[Mesa-dev] [Bug 104302] Wolfenstein 2 (2017) under wine graphical artifacting on RADV

2018-02-27 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104302

gloriouseggr...@gmail.com changed:

   What|Removed |Added

 CC||gloriouseggr...@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 2/2] anv: enable VK_EXT_shader_viewport_index_layer

2018-02-27 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

However, I think we should not push this until we have testing coverage
for using gl_Layer from Vertex and TessEval shaders. For now you only
submitted a test for gl_ViewportIndex to CTS right?

Iago

On Fri, 2018-02-23 at 14:09 -0800, Caio Marcelo de Oliveira Filho
wrote:
> ---
>  src/intel/vulkan/anv_extensions.py | 1 +
>  src/intel/vulkan/anv_pipeline.c| 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/src/intel/vulkan/anv_extensions.py
> b/src/intel/vulkan/anv_extensions.py
> index 581921e62a..00760fdd4e 100644
> --- a/src/intel/vulkan/anv_extensions.py
> +++ b/src/intel/vulkan/anv_extensions.py
> @@ -86,6 +86,7 @@ EXTENSIONS = [
>  Extension('VK_KHX_multiview', 1, True),
>  Extension('VK_EXT_debug_report',  8, True),
>  Extension('VK_EXT_external_memory_dma_buf',   1, True),
> +Extension('VK_EXT_shader_viewport_index_layer',   1, True),
>  ]
>  
>  class VkVersion:
> diff --git a/src/intel/vulkan/anv_pipeline.c
> b/src/intel/vulkan/anv_pipeline.c
> index e16a7a1994..115e1374d3 100644
> --- a/src/intel/vulkan/anv_pipeline.c
> +++ b/src/intel/vulkan/anv_pipeline.c
> @@ -143,6 +143,7 @@ anv_shader_compile_to_nir(struct anv_pipeline
> *pipeline,
>   .multiview = true,
>   .variable_pointers = true,
>   .storage_16bit = device->instance->physicalDevice.info.gen
> >= 8,
> + .shader_viewport_index_layer = true,
>},
> };
>  
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 1/2] spirv: Add SpvCapabilityShaderViewportIndexLayerEXT

2018-02-27 Thread Iago Toral
This patch is:

Reviewed-by: Iago Toral Quiroga 

On Fri, 2018-02-23 at 14:09 -0800, Caio Marcelo de Oliveira Filho
wrote:
> This capability allows gl_ViewportIndex and gl_Layer to also be used
> as outputs in Vertex and Tesselation shaders.
> 
> v2: Make conditional to the capability, add gl_Layer, add tesselation
> shaders. (Iago)
> 
> v3: Don't export to tesselation control shader.
> ---
>  src/compiler/shader_info.h | 1 +
>  src/compiler/spirv/spirv_to_nir.c  | 4 
>  src/compiler/spirv/vtn_variables.c | 8 
>  3 files changed, 13 insertions(+)
> 
> diff --git a/src/compiler/shader_info.h b/src/compiler/shader_info.h
> index e7fd7dbe62..2fcbde74be 100644
> --- a/src/compiler/shader_info.h
> +++ b/src/compiler/shader_info.h
> @@ -43,6 +43,7 @@ struct spirv_supported_capabilities {
> bool multiview;
> bool variable_pointers;
> bool storage_16bit;
> +   bool shader_viewport_index_layer;
>  };
>  
>  typedef struct shader_info {
> diff --git a/src/compiler/spirv/spirv_to_nir.c
> b/src/compiler/spirv/spirv_to_nir.c
> index c6df764682..fdb2993db5 100644
> --- a/src/compiler/spirv/spirv_to_nir.c
> +++ b/src/compiler/spirv/spirv_to_nir.c
> @@ -3203,6 +3203,10 @@ vtn_handle_preamble_instruction(struct
> vtn_builder *b, SpvOp opcode,
>   spv_check_supported(storage_16bit, cap);
>   break;
>  
> +  case SpvCapabilityShaderViewportIndexLayerEXT:
> + spv_check_supported(shader_viewport_index_layer, cap);
> + break;
> +
>default:
>   vtn_fail("Unhandled capability");
>}
> diff --git a/src/compiler/spirv/vtn_variables.c
> b/src/compiler/spirv/vtn_variables.c
> index 9eb85c24e9..75d8aec5da 100644
> --- a/src/compiler/spirv/vtn_variables.c
> +++ b/src/compiler/spirv/vtn_variables.c
> @@ -1192,6 +1192,10 @@ vtn_get_builtin_location(struct vtn_builder
> *b,
>   *mode = nir_var_shader_in;
>else if (b->shader->info.stage == MESA_SHADER_GEOMETRY)
>   *mode = nir_var_shader_out;
> +  else if (b->options && b->options-
> >caps.shader_viewport_index_layer &&
> +   (b->shader->info.stage == MESA_SHADER_VERTEX ||
> +b->shader->info.stage == MESA_SHADER_TESS_EVAL))
> + *mode = nir_var_shader_out;
>else
>   vtn_fail("invalid stage for SpvBuiltInLayer");
>break;
> @@ -1199,6 +1203,10 @@ vtn_get_builtin_location(struct vtn_builder
> *b,
>*location = VARYING_SLOT_VIEWPORT;
>if (b->shader->info.stage == MESA_SHADER_GEOMETRY)
>   *mode = nir_var_shader_out;
> +  else if (b->options && b->options-
> >caps.shader_viewport_index_layer &&
> +   (b->shader->info.stage == MESA_SHADER_VERTEX ||
> +b->shader->info.stage == MESA_SHADER_TESS_EVAL))
> + *mode = nir_var_shader_out;
>else if (b->shader->info.stage == MESA_SHADER_FRAGMENT)
>   *mode = nir_var_shader_in;
>else
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/12] vbo: Implement vbo_loopback_vertex_list in terms of the VAO.

2018-02-27 Thread Mathias Fröhlich
Hi Brian,

On Wednesday, 28 February 2018 00:55:33 CET Brian Paul wrote:
> Minor nit-picks below.
> > @@ -94,17 +94,15 @@ struct loopback_attr {
> >*/
> >   static void
> >   loopback_prim(struct gl_context *ctx,
> > -  const GLfloat *buffer,
> > +  const GLubyte *buffer,
> > const struct _mesa_prim *prim,
> > GLuint wrap_count,
> > -  GLuint vertex_size,
> > +  GLuint stride,
> > const struct loopback_attr *la, GLuint nr)
> >   {
> > -   GLint start = prim->start;
> > -   GLint end = start + prim->count;
> > -   const GLfloat *data;
> > -   GLint j;
> > -   GLuint k;
> > +   GLuint start = prim->start;
> > +   GLuint end = start + prim->count;
> 
> const on those two?

start may be modified afterwards by that wrap_count offset.
So I can't constify start. But end is now constified.

Thanks for the review!!

best

Mathias



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/12] vbo: Remove vbo_save_vertex_list::buffer_offset.

2018-02-27 Thread Mathias Fröhlich
Hi Brian,

On Wednesday, 28 February 2018 00:56:36 CET Brian Paul wrote:
> On 02/26/2018 11:12 PM, mathias.froehl...@gmx.net wrote:
> > From: Mathias Fröhlich 
> > 
> > The buffer_offset is used in aligned_vertex_buffer_offset.
> > But now that most of these decisions are done in compile_vertex_list
> > we can work on local variables instead of struct members in the
> > display list code. Clean that up and remove buffer_offset.
> 
> I presume the optimization I implemented here this still works after 
> this change.

I have been watching what you did last there.
And I have tried carefully to keep that behavior.

Well, the major purpose of the bigger series is that the direct OpenGL API 
user as well as internal users like the dlist and immediate mode code can 
build up VAOs that already contain just a single buffer object binding and so 
on. Also to give the mesa layer already a chance to see that there is no 
change in the vertex arrays.

So, what I mention in the cover letter that there sould be more optimization 
possible is at least one completely unfinalized change that I tried while 
playing around to check for more optimizations. Means the display list 
compiler now keeps the VAO's from the previous list. One thing that we can do 
now is to apply your optimization against the offset of the previous display 
list VAOs. Means the idea is that a lot of calling code ist compiling the 
display lists in an order that is also used while execute. That is checking 
the rest division against the previous VAO's offset instead of the buffer 
objects start offset is helping much more often. Then, if we can as a first 
order optimization keep the dlist compilers VAOs as long as possible then we 
do in turn not flag DriverFlags.NewArray and the driver shall in turn not even 
need to look at the arrays to detect changes. 
I'll try split out that easy change from the hackeries for review within the 
next week ...
But appart from that the dlist compiler can be hacked now to keep the same VAO 
used in the previous list by some offsetting to the primitives or pading 
vertices or what not to share the same pair of VAOs for more successive dlist 
nodes.
You can be pretty creative here ...

BTW: I am only mentioning legacy draw entry points, here. But note that the 
legacy entry points now basically use themselves the basic entry point that a 
modern OpenGL application uses. Means optimizing the modern main draw entry 
point does no longer partly collide with the already present dlist 
optimizations.

The next changes will try to incrementally adress the way from the VAO down 
into the drivers.

> If so, and with the minor comments on patch 4, the series LGTM.
> 
> Reviewed-by: Brian Paul 
> 
> Nice work!

Thanks for the review!!
I will apply the requested changes!
And rerun the tests wrt the inserted assertations.

best

Mathias



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] i965/sbe: fix number of inputs for active components

2018-02-27 Thread Iago Toral
Can someone review this patch? The bug if fixes is blocking the 18.0
release.

Iago

On Tue, 2018-02-27 at 08:02 +0100, Iago Toral Quiroga wrote:
> In 16631ca30ea6 we fixed gen9 active components to account for padded
> inputs in the URB, which we can have with SSO programs. To do that,
> instead of going through the bitfield of inputs (which doesn't
> include
> padding information), we compute the number of inputs from the size
> of the URB entry.
> 
> Unfortunately, there are some special inputs that are not stored in
> the URB and that we also need to account for. These special inputs
> are identified and handled during calculate_attr_overrides(), so this
> patch modifies this function to return a value with the total number
> of inputs, including the ones that are not stored in the URB, so we
> can use that number to program the correct number of active
> components.
> 
> This fixes a regression in a WebGL program that uses Point Sprite
> functionality (specifically, VARYING_SLOT_PNTC).
> 
> v2:
>  - Add 'Fixes' tag (Mark Janes)
>  - make no_vue_inputs int instead of uint32_t, and add const
> qualifier
>to num_inputs variable (Ian)
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105224
> Fixes: 16631ca30ea6 (i965/sbe: fix active components for SSO programs
> with over 16 inputs)
> ---
>  src/mesa/drivers/dri/i965/genX_state_upload.c | 31
> ---
>  1 file changed, 23 insertions(+), 8 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c
> b/src/mesa/drivers/dri/i965/genX_state_upload.c
> index 8668abd591..891bab1746 100644
> --- a/src/mesa/drivers/dri/i965/genX_state_upload.c
> +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
> @@ -1015,7 +1015,7 @@ genX(get_attr_override)(struct
> GENX(SF_OUTPUT_ATTRIBUTE_DETAIL) *attr,
>  }
>  
>  
> -static void
> +static int
>  genX(calculate_attr_overrides)(const struct brw_context *brw,
> struct
> GENX(SF_OUTPUT_ATTRIBUTE_DETAIL) *attr_overrides,
> uint32_t *point_sprite_enables,
> @@ -1064,6 +1064,7 @@ genX(calculate_attr_overrides)(const struct
> brw_context *brw,
>  */
> bool drawing_points = brw_is_drawing_points(brw);
>  
> +   int no_vue_inputs = 0;
> for (int attr = 0; attr < VARYING_SLOT_MAX; attr++) {
>int input_index = wm_prog_data->urb_setup[attr];
>  
> @@ -1097,6 +1098,12 @@ genX(calculate_attr_overrides)(const struct
> brw_context *brw,
>   &max_source_attr);
>}
>  
> +  if (point_sprite ||
> +  (attr == VARYING_SLOT_PRIMITIVE_ID &&
> +   brw->vue_map_geom_out.varying_to_slot[attr] == -1)) {
> + no_vue_inputs++;
> +  }
> +
>/* The hardware can only do the overrides on 16 overrides at a
> * time, and the other up to 16 have to be lined up so that
> the
> * input index = the output index.  We'll need to do some
> @@ -1124,6 +1131,8 @@ genX(calculate_attr_overrides)(const struct
> brw_context *brw,
>  * Similar text exists for Ivy Bridge.
>  */
> *urb_entry_read_length = DIV_ROUND_UP(max_source_attr + 1, 2);
> +
> +   return *urb_entry_read_length * 2 + no_vue_inputs;
>  }
>  #endif
>  
> @@ -3434,11 +3443,12 @@ genX(upload_sbe)(struct brw_context *brw)
> * BRW_NEW_GS_PROG_DATA | BRW_NEW_PRIMITIVE |
> BRW_NEW_TES_PROG_DATA |
> * BRW_NEW_VUE_MAP_GEOM_OUT
> */
> -  genX(calculate_attr_overrides)(brw,
> - attr_overrides,
> - &point_sprite_enables,
> - &urb_entry_read_length,
> - &urb_entry_read_offset);
> +  const int num_inputs =
> + genX(calculate_attr_overrides)(brw,
> +attr_overrides,
> +&point_sprite_enables,
> +&urb_entry_read_length,
> +&urb_entry_read_offset);
>  
>/* Typically, the URB entry read length and offset should be
> programmed
> * in 3DSTATE_VS and 3DSTATE_GS; SBE inherits it from the last
> active
> @@ -3459,8 +3469,13 @@ genX(upload_sbe)(struct brw_context *brw)
>  #endif
>  
>  #if GEN_GEN >= 9
> -  /* prepare the active component dwords */
> -  const int num_inputs = urb_entry_read_length * 2;
> +  /* prepare the active component dwords
> +   *
> +   * For this, we need to account for padded inputs (that we can
> have with
> +   * SSO programs), so we take the number of inputs from the
> size of the
> +   * URB entry. To that, we need to add the number of special
> inputs, if
> +   * any, that are not in the URB (such as point sprite inputs).
> +   */
>for (int input_index = 0; input_index < num_inputs;
> input_index++) {
>   sbe.AttributeActiveComponentFormat[input_index] =
> 

Re: [Mesa-dev] [PATCH 4/4] i965/tiled_memcpy: Add support for Yf and Ys tiling/detiling

2018-02-27 Thread Scott D Phillips
Ilia Mirkin  writes:

> On Feb 27, 2018 11:22 PM, "Scott D Phillips" 
> wrote:
>
> > Yf and Ys are a family of tilings similar to Y. The actual address
> > bit interleavings for Yf* and Ys* depend upon the bits-per-pixel
> > value of the surface, where 128-, 32-, and 8-bpp tiles are square
> > and 64- and 16-bpp tiles have a 2:1 aspect ratio.
> > 
> > The address bit layout of Yf and Ys are the same in the low
> > 12 bits (4-kbytes); however Ys tiles are actually 64-kbytes in
> > size, but can be handled as if they were composed of 4-kbyte
> > sub-tiles with a different overall tile ordering.
> > 
> > Because all of Y, Yf*, and Ys* have the same least significant
> > 6 bits of address bit layout, the same tiling/detiling routine can
> > be used between them. The inner loop that writes or reads cache
> > lines at a time is the same, while the outer loop (which walks in
> > a mostly linear order) uses different address increment values for
> > each tiling format to adjust the tile addresses.
> > ---
> >  src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 219
> > -
> >  1 file changed, 179 insertions(+), 40 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
> > b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
> > index a78e2b97d45..2b040a69524 100644
> > --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
> > +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
> > @@ -57,6 +57,12 @@ static const uint32_t xtile_span = 64;
> >  static const uint32_t ytile_width = 128;
> >  static const uint32_t ytile_height = 32;
> >  static const uint32_t ytile_span = 16;
> > +static const uint32_t std_ytile128_width = 256;
> > +static const uint32_t std_ytile128_height = 16;
> > +static const uint32_t std_ytile32_width = 128;
> > +static const uint32_t std_ytile32_height = 32;
> > +static const uint32_t std_ytile8_width = 64;
> > +static const uint32_t std_ytile8_height = 64;
> > 
> >  static inline uint32_t
> >  ror(uint32_t n, uint32_t d)
> > @@ -253,6 +259,48 @@ ytile_addr(uint32_t x, uint32_t y, char *src, uint32_t
> > src_pitch)
> > return src + (((y >> 5) * (src_pitch >> 7) + (x >> 7)) << 12);
> >  }
> > 
> > +static char *
> > +yf128_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
> > +{
> > +   return src + (((y >> 4) * (src_pitch >> 8) + (x >> 8)) << 12);
> > +}
> > +
> > +static char *
> > +yf32_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
> > +{
> > +   return src + (((y >> 5) * (src_pitch >> 7) + (x >> 7)) << 12);
> > +}
> > +
> > +static char *
> > +yf8_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
> > +{
> > +   return src + (((y >> 6) * (src_pitch >> 6) + (x >> 6)) << 12);
> > +}
> > +
> > +static char *
> > +ys128_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
> > +{
> > +   return src + (((y & 0x10) << 8) + ((y & 0x20) << 9) + ((x & 0x100) <<
> > 5) +
> > + ((x & 200) << 6) +
>
>
> 0x200?
>
> > + (((y >> 6) * (src_pitch >> 10) + (x >> 10)) << 16));
> > +}
> > +
> > +static char *
> > +ys32_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
> > +{
> > +   return src + (((y & 0x20) << 7) + ((y & 0x40) << 8) + ((x & 0x80) << 6)
> > +
> > + ((x & 100) << 7) +
>
>
> 0x100? Looks like there are more of these below.

doh, why can't the compiler always tell me when i'm being dumb? as you
can obviously tell the ys and yf stuff is just coded up, not tested yet
(though the y0 stuff is tested). thanks, fixed up locally.

-- 
scott
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600/shader: when using images always load thread id gpr at start (v2)

2018-02-27 Thread Dave Airlie
From: Dave Airlie 

The delayed loading code was fail if we had control flow.

This fixes:
tests/spec/arb_shader_image_load_store/execution/image_checkerboard.shader_test

v2: don't use temp_reg before setting temp_reg up.

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/r600/r600_shader.c | 22 +++---
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 1256e271b5b..5ddc5987326 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -363,7 +363,6 @@ struct r600_shader_ctx {
unsignedtess_input_info; /* temp with 
tess input offsets */
unsignedtess_output_info; /* temp with 
tess input offsets */
unsignedthread_id_gpr; /* temp with 
thread id calculated for images */
-   bool thread_id_gpr_loaded;
 };
 
 struct r600_shader_tgsi_instruction {
@@ -3275,9 +3274,6 @@ static int load_thread_id_gpr(struct r600_shader_ctx *ctx)
struct r600_bytecode_alu alu;
int r;
 
-   if (ctx->thread_id_gpr_loaded)
-   return 0;
-
memset(&alu, 0, sizeof(struct r600_bytecode_alu));
alu.op = ALU_OP1_MBCNT_32LO_ACCUM_PREV_INT;
alu.dst.sel = ctx->temp_reg;
@@ -3322,7 +3318,6 @@ static int load_thread_id_gpr(struct r600_shader_ctx *ctx)
   ctx->temp_reg, 0);
if (r)
return r;
-   ctx->thread_id_gpr_loaded = true;
return 0;
 }
 
@@ -3431,12 +3426,12 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
ctx.gs_next_vertex = 0;
ctx.gs_stream_output_info = &so;
 
+   ctx.thread_id_gpr = -1;
ctx.face_gpr = -1;
ctx.fixed_pt_position_gpr = -1;
ctx.fragcoord_input = -1;
ctx.colors_used = 0;
ctx.clip_vertex_write = 0;
-   ctx.thread_id_gpr_loaded = false;
 
ctx.helper_invoc_reg = -1;
ctx.cs_block_size_reg = -1;
@@ -3570,7 +3565,6 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
 
if (shader->uses_images) {
ctx.thread_id_gpr = ++regno;
-   ctx.thread_id_gpr_loaded = false;
}
ctx.temp_reg = ++regno;
 
@@ -3613,6 +3607,12 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
if (shader->vs_as_gs_a)
vs_add_primid_output(&ctx, key.vs.prim_id_out);
 
+   if (ctx.thread_id_gpr != -1) {
+   r = load_thread_id_gpr(&ctx);
+   if (r)
+   return r;
+   }
+
if (ctx.type == PIPE_SHADER_TESS_EVAL)
r600_fetch_tess_io_info(&ctx);
 
@@ -8647,10 +8647,6 @@ static int tgsi_load_rat(struct r600_shader_ctx *ctx)
unsigned rat_index_mode;
unsigned immed_base;
 
-   r = load_thread_id_gpr(ctx);
-   if (r)
-   return r;
-
rat_index_mode = inst->Src[0].Indirect.Index == 2 ? 2 : 0; // 
CF_INDEX_1 : CF_INDEX_NONE
 
immed_base = R600_IMAGE_IMMED_RESOURCE_OFFSET;
@@ -8978,10 +8974,6 @@ static int tgsi_atomic_op_rat(struct r600_shader_ctx 
*ctx)
immed_base = R600_IMAGE_IMMED_RESOURCE_OFFSET;
rat_base = ctx->shader->rat_base;
 
-   r = load_thread_id_gpr(ctx);
-   if (r)
-   return r;
-
 if (inst->Src[0].Register.File == TGSI_FILE_BUFFER) {
immed_base += ctx->info.file_count[TGSI_FILE_IMAGE];
rat_base += ctx->info.file_count[TGSI_FILE_IMAGE];
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600/shader: when using images always load thread id gpr at start

2018-02-27 Thread Dave Airlie
From: Dave Airlie 

The delayed loading code was fail if we had control flow.

This fixes:
tests/spec/arb_shader_image_load_store/execution/image_checkerboard.shader_test

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/r600/r600_shader.c | 18 +++---
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 1256e271b5b..3ed91f02e84 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -363,7 +363,6 @@ struct r600_shader_ctx {
unsignedtess_input_info; /* temp with 
tess input offsets */
unsignedtess_output_info; /* temp with 
tess input offsets */
unsignedthread_id_gpr; /* temp with 
thread id calculated for images */
-   bool thread_id_gpr_loaded;
 };
 
 struct r600_shader_tgsi_instruction {
@@ -3275,9 +3274,6 @@ static int load_thread_id_gpr(struct r600_shader_ctx *ctx)
struct r600_bytecode_alu alu;
int r;
 
-   if (ctx->thread_id_gpr_loaded)
-   return 0;
-
memset(&alu, 0, sizeof(struct r600_bytecode_alu));
alu.op = ALU_OP1_MBCNT_32LO_ACCUM_PREV_INT;
alu.dst.sel = ctx->temp_reg;
@@ -3322,7 +3318,6 @@ static int load_thread_id_gpr(struct r600_shader_ctx *ctx)
   ctx->temp_reg, 0);
if (r)
return r;
-   ctx->thread_id_gpr_loaded = true;
return 0;
 }
 
@@ -3436,7 +3431,6 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
ctx.fragcoord_input = -1;
ctx.colors_used = 0;
ctx.clip_vertex_write = 0;
-   ctx.thread_id_gpr_loaded = false;
 
ctx.helper_invoc_reg = -1;
ctx.cs_block_size_reg = -1;
@@ -3570,7 +3564,9 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
 
if (shader->uses_images) {
ctx.thread_id_gpr = ++regno;
-   ctx.thread_id_gpr_loaded = false;
+   r = load_thread_id_gpr(&ctx);
+   if (r)
+   return r;
}
ctx.temp_reg = ++regno;
 
@@ -8647,10 +8643,6 @@ static int tgsi_load_rat(struct r600_shader_ctx *ctx)
unsigned rat_index_mode;
unsigned immed_base;
 
-   r = load_thread_id_gpr(ctx);
-   if (r)
-   return r;
-
rat_index_mode = inst->Src[0].Indirect.Index == 2 ? 2 : 0; // 
CF_INDEX_1 : CF_INDEX_NONE
 
immed_base = R600_IMAGE_IMMED_RESOURCE_OFFSET;
@@ -8978,10 +8970,6 @@ static int tgsi_atomic_op_rat(struct r600_shader_ctx 
*ctx)
immed_base = R600_IMAGE_IMMED_RESOURCE_OFFSET;
rat_base = ctx->shader->rat_base;
 
-   r = load_thread_id_gpr(ctx);
-   if (r)
-   return r;
-
 if (inst->Src[0].Register.File == TGSI_FILE_BUFFER) {
immed_base += ctx->info.file_count[TGSI_FILE_IMAGE];
rat_base += ctx->info.file_count[TGSI_FILE_IMAGE];
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4] anv: enable VK_EXT_shader_stencil_export

2018-02-27 Thread Gustavo Lima Chaves
v2:
An attempt to support SpvExecutionModeStencilRefReplacingEXT's behavior
also follows, with the interpretation to said mode being we prevent
writes to the built-in FragStencilRefEXT variable when the execution
mode isn't set.

v3:
A more cautious reading of 1db44252d01bf7539452ccc2b5210c74b8dcd573 led
me to a missing change that would stop (what I later discovered were)
GPU hangs on the CTS test written to exercize this.

v4:
Turn FragStencilRefEXT decoration usage without StencilRefReplacingEXT
mode into a warning, instead of trying to make the variable read-only.
If we are to follow the originating extension on GL, the built-in
variable in question should never be readable anyway.
---
 src/compiler/shader_info.h | 2 ++
 src/compiler/spirv/spirv_to_nir.c  | 4 
 src/compiler/spirv/vtn_variables.c | 5 +
 src/intel/vulkan/anv_extensions.py | 2 ++
 src/intel/vulkan/anv_pipeline.c| 1 +
 src/intel/vulkan/genX_pipeline.c   | 1 +
 6 files changed, 15 insertions(+)

diff --git a/src/compiler/shader_info.h b/src/compiler/shader_info.h
index 6de707f672..f99cbc27a7 100644
--- a/src/compiler/shader_info.h
+++ b/src/compiler/shader_info.h
@@ -162,6 +162,8 @@ typedef struct shader_info {
 
  bool pixel_center_integer;
 
+ bool outputs_stencil;
+
  /** gl_FragDepth layout for ARB_conservative_depth. */
  enum gl_frag_depth_layout depth_layout;
   } fs;
diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index e00dcafa12..dcb8b31967 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -3395,6 +3395,10 @@ vtn_handle_execution_mode(struct vtn_builder *b, struct 
vtn_value *entry_point,
case SpvExecutionModeContractionOff:
   break; /* OpenCL */
 
+   case SpvExecutionModeStencilRefReplacingEXT:
+  b->shader->info.fs.outputs_stencil = true;
+  break;
+
default:
   vtn_fail("Unhandled execution mode");
}
diff --git a/src/compiler/spirv/vtn_variables.c 
b/src/compiler/spirv/vtn_variables.c
index 66a5fba6b7..3af54ede66 100644
--- a/src/compiler/spirv/vtn_variables.c
+++ b/src/compiler/spirv/vtn_variables.c
@@ -1382,6 +1382,11 @@ apply_var_decoration(struct vtn_builder *b, nir_variable 
*nir_var,
   case SpvBuiltInFragCoord:
  nir_var->data.pixel_center_integer = b->pixel_center_integer;
  break;
+  case SpvBuiltInFragStencilRefEXT:
+ if (!b->shader->info.fs.outputs_stencil)
+ vtn_warn("The StencilRefReplacingEXT mode should be declared when"
+  " the decoration FragStencilRefEXT is used on a 
variable");
+ break;
   default:
  break;
   }
diff --git a/src/intel/vulkan/anv_extensions.py 
b/src/intel/vulkan/anv_extensions.py
index 581921e62a..cd90c6ae52 100644
--- a/src/intel/vulkan/anv_extensions.py
+++ b/src/intel/vulkan/anv_extensions.py
@@ -86,6 +86,8 @@ EXTENSIONS = [
 Extension('VK_KHX_multiview', 1, True),
 Extension('VK_EXT_debug_report',  8, True),
 Extension('VK_EXT_external_memory_dma_buf',   1, True),
+Extension('VK_EXT_shader_stencil_export', 1,
+  'device->info.gen >= 9'),
 ]
 
 class VkVersion:
diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index e16a7a1994..ed63fa42cd 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -143,6 +143,7 @@ anv_shader_compile_to_nir(struct anv_pipeline *pipeline,
  .multiview = true,
  .variable_pointers = true,
  .storage_16bit = device->instance->physicalDevice.info.gen >= 8,
+ .stencil_export = device->instance->physicalDevice.info.gen >= 9,
   },
};
 
diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
index 89cbe293b8..683a4607e6 100644
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -1600,6 +1600,7 @@ emit_3dstate_ps_extra(struct anv_pipeline *pipeline,
  ps.PixelShaderHasUAV = true;
 
 #if GEN_GEN >= 9
+  ps.PixelShaderComputesStencil = wm_prog_data->computed_stencil;
   ps.PixelShaderPullsBary= wm_prog_data->pulls_bary;
   ps.InputCoverageMaskState  = wm_prog_data->uses_sample_mask ?
ICMS_INNER_CONSERVATIVE : ICMS_NONE;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] mesa: Fix tautological compare

2018-02-27 Thread Ian Romanick
On 02/27/2018 07:38 PM, Brian Paul wrote:
> On 02/27/2018 06:07 PM, Ian Romanick wrote:
>> From: Ian Romanick 
>>
>> _BaseFormat is GLenum16, but it is compared with an integer -1.  Even if
>> _mesa_base_tex_format returned -1, it becomes 0x, and the
>> comparison, after type promotion, will always fail.  Fix this by
>> explicitly casting both to GLenum16.  This should be more future proof
>> that casting just the -1.
> 
> "than"
> 
> Series is, Reviewed-by: Brian Paul 
> 
> How about we change _mesa_base_tex_format() to return GL_INVALID_ENUM
> instead of -1?  Or, even 0 like _mesa_base_fbo_format()?

I considered that.  It would be a much more invasive change, and I was
feeling lazy. :) There are a lot of callers.  Most either test the value
directly as 'if (_mesa_base_tex_format(...) < 0)' or assign the value to
a GLenum.  I can still do that...

> -Brian
> 
>> CID: 1429517
>> Fixes: ca721b3 ("mesa: use GLenum16 in a few more places")
>> Signed-off-by: Ian Romanick 
>> ---
>>   src/mesa/main/teximage.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
>> index 35ed875..1d7bff5 100644
>> --- a/src/mesa/main/teximage.c
>> +++ b/src/mesa/main/teximage.c
>> @@ -853,7 +853,7 @@ init_teximage_fields_ms(struct gl_context *ctx,
>>    target = img->TexObject->Target;
>>  img->_BaseFormat = _mesa_base_tex_format( ctx, internalFormat );
>> -   assert(img->_BaseFormat != -1);
>> +   assert((GLenum16) img->_BaseFormat != (GLenum16) -1);
>>  img->InternalFormat = internalFormat;
>>  img->Border = border;
>>  img->Width = width;
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 104302] Wolfenstein 2 (2017) under wine graphical artifacting on RADV

2018-02-27 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104302

--- Comment #9 from gloriouseggr...@gmail.com ---
Sorry for the delayed response, have been at work. Today roderickc/wine-vulkan
patches were implemented into mainline wine and wine-staging was rebased, so I
was able to run the following video with a full vulkan implementation on wine
with no additional patches. Here is my current setup:

Wine-staging 3.3 git. (github.com/wine-staging/wine-staging)
with rebase hash 9ae8b8c00f2cca205fdf4ce76e221778b7dfbea7

Arch Linux Kernel: 4.16 rc3 
Mesa 18.0.0 rc4 
LLVM 6.0.0 rc3 
AMD RX Vega 64

Video:
https://www.youtube.com/watch?v=bM-g1TBNczo

Notes:
shooting seems to work fine
cutscene backgrounds render fine

*mouth facial animations are a bit off/odd/melty

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] i965/tiled_memcpy: Add support for Yf and Ys tiling/detiling

2018-02-27 Thread Ilia Mirkin
On Feb 27, 2018 11:22 PM, "Scott D Phillips" 
wrote:

Yf and Ys are a family of tilings similar to Y. The actual address
bit interleavings for Yf* and Ys* depend upon the bits-per-pixel
value of the surface, where 128-, 32-, and 8-bpp tiles are square
and 64- and 16-bpp tiles have a 2:1 aspect ratio.

The address bit layout of Yf and Ys are the same in the low
12 bits (4-kbytes); however Ys tiles are actually 64-kbytes in
size, but can be handled as if they were composed of 4-kbyte
sub-tiles with a different overall tile ordering.

Because all of Y, Yf*, and Ys* have the same least significant
6 bits of address bit layout, the same tiling/detiling routine can
be used between them. The inner loop that writes or reads cache
lines at a time is the same, while the outer loop (which walks in
a mostly linear order) uses different address increment values for
each tiling format to adjust the tile addresses.
---
 src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 219
-
 1 file changed, 179 insertions(+), 40 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
index a78e2b97d45..2b040a69524 100644
--- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
+++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
@@ -57,6 +57,12 @@ static const uint32_t xtile_span = 64;
 static const uint32_t ytile_width = 128;
 static const uint32_t ytile_height = 32;
 static const uint32_t ytile_span = 16;
+static const uint32_t std_ytile128_width = 256;
+static const uint32_t std_ytile128_height = 16;
+static const uint32_t std_ytile32_width = 128;
+static const uint32_t std_ytile32_height = 32;
+static const uint32_t std_ytile8_width = 64;
+static const uint32_t std_ytile8_height = 64;

 static inline uint32_t
 ror(uint32_t n, uint32_t d)
@@ -253,6 +259,48 @@ ytile_addr(uint32_t x, uint32_t y, char *src, uint32_t
src_pitch)
return src + (((y >> 5) * (src_pitch >> 7) + (x >> 7)) << 12);
 }

+static char *
+yf128_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
+{
+   return src + (((y >> 4) * (src_pitch >> 8) + (x >> 8)) << 12);
+}
+
+static char *
+yf32_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
+{
+   return src + (((y >> 5) * (src_pitch >> 7) + (x >> 7)) << 12);
+}
+
+static char *
+yf8_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
+{
+   return src + (((y >> 6) * (src_pitch >> 6) + (x >> 6)) << 12);
+}
+
+static char *
+ys128_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
+{
+   return src + (((y & 0x10) << 8) + ((y & 0x20) << 9) + ((x & 0x100) <<
5) +
+ ((x & 200) << 6) +


0x200?

+ (((y >> 6) * (src_pitch >> 10) + (x >> 10)) << 16));
+}
+
+static char *
+ys32_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
+{
+   return src + (((y & 0x20) << 7) + ((y & 0x40) << 8) + ((x & 0x80) << 6)
+
+ ((x & 100) << 7) +


0x100? Looks like there are more of these below.

+ (((y >> 7) * (src_pitch >> 9) + (x >> 9)) << 16));
+}
+
+static char *
+ys8_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
+{
+   return src + (((y & 0x40) << 6) + ((y & 0x80) << 7) + ((x & 0x40) << 7)
+
+ ((x & 80) << 8) +
+ (((y >> 8) * (src_pitch >> 8) + (x >> 8)) << 16));
+}
+
 /**
  * Copy texture data from linear to X tile layout.
  *
@@ -302,7 +350,8 @@ linear_to_xtiled(uint32_t x0, uint32_t x1, uint32_t x2,
uint32_t x3,
 }

 /**
- * Copy texture data from linear to Y tile layout.
+ * Copy texture data from linear to Y tile layout. This function tiles a
+ * single 4KB portion of the tiling (even for the 64KB tiling variants)
  *
  * \copydoc tile_copy_fn
  */
@@ -312,28 +361,71 @@ linear_to_ytiled(uint32_t x0, uint32_t x1, uint32_t
x2, uint32_t x3,
  char *dst, const char *src,
  int32_t src_pitch,
  uint32_t swizzle_bit,
- UNUSED enum isl_tiling tiling,
- UNUSED int cpp,
+ enum isl_tiling tiling,
+ int cpp,
  mem_copy_fn mem_copy,
  mem_copy_fn mem_copy_align16)
 {
-   /* Y tiles consist of columns that are 'ytile_span' wide (and the same
height
-* as the tile).  Thus the destination offset for (x,y) is the sum of:
-*   (x % column_width)// position within column
-*   (x / column_width) * bytes_per_column // column number * bytes per
column
-*   y * column_width
+   /* The Y tilings are a family of different tilings with the following
+* linear-to-tiled address mapping for the low 12-bits of the tiled
+* addresses:
+*
+* Tilingbpp11 10  9  8  7  6  5  4  3  2  1  0
+* 
+* TileYF/TileYS 64 & 128   u7 v3 u6 v2 u5 u4 v1 v0 u3 u2 u1 u0
+* TileYF/TileYS 16 &  32   u6 v4 u5 v3 u4 v2 v1 v0 u3 u2 u1 u0
+* TileYF/TileYS8  

[Mesa-dev] [PATCH] r600: partly revert disabling tiling for 1d texture.

2018-02-27 Thread Dave Airlie
From: Dave Airlie 

Previously we had a check for 1d of narrow 2D textures, however
narrow 2d textures caused gpu hangs, but it was correct for 1d
textures.

This fixes a bunch of 1D image piglits for me.
---
 src/gallium/drivers/r600/r600_texture.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_texture.c 
b/src/gallium/drivers/r600/r600_texture.c
index 03cdcd22eee..1fbb682d671 100644
--- a/src/gallium/drivers/r600/r600_texture.c
+++ b/src/gallium/drivers/r600/r600_texture.c
@@ -1053,6 +1053,11 @@ r600_choose_tiling(struct r600_common_screen *rscreen,
if (templ->bind & PIPE_BIND_LINEAR)
return RADEON_SURF_MODE_LINEAR_ALIGNED;
 
+   /* 1D textures should be linear - fixes image operations on 1d 
*/
+   if (templ->target == PIPE_TEXTURE_1D ||
+   templ->target == PIPE_TEXTURE_1D_ARRAY)
+   return RADEON_SURF_MODE_LINEAR_ALIGNED;
+
/* Textures likely to be mapped often. */
if (templ->usage == PIPE_USAGE_STAGING ||
templ->usage == PIPE_USAGE_STREAM)
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] i965/tiled_memcpy: Add tile_addr_fn to compute the address of a tile

2018-02-27 Thread Scott D Phillips
Replace the calculation of the individual tile address with a call
through a function pointer to the calculation. This will be
important with Ys tiling where a more complicated calculation is
needed to derive the 4 kbyte sub-tile address.
---
 src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 29 --
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c 
b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
index 06663a8b3ce..a78e2b97d45 100644
--- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
+++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
@@ -234,6 +234,25 @@ typedef void (*tile_copy_fn)(uint32_t x0, uint32_t x1, 
uint32_t x2, uint32_t x3,
  int cpp,
  mem_copy_fn mem_copy);
 
+/* a tile_addr_fn returns the address of a 4kbyte tile (or a 4kbyte sub-tile
+ * in larger tiling formats) which contains the points (x, y) within the given
+ * tiled image.
+ */
+typedef char* (*tile_addr_fn)(uint32_t x, uint32_t y, char *src,
+  uint32_t src_pitch);
+
+static char *
+xtile_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
+{
+   return src + (((y >> 3) * (src_pitch >> 9) + (x >> 9)) << 12);
+}
+
+static char *
+ytile_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
+{
+   return src + (((y >> 5) * (src_pitch >> 7) + (x >> 7)) << 12);
+}
+
 /**
  * Copy texture data from linear to X tile layout.
  *
@@ -775,6 +794,7 @@ linear_to_tiled(uint32_t xt1, uint32_t xt2,
 int cpp,
 mem_copy_fn mem_copy)
 {
+   tile_addr_fn tile_addr;
tile_copy_fn tile_copy;
uint32_t xt0, xt3;
uint32_t yt0, yt3;
@@ -787,11 +807,13 @@ linear_to_tiled(uint32_t xt1, uint32_t xt2,
   th = xtile_height;
   span = xtile_span;
   tile_copy = linear_to_xtiled_faster;
+  tile_addr = xtile_addr;
} else if (tiling == ISL_TILING_Y0) {
   tw = ytile_width;
   th = ytile_height;
   span = ytile_span;
   tile_copy = linear_to_ytiled_faster;
+  tile_addr = ytile_addr;
} else {
   unreachable("unsupported tiling");
}
@@ -837,7 +859,7 @@ linear_to_tiled(uint32_t xt1, uint32_t xt2,
  /* Translate by (xt,yt) for single-tile copier. */
  tile_copy(x0-xt, x1-xt, x2-xt, x3-xt,
y0-yt, y1-yt,
-   dst + (ptrdiff_t)xt * th  +  (ptrdiff_t)yt* 
dst_pitch,
+   tile_addr(xt, yt, dst, dst_pitch),
src + (ptrdiff_t)xt - xt1 + ((ptrdiff_t)yt - yt1) * 
src_pitch,
src_pitch,
swizzle_bit,
@@ -869,6 +891,7 @@ tiled_to_linear(uint32_t xt1, uint32_t xt2,
 int cpp,
 mem_copy_fn mem_copy)
 {
+   tile_addr_fn tile_addr;
tile_copy_fn tile_copy;
uint32_t xt0, xt3;
uint32_t yt0, yt3;
@@ -881,11 +904,13 @@ tiled_to_linear(uint32_t xt1, uint32_t xt2,
   th = xtile_height;
   span = xtile_span;
   tile_copy = xtiled_to_linear_faster;
+  tile_addr = xtile_addr;
} else if (tiling == ISL_TILING_Y0) {
   tw = ytile_width;
   th = ytile_height;
   span = ytile_span;
   tile_copy = ytiled_to_linear_faster;
+  tile_addr = ytile_addr;
} else {
   unreachable("unsupported tiling");
}
@@ -932,7 +957,7 @@ tiled_to_linear(uint32_t xt1, uint32_t xt2,
  tile_copy(x0-xt, x1-xt, x2-xt, x3-xt,
y0-yt, y1-yt,
dst + (ptrdiff_t)xt - xt1 + ((ptrdiff_t)yt - yt1) * 
dst_pitch,
-   src + (ptrdiff_t)xt * th  +  (ptrdiff_t)yt* 
src_pitch,
+   tile_addr(xt, yt, (char*)src, src_pitch),
dst_pitch,
swizzle_bit,
tiling,
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] i965/tiled_memcpy: Pass cpp to tiling/untiling functions

2018-02-27 Thread Scott D Phillips
As preparation for doing Yf/Ys tiling, pass the image's cpp into
this tiling/untiling functions. The layout of Yf/Ys differ
depending on cpp.

Also plumb tiling and cpp through to the per-tile functions for
ytile.
---
 src/mesa/drivers/dri/i965/intel_pixel_read.c   |  1 +
 src/mesa/drivers/dri/i965/intel_tex_image.c|  2 +
 src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 72 +-
 src/mesa/drivers/dri/i965/intel_tiled_memcpy.h |  2 +
 4 files changed, 53 insertions(+), 24 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_pixel_read.c 
b/src/mesa/drivers/dri/i965/intel_pixel_read.c
index cf957378f92..012eac48504 100644
--- a/src/mesa/drivers/dri/i965/intel_pixel_read.c
+++ b/src/mesa/drivers/dri/i965/intel_pixel_read.c
@@ -207,6 +207,7 @@ intel_readpixels_tiled_memcpy(struct gl_context * ctx,
   dst_pitch, irb->mt->surf.row_pitch,
   brw->has_swizzling,
   irb->mt->surf.tiling,
+  irb->mt->cpp,
   mem_copy
);
 
diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index e25bc9a0c08..1d576627d05 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -298,6 +298,7 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx,
   image->mt->surf.row_pitch, src_pitch,
   brw->has_swizzling,
   image->mt->surf.tiling,
+  image->mt->cpp,
   mem_copy
);
 
@@ -804,6 +805,7 @@ intel_gettexsubimage_tiled_memcpy(struct gl_context *ctx,
   dst_pitch, image->mt->surf.row_pitch,
   brw->has_swizzling,
   image->mt->surf.tiling,
+  image->mt->cpp,
   mem_copy
);
 
diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c 
b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
index 7c6bde990d6..06663a8b3ce 100644
--- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
+++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
@@ -230,6 +230,8 @@ typedef void (*tile_copy_fn)(uint32_t x0, uint32_t x1, 
uint32_t x2, uint32_t x3,
  char *dst, const char *src,
  int32_t linear_pitch,
  uint32_t swizzle_bit,
+ enum isl_tiling tiling,
+ int cpp,
  mem_copy_fn mem_copy);
 
 /**
@@ -291,6 +293,8 @@ linear_to_ytiled(uint32_t x0, uint32_t x1, uint32_t x2, 
uint32_t x3,
  char *dst, const char *src,
  int32_t src_pitch,
  uint32_t swizzle_bit,
+ UNUSED enum isl_tiling tiling,
+ UNUSED int cpp,
  mem_copy_fn mem_copy,
  mem_copy_fn mem_copy_align16)
 {
@@ -455,6 +459,8 @@ ytiled_to_linear(uint32_t x0, uint32_t x1, uint32_t x2, 
uint32_t x3,
  char *dst, const char *src,
  int32_t dst_pitch,
  uint32_t swizzle_bit,
+ UNUSED enum isl_tiling tiling,
+ UNUSED int cpp,
  mem_copy_fn mem_copy,
  mem_copy_fn mem_copy_align16)
 {
@@ -582,6 +588,8 @@ linear_to_xtiled_faster(uint32_t x0, uint32_t x1, uint32_t 
x2, uint32_t x3,
 char *dst, const char *src,
 int32_t src_pitch,
 uint32_t swizzle_bit,
+UNUSED enum isl_tiling tiling,
+UNUSED int cpp,
 mem_copy_fn mem_copy)
 {
if (x0 == 0 && x3 == xtile_width && y0 == 0 && y1 == xtile_height) {
@@ -625,31 +633,35 @@ linear_to_ytiled_faster(uint32_t x0, uint32_t x1, 
uint32_t x2, uint32_t x3,
 char *dst, const char *src,
 int32_t src_pitch,
 uint32_t swizzle_bit,
+enum isl_tiling tiling,
+int cpp,
 mem_copy_fn mem_copy)
 {
if (x0 == 0 && x3 == ytile_width && y0 == 0 && y1 == ytile_height) {
   if (mem_copy == memcpy)
- return linear_to_ytiled(0, 0, ytile_width, ytile_width, 0, 
ytile_height,
- dst, src, src_pitch, swizzle_bit, memcpy, 
memcpy);
+ return linear_to_ytiled(0, 0, ytile_width, ytile_width, 0,
+ ytile_height, dst, src, src_pitch, 
swizzle_bit,
+ tiling, cpp, memcpy, memcpy);
   else if (mem_copy == rgba8_copy)
- return linear_to_ytiled(0, 0, ytile_width, ytile_width, 0, 
ytile_height,
- dst, src, src_pitch, swizzle_bit,
- rgba8_copy, rgba8_copy_aligned_dst);
+ return linear_to_ytiled(0, 0, ytile_width, ytile_width, 0,
+ ytile_height, dst, src, src_pitch, 
swizzle_bit,
+ tiling, cpp, rgba8_copy,
+ rgba8_copy_aligned_dst);
   else
  

[Mesa-dev] [PATCH 1/4] i965/tiled_memcpy: ytiled_to_linear a cache line at a time

2018-02-27 Thread Scott D Phillips
Similar to the transformation applied to linear_to_ytiled, also align
each readback from the ytiled source to a cacheline (i.e. transfer a
whole cacheline from the source before moving on to the next column).
This will allow us to utilize movntqda (_mm_stream_si128) in a
subsequent patch to obtain near WB readback performance when accessing
the uncached ytiled memory, an order of magnitude improvement.
---
 src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 72 +++---
 1 file changed, 66 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c 
b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
index 69306828d72..7c6bde990d6 100644
--- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
+++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
@@ -451,7 +451,7 @@ xtiled_to_linear(uint32_t x0, uint32_t x1, uint32_t x2, 
uint32_t x3,
  */
 static inline void
 ytiled_to_linear(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3,
- uint32_t y0, uint32_t y1,
+ uint32_t y0, uint32_t y3,
  char *dst, const char *src,
  int32_t dst_pitch,
  uint32_t swizzle_bit,
@@ -470,6 +470,9 @@ ytiled_to_linear(uint32_t x0, uint32_t x1, uint32_t x2, 
uint32_t x3,
const uint32_t column_width = ytile_span;
const uint32_t bytes_per_column = column_width * ytile_height;
 
+   uint32_t y1 = MIN2(y3, ALIGN_UP(y0, 4));
+   uint32_t y2 = MAX2(y1, ALIGN_DOWN(y3, 4));
+
uint32_t xo0 = (x0 % ytile_span) + (x0 / ytile_span) * bytes_per_column;
uint32_t xo1 = (x1 % ytile_span) + (x1 / ytile_span) * bytes_per_column;
 
@@ -485,24 +488,81 @@ ytiled_to_linear(uint32_t x0, uint32_t x1, uint32_t x2, 
uint32_t x3,
 
dst += (ptrdiff_t)y0 * dst_pitch;
 
-   for (yo = y0 * column_width; yo < y1 * column_width; yo += column_width) {
+   if (y0 != y1) {
+  for (yo = y0 * column_width; yo < y1 * column_width; yo += column_width) 
{
+ uint32_t xo = xo1;
+ uint32_t swizzle = swizzle1;
+
+ mem_copy(dst + x0, src + ((xo0 + yo) ^ swizzle0), x1 - x0);
+
+ /* Step by spans/columns.  As it happens, the swizzle bit flips
+  * at each step so we don't need to calculate it explicitly.
+  */
+ for (x = x1; x < x2; x += ytile_span) {
+mem_copy_align16(dst + x, src + ((xo + yo) ^ swizzle), ytile_span);
+xo += bytes_per_column;
+swizzle ^= swizzle_bit;
+ }
+
+ mem_copy_align16(dst + x2, src + ((xo + yo) ^ swizzle), x3 - x2);
+
+ dst += dst_pitch;
+  }
+   }
+
+   for (yo = y1 * column_width; yo < y2 * column_width; yo += 4 * 
column_width) {
   uint32_t xo = xo1;
   uint32_t swizzle = swizzle1;
 
-  mem_copy(dst + x0, src + ((xo0 + yo) ^ swizzle0), x1 - x0);
+  if (x0 != x1) {
+ mem_copy(dst + x0 + 0 * dst_pitch, src + ((xo0 + yo + 0 * 
column_width) ^ swizzle0), x1 - x0);
+ mem_copy(dst + x0 + 1 * dst_pitch, src + ((xo0 + yo + 1 * 
column_width) ^ swizzle0), x1 - x0);
+ mem_copy(dst + x0 + 2 * dst_pitch, src + ((xo0 + yo + 2 * 
column_width) ^ swizzle0), x1 - x0);
+ mem_copy(dst + x0 + 3 * dst_pitch, src + ((xo0 + yo + 3 * 
column_width) ^ swizzle0), x1 - x0);
+  }
 
   /* Step by spans/columns.  As it happens, the swizzle bit flips
* at each step so we don't need to calculate it explicitly.
*/
   for (x = x1; x < x2; x += ytile_span) {
- mem_copy_align16(dst + x, src + ((xo + yo) ^ swizzle), ytile_span);
+ mem_copy_align16(dst + x + 0 * dst_pitch, src + ((xo + yo + 0 * 
column_width) ^ swizzle), ytile_span);
+ mem_copy_align16(dst + x + 1 * dst_pitch, src + ((xo + yo + 1 * 
column_width) ^ swizzle), ytile_span);
+ mem_copy_align16(dst + x + 2 * dst_pitch, src + ((xo + yo + 2 * 
column_width) ^ swizzle), ytile_span);
+ mem_copy_align16(dst + x + 3 * dst_pitch, src + ((xo + yo + 3 * 
column_width) ^ swizzle), ytile_span);
  xo += bytes_per_column;
  swizzle ^= swizzle_bit;
   }
 
-  mem_copy_align16(dst + x2, src + ((xo + yo) ^ swizzle), x3 - x2);
+  if (x2 != x3) {
+ mem_copy_align16(dst + x2 + 0 * dst_pitch, src + ((xo + yo + 0 * 
column_width) ^ swizzle), x3 - x2);
+ mem_copy_align16(dst + x2 + 1 * dst_pitch, src + ((xo + yo + 1 * 
column_width) ^ swizzle), x3 - x2);
+ mem_copy_align16(dst + x2 + 2 * dst_pitch, src + ((xo + yo + 2 * 
column_width) ^ swizzle), x3 - x2);
+ mem_copy_align16(dst + x2 + 3 * dst_pitch, src + ((xo + yo + 3 * 
column_width) ^ swizzle), x3 - x2);
+  }
 
-  dst += dst_pitch;
+  dst += 4 * dst_pitch;
+   }
+
+   if (y2 != y3) {
+  for (yo = y2 * column_width; yo < y3 * column_width; yo += column_width) 
{
+ uint32_t xo = xo1;
+ uint32_t swizzle = swizzle1;
+
+ mem_copy(dst + x0, src + ((xo0 + yo) ^ swizzle0), x1 - x0);
+
+ /* Step by spans/columns.  As it happen

[Mesa-dev] [PATCH 4/4] i965/tiled_memcpy: Add support for Yf and Ys tiling/detiling

2018-02-27 Thread Scott D Phillips
Yf and Ys are a family of tilings similar to Y. The actual address
bit interleavings for Yf* and Ys* depend upon the bits-per-pixel
value of the surface, where 128-, 32-, and 8-bpp tiles are square
and 64- and 16-bpp tiles have a 2:1 aspect ratio.

The address bit layout of Yf and Ys are the same in the low
12 bits (4-kbytes); however Ys tiles are actually 64-kbytes in
size, but can be handled as if they were composed of 4-kbyte
sub-tiles with a different overall tile ordering.

Because all of Y, Yf*, and Ys* have the same least significant
6 bits of address bit layout, the same tiling/detiling routine can
be used between them. The inner loop that writes or reads cache
lines at a time is the same, while the outer loop (which walks in
a mostly linear order) uses different address increment values for
each tiling format to adjust the tile addresses.
---
 src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 219 -
 1 file changed, 179 insertions(+), 40 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c 
b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
index a78e2b97d45..2b040a69524 100644
--- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
+++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
@@ -57,6 +57,12 @@ static const uint32_t xtile_span = 64;
 static const uint32_t ytile_width = 128;
 static const uint32_t ytile_height = 32;
 static const uint32_t ytile_span = 16;
+static const uint32_t std_ytile128_width = 256;
+static const uint32_t std_ytile128_height = 16;
+static const uint32_t std_ytile32_width = 128;
+static const uint32_t std_ytile32_height = 32;
+static const uint32_t std_ytile8_width = 64;
+static const uint32_t std_ytile8_height = 64;
 
 static inline uint32_t
 ror(uint32_t n, uint32_t d)
@@ -253,6 +259,48 @@ ytile_addr(uint32_t x, uint32_t y, char *src, uint32_t 
src_pitch)
return src + (((y >> 5) * (src_pitch >> 7) + (x >> 7)) << 12);
 }
 
+static char *
+yf128_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
+{
+   return src + (((y >> 4) * (src_pitch >> 8) + (x >> 8)) << 12);
+}
+
+static char *
+yf32_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
+{
+   return src + (((y >> 5) * (src_pitch >> 7) + (x >> 7)) << 12);
+}
+
+static char *
+yf8_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
+{
+   return src + (((y >> 6) * (src_pitch >> 6) + (x >> 6)) << 12);
+}
+
+static char *
+ys128_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
+{
+   return src + (((y & 0x10) << 8) + ((y & 0x20) << 9) + ((x & 0x100) << 5) +
+ ((x & 200) << 6) +
+ (((y >> 6) * (src_pitch >> 10) + (x >> 10)) << 16));
+}
+
+static char *
+ys32_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
+{
+   return src + (((y & 0x20) << 7) + ((y & 0x40) << 8) + ((x & 0x80) << 6) +
+ ((x & 100) << 7) +
+ (((y >> 7) * (src_pitch >> 9) + (x >> 9)) << 16));
+}
+
+static char *
+ys8_addr(uint32_t x, uint32_t y, char *src, uint32_t src_pitch)
+{
+   return src + (((y & 0x40) << 6) + ((y & 0x80) << 7) + ((x & 0x40) << 7) +
+ ((x & 80) << 8) +
+ (((y >> 8) * (src_pitch >> 8) + (x >> 8)) << 16));
+}
+
 /**
  * Copy texture data from linear to X tile layout.
  *
@@ -302,7 +350,8 @@ linear_to_xtiled(uint32_t x0, uint32_t x1, uint32_t x2, 
uint32_t x3,
 }
 
 /**
- * Copy texture data from linear to Y tile layout.
+ * Copy texture data from linear to Y tile layout. This function tiles a
+ * single 4KB portion of the tiling (even for the 64KB tiling variants)
  *
  * \copydoc tile_copy_fn
  */
@@ -312,28 +361,71 @@ linear_to_ytiled(uint32_t x0, uint32_t x1, uint32_t x2, 
uint32_t x3,
  char *dst, const char *src,
  int32_t src_pitch,
  uint32_t swizzle_bit,
- UNUSED enum isl_tiling tiling,
- UNUSED int cpp,
+ enum isl_tiling tiling,
+ int cpp,
  mem_copy_fn mem_copy,
  mem_copy_fn mem_copy_align16)
 {
-   /* Y tiles consist of columns that are 'ytile_span' wide (and the same 
height
-* as the tile).  Thus the destination offset for (x,y) is the sum of:
-*   (x % column_width)// position within column
-*   (x / column_width) * bytes_per_column // column number * bytes per 
column
-*   y * column_width
+   /* The Y tilings are a family of different tilings with the following
+* linear-to-tiled address mapping for the low 12-bits of the tiled
+* addresses:
+*
+* Tilingbpp11 10  9  8  7  6  5  4  3  2  1  0
+* 
+* TileYF/TileYS 64 & 128   u7 v3 u6 v2 u5 u4 v1 v0 u3 u2 u1 u0
+* TileYF/TileYS 16 &  32   u6 v4 u5 v3 u4 v2 v1 v0 u3 u2 u1 u0
+* TileYF/TileYS8   u5 v5 u4 v4 v3 v2 v1 v0 u3 u2 u1 u0
+* TileYu6 u5 u4 v4 v3 v2 v1 v0 u3 u2 u1 u0
  

[Mesa-dev] [PATCH 1/2] cso: don't cycle through PIPE_MAX_SHADER_SAMPLER_VIEWS on context destroy

2018-02-27 Thread sroland
From: Roland Scheidegger 

There's no point, we know the highest non-null one.
---
 src/gallium/auxiliary/cso_cache/cso_context.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/cso_cache/cso_context.c 
b/src/gallium/auxiliary/cso_cache/cso_context.c
index 1b5d4b5..3fa57f1 100644
--- a/src/gallium/auxiliary/cso_cache/cso_context.c
+++ b/src/gallium/auxiliary/cso_cache/cso_context.c
@@ -407,8 +407,10 @@ void cso_destroy_context( struct cso_context *ctx )
  ctx->pipe->set_stream_output_targets(ctx->pipe, 0, NULL, NULL);
}
 
-   for (i = 0; i < PIPE_MAX_SHADER_SAMPLER_VIEWS; i++) {
+   for (i = 0; i < ctx->nr_fragment_views; i++) {
   pipe_sampler_view_reference(&ctx->fragment_views[i], NULL);
+   }
+   for (i = 0; i < ctx->nr_fragment_views_saved; i++) {
   pipe_sampler_view_reference(&ctx->fragment_views_saved[i], NULL);
}
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] softpipe: don't iterate through PIPE_MAX_SHADER_SAMPLER_VIEWS

2018-02-27 Thread sroland
From: Roland Scheidegger 

We were setting view to NULL if the iteration was larger than i.
But in fact if the view is NULL the code did nothing anyway...
---
 src/gallium/drivers/softpipe/sp_state_sampler.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/softpipe/sp_state_sampler.c 
b/src/gallium/drivers/softpipe/sp_state_sampler.c
index c10fd91..751eb76 100644
--- a/src/gallium/drivers/softpipe/sp_state_sampler.c
+++ b/src/gallium/drivers/softpipe/sp_state_sampler.c
@@ -181,8 +181,8 @@ prepare_shader_sampling(
if (!num)
   return;
 
-   for (i = 0; i < PIPE_MAX_SHADER_SAMPLER_VIEWS; i++) {
-  struct pipe_sampler_view *view = i < num ? views[i] : NULL;
+   for (i = 0; i < num; i++) {
+  struct pipe_sampler_view *view = views[i];
 
   if (view) {
  struct pipe_resource *tex = view->texture;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: gallium: increase PIPE_MAX_SHADER_SAMPLER_VIEWS to 128

2018-02-27 Thread Roland Scheidegger
Am 27.02.2018 um 19:07 schrieb Roland Scheidegger:
> Am 27.02.2018 um 17:39 schrieb Brian Paul:
>> On 02/26/2018 07:45 PM, Roland Scheidegger wrote:
>>> Am 27.02.2018 um 03:38 schrieb srol...@vmware.com:
 From: Roland Scheidegger 

 Some state trackers require 128.
 (There are no plans to increase PIPE_MAX_SAMPLERS too, since with gl
 state tracker it's unlikely more than 32 will be needed, if you need
 more use bindless.)
 ---
   src/gallium/include/pipe/p_state.h | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/src/gallium/include/pipe/p_state.h
 b/src/gallium/include/pipe/p_state.h
 index 2b56d60..cddb3b4 100644
 --- a/src/gallium/include/pipe/p_state.h
 +++ b/src/gallium/include/pipe/p_state.h
 @@ -64,7 +64,7 @@ extern "C" {
   #define PIPE_MAX_SAMPLERS 32
   #define PIPE_MAX_SHADER_INPUTS    80 /* 32 GENERIC + 32 PATCH + 16
 others */
   #define PIPE_MAX_SHADER_OUTPUTS   80 /* 32 GENERIC + 32 PATCH + 16
 others */
 -#define PIPE_MAX_SHADER_SAMPLER_VIEWS 32
 +#define PIPE_MAX_SHADER_SAMPLER_VIEWS 128
   #define PIPE_MAX_SHADER_BUFFERS   32
   #define PIPE_MAX_SHADER_IMAGES    32
   #define PIPE_MAX_TEXTURE_LEVELS   16

>>>
>>>
>>> FWIW if drivers are doing the right thing (so, don't blindly announce
>>> support for PIPE_MAX_SHADER_SAMPLER_VIEWS assuming it's 32 if you can't
>>> handle more etc.), this should just work. It will, however, make some
>>> allocations bigger (cso context comes to mind, but there's probably
>>> more), so I think it warrants some discussion. Apart from that I don't
>>> think there should really be any performance degradation, since code
>>> should not blindly iterate over max views typically (rather, only the
>>> views up to the max number which has changed and so on).
>>
>> Looks like there's such a loop at sp_state_sampler.c:184 that should be
>> fixed.
> I once fixed all such loops in llvmpipe (because even with 32, they
> actually showed up in profiles). I probably missed the one in draw
> (since that one doesn't do ref counting just set things to zero it
> probably was much lower overhead) and wasn't that worried about
> softpipe... But I'll fix this one.
> 
>>
>> Other instances found with git grep "for.*PIPE_MAX_SHADER_SAMPLER_VIEWS"
>> src/gallium/ seem to be in context init/destroy functions.
> The one in cso_context destroy should be fixable easily.
> The problem with the ones in draw aaline/pstipple is that they don't
> actually store the number of active views (albeit they pretend they do).
> Since the interface there with start/num just says which ones are
> replaced, but the ones above start+num remain. This is not handled
> correctly by these pipeline stages, but at least at stage destruction we
> definitely free them. Probably time to fix that...
> Those were the only 3 I've found with a quick grep.
> 

Ok I've got fixes for the one in cso_context and softpipe.

The draw pstipple/aaline ones though are difficult to fix. The code
there is really broken in a dozen ways, and non-trivial to fix up.
I hate these stages for a reason:
- they have to intercept all fs / sampler / view calls, ref count the
views (and great, we're doing it twice due to two stages needing it),
even though it's likely the stage may never get used at all.
- they will only use TEX opcodes and will decide which sampler (and
therefore view) slot to use based solely on the used samplers. The view
slot could already be in use (with dx10 style shaders).
- they don't use the syntax of setting samplers and views (with start
and num) correctly. So the num_samplers/views used isn't actually what
they think it is, albeit it should still work with some luck (but it's
due to this that the loop can't be avoided easily on context destruction).
I gave up on it, and just thinking if context destruction is a
performance critical path in your app, you're doing it wrong.

Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 105285] Mesa-utils -8.4 - glxinfo report has a blank value in GL_NUM_SHADING_LANGUAGE_VERSIONS subsection and also the aligniment below 4.3 seems not correct

2018-02-27 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105285

Bug ID: 105285
   Summary: Mesa-utils -8.4 - glxinfo report has a blank value in
GL_NUM_SHADING_LANGUAGE_VERSIONS subsection and also
the aligniment below 4.3 seems not correct
   Product: Mesa
   Version: unspecified
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Other
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: arunsivaraman...@gmail.com
QA Contact: mesa-dev@lists.freedesktop.org

Created attachment 137681
  --> https://bugs.freedesktop.org/attachment.cgi?id=137681&action=edit
glxinfo

Hi

The latest version of glxinfo displays a blank Value under
GL_NUM_SHADING_LANGUAGE_VERSIONS sections, As per the report
GL_NUM_SHADING_LANGUAGE_VERSIONS has 26 including the blank else it should be
25. Also, the alignment of GL_VERTEX_PROGRAM_ARB, GL_FRAGMENT_PROGRAM_ARB looks
to be under 4.3. I am not sure if its correct. Please confirm

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: fix divide by zero crash during constant folding

2018-02-27 Thread Roland Scheidegger
I think you just have to rely on the default of them being disabled.
It is a bug of an app calling into mesa with non-default float state.

Roland

Am 28.02.2018 um 04:35 schrieb Jason Ekstrand:
> On Tue, Feb 27, 2018 at 7:28 PM, Connor Abbott  > wrote:
> 
> Floating point division shouldn't signal on division by zero, it
> should just return an appropriately-signed infinity, which seems like
> a sane thing to do, and way better than just returning 0.
> 
> 
> I think that depends on whether or not floating-point exceptions are
> enabled.  Unfortunately, I don't really know how to enable/disable them. :-(
>  
> 
> So we
> shouldn't do this with fdiv. I guess 0 is as good a result as any for
> the integer division, though -- there aren't really any great choices.
> 
> 
> Yeah, 0 is as good as anything.  Honestly, I'm a bit tempted to say that
> integer division by 0 is always 42. :-)
>  
> 
> On Tue, Feb 27, 2018 at 10:07 PM, Timothy Arceri
> mailto:tarc...@itsqueeze.com>> wrote:
> > From the GLSL 4.60 spec Section 5.9 (Expressions):
> >
> >    "Dividing by zero does not cause an exception but does result in
> >     an unspecified value."
> >
> > Fixes: 89285e4d47a6 "nir: add new constant folding infrastructure"
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105271
> 
> 
> > ---
> >  src/compiler/nir/nir_opcodes.py | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/compiler/nir/nir_opcodes.py
> b/src/compiler/nir/nir_opcodes.py
> > index 278562b2bd..dcc5b07d05 100644
> > --- a/src/compiler/nir/nir_opcodes.py
> > +++ b/src/compiler/nir/nir_opcodes.py
> > @@ -403,9 +403,9 @@ binop("imul_high", tint32, commutative,
> >  binop("umul_high", tuint32, commutative,
> >        "(uint32_t)(((uint64_t) src0 * (uint64_t) src1) >> 32)")
> >
> > -binop("fdiv", tfloat, "", "src0 / src1")
> > -binop("idiv", tint, "", "src0 / src1")
> > -binop("udiv", tuint, "", "src0 / src1")
> > +binop("fdiv", tfloat, "", "src1 == 0 ? 0 : (src0 / src1)")
> > +binop("idiv", tint, "", "src1 == 0 ? 0 : (src0 / src1)")
> > +binop("udiv", tuint, "", "src1 == 0 ? 0 : (src0 / src1)")
> >
> >  # returns a boolean representing the carry resulting from the
> addition of
> >  # the two unsigned arguments.
> > --
> > 2.14.3
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org 
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org 
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 
> 
> 
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=xwOqfDj5YhD5I8CsA5gm7Z1MM2EDarjrQZgMbJ3X3Ns&s=nDfjLHo9SqOf-KSvRJrDZOi93VfiBnnV-NN8hx4NyvE&e=
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] mesa: Fix tautological compare

2018-02-27 Thread Brian Paul

On 02/27/2018 06:07 PM, Ian Romanick wrote:

From: Ian Romanick 

_BaseFormat is GLenum16, but it is compared with an integer -1.  Even if
_mesa_base_tex_format returned -1, it becomes 0x, and the
comparison, after type promotion, will always fail.  Fix this by
explicitly casting both to GLenum16.  This should be more future proof
that casting just the -1.


"than"

Series is, Reviewed-by: Brian Paul 

How about we change _mesa_base_tex_format() to return GL_INVALID_ENUM 
instead of -1?  Or, even 0 like _mesa_base_fbo_format()?


-Brian




CID: 1429517
Fixes: ca721b3 ("mesa: use GLenum16 in a few more places")
Signed-off-by: Ian Romanick 
---
  src/mesa/main/teximage.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
index 35ed875..1d7bff5 100644
--- a/src/mesa/main/teximage.c
+++ b/src/mesa/main/teximage.c
@@ -853,7 +853,7 @@ init_teximage_fields_ms(struct gl_context *ctx,
  
 target = img->TexObject->Target;

 img->_BaseFormat = _mesa_base_tex_format( ctx, internalFormat );
-   assert(img->_BaseFormat != -1);
+   assert((GLenum16) img->_BaseFormat != (GLenum16) -1);
 img->InternalFormat = internalFormat;
 img->Border = border;
 img->Width = width;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: fix divide by zero crash during constant folding

2018-02-27 Thread Jason Ekstrand
On Tue, Feb 27, 2018 at 7:28 PM, Connor Abbott  wrote:

> Floating point division shouldn't signal on division by zero, it
> should just return an appropriately-signed infinity, which seems like
> a sane thing to do, and way better than just returning 0.


I think that depends on whether or not floating-point exceptions are
enabled.  Unfortunately, I don't really know how to enable/disable them. :-(


> So we
> shouldn't do this with fdiv. I guess 0 is as good a result as any for
> the integer division, though -- there aren't really any great choices.
>

Yeah, 0 is as good as anything.  Honestly, I'm a bit tempted to say that
integer division by 0 is always 42. :-)


> On Tue, Feb 27, 2018 at 10:07 PM, Timothy Arceri 
> wrote:
> > From the GLSL 4.60 spec Section 5.9 (Expressions):
> >
> >"Dividing by zero does not cause an exception but does result in
> > an unspecified value."
> >
> > Fixes: 89285e4d47a6 "nir: add new constant folding infrastructure"
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105271
> > ---
> >  src/compiler/nir/nir_opcodes.py | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_
> opcodes.py
> > index 278562b2bd..dcc5b07d05 100644
> > --- a/src/compiler/nir/nir_opcodes.py
> > +++ b/src/compiler/nir/nir_opcodes.py
> > @@ -403,9 +403,9 @@ binop("imul_high", tint32, commutative,
> >  binop("umul_high", tuint32, commutative,
> >"(uint32_t)(((uint64_t) src0 * (uint64_t) src1) >> 32)")
> >
> > -binop("fdiv", tfloat, "", "src0 / src1")
> > -binop("idiv", tint, "", "src0 / src1")
> > -binop("udiv", tuint, "", "src0 / src1")
> > +binop("fdiv", tfloat, "", "src1 == 0 ? 0 : (src0 / src1)")
> > +binop("idiv", tint, "", "src1 == 0 ? 0 : (src0 / src1)")
> > +binop("udiv", tuint, "", "src1 == 0 ? 0 : (src0 / src1)")
> >
> >  # returns a boolean representing the carry resulting from the addition
> of
> >  # the two unsigned arguments.
> > --
> > 2.14.3
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] nir: fix interger divide by zero crash during constant folding

2018-02-27 Thread Timothy Arceri
From the GLSL 4.60 spec Section 5.9 (Expressions):

   "Dividing by zero does not cause an exception but does result in
an unspecified value."

Fixes: 89285e4d47a6 "nir: add new constant folding infrastructure"

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105271
---
 src/compiler/nir/nir_opcodes.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
index 278562b2bd..97da4db28f 100644
--- a/src/compiler/nir/nir_opcodes.py
+++ b/src/compiler/nir/nir_opcodes.py
@@ -404,8 +404,8 @@ binop("umul_high", tuint32, commutative,
   "(uint32_t)(((uint64_t) src0 * (uint64_t) src1) >> 32)")
 
 binop("fdiv", tfloat, "", "src0 / src1")
-binop("idiv", tint, "", "src0 / src1")
-binop("udiv", tuint, "", "src0 / src1")
+binop("idiv", tint, "", "src1 == 0 ? 0 : (src0 / src1)")
+binop("udiv", tuint, "", "src1 == 0 ? 0 : (src0 / src1)")
 
 # returns a boolean representing the carry resulting from the addition of
 # the two unsigned arguments.
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: fix divide by zero crash during constant folding

2018-02-27 Thread Jason Ekstrand
On Tue, Feb 27, 2018 at 7:20 PM, Timothy Arceri 
wrote:

> On 28/02/18 14:13, Jason Ekstrand wrote:
>
>> On February 27, 2018 19:11:49 Jason Ekstrand 
>> wrote:
>>
>> On February 27, 2018 19:07:43 Timothy Arceri 
>>> wrote:
>>>
>>> From the GLSL 4.60 spec Section 5.9 (Expressions):

"Dividing by zero does not cause an exception but does result in
 an unspecified value."

 Fixes: 89285e4d47a6 "nir: add new constant folding infrastructure"

 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105271
 ---
  src/compiler/nir/nir_opcodes.py | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 diff --git a/src/compiler/nir/nir_opcodes.py
 b/src/compiler/nir/nir_opcodes.py
 index 278562b2bd..dcc5b07d05 100644
 --- a/src/compiler/nir/nir_opcodes.py
 +++ b/src/compiler/nir/nir_opcodes.py
 @@ -403,9 +403,9 @@ binop("imul_high", tint32, commutative,
  binop("umul_high", tuint32, commutative,
"(uint32_t)(((uint64_t) src0 * (uint64_t) src1) >> 32)")

 -binop("fdiv", tfloat, "", "src0 / src1")
 -binop("idiv", tint, "", "src0 / src1")
 -binop("udiv", tuint, "", "src0 / src1")
 +binop("fdiv", tfloat, "", "src1 == 0 ? 0 : (src0 / src1)")

>>>
>>> Should this return inf?
>>>
>>
>> Or maybe an unsignaling NaN?  I'm not really sure.
>>
>
> I'm not sure. GLSL IR doesn't even try to catch this currently.
>

According to wikipedia, it's supposed to result in +-inf:

https://en.wikipedia.org/wiki/IEEE_754#Exception_handling

The D3D10 docs say:

"Divide by 0 produces +/- INF, except 0/0 which results in NaN."

https://msdn.microsoft.com/en-us/library/windows/desktop/cc308050(v=vs.85).aspx

Another option would be to temporarily disable FP exceptions during
constant folding.


>
>> +binop("idiv", tint, "", "src1 == 0 ? 0 : (src0 / src1)")
 +binop("udiv", tuint, "", "src1 == 0 ? 0 : (src0 / src1)")

  # returns a boolean representing the carry resulting from the addition
 of
  # the two unsigned arguments.
 --
 2.14.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://lists.freedesktop.org/mailman/listinfo/mesa-dev

>>>
>>>
>>>
>>
>>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 54763] glCopyPixels dirties the context and all rendering is slower after that

2018-02-27 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=54763

Daniel van Vugt  changed:

   What|Removed |Added

 Resolution|--- |WONTFIX
 Status|NEW |RESOLVED

--- Comment #8 from Daniel van Vugt  ---
I'm no longer involved in this work so can't verify the bug any more.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: fix divide by zero crash during constant folding

2018-02-27 Thread Connor Abbott
Floating point division shouldn't signal on division by zero, it
should just return an appropriately-signed infinity, which seems like
a sane thing to do, and way better than just returning 0. So we
shouldn't do this with fdiv. I guess 0 is as good a result as any for
the integer division, though -- there aren't really any great choices.

On Tue, Feb 27, 2018 at 10:07 PM, Timothy Arceri  wrote:
> From the GLSL 4.60 spec Section 5.9 (Expressions):
>
>"Dividing by zero does not cause an exception but does result in
> an unspecified value."
>
> Fixes: 89285e4d47a6 "nir: add new constant folding infrastructure"
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105271
> ---
>  src/compiler/nir/nir_opcodes.py | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
> index 278562b2bd..dcc5b07d05 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -403,9 +403,9 @@ binop("imul_high", tint32, commutative,
>  binop("umul_high", tuint32, commutative,
>"(uint32_t)(((uint64_t) src0 * (uint64_t) src1) >> 32)")
>
> -binop("fdiv", tfloat, "", "src0 / src1")
> -binop("idiv", tint, "", "src0 / src1")
> -binop("udiv", tuint, "", "src0 / src1")
> +binop("fdiv", tfloat, "", "src1 == 0 ? 0 : (src0 / src1)")
> +binop("idiv", tint, "", "src1 == 0 ? 0 : (src0 / src1)")
> +binop("udiv", tuint, "", "src1 == 0 ? 0 : (src0 / src1)")
>
>  # returns a boolean representing the carry resulting from the addition of
>  # the two unsigned arguments.
> --
> 2.14.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: fix divide by zero crash during constant folding

2018-02-27 Thread Timothy Arceri

On 28/02/18 14:13, Jason Ekstrand wrote:

On February 27, 2018 19:11:49 Jason Ekstrand  wrote:

On February 27, 2018 19:07:43 Timothy Arceri  
wrote:



From the GLSL 4.60 spec Section 5.9 (Expressions):

   "Dividing by zero does not cause an exception but does result in
    an unspecified value."

Fixes: 89285e4d47a6 "nir: add new constant folding infrastructure"

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105271
---
 src/compiler/nir/nir_opcodes.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/compiler/nir/nir_opcodes.py 
b/src/compiler/nir/nir_opcodes.py

index 278562b2bd..dcc5b07d05 100644
--- a/src/compiler/nir/nir_opcodes.py
+++ b/src/compiler/nir/nir_opcodes.py
@@ -403,9 +403,9 @@ binop("imul_high", tint32, commutative,
 binop("umul_high", tuint32, commutative,
   "(uint32_t)(((uint64_t) src0 * (uint64_t) src1) >> 32)")

-binop("fdiv", tfloat, "", "src0 / src1")
-binop("idiv", tint, "", "src0 / src1")
-binop("udiv", tuint, "", "src0 / src1")
+binop("fdiv", tfloat, "", "src1 == 0 ? 0 : (src0 / src1)")


Should this return inf?


Or maybe an unsignaling NaN?  I'm not really sure.


I'm not sure. GLSL IR doesn't even try to catch this currently.




+binop("idiv", tint, "", "src1 == 0 ? 0 : (src0 / src1)")
+binop("udiv", tuint, "", "src1 == 0 ? 0 : (src0 / src1)")

 # returns a boolean representing the carry resulting from the 
addition of

 # the two unsigned arguments.
--
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev







___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: fix divide by zero crash during constant folding

2018-02-27 Thread Jason Ekstrand

On February 27, 2018 19:11:49 Jason Ekstrand  wrote:


On February 27, 2018 19:07:43 Timothy Arceri  wrote:


From the GLSL 4.60 spec Section 5.9 (Expressions):

   "Dividing by zero does not cause an exception but does result in
an unspecified value."

Fixes: 89285e4d47a6 "nir: add new constant folding infrastructure"

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105271
---
 src/compiler/nir/nir_opcodes.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
index 278562b2bd..dcc5b07d05 100644
--- a/src/compiler/nir/nir_opcodes.py
+++ b/src/compiler/nir/nir_opcodes.py
@@ -403,9 +403,9 @@ binop("imul_high", tint32, commutative,
 binop("umul_high", tuint32, commutative,
   "(uint32_t)(((uint64_t) src0 * (uint64_t) src1) >> 32)")

-binop("fdiv", tfloat, "", "src0 / src1")
-binop("idiv", tint, "", "src0 / src1")
-binop("udiv", tuint, "", "src0 / src1")
+binop("fdiv", tfloat, "", "src1 == 0 ? 0 : (src0 / src1)")


Should this return inf?


Or maybe an unsignaling NaN?  I'm not really sure.


+binop("idiv", tint, "", "src1 == 0 ? 0 : (src0 / src1)")
+binop("udiv", tuint, "", "src1 == 0 ? 0 : (src0 / src1)")

 # returns a boolean representing the carry resulting from the addition of
 # the two unsigned arguments.
--
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev






___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: fix divide by zero crash during constant folding

2018-02-27 Thread Jason Ekstrand

On February 27, 2018 19:07:43 Timothy Arceri  wrote:


From the GLSL 4.60 spec Section 5.9 (Expressions):

   "Dividing by zero does not cause an exception but does result in
an unspecified value."

Fixes: 89285e4d47a6 "nir: add new constant folding infrastructure"

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105271
---
 src/compiler/nir/nir_opcodes.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
index 278562b2bd..dcc5b07d05 100644
--- a/src/compiler/nir/nir_opcodes.py
+++ b/src/compiler/nir/nir_opcodes.py
@@ -403,9 +403,9 @@ binop("imul_high", tint32, commutative,
 binop("umul_high", tuint32, commutative,
   "(uint32_t)(((uint64_t) src0 * (uint64_t) src1) >> 32)")

-binop("fdiv", tfloat, "", "src0 / src1")
-binop("idiv", tint, "", "src0 / src1")
-binop("udiv", tuint, "", "src0 / src1")
+binop("fdiv", tfloat, "", "src1 == 0 ? 0 : (src0 / src1)")


Should this return inf?


+binop("idiv", tint, "", "src1 == 0 ? 0 : (src0 / src1)")
+binop("udiv", tuint, "", "src1 == 0 ? 0 : (src0 / src1)")

 # returns a boolean representing the carry resulting from the addition of
 # the two unsigned arguments.
--
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir: fix divide by zero crash during constant folding

2018-02-27 Thread Timothy Arceri
From the GLSL 4.60 spec Section 5.9 (Expressions):

   "Dividing by zero does not cause an exception but does result in
an unspecified value."

Fixes: 89285e4d47a6 "nir: add new constant folding infrastructure"

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105271
---
 src/compiler/nir/nir_opcodes.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
index 278562b2bd..dcc5b07d05 100644
--- a/src/compiler/nir/nir_opcodes.py
+++ b/src/compiler/nir/nir_opcodes.py
@@ -403,9 +403,9 @@ binop("imul_high", tint32, commutative,
 binop("umul_high", tuint32, commutative,
   "(uint32_t)(((uint64_t) src0 * (uint64_t) src1) >> 32)")
 
-binop("fdiv", tfloat, "", "src0 / src1")
-binop("idiv", tint, "", "src0 / src1")
-binop("udiv", tuint, "", "src0 / src1")
+binop("fdiv", tfloat, "", "src1 == 0 ? 0 : (src0 / src1)")
+binop("idiv", tint, "", "src1 == 0 ? 0 : (src0 / src1)")
+binop("udiv", tuint, "", "src1 == 0 ? 0 : (src0 / src1)")
 
 # returns a boolean representing the carry resulting from the addition of
 # the two unsigned arguments.
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] intel/l3: Don't allocate SLM partition on ICL+.

2018-02-27 Thread Jordan Justen
Reviewed-by: Jordan Justen 

On 2018-02-27 13:38:23, Francisco Jerez wrote:
> SLM has a chunk of special-purpose memory separate from L3 on ICL+, we
> shouldn't allocate a partition for it on L3 anymore.
> ---
>  src/intel/common/gen_l3_config.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/intel/common/gen_l3_config.c 
> b/src/intel/common/gen_l3_config.c
> index aff13c06ec0..7d58ad8d7c8 100644
> --- a/src/intel/common/gen_l3_config.c
> +++ b/src/intel/common/gen_l3_config.c
> @@ -232,7 +232,7 @@ gen_get_default_l3_weights(const struct gen_device_info 
> *devinfo,
>  {
> struct gen_l3_weights w = {{ 0 }};
>  
> -   w.w[GEN_L3P_SLM] = needs_slm;
> +   w.w[GEN_L3P_SLM] = devinfo->gen < 11 && needs_slm;
> w.w[GEN_L3P_URB] = 1.0;
>  
> if (devinfo->gen >= 8) {
> -- 
> 2.16.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] mesa: Silence unused parameter warning in compressedteximage_only_format

2018-02-27 Thread Jordan Justen
Series Reviewed-by: Jordan Justen 

On 2018-02-27 17:06:59, Ian Romanick wrote:
> From: Ian Romanick 
> 
> Passing ctx to compressedteximage_only_format was the only use of the
> ctx parameter in _mesa_format_no_online_compression, so that parameter
> had to go too.
> 
> ../../SOURCE/master/src/mesa/main/teximage.c: In function 
> ‘compressedteximage_only_format’:
> ../../SOURCE/master/src/mesa/main/teximage.c:1355:57: warning: unused 
> parameter ‘ctx’ [-Wunused-parameter]
>  compressedteximage_only_format(const struct gl_context *ctx, GLenum format)
>  ^~~
> 
> Signed-off-by: Ian Romanick 
> ---
>  src/mesa/main/formatquery.c |  2 +-
>  src/mesa/main/teximage.c| 16 
>  src/mesa/main/teximage.h|  2 +-
>  3 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/src/mesa/main/formatquery.c b/src/mesa/main/formatquery.c
> index 834f111..3b000fa 100644
> --- a/src/mesa/main/formatquery.c
> +++ b/src/mesa/main/formatquery.c
> @@ -502,7 +502,7 @@ _is_resource_supported(struct gl_context *ctx, GLenum 
> target,
>/* additional checks for compressed textures */
>if (_mesa_is_compressed_format(ctx, internalformat) &&
>(!_mesa_target_can_be_compressed(ctx, target, internalformat, 
> NULL) ||
> -   _mesa_format_no_online_compression(ctx, internalformat)))
> +   _mesa_format_no_online_compression(internalformat)))
>   return false;
>  
>break;
> diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
> index 928e50d..fbca7a7 100644
> --- a/src/mesa/main/teximage.c
> +++ b/src/mesa/main/teximage.c
> @@ -1352,7 +1352,7 @@ _mesa_test_proxy_teximage(struct gl_context *ctx, 
> GLenum target,
>   * Return true if the format is only valid for glCompressedTexImage.
>   */
>  static bool
> -compressedteximage_only_format(const struct gl_context *ctx, GLenum format)
> +compressedteximage_only_format(GLenum format)
>  {
> switch (format) {
> case GL_PALETTE4_RGB8_OES:
> @@ -1375,11 +1375,11 @@ compressedteximage_only_format(const struct 
> gl_context *ctx, GLenum format)
>   * Return true if the format doesn't support online compression.
>   */
>  bool
> -_mesa_format_no_online_compression(const struct gl_context *ctx, GLenum 
> format)
> +_mesa_format_no_online_compression(GLenum format)
>  {
> return _mesa_is_astc_format(format) ||
>_mesa_is_etc2_format(format) ||
> -  compressedteximage_only_format(ctx, format);
> +  compressedteximage_only_format(format);
>  }
>  
>  /* Writes to an GL error pointer if non-null and returns whether or not the
> @@ -1979,7 +1979,7 @@ texture_error_check( struct gl_context *ctx,
>   "glTexImage%dD(target can't be compressed)", 
> dimensions);
>   return GL_TRUE;
>}
> -  if (_mesa_format_no_online_compression(ctx, internalFormat)) {
> +  if (_mesa_format_no_online_compression(internalFormat)) {
>   _mesa_error(ctx, GL_INVALID_OPERATION,
>   "glTexImage%dD(no compression for format)", dimensions);
>   return GL_TRUE;
> @@ -2252,7 +2252,7 @@ texsubimage_error_check(struct gl_context *ctx, GLuint 
> dimensions,
> }
>  
> if (_mesa_is_format_compressed(texImage->TexFormat)) {
> -  if (_mesa_format_no_online_compression(ctx, texImage->InternalFormat)) 
> {
> +  if (_mesa_format_no_online_compression(texImage->InternalFormat)) {
>   _mesa_error(ctx, GL_INVALID_OPERATION,
> "%s(no compression for format)", callerName);
>   return GL_TRUE;
> @@ -2529,7 +2529,7 @@ copytexture_error_check( struct gl_context *ctx, GLuint 
> dimensions,
>   "glCopyTexImage%dD(target can't be compressed)", 
> dimensions);
>   return GL_TRUE;
>}
> -  if (_mesa_format_no_online_compression(ctx, internalFormat)) {
> +  if (_mesa_format_no_online_compression(internalFormat)) {
>   _mesa_error(ctx, GL_INVALID_OPERATION,
> "glCopyTexImage%dD(no compression for format)", dimensions);
>   return GL_TRUE;
> @@ -2611,7 +2611,7 @@ copytexsubimage_error_check(struct gl_context *ctx, 
> GLuint dimensions,
> }
>  
> if (_mesa_is_format_compressed(texImage->TexFormat)) {
> -  if (_mesa_format_no_online_compression(ctx, texImage->InternalFormat)) 
> {
> +  if (_mesa_format_no_online_compression(texImage->InternalFormat)) {
>   _mesa_error(ctx, GL_INVALID_OPERATION,
> "%s(no compression for format)", caller);
>   return GL_TRUE;
> @@ -4833,7 +4833,7 @@ compressed_subtexture_error_check(struct gl_context 
> *ctx, GLint dims,
>return GL_TRUE;
> }
>  
> -   if (compressedteximage_only_format(ctx, format)) {
> +   if (compressedteximage_only_format(format)) {
>_mesa_error(ctx, GL_INVALID_OPERATION, "%s(format=%s cannot be 
> updated)",
>callerName, _mesa_enum_to

Re: [Mesa-dev] [PATCH v3 06/13] intel/isl: Add support to emit clear value address.

2018-02-27 Thread Rafael Antognolli
On Tue, Feb 27, 2018 at 05:00:24PM -0800, Jordan Justen wrote:
> On 2018-02-26 21:35:42, Jason Ekstrand wrote:
> > On Mon, Feb 26, 2018 at 8:12 PM, Jordan Justen 
> > wrote:
> > 
> > > On 2018-02-26 17:08:12, Jason Ekstrand wrote:
> > > > On Mon, Feb 26, 2018 at 1:14 PM, Jordan Justen <
> > > jordan.l.jus...@intel.com>
> > > > wrote:
> > > >
> > > > > On 2018-02-21 13:45:15, Rafael Antognolli wrote:
> > > > > > +   bool use_clear_address;
> > > > >
> > > > > I'm still wondering about this field. I think at the end we can just a
> > > > > assume that if gen >= 10 and aux_usage != ISL_AUX_USAGE_NONE, then
> > > > > we'll use the address.
> > > > >
> > > >
> > > > That's not going to work if we want to turn this on for blorp, anv, and
> > > > i965 separately.
> > >
> > > I guess this goes to the point I mentioned below. Maybe it make it
> > > easier to break it up for enabling it. (I'm not certain that we
> > > couldn't slice it up another way, but the argument seems fine.)
> > >
> > > But, after that, is it needed? If it's alway enabled when gen >= 10
> > > and aux_usage != ISL_AUX_USAGE_NONE, then once everything is in place,
> > > then isl can easily check for that condition, and there's no purpose
> > > for use_clear_address. Correct?
> > >
> > 
> > I suppose.  Once everything's moved over, there's really no reason to keep
> > it around on gen10.
> > 
> > > I also wonder if clear_address is needed in the info struct. It did
> > > not look like blorp set it by the end of the series, yet blorp was
> > > enabling the feature. (I'm guessing that the reloc must be handing the
> > > aux buf offset for blorp.)
> > >
> > 
> > Yes and no.  It's not really used today but it is needed the moment we get
> > rid of relocations.
> 
> Ah. Good point.
> 
> I guess the pinned address could still be written by the 'reloc'
> function, without emitting the reloc. That would also mean
> clear_address wouldn't be needed in the isl info struct. The name
> 'reloc' starts adding confusion in this case, and maybe something like
> emit_bo_address might make more sense at that point.
> 
> Or, we could make the reloc functions total no-ops with pinned
> addresses, in which case clear_address would be needed.
> 
> Based on this, we might want to decide if patch 7 should be doing
> anything with clear_address... (Right now it doesn't set it.)

If that's the case, then we not only need to set clear_address, but also
address and aux_address (they are also not being set there).

> > >
> > > > > I think you mentioned that it could be tough implement the support in
> > > > > steps if we had an all or nothing enaling of the address usage. But,
> > > > > does that mean that at the end of your series you could add a patch to
> > > > > remove this `use_clear_address` field?
> > > > >
> > > > > Maybe as a test in jenkins, you could add a patch that asserts that if
> > > > > gen >= 10 and there is an aux_buffer, then use_clear_address==true in
> > > > > your current series.
> > > > >
> > > > > -Jordan
> > >
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] mesa: Silence remaining unused parameter warnings in teximage.c

2018-02-27 Thread Ian Romanick
From: Ian Romanick 

src/mesa/main/teximage.c: In function ‘_mesa_test_proxy_teximage’:
src/mesa/main/teximage.c:1301:51: warning: unused parameter ‘level’ 
[-Wunused-parameter]
   GLuint numLevels, GLint level,
   ^
src/mesa/main/teximage.c: In function ‘texsubimage_error_check’:
src/mesa/main/teximage.c:2186:30: warning: unused parameter ‘dsa’ 
[-Wunused-parameter]
 bool dsa, const char *callerName)
  ^~~
src/mesa/main/teximage.c: In function ‘copytexture_error_check’:
src/mesa/main/teximage.c:2297:32: warning: unused parameter ‘width’ 
[-Wunused-parameter]
  GLint width, GLint height, GLint border )
^
src/mesa/main/teximage.c:2297:45: warning: unused parameter ‘height’ 
[-Wunused-parameter]
  GLint width, GLint height, GLint border )
 ^~
src/mesa/main/teximage.c: In function ‘check_rtt_cb’:
src/mesa/main/teximage.c:2679:21: warning: unused parameter ‘key’ 
[-Wunused-parameter]
 check_rtt_cb(GLuint key, void *data, void *userData)
 ^~~
src/mesa/main/teximage.c: In function ‘override_internal_format’:
src/mesa/main/teximage.c:2756:55: warning: unused parameter ‘width’ 
[-Wunused-parameter]
 override_internal_format(GLenum internalFormat, GLint width, GLint height)
   ^
src/mesa/main/teximage.c:2756:68: warning: unused parameter ‘height’ 
[-Wunused-parameter]
 override_internal_format(GLenum internalFormat, GLint width, GLint height)
^~
src/mesa/main/teximage.c: In function ‘texture_sub_image’:
src/mesa/main/teximage.c:3293:24: warning: unused parameter ‘dsa’ 
[-Wunused-parameter]
   bool dsa)
^~~
src/mesa/main/teximage.c: In function ‘can_avoid_reallocation’:
src/mesa/main/teximage.c:3788:53: warning: unused parameter ‘x’ 
[-Wunused-parameter]
mesa_format texFormat, GLint x, GLint y, GLsizei width,
 ^
src/mesa/main/teximage.c:3788:62: warning: unused parameter ‘y’ 
[-Wunused-parameter]
mesa_format texFormat, GLint x, GLint y, GLsizei width,
  ^
src/mesa/main/teximage.c: In function ‘valid_texstorage_ms_parameters’:
src/mesa/main/teximage.c:5987:40: warning: unused parameter ‘samples’ 
[-Wunused-parameter]
GLsizei samples, unsigned dims)
^~~

Signed-off-by: Ian Romanick 
---
 src/mesa/main/teximage.c | 42 +-
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
index fbca7a7..35ed875 100644
--- a/src/mesa/main/teximage.c
+++ b/src/mesa/main/teximage.c
@@ -1298,7 +1298,7 @@ error_check_subtexture_dimensions(struct gl_context *ctx, 
GLuint dims,
  */
 GLboolean
 _mesa_test_proxy_teximage(struct gl_context *ctx, GLenum target,
-  GLuint numLevels, GLint level,
+  GLuint numLevels, MAYBE_UNUSED GLint level,
   mesa_format format, GLuint numSamples,
   GLint width, GLint height, GLint depth)
 {
@@ -2183,7 +2183,7 @@ texsubimage_error_check(struct gl_context *ctx, GLuint 
dimensions,
 GLint xoffset, GLint yoffset, GLint zoffset,
 GLint width, GLint height, GLint depth,
 GLenum format, GLenum type, const GLvoid *pixels,
-bool dsa, const char *callerName)
+const char *callerName)
 {
struct gl_texture_image *texImage;
GLenum err;
@@ -2294,7 +2294,7 @@ texsubimage_error_check(struct gl_context *ctx, GLuint 
dimensions,
 static GLboolean
 copytexture_error_check( struct gl_context *ctx, GLuint dimensions,
  GLenum target, GLint level, GLint internalFormat,
- GLint width, GLint height, GLint border )
+ GLint border )
 {
GLint baseFormat;
GLint rb_base_format;
@@ -2676,7 +2676,7 @@ struct cb_info
  * Check render to texture callback.  Called from _mesa_HashWalk().
  */
 static void
-check_rtt_cb(GLuint key, void *data, void *userData)
+check_rtt_cb(UNUSED GLuint key, void *data, void *userData)
 {
struct gl_framebuffer *fb = (struct gl_framebuffer *) data;
const struct cb_info *info = (struct cb_info *) userData;
@@ -2753,7 +2753,8 @@ check_gen_mipmap(struct gl_context *ctx, GLenum target,
 
 /** Debug helper: override the user-requested internal format */
 static GLenum
-override_internal_format(GLenum internalFormat, GLint width, GLint hei

[Mesa-dev] [PATCH 3/3] mesa: Fix tautological compare

2018-02-27 Thread Ian Romanick
From: Ian Romanick 

_BaseFormat is GLenum16, but it is compared with an integer -1.  Even if
_mesa_base_tex_format returned -1, it becomes 0x, and the
comparison, after type promotion, will always fail.  Fix this by
explicitly casting both to GLenum16.  This should be more future proof
that casting just the -1.

CID: 1429517
Fixes: ca721b3 ("mesa: use GLenum16 in a few more places")
Signed-off-by: Ian Romanick 
---
 src/mesa/main/teximage.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
index 35ed875..1d7bff5 100644
--- a/src/mesa/main/teximage.c
+++ b/src/mesa/main/teximage.c
@@ -853,7 +853,7 @@ init_teximage_fields_ms(struct gl_context *ctx,
 
target = img->TexObject->Target;
img->_BaseFormat = _mesa_base_tex_format( ctx, internalFormat );
-   assert(img->_BaseFormat != -1);
+   assert((GLenum16) img->_BaseFormat != (GLenum16) -1);
img->InternalFormat = internalFormat;
img->Border = border;
img->Width = width;
-- 
2.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] mesa: Silence unused parameter warning in compressedteximage_only_format

2018-02-27 Thread Ian Romanick
From: Ian Romanick 

Passing ctx to compressedteximage_only_format was the only use of the
ctx parameter in _mesa_format_no_online_compression, so that parameter
had to go too.

../../SOURCE/master/src/mesa/main/teximage.c: In function 
‘compressedteximage_only_format’:
../../SOURCE/master/src/mesa/main/teximage.c:1355:57: warning: unused parameter 
‘ctx’ [-Wunused-parameter]
 compressedteximage_only_format(const struct gl_context *ctx, GLenum format)
 ^~~

Signed-off-by: Ian Romanick 
---
 src/mesa/main/formatquery.c |  2 +-
 src/mesa/main/teximage.c| 16 
 src/mesa/main/teximage.h|  2 +-
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/mesa/main/formatquery.c b/src/mesa/main/formatquery.c
index 834f111..3b000fa 100644
--- a/src/mesa/main/formatquery.c
+++ b/src/mesa/main/formatquery.c
@@ -502,7 +502,7 @@ _is_resource_supported(struct gl_context *ctx, GLenum 
target,
   /* additional checks for compressed textures */
   if (_mesa_is_compressed_format(ctx, internalformat) &&
   (!_mesa_target_can_be_compressed(ctx, target, internalformat, NULL) 
||
-   _mesa_format_no_online_compression(ctx, internalformat)))
+   _mesa_format_no_online_compression(internalformat)))
  return false;
 
   break;
diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
index 928e50d..fbca7a7 100644
--- a/src/mesa/main/teximage.c
+++ b/src/mesa/main/teximage.c
@@ -1352,7 +1352,7 @@ _mesa_test_proxy_teximage(struct gl_context *ctx, GLenum 
target,
  * Return true if the format is only valid for glCompressedTexImage.
  */
 static bool
-compressedteximage_only_format(const struct gl_context *ctx, GLenum format)
+compressedteximage_only_format(GLenum format)
 {
switch (format) {
case GL_PALETTE4_RGB8_OES:
@@ -1375,11 +1375,11 @@ compressedteximage_only_format(const struct gl_context 
*ctx, GLenum format)
  * Return true if the format doesn't support online compression.
  */
 bool
-_mesa_format_no_online_compression(const struct gl_context *ctx, GLenum format)
+_mesa_format_no_online_compression(GLenum format)
 {
return _mesa_is_astc_format(format) ||
   _mesa_is_etc2_format(format) ||
-  compressedteximage_only_format(ctx, format);
+  compressedteximage_only_format(format);
 }
 
 /* Writes to an GL error pointer if non-null and returns whether or not the
@@ -1979,7 +1979,7 @@ texture_error_check( struct gl_context *ctx,
  "glTexImage%dD(target can't be compressed)", dimensions);
  return GL_TRUE;
   }
-  if (_mesa_format_no_online_compression(ctx, internalFormat)) {
+  if (_mesa_format_no_online_compression(internalFormat)) {
  _mesa_error(ctx, GL_INVALID_OPERATION,
  "glTexImage%dD(no compression for format)", dimensions);
  return GL_TRUE;
@@ -2252,7 +2252,7 @@ texsubimage_error_check(struct gl_context *ctx, GLuint 
dimensions,
}
 
if (_mesa_is_format_compressed(texImage->TexFormat)) {
-  if (_mesa_format_no_online_compression(ctx, texImage->InternalFormat)) {
+  if (_mesa_format_no_online_compression(texImage->InternalFormat)) {
  _mesa_error(ctx, GL_INVALID_OPERATION,
"%s(no compression for format)", callerName);
  return GL_TRUE;
@@ -2529,7 +2529,7 @@ copytexture_error_check( struct gl_context *ctx, GLuint 
dimensions,
  "glCopyTexImage%dD(target can't be compressed)", 
dimensions);
  return GL_TRUE;
   }
-  if (_mesa_format_no_online_compression(ctx, internalFormat)) {
+  if (_mesa_format_no_online_compression(internalFormat)) {
  _mesa_error(ctx, GL_INVALID_OPERATION,
"glCopyTexImage%dD(no compression for format)", dimensions);
  return GL_TRUE;
@@ -2611,7 +2611,7 @@ copytexsubimage_error_check(struct gl_context *ctx, 
GLuint dimensions,
}
 
if (_mesa_is_format_compressed(texImage->TexFormat)) {
-  if (_mesa_format_no_online_compression(ctx, texImage->InternalFormat)) {
+  if (_mesa_format_no_online_compression(texImage->InternalFormat)) {
  _mesa_error(ctx, GL_INVALID_OPERATION,
"%s(no compression for format)", caller);
  return GL_TRUE;
@@ -4833,7 +4833,7 @@ compressed_subtexture_error_check(struct gl_context *ctx, 
GLint dims,
   return GL_TRUE;
}
 
-   if (compressedteximage_only_format(ctx, format)) {
+   if (compressedteximage_only_format(format)) {
   _mesa_error(ctx, GL_INVALID_OPERATION, "%s(format=%s cannot be updated)",
   callerName, _mesa_enum_to_string(format));
   return GL_TRUE;
diff --git a/src/mesa/main/teximage.h b/src/mesa/main/teximage.h
index 2e950bf..fbc69df 100644
--- a/src/mesa/main/teximage.h
+++ b/src/mesa/main/teximage.h
@@ -213,7 +213,7 @@ _mesa_legal_texture_base_format_for_target(struct 
gl_context *ctx,
   

Re: [Mesa-dev] [PATCH v3 06/13] intel/isl: Add support to emit clear value address.

2018-02-27 Thread Jordan Justen
On 2018-02-26 21:35:42, Jason Ekstrand wrote:
> On Mon, Feb 26, 2018 at 8:12 PM, Jordan Justen 
> wrote:
> 
> > On 2018-02-26 17:08:12, Jason Ekstrand wrote:
> > > On Mon, Feb 26, 2018 at 1:14 PM, Jordan Justen <
> > jordan.l.jus...@intel.com>
> > > wrote:
> > >
> > > > On 2018-02-21 13:45:15, Rafael Antognolli wrote:
> > > > > +   bool use_clear_address;
> > > >
> > > > I'm still wondering about this field. I think at the end we can just a
> > > > assume that if gen >= 10 and aux_usage != ISL_AUX_USAGE_NONE, then
> > > > we'll use the address.
> > > >
> > >
> > > That's not going to work if we want to turn this on for blorp, anv, and
> > > i965 separately.
> >
> > I guess this goes to the point I mentioned below. Maybe it make it
> > easier to break it up for enabling it. (I'm not certain that we
> > couldn't slice it up another way, but the argument seems fine.)
> >
> > But, after that, is it needed? If it's alway enabled when gen >= 10
> > and aux_usage != ISL_AUX_USAGE_NONE, then once everything is in place,
> > then isl can easily check for that condition, and there's no purpose
> > for use_clear_address. Correct?
> >
> 
> I suppose.  Once everything's moved over, there's really no reason to keep
> it around on gen10.
> 
> > I also wonder if clear_address is needed in the info struct. It did
> > not look like blorp set it by the end of the series, yet blorp was
> > enabling the feature. (I'm guessing that the reloc must be handing the
> > aux buf offset for blorp.)
> >
> 
> Yes and no.  It's not really used today but it is needed the moment we get
> rid of relocations.

Ah. Good point.

I guess the pinned address could still be written by the 'reloc'
function, without emitting the reloc. That would also mean
clear_address wouldn't be needed in the isl info struct. The name
'reloc' starts adding confusion in this case, and maybe something like
emit_bo_address might make more sense at that point.

Or, we could make the reloc functions total no-ops with pinned
addresses, in which case clear_address would be needed.

Based on this, we might want to decide if patch 7 should be doing
anything with clear_address... (Right now it doesn't set it.)

-Jordan

> >
> > > > I think you mentioned that it could be tough implement the support in
> > > > steps if we had an all or nothing enaling of the address usage. But,
> > > > does that mean that at the end of your series you could add a patch to
> > > > remove this `use_clear_address` field?
> > > >
> > > > Maybe as a test in jenkins, you could add a patch that asserts that if
> > > > gen >= 10 and there is an aux_buffer, then use_clear_address==true in
> > > > your current series.
> > > >
> > > > -Jordan
> >
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 12/13] anv: Emit the fast clear color address, instead of value.

2018-02-27 Thread Jason Ekstrand
On Tue, Feb 27, 2018 at 4:45 PM, Rafael Antognolli <
rafael.antogno...@intel.com> wrote:

> On Tue, Feb 27, 2018 at 02:58:01PM -0800, Jason Ekstrand wrote:
> > On Wed, Feb 21, 2018 at 1:45 PM, Rafael Antognolli <
> rafael.antogno...@intel.com
> > > wrote:
> >
> > On Gen10+, instead of copying the clear color from the state buffer
> to
> > the surface state, just use the address of the state buffer in the
> > surface state directly. This way we can avoid the copy from state
> buffer
> > to surface state.
> >
> > Signed-off-by: Rafael Antognolli 
> > ---
> >  src/intel/vulkan/anv_image.c   | 19 ++
> >  src/intel/vulkan/anv_private.h |  5 
> >  src/intel/vulkan/genX_cmd_buffer.c | 52
> ++
> > +---
> >  3 files changed, 72 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/intel/vulkan/anv_image.c
> b/src/intel/vulkan/anv_image.c
> > index 0dafe03442d..6b7ea32cbb3 100644
> > --- a/src/intel/vulkan/anv_image.c
> > +++ b/src/intel/vulkan/anv_image.c
> > @@ -1023,6 +1023,15 @@ anv_image_fill_surface_state(struct
> anv_device
> > *device,
> > const uint64_t aux_address = aux_usage == ISL_AUX_USAGE_NONE ?
> >0 : (image->planes[plane].bo_offset + aux_surface->offset);
> >
> > +   bool use_clear_address = false;
> > +   struct anv_address clear_address = { .bo = NULL };
> > +   state_inout->clear_address = 0;
> > +   if (device->info.gen >= 10 && aux_usage != ISL_AUX_USAGE_NONE &&
> > +   aux_usage != ISL_AUX_USAGE_HIZ) {
> > +  clear_address = anv_image_get_clear_color_addr(device, image,
> > aspect);
> > +  use_clear_address = true;
> > +   }
> > +
> > if (view_usage == ISL_SURF_USAGE_STORAGE_BIT &&
> > !(flags & ANV_IMAGE_VIEW_STATE_STORAGE_WRITE_ONLY) &&
> > !isl_has_matching_typed_storage_image_format(&device->info,
> > @@ -1040,6 +1049,7 @@ anv_image_fill_surface_state(struct anv_device
> > *device,
> >  .mocs = device->default_mocs);
> >state_inout->address = address,
> >state_inout->aux_address = 0;
> > +  state_inout->clear_address = 0;
> > } else {
> >if (view_usage == ISL_SURF_USAGE_STORAGE_BIT &&
> >!(flags & ANV_IMAGE_VIEW_STATE_STORAGE_WRITE_ONLY)) {
> > @@ -1113,6 +1123,8 @@ anv_image_fill_surface_state(struct anv_device
> > *device,
> >.aux_surf = &aux_surface->isl,
> >.aux_usage = aux_usage,
> >.aux_address = aux_address,
> > +  .clear_address = clear_address.offset,
> > +  .use_clear_address = use_clear_address,
> >.mocs = device->default_mocs,
> >.x_offset_sa = tile_x_sa,
> >.y_offset_sa = tile_y_sa);
> > @@ -1134,6 +1146,13 @@ anv_image_fill_surface_state(struct
> anv_device
> > *device,
> >assert((aux_address & 0xfff) == 0);
> >assert(aux_address == (*aux_addr_dw & 0xf000));
> >state_inout->aux_address = *aux_addr_dw;
> > +
> > +  if (device->info.gen >= 10 && clear_address.bo) {
> >
> >
> > Here you use clear_address.bo != NULL but above you use
> use_clear_address.
> > Probably best to just pick one and stick with it.
> >
> >
> > + uint32_t *clear_addr_dw = state_inout->state.map +
> > +   device->isl_dev.ss.clear_
> value_offset;
> > + assert((clear_address.offset & 0x3f) == 0);
> > + state_inout->clear_address = *clear_addr_dw;
> > +  }
> > }
> >
> > anv_state_flush(device, state_inout->state);
> > diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_
> > private.h
> > index b8c381d2665..5c077987cef 100644
> > --- a/src/intel/vulkan/anv_private.h
> > +++ b/src/intel/vulkan/anv_private.h
> > @@ -1674,6 +1674,11 @@ struct anv_surface_state {
> >  * bits of this address include extra aux information.
> >  */
> > uint64_t aux_address;
> > +   /* Address of the clear color, if any
> > +*
> > +* This address is relative to the start of the BO.
> > +*/
> > +   uint64_t clear_address;
> >  };
> >
> >  /**
> > diff --git a/src/intel/vulkan/genX_cmd_buffer.c b/src/intel/vulkan/
> > genX_cmd_buffer.c
> > index 939a795c2b1..b9e1d50cbe3 100644
> > --- a/src/intel/vulkan/genX_cmd_buffer.c
> > +++ b/src/intel/vulkan/genX_cmd_buffer.c
> > @@ -200,6 +200,16 @@ add_image_view_relocs(struct anv_cmd_buffer
> > *cmd_buffer,
> >if (result != VK_SUCCESS)
> >   anv_batch_set_error(&cmd_buffer->batch, result);
> > }
> > 

Re: [Mesa-dev] [PATCH v3 12/13] anv: Emit the fast clear color address, instead of value.

2018-02-27 Thread Rafael Antognolli
On Tue, Feb 27, 2018 at 02:58:01PM -0800, Jason Ekstrand wrote:
> On Wed, Feb 21, 2018 at 1:45 PM, Rafael Antognolli 
>  > wrote:
> 
> On Gen10+, instead of copying the clear color from the state buffer to
> the surface state, just use the address of the state buffer in the
> surface state directly. This way we can avoid the copy from state buffer
> to surface state.
> 
> Signed-off-by: Rafael Antognolli 
> ---
>  src/intel/vulkan/anv_image.c   | 19 ++
>  src/intel/vulkan/anv_private.h |  5 
>  src/intel/vulkan/genX_cmd_buffer.c | 52 ++
> +---
>  3 files changed, 72 insertions(+), 4 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
> index 0dafe03442d..6b7ea32cbb3 100644
> --- a/src/intel/vulkan/anv_image.c
> +++ b/src/intel/vulkan/anv_image.c
> @@ -1023,6 +1023,15 @@ anv_image_fill_surface_state(struct anv_device
> *device,
> const uint64_t aux_address = aux_usage == ISL_AUX_USAGE_NONE ?
>0 : (image->planes[plane].bo_offset + aux_surface->offset);
> 
> +   bool use_clear_address = false;
> +   struct anv_address clear_address = { .bo = NULL };
> +   state_inout->clear_address = 0;
> +   if (device->info.gen >= 10 && aux_usage != ISL_AUX_USAGE_NONE &&
> +   aux_usage != ISL_AUX_USAGE_HIZ) {
> +  clear_address = anv_image_get_clear_color_addr(device, image,
> aspect);
> +  use_clear_address = true;
> +   }
> +
> if (view_usage == ISL_SURF_USAGE_STORAGE_BIT &&
> !(flags & ANV_IMAGE_VIEW_STATE_STORAGE_WRITE_ONLY) &&
> !isl_has_matching_typed_storage_image_format(&device->info,
> @@ -1040,6 +1049,7 @@ anv_image_fill_surface_state(struct anv_device
> *device,
>  .mocs = device->default_mocs);
>state_inout->address = address,
>state_inout->aux_address = 0;
> +  state_inout->clear_address = 0;
> } else {
>if (view_usage == ISL_SURF_USAGE_STORAGE_BIT &&
>!(flags & ANV_IMAGE_VIEW_STATE_STORAGE_WRITE_ONLY)) {
> @@ -1113,6 +1123,8 @@ anv_image_fill_surface_state(struct anv_device
> *device,
>.aux_surf = &aux_surface->isl,
>.aux_usage = aux_usage,
>.aux_address = aux_address,
> +  .clear_address = clear_address.offset,
> +  .use_clear_address = use_clear_address,
>.mocs = device->default_mocs,
>.x_offset_sa = tile_x_sa,
>.y_offset_sa = tile_y_sa);
> @@ -1134,6 +1146,13 @@ anv_image_fill_surface_state(struct anv_device
> *device,
>assert((aux_address & 0xfff) == 0);
>assert(aux_address == (*aux_addr_dw & 0xf000));
>state_inout->aux_address = *aux_addr_dw;
> +
> +  if (device->info.gen >= 10 && clear_address.bo) {
> 
> 
> Here you use clear_address.bo != NULL but above you use use_clear_address. 
> Probably best to just pick one and stick with it.
>  
> 
> + uint32_t *clear_addr_dw = state_inout->state.map +
> +   device->isl_dev.ss.clear_value_offset;
> + assert((clear_address.offset & 0x3f) == 0);
> + state_inout->clear_address = *clear_addr_dw;
> +  }
> }
> 
> anv_state_flush(device, state_inout->state);
> diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_
> private.h
> index b8c381d2665..5c077987cef 100644
> --- a/src/intel/vulkan/anv_private.h
> +++ b/src/intel/vulkan/anv_private.h
> @@ -1674,6 +1674,11 @@ struct anv_surface_state {
>  * bits of this address include extra aux information.
>  */
> uint64_t aux_address;
> +   /* Address of the clear color, if any
> +*
> +* This address is relative to the start of the BO.
> +*/
> +   uint64_t clear_address;
>  };
> 
>  /**
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c b/src/intel/vulkan/
> genX_cmd_buffer.c
> index 939a795c2b1..b9e1d50cbe3 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -200,6 +200,16 @@ add_image_view_relocs(struct anv_cmd_buffer
> *cmd_buffer,
>if (result != VK_SUCCESS)
>   anv_batch_set_error(&cmd_buffer->batch, result);
> }
> +
> +   if (state.clear_address) {
> +  VkResult result =
> + anv_reloc_list_add(&cmd_buffer->surface_relocs,
> +&cmd_buffer->pool->alloc,
> +state.state.offset + isl_dev->ss.clear_value_
> offset,
> 
> 
> I'm not sure how comfortable I am wit

Re: [Mesa-dev] [PATCH v2] anv: set maxResourceSize to the respective value for each generation

2018-02-27 Thread Jason Ekstrand
On Tue, Feb 27, 2018 at 2:43 AM, Samuel Iglesias Gonsálvez <
sigles...@igalia.com> wrote:

> v2:
> - Add the proper values to gen9+ (Jason)
>
> Signed-off-by: Samuel Iglesias Gonsálvez 
> ---
>  src/intel/vulkan/anv_formats.c | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/vulkan/anv_formats.c b/src/intel/vulkan/anv_
> formats.c
> index 9c52ad5acbd..cbc3d357922 100644
> --- a/src/intel/vulkan/anv_formats.c
> +++ b/src/intel/vulkan/anv_formats.c
> @@ -842,6 +842,18 @@ anv_get_image_format_properties(
> */
> }
>
> +   /* Pre-gen9 has a 2 GB limitation of the size in bytes,
> +* gen9 and gen10 have a 256 GB limitation and gen11+
> +* has a 16 TB limitation.
>

Might be worth saying that this comes from the bspec section entitled
"Surface Layout and Tiling".

Reviewed-by: Jason Ekstrand 


> +*/
> +   uint64_t maxResourceSize = 0;
> +   if (devinfo->gen < 9)
> +  maxResourceSize = (uint64_t) 1 << 31;
> +   else if (devinfo->gen < 11)
> +  maxResourceSize = (uint64_t) 1 << 38;
> +   else
> +  maxResourceSize = (uint64_t) 1 << 44;
> +
> *pImageFormatProperties = (VkImageFormatProperties) {
>.maxExtent = maxExtent,
>.maxMipLevels = maxMipLevels,
> @@ -851,7 +863,7 @@ anv_get_image_format_properties(
>/* FINISHME: Accurately calculate
> * VkImageFormatProperties::maxResourceSize.
> */
> -  .maxResourceSize = UINT32_MAX,
> +  .maxResourceSize = maxResourceSize,
> };
>
> if (pYcbcrImageFormatProperties) {
> --
> 2.14.1
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] ac/radv: move load base vertex abi setup to vertex shader.

2018-02-27 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

On Wed, Feb 28, 2018 at 12:55 AM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> This was segfaulting:
> dEQP-VK.memory.pipeline_barrier.host_write_index_buffer.1024
>
> Fixes: 8de6f797070 (ac/radeonsi: add load_base_vertex() to the abi)
> Signed-off-by: Dave Airlie 
> ---
>  src/amd/common/ac_nir_to_llvm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index 44d96d27d7..8b662f884f 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -6920,7 +6920,6 @@ LLVMModuleRef 
> ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
> ctx.gs_max_out_vertices = 
> shaders[i]->info.gs.vertices_out;
> ctx.abi.load_inputs = load_gs_input;
> ctx.abi.emit_primitive = visit_end_primitive;
> -   ctx.abi.load_base_vertex = radv_load_base_vertex;
> } else if (shaders[i]->info.stage == MESA_SHADER_TESS_CTRL) {
> ctx.tcs_outputs_read = shaders[i]->info.outputs_read;
> ctx.tcs_patch_outputs_read = 
> shaders[i]->info.patch_outputs_read;
> @@ -6944,6 +6943,7 @@ LLVMModuleRef 
> ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
> MAX2(1, 
> ctx.shader_info->vs.vgpr_comp_cnt);
> }
> }
> +   ctx.abi.load_base_vertex = radv_load_base_vertex;
> } else if (shaders[i]->info.stage == MESA_SHADER_FRAGMENT) {
> shader_info->fs.can_discard = 
> shaders[i]->info.fs.uses_discard;
> ctx.abi.lookup_interp_param = lookup_interp_param;
> --
> 2.14.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeon/vcn: use enc profile instead of pic profile

2018-02-27 Thread boyuan.zhang
From: Boyuan Zhang 

Picture profile might not be set in some cases. Therefore, better to use
the profile stored in encoder base.

Signed-off-by: Boyuan Zhang 
---
 src/gallium/drivers/radeon/radeon_vcn_enc.c |  4 ++--
 src/gallium/drivers/radeon/radeon_vcn_enc_1_2.c | 16 
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_vcn_enc.c 
b/src/gallium/drivers/radeon/radeon_vcn_enc.c
index 388a333..dcc25f2 100644
--- a/src/gallium/drivers/radeon/radeon_vcn_enc.c
+++ b/src/gallium/drivers/radeon/radeon_vcn_enc.c
@@ -40,7 +40,7 @@
 
 static void radeon_vcn_enc_get_param(struct radeon_encoder *enc, struct 
pipe_picture_desc *picture)
 {
-   if (u_reduce_video_profile(picture->profile) == 
PIPE_VIDEO_FORMAT_MPEG4_AVC) {
+   if (u_reduce_video_profile(enc->base.profile) == 
PIPE_VIDEO_FORMAT_MPEG4_AVC) {
   struct pipe_h264_enc_picture_desc *pic = (struct 
pipe_h264_enc_picture_desc *)picture;
   enc->enc_pic.picture_type = pic->picture_type;
   enc->enc_pic.frame_num = pic->frame_num;
@@ -54,7 +54,7 @@ static void radeon_vcn_enc_get_param(struct radeon_encoder 
*enc, struct pipe_pic
   enc->enc_pic.crop_right = (align(enc->base.width, 16) - enc->base.width) 
/ 2;
   enc->enc_pic.crop_top = 0;
   enc->enc_pic.crop_bottom = (align(enc->base.height, 16) - 
enc->base.height) / 2;
-   } else if (u_reduce_video_profile(picture->profile) == 
PIPE_VIDEO_FORMAT_HEVC) {
+   } else if (u_reduce_video_profile(enc->base.profile) == 
PIPE_VIDEO_FORMAT_HEVC) {
   struct pipe_h265_enc_picture_desc *pic = (struct 
pipe_h265_enc_picture_desc *)picture;
   enc->enc_pic.picture_type = pic->picture_type;
   enc->enc_pic.frame_num = pic->frame_num;
diff --git a/src/gallium/drivers/radeon/radeon_vcn_enc_1_2.c 
b/src/gallium/drivers/radeon/radeon_vcn_enc_1_2.c
index 07493d8..9adf40f 100644
--- a/src/gallium/drivers/radeon/radeon_vcn_enc_1_2.c
+++ b/src/gallium/drivers/radeon/radeon_vcn_enc_1_2.c
@@ -341,7 +341,7 @@ static void radeon_enc_spec_misc_hevc(struct radeon_encoder 
*enc, struct pipe_pi
 
 static void radeon_enc_rc_session_init(struct radeon_encoder *enc, struct 
pipe_picture_desc *picture)
 {
-   if (u_reduce_video_profile(picture->profile) == 
PIPE_VIDEO_FORMAT_MPEG4_AVC) {
+   if (u_reduce_video_profile(enc->base.profile) == 
PIPE_VIDEO_FORMAT_MPEG4_AVC) {
struct pipe_h264_enc_picture_desc *pic = (struct 
pipe_h264_enc_picture_desc *)picture;
enc->enc_pic.rc_session_init.vbv_buffer_level = 
pic->rate_ctrl.vbv_buf_lv;
switch(pic->rate_ctrl.rate_ctrl_method) {
@@ -359,7 +359,7 @@ static void radeon_enc_rc_session_init(struct 
radeon_encoder *enc, struct pipe_p
default:

enc->enc_pic.rc_session_init.rate_control_method = 
RENCODE_RATE_CONTROL_METHOD_NONE;
}
-   } else if (u_reduce_video_profile(picture->profile) == 
PIPE_VIDEO_FORMAT_HEVC) {
+   } else if (u_reduce_video_profile(enc->base.profile) == 
PIPE_VIDEO_FORMAT_HEVC) {
struct pipe_h265_enc_picture_desc *pic = (struct 
pipe_h265_enc_picture_desc *)picture;
enc->enc_pic.rc_session_init.vbv_buffer_level = 
pic->rc.vbv_buf_lv;
switch(pic->rc.rate_ctrl_method) {
@@ -387,7 +387,7 @@ static void radeon_enc_rc_session_init(struct 
radeon_encoder *enc, struct pipe_p
 
 static void radeon_enc_rc_layer_init(struct radeon_encoder *enc, struct 
pipe_picture_desc *picture)
 {
-   if (u_reduce_video_profile(picture->profile) == 
PIPE_VIDEO_FORMAT_MPEG4_AVC) {
+   if (u_reduce_video_profile(enc->base.profile) == 
PIPE_VIDEO_FORMAT_MPEG4_AVC) {
struct pipe_h264_enc_picture_desc *pic = (struct 
pipe_h264_enc_picture_desc *)picture;
enc->enc_pic.rc_layer_init.target_bit_rate = 
pic->rate_ctrl.target_bitrate;
enc->enc_pic.rc_layer_init.peak_bit_rate = 
pic->rate_ctrl.peak_bitrate;
@@ -397,7 +397,7 @@ static void radeon_enc_rc_layer_init(struct radeon_encoder 
*enc, struct pipe_pic
enc->enc_pic.rc_layer_init.avg_target_bits_per_picture = 
pic->rate_ctrl.target_bits_picture;
enc->enc_pic.rc_layer_init.peak_bits_per_picture_integer = 
pic->rate_ctrl.peak_bits_picture_integer;
enc->enc_pic.rc_layer_init.peak_bits_per_picture_fractional = 
pic->rate_ctrl.peak_bits_picture_fraction;
-   } else if (u_reduce_video_profile(picture->profile) == 
PIPE_VIDEO_FORMAT_HEVC) {
+   } else if (u_reduce_video_profile(enc->base.profile) == 
PIPE_VIDEO_FORMAT_HEVC) {
struct pipe_h265_enc_picture_desc *pic = (struct 
pipe_h265_enc_picture_desc *)picture;
enc->enc_pic.rc_layer_init.target_bit_rate = 
pic->rc.target_bitrate;
enc->enc_pic.rc_layer_init.peak_bit_rate = pic->rc.peak_bitrate;
@@ -1113,7 +1113,7 @@ static void radeon_enc_intra_refresh(struct 
radeon_encoder *enc)

Re: [Mesa-dev] [PATCH 11/12] vbo: Remove vbo_save_vertex_list::buffer_offset.

2018-02-27 Thread Brian Paul

On 02/26/2018 11:12 PM, mathias.froehl...@gmx.net wrote:

From: Mathias Fröhlich 

The buffer_offset is used in aligned_vertex_buffer_offset.
But now that most of these decisions are done in compile_vertex_list
we can work on local variables instead of struct members in the
display list code. Clean that up and remove buffer_offset.


I presume the optimization I implemented here this still works after 
this change.


If so, and with the minor comments on patch 4, the series LGTM.

Reviewed-by: Brian Paul 

Nice work!

-Brian



Signed-off-by: Mathias Fröhlich 
---
  src/mesa/vbo/vbo_save.h | 14 --
  src/mesa/vbo/vbo_save_api.c | 28 +---
  2 files changed, 13 insertions(+), 29 deletions(-)

diff --git a/src/mesa/vbo/vbo_save.h b/src/mesa/vbo/vbo_save.h
index 6cd82094e3..3ad50e9151 100644
--- a/src/mesa/vbo/vbo_save.h
+++ b/src/mesa/vbo/vbo_save.h
@@ -70,7 +70,6 @@ struct vbo_save_vertex_list {
  */
 fi_type *current_data;
  
-   GLuint buffer_offset;/**< in bytes */

 GLuint vertex_count; /**< number of vertices in this list */
 GLuint wrap_count; /* number of copied vertices at start */
  
@@ -81,19 +80,6 @@ struct vbo_save_vertex_list {

  };
  
  
-/**

- * Is the vertex list's buffer offset an exact multiple of the
- * vertex size (in bytes)?  This is used to check for a vertex array /
- * drawing optimization.
- */
-static inline bool
-aligned_vertex_buffer_offset(const struct vbo_save_vertex_list *node)
-{
-   unsigned vertex_size = node->vertex_size * sizeof(GLfloat); /* in bytes */
-   return vertex_size != 0 && node->buffer_offset % vertex_size == 0;
-}
-
-
  /**
   * Return the stride in bytes of the display list node.
   */
diff --git a/src/mesa/vbo/vbo_save_api.c b/src/mesa/vbo/vbo_save_api.c
index e8d027f15c..e6cd04281e 100644
--- a/src/mesa/vbo/vbo_save_api.c
+++ b/src/mesa/vbo/vbo_save_api.c
@@ -527,7 +527,6 @@ compile_vertex_list(struct gl_context *ctx)
  {
 struct vbo_save_context *save = &vbo_context(ctx)->save;
 struct vbo_save_vertex_list *node;
-   GLintptr buffer_offset = 0;
  
 /* Allocate space for this structure in the display list currently

  * being compiled.
@@ -543,10 +542,12 @@ compile_vertex_list(struct gl_context *ctx)
  
 /* Duplicate our template, increment refcounts to the storage structs:

  */
+   const GLsizei stride = save->vertex_size*sizeof(GLfloat);
 node->vertex_size = save->vertex_size;
-   node->buffer_offset =
-  (save->buffer_map - save->vertex_store->buffer_map) * sizeof(GLfloat);
-   if (aligned_vertex_buffer_offset(node)) {
+   GLintptr buffer_offset =
+   (save->buffer_map - save->vertex_store->buffer_map) * sizeof(GLfloat);
+   GLuint start_offset = 0;
+   if (0 < buffer_offset && 0 < stride && buffer_offset % stride == 0) {
/* The vertex size is an exact multiple of the buffer offset.
 * This means that we can use zero-based vertex attribute pointers
 * and specify the start of the primitive with the _mesa_prim::start
@@ -555,9 +556,11 @@ compile_vertex_list(struct gl_context *ctx)
 * changes in drivers.  In particular, the Gallium CSO module will
 * filter out redundant vertex buffer changes.
 */
+  /* We cannot immediately update the primitives as some methods below
+   * still need the uncorrected start vertices
+   */
+  start_offset = buffer_offset/stride;
buffer_offset = 0;
-   } else {
-  buffer_offset = node->buffer_offset;
 }
 GLuint offsets[VBO_ATTRIB_MAX];
 for (unsigned i = 0, offset = 0; i < VBO_ATTRIB_MAX; ++i) {
@@ -596,7 +599,7 @@ compile_vertex_list(struct gl_context *ctx)
if (current_size) {
   node->current_data = malloc(current_size * sizeof(GLfloat));
   if (node->current_data) {
-const char *buffer = (const char *) save->vertex_store->buffer_map;
+const char *buffer = (const char *)save->buffer_map;
  unsigned attr_offset = save->attrsz[0] * sizeof(GLfloat);
  unsigned vertex_offset = 0;
  
@@ -604,8 +607,7 @@ compile_vertex_list(struct gl_context *ctx)

 vertex_offset =
(node->vertex_count - 1) * node->vertex_size * 
sizeof(GLfloat);
  
-memcpy(node->current_data,

-   buffer + node->buffer_offset + vertex_offset + attr_offset,
+memcpy(node->current_data, buffer + vertex_offset + attr_offset,
 current_size * sizeof(GLfloat));
   } else {
  _mesa_error(ctx, GL_OUT_OF_MEMORY, "Current value allocation");
@@ -636,12 +638,8 @@ compile_vertex_list(struct gl_context *ctx)
  * On the other hand the _vbo_loopback_vertex_list call below needs the
  * primitves to be corrected already.
  */
-   if (aligned_vertex_buffer_offset(node)) {
-  const unsigned start_offset =
- node->buffer_offset / (node->vertex_size * sizeof(GLfloat));
- 

Re: [Mesa-dev] [PATCH 04/12] vbo: Implement vbo_loopback_vertex_list in terms of the VAO.

2018-02-27 Thread Brian Paul

Minor nit-picks below.

On 02/26/2018 11:12 PM, mathias.froehl...@gmx.net wrote:

From: Mathias Fröhlich 

Use the information already present in the VAO to replay a display list
node using immediate mode draw commands. Use a hand full of helper methods
that will be useful for the next patches also.

Signed-off-by: Mathias Fröhlich 
---
  src/mesa/vbo/vbo_save.h  |  53 ++---
  src/mesa/vbo/vbo_save_api.c  |  42 ++
  src/mesa/vbo/vbo_save_draw.c |  28 +++--
  src/mesa/vbo/vbo_save_loopback.c | 119 +--
  4 files changed, 149 insertions(+), 93 deletions(-)

diff --git a/src/mesa/vbo/vbo_save.h b/src/mesa/vbo/vbo_save.h
index 14ac831ffd..0672557c0e 100644
--- a/src/mesa/vbo/vbo_save.h
+++ b/src/mesa/vbo/vbo_save.h
@@ -100,6 +100,50 @@ aligned_vertex_buffer_offset(const struct 
vbo_save_vertex_list *node)
  }
  
  
+/**

+ * Return the stride in bytes of the display list node.
+ */
+static inline GLsizei
+_vbo_save_get_stride(const struct vbo_save_vertex_list *node)
+{
+   return node->VAO[0]->BufferBinding[0].Stride;
+}
+
+
+/**
+ * Return the first referenced vertex index in the display list node.
+ */
+static inline GLuint
+_vbo_save_get_min_index(const struct vbo_save_vertex_list *node)
+{
+   const struct _mesa_prim *first_prim = &node->prims[0];
+   return first_prim->start;


Maybe just  return node->prims[0].start;



+}
+
+
+/**
+ * Return the last referenced vertex index in the display list node.
+ */
+static inline GLuint
+_vbo_save_get_max_index(const struct vbo_save_vertex_list *node)
+{


assert (node->prim_count > 0);


+   const struct _mesa_prim *last_prim = &node->prims[node->prim_count - 1];
+   return last_prim->start + last_prim->count - 1;
+}
+
+
+/**
+ * Return the vertex count in the display list node.
+ */
+static inline GLuint
+_vbo_save_get_vertex_count(const struct vbo_save_vertex_list *node)
+{
+   const struct _mesa_prim *first_prim = &node->prims[0];


assert (node->prim_count > 0);


+   const struct _mesa_prim *last_prim = &node->prims[node->prim_count - 1];
+   return last_prim->start - first_prim->start + last_prim->count;
+}
+
+
  /* These buffers should be a reasonable size to support upload to
   * hardware.  Current vbo implementation will re-upload on any
   * changes, so don't make too big or apps which dynamically create
@@ -178,13 +222,8 @@ void vbo_save_fallback(struct gl_context *ctx, GLboolean 
fallback);
  
  /* save_loopback.c:

   */
-void vbo_loopback_vertex_list(struct gl_context *ctx,
-  const GLfloat *buffer,
-  const GLubyte *attrsz,
-  const struct _mesa_prim *prim,
-  GLuint prim_count,
-  GLuint wrap_count,
-  GLuint vertex_size);
+void _vbo_loopback_vertex_list(struct gl_context *ctx,
+   const struct vbo_save_vertex_list* node);
  
  /* Callbacks:

   */
diff --git a/src/mesa/vbo/vbo_save_api.c b/src/mesa/vbo/vbo_save_api.c
index b6fc7daa35..dc248934f7 100644
--- a/src/mesa/vbo/vbo_save_api.c
+++ b/src/mesa/vbo/vbo_save_api.c
@@ -641,6 +641,22 @@ compile_vertex_list(struct gl_context *ctx)
  
 merge_prims(node->prims, &node->prim_count);
  
+   /* Correct the primitive starts, we can only do this here as copy_vertices

+* and convert_line_loop_to_strip above consume the uncorrected starts.
+* On the other hand the _vbo_loopback_vertex_list call below needs the
+* primitves to be corrected already.
+*/
+   if (aligned_vertex_buffer_offset(node)) {
+  const unsigned start_offset =
+ node->buffer_offset / (node->vertex_size * sizeof(GLfloat));
+  for (unsigned i = 0; i < node->prim_count; i++) {
+ node->prims[i].start += start_offset;
+  }
+  node->start_vertex = start_offset;
+   } else {
+  node->start_vertex = 0;
+   }
+
 /* Deal with GL_COMPILE_AND_EXECUTE:
  */
 if (ctx->ExecuteFlag) {
@@ -648,13 +664,8 @@ compile_vertex_list(struct gl_context *ctx)
  
_glapi_set_dispatch(ctx->Exec);
  
-  const GLfloat *buffer = (const GLfloat *)

- ((const char *) save->vertex_store->buffer_map +
-  node->buffer_offset);
-
-  vbo_loopback_vertex_list(ctx, buffer,
-   node->attrsz, node->prims, node->prim_count,
-   node->wrap_count, node->vertex_size);
+  /* Note that the range of referenced vertices must be mapped already */
+  _vbo_loopback_vertex_list(ctx, node);
  
_glapi_set_dispatch(dispatch);

 }
@@ -693,23 +704,6 @@ compile_vertex_list(struct gl_context *ctx)
save->prim_store = alloc_prim_store();
 }
  
-   /*

-* If the vertex buffer offset is a multiple of the vertex size,
-* we can use the _mesa_prim::start value to indicate where the
-* vertices starts, instead of the buffer offset.  Al

[Mesa-dev] [PATCH] ac/radv: move load base vertex abi setup to vertex shader.

2018-02-27 Thread Dave Airlie
From: Dave Airlie 

This was segfaulting:
dEQP-VK.memory.pipeline_barrier.host_write_index_buffer.1024

Fixes: 8de6f797070 (ac/radeonsi: add load_base_vertex() to the abi)
Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 44d96d27d7..8b662f884f 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -6920,7 +6920,6 @@ LLVMModuleRef 
ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
ctx.gs_max_out_vertices = 
shaders[i]->info.gs.vertices_out;
ctx.abi.load_inputs = load_gs_input;
ctx.abi.emit_primitive = visit_end_primitive;
-   ctx.abi.load_base_vertex = radv_load_base_vertex;
} else if (shaders[i]->info.stage == MESA_SHADER_TESS_CTRL) {
ctx.tcs_outputs_read = shaders[i]->info.outputs_read;
ctx.tcs_patch_outputs_read = 
shaders[i]->info.patch_outputs_read;
@@ -6944,6 +6943,7 @@ LLVMModuleRef 
ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
MAX2(1, 
ctx.shader_info->vs.vgpr_comp_cnt);
}
}
+   ctx.abi.load_base_vertex = radv_load_base_vertex;
} else if (shaders[i]->info.stage == MESA_SHADER_FRAGMENT) {
shader_info->fs.can_discard = 
shaders[i]->info.fs.uses_discard;
ctx.abi.lookup_interp_param = lookup_interp_param;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 00/13] Use clear color address in surface state.

2018-02-27 Thread Jason Ekstrand
I've now read through the whole thing and sprinkled some comments.
Over-all, I think it's looking pretty good.

There are a couple of bits of follow-on work that could be done:

 1) Now that we have indirect clear colors, we can allow non-default clear
values for texturing and transfer ops in ANV.  This is mostly a matter of
making anv_blorp hand off the clear address in more cases and then enabling
it in anv_layout_to_fast_clear_type

 2) Add a new I915_FORMAT_MOD_Y_TILED_CCS_CLEAR modifier and wire it up in
Vulkan and GL.  This will require adding a third fake plane for the
CLEAR_COLOR state.

Neither of those need to be done as a prerequisite to landing this and you
don't have to be the one to do them.  However, I thought I should write it
down somewhere.

On Wed, Feb 21, 2018 at 1:45 PM, Rafael Antognolli <
rafael.antogno...@intel.com> wrote:

> Rebase of this series after lots of aux surface changes on anv.
>
> Cc: Jason Ekstrand 
> Cc: Jordan Justen 
> Cc: Topi Pohjolainen 
>
> Rafael Antognolli (13):
>   anv/image: Do not override lower bits of dword.
>   genxml: Preserve fields that share dword space with addresses.
>   intel/genxml: Use a single field for clear color address on gen10.
>   intel/isl: Update size of clear color value.
>   intel/genxml: Add Clear Color struct.
>   intel/isl: Add support to emit clear value address.
>   intel/blorp: Add suport for fast clear address.
>   i965/miptree: Add space to store the clear value in the aux surface.
>   i965/blorp: Update the fast clear color entry buffer.
>   i965/surface_state: Emit the clear color address instead of value.
>   i965/surface_state: Silence warning.
>   anv: Emit the fast clear color address, instead of value.
>   anv: Use clear address for HiZ fast clears too.
>
>  src/intel/blorp/blorp_genX_exec.h| 12 --
>  src/intel/genxml/gen10.xml   | 15 +--
>  src/intel/genxml/gen11.xml   |  7 ++--
>  src/intel/genxml/gen_pack_header.py  |  9 +++-
>  src/intel/isl/isl.c  | 29 +
>  src/intel/isl/isl.h  | 14 +++
>  src/intel/isl/isl_surface_state.c| 18 ++--
>  src/intel/vulkan/anv_device.c| 19 +
>  src/intel/vulkan/anv_image.c | 53
> ++--
>  src/intel/vulkan/anv_private.h   |  8 +++-
>  src/intel/vulkan/genX_cmd_buffer.c   | 52
> +--
>  src/mesa/drivers/dri/i965/brw_blorp.c| 26 
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 18 +++-
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c| 10 +
>  14 files changed, 245 insertions(+), 45 deletions(-)
>
> --
> 2.14.3
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-users] GraphicsFuzz metamorphic testing of shader compilers

2018-02-27 Thread Brian Paul

On 02/27/2018 10:27 AM, Hugues Evrard wrote:

Hi all,

I have just reported a Mesa (i965) crash which was triggered by a shader 
from the GraphicsFuzz demo (bug ID 105271), and I wanted to give a 
broader context on that bug report.


We are three academics (Alastair, Paul and myself) from Imperial College 
London who work on metamorphic testing of shader compilers, last year we 
reported drivers bugs across all major GPU vendors and wrote some blog 
posts about this (https://medium.com/@afd_icl/689d15ce922b 
). 
We also had the chance to visit some driver developers, including the 
Intel Mesa team in Portland -- thanks again for hosting us!


After months of further development and tedious paperwork, we are now 
spinning GraphicsFuzz out of academia with the aim to raise graphics 
drivers reliability across the board. Our first effort focuses on the 
mobile landscape, you can see wrong images and crashes due to graphics 
driver bugs in the Samsung S8s, Nvidia Shields, Google NexusTV and 
Pixels, Huawei Honors and Apple iPhones here (more to come!):
http://www.graphicsfuzz.com/#results 



On the technical side, a summary of our testing approach is here: 
http://www.graphicsfuzz.com/howitworks.html 



We are looking forward to cover the Mesa drivers, but not immediately 
given our current focus on mobile devices. Meanwhile, anyone can easily 
try our demo, which executes 15 of our test shaders, on any 
WebGL2-capable web browser. Today's bug report comes from this demo, 
which crashes i956 (Mesa 17.3.3) on my Intel HD 520:
http://www.graphicsfuzz.com/#demo 



Please let me know if the demo crashes other Mesa setups, we will be 
happy to help providing good reproduction examples!


Many thanks, and if you have questions or feedback, we are all ears :)
Hugues


Hi Hughes,

I've been followed your work since I came across it a few months ago. 
It's a great project.  I look forward to finding/fixing any issues in Mesa.


I'm cc'ing your message to the mesa-dev list.  That's probably the 
better place to get the attention of the i965 driver developers.  Also, 
you should probably file a Mesa bug for any issues you find.


-Brian

PS: Sorry about mangling your URLs above.  It's a VMware mail thing. :-(


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 11/13] i965/surface_state: Silence warning.

2018-02-27 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Wed, Feb 21, 2018 at 1:45 PM, Rafael Antognolli <
rafael.antogno...@intel.com> wrote:

> This warning showed up after aux_bo started being used inside
>
> if (use_clear_address) {...
>
> But use_clear_address depends on aux_surf being not null, in which case
> aux_bo would also be set. Make the compiler happy anyway.
>
> Signed-off-by: Rafael Antognolli 
> Reviewed-by: Nanley Chery 
> ---
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> index b1f94f181b6..f69984e6f6d 100644
> --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> @@ -150,7 +150,7 @@ brw_emit_surface_state(struct brw_context *brw,
>
> union isl_color_value clear_color = { .u32 = { 0, 0, 0, 0 } };
>
> -   struct brw_bo *aux_bo;
> +   struct brw_bo *aux_bo = NULL;
> struct isl_surf *aux_surf = NULL;
> uint64_t aux_offset = 0;
> switch (aux_usage) {
> --
> 2.14.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 08/13] i965/miptree: Add space to store the clear value in the aux surface.

2018-02-27 Thread Jason Ekstrand
On Wed, Feb 21, 2018 at 1:45 PM, Rafael Antognolli <
rafael.antogno...@intel.com> wrote:

> Similarly to vulkan where we store the clear value in the aux surface,
> we can do the same in GL.
>
> v2: Remove unneeded extra function.
> v3: Use clear_value_state_size instead of clear_value_size.
>
> Signed-off-by: Rafael Antognolli 
> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index 6d35c9d3928..7bdd4d89ccc 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -1676,6 +1676,16 @@ intel_alloc_aux_buffer(struct brw_context *brw,
>return false;
>
> buf->size = aux_surf->size;
> +
> +   const struct gen_device_info *devinfo = &brw->screen->devinfo;
> +   if (devinfo->gen >= 10) {
> +  /* On CNL, instead of setting the clear color in the SURFACE_STATE,
> we
> +   * will set a pointer to a dword somewhere that contains the color.
> So,
> +   * allocate the space for the clear color value here on the aux
> buffer.
> +   */
> +  buf->size += brw->isl_dev.ss.clear_value_state_size;
>

So, this is a bit sticky...  There are two places that we get CCS buffers.
One is here and the other is when the image gets created through the window
system and has the I915_FORMAT_MOD_Y_TILED_CCS modifier.  In the second
(modifier) case, we need to put the clear color in its own BO.  Yeah,
that's a bit silly but there's not much more we can do.


> +   }
> +
> buf->pitch = aux_surf->row_pitch;
> buf->qpitch = isl_surf_get_array_pitch_sa_rows(aux_surf);
>
> --
> 2.14.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 09/13] i965/blorp: Update the fast clear color entry buffer.

2018-02-27 Thread Jason Ekstrand
On Wed, Feb 21, 2018 at 1:45 PM, Rafael Antognolli <
rafael.antogno...@intel.com> wrote:

> On Gen10, whenever the fast clear color changes, update it on the clear
> color entry buffer. This allow us to use it directly when emitting the
> surface state.
>
> Signed-off-by: Rafael Antognolli 
> ---
>  src/mesa/drivers/dri/i965/brw_blorp.c | 26 ++
>  1 file changed, 26 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c
> b/src/mesa/drivers/dri/i965/brw_blorp.c
> index 1d586e5ef38..25bfd118e7e 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.c
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.c
> @@ -1142,6 +1142,27 @@ set_write_disables(const struct intel_renderbuffer
> *irb,
> return disables;
>  }
>
> +static void
> +update_fast_clear_color(struct brw_context *brw,
> +struct blorp_surf *surf,
> +const union isl_color_value clear_color)
> +{
> +   assert(surf);
> +   /* Clear values are stored at the same bo as the aux surface, right
> +* after the surface.
> +*/
> +   uint32_t clear_offset = surf->aux_addr.offset + surf->aux_surf->size;
> +   for (int i = 0; i < brw->isl_dev.ss.clear_value_size / 4; i++) {
> +  brw_store_data_imm32(brw, surf->aux_addr.buffer,
> +   clear_offset + i * 4, clear_color.u32[i]);
> +   }
> +
> +   surf->clear_color_addr = (struct blorp_address) {
> +  .buffer = surf->aux_addr.buffer,
> +  .offset = clear_offset,
> +   };
>

I kind-of wonder if we don't want to just make core blorp do this.
Something like

if (GEN_GEN >= 10 && params->dst.clear_color_addr.buffer &&
params->fast_clear_op == ISL_AUX_OP_FAST_CLEAR) {
   /* Fill out the clear color */
}

I'm not actually sure that we do, but it's an idea.  Other than that, this
patch looks fine by me.


> +}
> +
>  static void
>  do_single_blorp_clear(struct brw_context *brw, struct gl_framebuffer *fb,
>struct gl_renderbuffer *rb, unsigned buf,
> @@ -1238,6 +1259,11 @@ do_single_blorp_clear(struct brw_context *brw,
> struct gl_framebuffer *fb,
>blorp_surf_for_miptree(brw, &surf, irb->mt, irb->mt->aux_usage,
> true,
>   &level, irb->mt_layer, num_layers, isl_tmp);
>
> +  /* update clear color */
> +  const struct gen_device_info *devinfo = &brw->screen->devinfo;
> +  if (devinfo->gen >= 10 && !same_clear_color)
> + update_fast_clear_color(brw, &surf, clear_color);
> +
>/* Ivybrigde PRM Vol 2, Part 1, "11.7 MCS Buffer for Render
> Target(s)":
> *
> *"Any transition from any value in {Clear, Render, Resolve} to
> a
> --
> 2.14.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] radv: Implement WaitForFences with !waitAll.

2018-02-27 Thread Dave Airlie
For the series.

Reviewed-by: Dave Airlie 

On 27 February 2018 at 09:37, Bas Nieuwenhuizen  
wrote:
> Nothing to do except using a busy wait loop. At least for old kernels.
>
> A better implementation for newer kernels to come later.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105255
> Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
> ---
>  src/amd/vulkan/radv_device.c | 20 +++-
>  1 file changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index f197b7f484..ca99ba50ba 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -2908,13 +2908,17 @@ void radv_DestroyFence(
> vk_free2(&device->alloc, pAllocator, fence);
>  }
>
> -static uint64_t radv_get_absolute_timeout(uint64_t timeout)
> +
> +static uint64_t radv_get_current_time()
>  {
> -   uint64_t current_time;
> struct timespec tv;
> -
> clock_gettime(CLOCK_MONOTONIC, &tv);
> -   current_time = tv.tv_nsec + tv.tv_sec*10ull;
> +   return tv.tv_nsec + tv.tv_sec*10ull;
> +}
> +
> +static uint64_t radv_get_absolute_timeout(uint64_t timeout)
> +{
> +   uint64_t current_time = radv_get_current_time();
>
> timeout = MIN2(UINT64_MAX - current_time, timeout);
>
> @@ -2932,7 +2936,13 @@ VkResult radv_WaitForFences(
> timeout = radv_get_absolute_timeout(timeout);
>
> if (!waitAll && fenceCount > 1) {
> -   fprintf(stderr, "radv: WaitForFences without waitAll not 
> implemented yet\n");
> +   while(radv_get_current_time() <= timeout) {
> +   for (uint32_t i = 0; i < fenceCount; ++i) {
> +   if (radv_GetFenceStatus(_device, pFences[i]) 
> == VK_SUCCESS)
> +   return VK_SUCCESS;
> +   }
> +   }
> +   return VK_TIMEOUT;
> }
>
> for (uint32_t i = 0; i < fenceCount; ++i) {
> --
> 2.16.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium: remove llvm from ir struct

2018-02-27 Thread Dave Airlie
Acked-by: Dave Airlie 

On 28 February 2018 at 08:36, Timothy Arceri  wrote:
> Ping?
>
> On 02/02/18 08:54, Timothy Arceri wrote:
>>
>> This was added in 425dc4c4b366 but never used. Also since
>> 100796c15c3a native has superseded llvm.
>> ---
>>   src/gallium/include/pipe/p_state.h | 1 -
>>   1 file changed, 1 deletion(-)
>>
>> diff --git a/src/gallium/include/pipe/p_state.h
>> b/src/gallium/include/pipe/p_state.h
>> index 2b56d60b5e..640e6ed26d 100644
>> --- a/src/gallium/include/pipe/p_state.h
>> +++ b/src/gallium/include/pipe/p_state.h
>> @@ -267,7 +267,6 @@ struct pipe_shader_state
>>  /* TODO move tokens into union. */
>>  const struct tgsi_token *tokens;
>>  union {
>> -  void *llvm;
>> void *native;
>> void *nir;
>>  } ir;
>>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 105274] Buffer overflow in gallium/auxiliary/hud/hud_cpufreq.c

2018-02-27 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105274

--- Comment #3 from Gert Wollny  ---
The sscanf will accept the number and not check the rest of the string, yes,
but later in the code you have

  char basename[256];
  snprintf(basename, sizeof(basename), 
   "/sys/devices/system/cpu/%s", dp->d_name);

so basename is limited to 256 chars including the terminator 0, and then 

  snprintf(fn, sizeof(fn), "%s/cpufreq/scaling_cur_freq", basename);

with sizeof(fn)== 128, which means fn is not longer then 128 byte, just like it
is defined in cpufreq_info::sysfs_filename, hence no buffer overflow there.
Then 

  if (stat(fn, &stat_buf) < 0)
 continue;

must pass, which means whatever string was created before, it must still point
to a valid file name that at this point is

  /sys/devices/system/cpu/??/cpufreq/scaling_cur_freq

and this limits the size of ?? == dp->d_name to less than 80 characters (This,
by the way, also means one has to create this special file, it is not
sufficient to add an arbitrary string to  /sys/devices/system/cpu/cpuN). So
yes, one can get a buffer overflow in cpufreq_info::name but it can not
overwrite anything important, because given the memory layout of cpufreq_info
it may temporarily overwrite cpufreq_info::cpu_index and a part of
cpufreq_info::sysfs_filename, but these are set later in add_object anyway, and
because of cpufreq_info::sysfs_filename being a null terminated string,
cpufreq_info::name would also always be null-termianted, limiting the impact of
its later use.

IMHO the better fix is snprintf, because it creates a null terminated string
and doesn't do any post-hoc error handling. Unless someone beats me to it I'll
send a patch tomorrow.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 12/13] anv: Emit the fast clear color address, instead of value.

2018-02-27 Thread Jason Ekstrand
On Wed, Feb 21, 2018 at 1:45 PM, Rafael Antognolli <
rafael.antogno...@intel.com> wrote:

> On Gen10+, instead of copying the clear color from the state buffer to
> the surface state, just use the address of the state buffer in the
> surface state directly. This way we can avoid the copy from state buffer
> to surface state.
>
> Signed-off-by: Rafael Antognolli 
> ---
>  src/intel/vulkan/anv_image.c   | 19 ++
>  src/intel/vulkan/anv_private.h |  5 
>  src/intel/vulkan/genX_cmd_buffer.c | 52 ++
> +---
>  3 files changed, 72 insertions(+), 4 deletions(-)
>
> diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
> index 0dafe03442d..6b7ea32cbb3 100644
> --- a/src/intel/vulkan/anv_image.c
> +++ b/src/intel/vulkan/anv_image.c
> @@ -1023,6 +1023,15 @@ anv_image_fill_surface_state(struct anv_device
> *device,
> const uint64_t aux_address = aux_usage == ISL_AUX_USAGE_NONE ?
>0 : (image->planes[plane].bo_offset + aux_surface->offset);
>
> +   bool use_clear_address = false;
> +   struct anv_address clear_address = { .bo = NULL };
> +   state_inout->clear_address = 0;
> +   if (device->info.gen >= 10 && aux_usage != ISL_AUX_USAGE_NONE &&
> +   aux_usage != ISL_AUX_USAGE_HIZ) {
> +  clear_address = anv_image_get_clear_color_addr(device, image,
> aspect);
> +  use_clear_address = true;
> +   }
> +
> if (view_usage == ISL_SURF_USAGE_STORAGE_BIT &&
> !(flags & ANV_IMAGE_VIEW_STATE_STORAGE_WRITE_ONLY) &&
> !isl_has_matching_typed_storage_image_format(&device->info,
> @@ -1040,6 +1049,7 @@ anv_image_fill_surface_state(struct anv_device
> *device,
>  .mocs = device->default_mocs);
>state_inout->address = address,
>state_inout->aux_address = 0;
> +  state_inout->clear_address = 0;
> } else {
>if (view_usage == ISL_SURF_USAGE_STORAGE_BIT &&
>!(flags & ANV_IMAGE_VIEW_STATE_STORAGE_WRITE_ONLY)) {
> @@ -1113,6 +1123,8 @@ anv_image_fill_surface_state(struct anv_device
> *device,
>.aux_surf = &aux_surface->isl,
>.aux_usage = aux_usage,
>.aux_address = aux_address,
> +  .clear_address = clear_address.offset,
> +  .use_clear_address = use_clear_address,
>.mocs = device->default_mocs,
>.x_offset_sa = tile_x_sa,
>.y_offset_sa = tile_y_sa);
> @@ -1134,6 +1146,13 @@ anv_image_fill_surface_state(struct anv_device
> *device,
>assert((aux_address & 0xfff) == 0);
>assert(aux_address == (*aux_addr_dw & 0xf000));
>state_inout->aux_address = *aux_addr_dw;
> +
> +  if (device->info.gen >= 10 && clear_address.bo) {
>

Here you use clear_address.bo != NULL but above you use use_clear_address.
Probably best to just pick one and stick with it.


> + uint32_t *clear_addr_dw = state_inout->state.map +
> +   device->isl_dev.ss.clear_value_offset;
> + assert((clear_address.offset & 0x3f) == 0);
> + state_inout->clear_address = *clear_addr_dw;
> +  }
> }
>
> anv_state_flush(device, state_inout->state);
> diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_
> private.h
> index b8c381d2665..5c077987cef 100644
> --- a/src/intel/vulkan/anv_private.h
> +++ b/src/intel/vulkan/anv_private.h
> @@ -1674,6 +1674,11 @@ struct anv_surface_state {
>  * bits of this address include extra aux information.
>  */
> uint64_t aux_address;
> +   /* Address of the clear color, if any
> +*
> +* This address is relative to the start of the BO.
> +*/
> +   uint64_t clear_address;
>  };
>
>  /**
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> b/src/intel/vulkan/genX_cmd_buffer.c
> index 939a795c2b1..b9e1d50cbe3 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -200,6 +200,16 @@ add_image_view_relocs(struct anv_cmd_buffer
> *cmd_buffer,
>if (result != VK_SUCCESS)
>   anv_batch_set_error(&cmd_buffer->batch, result);
> }
> +
> +   if (state.clear_address) {
> +  VkResult result =
> + anv_reloc_list_add(&cmd_buffer->surface_relocs,
> +&cmd_buffer->pool->alloc,
> +state.state.offset + isl_dev->ss.clear_value_
> offset,
>

I'm not sure how comfortable I am with ss.clear_value_offset doing
double-duty for inline clear values and clear value addresses.  I suppose
it's probably ok because the only overlap is gen10 and we know it matches
there.


> +image->planes[image_plane].bo,
> state.clear_address);
> +  if (result != VK_SUCCESS)
> + anv_batch_set_error(&cmd_buffer->batch, result);
> +   }
>  }
>
>  static void
> @@ -1056,6 +1066,35 @@ transition_color_buffer(struct anv_

Re: [Mesa-dev] [PATCH] radeonsi: fix radeon create encoder return

2018-02-27 Thread Alex Deucher
On Tue, Feb 27, 2018 at 5:43 PM,   wrote:
> From: Boyuan Zhang 
>
> Previous patch missed a "return" when trying to modify the create encoder
> function, which made the whole logic fail. Therefore, add the return back.
>
> Signed-off-by: Boyuan Zhang 

Reviewed-by: Alex Deucher 

> ---
>  src/gallium/drivers/radeonsi/si_uvd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/radeonsi/si_uvd.c 
> b/src/gallium/drivers/radeonsi/si_uvd.c
> index 3906bbd..b6cb4cb 100644
> --- a/src/gallium/drivers/radeonsi/si_uvd.c
> +++ b/src/gallium/drivers/radeonsi/si_uvd.c
> @@ -150,7 +150,7 @@ struct pipe_video_codec *si_uvd_create_decoder(struct 
> pipe_context *context,
>
> if (templ->entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE) {
> if (vcn) {
> -   radeon_create_encoder(context, templ, ctx->b.ws, 
> si_vce_get_buffer);
> +   return radeon_create_encoder(context, templ, 
> ctx->b.ws, si_vce_get_buffer);
> } else {
> if (u_reduce_video_profile(templ->profile) == 
> PIPE_VIDEO_FORMAT_HEVC)
> return radeon_uvd_create_encoder(context, 
> templ, ctx->b.ws, si_vce_get_buffer);
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: fix radeon create encoder return

2018-02-27 Thread boyuan.zhang
From: Boyuan Zhang 

Previous patch missed a "return" when trying to modify the create encoder
function, which made the whole logic fail. Therefore, add the return back.

Signed-off-by: Boyuan Zhang 
---
 src/gallium/drivers/radeonsi/si_uvd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_uvd.c 
b/src/gallium/drivers/radeonsi/si_uvd.c
index 3906bbd..b6cb4cb 100644
--- a/src/gallium/drivers/radeonsi/si_uvd.c
+++ b/src/gallium/drivers/radeonsi/si_uvd.c
@@ -150,7 +150,7 @@ struct pipe_video_codec *si_uvd_create_decoder(struct 
pipe_context *context,
 
if (templ->entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE) {
if (vcn) {
-   radeon_create_encoder(context, templ, ctx->b.ws, 
si_vce_get_buffer);
+   return radeon_create_encoder(context, templ, ctx->b.ws, 
si_vce_get_buffer);
} else {
if (u_reduce_video_profile(templ->profile) == 
PIPE_VIDEO_FORMAT_HEVC)
return radeon_uvd_create_encoder(context, 
templ, ctx->b.ws, si_vce_get_buffer);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium: remove llvm from ir struct

2018-02-27 Thread Timothy Arceri

Ping?

On 02/02/18 08:54, Timothy Arceri wrote:

This was added in 425dc4c4b366 but never used. Also since
100796c15c3a native has superseded llvm.
---
  src/gallium/include/pipe/p_state.h | 1 -
  1 file changed, 1 deletion(-)

diff --git a/src/gallium/include/pipe/p_state.h 
b/src/gallium/include/pipe/p_state.h
index 2b56d60b5e..640e6ed26d 100644
--- a/src/gallium/include/pipe/p_state.h
+++ b/src/gallium/include/pipe/p_state.h
@@ -267,7 +267,6 @@ struct pipe_shader_state
 /* TODO move tokens into union. */
 const struct tgsi_token *tokens;
 union {
-  void *llvm;
void *native;
void *nir;
 } ir;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Allow 48-bit addressing on Gen8+.

2018-02-27 Thread Jordan Justen
On 2018-02-26 16:05:46, Kenneth Graunke wrote:
> This allows most GPU objects to use the full 48-bit address space
> offered by Gen8+ platforms, rather than being stuck with 32-bit.
> This expands the available GPU memory from 4G to 256TB or so.
> 
> A few objects - instruction, scratch, and vertex buffers - need to
> remain pinned in the low 4GB of the address space for various reasons.
> We default everything to 48-bit but disable it in those cases.
> 
> Thanks to Jason Ekstrand for blazing this trail in anv first and
> finding the nasty undocumented hardware issues.  This patch simply
> rips off all of his findings.
> ---
>  src/mesa/drivers/dri/i965/brw_bufmgr.c   | 23 +
>  src/mesa/drivers/dri/i965/brw_misc_state.c   | 13 +++--
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 23 ++---
>  src/mesa/drivers/dri/i965/genX_blorp_exec.c  |  9 
>  src/mesa/drivers/dri/i965/genX_state_upload.c| 60 
> 
>  src/mesa/drivers/dri/i965/intel_batchbuffer.c| 15 ++
>  src/mesa/drivers/dri/i965/intel_batchbuffer.h|  2 +
>  7 files changed, 127 insertions(+), 18 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c 
> b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> index fb180289a0c..2e54adb3ed2 100644
> --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
> +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> @@ -119,6 +119,7 @@ struct brw_bufmgr {
> bool has_llc:1;
> bool has_mmap_wc:1;
> bool bo_reuse:1;
> +   bool supports_48b_addresses:1;
>  };
>  
>  static int bo_set_tiling_internal(struct brw_bo *bo, uint32_t tiling_mode,
> @@ -409,6 +410,8 @@ retry:
> bo->reusable = true;
> bo->cache_coherent = bufmgr->has_llc;
> bo->index = -1;
> +   if (bufmgr->supports_48b_addresses)
> +  bo->kflags = EXEC_OBJECT_SUPPORTS_48B_ADDRESS;
>  
> mtx_unlock(&bufmgr->lock);
>  
> @@ -1385,6 +1388,24 @@ gem_param(int fd, int name)
> return v;
>  }
>  
> +static bool
> +gem_supports_48b_addresses(int fd)
> +{
> +   struct drm_i915_gem_exec_object2 obj = {
> +  .flags = EXEC_OBJECT_SUPPORTS_48B_ADDRESS,
> +   };
> +
> +   struct drm_i915_gem_execbuffer2 execbuf = {
> +  .buffers_ptr = (uintptr_t)&obj,
> +  .buffer_count = 1,
> +  .rsvd1 = 0xffu,
> +   };
> +
> +   int ret = drmIoctl(fd, DRM_IOCTL_I915_GEM_EXECBUFFER2, &execbuf);
> +
> +   return ret == -1 && errno == ENOENT;
> +}
> +
>  /**
>   * Initializes the GEM buffer manager, which uses the kernel to allocate, 
> map,
>   * and manage map buffer objections.
> @@ -1418,6 +1439,8 @@ brw_bufmgr_init(struct gen_device_info *devinfo, int fd)
>  
> bufmgr->has_llc = devinfo->has_llc;
> bufmgr->has_mmap_wc = gem_param(fd, I915_PARAM_MMAP_VERSION) > 0;
> +   bufmgr->supports_48b_addresses =
> +  devinfo->gen >= 8 && gem_supports_48b_addresses(fd);
>  
> init_cache_buckets(bufmgr);
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c 
> b/src/mesa/drivers/dri/i965/brw_misc_state.c
> index c4ef6812bff..29d74876c27 100644
> --- a/src/mesa/drivers/dri/i965/brw_misc_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
> @@ -634,6 +634,12 @@ brw_upload_state_base_address(struct brw_context *brw)
> }
>  
> if (devinfo->gen >= 8) {
> +  /* STATE_BASE_ADDRESS has issues with 48-bit address spaces.  If the
> +   * address + size as seen by STATE_BASE_ADDRESS overflows 48 bits,
> +   * the GPU appears to treat all accesses to the buffer as being out
> +   * of bounds and returns zero.  To work around this, we pin all SBAs
> +   * to the bottom 4GB.
> +   */
>uint32_t mocs_wb = devinfo->gen >= 9 ? SKL_MOCS_WB : BDW_MOCS_WB;
>int pkt_len = devinfo->gen >= 9 ? 19 : 16;
>  
> @@ -644,15 +650,14 @@ brw_upload_state_base_address(struct brw_context *brw)
>OUT_BATCH(0);
>OUT_BATCH(mocs_wb << 16);
>/* Surface state base address: */
> -  OUT_RELOC64(brw->batch.state.bo, 0, mocs_wb << 4 | 1);
> +  OUT_RELOC64(brw->batch.state.bo, RELOC_32BIT, mocs_wb << 4 | 1);

Lines like this seem a little confusing with the RELOC_32BIT name.
What about something like RELOC_BELOW4G?

Reviewed-by: Jordan Justen 

>/* Dynamic state base address: */
> -  OUT_RELOC64(brw->batch.state.bo, 0, mocs_wb << 4 | 1);
> +  OUT_RELOC64(brw->batch.state.bo, RELOC_32BIT, mocs_wb << 4 | 1);
>/* Indirect object base address: MEDIA_OBJECT data */
>OUT_BATCH(mocs_wb << 4 | 1);
>OUT_BATCH(0);
>/* Instruction base address: shader kernels (incl. SIP) */
> -  OUT_RELOC64(brw->cache.bo, 0, mocs_wb << 4 | 1);
> -
> +  OUT_RELOC64(brw->cache.bo, RELOC_32BIT, mocs_wb << 4 | 1);
>/* General state buffer size */
>OUT_BATCH(0xf001);
>/* Dynamic state buffer size */
> diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
> b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> index 0b6016427bd..55e752261a5 100644
> --- a/src/me

Re: [Mesa-dev] [PATCH v4] anv: implement VK_EXT_global_priority extension

2018-02-27 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Tue, Feb 27, 2018 at 12:14 AM, Tapani Pälli 
wrote:

> v2: add ANV_CONTEXT_REALTIME_PRIORITY (Chris)
> use unreachable with unknown priority (Samuel)
>
> v3: add stubs in gem_stubs.c (Emil)
> use priority defines from gen_defines.h
>
> v4: cleanup, add anv_gem_set_context_param (Jason)
>
> Signed-off-by: Tapani Pälli 
> Reviewed-by: Samuel Iglesias Gonsálvez  (v2)
> Reviewed-by: Chris Wilson  (v2)
> Reviewed-by: Emil Velikov  (v3)
>
> Signed-off-by: Tapani Pälli 
> ---
>  src/intel/vulkan/anv_device.c  | 44 ++
> 
>  src/intel/vulkan/anv_extensions.py |  2 ++
>  src/intel/vulkan/anv_gem.c | 32 +++
>  src/intel/vulkan/anv_gem_stubs.c   | 12 +++
>  src/intel/vulkan/anv_private.h |  5 +
>  5 files changed, 95 insertions(+)
>
> diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
> index a83b7a39f6..417969110b 100644
> --- a/src/intel/vulkan/anv_device.c
> +++ b/src/intel/vulkan/anv_device.c
> @@ -37,6 +37,7 @@
>  #include "util/build_id.h"
>  #include "util/mesa-sha1.h"
>  #include "vk_util.h"
> +#include "common/gen_defines.h"
>
>  #include "genxml/gen7_pack.h"
>
> @@ -366,6 +367,9 @@ anv_physical_device_init(struct anv_physical_device
> *device,
> device->has_syncobj_wait = device->has_syncobj &&
>anv_gem_supports_syncobj_wait(fd);
>
> +   if (anv_gem_has_context_priority(fd))
> +  device->has_context_priority = true;
> +
> bool swizzled = anv_gem_get_bit6_swizzle(fd, I915_TILING_X);
>
> /* Starting with Gen10, the timestamp frequency of the command
> streamer may
> @@ -1316,6 +1320,23 @@ anv_device_init_dispatch(struct anv_device *device)
> }
>  }
>
> +static int
> +vk_priority_to_gen(int priority)
> +{
> +   switch (priority) {
> +   case VK_QUEUE_GLOBAL_PRIORITY_LOW_EXT:
> +  return GEN_CONTEXT_LOW_PRIORITY;
> +   case VK_QUEUE_GLOBAL_PRIORITY_MEDIUM_EXT:
> +  return GEN_CONTEXT_MEDIUM_PRIORITY;
> +   case VK_QUEUE_GLOBAL_PRIORITY_HIGH_EXT:
> +  return GEN_CONTEXT_HIGH_PRIORITY;
> +   case VK_QUEUE_GLOBAL_PRIORITY_REALTIME_EXT:
> +  return GEN_CONTEXT_REALTIME_PRIORITY;
> +   default:
> +  unreachable("Invalid priority");
> +   }
> +}
> +
>  VkResult anv_CreateDevice(
>  VkPhysicalDevicephysicalDevice,
>  const VkDeviceCreateInfo*   pCreateInfo,
> @@ -1359,6 +1380,15 @@ VkResult anv_CreateDevice(
>}
> }
>
> +   /* Check if client specified queue priority. */
> +   const VkDeviceQueueGlobalPriorityCreateInfoEXT *queue_priority =
> +  vk_find_struct_const(pCreateInfo->pQueueCreateInfos[0].pNext,
> +   DEVICE_QUEUE_GLOBAL_PRIORITY_CREATE_INFO_EXT);
> +
> +   VkQueueGlobalPriorityEXT priority =
> +  queue_priority ? queue_priority->globalPriority :
> + VK_QUEUE_GLOBAL_PRIORITY_MEDIUM_EXT;
> +
> device = vk_alloc2(&physical_device->instance->alloc, pAllocator,
> sizeof(*device), 8,
> VK_SYSTEM_ALLOCATION_SCOPE_DEVICE);
> @@ -1388,6 +1418,20 @@ VkResult anv_CreateDevice(
>goto fail_fd;
> }
>
> +   /* As per spec, the driver implementation may deny requests to acquire
> +* a priority above the default priority (MEDIUM) if the caller does
> not
> +* have sufficient privileges. In this scenario
> VK_ERROR_NOT_PERMITTED_EXT
> +* is returned.
> +*/
> +   if (physical_device->has_context_priority) {
> +  int err =
> + anv_gem_set_context_priority(device,
> vk_priority_to_gen(priority));
> +  if (err != 0 && priority > VK_QUEUE_GLOBAL_PRIORITY_MEDIUM_EXT) {
> + result = vk_error(VK_ERROR_NOT_PERMITTED_EXT);
> + goto fail_fd;
> +  }
> +   }
> +
> device->info = physical_device->info;
> device->isl_dev = physical_device->isl_dev;
>
> diff --git a/src/intel/vulkan/anv_extensions.py b/src/intel/vulkan/anv_
> extensions.py
> index 581921e62a..6194eb0ad6 100644
> --- a/src/intel/vulkan/anv_extensions.py
> +++ b/src/intel/vulkan/anv_extensions.py
> @@ -86,6 +86,8 @@ EXTENSIONS = [
>  Extension('VK_KHX_multiview', 1, True),
>  Extension('VK_EXT_debug_report',  8, True),
>  Extension('VK_EXT_external_memory_dma_buf',   1, True),
> +Extension('VK_EXT_global_priority',   1,
> +  'device->has_context_priority'),
>  ]
>
>  class VkVersion:
> diff --git a/src/intel/vulkan/anv_gem.c b/src/intel/vulkan/anv_gem.c
> index 34c0989108..93072c7d3b 100644
> --- a/src/intel/vulkan/anv_gem.c
> +++ b/src/intel/vulkan/anv_gem.c
> @@ -30,6 +30,7 @@
>  #include 
>
>  #include "anv_private.h"
> +#include "common/gen_defines.h"
>
>  static int
>  anv_ioctl(int fd, unsigned long request, void *arg)
> @@ -302,6 +303,22 @@ close_and_return:
> return swizzled;
>  }
>
> +int
> +anv_gem_set_context_priority(struc

Re: [Mesa-dev] [PATCH 01/17] intel: Add a preliminary device for Ice Lake

2018-02-27 Thread Kenneth Graunke
On Tuesday, February 20, 2018 9:15:08 PM PST Matt Turner wrote:
> From: Anuj Phogat 
> 
> Signed-off-by: Anuj Phogat 
[...]
> +   .urb = { \
> +  .size = 1024, \
> +   }

This might need to be 768 on some variants - I'm not certain, though.
Sometimes there's autoscaling that applies, and I haven't followed
exactly how it works these days.

Otherwise, the updated version with the subslices thing fixed is,
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/6] intel/ir: Allow representing additional flag subregisters in the IR.

2018-02-27 Thread Francisco Jerez
This allows representing conditional mods and predicates on f1.0-f1.1
at the IR level by adding an extra bit to the flag_subreg
backend_instruction field.
---
 src/intel/compiler/brw_fs.cpp| 12 +++-
 src/intel/compiler/brw_fs_generator.cpp  |  4 ++--
 src/intel/compiler/brw_reg.h |  7 +++
 src/intel/compiler/brw_schedule_instructions.cpp |  2 +-
 src/intel/compiler/brw_shader.h  |  4 ++--
 src/intel/compiler/brw_vec4.cpp  |  7 ---
 src/intel/compiler/brw_vec4_generator.cpp|  2 +-
 7 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index bed632d21b9..6c86b1592fd 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -5460,9 +5460,10 @@ fs_visitor::dump_instruction(backend_instruction 
*be_inst, FILE *file)
fs_inst *inst = (fs_inst *)be_inst;
 
if (inst->predicate) {
-  fprintf(file, "(%cf0.%d) ",
- inst->predicate_inverse ? '-' : '+',
- inst->flag_subreg);
+  fprintf(file, "(%cf%d.%d) ",
+  inst->predicate_inverse ? '-' : '+',
+  inst->flag_subreg / 2,
+  inst->flag_subreg % 2);
}
 
fprintf(file, "%s", brw_instruction_name(devinfo, inst->opcode));
@@ -5474,7 +5475,8 @@ fs_visitor::dump_instruction(backend_instruction 
*be_inst, FILE *file)
   (devinfo->gen < 5 || (inst->opcode != BRW_OPCODE_SEL &&
 inst->opcode != BRW_OPCODE_IF &&
 inst->opcode != BRW_OPCODE_WHILE))) {
- fprintf(file, ".f0.%d", inst->flag_subreg);
+ fprintf(file, ".f%d.%d", inst->flag_subreg / 2,
+ inst->flag_subreg % 2);
   }
}
fprintf(file, "(%d) ", inst->exec_size);
@@ -5860,7 +5862,7 @@ fs_visitor::calculate_register_pressure()
 bool
 fs_visitor::opt_drop_redundant_mov_to_flags()
 {
-   bool flag_mov_found[2] = {false};
+   bool flag_mov_found[4] = {false};
bool progress = false;
 
/* Instructions removed by this pass can only be added if this were true */
diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index cd5be054f69..1aed7fb850f 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -1459,7 +1459,7 @@ 
fs_generator::generate_varying_pull_constant_load_gen7(fs_inst *inst,
 void
 fs_generator::generate_mov_dispatch_to_flags(fs_inst *inst)
 {
-   struct brw_reg flags = brw_flag_reg(0, inst->flag_subreg);
+   struct brw_reg flags = brw_flag_subreg(inst->flag_subreg);
struct brw_reg dispatch_mask;
 
if (devinfo->gen >= 6)
@@ -1715,7 +1715,7 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
   brw_set_default_access_mode(p, BRW_ALIGN_1);
   brw_set_default_predicate_control(p, inst->predicate);
   brw_set_default_predicate_inverse(p, inst->predicate_inverse);
-  brw_set_default_flag_reg(p, 0, inst->flag_subreg);
+  brw_set_default_flag_reg(p, inst->flag_subreg / 2, inst->flag_subreg % 
2);
   brw_set_default_saturate(p, inst->saturate);
   brw_set_default_mask_control(p, inst->force_writemask_all);
   brw_set_default_acc_write_control(p, inst->writes_accumulator);
diff --git a/src/intel/compiler/brw_reg.h b/src/intel/compiler/brw_reg.h
index 17d5b97bf31..c41408104fa 100644
--- a/src/intel/compiler/brw_reg.h
+++ b/src/intel/compiler/brw_reg.h
@@ -842,6 +842,13 @@ brw_flag_reg(int reg, int subreg)
   BRW_ARF_FLAG + reg, subreg);
 }
 
+static inline struct brw_reg
+brw_flag_subreg(unsigned subreg)
+{
+   return brw_uw1_reg(BRW_ARCHITECTURE_REGISTER_FILE,
+  BRW_ARF_FLAG + subreg / 2, subreg % 2);
+}
+
 /**
  * Return the mask register present in Gen4-5, or the related register present
  * in Gen7.5 and later hardware referred to as "channel enable" register in
diff --git a/src/intel/compiler/brw_schedule_instructions.cpp 
b/src/intel/compiler/brw_schedule_instructions.cpp
index 692f7125323..0e793de4ddf 100644
--- a/src/intel/compiler/brw_schedule_instructions.cpp
+++ b/src/intel/compiler/brw_schedule_instructions.cpp
@@ -974,7 +974,7 @@ fs_instruction_scheduler::calculate_deps()
 */
schedule_node *last_grf_write[grf_count * 16];
schedule_node *last_mrf_write[BRW_MAX_MRF(v->devinfo->gen)];
-   schedule_node *last_conditional_mod[4] = {};
+   schedule_node *last_conditional_mod[8] = {};
schedule_node *last_accumulator_write = NULL;
/* Fixed HW registers are assumed to be separate from the virtual
 * GRFs, so they can be tracked separately.  We don't really write
diff --git a/src/intel/compiler/brw_shader.h b/src/intel/compiler/brw_shader.h
index 06abdc4d175..fd02feb9107 100644
--- a/src/intel/compiler/brw_shader.h
+++ b/src/intel/compiler/brw_shader.h
@@ -169,10 +169,10 @@ struct backend_instruction {
bool shadow_com

[Mesa-dev] [PATCH 1/6] intel/l3: Don't allocate SLM partition on ICL+.

2018-02-27 Thread Francisco Jerez
SLM has a chunk of special-purpose memory separate from L3 on ICL+, we
shouldn't allocate a partition for it on L3 anymore.
---
 src/intel/common/gen_l3_config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/common/gen_l3_config.c b/src/intel/common/gen_l3_config.c
index aff13c06ec0..7d58ad8d7c8 100644
--- a/src/intel/common/gen_l3_config.c
+++ b/src/intel/common/gen_l3_config.c
@@ -232,7 +232,7 @@ gen_get_default_l3_weights(const struct gen_device_info 
*devinfo,
 {
struct gen_l3_weights w = {{ 0 }};
 
-   w.w[GEN_L3P_SLM] = needs_slm;
+   w.w[GEN_L3P_SLM] = devinfo->gen < 11 && needs_slm;
w.w[GEN_L3P_URB] = 1.0;
 
if (devinfo->gen >= 8) {
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/6] Revert "i965/fs: Predicate byte scattered writes if needed"

2018-02-27 Thread Francisco Jerez
This reverts commit a4031bdfa927fb4c3c5d0bdadc70634f3c1a5eac.  It's
redundant with the sample mask predication done at this point by the
common logical send lowering infrastructure, and rather buggy because
it wasn't applying the correct sample mask in shaders using discard,
since the dispatch mask returned by FS_OPCODE_MOV_DISPATCH_TO_FLAGS
doesn't reflect samples discarded by the shader, so it could have led
to data corruption in fragment shader invocations that execute discard
based on a non-dynamically uniform condition.
---
Not CC'ing stable because AFAIK the affected code is not exposed to
the user yet.

 src/intel/compiler/brw_fs_nir.cpp | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 8efec34cc9d..b8d1182c00f 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4172,25 +4172,12 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
  * to rely on byte scattered in order to write 16-bit elements.
  * The byte_scattered_write message needs that every written 16-bit
  * type to be aligned 32-bits (stride=2).
- * Additionally, while on Untyped Surface messages the
- * bits of the execution mask are ANDed with the corresponding
- * bits of the Pixel/Sample Mask, that is not the case for byte
- * scattered writes. That is needed to avoid ssbo stores writing
- * on helper invocations. So when that can affect, we load the
- * sample mask, and predicate the send message.
  */
-brw_predicate pred = BRW_PREDICATE_NONE;
-
-if (stage == MESA_SHADER_FRAGMENT) {
-   bld.emit(FS_OPCODE_MOV_DISPATCH_TO_FLAGS);
-   pred = BRW_PREDICATE_NORMAL;
-}
-
 emit_byte_scattered_write(bld, surf_index, offset_reg,
   write_src,
   1 /* dims */, 1,
   bit_size,
-  pred);
+  BRW_PREDICATE_NONE);
  } else {
 assert(num_components * type_size <= 16);
 assert((num_components * type_size) % 4 == 0);
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6] intel/ir: Allow arbitrary scratch flag registers for SHADER_OPCODE_FIND_LIVE_CHANNEL.

2018-02-27 Thread Francisco Jerez
This shouldn't cause any functional change at this point, it changes
SHADER_OPCODE_FIND_LIVE_CHANNEL to use the flag register specified at
the IR level instead of the hard-coded f1.0, now that it can be
represented in backend_instruction::flag_subreg.  This will be
necessary for scheduling to behave correctly once more things start
making use of f1.0.
---
 src/intel/compiler/brw_eu_emit.c| 5 +++--
 src/intel/compiler/brw_fs.cpp   | 3 ++-
 src/intel/compiler/brw_fs_builder.h | 2 +-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index c25d8d6eda0..9fc6d12f288 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++ b/src/intel/compiler/brw_eu_emit.c
@@ -3393,7 +3393,9 @@ brw_find_live_channel(struct brw_codegen *p, struct 
brw_reg dst,
   */
  inst = brw_FBL(p, vec1(dst), exec_mask);
   } else {
- const struct brw_reg flag = brw_flag_reg(1, 0);
+ const struct brw_reg flag = brw_flag_reg(
+brw_inst_flag_reg_nr(devinfo, p->current),
+brw_inst_flag_subreg_nr(devinfo, p->current));
 
  brw_set_default_exec_size(p, BRW_EXECUTE_1);
  brw_MOV(p, retype(flag, BRW_REGISTER_TYPE_UD), brw_imm_ud(0));
@@ -3412,7 +3414,6 @@ brw_find_live_channel(struct brw_codegen *p, struct 
brw_reg dst,
 brw_inst_set_mask_control(devinfo, inst, BRW_MASK_ENABLE);
 brw_inst_set_group(devinfo, inst, lower_size * i + 8 * 
qtr_control);
 brw_inst_set_cond_modifier(devinfo, inst, BRW_CONDITIONAL_Z);
-brw_inst_set_flag_reg_nr(devinfo, inst, 1);
 brw_inst_set_exec_size(devinfo, inst, cvt(lower_size) - 1);
  }
 
diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 6c86b1592fd..0b87d8ab14e 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -931,7 +931,8 @@ fs_inst::flags_written() const
if ((conditional_mod && (opcode != BRW_OPCODE_SEL &&
 opcode != BRW_OPCODE_IF &&
 opcode != BRW_OPCODE_WHILE)) ||
-   opcode == FS_OPCODE_MOV_DISPATCH_TO_FLAGS) {
+   opcode == FS_OPCODE_MOV_DISPATCH_TO_FLAGS ||
+   opcode == SHADER_OPCODE_FIND_LIVE_CHANNEL) {
   return flag_mask(this);
} else {
   return flag_mask(dst, size_written);
diff --git a/src/intel/compiler/brw_fs_builder.h 
b/src/intel/compiler/brw_fs_builder.h
index 87394bc17b3..963588611c6 100644
--- a/src/intel/compiler/brw_fs_builder.h
+++ b/src/intel/compiler/brw_fs_builder.h
@@ -406,7 +406,7 @@ namespace brw {
  const dst_reg chan_index = vgrf(BRW_REGISTER_TYPE_UD);
  const dst_reg dst = vgrf(src.type);
 
- ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index);
+ ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index)->flag_subreg = 
2;
  ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, component(chan_index, 
0));
 
  return src_reg(component(dst, 0));
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/6] intel/fs: Handle surface opcode sample masks via predication.

2018-02-27 Thread Francisco Jerez
The main motivation is to enable HDC surface opcodes on ICL which no
longer allows the sample mask to be provided in a message header, but
this is enabled all the way back to IVB when possible because it
decreases the instruction count of some shaders using HDC messages
significantly, e.g. one of the SynMark2 CSDof compute shaders
decreases instruction count by about 40% due to the removal of header
setup boilerplate which in turn makes a number of send message
payloads more easily CSE-able.  Shader-db results on SKL:

 total instructions in shared programs: 15325319 -> 15314384 (-0.07%)
 instructions in affected programs: 311532 -> 300597 (-3.51%)
 helped: 491
 HURT: 1

Shader-db results on BDW where the optimization needs to be disabled
in some cases due to hardware restrictions:

 total instructions in shared programs: 15604794 -> 15598028 (-0.04%)
 instructions in affected programs: 220863 -> 214097 (-3.06%)
 helped: 351
 HURT: 0

The FPS of SynMark2 CSDof improves by 5.09% ±0.36% (n=10) on my SKL
laptop with this change.
---
 src/intel/compiler/brw_fs.cpp | 42 +-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 0b87d8ab14e..639432b4f49 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -4432,6 +4432,8 @@ static void
 lower_surface_logical_send(const fs_builder &bld, fs_inst *inst, opcode op,
const fs_reg &sample_mask)
 {
+   const gen_device_info *devinfo = bld.shader->devinfo;
+
/* Get the logical send arguments. */
const fs_reg &addr = inst->src[0];
const fs_reg &src = inst->src[1];
@@ -4442,7 +,20 @@ lower_surface_logical_send(const fs_builder &bld, 
fs_inst *inst, opcode op,
/* Calculate the total number of components of the payload. */
const unsigned addr_sz = inst->components_read(0);
const unsigned src_sz = inst->components_read(1);
-   const unsigned header_sz = (sample_mask.file == BAD_FILE ? 0 : 1);
+   /* From the BDW PRM Volume 7, page 147:
+*
+*  "For the Data Cache Data Port*, the header must be present for the
+*   following message types: [...] Typed read/write/atomics"
+*
+* Earlier generations have a similar wording.  Because of this restriction
+* we don't attempt to implement sample masks via predication for such
+* messages prior to Gen9, since we have to provide a header anyway.  On
+* Gen11+ the header has been removed so we can only use predication.
+*/
+   const unsigned header_sz = devinfo->gen < 9 &&
+  (op == SHADER_OPCODE_TYPED_SURFACE_READ ||
+   op == SHADER_OPCODE_TYPED_SURFACE_WRITE ||
+   op == SHADER_OPCODE_TYPED_ATOMIC) ? 1 : 0;
const unsigned sz = header_sz + addr_sz + src_sz;
 
/* Allocate space for the payload. */
@@ -4462,6 +4477,31 @@ lower_surface_logical_send(const fs_builder &bld, 
fs_inst *inst, opcode op,
 
bld.LOAD_PAYLOAD(payload, components, sz, header_sz);
 
+   /* Predicate the instruction on the sample mask if no header is
+* provided.
+*/
+   if (!header_sz && sample_mask.file != BAD_FILE &&
+   sample_mask.file != IMM) {
+  const fs_builder ubld = bld.group(1, 0).exec_all();
+  if (inst->predicate) {
+ assert(inst->predicate == BRW_PREDICATE_NORMAL);
+ assert(!inst->predicate_inverse);
+ assert(inst->flag_subreg < 2);
+ /* Combine the sample mask with the existing predicate by using a
+  * vertical predication mode.
+   */
+ inst->predicate = BRW_PREDICATE_ALIGN1_ALLV;
+ ubld.MOV(retype(brw_flag_subreg(inst->flag_subreg + 2),
+ sample_mask.type),
+  sample_mask);
+  } else {
+ inst->flag_subreg = 2;
+ inst->predicate = BRW_PREDICATE_NORMAL;
+ ubld.MOV(retype(brw_flag_subreg(inst->flag_subreg), sample_mask.type),
+  sample_mask);
+  }
+   }
+
/* Update the original instruction. */
inst->opcode = op;
inst->mlen = header_sz + (addr_sz + src_sz) * inst->exec_size / 8;
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/6] intel/eu: Plumb header present bit to codegen helpers for HDC messages.

2018-02-27 Thread Francisco Jerez
This makes sure that the header-present bit of the message descriptor
is in sync with the IR instruction fields, which gives the optimizer
more control to avoid the overhead of setting up a message header when
it's possible to do so.
---
 src/intel/compiler/brw_eu.h   | 18 --
 src/intel/compiler/brw_eu_emit.c  | 30 ++
 src/intel/compiler/brw_fs_generator.cpp   | 20 ++--
 src/intel/compiler/brw_vec4_generator.cpp | 11 ++-
 4 files changed, 50 insertions(+), 29 deletions(-)

diff --git a/src/intel/compiler/brw_eu.h b/src/intel/compiler/brw_eu.h
index 2d0f56f7938..a5f28d8fc65 100644
--- a/src/intel/compiler/brw_eu.h
+++ b/src/intel/compiler/brw_eu.h
@@ -444,7 +444,8 @@ brw_untyped_atomic(struct brw_codegen *p,
struct brw_reg surface,
unsigned atomic_op,
unsigned msg_length,
-   bool response_expected);
+   bool response_expected,
+   bool header_present);
 
 void
 brw_untyped_surface_read(struct brw_codegen *p,
@@ -459,7 +460,8 @@ brw_untyped_surface_write(struct brw_codegen *p,
   struct brw_reg payload,
   struct brw_reg surface,
   unsigned msg_length,
-  unsigned num_channels);
+  unsigned num_channels,
+  bool header_present);
 
 void
 brw_typed_atomic(struct brw_codegen *p,
@@ -468,7 +470,8 @@ brw_typed_atomic(struct brw_codegen *p,
  struct brw_reg surface,
  unsigned atomic_op,
  unsigned msg_length,
- bool response_expected);
+ bool response_expected,
+ bool header_present);
 
 void
 brw_typed_surface_read(struct brw_codegen *p,
@@ -476,14 +479,16 @@ brw_typed_surface_read(struct brw_codegen *p,
struct brw_reg payload,
struct brw_reg surface,
unsigned msg_length,
-   unsigned num_channels);
+   unsigned num_channels,
+   bool header_present);
 
 void
 brw_typed_surface_write(struct brw_codegen *p,
 struct brw_reg payload,
 struct brw_reg surface,
 unsigned msg_length,
-unsigned num_channels);
+unsigned num_channels,
+bool header_present);
 
 void
 brw_byte_scattered_read(struct brw_codegen *p,
@@ -498,7 +503,8 @@ brw_byte_scattered_write(struct brw_codegen *p,
  struct brw_reg payload,
  struct brw_reg surface,
  unsigned msg_length,
- unsigned bit_size);
+ unsigned bit_size,
+ bool header_present);
 
 void
 brw_memory_fence(struct brw_codegen *p,
diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index 9fc6d12f288..9529a30d27e 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++ b/src/intel/compiler/brw_eu_emit.c
@@ -2877,7 +2877,8 @@ brw_untyped_atomic(struct brw_codegen *p,
struct brw_reg surface,
unsigned atomic_op,
unsigned msg_length,
-   bool response_expected)
+   bool response_expected,
+   bool header_present)
 {
const struct gen_device_info *devinfo = p->devinfo;
const unsigned sfid = (devinfo->gen >= 8 || devinfo->is_haswell ?
@@ -2895,7 +2896,7 @@ brw_untyped_atomic(struct brw_codegen *p,
   p, sfid, brw_writemask(dst, mask), payload, surface, msg_length,
   brw_surface_payload_size(p, response_expected,
devinfo->gen >= 8 || devinfo->is_haswell, true),
-  align1);
+  header_present);
 
brw_set_dp_untyped_atomic_message(
   p, insn, atomic_op, response_expected);
@@ -2978,7 +2979,8 @@ brw_untyped_surface_write(struct brw_codegen *p,
   struct brw_reg payload,
   struct brw_reg surface,
   unsigned msg_length,
-  unsigned num_channels)
+  unsigned num_channels,
+  bool header_present)
 {
const struct gen_device_info *devinfo = p->devinfo;
const unsigned sfid = (devinfo->gen >= 8 || devinfo->is_haswell ?
@@ -2990,7 +2992,7 @@ brw_untyped_surface_write(struct brw_codegen *p,
   WRITEMASK_X : WRITEMASK_XYZW;
struct brw_inst *insn = brw_send_indirect_surface_message(
   p, sfid, brw_writemask(brw_null_reg(), mask),
-  payload, surface, msg_length, 0, align1);
+  payload, surface, msg_length, 0, header_present);
 
brw_set_dp_untyped_surface_write_message(
   p, in

[Mesa-dev] [Bug 105238] ast.h:648:16: error: union member 'i' has a non-trivial constructor

2018-02-27 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105238

--- Comment #6 from Brian Paul  ---
(In reply to Francisco Jerez from comment #5)
> Should be fixed in master now, closing.

Looks good here.  Thanks!

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Allow 48-bit addressing on Gen8+.

2018-02-27 Thread Chris Wilson
Quoting Kenneth Graunke (2018-02-27 20:56:29)
> On Tuesday, February 27, 2018 12:35:32 AM PST Chris Wilson wrote:
> > Quoting Kenneth Graunke (2018-02-27 00:05:46)
> > > +static bool
> > > +gem_supports_48b_addresses(int fd)
> > > +{
> > > +   struct drm_i915_gem_exec_object2 obj = {
> > > +  .flags = EXEC_OBJECT_SUPPORTS_48B_ADDRESS,
> > > +   };
> > > +
> > > +   struct drm_i915_gem_execbuffer2 execbuf = {
> > > +  .buffers_ptr = (uintptr_t)&obj,
> > > +  .buffer_count = 1,
> > > +  .rsvd1 = 0xffu,
> > > +   };
> > > +
> > > +   int ret = drmIoctl(fd, DRM_IOCTL_I915_GEM_EXECBUFFER2, &execbuf);
> > > +
> > > +   return ret == -1 && errno == ENOENT;
> > 
> > Note that this reports ENOENT due to the unknown ctx id which is
> > evaluated before the object now. But since you are only looking for the
> > flag, that's ok as the behaviour changed after the flag was introduced.
> 
> Alright.  I just stole this from anv, so I figured it was correct.

It does the job. I was worrying that it works despite not doing the
intended test. :)

One thing that is unambiguous is that only kernels with 48b ppgtt
support 48B address, so you could just query the context's gtt size and
only set the flag when dealing with a large ppgtt.

> > > if (devinfo->gen >= 8) {
> > > +  /* STATE_BASE_ADDRESS has issues with 48-bit address spaces.  If 
> > > the
> > > +   * address + size as seen by STATE_BASE_ADDRESS overflows 48 bits,
> > > +   * the GPU appears to treat all accesses to the buffer as being out
> > > +   * of bounds and returns zero.  To work around this, we pin all 
> > > SBAs
> > > +   * to the bottom 4GB.
> > > +   */
> > 
> > We could do with a quick explanation as to why we can't program the max
> > size. Aiui, it's because of dynamic resize and so you cannot program
> > here the size of object as reported to the kernel.
> 
> I don't think it has anything to do with dynamic resize.  I think Jason
> told me that it seems to compute an "upper bound address" by doing base
> address + size and if that's at the very end, it can wrap from 2**48 to
> 0, causing the upper bound to be less than the base, at which point it
> assumes zero size and all data reads back as 0.
> 
> If that's the actual problem, we don't need to restrict it to 4GB, we
> just need to restrict it away from the very end...

Ok, I have a patch around to exclude the last page in ppgtt which would
accomplish this. (The pain being having to check kernel versions, at
which point you will probably have finished softpinning before the
kernel is deployed.)

> > > diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
> > > b/src/mesa/drivers/dri/i965/intel_batchbuffer.c > index 
> > > 26718e0d1a2..0a8d3a80b64 100644
> > > --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> > > +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> > > @@ -1093,6 +1093,21 @@ emit_reloc(struct intel_batchbuffer *batch,
> > > unsigned int index = add_exec_bo(batch, target);
> > > struct drm_i915_gem_exec_object2 *entry = 
> > > &batch->validation_list[index];
> > >  
> > > +   if (reloc_flags & RELOC_32BIT) {
> > > +  /* Restrict this buffer to the low 32 bits of the address space.
> > > +   *
> > > +   * Altering the validation list flags restricts it for this batch,
> > > +   * but we also alter the BO's kflags to restrict it permanently
> > > +   * (until the BO is destroyed and put back in the cache).  Buffers
> > > +   * may stay bound across batches, and we want keep it constrained.
> > > +   */
> > 
> > Hmm, I think the intent is more along the lines of the buffer may be
> > reused for non-r32b targets between batches, but we want to avoid
> > ping-pong migrations into r32b (as moving an active buffer will stall,
> > hmm, can fix if needs must), so once we go r32b we never go back.
> 
> Yeah, that's a better explanation.  How about:
> 
>   /* Restrict this buffer to the low 32 bits of the address space,
>* both now, and forever.  Altering the validation list's flags
>* restricts it for this batch, but we also modify the BO's kflags
>* to restrict it permanently.  This avoids ping-ponging the BO
>* between the low 32 bits and the full address space.  Moving an
>* active buffer can cause stalls.
>*/

Ok. Trying to explain the migration would only be on active buffers for
48b->32b transitions is too painful. The gist is as in the comment, we
expect to use it as r32b in future, so if we allow the kernel to move it
(for swap out/in), we will expect to stall to fit it in 32b.
 
> > > +  target->kflags &= ~EXEC_OBJECT_SUPPORTS_48B_ADDRESS;
> > > +  entry->flags &= ~EXEC_OBJECT_SUPPORTS_48B_ADDRESS;
> > > +
> > > +  /* RELOC_32BIT is not an EXEC_OBJECT_* flag, so get rid of it. */
> > > +  reloc_flags &= ~RELOC_32BIT;
> > > +   }
> > > +
> > > if (reloc_flags)
> > >entry->flags |= reloc_flags & batch->valid_reloc_f

[Mesa-dev] [Bug 105274] Buffer overflow in gallium/auxiliary/hud/hud_cpufreq.c

2018-02-27 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105274

--- Comment #2 from vesim...@gmail.com ---
It is possible in both, you can create directory named "cpu0"
and it will be accepted by that sscanf. 
The easiest fix for it is by using strcpy_s instead of strcpy.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [AppVeyor] mesa master #7037 completed

2018-02-27 Thread AppVeyor


Build mesa 7037 completed



Commit cb309d27c5 by Francisco Jerez on 1/26/2018 7:48 PM:

intel/ir: Fix invalid type aliasing with undefined behavior in test_eu_compact.\n\ntest_fuzz_compact_instruction() was attempting to modify the uint64_t\ndata array of a brw_inst through a pointer to uint32_t, which has\nundefined behavior.  This was causing the test_eu_compact unit test to\nfail mysteriously for me on GCC 7 with some additional\nharmless-looking changes I had applied to my tree, which happened to\naffect the order instructions are emitted by GCC causing the bit\ntwiddling to be done after the clear_pad_bits() call which is supposed\nto overwrite the same data through a pointer of different type,\nleading to data corruption.  A similar failure has been reported by\nVinson Lee on the master branch built with GCC 8.\n\nBugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105052\nTested-by: Vinson Lee \nReviewed-by: Matt Turner 


Configure your notification preferences

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 105052] compiler/test_eu_compact failure with GCC 8

2018-02-27 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105052

Francisco Jerez  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Francisco Jerez  ---
Thanks, should be fixed in master now.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 105238] ast.h:648:16: error: union member 'i' has a non-trivial constructor

2018-02-27 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105238

Francisco Jerez  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Francisco Jerez  ---
Should be fixed in master now, closing.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Allow 48-bit addressing on Gen8+.

2018-02-27 Thread Kenneth Graunke
On Tuesday, February 27, 2018 12:35:32 AM PST Chris Wilson wrote:
> Quoting Kenneth Graunke (2018-02-27 00:05:46)
> > +static bool
> > +gem_supports_48b_addresses(int fd)
> > +{
> > +   struct drm_i915_gem_exec_object2 obj = {
> > +  .flags = EXEC_OBJECT_SUPPORTS_48B_ADDRESS,
> > +   };
> > +
> > +   struct drm_i915_gem_execbuffer2 execbuf = {
> > +  .buffers_ptr = (uintptr_t)&obj,
> > +  .buffer_count = 1,
> > +  .rsvd1 = 0xffu,
> > +   };
> > +
> > +   int ret = drmIoctl(fd, DRM_IOCTL_I915_GEM_EXECBUFFER2, &execbuf);
> > +
> > +   return ret == -1 && errno == ENOENT;
> 
> Note that this reports ENOENT due to the unknown ctx id which is
> evaluated before the object now. But since you are only looking for the
> flag, that's ok as the behaviour changed after the flag was introduced.

Alright.  I just stole this from anv, so I figured it was correct.

> > +}
> > +
> >  /**
> >   * Initializes the GEM buffer manager, which uses the kernel to allocate, 
> > map,
> >   * and manage map buffer objections.
> > @@ -1418,6 +1439,8 @@ brw_bufmgr_init(struct gen_device_info *devinfo, int 
> > fd)
> >  
> > bufmgr->has_llc = devinfo->has_llc;
> > bufmgr->has_mmap_wc = gem_param(fd, I915_PARAM_MMAP_VERSION) > 0;
> > +   bufmgr->supports_48b_addresses =
> > +  devinfo->gen >= 8 && gem_supports_48b_addresses(fd);
> >  
> > init_cache_buckets(bufmgr);
> >  
> > diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c 
> > b/src/mesa/drivers/dri/i965/brw_misc_state.c
> > index c4ef6812bff..29d74876c27 100644
> > --- a/src/mesa/drivers/dri/i965/brw_misc_state.c
> > +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
> > @@ -634,6 +634,12 @@ brw_upload_state_base_address(struct brw_context *brw)
> > }
> >  
> > if (devinfo->gen >= 8) {
> > +  /* STATE_BASE_ADDRESS has issues with 48-bit address spaces.  If the
> > +   * address + size as seen by STATE_BASE_ADDRESS overflows 48 bits,
> > +   * the GPU appears to treat all accesses to the buffer as being out
> > +   * of bounds and returns zero.  To work around this, we pin all SBAs
> > +   * to the bottom 4GB.
> > +   */
> 
> We could do with a quick explanation as to why we can't program the max
> size. Aiui, it's because of dynamic resize and so you cannot program
> here the size of object as reported to the kernel.

I don't think it has anything to do with dynamic resize.  I think Jason
told me that it seems to compute an "upper bound address" by doing base
address + size and if that's at the very end, it can wrap from 2**48 to
0, causing the upper bound to be less than the base, at which point it
assumes zero size and all data reads back as 0.

If that's the actual problem, we don't need to restrict it to 4GB, we
just need to restrict it away from the very end...

> > diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
> > b/src/mesa/drivers/dri/i965/intel_batchbuffer.c > index 
> > 26718e0d1a2..0a8d3a80b64 100644
> > --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> > +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> > @@ -1093,6 +1093,21 @@ emit_reloc(struct intel_batchbuffer *batch,
> > unsigned int index = add_exec_bo(batch, target);
> > struct drm_i915_gem_exec_object2 *entry = 
> > &batch->validation_list[index];
> >  
> > +   if (reloc_flags & RELOC_32BIT) {
> > +  /* Restrict this buffer to the low 32 bits of the address space.
> > +   *
> > +   * Altering the validation list flags restricts it for this batch,
> > +   * but we also alter the BO's kflags to restrict it permanently
> > +   * (until the BO is destroyed and put back in the cache).  Buffers
> > +   * may stay bound across batches, and we want keep it constrained.
> > +   */
> 
> Hmm, I think the intent is more along the lines of the buffer may be
> reused for non-r32b targets between batches, but we want to avoid
> ping-pong migrations into r32b (as moving an active buffer will stall,
> hmm, can fix if needs must), so once we go r32b we never go back.

Yeah, that's a better explanation.  How about:

  /* Restrict this buffer to the low 32 bits of the address space,
   * both now, and forever.  Altering the validation list's flags
   * restricts it for this batch, but we also modify the BO's kflags
   * to restrict it permanently.  This avoids ping-ponging the BO
   * between the low 32 bits and the full address space.  Moving an
   * active buffer can cause stalls.
   */

> > +  target->kflags &= ~EXEC_OBJECT_SUPPORTS_48B_ADDRESS;
> > +  entry->flags &= ~EXEC_OBJECT_SUPPORTS_48B_ADDRESS;
> > +
> > +  /* RELOC_32BIT is not an EXEC_OBJECT_* flag, so get rid of it. */
> > +  reloc_flags &= ~RELOC_32BIT;
> > +   }
> > +
> > if (reloc_flags)
> >entry->flags |= reloc_flags & batch->valid_reloc_flags;
> 
> > diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c 
> > b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > index fb180289a0c..2e54adb3ed2 

[Mesa-dev] [PATCH 2/4] intel/isl: Add isl_format_is_valid

2018-02-27 Thread Jordan Justen
Signed-off-by: Jordan Justen 
---
 src/intel/isl/gen_format_layout.py | 8 
 src/intel/isl/isl.h| 2 ++
 2 files changed, 10 insertions(+)

diff --git a/src/intel/isl/gen_format_layout.py 
b/src/intel/isl/gen_format_layout.py
index 0ca42dbab8d..53cdd3b8110 100644
--- a/src/intel/isl/gen_format_layout.py
+++ b/src/intel/isl/gen_format_layout.py
@@ -89,6 +89,14 @@ isl_format_layouts[] = {
 % endfor
 };
 
+bool
+isl_format_is_valid(enum isl_format format)
+{
+if (format >= sizeof(isl_format_layouts) / sizeof(isl_format_layouts[0]))
+return false;
+return isl_format_layouts[format].name;
+}
+
 enum isl_format
 isl_format_srgb_to_linear(enum isl_format format)
 {
diff --git a/src/intel/isl/isl.h b/src/intel/isl/isl.h
index 863b6e4befa..0da6abb71d4 100644
--- a/src/intel/isl/isl.h
+++ b/src/intel/isl/isl.h
@@ -1410,6 +1410,8 @@ isl_format_get_layout(enum isl_format fmt)
return &isl_format_layouts[fmt];
 }
 
+bool isl_format_is_valid(enum isl_format);
+
 static inline const char * ATTRIBUTE_CONST
 isl_format_get_name(enum isl_format fmt)
 {
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] intel: Split gen_device_info out into libintel_dev

2018-02-27 Thread Jordan Justen
Signed-off-by: Jordan Justen 
---
 src/intel/Android.dev.mk   | 35 ++
 src/intel/Makefile.am  |  1 +
 src/intel/Makefile.dev.am  | 31 +++
 src/intel/Makefile.isl.am  |  2 +-
 src/intel/Makefile.sources |  6 ++--
 src/intel/Makefile.tools.am|  4 +++
 src/intel/Makefile.vulkan.am   |  1 +
 src/intel/blorp/blorp_genX_exec.h  |  2 +-
 src/intel/common/gen_decoder.h |  2 +-
 src/intel/common/gen_l3_config.h   |  2 +-
 src/intel/common/meson.build   |  2 --
 src/intel/compiler/brw_compiler.h  |  2 +-
 src/intel/compiler/brw_inst.h  |  2 +-
 src/intel/compiler/brw_reg_type.c  |  2 +-
 src/intel/{common => dev}/gen_device_info.c|  0
 src/intel/{common => dev}/gen_device_info.h|  0
 src/intel/dev/meson.build  | 33 
 src/intel/genxml/gen_bits_header.py|  2 +-
 src/intel/isl/isl_drm.c|  2 +-
 src/intel/isl/isl_format.c |  2 +-
 src/intel/isl/isl_priv.h   |  2 +-
 src/intel/isl/meson.build  |  2 +-
 .../isl/tests/isl_surf_get_image_offset_test.c |  2 +-
 src/intel/meson.build  |  1 +
 src/intel/tools/gen_disasm.h   |  2 +-
 src/intel/tools/meson.build|  4 +--
 src/intel/vulkan/anv_private.h |  2 +-
 src/intel/vulkan/meson.build   |  8 ++---
 src/mesa/drivers/dri/i965/Makefile.am  |  1 +
 src/mesa/drivers/dri/i965/brw_bufmgr.c |  2 +-
 src/mesa/drivers/dri/i965/genX_state_upload.c  |  2 +-
 src/mesa/drivers/dri/i965/intel_screen.h   |  2 +-
 src/mesa/drivers/dri/i965/meson.build  |  3 +-
 33 files changed, 137 insertions(+), 29 deletions(-)
 create mode 100644 src/intel/Android.dev.mk
 create mode 100644 src/intel/Makefile.dev.am
 rename src/intel/{common => dev}/gen_device_info.c (100%)
 rename src/intel/{common => dev}/gen_device_info.h (100%)
 create mode 100644 src/intel/dev/meson.build

diff --git a/src/intel/Android.dev.mk b/src/intel/Android.dev.mk
new file mode 100644
index 000..956f32c119f
--- /dev/null
+++ b/src/intel/Android.dev.mk
@@ -0,0 +1,35 @@
+# Copyright © 2016 Intel Corporation
+# Copyright © 2016 Mauro Rossi 
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+# DEALINGS IN THE SOFTWARE.
+
+# ---
+# Build libmesa_intel_dev
+# ---
+
+include $(CLEAR_VARS)
+
+LOCAL_MODULE := libmesa_intel_dev
+
+LOCAL_MODULE_CLASS := STATIC_LIBRARIES
+
+LOCAL_SRC_FILES := $(DEV_FILES)
+
+include $(MESA_COMMON_MK)
+include $(BUILD_STATIC_LIBRARY)
diff --git a/src/intel/Makefile.am b/src/intel/Makefile.am
index cde4a70fbc6..bfb7f5b9f08 100644
--- a/src/intel/Makefile.am
+++ b/src/intel/Makefile.am
@@ -75,6 +75,7 @@ EXTRA_DIST = \
 include Makefile.blorp.am
 include Makefile.common.am
 include Makefile.compiler.am
+include Makefile.dev.am
 include Makefile.genxml.am
 include Makefile.isl.am
 include Makefile.tools.am
diff --git a/src/intel/Makefile.dev.am b/src/intel/Makefile.dev.am
new file mode 100644
index 000..2bf5b4e232a
--- /dev/null
+++ b/src/intel/Makefile.dev.am
@@ -0,0 +1,31 @@
+# Copyright © 2016 Intel Corporation
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# 

[Mesa-dev] [PATCH 3/4] intel/common: Use isl for decoder surface formats

2018-02-27 Thread Jordan Justen
Signed-off-by: Jordan Justen 
---
 src/intel/common/gen_decoder.c | 8 
 src/intel/common/meson.build   | 1 +
 src/intel/meson.build  | 2 +-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c
index 87a81b4b05a..7ca71c0d9fc 100644
--- a/src/intel/common/gen_decoder.c
+++ b/src/intel/common/gen_decoder.c
@@ -35,6 +35,7 @@
 
 #include "gen_decoder.h"
 
+#include "isl/isl.h"
 #include "genxml/genX_xml.h"
 
 #define XML_BUFFER_SIZE 4096
@@ -954,6 +955,13 @@ iter_decode_field(struct gen_field_iterator *iter)
   int length = strlen(iter->value);
   snprintf(iter->value + length, sizeof(iter->value) - length,
" (%s)", enum_name);
+   } else if (strcmp(iter->name, "Surface Format") == 0) {
+  if (isl_format_is_valid((enum isl_format)v.qw)) {
+ const char *fmt_name = isl_format_get_name((enum isl_format)v.qw);
+ int length = strlen(iter->value);
+ snprintf(iter->value + length, sizeof(iter->value) - length,
+  " (%s)", fmt_name);
+  }
}
 }
 
diff --git a/src/intel/common/meson.build b/src/intel/common/meson.build
index d35d5e8f78e..5e0394a5b86 100644
--- a/src/intel/common/meson.build
+++ b/src/intel/common/meson.build
@@ -38,5 +38,6 @@ libintel_common = static_library(
   files_libintel_common,
   include_directories : [inc_common, inc_intel],
   c_args : [c_vis_args, no_override_init_args],
+  link_with : [libisl],
   dependencies : [dep_expat, dep_libdrm, dep_thread],
 )
diff --git a/src/intel/meson.build b/src/intel/meson.build
index 28a2d793509..ccaf16a76f9 100644
--- a/src/intel/meson.build
+++ b/src/intel/meson.build
@@ -23,8 +23,8 @@ inc_intel = include_directories('.')
 subdir('blorp')
 subdir('dev')
 subdir('genxml')
-subdir('common')
 subdir('isl')
+subdir('common')
 subdir('compiler')
 subdir('tools')
 if with_intel_vk
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] disk cache: Link with -latomic if necessary

2018-02-27 Thread Matt Turner
On Tue, Feb 27, 2018 at 2:26 AM, Thierry Reding
 wrote:
> On Mon, Feb 26, 2018 at 11:14:05AM -0800, Matt Turner wrote:
>> On Fri, Feb 23, 2018 at 5:18 AM, Thierry Reding
>>  wrote:
>> > From: Thierry Reding 
>> >
>> > The disk cache implementation uses 64-bit atomic operations. For some
>> > architectures, such as 32-bit ARM, GCC will not be able to translate
>> > these operations into lock-free instructions and will instead rely on
>>
>> Here, and in the comment in meson.build, I think you mean "atomic"
>> rather than "lock-free" instructions? It's at least confusing, since
>> on x86 atomic instructions have a "lock" prefix.
>
> This uses the terminology used by the GCC documentation, see:
>
> https://gcc.gnu.org/wiki/Atomic/GCCMM
>
> I think the GCC terms merely mean that you don't need any explicit
> locking for these operations to be atomic.
>
> How about this instead:
>
> "... operations into atomic, lock-free instructions..."
>
> ?

Ah, I see. Thanks, that works for me.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] anv/pipeline: don't pass constant view index in multiview

2018-02-27 Thread Jason Ekstrand
On Tue, Feb 27, 2018 at 11:46 AM, Caio Marcelo de Oliveira Filho <
caio.olive...@intel.com> wrote:

> If view mask has only one bit set, view index is effectively a
> constant, so doesn't need to be passed to the next stages, just always
> set it.
>
> Part of this was in the original patch that added
> anv_nir_lower_multiview.c but disabled.
> ---
>  src/intel/vulkan/anv_nir_lower_multiview.c | 17 +++--
>  1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/src/intel/vulkan/anv_nir_lower_multiview.c
> b/src/intel/vulkan/anv_nir_lower_multiview.c
> index d2aefdee62..365a70d757 100644
> --- a/src/intel/vulkan/anv_nir_lower_multiview.c
> +++ b/src/intel/vulkan/anv_nir_lower_multiview.c
> @@ -72,7 +72,8 @@ build_view_index(struct lower_multiview_state *state)
>b->cursor = nir_before_block(nir_start_block(b->impl));
>
>assert(state->view_mask != 0);
> -  if (0 && _mesa_bitcount(state->view_mask) == 1) {
> +  if (_mesa_bitcount(state->view_mask) == 1) {
>

Yes, I think it's probably safe to turn this on now.  Originally, I had it
commented out because I was afraid of not getting enough test coverage.
For all I know, we still aren't getting enough test coverage but I think
we've proven by now that the calculations below work.


> + /* Set the view index directly. */
>   state->view_index = nir_imm_int(b, ffs(state->view_mask) - 1);
>} else if (state->builder.shader->info.stage ==
> MESA_SHADER_VERTEX) {
>   /* We only support 16 viewports */
> @@ -210,11 +211,15 @@ anv_nir_lower_multiview(nir_shader *shader,
> uint32_t view_mask)
>assert(view_index->parent_instr->block ==
> nir_start_block(entrypoint));
>b->cursor = nir_after_instr(view_index->parent_instr);
>
> -  nir_variable *view_index_out =
> - nir_variable_create(shader, nir_var_shader_out,
> - glsl_int_type(), "view index");
> -  view_index_out->data.location = VARYING_SLOT_VIEW_INDEX;
> -  nir_store_var(b, view_index_out, view_index, 0x1);
> +  /* Unless there is only one possible view index (that would be set
> +   * directly), pass it to the next stage. */
>

With multi-line comments, we usually put the "*/" on its own line.


> +  if (_mesa_bitcount(state.view_mask) != 1) {
> + nir_variable *view_index_out =
> +nir_variable_create(shader, nir_var_shader_out,
> +glsl_int_type(), "view index");
> + view_index_out->data.location = VARYING_SLOT_VIEW_INDEX;
> + nir_store_var(b, view_index_out, view_index, 0x1);
> +  }
>
>nir_variable *layer_id_out =
>   nir_variable_create(shader, nir_var_shader_out,
> --
> 2.16.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 6/7] swr/rast: Faster frustum prim culling

2018-02-27 Thread George Kyriazis
Fix clipper validMask setting. We don't need to run frustum rejected
primitives through the clipper.  Perform frustum culling with only
frustum clip codes. Guardband clip codes cannot be used because they
overlap frustum codes.
---
 src/gallium/drivers/swr/rasterizer/core/clip.h | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/clip.h 
b/src/gallium/drivers/swr/rasterizer/core/clip.h
index 8d2590a..0f8399c 100644
--- a/src/gallium/drivers/swr/rasterizer/core/clip.h
+++ b/src/gallium/drivers/swr/rasterizer/core/clip.h
@@ -60,6 +60,7 @@ enum SWR_CLIPCODES
 };
 
 #define GUARDBAND_CLIP_MASK 
(FRUSTUM_NEAR|FRUSTUM_FAR|GUARDBAND_LEFT|GUARDBAND_TOP|GUARDBAND_RIGHT|GUARDBAND_BOTTOM|NEGW)
+#define FRUSTUM_CLIP_MASK 
(FRUSTUM_NEAR|FRUSTUM_FAR|FRUSTUM_LEFT|FRUSTUM_RIGHT|FRUSTUM_TOP|FRUSTUM_BOTTOM)
 
 template
 void ComputeClipCodes(const API_STATE &state, const Vec4 &vertex, 
Float &clipCodes, Integer const &viewportIndexes)
@@ -708,15 +709,18 @@ public:
 primMask &= ~ComputeUserClipCullMask(pa, prim);
 }
 
-// cull prims outside view frustum
 Float clipIntersection = ComputeClipCodeIntersection();
+// Mask out non-frustum codes
+clipIntersection = SIMD_T::and_ps(clipIntersection, 
SIMD_T::castsi_ps(SIMD_T::set1_epi32(FRUSTUM_CLIP_MASK)));
+
+// cull prims outside view frustum
 int validMask = primMask & 
SimdHelper::cmpeq_ps_mask(clipIntersection, SIMD_T::setzero_ps());
 
 // skip clipping for points
 uint32_t clipMask = 0;
 if (NumVertsPerPrim != 1)
 {
-clipMask = primMask & ComputeClipMask();
+clipMask = validMask & ComputeClipMask();
 }
 
 AR_EVENT(ClipInfoEvent(numInvoc, validMask, clipMask));
@@ -726,7 +730,7 @@ public:
 RDTSC_BEGIN(FEGuardbandClip, pa.pDC->drawId);
 // we have to clip tris, execute the clipper, which will also
 // call the binner
-ClipSimd(prim, SIMD_T::vmask_ps(primMask), 
SIMD_T::vmask_ps(clipMask), pa, primId, viewportIdx, rtIdx);
+ClipSimd(prim, SIMD_T::vmask_ps(validMask), 
SIMD_T::vmask_ps(clipMask), pa, primId, viewportIdx, rtIdx);
 RDTSC_END(FEGuardbandClip, 1);
 }
 else if (validMask)
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 5/7] swr/rast: Consolidate TRANSLATE_ADDRESS

2018-02-27 Thread George Kyriazis
Translate is now part of an overloaded LOAD call which required a change to
the code gen to skip the load functions in order to handle them manually
to make them virtual.
---
 .../swr/rasterizer/codegen/gen_llvm_ir_macros.py |  3 ++-
 .../drivers/swr/rasterizer/jitter/builder_mem.cpp| 20 
 .../drivers/swr/rasterizer/jitter/builder_mem.h  |  7 ++-
 .../drivers/swr/rasterizer/jitter/fetch_jit.cpp  |  4 
 4 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py 
b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
index 3b19cb4..aab499b 100644
--- a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
+++ b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
@@ -152,7 +152,8 @@ def parse_ir_builder(input_file):
 # The following functions need to be ignored.
 if (func_name == 'CreateInsertNUWNSWBinOp' or
 func_name == 'CreateMaskedIntrinsic' or
-func_name == 'CreateAlignmentAssumptionHelper'):
+func_name == 'CreateAlignmentAssumptionHelper' or
+func_name == 'CreateLoad'):
 ignore = True
 
 # Convert CamelCase to CAMEL_CASE
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_mem.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/builder_mem.cpp
index f7d0402..05e450f 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/builder_mem.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_mem.cpp
@@ -69,6 +69,26 @@ namespace SwrJit
 return IN_BOUNDS_GEP(ptr, indices);
 }
 
+LoadInst* Builder::LOAD(Value *Ptr, const char *Name)
+{
+return IRB()->CreateLoad(Ptr, Name);
+}
+
+LoadInst* Builder::LOAD(Value *Ptr, const Twine &Name)
+{
+return IRB()->CreateLoad(Ptr, Name);
+}
+
+LoadInst* Builder::LOAD(Type *Ty, Value *Ptr, const Twine &Name)
+{
+return IRB()->CreateLoad(Ty, Ptr, Name);
+}
+
+LoadInst* Builder::LOAD(Value *Ptr, bool isVolatile, const Twine &Name)
+{
+return IRB()->CreateLoad(Ptr, isVolatile, Name);
+}
+
 LoadInst *Builder::LOAD(Value *basePtr, const 
std::initializer_list &indices, const llvm::Twine& name)
 {
 std::vector valIndices;
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_mem.h 
b/src/gallium/drivers/swr/rasterizer/jitter/builder_mem.h
index 4f49634..b3a0e2b 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/builder_mem.h
+++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_mem.h
@@ -34,7 +34,12 @@ Value *GEP(Value* ptr, const std::initializer_list 
&indexList);
 Value *IN_BOUNDS_GEP(Value* ptr, const std::initializer_list 
&indexList);
 Value *IN_BOUNDS_GEP(Value* ptr, const std::initializer_list 
&indexList);
 
-LoadInst *LOAD(Value *BasePtr, const std::initializer_list &offset, 
const llvm::Twine& name = "");
+virtual LoadInst* LOAD(Value *Ptr, const char *Name);
+virtual LoadInst* LOAD(Value *Ptr, const Twine &Name = "");
+virtual LoadInst* LOAD(Type *Ty, Value *Ptr, const Twine &Name = "");
+virtual LoadInst* LOAD(Value *Ptr, bool isVolatile, const Twine &Name = "");
+virtual LoadInst* LOAD(Value *BasePtr, const std::initializer_list 
&offset, const llvm::Twine& Name = "");
+
 LoadInst *LOADV(Value *BasePtr, const std::initializer_list &offset, 
const llvm::Twine& name = "");
 StoreInst *STORE(Value *Val, Value *BasePtr, const 
std::initializer_list &offset);
 StoreInst *STOREV(Value *Val, Value *BasePtr, const 
std::initializer_list &offset);
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
index 68bd4c1..f1dc002 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
@@ -1830,16 +1830,12 @@ Value* FetchJit::GetSimdValid16bitIndices(Value* 
pIndices, Value* pLastIndex)
 Value* pZeroIndex = ALLOCA(mInt16Ty);
 STORE(C((uint16_t)0), pZeroIndex);
 
-pLastIndex = TRANSLATE_ADDRESS(pLastIndex);
-
 // Load a SIMD of index pointers
 for(int64_t lane = 0; lane < mVWidth; lane++)
 {
 // Calculate the address of the requested index
 Value *pIndex = GEP(pIndices, C(lane));
 
-pIndex = TRANSLATE_ADDRESS(pIndex);
-
 // check if the address is less than the max index, 
 Value* mask = ICMP_ULT(pIndex, pLastIndex);
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 3/7] swr/rast: Remove draw type from event definitions

2018-02-27 Thread George Kyriazis
- Have the draw type sent to DrawInfoEvent in handlers created in
  archrast.cpp.  The draw type no longer needs to be sent during during
  AR_API_EVENT() call in api.cpp.

- Remove draw type from event defintions in events_private.proto, no
  longer needed
---
 src/gallium/drivers/swr/rasterizer/archrast/archrast.cpp | 8 
 src/gallium/drivers/swr/rasterizer/archrast/events_private.proto | 4 
 src/gallium/drivers/swr/rasterizer/core/api.cpp  | 8 
 3 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/archrast/archrast.cpp 
b/src/gallium/drivers/swr/rasterizer/archrast/archrast.cpp
index d7a3b29..8c09411 100644
--- a/src/gallium/drivers/swr/rasterizer/archrast/archrast.cpp
+++ b/src/gallium/drivers/swr/rasterizer/archrast/archrast.cpp
@@ -175,28 +175,28 @@ namespace ArchRast
 
 virtual void Handle(const DrawInstancedEvent& event)
 {
-DrawInfoEvent e(event.data.drawId, event.data.type, 
event.data.topology, event.data.numVertices, 0, 0, event.data.startVertex, 
event.data.numInstances, event.data.startInstance);
+DrawInfoEvent e(event.data.drawId, ArchRast::Instanced, 
event.data.topology, event.data.numVertices, 0, 0, event.data.startVertex, 
event.data.numInstances, event.data.startInstance);
 
 EventHandlerFile::Handle(e);
 }
 
 virtual void Handle(const DrawIndexedInstancedEvent& event)
 {
-DrawInfoEvent e(event.data.drawId, event.data.type, 
event.data.topology, 0, event.data.numIndices, event.data.indexOffset, 
event.data.baseVertex, event.data.numInstances, event.data.startInstance);
+DrawInfoEvent e(event.data.drawId, ArchRast::IndexedInstanced, 
event.data.topology, 0, event.data.numIndices, event.data.indexOffset, 
event.data.baseVertex, event.data.numInstances, event.data.startInstance);
 
 EventHandlerFile::Handle(e);
 }
 
 virtual void Handle(const DrawInstancedSplitEvent& event)
 {
-DrawInfoEvent e(event.data.drawId, event.data.type, 0, 0, 0, 0, 0, 
0, 0);
+DrawInfoEvent e(event.data.drawId, ArchRast::InstancedSplit, 0, 0, 
0, 0, 0, 0, 0);
 
 EventHandlerFile::Handle(e);
 }
 
 virtual void Handle(const DrawIndexedInstancedSplitEvent& event)
 {
-DrawInfoEvent e(event.data.drawId, event.data.type, 0, 0, 0, 0, 0, 
0, 0);
+DrawInfoEvent e(event.data.drawId, 
ArchRast::IndexedInstancedSplit, 0, 0, 0, 0, 0, 0, 0);
 
 EventHandlerFile::Handle(e);
 }
diff --git a/src/gallium/drivers/swr/rasterizer/archrast/events_private.proto 
b/src/gallium/drivers/swr/rasterizer/archrast/events_private.proto
index 71b723d..8970141 100644
--- a/src/gallium/drivers/swr/rasterizer/archrast/events_private.proto
+++ b/src/gallium/drivers/swr/rasterizer/archrast/events_private.proto
@@ -117,7 +117,6 @@ event ClipInfoEvent
 event DrawInstancedEvent
 {
 uint32_t drawId;
-AR_DRAW_TYPE type;
 uint32_t topology;
 uint32_t numVertices;
 int32_t  startVertex;
@@ -128,7 +127,6 @@ event DrawInstancedEvent
 event DrawIndexedInstancedEvent
 {
 uint32_t drawId;
-AR_DRAW_TYPE type;
 uint32_t topology;
 uint32_t numIndices;
 int32_t  indexOffset;
@@ -141,12 +139,10 @@ event DrawIndexedInstancedEvent
 event DrawInstancedSplitEvent
 {
 uint32_t drawId;
-AR_DRAW_TYPE type;
 };
 
 ///@brief API Stat: Split draw event for DrawIndexedInstanced.
 event DrawIndexedInstancedSplitEvent
 {
 uint32_t drawId;
-AR_DRAW_TYPE type;
 };
diff --git a/src/gallium/drivers/swr/rasterizer/core/api.cpp 
b/src/gallium/drivers/swr/rasterizer/core/api.cpp
index cb98cbe..99d3cd5 100644
--- a/src/gallium/drivers/swr/rasterizer/core/api.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/api.cpp
@@ -1169,7 +1169,7 @@ void DrawInstanced(
 DRAW_CONTEXT* pDC = GetDrawContext(pContext);
 
 RDTSC_BEGIN(APIDraw, pDC->drawId);
-AR_API_EVENT(DrawInstancedEvent(pDC->drawId, ArchRast::Instanced, 
topology, numVertices, startVertex, numInstances, startInstance));
+AR_API_EVENT(DrawInstancedEvent(pDC->drawId, topology, numVertices, 
startVertex, numInstances, startInstance));
 
 uint32_t maxVertsPerDraw = MaxVertsPerDraw(pDC, numVertices, topology);
 uint32_t primsPerDraw = GetNumPrims(topology, maxVertsPerDraw);
@@ -1221,7 +1221,7 @@ void DrawInstanced(
 //enqueue DC
 QueueDraw(pContext);
 
-AR_API_EVENT(DrawInstancedSplitEvent(pDC->drawId, 
ArchRast::InstancedSplit));
+AR_API_EVENT(DrawInstancedSplitEvent(pDC->drawId));
 
 remainingVerts -= numVertsForDraw;
 draw++;
@@ -1297,7 +1297,7 @@ void DrawIndexedInstance(
 API_STATE* pState = &pDC->pState->state;
 
 RDTSC_BEGIN(APIDrawIndexed, pDC->drawId);
-AR_API_EVENT(DrawIndexedInstancedEvent(pDC->drawId, 
ArchRast::IndexedInstancedSplit, topology, numIndices, indexOffset

[Mesa-dev] [PATCH v2 7/7] swr/rast: revert clip distance precision

2018-02-27 Thread George Kyriazis
Breaks a whole bunch of piglit tests
---
 src/gallium/drivers/swr/rasterizer/core/backend_impl.h |  4 +---
 src/gallium/drivers/swr/rasterizer/core/binner.cpp | 17 -
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/backend_impl.h 
b/src/gallium/drivers/swr/rasterizer/core/backend_impl.h
index 454f473..2cfd52e 100644
--- a/src/gallium/drivers/swr/rasterizer/core/backend_impl.h
+++ b/src/gallium/drivers/swr/rasterizer/core/backend_impl.h
@@ -62,10 +62,8 @@ static INLINE simdmask ComputeUserClipMask(uint8_t clipMask, 
float* pUserClipBuf
 simdscalar vB = _simd_broadcast_ss(pUserClipBuffer++);
 simdscalar vC = _simd_broadcast_ss(pUserClipBuffer++);
 
-simdscalar vK = _simd_sub_ps(_simd_sub_ps(_simd_set1_ps(1.0f), vI), 
vJ);
-
 // interpolate
-simdscalar vInterp = vplaneps(vA, vB, _simd_mul_ps(vK, vC), vI, vJ);
+simdscalar vInterp = vplaneps(vA, vB, vC, vI, vJ);
 
 // clip if interpolated clip distance is < 0 || NAN
 simdscalar vCull = _simd_cmp_ps(_simd_setzero_ps(), vInterp, 
_CMP_NLE_UQ);
diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
index 3b093ce..c9a37cb 100644
--- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
@@ -256,12 +256,27 @@ void ProcessUserClipDist(const SWR_BACKEND_STATE& state, 
PA_STATE& pa, uint32_t
 simd4scalar primClipDist[3];
 pa.AssembleSingle(clipAttribSlot, primIndex, primClipDist);
 
+float vertClipDist[NumVerts];
 for (uint32_t e = 0; e < NumVerts; ++e)
 {
 OSALIGNSIMD(float) aVertClipDist[4];
 SIMD128::store_ps(aVertClipDist, primClipDist[e]);
-*(pUserClipBuffer++) = aVertClipDist[clipComp];
+vertClipDist[e] = aVertClipDist[clipComp];
 };
+
+// setup plane equations for barycentric interpolation in the backend
+float baryCoeff[NumVerts];
+float last = vertClipDist[NumVerts - 1] * pRecipW[NumVerts - 1];
+for (uint32_t e = 0; e < NumVerts - 1; ++e)
+{
+baryCoeff[e] = vertClipDist[e] * pRecipW[e] - last;
+}
+baryCoeff[NumVerts - 1] = last;
+
+for (uint32_t e = 0; e < NumVerts; ++e)
+{
+*(pUserClipBuffer++) = baryCoeff[e];
+}
 }
 }
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 4/7] swr/rast: Code generation cleanup

2018-02-27 Thread George Kyriazis
Generate more compact code from gen_llvm.hpp.
---
 .../swr/rasterizer/codegen/templates/gen_llvm.hpp  | 36 +-
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_llvm.hpp 
b/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_llvm.hpp
index d61194d..190e660 100644
--- a/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_llvm.hpp
+++ b/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_llvm.hpp
@@ -1,5 +1,5 @@
 /
-* Copyright (C) 2014-2017 Intel Corporation.   All Rights Reserved.
+* Copyright (C) 2014-2018 Intel Corporation.   All Rights Reserved.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
@@ -39,19 +39,19 @@ namespace SwrJit
 %for type in types:
 INLINE static StructType *Gen_${type['name']}(JitManager* pJitMgr)
 {
+%if needs_ctx(type):
 LLVMContext& ctx = pJitMgr->mContext;
-   (void) ctx;
 
+%endif
 StructType* pRetType = 
pJitMgr->mpCurrentModule->getTypeByName("${type['name']}");
 if (pRetType == nullptr)
 {
-std::vector members;
-<%
-(max_type_len, max_name_len) = calc_max_len(type['members'])
-%>
-%for member in type['members']:
-/* ${member['name']} ${pad(len(member['name']), max_name_len)}*/ 
members.push_back(${ member['type'] });
-%endfor
+std::vector members =<% (max_type_len, max_name_len) = 
calc_max_len(type['members']) %>
+{
+%for member in type['members']:
+/* ${member['name']} ${pad(len(member['name']), 
max_name_len)}*/ ${member['type']},
+%endfor
+};
 
 pRetType = StructType::create(members, "${type['name']}", false);
 
@@ -59,13 +59,13 @@ namespace SwrJit
 llvm::DIBuilder builder(*pJitMgr->mpCurrentModule);
 llvm::DIFile* pFile = builder.createFile("${input_file}", 
"${os.path.normpath(input_dir).replace('\\', '/')}");
 
-std::vector> dbgMembers;
-%for member in type['members']:
-dbgMembers.push_back(std::make_pair("${member['name']}", ${ 
member['lineNum'] }));
-%endfor
-
+std::vector> dbgMembers =
+{
+%for member in type['members']:
+std::make_pair("${member['name']}", ${pad(len(member['name']), 
max_name_len)}${member['lineNum']}),
+%endfor
+};
 pJitMgr->CreateDebugStructType(pRetType, "${type['name']}", pFile, 
${type['lineNum']}, dbgMembers);
-
 }
 
 return pRetType;
@@ -80,6 +80,12 @@ namespace SwrJit
 
 <%! # Global function definitions
 import os
+def needs_ctx(struct_type):
+for m in struct_type.get('members', []):
+if '(ctx)' in m.get('type', ''):
+return True
+return False
+
 def calc_max_len(fields):
 max_type_len = 0
 max_name_len = 0
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 0/7] Merged 2 patchsets

2018-02-27 Thread George Kyriazis
v2 is a merged patchset of the following email threads:
<1519312447-27403-1-git-send-email-george.kyria...@intel.com>
<1519317122-3529-1-git-send-email-george.kyria...@intel.com>

The cc to mesa-stable is going to be dealt with in a different patch,
since the surrounding context is different, and the same patch won't work
on both branches.

George Kyriazis (7):
  swr/rast: Fix index buffer overfetch issue for non-indexed draws
  swr/rast: whitespace change
  swr/rast: Remove draw type from event definitions
  swr/rast: Code generation cleanup
  swr/rast: Consolidate TRANSLATE_ADDRESS
  swr/rast: Faster frustum prim culling
  swr/rast: revert clip distance precision

 .../drivers/swr/rasterizer/archrast/archrast.cpp   |  8 ++---
 .../swr/rasterizer/archrast/events_private.proto   |  4 ---
 .../swr/rasterizer/codegen/gen_llvm_ir_macros.py   |  3 +-
 .../swr/rasterizer/codegen/templates/gen_llvm.hpp  | 36 +-
 src/gallium/drivers/swr/rasterizer/core/api.cpp|  8 ++---
 .../drivers/swr/rasterizer/core/backend_impl.h |  4 +--
 src/gallium/drivers/swr/rasterizer/core/binner.cpp | 17 +-
 src/gallium/drivers/swr/rasterizer/core/clip.h | 10 --
 .../drivers/swr/rasterizer/core/frontend.cpp   | 15 +
 .../drivers/swr/rasterizer/jitter/builder_mem.cpp  | 20 
 .../drivers/swr/rasterizer/jitter/builder_mem.h|  7 -
 .../drivers/swr/rasterizer/jitter/fetch_jit.cpp|  6 +---
 12 files changed, 97 insertions(+), 41 deletions(-)

-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/7] swr/rast: Fix index buffer overfetch issue for non-indexed draws

2018-02-27 Thread George Kyriazis
Populate pLastIndex, even for the non-indexed case.  An zero pLastIndex
can cause the index offsets inside the fetcher to have non-sensical values
that can be either very large positive or very large negative numbers.

v2: split whitespace change into new commit
---
 src/gallium/drivers/swr/rasterizer/core/frontend.cpp | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/src/gallium/drivers/swr/rasterizer/core/frontend.cpp 
b/src/gallium/drivers/swr/rasterizer/core/frontend.cpp
index 1c4b522..c2be5d7 100644
--- a/src/gallium/drivers/swr/rasterizer/core/frontend.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/frontend.cpp
@@ -1719,6 +1719,21 @@ void ProcessDraw(
 
 if (i < endVertex)
 {
+if (!IsIndexedT::value)
+{
+fetchInfo_lo.pLastIndex = fetchInfo_lo.pIndices;
+uint32_t offset;
+offset = std::min(endVertex-i, (uint32_t) 
KNOB_SIMD16_WIDTH);
+#if USE_SIMD16_SHADERS
+fetchInfo_lo.pLastIndex += offset;
+#else
+fetchInfo_lo.pLastIndex += std::min(offset, (uint32_t) 
KNOB_SIMD_WIDTH);
+uint32_t offset2 = std::min(offset, (uint32_t) 
KNOB_SIMD16_WIDTH)-KNOB_SIMD_WIDTH;
+assert(offset >= 0);
+fetchInfo_hi.pLastIndex = fetchInfo_hi.pIndices;
+fetchInfo_hi.pLastIndex += offset2;
+#endif
+}
 // 1. Execute FS/VS for a single SIMD.
 RDTSC_BEGIN(FEFetchShader, pDC->drawId);
 #if USE_SIMD16_SHADERS
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/7] swr/rast: whitespace change

2018-02-27 Thread George Kyriazis
v2: this commit contains whitespace change split from another commit
---
 src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
index 880aaf8..68bd4c1 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
@@ -1881,7 +1881,7 @@ Value* FetchJit::GetSimdValid32bitIndices(Value* 
pIndices, Value* pLastIndex)
 // vIndexMask-1-1-1-1 0 0 0 0 : offsets < max pass
 // vLoadedIndices 0 1 2 3 0 0 0 0 : offsets >= max masked to 0
 Value* vMaxIndex = VBROADCAST(numIndicesLeft);
-Value* vIndexMask = VPCMPGTD(vMaxIndex,vIndexOffsets);
+Value* vIndexMask = VPCMPGTD(vMaxIndex, vIndexOffsets);
 
 // VMASKLOAD takes an *i8 src pointer
 pIndices = BITCAST(pIndices,PointerType::get(mInt8Ty,0));
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 05/13] intel/genxml: Add Clear Color struct.

2018-02-27 Thread Jason Ekstrand
On Tue, Feb 27, 2018 at 11:55 AM, Rafael Antognolli <
rafael.antogno...@intel.com> wrote:

> On Tue, Feb 27, 2018 at 11:46:12AM -0800, Jason Ekstrand wrote:
> > On Tue, Feb 27, 2018 at 9:35 AM, Rafael Antognolli <
> rafael.antogno...@intel.com
> > > wrote:
> >
> > On Mon, Feb 26, 2018 at 05:04:37PM -0800, Jason Ekstrand wrote:
> > > On Wed, Feb 21, 2018 at 1:45 PM, Rafael Antognolli <
> > rafael.antogno...@intel.com
> > > > wrote:
> > >
> > > The size of the clear color struct (expected by the hardware)
> is 8
> > > dwords (isl_dev.ss.clear_value_state_size here). But we still
> need to
> > > track the size of the clear color, used when memcopying it
> to/from
> > the
> > > state buffer. For that we keep isl_dev.ss.clear_value_size.
> > >
> > > Signed-off-by: Rafael Antognolli 
> > > ---
> > >  src/intel/genxml/gen10.xml | 8 
> > >  src/intel/isl/isl.c| 4 +++-
> > >  src/intel/isl/isl.h| 5 +
> > >  src/intel/vulkan/anv_image.c   | 2 +-
> > >  src/intel/vulkan/anv_private.h | 2 +-
> > >  5 files changed, 18 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/src/intel/genxml/gen10.xml
> b/src/intel/genxml/gen10.xml
> > > index b434d1b0f66..58b83954c4c 100644
> > > --- a/src/intel/genxml/gen10.xml
> > > +++ b/src/intel/genxml/gen10.xml
> > >
> > >
> > > We need gen11 as well
> > >
> > >
> > > @@ -809,6 +809,14 @@
> > >   type="int"
> > />
> > >
> > >
> > > +  
> > > + type="int"
> > />
> > > + type=
> > "int"/>
> > > + type=
> > "int"/>
> > > + type=
> > "int"/>
> > >
> > >
> > > Might be good to put Converted Clear Value Hi/Low in here as well.
> > >
> > >
> > > +
> > > +  
> > > +
> > > length="4">
> > >   type=
> > "int"/>
> > >   type=
> > "uint"/>
> > > diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> > > index 77641a89f86..e94470362e2 100644
> > > --- a/src/intel/isl/isl.c
> > > +++ b/src/intel/isl/isl.c
> > > @@ -79,9 +79,10 @@ isl_device_init(struct isl_device *dev,
> > > * - 2 dwords that can be used by the hardware for
> converted
> > clear
> > > color
> > > *   + some extra bits.
> > > */
> > > -  dev->ss.clear_value_size = 8 * 4;
> > > +  dev->ss.clear_value_size = 4 * 4;
> > >dev->ss.clear_value_offset =
> > >   RENDER_SURFACE_STATE_ClearValueAddress_start(info)
> / 32 *
> > 4;
> > > +  dev->ss.clear_value_state_size =
> CLEAR_COLOR_length(info) * 4;
> > > } else {
> > >dev->ss.clear_value_size =
> > >   isl_align(RENDER_SURFACE_STATE_RedClearColor_bits(info)
> +
> > > @@ -90,6 +91,7 @@ isl_device_init(struct isl_device *dev,
> > > RENDER_SURFACE_STATE_
> AlphaClearColor_bits(info),
> > 32) /
> > > 8;
> > >dev->ss.clear_value_offset =
> > >   RENDER_SURFACE_STATE_RedClearColor_start(info) / 32
> * 4;
> > > +  dev->ss.clear_value_state_size =
> dev->ss.clear_value_size;
> > >
> > >
> > > Ugh... Let's just make these two separate things.
> clear_value_size will
> > be 4 *
> > > 4 on gen9-10 and clear_color_state_size will be 8*4 on gen10+
> >
> > I'm not sure I understand/agree with you here. clear_value_size
> should
> > be 4 * 4 everywhere, since we use this to memcpy the 4 dwords of the
> > clear color. So, are you suggesting we remove clear_value_state_size
> > from gen9?
> >
> >
> > I mean that we have two separate things here: clear_value_size which is
> 4 B on
> > gen7-8, 16 B on gen9-10, and doesn't exist on gen11+ and
> clear_color_state_size
> > which doesn't exist prior to gen10 and is 32 B on gen10+.  Does that
> make more
> > sense?
>
> Hmm... I was planning to use clear_value_size on gen11+ as well. If I'm
> not wrong, there are some loops where we cycle through the dwords in the
> clear color state buffer using clear_value_size as a limit, but we can
> simply drop it and use 4 (dwords). But yeah, I can drop it...
>

I see what you mean.  I'm not quite sure what the right solution is without
knowing exactly what piece of code you're looking at.  The only one I can
think of is init_fast_clear_color and I think the right thing there might
be to just structure the gen split a little differently and do

if (GEN_GEN <= 8) {
   /* Fill out the one dword */
} else {
   for (unsigned i = 0; i < 4; i++) {
  /* Fill out the dwords */
   }
}


> I agree with clear_color_state_size, though.
>
> > > I think this means droppi

[Mesa-dev] [Bug 105238] ast.h:648:16: error: union member 'i' has a non-trivial constructor

2018-02-27 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105238

--- Comment #4 from Francisco Jerez  ---
(In reply to Brian Paul from comment #3)
> What's going on with this bug?  I'm tempted to revert the change soon if it
> can't be fixed.

It can be fixed, a fix has been on the ML since before this bug was filed, I
just got an R-b for it now.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 05/13] intel/genxml: Add Clear Color struct.

2018-02-27 Thread Rafael Antognolli
On Tue, Feb 27, 2018 at 11:46:12AM -0800, Jason Ekstrand wrote:
> On Tue, Feb 27, 2018 at 9:35 AM, Rafael Antognolli 
>  > wrote:
> 
> On Mon, Feb 26, 2018 at 05:04:37PM -0800, Jason Ekstrand wrote:
> > On Wed, Feb 21, 2018 at 1:45 PM, Rafael Antognolli <
> rafael.antogno...@intel.com
> > > wrote:
> >
> > The size of the clear color struct (expected by the hardware) is 8
> > dwords (isl_dev.ss.clear_value_state_size here). But we still need 
> to
> > track the size of the clear color, used when memcopying it to/from
> the
> > state buffer. For that we keep isl_dev.ss.clear_value_size.
> >
> > Signed-off-by: Rafael Antognolli 
> > ---
> >  src/intel/genxml/gen10.xml | 8 
> >  src/intel/isl/isl.c| 4 +++-
> >  src/intel/isl/isl.h| 5 +
> >  src/intel/vulkan/anv_image.c   | 2 +-
> >  src/intel/vulkan/anv_private.h | 2 +-
> >  5 files changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/intel/genxml/gen10.xml b/src/intel/genxml/gen10.xml
> > index b434d1b0f66..58b83954c4c 100644
> > --- a/src/intel/genxml/gen10.xml
> > +++ b/src/intel/genxml/gen10.xml
> >
> >
> > We need gen11 as well
> >
> >
> > @@ -809,6 +809,14 @@
> >   type="int"
> />
> >
> >
> > +  
> > + />
> > + "int"/>
> > + "int"/>
> > + "int"/>
> >
> >
> > Might be good to put Converted Clear Value Hi/Low in here as well.
> >
> >
> > +
> > +  
> > +
> >
> >   "int"/>
> >   "uint"/>
> > diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> > index 77641a89f86..e94470362e2 100644
> > --- a/src/intel/isl/isl.c
> > +++ b/src/intel/isl/isl.c
> > @@ -79,9 +79,10 @@ isl_device_init(struct isl_device *dev,
> > * - 2 dwords that can be used by the hardware for converted
> clear
> > color
> > *   + some extra bits.
> > */
> > -  dev->ss.clear_value_size = 8 * 4;
> > +  dev->ss.clear_value_size = 4 * 4;
> >dev->ss.clear_value_offset =
> >   RENDER_SURFACE_STATE_ClearValueAddress_start(info) / 32 *
> 4;
> > +  dev->ss.clear_value_state_size = CLEAR_COLOR_length(info) * 
> 4;
> > } else {
> >dev->ss.clear_value_size =
> >   isl_align(RENDER_SURFACE_STATE_RedClearColor_bits(info) +
> > @@ -90,6 +91,7 @@ isl_device_init(struct isl_device *dev,
> > RENDER_SURFACE_STATE_AlphaClearColor_bits(info),
> 32) /
> > 8;
> >dev->ss.clear_value_offset =
> >   RENDER_SURFACE_STATE_RedClearColor_start(info) / 32 * 4;
> > +  dev->ss.clear_value_state_size = dev->ss.clear_value_size;
> >
> >
> > Ugh... Let's just make these two separate things.  clear_value_size will
> be 4 *
> > 4 on gen9-10 and clear_color_state_size will be 8*4 on gen10+
> 
> I'm not sure I understand/agree with you here. clear_value_size should
> be 4 * 4 everywhere, since we use this to memcpy the 4 dwords of the
> clear color. So, are you suggesting we remove clear_value_state_size
> from gen9?
> 
> 
> I mean that we have two separate things here: clear_value_size which is 4 B on
> gen7-8, 16 B on gen9-10, and doesn't exist on gen11+ and 
> clear_color_state_size
> which doesn't exist prior to gen10 and is 32 B on gen10+.  Does that make more
> sense?

Hmm... I was planning to use clear_value_size on gen11+ as well. If I'm
not wrong, there are some loops where we cycle through the dwords in the
clear color state buffer using clear_value_size as a limit, but we can
simply drop it and use 4 (dwords). But yeah, I can drop it...

I agree with clear_color_state_size, though.

> > I think this means dropping the previous patch entirely and just making
> this
> > patch add a size field.
> 
> Ack.
>
> > }
> > assert(RENDER_SURFACE_STATE_SurfaceBaseAddress_start(info) % 8 
> ==
> 0);
> > dev->ss.addr_offset =
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/21] intel/isl: Use a 4D physical total extent for size calculations

2018-02-27 Thread Jason Ekstrand
On Tue, Feb 27, 2018 at 4:33 AM, Pohjolainen, Topi <
topi.pohjolai...@gmail.com> wrote:

> On Thu, Feb 22, 2018 at 11:06:45PM -0800, Jason Ekstrand wrote:
> > With Yf and Ys tiling, everything is actually four dimensional because
> > we can have multiple depth or multisampled array slices in the same
> > tile.  This commit just enhances the calculations so they can handle it.
> > ---
> >  src/intel/isl/isl.c | 71 ++
> +--
> >  1 file changed, 53 insertions(+), 18 deletions(-)
> >
> > diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> > index 5bdc726..46aff46 100644
> > --- a/src/intel/isl/isl.c
> > +++ b/src/intel/isl/isl.c
> > @@ -988,7 +988,7 @@ isl_calc_phys_total_extent_el_gen4_2d(
> >const struct isl_extent4d *phys_level0_sa,
> >enum isl_array_pitch_span array_pitch_span,
> >uint32_t *array_pitch_el_rows,
> > -  struct isl_extent2d *total_extent_el)
> > +  struct isl_extent4d *phys_total_el)
> >  {
> > const struct isl_format_layout *fmtl = isl_format_get_layout(info->
> format);
> >
> > @@ -1001,10 +1001,12 @@ isl_calc_phys_total_extent_el_gen4_2d(
> > image_align_sa,
> phys_level0_sa,
> > array_pitch_span,
> > &phys_slice0_sa);
> > -   *total_extent_el = (struct isl_extent2d) {
> > +   *phys_total_el = (struct isl_extent4d) {
> >.w = isl_assert_div(phys_slice0_sa.w, fmtl->bw),
> >.h = *array_pitch_el_rows * (phys_level0_sa->array_len - 1) +
> > isl_assert_div(phys_slice0_sa.h, fmtl->bh),
> > +  .d = 1,
> > +  .a = 1,
> > };
> >  }
> >
> > @@ -1019,7 +1021,7 @@ isl_calc_phys_total_extent_el_gen4_3d(
> >const struct isl_extent3d *image_align_sa,
> >const struct isl_extent4d *phys_level0_sa,
> >uint32_t *array_pitch_el_rows,
> > -  struct isl_extent2d *phys_total_el)
> > +  struct isl_extent4d *phys_total_el)
> >  {
> > const struct isl_format_layout *fmtl = isl_format_get_layout(info->
> format);
> >
> > @@ -1066,9 +1068,11 @@ isl_calc_phys_total_extent_el_gen4_3d(
> >  */
> > *array_pitch_el_rows =
> >isl_align_npot(phys_level0_sa->h, image_align_sa->h) / fmtl->bw;
> > -   *phys_total_el = (struct isl_extent2d) {
> > +   *phys_total_el = (struct isl_extent4d) {
> >.w = isl_assert_div(total_w, fmtl->bw),
> >.h = isl_assert_div(total_h, fmtl->bh),
> > +  .d = 1,
> > +  .a = 1,
> > };
> >  }
> >
> > @@ -1084,7 +1088,7 @@ isl_calc_phys_total_extent_el_gen6_stencil_hiz(
> >const struct isl_extent3d *image_align_sa,
> >const struct isl_extent4d *phys_level0_sa,
> >uint32_t *array_pitch_el_rows,
> > -  struct isl_extent2d *phys_total_el)
> > +  struct isl_extent4d *phys_total_el)
> >  {
> > const struct isl_format_layout *fmtl = isl_format_get_layout(info->
> format);
> >
> > @@ -1127,9 +1131,11 @@ isl_calc_phys_total_extent_el_gen6_stencil_hiz(
> >
> > *array_pitch_el_rows =
> >isl_assert_div(isl_align(H0, image_align_sa->h), fmtl->bh);
> > -   *phys_total_el = (struct isl_extent2d) {
> > +   *phys_total_el = (struct isl_extent4d) {
> >.w = isl_assert_div(MAX(total_top_w, total_bottom_w), fmtl->bw),
> >.h = isl_assert_div(total_h, fmtl->bh),
> > +  .d = 1,
> > +  .a = 1,
> > };
> >  }
> >
> > @@ -1144,7 +1150,7 @@ isl_calc_phys_total_extent_el_gen9_1d(
> >const struct isl_extent3d *image_align_sa,
> >const struct isl_extent4d *phys_level0_sa,
> >uint32_t *array_pitch_el_rows,
> > -  struct isl_extent2d *phys_total_el)
> > +  struct isl_extent4d *phys_total_el)
> >  {
> > MAYBE_UNUSED const struct isl_format_layout *fmtl =
> isl_format_get_layout(info->format);
> >
> > @@ -1164,9 +1170,11 @@ isl_calc_phys_total_extent_el_gen9_1d(
> > }
> >
> > *array_pitch_el_rows = 1;
> > -   *phys_total_el = (struct isl_extent2d) {
> > +   *phys_total_el = (struct isl_extent4d) {
> >.w = isl_assert_div(slice_w, fmtl->bw),
> >.h = phys_level0_sa->array_len,
> > +  .d = 1,
> > +  .a = 1,
> > };
> >  }
> >
> > @@ -1184,7 +1192,7 @@ isl_calc_phys_total_extent_el(const struct
> isl_device *dev,
> >const struct isl_extent4d *phys_level0_sa,
> >enum isl_array_pitch_span
> array_pitch_span,
> >uint32_t *array_pitch_el_rows,
> > -  struct isl_extent2d *total_extent_el)
> > +  struct isl_extent4d *phys_total_el)
> >  {
> > switch (dim_layout) {
> > case ISL_DIM_LAYOUT_GEN9_1D:
> > @@ -1192,14 +1200,14 @@ isl_calc_phys_total_extent_el(const struct
> isl_device *dev,
> >isl_calc_phys_total_extent_el_gen9_1d(dev, info,
> >  image_align_sa,
> phys

[Mesa-dev] [PATCH 2/2] anv/pipeline: don't pass constant view index in multiview

2018-02-27 Thread Caio Marcelo de Oliveira Filho
If view mask has only one bit set, view index is effectively a
constant, so doesn't need to be passed to the next stages, just always
set it.

Part of this was in the original patch that added
anv_nir_lower_multiview.c but disabled.
---
 src/intel/vulkan/anv_nir_lower_multiview.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/src/intel/vulkan/anv_nir_lower_multiview.c 
b/src/intel/vulkan/anv_nir_lower_multiview.c
index d2aefdee62..365a70d757 100644
--- a/src/intel/vulkan/anv_nir_lower_multiview.c
+++ b/src/intel/vulkan/anv_nir_lower_multiview.c
@@ -72,7 +72,8 @@ build_view_index(struct lower_multiview_state *state)
   b->cursor = nir_before_block(nir_start_block(b->impl));
 
   assert(state->view_mask != 0);
-  if (0 && _mesa_bitcount(state->view_mask) == 1) {
+  if (_mesa_bitcount(state->view_mask) == 1) {
+ /* Set the view index directly. */
  state->view_index = nir_imm_int(b, ffs(state->view_mask) - 1);
   } else if (state->builder.shader->info.stage == MESA_SHADER_VERTEX) {
  /* We only support 16 viewports */
@@ -210,11 +211,15 @@ anv_nir_lower_multiview(nir_shader *shader, uint32_t 
view_mask)
   assert(view_index->parent_instr->block == nir_start_block(entrypoint));
   b->cursor = nir_after_instr(view_index->parent_instr);
 
-  nir_variable *view_index_out =
- nir_variable_create(shader, nir_var_shader_out,
- glsl_int_type(), "view index");
-  view_index_out->data.location = VARYING_SLOT_VIEW_INDEX;
-  nir_store_var(b, view_index_out, view_index, 0x1);
+  /* Unless there is only one possible view index (that would be set
+   * directly), pass it to the next stage. */
+  if (_mesa_bitcount(state.view_mask) != 1) {
+ nir_variable *view_index_out =
+nir_variable_create(shader, nir_var_shader_out,
+glsl_int_type(), "view index");
+ view_index_out->data.location = VARYING_SLOT_VIEW_INDEX;
+ nir_store_var(b, view_index_out, view_index, 0x1);
+  }
 
   nir_variable *layer_id_out =
  nir_variable_create(shader, nir_var_shader_out,
-- 
2.16.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 105238] ast.h:648:16: error: union member 'i' has a non-trivial constructor

2018-02-27 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105238

--- Comment #3 from Brian Paul  ---
What's going on with this bug?  I'm tempted to revert the change soon if it
can't be fixed.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] anv/pipeline: use less instructions for multiview

2018-02-27 Thread Caio Marcelo de Oliveira Filho
The view_index is encoded in the remainder of dividing instance id by
the number of views in the view mask (n). In the general case (handled
by the else clause), there is a need to map from 0..n-1 into the
number of the view being masked. For that a map is encoded.

In the case only the first n bits in the mask are set, the mapping is
trivial, 0..n-1 already represent what view is being referred to.

That case was in the original patch that added
anv_nir_lower_multiview.c but disabled.
---
 src/intel/vulkan/anv_nir_lower_multiview.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_nir_lower_multiview.c 
b/src/intel/vulkan/anv_nir_lower_multiview.c
index 88e6f9af87..d2aefdee62 100644
--- a/src/intel/vulkan/anv_nir_lower_multiview.c
+++ b/src/intel/vulkan/anv_nir_lower_multiview.c
@@ -86,7 +86,7 @@ build_view_index(struct lower_multiview_state *state)
 nir_umod(b, nir_load_instance_id(b),
 nir_imm_int(b, _mesa_bitcount(state->view_mask)));
 
- if (0 && util_is_power_of_two(state->view_mask + 1)) {
+ if (util_is_power_of_two(state->view_mask + 1)) {
 /* If we have a full view mask, then compacted is what we want */
 state->view_index = compacted;
  } else {
-- 
2.16.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 05/13] intel/genxml: Add Clear Color struct.

2018-02-27 Thread Jason Ekstrand
On Tue, Feb 27, 2018 at 9:35 AM, Rafael Antognolli <
rafael.antogno...@intel.com> wrote:

> On Mon, Feb 26, 2018 at 05:04:37PM -0800, Jason Ekstrand wrote:
> > On Wed, Feb 21, 2018 at 1:45 PM, Rafael Antognolli <
> rafael.antogno...@intel.com
> > > wrote:
> >
> > The size of the clear color struct (expected by the hardware) is 8
> > dwords (isl_dev.ss.clear_value_state_size here). But we still need
> to
> > track the size of the clear color, used when memcopying it to/from
> the
> > state buffer. For that we keep isl_dev.ss.clear_value_size.
> >
> > Signed-off-by: Rafael Antognolli 
> > ---
> >  src/intel/genxml/gen10.xml | 8 
> >  src/intel/isl/isl.c| 4 +++-
> >  src/intel/isl/isl.h| 5 +
> >  src/intel/vulkan/anv_image.c   | 2 +-
> >  src/intel/vulkan/anv_private.h | 2 +-
> >  5 files changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/intel/genxml/gen10.xml b/src/intel/genxml/gen10.xml
> > index b434d1b0f66..58b83954c4c 100644
> > --- a/src/intel/genxml/gen10.xml
> > +++ b/src/intel/genxml/gen10.xml
> >
> >
> > We need gen11 as well
> >
> >
> > @@ -809,6 +809,14 @@
> >   type="int"/>
> >
> >
> > +  
> > + type="int"/>
> > + type="int"/>
> > + type="int"/>
> > + type="int"/>
> >
> >
> > Might be good to put Converted Clear Value Hi/Low in here as well.
> >
> >
> > +
> > +  
> > +
> >
> >   type="int"/>
> >   type="uint"/>
> > diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> > index 77641a89f86..e94470362e2 100644
> > --- a/src/intel/isl/isl.c
> > +++ b/src/intel/isl/isl.c
> > @@ -79,9 +79,10 @@ isl_device_init(struct isl_device *dev,
> > * - 2 dwords that can be used by the hardware for converted
> clear
> > color
> > *   + some extra bits.
> > */
> > -  dev->ss.clear_value_size = 8 * 4;
> > +  dev->ss.clear_value_size = 4 * 4;
> >dev->ss.clear_value_offset =
> >   RENDER_SURFACE_STATE_ClearValueAddress_start(info) / 32 *
> 4;
> > +  dev->ss.clear_value_state_size = CLEAR_COLOR_length(info) * 4;
> > } else {
> >dev->ss.clear_value_size =
> >   isl_align(RENDER_SURFACE_STATE_RedClearColor_bits(info) +
> > @@ -90,6 +91,7 @@ isl_device_init(struct isl_device *dev,
> > RENDER_SURFACE_STATE_AlphaClearColor_bits(info),
> 32) /
> > 8;
> >dev->ss.clear_value_offset =
> >   RENDER_SURFACE_STATE_RedClearColor_start(info) / 32 * 4;
> > +  dev->ss.clear_value_state_size = dev->ss.clear_value_size;
> >
> >
> > Ugh... Let's just make these two separate things.  clear_value_size will
> be 4 *
> > 4 on gen9-10 and clear_color_state_size will be 8*4 on gen10+
>
> I'm not sure I understand/agree with you here. clear_value_size should
> be 4 * 4 everywhere, since we use this to memcpy the 4 dwords of the
> clear color. So, are you suggesting we remove clear_value_state_size
> from gen9?
>

I mean that we have two separate things here: clear_value_size which is 4 B
on gen7-8, 16 B on gen9-10, and doesn't exist on gen11+ and
clear_color_state_size which doesn't exist prior to gen10 and is 32 B on
gen10+.  Does that make more sense?


> > I think this means dropping the previous patch entirely and just making
> this
> > patch add a size field.
>
> Ack.
>
> > }
> > assert(RENDER_SURFACE_STATE_SurfaceBaseAddress_start(info) % 8
> == 0);
> > dev->ss.addr_offset =
> > diff --git a/src/intel/isl/isl.h b/src/intel/isl/isl.h
> > index 209769a9a99..f1b38efed44 100644
> > --- a/src/intel/isl/isl.h
> > +++ b/src/intel/isl/isl.h
> > @@ -963,6 +963,11 @@ struct isl_device {
> >uint8_t aux_addr_offset;
> >
> >/* Rounded up to the nearest dword to simplify GPU memcpy
> > operations. */
> > +
> > +  /* size of the state buffer used to store the clear value +
> extra
> > +   * additional space used by the hardware */
> > +  uint8_t clear_value_state_size;
> >
> >
> > Maybe call this clear_color_state_size since it is the size of the
> CLEAR_COLOR
> > state.
>
> Ack.
>
> >
> > +  /* size of the clear color itself - used to copy it to/from a
> BO */
> >uint8_t clear_value_size;
> >uint8_t clear_value_offset;
> > } ss;
> > diff --git a/src/intel/vulkan/anv_image.c
> b/src/intel/vulkan/anv_image.c
> > index a0aee43bd21..0dafe03442d 100644
> > --- a/src/intel/vulkan/anv_image.c
> > +++ b/src/intel/vulkan/anv_image.c
> > @@ -264,7 +264,7 @@ add_aux_state_tracking_buffer(struct anv_image
> *image,
> > }
> >
> > /* Clear color and fast clear type */
> > -   unsigned state_size = device->isl_dev.ss.clear_value_size + 4;
> > +   unsigned state_size = de

Re: [Mesa-dev] [PATCH 08/21] intel/isl: Implement correct tile size calculations for Ys/Yf

2018-02-27 Thread Jason Ekstrand
On Tue, Feb 27, 2018 at 6:13 AM, Pohjolainen, Topi <
topi.pohjolai...@gmail.com> wrote:

> On Thu, Feb 22, 2018 at 11:06:48PM -0800, Jason Ekstrand wrote:
> > The tile size calculations use a clever bit of math to make them short
> > and simple.  We add unit tests to assert that they identically match the
> > tables in the PRM.
> > ---
> >  src/intel/Makefile.isl.am |   9 +-
> >  src/intel/isl/isl.c   |  56 ++-
> >  src/intel/isl/meson.build |  11 ++
> >  src/intel/isl/tests/isl_tile_std_y_test.c | 160
> ++
> >  4 files changed, 230 insertions(+), 6 deletions(-)
> >  create mode 100644 src/intel/isl/tests/isl_tile_std_y_test.c
> >
> > diff --git a/src/intel/Makefile.isl.am b/src/intel/Makefile.isl.am
> > index 9525f9e..a498f2f 100644
> > --- a/src/intel/Makefile.isl.am
> > +++ b/src/intel/Makefile.isl.am
> > @@ -75,7 +75,9 @@ isl/isl_format_layout.c: isl/gen_format_layout.py \
> >  #  Tests
> >  # 
> 
> >
> > -check_PROGRAMS += isl/tests/isl_surf_get_image_offset_test
> > +check_PROGRAMS += \
> > + isl/tests/isl_surf_get_image_offset_test \
> > + isl/tests/isl_tile_std_y_test
> >
> >  TESTS += $(check_PROGRAMS)
> >
> > @@ -84,6 +86,11 @@ isl_tests_isl_surf_get_image_offset_test_LDADD = \
> >   isl/libisl.la \
> >   -lm
> >
> > +isl_tests_isl_tile_std_y_test_LDADD = \
> > + common/libintel_common.la \
> > + isl/libisl.la \
> > + -lm
> > +
> >  # 
> 
> >
> >  EXTRA_DIST += \
> > diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> > index aa56a3c..fcbe2ad 100644
> > --- a/src/intel/isl/isl.c
> > +++ b/src/intel/isl/isl.c
> > @@ -217,13 +217,59 @@ isl_tiling_get_info(enum isl_tiling tiling,
> > case ISL_TILING_Yf:
> > case ISL_TILING_Ys: {
> >bool is_Ys = tiling == ISL_TILING_Ys;
> > +  assert(format_bpb >= 8);
> >
> > -  assert(bs > 0);
> > -  unsigned width = 1 << (6 + (ffs(bs) / 2) + (2 * is_Ys));
> > -  unsigned height = 1 << (6 - (ffs(bs) / 2) + (2 * is_Ys));
> > +  switch (dim) {
> > +  case ISL_SURF_DIM_1D:
> > + /* See the Skylake BSpec > Memory Views > Common Surface
> Formats >
> > +  * Surface Layout and Tiling > 1D Surfaces > 1D Alignment
> > +  * Requirements.
>
> I wonder if I'm looking the right version, under "Memory Views" there is no
> section called "Common Surface Formats" - but under "Memory Data Formats"
> there is such. Only there the "1D Surfaces > 1D Alignment" section is
> pretty
> limited - it only says:
>

This is the problem with citing the bspec: it changes.  If you follow that
path in the PRMs, you should get to a useful section.


> 1D surfaces are not tiled, but laid out linearly in memory.
>
> Tiled Resource Mode Bits per ElementHorizontal Alignment
> TRMODE_NONE Any 64
>
> > +  */
> > + logical_el = (struct isl_extent4d) {
> > +.w = 1 << (12 - (ffs(format_bpb) - 4) + (4 * is_Ys)),
> > +.h = 1,
> > +.d = 1,
> > +.a = 1,
> > + };
> > + break;
> > +
> > +  case ISL_SURF_DIM_2D:
> > + /* See the Skylake BSpec > Memory Views > Common Surface
> Formats >
> > +  * Surface Layout and Tiling > 2D Surfaces > 2D/CUBE Alignment
> > +  * Requirements.
> > +  */
> > + logical_el = (struct isl_extent4d) {
> > +.w = 1 << (6 - ((ffs(format_bpb) - 4) / 2) + (2 * is_Ys)),
> > +.h = 1 << (6 - ((ffs(format_bpb) - 3) / 2) + (2 * is_Ys)),
> > +.d = 1,
> > +.a = 1,
> > + };
>
> In case of section "2D/CUBE Alignment" I'm having similar problem - there
> is
> only a simple table. The equations above, however, suggest that there is
> more
> to it.
>

There are tables in the PRM.  For tile size calculations, there are these
fairly simple (Ha!) closed-form calculations.  The unit tests below contain
the actual tables and test that the calculations above match exactly.


> > +
> > + if (is_Ys && samples > 1) {
> > +logical_el.w >>= (ffs(samples) - 0) / 2;
> > +logical_el.h >>= (ffs(samples) - 1) / 2;
> > +logical_el.a = samples;
> > + }
> > + break;
> > +
> > +  case ISL_SURF_DIM_3D:
> > + /* See the Skylake BSpec > Memory Views > Common Surface
> Formats >
> > +  * Surface Layout and Tiling > 3D Surfaces > 3D Alignment
> > +  * Requirements.
> > +  */
> > + logical_el = (struct isl_extent4d) {
> > +.w = 1 << (4 - ((ffs(format_bpb) - 2) / 3) + (2 * is_Ys)),
> > +.h = 1 << (4 - ((ffs(format_bpb) - 4) / 3) + (1 * is_Ys)),
> > +.d = 1 << (4 - ((ffs(format_bpb) - 3) / 3) + (1 * is_Ys)),
> > +.a = 1,
> > + 

Re: [Mesa-dev] [PATCH 08/15] glsl: Switch ast_type_qualifier to a 128-bit bitset.

2018-02-27 Thread Roland Scheidegger
Not my area of expertise, but sure.
Reviewed-by: Roland Scheidegger 


Am 27.02.2018 um 20:14 schrieb Francisco Jerez:
> Do you care enough to give me a reviewed-by so I could land it right
> away?
> 
> Roland Scheidegger  writes:
> 
>> Please don't wait any longer. We really want appveyor (and some of our
>> own build systems) going again...
>>
>> Roland
>>
>> Am 27.02.2018 um 19:58 schrieb Francisco Jerez:
>>> Thanks for testing.  I'm going to land the build fix with your
>>> Tested-by's if nobody raises any concerns in the next 24h.
>>>
>>> "Kyriazis, George"  writes:
>>>
 It also fixes the errors that I was getting with gcc 5.4.0 with configure 
 build on ubuntu 16.04.

 Tested-By: George Kyriazis 
 mailto:george.kyria...@intel.com>>

 On Feb 25, 2018, at 6:52 PM, Roland Scheidegger 
 mailto:srol...@vmware.com>> wrote:

 Am 25.02.2018 um 21:12 schrieb Francisco Jerez:
 Roland Scheidegger mailto:srol...@vmware.com>> writes:

 Am 25.02.2018 um 03:35 schrieb Francisco Jerez:
 Roland Scheidegger mailto:srol...@vmware.com>> writes:

 This seems to have broken compilation with some gcc versions (with scons
 build):

 In file included from src/compiler/glsl/ast_array_index.cpp:24:0:
 src/compiler/glsl/ast.h:648:16: error: member
 ‘ast_type_qualifier::bitset_t ast_type_qualifier::flags::i’ with
 constructor not allowed in union
   bitset_t i;
^

 Oops...  And the only reason bitset_t has a default constructor was...
 to avoid using another C++11 feature (defaulted member functions).
 Does the attached patch fix the build failure for you?  The cleaner
 alternative would be to define the default constructor of the bitset
 object like 'T() = default', but that would imply dropping support for
 GCC 4.2-4.3 which don't implement the feature...

 FWIW the compile error was happening with gcc 4.8 - I didn't see it with
 gcc 5.4.
 (I don't think at vmware we'd care about anything older than gcc 4.4 at
 least but last time someone wanted to bump gcc requirements there were
 still people requiring gcc 4.2.)

 The patch compiles albeit there's about two dozen warnings like the
 following:
 glsl/ast_type.cpp: In member function 'bool
 ast_fully_specified_type::has_qualifiers(_mesa_glsl_parse_state*) const':
 glsl/ast_type.cpp:50:67: warning: ISO C++ says that these are ambiguous,
 even though the worst conversion for the first is better than the worst
 conversion for the second: [enabled by default]
return (this->qualifier.flags.i & ~subroutine_only.flags.i) != 0;
   ^
 In file included from glsl/ast.h:31:0,
 from glsl/ast_type.cpp:24:
 ../../src/util/bitset.h:181:7: note: candidate 1: bool operator!=(const
 ast_type_qualifier::bitset_t&, unsigned int)
   operator!=(const T &b, BITSET_WORD x) \
   ^
 glsl/ast.h:477:4: note: in expansion of macro 'DECLARE_BITSET_T'
DECLARE_BITSET_T(bitset_t, 128);
^
 glsl/ast_type.cpp:50:67: note: candidate 2: operator!=(int, int) 
return (this->qualifier.flags.i & ~subroutine_only.flags.i) != 0;
   ^
 Roland


 Ah, yeah, that's because I didn't provide overloads for signed integer
 types, but it should be harmless since the two candidates have the same
 semantics, and should go away with a C++11-capable compiler.  I think
 the attached patch should shut the warnings on older compilers.

 Yes, that compiles without warnings (with gcc 4.8)
 Tested-by: Roland Scheidegger 
 mailto:srol...@vmware.com>>






 src/compiler/glsl/ast.h:648:16: note: unrestricted unions only available
 with -std=c++11 or -std=gnu++11
 scons: *** [build/linux-x86_64-checked/compiler/glsl/ast_array_index.os]
 Error 1
 src/gallium/tests/unit/u_format_test.c: In function ‘main’:
 src/gallium/tests/unit/u_format_test.c:649:44: warning: array subscript
 is above array bounds [-Warray-bounds]
  unpacked[i][j] = test->unpacked[i][j][1];
^
 In file included from src/compiler/glsl/ast_expr.cpp:24:0:
 src/compiler/glsl/ast.h:648:16: error: member
 ‘ast_type_qualifier::bitset_t ast_type_qualifier::flags::i’ with
 constructor not allowed in union
   bitset_t i;
^
 src/compiler/glsl/ast.h:648:16: note: unrestricted unions only available
 with -std=c++11 or -std=gnu++11

 Roland

 [...]



 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://lists.f

[Mesa-dev] [Bug 105274] Buffer overflow in gallium/auxiliary/hud/hud_cpufreq.c

2018-02-27 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105274

--- Comment #1 from Gert Wollny  ---

cpufreq_info::sysfs_filename is a buffer of char[128] and the string fn is 
of the same size and it is written to by using snprintf, indicating its size,
so unless I miss something the buffer overflow is only possible for
cpufreq_info::name.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   >