Re: [Mesa-dev] [PATCH] i965: Advertise 65536 for GL_MAX_UNIFORM_BLOCK_SIZE.

2015-09-09 Thread Tapani Pälli

On 09/09/2015 04:04 AM, Kenneth Graunke wrote:

Our old value of 16384 is the minimum value.  DirectX apparently
requires 65536 at a minimum; that's also what nVidia and the Intel
Windows driver advertise.  AMD advertises MAX_INT.

Ilia Mirkin noticed that "Shadow Warrior" uses UBOs larger than 16k
on Nouveau, which advertises 65536 bytes for this limit.  Traces
captured on Nouveau don't work on i965 because our lower limit causes
the GLSL linker to reject the captured shaders.  While this isn't
important in and of itself, it does suggest that raising the limit
would be beneficial.


There's also one unreleased Linux game that asserts for this.

Reviewed-by: Tapani Pälli 


We can read linear buffers up to 2^27 bytes in size, so raising this
should be safe; we could probably even go larger.  For now, matching
nVidia and Intel/Windows seems like a good plan.

We have to reinitialize MaxCombinedUniformComponents as core Mesa will
have set it based on a stale value for MaxUniformBlockSize.

Signed-off-by: Kenneth Graunke 
---
  src/mesa/drivers/dri/i965/brw_context.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 907b2a0..7c1c133 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -323,6 +323,15 @@ brw_initialize_context_constants(struct brw_context *brw)
  
 ctx->Const.StripTextureBorder = true;
  
+   ctx->Const.MaxUniformBlockSize = 65536;

+   for (int i = 0; i < MESA_SHADER_STAGES; i++) {
+  struct gl_program_constants *prog = >Const.Program[i];
+  prog->MaxUniformBlocks = 12;
+  prog->MaxCombinedUniformComponents =
+ prog->MaxUniformComponents +
+ ctx->Const.MaxUniformBlockSize / 4 * prog->MaxUniformBlocks;
+   }
+
 ctx->Const.MaxDualSourceDrawBuffers = 1;
 ctx->Const.MaxDrawBuffers = BRW_MAX_DRAW_BUFFERS;
 ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits = 
max_samplers;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] radeonsi: don't update polygon offset state if it has no effect

2015-09-09 Thread Michel Dänzer
On 07.09.2015 07:14, Marek Olšák wrote:
> From: Marek Olšák 

This patch is

Reviewed-by: Michel Dänzer 


Patches 1 & 2 are

Acked-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Match MESA_FORMAT_B5G6R5 for a shallow pixel format of GL_RGB

2015-09-09 Thread Chris Wilson
On Thu, Sep 03, 2015 at 11:08:11AM -0700, Ian Romanick wrote:
> On 09/03/2015 09:05 AM, Chris Wilson wrote:
> > If the user supplies a pixel format of GL_RGB + GL_UNSIGNED_SHORT_5_6_5
> > and specifies a generic unsized GL_RGB internal format, match that to a
> > texture format of MESA_FORMAT_B5G6R5 if supported by the hardware.
> > 
> > Noticed while playing with mesa-demos/teximage:
> > 
> >   TexImage(RGB/565 256 x 256): 79.8 images/sec, 10.0 MB/sec
> >   TexSubImage(RGB/565 256 x 256): 3804.9 images/sec, 475.6 MB/sec
> >   GetTexImage(RGB/565 256 x 256): 99.5 images/sec, 12.4 MB/sec
> > 
> > becomes
> > 
> >   TexImage(RGB/565 256 x 256): 3439.1 images/sec, 429.9 MB/sec
> >   TexSubImage(RGB/565 256 x 256): 3744.1 images/sec, 468.0 MB/sec
> >   GetTexImage(RGB/565 256 x 256): 4713.5 images/sec, 589.2 MB/sec
> > 
> > on a puny Baytrail which is still far from what it is capable of. The
> > reason for the disparity is that the teximage demo uses a busy texture
> > which is performs an accelerated pixel conversion from the user's B5G6R5
> > into the native X8B8G8R8. After the patch, no conversion is required
> > allowing use of the blitter and memcpy fast paths.
> > 
> > Signed-off-by: Chris Wilson 
> > ---
> >  src/mesa/main/texformat.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/src/mesa/main/texformat.c b/src/mesa/main/texformat.c
> > index fd9f335..866c7b3 100644
> > --- a/src/mesa/main/texformat.c
> > +++ b/src/mesa/main/texformat.c
> > @@ -114,6 +114,8 @@ _mesa_choose_tex_format(struct gl_context *ctx, GLenum 
> > target,
> > case GL_RGB:
> >if (type == GL_UNSIGNED_INT_2_10_10_10_REV) {
> >   RETURN_IF_SUPPORTED(MESA_FORMAT_B10G10R10A2_UNORM);
> > +  } else if (type == GL_UNSIGNED_SHORT_5_6_5) {
> > + RETURN_IF_SUPPORTED(MESA_FORMAT_B5G6R5_UNORM);
> 
> Oh man... I hate this function.  It's such a big pile of hacks.  I think
> this change is good for now, but I think you're other idea is in the
> right direction.  This function really should check, "If internalFormat
> is a generic base format that matches , is there supported
> format that matches  exactly?"

An issue that cropped in testing was glGenerateMipmap(). As this
switches to using B5G6R5 internally we also use that format for storing
the intermediate layers when generating the mipmap (thus incurring
quantisation errors at each step). What I have in mind to remedy this is
to convert the texture to floating point, generate the mipmap, then
convert back to the internal format. (However, I expect this also to
change test results!) Does that seem reasonable?

"The internal formats of the derived mipmap images all match those of
the levelbase image. The contents of the derived images are computed by
repeated, filtered reduction of the levelbase+1 image. For one- and
two-dimensional array and cube map array textures, each layer is
filtered independently."

Does not seem to exclude the use of resampling.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC 1/2] nir: Add a new lowering pass nir_lower_vec_and_coalesce

2015-09-09 Thread Eduardo Lima Mitev
This pass will propagate the destination components of a vecN instructions,
as destination of the instructions that define its sources; if certain
conditions are met.

If all the components of the destination register in the vecN instruction
can be propagated, the instruction is removed. Otherwise, a new, reduced
vecN instruction is emitted with the channels that remained.

This effectively coalesces registers and reduces indirection.

By now, this pass will only propagate to ALU instructions, but it could
be extended to include other instructions like load_input intrinsic.

It also propagates to instructions within the same block as the vecN
instruction. But it could be made to work cross-block in the future,
though there are non-trivial issues with this like considering
registers that are written in different branches of a conditional.
More analysis is needed to correctly cover these cases.

This pass works on a NIR shader in final form (after SSA), and is
expected to run before nir_lower_vec_to_movs().
---
 src/glsl/Makefile.sources |   1 +
 src/glsl/nir/nir.h|   1 +
 src/glsl/nir/nir_lower_vec_and_coalesce.c | 301 ++
 3 files changed, 303 insertions(+)
 create mode 100644 src/glsl/nir/nir_lower_vec_and_coalesce.c

diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index c422303..015f242 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -48,6 +48,7 @@ NIR_FILES = \
nir/nir_lower_vars_to_ssa.c \
nir/nir_lower_var_copies.c \
nir/nir_lower_vec_to_movs.c \
+   nir/nir_lower_vec_and_coalesce.c \
nir/nir_metadata.c \
nir/nir_normalize_cubemap_coords.c \
nir/nir_opt_constant_folding.c \
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index 3c375f3..6a89f1d 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1786,6 +1786,7 @@ void nir_lower_vars_to_ssa(nir_shader *shader);
 void nir_remove_dead_variables(nir_shader *shader);
 
 void nir_lower_vec_to_movs(nir_shader *shader);
+void nir_lower_vec_and_coalesce(nir_shader *shader);
 void nir_lower_alu_to_scalar(nir_shader *shader);
 void nir_lower_load_const_to_scalar(nir_shader *shader);
 
diff --git a/src/glsl/nir/nir_lower_vec_and_coalesce.c 
b/src/glsl/nir/nir_lower_vec_and_coalesce.c
new file mode 100644
index 000..2b21ec1
--- /dev/null
+++ b/src/glsl/nir/nir_lower_vec_and_coalesce.c
@@ -0,0 +1,301 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Eduardo Lima Mitev (el...@igalia.com)
+ *
+ */
+
+#include "nir.h"
+
+/*
+ * Implements a pass that lowers vecN instructions by propagating the
+ * components of their destinations, as the destination of the
+ * instructions that defines the sources of the vecN instruction.
+ *
+ * This effectively coalesces registers and reduces indirection.
+ *
+ * If all the components of the destination register in the vecN
+ * instruction can be propagated, the instruction is removed. Otherwise,
+ * a new, reduced vecN instruction is emitted with the channels that
+ * remained.
+ *
+ * By now, this pass will only propagate to ALU instructions, but it could
+ * be extended to include load_const instructions or some intrinsics like
+ * load_input.
+ *
+ * This pass works on a NIR shader in final form (after SSA), and is
+ * expected to run before nir_lower_vec_to_movs().
+ */
+
+/**
+ * Clone an ALU instruction and override the destination with the one given by
+ * new_dest. It copies sources from original ALU to the new one, adjusting
+ * their swizzles.
+ *
+ * Returns the new ALU instruction.
+ */
+static nir_alu_instr *
+clone_alu_instr_and_override_dest(nir_alu_instr *alu_instr,
+  nir_alu_dest *new_dest, unsigned index,
+  void *mem_ctx)
+{
+   nir_alu_instr 

[Mesa-dev] [RFC 2/2] i965/nir/vec4: Activate lower_vec_and_coalesce pass on non-scalar shaders

2015-09-09 Thread Eduardo Lima Mitev
Enable the lower_vec_and_coalesce pass on non-scalar shaders, after
the shader is out of SSA-form and before passing lower_vec_to_movs.

shader-db VS results against non-NIR before this patch:

total instructions in shared programs: 1853747 -> 1804428 (-2.66%)
instructions in affected programs: 1687580 -> 1638261 (-2.92%)
helped:6748
HURT:  11101
GAINED:0
LOST:  0

shader-db VS results against non-NIR after this patch:

total instructions in shared programs: 1853747 -> 1762126 (-4.94%)
instructions in affected programs: 1681255 -> 1589634 (-5.45%)
helped:7751
HURT:  9344
GAINED:0
LOST:  0
---
 src/mesa/drivers/dri/i965/brw_nir.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index 8f3edc5..0797fda 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -187,6 +187,9 @@ brw_create_nir(struct brw_context *brw,
nir_validate_shader(nir);
 
if (!is_scalar) {
+  nir_lower_vec_and_coalesce(nir);
+  nir_validate_shader(nir);
+
   nir_lower_vec_to_movs(nir);
   nir_validate_shader(nir);
}
-- 
2.4.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] gallium/radeon: add separate stencil level dirty flags

2015-09-09 Thread Michel Dänzer
On 07.09.2015 07:17, Marek Olšák wrote:
> From: Marek Olšák 
> 
> We will only do depth-only or stencil-only decompress blits, whichever is
> needed by textures, instead of always doing both.
> ---
>  src/gallium/drivers/r600/evergreen_state.c| 4 ++--
>  src/gallium/drivers/r600/r600_state_common.c  | 3 +++
>  src/gallium/drivers/radeon/r600_pipe_common.h | 1 +
>  src/gallium/drivers/radeonsi/cik_sdma.c   | 4 ++--
>  src/gallium/drivers/radeonsi/si_dma.c | 4 ++--
>  src/gallium/drivers/radeonsi/si_state_draw.c  | 3 +++
>  6 files changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/src/gallium/drivers/r600/evergreen_state.c 
> b/src/gallium/drivers/r600/evergreen_state.c
> index 6f4cb55..8ecc498 100644
> --- a/src/gallium/drivers/r600/evergreen_state.c
> +++ b/src/gallium/drivers/r600/evergreen_state.c
> @@ -3372,11 +3372,11 @@ static void evergreen_dma_copy(struct pipe_context 
> *ctx,
>   }
>  
>   if (src->format != dst->format || src_box->depth > 1 ||
> - rdst->dirty_level_mask != 0) {
> + (rdst->dirty_level_mask | rdst->stencil_dirty_level_mask) & (1 << 
> dst_level)) {
>   goto fallback;
>   }
>  
> - if (rsrc->dirty_level_mask) {
> + if ((rsrc->dirty_level_mask | rsrc->stencil_dirty_level_mask) & (1 << 
> src_level)) {
>   ctx->flush_resource(ctx, src);
>   }

AFAICT ctx->flush_resource only decompresses colour resources, so is the
second change really necessary? Same in si_dma_copy().


With that fixed, or if it's correct as is,

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] radeonsi: only do depth-only or stencil-only in-place decompression

2015-09-09 Thread Michel Dänzer
On 07.09.2015 07:17, Marek Olšák wrote:
> From: Marek Olšák 
> 
> instead of always doing both.
> Usually, only depth is needed, so stencil decompression is useless.

[...]

> @@ -247,6 +256,7 @@ void si_flush_depth_textures(struct si_context *sctx,
>   assert(tex->is_depth && !tex->is_flushing_texture);
>  
>   si_blit_decompress_depth_in_place(sctx, tex,
> +   ((struct 
> si_sampler_view*)view)->is_stencil_sampler,

This cast is a bit ugly, please use a separate local variable of type
struct si_sampler_view* for this.


With that fixed,

Reviewed-by: Michel Dänzer 


Correspondingly for patch 3.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Match MESA_FORMAT_B5G6R5 for a shallow pixel format of GL_RGB

2015-09-09 Thread Erik Faye-Lund
On Thu, Sep 3, 2015 at 6:05 PM, Chris Wilson  wrote:
> If the user supplies a pixel format of GL_RGB + GL_UNSIGNED_SHORT_5_6_5
> and specifies a generic unsized GL_RGB internal format, match that to a
> texture format of MESA_FORMAT_B5G6R5 if supported by the hardware.
>
> Noticed while playing with mesa-demos/teximage:
>
>   TexImage(RGB/565 256 x 256): 79.8 images/sec, 10.0 MB/sec
>   TexSubImage(RGB/565 256 x 256): 3804.9 images/sec, 475.6 MB/sec
>   GetTexImage(RGB/565 256 x 256): 99.5 images/sec, 12.4 MB/sec
>
> becomes
>
>   TexImage(RGB/565 256 x 256): 3439.1 images/sec, 429.9 MB/sec
>   TexSubImage(RGB/565 256 x 256): 3744.1 images/sec, 468.0 MB/sec
>   GetTexImage(RGB/565 256 x 256): 4713.5 images/sec, 589.2 MB/sec
>
> on a puny Baytrail which is still far from what it is capable of. The
> reason for the disparity is that the teximage demo uses a busy texture
> which is performs an accelerated pixel conversion from the user's B5G6R5
> into the native X8B8G8R8. After the patch, no conversion is required
> allowing use of the blitter and memcpy fast paths.
>
> Signed-off-by: Chris Wilson 
> ---
>  src/mesa/main/texformat.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/src/mesa/main/texformat.c b/src/mesa/main/texformat.c
> index fd9f335..866c7b3 100644
> --- a/src/mesa/main/texformat.c
> +++ b/src/mesa/main/texformat.c
> @@ -114,6 +114,8 @@ _mesa_choose_tex_format(struct gl_context *ctx, GLenum 
> target,
> case GL_RGB:
>if (type == GL_UNSIGNED_INT_2_10_10_10_REV) {
>   RETURN_IF_SUPPORTED(MESA_FORMAT_B10G10R10A2_UNORM);
> +  } else if (type == GL_UNSIGNED_SHORT_5_6_5) {
> + RETURN_IF_SUPPORTED(MESA_FORMAT_B5G6R5_UNORM);

Shouldn't this be MESA_FORMAT_R5G6B5_UNORM? AFAICT, the reason for
using BGR above, is the _REV suffix on the type...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nvc0: keep track of cb bindings per buffer, use for upload settings

2015-09-09 Thread Ilia Mirkin
CB updates to bound buffers need to go through the CB_DATA endpoints,
otherwise the shader may not notice that the updates happened.
Furthermore, these updates have to go in to the same address as the
bound buffer, otherwise, again, the shader may not notice updates.

So we keep track of all the places where a constbuf is bound, and
iterate over all of them when updating data. If a binding is found that
encompasses the region to be updated, then we use the settings of that
binding for the upload. Otherwise we upload as a regular data update.

This fixes piglit 'arb_uniform_buffer_object-rendering offset' as well
as blurriness in Witcher2.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91890
Signed-off-by: Ilia Mirkin 
Cc: "11.0" 
---
 src/gallium/drivers/nouveau/nouveau_buffer.c   |  4 +-
 src/gallium/drivers/nouveau/nouveau_buffer.h   |  2 +
 src/gallium/drivers/nouveau/nouveau_context.h  |  5 ++-
 src/gallium/drivers/nouveau/nvc0/nvc0_context.h|  8 ++--
 src/gallium/drivers/nouveau/nvc0/nvc0_state.c  |  2 +
 .../drivers/nouveau/nvc0/nvc0_state_validate.c |  3 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c   | 46 --
 7 files changed, 58 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nouveau_buffer.c 
b/src/gallium/drivers/nouveau/nouveau_buffer.c
index 912b778..4937dae 100644
--- a/src/gallium/drivers/nouveau/nouveau_buffer.c
+++ b/src/gallium/drivers/nouveau/nouveau_buffer.c
@@ -206,8 +206,8 @@ nouveau_transfer_write(struct nouveau_context *nv, struct 
nouveau_transfer *tx,
   nv->copy_data(nv, buf->bo, buf->offset + base, buf->domain,
 tx->bo, tx->offset + offset, NOUVEAU_BO_GART, size);
else
-   if ((buf->base.bind & PIPE_BIND_CONSTANT_BUFFER) && nv->push_cb && can_cb)
-  nv->push_cb(nv, buf->bo, buf->domain, buf->offset, buf->base.width0,
+   if (nv->push_cb && can_cb)
+  nv->push_cb(nv, buf,
   base, size / 4, (const uint32_t *)data);
else
   nv->push_data(nv, buf->bo, buf->offset + base, buf->domain, size, data);
diff --git a/src/gallium/drivers/nouveau/nouveau_buffer.h 
b/src/gallium/drivers/nouveau/nouveau_buffer.h
index 7e6a6cc..d45bf7a 100644
--- a/src/gallium/drivers/nouveau/nouveau_buffer.h
+++ b/src/gallium/drivers/nouveau/nouveau_buffer.h
@@ -41,6 +41,8 @@ struct nv04_resource {
uint8_t status;
uint8_t domain;
 
+   uint16_t cb_bindings[6]; /* per-shader per-slot bindings */
+
struct nouveau_fence *fence;
struct nouveau_fence *fence_wr;
 
diff --git a/src/gallium/drivers/nouveau/nouveau_context.h 
b/src/gallium/drivers/nouveau/nouveau_context.h
index 24deb7e..decb271 100644
--- a/src/gallium/drivers/nouveau/nouveau_context.h
+++ b/src/gallium/drivers/nouveau/nouveau_context.h
@@ -6,6 +6,8 @@
 
 #define NOUVEAU_MAX_SCRATCH_BUFS 4
 
+struct nv04_resource;
+
 struct nouveau_context {
struct pipe_context pipe;
struct nouveau_screen *screen;
@@ -23,8 +25,7 @@ struct nouveau_context {
  unsigned, const void *);
/* base, size refer to the whole constant buffer */
void (*push_cb)(struct nouveau_context *,
-   struct nouveau_bo *, unsigned domain,
-   unsigned base, unsigned size,
+   struct nv04_resource *,
unsigned offset, unsigned words, const uint32_t *);
 
/* @return: @ref reduced by nr of references found in context */
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h 
b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
index 6ed79cf..30bee3a 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
@@ -299,10 +299,10 @@ nve4_p2mf_push_linear(struct nouveau_context *nv,
   struct nouveau_bo *dst, unsigned offset, unsigned domain,
   unsigned size, const void *data);
 void
-nvc0_cb_push(struct nouveau_context *,
- struct nouveau_bo *bo, unsigned domain,
- unsigned base, unsigned size,
- unsigned offset, unsigned words, const uint32_t *data);
+nvc0_cb_bo_push(struct nouveau_context *,
+struct nouveau_bo *bo, unsigned domain,
+unsigned base, unsigned size,
+unsigned offset, unsigned words, const uint32_t *data);
 
 /* nvc0_vbo.c */
 void nvc0_draw_vbo(struct pipe_context *, const struct pipe_draw_info *);
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
index ee29912..c5bfd03 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
@@ -831,6 +831,8 @@ nvc0_set_constant_buffer(struct pipe_context *pipe, uint 
shader, uint index,
}
nvc0->constbuf_dirty[s] |= 1 << i;
 
+   if (nvc0->constbuf[s][i].u.buf)
+  nv04_resource(nvc0->constbuf[s][i].u.buf)->cb_bindings[s] &= ~(1 

Re: [Mesa-dev] [PATCH v4 (part2) 24/59] glsl: a shader storage buffer must be smaller than the maximum size allowed

2015-09-09 Thread Samuel Iglesias Gonsálvez


On 08/09/15 19:50, Jordan Justen wrote:
> On 2015-08-05 01:30:21, Iago Toral Quiroga wrote:
>> From: Samuel Iglesias Gonsalvez 
>>
>> Otherwise, generate a link time error as per the
>> ARB_shader_storage_buffer_object spec.
>>
>> Signed-off-by: Samuel Iglesias Gonsalvez 
>> ---
>>  src/glsl/glsl_types.cpp  |  9 +++--
>>  src/glsl/link_uniform_blocks.cpp | 17 +
>>  src/glsl/linker.cpp  |  2 +-
>>  src/glsl/linker.h|  1 +
>>  4 files changed, 26 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
>> index 5758029..1b8e2e4 100644
>> --- a/src/glsl/glsl_types.cpp
>> +++ b/src/glsl/glsl_types.cpp
>> @@ -1312,7 +1312,7 @@ glsl_type::std140_size(bool row_major) const
>>  * rounded up to the next multiple of the base alignment of the
>>  * structure.
>>  */
>> -   if (this->is_record()) {
>> +   if (this->is_record() || this->is_interface()) {
>>unsigned size = 0;
>>unsigned max_align = 0;
>>  
>> @@ -1328,7 +1328,12 @@ glsl_type::std140_size(bool row_major) const
>>  
>>  const struct glsl_type *field_type = this->fields.structure[i].type;
>>  unsigned align = field_type->std140_base_alignment(field_row_major);
>> -size = glsl_align(size, align);
>> +
>> + /* Ignore unsized arrays when calculating size */
>> + if (field_type->is_unsized_array())
>> +continue;
>> +
>> + size = glsl_align(size, align);
>>  size += field_type->std140_size(field_row_major);
>>  
>>   max_align = MAX2(align, max_align);
>> diff --git a/src/glsl/link_uniform_blocks.cpp 
>> b/src/glsl/link_uniform_blocks.cpp
>> index c891d03..2cda78d 100644
>> --- a/src/glsl/link_uniform_blocks.cpp
>> +++ b/src/glsl/link_uniform_blocks.cpp
>> @@ -178,6 +178,7 @@ struct block {
>>  
>>  unsigned
>>  link_uniform_blocks(void *mem_ctx,
>> +struct gl_context *ctx,
>>  struct gl_shader_program *prog,
>>  struct gl_shader **shader_list,
>>  unsigned num_shaders,
>> @@ -299,6 +300,14 @@ link_uniform_blocks(void *mem_ctx,
>>  
>>  blocks[i].UniformBufferSize = parcel.buffer_size;
>>  
>> +/* Check SSBO size is lower than maximum supported size for 
>> SSBO */
>> +if (b->is_shader_storage &&
>> +parcel.buffer_size > ctx->Const.MaxShaderStorageBlockSize) {
>> +   linker_error(prog, "shader storage block `%s' has size %d > 
>> %d",
>> +   block_type->name,
>> +   parcel.buffer_size,
>> +   ctx->Const.MaxShaderStorageBlockSize);
> 
> This error message (and below) seems confusing. What about "shader
> storage block `%s' has size %d, which is larger than than the maximum
> allowed (%d)"?
> 

Yeah, it is more clear. I will do that change.

> Reviewed-by: Jordan Justen 

Thanks for your reviews!

Sam

> 
>> +}
>>  blocks[i].NumUniforms =
>> (unsigned)(ptrdiff_t)([parcel.index] - 
>> blocks[i].Uniforms);
>>  
>> @@ -319,6 +328,14 @@ link_uniform_blocks(void *mem_ctx,
>>  
>>   blocks[i].UniformBufferSize = parcel.buffer_size;
>>  
>> + /* Check SSBO size is lower than maximum supported size for SSBO */
>> + if (b->is_shader_storage &&
>> + parcel.buffer_size > ctx->Const.MaxShaderStorageBlockSize) {
>> +linker_error(prog, "shader storage block `%s' has size %d > %d",
>> +block_type->name,
>> +parcel.buffer_size,
>> +ctx->Const.MaxShaderStorageBlockSize);
>> + }
>>   blocks[i].NumUniforms =
>>  (unsigned)(ptrdiff_t)([parcel.index] - 
>> blocks[i].Uniforms);
>>  
>> diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
>> index 9aff3ce..7ba685b 100644
>> --- a/src/glsl/linker.cpp
>> +++ b/src/glsl/linker.cpp
>> @@ -2023,7 +2023,7 @@ link_intrastage_shaders(void *mem_ctx,
>>  
>> /* Link up uniform blocks defined within this stage. */
>> const unsigned num_uniform_blocks =
>> -  link_uniform_blocks(mem_ctx, prog, shader_list, num_shaders,
>> +  link_uniform_blocks(mem_ctx, ctx, prog, shader_list, num_shaders,
>>_blocks);
>> if (!prog->LinkStatus)
>>return NULL;
>> diff --git a/src/glsl/linker.h b/src/glsl/linker.h
>> index ce3dc32..ada5a1f 100644
>> --- a/src/glsl/linker.h
>> +++ b/src/glsl/linker.h
>> @@ -56,6 +56,7 @@ link_uniform_blocks_are_compatible(const gl_uniform_block 
>> *a,
>>  
>>  extern unsigned
>>  link_uniform_blocks(void *mem_ctx,
>> +struct gl_context *ctx,
>>  struct gl_shader_program *prog,
>>  struct gl_shader **shader_list,
>>  

Re: [Mesa-dev] [PATCH v4 (part2) 23/59] glsl: add std430 interface packing support to ssbo store and unsized array length

2015-09-09 Thread Samuel Iglesias Gonsálvez


On 08/09/15 18:52, Jordan Justen wrote:
> On 2015-08-05 01:30:20, Iago Toral Quiroga wrote:
>> From: Samuel Iglesias Gonsalvez 
>>
>> Signed-off-by: Samuel Iglesias Gonsalvez 
>> ---
>>  src/glsl/lower_ubo_reference.cpp | 65 
>> ++--
>>  1 file changed, 49 insertions(+), 16 deletions(-)
>>
>> diff --git a/src/glsl/lower_ubo_reference.cpp 
>> b/src/glsl/lower_ubo_reference.cpp
>> index 4f42abb..495ffcf 100644
>> --- a/src/glsl/lower_ubo_reference.cpp
>> +++ b/src/glsl/lower_ubo_reference.cpp
>> @@ -164,7 +164,7 @@ public:
>> void emit_access(bool is_write, ir_dereference *deref,
>>  ir_variable *base_offset, unsigned int deref_offset,
>>  bool row_major, int matrix_columns,
>> -unsigned write_mask);
>> +bool is_std430, unsigned write_mask);
>>  
>> ir_visitor_status visit_enter(class ir_expression *);
>> void check_ssbo_unsized_array_length_expression(class ir_expression *);
>> @@ -343,8 +343,15 @@ 
>> lower_ubo_reference_visitor::setup_for_load_or_store(ir_variable *var,
>>  const bool array_row_major =
>> is_dereferenced_thing_row_major(deref_array);
>>  
>> -array_stride = deref_array->type->std140_size(array_row_major);
>> -array_stride = glsl_align(array_stride, 16);
>> +/* The array type will give the correct interface packing
>> + * information
>> + */
>> +if (deref_array->array->type->interface_packing == 
>> GLSL_INTERFACE_PACKING_STD430) {
>> +   array_stride = 
>> deref_array->type->std430_size(array_row_major);
>> +} else {
>> +   array_stride = 
>> deref_array->type->std140_size(array_row_major);
>> +   array_stride = glsl_align(array_stride, 16);
>> +}
>>   }
>>  
>>   ir_rvalue *array_index = deref_array->array_index;
>> @@ -380,7 +387,12 @@ 
>> lower_ubo_reference_visitor::setup_for_load_or_store(ir_variable *var,
>>  
>>  ralloc_free(field_deref);
>>  
>> -unsigned field_align = 
>> type->std140_base_alignment(field_row_major);
>> +unsigned field_align = 0;
>> +
>> +if (struct_type->interface_packing == 
>> GLSL_INTERFACE_PACKING_STD430)
>> +   field_align = type->std430_base_alignment(field_row_major);
>> +else
>> +   field_align = type->std140_base_alignment(field_row_major);
>>  
>>  intra_struct_offset = glsl_align(intra_struct_offset, 
>> field_align);
>>  
>> @@ -388,7 +400,10 @@ 
>> lower_ubo_reference_visitor::setup_for_load_or_store(ir_variable *var,
>> deref_record->field) == 0)
>> break;
>>  
>> -intra_struct_offset += type->std140_size(field_row_major);
>> +if (struct_type->interface_packing == 
>> GLSL_INTERFACE_PACKING_STD430)
>> +   intra_struct_offset += type->std430_size(field_row_major);
>> +else
>> +   intra_struct_offset += type->std140_size(field_row_major);
>>  
>>  /* If the field just examined was itself a structure, apply rule
>>   * #9:
>> @@ -463,7 +478,7 @@ lower_ubo_reference_visitor::handle_rvalue(ir_rvalue 
>> **rvalue)
>>  
>> deref = new(mem_ctx) ir_dereference_variable(load_var);
>> emit_access(false, deref, load_offset, const_offset,
>> -   row_major, matrix_columns, 0);
>> +   row_major, matrix_columns, false, 0);
>> *rvalue = deref;
>>  
>> progress = true;
>> @@ -581,6 +596,7 @@ lower_ubo_reference_visitor::emit_access(bool is_write,
>>   unsigned int deref_offset,
>>   bool row_major,
>>   int matrix_columns,
>> + bool is_std430,
>>   unsigned write_mask)
>>  {
>> if (deref->type->is_record()) {
>> @@ -599,7 +615,7 @@ lower_ubo_reference_visitor::emit_access(bool is_write,
>>  
>>   emit_access(is_write, field_deref, base_offset,
>>   deref_offset + field_offset,
>> - row_major, 1,
>> + row_major, 1, is_std430,
>>   
>> writemask_for_size(field_deref->type->vector_elements));
>>  
>>   field_offset += field->type->std140_size(row_major);
>> @@ -608,7 +624,8 @@ lower_ubo_reference_visitor::emit_access(bool is_write,
>> }
>>  
>> if (deref->type->is_array()) {
>> -  unsigned array_stride =
>> +  unsigned array_stride = is_std430 ?
>> + deref->type->fields.array->std430_size(row_major) :
>>   glsl_align(deref->type->fields.array->std140_size(row_major), 16);
>>  
>>for (unsigned i = 0; i < deref->type->length; i++) {
>> @@ 

Re: [Mesa-dev] [PATCH v4 (part2) 20/59] glsl: Add parser/compiler support for std430 interface packing qualifier

2015-09-09 Thread Samuel Iglesias Gonsálvez


On 31/08/15 08:38, Samuel Iglesias Gonsálvez wrote:
> 
> 
> On 29/08/15 02:27, Jordan Justen wrote:
>> On 2015-08-05 01:30:17, Iago Toral Quiroga wrote:
>>> From: Samuel Iglesias Gonsalvez 
>>>
>>> This commit also adds functions to calculate std430 base alignment and sizes
>>>
>>> Signed-off-by: Samuel Iglesias Gonsalvez 
>>> ---
>>>  src/glsl/ast.h   |   1 +
>>>  src/glsl/ast_to_hir.cpp  |  20 +--
>>>  src/glsl/ast_type.cpp|   1 +
>>>  src/glsl/glsl_parser.yy  |   2 +
>>>  src/glsl/glsl_types.cpp  | 117 
>>> +++
>>>  src/glsl/glsl_types.h|  15 -
>>>  src/glsl/link_uniform_blocks.cpp |  15 -
>>>  src/mesa/main/mtypes.h   |   3 +-
>>>  8 files changed, 165 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/src/glsl/ast.h b/src/glsl/ast.h
>>> index d8c6cea..ac3f80f 100644
>>> --- a/src/glsl/ast.h
>>> +++ b/src/glsl/ast.h
>>> @@ -491,6 +491,7 @@ struct ast_type_qualifier {
>>>  /** \name Layout qualifiers for GL_ARB_uniform_buffer_object */
>>>  /** \{ */
>>>   unsigned std140:1;
>>> + unsigned std430:1;
>>>   unsigned shared:1;
>>>   unsigned packed:1;
>>>   unsigned column_major:1;
>>> diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
>>> index 6e63d86..2dadb03 100644
>>> --- a/src/glsl/ast_to_hir.cpp
>>> +++ b/src/glsl/ast_to_hir.cpp
>>> @@ -2886,11 +2886,12 @@ apply_type_qualifier_to_variable(const struct 
>>> ast_type_qualifier *qual,
>>> var->data.depth_layout = ir_depth_layout_none;
>>>  
>>> if (qual->flags.q.std140 ||
>>> +   qual->flags.q.std430 ||
>>> qual->flags.q.packed ||
>>> qual->flags.q.shared) {
>>>_mesa_glsl_error(loc, state,
>>> -   "uniform block layout qualifiers std140, packed, 
>>> and "
>>> -   "shared can only be applied to uniform blocks, not "
>>> +   "uniform block layout qualifiers std140, std430, 
>>> packed, "
>>> +   "and shared can only be applied to uniform blocks, 
>>> not "
>>> "members");
>>> }
>>>  
>>> @@ -5655,12 +5656,14 @@ ast_process_structure_or_interface_block(exec_list 
>>> *instructions,
>>>   const struct ast_type_qualifier *const qual =
>>>  & decl_list->type->qualifier;
>>>   if (qual->flags.q.std140 ||
>>> + qual->flags.q.std430 ||
>>>   qual->flags.q.packed ||
>>>   qual->flags.q.shared) {
>>>  _mesa_glsl_error(, state,
>>>   "uniform/shader storage block layout 
>>> qualifiers "
>>> - "std140, packed, and shared can only be 
>>> applied "
>>> - "to uniform/shader storage blocks, not 
>>> members");
>>> + "std140, std430, packed, and shared can only 
>>> be "
>>> + "applied to uniform/shader storage blocks, 
>>> not "
>>> + "members");
>>>   }
>>>  
>>>   if (qual->flags.q.constant) {
>>> @@ -5872,6 +5875,13 @@ ast_interface_block::hir(exec_list *instructions,
>>> this->block_name);
>>> }
>>>  
>>> +   if (!this->layout.flags.q.buffer &&
>>> +   this->layout.flags.q.std430) {
>>> +  _mesa_glsl_error(, state,
>>> +   "std430 storage block layout qualifier is supported 
>>> "
>>> +   "only for shader storage blocks");
>>> +   }
>>> +
>>> /* The ast_interface_block has a list of ast_declarator_lists.  We
>>>  * need to turn those into ir_variables with an association
>>>  * with this uniform block.
>>> @@ -5881,6 +5891,8 @@ ast_interface_block::hir(exec_list *instructions,
>>>packing = GLSL_INTERFACE_PACKING_SHARED;
>>> } else if (this->layout.flags.q.packed) {
>>>packing = GLSL_INTERFACE_PACKING_PACKED;
>>> +   } else if (this->layout.flags.q.std430) {
>>> +  packing = GLSL_INTERFACE_PACKING_STD430;
>>> } else {
>>>/* The default layout is std140.
>>> */
>>> diff --git a/src/glsl/ast_type.cpp b/src/glsl/ast_type.cpp
>>> index a4671e2..6e69ba2 100644
>>> --- a/src/glsl/ast_type.cpp
>>> +++ b/src/glsl/ast_type.cpp
>>> @@ -123,6 +123,7 @@ ast_type_qualifier::merge_qualifier(YYLTYPE *loc,
>>> ubo_layout_mask.flags.q.std140 = 1;
>>> ubo_layout_mask.flags.q.packed = 1;
>>> ubo_layout_mask.flags.q.shared = 1;
>>> +   ubo_layout_mask.flags.q.std430 = 1;
>>>  
>>> ast_type_qualifier ubo_binding_mask;
>>> ubo_binding_mask.flags.i = 0;
>>> diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy
>>> index 2b0c8bd..e1c0f3d 100644
>>> --- a/src/glsl/glsl_parser.yy
>>> +++ b/src/glsl/glsl_parser.yy
>>> @@ -1197,6 +1197,8 @@ layout_qualifier_id:
>>>  $$.flags.q.std140 = 1;

Re: [Mesa-dev] [PATCH] mesa: Match MESA_FORMAT_B5G6R5 for a shallow pixel format of GL_RGB

2015-09-09 Thread Erik Faye-Lund
On Wed, Sep 9, 2015 at 11:25 AM, Chris Wilson  wrote:
> On Wed, Sep 09, 2015 at 11:11:59AM +0200, Erik Faye-Lund wrote:
>> On Thu, Sep 3, 2015 at 6:05 PM, Chris Wilson  
>> wrote:
>> > If the user supplies a pixel format of GL_RGB + GL_UNSIGNED_SHORT_5_6_5
>> > and specifies a generic unsized GL_RGB internal format, match that to a
>> > texture format of MESA_FORMAT_B5G6R5 if supported by the hardware.
>> >
>> > Noticed while playing with mesa-demos/teximage:
>> >
>> >   TexImage(RGB/565 256 x 256): 79.8 images/sec, 10.0 MB/sec
>> >   TexSubImage(RGB/565 256 x 256): 3804.9 images/sec, 475.6 MB/sec
>> >   GetTexImage(RGB/565 256 x 256): 99.5 images/sec, 12.4 MB/sec
>> >
>> > becomes
>> >
>> >   TexImage(RGB/565 256 x 256): 3439.1 images/sec, 429.9 MB/sec
>> >   TexSubImage(RGB/565 256 x 256): 3744.1 images/sec, 468.0 MB/sec
>> >   GetTexImage(RGB/565 256 x 256): 4713.5 images/sec, 589.2 MB/sec
>> >
>> > on a puny Baytrail which is still far from what it is capable of. The
>> > reason for the disparity is that the teximage demo uses a busy texture
>> > which is performs an accelerated pixel conversion from the user's B5G6R5
>> > into the native X8B8G8R8. After the patch, no conversion is required
>> > allowing use of the blitter and memcpy fast paths.
>> >
>> > Signed-off-by: Chris Wilson 
>> > ---
>> >  src/mesa/main/texformat.c | 2 ++
>> >  1 file changed, 2 insertions(+)
>> >
>> > diff --git a/src/mesa/main/texformat.c b/src/mesa/main/texformat.c
>> > index fd9f335..866c7b3 100644
>> > --- a/src/mesa/main/texformat.c
>> > +++ b/src/mesa/main/texformat.c
>> > @@ -114,6 +114,8 @@ _mesa_choose_tex_format(struct gl_context *ctx, GLenum 
>> > target,
>> > case GL_RGB:
>> >if (type == GL_UNSIGNED_INT_2_10_10_10_REV) {
>> >   RETURN_IF_SUPPORTED(MESA_FORMAT_B10G10R10A2_UNORM);
>> > +  } else if (type == GL_UNSIGNED_SHORT_5_6_5) {
>> > + RETURN_IF_SUPPORTED(MESA_FORMAT_B5G6R5_UNORM);
>>
>> Shouldn't this be MESA_FORMAT_R5G6B5_UNORM? AFAICT, the reason for
>> using BGR above, is the _REV suffix on the type...
>
> No, it's the first line that's "wrong" since the B10G10R10A2 matches
> GL_BGRA + GL_UNSIGNED_INT_2_10_10_10_REV, GL_RGB implies that the
> incoming data only has 3 channels (no alpha at all).

Good point about the alpha channel. It sounds like it should be
changed to MESA_FORMAT_B10G10R10X2_UNORM instead.

But according to src/mesa/main/format_pack.c's
pack_ubyte_b10g10r10a2_unorm(), mesa's B10G10R10A2 corresponds to
OpenGL's UNSIGNED_INT_2_10_10_10_REV. So I think it matches GL_RGBA,
not GL_BGRA. The latter would mean another swizzle, AFAICT.

> i965 know how to do B5G6R5 and not R5G6B5, but for completeness we could
> also add RETURN_IF_SUPPORTED(MESA_FORMAT_R5G6B5_UNORM);

I think intel-specific hacks (like preferring B5G6R5 over R5G6B5)
shouldn't leak into _mesa_choose_tex_format(), so I think it'd be a
good move to add "RETURN_IF_SUPPORTED(MESA_FORMAT_R5G6B5_UNORM);"
before the latter return.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/9] softpipe: Add functions for computing mipmap level

2015-09-09 Thread Krzesimir Nowak
These functions will be used by textureQueryLod.
---
 src/gallium/drivers/softpipe/sp_tex_sample.c | 100 +--
 src/gallium/drivers/softpipe/sp_tex_sample.h |   7 ++
 2 files changed, 101 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/softpipe/sp_tex_sample.c 
b/src/gallium/drivers/softpipe/sp_tex_sample.c
index cdec984..6e639e0 100644
--- a/src/gallium/drivers/softpipe/sp_tex_sample.c
+++ b/src/gallium/drivers/softpipe/sp_tex_sample.c
@@ -1937,6 +1937,38 @@ get_gather_component(const float lod_in[TGSI_QUAD_SIZE])
 }
 
 static void
+clamp_lod(const struct sp_sampler_view *sp_sview,
+  const struct sp_sampler *sp_samp,
+  const float lod[TGSI_QUAD_SIZE],
+  float clamped[TGSI_QUAD_SIZE])
+{
+   const float min_lod = sp_samp->base.min_lod;
+   const float max_lod = sp_samp->base.max_lod;
+   const float min_level = sp_sview->base.u.tex.first_level;
+   const float max_level = sp_sview->base.u.tex.last_level;
+   int i;
+
+   for (i = 0; i < TGSI_QUAD_SIZE; i++) {
+  float cl = lod[i];
+
+  cl = CLAMP(cl, min_lod, max_lod);
+  /* XXX: Is min_level ever different from 0?
+   */
+  cl = CLAMP(cl, 0, max_level - min_level);
+  clamped[i] = cl;
+   }
+}
+
+static void
+mip_level_linear(struct sp_sampler_view *sp_sview,
+ struct sp_sampler *sp_samp,
+ const float lod[TGSI_QUAD_SIZE],
+ float level[TGSI_QUAD_SIZE])
+{
+   clamp_lod(sp_sview, sp_samp, lod, level);
+}
+
+static void
 mip_filter_linear(struct sp_sampler_view *sp_sview,
   struct sp_sampler *sp_samp,
   img_filter_func min_filter,
@@ -1998,6 +2030,23 @@ mip_filter_linear(struct sp_sampler_view *sp_sview,
 }
 
 
+static void
+mip_level_nearest(struct sp_sampler_view *sp_sview,
+  struct sp_sampler *sp_samp,
+  const float lod[TGSI_QUAD_SIZE],
+  float level[TGSI_QUAD_SIZE])
+{
+   const int first_level = sp_sview->base.u.tex.first_level;
+   int j;
+
+   clamp_lod(sp_sview, sp_samp, lod, level);
+   for (j = 0; j < TGSI_QUAD_SIZE; j++)
+  /* TODO: It should rather be:
+   * level[j] = first_level + ceil(level[j] + 0.5F) - 1.0F;
+   */
+  level[j] = first_level + (int)(level[j] + 0.5F);
+}
+
 /**
  * Compute nearest mipmap level from texcoords.
  * Then sample the texture level for four elements of a quad.
@@ -2050,6 +2099,19 @@ mip_filter_nearest(struct sp_sampler_view *sp_sview,
 
 
 static void
+mip_level_none(struct sp_sampler_view *sp_sview,
+   struct sp_sampler *sp_samp,
+   const float lod[TGSI_QUAD_SIZE],
+   float level[TGSI_QUAD_SIZE])
+{
+   int j;
+
+   for (j = 0; j < TGSI_QUAD_SIZE; j++) {
+  level[j] = sp_sview->base.u.tex.first_level;
+   }
+}
+
+static void
 mip_filter_none(struct sp_sampler_view *sp_sview,
 struct sp_sampler *sp_samp,
 img_filter_func min_filter,
@@ -2088,6 +2150,15 @@ mip_filter_none(struct sp_sampler_view *sp_sview,
 
 
 static void
+mip_level_none_no_filter_select(struct sp_sampler_view *sp_sview,
+struct sp_sampler *sp_samp,
+const float lod[TGSI_QUAD_SIZE],
+float level[TGSI_QUAD_SIZE])
+{
+   mip_level_none(sp_sview, sp_samp, lod, level);
+}
+
+static void
 mip_filter_none_no_filter_select(struct sp_sampler_view *sp_sview,
  struct sp_sampler *sp_samp,
  img_filter_func min_filter,
@@ -2339,6 +2410,15 @@ img_filter_2d_ewa(struct sp_sampler_view *sp_sview,
 }
 
 
+static void
+mip_level_linear_aniso(struct sp_sampler_view *sp_sview,
+   struct sp_sampler *sp_samp,
+   const float lod[TGSI_QUAD_SIZE],
+   float level[TGSI_QUAD_SIZE])
+{
+   mip_level_linear(sp_sview, sp_samp, lod, level);
+}
+
 /**
  * Sample 2D texture using an anisotropic filter.
  */
@@ -2450,6 +2530,14 @@ mip_filter_linear_aniso(struct sp_sampler_view *sp_sview,
}
 }
 
+static void
+mip_level_linear_2d_linear_repeat_POT(struct sp_sampler_view *sp_sview,
+  struct sp_sampler *sp_samp,
+  const float lod[TGSI_QUAD_SIZE],
+  float level[TGSI_QUAD_SIZE])
+{
+   mip_level_linear(sp_sview, sp_samp, lod, level);
+}
 
 /**
  * Specialized version of mip_filter_linear with hard-wired calls to
@@ -2515,12 +2603,12 @@ mip_filter_linear_2d_linear_repeat_POT(
}
 }
 
-static struct sp_mip mip_linear = {mip_filter_linear};
-static struct sp_mip mip_nearest = {mip_filter_nearest};
-static struct sp_mip mip_none = {mip_filter_none};
-static struct sp_mip mip_none_no_filter_select = 
{mip_filter_none_no_filter_select};
-static struct sp_mip mip_linear_aniso = {mip_filter_linear_aniso};
-static struct sp_mip 

[Mesa-dev] [PATCH 1/9] tgsi: Remove trailing backslash in comment

2015-09-09 Thread Krzesimir Nowak
It clearly is here by accident.
---
 src/gallium/auxiliary/tgsi/tgsi_exec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index 75cd0d5..9544623 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -2021,7 +2021,7 @@ fetch_sampler_unit(struct tgsi_exec_machine *mach,
 /*
  * execute a texture instruction.
  *
- * modifier is used to control the channel routing for the\
+ * modifier is used to control the channel routing for the
  * instruction variants like proj, lod, and texture with lod bias.
  * sampler indicates which src register the sampler is contained in.
  */
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/9] softpipe: Split compute_lambda_lod into two functions

2015-09-09 Thread Krzesimir Nowak
textureQueryLod returns a vec2 with a mipmap information and a
LOD. The latter needs to be not clamped.
---
 src/gallium/drivers/softpipe/sp_tex_sample.c | 55 
 1 file changed, 39 insertions(+), 16 deletions(-)

diff --git a/src/gallium/drivers/softpipe/sp_tex_sample.c 
b/src/gallium/drivers/softpipe/sp_tex_sample.c
index 19188b0..38bdc93 100644
--- a/src/gallium/drivers/softpipe/sp_tex_sample.c
+++ b/src/gallium/drivers/softpipe/sp_tex_sample.c
@@ -1855,24 +1855,23 @@ compute_lod(const struct pipe_sampler_state *sampler,
 }
 
 
-/* Calculate level of detail for every fragment.
+/* Calculate level of detail for every fragment. The computed value is not
+ * clamped into lod_min and lod_max.
  * \param lod_in per-fragment lod_bias or explicit_lod.
  * \param lod results per-fragment lod.
  */
 static inline void
-compute_lambda_lod(struct sp_sampler_view *sp_sview,
-   struct sp_sampler *sp_samp,
-   const float s[TGSI_QUAD_SIZE],
-   const float t[TGSI_QUAD_SIZE],
-   const float p[TGSI_QUAD_SIZE],
-   const float lod_in[TGSI_QUAD_SIZE],
-   enum tgsi_sampler_control control,
-   float lod[TGSI_QUAD_SIZE])
+compute_lambda_lod_not_clamped(struct sp_sampler_view *sp_sview,
+   struct sp_sampler *sp_samp,
+   const float s[TGSI_QUAD_SIZE],
+   const float t[TGSI_QUAD_SIZE],
+   const float p[TGSI_QUAD_SIZE],
+   const float lod_in[TGSI_QUAD_SIZE],
+   enum tgsi_sampler_control control,
+   float lod[TGSI_QUAD_SIZE])
 {
const struct pipe_sampler_state *sampler = _samp->base;
-   float lod_bias = sampler->lod_bias;
-   float min_lod = sampler->min_lod;
-   float max_lod = sampler->max_lod;
+   const float lod_bias = sampler->lod_bias;
float lambda;
uint i;
 
@@ -1881,24 +1880,23 @@ compute_lambda_lod(struct sp_sampler_view *sp_sview,
   /* XXX FIXME */
case tgsi_sampler_derivs_explicit:
   lambda = sp_sview->compute_lambda(sp_sview, s, t, p) + lod_bias;
-  lod[0] = lod[1] = lod[2] = lod[3] = CLAMP(lambda, min_lod, max_lod);
+  lod[0] = lod[1] = lod[2] = lod[3] = lambda;
   break;
case tgsi_sampler_lod_bias:
   lambda = sp_sview->compute_lambda(sp_sview, s, t, p) + lod_bias;
   for (i = 0; i < TGSI_QUAD_SIZE; i++) {
  lod[i] = lambda + lod_in[i];
- lod[i] = CLAMP(lod[i], min_lod, max_lod);
   }
   break;
case tgsi_sampler_lod_explicit:
   for (i = 0; i < TGSI_QUAD_SIZE; i++) {
- lod[i] = CLAMP(lod_in[i] + lod_bias, min_lod, max_lod);
+ lod[i] = lod_in[i] + lod_bias;
   }
   break;
case tgsi_sampler_lod_zero:
case tgsi_sampler_gather:
   /* this is all static state in the sampler really need clamp here? */
-  lod[0] = lod[1] = lod[2] = lod[3] = CLAMP(lod_bias, min_lod, max_lod);
+  lod[0] = lod[1] = lod[2] = lod[3] = lod_bias;
   break;
default:
   assert(0);
@@ -1906,6 +1904,31 @@ compute_lambda_lod(struct sp_sampler_view *sp_sview,
}
 }
 
+/* Calculate level of detail for every fragment.
+ * \param lod_in per-fragment lod_bias or explicit_lod.
+ * \param lod results per-fragment lod.
+ */
+static inline void
+compute_lambda_lod(struct sp_sampler_view *sp_sview,
+   struct sp_sampler *sp_samp,
+   const float s[TGSI_QUAD_SIZE],
+   const float t[TGSI_QUAD_SIZE],
+   const float p[TGSI_QUAD_SIZE],
+   const float lod_in[TGSI_QUAD_SIZE],
+   enum tgsi_sampler_control control,
+   float lod[TGSI_QUAD_SIZE])
+{
+   const struct pipe_sampler_state *sampler = _samp->base;
+   const float min_lod = sampler->min_lod;
+   const float max_lod = sampler->max_lod;
+   int i;
+
+   compute_lambda_lod_not_clamped(sp_sview, sp_samp, s, t, p, lod_in, control, 
lod);
+   for (i = 0; i < TGSI_QUAD_SIZE; i++) {
+  lod[i] = CLAMP(lod[i], min_lod, max_lod);
+   }
+}
+
 static inline unsigned
 get_gather_component(const float lod_in[TGSI_QUAD_SIZE])
 {
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 8/9] tgsi: Add code for handling lodq opcode

2015-09-09 Thread Krzesimir Nowak
This introduces new vfunc in tgsi_sampler just for this opcode. I
decided against extending get_samples vfunc to return the mipmap level
and LOD - the function's prototype is already too scary and doing the
sampling for textureQueryLod would be a waste of time.
---
 src/gallium/auxiliary/tgsi/tgsi_exec.c | 44 ++
 src/gallium/auxiliary/tgsi/tgsi_exec.h | 10 
 2 files changed, 54 insertions(+)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index 9544623..054ad08 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -2132,6 +2132,44 @@ exec_tex(struct tgsi_exec_machine *mach,
}
 }
 
+static void
+exec_lodq(struct tgsi_exec_machine *mach,
+  const struct tgsi_full_instruction *inst)
+{
+   uint unit;
+   int dim;
+   int i;
+   union tgsi_exec_channel coords[4];
+   const union tgsi_exec_channel *args[Elements(coords)];
+   union tgsi_exec_channel r[2];
+
+   unit = fetch_sampler_unit(mach, inst, 1);
+   dim = tgsi_util_get_texture_coord_dim(inst->Texture.Texture, NULL);
+   assert(dim <= Elements(coords));
+   /* fetch coordinates */
+   for (i = 0; i < dim; i++) {
+  FETCH([i], 0, TGSI_CHAN_X + i);
+  args[i] = [i];
+   }
+   for (i = dim; i < Elements(coords); i++) {
+  args[i] = 
+   }
+   mach->Sampler->query_lod(mach->Sampler, unit, unit,
+args[0]->f,
+args[1]->f,
+args[2]->f,
+args[3]->f,
+tgsi_sampler_lod_none,
+r[0].f,
+r[1].f);
+
+   if (inst->Dst[0].Register.WriteMask & TGSI_WRITEMASK_X) {
+  store_dest(mach, [0], >Dst[0], inst, TGSI_CHAN_X, 
TGSI_EXEC_DATA_FLOAT);
+   }
+   if (inst->Dst[0].Register.WriteMask & TGSI_WRITEMASK_Y) {
+  store_dest(mach, [1], >Dst[0], inst, TGSI_CHAN_Y, 
TGSI_EXEC_DATA_FLOAT);
+   }
+}
 
 static void
 exec_txd(struct tgsi_exec_machine *mach,
@@ -4378,6 +4416,12 @@ exec_instruction(
   exec_tex(mach, inst, TEX_MODIFIER_GATHER, 2);
   break;
 
+   case TGSI_OPCODE_LODQ:
+  /* src[0] = texcoord */
+  /* src[1] = sampler unit */
+  exec_lodq(mach, inst);
+  break;
+
case TGSI_OPCODE_UP2H:
   assert (0);
   break;
diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.h 
b/src/gallium/auxiliary/tgsi/tgsi_exec.h
index 5d56aab..556e0af 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.h
@@ -138,6 +138,16 @@ struct tgsi_sampler
  const int j[TGSI_QUAD_SIZE], const int k[TGSI_QUAD_SIZE],
  const int lod[TGSI_QUAD_SIZE], const int8_t offset[3],
  float rgba[TGSI_NUM_CHANNELS][TGSI_QUAD_SIZE]);
+   void (*query_lod)(struct tgsi_sampler *tgsi_sampler,
+ const unsigned sview_index,
+ const unsigned sampler_index,
+ const float s[TGSI_QUAD_SIZE],
+ const float t[TGSI_QUAD_SIZE],
+ const float p[TGSI_QUAD_SIZE],
+ const float c0[TGSI_QUAD_SIZE],
+ enum tgsi_sampler_control control,
+ float mipmap[TGSI_QUAD_SIZE],
+ float lod[TGSI_QUAD_SIZE]);
 };
 
 #define TGSI_EXEC_NUM_TEMPS   4096
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/9] softpipe: Split code getting a filter into separate function

2015-09-09 Thread Krzesimir Nowak
This function will be later used by textureQueryLod. The
img_filter_func are optional, because textureQueryLod will not need
them.
---
 src/gallium/drivers/softpipe/sp_tex_sample.c | 53 +++-
 1 file changed, 37 insertions(+), 16 deletions(-)

diff --git a/src/gallium/drivers/softpipe/sp_tex_sample.c 
b/src/gallium/drivers/softpipe/sp_tex_sample.c
index cd4a659..0a3fc20 100644
--- a/src/gallium/drivers/softpipe/sp_tex_sample.c
+++ b/src/gallium/drivers/softpipe/sp_tex_sample.c
@@ -2912,6 +2912,42 @@ get_img_filter(const struct sp_sampler_view *sp_sview,
}
 }
 
+/**
+ * Get mip filter, and optionally both img min filter and img mag filter
+ */
+static void
+get_filters(struct sp_sampler_view *sp_sview,
+struct sp_sampler *sp_samp,
+enum tgsi_sampler_control control,
+struct sp_mip **mip,
+img_filter_func *min,
+img_filter_func *mag)
+{
+   assert(mip);
+   if (control == tgsi_sampler_gather) {
+  *mip = _nearest;
+  if (min)
+ *min = get_img_filter(sp_sview, _samp->base, 
PIPE_TEX_FILTER_LINEAR, true);
+   } else if (sp_sview->pot2d & sp_samp->min_mag_equal_repeat_linear) {
+  *mip = _linear_2d_linear_repeat_POT;
+   }
+   else {
+  *mip = _samp->mip;
+  if (min || mag) {
+ img_filter_func tmp_filter = get_img_filter(sp_sview, _samp->base, 
sp_samp->min_img_filter, false);
+ if (min)
+*min = tmp_filter;
+ if (mag) {
+if (sp_samp->min_mag_equal) {
+   *mag = tmp_filter;
+}
+else {
+   *mag = get_img_filter(sp_sview, _samp->base, 
sp_samp->base.mag_img_filter, false);
+}
+ }
+  }
+   }
+}
 
 static void
 sample_mip(struct sp_sampler_view *sp_sview,
@@ -2928,22 +2964,7 @@ sample_mip(struct sp_sampler_view *sp_sview,
img_filter_func min_img_filter = NULL;
img_filter_func mag_img_filter = NULL;
 
-   if (filt_args->control == tgsi_sampler_gather) {
-  mip = _nearest;
-  min_img_filter = get_img_filter(sp_sview, _samp->base, 
PIPE_TEX_FILTER_LINEAR, true);
-   } else if (sp_sview->pot2d & sp_samp->min_mag_equal_repeat_linear) {
-  mip = _linear_2d_linear_repeat_POT;
-   }
-   else {
-  mip = _samp->mip;
-  min_img_filter = get_img_filter(sp_sview, _samp->base, 
sp_samp->min_img_filter, false);
-  if (sp_samp->min_mag_equal) {
- mag_img_filter = min_img_filter;
-  }
-  else {
- mag_img_filter = get_img_filter(sp_sview, _samp->base, 
sp_samp->base.mag_img_filter, false);
-  }
-   }
+   get_filters(sp_sview, sp_samp, filt_args->control, , _img_filter, 
_img_filter);
 
mip->filter(sp_sview, sp_samp, min_img_filter, mag_img_filter,
s, t, p, c0, lod, filt_args, rgba);
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/9] softpipe: Fix textureLod with nonzero GL_TEXTURE_LOD_BIAS value.

2015-09-09 Thread Krzesimir Nowak
The level-of-detail bias wasn't simply added in the explicit LOD case.
This case seems to be tested only in piglit's
fs-texturequerylod-nearest-biased test, which is currently skipped, as
softpipe does not support textureQueryLod at the moment.
---
 src/gallium/drivers/softpipe/sp_tex_sample.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/softpipe/sp_tex_sample.c 
b/src/gallium/drivers/softpipe/sp_tex_sample.c
index 565fca6..19188b0 100644
--- a/src/gallium/drivers/softpipe/sp_tex_sample.c
+++ b/src/gallium/drivers/softpipe/sp_tex_sample.c
@@ -1892,7 +1892,7 @@ compute_lambda_lod(struct sp_sampler_view *sp_sview,
   break;
case tgsi_sampler_lod_explicit:
   for (i = 0; i < TGSI_QUAD_SIZE; i++) {
- lod[i] = CLAMP(lod_in[i], min_lod, max_lod);
+ lod[i] = CLAMP(lod_in[i] + lod_bias, min_lod, max_lod);
   }
   break;
case tgsi_sampler_lod_zero:
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 9/9] softpipe: Implement and enable textureQueryLod

2015-09-09 Thread Krzesimir Nowak
Passes the shader piglit tests and introduces no regressions.

This commit finally makes use of the refactoring in previous
commits.
---
 src/gallium/drivers/softpipe/sp_screen.c |  2 +-
 src/gallium/drivers/softpipe/sp_tex_sample.c | 47 +++-
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/softpipe/sp_screen.c 
b/src/gallium/drivers/softpipe/sp_screen.c
index 0bfd9c3..7ca8a67 100644
--- a/src/gallium/drivers/softpipe/sp_screen.c
+++ b/src/gallium/drivers/softpipe/sp_screen.c
@@ -193,9 +193,9 @@ softpipe_get_param(struct pipe_screen *screen, enum 
pipe_cap param)
case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
   return 4;
case PIPE_CAP_TEXTURE_GATHER_SM5:
+   case PIPE_CAP_TEXTURE_QUERY_LOD:
   return 1;
case PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT:
-   case PIPE_CAP_TEXTURE_QUERY_LOD:
case PIPE_CAP_SAMPLE_SHADING:
case PIPE_CAP_TEXTURE_GATHER_OFFSETS:
   return 0;
diff --git a/src/gallium/drivers/softpipe/sp_tex_sample.c 
b/src/gallium/drivers/softpipe/sp_tex_sample.c
index 6e639e0..499c8f9 100644
--- a/src/gallium/drivers/softpipe/sp_tex_sample.c
+++ b/src/gallium/drivers/softpipe/sp_tex_sample.c
@@ -3566,6 +3566,51 @@ sp_tgsi_get_samples(struct tgsi_sampler *tgsi_sampler,
sample_mip(sp_sview, sp_samp, cs, ct, cp, c0, lod, _args, rgba);
 }
 
+static void
+sp_tgsi_query_lod(struct tgsi_sampler *tgsi_sampler,
+  const unsigned sview_index,
+  const unsigned sampler_index,
+  const float s[TGSI_QUAD_SIZE],
+  const float t[TGSI_QUAD_SIZE],
+  const float p[TGSI_QUAD_SIZE],
+  const float c0[TGSI_QUAD_SIZE],
+  enum tgsi_sampler_control control,
+  float mipmap[TGSI_QUAD_SIZE],
+  float lod[TGSI_QUAD_SIZE])
+{
+   static const float lod_in[TGSI_QUAD_SIZE] = { 0.0, 0.0, 0.0, 0.0 };
+
+   struct sp_tgsi_sampler *sp_tgsi_samp = (struct sp_tgsi_sampler 
*)tgsi_sampler;
+   struct sp_sampler_view *sp_sview;
+   struct sp_sampler *sp_samp;
+   struct sp_mip *mip;
+   int i;
+   float cs[TGSI_QUAD_SIZE];
+   float ct[TGSI_QUAD_SIZE];
+   float cp[TGSI_QUAD_SIZE];
+
+   assert(sview_index < PIPE_MAX_SHADER_SAMPLER_VIEWS);
+   assert(sampler_index < PIPE_MAX_SAMPLERS);
+   assert(sp_tgsi_samp->sp_sampler[sampler_index]);
+
+   sp_sview = _tgsi_samp->sp_sview[sview_index];
+   sp_samp = sp_tgsi_samp->sp_sampler[sampler_index];
+   /* always have a view here but texture is NULL if no sampler view was set. 
*/
+   if (!sp_sview->base.texture) {
+  for (i = 0; i < TGSI_QUAD_SIZE; i++) {
+ mipmap[i] = 0.0f;
+ lod[i] = 0.0f;
+  }
+  return;
+   }
+
+   sp_sview->convert_coords(sp_sview, sp_samp, s, t, p, c0, cs, ct, cp);
+
+   compute_lambda_lod_not_clamped(sp_sview, sp_samp,
+  cs, ct, cp, lod_in, control, lod);
+   get_filters(sp_sview, sp_samp, control, , NULL, NULL);
+   mip->level(sp_sview, sp_samp, lod, mipmap);
+}
 
 static void
 sp_tgsi_get_texel(struct tgsi_sampler *tgsi_sampler,
@@ -3602,7 +3647,7 @@ sp_create_tgsi_sampler(void)
samp->base.get_dims = sp_tgsi_get_dims;
samp->base.get_samples = sp_tgsi_get_samples;
samp->base.get_texel = sp_tgsi_get_texel;
+   samp->base.query_lod = sp_tgsi_query_lod;
 
return samp;
 }
-
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/9] softpipe: Split 3D to 2D coords conversion into separate function.

2015-09-09 Thread Krzesimir Nowak
This is to avoid tying the conversion to sampling - textureQueryLod
will need to do the conversion too, but it does not do any sampling.

So instead of a "sample" vfunc, there is a "convert" vfunc. The
drawback of this approach is that a "noop" convert copies 3 arrays of
floats. Would be nice to avoid it in some clean way.
---
 src/gallium/drivers/softpipe/sp_tex_sample.c | 73 +---
 src/gallium/drivers/softpipe/sp_tex_sample.h | 20 
 2 files changed, 54 insertions(+), 39 deletions(-)

diff --git a/src/gallium/drivers/softpipe/sp_tex_sample.c 
b/src/gallium/drivers/softpipe/sp_tex_sample.c
index 0a3fc20..cdec984 100644
--- a/src/gallium/drivers/softpipe/sp_tex_sample.c
+++ b/src/gallium/drivers/softpipe/sp_tex_sample.c
@@ -2981,29 +2981,36 @@ sample_mip(struct sp_sampler_view *sp_sview,
 
 }
 
+static void
+convert_noop(struct sp_sampler_view *sp_sview,
+ struct sp_sampler *sp_samp,
+ const float s[TGSI_QUAD_SIZE],
+ const float t[TGSI_QUAD_SIZE],
+ const float p[TGSI_QUAD_SIZE],
+ const float c0[TGSI_QUAD_SIZE],
+ float [TGSI_QUAD_SIZE],
+ float [TGSI_QUAD_SIZE],
+ float [TGSI_QUAD_SIZE])
+{
+   const size_t len = sizeof(float) * TGSI_QUAD_SIZE;
+
+   memcpy(, s, len);
+   memcpy(, t, len);
+   memcpy(, p, len);
+}
 
-/**
- * Use 3D texcoords to choose a cube face, then sample the 2D cube faces.
- * Put face info into the sampler faces[] array.
- */
 static void
-sample_cube(struct sp_sampler_view *sp_sview,
-struct sp_sampler *sp_samp,
-const float s[TGSI_QUAD_SIZE],
-const float t[TGSI_QUAD_SIZE],
-const float p[TGSI_QUAD_SIZE],
-const float c0[TGSI_QUAD_SIZE],
-const float c1[TGSI_QUAD_SIZE],
-const struct filter_args *filt_args,
-float rgba[TGSI_NUM_CHANNELS][TGSI_QUAD_SIZE])
+convert_cube(struct sp_sampler_view *sp_sview,
+ struct sp_sampler *sp_samp,
+ const float s[TGSI_QUAD_SIZE],
+ const float t[TGSI_QUAD_SIZE],
+ const float p[TGSI_QUAD_SIZE],
+ const float c0[TGSI_QUAD_SIZE],
+ float [TGSI_QUAD_SIZE],
+ float [TGSI_QUAD_SIZE],
+ float [TGSI_QUAD_SIZE])
 {
unsigned j;
-   float [4], [4];
-
-   /* Not actually used, but the intermediate steps that do the
-* dereferencing don't know it.
-*/
-   static float [4] = { 0, 0, 0, 0 };
 
[0] = c0[0];
[1] = c0[1];
@@ -3071,8 +3078,6 @@ sample_cube(struct sp_sampler_view *sp_sview,
  }
   }
}
-
-   sample_mip(sp_sview, sp_samp, , , , c0, c1, filt_args, rgba);
 }
 
 
@@ -3393,9 +3398,9 @@ softpipe_create_sampler_view(struct pipe_context *pipe,
 
   if (view->target == PIPE_TEXTURE_CUBE ||
   view->target == PIPE_TEXTURE_CUBE_ARRAY)
- sview->get_samples = sample_cube;
+ sview->convert_coords = convert_cube;
   else {
- sview->get_samples = sample_mip;
+ sview->convert_coords = convert_noop;
   }
   sview->pot2d = spr->pot &&
  (view->target == PIPE_TEXTURE_2D ||
@@ -3440,13 +3445,22 @@ sp_tgsi_get_samples(struct tgsi_sampler *tgsi_sampler,
 enum tgsi_sampler_control control,
 float rgba[TGSI_NUM_CHANNELS][TGSI_QUAD_SIZE])
 {
-   struct sp_tgsi_sampler *sp_samp = (struct sp_tgsi_sampler *)tgsi_sampler;
+   struct sp_tgsi_sampler *sp_tgsi_samp = (struct sp_tgsi_sampler 
*)tgsi_sampler;
+   struct sp_sampler_view *sp_sview;
+   struct sp_sampler *sp_samp;
struct filter_args filt_args;
+   float cs[TGSI_QUAD_SIZE];
+   float ct[TGSI_QUAD_SIZE];
+   float cp[TGSI_QUAD_SIZE];
+
assert(sview_index < PIPE_MAX_SHADER_SAMPLER_VIEWS);
assert(sampler_index < PIPE_MAX_SAMPLERS);
-   assert(sp_samp->sp_sampler[sampler_index]);
+   assert(sp_tgsi_samp->sp_sampler[sampler_index]);
+
+   sp_sview = _tgsi_samp->sp_sview[sview_index];
+   sp_samp = sp_tgsi_samp->sp_sampler[sampler_index];
/* always have a view here but texture is NULL if no sampler view was set. 
*/
-   if (!sp_samp->sp_sview[sview_index].base.texture) {
+   if (!sp_sview->base.texture) {
   int i, j;
   for (j = 0; j < TGSI_NUM_CHANNELS; j++) {
  for (i = 0; i < TGSI_QUAD_SIZE; i++) {
@@ -3456,11 +3470,12 @@ sp_tgsi_get_samples(struct tgsi_sampler *tgsi_sampler,
   return;
}
 
+   sp_sview->convert_coords(sp_sview, sp_samp, s, t, p, c0, cs, ct, cp);
+
filt_args.control = control;
filt_args.offset = offset;
-   sp_samp->sp_sview[sview_index].get_samples(_samp->sp_sview[sview_index],
-  
sp_samp->sp_sampler[sampler_index],
-  s, t, p, c0, lod, _args, 
rgba);
+
+   sample_mip(sp_sview, sp_samp, cs, ct, cp, c0, lod, _args, rgba);
 }
 
 
diff --git 

[Mesa-dev] [PATCH 4/9] softpipe: Put mip_filter_func inside a struct

2015-09-09 Thread Krzesimir Nowak
Putting this function pointer into a struct enables grouping of
several related functions in a single place. For now it is just a
single function, but the struct will be later extended with a
mip_level_func for returning mip level.
---
 src/gallium/drivers/softpipe/sp_tex_sample.c | 28 +---
 src/gallium/drivers/softpipe/sp_tex_sample.h |  5 -
 2 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/softpipe/sp_tex_sample.c 
b/src/gallium/drivers/softpipe/sp_tex_sample.c
index 38bdc93..cd4a659 100644
--- a/src/gallium/drivers/softpipe/sp_tex_sample.c
+++ b/src/gallium/drivers/softpipe/sp_tex_sample.c
@@ -2515,6 +2515,12 @@ mip_filter_linear_2d_linear_repeat_POT(
}
 }
 
+static struct sp_mip mip_linear = {mip_filter_linear};
+static struct sp_mip mip_nearest = {mip_filter_nearest};
+static struct sp_mip mip_none = {mip_filter_none};
+static struct sp_mip mip_none_no_filter_select = 
{mip_filter_none_no_filter_select};
+static struct sp_mip mip_linear_aniso = {mip_filter_linear_aniso};
+static struct sp_mip mip_linear_2d_linear_repeat_POT = 
{mip_filter_linear_2d_linear_repeat_POT};
 
 /**
  * Do shadow/depth comparisons.
@@ -2918,18 +2924,18 @@ sample_mip(struct sp_sampler_view *sp_sview,
const struct filter_args *filt_args,
float rgba[TGSI_NUM_CHANNELS][TGSI_QUAD_SIZE])
 {
-   mip_filter_func mip_filter;
+   struct sp_mip *mip = NULL;
img_filter_func min_img_filter = NULL;
img_filter_func mag_img_filter = NULL;
 
if (filt_args->control == tgsi_sampler_gather) {
-  mip_filter = mip_filter_nearest;
+  mip = _nearest;
   min_img_filter = get_img_filter(sp_sview, _samp->base, 
PIPE_TEX_FILTER_LINEAR, true);
} else if (sp_sview->pot2d & sp_samp->min_mag_equal_repeat_linear) {
-  mip_filter = mip_filter_linear_2d_linear_repeat_POT;
+  mip = _linear_2d_linear_repeat_POT;
}
else {
-  mip_filter = sp_samp->mip_filter;
+  mip = _samp->mip;
   min_img_filter = get_img_filter(sp_sview, _samp->base, 
sp_samp->min_img_filter, false);
   if (sp_samp->min_mag_equal) {
  mag_img_filter = min_img_filter;
@@ -2939,8 +2945,8 @@ sample_mip(struct sp_sampler_view *sp_sview,
   }
}
 
-   mip_filter(sp_sview, sp_samp, min_img_filter, mag_img_filter,
-  s, t, p, c0, lod, filt_args, rgba);
+   mip->filter(sp_sview, sp_samp, min_img_filter, mag_img_filter,
+   s, t, p, c0, lod, filt_args, rgba);
 
if (sp_samp->base.compare_mode != PIPE_TEX_COMPARE_NONE) {
   sample_compare(sp_sview, sp_samp, s, t, p, c0, lod, filt_args->control, 
rgba);
@@ -3239,13 +3245,13 @@ softpipe_create_sampler_state(struct pipe_context *pipe,
switch (sampler->min_mip_filter) {
case PIPE_TEX_MIPFILTER_NONE:
   if (sampler->min_img_filter == sampler->mag_img_filter)
- samp->mip_filter = mip_filter_none_no_filter_select;
+ samp->mip = mip_none_no_filter_select;
   else
- samp->mip_filter = mip_filter_none;
+ samp->mip = mip_none;
   break;
 
case PIPE_TEX_MIPFILTER_NEAREST:
-  samp->mip_filter = mip_filter_nearest;
+  samp->mip = mip_nearest;
   break;
 
case PIPE_TEX_MIPFILTER_LINEAR:
@@ -3257,11 +3263,11 @@ softpipe_create_sampler_state(struct pipe_context *pipe,
   sampler->max_anisotropy <= 1) {
  samp->min_mag_equal_repeat_linear = TRUE;
   }
-  samp->mip_filter = mip_filter_linear;
+  samp->mip = mip_linear;
 
   /* Anisotropic filtering extension. */
   if (sampler->max_anisotropy > 1) {
- samp->mip_filter = mip_filter_linear_aniso;
+ samp->mip = mip_linear_aniso;
 
  /* Override min_img_filter:
   * min_img_filter needs to be set to NEAREST since we need to access
diff --git a/src/gallium/drivers/softpipe/sp_tex_sample.h 
b/src/gallium/drivers/softpipe/sp_tex_sample.h
index 7d1aafc..78541e1 100644
--- a/src/gallium/drivers/softpipe/sp_tex_sample.h
+++ b/src/gallium/drivers/softpipe/sp_tex_sample.h
@@ -128,6 +128,9 @@ struct sp_sampler_view
 
 };
 
+struct sp_mip {
+   mip_filter_func filter;
+};
 
 struct sp_sampler {
struct pipe_sampler_state base;
@@ -144,7 +147,7 @@ struct sp_sampler {
wrap_linear_func linear_texcoord_t;
wrap_linear_func linear_texcoord_p;
 
-   mip_filter_func mip_filter;
+   struct sp_mip mip;
 };
 
 
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/9] Implement textureQueryLod in softpipe

2015-09-09 Thread Krzesimir Nowak
The following patches do some preparatory refactoring in softpipe's
sp_tex_sample.{c,h} before actually implementing the feature.

There is also a patch that fixes softpipe's textureLod with bias. I
caught this bug when finished the textureQueryLod implementation - one
of four textureQueryLod tests were failing because of it. I sent
patches adding the testcases with non-zero lod-bias value to piglit
mailing list
(http://lists.freedesktop.org/archives/piglit/2015-September/017018.html).

Also, this is my first contribution to mesa, but I hope I didn't make
any obvious mistakes.

Thanks.

Krzesimir Nowak (9):
  tgsi: Remove trailing backslash in comment
  softpipe: Fix textureLod with nonzero GL_TEXTURE_LOD_BIAS value.
  softpipe: Split compute_lambda_lod into two functions
  softpipe: Put mip_filter_func inside a struct
  softpipe: Split code getting a filter into separate function
  softpipe: Split 3D to 2D coords conversion into separate function.
  softpipe: Add functions for computing mipmap level
  tgsi: Add code for handling lodq opcode
  softpipe: Implement and enable textureQueryLod

 src/gallium/auxiliary/tgsi/tgsi_exec.c   |  46 +++-
 src/gallium/auxiliary/tgsi/tgsi_exec.h   |  10 +
 src/gallium/drivers/softpipe/sp_screen.c |   2 +-
 src/gallium/drivers/softpipe/sp_tex_sample.c | 338 +--
 src/gallium/drivers/softpipe/sp_tex_sample.h |  32 ++-
 5 files changed, 345 insertions(+), 83 deletions(-)

-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i915: fixing driver crashes if too few vertices are submitted

2015-09-09 Thread Marius Predut
Comparison with a signed expression and unsigned value
is converted to unsigned value, reason for minus value is interpreted
as a big unsigned value. For this case the "for" loop
is going into unexpected behavior.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38109
Signed-off-by: Marius Predut 
---
 src/mesa/tnl_dd/t_dd_dmatmp.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/tnl_dd/t_dd_dmatmp.h b/src/mesa/tnl_dd/t_dd_dmatmp.h
index 7be3954..2ae33c0 100644
--- a/src/mesa/tnl_dd/t_dd_dmatmp.h
+++ b/src/mesa/tnl_dd/t_dd_dmatmp.h
@@ -627,6 +627,8 @@ static void TAG(render_quads_verts)( struct gl_context *ctx,
   LOCAL_VARS;
   GLuint j;
 
+  if(count%4 != 0) return;
+
   INIT(GL_TRIANGLES);
 
   for (j = start; j < count-3; j += 4) {
@@ -1248,7 +1250,7 @@ static GLboolean TAG(validate_render)( struct gl_context 
*ctx,
ok = (GLint) count < GET_SUBSEQUENT_VB_MAX_ELTS();
 }
 else {
-   ok = HAVE_TRIANGLES; /* flatshading is ok. */
+   ok = HAVE_TRIANGLES && (count%4 == 0); /* flatshading is ok. */
 }
 break;
   default:
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Don't tell the hardware about our UAV access.

2015-09-09 Thread Francisco Jerez
Mark Janes  writes:

> Francisco Jerez  writes:
>
>> Mark Janes  writes:
>>
>>> When I tested this, I saw an intermittent BSW gpu hang.  I haven't been
>>> able to confirm that it is due to the host-mem-barrier test.
>>>
>> It would probably be useful to know if the hang is due to any of the
>> image load/store tests or if it's something unrelated.
>
> Yes, you are right.  I will take some time tomorrow to catch the
> specific test.  Given the low rate of gpu hangs on BSW lately, I expect
> it will be due to image load/store tests.  However I need to confirm it.
>
Anyway, unless the BSW hang is introduced by this patch or caused by the
host-mem-barrier/Indirect/RaW subtest, feel free to open a separate bug
report for it.  It's unlikely to be related to this thread.

> -Mark


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Match MESA_FORMAT_B5G6R5 for a shallow pixel format of GL_RGB

2015-09-09 Thread Chris Wilson
On Wed, Sep 09, 2015 at 11:11:59AM +0200, Erik Faye-Lund wrote:
> On Thu, Sep 3, 2015 at 6:05 PM, Chris Wilson  wrote:
> > If the user supplies a pixel format of GL_RGB + GL_UNSIGNED_SHORT_5_6_5
> > and specifies a generic unsized GL_RGB internal format, match that to a
> > texture format of MESA_FORMAT_B5G6R5 if supported by the hardware.
> >
> > Noticed while playing with mesa-demos/teximage:
> >
> >   TexImage(RGB/565 256 x 256): 79.8 images/sec, 10.0 MB/sec
> >   TexSubImage(RGB/565 256 x 256): 3804.9 images/sec, 475.6 MB/sec
> >   GetTexImage(RGB/565 256 x 256): 99.5 images/sec, 12.4 MB/sec
> >
> > becomes
> >
> >   TexImage(RGB/565 256 x 256): 3439.1 images/sec, 429.9 MB/sec
> >   TexSubImage(RGB/565 256 x 256): 3744.1 images/sec, 468.0 MB/sec
> >   GetTexImage(RGB/565 256 x 256): 4713.5 images/sec, 589.2 MB/sec
> >
> > on a puny Baytrail which is still far from what it is capable of. The
> > reason for the disparity is that the teximage demo uses a busy texture
> > which is performs an accelerated pixel conversion from the user's B5G6R5
> > into the native X8B8G8R8. After the patch, no conversion is required
> > allowing use of the blitter and memcpy fast paths.
> >
> > Signed-off-by: Chris Wilson 
> > ---
> >  src/mesa/main/texformat.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/src/mesa/main/texformat.c b/src/mesa/main/texformat.c
> > index fd9f335..866c7b3 100644
> > --- a/src/mesa/main/texformat.c
> > +++ b/src/mesa/main/texformat.c
> > @@ -114,6 +114,8 @@ _mesa_choose_tex_format(struct gl_context *ctx, GLenum 
> > target,
> > case GL_RGB:
> >if (type == GL_UNSIGNED_INT_2_10_10_10_REV) {
> >   RETURN_IF_SUPPORTED(MESA_FORMAT_B10G10R10A2_UNORM);
> > +  } else if (type == GL_UNSIGNED_SHORT_5_6_5) {
> > + RETURN_IF_SUPPORTED(MESA_FORMAT_B5G6R5_UNORM);
> 
> Shouldn't this be MESA_FORMAT_R5G6B5_UNORM? AFAICT, the reason for
> using BGR above, is the _REV suffix on the type...

No, it's the first line that's "wrong" since the B10G10R10A2 matches
GL_BGRA + GL_UNSIGNED_INT_2_10_10_10_REV, GL_RGB implies that the
incoming data only has 3 channels (no alpha at all).

i965 know how to do B5G6R5 and not R5G6B5, but for completeness we could
also add RETURN_IF_SUPPORTED(MESA_FORMAT_R5G6B5_UNORM);
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 80821] When LIBGL_ALWAYS_SOFTWARE is set, KHR_create_context is not supported

2015-09-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=80821

Emil Velikov  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Emil Velikov  ---
Props to Boyan, a slightly different version has landed for the DRI based
libEGL.
On the st/egl side, it has been dropped from upstream due lack of interested
maintainer :'(

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Match MESA_FORMAT_B5G6R5 for a shallow pixel format of GL_RGB

2015-09-09 Thread Chris Wilson
On Wed, Sep 09, 2015 at 12:09:40PM +0200, Erik Faye-Lund wrote:
> On Wed, Sep 9, 2015 at 11:25 AM, Chris Wilson  
> wrote:
> > On Wed, Sep 09, 2015 at 11:11:59AM +0200, Erik Faye-Lund wrote:
> >> On Thu, Sep 3, 2015 at 6:05 PM, Chris Wilson  
> >> wrote:
> >> > If the user supplies a pixel format of GL_RGB + GL_UNSIGNED_SHORT_5_6_5
> >> > and specifies a generic unsized GL_RGB internal format, match that to a
> >> > texture format of MESA_FORMAT_B5G6R5 if supported by the hardware.
> >> >
> >> > Noticed while playing with mesa-demos/teximage:
> >> >
> >> >   TexImage(RGB/565 256 x 256): 79.8 images/sec, 10.0 MB/sec
> >> >   TexSubImage(RGB/565 256 x 256): 3804.9 images/sec, 475.6 MB/sec
> >> >   GetTexImage(RGB/565 256 x 256): 99.5 images/sec, 12.4 MB/sec
> >> >
> >> > becomes
> >> >
> >> >   TexImage(RGB/565 256 x 256): 3439.1 images/sec, 429.9 MB/sec
> >> >   TexSubImage(RGB/565 256 x 256): 3744.1 images/sec, 468.0 MB/sec
> >> >   GetTexImage(RGB/565 256 x 256): 4713.5 images/sec, 589.2 MB/sec
> >> >
> >> > on a puny Baytrail which is still far from what it is capable of. The
> >> > reason for the disparity is that the teximage demo uses a busy texture
> >> > which is performs an accelerated pixel conversion from the user's B5G6R5
> >> > into the native X8B8G8R8. After the patch, no conversion is required
> >> > allowing use of the blitter and memcpy fast paths.
> >> >
> >> > Signed-off-by: Chris Wilson 
> >> > ---
> >> >  src/mesa/main/texformat.c | 2 ++
> >> >  1 file changed, 2 insertions(+)
> >> >
> >> > diff --git a/src/mesa/main/texformat.c b/src/mesa/main/texformat.c
> >> > index fd9f335..866c7b3 100644
> >> > --- a/src/mesa/main/texformat.c
> >> > +++ b/src/mesa/main/texformat.c
> >> > @@ -114,6 +114,8 @@ _mesa_choose_tex_format(struct gl_context *ctx, 
> >> > GLenum target,
> >> > case GL_RGB:
> >> >if (type == GL_UNSIGNED_INT_2_10_10_10_REV) {
> >> >   RETURN_IF_SUPPORTED(MESA_FORMAT_B10G10R10A2_UNORM);
> >> > +  } else if (type == GL_UNSIGNED_SHORT_5_6_5) {
> >> > + RETURN_IF_SUPPORTED(MESA_FORMAT_B5G6R5_UNORM);
> >>
> >> Shouldn't this be MESA_FORMAT_R5G6B5_UNORM? AFAICT, the reason for
> >> using BGR above, is the _REV suffix on the type...
> >
> > No, it's the first line that's "wrong" since the B10G10R10A2 matches
> > GL_BGRA + GL_UNSIGNED_INT_2_10_10_10_REV, GL_RGB implies that the
> > incoming data only has 3 channels (no alpha at all).
> 
> Good point about the alpha channel. It sounds like it should be
> changed to MESA_FORMAT_B10G10R10X2_UNORM instead.
> 
> But according to src/mesa/main/format_pack.c's
> pack_ubyte_b10g10r10a2_unorm(), mesa's B10G10R10A2 corresponds to
> OpenGL's UNSIGNED_INT_2_10_10_10_REV. So I think it matches GL_RGBA,
> not GL_BGRA. The latter would mean another swizzle, AFAICT.

The mapping is (_mesa_format_from_format_and_type):

   case GL_UNSIGNED_INT_2_10_10_10_REV:
  if (format == GL_RGB)
 return MESA_FORMAT_R10G10B10X2_UNORM;
  if (format == GL_RGBA)
 return MESA_FORMAT_R10G10B10A2_UNORM;
  else if (format == GL_RGBA_INTEGER)
 return MESA_FORMAT_R10G10B10A2_UINT;
  else if (format == GL_BGRA)
 return MESA_FORMAT_B10G10R10A2_UNORM;
  else if (format == GL_BGRA_INTEGER)
 return MESA_FORMAT_B10G10R10A2_UINT;
  break;

The trick is that the packed formats are written as lsb first (or it may
just be native and my lsb bias is showing).

> > i965 know how to do B5G6R5 and not R5G6B5, but for completeness we could
> > also add RETURN_IF_SUPPORTED(MESA_FORMAT_R5G6B5_UNORM);
> 
> I think intel-specific hacks (like preferring B5G6R5 over R5G6B5)
> shouldn't leak into _mesa_choose_tex_format(),

Hah, did you look at _mesa_choose_tex_format()? :)
I sent an another patch to do the hardware agnostic unswizzled conversions.

> So I think it'd be a
> good move to add "RETURN_IF_SUPPORTED(MESA_FORMAT_R5G6B5_UNORM);"
> before the latter return.

Not quite, because the mapping for 565 is:

   case GL_UNSIGNED_SHORT_5_6_5:
  if (format == GL_RGB)
 return MESA_FORMAT_B5G6R5_UNORM;
  else if (format == GL_BGR)
 return MESA_FORMAT_R5G6B5_UNORM;
  else if (format == GL_RGB_INTEGER)
 return MESA_FORMAT_B5G6R5_UINT;
  break;
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] i965/fs: Fix hang on IVB and VLV with image format mismatch.

2015-09-09 Thread Francisco Jerez
Ian Romanick  writes:

> On 09/03/2015 06:03 AM, Francisco Jerez wrote:
>> IVB and VLV hang sporadically when an untyped surface read or write
>> message is used to access a surface of format other than RAW, as may
>> happen when there is a mismatch between the format qualifier of the
>> image uniform and the format of the actual image bound to the
>> pipeline.  According to the spec this condition gives undefined
>> results but may not lead to program termination (which is one of the
>> possible outcomes of the hang).  Fix it by checking at runtime whether
>> the surface is of the right type.
>> 
>> Fixes the "arb_shader_image_load_store.invalid/format mismatch" piglit
>> subtest.
>> 
>> Reported-by: Mark Janes 
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91718
>> CC: mesa-sta...@lists.freedesktop.org
>> ---
>>  .../drivers/dri/i965/brw_fs_surface_builder.cpp| 42 
>> +++---
>>  1 file changed, 38 insertions(+), 4 deletions(-)
>> 
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp 
>> b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>> index f60afc9..57ce87f 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>> @@ -313,12 +313,42 @@ namespace {
>>  
>> namespace image_validity {
>>/**
>> +   * Check whether the bound image is suitable for untyped access.
>> +   */
>> +  brw_predicate
>> +  emit_untyped_image_check(const fs_builder , const fs_reg ,
>> +   brw_predicate pred)
>> +  {
>> + const brw_device_info *devinfo = bld.shader->devinfo;
>> + const fs_reg stride = offset(image, bld, 
>> BRW_IMAGE_PARAM_STRIDE_OFFSET);
>> +
>> + if (devinfo->gen == 7 && !devinfo->is_haswell) {
>> +/* Check whether the first stride component (i.e. the Bpp value)
>> + * is greater than four, what on Gen7 indicates that a surface 
>> of
>> + * type RAW has been bound for untyped access.  Reading or 
>> writing
>> + * to a surface of type other than RAW using untyped surface
>> + * messages causes a hang on IVB and VLV.
>> + */
>> +set_predicate(pred,
>> +  bld.CMP(bld.null_reg_ud(), stride, fs_reg(4),
>> +  BRW_CONDITIONAL_G));
>> +
>> +return BRW_PREDICATE_NORMAL;
>> + } else {
>> +/* More recent generations handle the format mismatch
>> + * gracefully.
>> + */
>> +return pred;
>> + }
>> +  }
>> +
>> +  /**
>> * Check whether there is an image bound at the given index and write
>> * the comparison result to f0.0.  Returns an appropriate predication
>> * mode to use on subsequent image operations.
>> */
>>brw_predicate
>> -  emit_surface_check(const fs_builder , const fs_reg )
>> +  emit_typed_atomic_check(const fs_builder , const fs_reg )
>
> This change seems spurious (and also reasonable).
>
The problem is that this patch introduces a new kind of surface check
applicable to untyped surface reads and writes only, so it would have
been confusing to keep the other surface check which is applicable to
atomics only with its previous rather unspecific name.

>>{
>>   const brw_device_info *devinfo = bld.shader->devinfo;
>>   const fs_reg size = offset(image, bld, 
>> BRW_IMAGE_PARAM_SIZE_OFFSET);
>> @@ -895,7 +925,9 @@ namespace brw {
>>   * surface read on the result,
>>   */
>>  const brw_predicate pred =
>> -   emit_bounds_check(bld, image, saddr, dims);
>> +   emit_untyped_image_check(bld, image,
>> +emit_bounds_check(bld, image,
>> +  saddr, dims));
>
> These appear to be the only two users of emit_bounds_check... shouldn't
> the bounds still be tested?
>
Yes, they are still.

>>  
>>  /* and they don't know about surface coordinates, we need to
>>   * convert them to a raw memory offset.
>> @@ -1041,7 +1073,9 @@ namespace brw {
>>  * the surface write on the result,
>>  */
>> const brw_predicate pred =
>> -  emit_bounds_check(bld, image, saddr, dims);
>> +  emit_untyped_image_check(bld, image,
>> +   emit_bounds_check(bld, image,
>> + saddr, dims));
>>  
>> /* and, phew, they don't know about surface coordinates, we
>>  * need to convert them to a raw memory offset.
>> @@ -1072,7 +1106,7 @@ namespace brw {
>>   using namespace image_coordinates;
>>   using namespace 

Re: [Mesa-dev] Adaptive Vsync

2015-09-09 Thread Lauri Kasanen
On Wed, 9 Sep 2015 09:11:50 -0400
Alex Deucher  wrote:

> > Oh, absolutely - I had no issues with "this needs changing". My issue
> > was with the fact it took months to get that. Had I come up with a new
> > patch, it would likely have taken a similar time, months again, which
> > did not inspire confidence.
> 
> You need to drive it.  Developers are busy and can miss patches
> especially if the subject is not necessarily an area they are familiar
> with.  This is not specific to mesa; the same is true about just about
> any open source project.  If you don't get any feedback right away
> send a gentle reminder to the list asking if there are any objections
> stating your desire to get it committed.  Wait a bit more time and if
> there is still no feedback, feel free to commit or ask someone to
> commit it.  If you just drive by and dump patches with no follow up,
> we have no idea how serious about the patches you are. Were they just
> an RFC, did you really want to commit them?  If you don't follow up on
> them, developers assume you don't really care about getting them
> committed.

I had followed up twice on ML in this particular case, and some times
on IRC.

- Lauri
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Don't tell the hardware about our UAV access.

2015-09-09 Thread Roland Scheidegger
Am 09.09.2015 um 15:04 schrieb Roland Scheidegger:
> Am 09.09.2015 um 05:56 schrieb Ian Romanick:
>> On 08/15/2015 02:24 AM, Francisco Jerez wrote:
>>> Roland Scheidegger  writes:
>>>
 I guess though you'd need these bits when implementing things like
 ARB_fragment_shader_ordering? (Thus stuff actually looks useful but I
 don't know if anybody wants to implement it in the near term.)

>>> I believe that could be implemented using the thread dependency
>>> mechanism which is independent of the UAV coherency stuff.  BTW I wasn't
>>> aware that the extension had become ARB.
>>
>> It hasn't.  It's still just an Intel extension.
> 
> Ah yes that's right not sure what I was looking at. There's a very
> similar NV extension though too (NV_fragment_shader_interlock), so there
> may be an ARB extension in the future.
> 

Actually it already exists, ARB_fragment_shader_interlock. Was just
missing the different name :-).

Roland


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH mesa v2 2/2] nv30: Disable msaa unless requested from the env by NV30_MAX_MSAA

2015-09-09 Thread Hans de Goede
Some modern apps try to use msaa without keeping in mind the
restrictions on videomem of older cards. Resulting in dmesg saying:

 [ 1197.850642] nouveau E[soffice.bin[3785]] fail ttm_validate
 [ 1197.850648] nouveau E[soffice.bin[3785]] validating bo list
 [ 1197.850654] nouveau E[soffice.bin[3785]] validate: -12

Because we are running out of video memory, after which the program
using the msaa visual freezes, and eventually the entire system freezes.

To work around this we do not allow msaa visauls by default and allow
the user to override this via NV30_MAX_MSAA.

Signed-off-by: Hans de Goede 
---
Changes in v2:
-Allow re-enabling msaa by setting NV30_MAX_MSAA in the environment
---
 src/gallium/drivers/nouveau/nv30/nv30_screen.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c 
b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
index 7aad26b..4b77f43 100644
--- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
@@ -319,8 +319,28 @@ nv30_screen_is_format_supported(struct pipe_screen 
*pscreen,
 unsigned sample_count,
 unsigned bindings)
 {
-   if (sample_count > 4)
+   unsigned int max_sample_count;
+
+   /*
+* Some modern apps try to use msaa without keeping in mind the
+* restrictions on videomem of older cards. Resulting in dmesg saying:
+* [ 1197.850642] nouveau E[soffice.bin[3785]] fail ttm_validate
+* [ 1197.850648] nouveau E[soffice.bin[3785]] validating bo list
+* [ 1197.850654] nouveau E[soffice.bin[3785]] validate: -12
+*
+* Because we are running out of video memory, after which the program
+* using the msaa visual freezes, and eventually the entire system freezes.
+*
+* To work around this we do not allow msaa visauls by default and allow
+* the user to override this via NV30_MAX_MSAA.
+*/
+   max_sample_count = debug_get_num_option("NV30_MAX_MSAA", 0);
+   if (max_sample_count > 4)
+  max_sample_count = 4;
+
+   if (sample_count > max_sample_count)
   return false;
+
if (!(0x0017 & (1 << sample_count)))
   return false;
 
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH mesa v2 1/2] nv30: Fix color resolving for nv3x cards

2015-09-09 Thread Hans de Goede
We do not have a generic blitter on nv3x cards, so we must use the
sifm object for color resolving.

This commit divides the sources and dest surfaces in to tiles which
match the constraints of the sifm object, so that color resolving
will work properly on nv3x cards.

Signed-off-by: Hans de Goede 
---
Changes in v2:
-Use 1024x1024 blocks
-Use the sifm on both nv3x and nv4x cards instead of only on nv3x cards
---
 src/gallium/drivers/nouveau/nv30/nv30_miptree.c | 38 -
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nv30/nv30_miptree.c 
b/src/gallium/drivers/nouveau/nv30/nv30_miptree.c
index 76bb8b8..735c718 100644
--- a/src/gallium/drivers/nouveau/nv30/nv30_miptree.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_miptree.c
@@ -149,14 +149,50 @@ static void
 nv30_resource_resolve(struct nv30_context *nv30,
   const struct pipe_blit_info *info)
 {
+   struct nv30_miptree *src_mt = nv30_miptree(info->src.resource);
struct nv30_rect src, dst;
+   unsigned x, x0, x1, y, y1, w, h;
 
define_rect(info->src.resource, 0, info->src.box.z, info->src.box.x,
   info->src.box.y, info->src.box.width, info->src.box.height, );
define_rect(info->dst.resource, 0, info->dst.box.z, info->dst.box.x,
   info->dst.box.y, info->dst.box.width, info->dst.box.height, );
 
-   nv30_transfer_rect(nv30, BILINEAR, , );
+   x0 = src.x0;
+   x1 = src.x1;
+   y1 = src.y1;
+
+   /* On nv3x we must use sifm which is restricted to 1024x1024 tiles */
+   for (y = src.y0; y < y1; y += h) {
+  h = y1 - y;
+  if (h > 1024)
+ h = 1024;
+
+  src.y0 = 0;
+  src.y1 = h;
+  src.h = h;
+
+  dst.y1 = dst.y0 + (h >> src_mt->ms_y);
+  dst.h = h >> src_mt->ms_y;
+
+  for (x = x0; x < x1; x += w) {
+ w = x1 - x;
+ if (w > 1024)
+w = 1024;
+
+ src.offset = y * src.pitch + x * src.cpp;
+ src.x0 = 0;
+ src.x1 = w;
+ src.w = w;
+
+ dst.offset = (y >> src_mt->ms_y) * dst.pitch +
+  (x >> src_mt->ms_x) * dst.cpp;
+ dst.x1 = dst.x0 + (w >> src_mt->ms_x);
+ dst.w = w >> src_mt->ms_x;
+
+ nv30_transfer_rect(nv30, BILINEAR, , );
+  }
+   }
 }
 
 void
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] i965: Move resolve from state-update notification to draw

2015-09-09 Thread Chris Wilson
Since _mesa_update_state() may be called several times per draw call,
we wish to be judicious about the amount of work we place inside the
intel_update_state() callback. By moving the current HiZ/color resolves
we need before drawing from out of the notify and into the draw itself,
we can save a few percent of overhead in OglBatch7 on Atom class
devices. The catch is that putting the resolve inside the draw call
itself causes recursion so we need a new vbo entry point so that we do
the resolves prior to sealing the vbo context.

No i965 piglits were harmed.
-Chris

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Nouveau] [PATCH mesa 2/3] nv30: Fix color resolving for nv3x cards

2015-09-09 Thread Hans de Goede

Hi,

On 08-09-15 09:53, Hans de Goede wrote:

Hi,

On 07-09-15 21:55, Ilia Mirkin wrote:

May I ask why you're doing 512x512 instead of 1024x1024? These are
already scaled up coordinates, so 1024x1024 should work no? Or is it
because of the seams on the edges? Do those not also appear with
512x512 or does it sample outside of the box?


This is my bad because of the bug fixed by patch 1/3 I had 512x512 in there
for a while, I've moved back and forth between 512 and 1024 a couple of times.

I've also tried 2048 but the hardware does not like that.

I will retest with 1024 and submit a fixed version.

I've not noticed any seams on the edges, even though I've been actively looking
for them.


Separately, why not use this approach on nv40 as well? I can't imagine
the blitter would be faster... does this result in lower quality?


I've the feeling the sifm bilinear filtering is more "blurry" then the
blitter one. I will do some more objective (ahem) tests on nv4x and get
back to you on this.


Ok I've run some tests comparing the rendering between using the blitter
and the sifm and the results seem identical, so one v2 using 1024x1024
blocks and doing so on all nv3x/nv4x cards coming up.

Regards,

Hans
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v6 2/2] drm/i915: Add soft-pinning API for execbuffer

2015-09-09 Thread Chris Wilson
On Wed, Sep 09, 2015 at 04:07:09PM +0200, Michał Winiarski wrote:
> From: Chris Wilson 
> 
> Userspace can pass in an offset that it presumes the object is located
> at. The kernel will then do its utmost to fit the object into that
> location. The assumption is that userspace is handling its own object
> locations (for example along with full-ppgtt) and that the kernel will
> rarely have to make space for the user's requests.
> 
> v2: Fixed incorrect eviction found by Michal Winiarski - fix suggested by 
> Chris
> Wilson.  Fixed incorrect error paths causing crash found by Michal Winiarski.
> (Not published externally)
> 
> v3: Rebased because of trivial conflict in object_bind_to_vm.  Fixed eviction
> to allow eviction of soft-pinned objects when another soft-pinned object used
> by a subsequent execbuffer overlaps reported by Michal Winiarski.
> (Not published externally)
> 
> v4: Moved soft-pinned objects to the front of ordered_vmas so that they are
> pinned first after an address conflict happens to avoid repeated conflicts in
> rare cases (Suggested by Chris Wilson).  Expanded comment on
> drm_i915_gem_exec_object2.offset to cover this new API.
> 
> v5: Added I915_PARAM_HAS_EXEC_SOFTPIN parameter for detecting this capability
> (Kristian). Added check for multiple pinnings on eviction (Akash). Made sure
> buffers are not considered misplaced without the user specifying
> EXEC_OBJECT_SUPPORTS_48BBADDRESS.  User must assume responsibility for any
> addressing workarounds.  Updated object2.offset field comment again to clarify
> NO_RELOC case (Chris).  checkpatch cleanup.
> 
> v6: Rebase on top of current nightly. Dropped any references to
> EXEC_OBJECT_SUPPORTS_48BBADDRESS since those don't exist in upstream
> yet, and are not a dependency.
> 
> Cc: Chris Wilson 
> Cc: Akash Goel 
> Cc: Vinay Belgaumkar 
> Cc: Michal Winiarski 
> Cc: Zou Nanhai 
> Cc: Kristian Høgsberg 
> Signed-off-by: Chris Wilson 
> Signed-off-by: Thomas Daniel 
> Signed-off-by: Michał Winiarski 

Again, the precursors are not included in this series or upstream, so
NAK.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v1] i915: fixing driver crashes if too few vertices are submitted

2015-09-09 Thread Marius Predut
Comparison with a signed expression and unsigned value
is converted to unsigned value, reason for minus value is interpreted
as a big unsigned value. For this case the "for" loop
is going into unexpected behavior.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38109
Signed-off-by: Marius Predut 
---
 src/mesa/tnl_dd/t_dd_dmatmp.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/tnl_dd/t_dd_dmatmp.h b/src/mesa/tnl_dd/t_dd_dmatmp.h
index 7be3954..88ecc78 100644
--- a/src/mesa/tnl_dd/t_dd_dmatmp.h
+++ b/src/mesa/tnl_dd/t_dd_dmatmp.h
@@ -627,6 +627,8 @@ static void TAG(render_quads_verts)( struct gl_context *ctx,
   LOCAL_VARS;
   GLuint j;
 
+  if(count%4 != 0) return;
+
   INIT(GL_TRIANGLES);
 
   for (j = start; j < count-3; j += 4) {
@@ -1248,7 +1250,7 @@ static GLboolean TAG(validate_render)( struct gl_context 
*ctx,
ok = (GLint) count < GET_SUBSEQUENT_VB_MAX_ELTS();
 }
 else {
-   ok = HAVE_TRIANGLES; /* flatshading is ok. */
+   ok = HAVE_TRIANGLES && (count%4 == 0); /* flatshading is ok. */
 }
 break;
   default:
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/7] i965: Remove redundant test for NULL intel_texture_object

2015-09-09 Thread Chris Wilson
Having checked whether the base class (gl_texture_object) is NULL, we
know that intel_texture_object itself cannot be NULL.

Signed-off-by: Chris Wilson 
Cc: Jordan Justen 
Cc: Jason Ekstrand 
Cc: Kenneth Graunke 
Cc: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_draw.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index 0ffcc24..3cea331 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -346,7 +346,6 @@ static void
 brw_draw_resolve(struct gl_context *ctx)
 {
struct brw_context *brw = brw_context(ctx);
-   struct intel_texture_object *tex_obj;
struct intel_renderbuffer *depth_irb;
 
/* Resolve the depth buffer's HiZ buffer. */
@@ -358,11 +357,15 @@ brw_draw_resolve(struct gl_context *ctx)
if (brw->NewGLState & _NEW_TEXTURE) {
   int maxEnabledUnit = brw->ctx.Texture._MaxEnabledTexImageUnit;
   for (int i = 0; i <= maxEnabledUnit; i++) {
+ struct intel_texture_object *tex_obj;
+
  if (!brw->ctx.Texture.Unit[i]._Current)
 continue;
+
  tex_obj = intel_texture_object(brw->ctx.Texture.Unit[i]._Current);
- if (!tex_obj || !tex_obj->mt)
+ if (!tex_obj->mt)
 continue;
+
  intel_miptree_all_slices_resolve_depth(brw, tex_obj->mt);
  intel_miptree_resolve_color(brw, tex_obj->mt);
  brw_render_cache_set_check_flush(brw, tex_obj->mt->bo);
-- 
2.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] i915/aa: fixing anti-aliasing bug for thinnest width lines

2015-09-09 Thread Brian Paul

On 09/09/2015 07:59 AM, Marius Predut wrote:

On PNV platform, for 1 pixel line thickness or less,
the general anti-aliasing algorithm gives up, and a garbage line is generated.
Setting a Line Width of 0.0 specifies the rasterization
of the "thinnest" (one-pixel-wide), non-antialiased lines.
Lines rendered with zero Line Width are rasterized using
Grid Intersection Quantization rules as specified by bspec G45: Volume 2: 
3D/Media,
7.3.13.1 Zero-Width (Cosmetic) Line Rasterization section.

This patch follow the same rules as patches fixing the
https://bugs.freedesktop.org/show_bug.cgi?id=28832
bug.

v1: Eduardo Lima Mitev:  Wrong indentation inside the if clause.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90367

Signed-off-by: Marius Predut 
---
  src/mesa/drivers/dri/i915/i915_state.c | 15 +++
  1 file changed, 15 insertions(+)

diff --git a/src/mesa/drivers/dri/i915/i915_state.c 
b/src/mesa/drivers/dri/i915/i915_state.c
index 4c83073..ebb4e9a 100644
--- a/src/mesa/drivers/dri/i915/i915_state.c
+++ b/src/mesa/drivers/dri/i915/i915_state.c
@@ -599,6 +599,21 @@ i915LineWidth(struct gl_context * ctx, GLfloat widthf)

 width = (int) (widthf * 2);
 width = CLAMP(width, 1, 0xf);
+
+   if (ctx->Line.Width < 1.5 || widthf < 1.5) {


By using "1.5f" we can avoid some float/double conversion.



+ /* For 1 pixel line thickness or less, the general
+  * anti-aliasing algorithm gives up, and a garbage line is
+  * generated.  Setting a Line Width of 0.0 specifies the
+  * rasterization of the "thinnest" (one-pixel-wide),
+  * non-antialiased lines.
+  *
+  * Lines rendered with zero Line Width are rasterized using
+  * Grid Intersection Quantization rules as specified by
+  * bspec G45: Volume 2: 3D/Media,


bspec?


+  * 7.3.13.1 Zero-Width (Cosmetic) Line Rasterization section
+  */
+  width = 0;
+   }
 lis4 |= width << S4_LINE_WIDTH_SHIFT;

 if (lis4 != i915->state.Ctx[I915_CTXREG_LIS4]) {



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/7] i965: Move the predraw resolve from _mesa_update_state into brw_draw_prims()

2015-09-09 Thread Chris Wilson
Performing the HiZ/color resolves is a fairly heavyweight operation that
we don't need to perform on every state update, only before use the
unresolved surfaces in a draw call. Since we use meta operations to
perform the resolve, we cannot do so from inside brw_draw_prims() itself
(as that will reenter and corrupt the vbo context). Instead we move the
resolve into a vbo->resolve() hook that is called just before the
primitives are drawn.

Signed-off-by: Chris Wilson 
Cc: Jordan Justen 
Cc: Jason Ekstrand 
Cc: Kenneth Graunke 
Cc: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_context.c | 24 
 src/mesa/drivers/dri/i965/brw_draw.c| 28 
 2 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 907b2a0..fdeadda 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -160,36 +160,12 @@ static void
 intel_update_state(struct gl_context * ctx, GLuint new_state)
 {
struct brw_context *brw = brw_context(ctx);
-   struct intel_texture_object *tex_obj;
-   struct intel_renderbuffer *depth_irb;
 
if (ctx->swrast_context)
   _swrast_InvalidateState(ctx, new_state);
_vbo_InvalidateState(ctx, new_state);
 
brw->NewGLState |= new_state;
-
-   _mesa_unlock_context_textures(ctx);
-
-   /* Resolve the depth buffer's HiZ buffer. */
-   depth_irb = intel_get_renderbuffer(ctx->DrawBuffer, BUFFER_DEPTH);
-   if (depth_irb)
-  intel_renderbuffer_resolve_hiz(brw, depth_irb);
-
-   /* Resolve depth buffer and render cache of each enabled texture. */
-   int maxEnabledUnit = ctx->Texture._MaxEnabledTexImageUnit;
-   for (int i = 0; i <= maxEnabledUnit; i++) {
-  if (!ctx->Texture.Unit[i]._Current)
-continue;
-  tex_obj = intel_texture_object(ctx->Texture.Unit[i]._Current);
-  if (!tex_obj || !tex_obj->mt)
-continue;
-  intel_miptree_all_slices_resolve_depth(brw, tex_obj->mt);
-  intel_miptree_resolve_color(brw, tex_obj->mt);
-  brw_render_cache_set_check_flush(brw, tex_obj->mt->bo);
-   }
-
-   _mesa_lock_context_textures(ctx);
 }
 
 #define flushFront(screen)  ((screen)->image.loader ? 
(screen)->image.loader->flushFrontBuffer : 
(screen)->dri2.loader->flushFrontBuffer)
diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index e5de420..d1b6279 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -342,6 +342,33 @@ brw_merge_inputs(struct brw_context *brw,
}
 }
 
+static void
+brw_draw_resolve(struct gl_context *ctx)
+{
+   struct brw_context *brw = brw_context(ctx);
+   struct intel_texture_object *tex_obj;
+   struct intel_renderbuffer *depth_irb;
+
+   /* Resolve the depth buffer's HiZ buffer. */
+   depth_irb = intel_get_renderbuffer(brw->ctx.DrawBuffer, BUFFER_DEPTH);
+   if (depth_irb)
+  intel_renderbuffer_resolve_hiz(brw, depth_irb);
+
+   /* Resolve depth buffer and render cache of each enabled texture. */
+   int maxEnabledUnit = brw->ctx.Texture._MaxEnabledTexImageUnit;
+   for (int i = 0; i <= maxEnabledUnit; i++) {
+  if (!brw->ctx.Texture.Unit[i]._Current)
+ continue;
+  tex_obj = intel_texture_object(brw->ctx.Texture.Unit[i]._Current);
+  if (!tex_obj || !tex_obj->mt)
+ continue;
+  intel_miptree_all_slices_resolve_depth(brw, tex_obj->mt);
+  intel_miptree_resolve_color(brw, tex_obj->mt);
+  brw_render_cache_set_check_flush(brw, tex_obj->mt->bo);
+   }
+}
+
+
 /**
  * \brief Call this after drawing to mark which buffers need resolving
  *
@@ -615,6 +642,7 @@ brw_draw_init(struct brw_context *brw)
 
/* Register our drawing function:
 */
+   vbo->resolve = brw_draw_resolve;
vbo->draw_prims = brw_draw_prims;
 
for (int i = 0; i < VERT_ATTRIB_MAX; i++)
-- 
2.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/7] vbo: Reorder clearing NeedFlush flag with performing the flush

2015-09-09 Thread Chris Wilson
When we come to flush the vertices, we need to clear the NeedFlush
flag first before calling into the backend as that backend may trigger
FLUSH_VERTICES() itself (through use of meta) before dirtying the vertex
state. If the NeedFlush flag is still in place, we end up recursing
into the backend again (ad infinitum or until we hit an assertion).

Signed-off-by: Chris Wilson 
Cc: Brian Paul 
Cc: Jordan Justen 
Cc: Jason Ekstrand 
Cc: Kenneth Graunke 
Cc: Francisco Jerez 
---
 src/mesa/vbo/vbo_exec_api.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/vbo/vbo_exec_api.c b/src/mesa/vbo/vbo_exec_api.c
index 138cd60..5486104 100644
--- a/src/mesa/vbo/vbo_exec_api.c
+++ b/src/mesa/vbo/vbo_exec_api.c
@@ -1187,6 +1187,9 @@ void vbo_exec_FlushVertices( struct gl_context *ctx, 
GLuint flags )
   return;
}
 
+   /* Don't try and recursively flush the current VBO */
+   ctx->Driver.NeedFlush &= ~(FLUSH_UPDATE_CURRENT | flags);
+
/* Flush (draw), and make sure VBO is left unmapped when done */
vbo_exec_FlushVertices_internal(exec, GL_TRUE);
 
-- 
2.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/7] i965: Move update_state from brw_meta_resolve_color() to after all resolves

2015-09-09 Thread Chris Wilson
Now that we moved the resolving from out of the _mesa_update_state()
callback we can remove the trick of calling _mesa_update_state() again
from inside the resolver (as we no longer need to adhere to the
_mesa_update_state() contract). However, we still need to make sure that
the context is up to date before proceeding on with the vbo draw.

Signed-off-by: Chris Wilson 
Cc: Jordan Justen 
Cc: Jason Ekstrand 
Cc: Kenneth Graunke 
Cc: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_draw.c| 7 +++
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 9 -
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index d1b6279..1269b65 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -366,6 +366,13 @@ brw_draw_resolve(struct gl_context *ctx)
   intel_miptree_resolve_color(brw, tex_obj->mt);
   brw_render_cache_set_check_flush(brw, tex_obj->mt->bo);
}
+
+   /* As we may have resolved using meta operations and so left the
+* context in a dirty state, we need to complete the invalidations
+* before continuing on with the vbo draw (and its context checks).
+*/
+   if (ctx->NewState)
+  _mesa_update_state(ctx);
 }
 
 
diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index f5ecbb5..dd7f148 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -710,13 +710,4 @@ brw_meta_resolve_color(struct brw_context *brw,
_mesa_DeleteFramebuffers(1, );
 
_mesa_meta_end(ctx);
-
-   /* We're typically called from intel_update_state() and we're supposed to
-* return with the state all updated to what it was before
-* brw_meta_resolve_color() was called.  The meta rendering will have
-* messed up the state and we need to call _mesa_update_state() again to
-* get back to where we were supposed to be when resolve was called.
-*/
-   if (ctx->NewState)
-  _mesa_update_state(ctx);
 }
-- 
2.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/7] vbo: Add a predraw resolve callback

2015-09-09 Thread Chris Wilson
A common problem with using HiZ and multisampling is that surfaces need
to resolved prior to use. Currently i965 does this inside its state
update hook, but that is a comparatively heavyweight operation that need
not be performed so frequently. The obvious solution (and therefore
fraught with dragons) is to move the HiZ/color resolves into the
brw_draw_prims() - however, the resolves are performed using meta and
end up re-entering brw_draw_prims() corrupting the context state of the
original call. To avoid the meta recursion, we can add a new callback
(vbo->resolve()) into the vbo pipeline that is called just before
vbo->draw().

Signed-off-by: Chris Wilson 
Cc: Brian Paul 
Cc: Jordan Justen 
Cc: Jason Ekstrand 
Cc: Kenneth Graunke 
Cc: Francisco Jerez 
---
 src/mesa/vbo/vbo.h|  1 +
 src/mesa/vbo/vbo_context.c| 19 +++
 src/mesa/vbo/vbo_context.h|  1 +
 src/mesa/vbo/vbo_exec_array.c |  1 +
 src/mesa/vbo/vbo_exec_draw.c  |  5 -
 src/mesa/vbo/vbo_save_draw.c  |  2 ++
 6 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/src/mesa/vbo/vbo.h b/src/mesa/vbo/vbo.h
index 2aaff5d..b64c468 100644
--- a/src/mesa/vbo/vbo.h
+++ b/src/mesa/vbo/vbo.h
@@ -89,6 +89,7 @@ vbo_initialize_save_dispatch(const struct gl_context *ctx,
  struct _glapi_table *exec);
 
 
+typedef void (*vbo_resolve_func)( struct gl_context *ctx);
 typedef void (*vbo_draw_func)( struct gl_context *ctx,
   const struct _mesa_prim *prims,
   GLuint nr_prims,
diff --git a/src/mesa/vbo/vbo_context.c b/src/mesa/vbo/vbo_context.c
index e3eb286..1f0b46a 100644
--- a/src/mesa/vbo/vbo_context.c
+++ b/src/mesa/vbo/vbo_context.c
@@ -148,11 +148,30 @@ static void init_mat_currval(struct gl_context *ctx)
 }
 
 
+static void nop_resolve(struct gl_context *ctx)
+{
+}
+
+static void nop_draw(struct gl_context *ctx,
+ const struct _mesa_prim *prims,
+ GLuint nr_prims,
+ const struct _mesa_index_buffer *ib,
+ GLboolean index_bounds_valid,
+ GLuint min_index,
+ GLuint max_index,
+ struct gl_transform_feedback_object *tfb_vertcount,
+ unsigned stream,
+ struct gl_buffer_object *indirect)
+{
+}
+
 GLboolean _vbo_CreateContext( struct gl_context *ctx )
 {
struct vbo_context *vbo = CALLOC_STRUCT(vbo_context);
 
ctx->vbo_context = vbo;
+   vbo->draw_prims = nop_draw;
+   vbo->resolve = nop_resolve;
 
/* Initialize the arrayelt helper
 */
diff --git a/src/mesa/vbo/vbo_context.h b/src/mesa/vbo/vbo_context.h
index a376efe..c4033ee4 100644
--- a/src/mesa/vbo/vbo_context.h
+++ b/src/mesa/vbo/vbo_context.h
@@ -75,6 +75,7 @@ struct vbo_context {
/* Callback into the driver.  This must always succeed, the driver
 * is responsible for initiating any fallback actions required:
 */
+   vbo_resolve_func resolve;
vbo_draw_func draw_prims;
 };
 
diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c
index 34d2c1d..d592ae4 100644
--- a/src/mesa/vbo/vbo_exec_array.c
+++ b/src/mesa/vbo/vbo_exec_array.c
@@ -549,6 +549,7 @@ vbo_bind_arrays(struct gl_context *ctx)
struct vbo_context *vbo = vbo_context(ctx);
struct vbo_exec_context *exec = >exec;
 
+   vbo->resolve(ctx);
vbo_draw_method(vbo, DRAW_ARRAYS);
 
if (exec->array.recalculate_inputs) {
diff --git a/src/mesa/vbo/vbo_exec_draw.c b/src/mesa/vbo/vbo_exec_draw.c
index 2bfb0c3..fa5b06b 100644
--- a/src/mesa/vbo/vbo_exec_draw.c
+++ b/src/mesa/vbo/vbo_exec_draw.c
@@ -388,11 +388,14 @@ vbo_exec_vtx_flush(struct vbo_exec_context *exec, 
GLboolean keepUnmapped)
 
   if (exec->vtx.copied.nr != exec->vtx.vert_count) {
 struct gl_context *ctx = exec->ctx;
+
+ vbo_context(ctx)->resolve( ctx );
 
 /* Before the update_state() as this may raise _NEW_VARYING_VP_INPUTS
   * from _mesa_set_varying_vp_inputs().
  */
-vbo_exec_bind_arrays( ctx );
+ vbo_draw_method( vbo_context(ctx), DRAW_BEGIN_END);
+ vbo_exec_bind_arrays( ctx );
 
  if (ctx->NewState)
 _mesa_update_state( ctx );
diff --git a/src/mesa/vbo/vbo_save_draw.c b/src/mesa/vbo/vbo_save_draw.c
index b1fd689..2103b8e 100644
--- a/src/mesa/vbo/vbo_save_draw.c
+++ b/src/mesa/vbo/vbo_save_draw.c
@@ -297,6 +297,8 @@ vbo_save_playback_vertex_list(struct gl_context *ctx, void 
*data)
  return;
   }
 
+  vbo_context(ctx)->resolve(ctx);
+
   vbo_bind_vertex_list( ctx, node );
 
   vbo_draw_method(vbo_context(ctx), DRAW_DISPLAY_LIST);
-- 
2.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

[Mesa-dev] [PATCH 5/7] i965: Remove redundant update_state from intelReadPixels()

2015-09-09 Thread Chris Wilson
The very first operation performed by _mesa_readpixels() is to call
_mesa_update_state() - we do not need to do so ourselves.

Signed-off-by: Chris Wilson 
Cc: Jordan Justen 
Cc: Jason Ekstrand 
Cc: Kenneth Graunke 
Cc: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/intel_pixel_read.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_pixel_read.c 
b/src/mesa/drivers/dri/i965/intel_pixel_read.c
index eb366cd..16dcc55 100644
--- a/src/mesa/drivers/dri/i965/intel_pixel_read.c
+++ b/src/mesa/drivers/dri/i965/intel_pixel_read.c
@@ -269,14 +269,6 @@ intelReadPixels(struct gl_context * ctx,
intel_prepare_render(brw);
brw->front_buffer_dirty = dirty;
 
-   /* Update Mesa state before calling _mesa_readpixels().
-* XXX this may not be needed since ReadPixels no longer uses the
-* span code.
-*/
-
-   if (ctx->NewState)
-  _mesa_update_state(ctx);
-
_mesa_readpixels(ctx, x, y, width, height, format, type, pack, pixels);
 
/* There's an intel_prepare_render() call in intelSpanRenderStart(). */
-- 
2.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/7] i965: Only resolve the textures if a _NEW_STATE state change is flagged

2015-09-09 Thread Chris Wilson
Only walk through the set of enabled TextureUnits looking for a texture
that needs to be resolved if the context state flags a new texture.

Note that this will miss if the client is rendering into a texture that
it is reading from, though that needs explicit barriers (and futhermore
no piglits complain).

Signed-off-by: Chris Wilson 
Cc: Jordan Justen 
Cc: Jason Ekstrand 
Cc: Kenneth Graunke 
Cc: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_draw.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index 1269b65..0ffcc24 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -355,16 +355,18 @@ brw_draw_resolve(struct gl_context *ctx)
   intel_renderbuffer_resolve_hiz(brw, depth_irb);
 
/* Resolve depth buffer and render cache of each enabled texture. */
-   int maxEnabledUnit = brw->ctx.Texture._MaxEnabledTexImageUnit;
-   for (int i = 0; i <= maxEnabledUnit; i++) {
-  if (!brw->ctx.Texture.Unit[i]._Current)
- continue;
-  tex_obj = intel_texture_object(brw->ctx.Texture.Unit[i]._Current);
-  if (!tex_obj || !tex_obj->mt)
- continue;
-  intel_miptree_all_slices_resolve_depth(brw, tex_obj->mt);
-  intel_miptree_resolve_color(brw, tex_obj->mt);
-  brw_render_cache_set_check_flush(brw, tex_obj->mt->bo);
+   if (brw->NewGLState & _NEW_TEXTURE) {
+  int maxEnabledUnit = brw->ctx.Texture._MaxEnabledTexImageUnit;
+  for (int i = 0; i <= maxEnabledUnit; i++) {
+ if (!brw->ctx.Texture.Unit[i]._Current)
+continue;
+ tex_obj = intel_texture_object(brw->ctx.Texture.Unit[i]._Current);
+ if (!tex_obj || !tex_obj->mt)
+continue;
+ intel_miptree_all_slices_resolve_depth(brw, tex_obj->mt);
+ intel_miptree_resolve_color(brw, tex_obj->mt);
+ brw_render_cache_set_check_flush(brw, tex_obj->mt->bo);
+  }
}
 
/* As we may have resolved using meta operations and so left the
-- 
2.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] i915/aa: fixing anti-aliasing bug for thinnest width lines

2015-09-09 Thread Marius Predut
On PNV platform, for 1 pixel line thickness or less,
the general anti-aliasing algorithm gives up, and a garbage line is generated.
Setting a Line Width of 0.0 specifies the rasterization
of the "thinnest" (one-pixel-wide), non-antialiased lines.
Lines rendered with zero Line Width are rasterized using
Grid Intersection Quantization rules as specified by bspec G45: Volume 2: 
3D/Media,
7.3.13.1 Zero-Width (Cosmetic) Line Rasterization section.

This patch follow the same rules as patches fixing the
https://bugs.freedesktop.org/show_bug.cgi?id=28832
bug.

v1: Eduardo Lima Mitev:  Wrong indentation inside the if clause.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90367

Signed-off-by: Marius Predut 
---
 src/mesa/drivers/dri/i915/i915_state.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/src/mesa/drivers/dri/i915/i915_state.c 
b/src/mesa/drivers/dri/i915/i915_state.c
index 4c83073..ebb4e9a 100644
--- a/src/mesa/drivers/dri/i915/i915_state.c
+++ b/src/mesa/drivers/dri/i915/i915_state.c
@@ -599,6 +599,21 @@ i915LineWidth(struct gl_context * ctx, GLfloat widthf)

width = (int) (widthf * 2);
width = CLAMP(width, 1, 0xf);
+
+   if (ctx->Line.Width < 1.5 || widthf < 1.5) {
+ /* For 1 pixel line thickness or less, the general
+  * anti-aliasing algorithm gives up, and a garbage line is
+  * generated.  Setting a Line Width of 0.0 specifies the
+  * rasterization of the "thinnest" (one-pixel-wide),
+  * non-antialiased lines.
+  *
+  * Lines rendered with zero Line Width are rasterized using
+  * Grid Intersection Quantization rules as specified by
+  * bspec G45: Volume 2: 3D/Media,
+  * 7.3.13.1 Zero-Width (Cosmetic) Line Rasterization section
+  */
+  width = 0;
+   }
lis4 |= width << S4_LINE_WIDTH_SHIFT;
 
if (lis4 != i915->state.Ctx[I915_CTXREG_LIS4]) {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Adaptive Vsync

2015-09-09 Thread Alex Deucher
On Sun, Sep 6, 2015 at 4:28 AM, Lauri Kasanen  wrote:
> On Sat, 5 Sep 2015 23:29:05 +
> Albert Freeman  wrote:
>
>> The reply from Eric Anholt made two suggestions that should not be
>> difficult to implement for someone who made the patch in the first
>> place. Why would code be committed when improvements could be easily
>> made? From what I have seen, this kind of thing happens even to
>> experienced mesa developers who know exactly what they are doing.
>>
>> It is much better to arrive at a solution without the issue now than
>> wait a few years (or some other period of time) for it to be replaced
>> by someone who has likely forgotten the details involved with its
>> implementation (or by someone who did not write the code in the first
>> place).
>
> Oh, absolutely - I had no issues with "this needs changing". My issue
> was with the fact it took months to get that. Had I come up with a new
> patch, it would likely have taken a similar time, months again, which
> did not inspire confidence.

You need to drive it.  Developers are busy and can miss patches
especially if the subject is not necessarily an area they are familiar
with.  This is not specific to mesa; the same is true about just about
any open source project.  If you don't get any feedback right away
send a gentle reminder to the list asking if there are any objections
stating your desire to get it committed.  Wait a bit more time and if
there is still no feedback, feel free to commit or ask someone to
commit it.  If you just drive by and dump patches with no follow up,
we have no idea how serious about the patches you are. Were they just
an RFC, did you really want to commit them?  If you don't follow up on
them, developers assume you don't really care about getting them
committed.

Alex

>
> Benjamin, if you want to take it forward, please do.
>
> - Lauri
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Don't tell the hardware about our UAV access.

2015-09-09 Thread Roland Scheidegger
Am 09.09.2015 um 05:56 schrieb Ian Romanick:
> On 08/15/2015 02:24 AM, Francisco Jerez wrote:
>> Roland Scheidegger  writes:
>>
>>> I guess though you'd need these bits when implementing things like
>>> ARB_fragment_shader_ordering? (Thus stuff actually looks useful but I
>>> don't know if anybody wants to implement it in the near term.)
>>>
>> I believe that could be implemented using the thread dependency
>> mechanism which is independent of the UAV coherency stuff.  BTW I wasn't
>> aware that the extension had become ARB.
> 
> It hasn't.  It's still just an Intel extension.

Ah yes that's right not sure what I was looking at. There's a very
similar NV extension though too (NV_fragment_shader_interlock), so there
may be an ARB extension in the future.

Roland


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v1] i915: fixing driver crashes if too few vertices are submitted

2015-09-09 Thread Brian Paul

On 09/09/2015 08:16 AM, Marius Predut wrote:

Comparison with a signed expression and unsigned value
is converted to unsigned value, reason for minus value is interpreted
as a big unsigned value. For this case the "for" loop
is going into unexpected behavior.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38109
Signed-off-by: Marius Predut 
---
  src/mesa/tnl_dd/t_dd_dmatmp.h | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/tnl_dd/t_dd_dmatmp.h b/src/mesa/tnl_dd/t_dd_dmatmp.h
index 7be3954..88ecc78 100644
--- a/src/mesa/tnl_dd/t_dd_dmatmp.h
+++ b/src/mesa/tnl_dd/t_dd_dmatmp.h
@@ -627,6 +627,8 @@ static void TAG(render_quads_verts)( struct gl_context *ctx,
LOCAL_VARS;
GLuint j;

+  if(count%4 != 0) return;


Should be:

   if (count % 4 != 0)
  return

In addition to following our code style, this allows one to put a 
breakpoint on the return statement if one wanted to catch that 
particular condition when debugging.




+
INIT(GL_TRIANGLES);

for (j = start; j < count-3; j += 4) {
@@ -1248,7 +1250,7 @@ static GLboolean TAG(validate_render)( struct gl_context 
*ctx,
ok = (GLint) count < GET_SUBSEQUENT_VB_MAX_ELTS();
 }
 else {
-   ok = HAVE_TRIANGLES; /* flatshading is ok. */
+   ok = HAVE_TRIANGLES && (count%4 == 0); /* flatshading is ok. */


We put spaces around operators, so "count % 4" please.



 }
 break;
default:



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 9/9] softpipe: Implement and enable textureQueryLod

2015-09-09 Thread Brian Paul

On 09/09/2015 04:35 AM, Krzesimir Nowak wrote:

Passes the shader piglit tests and introduces no regressions.

This commit finally makes use of the refactoring in previous
commits.
---
  src/gallium/drivers/softpipe/sp_screen.c |  2 +-
  src/gallium/drivers/softpipe/sp_tex_sample.c | 47 +++-
  2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/softpipe/sp_screen.c 
b/src/gallium/drivers/softpipe/sp_screen.c
index 0bfd9c3..7ca8a67 100644
--- a/src/gallium/drivers/softpipe/sp_screen.c
+++ b/src/gallium/drivers/softpipe/sp_screen.c
@@ -193,9 +193,9 @@ softpipe_get_param(struct pipe_screen *screen, enum 
pipe_cap param)
 case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
return 4;
 case PIPE_CAP_TEXTURE_GATHER_SM5:
+   case PIPE_CAP_TEXTURE_QUERY_LOD:
return 1;
 case PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT:
-   case PIPE_CAP_TEXTURE_QUERY_LOD:
 case PIPE_CAP_SAMPLE_SHADING:
 case PIPE_CAP_TEXTURE_GATHER_OFFSETS:
return 0;
diff --git a/src/gallium/drivers/softpipe/sp_tex_sample.c 
b/src/gallium/drivers/softpipe/sp_tex_sample.c
index 6e639e0..499c8f9 100644
--- a/src/gallium/drivers/softpipe/sp_tex_sample.c
+++ b/src/gallium/drivers/softpipe/sp_tex_sample.c
@@ -3566,6 +3566,51 @@ sp_tgsi_get_samples(struct tgsi_sampler *tgsi_sampler,
 sample_mip(sp_sview, sp_samp, cs, ct, cp, c0, lod, _args, rgba);
  }

+static void
+sp_tgsi_query_lod(struct tgsi_sampler *tgsi_sampler,
+  const unsigned sview_index,
+  const unsigned sampler_index,
+  const float s[TGSI_QUAD_SIZE],
+  const float t[TGSI_QUAD_SIZE],
+  const float p[TGSI_QUAD_SIZE],
+  const float c0[TGSI_QUAD_SIZE],
+  enum tgsi_sampler_control control,
+  float mipmap[TGSI_QUAD_SIZE],
+  float lod[TGSI_QUAD_SIZE])
+{
+   static const float lod_in[TGSI_QUAD_SIZE] = { 0.0, 0.0, 0.0, 0.0 };
+
+   struct sp_tgsi_sampler *sp_tgsi_samp = (struct sp_tgsi_sampler 
*)tgsi_sampler;


Can that be const-qualified, and the tgsi_sampler function parameter?

Ideally, we'd also have a cast-wrapper function to use instead of an 
inline cast here and elsewhere.  That could be done as a follow-up.




+   struct sp_sampler_view *sp_sview;
+   struct sp_sampler *sp_samp;
+   struct sp_mip *mip;
+   int i;
+   float cs[TGSI_QUAD_SIZE];
+   float ct[TGSI_QUAD_SIZE];
+   float cp[TGSI_QUAD_SIZE];
+
+   assert(sview_index < PIPE_MAX_SHADER_SAMPLER_VIEWS);
+   assert(sampler_index < PIPE_MAX_SAMPLERS);
+   assert(sp_tgsi_samp->sp_sampler[sampler_index]);
+
+   sp_sview = _tgsi_samp->sp_sview[sview_index];
+   sp_samp = sp_tgsi_samp->sp_sampler[sampler_index];
+   /* always have a view here but texture is NULL if no sampler view was set. 
*/
+   if (!sp_sview->base.texture) {
+  for (i = 0; i < TGSI_QUAD_SIZE; i++) {
+ mipmap[i] = 0.0f;
+ lod[i] = 0.0f;
+  }
+  return;
+   }
+
+   sp_sview->convert_coords(sp_sview, sp_samp, s, t, p, c0, cs, ct, cp);
+
+   compute_lambda_lod_not_clamped(sp_sview, sp_samp,
+  cs, ct, cp, lod_in, control, lod);
+   get_filters(sp_sview, sp_samp, control, , NULL, NULL);
+   mip->level(sp_sview, sp_samp, lod, mipmap);
+}

  static void
  sp_tgsi_get_texel(struct tgsi_sampler *tgsi_sampler,
@@ -3602,7 +3647,7 @@ sp_create_tgsi_sampler(void)
 samp->base.get_dims = sp_tgsi_get_dims;
 samp->base.get_samples = sp_tgsi_get_samples;
 samp->base.get_texel = sp_tgsi_get_texel;
+   samp->base.query_lod = sp_tgsi_query_lod;

 return samp;
  }
-



Reviewed-by: Brian Paul 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/9] tgsi: Add code for handling lodq opcode

2015-09-09 Thread Brian Paul

On 09/09/2015 04:35 AM, Krzesimir Nowak wrote:

This introduces new vfunc in tgsi_sampler just for this opcode. I
decided against extending get_samples vfunc to return the mipmap level
and LOD - the function's prototype is already too scary and doing the
sampling for textureQueryLod would be a waste of time.
---
  src/gallium/auxiliary/tgsi/tgsi_exec.c | 44 ++
  src/gallium/auxiliary/tgsi/tgsi_exec.h | 10 
  2 files changed, 54 insertions(+)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index 9544623..054ad08 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -2132,6 +2132,44 @@ exec_tex(struct tgsi_exec_machine *mach,
 }
  }

+static void
+exec_lodq(struct tgsi_exec_machine *mach,
+  const struct tgsi_full_instruction *inst)
+{
+   uint unit;
+   int dim;
+   int i;
+   union tgsi_exec_channel coords[4];
+   const union tgsi_exec_channel *args[Elements(coords)];
+   union tgsi_exec_channel r[2];
+
+   unit = fetch_sampler_unit(mach, inst, 1);
+   dim = tgsi_util_get_texture_coord_dim(inst->Texture.Texture, NULL);
+   assert(dim <= Elements(coords));
+   /* fetch coordinates */
+   for (i = 0; i < dim; i++) {
+  FETCH([i], 0, TGSI_CHAN_X + i);
+  args[i] = [i];
+   }
+   for (i = dim; i < Elements(coords); i++) {
+  args[i] = 
+   }
+   mach->Sampler->query_lod(mach->Sampler, unit, unit,
+args[0]->f,
+args[1]->f,
+args[2]->f,
+args[3]->f,
+tgsi_sampler_lod_none,
+r[0].f,
+r[1].f);
+
+   if (inst->Dst[0].Register.WriteMask & TGSI_WRITEMASK_X) {
+  store_dest(mach, [0], >Dst[0], inst, TGSI_CHAN_X, 
TGSI_EXEC_DATA_FLOAT);
+   }
+   if (inst->Dst[0].Register.WriteMask & TGSI_WRITEMASK_Y) {
+  store_dest(mach, [1], >Dst[0], inst, TGSI_CHAN_Y, 
TGSI_EXEC_DATA_FLOAT);
+   }
+}

  static void
  exec_txd(struct tgsi_exec_machine *mach,
@@ -4378,6 +4416,12 @@ exec_instruction(
exec_tex(mach, inst, TEX_MODIFIER_GATHER, 2);
break;

+   case TGSI_OPCODE_LODQ:
+  /* src[0] = texcoord */
+  /* src[1] = sampler unit */
+  exec_lodq(mach, inst);
+  break;
+
 case TGSI_OPCODE_UP2H:
assert (0);
break;
diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.h 
b/src/gallium/auxiliary/tgsi/tgsi_exec.h
index 5d56aab..556e0af 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.h
@@ -138,6 +138,16 @@ struct tgsi_sampler
   const int j[TGSI_QUAD_SIZE], const int k[TGSI_QUAD_SIZE],
   const int lod[TGSI_QUAD_SIZE], const int8_t offset[3],
   float rgba[TGSI_NUM_CHANNELS][TGSI_QUAD_SIZE]);
+   void (*query_lod)(struct tgsi_sampler *tgsi_sampler,


Can tgsi_sampler be const-qualified?



+ const unsigned sview_index,
+ const unsigned sampler_index,
+ const float s[TGSI_QUAD_SIZE],
+ const float t[TGSI_QUAD_SIZE],
+ const float p[TGSI_QUAD_SIZE],
+ const float c0[TGSI_QUAD_SIZE],
+ enum tgsi_sampler_control control,
+ float mipmap[TGSI_QUAD_SIZE],
+ float lod[TGSI_QUAD_SIZE]);
  };

  #define TGSI_EXEC_NUM_TEMPS   4096



Reviewed-by: Brian Paul 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH mesa v2 1/2] nv30: Fix color resolving for nv3x cards

2015-09-09 Thread Ilia Mirkin
On Wed, Sep 9, 2015 at 9:52 AM, Hans de Goede  wrote:
> We do not have a generic blitter on nv3x cards, so we must use the
> sifm object for color resolving.
>
> This commit divides the sources and dest surfaces in to tiles which
> match the constraints of the sifm object, so that color resolving
> will work properly on nv3x cards.
>
> Signed-off-by: Hans de Goede 
> ---
> Changes in v2:
> -Use 1024x1024 blocks
> -Use the sifm on both nv3x and nv4x cards instead of only on nv3x cards

Thanks, pushed.

> ---
>  src/gallium/drivers/nouveau/nv30/nv30_miptree.c | 38 
> -
>  1 file changed, 37 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/nouveau/nv30/nv30_miptree.c 
> b/src/gallium/drivers/nouveau/nv30/nv30_miptree.c
> index 76bb8b8..735c718 100644
> --- a/src/gallium/drivers/nouveau/nv30/nv30_miptree.c
> +++ b/src/gallium/drivers/nouveau/nv30/nv30_miptree.c
> @@ -149,14 +149,50 @@ static void
>  nv30_resource_resolve(struct nv30_context *nv30,
>const struct pipe_blit_info *info)
>  {
> +   struct nv30_miptree *src_mt = nv30_miptree(info->src.resource);
> struct nv30_rect src, dst;
> +   unsigned x, x0, x1, y, y1, w, h;
>
> define_rect(info->src.resource, 0, info->src.box.z, info->src.box.x,
>info->src.box.y, info->src.box.width, info->src.box.height, );
> define_rect(info->dst.resource, 0, info->dst.box.z, info->dst.box.x,
>info->dst.box.y, info->dst.box.width, info->dst.box.height, );
>
> -   nv30_transfer_rect(nv30, BILINEAR, , );
> +   x0 = src.x0;
> +   x1 = src.x1;
> +   y1 = src.y1;
> +
> +   /* On nv3x we must use sifm which is restricted to 1024x1024 tiles */
> +   for (y = src.y0; y < y1; y += h) {
> +  h = y1 - y;
> +  if (h > 1024)
> + h = 1024;
> +
> +  src.y0 = 0;
> +  src.y1 = h;
> +  src.h = h;
> +
> +  dst.y1 = dst.y0 + (h >> src_mt->ms_y);
> +  dst.h = h >> src_mt->ms_y;
> +
> +  for (x = x0; x < x1; x += w) {
> + w = x1 - x;
> + if (w > 1024)
> +w = 1024;
> +
> + src.offset = y * src.pitch + x * src.cpp;
> + src.x0 = 0;
> + src.x1 = w;
> + src.w = w;
> +
> + dst.offset = (y >> src_mt->ms_y) * dst.pitch +
> +  (x >> src_mt->ms_x) * dst.cpp;
> + dst.x1 = dst.x0 + (w >> src_mt->ms_x);
> + dst.w = w >> src_mt->ms_x;
> +
> + nv30_transfer_rect(nv30, BILINEAR, , );
> +  }
> +   }
>  }
>
>  void
> --
> 2.4.3
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/9] Implement textureQueryLod in softpipe

2015-09-09 Thread Brian Paul

On 09/09/2015 04:35 AM, Krzesimir Nowak wrote:

The following patches do some preparatory refactoring in softpipe's
sp_tex_sample.{c,h} before actually implementing the feature.

There is also a patch that fixes softpipe's textureLod with bias. I
caught this bug when finished the textureQueryLod implementation - one
of four textureQueryLod tests were failing because of it. I sent
patches adding the testcases with non-zero lod-bias value to piglit
mailing list
(http://lists.freedesktop.org/archives/piglit/2015-September/017018.html).

Also, this is my first contribution to mesa, but I hope I didn't make
any obvious mistakes.

Thanks.

Krzesimir Nowak (9):
   tgsi: Remove trailing backslash in comment
   softpipe: Fix textureLod with nonzero GL_TEXTURE_LOD_BIAS value.
   softpipe: Split compute_lambda_lod into two functions
   softpipe: Put mip_filter_func inside a struct
   softpipe: Split code getting a filter into separate function
   softpipe: Split 3D to 2D coords conversion into separate function.
   softpipe: Add functions for computing mipmap level
   tgsi: Add code for handling lodq opcode
   softpipe: Implement and enable textureQueryLod

  src/gallium/auxiliary/tgsi/tgsi_exec.c   |  46 +++-
  src/gallium/auxiliary/tgsi/tgsi_exec.h   |  10 +
  src/gallium/drivers/softpipe/sp_screen.c |   2 +-
  src/gallium/drivers/softpipe/sp_tex_sample.c | 338 +--
  src/gallium/drivers/softpipe/sp_tex_sample.h |  32 ++-
  5 files changed, 345 insertions(+), 83 deletions(-)



Overall, this looks really nice.  You did a good job of following the 
coding style and documenting your patches.


I just have some nitpicks on some of the patches that I'll post/reply 
on.  Basically, I think a few more things could be const-qualified and 
more comments on the new functions to explain what's going on would be 
helpful.


If you can clean up those minor things I'll commit the series for you.

Thanks!

-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH mesa v2 2/2] nv30: Disable msaa unless requested from the env by NV30_MAX_MSAA

2015-09-09 Thread Ilia Mirkin
On Wed, Sep 9, 2015 at 9:52 AM, Hans de Goede  wrote:
> Some modern apps try to use msaa without keeping in mind the
> restrictions on videomem of older cards. Resulting in dmesg saying:
>
>  [ 1197.850642] nouveau E[soffice.bin[3785]] fail ttm_validate
>  [ 1197.850648] nouveau E[soffice.bin[3785]] validating bo list
>  [ 1197.850654] nouveau E[soffice.bin[3785]] validate: -12
>
> Because we are running out of video memory, after which the program
> using the msaa visual freezes, and eventually the entire system freezes.
>
> To work around this we do not allow msaa visauls by default and allow
> the user to override this via NV30_MAX_MSAA.
>
> Signed-off-by: Hans de Goede 
> ---
> Changes in v2:
> -Allow re-enabling msaa by setting NV30_MAX_MSAA in the environment

Moved this to screen init and pushed. is_format_supported is called a
lot, not sure how expensive getenv() is, don't want to find out.

> ---
>  src/gallium/drivers/nouveau/nv30/nv30_screen.c | 22 +-
>  1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c 
> b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
> index 7aad26b..4b77f43 100644
> --- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c
> +++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
> @@ -319,8 +319,28 @@ nv30_screen_is_format_supported(struct pipe_screen 
> *pscreen,
>  unsigned sample_count,
>  unsigned bindings)
>  {
> -   if (sample_count > 4)
> +   unsigned int max_sample_count;
> +
> +   /*
> +* Some modern apps try to use msaa without keeping in mind the
> +* restrictions on videomem of older cards. Resulting in dmesg saying:
> +* [ 1197.850642] nouveau E[soffice.bin[3785]] fail ttm_validate
> +* [ 1197.850648] nouveau E[soffice.bin[3785]] validating bo list
> +* [ 1197.850654] nouveau E[soffice.bin[3785]] validate: -12
> +*
> +* Because we are running out of video memory, after which the program
> +* using the msaa visual freezes, and eventually the entire system 
> freezes.
> +*
> +* To work around this we do not allow msaa visauls by default and allow
> +* the user to override this via NV30_MAX_MSAA.
> +*/
> +   max_sample_count = debug_get_num_option("NV30_MAX_MSAA", 0);
> +   if (max_sample_count > 4)
> +  max_sample_count = 4;
> +
> +   if (sample_count > max_sample_count)
>return false;
> +
> if (!(0x0017 & (1 << sample_count)))
>return false;
>
> --
> 2.4.3
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/9] softpipe: Put mip_filter_func inside a struct

2015-09-09 Thread Brian Paul

On 09/09/2015 04:35 AM, Krzesimir Nowak wrote:

Putting this function pointer into a struct enables grouping of
several related functions in a single place. For now it is just a
single function, but the struct will be later extended with a
mip_level_func for returning mip level.
---
  src/gallium/drivers/softpipe/sp_tex_sample.c | 28 +---
  src/gallium/drivers/softpipe/sp_tex_sample.h |  5 -
  2 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/softpipe/sp_tex_sample.c 
b/src/gallium/drivers/softpipe/sp_tex_sample.c
index 38bdc93..cd4a659 100644
--- a/src/gallium/drivers/softpipe/sp_tex_sample.c
+++ b/src/gallium/drivers/softpipe/sp_tex_sample.c
@@ -2515,6 +2515,12 @@ mip_filter_linear_2d_linear_repeat_POT(
 }
  }

+static struct sp_mip mip_linear = {mip_filter_linear};
+static struct sp_mip mip_nearest = {mip_filter_nearest};
+static struct sp_mip mip_none = {mip_filter_none};
+static struct sp_mip mip_none_no_filter_select = 
{mip_filter_none_no_filter_select};
+static struct sp_mip mip_linear_aniso = {mip_filter_linear_aniso};
+static struct sp_mip mip_linear_2d_linear_repeat_POT = 
{mip_filter_linear_2d_linear_repeat_POT};


Can those be const-qualified too?




  /**
   * Do shadow/depth comparisons.
@@ -2918,18 +2924,18 @@ sample_mip(struct sp_sampler_view *sp_sview,
 const struct filter_args *filt_args,
 float rgba[TGSI_NUM_CHANNELS][TGSI_QUAD_SIZE])
  {
-   mip_filter_func mip_filter;
+   struct sp_mip *mip = NULL;
 img_filter_func min_img_filter = NULL;
 img_filter_func mag_img_filter = NULL;

 if (filt_args->control == tgsi_sampler_gather) {
-  mip_filter = mip_filter_nearest;
+  mip = _nearest;
min_img_filter = get_img_filter(sp_sview, _samp->base, 
PIPE_TEX_FILTER_LINEAR, true);
 } else if (sp_sview->pot2d & sp_samp->min_mag_equal_repeat_linear) {
-  mip_filter = mip_filter_linear_2d_linear_repeat_POT;
+  mip = _linear_2d_linear_repeat_POT;
 }
 else {
-  mip_filter = sp_samp->mip_filter;
+  mip = _samp->mip;
min_img_filter = get_img_filter(sp_sview, _samp->base, 
sp_samp->min_img_filter, false);
if (sp_samp->min_mag_equal) {
   mag_img_filter = min_img_filter;
@@ -2939,8 +2945,8 @@ sample_mip(struct sp_sampler_view *sp_sview,
}
 }

-   mip_filter(sp_sview, sp_samp, min_img_filter, mag_img_filter,
-  s, t, p, c0, lod, filt_args, rgba);
+   mip->filter(sp_sview, sp_samp, min_img_filter, mag_img_filter,
+   s, t, p, c0, lod, filt_args, rgba);

 if (sp_samp->base.compare_mode != PIPE_TEX_COMPARE_NONE) {
sample_compare(sp_sview, sp_samp, s, t, p, c0, lod, filt_args->control, 
rgba);
@@ -3239,13 +3245,13 @@ softpipe_create_sampler_state(struct pipe_context *pipe,
 switch (sampler->min_mip_filter) {
 case PIPE_TEX_MIPFILTER_NONE:
if (sampler->min_img_filter == sampler->mag_img_filter)
- samp->mip_filter = mip_filter_none_no_filter_select;
+ samp->mip = mip_none_no_filter_select;
else
- samp->mip_filter = mip_filter_none;
+ samp->mip = mip_none;
break;

 case PIPE_TEX_MIPFILTER_NEAREST:
-  samp->mip_filter = mip_filter_nearest;
+  samp->mip = mip_nearest;
break;

 case PIPE_TEX_MIPFILTER_LINEAR:
@@ -3257,11 +3263,11 @@ softpipe_create_sampler_state(struct pipe_context *pipe,
sampler->max_anisotropy <= 1) {
   samp->min_mag_equal_repeat_linear = TRUE;
}
-  samp->mip_filter = mip_filter_linear;
+  samp->mip = mip_linear;

/* Anisotropic filtering extension. */
if (sampler->max_anisotropy > 1) {
- samp->mip_filter = mip_filter_linear_aniso;
+ samp->mip = mip_linear_aniso;

   /* Override min_img_filter:
* min_img_filter needs to be set to NEAREST since we need to access
diff --git a/src/gallium/drivers/softpipe/sp_tex_sample.h 
b/src/gallium/drivers/softpipe/sp_tex_sample.h
index 7d1aafc..78541e1 100644
--- a/src/gallium/drivers/softpipe/sp_tex_sample.h
+++ b/src/gallium/drivers/softpipe/sp_tex_sample.h
@@ -128,6 +128,9 @@ struct sp_sampler_view

  };

+struct sp_mip {


I wonder if we could come up with a more descriptive name than "sp_mip". 
 Maybe something like sp_filter_funcs since it contains functions 
related to filtering?




+   mip_filter_func filter;
+};

  struct sp_sampler {
 struct pipe_sampler_state base;
@@ -144,7 +147,7 @@ struct sp_sampler {
 wrap_linear_func linear_texcoord_t;
 wrap_linear_func linear_texcoord_p;

-   mip_filter_func mip_filter;
+   struct sp_mip mip;
  };





___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] softpipe: Split compute_lambda_lod into two functions

2015-09-09 Thread Brian Paul

On 09/09/2015 04:35 AM, Krzesimir Nowak wrote:

textureQueryLod returns a vec2 with a mipmap information and a
LOD. The latter needs to be not clamped.
---
  src/gallium/drivers/softpipe/sp_tex_sample.c | 55 
  1 file changed, 39 insertions(+), 16 deletions(-)

diff --git a/src/gallium/drivers/softpipe/sp_tex_sample.c 
b/src/gallium/drivers/softpipe/sp_tex_sample.c
index 19188b0..38bdc93 100644
--- a/src/gallium/drivers/softpipe/sp_tex_sample.c
+++ b/src/gallium/drivers/softpipe/sp_tex_sample.c
@@ -1855,24 +1855,23 @@ compute_lod(const struct pipe_sampler_state *sampler,
  }


-/* Calculate level of detail for every fragment.
+/* Calculate level of detail for every fragment. The computed value is not
+ * clamped into lod_min and lod_max.
   * \param lod_in per-fragment lod_bias or explicit_lod.
   * \param lod results per-fragment lod.
   */
  static inline void
-compute_lambda_lod(struct sp_sampler_view *sp_sview,
-   struct sp_sampler *sp_samp,
-   const float s[TGSI_QUAD_SIZE],
-   const float t[TGSI_QUAD_SIZE],
-   const float p[TGSI_QUAD_SIZE],
-   const float lod_in[TGSI_QUAD_SIZE],
-   enum tgsi_sampler_control control,
-   float lod[TGSI_QUAD_SIZE])
+compute_lambda_lod_not_clamped(struct sp_sampler_view *sp_sview,


Not a big deal, but we use "unclamped" in various other places in Gallium.

Patches 1-3: Reviewed-by: Brian Paul 



+   struct sp_sampler *sp_samp,
+   const float s[TGSI_QUAD_SIZE],
+   const float t[TGSI_QUAD_SIZE],
+   const float p[TGSI_QUAD_SIZE],
+   const float lod_in[TGSI_QUAD_SIZE],
+   enum tgsi_sampler_control control,
+   float lod[TGSI_QUAD_SIZE])
  {
 const struct pipe_sampler_state *sampler = _samp->base;
-   float lod_bias = sampler->lod_bias;
-   float min_lod = sampler->min_lod;
-   float max_lod = sampler->max_lod;
+   const float lod_bias = sampler->lod_bias;
 float lambda;
 uint i;

@@ -1881,24 +1880,23 @@ compute_lambda_lod(struct sp_sampler_view *sp_sview,
/* XXX FIXME */
 case tgsi_sampler_derivs_explicit:
lambda = sp_sview->compute_lambda(sp_sview, s, t, p) + lod_bias;
-  lod[0] = lod[1] = lod[2] = lod[3] = CLAMP(lambda, min_lod, max_lod);
+  lod[0] = lod[1] = lod[2] = lod[3] = lambda;
break;
 case tgsi_sampler_lod_bias:
lambda = sp_sview->compute_lambda(sp_sview, s, t, p) + lod_bias;
for (i = 0; i < TGSI_QUAD_SIZE; i++) {
   lod[i] = lambda + lod_in[i];
- lod[i] = CLAMP(lod[i], min_lod, max_lod);
}
break;
 case tgsi_sampler_lod_explicit:
for (i = 0; i < TGSI_QUAD_SIZE; i++) {
- lod[i] = CLAMP(lod_in[i] + lod_bias, min_lod, max_lod);
+ lod[i] = lod_in[i] + lod_bias;
}
break;
 case tgsi_sampler_lod_zero:
 case tgsi_sampler_gather:
/* this is all static state in the sampler really need clamp here? */
-  lod[0] = lod[1] = lod[2] = lod[3] = CLAMP(lod_bias, min_lod, max_lod);
+  lod[0] = lod[1] = lod[2] = lod[3] = lod_bias;
break;
 default:
assert(0);
@@ -1906,6 +1904,31 @@ compute_lambda_lod(struct sp_sampler_view *sp_sview,
 }
  }

+/* Calculate level of detail for every fragment.
+ * \param lod_in per-fragment lod_bias or explicit_lod.
+ * \param lod results per-fragment lod.
+ */
+static inline void
+compute_lambda_lod(struct sp_sampler_view *sp_sview,
+   struct sp_sampler *sp_samp,
+   const float s[TGSI_QUAD_SIZE],
+   const float t[TGSI_QUAD_SIZE],
+   const float p[TGSI_QUAD_SIZE],
+   const float lod_in[TGSI_QUAD_SIZE],
+   enum tgsi_sampler_control control,
+   float lod[TGSI_QUAD_SIZE])
+{
+   const struct pipe_sampler_state *sampler = _samp->base;
+   const float min_lod = sampler->min_lod;
+   const float max_lod = sampler->max_lod;
+   int i;
+
+   compute_lambda_lod_not_clamped(sp_sview, sp_samp, s, t, p, lod_in, control, 
lod);
+   for (i = 0; i < TGSI_QUAD_SIZE; i++) {
+  lod[i] = CLAMP(lod[i], min_lod, max_lod);
+   }
+}
+
  static inline unsigned
  get_gather_component(const float lod_in[TGSI_QUAD_SIZE])
  {



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/9] softpipe: Split code getting a filter into separate function

2015-09-09 Thread Brian Paul

On 09/09/2015 04:35 AM, Krzesimir Nowak wrote:

This function will be later used by textureQueryLod. The
img_filter_func are optional, because textureQueryLod will not need
them.
---
  src/gallium/drivers/softpipe/sp_tex_sample.c | 53 +++-
  1 file changed, 37 insertions(+), 16 deletions(-)

diff --git a/src/gallium/drivers/softpipe/sp_tex_sample.c 
b/src/gallium/drivers/softpipe/sp_tex_sample.c
index cd4a659..0a3fc20 100644
--- a/src/gallium/drivers/softpipe/sp_tex_sample.c
+++ b/src/gallium/drivers/softpipe/sp_tex_sample.c
@@ -2912,6 +2912,42 @@ get_img_filter(const struct sp_sampler_view *sp_sview,
 }
  }

+/**
+ * Get mip filter, and optionally both img min filter and img mag filter
+ */
+static void
+get_filters(struct sp_sampler_view *sp_sview,
+struct sp_sampler *sp_samp,
+enum tgsi_sampler_control control,
+struct sp_mip **mip,
+img_filter_func *min,
+img_filter_func *mag)
+{
+   assert(mip);
+   if (control == tgsi_sampler_gather) {
+  *mip = _nearest;
+  if (min)
+ *min = get_img_filter(sp_sview, _samp->base, 
PIPE_TEX_FILTER_LINEAR, true);
+   } else if (sp_sview->pot2d & sp_samp->min_mag_equal_repeat_linear) {
+  *mip = _linear_2d_linear_repeat_POT;
+   }
+   else {
+  *mip = _samp->mip;
+  if (min || mag) {
+ img_filter_func tmp_filter = get_img_filter(sp_sview, _samp->base, 
sp_samp->min_img_filter, false);
+ if (min)
+*min = tmp_filter;
+ if (mag) {
+if (sp_samp->min_mag_equal) {
+   *mag = tmp_filter;
+}
+else {
+   *mag = get_img_filter(sp_sview, _samp->base, 
sp_samp->base.mag_img_filter, false);
+}
+ }
+  }
+   }
+}

  static void
  sample_mip(struct sp_sampler_view *sp_sview,
@@ -2928,22 +2964,7 @@ sample_mip(struct sp_sampler_view *sp_sview,
 img_filter_func min_img_filter = NULL;
 img_filter_func mag_img_filter = NULL;

-   if (filt_args->control == tgsi_sampler_gather) {
-  mip = _nearest;
-  min_img_filter = get_img_filter(sp_sview, _samp->base, 
PIPE_TEX_FILTER_LINEAR, true);
-   } else if (sp_sview->pot2d & sp_samp->min_mag_equal_repeat_linear) {
-  mip = _linear_2d_linear_repeat_POT;
-   }
-   else {
-  mip = _samp->mip;
-  min_img_filter = get_img_filter(sp_sview, _samp->base, 
sp_samp->min_img_filter, false);
-  if (sp_samp->min_mag_equal) {
- mag_img_filter = min_img_filter;
-  }
-  else {
- mag_img_filter = get_img_filter(sp_sview, _samp->base, 
sp_samp->base.mag_img_filter, false);
-  }
-   }
+   get_filters(sp_sview, sp_samp, filt_args->control, , _img_filter, 
_img_filter);

 mip->filter(sp_sview, sp_samp, min_img_filter, mag_img_filter,
 s, t, p, c0, lod, filt_args, rgba);



Reviewed-by: Brian Paul 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC libdrm] intel: Add support for softpin

2015-09-09 Thread Michał Winiarski
Softpin allows userspace to take greater control of GPU virtual address
space and eliminates the need of relocations. It can also be used to
mirror addresses between GPU and CPU (shared virtual memory).
Calls to drm_intel_bo_emit_reloc are still required to build the list of
drm_i915_gem_exec_objects at exec time, but no entries in relocs are
created. Self-relocs don't make any sense for softpinned objects and can
indicate a programming errors, thus are forbidden. Softpinned objects
are marked by asterisk in debug dumps.

Cc: Thomas Daniel 
Cc: Kristian Høgsberg 
Cc: Zou Nanhai 
Cc: Michel Thierry 
Cc: Ben Widawsky 
Cc: Chris Wilson 
Signed-off-by: Michał Winiarski 
---
 include/drm/i915_drm.h|   4 +-
 intel/intel_bufmgr.c  |   9 +++
 intel/intel_bufmgr.h  |   1 +
 intel/intel_bufmgr_gem.c  | 176 --
 intel/intel_bufmgr_priv.h |   7 ++
 5 files changed, 173 insertions(+), 24 deletions(-)

diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
index ded43b1..2b99fc6 100644
--- a/include/drm/i915_drm.h
+++ b/include/drm/i915_drm.h
@@ -350,6 +350,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_REVISION  32
 #define I915_PARAM_SUBSLICE_TOTAL   33
 #define I915_PARAM_EU_TOTAL 34
+#define I915_PARAM_HAS_EXEC_SOFTPIN 37
 
 typedef struct drm_i915_getparam {
int param;
@@ -680,7 +681,8 @@ struct drm_i915_gem_exec_object2 {
 #define EXEC_OBJECT_NEEDS_FENCE (1<<0)
 #define EXEC_OBJECT_NEEDS_GTT  (1<<1)
 #define EXEC_OBJECT_WRITE  (1<<2)
-#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_WRITE<<1)
+#define EXEC_OBJECT_PINNED (1<<3)
+#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_PINNED<<1)
__u64 flags;
 
__u64 rsvd1;
diff --git a/intel/intel_bufmgr.c b/intel/intel_bufmgr.c
index 14ea9f9..bd92335 100644
--- a/intel/intel_bufmgr.c
+++ b/intel/intel_bufmgr.c
@@ -261,6 +261,15 @@ drm_intel_bo_get_tiling(drm_intel_bo *bo, uint32_t * 
tiling_mode,
 }
 
 int
+drm_intel_bo_set_softpin_offset(drm_intel_bo *bo, uint64_t offset)
+{
+   if (bo->bufmgr->bo_set_softpin_offset)
+   return bo->bufmgr->bo_set_softpin_offset(bo, offset);
+
+   return -ENODEV;
+}
+
+int
 drm_intel_bo_disable_reuse(drm_intel_bo *bo)
 {
if (bo->bufmgr->bo_disable_reuse)
diff --git a/intel/intel_bufmgr.h b/intel/intel_bufmgr.h
index 95eecb8..62ea4a0 100644
--- a/intel/intel_bufmgr.h
+++ b/intel/intel_bufmgr.h
@@ -164,6 +164,7 @@ int drm_intel_bo_get_tiling(drm_intel_bo *bo, uint32_t * 
tiling_mode,
 int drm_intel_bo_flink(drm_intel_bo *bo, uint32_t * name);
 int drm_intel_bo_busy(drm_intel_bo *bo);
 int drm_intel_bo_madvise(drm_intel_bo *bo, int madv);
+int drm_intel_bo_set_softpin_offset(drm_intel_bo *bo, uint64_t offset);
 
 int drm_intel_bo_disable_reuse(drm_intel_bo *bo);
 int drm_intel_bo_is_reusable(drm_intel_bo *bo);
diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index 2723e21..615741c 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -184,6 +184,13 @@ struct _drm_intel_bo_gem {
drm_intel_reloc_target *reloc_target_info;
/** Number of entries in relocs */
int reloc_count;
+   /** Array of BOs that are referenced by this buffer and will be 
softpinned */
+   drm_intel_bo **softpin_target;
+   /** Number softpinned BOs that are referenced by this buffer */
+   int softpin_target_count;
+   /** Maximum amount of softpinned BOs that are referenced by this buffer 
*/
+   int softpin_target_size;
+
/** Mapped address for the buffer, saved across map/unmap cycles */
void *mem_virtual;
/** GTT virtual address for the buffer, saved across map/unmap cycles */
@@ -237,6 +244,11 @@ struct _drm_intel_bo_gem {
bool is_userptr;
 
/**
+* Whether this buffer is softpinned at offset specified by the user
+*/
+   bool is_softpin;
+
+   /**
 * Size in bytes of this buffer and its relocation descendents.
 *
 * Used to avoid costly tree walking in
@@ -384,8 +396,9 @@ drm_intel_gem_dump_validation_list(drm_intel_bufmgr_gem 
*bufmgr_gem)
drm_intel_bo *bo = bufmgr_gem->exec_bos[i];
drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
 
-   if (bo_gem->relocs == NULL) {
-   DBG("%2d: %d (%s)\n", i, bo_gem->gem_handle,
+   if (bo_gem->relocs == NULL && bo_gem->softpin_target == NULL) {
+   DBG("%2d: %d %s(%s)\n", i, bo_gem->gem_handle,
+   bo_gem->is_softpin ? "*" : "",
bo_gem->name);
continue;
}
@@ -395,16 +408,33 @@ drm_intel_gem_dump_validation_list(drm_intel_bufmgr_gem 
*bufmgr_gem)

[Mesa-dev] [PATCH 1/2] drm/i915/gtt: Allow adventurous users to select enable_ppgtt=3

2015-09-09 Thread Michał Winiarski
While support for 48b ppgtt is here, parameter enabling it is not known
to the sanitize function. Let's update it to allow selecting
full_48bit_ppgtt using module parameter.

Cc: Michel Thierry 
Cc: Mika Kuoppala 
Signed-off-by: Michał Winiarski 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 8786281..f598d63 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -104,9 +104,11 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, 
int enable_ppgtt)
 {
bool has_aliasing_ppgtt;
bool has_full_ppgtt;
+   bool has_full_48bit_ppgtt;
 
has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
+   has_full_48bit_ppgtt = INTEL_INFO(dev)->gen >= 9 || IS_BROADWELL(dev);
 
if (intel_vgpu_active(dev))
has_full_ppgtt = false; /* emulation is too hard */
@@ -125,6 +127,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, 
int enable_ppgtt)
if (enable_ppgtt == 2 && has_full_ppgtt)
return 2;
 
+   if (enable_ppgtt == 3 && has_full_48bit_ppgtt)
+   return 3;
+
 #ifdef CONFIG_INTEL_IOMMU
/* Disable ppgtt on SNB if VT-d is on. */
if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) {
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC libdrm] intel: Softpin support

2015-09-09 Thread Michał Winiarski
The goal of this series is to start a discussion whether the interface and
implementation of softpin support proposed for libdrm is acceptable by all
interested parties.
The i915 patches are added so that it's easy to apply the series on latest
drm-intel-nightly and libdrm master and start using softpin.

Michał Winiarski (1):
  drm/i915/gtt: Allow adventurous users to select enable_ppgtt=3
is not strictly necessary for softpin itself, but it is needed for SVM usecase.

Chris Wilson (1):
  drm/i915: Add soft-pinning API for execbuffer
is just v5 from Thomas Daniel rebased on top of current nightly.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v6 2/2] drm/i915: Add soft-pinning API for execbuffer

2015-09-09 Thread Michał Winiarski
From: Chris Wilson 

Userspace can pass in an offset that it presumes the object is located
at. The kernel will then do its utmost to fit the object into that
location. The assumption is that userspace is handling its own object
locations (for example along with full-ppgtt) and that the kernel will
rarely have to make space for the user's requests.

v2: Fixed incorrect eviction found by Michal Winiarski - fix suggested by Chris
Wilson.  Fixed incorrect error paths causing crash found by Michal Winiarski.
(Not published externally)

v3: Rebased because of trivial conflict in object_bind_to_vm.  Fixed eviction
to allow eviction of soft-pinned objects when another soft-pinned object used
by a subsequent execbuffer overlaps reported by Michal Winiarski.
(Not published externally)

v4: Moved soft-pinned objects to the front of ordered_vmas so that they are
pinned first after an address conflict happens to avoid repeated conflicts in
rare cases (Suggested by Chris Wilson).  Expanded comment on
drm_i915_gem_exec_object2.offset to cover this new API.

v5: Added I915_PARAM_HAS_EXEC_SOFTPIN parameter for detecting this capability
(Kristian). Added check for multiple pinnings on eviction (Akash). Made sure
buffers are not considered misplaced without the user specifying
EXEC_OBJECT_SUPPORTS_48BBADDRESS.  User must assume responsibility for any
addressing workarounds.  Updated object2.offset field comment again to clarify
NO_RELOC case (Chris).  checkpatch cleanup.

v6: Rebase on top of current nightly. Dropped any references to
EXEC_OBJECT_SUPPORTS_48BBADDRESS since those don't exist in upstream
yet, and are not a dependency.

Cc: Chris Wilson 
Cc: Akash Goel 
Cc: Vinay Belgaumkar 
Cc: Michal Winiarski 
Cc: Zou Nanhai 
Cc: Kristian Høgsberg 
Signed-off-by: Chris Wilson 
Signed-off-by: Thomas Daniel 
Signed-off-by: Michał Winiarski 
---
 drivers/gpu/drm/i915/i915_dma.c|  3 ++
 drivers/gpu/drm/i915/i915_drv.h|  3 ++
 drivers/gpu/drm/i915/i915_gem.c| 52 ++
 drivers/gpu/drm/i915/i915_gem_evict.c  | 39 ++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 16 +++--
 include/uapi/drm/i915_drm.h| 10 +-
 6 files changed, 106 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 066a0ef..bd619af 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -170,6 +170,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
case I915_PARAM_HAS_RESOURCE_STREAMER:
value = HAS_RESOURCE_STREAMER(dev);
break;
+   case I915_PARAM_HAS_EXEC_SOFTPIN:
+   value = 1;
+   break;
default:
DRM_DEBUG("Unknown parameter %d\n", param->param);
return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2ef83c5..8eb01e0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2815,6 +2815,7 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
 #define PIN_OFFSET_BIAS(1<<3)
 #define PIN_USER   (1<<4)
 #define PIN_UPDATE (1<<5)
+#define PIN_OFFSET_FIXED   (1<<6)
 #define PIN_OFFSET_MASK (~4095)
 int __must_check
 i915_gem_object_pin(struct drm_i915_gem_object *obj,
@@ -3157,6 +3158,8 @@ int __must_check i915_gem_evict_something(struct 
drm_device *dev,
  unsigned long start,
  unsigned long end,
  unsigned flags);
+int __must_check
+i915_gem_evict_for_vma(struct i915_vma *target);
 int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle);
 int i915_gem_evict_everything(struct drm_device *dev);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 41263cd..bb2ff81 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3437,22 +3437,42 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object 
*obj,
if (IS_ERR(vma))
goto err_unpin;
 
+   if (flags & PIN_OFFSET_FIXED) {
+   uint64_t offset = flags & PIN_OFFSET_MASK;
+
+   if (offset & (alignment - 1)) {
+   ret = -EINVAL;
+   goto err_free_vma;
+   }
+   vma->node.start = offset;
+   vma->node.size = size;
+   vma->node.color = obj->cache_level;
+   ret = drm_mm_reserve_node(>mm, >node);
+   if (ret) {
+   ret = i915_gem_evict_for_vma(vma);
+   if (ret == 0)
+  

Re: [Mesa-dev] [PATCH 7/9] softpipe: Add functions for computing mipmap level

2015-09-09 Thread Brian Paul

On 09/09/2015 04:35 AM, Krzesimir Nowak wrote:

These functions will be used by textureQueryLod.
---
  src/gallium/drivers/softpipe/sp_tex_sample.c | 100 +--
  src/gallium/drivers/softpipe/sp_tex_sample.h |   7 ++
  2 files changed, 101 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/softpipe/sp_tex_sample.c 
b/src/gallium/drivers/softpipe/sp_tex_sample.c
index cdec984..6e639e0 100644
--- a/src/gallium/drivers/softpipe/sp_tex_sample.c
+++ b/src/gallium/drivers/softpipe/sp_tex_sample.c
@@ -1937,6 +1937,38 @@ get_gather_component(const float lod_in[TGSI_QUAD_SIZE])
  }

  static void
+clamp_lod(const struct sp_sampler_view *sp_sview,
+  const struct sp_sampler *sp_samp,
+  const float lod[TGSI_QUAD_SIZE],
+  float clamped[TGSI_QUAD_SIZE])


Can you add a comment on this function too?  We also have 
compute_lambda_lod() which does lod clamping so the comments could help 
to explain what's different between them.




+{
+   const float min_lod = sp_samp->base.min_lod;
+   const float max_lod = sp_samp->base.max_lod;
+   const float min_level = sp_sview->base.u.tex.first_level;
+   const float max_level = sp_sview->base.u.tex.last_level;
+   int i;
+
+   for (i = 0; i < TGSI_QUAD_SIZE; i++) {
+  float cl = lod[i];
+
+  cl = CLAMP(cl, min_lod, max_lod);
+  /* XXX: Is min_level ever different from 0?
+   */
+  cl = CLAMP(cl, 0, max_level - min_level);
+  clamped[i] = cl;
+   }
+}
+
+static void
+mip_level_linear(struct sp_sampler_view *sp_sview,
+ struct sp_sampler *sp_samp,
+ const float lod[TGSI_QUAD_SIZE],
+ float level[TGSI_QUAD_SIZE])
+{
+   clamp_lod(sp_sview, sp_samp, lod, level);
+}
+
+static void
  mip_filter_linear(struct sp_sampler_view *sp_sview,
struct sp_sampler *sp_samp,
img_filter_func min_filter,
@@ -1998,6 +2030,23 @@ mip_filter_linear(struct sp_sampler_view *sp_sview,
  }


+static void
+mip_level_nearest(struct sp_sampler_view *sp_sview,
+  struct sp_sampler *sp_samp,
+  const float lod[TGSI_QUAD_SIZE],
+  float level[TGSI_QUAD_SIZE])
+{
+   const int first_level = sp_sview->base.u.tex.first_level;
+   int j;
+
+   clamp_lod(sp_sview, sp_samp, lod, level);
+   for (j = 0; j < TGSI_QUAD_SIZE; j++)
+  /* TODO: It should rather be:
+   * level[j] = first_level + ceil(level[j] + 0.5F) - 1.0F;
+   */
+  level[j] = first_level + (int)(level[j] + 0.5F);
+}
+
  /**
   * Compute nearest mipmap level from texcoords.
   * Then sample the texture level for four elements of a quad.
@@ -2050,6 +2099,19 @@ mip_filter_nearest(struct sp_sampler_view *sp_sview,


  static void
+mip_level_none(struct sp_sampler_view *sp_sview,
+   struct sp_sampler *sp_samp,
+   const float lod[TGSI_QUAD_SIZE],
+   float level[TGSI_QUAD_SIZE])
+{
+   int j;
+
+   for (j = 0; j < TGSI_QUAD_SIZE; j++) {
+  level[j] = sp_sview->base.u.tex.first_level;
+   }
+}
+
+static void
  mip_filter_none(struct sp_sampler_view *sp_sview,
  struct sp_sampler *sp_samp,
  img_filter_func min_filter,
@@ -2088,6 +2150,15 @@ mip_filter_none(struct sp_sampler_view *sp_sview,


  static void
+mip_level_none_no_filter_select(struct sp_sampler_view *sp_sview,
+struct sp_sampler *sp_samp,
+const float lod[TGSI_QUAD_SIZE],
+float level[TGSI_QUAD_SIZE])
+{
+   mip_level_none(sp_sview, sp_samp, lod, level);
+}
+
+static void
  mip_filter_none_no_filter_select(struct sp_sampler_view *sp_sview,
   struct sp_sampler *sp_samp,
   img_filter_func min_filter,
@@ -2339,6 +2410,15 @@ img_filter_2d_ewa(struct sp_sampler_view *sp_sview,
  }


+static void
+mip_level_linear_aniso(struct sp_sampler_view *sp_sview,
+   struct sp_sampler *sp_samp,
+   const float lod[TGSI_QUAD_SIZE],
+   float level[TGSI_QUAD_SIZE])
+{
+   mip_level_linear(sp_sview, sp_samp, lod, level);
+}
+
  /**
   * Sample 2D texture using an anisotropic filter.
   */
@@ -2450,6 +2530,14 @@ mip_filter_linear_aniso(struct sp_sampler_view *sp_sview,
 }
  }

+static void
+mip_level_linear_2d_linear_repeat_POT(struct sp_sampler_view *sp_sview,
+  struct sp_sampler *sp_samp,
+  const float lod[TGSI_QUAD_SIZE],
+  float level[TGSI_QUAD_SIZE])
+{
+   mip_level_linear(sp_sview, sp_samp, lod, level);
+}

  /**
   * Specialized version of mip_filter_linear with hard-wired calls to
@@ -2515,12 +2603,12 @@ mip_filter_linear_2d_linear_repeat_POT(
 }
  }

-static struct sp_mip mip_linear = {mip_filter_linear};
-static struct sp_mip mip_nearest = {mip_filter_nearest};
-static 

Re: [Mesa-dev] [PATCH 6/9] softpipe: Split 3D to 2D coords conversion into separate function.

2015-09-09 Thread Brian Paul

On 09/09/2015 04:35 AM, Krzesimir Nowak wrote:

This is to avoid tying the conversion to sampling - textureQueryLod
will need to do the conversion too, but it does not do any sampling.

So instead of a "sample" vfunc, there is a "convert" vfunc. The
drawback of this approach is that a "noop" convert copies 3 arrays of
floats. Would be nice to avoid it in some clean way.
---
  src/gallium/drivers/softpipe/sp_tex_sample.c | 73 +---
  src/gallium/drivers/softpipe/sp_tex_sample.h | 20 
  2 files changed, 54 insertions(+), 39 deletions(-)

diff --git a/src/gallium/drivers/softpipe/sp_tex_sample.c 
b/src/gallium/drivers/softpipe/sp_tex_sample.c
index 0a3fc20..cdec984 100644
--- a/src/gallium/drivers/softpipe/sp_tex_sample.c
+++ b/src/gallium/drivers/softpipe/sp_tex_sample.c
@@ -2981,29 +2981,36 @@ sample_mip(struct sp_sampler_view *sp_sview,

  }

+static void
+convert_noop(struct sp_sampler_view *sp_sview,
+ struct sp_sampler *sp_samp,
+ const float s[TGSI_QUAD_SIZE],
+ const float t[TGSI_QUAD_SIZE],
+ const float p[TGSI_QUAD_SIZE],
+ const float c0[TGSI_QUAD_SIZE],
+ float [TGSI_QUAD_SIZE],
+ float [TGSI_QUAD_SIZE],
+ float [TGSI_QUAD_SIZE])
+{
+   const size_t len = sizeof(float) * TGSI_QUAD_SIZE;
+
+   memcpy(, s, len);
+   memcpy(, t, len);
+   memcpy(, p, len);
+}

-/**
- * Use 3D texcoords to choose a cube face, then sample the 2D cube faces.
- * Put face info into the sampler faces[] array.
- */
  static void
-sample_cube(struct sp_sampler_view *sp_sview,
-struct sp_sampler *sp_samp,
-const float s[TGSI_QUAD_SIZE],
-const float t[TGSI_QUAD_SIZE],
-const float p[TGSI_QUAD_SIZE],
-const float c0[TGSI_QUAD_SIZE],
-const float c1[TGSI_QUAD_SIZE],
-const struct filter_args *filt_args,
-float rgba[TGSI_NUM_CHANNELS][TGSI_QUAD_SIZE])
+convert_cube(struct sp_sampler_view *sp_sview,
+ struct sp_sampler *sp_samp,
+ const float s[TGSI_QUAD_SIZE],
+ const float t[TGSI_QUAD_SIZE],
+ const float p[TGSI_QUAD_SIZE],
+ const float c0[TGSI_QUAD_SIZE],
+ float [TGSI_QUAD_SIZE],
+ float [TGSI_QUAD_SIZE],
+ float [TGSI_QUAD_SIZE])


Could you put some comments on top of these new functions to explain 
what they're doing?


Otherwise,
Reviewed-by: Brian Paul 



  {
 unsigned j;
-   float [4], [4];
-
-   /* Not actually used, but the intermediate steps that do the
-* dereferencing don't know it.
-*/
-   static float [4] = { 0, 0, 0, 0 };

 [0] = c0[0];
 [1] = c0[1];
@@ -3071,8 +3078,6 @@ sample_cube(struct sp_sampler_view *sp_sview,
   }
}
 }
-
-   sample_mip(sp_sview, sp_samp, , , , c0, c1, filt_args, rgba);
  }


@@ -3393,9 +3398,9 @@ softpipe_create_sampler_view(struct pipe_context *pipe,

if (view->target == PIPE_TEXTURE_CUBE ||
view->target == PIPE_TEXTURE_CUBE_ARRAY)
- sview->get_samples = sample_cube;
+ sview->convert_coords = convert_cube;
else {
- sview->get_samples = sample_mip;
+ sview->convert_coords = convert_noop;
}
sview->pot2d = spr->pot &&
   (view->target == PIPE_TEXTURE_2D ||
@@ -3440,13 +3445,22 @@ sp_tgsi_get_samples(struct tgsi_sampler *tgsi_sampler,
  enum tgsi_sampler_control control,
  float rgba[TGSI_NUM_CHANNELS][TGSI_QUAD_SIZE])
  {
-   struct sp_tgsi_sampler *sp_samp = (struct sp_tgsi_sampler *)tgsi_sampler;
+   struct sp_tgsi_sampler *sp_tgsi_samp = (struct sp_tgsi_sampler 
*)tgsi_sampler;
+   struct sp_sampler_view *sp_sview;
+   struct sp_sampler *sp_samp;
 struct filter_args filt_args;
+   float cs[TGSI_QUAD_SIZE];
+   float ct[TGSI_QUAD_SIZE];
+   float cp[TGSI_QUAD_SIZE];
+
 assert(sview_index < PIPE_MAX_SHADER_SAMPLER_VIEWS);
 assert(sampler_index < PIPE_MAX_SAMPLERS);
-   assert(sp_samp->sp_sampler[sampler_index]);
+   assert(sp_tgsi_samp->sp_sampler[sampler_index]);
+
+   sp_sview = _tgsi_samp->sp_sview[sview_index];
+   sp_samp = sp_tgsi_samp->sp_sampler[sampler_index];
 /* always have a view here but texture is NULL if no sampler view was set. 
*/
-   if (!sp_samp->sp_sview[sview_index].base.texture) {
+   if (!sp_sview->base.texture) {
int i, j;
for (j = 0; j < TGSI_NUM_CHANNELS; j++) {
   for (i = 0; i < TGSI_QUAD_SIZE; i++) {
@@ -3456,11 +3470,12 @@ sp_tgsi_get_samples(struct tgsi_sampler *tgsi_sampler,
return;
 }

+   sp_sview->convert_coords(sp_sview, sp_samp, s, t, p, c0, cs, ct, cp);
+
 filt_args.control = control;
 filt_args.offset = offset;
-   sp_samp->sp_sview[sview_index].get_samples(_samp->sp_sview[sview_index],
- 

Re: [Mesa-dev] [Mesa-dev, 3/5] i965/fs: Set first_non_payload_grf in assign_curb_setup

2015-09-09 Thread Kristian Høgsberg
Jordan Justen  writes:

> first_non_payload_grf may be updated in assign_urb_setup for FS or
> assign_vs_urb_setup for VS.

Patch looks good, but I think we should clarify in the commit message
that we're doing this so that compute will get first_non_payload_grf set
up in assign_curb_setup().

Reviewed-by: Kristian Høgsberg 

> Signed-off-by: Jordan Justen 
>
> ---
> src/mesa/drivers/dri/i965/brw_fs.cpp | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 4292aa6..39ed89a 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -1324,6 +1324,9 @@ fs_visitor::assign_curb_setup()
>}
>}
> }
> +
> +   /* This may be updated in assign_urb_setup or assign_vs_urb_setup. */
> +   this->first_non_payload_grf = payload.num_regs + 
> prog_data->curb_read_length;
>  }
>  
>  void
> @@ -1438,8 +1441,7 @@ fs_visitor::assign_urb_setup()
> }
>  
> /* Each attribute is 4 setup channels, each of which is half a reg. */
> -   this->first_non_payload_grf =
> -  urb_start + prog_data->num_varying_inputs * 2;
> +   this->first_non_payload_grf += prog_data->num_varying_inputs * 2;
>  }
>  
>  void
> @@ -1454,8 +1456,7 @@ fs_visitor::assign_vs_urb_setup()
>count++;
>  
> /* Each attribute is 4 regs. */
> -   this->first_non_payload_grf =
> -  payload.num_regs + prog_data->curb_read_length + count * 4;
> +   this->first_non_payload_grf += count * 4;
>  
> unsigned vue_entries =
>MAX2(count, vs_prog_data->base.vue_map.num_slots);
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallium/ttn: Simplify control flow stack handling.

2015-09-09 Thread Kenneth Graunke
Instead of storing insertion points, we simply store a single stack of
control flow nodes representing the nir_if / nir_loop we're currently
filling in.

When handling ELSE, we can simply switch to the current nir_if's
else_list.  When handling ENDIF or ENDLOOP, we move the cursor to
after the current if/loop, and pop it off since we're done.

This has a few advantages:
- We only need one stack entry for IF/ELSE/ENDIF instead of two.
- Each stack entry is sizeof(nir_cf_node *), which is smaller than
  sizeof(nir_cursor) (this is how it used to be).
- We only need one stack, not two.

Lightly tested with ir3_compiler - both TGSI programs Rob gave me
generate identical code before and after this patch.

Cc: Rob Clark 
Cc: Eric Anholt 
Signed-off-by: Kenneth Graunke 
---
 src/gallium/auxiliary/nir/tgsi_to_nir.c | 53 -
 1 file changed, 19 insertions(+), 34 deletions(-)

diff --git a/src/gallium/auxiliary/nir/tgsi_to_nir.c 
b/src/gallium/auxiliary/nir/tgsi_to_nir.c
index 560..41aca6b 100644
--- a/src/gallium/auxiliary/nir/tgsi_to_nir.c
+++ b/src/gallium/auxiliary/nir/tgsi_to_nir.c
@@ -65,25 +65,13 @@ struct ttn_compile {
nir_register *addr_reg;
 
/**
-* Stack of nir_cursors where instructions should be pushed as we pop
-* back out of the control flow stack.
+* Stack of CF nodes.
 *
-* For each IF/ELSE/ENDIF block, if_stack[if_stack_pos] has where the else
-* instructions should be placed, and if_stack[if_stack_pos - 1] has where
-* the next instructions outside of the if/then/else block go.
+* The top element stores the nir_if or nir_loop that we're currently
+* emitting code for.
 */
-   nir_cursor *if_stack;
-   unsigned if_stack_pos;
-
-   /**
-* Stack of nir_cursors where instructions should be pushed as we pop
-* back out of the control flow stack.
-*
-* loop_stack[loop_stack_pos - 1] contains the cf_node_list for the outside
-* of the loop.
-*/
-   nir_cursor *loop_stack;
-   unsigned loop_stack_pos;
+   nir_cf_node **cf_stack;
+   unsigned cf_stack_pos;
 
/* How many TGSI_FILE_IMMEDIATE vec4s have been parsed so far. */
unsigned next_imm;
@@ -931,13 +919,10 @@ ttn_if(struct ttn_compile *c, nir_ssa_def *src, bool 
is_uint)
}
nir_builder_cf_insert(b, _stmt->cf_node);
 
-   c->if_stack[c->if_stack_pos] = nir_after_cf_node(_stmt->cf_node);
-   c->if_stack_pos++;
-
b->cursor = nir_after_cf_list(_stmt->then_list);
 
-   c->if_stack[c->if_stack_pos] = nir_after_cf_list(_stmt->else_list);
-   c->if_stack_pos++;
+   c->cf_stack[c->cf_stack_pos] = _stmt->cf_node;
+   c->cf_stack_pos++;
 }
 
 static void
@@ -945,7 +930,8 @@ ttn_else(struct ttn_compile *c)
 {
nir_builder *b = >build;
 
-   b->cursor = c->if_stack[c->if_stack_pos - 1];
+   nir_if *if_node = nir_cf_node_as_if(c->cf_stack[c->cf_stack_pos - 1]);
+   b->cursor = nir_after_cf_list(_node->else_list);
 }
 
 static void
@@ -953,8 +939,8 @@ ttn_endif(struct ttn_compile *c)
 {
nir_builder *b = >build;
 
-   c->if_stack_pos -= 2;
-   b->cursor = c->if_stack[c->if_stack_pos];
+   c->cf_stack_pos--;
+   b->cursor = nir_after_cf_node(c->cf_stack[c->cf_stack_pos]);
 }
 
 static void
@@ -965,8 +951,8 @@ ttn_bgnloop(struct ttn_compile *c)
nir_loop *loop = nir_loop_create(b->shader);
nir_builder_cf_insert(b, >cf_node);
 
-   c->loop_stack[c->loop_stack_pos] = nir_after_cf_node(>cf_node);
-   c->loop_stack_pos++;
+   c->cf_stack[c->cf_stack_pos] = >cf_node;
+   c->cf_stack_pos++;
 
b->cursor = nir_after_cf_list(>body);
 }
@@ -990,8 +976,8 @@ ttn_endloop(struct ttn_compile *c)
 {
nir_builder *b = >build;
 
-   c->loop_stack_pos--;
-   b->cursor = c->loop_stack[c->loop_stack_pos];
+   c->cf_stack_pos--;
+   b->cursor = nir_after_cf_node(c->cf_stack[c->cf_stack_pos]);
 }
 
 static void
@@ -1817,11 +1803,10 @@ tgsi_to_nir(const void *tgsi_tokens,
c->num_samp_types = scan.file_max[TGSI_FILE_SAMPLER_VIEW] + 1;
c->samp_types = rzalloc_array(c, nir_alu_type, c->num_samp_types);
 
-   c->if_stack = rzalloc_array(c, nir_cursor,
-   (scan.opcode_count[TGSI_OPCODE_IF] +
-scan.opcode_count[TGSI_OPCODE_UIF]) * 2);
-   c->loop_stack = rzalloc_array(c, nir_cursor,
- scan.opcode_count[TGSI_OPCODE_BGNLOOP]);
+   c->cf_stack = rzalloc_array(c, nir_cf_node *,
+   scan.opcode_count[TGSI_OPCODE_IF] +
+   scan.opcode_count[TGSI_OPCODE_UIF] +
+   scan.opcode_count[TGSI_OPCODE_BGNLOOP]);
 
ret = tgsi_parse_init(, tgsi_tokens);
assert(ret == TGSI_PARSE_OK);
-- 
2.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] tgsi/scan: add support to figure out max nesting depth

2015-09-09 Thread Rob Clark
From: Rob Clark 

Sometimes a useful thing for compilers (or, for example, tgsi_to_nir) to
know.  And pretty trivial for scan to figure this out for us.

Signed-off-by: Rob Clark 
---
 src/gallium/auxiliary/tgsi/tgsi_scan.c | 16 
 src/gallium/auxiliary/tgsi/tgsi_scan.h |  5 +
 2 files changed, 21 insertions(+)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.c 
b/src/gallium/auxiliary/tgsi/tgsi_scan.c
index 9810b54..66306d7 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_scan.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_scan.c
@@ -56,6 +56,7 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
 {
uint procType, i;
struct tgsi_parse_context parse;
+   unsigned current_depth = 0;
 
memset(info, 0, sizeof(*info));
for (i = 0; i < TGSI_FILE_COUNT; i++)
@@ -100,6 +101,21 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
 assert(fullinst->Instruction.Opcode < TGSI_OPCODE_LAST);
 info->opcode_count[fullinst->Instruction.Opcode]++;
 
+switch (fullinst->Instruction.Opcode) {
+case TGSI_OPCODE_IF:
+case TGSI_OPCODE_UIF:
+case TGSI_OPCODE_BGNLOOP:
+   current_depth++;
+   info->max_depth = MAX2(info->max_depth, current_depth);
+   break;
+case TGSI_OPCODE_ENDIF:
+case TGSI_OPCODE_ENDLOOP:
+   current_depth--;
+   break;
+default:
+   break;
+}
+
 if (fullinst->Instruction.Opcode >= TGSI_OPCODE_F2D &&
 fullinst->Instruction.Opcode <= TGSI_OPCODE_DSSG)
info->uses_doubles = true;
diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.h 
b/src/gallium/auxiliary/tgsi/tgsi_scan.h
index c5900bc..42539ee 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_scan.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_scan.h
@@ -113,6 +113,11 @@ struct tgsi_shader_info
unsigned indirect_files_written;
 
unsigned properties[TGSI_PROPERTY_COUNT]; /* index with TGSI_PROPERTY_ */
+
+   /**
+* Max nesting limit of loops/if's
+*/
+   unsigned max_depth;
 };
 
 extern void
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] tgsi/scan: add support to figure out max nesting depth

2015-09-09 Thread Kenneth Graunke
On Wednesday, September 09, 2015 06:31:06 PM Rob Clark wrote:
> From: Rob Clark 
> 
> Sometimes a useful thing for compilers (or, for example, tgsi_to_nir) to
> know.  And pretty trivial for scan to figure this out for us.
> 
> Signed-off-by: Rob Clark 
> ---
>  src/gallium/auxiliary/tgsi/tgsi_scan.c | 16 
>  src/gallium/auxiliary/tgsi/tgsi_scan.h |  5 +
>  2 files changed, 21 insertions(+)
> 
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.c 
> b/src/gallium/auxiliary/tgsi/tgsi_scan.c
> index 9810b54..66306d7 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_scan.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_scan.c
> @@ -56,6 +56,7 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
>  {
> uint procType, i;
> struct tgsi_parse_context parse;
> +   unsigned current_depth = 0;
>  
> memset(info, 0, sizeof(*info));
> for (i = 0; i < TGSI_FILE_COUNT; i++)
> @@ -100,6 +101,21 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
>  assert(fullinst->Instruction.Opcode < TGSI_OPCODE_LAST);
>  info->opcode_count[fullinst->Instruction.Opcode]++;
>  
> +switch (fullinst->Instruction.Opcode) {
> +case TGSI_OPCODE_IF:
> +case TGSI_OPCODE_UIF:
> +case TGSI_OPCODE_BGNLOOP:
> +   current_depth++;
> +   info->max_depth = MAX2(info->max_depth, current_depth);
> +   break;
> +case TGSI_OPCODE_ENDIF:
> +case TGSI_OPCODE_ENDLOOP:
> +   current_depth--;
> +   break;
> +default:
> +   break;
> +}
> +
>  if (fullinst->Instruction.Opcode >= TGSI_OPCODE_F2D &&
>  fullinst->Instruction.Opcode <= TGSI_OPCODE_DSSG)
> info->uses_doubles = true;
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.h 
> b/src/gallium/auxiliary/tgsi/tgsi_scan.h
> index c5900bc..42539ee 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_scan.h
> +++ b/src/gallium/auxiliary/tgsi/tgsi_scan.h
> @@ -113,6 +113,11 @@ struct tgsi_shader_info
> unsigned indirect_files_written;
>  
> unsigned properties[TGSI_PROPERTY_COUNT]; /* index with TGSI_PROPERTY_ */
> +
> +   /**
> +* Max nesting limit of loops/if's
> +*/
> +   unsigned max_depth;
>  };
>  
>  extern void
> 

FWIW, this is
Reviewed-by: Kenneth Graunke 

but you should probably have other Gallium folks review it too.

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-dev, 4/5] i965/cs: Support texture sampling for CS

2015-09-09 Thread Kristian Høgsberg
Jordan Justen  writes:

> Signed-off-by: Jordan Justen 

Looks good - maybe change the subject to

i965: Set up sampler state for compute shaders

since this really only sets up that part of the state.

Reviewed-by: Kristian Høgsberg 

> ---
> src/mesa/drivers/dri/i965/brw_cs.cpp | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_cs.cpp 
> b/src/mesa/drivers/dri/i965/brw_cs.cpp
> index 63e3b8d..bf43def 100644
> --- a/src/mesa/drivers/dri/i965/brw_cs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_cs.cpp
> @@ -221,12 +221,18 @@ brw_codegen_cs_prog(struct brw_context *brw,
>  static void
>  brw_cs_populate_key(struct brw_context *brw, struct brw_cs_prog_key *key)
>  {
> +   struct gl_context *ctx = >ctx;
> /* BRW_NEW_COMPUTE_PROGRAM */
> const struct brw_compute_program *cp =
>(struct brw_compute_program *) brw->compute_program;
> +   const struct gl_program *prog = (struct gl_program *) cp;
>  
> memset(key, 0, sizeof(*key));
>  
> +   /* _NEW_TEXTURE */
> +   brw_populate_sampler_prog_key_data(ctx, prog, brw->cs.base.sampler_count,
> +  >tex);
> +
> /* The unique compute program ID */
> key->program_string_id = cp->id;
>  }
> @@ -244,9 +250,12 @@ brw_upload_cs_prog(struct brw_context *brw)
> if (!cp)
>return;
>  
> -   if (!brw_state_dirty(brw, 0, BRW_NEW_COMPUTE_PROGRAM))
> +   if (!brw_state_dirty(brw, _NEW_TEXTURE, BRW_NEW_COMPUTE_PROGRAM))
>return;
>  
> +   brw->cs.base.sampler_count =
> +  _mesa_fls(ctx->ComputeProgram._Current->Base.SamplersUsed);
> +
> brw_cs_populate_key(brw, );
>  
> if (!brw_search_cache(>cache, BRW_CACHE_CS_PROG,
> @@ -391,7 +400,8 @@ brw_upload_cs_state(struct brw_context *brw)
> if (brw->gen >= 8)
>desc[dw++] = 0; /* Kernel Start Pointer High */
> desc[dw++] = 0;
> -   desc[dw++] = 0;
> +   desc[dw++] = stage_state->sampler_offset |
> +  ((stage_state->sampler_count + 3) / 4);
> desc[dw++] = stage_state->bind_bo_offset;
> desc[dw++] = SET_FIELD((reg_aligned_constant_size / 32) + 0,
>MEDIA_CURBE_READ_LENGTH);
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-dev, 3/5] i965/fs: Set first_non_payload_grf in assign_curb_setup

2015-09-09 Thread Kristian Høgsberg

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] i915: fixing driver crashes if too few vertices are submitted

2015-09-09 Thread Ville Syrjälä
On Wed, Sep 09, 2015 at 11:53:54AM -0700, Ian Romanick wrote:
> On 09/09/2015 11:16 AM, Marius Predut wrote:
> > Comparison with a signed expression and unsigned value
> > is converted to unsigned value, reason for minus value is interpreted
> > as a big unsigned value. For this case the "for" loop
> > is going into unexpected behavior.
> > 
> > v1:Brian Paul: code style fix.
> 
> I don't think you really did...
> 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38109
> > Signed-off-by: Marius Predut 
> > ---
> >  src/mesa/tnl_dd/t_dd_dmatmp.h | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/mesa/tnl_dd/t_dd_dmatmp.h b/src/mesa/tnl_dd/t_dd_dmatmp.h
> > index 7be3954..79de224 100644
> > --- a/src/mesa/tnl_dd/t_dd_dmatmp.h
> > +++ b/src/mesa/tnl_dd/t_dd_dmatmp.h
> > @@ -627,6 +627,9 @@ static void TAG(render_quads_verts)( struct gl_context 
> > *ctx,
> >LOCAL_VARS;
> >GLuint j;
> >  
> > +  if(count % 4 != 0)
>^
> ...because I'm quite sure Brian's code had a space here, per Mesa's
> coding standards.
> 
> Also, this change is incorrect.  If an application does
> glDrawArrays(GL_QUADS, 0, (n * 4) + 3), this change will cause zero
> quads to be drawn when n quads should be drawn.

Due to the other check added to validate_render() we would never even
get here since we would have dropped off the fast path entrirely. Which
is not a nice thing to do simply due to an extra vertex hanging around
somewhere.

> 
> Page 18 (page 32 of the PDF) of the OpenGL 2.1 spec says:
> 
> "The total number of vertices between Begin and End is 4n + k,
> where 0 ≤ k ≤ 3; if k is not zero, the final k vertices are
> ignored."
> 
> We probably don't have a piglit test for this scenario, so one should be
> added.  You can CC me on that patch. :)
> 
> I think the correct change is to trim count such that (count % 4) == 0.
>  If the modified value of count is zero, bail out.  With that change,
> the other hunk (below) is unnecessary.

I would suggest having one function that gets passed the primitive and
vertex count and have it return the trimmed vertex count. That could
then be called from both intel_run_render() and and TAG(validate_render())
to skip any extra vertices. That way none of the actual render functions
in t_dd_dmatmp.h need worry about any extra vertices.

> 
> > +  return;
> > +
> >INIT(GL_TRIANGLES);
> >  
> >for (j = start; j < count-3; j += 4) {
> > @@ -1248,7 +1251,7 @@ static GLboolean TAG(validate_render)( struct 
> > gl_context *ctx,
> > ok = (GLint) count < GET_SUBSEQUENT_VB_MAX_ELTS();
> >  }
> >  else {
> > -   ok = HAVE_TRIANGLES; /* flatshading is ok. */
> > +   ok = HAVE_TRIANGLES && (count % 4 == 0); /* flatshading is ok. */
> >  }
> >  break;
> >default:
> > 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

-- 
Ville Syrjälä
Intel OTC
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] i965/cs: Enable barrier in MEDIA_INTERFACE_DESCRIPTOR

2015-09-09 Thread Kristian Høgsberg
Jordan Justen  writes:

> Enable barrier in MEDIA_INTERFACE_DESCRIPTOR if the program uses the
> barrier() GLSL function.
>
> On Ivy Bridge and Haswell, this allows the piglit test
> tests/spec/arb_compute_shader/execution/simple-barrier-atomics.shader_test
> to pass. On gen8, this enables a similar test with a local group size
> of 896 to pass.
>
> Signed-off-by: Jordan Justen 
>
> ---
> src/mesa/drivers/dri/i965/brw_context.h  | 1 +
>  src/mesa/drivers/dri/i965/brw_cs.cpp | 4 +++-
>  src/mesa/drivers/dri/i965/brw_defines.h  | 2 ++
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 2 ++
>  4 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index cd43ac5..883c75c 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -441,6 +441,7 @@ struct brw_cs_prog_data {
> GLuint dispatch_grf_start_reg_16;
> unsigned local_size[3];
> unsigned simd_size;
> +   bool uses_barrier;
>  };
>  
>  /**
> diff --git a/src/mesa/drivers/dri/i965/brw_cs.cpp 
> b/src/mesa/drivers/dri/i965/brw_cs.cpp
> index 29ee75b..4ff012a 100644
> --- a/src/mesa/drivers/dri/i965/brw_cs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_cs.cpp
> @@ -384,7 +384,9 @@ brw_upload_cs_state(struct brw_context *brw)
>SET_FIELD(threads, GEN8_MEDIA_GPGPU_THREAD_COUNT) :
>SET_FIELD(threads, MEDIA_GPGPU_THREAD_COUNT);
> assert(threads <= brw->max_cs_threads);
> -   desc[dw++] = media_threads;
> +   desc[dw++] =
> +  SET_FIELD(cs_prog_data->uses_barrier, MEDIA_BARRIER_ENABLE) |
> +  media_threads;
>  
> BEGIN_BATCH(4);
> OUT_BATCH(MEDIA_INTERFACE_DESCRIPTOR_LOAD << 16 | (4 - 2));
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index f595366..95d74ba 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -2615,6 +2615,8 @@ enum brw_wm_barycentric_interp_mode {
>  
>  #define MEDIA_INTERFACE_DESCRIPTOR_LOAD 0x7002
>  /* GEN7 DW5, GEN8+ DW6 */
> +# define MEDIA_BARRIER_ENABLE_SHIFT 21
> +# define MEDIA_BARRIER_ENABLE_MASK  INTEL_MASK(21, 21)
>  # define MEDIA_GPGPU_THREAD_COUNT_SHIFT 0
>  # define MEDIA_GPGPU_THREAD_COUNT_MASK  INTEL_MASK(7, 0)
>  # define GEN8_MEDIA_GPGPU_THREAD_COUNT_SHIFT0
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index e922a85..5549a2d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -1575,7 +1575,9 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
> nir_intrinsic_instr *instr
> }
>  
> case nir_intrinsic_barrier:
> +  assert(stage == MESA_SHADER_COMPUTE);
>emit_barrier();
> +  ((struct brw_cs_prog_data *) prog_data)->uses_barrier = true;
>break;

We can get a barrier in tess control shaders too - how about

if (stage == MESA_SHADER_COMPUTE)
   ((struct brw_cs_prog_data *) prog_data)->uses_barrier = true;

instead?

With that change,

Reviewed-by: Kristian Høgsberg 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Don't tell the hardware about our UAV access.

2015-09-09 Thread Mark Janes
Mark Janes  writes:

> Francisco Jerez  writes:
>
>> Mark Janes  writes:
>>
>>> When I tested this, I saw an intermittent BSW gpu hang.  I haven't been
>>> able to confirm that it is due to the host-mem-barrier test.
>>>
>> It would probably be useful to know if the hang is due to any of the
>> image load/store tests or if it's something unrelated.
>
> Yes, you are right.  I will take some time tomorrow to catch the
> specific test.  Given the low rate of gpu hangs on BSW lately, I expect
> it will be due to image load/store tests.  However I need to confirm it.

I spent time attempting to reproduce hangs with this patch on BSW today.
I forced 3 hangs, but none of them could be attributed to image
load/store tests.

> -Mark
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-dev, 5/5] i965/cs: Emit texture surfaces to enable CS sampling

2015-09-09 Thread Kristian Høgsberg
Jordan Justen  writes:

> Signed-off-by: Jordan Justen 
>

Reviewed-by: Kristian Høgsberg 

> ---
> src/mesa/drivers/dri/i965/brw_context.h  | 2 +-
>  src/mesa/drivers/dri/i965/brw_state_upload.c | 2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index 51a1447..67ace37 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1449,7 +1449,7 @@ struct brw_context
>  
> int num_atoms[BRW_NUM_PIPELINES];
> const struct brw_tracked_state render_atoms[57];
> -   const struct brw_tracked_state compute_atoms[4];
> +   const struct brw_tracked_state compute_atoms[5];
>  
> /* If (INTEL_DEBUG & DEBUG_BATCH) */
> struct {
> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
> b/src/mesa/drivers/dri/i965/brw_state_upload.c
> index c2d0ba1..f677826 100644
> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
> @@ -253,6 +253,7 @@ static const struct brw_tracked_state 
> *gen7_compute_atoms[] =
> _state_base_address,
> _cs_push_constants,
> _cs_abo_surfaces,
> +   _texture_surfaces,
> _cs_state,
>  };
>  
> @@ -337,6 +338,7 @@ static const struct brw_tracked_state 
> *gen8_compute_atoms[] =
> _state_base_address,
> _cs_push_constants,
> _cs_abo_surfaces,
> +   _texture_surfaces,
> _cs_state,
>  };
>  
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] meta: Use result of texture coordinate clamping operation

2015-09-09 Thread Anuj Phogat
On Wed, Sep 9, 2015 at 11:13 AM, Ian Romanick  wrote:
> From: Ian Romanick 
>
> Previously the result of the complicated clamp() expression just dropped
> on the floor: clamp does not modify any of its parameters.  Looking at
> the surrounding code, I believe this is supposed to modify the value of
> tex_coord.
>
> This change (along with a change to avoid the use of
> brw_blorp_framebuffer) does not affect any existing piglit tests.  I'm
> not sure what this clamp is trying to accomplish, so I'm not sure how to
> write a test to exercise this path.
>
texCoords for the src texture are offset and scaled in the fragment shader.
The resulting tex_coord need to be clamped to avoid picking samples
beyond the texture edges.

Unfortunately piglit blit scaled test don't hit this bug. I remember trying
to exercise this case but it's little tricky.

> I also noticed another bug in this code.  There is no way the array
> texture case could possibly work.  This will generate code for the
> TEXEL_FETCH macro like:
>
> #define TEXEL_FETCH(coord) texelFetch(texSampler, ivec3(coord), 
> sample_map[int(2 * fract(coord.x))]);
>
> Since the coord parameter of this macro is a vec2 at all invocations, no
> expansion of this macro will even compile.
>
> Signed-off-by: Ian Romanick 
> Cc: Anuj Phogat 
> Cc: Topi Pohjolainen 
> Cc: Jordan Justen 
> ---
>  src/mesa/drivers/common/meta_blit.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/common/meta_blit.c 
> b/src/mesa/drivers/common/meta_blit.c
> index 71d18de..a41fe42 100644
> --- a/src/mesa/drivers/common/meta_blit.c
> +++ b/src/mesa/drivers/common/meta_blit.c
> @@ -187,8 +187,8 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx,
> "   vec2 tex_coord = texCoords - 
> s_0_offset;\n"
> "\n"
> "   tex_coord *= scale;\n"
> -   "   clamp(tex_coord.x, 0.0f, scale.x * 
> src_width - 1.0f);\n"
> -   "   clamp(tex_coord.y, 0.0f, scale.y * 
> src_height - 1.0f);\n"
> +   "   tex_coord.x = clamp(tex_coord.x, 0.0f, 
> scale.x * src_width - 1.0f);\n"
> +   "   tex_coord.y = clamp(tex_coord.y, 0.0f, 
> scale.y * src_height - 1.0f);\n"
> "   interp = fract(tex_coord);\n"
> "   tex_coord = ivec2(tex_coord) * 
> scale_inv;\n"
> "\n"
> --
> 2.1.0
>

Thanks for fixing it Ian.
Reviewed-by: Anuj Phogat 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/11] nir/lower_vec_to_movs: Add a state struct

2015-09-09 Thread Jason Ekstrand
While we're here, we also fix up a couple of potential ralloc-parenting
bugs.  In particular, we were calling nir_src_copy with the shader as the
mem context instead of the instruction.
---
 src/glsl/nir/nir_lower_vec_to_movs.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/src/glsl/nir/nir_lower_vec_to_movs.c 
b/src/glsl/nir/nir_lower_vec_to_movs.c
index b7f096d..e297cb8 100644
--- a/src/glsl/nir/nir_lower_vec_to_movs.c
+++ b/src/glsl/nir/nir_lower_vec_to_movs.c
@@ -32,6 +32,10 @@
  * moves with partial writes.
  */
 
+struct vec_to_movs_state {
+   nir_shader *shader;
+};
+
 static bool
 src_matches_dest_reg(nir_dest *dest, nir_src *src)
 {
@@ -54,12 +58,12 @@ src_matches_dest_reg(nir_dest *dest, nir_src *src)
  */
 static unsigned
 insert_mov(nir_alu_instr *vec, unsigned start_channel,
-unsigned start_src_idx, void *mem_ctx)
+unsigned start_src_idx, nir_shader *shader)
 {
unsigned src_idx = start_src_idx;
assert(src_idx < nir_op_infos[vec->op].num_inputs);
 
-   nir_alu_instr *mov = nir_alu_instr_create(mem_ctx, nir_op_imov);
+   nir_alu_instr *mov = nir_alu_instr_create(shader, nir_op_imov);
nir_alu_src_copy(>src[0], >src[src_idx], mov);
nir_alu_dest_copy(>dest, >dest, mov);
 
@@ -84,8 +88,10 @@ insert_mov(nir_alu_instr *vec, unsigned start_channel,
 }
 
 static bool
-lower_vec_to_movs_block(nir_block *block, void *mem_ctx)
+lower_vec_to_movs_block(nir_block *block, void *void_state)
 {
+   struct vec_to_movs_state *state = void_state;
+
nir_foreach_instr_safe(block, instr) {
   if (instr->type != nir_instr_type_alu)
  continue;
@@ -115,7 +121,7 @@ lower_vec_to_movs_block(nir_block *block, void *mem_ctx)
 continue;
 
  if (src_matches_dest_reg(>dest.dest, >src[src_idx].src)) {
-finished_write_mask |= insert_mov(vec, i, src_idx, mem_ctx);
+finished_write_mask |= insert_mov(vec, i, src_idx, state->shader);
 break;
  }
  src_idx++;
@@ -127,7 +133,7 @@ lower_vec_to_movs_block(nir_block *block, void *mem_ctx)
 continue;
 
  if (!(finished_write_mask & (1 << i)))
-finished_write_mask |= insert_mov(vec, i, src_idx, mem_ctx);
+finished_write_mask |= insert_mov(vec, i, src_idx, state->shader);
 
  src_idx++;
   }
@@ -142,7 +148,11 @@ lower_vec_to_movs_block(nir_block *block, void *mem_ctx)
 static void
 nir_lower_vec_to_movs_impl(nir_function_impl *impl)
 {
-   nir_foreach_block(impl, lower_vec_to_movs_block, ralloc_parent(impl));
+   struct vec_to_movs_state state;
+
+   state.shader = impl->overload->function->shader;
+
+   nir_foreach_block(impl, lower_vec_to_movs_block, );
 }
 
 void
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/11] nir/lower_vec_to_movs: Get rid of start_idx and swizzle compacting

2015-09-09 Thread Jason Ekstrand
Previously, we did this thing with keeping track of a separate start_idx
which was different from the iteration variable.  I think this was a relic
of the way that GLSL IR implements writemasks.  In NIR, if a given bit in
the writemask is unset then that channel is just "unused", not missing.  In
particular, a vec4 operation with a writemask of 0xd will use sources 0, 2,
and 3 and leave source 1 alone.  We can simplify things a good deal (and
make them correct) by removing this "compacting" step.

Cc: Eric Anholt 
---
 src/glsl/nir/nir_lower_vec_to_movs.c | 33 +
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/src/glsl/nir/nir_lower_vec_to_movs.c 
b/src/glsl/nir/nir_lower_vec_to_movs.c
index 7d31e36..ed8ec9b 100644
--- a/src/glsl/nir/nir_lower_vec_to_movs.c
+++ b/src/glsl/nir/nir_lower_vec_to_movs.c
@@ -58,29 +58,25 @@ src_matches_dest_reg(nir_dest *dest, nir_src *src)
  * which ones have been processed.
  */
 static unsigned
-insert_mov(nir_alu_instr *vec, unsigned start_channel,
-unsigned start_src_idx, nir_shader *shader)
+insert_mov(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader)
 {
-   unsigned src_idx = start_src_idx;
-   assert(src_idx < nir_op_infos[vec->op].num_inputs);
+   assert(start_idx < nir_op_infos[vec->op].num_inputs);
 
nir_alu_instr *mov = nir_alu_instr_create(shader, nir_op_imov);
-   nir_alu_src_copy(>src[0], >src[src_idx], mov);
+   nir_alu_src_copy(>src[0], >src[start_idx], mov);
nir_alu_dest_copy(>dest, >dest, mov);
 
-   mov->dest.write_mask = (1u << start_channel);
-   mov->src[0].swizzle[start_channel] = vec->src[src_idx].swizzle[0];
-   src_idx++;
+   mov->dest.write_mask = (1u << start_idx);
+   mov->src[0].swizzle[start_idx] = vec->src[start_idx].swizzle[0];
 
-   for (unsigned i = start_channel + 1; i < 4; i++) {
+   for (unsigned i = start_idx + 1; i < 4; i++) {
   if (!(vec->dest.write_mask & (1 << i)))
  continue;
 
-  if (nir_srcs_equal(vec->src[src_idx].src, vec->src[start_src_idx].src)) {
+  if (nir_srcs_equal(vec->src[i].src, vec->src[start_idx].src)) {
  mov->dest.write_mask |= (1 << i);
- mov->src[0].swizzle[i] = vec->src[src_idx].swizzle[0];
+ mov->src[0].swizzle[i] = vec->src[i].swizzle[0];
   }
-  src_idx++;
}
 
nir_instr_insert_before(>instr, >instr);
@@ -129,26 +125,23 @@ lower_vec_to_movs_block(nir_block *block, void 
*void_state)
* destination reg, in case other values we're populating in the dest
* might overwrite them.
*/
-  for (unsigned i = 0, src_idx = 0; i < 4; i++) {
+  for (unsigned i = 0; i < 4; i++) {
  if (!(vec->dest.write_mask & (1 << i)))
 continue;
 
- if (src_matches_dest_reg(>dest.dest, >src[src_idx].src)) {
-finished_write_mask |= insert_mov(vec, i, src_idx, state->shader);
+ if (src_matches_dest_reg(>dest.dest, >src[i].src)) {
+finished_write_mask |= insert_mov(vec, i, state->shader);
 break;
  }
- src_idx++;
   }
 
   /* Now, emit MOVs for all the other src channels. */
-  for (unsigned i = 0, src_idx = 0; i < 4; i++) {
+  for (unsigned i = 0; i < 4; i++) {
  if (!(vec->dest.write_mask & (1 << i)))
 continue;
 
  if (!(finished_write_mask & (1 << i)))
-finished_write_mask |= insert_mov(vec, i, src_idx, state->shader);
-
- src_idx++;
+finished_write_mask |= insert_mov(vec, i, state->shader);
   }
 
   nir_instr_remove(>instr);
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/11] i965/vec4_nir: Use partial SSA form rather than full non-SSA

2015-09-09 Thread Jason Ekstrand
We made this switch in the FS backend some time ago and it seems to make a
number of things a bit easier.
---
 src/mesa/drivers/dri/i965/brw_nir.c|  2 +-
 src/mesa/drivers/dri/i965/brw_vec4.h   |  1 +
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 ++---
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index 8f3edc5..f326b23 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -183,7 +183,7 @@ brw_create_nir(struct brw_context *brw,
   nir_print_shader(nir, stderr);
}
 
-   nir_convert_from_ssa(nir, is_scalar);
+   nir_convert_from_ssa(nir, true);
nir_validate_shader(nir);
 
if (!is_scalar) {
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 01c6e84..de74ec9 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -423,6 +423,7 @@ public:
virtual void nir_emit_alu(nir_alu_instr *instr);
virtual void nir_emit_jump(nir_jump_instr *instr);
virtual void nir_emit_texture(nir_tex_instr *instr);
+   virtual void nir_emit_undef(nir_ssa_undef_instr *instr);
 
dst_reg get_nir_dest(nir_dest dest, enum brw_reg_type type);
dst_reg get_nir_dest(nir_dest dest, nir_alu_type type);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 751ec73..87a7941 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -367,6 +367,10 @@ vec4_visitor::nir_emit_instr(nir_instr *instr)
   nir_emit_texture(nir_instr_as_tex(instr));
   break;
 
+   case nir_instr_type_ssa_undef:
+  nir_emit_undef(nir_instr_as_ssa_undef(instr));
+  break;
+
default:
   fprintf(stderr, "VS instruction not yet implemented by NIR->vec4\n");
   break;
@@ -393,9 +397,14 @@ dst_reg_for_nir_reg(vec4_visitor *v, nir_register *nir_reg,
 dst_reg
 vec4_visitor::get_nir_dest(nir_dest dest)
 {
-   assert(!dest.is_ssa);
-   return dst_reg_for_nir_reg(this, dest.reg.reg, dest.reg.base_offset,
-  dest.reg.indirect);
+   if (dest.is_ssa) {
+  dst_reg dst = dst_reg(GRF, alloc.allocate(1));
+  nir_ssa_values[dest.ssa.index] = dst;
+  return dst;
+   } else {
+  return dst_reg_for_nir_reg(this, dest.reg.reg, dest.reg.base_offset,
+ dest.reg.indirect);
+   }
 }
 
 dst_reg
@@ -1528,4 +1537,10 @@ vec4_visitor::nir_emit_texture(nir_tex_instr *instr)
 mcs, is_cube_array, sampler, sampler_reg);
 }
 
+void
+vec4_visitor::nir_emit_undef(nir_ssa_undef_instr *instr)
+{
+   nir_ssa_values[instr->def.index] = dst_reg(GRF, alloc.allocate(1));
+}
+
 }
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/11] nir/from_ssa: Use instr_rewrite_dest

2015-09-09 Thread Jason Ekstrand
---
 src/glsl/nir/nir_from_ssa.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/glsl/nir/nir_from_ssa.c b/src/glsl/nir/nir_from_ssa.c
index 084f43d..eaf883d 100644
--- a/src/glsl/nir/nir_from_ssa.c
+++ b/src/glsl/nir/nir_from_ssa.c
@@ -512,9 +512,7 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state)
 */
nir_dest *dest = exec_node_data(nir_dest, def, ssa);
 
-   *dest = nir_dest_for_reg(reg);
-   dest->reg.parent_instr = state->instr;
-   list_addtail(>reg.def_link, >defs);
+   nir_instr_rewrite_dest(state->instr, dest, nir_dest_for_reg(reg));
 
return true;
 }
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] meta: Use result of texture coordinate clamping operation

2015-09-09 Thread Anuj Phogat
On Wed, Sep 9, 2015 at 6:19 PM, Ian Romanick  wrote:
> On 09/09/2015 04:55 PM, Anuj Phogat wrote:
>> On Wed, Sep 9, 2015 at 11:30 AM, Ian Romanick 
wrote:
>>> I'm pretty sure our implementation of this extension is complete
>>> rubbish.  I have attached an image from the blit-scaled test.  This
>>> result cannot be useful to anyone.
>>>
>> Paul Berry and I agreed to follow the "NVIDIA Implementation Details"
>> given in the extension spec. i965 driver currently use bilinear filtering
>> for both SCALED_RESOLVE_FASTEST and SCALED_RESOLVE_NICEST.
>> Do you think the implementation doesn't match what is suggested in
>> the spec? I'll be happy to work on improving the implementation if you
>> have any specific comments.
>>
>> Extension spec mentions the potential loss of quality when using a
>> single pass scaled resolve blit. Also the quality of image generated
>> depends on sample count and scaling factor. I vaguely remember
>> Paul mentioned the scaling factor of ~1.5, beyond which single pass
>> scaled resolve blit quickly becomes useless. I think the attached
>> image is using a scaling factor of 2.4.
>
> Yes, I think so.  It was the last image produced by the test.
>
>> I'm attaching an old image generated by me doing the quality
>> comparison with Nvidia's proprietary driver implementation. I posted
>> the image on mesa-dev before enabling the extension on i965. Please
>> see the images under "New implementation" for the comparison.
>
> Comparing our images with NVIDIA's image, I don't see very much
> difference.  That's both reassuring and depressing.  Looking at all of
> those images, I can't imagine why anyone would bother using this
> functionality.  I doubt there's any scenario where the little "spikes"
> that grow of the sides of the triangles is acceptable.  Oh well.

Markus Wick (IRC name: degasus) shared these dolphin emulator
images (rendered using i965 drivers) with me on dri-devel. They show
the effect of using the extension over resolve and liner scaling. I think
most of the images look crisper with little loss of smooth edges.
https://www.dropbox.com/sh/vmt7tolzd6qlnpo/AABlGkXZ4W6_ucFk0Gk3tfNpa?dl=0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 (part2) 20/59] glsl: Add parser/compiler support for std430 interface packing qualifier

2015-09-09 Thread Ian Romanick
On 09/09/2015 03:32 AM, Samuel Iglesias Gonsálvez wrote:
> On 31/08/15 08:38, Samuel Iglesias Gonsálvez wrote:
>> On 29/08/15 02:27, Jordan Justen wrote:
>>> Hmm, you quoted the extension spec, but the 4.30 spec has: "When using
>>> the std430 storage layout, shader storage blocks will be laid out in
>>> buffer storage identically to uniform and shader storage blocks using
>>> the std140 layout, except that the base alignment and stride of arrays
>>> of scalars and vectors in rule 4 and of structures in rule 9 are not
>>> rounded up a multiple of the base alignment of a vec4 ."
>>>
>>> Notably, it mentions stride. Wouldn't this be wrong for a vec3 array,
>>> where the stride should be 12 bytes?
>>>
>>
>> Good catch! Both specs differ in this case. I will update the comment.
>>
>> Right now, we are using a 16 bytes stride for an array of vec3s. NVIDIA
>> proprietary driver (4.50 NVIDIA 352.21) does the same.
>>
>> I am going to modify our code to fix this case. Thanks!
>>
> 
> Before doing any change, I have investigated which is the proper stride
> value for array of vec3. From 4.30 spec:
> 
> "3. If the member is a three-component vector with components consuming
> N basic machine units, the base alignment is 4N."

Not sure if this helps, but issue 8 in the ARB_ssbo spec says:

  Note that in the "std430" packing, arrays of vec3s are still not tightly
  packed; vec3 types still require a 16B alignment as in "std140".

> std430 explicitly says that the base alignment and stride of arrays of
> scalars and vectors in rule 4 are not rounded up a multiple of the base
> alignment of a vec4. But it doesn't mention any change to rule 3.
> 
> According to rule (3) and std430 changes to rule (4), the stride should
> be 16 for an array of vec3.
> 
> Sam

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] meta: Use result of texture coordinate clamping operation

2015-09-09 Thread Anuj Phogat
On Wed, Sep 9, 2015 at 11:30 AM, Ian Romanick  wrote:
> I'm pretty sure our implementation of this extension is complete
> rubbish.  I have attached an image from the blit-scaled test.  This
> result cannot be useful to anyone.
>
Resending it with link to the image in place of big attachment.

Paul Berry and I agreed to follow the "NVIDIA Implementation Details"
given in the extension spec. i965 driver currently use bilinear filtering
for both SCALED_RESOLVE_FASTEST and SCALED_RESOLVE_NICEST.
Do you think the implementation doesn't match what is suggested in
the spec? I'll be happy to work on improving the implementation if you
have any specific comments.

Extension spec mentions the potential loss of quality when using a
single pass scaled resolve blit. Also the quality of image generated
depends on sample count and scaling factor. I vaguely remember
Paul mentioned the scaling factor of ~1.5, beyond which single pass
scaled resolve blit quickly becomes useless. I think the attached
image is using a scaling factor of 2.4.

Below link shows an old image generated by me doing the quality
comparison with Nvidia's proprietary driver implementation. I posted
the image on mesa-dev before enabling the extension on i965. Please
see the images under "New implementation" for the comparison.
Image: https://www.dropbox.com/s/z4ebhx0lwfmui8m/scaled-blit.png?dl=0

> On 09/09/2015 11:16 AM, Ian Romanick wrote:
>> On 09/09/2015 11:13 AM, Ian Romanick wrote:
>>> From: Ian Romanick 
>>>
>>> Previously the result of the complicated clamp() expression just dropped
>>> on the floor: clamp does not modify any of its parameters.  Looking at
>>> the surrounding code, I believe this is supposed to modify the value of
>>> tex_coord.
>>>
>>> This change (along with a change to avoid the use of
>>> brw_blorp_framebuffer) does not affect any existing piglit tests.  I'm
>>> not sure what this clamp is trying to accomplish, so I'm not sure how to
>>> write a test to exercise this path.
>>
>> I just noticed that the piglit test contains the same incorrect shader
>> as meta for the "reference" implementation.
>>
>>> I also noticed another bug in this code.  There is no way the array
>>> texture case could possibly work.  This will generate code for the
>>> TEXEL_FETCH macro like:
>>>
>>> #define TEXEL_FETCH(coord) texelFetch(texSampler, ivec3(coord),
sample_map[int(2 * fract(coord.x))]);
>>>
>>> Since the coord parameter of this macro is a vec2 at all invocations, no
>>> expansion of this macro will even compile.
>>>
>>> Signed-off-by: Ian Romanick 
>>> Cc: Anuj Phogat 
>>> Cc: Topi Pohjolainen 
>>> Cc: Jordan Justen 
>>> ---
>>>  src/mesa/drivers/common/meta_blit.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/common/meta_blit.c
b/src/mesa/drivers/common/meta_blit.c
>>> index 71d18de..a41fe42 100644
>>> --- a/src/mesa/drivers/common/meta_blit.c
>>> +++ b/src/mesa/drivers/common/meta_blit.c
>>> @@ -187,8 +187,8 @@ setup_glsl_msaa_blit_scaled_shader(struct
gl_context *ctx,
>>> "   vec2 tex_coord = texCoords -
s_0_offset;\n"
>>> "\n"
>>> "   tex_coord *= scale;\n"
>>> -   "   clamp(tex_coord.x, 0.0f, scale.x *
src_width - 1.0f);\n"
>>> -   "   clamp(tex_coord.y, 0.0f, scale.y *
src_height - 1.0f);\n"
>>> +   "   tex_coord.x = clamp(tex_coord.x,
0.0f, scale.x * src_width - 1.0f);\n"
>>> +   "   tex_coord.y = clamp(tex_coord.y,
0.0f, scale.y * src_height - 1.0f);\n"
>>> "   interp = fract(tex_coord);\n"
>>> "   tex_coord = ivec2(tex_coord) *
scale_inv;\n"
>>> "\n"
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/11] nir: Add a function for rewriting instruction destinations

2015-09-09 Thread Jason Ekstrand
---
 src/glsl/nir/nir.c | 24 
 src/glsl/nir/nir.h |  2 ++
 2 files changed, 26 insertions(+)

diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
index e173b21..0090b08 100644
--- a/src/glsl/nir/nir.c
+++ b/src/glsl/nir/nir.c
@@ -1175,6 +1175,30 @@ nir_if_rewrite_condition(nir_if *if_stmt, nir_src 
new_src)
 }
 
 void
+nir_instr_rewrite_dest(nir_instr *instr, nir_dest *dest, nir_dest new_dest)
+{
+   if (dest->is_ssa) {
+  /* We can only overwrite an SSA destination if it has no uses. */
+  assert(list_empty(>ssa.uses) && list_empty(>ssa.if_uses));
+   } else {
+  list_del(>reg.def_link);
+  if (dest->reg.indirect)
+ src_remove_all_uses(dest->reg.indirect);
+   }
+
+   /* We can't re-write with an SSA def */
+   assert(!new_dest.is_ssa);
+
+   nir_dest_copy(dest, _dest, instr);
+
+   dest->reg.parent_instr = instr;
+   list_addtail(>reg.def_link, _dest.reg.reg->defs);
+
+   if (dest->reg.indirect)
+  src_add_all_uses(dest->reg.indirect, instr, NULL);
+}
+
+void
 nir_ssa_def_init(nir_instr *instr, nir_ssa_def *def,
  unsigned num_components, const char *name)
 {
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index 32c338c..a8e7363 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1725,6 +1725,8 @@ bool nir_srcs_equal(nir_src src1, nir_src src2);
 void nir_instr_rewrite_src(nir_instr *instr, nir_src *src, nir_src new_src);
 void nir_instr_move_src(nir_instr *dest_instr, nir_src *dest, nir_src *src);
 void nir_if_rewrite_condition(nir_if *if_stmt, nir_src new_src);
+void nir_instr_rewrite_dest(nir_instr *instr, nir_dest *dest,
+nir_dest new_dest);
 
 void nir_ssa_dest_init(nir_instr *instr, nir_dest *dest,
unsigned num_components, const char *name);
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/11] nir: Fix a bunch of ralloc parenting errors

2015-09-09 Thread Jason Ekstrand
As of a10d4937, we would really like things associated with an instruction
to be allocated out of that instruction and not out of the shader.  In
particular, you should be passing the instruction that will ultimately be
holding the source into nir_src_copy rather than an arbitrary memory
context.

We also change the prototypes of nir_dest_copy and nir_alu_src/dest_copy to
explicitly take an instruction so we catch this earlier in the future.
---
 src/glsl/nir/nir.c  | 20 +++-
 src/glsl/nir/nir.h  | 13 +++--
 src/glsl/nir/nir_from_ssa.c |  2 +-
 src/glsl/nir/nir_lower_alu_to_scalar.c  |  6 +++---
 src/glsl/nir/nir_lower_atomics.c|  2 +-
 src/glsl/nir/nir_lower_io.c |  2 +-
 src/glsl/nir/nir_lower_locals_to_regs.c |  7 +++
 src/glsl/nir/nir_lower_vec_to_movs.c|  4 ++--
 src/glsl/nir/nir_opt_peephole_ffma.c|  3 +--
 src/glsl/nir/nir_opt_peephole_select.c  |  4 ++--
 10 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
index ab06ea2..6389d45 100644
--- a/src/glsl/nir/nir.c
+++ b/src/glsl/nir/nir.c
@@ -150,7 +150,7 @@ void nir_src_copy(nir_src *dest, const nir_src *src, void 
*mem_ctx)
}
 }
 
-void nir_dest_copy(nir_dest *dest, const nir_dest *src, void *mem_ctx)
+void nir_dest_copy(nir_dest *dest, const nir_dest *src, nir_instr *instr)
 {
/* Copying an SSA definition makes no sense whatsoever. */
assert(!src->is_ssa);
@@ -160,17 +160,18 @@ void nir_dest_copy(nir_dest *dest, const nir_dest *src, 
void *mem_ctx)
dest->reg.base_offset = src->reg.base_offset;
dest->reg.reg = src->reg.reg;
if (src->reg.indirect) {
-  dest->reg.indirect = ralloc(mem_ctx, nir_src);
-  nir_src_copy(dest->reg.indirect, src->reg.indirect, mem_ctx);
+  dest->reg.indirect = ralloc(instr, nir_src);
+  nir_src_copy(dest->reg.indirect, src->reg.indirect, instr);
} else {
   dest->reg.indirect = NULL;
}
 }
 
 void
-nir_alu_src_copy(nir_alu_src *dest, const nir_alu_src *src, void *mem_ctx)
+nir_alu_src_copy(nir_alu_src *dest, const nir_alu_src *src,
+ nir_alu_instr *instr)
 {
-   nir_src_copy(>src, >src, mem_ctx);
+   nir_src_copy(>src, >src, >instr);
dest->abs = src->abs;
dest->negate = src->negate;
for (unsigned i = 0; i < 4; i++)
@@ -178,9 +179,10 @@ nir_alu_src_copy(nir_alu_src *dest, const nir_alu_src 
*src, void *mem_ctx)
 }
 
 void
-nir_alu_dest_copy(nir_alu_dest *dest, const nir_alu_dest *src, void *mem_ctx)
+nir_alu_dest_copy(nir_alu_dest *dest, const nir_alu_dest *src,
+  nir_alu_instr *instr)
 {
-   nir_dest_copy(>dest, >dest, mem_ctx);
+   nir_dest_copy(>dest, >dest, >instr);
dest->write_mask = src->write_mask;
dest->saturate = src->saturate;
 }
@@ -1207,14 +1209,14 @@ nir_ssa_def_rewrite_uses(nir_ssa_def *def, nir_src 
new_src, void *mem_ctx)
nir_foreach_use_safe(def, use_src) {
   nir_instr *src_parent_instr = use_src->parent_instr;
   list_del(_src->use_link);
-  nir_src_copy(use_src, _src, mem_ctx);
+  nir_src_copy(use_src, _src, src_parent_instr);
   src_add_all_uses(use_src, src_parent_instr, NULL);
}
 
nir_foreach_if_use_safe(def, use_src) {
   nir_if *src_parent_if = use_src->parent_if;
   list_del(_src->use_link);
-  nir_src_copy(use_src, _src, mem_ctx);
+  nir_src_copy(use_src, _src, src_parent_if);
   src_add_all_uses(use_src, NULL, src_parent_if);
}
 }
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index 2ba7731..dce2d6a 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -580,8 +580,8 @@ nir_dest_for_reg(nir_register *reg)
return dest;
 }
 
-void nir_src_copy(nir_src *dest, const nir_src *src, void *mem_ctx);
-void nir_dest_copy(nir_dest *dest, const nir_dest *src, void *mem_ctx);
+void nir_src_copy(nir_src *dest, const nir_src *src, void *instr_or_if);
+void nir_dest_copy(nir_dest *dest, const nir_dest *src, nir_instr *instr);
 
 typedef struct {
nir_src src;
@@ -630,10 +630,6 @@ typedef struct {
unsigned write_mask : 4; /* ignored if dest.is_ssa is true */
 } nir_alu_dest;
 
-void nir_alu_src_copy(nir_alu_src *dest, const nir_alu_src *src, void 
*mem_ctx);
-void nir_alu_dest_copy(nir_alu_dest *dest, const nir_alu_dest *src,
-   void *mem_ctx);
-
 typedef enum {
nir_type_invalid = 0, /* Not a valid type */
nir_type_float,
@@ -702,6 +698,11 @@ typedef struct nir_alu_instr {
nir_alu_src src[];
 } nir_alu_instr;
 
+void nir_alu_src_copy(nir_alu_src *dest, const nir_alu_src *src,
+  nir_alu_instr *instr);
+void nir_alu_dest_copy(nir_alu_dest *dest, const nir_alu_dest *src,
+   nir_alu_instr *instr);
+
 /* is this source channel used? */
 static inline bool
 nir_alu_instr_channel_used(nir_alu_instr *instr, unsigned src, unsigned 
channel)
diff --git a/src/glsl/nir/nir_from_ssa.c b/src/glsl/nir/nir_from_ssa.c

[Mesa-dev] [PATCH 02/11] nir: Remove the mem_ctx parameter from ssa_def_rewrite_uses

2015-09-09 Thread Jason Ekstrand
---
 src/glsl/nir/nir.c| 2 +-
 src/glsl/nir/nir.h| 2 +-
 src/glsl/nir/nir_control_flow.c   | 2 +-
 src/glsl/nir/nir_from_ssa.c   | 5 ++---
 src/glsl/nir/nir_lower_alu_to_scalar.c| 6 ++
 src/glsl/nir/nir_lower_atomics.c  | 3 +--
 src/glsl/nir/nir_lower_idiv.c | 4 +---
 src/glsl/nir/nir_lower_io.c   | 3 +--
 src/glsl/nir/nir_lower_load_const_to_scalar.c | 3 +--
 src/glsl/nir/nir_lower_locals_to_regs.c   | 3 +--
 src/glsl/nir/nir_lower_phis_to_scalar.c   | 3 +--
 src/glsl/nir/nir_lower_system_values.c| 3 +--
 src/glsl/nir/nir_lower_vars_to_ssa.c  | 6 ++
 src/glsl/nir/nir_opt_constant_folding.c   | 4 ++--
 src/glsl/nir/nir_opt_cse.c| 6 ++
 src/glsl/nir/nir_opt_dead_cf.c| 4 +---
 src/glsl/nir/nir_opt_peephole_ffma.c  | 3 +--
 src/glsl/nir/nir_opt_peephole_select.c| 3 +--
 src/glsl/nir/nir_opt_remove_phis.c| 5 +
 src/glsl/nir/nir_search.c | 2 +-
 20 files changed, 25 insertions(+), 47 deletions(-)

diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
index 6389d45..9a22fea 100644
--- a/src/glsl/nir/nir.c
+++ b/src/glsl/nir/nir.c
@@ -1202,7 +1202,7 @@ nir_ssa_dest_init(nir_instr *instr, nir_dest *dest,
 }
 
 void
-nir_ssa_def_rewrite_uses(nir_ssa_def *def, nir_src new_src, void *mem_ctx)
+nir_ssa_def_rewrite_uses(nir_ssa_def *def, nir_src new_src)
 {
assert(!new_src.is_ssa || def != new_src.ssa);
 
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index dce2d6a..32c338c 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1730,7 +1730,7 @@ void nir_ssa_dest_init(nir_instr *instr, nir_dest *dest,
unsigned num_components, const char *name);
 void nir_ssa_def_init(nir_instr *instr, nir_ssa_def *def,
   unsigned num_components, const char *name);
-void nir_ssa_def_rewrite_uses(nir_ssa_def *def, nir_src new_src, void 
*mem_ctx);
+void nir_ssa_def_rewrite_uses(nir_ssa_def *def, nir_src new_src);
 
 /* visits basic blocks in source-code order */
 typedef bool (*nir_foreach_block_cb)(nir_block *block, void *state);
diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c
index 5c03375..768dfd2 100644
--- a/src/glsl/nir/nir_control_flow.c
+++ b/src/glsl/nir/nir_control_flow.c
@@ -654,7 +654,7 @@ replace_ssa_def_uses(nir_ssa_def *def, void *void_impl)
nir_ssa_undef_instr *undef =
   nir_ssa_undef_instr_create(mem_ctx, def->num_components);
nir_instr_insert_before_cf_list(>body, >instr);
-   nir_ssa_def_rewrite_uses(def, nir_src_for_ssa(>def), mem_ctx);
+   nir_ssa_def_rewrite_uses(def, nir_src_for_ssa(>def));
return true;
 }
 
diff --git a/src/glsl/nir/nir_from_ssa.c b/src/glsl/nir/nir_from_ssa.c
index d6569d8..084f43d 100644
--- a/src/glsl/nir/nir_from_ssa.c
+++ b/src/glsl/nir/nir_from_ssa.c
@@ -359,8 +359,7 @@ isolate_phi_nodes_block(nir_block *block, void *void_state)
   exec_list_push_tail(_pcopy->entries, >node);
 
   nir_ssa_def_rewrite_uses(>dest.ssa,
-   nir_src_for_ssa(>dest.ssa),
-   state->mem_ctx);
+   nir_src_for_ssa(>dest.ssa));
 
   nir_instr_rewrite_src(_pcopy->instr, >src,
 nir_src_for_ssa(>dest.ssa));
@@ -493,7 +492,7 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state)
   reg->num_array_elems = 0;
}
 
-   nir_ssa_def_rewrite_uses(def, nir_src_for_reg(reg), state->mem_ctx);
+   nir_ssa_def_rewrite_uses(def, nir_src_for_reg(reg));
assert(list_empty(>uses) && list_empty(>if_uses));
 
if (def->parent_instr->type == nir_instr_type_ssa_undef) {
diff --git a/src/glsl/nir/nir_lower_alu_to_scalar.c 
b/src/glsl/nir/nir_lower_alu_to_scalar.c
index 1607308..710bb37 100644
--- a/src/glsl/nir/nir_lower_alu_to_scalar.c
+++ b/src/glsl/nir/nir_lower_alu_to_scalar.c
@@ -70,8 +70,7 @@ lower_reduction(nir_alu_instr *instr, nir_op chan_op, nir_op 
merge_op,
}
 
assert(instr->dest.write_mask == 1);
-   nir_ssa_def_rewrite_uses(>dest.dest.ssa, nir_src_for_ssa(last),
-mem_ctx);
+   nir_ssa_def_rewrite_uses(>dest.dest.ssa, nir_src_for_ssa(last));
nir_instr_remove(>instr);
 }
 
@@ -168,8 +167,7 @@ lower_alu_instr_scalar(nir_alu_instr *instr, void *mem_ctx)
nir_instr_insert_before(>instr, _instr->instr);
 
nir_ssa_def_rewrite_uses(>dest.dest.ssa,
-nir_src_for_ssa(_instr->dest.dest.ssa),
-mem_ctx);
+nir_src_for_ssa(_instr->dest.dest.ssa));
 
nir_instr_remove(>instr);
 }
diff --git a/src/glsl/nir/nir_lower_atomics.c b/src/glsl/nir/nir_lower_atomics.c
index 7ae8462..6f9ecc0 100644
--- a/src/glsl/nir/nir_lower_atomics.c
+++ b/src/glsl/nir/nir_lower_atomics.c
@@ -116,8 +116,7 @@ 

[Mesa-dev] [PATCH 07/11] nir/lower_vec_to_movs: Handle partially SSA shaders

2015-09-09 Thread Jason Ekstrand
---
 src/glsl/nir/nir_lower_vec_to_movs.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/glsl/nir/nir_lower_vec_to_movs.c 
b/src/glsl/nir/nir_lower_vec_to_movs.c
index e297cb8..7d31e36 100644
--- a/src/glsl/nir/nir_lower_vec_to_movs.c
+++ b/src/glsl/nir/nir_lower_vec_to_movs.c
@@ -34,6 +34,7 @@
 
 struct vec_to_movs_state {
nir_shader *shader;
+   nir_function_impl *impl;
 };
 
 static bool
@@ -107,8 +108,20 @@ lower_vec_to_movs_block(nir_block *block, void *void_state)
  continue; /* The loop */
   }
 
-  /* Since we insert multiple MOVs, we have to be non-SSA. */
-  assert(!vec->dest.dest.is_ssa);
+  if (vec->dest.dest.is_ssa) {
+ /* Since we insert multiple MOVs, we have a register destination. */
+ nir_register *reg = nir_local_reg_create(state->impl);
+ reg->num_components = vec->dest.dest.ssa.num_components;
+
+ nir_ssa_def_rewrite_uses(>dest.dest.ssa, nir_src_for_reg(reg));
+
+ assert(list_empty(>dest.dest.ssa.uses));
+ assert(list_empty(>dest.dest.ssa.if_uses));
+
+ vec->dest.dest = nir_dest_for_reg(reg);
+ vec->dest.dest.reg.parent_instr = >instr;
+ list_addtail(>dest.dest.reg.def_link, >defs);
+  }
 
   unsigned finished_write_mask = 0;
 
@@ -151,6 +164,7 @@ nir_lower_vec_to_movs_impl(nir_function_impl *impl)
struct vec_to_movs_state state;
 
state.shader = impl->overload->function->shader;
+   state.impl = impl;
 
nir_foreach_block(impl, lower_vec_to_movs_block, );
 }
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/11] HACK: nir/lower_vec_to_movs: Coalesce into destinations of fdot instructions

2015-09-09 Thread Jason Ekstrand
This is labeled HACK because it relies on the validator not validating
destination write-masks with respect to nir_op_infos[op].output_size and
relies on the fact that the i965 vec4 hardware does an implicit splat of
any dot-product results.  Technically, nir_op_fdotN produces a single
component that lives in the first slot and the others don't exist.
However, most hardware splats dot-products so this is probably reasonable.

One solution to doing this "properly" would be to add a set of
nir_op_fdotN_replicated opcodes that do the splat and somehow lower to
those at the end of optimizations.  I don't think we want to have the
nir_op_fdot opcodes splat in SSA form because that could hurt our
opportunity for CSE.  However, the shader-db ressults below show that we
might want to have it splat for the purposes of coalescing.

Shader-db results for vec4 programs on Haswell:

   total instructions in shared programs: 1778849 -> 1751223 (-1.55%)
   instructions in affected programs: 763104 -> 735478 (-3.62%)
   helped:7067
   HURT:  26

It turns out that dot-products matter...

Cc: Eduardo Lima Mitev 
---
 src/glsl/nir/nir_lower_vec_to_movs.c | 47 ++--
 1 file changed, 34 insertions(+), 13 deletions(-)

diff --git a/src/glsl/nir/nir_lower_vec_to_movs.c 
b/src/glsl/nir/nir_lower_vec_to_movs.c
index 0ebf3e3..1aa6add 100644
--- a/src/glsl/nir/nir_lower_vec_to_movs.c
+++ b/src/glsl/nir/nir_lower_vec_to_movs.c
@@ -84,6 +84,14 @@ insert_mov(nir_alu_instr *vec, unsigned start_idx, 
nir_shader *shader)
return mov->dest.write_mask;
 }
 
+static bool
+is_fdot(nir_alu_instr *alu)
+{
+   return alu->op == nir_op_fdot2 ||
+  alu->op == nir_op_fdot3 ||
+  alu->op == nir_op_fdot4;
+}
+
 /* Attempts to coalesce the "move" from the given source of the vec to the
  * destination of the instruction generating the value. If, for whatever
  * reason, we cannot coalesce the mmove, it does nothing and returns 0.  We
@@ -121,19 +129,28 @@ try_coalesce(nir_alu_instr *vec, unsigned start_idx, 
nir_shader *shader)
nir_alu_instr *src_alu =
   nir_instr_as_alu(vec->src[start_idx].src.ssa->parent_instr);
 
-   /* We only care about being able to re-swizzle the instruction if it is
-* something that we can reswizzle.  It must be per-component.
-*/
-   if (nir_op_infos[src_alu->op].output_size != 0)
-  return 0;
-
-   /* If we are going to reswizzle the instruction, we can't have any
-* non-per-component sources either.
-*/
-   for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
-  if (nir_op_infos[src_alu->op].input_sizes[j] != 0)
+   if (is_fdot(src_alu)) {
+  /* The fdot instruction is special: It splats its result to all
+   * components.  This means that we can always rewrite its destination
+   * and we don't need to swizzle anything.
+   */
+   } else {
+  /* We only care about being able to re-swizzle the instruction if it is
+   * something that we can reswizzle.  It must be per-component.  The one
+   * exception to this is the fdotN instructions which implicitly splat
+   * their result out to all channels.
+   */
+  if (nir_op_infos[src_alu->op].output_size != 0)
  return 0;
 
+  /* If we are going to reswizzle the instruction, we can't have any
+   * non-per-component sources either.
+   */
+  for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
+ if (nir_op_infos[src_alu->op].input_sizes[j] != 0)
+return 0;
+   }
+
/* Stash off all of the ALU instruction's swizzles. */
uint8_t swizzles[4][4];
for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
@@ -153,8 +170,12 @@ try_coalesce(nir_alu_instr *vec, unsigned start_idx, 
nir_shader *shader)
* instruction so we can re-swizzle that component to match.
*/
   write_mask |= 1 << i;
-  for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
- src_alu->src[j].swizzle[i] = swizzles[j][vec->src[i].swizzle[0]];
+  if (is_fdot(src_alu)) {
+ /* Since fdot splats, we don't need to do any reswizzling */
+  } else {
+ for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
+src_alu->src[j].swizzle[i] = swizzles[j][vec->src[i].swizzle[0]];
+  }
 
   /* Clear the no longer needed vec source */
   nir_instr_rewrite_src(>instr, >src[i].src, NIR_SRC_INIT);
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/11] nir/lower_vec_to_movs: Coalesce movs on-the-fly when possible

2015-09-09 Thread Jason Ekstrand
The old pass blindly inserted a bunch of moves into the shader with no
concern for whether or not it was really needed.  This adds code to try and
coalesce into the destination of the instruction providing the value.

Shader-db results for vec4 shaders on Haswell:

   total instructions in shared programs: 1801527 -> 1778849 (-1.26%)
   instructions in affected programs: 1205644 -> 1182966 (-1.88%)
   helped:9817
   HURT:  1334

This approach is heavily based on a different patch by Eduardo Lima Mitev
.  Eduardo's patch did this in a separate pass as opposed
to integrating it into nir_lower_vec_to_movs.

Cc: Eduardo Lima Mitev 
---
 src/glsl/nir/nir_lower_vec_to_movs.c | 85 
 1 file changed, 85 insertions(+)

diff --git a/src/glsl/nir/nir_lower_vec_to_movs.c 
b/src/glsl/nir/nir_lower_vec_to_movs.c
index ed8ec9b..0ebf3e3 100644
--- a/src/glsl/nir/nir_lower_vec_to_movs.c
+++ b/src/glsl/nir/nir_lower_vec_to_movs.c
@@ -84,6 +84,88 @@ insert_mov(nir_alu_instr *vec, unsigned start_idx, 
nir_shader *shader)
return mov->dest.write_mask;
 }
 
+/* Attempts to coalesce the "move" from the given source of the vec to the
+ * destination of the instruction generating the value. If, for whatever
+ * reason, we cannot coalesce the mmove, it does nothing and returns 0.  We
+ * can then call insert_mov as normal.
+ */
+static unsigned
+try_coalesce(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader)
+{
+   assert(start_idx < nir_op_infos[vec->op].num_inputs);
+
+   /* We will only even try if the source is SSA */
+   if (!vec->src[start_idx].src.is_ssa)
+  return 0;
+
+   assert(vec->src[start_idx].src.ssa);
+
+   /* If we are going to do a reswizzle, then the vecN operation must be the
+* only use of the source value.  We also can't have any source modifiers.
+*/
+   nir_foreach_use(vec->src[start_idx].src.ssa, src) {
+  if (src->parent_instr != >instr)
+ return 0;
+
+  nir_alu_src *alu_src = exec_node_data(nir_alu_src, src, src);
+  if (alu_src->abs || alu_src->negate)
+ return 0;
+   }
+
+   if (!list_empty(>src[start_idx].src.ssa->if_uses))
+  return 0;
+
+   if (vec->src[start_idx].src.ssa->parent_instr->type != nir_instr_type_alu)
+  return 0;
+
+   nir_alu_instr *src_alu =
+  nir_instr_as_alu(vec->src[start_idx].src.ssa->parent_instr);
+
+   /* We only care about being able to re-swizzle the instruction if it is
+* something that we can reswizzle.  It must be per-component.
+*/
+   if (nir_op_infos[src_alu->op].output_size != 0)
+  return 0;
+
+   /* If we are going to reswizzle the instruction, we can't have any
+* non-per-component sources either.
+*/
+   for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
+  if (nir_op_infos[src_alu->op].input_sizes[j] != 0)
+ return 0;
+
+   /* Stash off all of the ALU instruction's swizzles. */
+   uint8_t swizzles[4][4];
+   for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
+  for (unsigned i = 0; i < 4; i++)
+ swizzles[j][i] = src_alu->src[j].swizzle[i];
+
+   unsigned write_mask = 0;
+   for (unsigned i = start_idx; i < 4; i++) {
+  if (!(vec->dest.write_mask & (1 << i)))
+ continue;
+
+  if (!vec->src[i].src.is_ssa ||
+  vec->src[i].src.ssa != _alu->dest.dest.ssa)
+ continue;
+
+  /* At this point, the give vec source matchese up with the ALU
+   * instruction so we can re-swizzle that component to match.
+   */
+  write_mask |= 1 << i;
+  for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++)
+ src_alu->src[j].swizzle[i] = swizzles[j][vec->src[i].swizzle[0]];
+
+  /* Clear the no longer needed vec source */
+  nir_instr_rewrite_src(>instr, >src[i].src, NIR_SRC_INIT);
+   }
+
+   nir_instr_rewrite_dest(_alu->instr, _alu->dest.dest, 
vec->dest.dest);
+   src_alu->dest.write_mask = write_mask;
+
+   return write_mask;
+}
+
 static bool
 lower_vec_to_movs_block(nir_block *block, void *void_state)
 {
@@ -141,6 +223,9 @@ lower_vec_to_movs_block(nir_block *block, void *void_state)
 continue;
 
  if (!(finished_write_mask & (1 << i)))
+finished_write_mask |= try_coalesce(vec, i, state->shader);
+
+ if (!(finished_write_mask & (1 << i)))
 finished_write_mask |= insert_mov(vec, i, state->shader);
   }
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/11] nir: Only unlink sources that are actually valid

2015-09-09 Thread Jason Ekstrand
---
 src/glsl/nir/nir.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
index 9a22fea..e173b21 100644
--- a/src/glsl/nir/nir.c
+++ b/src/glsl/nir/nir.c
@@ -714,9 +714,16 @@ nir_instr_insert(nir_cursor cursor, nir_instr *instr)
 }
 
 static bool
+src_is_valid(const nir_src *src)
+{
+   return src->is_ssa ? (src->ssa != NULL) : (src->reg.reg != NULL);
+}
+
+static bool
 remove_use_cb(nir_src *src, void *state)
 {
-   list_del(>use_link);
+   if (src_is_valid(src))
+  list_del(>use_link);
 
return true;
 }
@@ -1099,12 +1106,6 @@ nir_srcs_equal(nir_src src1, nir_src src2)
}
 }
 
-static bool
-src_is_valid(const nir_src *src)
-{
-   return src->is_ssa ? (src->ssa != NULL) : (src->reg.reg != NULL);
-}
-
 static void
 src_remove_all_uses(nir_src *src)
 {
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/11] Various vec4 NIR improvements

2015-09-09 Thread Jason Ekstrand
This patch series started out with me suggesting to Eduardo that his
coalescing pass would be easier if the shader was in partial SSA.  So I
started working towards getting the vec4 backend using partial SSA like the
FS backend does.  Then I started adding more stuff to lower_vec_to_movs and
the rest is history...

It's worth noting that I'm not the first one to attempt to do coalescing in
NIR.  This is heavily inspired by Eduardo Lima Mitev's patch that does
something similar.  The big differences are that a) His does it as a
separate pass and b) Mine takes advantage of partial SSA form.

Shader-db results for vec4 programs on Haswell:

   total instructions in shared programs: 1801527 -> 1751223 (-2.79%)
   instructions in affected programs: 1242113 -> 1191809 (-4.05%)
   helped:12078
   HURT:  6

Together with the RFC patches I sent out yesterday, we get the following
total delta between GLSL IR and NIR:

   total instructions in shared programs: 1853737 -> 1683930 (-9.16%)
   instructions in affected programs: 1694137 -> 1524330 (-10.02%)
   helped:12748
   HURT:  4348

Not sure if we want all these patches to land in their current form, but it
seems like our shader-db regression problem is totally solvable. :-)

Jason Ekstrand (11):
  nir: Fix a bunch of ralloc parenting errors
  nir: Remove the mem_ctx parameter from ssa_def_rewrite_uses
  nir: Only unlink sources that are actually valid
  nir: Add a function for rewriting instruction destinations
  nir/from_ssa: Use instr_rewrite_dest
  nir/lower_vec_to_movs: Add a state struct
  nir/lower_vec_to_movs: Handle partially SSA shaders
  i965/vec4_nir: Use partial SSA form rather than full non-SSA
  nir/lower_vec_to_movs: Get rid of start_idx and swizzle compacting
  nir/lower_vec_to_movs: Coalesce movs on-the-fly when possible
  HACK: nir/lower_vec_to_movs: Coalesce into destinations of fdot
instructions

 src/glsl/nir/nir.c|  61 ++---
 src/glsl/nir/nir.h|  17 +--
 src/glsl/nir/nir_control_flow.c   |   2 +-
 src/glsl/nir/nir_from_ssa.c   |  11 +-
 src/glsl/nir/nir_lower_alu_to_scalar.c|  12 +-
 src/glsl/nir/nir_lower_atomics.c  |   5 +-
 src/glsl/nir/nir_lower_idiv.c |   4 +-
 src/glsl/nir/nir_lower_io.c   |   5 +-
 src/glsl/nir/nir_lower_load_const_to_scalar.c |   3 +-
 src/glsl/nir/nir_lower_locals_to_regs.c   |  10 +-
 src/glsl/nir/nir_lower_phis_to_scalar.c   |   3 +-
 src/glsl/nir/nir_lower_system_values.c|   3 +-
 src/glsl/nir/nir_lower_vars_to_ssa.c  |   6 +-
 src/glsl/nir/nir_lower_vec_to_movs.c  | 173 ++
 src/glsl/nir/nir_opt_constant_folding.c   |   4 +-
 src/glsl/nir/nir_opt_cse.c|   6 +-
 src/glsl/nir/nir_opt_dead_cf.c|   4 +-
 src/glsl/nir/nir_opt_peephole_ffma.c  |   6 +-
 src/glsl/nir/nir_opt_peephole_select.c|   7 +-
 src/glsl/nir/nir_opt_remove_phis.c|   5 +-
 src/glsl/nir/nir_search.c |   2 +-
 src/mesa/drivers/dri/i965/brw_nir.c   |   2 +-
 src/mesa/drivers/dri/i965/brw_vec4.h  |   1 +
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp|  21 +++-
 24 files changed, 258 insertions(+), 115 deletions(-)

-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] meta: Use result of texture coordinate clamping operation

2015-09-09 Thread Ian Romanick
On 09/09/2015 04:55 PM, Anuj Phogat wrote:
> On Wed, Sep 9, 2015 at 11:30 AM, Ian Romanick  wrote:
>> I'm pretty sure our implementation of this extension is complete
>> rubbish.  I have attached an image from the blit-scaled test.  This
>> result cannot be useful to anyone.
>>
> Paul Berry and I agreed to follow the "NVIDIA Implementation Details"
> given in the extension spec. i965 driver currently use bilinear filtering
> for both SCALED_RESOLVE_FASTEST and SCALED_RESOLVE_NICEST.
> Do you think the implementation doesn't match what is suggested in
> the spec? I'll be happy to work on improving the implementation if you
> have any specific comments.
> 
> Extension spec mentions the potential loss of quality when using a
> single pass scaled resolve blit. Also the quality of image generated
> depends on sample count and scaling factor. I vaguely remember
> Paul mentioned the scaling factor of ~1.5, beyond which single pass
> scaled resolve blit quickly becomes useless. I think the attached
> image is using a scaling factor of 2.4.

Yes, I think so.  It was the last image produced by the test.

> I'm attaching an old image generated by me doing the quality
> comparison with Nvidia's proprietary driver implementation. I posted
> the image on mesa-dev before enabling the extension on i965. Please
> see the images under "New implementation" for the comparison.

Comparing our images with NVIDIA's image, I don't see very much
difference.  That's both reassuring and depressing.  Looking at all of
those images, I can't imagine why anyone would bother using this
functionality.  I doubt there's any scenario where the little "spikes"
that grow of the sides of the triangles is acceptable.  Oh well.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/10] i965: add ARB_shader_texture_image_samples support

2015-09-09 Thread Kenneth Graunke
On Thursday, August 27, 2015 11:48:29 PM Ilia Mirkin wrote:
> This should include everything. I sent a test for textureSamples to
> piglit a while ago, not sure how to test imageSamples -- apparently ms
> images aren't entirely supported on i965? But I'm not sure via what
> feat that happens.
> 
> I'll send some tests for compiler tests though.
> 
> Ilia Mirkin (10):
>   mesa: add infra for ARB_shader_texture_image_samples
>   glsl: add ir_texture_samples texture opcode
>   nir: add nir_texop_texture_samples and convert from glsl
>   glsl: add ARB_shader_texture_image_samples infrastructure
>   glsl: add support for the textureSamples function
>   i965: add support for textureSamples function
>   glsl: add support for the imageSamples function
>   nir: convert glsl imageSamples into a new intrinsic
>   i965: add handling for imageSamples
>   i965: enable ARB_shader_texture_image_samples

Series is:
Reviewed-by: Kenneth Graunke 

(with v2 of the one patch)


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] glsl/cs: Initialize gl_LocalInvocationIndex in main()

2015-09-09 Thread Ian Romanick
On 08/23/2015 01:50 AM, Jordan Justen wrote:
> We initialize gl_LocalInvocationIndex based on the extension spec
> formula:
> 
> gl_LocalInvocationIndex =
> gl_LocalInvocationID.z * gl_WorkGroupSize.x * gl_WorkGroupSize.y +
> gl_LocalInvocationID.y * gl_WorkGroupSize.x +
> gl_LocalInvocationID.x;
> 
> https://www.opengl.org/registry/specs/ARB/compute_shader.txt
> 
> Signed-off-by: Jordan Justen 
> ---
>  src/glsl/builtin_variables.cpp | 28 
>  1 file changed, 28 insertions(+)
> 
> diff --git a/src/glsl/builtin_variables.cpp b/src/glsl/builtin_variables.cpp
> index 8f8be90..082c73a 100644
> --- a/src/glsl/builtin_variables.cpp
> +++ b/src/glsl/builtin_variables.cpp
> @@ -1059,6 +1059,7 @@ builtin_variable_generator::generate_cs_special_vars()
> add_system_value(SYSTEM_VALUE_WORK_GROUP_ID, glsl_type::uvec3_type,
>  "gl_WorkGroupID");
> add_variable("gl_GlobalInvocationID", glsl_type::uvec3_type, ir_var_auto, 
> 0);
> +   add_variable("gl_LocalInvocationIndex", glsl_type::uint_type, 
> ir_var_auto, 0);
> /* TODO: finish this. */
>  }
>  
> @@ -1217,6 +1218,11 @@ _mesa_glsl_initialize_variables(exec_list 
> *instructions,
>  }
>  
>  
> +using ir_builder::swizzle_x;
> +using ir_builder::swizzle_y;
> +using ir_builder::swizzle_z;
> +
> +

Too man blank lines.

Also... I though that other places did 'using namespace ir_builder;' to
avoid all the ir_builder:: stuff.  It would certainly improve the
readability of the hunk below.

>  /**
>   * Initialize compute shader variables with values that are derived from 
> other
>   * compute shader variable.
> @@ -1249,6 +1255,28 @@ initialize_cs_derived_variables(gl_shader *shader,
>   gl_WorkGroupSize),
>   gl_LocalInvocationID));
> main_sig->body.push_head(inst);
> +
> +   /* gl_LocalInvocationIndex =
> +*gl_LocalInvocationID.z * gl_WorkGroupSize.x * gl_WorkGroupSize.y +
> +*gl_LocalInvocationID.y * gl_WorkGroupSize.x +
> +*gl_LocalInvocationID.x;
> +*/
> +   ir_expression *index_z =
> +  ir_builder::mul(ir_builder::mul(swizzle_z(gl_LocalInvocationID),
> +  swizzle_x(gl_WorkGroupSize)),
> +  swizzle_y(gl_WorkGroupSize));
> +   ir_expression *index_y =
> +  ir_builder::mul(swizzle_y(gl_LocalInvocationID),
> +  swizzle_x(gl_WorkGroupSize));
> +   ir_expression *index_y_plus_z = ir_builder::add(index_y, index_z);
> +   ir_builder::operand index_x(swizzle_x(gl_LocalInvocationID));
> +   ir_expression *index_x_plus_y_plus_z =
> +  ir_builder::add(index_y_plus_z, index_x);
> +   ir_variable *gl_LocalInvocationIndex =
> +  shader->symbols->get_variable("gl_LocalInvocationIndex");
> +   assert(gl_LocalInvocationIndex);
> +   inst = ir_builder::assign(gl_LocalInvocationIndex, index_x_plus_y_plus_z);
> +   main_sig->body.push_head(inst);
>  }
>  
>  

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] ralloc: Use __attribute__((destructor)) instead of atexit(3)

2015-09-09 Thread Ian Romanick
On 09/07/2015 06:54 AM, Jose Fonseca wrote:
> On 07/09/15 10:17, Jean-Sébastien Pédron wrote:
>> On 04.09.2015 01:37, Matt Turner wrote:
>>> You need to test for this support in configure.ac. It's as simple as
>>> adding a call to AX_GCC_FUNC_ATTRIBUTE in the existing alphabetized
>>> list and then a little bit of preprocessor in src/util/macros.h.
>>
>> Should the code fallbacks on atexit(3) if the attribute is not
>> supported?
> 
> At least on Windows, with MSVC,  atexit should be the right thing to do,
> since we statically link MSVC RunTime,
> 
> 
>> Can I use the HAVE_FUNC_ATTRIBUTE_DESTRUCTOR macro in
>> ralloc.c for this purpose?
> 
> For the record, another alternative (way more portable), is you have a
> simple .cpp file with a static destructior:

Arg. :(  We used to have code that used this in several places, but that
idiom has gradually been removed.  See c61bc6ed (although that was using
a constructor to initialize instead of a destructor to free).  I don't
care too much what idiom we pick, but I think we should pick one and
stick with it.

>   class AtExit
>   {
>   public:
>  ~AtExit() {
>  // do what must be done
>   }
>   };
> 
>   AtExit atExit();
> 
> 
> After all, it seems wrong to use non-standard C to replicate what
> standard C++ can do.
> 
> 
> Jose
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/9] i965: Silence unused parameter warnings in intel_mipmap_tree.c

2015-09-09 Thread Ian Romanick
From: Ian Romanick 

The target parameter of compute_msaa_layout appears to be unused since
83b83fb when support for CMS textures was added for Gen7.

The brw parameter of intel_get_non_msrt_mcs_alignment appears to be
unused since e92fbdc when the GEN check (along with the "can we fast
clear" decision) was moved to a different function.

intel_mipmap_tree.c: In function 'compute_msaa_layout':
intel_mipmap_tree.c:62:73: warning: unused parameter 'target' 
[-Wunused-parameter]
 compute_msaa_layout(struct brw_context *brw, mesa_format format, GLenum target,
 ^
intel_mipmap_tree.c: In function 'intel_get_non_msrt_mcs_alignment':
intel_mipmap_tree.c:143:54: warning: unused parameter 'brw' [-Wunused-parameter]
 intel_get_non_msrt_mcs_alignment(struct brw_context *brw,
  ^

Signed-off-by: Ian Romanick 
Cc: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 8 
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c   | 9 -
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h   | 3 +--
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index f5ecbb5..eb20173 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -204,7 +204,7 @@ brw_draw_rectlist(struct gl_context *ctx, struct rect 
*rect, int num_instances)
 }
 
 static void
-get_fast_clear_rect(struct brw_context *brw, struct gl_framebuffer *fb,
+get_fast_clear_rect(struct gl_framebuffer *fb,
 struct intel_renderbuffer *irb, struct rect *rect)
 {
unsigned int x_align, y_align;
@@ -226,7 +226,7 @@ get_fast_clear_rect(struct brw_context *brw, struct 
gl_framebuffer *fb,
* alignment size returned by intel_get_non_msrt_mcs_alignment(), but
* with X alignment multiplied by 16 and Y alignment multiplied by 32.
*/
-  intel_get_non_msrt_mcs_alignment(brw, irb->mt, _align, _align);
+  intel_get_non_msrt_mcs_alignment(irb->mt, _align, _align);
   x_align *= 16;
   y_align *= 32;
 
@@ -516,7 +516,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
  irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_RESOLVED;
  irb->need_downsample = true;
  fast_clear_buffers |= 1 << index;
- get_fast_clear_rect(brw, fb, irb, _clear_rect);
+ get_fast_clear_rect(fb, irb, _clear_rect);
  break;
 
   case REP_CLEAR:
@@ -653,7 +653,7 @@ get_resolve_rect(struct brw_context *brw,
 * by 8 and 16 and 8 and 8 for SKL.
 */
 
-   intel_get_non_msrt_mcs_alignment(brw, mt, _align, _align);
+   intel_get_non_msrt_mcs_alignment(mt, _align, _align);
if (brw->gen >= 9) {
   x_scaledown = x_align * 8;
   y_scaledown = y_align * 8;
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 19f66b7..485752f 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -59,7 +59,7 @@ intel_miptree_alloc_mcs(struct brw_context *brw,
  * created, based on the chip generation and the surface type.
  */
 static enum intel_msaa_layout
-compute_msaa_layout(struct brw_context *brw, mesa_format format, GLenum target,
+compute_msaa_layout(struct brw_context *brw, mesa_format format,
 bool disable_aux_buffers)
 {
/* Prior to Gen7, all MSAA surfaces used IMS layout. */
@@ -140,8 +140,7 @@ compute_msaa_layout(struct brw_context *brw, mesa_format 
format, GLenum target,
  *   by half the block width, and Y coordinates by half the block height.
  */
 void
-intel_get_non_msrt_mcs_alignment(struct brw_context *brw,
- struct intel_mipmap_tree *mt,
+intel_get_non_msrt_mcs_alignment(struct intel_mipmap_tree *mt,
  unsigned *width_px, unsigned *height)
 {
switch (mt->tiling) {
@@ -322,7 +321,7 @@ intel_miptree_create_layout(struct brw_context *brw,
if (num_samples > 1) {
   /* Adjust width/height/depth for MSAA */
   mt->msaa_layout = compute_msaa_layout(brw, format,
-mt->target, 
mt->disable_aux_buffers);
+mt->disable_aux_buffers);
   if (mt->msaa_layout == INTEL_MSAA_LAYOUT_IMS) {
  /* From the Ivybridge PRM, Volume 1, Part 1, page 108:
   * "If the surface is multisampled and it is a depth or stencil
@@ -1427,7 +1426,7 @@ intel_miptree_alloc_non_msrt_mcs(struct brw_context *brw,
const mesa_format format = MESA_FORMAT_R_UINT32;
unsigned block_width_px;
unsigned block_height;
-   intel_get_non_msrt_mcs_alignment(brw, mt, _width_px, _height);
+   

[Mesa-dev] [PATCH 9/9] i915, i965: Silence unused parameter warnings in intel_batchbuffer_advance

2015-09-09 Thread Ian Romanick
From: Ian Romanick 

These only occurred in release builds, but they occurred in every file
that included intel_batchbuffer.h.  Lots of spam. :(

intel_batchbuffer.h: In function 'intel_batchbuffer_advance':
intel_batchbuffer.h:153:47: warning: unused parameter 'brw' [-Wunused-parameter]
 intel_batchbuffer_advance(struct brw_context *brw)
   ^

Signed-off-by: Ian Romanick 
---
 src/mesa/drivers/dri/i915/intel_batchbuffer.h | 2 ++
 src/mesa/drivers/dri/i965/intel_batchbuffer.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/src/mesa/drivers/dri/i915/intel_batchbuffer.h 
b/src/mesa/drivers/dri/i915/intel_batchbuffer.h
index feecc01..c4efa76 100644
--- a/src/mesa/drivers/dri/i915/intel_batchbuffer.h
+++ b/src/mesa/drivers/dri/i915/intel_batchbuffer.h
@@ -128,6 +128,8 @@ intel_batchbuffer_advance(struct intel_context *intel)
   abort();
}
batch->total = 0;
+#else
+   (void) intel;
 #endif
 }
 
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
index 84add92..2b177d3 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
@@ -162,6 +162,8 @@ intel_batchbuffer_advance(struct brw_context *brw)
   abort();
}
batch->total = 0;
+#else
+   (void) brw;
 #endif
 }
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/9] i915: Silence unused parameter warnings

2015-09-09 Thread Ian Romanick
From: Ian Romanick 

intel_mipmap_tree.c: In function 'old_intel_miptree_unmap_raw':
intel_mipmap_tree.c:726:51: warning: unused parameter 'intel' 
[-Wunused-parameter]
 intel_miptree_unmap_raw(struct intel_context *intel,
   ^

Signed-off-by: Ian Romanick 
---
 src/mesa/drivers/dri/i915/intel_mipmap_tree.c  | 7 +++
 src/mesa/drivers/dri/i915/intel_mipmap_tree.h  | 3 +--
 src/mesa/drivers/dri/i915/intel_tex_subimage.c | 2 +-
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i915/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i915/intel_mipmap_tree.c
index 1aa06c1..0fcc2b1 100644
--- a/src/mesa/drivers/dri/i915/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i915/intel_mipmap_tree.c
@@ -723,8 +723,7 @@ intel_miptree_map_raw(struct intel_context *intel, struct 
intel_mipmap_tree *mt)
 }
 
 void
-intel_miptree_unmap_raw(struct intel_context *intel,
-struct intel_mipmap_tree *mt)
+intel_miptree_unmap_raw(struct intel_mipmap_tree *mt)
 {
drm_intel_bo_unmap(mt->region->bo);
 }
@@ -778,7 +777,7 @@ intel_miptree_unmap_gtt(struct intel_context *intel,
unsigned int level,
unsigned int slice)
 {
-   intel_miptree_unmap_raw(intel, mt);
+   intel_miptree_unmap_raw(mt);
 }
 
 static void
@@ -833,7 +832,7 @@ intel_miptree_unmap_blit(struct intel_context *intel,
 {
struct gl_context *ctx = >ctx;
 
-   intel_miptree_unmap_raw(intel, map->mt);
+   intel_miptree_unmap_raw(map->mt);
 
if (map->mode & GL_MAP_WRITE_BIT) {
   bool ok = intel_miptree_blit(intel,
diff --git a/src/mesa/drivers/dri/i915/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i915/intel_mipmap_tree.h
index f6ed717..aab30ed 100644
--- a/src/mesa/drivers/dri/i915/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i915/intel_mipmap_tree.h
@@ -348,8 +348,7 @@ void i945_miptree_layout(struct intel_mipmap_tree *mt);
 void *intel_miptree_map_raw(struct intel_context *intel,
 struct intel_mipmap_tree *mt);
 
-void intel_miptree_unmap_raw(struct intel_context *intel,
- struct intel_mipmap_tree *mt);
+void intel_miptree_unmap_raw(struct intel_mipmap_tree *mt);
 
 void
 intel_miptree_map(struct intel_context *intel,
diff --git a/src/mesa/drivers/dri/i915/intel_tex_subimage.c 
b/src/mesa/drivers/dri/i915/intel_tex_subimage.c
index f11ef2e..4083d69 100644
--- a/src/mesa/drivers/dri/i915/intel_tex_subimage.c
+++ b/src/mesa/drivers/dri/i915/intel_tex_subimage.c
@@ -102,7 +102,7 @@ intel_blit_texsubimage(struct gl_context * ctx,
   _mesa_error(ctx, GL_OUT_OF_MEMORY, "intelTexSubImage");
}
 
-   intel_miptree_unmap_raw(intel, temp_mt);
+   intel_miptree_unmap_raw(temp_mt);
 
bool ret;
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/9] i915, i965: Silence unused parameter warnings in intel_miptree_unmap_gtt

2015-09-09 Thread Ian Romanick
From: Ian Romanick 

intel_mipmap_tree.c: In function 'intel_miptree_unmap_gtt':
intel_mipmap_tree.c:777:34: warning: unused parameter 'map' [-Wunused-parameter]
struct intel_miptree_map *map,
  ^
intel_mipmap_tree.c:778:17: warning: unused parameter 'level' 
[-Wunused-parameter]
unsigned int level,
 ^
intel_mipmap_tree.c:779:17: warning: unused parameter 'slice' 
[-Wunused-parameter]
unsigned int slice)
 ^

Signed-off-by: Ian Romanick 
---
 src/mesa/drivers/dri/i915/intel_mipmap_tree.c | 8 ++--
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 8 ++--
 2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/src/mesa/drivers/dri/i915/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i915/intel_mipmap_tree.c
index 0fcc2b1..307ad4d 100644
--- a/src/mesa/drivers/dri/i915/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i915/intel_mipmap_tree.c
@@ -771,11 +771,7 @@ intel_miptree_map_gtt(struct intel_context *intel,
 }
 
 static void
-intel_miptree_unmap_gtt(struct intel_context *intel,
-   struct intel_mipmap_tree *mt,
-   struct intel_miptree_map *map,
-   unsigned int level,
-   unsigned int slice)
+intel_miptree_unmap_gtt(struct intel_mipmap_tree *mt)
 {
intel_miptree_unmap_raw(mt);
 }
@@ -948,7 +944,7 @@ intel_miptree_unmap(struct intel_context *intel,
if (map->mt) {
   intel_miptree_unmap_blit(intel, mt, map, level, slice);
} else {
-  intel_miptree_unmap_gtt(intel, mt, map, level, slice);
+  intel_miptree_unmap_gtt(mt);
}
 
intel_miptree_release_map(mt, level, slice);
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index f68aba2..61c49de 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -2128,11 +2128,7 @@ intel_miptree_map_gtt(struct brw_context *brw,
 }
 
 static void
-intel_miptree_unmap_gtt(struct brw_context *brw,
-   struct intel_mipmap_tree *mt,
-   struct intel_miptree_map *map,
-   unsigned int level,
-   unsigned int slice)
+intel_miptree_unmap_gtt(struct intel_mipmap_tree *mt)
 {
intel_miptree_unmap_raw(mt);
 }
@@ -2738,7 +2734,7 @@ intel_miptree_unmap(struct brw_context *brw,
   intel_miptree_unmap_movntdqa(brw, mt, map, level, slice);
 #endif
} else {
-  intel_miptree_unmap_gtt(brw, mt, map, level, slice);
+  intel_miptree_unmap_gtt(mt);
}
 
intel_miptree_release_map(mt, level, slice);
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/9] i965: Silence unused parameter warnings in intel_fbo.c

2015-09-09 Thread Ian Romanick
From: Ian Romanick 

intel_fbo.c: In function 'intel_alloc_window_storage':
intel_fbo.c:415:48: warning: unused parameter 'ctx' [-Wunused-parameter]
 intel_alloc_window_storage(struct gl_context * ctx, struct gl_renderbuffer *rb,
^
intel_fbo.c: In function 'intel_nop_alloc_storage':
intel_fbo.c:428:74: warning: unused parameter 'rb' [-Wunused-parameter]
 intel_nop_alloc_storage(struct gl_context * ctx, struct gl_renderbuffer *rb,
  ^
intel_fbo.c:429:32: warning: unused parameter 'internalFormat' 
[-Wunused-parameter]
 GLenum internalFormat, GLuint width, GLuint height)
^
intel_fbo.c:429:55: warning: unused parameter 'width' [-Wunused-parameter]
 GLenum internalFormat, GLuint width, GLuint height)
   ^
intel_fbo.c:429:69: warning: unused parameter 'height' [-Wunused-parameter]
 GLenum internalFormat, GLuint width, GLuint height)
 ^
intel_fbo.c: In function 'intel_blit_framebuffer_with_blitter':
intel_fbo.c:790:61: warning: unused parameter 'filter' [-Wunused-parameter]
 GLbitfield mask, GLenum filter)
 ^

Signed-off-by: Ian Romanick 
---
 src/mesa/drivers/dri/i965/intel_fbo.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_fbo.c 
b/src/mesa/drivers/dri/i965/intel_fbo.c
index 64d57e8..cc90a49 100644
--- a/src/mesa/drivers/dri/i965/intel_fbo.c
+++ b/src/mesa/drivers/dri/i965/intel_fbo.c
@@ -415,6 +415,7 @@ static GLboolean
 intel_alloc_window_storage(struct gl_context * ctx, struct gl_renderbuffer *rb,
GLenum internalFormat, GLuint width, GLuint height)
 {
+   (void) ctx;
assert(rb->Name == 0);
rb->Width = width;
rb->Height = height;
@@ -428,6 +429,10 @@ static GLboolean
 intel_nop_alloc_storage(struct gl_context * ctx, struct gl_renderbuffer *rb,
 GLenum internalFormat, GLuint width, GLuint height)
 {
+   (void) rb;
+   (void) internalFormat;
+   (void) width;
+   (void) height;
_mesa_problem(ctx, "intel_nop_alloc_storage should never be called.");
return false;
 }
@@ -787,7 +792,7 @@ intel_blit_framebuffer_with_blitter(struct gl_context *ctx,
 GLint srcX1, GLint srcY1,
 GLint dstX0, GLint dstY0,
 GLint dstX1, GLint dstY1,
-GLbitfield mask, GLenum filter)
+GLbitfield mask)
 {
struct brw_context *brw = brw_context(ctx);
 
@@ -907,7 +912,7 @@ intel_blit_framebuffer(struct gl_context *ctx,
mask = intel_blit_framebuffer_with_blitter(ctx, readFb, drawFb,
   srcX0, srcY0, srcX1, srcY1,
   dstX0, dstY0, dstX1, dstY1,
-  mask, filter);
+  mask);
if (mask == 0x0)
   return;
 
@@ -945,7 +950,7 @@ gen4_blit_framebuffer(struct gl_context *ctx,
mask = intel_blit_framebuffer_with_blitter(ctx, readFb, drawFb,
   srcX0, srcY0, srcX1, srcY1,
   dstX0, dstY0, dstX1, dstY1,
-  mask, filter);
+  mask);
if (mask == 0x0)
   return;
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/9] i965: Make intel_miptree_map_raw static

2015-09-09 Thread Ian Romanick
From: Ian Romanick 

This hasn't been used outside intel_mipmap_tree.c since d5d4ba9 started
using meta instead of the blitter for PBO TexSubImage.  While we're here,

1. s/intel_miptree_\([^_]\)_raw/miptree_\1_raw/g because we don't
generally prefix static functions.

2. Remove the unused brw parameter from the function formerly known as
intel_miptree_unmap_raw.

Signed-off-by: Ian Romanick 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 30 +++
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  6 --
 2 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 485752f..f68aba2 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -49,6 +49,11 @@
 
 #define FILE_DEBUG_FLAG DEBUG_MIPTREE
 
+static void *intel_miptree_map_raw(struct brw_context *brw,
+   struct intel_mipmap_tree *mt);
+
+static void intel_miptree_unmap_raw(struct intel_mipmap_tree *mt);
+
 static bool
 intel_miptree_alloc_mcs(struct brw_context *brw,
 struct intel_mipmap_tree *mt,
@@ -1400,7 +1405,7 @@ intel_miptree_alloc_mcs(struct brw_context *brw,
 */
void *data = intel_miptree_map_raw(brw, mt->mcs_mt);
memset(data, 0xff, mt->mcs_mt->total_height * mt->mcs_mt->pitch);
-   intel_miptree_unmap_raw(brw, mt->mcs_mt);
+   intel_miptree_unmap_raw(mt->mcs_mt);
mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;
 
return mt->mcs_mt;
@@ -2072,8 +2077,7 @@ intel_miptree_map_raw(struct brw_context *brw, struct 
intel_mipmap_tree *mt)
 }
 
 void
-intel_miptree_unmap_raw(struct brw_context *brw,
-struct intel_mipmap_tree *mt)
+intel_miptree_unmap_raw(struct intel_mipmap_tree *mt)
 {
drm_intel_bo_unmap(mt->bo);
 }
@@ -2130,7 +2134,7 @@ intel_miptree_unmap_gtt(struct brw_context *brw,
unsigned int level,
unsigned int slice)
 {
-   intel_miptree_unmap_raw(brw, mt);
+   intel_miptree_unmap_raw(mt);
 }
 
 static void
@@ -2191,7 +2195,7 @@ intel_miptree_unmap_blit(struct brw_context *brw,
 {
struct gl_context *ctx = >ctx;
 
-   intel_miptree_unmap_raw(brw, map->mt);
+   intel_miptree_unmap_raw(map->mt);
 
if (map->mode & GL_MAP_WRITE_BIT) {
   bool ok = intel_miptree_blit(brw,
@@ -2263,7 +2267,7 @@ intel_miptree_map_movntdqa(struct brw_context *brw,
   _mesa_streaming_load_memcpy(dst_ptr, src_ptr, width_bytes);
}
 
-   intel_miptree_unmap_raw(brw, mt);
+   intel_miptree_unmap_raw(mt);
 }
 
 static void
@@ -2312,7 +2316,7 @@ intel_miptree_map_s8(struct brw_context *brw,
 }
   }
 
-  intel_miptree_unmap_raw(brw, mt);
+  intel_miptree_unmap_raw(mt);
 
   DBG("%s: %d,%d %dx%d from mt %p %d,%d = %p/%d\n", __func__,
  map->x, map->y, map->w, map->h,
@@ -2348,7 +2352,7 @@ intel_miptree_unmap_s8(struct brw_context *brw,
 }
   }
 
-  intel_miptree_unmap_raw(brw, mt);
+  intel_miptree_unmap_raw(mt);
}
 
free(map->buffer);
@@ -2402,7 +2406,7 @@ intel_miptree_unmap_etc(struct brw_context *brw,
map->ptr, map->stride,
map->w, map->h, mt->etc_format);
 
-   intel_miptree_unmap_raw(brw, mt);
+   intel_miptree_unmap_raw(mt);
free(map->buffer);
 }
 
@@ -2472,8 +2476,8 @@ intel_miptree_map_depthstencil(struct brw_context *brw,
 }
   }
 
-  intel_miptree_unmap_raw(brw, s_mt);
-  intel_miptree_unmap_raw(brw, z_mt);
+  intel_miptree_unmap_raw(s_mt);
+  intel_miptree_unmap_raw(z_mt);
 
   DBG("%s: %d,%d %dx%d from z mt %p %d,%d, s mt %p %d,%d = %p/%d\n",
  __func__,
@@ -2532,8 +2536,8 @@ intel_miptree_unmap_depthstencil(struct brw_context *brw,
 }
   }
 
-  intel_miptree_unmap_raw(brw, s_mt);
-  intel_miptree_unmap_raw(brw, z_mt);
+  intel_miptree_unmap_raw(s_mt);
+  intel_miptree_unmap_raw(z_mt);
 
   DBG("%s: %d,%d %dx%d from z mt %p (%s) %d,%d, s mt %p %d,%d = %p/%d\n",
  __func__,
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index d91d4be..93cbd62 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -774,12 +774,6 @@ brw_miptree_layout(struct brw_context *brw,
struct intel_mipmap_tree *mt,
uint32_t layout_flags);
 
-void *intel_miptree_map_raw(struct brw_context *brw,
-struct intel_mipmap_tree *mt);
-
-void intel_miptree_unmap_raw(struct brw_context *brw,
- struct intel_mipmap_tree *mt);
-
 void
 intel_miptree_map(struct brw_context *brw,
  struct intel_mipmap_tree *mt,
-- 
2.1.0


[Mesa-dev] [PATCH 3/9] i915, i965: Silence unused parameter warnings in intel_mipmap_tree.h

2015-09-09 Thread Ian Romanick
From: Ian Romanick 

These only occurred in release builds, but they occurred in every file
that included intel_mipmap_tree.h.  Lots of spam. :(

intel_mipmap_tree.h: In function 'intel_miptree_check_level_layer':
intel_mipmap_tree.h:595:59: warning: unused parameter 'mt' [-Wunused-parameter]
 intel_miptree_check_level_layer(struct intel_mipmap_tree *mt,
   ^
intel_mipmap_tree.h:596:42: warning: unused parameter 'level' 
[-Wunused-parameter]
 uint32_t level,
  ^
intel_mipmap_tree.h:597:42: warning: unused parameter 'layer' 
[-Wunused-parameter]
 uint32_t layer)
  ^

Signed-off-by: Ian Romanick 
---
 src/mesa/drivers/dri/i915/intel_mipmap_tree.h | 4 
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 4 
 2 files changed, 8 insertions(+)

diff --git a/src/mesa/drivers/dri/i915/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i915/intel_mipmap_tree.h
index 77b1f54..d711a29 100644
--- a/src/mesa/drivers/dri/i915/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i915/intel_mipmap_tree.h
@@ -285,6 +285,10 @@ intel_miptree_check_level_layer(struct intel_mipmap_tree 
*mt,
 uint32_t level,
 uint32_t layer)
 {
+   (void) mt;
+   (void) level;
+   (void) layer;
+
assert(level >= mt->first_level);
assert(level <= mt->last_level);
assert(layer < mt->level[level].depth);
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index 81e5f52..d91d4be 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -596,6 +596,10 @@ intel_miptree_check_level_layer(struct intel_mipmap_tree 
*mt,
 uint32_t level,
 uint32_t layer)
 {
+   (void) mt;
+   (void) level;
+   (void) layer;
+
assert(level >= mt->first_level);
assert(level <= mt->last_level);
assert(layer < mt->level[level].depth);
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >