[Mesa-dev] [PATCH] Allow sampler arrays to have precision qualifier

2014-02-03 Thread Kevin Rogovin
Fixes a minor oversight where a precision qualifier was not
allowed for an array of samplers.

---
 src/glsl/ast_to_hir.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 1bfb4e5..08fd0d3 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -3397,7 +3397,8 @@ ast_declarator_list::hir(exec_list *instructions,
!var-type-is_sampler()
!(var-type-is_array()
 (var-type-fields.array-is_float()
-   || var-type-fields.array-is_integer( {
+   || var-type-fields.array-is_integer()
+   || var-type-fields.array-is_sampler( {
 
  _mesa_glsl_error(loc, state,
   precision qualifiers apply only to floating point
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 26/35] meta: Use common GLSL code for blits

2014-02-03 Thread Rogovin, Kevin
Hi,

 We can't do stencil blits with GLSL because no driver that uses meta can
 do the GL_ARB_shader_stencil_export extension.  For depth and color
 blits, we can always write the values from the shader, and disable
 writes to the buffers using glColorMask and glDepthMask.  So... we just
 need one shader.

Should we add GL_ARB_shader_stencil_export as one of the extensions Mesa
can support? Naturally that is major tangent from this, and I am not too sure
what hardware beyond newer AMD hardware has that capability.

The shader had that the fragment depth used the same value as the red of the
fragment color, so should it not be then 3 shaders (color only, depth only, 
color and depth)
with the last one having two samplers?

Also, should not the case of blitting from MSAA to non-MSAA be handled 
correctly?
i.e. use the sampler type sampler2DMS  and average over the number of samples?
or possible fancier filtering kernels?

Best Regards,
 -Kevin Rogovin

From: Ian Romanick [i...@freedesktop.org]
Sent: Sunday, February 02, 2014 5:45 PM
To: Rogovin, Kevin; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 26/35] meta: Use common GLSL code for blits

On 01/30/2014 09:51 AM, Rogovin, Kevin wrote:
 @@ -487,6 +486,7 @@ setup_shader_for_sampler(struct gl_context *ctx, struct 
 glsl_sampler
 *sampler)
   void main()\n
  {\n
 gl_FragColor = %s(texSampler, %s);\n
 + gl_FragDepth = gl_FragColor.x;\n

 *IF* I understood what this is to do, is to provide FBO blit API
 implementation from upstairs mesa. Should not a different shader be used
 base on the 8 possibilities(really 7 since all 0 means no-op) of the
 bits being up and down: COLOR_BUFFER_BIT, DEPTH_BUFFER_BIT and
 STENCIL_BUFFER_BIT ? The above looks like to me to handle either
 COLOR_BUFFER_BIT or DEPTH_BUFFER_BIT up, but not both. With that in mind
 then there would be up to 3 separate samplers (one for color, one for
 stencil and one for depth).

We can't do stencil blits with GLSL because no driver that uses meta can
do the GL_ARB_shader_stencil_export extension.  For depth and color
blits, we can always write the values from the shader, and disable
writes to the buffers using glColorMask and glDepthMask.  So... we just
need one shader.

 Or am I missing something?

 -Kevin


 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] i965/vec4: Emit shader w/a for Gen6 gather

2014-02-03 Thread Chris Forbes
Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/drivers/dri/i965/brw_vec4.h   |  1 +
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 22 ++
 2 files changed, 23 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index e17b5cd..1cf74db 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -482,6 +482,7 @@ public:
 
uint32_t gather_channel(ir_texture *ir, int sampler);
src_reg emit_mcs_fetch(ir_texture *ir, src_reg coordinate, int sampler);
+   void emit_gen6_gather_wa(uint8_t wa, dst_reg dst);
void swizzle_result(ir_texture *ir, src_reg orig_val, int sampler);
 
void emit_ndc_computation();
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index aa5fb6a..c03e4fb 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -2499,9 +2499,31 @@ vec4_visitor::visit(ir_texture *ir)
   }
}
 
+   if (brw-gen == 6  ir-op == ir_tg4  key-tex.gen6_gather_wa[sampler]) {
+  emit_gen6_gather_wa(key-tex.gen6_gather_wa[sampler], inst-dst);
+   }
+
swizzle_result(ir, src_reg(inst-dst), sampler);
 }
 
+/*
+ * Apply workarounds for Gen6 gather with UINT/SINT
+ */
+void
+vec4_visitor::emit_gen6_gather_wa(uint8_t wa, dst_reg dst)
+{
+   int width = (wa  WA_8BIT) ? 8 : 16;
+   dst_reg dst_f = dst;
+   dst_f.type = BRW_REGISTER_TYPE_F;
+   emit(MUL(dst_f, src_reg(dst_f), src_reg((float)((1  width) - 1;
+   emit(MOV(dst, src_reg(dst_f)));
+
+   if (wa  WA_SIGN) {
+  emit(SHL(dst, src_reg(dst), src_reg(32 - width)));
+  emit(ASR(dst, src_reg(dst), src_reg(32 - width)));
+   }
+}
+
 /**
  * Set up the gather channel based on the swizzle, for gather4.
  */
-- 
1.8.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] i965: Add Gen6 gather wa to sampler key

2014-02-03 Thread Chris Forbes
Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/drivers/dri/i965/brw_program.h | 11 +++
 src/mesa/drivers/dri/i965/brw_wm.c  | 20 
 2 files changed, 31 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_program.h 
b/src/mesa/drivers/dri/i965/brw_program.h
index 51182ea..c071b5b 100644
--- a/src/mesa/drivers/dri/i965/brw_program.h
+++ b/src/mesa/drivers/dri/i965/brw_program.h
@@ -24,6 +24,12 @@
 #ifndef BRW_PROGRAM_H
 #define BRW_PROGRAM_H
 
+enum gen6_gather_sampler_wa {
+   WA_SIGN = 1, /* whether we need to sign extend */
+   WA_8BIT = 2,  /* if we have an 8bit format needing wa */
+   WA_16BIT = 4, /* if we have a 16bit format needing wa */
+};
+
 /**
  * Sampler information needed by VS, WM, and GS program cache keys.
  */
@@ -50,6 +56,11 @@ struct brw_sampler_prog_key_data {
 * Whether this sampler uses the compressed multisample surface layout.
 */
uint16_t compressed_multisample_layout_mask;
+
+   /**
+* For Sandybridge, which shader w/a we need for gather quirks.
+*/
+   uint8_t gen6_gather_wa[MAX_SAMPLERS];
 };
 
 #ifdef __cplusplus
diff --git a/src/mesa/drivers/dri/i965/brw_wm.c 
b/src/mesa/drivers/dri/i965/brw_wm.c
index a0758d2..97016c6 100644
--- a/src/mesa/drivers/dri/i965/brw_wm.c
+++ b/src/mesa/drivers/dri/i965/brw_wm.c
@@ -317,6 +317,20 @@ brw_wm_debug_recompile(struct brw_context *brw,
}
 }
 
+static uint8_t
+_gen6_gather_wa(GLenum internalformat)
+{
+   switch (internalformat) {
+  case GL_R8I: return WA_SIGN | WA_8BIT;
+  case GL_R8UI: return WA_8BIT;
+  case GL_R16I: return WA_SIGN | WA_16BIT;
+  case GL_R16UI: return WA_16BIT;
+  /* note that even though GL_R32I and GL_R32UI have format overrides
+   * in the surface state, there is no shader w/a required */
+  default: return 0;
+   }
+}
+
 void
 brw_populate_sampler_prog_key_data(struct gl_context *ctx,
   const struct gl_program *prog,
@@ -372,6 +386,12 @@ brw_populate_sampler_prog_key_data(struct gl_context *ctx,
key-gather_channel_quirk_mask |= 1  s;
  }
 
+ /* Gen6's gather4 is broken for UINT/SINT; we treat them as 
UNORM/FLOAT instead
+  * and fix it in the shader. */
+ if (brw-gen == 6  prog-UsesGather) {
+key-gen6_gather_wa[s] = _gen6_gather_wa(img-InternalFormat);
+ }
+
  /* If this is a multisample sampler, and uses the CMS MSAA layout,
   * then we need to emit slightly different code to first sample the
   * MCS surface.
-- 
1.8.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] i965: Enable ARB_texture_gather for one component on Gen6.

2014-02-03 Thread Chris Forbes
Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/drivers/dri/i965/brw_context.c  | 2 ++
 src/mesa/drivers/dri/i965/intel_extensions.c | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index d9d1ae5..b140d3d 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -367,6 +367,8 @@ brw_initialize_context_constants(struct brw_context *brw)
 
if (brw-gen = 7)
   ctx-Const.MaxProgramTextureGatherComponents = 4;
+   else if (brw-gen == 6)
+  ctx-Const.MaxProgramTextureGatherComponents = 1;
 
ctx-Const.MinLineWidth = 1.0;
ctx-Const.MinLineWidthAA = 1.0;
diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 2b5ed0b..5fd6ec2 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -270,6 +270,7 @@ intelInitExtensions(struct gl_context *ctx)
   ctx-Extensions.ARB_texture_multisample = true;
   ctx-Extensions.ARB_sample_shading = true;
   ctx-Extensions.ARB_vertex_type_10f_11f_11f_rev = true;
+  ctx-Extensions.ARB_texture_gather = true;
 
   /* Test if the kernel has the ioctl. */
   if (drm_intel_reg_read(brw-bufmgr, TIMESTAMP, dummy) == 0)
@@ -284,7 +285,6 @@ intelInitExtensions(struct gl_context *ctx)
}
 
if (brw-gen = 7) {
-  ctx-Extensions.ARB_texture_gather = true;
   ctx-Extensions.ARB_conservative_depth = true;
   ctx-Extensions.AMD_vertex_shader_layer = true;
   if (can_do_pipelined_register_writes(brw)) {
-- 
1.8.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] i965: Add surface format overrides for Gen6 gather

2014-02-03 Thread Chris Forbes
Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 30 
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
index dd96c9b..247b663 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -282,15 +282,35 @@ brw_update_texture_surface(struct gl_context *ctx,
surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
  6 * 4, 32, surf_offset);
 
-   (void) for_gather;   /* no w/a to apply for this gen */
+   uint32_t tex_format = translate_tex_format(brw, mt-format,
+  sampler-sRGBDecode);
+
+   if (for_gather) {
+  switch (tex_format) {
+  case BRW_SURFACEFORMAT_R8_SINT:
+  case BRW_SURFACEFORMAT_R8_UINT:
+ tex_format = BRW_SURFACEFORMAT_R8_UNORM;
+ break;
+
+  case BRW_SURFACEFORMAT_R16_SINT:
+  case BRW_SURFACEFORMAT_R16_UINT:
+ tex_format = BRW_SURFACEFORMAT_R16_UNORM;
+ break;
+
+  case BRW_SURFACEFORMAT_R32_SINT:
+  case BRW_SURFACEFORMAT_R32_UINT:
+ tex_format = BRW_SURFACEFORMAT_R32_FLOAT;
+ break;
+
+  default:
+ break;
+  }
+   }
 
surf[0] = (translate_tex_target(tObj-Target)  BRW_SURFACE_TYPE_SHIFT |
  BRW_SURFACE_MIPMAPLAYOUT_BELOW  BRW_SURFACE_MIPLAYOUT_SHIFT |
  BRW_SURFACE_CUBEFACE_ENABLES |
- (translate_tex_format(brw,
-mt-format,
-   sampler-sRGBDecode) 
-  BRW_SURFACE_FORMAT_SHIFT));
+ tex_format  BRW_SURFACE_FORMAT_SHIFT);
 
surf[1] = intelObj-mt-region-bo-offset64 + intelObj-mt-offset; /* 
reloc */
 
-- 
1.8.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] ARB_texture_gather support for Sandy Bridge

2014-02-03 Thread Chris Forbes
This series adds a bunch of workarounds to enable ARB_texture_gather
(in its more restrictive form) on Gen6 hardware.

These are necessary because Gen6's gather4 instruction doesn't work
correctly with integer or unsigned integer formats.

The approach is:

* For 32-bit wide formats, pretend the surface is FLOAT, and reinterpret
the bits as INT/UINT. This requires only a surface format override; nothing
in the shader.

* For 8- and 16-bit wide formats, pretend the surface is UNORM,
and recover the appropriate unsigned integer value by multiplying up,
and then converting to INT/UINT. If INT is required, then fix the sign
extension of the value by the usual SHL/ASR method.

This now passes all the applicable ARB_texture_gather piglit tests.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] i965/fs: Emit shader w/a for Gen6 gather

2014-02-03 Thread Chris Forbes
Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/drivers/dri/i965/brw_fs.h   |  1 +
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 26 ++
 2 files changed, 27 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 9c5c13a..3d668b9 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -360,6 +360,7 @@ public:
   fs_reg shadow_comp, fs_reg lod, fs_reg lod2,
   fs_reg sample_index, fs_reg mcs, int sampler);
fs_reg emit_mcs_fetch(ir_texture *ir, fs_reg coordinate, int sampler);
+   void emit_gen6_gather_wa(uint8_t wa, fs_reg dst);
fs_reg fix_math_operand(fs_reg src);
fs_inst *emit_math(enum opcode op, fs_reg dst, fs_reg src0);
fs_inst *emit_math(enum opcode op, fs_reg dst, fs_reg src0, fs_reg src1);
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index d88d24c..109f2e8 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1699,9 +1699,35 @@ fs_visitor::visit(ir_texture *ir)
   }
}
 
+   if (brw-gen == 6  ir-op == ir_tg4  
c-key.tex.gen6_gather_wa[sampler]) {
+  emit_gen6_gather_wa(c-key.tex.gen6_gather_wa[sampler], dst);
+   }
+
swizzle_result(ir, dst, sampler);
 }
 
+/*
+ * Apply workarounds for Gen6 gather with UINT/SINT
+ */
+void
+fs_visitor::emit_gen6_gather_wa(uint8_t wa, fs_reg dst)
+{
+   int width = (wa  WA_8BIT) ? 8 : 16;
+
+   for (int i = 0; i  4; i++) {
+  fs_reg dst_f = dst.retype(BRW_REGISTER_TYPE_F);
+  emit(MUL(dst_f, dst_f, fs_reg((float)((1  width) - 1;
+  emit(MOV(dst, dst_f));
+
+  if (wa  WA_SIGN) {
+ emit(SHL(dst, dst, fs_reg(32 - width)));
+ emit(ASR(dst, dst, fs_reg(32 - width)));
+  }
+
+  dst.reg_offset++;
+   }
+}
+
 /**
  * Set up the gather channel based on the swizzle, for gather4.
  */
-- 
1.8.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] ARB_texture_gather support for Sandy Bridge

2014-02-03 Thread Chris Forbes
One further note:

It initially looked like the USCALED/SSCALED surface formats might be
usable for this, and if so would reduce the required shader workaround
to just a single mov for the float to [u]int conversion -- but
unfortunately the sampler is broken for these as well.

-- Chris

On Mon, Feb 3, 2014 at 10:29 PM, Chris Forbes chr...@ijw.co.nz wrote:
 This series adds a bunch of workarounds to enable ARB_texture_gather
 (in its more restrictive form) on Gen6 hardware.

 These are necessary because Gen6's gather4 instruction doesn't work
 correctly with integer or unsigned integer formats.

 The approach is:

 * For 32-bit wide formats, pretend the surface is FLOAT, and reinterpret
 the bits as INT/UINT. This requires only a surface format override; nothing
 in the shader.

 * For 8- and 16-bit wide formats, pretend the surface is UNORM,
 and recover the appropriate unsigned integer value by multiplying up,
 and then converting to INT/UINT. If INT is required, then fix the sign
 extension of the value by the usual SHL/ASR method.

 This now passes all the applicable ARB_texture_gather piglit tests.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/blorp: Fix hw blending coefficients

2014-02-03 Thread Topi Pohjolainen
Previously the color components where evaluated using formula
(src_color * src_color + 1.0 dst_color) and alpha in turn using
(1.0 * src_alpha + 0.0 * dst_alpha). The intention is to keep
source color components unmodified and force alpha channel to
fixed value of one regardless of source or destination.

If one modifies piglit test gl-3.2-layered-rendering-blit to
use color components values other than zero or one, this change
will kick in on IVB. No regressions on IVB.

CC: Ian Romanick ian.d.roman...@intel.com
CC: Kenneth Graunke kenn...@whitecape.org
CC: Martin Steigerwald mar...@lichtvoll.de
CC: Eric Anholt e...@anholt.net
CC: mesa-sta...@lists.freedesktop.org
Signed-off-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/gen6_blorp.cpp | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.cpp 
b/src/mesa/drivers/dri/i965/gen6_blorp.cpp
index 90b9fbb..281ea5c 100644
--- a/src/mesa/drivers/dri/i965/gen6_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/gen6_blorp.cpp
@@ -262,15 +262,27 @@ gen6_blorp_emit_blend_state(struct brw_context *brw,
if (params-src.mt 
_mesa_get_format_bits(params-dst.mt-format, GL_ALPHA_BITS)  0 
_mesa_get_format_bits(params-src.mt-format, GL_ALPHA_BITS) == 0) {
+  /* The blending function cannot be set for the alpha channel alone. It
+   * is an addition to the basic color blending and hence both need to
+   * be enabled explicitly.
+   */
   blend-blend0.blend_enable = 1;
   blend-blend0.ia_blend_enable = 1;
 
   blend-blend0.blend_func = BRW_BLENDFUNCTION_ADD;
   blend-blend0.ia_blend_func = BRW_BLENDFUNCTION_ADD;
 
-  blend-blend0.source_blend_factor = BRW_BLENDFACTOR_SRC_COLOR;
+  /* Only alpha channel needs adjusting whereas the color components are
+   * to be evaluated regardless of the existing destination value. This
+   * can be achieved using formula: 1.0 * src_color + 0.0 * dst_color.
+   */
+  blend-blend0.source_blend_factor = BRW_BLENDFACTOR_ONE;
   blend-blend0.dest_blend_factor = BRW_BLENDFACTOR_ZERO;
-  blend-blend0.ia_source_blend_factor = BRW_BLENDFACTOR_ONE;
+  /* Forces alpha to fixed value of one by ignoring source value explicitly
+   * by replacing it with the constant one. The existing destination value
+   * in turn is ignored by using multiplier of zero: 1.0 + 0.0 * dst_color.
+   */
+  blend-blend0.ia_source_blend_factor = BRW_BLENDFACTOR_CONST_ALPHA;
   blend-blend0.ia_dest_blend_factor = BRW_BLENDFACTOR_ZERO;
}
 
@@ -291,6 +303,12 @@ gen6_blorp_emit_cc_state(struct brw_context *brw,
   cc_state_offset);
memset(cc, 0, sizeof(*cc));
 
+   /* When using hw color blending for forcing alpha channel to one independent
+* of the source, the blender is told to refer to the constant table here
+* for the fixed value.
+*/
+   cc-constant_a = 1.0;
+
return cc_state_offset;
 }
 
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] r600g, radeonsi: force VRAM placement for DRI2 buffers

2014-02-03 Thread Marek Olšák
Do you mean something like:

write_domains = VRAM
read_domains = VRAM | GTT

I don't think it's perfect. We should treat rendering by an app and
blitting by u_blitter (= texture transfers) differently. We don't want
u_blitter to mess up our idea about which textures are used as a
framebuffer, because once u_blitter comes into play, everything can be
a framebuffer.

I've got the following idea:

1) Set the domain to VRAM | GTT by default for all buffers and textures.
2) In set_framebuffer_state, if u_blitter isn't blitting, do this to
the domains of all renderbuffers: if (domains  VRAM) domains = ~GTT;

That said, I'm not too excited about this. Allowing more stuff to be
moved to GTT will have an adverse effect on performance. Being more
aggressive and trying to put more stuff in VRAM should almost always
be better if you can avoid buffer ping-ponging. I have been messing
with the domains since ever and what we have now seems to work best.
It's not perfect, but I have already tried all the other options and
they weren't better.

Marek

On Mon, Feb 3, 2014 at 4:13 AM, Michel Dänzer mic...@daenzer.net wrote:
 On Sam, 2014-02-01 at 15:08 +0100, Marek Olšák wrote:
 From: Marek Olšák marek.ol...@amd.com

 ---
  src/gallium/drivers/radeon/r600_texture.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/src/gallium/drivers/radeon/r600_texture.c 
 b/src/gallium/drivers/radeon/r600_texture.c
 index f80a6a2..434a651 100644
 --- a/src/gallium/drivers/radeon/r600_texture.c
 +++ b/src/gallium/drivers/radeon/r600_texture.c
 @@ -668,7 +668,7 @@ r600_texture_create_object(struct pipe_screen *screen,
   } else {
   resource-buf = buf;
   resource-cs_buf = rscreen-ws-buffer_get_cs_handle(buf);
 - resource-domains = RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM;
 + resource-domains = RADEON_DOMAIN_VRAM;

 This will force all textures read by a compositor using OpenGL to VRAM
 as well. I think we need to manage the domains separately for reads and
 writes.


 The other patches in the series look good to me.


 --
 Earthling Michel Dänzer|  http://www.amd.com
 Libre software enthusiast  |Mesa and X developer

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] pipe-loader: Add support for render nodes v2

2014-02-03 Thread David Herrmann
Hi

On Fri, Jan 31, 2014 at 9:58 PM, Tom Stellard t...@stellard.net wrote:
 On Thu, Jan 30, 2014 at 10:16:41PM +, Emil Velikov wrote:
 On 27/01/14 16:13, Tom Stellard wrote:
  From: Tom Stellard thomas.stell...@amd.com
 
  v2:
 - Add missing call to pipe_loader_drm_release()
 - Fix render node macros
 - Drop render-node configure option
  ---
 
  For reference, version 1 of this patch:
  http://lists.freedesktop.org/archives/mesa-dev/2013-October/047296.html
 
   .../auxiliary/pipe-loader/pipe_loader_drm.c| 80 
  +-
   1 file changed, 77 insertions(+), 3 deletions(-)
 
  diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader_drm.c 
  b/src/gallium/auxiliary/pipe-loader/pipe_loader_drm.c
  index 9484db8..7a1af91 100644
  --- a/src/gallium/auxiliary/pipe-loader/pipe_loader_drm.c
  +++ b/src/gallium/auxiliary/pipe-loader/pipe_loader_drm.c
  @@ -49,6 +49,11 @@
   #include util/u_dl.h
   #include util/u_debug.h
 
  +#define DRM_RENDER_NODE_DEV_NAME_FORMAT %s/renderD%d
  +#define DRM_RENDER_NODE_MAX_NODES 63
  +#define DRM_RENDER_NODE_MIN_MINOR 128
  +#define DRM_RENDER_NODE_MAX_MINOR (DRM_RENDER_NODE_MIN_MINOR + 
  DRM_RENDER_NODE_MAX_NODES)
  +
 Hello gents,
 Slightly silly question here, so plese be gentle. Shouldn't the above
 defines come from a libdrm header ?

 Similar to the following (from xf86drm.h)

 #define DRM_DIR_NAME  /dev/dri
 #define DRM_DEV_NAME  %s/card%d
 #define DRM_CONTROL_DEV_NAME  %s/controlD%d

 -Emil

 I'm not sure about this, maybe David has an opinion.

You should use libudev to search for device-nodes. Everything else I
don't really care about. If you want to hard-code the minors, feel
free to do that. So I don't mind adding it to libdrm, but I'd also be
fine if everyone does it themselves. It's a kernel API so it will
never change. And it's not really magic to do that..

Thanks
David
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 26/35] meta: Use common GLSL code for blits

2014-02-03 Thread Alex Deucher
On Mon, Feb 3, 2014 at 3:54 AM, Rogovin, Kevin kevin.rogo...@intel.com wrote:
 Hi,

 We can't do stencil blits with GLSL because no driver that uses meta can
 do the GL_ARB_shader_stencil_export extension.  For depth and color
 blits, we can always write the values from the shader, and disable
 writes to the buffers using glColorMask and glDepthMask.  So... we just
 need one shader.

 Should we add GL_ARB_shader_stencil_export as one of the extensions Mesa
 can support? Naturally that is major tangent from this, and I am not too sure
 what hardware beyond newer AMD hardware has that capability.

Mesa already supports it, at least for AMD hardware.

Alex


 The shader had that the fragment depth used the same value as the red of the
 fragment color, so should it not be then 3 shaders (color only, depth only, 
 color and depth)
 with the last one having two samplers?

 Also, should not the case of blitting from MSAA to non-MSAA be handled 
 correctly?
 i.e. use the sampler type sampler2DMS  and average over the number of samples?
 or possible fancier filtering kernels?

 Best Regards,
  -Kevin Rogovin
 
 From: Ian Romanick [i...@freedesktop.org]
 Sent: Sunday, February 02, 2014 5:45 PM
 To: Rogovin, Kevin; mesa-dev@lists.freedesktop.org
 Subject: Re: [Mesa-dev] [PATCH 26/35] meta: Use common GLSL code for blits

 On 01/30/2014 09:51 AM, Rogovin, Kevin wrote:
 @@ -487,6 +486,7 @@ setup_shader_for_sampler(struct gl_context *ctx, struct 
 glsl_sampler
 *sampler)
   void main()\n
  {\n
 gl_FragColor = %s(texSampler, %s);\n
 + gl_FragDepth = gl_FragColor.x;\n

 *IF* I understood what this is to do, is to provide FBO blit API
 implementation from upstairs mesa. Should not a different shader be used
 base on the 8 possibilities(really 7 since all 0 means no-op) of the
 bits being up and down: COLOR_BUFFER_BIT, DEPTH_BUFFER_BIT and
 STENCIL_BUFFER_BIT ? The above looks like to me to handle either
 COLOR_BUFFER_BIT or DEPTH_BUFFER_BIT up, but not both. With that in mind
 then there would be up to 3 separate samplers (one for color, one for
 stencil and one for depth).

 We can't do stencil blits with GLSL because no driver that uses meta can
 do the GL_ARB_shader_stencil_export extension.  For depth and color
 blits, we can always write the values from the shader, and disable
 writes to the buffers using glColorMask and glDepthMask.  So... we just
 need one shader.

 Or am I missing something?

 -Kevin


 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeon/uvd: fix feedback buffer handling

2014-02-03 Thread Christian König
From: Christian König christian.koe...@amd.com

Without the correct feedback buffer size UVD runs
into an error on each frame, reducing the maximum FPS.

Signed-off-by: Christian König christian.koe...@amd.com
---
 src/gallium/drivers/radeon/radeon_uvd.c | 38 +++--
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_uvd.c 
b/src/gallium/drivers/radeon/radeon_uvd.c
index 95757e3..6ac2199 100644
--- a/src/gallium/drivers/radeon/radeon_uvd.c
+++ b/src/gallium/drivers/radeon/radeon_uvd.c
@@ -58,6 +58,9 @@
 #define NUM_H264_REFS 17
 #define NUM_VC1_REFS 5
 
+#define FB_BUFFER_OFFSET 0x1000
+#define FB_BUFFER_SIZE 2048
+
 /* UVD buffer representation */
 struct ruvd_buffer
 {
@@ -81,6 +84,7 @@ struct ruvd_decoder {
 
struct ruvd_buffer  msg_fb_buffers[NUM_BUFFERS];
struct ruvd_msg *msg;
+   uint32_t*fb;
 
struct ruvd_buffer  bs_buffers[NUM_BUFFERS];
void*   bs_ptr;
@@ -132,15 +136,20 @@ static void send_cmd(struct ruvd_decoder *dec, unsigned 
cmd,
 }
 
 /* map the next available message buffer */
-static void map_msg_buf(struct ruvd_decoder *dec)
+static void map_msg_fb_buf(struct ruvd_decoder *dec)
 {
struct ruvd_buffer* buf;
+   void *ptr;
 
-   /* grap the current message buffer */
+   /* grap the current message/feedback buffer */
buf = dec-msg_fb_buffers[dec-cur_buffer];
 
-   /* copy the message into it */
-   dec-msg = dec-ws-buffer_map(buf-cs_handle, dec-cs, 
PIPE_TRANSFER_WRITE);
+   /* and map it for CPU access */
+   ptr = dec-ws-buffer_map(buf-cs_handle, dec-cs, PIPE_TRANSFER_WRITE);
+
+   /* calc buffer offsets */
+   dec-msg = ptr;
+   dec-fb = ptr + FB_BUFFER_OFFSET;
 }
 
 /* unmap and send a message command to the VCPU */
@@ -148,8 +157,8 @@ static void send_msg_buf(struct ruvd_decoder *dec)
 {
struct ruvd_buffer* buf;
 
-   /* ignore the request if message buffer isn't mapped */
-   if (!dec-msg)
+   /* ignore the request if message/feedback buffer isn't mapped */
+   if (!dec-msg || !dec-fb)
return;
 
/* grap the current message buffer */
@@ -157,6 +166,8 @@ static void send_msg_buf(struct ruvd_decoder *dec)
 
/* unmap the buffer */
dec-ws-buffer_unmap(buf-cs_handle);
+   dec-msg = NULL;
+   dec-fb = NULL;
 
/* and send it to the hardware */
send_cmd(dec, RUVD_CMD_MSG_BUFFER, buf-cs_handle, 0,
@@ -644,7 +655,7 @@ static void ruvd_destroy(struct pipe_video_codec *decoder)
 
assert(decoder);
 
-   map_msg_buf(dec);
+   map_msg_fb_buf(dec);
memset(dec-msg, 0, sizeof(*dec-msg));
dec-msg-size = sizeof(*dec-msg);
dec-msg-msg_type = RUVD_MSG_DESTROY;
@@ -773,7 +784,7 @@ static void ruvd_end_frame(struct pipe_video_codec *decoder,
memset(dec-bs_ptr, 0, bs_size - dec-bs_size);
dec-ws-buffer_unmap(bs_buf-cs_handle);
 
-   map_msg_buf(dec);
+   map_msg_fb_buf(dec);
dec-msg-size = sizeof(*dec-msg);
dec-msg-msg_type = RUVD_MSG_DECODE;
dec-msg-stream_handle = dec-stream_handle;
@@ -813,6 +824,10 @@ static void ruvd_end_frame(struct pipe_video_codec 
*decoder,
 
dec-msg-body.decode.db_surf_tile_config = 
dec-msg-body.decode.dt_surf_tile_config;
dec-msg-body.decode.extension_support = 0x1;
+
+   /* set at least the feedback buffer size */
+   dec-fb[0] = FB_BUFFER_SIZE;
+
send_msg_buf(dec);
 
send_cmd(dec, RUVD_CMD_DPB_BUFFER, dec-dpb.cs_handle, 0,
@@ -822,7 +837,7 @@ static void ruvd_end_frame(struct pipe_video_codec *decoder,
send_cmd(dec, RUVD_CMD_DECODING_TARGET_BUFFER, dt, 0,
 RADEON_USAGE_WRITE, RADEON_DOMAIN_VRAM);
send_cmd(dec, RUVD_CMD_FEEDBACK_BUFFER, msg_fb_buf-cs_handle,
-0x1000, RADEON_USAGE_WRITE, RADEON_DOMAIN_GTT);
+FB_BUFFER_OFFSET, RADEON_USAGE_WRITE, RADEON_DOMAIN_GTT);
set_reg(dec, RUVD_ENGINE_CNTL, 1);
 
flush(dec);
@@ -898,7 +913,8 @@ struct pipe_video_codec *ruvd_create_decoder(struct 
pipe_context *context,
 
bs_buf_size = width * height * 512 / (16 * 16);
for (i = 0; i  NUM_BUFFERS; ++i) {
-   unsigned msg_fb_size = align(sizeof(struct ruvd_msg), 0x1000) + 
0x1000;
+   unsigned msg_fb_size = FB_BUFFER_OFFSET + FB_BUFFER_SIZE;
+   assert(sizeof(struct ruvd_msg) = FB_BUFFER_OFFSET);
if (!create_buffer(dec, dec-msg_fb_buffers[i], msg_fb_size)) {
RUVD_ERR(Can't allocated message buffers.\n);
goto error;
@@ -920,7 +936,7 @@ struct pipe_video_codec *ruvd_create_decoder(struct 
pipe_context *context,
 
clear_buffer(dec, dec-dpb);
 
-   map_msg_buf(dec);
+   map_msg_fb_buf(dec);
dec-msg-size = sizeof(*dec-msg);

Re: [Mesa-dev] [PATCH 1/4] r600g, radeonsi: force VRAM placement for DRI2 buffers

2014-02-03 Thread Axel Davy

I don't know of any way to know if the buffer should be in GTT.

There could also be the case where the buffer is first not shared with 
another device,
and then is shared with another device. So at first we shouldn't have 
forced the buffer to stay in VRAM.


Just to confirm, I have tested your patch as is with my Wayland Prime 
patches, and as expected it doesn't work well anymore.


Axel Davy

On 03/02/2014, Marek Olšák wrote :

Well, it's kinda obvious what would happen. The buffer would be
relocated to VRAM.

Do you have any suggestion how to find out if the buffer should be in
GTT instead?

Marek

On Mon, Feb 3, 2014 at 12:13 AM, Axel Davy axel.d...@ens.fr wrote:

From: Marek Olšák marek.olsak at amd.com

---
src/gallium/drivers/radeon/r600_texture.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/r600_texture.c
b/src/gallium/drivers/radeon/r600_texture.c
index f80a6a2..434a651 100644
--- a/src/gallium/drivers/radeon/r600_texture.c
+++ b/src/gallium/drivers/radeon/r600_texture.c
@@ -668,7 +668,7 @@ r600_texture_create_object(struct pipe_screen *screen,
} else {
resource-buf = buf;
resource-cs_buf = rscreen-ws-buffer_get_cs_handle(buf);
-   resource-domains = RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM;
+   resource-domains = RADEON_DOMAIN_VRAM;
}

if (rtex-cmask.size) {
--
1.8.3.2

If I understand correctly, this code is always executed after importing a
buffer with an handle, a Gem name or a prime fd.

Given the imported buffer can be shared with another card (in the prime fd
case), I imagine there are some cases where the buffer must stay in GTT.

What would happen in this case with your patch?


Axel Davy


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Stable 10.0 branch updated for imminent release

2014-02-03 Thread Carl Worth
Matt Turner matts...@gmail.com writes:
 We definitely want to apply glx: Update glxext.h to revision 24777.
 in time for the next 10.0 release. It fixes a really annoying problem
 with the GL headers that make distributions lives awful.

Thanks. I'm glad I asked people to point out anything I missed. This is
exactly the kind of thing I'm looking for.

 I sent this to mesa-stable@ on the 25th. It's a slightly different
 than what's on master due to some other patches being applied to one
 branch but not the other.

Oh, I see. This was still in my queue of patches, but I had categorized
it along with Things that have been emailed to mesa-stable@ but haven't
yet appeared in master.

Since we do have a lot of patches to mesa-stable@ that are in that
category, it would probably help to have something to more easily
distinguish this case, (which is This is a backport of a patch which
already appears in master.).

I'd like to recommend a note in the commit message along this lines of:

(Backported from commit 3f3aafbfeeb3939cb5cf710954ccefb8bbe9cff9)

Independently, it will be nice to have that commit identifier in the
history, (much like we already get for things we cherry pick with git
cherry-pick -x).

I've now added that note to this commit and applied it to 10.0.

Thanks again!

-Carl

-- 
carl.d.wo...@intel.com


pgpiIIGyyLZbk.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/13] r300g, r600g, radeonsi: add support for ARB_buffer_storage

2014-02-03 Thread Christian König

Is there a way to do the HDP flush at the bottom of the pipe? Can this
be fixed in the kernel?


I'm not 100% sure, but I don't think so. As far as I know the only thing 
you can do on the bottom of the pipe is writing a fence/timestamp value 
or signaling a semaphore (the different EOP events).



For now, it would be safer to use GTT for all persistent mappings.
I'll update the patch.


That sounds like a good idea anyway. Persistent mappings seems to make 
only sense with stuff that is accessed allot by the CPU and VRAM isn't a 
good choice for that.


Christian.

Am 02.02.2014 19:29, schrieb Marek Olšák:

Is there a way to do the HDP flush at the bottom of the pipe? Can this
be fixed in the kernel?

For now, it would be safer to use GTT for all persistent mappings.
I'll update the patch.

Marek

On Sat, Feb 1, 2014 at 11:58 AM, Christian König
deathsim...@vodafone.de wrote:

Am 31.01.2014 21:36, schrieb Alex Deucher:


On Fri, Jan 31, 2014 at 7:05 AM, Marek Olšák mar...@gmail.com wrote:

I think we always flush the HDP cache after (before?) command submission.


The kernel flushes the HDP cache in the fence command sequence.


But we do this at the top of the pipe instead of the bottom and then
silently assume that between the top and the bottom the CPU isn't accessing
this VRAM buffer.

At least theoretically this can lead to a problem if the CPU wants to read a
buffer value that's in the same cache line as a value the GPU writes. For
example if a stupid application tries something like
while(!value_written_by_GPU_in_VRAM); this won't work correctly.

Christian.



Alex


This patch adds nothing new to the drivers - we have always had
persistent buffer mappings for all buffers and it has always worked.
The only thing this does is that persistent mappings are now also
supported by Gallium and OpenGL.

If there is a missing cache flush somewhere, the new
ARB_buffer_storage piglit test will show it.

Marek

On Fri, Jan 31, 2014 at 2:04 AM, Michel Dänzer mic...@daenzer.net
wrote:

On Don, 2014-01-30 at 23:46 +0100, Fredrik Höglund wrote:

On Thursday 30 January 2014, Marek Olšák wrote:

From: Marek Olšák marek.ol...@amd.com

All GTT memory mappings are coherent and therefore can be persistent.

As we discussed on IRC, I think there should be a comment somewhere
explaining that VRAM mappings are uncached, so the memory_barrier
implementations don't need to do anything for those.

VRAM is mapped uncacheable by the CPU, but there is an HDP cache which
must be flushed to ensure coherency between the CPU and GPU. So I
suspect memory_barrier actually needs to flush the HDP cache for VRAM.

I'm wondering about GTT mappings on AGP as well. I think we're using CPU
write-combining for those, so we probably need to flush the
write-combining buffers?


--
Earthling Michel Dänzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Stable 10.0 branch updated for imminent release

2014-02-03 Thread Carl Worth
Matt Turner matts...@gmail.com writes:
 Another couple 10.0.3 requests

 - fcefdc9a595c5 (see https://bugs.freedesktop.org/show_bug.cgi?id=73900)
 - 1cdeeef6c400 (see https://bugs.freedesktop.org/show_bug.cgi?id=73902)

Thanks. Applied.

-Carl

-- 
carl.d.wo...@intel.com


pgpYvofdRNcq8.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Drop need_workaround_flush flag; always do post-sync workaround.

2014-02-03 Thread Eric Anholt
Kenneth Graunke kenn...@whitecape.org writes:

 The need_workaround_flush flag was an attempt to avoid doing the
 post-sync non-zero workaround flush unless it was actually necessary.

 Getting the tracking right is tricky; we've messed it up several times.

 The cost of getting this wrong is high: almost certainly a GPU hang.
 The cost of always doing the flush is low: it's just another flush,
 and on Sandybridge, we're doing piles of them anyway.  And every time
 we've added more post-sync non-zero workarounds, the system has become
 more stable.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/i965/brw_context.h   |  1 -
  src/mesa/drivers/dri/i965/brw_draw.c  |  3 ---
  src/mesa/drivers/dri/i965/gen6_blorp.cpp  |  6 --
  src/mesa/drivers/dri/i965/intel_batchbuffer.c | 12 
  4 files changed, 22 deletions(-)

 People are still getting daily GPU hangs...in the past, adding post-sync
 workarounds has helped a lot.  Emitting more of them should be safe to do
 (the post-sync workaround doesn't require any preceding PIPE_CONTROLs),
 and may help.  Worth a try at least...?

If we were going to do this, I'd rather drop all the callers of
emit_post_sync_nonzero_flush() and just call it once at the top of each
primitive.  Emitting the same stall multiple times per primitive, which
this patch will do, seems silly.


pgpvXVBlTIgJj.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] gl_ViewportIndex/EmitVertex semantics question

2014-02-03 Thread Ilia Mirkin
Hi Ian, mesa-dev,

I've been trying to implement ARB_viewport_array support on nv50, with
little success. Christoph pointed out that if I write the
gl_ViewportIndex value before each emit (rather than once at the top,
as the piglit tests do it), everything magically starts working.

I glanced at http://www.opengl.org/sdk/docs/manglsl/xhtml/gl_ViewportIndex.xml
which is a bit inconclusive: The viewport index used will come from
one of the vertices in the primitive being shaded. However, which
vertex the viewport index comes from is implementation-defined, and so
it is recommended to use the same viewport index for all vertices of a
primitive. And there's various text about how if it's unset, it's
undefined. But it's unclear what set means -- does it become unset
after an EmitVertex() call or not?

Actually Christoph just pointed out that EmitVertex says: Emit the
current values of output variables to the current output primitive. On
return from this call, the values of output variables are undefined.
So I guess the piglit tests should change?

Could someone with actual understanding of GLSL/etc take a look and
let me know whether I need to register gl_ViewportIndex (and, in all
probability, gl_Layer) writes and then write the registered value on
every EmitVertex in the driver, or whether the piglit tests should be
changed? (An observation is that I did _not_ need to move the 'color'
out into the loop in render-scissor, just gl_ViewportIndex -- but that
could be an abberation of the nvidia geometry program execution
logic.)

Thanks,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] draw: fix incorrect color of flat-shaded clipped lines

2014-02-03 Thread Brian Paul
When we clipped a line weren't copying the provoking vertex
color to the second vertex.  We also weren't checking for
first vs. last provoking vertex.

Fixes failures found with the new piglit line-flat-clip-color test.

Cc: 10.0, 10.1 mesa-sta...@lists.freedesktop.org
---
 src/gallium/auxiliary/draw/draw_pipe_clip.c |   13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_clip.c 
b/src/gallium/auxiliary/draw/draw_pipe_clip.c
index de354e9..d8b56de 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_clip.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_clip.c
@@ -588,7 +588,12 @@ do_clip_line( struct draw_stage *stage,
 
if (v0-clipmask) {
   interp( clipper, stage-tmp[0], t0, v0, v1, viewport_index );
-  copy_flat(stage, stage-tmp[0], v0);
+  if (stage-draw-rasterizer-flatshade_first) {
+ copy_flat(stage, stage-tmp[0], v0);  /* copy v0 color to tmp[0] */
+  }
+  else {
+ copy_flat(stage, stage-tmp[0], v1);  /* copy v1 color to tmp[0] */
+  }
   newprim.v[0] = stage-tmp[0];
}
else {
@@ -597,6 +602,12 @@ do_clip_line( struct draw_stage *stage,
 
if (v1-clipmask) {
   interp( clipper, stage-tmp[1], t1, v1, v0, viewport_index );
+  if (stage-draw-rasterizer-flatshade_first) {
+ copy_flat(stage, stage-tmp[1], v0);  /* copy v0 color to tmp[1] */
+  }
+  else {
+ copy_flat(stage, stage-tmp[1], v1);  /* copy v1 color to tmp[1] */
+  }
   newprim.v[1] = stage-tmp[1];
}
else {
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Fix INTEL_DEBUG=vs for fixed-function/ARB programs.

2014-02-03 Thread Kenneth Graunke
Since commit 9cee3ff562f3e4b51bfd30338fd1ba7716ac5737, INTEL_DEBUG=vs
has caused a NULL pointer dereference for fixed-function/ARB programs.

In the vec4 generators, prog is a gl_program, and shader_prog is the
gl_shader_program.  This is different than the FS visitor.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp  | 4 ++--
 src/mesa/drivers/dri/i965/gen8_vec4_generator.cpp | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 94d1e79..e5ed2c2 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1255,7 +1255,7 @@ vec4_generator::generate_code(exec_list *instructions)
const void *last_annotation_ir = NULL;
 
if (unlikely(debug_flag)) {
-  if (prog) {
+  if (shader_prog) {
  printf(Native code for vertex shader %d:\n, shader_prog-Name);
   } else {
  printf(Native code for vertex program %d:\n, prog-Id);
@@ -1271,7 +1271,7 @@ vec4_generator::generate_code(exec_list *instructions)
last_annotation_ir = inst-ir;
if (last_annotation_ir) {
   printf(   );
-   if (prog) {
+   if (shader_prog) {
   ((ir_instruction *) last_annotation_ir)-print();
} else {
   const prog_instruction *vpi;
diff --git a/src/mesa/drivers/dri/i965/gen8_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/gen8_vec4_generator.cpp
index 5a34858..03ed05a 100644
--- a/src/mesa/drivers/dri/i965/gen8_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/gen8_vec4_generator.cpp
@@ -782,7 +782,7 @@ gen8_vec4_generator::generate_code(exec_list *instructions)
const void *last_annotation_ir = NULL;
 
if (unlikely(debug_flag)) {
-  if (prog) {
+  if (shader_prog) {
  printf(Native code for vertex shader %d:\n, shader_prog-Name);
   } else {
  printf(Native code for vertex program %d:\n, prog-Id);
@@ -798,7 +798,7 @@ gen8_vec4_generator::generate_code(exec_list *instructions)
 last_annotation_ir = ir-ir;
 if (last_annotation_ir) {
printf(   );
-   if (prog) {
+   if (shader_prog) {
   ((ir_instruction *) last_annotation_ir)-print();
} else {
   const prog_instruction *vpi;
-- 
1.8.5.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] draw: fix incorrect color of flat-shaded clipped lines

2014-02-03 Thread Jose Fonseca


- Original Message -
 When we clipped a line weren't copying the provoking vertex
 color to the second vertex.  We also weren't checking for
 first vs. last provoking vertex.
 
 Fixes failures found with the new piglit line-flat-clip-color test.
 
 Cc: 10.0, 10.1 mesa-sta...@lists.freedesktop.org
 ---
  src/gallium/auxiliary/draw/draw_pipe_clip.c |   13 -
  1 file changed, 12 insertions(+), 1 deletion(-)
 
 diff --git a/src/gallium/auxiliary/draw/draw_pipe_clip.c
 b/src/gallium/auxiliary/draw/draw_pipe_clip.c
 index de354e9..d8b56de 100644
 --- a/src/gallium/auxiliary/draw/draw_pipe_clip.c
 +++ b/src/gallium/auxiliary/draw/draw_pipe_clip.c
 @@ -588,7 +588,12 @@ do_clip_line( struct draw_stage *stage,
  
 if (v0-clipmask) {
interp( clipper, stage-tmp[0], t0, v0, v1, viewport_index );
 -  copy_flat(stage, stage-tmp[0], v0);
 +  if (stage-draw-rasterizer-flatshade_first) {
 + copy_flat(stage, stage-tmp[0], v0);  /* copy v0 color to tmp[0] */
 +  }
 +  else {
 + copy_flat(stage, stage-tmp[0], v1);  /* copy v1 color to tmp[0] */
 +  }
newprim.v[0] = stage-tmp[0];
 }
 else {
 @@ -597,6 +602,12 @@ do_clip_line( struct draw_stage *stage,
  
 if (v1-clipmask) {
interp( clipper, stage-tmp[1], t1, v1, v0, viewport_index );
 +  if (stage-draw-rasterizer-flatshade_first) {
 + copy_flat(stage, stage-tmp[1], v0);  /* copy v0 color to tmp[1] */
 +  }
 +  else {
 + copy_flat(stage, stage-tmp[1], v1);  /* copy v1 color to tmp[1] */
 +  }
newprim.v[1] = stage-tmp[1];
 }
 else {
 --
 1.7.10.4
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/mesa-devk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0Am=JXBgl5uBqvPNX%2BdbIW3nyeyaAE1k%2Fe5i%2BFx%2FJu%2Fujvs%3D%0As=23b3fafec3c062f542eff7bdeda679e533c99024a5f2b0708c0ee862c8342ca7
 

Reviewed-by: Jose Fonseca jfons...@vmware.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/auxiliary/indices: replace free() with FREE()

2014-02-03 Thread Jose Fonseca


- Original Message -
 To match the CALLOC_STRUCT() call.
 ---
  src/gallium/auxiliary/indices/u_primconvert.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/src/gallium/auxiliary/indices/u_primconvert.c
 b/src/gallium/auxiliary/indices/u_primconvert.c
 index 14da7a6..f2a77ce 100644
 --- a/src/gallium/auxiliary/indices/u_primconvert.c
 +++ b/src/gallium/auxiliary/indices/u_primconvert.c
 @@ -74,7 +74,7 @@ void
  util_primconvert_destroy(struct primconvert_context *pc)
  {
 util_primconvert_save_index_buffer(pc, NULL);
 -   free(pc);
 +   FREE(pc);
  }
  
  void
 --
 1.7.10.4
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/mesa-devk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0Am=XsDKKn2C38Z0QNdS%2Boi50mFvk4%2BiW%2F6RaEWprRFlUxU%3D%0As=d876b41ad03741ecc863ed4860b4b2d7e88ce60a2b040d0f2493f3c6496ada1a
 


Reviewed-by: Jose Fonseca jfons...@vmware.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] svga: check shader size against max command buffer size

2014-02-03 Thread Jose Fonseca
Looks good AFAICT.

Jose

- Original Message -
 If the shader is too large, plug in a dummy shader.  This patch also
 reworks the existing dummy shader code.
 ---
  src/gallium/drivers/svga/include/svga_reg.h |1 +
  src/gallium/drivers/svga/svga_state_fs.c|   60
  +--
  2 files changed, 49 insertions(+), 12 deletions(-)
 
 diff --git a/src/gallium/drivers/svga/include/svga_reg.h
 b/src/gallium/drivers/svga/include/svga_reg.h
 index 1b96c2e..dee719d 100644
 --- a/src/gallium/drivers/svga/include/svga_reg.h
 +++ b/src/gallium/drivers/svga/include/svga_reg.h
 @@ -889,6 +889,7 @@ typedef enum {
  } SVGAFifoCmdId;
  
  #define SVGA_CMD_MAX_ARGS   64
 +#define SVGA_CB_MAX_COMMAND_SIZE (32 * 1024) // 32 KB
  
  
  /*
 diff --git a/src/gallium/drivers/svga/svga_state_fs.c
 b/src/gallium/drivers/svga/svga_state_fs.c
 index 860a0c8..7119a19 100644
 --- a/src/gallium/drivers/svga/svga_state_fs.c
 +++ b/src/gallium/drivers/svga/svga_state_fs.c
 @@ -76,12 +76,18 @@ search_fs_key(const struct svga_fragment_shader *fs,
  /**
   * If we fail to compile a fragment shader (because it uses too many
   * registers, for example) we'll use a dummy/fallback shader that
 - * simply emits a constant color.
 + * simply emits a constant color (red for debug, black for release).
 + * We hit this with the Unigine/Heaven demo when Shaders = High.
 + * With black, the demo still looks good.
   */
  static const struct tgsi_token *
  get_dummy_fragment_shader(void)
  {
 -   static const float red[4] = { 1.0, 0.0, 0.0, 0.0 };
 +#ifdef DEBUG
 +   static const float color[4] = { 1.0, 0.0, 0.0, 0.0 }; /* red */
 +#else
 +   static const float color[4] = { 0.0, 0.0, 0.0, 0.0 }; /* black */
 +#endif
 struct ureg_program *ureg;
 const struct tgsi_token *tokens;
 struct ureg_src src;
 @@ -93,7 +99,7 @@ get_dummy_fragment_shader(void)
return NULL;
  
 dst = ureg_DECL_output(ureg, TGSI_SEMANTIC_COLOR, 0);
 -   src = ureg_DECL_immediate(ureg, red, 4);
 +   src = ureg_DECL_immediate(ureg, color, 4);
 ureg_MOV(ureg, dst, src);
 ureg_END(ureg);
  
 @@ -106,6 +112,29 @@ get_dummy_fragment_shader(void)
  
  
  /**
 + * Replace the given shader's instruction with a simple constant-color
 + * shader.  We use this when normal shader translation fails.
 + */
 +static struct svga_shader_variant *
 +get_compiled_dummy_shader(struct svga_fragment_shader *fs,
 +  const struct svga_fs_compile_key *key)
 +{
 +   const struct tgsi_token *dummy = get_dummy_fragment_shader();
 +   struct svga_shader_variant *variant;
 +
 +   if (!dummy) {
 +  return NULL;
 +   }
 +
 +   FREE((void *) fs-base.tokens);
 +   fs-base.tokens = dummy;
 +
 +   variant = svga_translate_fragment_program(fs, key);
 +   return variant;
 +}
 +
 +
 +/**
   * Translate TGSI shader into an svga shader variant.
   */
  static enum pipe_error
 @@ -119,17 +148,24 @@ compile_fs(struct svga_context *svga,
  
 variant = svga_translate_fragment_program( fs, key );
 if (variant == NULL) {
 -  /* some problem during translation, try the dummy shader */
 -  const struct tgsi_token *dummy = get_dummy_fragment_shader();
 -  if (!dummy) {
 - ret = PIPE_ERROR_OUT_OF_MEMORY;
 +  debug_printf(Failed to compile fragment shader,
 +using dummy shader instead.\n);
 +  variant = get_compiled_dummy_shader(fs, key);
 +  if (!variant) {
 + ret = PIPE_ERROR;
   goto fail;
}
 -  debug_printf(Failed to compile fragment shader, using dummy shader
 instead.\n);
 -  FREE((void *) fs-base.tokens);
 -  fs-base.tokens = dummy;
 -  variant = svga_translate_fragment_program(fs, key);
 -  if (variant == NULL) {
 +   }
 +
 +   if (variant-nr_tokens * sizeof(variant-tokens[0])
 +   + sizeof(SVGA3dCmdDefineShader) + sizeof(SVGA3dCmdHeader)
 +   = SVGA_CB_MAX_COMMAND_SIZE) {
 +  /* too big, use dummy shader */
 +  debug_printf(Shader too large (%lu bytes),
 +using dummy shader instead.\n,
 +   (unsigned long ) variant-nr_tokens *
 sizeof(variant-tokens[0]));
 +  variant = get_compiled_dummy_shader(fs, key);
 +  if (!variant) {
   ret = PIPE_ERROR;
   goto fail;
}
 --
 1.7.10.4
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/mesa-devk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0Am=D6Q9%2Bo%2FJoK7IQSw%2BN8R2qpRcyU4Ejo41GMF0kxJRxeI%3D%0As=725f477afcd6dfa8da7adbf9b89b6405506a4888f384392732f367b7249261da
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 74471] New: [swrast] piglit ext_texture_array-gen-mipmap regression

2014-02-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=74471

  Priority: medium
Bug ID: 74471
  Keywords: regression
CC: bri...@vmware.com, i...@freedesktop.org
  Assignee: mesa-dev@lists.freedesktop.org
   Summary: [swrast] piglit ext_texture_array-gen-mipmap
regression
  Severity: normal
Classification: Unclassified
OS: Linux (All)
  Reporter: v...@freedesktop.org
  Hardware: x86-64 (AMD64)
Status: NEW
   Version: git
 Component: Mesa core
   Product: Mesa

mesa: 595bcf38a6620cc6b69b170bd5c48ace720dd607 (master)

$ ./bin/ext_texture_array-gen-mipmap -auto
Probe at level 0, x 64, y 32, z 0 =  (0, 0, 0, 0), expected (255,255,255,255)
Internal tex format GL_COMPRESSED_RGBA
Probe at level 1, x 32, y 16, z 0 =  (0, 0, 0, 0), expected (255,255,255,255)
Internal tex format GL_COMPRESSED_RGBA
Probe at level 2, x 16, y 8, z 0 =  (0, 0, 0, 0), expected (255,255,255,255)
Internal tex format GL_COMPRESSED_RGBA
Probe at level 3, x 8, y 4, z 0 =  (0, 0, 0, 0), expected (255,255,255,255)
Internal tex format GL_COMPRESSED_RGBA
Probe at level 4, x 4, y 2, z 0 =  (0, 0, 0, 0), expected (255,255,255,255)
Internal tex format GL_COMPRESSED_RGBA
Probe at level 5, x 2, y 1, z 0 =  (0, 0, 0, 0), expected (255,255,255,255)
Internal tex format GL_COMPRESSED_RGBA
Probe at level 6, x 1, y 0, z 0 =  (0, 0, 0, 0), expected (255,255,255,255)
Internal tex format GL_COMPRESSED_RGBA
Probe at level 7, x 0, y 0, z 0 =  (0, 0, 0, 0), expected (255,255,255,255)
Internal tex format GL_COMPRESSED_RGBA
Probe at level 0, x 64, y 32, z 0 =  (0, 0, 0, 0), expected (255,255,255,255)
Internal tex format GL_COMPRESSED_RGB
Probe at level 1, x 32, y 16, z 0 =  (0, 0, 0, 0), expected (255,255,255,255)
Internal tex format GL_COMPRESSED_RGB
Probe at level 2, x 16, y 8, z 0 =  (0, 0, 0, 0), expected (255,255,255,255)
Internal tex format GL_COMPRESSED_RGB
Probe at level 3, x 8, y 4, z 0 =  (0, 0, 0, 0), expected (255,255,255,255)
Internal tex format GL_COMPRESSED_RGB
Probe at level 4, x 4, y 2, z 0 =  (0, 0, 0, 0), expected (255,255,255,255)
Internal tex format GL_COMPRESSED_RGB
Probe at level 5, x 2, y 1, z 0 =  (0, 0, 0, 0), expected (255,255,255,255)
Internal tex format GL_COMPRESSED_RGB
Probe at level 6, x 1, y 0, z 0 =  (0, 0, 0, 0), expected (255,255,255,255)
Internal tex format GL_COMPRESSED_RGB
Probe at level 7, x 0, y 0, z 0 =  (0, 0, 0, 0), expected (255,255,255,255)
Internal tex format GL_COMPRESSED_RGB
PIGLIT: {'result': 'fail' }


2bf4db1697a02cd58062e0ebac086fda72f29945 is the first bad commit
commit 2bf4db1697a02cd58062e0ebac086fda72f29945
Author: Ian Romanick ian.d.roman...@intel.com
Date:   Fri Dec 13 13:40:48 2013 -0800

meta: Don't use fixed-function to decompress array textures

Array textures can't be used with fixed-function, so don't.  Instead,
just drop the decompress request on the floor.  This is no worse than
what was done previously because generating the GL error (in
_mesa_set_enable) broke everything anyway.

A later patch will get GL_TEXTURE_2D_ARRAY targets working.

Signed-off-by: Ian Romanick ian.d.roman...@intel.com
Reviewed-by: Brian Paul bri...@vmware.com

:04 04 ee023cf7abb440daa69f60646014a81357f39df0
a5fde0128d2b3e8cfb620724b06392d601d0da13 Msrc
bisect run success

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/blorp: Fix hw blending coefficients

2014-02-03 Thread Eric Anholt
Topi Pohjolainen topi.pohjolai...@intel.com writes:

 Previously the color components where evaluated using formula
 (src_color * src_color + 1.0 dst_color) and alpha in turn using
   ^ 0.0
 (1.0 * src_alpha + 0.0 * dst_alpha). The intention is to keep
 source color components unmodified and force alpha channel to
 fixed value of one regardless of source or destination.

Not multiplying the source color by source color makes a lot of sense.
But your CONSTANT_ALPHA with ALPHA of 1.0 is the same as just using ONE.
(you're still multiplying the src alpha by 1.0 with your change, not
replacing the source alpha value with 1.0).

Basically, I think your original patch is equivalent to this and bettter
than this, and if we want to actually make the no-alpha-bits-present
thing work, we need to override the bits in the surface state or in the
generated code.  In the normal draw path, it's done for sampling by the
swizzling code in brw_wm_surface_state.c, and the blending overrides is
just to fix up the alpha blending stage which doesn't pay attention to
that for the destination surface.

So, the original patch is:

Reviewed-by: Eric Anholt e...@anholt.net


pgpg5MIw2lqF4.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeon/uvd: fix feedback buffer handling

2014-02-03 Thread Alex Deucher
On Mon, Feb 3, 2014 at 5:33 AM, Christian König deathsim...@vodafone.de wrote:
 From: Christian König christian.koe...@amd.com

 Without the correct feedback buffer size UVD runs
 into an error on each frame, reducing the maximum FPS.

 Signed-off-by: Christian König christian.koe...@amd.com

Reviewed-by: Alex Deucher alexander.deuc...@amd.com

Should probably also CC 10.1, 10.0, and 9.2?

 ---
  src/gallium/drivers/radeon/radeon_uvd.c | 38 
 +++--
  1 file changed, 27 insertions(+), 11 deletions(-)

 diff --git a/src/gallium/drivers/radeon/radeon_uvd.c 
 b/src/gallium/drivers/radeon/radeon_uvd.c
 index 95757e3..6ac2199 100644
 --- a/src/gallium/drivers/radeon/radeon_uvd.c
 +++ b/src/gallium/drivers/radeon/radeon_uvd.c
 @@ -58,6 +58,9 @@
  #define NUM_H264_REFS 17
  #define NUM_VC1_REFS 5

 +#define FB_BUFFER_OFFSET 0x1000
 +#define FB_BUFFER_SIZE 2048
 +
  /* UVD buffer representation */
  struct ruvd_buffer
  {
 @@ -81,6 +84,7 @@ struct ruvd_decoder {

 struct ruvd_buffer  msg_fb_buffers[NUM_BUFFERS];
 struct ruvd_msg *msg;
 +   uint32_t*fb;

 struct ruvd_buffer  bs_buffers[NUM_BUFFERS];
 void*   bs_ptr;
 @@ -132,15 +136,20 @@ static void send_cmd(struct ruvd_decoder *dec, unsigned 
 cmd,
  }

  /* map the next available message buffer */
 -static void map_msg_buf(struct ruvd_decoder *dec)
 +static void map_msg_fb_buf(struct ruvd_decoder *dec)
  {
 struct ruvd_buffer* buf;
 +   void *ptr;

 -   /* grap the current message buffer */
 +   /* grap the current message/feedback buffer */
 buf = dec-msg_fb_buffers[dec-cur_buffer];

 -   /* copy the message into it */
 -   dec-msg = dec-ws-buffer_map(buf-cs_handle, dec-cs, 
 PIPE_TRANSFER_WRITE);
 +   /* and map it for CPU access */
 +   ptr = dec-ws-buffer_map(buf-cs_handle, dec-cs, 
 PIPE_TRANSFER_WRITE);
 +
 +   /* calc buffer offsets */
 +   dec-msg = ptr;
 +   dec-fb = ptr + FB_BUFFER_OFFSET;
  }

  /* unmap and send a message command to the VCPU */
 @@ -148,8 +157,8 @@ static void send_msg_buf(struct ruvd_decoder *dec)
  {
 struct ruvd_buffer* buf;

 -   /* ignore the request if message buffer isn't mapped */
 -   if (!dec-msg)
 +   /* ignore the request if message/feedback buffer isn't mapped */
 +   if (!dec-msg || !dec-fb)
 return;

 /* grap the current message buffer */
 @@ -157,6 +166,8 @@ static void send_msg_buf(struct ruvd_decoder *dec)

 /* unmap the buffer */
 dec-ws-buffer_unmap(buf-cs_handle);
 +   dec-msg = NULL;
 +   dec-fb = NULL;

 /* and send it to the hardware */
 send_cmd(dec, RUVD_CMD_MSG_BUFFER, buf-cs_handle, 0,
 @@ -644,7 +655,7 @@ static void ruvd_destroy(struct pipe_video_codec *decoder)

 assert(decoder);

 -   map_msg_buf(dec);
 +   map_msg_fb_buf(dec);
 memset(dec-msg, 0, sizeof(*dec-msg));
 dec-msg-size = sizeof(*dec-msg);
 dec-msg-msg_type = RUVD_MSG_DESTROY;
 @@ -773,7 +784,7 @@ static void ruvd_end_frame(struct pipe_video_codec 
 *decoder,
 memset(dec-bs_ptr, 0, bs_size - dec-bs_size);
 dec-ws-buffer_unmap(bs_buf-cs_handle);

 -   map_msg_buf(dec);
 +   map_msg_fb_buf(dec);
 dec-msg-size = sizeof(*dec-msg);
 dec-msg-msg_type = RUVD_MSG_DECODE;
 dec-msg-stream_handle = dec-stream_handle;
 @@ -813,6 +824,10 @@ static void ruvd_end_frame(struct pipe_video_codec 
 *decoder,

 dec-msg-body.decode.db_surf_tile_config = 
 dec-msg-body.decode.dt_surf_tile_config;
 dec-msg-body.decode.extension_support = 0x1;
 +
 +   /* set at least the feedback buffer size */
 +   dec-fb[0] = FB_BUFFER_SIZE;
 +
 send_msg_buf(dec);

 send_cmd(dec, RUVD_CMD_DPB_BUFFER, dec-dpb.cs_handle, 0,
 @@ -822,7 +837,7 @@ static void ruvd_end_frame(struct pipe_video_codec 
 *decoder,
 send_cmd(dec, RUVD_CMD_DECODING_TARGET_BUFFER, dt, 0,
  RADEON_USAGE_WRITE, RADEON_DOMAIN_VRAM);
 send_cmd(dec, RUVD_CMD_FEEDBACK_BUFFER, msg_fb_buf-cs_handle,
 -0x1000, RADEON_USAGE_WRITE, RADEON_DOMAIN_GTT);
 +FB_BUFFER_OFFSET, RADEON_USAGE_WRITE, RADEON_DOMAIN_GTT);
 set_reg(dec, RUVD_ENGINE_CNTL, 1);

 flush(dec);
 @@ -898,7 +913,8 @@ struct pipe_video_codec *ruvd_create_decoder(struct 
 pipe_context *context,

 bs_buf_size = width * height * 512 / (16 * 16);
 for (i = 0; i  NUM_BUFFERS; ++i) {
 -   unsigned msg_fb_size = align(sizeof(struct ruvd_msg), 0x1000) 
 + 0x1000;
 +   unsigned msg_fb_size = FB_BUFFER_OFFSET + FB_BUFFER_SIZE;
 +   assert(sizeof(struct ruvd_msg) = FB_BUFFER_OFFSET);
 if (!create_buffer(dec, dec-msg_fb_buffers[i], 
 msg_fb_size)) {
 RUVD_ERR(Can't allocated message 

Re: [Mesa-dev] [PATCH 3/3] i965: Use brw_bo_map[_gtt]() in intel_miptree_map_raw().

2014-02-03 Thread Eric Anholt
Kenneth Graunke kenn...@whitecape.org writes:

 This moves the intel_batchbuffer_flush before the drm_intel_bo_busy
 call, so it isn't entirely equivalent, but I think that should be fine.

Where by isn't entirely equivalent, you mean actually works. :)

This series is:

Reviewed-by: Eric Anholt e...@anholt.net


pgpc3Zj04xGtD.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: Initialize ubo_binding_mask flags to zero.

2014-02-03 Thread Matt Turner
Missed in commit e63bb298. Caused sporadic test failures, like
incorrect-in-layout-qualifier-repeated-prim.geom.

Cc: 10.0 mesa-sta...@lists.freedesktop.org
---
 src/glsl/ast_type.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/glsl/ast_type.cpp b/src/glsl/ast_type.cpp
index 637da0d..5069326 100644
--- a/src/glsl/ast_type.cpp
+++ b/src/glsl/ast_type.cpp
@@ -118,6 +118,7 @@ ast_type_qualifier::merge_qualifier(YYLTYPE *loc,
ubo_layout_mask.flags.q.shared = 1;
 
ast_type_qualifier ubo_binding_mask;
+   ubo_binding_mask.flags.i = 0;
ubo_binding_mask.flags.q.explicit_binding = 1;
ubo_binding_mask.flags.q.explicit_offset = 1;
 
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Mesa 10.0.3

2014-02-03 Thread Carl Worth
Mesa 10.0.3 has been released. Mesa 10.0.3 is a bug fix release which
fixes bugs fixed since the 10.0.2 release, (see below for a list of
changes).

The tag in the git repository for Mesa 10.0.3 is 'mesa-10.0.3'.

Mesa 10.0.3 is available for download at
ftp://freedesktop.org/pub/mesa/10.0.3/

md5sums:

5f9f463ef08129f6762106b434910adb  MesaLib-10.0.3.tar.bz2
fb3997b6500e153bc32370cb3fc4ca9e  MesaLib-10.0.3.tar.gz
a07b4b6b9eb449b88a6cb5061e51c331  MesaLib-10.0.3.zip

I have verified building from the .tar.bz2 file by doing:

tar xjf MesaLib-10.0.3.tar.bz2
cd Mesa-10.0.3
./configure --enable-gallium-llvm --with-llvm-shared-libs
make -j6
make install

I have also verified that I pushed the tag.

-Carl

-- 
carl.d.wo...@intel.com

Changes from 10.0.2 to 10.0.3:

Aaron Watry (2):
  radeon: Move gfx/dma cs cleanup to r600_common_context_cleanup
  st/dri: prevent leak of dri option default values

Andreas Fänger (1):
  swrast: fix delayed texel buffer allocation regression for OpenMP

Anuj Phogat (3):
  glsl: Disable ARB_texture_rectangle in shader version 100.
  i965: Use sample barycentric coordinates with per sample shading
  i965: Ignore 'centroid' interpolation qualifier in case of persample 
shading

Brian Paul (3):
  mesa: implement missing glGet(GL_RGBA_SIGNED_COMPONENTS_EXT) query
  st/mesa: fix glReadBuffer(GL_NONE) segfault
  draw: fix incorrect vertex size computation in LLVM drawing code

Carl Worth (6):
  Add md5sums for 10.0.2. release.
  cherry-ignore: Ignore several patches not yet ready for the stable branch
  Drop another couple of patches.
  cherry-ignore: Ignore 4 patches at teh request of the author, (Anuj).
  Update version to 10.0.3
  docs: Add release notes for 10.0.3 release.

Chad Versace (1):
  i965/gen6/blorp: Emit more flushes to workaround hangs

Chris Forbes (1):
  i965: fold offset into coord for textureOffset(gsampler2DRect)

Emil Velikov (5):
  mesa: use signed temporary variable to store _ColorDrawBufferIndexes
  st/mesa: use signed temporary variable to store _ColorDrawBufferIndexes
  nv50: access only the available amount of textures
  nv50: access only the available amount of constbuf
  gallium/rtasm: handle mmap failures appropriately

Eric Anholt (2):
  i965: Fix handling of MESA_pack_invert in blit (PBO) readpixels.
  i965: Don't do the temporary-and-blit-copy for INVALIDATE_RANGE maps.

Ian Romanick (2):
  mesa: Add COMPRESSED_RGBA_S3TC_DXT1_EXT to COMPRESSED_TEXTURE_FORMATS for 
GLES
  radeon / r200: Pass the API into _mesa_initialize_context

Ilia Mirkin (2):
  mesa: fix GL_COLOR_SUM enum for drivers without ARB_vertex_program
  st/vdpau: don't return a device if the screen doesn't support NPOT

José Fonseca (1):
  mesa: Use IROUND instead of roundf.

Kenneth Graunke (2):
  glsl: Rename expr to lhs_expr in vector_extract munging code.
  glsl: Fix chained assignments of vector channels.

Lauri Kasanen (1):
  mesa: Fix build to properly check for supported compiler flags

Marek Olšák (2):
  st/mesa: use sRGB formats for MSAA resolving if destination is sRGB
  gallium/util: util_format_srgb should not return FORMAT_NONE for sRGB 
formats

Matt Turner (2):
  glcpp: Define GL_EXT_shader_integer_mix in both GL and ES.
  glx: Update glxext.h to revision 24777.

Michał Górny (1):
  Use AC_PATH_TOOL instead of AC_PATH_PROG for llvm-config.

Paul Berry (1):
  i965: Ensure that all necessary state is re-emitted if we run out of 
aperture.

Paul Seidler (1):
  build: move ARCH_LIBS definition outside of ASM definition

Thomas Sondergaard (4):
  mesa: Preliminary support for MSVC_VERSION=12.0
  mesa: Fix compile error with MSVC 2013
  mesa: Work around internal compiler error
  mesa: Namespace qualify fma to override ambiguity with fma from math.h

Tom Stellard (1):
  r600g/compute: Emit DEALLOC_STATE on cayman after dispatching a compute 
shader.

-- 
carl.d.wo...@intel.com


pgpaCLw8sdo9l.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] R600/SI: Fix fneg for 0.0

2014-02-03 Thread Tom Stellard
On Mon, Feb 03, 2014 at 04:43:10PM +0900, Michel Dänzer wrote:
 On Don, 2014-01-30 at 10:43 -0500, Tom Stellard wrote:
  On Wed, Jan 29, 2014 at 06:23:00PM +0900, Michel Dänzer wrote:
   From: Michel Dänzer michel.daen...@amd.com
   
   V_ADD_F32 with source modifier does not produce -0.0 for this. Just
   manipulate the sign bit directly instead.
  
  That's strange, so does this mean we can never use these modifiers?
 
 I think we could use them for folding fabs/fneg into other instructions
 using their results, as we're already doing for pre-SI.
 
 The problem here is that adding -0.0 to 0.0 results in 0.0, not -0.0.
 
 
   Also add a pattern for (fneg (fabs ...)).
   
   Fixes a bunch of bit encoding piglit tests with radeonsi.
   
   Signed-off-by: Michel Dänzer michel.daen...@amd.com
   ---
lib/Target/R600/SIInstructions.td | 11 +++
1 file changed, 7 insertions(+), 4 deletions(-)
   
   diff --git a/lib/Target/R600/SIInstructions.td 
   b/lib/Target/R600/SIInstructions.td
   index 912b59a..43fe63c 100644
   --- a/lib/Target/R600/SIInstructions.td
   +++ b/lib/Target/R600/SIInstructions.td
   @@ -1684,15 +1684,18 @@ def : Pat 
;

def : Pat 
   +  (fneg (fabs f32:$src)),
   +  (V_OR_B32_e32 $src, (V_MOV_B32_e32 0x8000)) /* Set sign bit */
   +;
   +
   +def : Pat 
  (fabs f32:$src),
   -  (V_ADD_F32_e64 $src, (i32 0 /* SRC1 */),
   -   1 /* ABS */, 0 /* CLAMP */, 0 /* OMOD */, 0 /* NEG */)
   +  (V_AND_B32_e32 $src, (V_MOV_B32_e32 0x7fff)) /* Clear sign bit */
;

def : Pat 
  (fneg f32:$src),
   -  (V_ADD_F32_e64 $src, (i32 0 /* SRC1 */),
   -   0 /* ABS */, 0 /* CLAMP */, 0 /* OMOD */, 1 /* NEG */)
   +  (V_XOR_B32_e32 $src, (V_MOV_B32_e32 0x8000)) /* Toggle sign bit */
;
  
  I think you may be able to achieve the same results by marking
  ISD::FNEG and ISD::FABS as Expand in SIISelLowering.
 
 That seems to work as expected for the *-floatBitsToInt-neg(_abs) piglit
 tests, but the lit tests end up using V_SUB_F32 vX, -0.0, vY for fneg,
 and while fabs results in V_AND_B32 as expected for a single f32, it
 ends up using more complex comparisons and selects for v2f32 and v4f32.
 So I'm not sure what to do about the lit tests in that case.
 

This sounds like a bug in the vector lowering code.

 
  Also, we have implemented isFAbsFree() and isFNegFree() in
  AMDGPUISelLowering.cpp
  We will need to move these implementations into R600ISelLowering.cpp
  now that FAbs and FNeg are no longer free on SI.
 
 FWIW, they're not really more expensive with this change than before. :)
 I think implementing these for SI is already wrong at this point.
 

Yes, good point.

 
 May I ask you to fix this in your preferred way?
 

It's clear that there a few things that are wrong which are unrelated to this
patch, so I think it is fine as is.  Could you add a comment above the pattern
explaining why we need to manually toggle the sign bit and also a todo to fix 
FabsFree
and FNegFree.

If you add a lit test, then this patch has my r-b.

-Tom


 
 -- 
 Earthling Michel Dänzer|  http://www.amd.com
 Libre software enthusiast  |Mesa and X developer
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 22/30] mesa/cs: Implement MAX_COMPUTE_WORK_GROUP_INVOCATIONS constant.

2014-02-03 Thread Paul Berry
On 1 February 2014 21:25, Jordan Justen jljus...@gmail.com wrote:

 On Thu, Jan 9, 2014 at 6:19 PM, Paul Berry stereotype...@gmail.com
 wrote:
  ---
   src/glsl/main.cpp   | 1 +
   src/glsl/standalone_scaffolding.cpp | 1 +
   src/mesa/main/context.c | 1 +
   src/mesa/main/get.c | 1 +
   src/mesa/main/get_hash_params.py| 3 +++
   src/mesa/main/mtypes.h  | 1 +
   6 files changed, 8 insertions(+)
 
  diff --git a/src/glsl/main.cpp b/src/glsl/main.cpp
  index bb2054f..94bc1cc 100644
  --- a/src/glsl/main.cpp
  +++ b/src/glsl/main.cpp
  @@ -53,6 +53,7 @@ initialize_context(struct gl_context *ctx, gl_api api)
  ctx-Const.MaxComputeWorkGroupSize[0] = 1024;
  ctx-Const.MaxComputeWorkGroupSize[1] = 1024;
  ctx-Const.MaxComputeWorkGroupSize[2] = 64;
  +   ctx-Const.MaxComputeWorkGroupInvocations = 1024;
  ctx-Const.Program[MESA_SHADER_COMPUTE].MaxTextureImageUnits = 16;
  ctx-Const.Program[MESA_SHADER_COMPUTE].MaxUniformComponents = 1024;
  ctx-Const.Program[MESA_SHADER_COMPUTE].MaxInputComponents = 0; /*
 not used */
  diff --git a/src/glsl/standalone_scaffolding.cpp
 b/src/glsl/standalone_scaffolding.cpp
  index e8eb529..0c83ea3 100644
  --- a/src/glsl/standalone_scaffolding.cpp
  +++ b/src/glsl/standalone_scaffolding.cpp
  @@ -143,6 +143,7 @@ void initialize_context_to_defaults(struct
 gl_context *ctx, gl_api api)
  ctx-Const.MaxComputeWorkGroupSize[0] = 1024;
  ctx-Const.MaxComputeWorkGroupSize[1] = 1024;
  ctx-Const.MaxComputeWorkGroupSize[2] = 64;
  +   ctx-Const.MaxComputeWorkGroupInvocations = 1024;
  ctx-Const.Program[MESA_SHADER_COMPUTE].MaxTextureImageUnits = 16;
  ctx-Const.Program[MESA_SHADER_COMPUTE].MaxUniformComponents = 1024;
  ctx-Const.Program[MESA_SHADER_COMPUTE].MaxInputComponents = 0; /*
 not used */
  diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
  index ebe27b4..942f247 100644
  --- a/src/mesa/main/context.c
  +++ b/src/mesa/main/context.c
  @@ -695,6 +695,7 @@ _mesa_init_constants(struct gl_context *ctx)
  ctx-Const.MaxComputeWorkGroupSize[0] = 1024;
  ctx-Const.MaxComputeWorkGroupSize[1] = 1024;
  ctx-Const.MaxComputeWorkGroupSize[2] = 64;
  +   ctx-Const.MaxComputeWorkGroupInvocations = 1024;
   }
 
 
  diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
  index 6b914f4..bcbb5d5 100644
  --- a/src/mesa/main/get.c
  +++ b/src/mesa/main/get.c
  @@ -382,6 +382,7 @@ EXTRA_EXT(ARB_texture_multisample);
   EXTRA_EXT(ARB_texture_gather);
   EXTRA_EXT(ARB_shader_atomic_counters);
   EXTRA_EXT(ARB_draw_indirect);
  +EXTRA_EXT(ARB_compute_shader);
 
   static const int
   extra_ARB_color_buffer_float_or_glcore[] = {
  diff --git a/src/mesa/main/get_hash_params.py
 b/src/mesa/main/get_hash_params.py
  index 7f025a9..c18e848 100644
  --- a/src/mesa/main/get_hash_params.py
  +++ b/src/mesa/main/get_hash_params.py
  @@ -742,6 +742,9 @@ descriptor=[
   # GL_ARB_vertex_attrib_binding
 [ MAX_VERTEX_ATTRIB_RELATIVE_OFFSET,
 CONTEXT_ENUM(Const.MaxVertexAttribRelativeOffset), NO_EXTRA ],
 [ MAX_VERTEX_ATTRIB_BINDINGS,
 CONTEXT_ENUM(Const.MaxVertexAttribBindings), NO_EXTRA ],
  +
  +# GL_ARB_compute_shader
  +  [ MAX_COMPUTE_WORK_GROUP_INVOCATIONS,
 CONTEXT_ENUM(Const.MaxComputeWorkGroupInvocations),
 extra_ARB_compute_shader ],

 CONTEXT_ENUM and CONTEXT_INT seem to follow the same code paths, but
 would CONTEXT_INT be a better fit here?


You're right, CONTEXT_INT makes more sense here.  I've changed it.



 Also, how do we decide between adding to gl_constants and just using
 CONST(1024)? My thought would be to use CONST unless we are pretty
 sure drivers will want to vary the value.


My reasoning for using gl_constants is that the value of the constant is
determined by a hardware limitation rather than a core Mesa software
limitation, hence it's likely to vary from one back end to the next.  I had
a quick look through our uses of CONST and it looks like most of the time
we use CONST it's when we're talking about a core Mesa software limitation
(e.g. MAX_MODELVIEW_STACK_DEPTH is CONST(MAX_MODELVIEW_STACK_DEPTH) because
that's the size of the per-context data structure we use to keep track of
the modelview matrix stack.

I admit that we haven't historically been terribly consistent about this
decision, though :)



 21-23 Reviewed-by: Jordan Justen jordan.l.jus...@intel.com

   ]},
 
   # Enums restricted to OpenGL Core profile
  diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
  index d3f3a30..6481dc1 100644
  --- a/src/mesa/main/mtypes.h
  +++ b/src/mesa/main/mtypes.h
  @@ -3323,6 +3323,7 @@ struct gl_constants
 
  /** GL_ARB_compute_shader */
  GLuint MaxComputeWorkGroupSize[3];
  +   GLuint MaxComputeWorkGroupInvocations;
   };
 
 
  --
  1.8.5.2
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 74251] Segfault in st_finalize_texture with Texture Buffer

2014-02-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=74251

--- Comment #9 from Ian Milligan ianml...@gmail.com ---
It appears that loading a shader which expects a texture buffer causes the
first texture unit's target to be set to TEXTURE_BUFFER. When st_validate_state
is called without a texture bound to this texture unit, st_validate_texture is
called on this texture unit's texture object. However st_validate_texture
expects a texture object with target TEXTURE_BUFFER to have defined the field
BufferObject, causing the null pointer dereference. Unfortunately, I'm not sure
where the fault lies here.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 26/30] main/cs: Implement query for COMPUTE_WORK_GROUP_SIZE.

2014-02-03 Thread Paul Berry
On 1 February 2014 22:28, Jordan Justen jljus...@gmail.com wrote:

 On Thu, Jan 9, 2014 at 6:19 PM, Paul Berry stereotype...@gmail.com
 wrote:
  ---
   src/mesa/main/shaderapi.c | 18 ++
   1 file changed, 18 insertions(+)
 
  diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
  index 053f27b..680d449 100644
  --- a/src/mesa/main/shaderapi.c
  +++ b/src/mesa/main/shaderapi.c
  @@ -663,6 +663,24 @@ get_programiv(struct gl_context *ctx, GLuint
 program, GLenum pname, GLint *param
 
 *params = shProg-NumAtomicBuffers;
 return;
  +   case GL_COMPUTE_WORK_GROUP_SIZE: {
  +  int i;
  +  if (!_mesa_is_desktop_gl(ctx) ||
 !ctx-Extensions.ARB_compute_shader)
  + break;
  +  if (!shProg-LinkStatus) {
  + _mesa_error(ctx, GL_INVALID_OPERATION, glGetProgramiv(program
 not 
  + linked successfully));

 Nit, but dropping  successfully sounds better to me. It seems more
 likely that they tried to get the value before linking than that they
 ignored a link error.


Sure, I can go along with that.  I've changed it to just
glGetProgramiv(program not linked).



 24-26 Reviewed-by: Jordan Justen jordan.l.jus...@intel.com

  + return;
  +  }
  +  if (shProg-_LinkedShaders[MESA_SHADER_COMPUTE] == NULL) {
  + _mesa_error(ctx, GL_INVALID_OPERATION, glGetProgramiv(no
 compute 
  + shaders));
  + return;
  +  }
  +  for (i = 0; i  3; i++)
  + params[i] = shProg-Comp.LocalSize[i];
  +  return;
  +   }
  default:
 break;
  }
  --
  1.8.5.2
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 29/30] i965/cs: Create the brw_compute_program struct, and the code to initialize it.

2014-02-03 Thread Paul Berry
On 1 February 2014 22:37, Jordan Justen jljus...@gmail.com wrote:

 On Thu, Jan 9, 2014 at 6:19 PM, Paul Berry stereotype...@gmail.com
 wrote:
  ---
   src/mesa/drivers/dri/i965/brw_context.h |  8 
   src/mesa/drivers/dri/i965/brw_program.c | 11 +++
   2 files changed, 19 insertions(+)
 
  diff --git a/src/mesa/drivers/dri/i965/brw_context.h
 b/src/mesa/drivers/dri/i965/brw_context.h
  index df32ccb..abc1783 100644
  --- a/src/mesa/drivers/dri/i965/brw_context.h
  +++ b/src/mesa/drivers/dri/i965/brw_context.h
  @@ -316,6 +316,14 @@ struct brw_fragment_program {
  GLuint id;  /** serial no. to identify frag progs, never re-used */
   };
 
  +
  +/** Subclass of Mesa compute program */
  +struct brw_compute_program {
  +   struct gl_compute_program program;
  +   unsigned id;  /** serial no. to identify frag progs, never re-used
 */

 frag in comment


Whoops, thanks.  I've changed it to serial no. to identify compute
progs



 Reviewed-by: Jordan Justen jordan.l.jus...@intel.com

  +};
  +
  +
   struct brw_shader {
  struct gl_shader base;
 
  diff --git a/src/mesa/drivers/dri/i965/brw_program.c
 b/src/mesa/drivers/dri/i965/brw_program.c
  index 90844e5..2d92acb 100644
  --- a/src/mesa/drivers/dri/i965/brw_program.c
  +++ b/src/mesa/drivers/dri/i965/brw_program.c
  @@ -113,6 +113,17 @@ static struct gl_program *brwNewProgram( struct
 gl_context *ctx,
 }
  }
 
  +   case GL_COMPUTE_PROGRAM_NV: {
  +  struct brw_compute_program *prog =
 CALLOC_STRUCT(brw_compute_program);
  +  if (prog) {
  + prog-id = get_new_program_id(brw-intelScreen);
  +
  + return _mesa_init_compute_program(ctx, prog-program, target,
 id);
  +  } else {
  + return NULL;
  +  }
  +   }
  +
  default:
 assert(!Unsupported target in brwNewProgram());
 return NULL;
  --
  1.8.5.2
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] r600g geometry shader and hence GL 3.3 support

2014-02-03 Thread Dave Airlie
These 30 patches comprise geometry shader support for r600-cayman GPUs,

Since I last posted, I've fixed all the regressions caused by the initial
introduction of geometry shaders patch, this passes a large amount of the
geometry tests, and I'd like to merge this soon, I think any remaining problems
can be fixed in tree.

Dave.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 17/30] r600g: write proper output prim type

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

Vadim's code derived it from the info.mode, but it needs
to be takes from the geometry shader output primitive.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/evergreen_state.c   | 27 ++-
 src/gallium/drivers/r600/r600_state_common.c | 26 --
 2 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 0578520..2828a75 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -3593,6 +3593,30 @@ void evergreen_update_es_state(struct pipe_context *ctx, 
struct r600_pipe_shader
/* After that, the NOP relocation packet must be emitted (shader-bo, 
RADEON_USAGE_READ). */
 }
 
+static unsigned r600_conv_prim_to_gs_out(unsigned mode)
+{
+   static const int prim_conv[] = {
+   V_028A6C_OUTPRIM_TYPE_POINTLIST,
+   V_028A6C_OUTPRIM_TYPE_LINESTRIP,
+   V_028A6C_OUTPRIM_TYPE_LINESTRIP,
+   V_028A6C_OUTPRIM_TYPE_LINESTRIP,
+   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   V_028A6C_OUTPRIM_TYPE_LINESTRIP,
+   V_028A6C_OUTPRIM_TYPE_LINESTRIP,
+   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   V_028A6C_OUTPRIM_TYPE_TRISTRIP
+   };
+   assert(mode  Elements(prim_conv));
+
+   return prim_conv[mode];
+}
+
 void evergreen_update_gs_state(struct pipe_context *ctx, struct 
r600_pipe_shader *shader)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
@@ -3604,13 +3628,14 @@ void evergreen_update_gs_state(struct pipe_context 
*ctx, struct r600_pipe_shader
 
r600_init_command_buffer(cb, 64);
 
-   /* VGT_GS_OUT_PRIM_TYPE is written by r6000_draw_vbo */
/* VGT_GS_MODE is written by evergreen_emit_shader_stages */
 
r600_store_context_reg(cb, R_028AB8_VGT_VTX_CNT_EN, 1);
 
r600_store_context_reg(cb, R_028B38_VGT_GS_MAX_VERT_OUT,
   
S_028B38_MAX_VERT_OUT(rshader-gs_max_out_vertices));
+   r600_store_context_reg(cb, R_028A6C_VGT_GS_OUT_PRIM_TYPE,
+  
r600_conv_prim_to_gs_out(rshader-gs_output_prim));
 
if (rctx-screen-b.info.drm_minor = 35) {
r600_store_context_reg(cb, R_028B90_VGT_GS_INSTANCE_CNT,
diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index 6cc6ef0..96fdd0e 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -1285,30 +1285,6 @@ static bool r600_update_derived_state(struct 
r600_context *rctx)
return true;
 }
 
-static unsigned r600_conv_prim_to_gs_out(unsigned mode)
-{
-   static const int prim_conv[] = {
-   V_028A6C_OUTPRIM_TYPE_POINTLIST,
-   V_028A6C_OUTPRIM_TYPE_LINESTRIP,
-   V_028A6C_OUTPRIM_TYPE_LINESTRIP,
-   V_028A6C_OUTPRIM_TYPE_LINESTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
-   V_028A6C_OUTPRIM_TYPE_LINESTRIP,
-   V_028A6C_OUTPRIM_TYPE_LINESTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP
-   };
-   assert(mode  Elements(prim_conv));
-
-   return prim_conv[mode];
-}
-
 void r600_emit_clip_misc_state(struct r600_context *rctx, struct r600_atom 
*atom)
 {
struct radeon_winsys_cs *cs = rctx-b.rings.gfx.cs;
@@ -1438,8 +1414,6 @@ static void r600_draw_vbo(struct pipe_context *ctx, const 
struct pipe_draw_info
r600_write_context_reg(cs, R_028A0C_PA_SC_LINE_STIPPLE,
   S_028A0C_AUTO_RESET_CNTL(ls_mask) |
   (rctx-rasterizer ? 
rctx-rasterizer-pa_sc_line_stipple : 0));
-   r600_write_context_reg(cs, R_028A6C_VGT_GS_OUT_PRIM_TYPE,
-  r600_conv_prim_to_gs_out(info.mode));
r600_write_config_reg(cs, R_008958_VGT_PRIMITIVE_TYPE,
  r600_conv_pipe_prim(info.mode));
 
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 15/30] r600g: add primitive input support for gs

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

only enable prim id if gs uses it

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/evergreen_state.c | 6 +-
 src/gallium/drivers/r600/evergreend.h  | 4 
 src/gallium/drivers/r600/r600_shader.c | 9 +
 src/gallium/drivers/r600/r600_shader.h | 1 +
 4 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index e6aa122..c206c08 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -2725,7 +2725,7 @@ static void evergreen_emit_shader_stages(struct 
r600_context *rctx, struct r600_
struct radeon_winsys_cs *cs = rctx-b.rings.gfx.cs;
struct r600_shader_stages_state *state = (struct 
r600_shader_stages_state*)a;
 
-   uint32_t v = 0, v2 = 0;
+   uint32_t v = 0, v2 = 0, primid = 0;
 
if (state-geom_enable) {
v = S_028B54_ES_EN(V_028B54_ES_STAGE_REAL) |
@@ -2734,10 +2734,14 @@ static void evergreen_emit_shader_stages(struct 
r600_context *rctx, struct r600_
 
v2 = S_028A40_MODE(V_028A40_GS_SCENARIO_G) |
S_028A40_CUT_MODE(V_028A40_GS_CUT_128);
+
+   if (rctx-gs_shader-current-shader.gs_prim_id_input)
+   primid = 1;
}
 
r600_write_context_reg(cs, R_028B54_VGT_SHADER_STAGES_EN, v);
r600_write_context_reg(cs, R_028A40_VGT_GS_MODE, v2);
+   r600_write_context_reg(cs, R_028A84_VGT_PRIMITIVEID_EN, primid);
 }
 
 static void evergreen_emit_gs_rings(struct r600_context *rctx, struct 
r600_atom *a)
diff --git a/src/gallium/drivers/r600/evergreend.h 
b/src/gallium/drivers/r600/evergreend.h
index 73b37c2..986fd16 100644
--- a/src/gallium/drivers/r600/evergreend.h
+++ b/src/gallium/drivers/r600/evergreend.h
@@ -1983,6 +1983,10 @@
 #define R_028A58_ES_PER_GS   0x00028A58
 #define R_028A5C_GS_PER_VS   0x00028A5C
 
+#define R_028A84_VGT_PRIMITIVEID_EN  0x028A84
+#define   S_028A84_PRIMITIVEID_EN(x)   (((x)  0x1)  0)
+#define   G_028A84_PRIMITIVEID_EN(x)   (((x)  0)  0x1)
+#define   C_028A84_PRIMITIVEID_EN  0xFFFE
 #define R_028A94_VGT_MULTI_PRIM_IB_RESET_EN  0x00028A94
 #define   S_028A94_RESET_EN(x) (((x)  0x1)  0)
 #define   G_028A94_RESET_EN(x) (((x)  0)  0x1)
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index e24b64d..fd11f61 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -590,6 +590,8 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx)
/* FIXME probably skip inputs if they aren't passed in 
the ring */
ctx-shader-input[i].ring_offset = 
ctx-next_ring_offset;
ctx-next_ring_offset += 16;
+   if (ctx-shader-input[i].name == TGSI_SEMANTIC_PRIMID)
+   ctx-shader-gs_prim_id_input = true;
}
for (j = 1; j  count; ++j) {
ctx-shader-input[i + j] = ctx-shader-input[i];
@@ -872,6 +874,13 @@ static int tgsi_split_gs_inputs(struct r600_shader_ctx 
*ctx)
for (i = 0; i  inst-Instruction.NumSrcRegs; i++) {
struct tgsi_full_src_register *src = inst-Src[i];
 
+   if (src-Register.File == TGSI_FILE_INPUT) {
+   if (ctx-shader-input[src-Register.Index].name == 
TGSI_SEMANTIC_PRIMID) {
+   /* primitive id is in R0.z */
+   ctx-src[i].sel = 0;
+   ctx-src[i].swizzle[0] = 2;
+   }
+   }
if (src-Register.File == TGSI_FILE_INPUT  
src-Register.Dimension) {
int treg = r600_get_temp(ctx);
int index = src-Register.Index;
diff --git a/src/gallium/drivers/r600/r600_shader.h 
b/src/gallium/drivers/r600/r600_shader.h
index 0bf8b51..d31c65f 100644
--- a/src/gallium/drivers/r600/r600_shader.h
+++ b/src/gallium/drivers/r600/r600_shader.h
@@ -64,6 +64,7 @@ struct r600_shader {
boolean vs_out_point_size;
boolean has_txq_cube_array_z_comp;
boolean uses_tex_buffers;
+   boolean gs_prim_id_input;
 
/* geometry shader properties */
unsignedgs_input_prim;
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/30] r600g: don't enable SB for geom shaders

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

SB needs fixes for three GS instructions it seems to raise
them outside loops etc despite my best efforts.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_shader.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 73b2909..59fda7b 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -158,6 +158,9 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
return r;
}
 
+   /* disable SB for geom shaders - it can't handle the CF_EMIT 
instructions */
+   use_sb = (shader-shader.processor_type != TGSI_PROCESSOR_GEOMETRY);
+
/* Check if the bytecode has already been built.  When using the llvm
 * backend, r600_shader_from_tgsi() will take care of building the
 * bytecode.
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/30] r600g: don't fail if we can't map VS-GS ring entries

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

This can happen in normal operation, so don't report an error on it,
just continue.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_shader.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 5606724..73b2909 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -1299,10 +1299,9 @@ static int emit_gs_ring_writes(struct r600_shader_ctx 
*ctx)
if (in-name == out-name  in-sid == 
out-sid)
ring_offset = in-ring_offset;
}
-   if (ring_offset == -1) {
-   R600_ERR(error mapping VS-GS outputs\n);
-   return -1;
-   }
+
+   if (ring_offset == -1)
+   continue;
} else
ring_offset = i * 16;
 
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/30] r600g/sb: add MEM_RING support

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

Although we don't use SB on geom shaders, the VS copy shader will use it
so we might as well implement MEM_RING support in sb.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/sb/sb_bc_dump.cpp | 3 +++
 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 2 +-
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp   | 6 +++---
 src/gallium/drivers/r600/sb/sb_dump.cpp| 2 +-
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_dump.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_dump.cpp
index f79dff1..1551e6d 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_dump.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_dump.cpp
@@ -150,6 +150,9 @@ void bc_dump::dump(cf_node n) {
if ((n.bc.op_ptr-flags  CF_RAT)  (n.bc.type  1)) {
s  , @R  n.bc.index_gpr  .xyz;
}
+   if ((n.bc.op_ptr-flags  CF_MEM)  (n.bc.type  1)) {
+   s  , @R  n.bc.index_gpr  .x;
+   }
 
sES:  n.bc.elem_size;
 
diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
index d965483..54717f9 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
@@ -695,7 +695,7 @@ void bc_finalizer::finalize_cf(cf_node* c) {
c-bc.rw_gpr = reg = 0 ? reg : 0;
c-bc.comp_mask = mask;
 
-   if ((flags  CF_RAT)  (c-bc.type  1)) {
+   if (((flags  CF_RAT) || (!(flags  CF_STRM)))  (c-bc.type  
1)) {
 
reg = -1;
 
diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
index 24c4854..7181e27 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
@@ -209,7 +209,7 @@ int bc_parser::decode_cf(unsigned i, bool eop) {
if (cf-bc.rw_rel)
gpr_reladdr = true;
assert(!cf-bc.rw_rel);
-   } else if (flags  (CF_STRM | CF_RAT)) {
+   } else if (flags  CF_MEM) {
if (cf-bc.rw_rel)
gpr_reladdr = true;
assert(!cf-bc.rw_rel);
@@ -683,7 +683,7 @@ int bc_parser::prepare_ir() {
} while (1);
 
c-bc.end_of_program = eop;
-   } else if (flags  (CF_STRM | CF_RAT)) {
+   } else if (flags  CF_MEM) {
 
unsigned burst_count = c-bc.burst_count;
unsigned eop = c-bc.end_of_program;
@@ -701,7 +701,7 @@ int bc_parser::prepare_ir() {

sh-get_gpr_value(true, c-bc.rw_gpr, s, false);
}
 
-   if ((flags  CF_RAT)  (c-bc.type  1)) { // 
indexed write
+   if (((flags  CF_RAT) || (!(flags  CF_STRM))) 
 (c-bc.type  1)) { // indexed write
c-src.resize(8);
for(int s = 0; s  3; ++s) {
c-src[4 + s] =
diff --git a/src/gallium/drivers/r600/sb/sb_dump.cpp 
b/src/gallium/drivers/r600/sb/sb_dump.cpp
index 7e317ab..b2130a4 100644
--- a/src/gallium/drivers/r600/sb/sb_dump.cpp
+++ b/src/gallium/drivers/r600/sb/sb_dump.cpp
@@ -349,7 +349,7 @@ void dump::dump_op(node n, const char *name) {
static const char *exp_type[] = {PIXEL, POS  , 
PARAM};
sblog  exp_type[c-bc.type] 
c-bc.array_base;
has_dst = false;
-   } else if (c-bc.op_ptr-flags  CF_STRM) {
+   } else if (c-bc.op_ptr-flags  (CF_MEM)) {
static const char *exp_type[] = {WRITE, WRITE_IND, 
WRITE_ACK,
WRITE_IND_ACK};
sblog  exp_type[c-bc.type] 
c-bc.array_base
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/30] r600g: initialise the geom shader loop registers.

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

As we do for vertex and pixel shaders.
`
Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/evergreen_state.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 6896617..e6aa122 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -2967,6 +2967,7 @@ static void cayman_init_atom_start_cs(struct r600_context 
*rctx)
 
eg_store_loop_const(cb, R_03A200_SQ_LOOP_CONST_0, 0x01000FFF);
eg_store_loop_const(cb, R_03A200_SQ_LOOP_CONST_0 + (32 * 4), 
0x01000FFF);
+   eg_store_loop_const(cb, R_03A200_SQ_LOOP_CONST_0 + (64 * 4), 
0x01000FFF);
 }
 
 void evergreen_init_common_regs(struct r600_command_buffer *cb,
@@ -3425,6 +3426,7 @@ void evergreen_init_atom_start_cs(struct r600_context 
*rctx)
 
eg_store_loop_const(cb, R_03A200_SQ_LOOP_CONST_0, 0x01000FFF);
eg_store_loop_const(cb, R_03A200_SQ_LOOP_CONST_0 + (32 * 4), 
0x01000FFF);
+   eg_store_loop_const(cb, R_03A200_SQ_LOOP_CONST_0 + (64 * 4), 
0x01000FFF);
 }
 
 void evergreen_update_ps_state(struct pipe_context *ctx, struct 
r600_pipe_shader *shader)
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 13/30] r600g/gs: fix cases where number of gs inputs != number of gs outputs

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

this fixes a bunch of the geom shader built-in tests
---
 src/gallium/drivers/r600/r600_shader.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index f4a3194..07b343b 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -285,6 +285,7 @@ struct r600_shader_ctx {
int fragcoord_input;
int native_integers;
int next_ring_offset;
+   int gs_out_ring_offset;
int gs_next_vertex;
struct r600_shader  *gs_for_vs;
 };
@@ -619,6 +620,9 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx)
ctx-cv_output = i;
break;
}
+   if (ctx-type == TGSI_PROCESSOR_GEOMETRY) {
+   ctx-gs_out_ring_offset += 16;
+   }
} else if (ctx-type == TGSI_PROCESSOR_FRAGMENT) {
switch (d-Semantic.Name) {
case TGSI_SEMANTIC_COLOR:
@@ -1313,7 +1317,7 @@ static int emit_gs_ring_writes(struct r600_shader_ctx 
*ctx)
 
/* next_ring_offset after parsing input decls contains total 
size of
 * single vertex data, gs_next_vertex - current vertex index */
-   ring_offset += ctx-next_ring_offset * ctx-gs_next_vertex;
+   ring_offset += ctx-gs_out_ring_offset * ctx-gs_next_vertex;
 
memset(output, 0, sizeof(struct r600_bytecode_output));
output.gpr = ctx-shader-output[i].gpr;
@@ -1379,6 +1383,7 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
}
 
ctx.next_ring_offset = 0;
+   ctx.gs_out_ring_offset = 0;
ctx.gs_next_vertex = 0;
 
ctx.face_gpr = -1;
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/30] r600g/bc: add support for indexed memory writes.

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

It looks like we need these for geom shaders in the future.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/eg_asm.c   | 6 --
 src/gallium/drivers/r600/r600_asm.c | 9 +++--
 src/gallium/drivers/r600/r600_asm.h | 1 +
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/r600/eg_asm.c 
b/src/gallium/drivers/r600/eg_asm.c
index 42e78c0..5ad3d77 100644
--- a/src/gallium/drivers/r600/eg_asm.c
+++ b/src/gallium/drivers/r600/eg_asm.c
@@ -79,7 +79,8 @@ int eg_bytecode_cf_build(struct r600_bytecode *bc, struct 
r600_bytecode_cf *cf)
bc-bytecode[id++] = 
S_SQ_CF_ALLOC_EXPORT_WORD0_RW_GPR(cf-output.gpr) |

S_SQ_CF_ALLOC_EXPORT_WORD0_ELEM_SIZE(cf-output.elem_size) |

S_SQ_CF_ALLOC_EXPORT_WORD0_ARRAY_BASE(cf-output.array_base) |
-   
S_SQ_CF_ALLOC_EXPORT_WORD0_TYPE(cf-output.type);
+   
S_SQ_CF_ALLOC_EXPORT_WORD0_TYPE(cf-output.type) |
+   
S_SQ_CF_ALLOC_EXPORT_WORD0_INDEX_GPR(cf-output.index_gpr);
bc-bytecode[id] =

S_SQ_CF_ALLOC_EXPORT_WORD1_BURST_COUNT(cf-output.burst_count - 1) |

S_SQ_CF_ALLOC_EXPORT_WORD1_SWIZ_SEL_X(cf-output.swizzle_x) |
@@ -97,7 +98,8 @@ int eg_bytecode_cf_build(struct r600_bytecode *bc, struct 
r600_bytecode_cf *cf)
bc-bytecode[id++] = 
S_SQ_CF_ALLOC_EXPORT_WORD0_RW_GPR(cf-output.gpr) |

S_SQ_CF_ALLOC_EXPORT_WORD0_ELEM_SIZE(cf-output.elem_size) |

S_SQ_CF_ALLOC_EXPORT_WORD0_ARRAY_BASE(cf-output.array_base) |
-   
S_SQ_CF_ALLOC_EXPORT_WORD0_TYPE(cf-output.type);
+   
S_SQ_CF_ALLOC_EXPORT_WORD0_TYPE(cf-output.type) |
+   
S_SQ_CF_ALLOC_EXPORT_WORD0_INDEX_GPR(cf-output.index_gpr);
bc-bytecode[id] = 
S_SQ_CF_ALLOC_EXPORT_WORD1_BURST_COUNT(cf-output.burst_count - 1) |

S_SQ_CF_ALLOC_EXPORT_WORD1_BARRIER(cf-barrier) |

S_SQ_CF_ALLOC_EXPORT_WORD1_CF_INST(opcode) |
diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index e062fcb..14ee2c2 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -1525,7 +1525,8 @@ static int r600_bytecode_cf_build(struct r600_bytecode 
*bc, struct r600_bytecode
bc-bytecode[id++] = 
S_SQ_CF_ALLOC_EXPORT_WORD0_RW_GPR(cf-output.gpr) |

S_SQ_CF_ALLOC_EXPORT_WORD0_ELEM_SIZE(cf-output.elem_size) |

S_SQ_CF_ALLOC_EXPORT_WORD0_ARRAY_BASE(cf-output.array_base) |
-   S_SQ_CF_ALLOC_EXPORT_WORD0_TYPE(cf-output.type);
+   S_SQ_CF_ALLOC_EXPORT_WORD0_TYPE(cf-output.type) |
+   
S_SQ_CF_ALLOC_EXPORT_WORD0_INDEX_GPR(cf-output.index_gpr);
bc-bytecode[id++] = 
S_SQ_CF_ALLOC_EXPORT_WORD1_BURST_COUNT(cf-output.burst_count - 1) |

S_SQ_CF_ALLOC_EXPORT_WORD1_SWIZ_SEL_X(cf-output.swizzle_x) |

S_SQ_CF_ALLOC_EXPORT_WORD1_SWIZ_SEL_Y(cf-output.swizzle_y) |
@@ -1538,7 +1539,8 @@ static int r600_bytecode_cf_build(struct r600_bytecode 
*bc, struct r600_bytecode
bc-bytecode[id++] = 
S_SQ_CF_ALLOC_EXPORT_WORD0_RW_GPR(cf-output.gpr) |

S_SQ_CF_ALLOC_EXPORT_WORD0_ELEM_SIZE(cf-output.elem_size) |

S_SQ_CF_ALLOC_EXPORT_WORD0_ARRAY_BASE(cf-output.array_base) |
-   S_SQ_CF_ALLOC_EXPORT_WORD0_TYPE(cf-output.type);
+   S_SQ_CF_ALLOC_EXPORT_WORD0_TYPE(cf-output.type) |
+   
S_SQ_CF_ALLOC_EXPORT_WORD0_INDEX_GPR(cf-output.index_gpr);
bc-bytecode[id++] = 
S_SQ_CF_ALLOC_EXPORT_WORD1_BURST_COUNT(cf-output.burst_count - 1) |
S_SQ_CF_ALLOC_EXPORT_WORD1_BARRIER(cf-barrier) |
S_SQ_CF_ALLOC_EXPORT_WORD1_CF_INST(opcode) |
@@ -1963,6 +1965,9 @@ void r600_bytecode_disasm(struct r600_bytecode *bc)
o += print_swizzle(7);
}
 
+   if (cf-output.type == 
V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_WRITE_IND)
+   o += fprintf(stderr,  R%d, 
cf-output.index_gpr);
+
o += print_indent(o, 67);
 
fprintf(stderr,  ES:%i , 
cf-output.elem_size);
diff --git a/src/gallium/drivers/r600/r600_asm.h 
b/src/gallium/drivers/r600/r600_asm.h
index 3bfbcb2..4fb0ef0 100644
--- 

[Mesa-dev] [PATCH 05/30] r600g: add hw register definitions for GS block setup

2014-02-03 Thread Dave Airlie
From: Vadim Girlin vadimgir...@gmail.com

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/evergreend.h | 80 ---
 src/gallium/drivers/r600/r600d.h  |  1 +
 2 files changed, 75 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreend.h 
b/src/gallium/drivers/r600/evergreend.h
index 9ba3db7..73b37c2 100644
--- a/src/gallium/drivers/r600/evergreend.h
+++ b/src/gallium/drivers/r600/evergreend.h
@@ -48,6 +48,7 @@
 #define EVENT_TYPE_ZPASS_DONE  0x15
 #define EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT   0x16
 #define EVENT_TYPE_SO_VGTSTREAMOUT_FLUSH   0x1f
+#define EVENT_TYPE_VGT_FLUSH   0x24
 #define EVENT_TYPE_FLUSH_AND_INV_DB_META   0x2c
 
 #defineEVENT_TYPE(x)   ((x)  0)
@@ -274,6 +275,11 @@
 #define   G_008E2C_NUM_LS_LDS(x)   (((x)  16)  0x)
 #define   C_008E2C_NUM_LS_LDS(x)   0x
 
+#define R_008C40_SQ_ESGS_RING_BASE0x8C40
+#define R_008C44_SQ_ESGS_RING_SIZE0x8C44
+#define R_008C48_SQ_GSVS_RING_BASE0x8C48
+#define R_008C4C_SQ_GSVS_RING_SIZE0x8C4C
+
 #define R_008CF0_SQ_MS_FIFO_SIZES 0x8CF0
 #define   S_008CF0_CACHE_FIFO_SIZE(x)  (((x)  0xFF)  0)
 #define   G_008CF0_CACHE_FIFO_SIZE(x)  (((x)  0)  0xFF)
@@ -821,12 +827,22 @@
 #define   S_028A40_MODE(x) (((x)  0x3)  0)
 #define   G_028A40_MODE(x) (((x)  0)  0x3)
 #define   C_028A40_MODE0xFFFC
+#define V_028A40_GS_OFF0
+#define V_028A40_GS_SCENARIO_A 1
+#define V_028A40_GS_SCENARIO_B 2
+#define V_028A40_GS_SCENARIO_G 3
+#define V_028A40_GS_SCENARIO_C 4
+#define V_028A40_SPRITE_EN 5
 #define   S_028A40_ES_PASSTHRU(x)  (((x)  0x1)  2)
 #define   G_028A40_ES_PASSTHRU(x)  (((x)  2)  0x1)
 #define   C_028A40_ES_PASSTHRU 0xFFFB
 #define   S_028A40_CUT_MODE(x) (((x)  0x3)  3)
 #define   G_028A40_CUT_MODE(x) (((x)  3)  0x3)
 #define   C_028A40_CUT_MODE0xFFE7
+#define V_028A40_GS_CUT_1024   0
+#define V_028A40_GS_CUT_5121
+#define V_028A40_GS_CUT_2562
+#define V_028A40_GS_CUT_1283
 #define   S_028A40_COMPUTE_MODE(x) (x  14)
 #define   S_028A40_PARTIAL_THD_AT_EOI(x)   (x  17)
 #define R_028A6C_VGT_GS_OUT_PRIM_TYPE0x028A6C
@@ -1201,6 +1217,7 @@
 #define   C_030008_ENDIAN_SWAP 0x3FFF
 
 #define R_03000C_SQ_VTX_CONSTANT_WORD3_0 0x03000C
+#define   S_03000C_UNCACHED(x) (((x)  0x1)  2)
 #define   S_03000C_DST_SEL_X(x)(((x)  0x7)  3)
 #define   G_03000C_DST_SEL_X(x)(((x)  3)  0x7)
 #define V_03000C_SQ_SEL_X  0x
@@ -1457,6 +1474,34 @@
 #define   G_028860_UNCACHED_FIRST_INST(x)  (((x)  28)  0x1)
 #define   C_028860_UNCACHED_FIRST_INST 0xEFFF
 
+#define R_028878_SQ_PGM_RESOURCES_GS 0x028878
+#define   S_028878_NUM_GPRS(x) (((x)  0xFF)  0)
+#define   G_028878_NUM_GPRS(x) (((x)  0)  0xFF)
+#define   C_028878_NUM_GPRS0xFF00
+#define   S_028878_STACK_SIZE(x)   (((x)  0xFF)  8)
+#define   G_028878_STACK_SIZE(x)   (((x)  8)  0xFF)
+#define   C_028878_STACK_SIZE  0x00FF
+#define   S_028878_DX10_CLAMP(x)   (((x)  0x1)  21)
+#define   G_028878_DX10_CLAMP(x)   (((x)  21)  0x1)
+#define   C_028878_DX10_CLAMP  0xFFDF
+#define   S_028878_UNCACHED_FIRST_INST(x)  (((x)  0x1)  28)
+#define   G_028878_UNCACHED_FIRST_INST(x)  (((x)  28)  0x1)
+#define   C_028878_UNCACHED_FIRST_INST 0xEFFF
+
+#define R_028890_SQ_PGM_RESOURCES_ES 0x028890
+#define   S_028890_NUM_GPRS(x) (((x)  0xFF)  0)
+#define   G_028890_NUM_GPRS(x) (((x)  0)  0xFF)
+#define   C_028890_NUM_GPRS0xFF00
+#define   S_028890_STACK_SIZE(x)   (((x)  0xFF)  8)
+#define   G_028890_STACK_SIZE(x)   (((x)  8)  0xFF)
+#define   C_028890_STACK_SIZE  0x00FF
+#define   S_028890_DX10_CLAMP(x)   (((x)  0x1)  21)
+#define  

[Mesa-dev] [PATCH 10/30] r600g: emit NOPs at end of shaders in more cases

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

If the shader has no CF clauses at all emit an nop
If the last instruction is an ENDLOOP add a NOP for the LOOP to go to
if the last instruction is CALL_FS add a NOP

These fix a bunch of hangs in the geometry shader tests.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_shader.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 59fda7b..5d3f7c8 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -1931,10 +1931,13 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
if (ctx.bc-chip_class == CAYMAN)
cm_bytecode_add_cf_end(ctx.bc);
else {
-   const struct cf_op_info *last = 
r600_isa_cf(ctx.bc-cf_last-op);
+   const struct cf_op_info *last = NULL;
+
+   if (ctx.bc-cf_last)
+   last = r600_isa_cf(ctx.bc-cf_last-op);
 
/* alu clause instructions don't have EOP bit, so add 
NOP */
-   if (last-flags  CF_ALU)
+   if (!last || last-flags  CF_ALU || 
ctx.bc-cf_last-op == CF_OP_LOOP_END || ctx.bc-cf_last-op == CF_OP_CALL_FS)
r600_bytecode_add_cfinst(ctx.bc, CF_OP_NOP);
 
ctx.bc-cf_last-end_of_program = 1;
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 22/30] r600g: initial VS output layer support

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

This just adds support for emitting the proper value in the VS out misc.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/evergreen_state.c |  3 +-
 src/gallium/drivers/r600/r600_pipe.c   |  2 +-
 src/gallium/drivers/r600/r600_shader.c | 58 +++---
 src/gallium/drivers/r600/r600_shader.h |  1 +
 4 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index e23f434..c784f2f 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -3726,7 +3726,8 @@ void evergreen_update_vs_state(struct pipe_context *ctx, 
struct r600_pipe_shader
S_02881C_VS_OUT_CCDIST0_VEC_ENA((rshader-clip_dist_write  
0x0F) != 0) |
S_02881C_VS_OUT_CCDIST1_VEC_ENA((rshader-clip_dist_write  
0xF0) != 0) |
S_02881C_VS_OUT_MISC_VEC_ENA(rshader-vs_out_misc_write) |
-   S_02881C_USE_VTX_POINT_SIZE(rshader-vs_out_point_size);
+   S_02881C_USE_VTX_POINT_SIZE(rshader-vs_out_point_size) |
+   S_02881C_USE_VTX_RENDER_TARGET_INDX(rshader-vs_out_layer);
 }
 
 void *evergreen_create_resolve_blend(struct r600_context *rctx)
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 3ca9580..e5fdac0 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -383,6 +383,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
/* Supported on Evergreen. */
case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
case PIPE_CAP_CUBE_MAP_ARRAY:
+   case PIPE_CAP_TGSI_VS_LAYER:
return family = CHIP_CEDAR ? 1 : 0;
 
/* Unsupported features. */
@@ -392,7 +393,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_FRAGMENT_COLOR_CLAMPED:
case PIPE_CAP_VERTEX_COLOR_CLAMPED:
case PIPE_CAP_USER_VERTEX_BUFFERS:
-   case PIPE_CAP_TGSI_VS_LAYER:
return 0;
 
/* Stream output. */
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index a58955a..0a9f55a 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -487,6 +487,7 @@ static int r600_spi_sid(struct r600_shader_io * io)
 */
if (name == TGSI_SEMANTIC_POSITION ||
name == TGSI_SEMANTIC_PSIZE ||
+   name == TGSI_SEMANTIC_LAYER ||
name == TGSI_SEMANTIC_FACE)
index = 0;
else {
@@ -618,6 +619,10 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx)
ctx-shader-vs_out_misc_write = 1;
ctx-shader-vs_out_point_size = 1;
break;
+   case TGSI_SEMANTIC_LAYER:
+   ctx-shader-vs_out_misc_write = 1;
+   ctx-shader-vs_out_layer = 1;
+   break;
case TGSI_SEMANTIC_CLIPVERTEX:
ctx-clip_vertex_write = TRUE;
ctx-cv_output = i;
@@ -1157,7 +1162,7 @@ static int generate_gs_copy_shader(struct r600_context 
*rctx,
struct r600_bytecode_output output;
struct r600_bytecode_cf *cf_jump, *cf_pop,
*last_exp_pos = NULL, *last_exp_param = NULL;
-   int i, next_pos = 60, next_param = 0;
+   int i, next_clip_pos = 62, next_param = 0;
 
cshader = calloc(1, sizeof(struct r600_pipe_shader));
if (!cshader)
@@ -1252,13 +1257,24 @@ static int generate_gs_copy_shader(struct r600_context 
*rctx,
output.op = CF_OP_EXPORT;
switch (out-name) {
case TGSI_SEMANTIC_POSITION:
-   output.array_base = next_pos++;
+   output.array_base = 60;
output.type = V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_POS;
break;
 
case TGSI_SEMANTIC_PSIZE:
-   output.array_base = next_pos++;
+   output.array_base = 61;
+   output.type = V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_POS;
+   output.swizzle_y = 7;
+   output.swizzle_z = 7;
+   output.swizzle_w = 7;
+   break;
+   case TGSI_SEMANTIC_LAYER:
+   output.array_base = 61;
output.type = V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_POS;
+   output.swizzle_x = 7;
+   output.swizzle_y = 7;
+   output.swizzle_z = 0;
+   output.swizzle_w = 7;
break;
 

[Mesa-dev] [PATCH 16/30] r600g: enable instance cnt register with new enough kernel

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

The instance cnt register was missing for a few kernels,
with a new enough kernel we can output it.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/evergreen_state.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index c206c08..0578520 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -3595,6 +3595,7 @@ void evergreen_update_es_state(struct pipe_context *ctx, 
struct r600_pipe_shader
 
 void evergreen_update_gs_state(struct pipe_context *ctx, struct 
r600_pipe_shader *shader)
 {
+   struct r600_context *rctx = (struct r600_context *)ctx;
struct r600_command_buffer *cb = shader-command_buffer;
struct r600_shader *rshader = shader-shader;
struct r600_shader *cp_shader = shader-gs_copy_shader-shader;
@@ -3611,12 +3612,11 @@ void evergreen_update_gs_state(struct pipe_context 
*ctx, struct r600_pipe_shader
r600_store_context_reg(cb, R_028B38_VGT_GS_MAX_VERT_OUT,
   
S_028B38_MAX_VERT_OUT(rshader-gs_max_out_vertices));
 
-
-/* XXX kernel checker fails
-   r600_store_context_reg(cb, R_028B90_VGT_GS_INSTANCE_CNT,
- S_028B90_CNT(0) |
-  S_028B90_ENABLE(0));
-*/
+   if (rctx-screen-b.info.drm_minor = 35) {
+   r600_store_context_reg(cb, R_028B90_VGT_GS_INSTANCE_CNT,
+   S_028B90_CNT(0) |
+   S_028B90_ENABLE(0));
+   }
r600_store_context_reg_seq(cb, R_02891C_SQ_GS_VERT_ITEMSIZE, 4);
r600_store_value(cb, cp_shader-ring_item_size  2);
r600_store_value(cb, 0);
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 18/30] r600g: add support for indirect geom ring writes

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

We need to be able to write to the ring using a base register
for when we emit vertices in a loop, in theory the SB compiler
could collapse these indirect writes to direct writes if the
register value is constant and known, but that is outside my
pay grade.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_shader.c | 65 ++
 1 file changed, 58 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index fd11f61..cc2000d 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -288,6 +288,7 @@ struct r600_shader_ctx {
int gs_out_ring_offset;
int gs_next_vertex;
struct r600_shader  *gs_for_vs;
+   int gs_export_gpr_treg;
 };
 
 struct r600_shader_tgsi_instruction {
@@ -297,7 +298,7 @@ struct r600_shader_tgsi_instruction {
int (*process)(struct r600_shader_ctx *ctx);
 };
 
-static int emit_gs_ring_writes(struct r600_shader_ctx *ctx);
+static int emit_gs_ring_writes(struct r600_shader_ctx *ctx, bool ind);
 static struct r600_shader_tgsi_instruction r600_shader_tgsi_instruction[], 
eg_shader_tgsi_instruction[], cm_shader_tgsi_instruction[];
 static int tgsi_helper_tempx_replicate(struct r600_shader_ctx *ctx);
 static inline void callstack_push(struct r600_shader_ctx *ctx, unsigned 
reason);
@@ -1303,7 +1304,7 @@ static int generate_gs_copy_shader(struct r600_context 
*rctx,
return r600_bytecode_build(ctx.bc);
 }
 
-static int emit_gs_ring_writes(struct r600_shader_ctx *ctx)
+static int emit_gs_ring_writes(struct r600_shader_ctx *ctx, bool ind)
 {
struct r600_bytecode_output output;
int i, k, ring_offset;
@@ -1328,17 +1329,48 @@ static int emit_gs_ring_writes(struct r600_shader_ctx 
*ctx)
 
/* next_ring_offset after parsing input decls contains total 
size of
 * single vertex data, gs_next_vertex - current vertex index */
-   ring_offset += ctx-gs_out_ring_offset * ctx-gs_next_vertex;
+   if (!ind)
+   ring_offset += ctx-gs_out_ring_offset * 
ctx-gs_next_vertex;
 
+   /* get a temp and add the ring offset to the next vertex base 
in the shader */
memset(output, 0, sizeof(struct r600_bytecode_output));
output.gpr = ctx-shader-output[i].gpr;
output.elem_size = 3;
output.comp_mask = 0xF;
output.burst_count = 1;
+
+   if (ind)
+   output.type = 
V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_WRITE_IND;
+   else
+   output.type = 
V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_WRITE;
output.op = CF_OP_MEM_RING;
-   output.array_base = ring_offset  2; /* in dwords */
+
+
+   if (ind) {
+   output.array_base = ring_offset  2; /* in dwords */
+   output.array_size = 0xff
+   output.index_gpr = ctx-gs_export_gpr_treg;
+   } else
+   output.array_base = ring_offset  2; /* in dwords */
r600_bytecode_add_output(ctx-bc, output);
}
+
+   if (ind) {
+   struct r600_bytecode_alu alu;
+   int r;
+
+   memset(alu, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ALU_OP2_ADD_INT;
+   alu.src[0].sel = ctx-gs_export_gpr_treg;
+   alu.src[1].sel = V_SQ_ALU_SRC_LITERAL;
+   alu.src[1].value = ctx-gs_out_ring_offset  4;
+   alu.dst.sel = ctx-gs_export_gpr_treg;
+   alu.dst.write = 1;
+   alu.last = 1;
+   r = r600_bytecode_add_alu(ctx-bc, alu);
+   if (r)
+   return r;
+   }
++ctx-gs_next_vertex;
return 0;
 }
@@ -1473,7 +1505,11 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
ctx.file_offset[TGSI_FILE_IMMEDIATE] = V_SQ_ALU_SRC_LITERAL;
ctx.bc-ar_reg = ctx.file_offset[TGSI_FILE_TEMPORARY] +
ctx.info.file_max[TGSI_FILE_TEMPORARY] + 1;
-   ctx.temp_reg = ctx.bc-ar_reg + 1;
+   if (ctx.type == TGSI_PROCESSOR_GEOMETRY) {
+   ctx.gs_export_gpr_treg = ctx.bc-ar_reg + 1;
+   ctx.temp_reg = ctx.bc-ar_reg + 2;
+   } else
+   ctx.temp_reg = ctx.bc-ar_reg + 1;
 
if (indirect_gprs) {
shader-max_arrays = 0;
@@ -1667,6 +1703,21 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
}
}
 
+   if (ctx.type == TGSI_PROCESSOR_GEOMETRY) {
+   struct r600_bytecode_alu alu;
+   int r;
+
+   

[Mesa-dev] [PATCH 12/30] r600g: increase array base for exported parameters

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

Trivial fix to Vadim's code.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_shader.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 5d3f7c8..f4a3194 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -1221,6 +1221,9 @@ static int generate_gs_copy_shader(struct r600_context 
*rctx,
output.swizzle_z = 4; /* 0 */
output.swizzle_w = 5; /* 1 */
break;
+   default:
+   output.array_base = next_param++;
+   break;
}
r600_bytecode_add_output(ctx.bc, output);
if (output.type == V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PARAM)
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/30] r600g: defer shader variant selection and depending state updates

2014-02-03 Thread Dave Airlie
From: Vadim Girlin vadimgir...@gmail.com

[airlied: fix dropped streamout line - fix for master]

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/evergreen_state.c   |   1 -
 src/gallium/drivers/r600/r600_state.c|   2 -
 src/gallium/drivers/r600/r600_state_common.c | 123 +--
 3 files changed, 57 insertions(+), 69 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 77e3c95..48bea1f 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -1829,7 +1829,6 @@ static void evergreen_set_framebuffer_state(struct 
pipe_context *ctx,
rctx-db_misc_state.atom.dirty = true;
}
 
-   evergreen_update_db_shader_control(rctx);
 
/* Calculate the CS size. */
rctx-framebuffer.atom.num_dw = 4; /* SCISSOR */
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index a0d527b..98e6924 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -1667,8 +1667,6 @@ static void r600_set_framebuffer_state(struct 
pipe_context *ctx,
rctx-alphatest_state.atom.dirty = true;
}
 
-   r600_update_db_shader_control(rctx);
-
/* Calculate the CS size. */
rctx-framebuffer.atom.num_dw =
10 /*COLOR_INFO*/ + 4 /*SCISSOR*/ + 3 /*SHADER_CONTROL*/ + 8 
/*MSAA*/;
diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index d8fab10..c05b74d 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -301,11 +301,6 @@ static void r600_bind_dsa_state(struct pipe_context *ctx, 
void *state)
rctx-alphatest_state.sx_alpha_test_control = 
dsa-sx_alpha_test_control;
rctx-alphatest_state.sx_alpha_ref = dsa-alpha_ref;
rctx-alphatest_state.atom.dirty = true;
-   if (rctx-b.chip_class = EVERGREEN) {
-   evergreen_update_db_shader_control(rctx);
-   } else {
-   r600_update_db_shader_control(rctx);
-   }
}
 }
 
@@ -709,7 +704,6 @@ static int r600_shader_select(struct pipe_context *ctx,
 bool *dirty)
 {
struct r600_shader_key key;
-   struct r600_context *rctx = (struct r600_context *)ctx;
struct r600_pipe_shader * shader = NULL;
int r;
 
@@ -771,11 +765,6 @@ static int r600_shader_select(struct pipe_context *ctx,
shader-next_variant = sel-current;
sel-current = shader;
 
-   if (rctx-ps_shader 
-   rctx-cb_misc_state.nr_ps_color_outputs != 
rctx-ps_shader-current-nr_ps_color_outputs) {
-   rctx-cb_misc_state.nr_ps_color_outputs = 
rctx-ps_shader-current-nr_ps_color_outputs;
-   rctx-cb_misc_state.atom.dirty = true;
-   }
return 0;
 }
 
@@ -784,16 +773,10 @@ static void *r600_create_shader_state(struct pipe_context 
*ctx,
   unsigned pipe_shader_type)
 {
struct r600_pipe_shader_selector *sel = 
CALLOC_STRUCT(r600_pipe_shader_selector);
-   int r;
 
sel-type = pipe_shader_type;
sel-tokens = tgsi_dup_tokens(state-tokens);
sel-so = state-stream_output;
-
-   r = r600_shader_select(ctx, sel, NULL);
-   if (r)
-   return NULL;
-
return sel;
 }
 
@@ -816,31 +799,7 @@ static void r600_bind_ps_state(struct pipe_context *ctx, 
void *state)
if (!state)
state = rctx-dummy_pixel_shader;
 
-   rctx-pixel_shader.shader = rctx-ps_shader = (struct 
r600_pipe_shader_selector *)state;
-   rctx-pixel_shader.atom.num_dw = 
rctx-ps_shader-current-command_buffer.num_dw;
-   rctx-pixel_shader.atom.dirty = true;
-
-   r600_context_add_resource_size(ctx, (struct pipe_resource 
*)rctx-ps_shader-current-bo);
-
-   if (rctx-b.chip_class = R700) {
-   bool multiwrite = rctx-ps_shader-current-shader.fs_write_all;
-
-   if (rctx-cb_misc_state.multiwrite != multiwrite) {
-   rctx-cb_misc_state.multiwrite = multiwrite;
-   rctx-cb_misc_state.atom.dirty = true;
-   }
-   }
-
-   if (rctx-cb_misc_state.nr_ps_color_outputs != 
rctx-ps_shader-current-nr_ps_color_outputs) {
-   rctx-cb_misc_state.nr_ps_color_outputs = 
rctx-ps_shader-current-nr_ps_color_outputs;
-   rctx-cb_misc_state.atom.dirty = true;
-   }
-
-   if (rctx-b.chip_class = EVERGREEN) {
-   evergreen_update_db_shader_control(rctx);
-   } else {
-   r600_update_db_shader_control(rctx);
-   }
+   rctx-ps_shader = (struct r600_pipe_shader_selector *)state;
 }
 
 static void r600_bind_vs_state(struct pipe_context *ctx, void 

[Mesa-dev] [PATCH 19/30] r600g: fix dynamic_input_array_index.shader_test

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

This follows what fglrx does, it unpacks the input we are
going to indirect into a bunch of registers and indirects
inside them.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_shader.c | 48 +++---
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index cc2000d..a58955a 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -834,10 +834,12 @@ static int tgsi_fetch_rel_const(struct r600_shader_ctx 
*ctx, unsigned int cb_idx
return 0;
 }
 
-static int fetch_gs_input(struct r600_shader_ctx *ctx, unsigned index, 
unsigned vtx_id, unsigned int dst_reg)
+static int fetch_gs_input(struct r600_shader_ctx *ctx, struct 
tgsi_full_src_register *src, unsigned int dst_reg)
 {
struct r600_bytecode_vtx vtx;
int r;
+   unsigned index = src-Register.Index;
+   unsigned vtx_id = src-Dimension.Index;
int offset_reg = vtx_id / 3;
int offset_chan = vtx_id % 3;
 
@@ -847,6 +849,46 @@ static int fetch_gs_input(struct r600_shader_ctx *ctx, 
unsigned index, unsigned
if (offset_reg == 0  offset_chan == 2)
offset_chan = 3;
 
+   if (src-Dimension.Indirect) {
+   int treg[3];
+   int t2;
+   struct r600_bytecode_alu alu;
+   int r, i;
+
+   /* you have got to be shitting me -
+  we have to put the R0.x/y/w into Rt.x Rt+1.x Rt+2.x then 
index reg from Rt.
+  at least this is what fglrx seems to do. */
+   for (i = 0; i  3; i++) {
+   treg[i] = r600_get_temp(ctx);
+   }
+   t2 = r600_get_temp(ctx);
+   for (i = 0; i  3; i++) {
+   memset(alu, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ALU_OP1_MOV;
+   alu.src[0].sel = 0;
+   alu.src[0].chan = i == 2 ? 3 : i;
+   alu.dst.sel = treg[i];
+   alu.dst.chan = 0;
+   alu.dst.write = 1;
+   alu.last = 1;
+   r = r600_bytecode_add_alu(ctx-bc, alu);
+   if (r)
+   return r;
+   }
+   memset(alu, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ALU_OP1_MOV;
+   alu.src[0].sel = treg[0];
+   alu.src[0].rel = 1;
+   alu.dst.sel = t2;
+   alu.dst.write = 1;
+   alu.last = 1;
+   r = r600_bytecode_add_alu(ctx-bc, alu);
+   if (r)
+   return r;
+   offset_reg = t2;
+   }
+
+
memset(vtx, 0, sizeof(vtx));
vtx.buffer_id = R600_GS_RING_CONST_BUFFER;
vtx.fetch_type = 2; /* VTX_FETCH_NO_INDEX_OFFSET */
@@ -884,10 +926,8 @@ static int tgsi_split_gs_inputs(struct r600_shader_ctx 
*ctx)
}
if (src-Register.File == TGSI_FILE_INPUT  
src-Register.Dimension) {
int treg = r600_get_temp(ctx);
-   int index = src-Register.Index;
-   int vtx_id = src-Dimension.Index;
 
-   fetch_gs_input(ctx, index, vtx_id, treg);
+   fetch_gs_input(ctx, src, treg);
ctx-src[i].sel = treg;
}
}
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 21/30] r600g: setup const texture buffers for geom shaders

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

This just enables the workarounds we have for vertex/pixel shaders
for geom shaders as well.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_state_common.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index 96fdd0e..0c32974 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -1252,11 +1252,15 @@ static bool r600_update_derived_state(struct 
r600_context *rctx)
r600_setup_buffer_constants(rctx, PIPE_SHADER_FRAGMENT);
if (rctx-vs_shader  
rctx-vs_shader-current-shader.uses_tex_buffers)
r600_setup_buffer_constants(rctx, PIPE_SHADER_VERTEX);
+   if (rctx-gs_shader  
rctx-gs_shader-current-shader.uses_tex_buffers)
+   r600_setup_buffer_constants(rctx, PIPE_SHADER_GEOMETRY);
} else {
if (rctx-ps_shader  
rctx-ps_shader-current-shader.uses_tex_buffers)
eg_setup_buffer_constants(rctx, PIPE_SHADER_FRAGMENT);
if (rctx-vs_shader  
rctx-vs_shader-current-shader.uses_tex_buffers)
eg_setup_buffer_constants(rctx, PIPE_SHADER_VERTEX);
+   if (rctx-gs_shader  
rctx-gs_shader-current-shader.uses_tex_buffers)
+   eg_setup_buffer_constants(rctx, PIPE_SHADER_GEOMETRY);
}
 
 
@@ -1264,6 +1268,8 @@ static bool r600_update_derived_state(struct r600_context 
*rctx)
r600_setup_txq_cube_array_constants(rctx, PIPE_SHADER_FRAGMENT);
if (rctx-vs_shader  
rctx-vs_shader-current-shader.has_txq_cube_array_z_comp)
r600_setup_txq_cube_array_constants(rctx, PIPE_SHADER_VERTEX);
+   if (rctx-gs_shader  
rctx-gs_shader-current-shader.has_txq_cube_array_z_comp)
+   r600_setup_txq_cube_array_constants(rctx, PIPE_SHADER_GEOMETRY);
 
if (rctx-b.chip_class  EVERGREEN  rctx-ps_shader  
rctx-vs_shader) {
if (!r600_adjust_gprs(rctx)) {
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 27/30] r600g: calculate a better value for array_size

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

attempt to calculate a better value for array size to avoid breaking apps.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_shader.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 8fa7054..f0e980b 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -1416,7 +1416,7 @@ static int emit_gs_ring_writes(struct r600_shader_ctx 
*ctx, bool ind)
 
if (ind) {
output.array_base = ring_offset  2; /* in dwords */
-   output.array_size = 0xff
+   output.array_size = ctx-shader-gs_max_out_vertices * 
4;
output.index_gpr = ctx-gs_export_gpr_treg;
} else
output.array_base = ring_offset  2; /* in dwords */
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/30] r600g: emit streamout from dma copy shader

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

This enables streamout with GS in the mix, from the
VS dma shader.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_shader.c   | 6 --
 src/gallium/drivers/r600/r600_state_common.c | 4 
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 07b343b..e24b64d 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -1095,7 +1095,8 @@ out_err:
 }
 
 static int generate_gs_copy_shader(struct r600_context *rctx,
-  struct r600_pipe_shader *gs)
+  struct r600_pipe_shader *gs,
+  struct pipe_stream_output_info *so)
 {
struct r600_shader_ctx ctx = {};
struct r600_shader *gs_shader = gs-shader;
@@ -1179,6 +1180,7 @@ static int generate_gs_copy_shader(struct r600_context 
*rctx,
}
 
/* XXX handle clipvertex, streamout? */
+   emit_streamout(ctx, so);
 
/* export vertex data */
/* XXX factor out common code with r600_shader_from_tgsi ? */
@@ -1961,7 +1963,7 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
}
 
if (ctx.type == TGSI_PROCESSOR_GEOMETRY) {
-   if ((r = generate_gs_copy_shader(rctx, pipeshader)))
+   if ((r = generate_gs_copy_shader(rctx, pipeshader, so)))
return r;
}
 
diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index 3221a8e..6cc6ef0 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -826,6 +826,10 @@ static void r600_bind_gs_state(struct pipe_context *ctx, 
void *state)
struct r600_context *rctx = (struct r600_context *)ctx;
 
rctx-gs_shader = (struct r600_pipe_shader_selector *)state;
+
+   if (!state)
+   return;
+   rctx-b.streamout.stride_in_dw = rctx-gs_shader-so.stride;
 }
 
 static void r600_delete_shader_selector(struct pipe_context *ctx,
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/30] r600g: move barrier and end_of_program bits from output to cf struct (v2)

2014-02-03 Thread Dave Airlie
From: Vadim Girlin vadimgir...@gmail.com

v2: fix regression on r600 NOP instructions.

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
Signed-off-by: Dave Airlie airl...@redhat.com

Fix regression since eop moving
---
 src/gallium/drivers/r600/eg_asm.c  | 10 ++
 src/gallium/drivers/r600/r600_asm.c| 24 +---
 src/gallium/drivers/r600/r600_asm.h|  4 ++--
 src/gallium/drivers/r600/r600_shader.c | 26 +-
 4 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/src/gallium/drivers/r600/eg_asm.c 
b/src/gallium/drivers/r600/eg_asm.c
index fffc436..42e78c0 100644
--- a/src/gallium/drivers/r600/eg_asm.c
+++ b/src/gallium/drivers/r600/eg_asm.c
@@ -86,11 +86,11 @@ int eg_bytecode_cf_build(struct r600_bytecode *bc, struct 
r600_bytecode_cf *cf)

S_SQ_CF_ALLOC_EXPORT_WORD1_SWIZ_SEL_Y(cf-output.swizzle_y) |

S_SQ_CF_ALLOC_EXPORT_WORD1_SWIZ_SEL_Z(cf-output.swizzle_z) |

S_SQ_CF_ALLOC_EXPORT_WORD1_SWIZ_SEL_W(cf-output.swizzle_w) |
-   
S_SQ_CF_ALLOC_EXPORT_WORD1_BARRIER(cf-output.barrier) |
+   
S_SQ_CF_ALLOC_EXPORT_WORD1_BARRIER(cf-barrier) |

S_SQ_CF_ALLOC_EXPORT_WORD1_CF_INST(opcode);
 
if (bc-chip_class == EVERGREEN) /* no EOP on cayman */
-   bc-bytecode[id] |= 
S_SQ_CF_ALLOC_EXPORT_WORD1_END_OF_PROGRAM(cf-output.end_of_program);
+   bc-bytecode[id] |= 
S_SQ_CF_ALLOC_EXPORT_WORD1_END_OF_PROGRAM(cf-end_of_program);
id++;
} else if (cfop-flags  CF_STRM) {
/* MEM_STREAM instructions */
@@ -99,12 +99,12 @@ int eg_bytecode_cf_build(struct r600_bytecode *bc, struct 
r600_bytecode_cf *cf)

S_SQ_CF_ALLOC_EXPORT_WORD0_ARRAY_BASE(cf-output.array_base) |

S_SQ_CF_ALLOC_EXPORT_WORD0_TYPE(cf-output.type);
bc-bytecode[id] = 
S_SQ_CF_ALLOC_EXPORT_WORD1_BURST_COUNT(cf-output.burst_count - 1) |
-   
S_SQ_CF_ALLOC_EXPORT_WORD1_BARRIER(cf-output.barrier) |
+   
S_SQ_CF_ALLOC_EXPORT_WORD1_BARRIER(cf-barrier) |

S_SQ_CF_ALLOC_EXPORT_WORD1_CF_INST(opcode) |

S_SQ_CF_ALLOC_EXPORT_WORD1_BUF_COMP_MASK(cf-output.comp_mask) |

S_SQ_CF_ALLOC_EXPORT_WORD1_BUF_ARRAY_SIZE(cf-output.array_size);
if (bc-chip_class == EVERGREEN) /* no EOP on cayman */
-   bc-bytecode[id] |= 
S_SQ_CF_ALLOC_EXPORT_WORD1_END_OF_PROGRAM(cf-output.end_of_program);
+   bc-bytecode[id] |= 
S_SQ_CF_ALLOC_EXPORT_WORD1_END_OF_PROGRAM(cf-end_of_program);
id++;
} else {
/* branch, loop, call, return instructions */
@@ -118,6 +118,7 @@ int eg_bytecode_cf_build(struct r600_bytecode *bc, struct 
r600_bytecode_cf *cf)
return 0;
 }
 
+#if 0
 void eg_bytecode_export_read(struct r600_bytecode *bc,
struct r600_bytecode_output *output, uint32_t word0, uint32_t 
word1)
 {
@@ -138,3 +139,4 @@ void eg_bytecode_export_read(struct r600_bytecode *bc,
output-array_size = G_SQ_CF_ALLOC_EXPORT_WORD1_BUF_ARRAY_SIZE(word1);
output-comp_mask = G_SQ_CF_ALLOC_EXPORT_WORD1_BUF_COMP_MASK(word1);
 }
+#endif
diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index c5922a8..e062fcb 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -193,7 +193,6 @@ int r600_bytecode_add_output(struct r600_bytecode *bc,
if ((output-gpr + output-burst_count) == 
bc-cf_last-output.gpr 
(output-array_base + output-burst_count) == 
bc-cf_last-output.array_base) {
 
-   bc-cf_last-output.end_of_program |= 
output-end_of_program;
bc-cf_last-op = bc-cf_last-output.op = output-op;
bc-cf_last-output.gpr = output-gpr;
bc-cf_last-output.array_base = output-array_base;
@@ -203,7 +202,6 @@ int r600_bytecode_add_output(struct r600_bytecode *bc,
} else if (output-gpr == (bc-cf_last-output.gpr + 
bc-cf_last-output.burst_count) 
output-array_base == (bc-cf_last-output.array_base + 
bc-cf_last-output.burst_count)) {
 
-   bc-cf_last-output.end_of_program |= 
output-end_of_program;
bc-cf_last-op = bc-cf_last-output.op = output-op;
bc-cf_last-output.burst_count += output-burst_count;
return 0;

[Mesa-dev] [PATCH 28/30] r600g/sb: fix assert in SB code

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

---
 src/gallium/drivers/r600/sb/sb_shader.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/sb/sb_shader.cpp 
b/src/gallium/drivers/r600/sb/sb_shader.cpp
index 9f49ac8..f996c07 100644
--- a/src/gallium/drivers/r600/sb/sb_shader.cpp
+++ b/src/gallium/drivers/r600/sb/sb_shader.cpp
@@ -215,7 +215,7 @@ void shader::init() {
 void shader::init_call_fs(cf_node* cf) {
unsigned gpr = 0;
 
-   assert(target == TARGET_VS);
+   assert(target == TARGET_VS || target == TARGET_ES);
 
for(inputs_vec::const_iterator I = inputs.begin(),
E = inputs.end(); I != E; ++I, ++gpr) {
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 20/30] r600g: calculate correct cut value

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

This selects the cut value depending on the shader selected.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/evergreen_state.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 2828a75..e23f434 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -2728,12 +2728,22 @@ static void evergreen_emit_shader_stages(struct 
r600_context *rctx, struct r600_
uint32_t v = 0, v2 = 0, primid = 0;
 
if (state-geom_enable) {
+   uint32_t cut_val;
+
+   if (rctx-gs_shader-current-shader.gs_max_out_vertices = 128)
+   cut_val = V_028A40_GS_CUT_128;
+   else if (rctx-gs_shader-current-shader.gs_max_out_vertices 
= 256)
+   cut_val = V_028A40_GS_CUT_256;
+   else if (rctx-gs_shader-current-shader.gs_max_out_vertices 
= 512)
+   cut_val = V_028A40_GS_CUT_512;
+   else
+   cut_val = V_028A40_GS_CUT_1024;
v = S_028B54_ES_EN(V_028B54_ES_STAGE_REAL) |
S_028B54_GS_EN(1) |
S_028B54_VS_EN(V_028B54_VS_STAGE_COPY_SHADER);
 
v2 = S_028A40_MODE(V_028A40_GS_SCENARIO_G) |
-   S_028A40_CUT_MODE(V_028A40_GS_CUT_128);
+   S_028A40_CUT_MODE(cut_val);
 
if (rctx-gs_shader-current-shader.gs_prim_id_input)
primid = 1;
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 23/30] r600g: port the layered surface rendering patch from radeonsi

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

This just makes r600 and evergreen do what the radeonsi codepaths do
for layered rendering. This makes the 2d amd_vertex_shader_layer test
pass on evergreen.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/evergreen_state.c   | 18 +-
 src/gallium/drivers/r600/r600_state.c| 21 ++---
 src/gallium/drivers/r600/r600_state_common.c |  1 -
 3 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index c784f2f..c86e812 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -1407,7 +1407,7 @@ void evergreen_init_color_surface(struct r600_context 
*rctx,
struct pipe_resource *pipe_tex = surf-base.texture;
unsigned level = surf-base.u.tex.level;
unsigned pitch, slice;
-   unsigned color_info, color_attrib, color_dim = 0;
+   unsigned color_info, color_attrib, color_dim = 0, color_view;
unsigned format, swap, ntype, endian;
uint64_t offset, base_offset;
unsigned non_disp_tiling, macro_aspect, tile_split, bankh, bankw, 
fmask_bankh, nbanks;
@@ -1416,10 +1416,15 @@ void evergreen_init_color_surface(struct r600_context 
*rctx,
bool blend_clamp = 0, blend_bypass = 0;
 
offset = rtex-surface.level[level].offset;
-   if (rtex-surface.level[level].mode  RADEON_SURF_MODE_1D) {
+   if (rtex-surface.level[level].mode == RADEON_SURF_MODE_LINEAR) {
+   assert(surf-base.u.tex.first_layer == 
surf-base.u.tex.last_layer);
offset += rtex-surface.level[level].slice_size *
  surf-base.u.tex.first_layer;
-   }
+   color_view = 0;
+   } else
+   color_view = S_028C6C_SLICE_START(surf-base.u.tex.first_layer) 
|
+S_028C6C_SLICE_MAX(surf-base.u.tex.last_layer);
+
pitch = (rtex-surface.level[level].nblk_x) / 8 - 1;
slice = (rtex-surface.level[level].nblk_x * 
rtex-surface.level[level].nblk_y) / 64;
if (slice) {
@@ -1569,12 +1574,7 @@ void evergreen_init_color_surface(struct r600_context 
*rctx,
surf-cb_color_info = color_info;
surf-cb_color_pitch = S_028C64_PITCH_TILE_MAX(pitch);
surf-cb_color_slice = S_028C68_SLICE_TILE_MAX(slice);
-   if (rtex-surface.level[level].mode  RADEON_SURF_MODE_1D) {
-   surf-cb_color_view = 0;
-   } else {
-   surf-cb_color_view = 
S_028C6C_SLICE_START(surf-base.u.tex.first_layer) |
- 
S_028C6C_SLICE_MAX(surf-base.u.tex.last_layer);
-   }
+   surf-cb_color_view = color_view;
surf-cb_color_attrib = color_attrib;
if (rtex-fmask.size) {
surf-cb_color_fmask = (base_offset + rtex-fmask.offset)  8;
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index 98e6924..e0c801e 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -1264,6 +1264,7 @@ static void r600_init_color_surface(struct r600_context 
*rctx,
unsigned level = surf-base.u.tex.level;
unsigned pitch, slice;
unsigned color_info;
+   unsigned color_view;
unsigned format, swap, ntype, endian;
unsigned offset;
const struct util_format_description *desc;
@@ -1277,10 +1278,15 @@ static void r600_init_color_surface(struct r600_context 
*rctx,
}
 
offset = rtex-surface.level[level].offset;
-   if (rtex-surface.level[level].mode  RADEON_SURF_MODE_1D) {
+   if (rtex-surface.level[level].mode == RADEON_SURF_MODE_LINEAR) {
+   assert(surf-base.u.tex.first_layer == 
surf-base.u.tex.last_layer);
offset += rtex-surface.level[level].slice_size *
- surf-base.u.tex.first_layer;
-   }
+   surf-base.u.tex.first_layer;
+   color_view = 0;
+   } else
+   color_view = S_028080_SLICE_START(surf-base.u.tex.first_layer) 
|
+S_028080_SLICE_MAX(surf-base.u.tex.last_layer);
+
pitch = rtex-surface.level[level].nblk_x / 8 - 1;
slice = (rtex-surface.level[level].nblk_x * 
rtex-surface.level[level].nblk_y) / 64;
if (slice) {
@@ -1466,14 +1472,7 @@ static void r600_init_color_surface(struct r600_context 
*rctx,
}
 
surf-cb_color_info = color_info;
-
-   if (rtex-surface.level[level].mode  RADEON_SURF_MODE_1D) {
-   surf-cb_color_view = 0;
-   } else {
-   surf-cb_color_view = 
S_028080_SLICE_START(surf-base.u.tex.first_layer) |
- 
S_028080_SLICE_MAX(surf-base.u.tex.last_layer);
-   }
-
+   surf-cb_color_view = color_view;
surf-color_initialized = true;
 }
 
diff --git 

[Mesa-dev] [PATCH 25/30] r600g: bump max combined samplers

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

This just bumps the number to the 3.2 minimum to represent geometry
shader samplers.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_pipe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index e5fdac0..2fcef28 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -416,7 +416,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
return rscreen-b.info.drm_minor = 9 ?
(family = CHIP_CEDAR ? 16384 : 8192) : 0;
case PIPE_CAP_MAX_COMBINED_SAMPLERS:
-   return 32;
+   return 48;
 
/* Render targets. */
case PIPE_CAP_MAX_RENDER_TARGETS:
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 26/30] r600g: fix CAYMAN geometry shader support

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

cayman has a different end of program bit, so do that properly.

fixes hangs with geom shader tests on cayman.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_shader.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 3535a13..8fa7054 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -1357,8 +1357,12 @@ static int generate_gs_copy_shader(struct r600_context 
*rctx,
cf_pop-cf_addr = cf_pop-id + 2;
cf_pop-pop_count = 1;
 
-   r600_bytecode_add_cfinst(ctx.bc, CF_OP_NOP);
-   ctx.bc-cf_last-end_of_program = 1;
+   if (ctx.bc-chip_class == CAYMAN)
+   cm_bytecode_add_cf_end(ctx.bc);
+   else {
+   r600_bytecode_add_cfinst(ctx.bc, CF_OP_NOP);
+   ctx.bc-cf_last-end_of_program = 1;
+   }
 
gs-gs_copy_shader = cshader;
 
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 30/30] r600g: add support for geom shaders to r600/r700 chipsets

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

This is my first attempt at enabling r600/r700 geometry shaders,
the basic tests pass on both my rv770 and my rv635,

It requires this kernel patch:
http://www.spinics.net/lists/dri-devel/msg52745.html

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_asm.c  |   2 +-
 src/gallium/drivers/r600/r600_pipe.c |  16 +-
 src/gallium/drivers/r600/r600_pipe.h |   3 +
 src/gallium/drivers/r600/r600_shader.c   |  26 ++-
 src/gallium/drivers/r600/r600_state.c| 264 +++
 src/gallium/drivers/r600/r600_state_common.c |   2 +-
 src/gallium/drivers/r600/r600d.h |  50 -
 7 files changed, 314 insertions(+), 49 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index 899a8ef..3afe7b2 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -1535,7 +1535,7 @@ static int r600_bytecode_cf_build(struct r600_bytecode 
*bc, struct r600_bytecode
S_SQ_CF_ALLOC_EXPORT_WORD1_BARRIER(cf-barrier) |
S_SQ_CF_ALLOC_EXPORT_WORD1_CF_INST(opcode) |

S_SQ_CF_ALLOC_EXPORT_WORD1_END_OF_PROGRAM(cf-end_of_program);
-   } else if (cfop-flags  CF_STRM) {
+   } else if (cfop-flags  CF_MEM) {
bc-bytecode[id++] = 
S_SQ_CF_ALLOC_EXPORT_WORD0_RW_GPR(cf-output.gpr) |

S_SQ_CF_ALLOC_EXPORT_WORD0_ELEM_SIZE(cf-output.elem_size) |

S_SQ_CF_ALLOC_EXPORT_WORD0_ARRAY_BASE(cf-output.array_base) |
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 7ed70e1..edf37ad 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -372,7 +372,12 @@ static int r600_get_param(struct pipe_screen* pscreen, 
enum pipe_cap param)
return 1;
 
case PIPE_CAP_GLSL_FEATURE_LEVEL:
-   return family = CHIP_CEDAR ? 330 : 140;
+   if (family = CHIP_CEDAR)
+  return 330;
+   /* pre-evergreen geom shaders need newer kernel */
+   if (rscreen-b.info.drm_minor = 37)
+  return 330;
+   return 140;
 
/* Supported except the original R600. */
case PIPE_CAP_INDEP_BLEND_ENABLE:
@@ -458,9 +463,12 @@ static int r600_get_shader_param(struct pipe_screen* 
pscreen, unsigned shader, e
case PIPE_SHADER_COMPUTE:
break;
case PIPE_SHADER_GEOMETRY:
-   if (rscreen-b.chip_class  EVERGREEN)
-   return 0;
-   break;
+   if (rscreen-b.family = CHIP_CEDAR)
+   break;
+   /* pre-evergreen geom shaders need newer kernel */
+   if (rscreen-b.info.drm_minor = 37)
+   break;
+   return 0;
default:
/* XXX: support tessellation on Evergreen */
return 0;
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 2d2c79b..9f27a17 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -160,6 +160,7 @@ struct r600_sample_mask {
 struct r600_config_state {
struct r600_atom atom;
unsigned sq_gpr_resource_mgmt_1;
+   unsigned sq_gpr_resource_mgmt_2;
 };
 
 struct r600_stencil_ref
@@ -565,6 +566,8 @@ r600_create_sampler_view_custom(struct pipe_context *ctx,
 void r600_init_state_functions(struct r600_context *rctx);
 void r600_init_atom_start_cs(struct r600_context *rctx);
 void r600_update_ps_state(struct pipe_context *ctx, struct r600_pipe_shader 
*shader);
+void r600_update_es_state(struct pipe_context *ctx, struct r600_pipe_shader 
*shader);
+void r600_update_gs_state(struct pipe_context *ctx, struct r600_pipe_shader 
*shader);
 void r600_update_vs_state(struct pipe_context *ctx, struct r600_pipe_shader 
*shader);
 void *r600_create_db_flush_dsa(struct r600_context *rctx);
 void *r600_create_resolve_blend(struct r600_context *rctx);
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index f0e980b..ffe4749 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -210,7 +210,8 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
evergreen_update_gs_state(ctx, shader);
evergreen_update_vs_state(ctx, shader-gs_copy_shader);
} else {
-   assert(!not suported yet);
+   r600_update_gs_state(ctx, shader);
+   r600_update_vs_state(ctx, shader-gs_copy_shader);
}
break;
case TGSI_PROCESSOR_VERTEX:
@@ -220,7 +221,10 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
else
  

[Mesa-dev] [PATCH 29/30] r600g: enable GLSL 3.30 on evergreen GPUs

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

This throws the switch to enable GL 3.3 and GLSL 330.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_pipe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 2fcef28..7ed70e1 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -372,7 +372,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
return 1;
 
case PIPE_CAP_GLSL_FEATURE_LEVEL:
-   return 140;
+   return family = CHIP_CEDAR ? 330 : 140;
 
/* Supported except the original R600. */
case PIPE_CAP_INDEP_BLEND_ENABLE:
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 24/30] r600g: fix up shader out misc stuff for copy shader

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

set the correct values so the misc out register is setup correctly
for the copy shader.

This also updates the state for the gs copy shader so the hw
gets programmed correctly.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_shader.c   | 10 +-
 src/gallium/drivers/r600/r600_state_common.c |  7 +++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 0a9f55a..3535a13 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -1162,7 +1162,7 @@ static int generate_gs_copy_shader(struct r600_context 
*rctx,
struct r600_bytecode_output output;
struct r600_bytecode_cf *cf_jump, *cf_pop,
*last_exp_pos = NULL, *last_exp_param = NULL;
-   int i, next_clip_pos = 62, next_param = 0;
+   int i, next_clip_pos = 61, next_param = 0;
 
cshader = calloc(1, sizeof(struct r600_pipe_shader));
if (!cshader)
@@ -1263,18 +1263,26 @@ static int generate_gs_copy_shader(struct r600_context 
*rctx,
 
case TGSI_SEMANTIC_PSIZE:
output.array_base = 61;
+   if (next_clip_pos == 61)
+   next_clip_pos = 62;
output.type = V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_POS;
output.swizzle_y = 7;
output.swizzle_z = 7;
output.swizzle_w = 7;
+   ctx.shader-vs_out_misc_write = 1;
+   ctx.shader-vs_out_point_size = 1;
break;
case TGSI_SEMANTIC_LAYER:
output.array_base = 61;
+   if (next_clip_pos == 61)
+   next_clip_pos = 62;
output.type = V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_POS;
output.swizzle_x = 7;
output.swizzle_y = 7;
output.swizzle_z = 0;
output.swizzle_w = 7;
+   ctx.shader-vs_out_misc_write = 1;
+   ctx.shader-vs_out_layer = 1;
break;
case TGSI_SEMANTIC_CLIPDIST:
/* spi_sid is 0 for clipdistance outputs that were 
generated
diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index f237d26..ffa1eb0 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -1171,6 +1171,13 @@ static bool r600_update_derived_state(struct 
r600_context *rctx)
if (unlikely(rctx-geometry_shader.shader != 
rctx-gs_shader-current)) {
update_shader_atom(ctx, rctx-geometry_shader, 
rctx-gs_shader-current);
update_shader_atom(ctx, rctx-vertex_shader, 
rctx-gs_shader-current-gs_copy_shader);
+   /* Update clip misc state. */
+   if 
(rctx-gs_shader-current-gs_copy_shader-pa_cl_vs_out_cntl != 
rctx-clip_misc_state.pa_cl_vs_out_cntl ||
+   
rctx-gs_shader-current-gs_copy_shader-shader.clip_dist_write != 
rctx-clip_misc_state.clip_dist_write) {
+   rctx-clip_misc_state.pa_cl_vs_out_cntl = 
rctx-gs_shader-current-gs_copy_shader-pa_cl_vs_out_cntl;
+   rctx-clip_misc_state.clip_dist_write = 
rctx-gs_shader-current-gs_copy_shader-shader.clip_dist_write;
+   rctx-clip_misc_state.atom.dirty = true;
+   }
}
 
r600_shader_select(ctx, rctx-vs_shader, vs_dirty);
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/30] r600g: split streamout emit code into a separate function

2014-02-03 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

For geometry shaders we need to call this code from a second place.

Just move it out for now to keep future patches cleaner.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_shader.c | 213 +
 1 file changed, 110 insertions(+), 103 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 1ea4ae6..5fd445e 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -875,6 +875,114 @@ static int process_twoside_color_inputs(struct 
r600_shader_ctx *ctx)
return 0;
 }
 
+static int emit_streamout(struct r600_shader_ctx *ctx, struct 
pipe_stream_output_info *so)
+{
+   unsigned so_gpr[PIPE_MAX_SHADER_OUTPUTS];
+   int i, j, r;
+
+   /* Sanity checking. */
+   if (so-num_outputs  PIPE_MAX_SHADER_OUTPUTS) {
+   R600_ERR(Too many stream outputs: %d\n, so-num_outputs);
+   r = -EINVAL;
+   goto out_err;
+   }
+   for (i = 0; i  so-num_outputs; i++) {
+   if (so-output[i].output_buffer = 4) {
+   R600_ERR(Exceeded the max number of stream output 
buffers, got: %d\n,
+so-output[i].output_buffer);
+   r = -EINVAL;
+   goto out_err;
+   }
+   }
+
+   /* Initialize locations where the outputs are stored. */
+   for (i = 0; i  so-num_outputs; i++) {
+   so_gpr[i] = 
ctx-shader-output[so-output[i].register_index].gpr;
+
+   /* Lower outputs with dst_offset  start_component.
+*
+* We can only output 4D vectors with a write mask, e.g. we can
+* only output the W component at offset 3, etc. If we want
+* to store Y, Z, or W at buffer offset 0, we need to use MOV
+* to move it to X and output X. */
+   if (so-output[i].dst_offset  so-output[i].start_component) {
+   unsigned tmp = r600_get_temp(ctx);
+
+   for (j = 0; j  so-output[i].num_components; j++) {
+   struct r600_bytecode_alu alu;
+   memset(alu, 0, sizeof(struct 
r600_bytecode_alu));
+   alu.op = ALU_OP1_MOV;
+   alu.src[0].sel = so_gpr[i];
+   alu.src[0].chan = so-output[i].start_component 
+ j;
+
+   alu.dst.sel = tmp;
+   alu.dst.chan = j;
+   alu.dst.write = 1;
+   if (j == so-output[i].num_components - 1)
+   alu.last = 1;
+   r = r600_bytecode_add_alu(ctx-bc, alu);
+   if (r)
+   return r;
+   }
+   so-output[i].start_component = 0;
+   so_gpr[i] = tmp;
+   }
+   }
+
+   /* Write outputs to buffers. */
+   for (i = 0; i  so-num_outputs; i++) {
+   struct r600_bytecode_output output;
+
+   memset(output, 0, sizeof(struct r600_bytecode_output));
+   output.gpr = so_gpr[i];
+   output.elem_size = so-output[i].num_components;
+   output.array_base = so-output[i].dst_offset - 
so-output[i].start_component;
+   output.type = V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_WRITE;
+   output.burst_count = 1;
+   output.barrier = 1;
+   /* array_size is an upper limit for the burst_count
+* with MEM_STREAM instructions */
+   output.array_size = 0xFFF;
+   output.comp_mask = ((1  so-output[i].num_components) - 1)  
so-output[i].start_component;
+   if (ctx-bc-chip_class = EVERGREEN) {
+   switch (so-output[i].output_buffer) {
+   case 0:
+   output.op = CF_OP_MEM_STREAM0_BUF0;
+   break;
+   case 1:
+   output.op = CF_OP_MEM_STREAM0_BUF1;
+   break;
+   case 2:
+   output.op = CF_OP_MEM_STREAM0_BUF2;
+   break;
+   case 3:
+   output.op = CF_OP_MEM_STREAM0_BUF3;
+   break;
+   }
+   } else {
+   switch (so-output[i].output_buffer) {
+   case 0:
+   output.op = CF_OP_MEM_STREAM0;
+   break;
+   case 1:
+   output.op = CF_OP_MEM_STREAM1;

Re: [Mesa-dev] [PATCH 1/2] gallium: remove PIPE_CAP_MAX_COMBINED_SAMPLERS

2014-02-03 Thread Marek Olšák
Ilia,

I see you worked on nv30. Are vertex shader texture slots shared with
fragment shader texture slots on nv40? If yes, does the nouveau team
plan to implement vertex shader textures for nv40?

Marek

On Wed, Jan 22, 2014 at 9:36 PM, Marek Olšák mar...@gmail.com wrote:
 On Mon, Jan 20, 2014 at 3:35 PM, Brian Paul bri...@vmware.com wrote:
 I'm not sure about this.  The concept of MAX_COMBINED_TEXTURE_IMAGE_UNITS
 goes back to the first shader hardware where texture image units might be
 shared between the VS and FS.

 The question is whether any of the hardware we care about has that
 restriction.

 I'm not aware of any hardware which has the restriction. I think
 the reason MAX_COMBINED_TEXTURE_IMAGE_UNITS exists is that the OpenGL
 texture units are shared by all shader stages.

 Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gallium: remove PIPE_CAP_MAX_COMBINED_SAMPLERS

2014-02-03 Thread Ilia Mirkin
[+ben, nv30 driver author]

On Mon, Feb 3, 2014 at 7:18 PM, Marek Olšák mar...@gmail.com wrote:
 Ilia,

 I see you worked on nv30. Are vertex shader texture slots shared with
 fragment shader texture slots on nv40? If yes, does the nouveau team
 plan to implement vertex shader textures for nv40?

Unfortunately my knowledge of these things is... limited. I haven't
really touched the nv30 shader stuff much, esp not relating to
textures. Ben, do you know what Marek is talking about? If not, I'll
try to investigate.

Marek -- perhaps you can give me an example of what you mean? Are you
talking about, e.g. texture(sampler2D, ...), and whether one can do
that from within the vertex shader and whether one can attach those
samplers/textures to vs/fs independently or not? (Sorry, my knowledge
of opengl and 3d hardware and esp terminology is still very poor. But
I'm learning...)


 Marek

 On Wed, Jan 22, 2014 at 9:36 PM, Marek Olšák mar...@gmail.com wrote:
 On Mon, Jan 20, 2014 at 3:35 PM, Brian Paul bri...@vmware.com wrote:
 I'm not sure about this.  The concept of MAX_COMBINED_TEXTURE_IMAGE_UNITS
 goes back to the first shader hardware where texture image units might be
 shared between the VS and FS.

 The question is whether any of the hardware we care about has that
 restriction.

 I'm not aware of any hardware which has the restriction. I think
 the reason MAX_COMBINED_TEXTURE_IMAGE_UNITS exists is that the OpenGL
 texture units are shared by all shader stages.

 Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl-compiler: ast: Precise locations positions.

2014-02-03 Thread Sir Anthony
I'm using mesa glsl-compiler as backend for glsl-debugger interface with heavy 
use of locations for graphical representation. Locations proposed by original 
parser is wrong in most cases, because it uses yylloc for all bison rules. This 
patch includes:
1. Change locations setup in glsl_parser.yy from yylloc to appropriate token 
locations.
2. Addition of two fields in ast_node location to hold end position of token.
3. Addition of ast_node method to setup range locations (for aggregate tokens).
4. Fix for glcpp-lex.l. It handled spaces wrong and convert two adjacent spaces 
into one, which added location offset for shaders with indentation.

---
 src/glsl/ast.h  |  36 +--
 src/glsl/glcpp/glcpp-lex.l  |   5 +-
 src/glsl/glsl_lexer.ll  |   3 +-
 src/glsl/glsl_parser.yy | 215 +---
 src/glsl/glsl_parser_extras.cpp |   6 +-
 5 files changed, 150 insertions(+), 115 deletions(-)

-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl-compiler: ast: Precise locations positions.

2014-02-03 Thread Sir Anthony
1. Change locations setup in glsl_parser.yy from yylloc to appropriate token 
locations.
2. Addition of two fields in ast_node location to hold end position of token.
3. Addition of ast_node method to setup range locations (for aggregate tokens).
4. Fix for glcpp-lex.l. It handled spaces wrong and convert two adjacent spaces 
into one, which added location offset for shaders with indentation.
---
 src/glsl/ast.h  |  36 +--
 src/glsl/glcpp/glcpp-lex.l  |   5 +-
 src/glsl/glsl_lexer.ll  |   3 +-
 src/glsl/glsl_parser.yy | 215 +---
 src/glsl/glsl_parser_extras.cpp |   6 +-
 5 files changed, 150 insertions(+), 115 deletions(-)

diff --git a/src/glsl/ast.h b/src/glsl/ast.h
index 0bda28d..9b5bc47 100644
--- a/src/glsl/ast.h
+++ b/src/glsl/ast.h
@@ -75,10 +75,10 @@ public:
   struct YYLTYPE locp;
 
   locp.source = this-location.source;
-  locp.first_line = this-location.line;
-  locp.first_column = this-location.column;
-  locp.last_line = locp.first_line;
-  locp.last_column = locp.first_column;
+  locp.first_line = this-location.first_line;
+  locp.first_column = this-location.first_column;
+  locp.last_line = this-location.last_line;
+  locp.last_column = this-location.last_column;
 
   return locp;
}
@@ -91,17 +91,35 @@ public:
void set_location(const struct YYLTYPE locp)
{
   this-location.source = locp.source;
-  this-location.line = locp.first_line;
-  this-location.column = locp.first_column;
+  this-location.first_line = locp.first_line;
+  this-location.first_column = locp.first_column;
+  this-location.last_line = locp.last_line;
+  this-location.last_column = locp.last_column;
+   }
+
+   /**
+* Set the source location range of an AST node using two location nodes
+*
+* \sa ast_node::set_location
+*/
+   void set_location_range(const struct YYLTYPE begin, const struct YYLTYPE 
end)
+   {
+  this-location.source = begin.source;
+  this-location.first_line = begin.first_line;
+  this-location.last_line = end.last_line;
+  this-location.first_column = begin.first_column;
+  this-location.last_column = end.last_column;
}
 
/**
 * Source location of the AST node.
 */
struct {
-  unsigned source;/** GLSL source number. */
-  unsigned line;  /** Line number within the source string. */
-  unsigned column;/** Column in the line. */
+  unsigned source;  /** GLSL source number. */
+  unsigned first_line;  /** Line number within the source string. */
+  unsigned first_column;/** Column in the line. */
+  unsigned last_line;   /** Line number within the source string. */
+  unsigned last_column; /** Column in the line. */
} location;
 
exec_node link;
diff --git a/src/glsl/glcpp/glcpp-lex.l b/src/glsl/glcpp/glcpp-lex.l
index ea3b862..188e454 100644
--- a/src/glsl/glcpp/glcpp-lex.l
+++ b/src/glsl/glcpp/glcpp-lex.l
@@ -47,8 +47,9 @@ void glcpp_set_column (int  column_no , yyscan_t yyscanner);
if (parser-has_new_source_number)  \
yylloc-source = parser-new_source_number; \
yylloc-first_column = yycolumn + 1;\
-   yylloc-first_line = yylineno;  \
+   yylloc-first_line = yylloc-last_line = yylineno;  \
yycolumn += yyleng; \
+   yylloc-last_column = yycolumn + 1; \
parser-has_new_line_number = 0;\
parser-has_new_source_number = 0;  \
  } while(0);
@@ -337,7 +338,7 @@ HEXADECIMAL_INTEGER 0[xX][0-9a-fA-F]+[uU]?
return OTHER;
 }
 
-{HSPACE}+ {
+{HSPACE} {
if (yyextra-space_tokens) {
return SPACE;
}
diff --git a/src/glsl/glsl_lexer.ll b/src/glsl/glsl_lexer.ll
index 50875bf..e7766a8 100644
--- a/src/glsl/glsl_lexer.ll
+++ b/src/glsl/glsl_lexer.ll
@@ -38,8 +38,9 @@ static int classify_identifier(struct _mesa_glsl_parse_state 
*, const char *);
do {\
   yylloc-source = 0;  \
   yylloc-first_column = yycolumn + 1; \
-  yylloc-first_line = yylineno + 1;   \
+  yylloc-first_line = yylloc-last_line = yylineno + 1;   \
   yycolumn += yyleng;  \
+  yylloc-last_column = yycolumn + 1;  \
} while(0);
 
 #define YY_USER_INIT yylineno = 0; yycolumn = 0;
diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy
index 928c57e..c6a585f 100644
--- a/src/glsl/glsl_parser.yy
+++ b/src/glsl/glsl_parser.yy
@@ -386,35 +386,35 @@ primary_expression:
{
   void *ctx = state;

Re: [Mesa-dev] [PATCH] r600g: Prevent SIGFPE when using r600_dma_copy_tile with large textures

2014-02-03 Thread Marek Olšák
Unless Jerome has a better idea, your best bet would be to just return
FALSE from that function.

For Jerome:

The code crashes on a division by 0 (cheight is 0). The problem occurs
with the texture format R32G32B32A32 and width = 4096. Do you have
any idea what is wrong with it?

Thank you.

Marek

On Mon, Feb 3, 2014 at 6:40 AM, Ahmed Allam ahmabd...@hotmail.com wrote:
 Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=73781

 Signed-off-by: Ahmed Allam ahmabd...@hotmail.com
 ---
  src/gallium/drivers/r600/r600_state.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/src/gallium/drivers/r600/r600_state.c 
 b/src/gallium/drivers/r600/r600_state.c
 index a0d527b..8690849 100644
 --- a/src/gallium/drivers/r600/r600_state.c
 +++ b/src/gallium/drivers/r600/r600_state.c
 @@ -3095,7 +3095,8 @@ static boolean r600_dma_copy_tile(struct r600_context 
 *rctx,
 /* It's a r6xx/r7xx limitation, the blit must be on 8 boundary for 
 number
  * line in the blit. Compute max 8 line we can copy in the size limit
  */
 -   cheight = ((0x  2) / pitch)  0xfff8;
 +   cheight = ((0x  2) / pitch)  0x0008 ?
 +   ((0x  2) / pitch)  0xfff8 : ((0x  2) 
 / pitch);
 ncopy = (copy_height / cheight) + !!(copy_height % cheight);
 r600_need_dma_space(rctx, ncopy * 7);

 --
 1.9.rc1

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gallium: remove PIPE_CAP_MAX_COMBINED_SAMPLERS

2014-02-03 Thread Marek Olšák
nv40 supports texturing in the vertex shader, but I don't know if the
number of textures used by the vertex shader doesn't limit the maximum
number of textures which can be used by the fragment shader. For
example, if the vertex shaders uses 4 textures, the fragment shader
can only use 12 instead of 16. I'm asking if nv40 has such a
limitation.

Thank you.

Marek

On Tue, Feb 4, 2014 at 1:37 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 [+ben, nv30 driver author]

 On Mon, Feb 3, 2014 at 7:18 PM, Marek Olšák mar...@gmail.com wrote:
 Ilia,

 I see you worked on nv30. Are vertex shader texture slots shared with
 fragment shader texture slots on nv40? If yes, does the nouveau team
 plan to implement vertex shader textures for nv40?

 Unfortunately my knowledge of these things is... limited. I haven't
 really touched the nv30 shader stuff much, esp not relating to
 textures. Ben, do you know what Marek is talking about? If not, I'll
 try to investigate.

 Marek -- perhaps you can give me an example of what you mean? Are you
 talking about, e.g. texture(sampler2D, ...), and whether one can do
 that from within the vertex shader and whether one can attach those
 samplers/textures to vs/fs independently or not? (Sorry, my knowledge
 of opengl and 3d hardware and esp terminology is still very poor. But
 I'm learning...)


 Marek

 On Wed, Jan 22, 2014 at 9:36 PM, Marek Olšák mar...@gmail.com wrote:
 On Mon, Jan 20, 2014 at 3:35 PM, Brian Paul bri...@vmware.com wrote:
 I'm not sure about this.  The concept of MAX_COMBINED_TEXTURE_IMAGE_UNITS
 goes back to the first shader hardware where texture image units might be
 shared between the VS and FS.

 The question is whether any of the hardware we care about has that
 restriction.

 I'm not aware of any hardware which has the restriction. I think
 the reason MAX_COMBINED_TEXTURE_IMAGE_UNITS exists is that the OpenGL
 texture units are shared by all shader stages.

 Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gallium: remove PIPE_CAP_MAX_COMBINED_SAMPLERS

2014-02-03 Thread Ilia Mirkin
Hmmm... unclear from a quick read of the code. nv40_verttex.c looks at
nv30-fragprog.textures/samplers (instead of nv30-vertprog) to
disable VTXTEX_ENABLE, but doesn't do anything with them if they're
set. Could just be a copy-pasta situation, or could be that TEX_ENABLE
and VTXTEX_ENABLE can't be set for the same index... The
enables/formats/etc are all in separate registers, but there's nothing
saying that you can use both at the same time. I'll see if there's any
docs on it when I get home.

Separately, the chances of major new features being implemented in
nv30 at this point are relatively low. I'm probably the only person
looking at it, and I'm just trying to make the current functionality
work (or at least be not as wrong). There's sitll quite some work
ahead in that regard --
http://people.freedesktop.org/~imirkin/nv40-comparison/problems.html

On Mon, Feb 3, 2014 at 7:58 PM, Marek Olšák mar...@gmail.com wrote:
 nv40 supports texturing in the vertex shader, but I don't know if the
 number of textures used by the vertex shader doesn't limit the maximum
 number of textures which can be used by the fragment shader. For
 example, if the vertex shaders uses 4 textures, the fragment shader
 can only use 12 instead of 16. I'm asking if nv40 has such a
 limitation.

 Thank you.

 Marek

 On Tue, Feb 4, 2014 at 1:37 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 [+ben, nv30 driver author]

 On Mon, Feb 3, 2014 at 7:18 PM, Marek Olšák mar...@gmail.com wrote:
 Ilia,

 I see you worked on nv30. Are vertex shader texture slots shared with
 fragment shader texture slots on nv40? If yes, does the nouveau team
 plan to implement vertex shader textures for nv40?

 Unfortunately my knowledge of these things is... limited. I haven't
 really touched the nv30 shader stuff much, esp not relating to
 textures. Ben, do you know what Marek is talking about? If not, I'll
 try to investigate.

 Marek -- perhaps you can give me an example of what you mean? Are you
 talking about, e.g. texture(sampler2D, ...), and whether one can do
 that from within the vertex shader and whether one can attach those
 samplers/textures to vs/fs independently or not? (Sorry, my knowledge
 of opengl and 3d hardware and esp terminology is still very poor. But
 I'm learning...)


 Marek

 On Wed, Jan 22, 2014 at 9:36 PM, Marek Olšák mar...@gmail.com wrote:
 On Mon, Jan 20, 2014 at 3:35 PM, Brian Paul bri...@vmware.com wrote:
 I'm not sure about this.  The concept of MAX_COMBINED_TEXTURE_IMAGE_UNITS
 goes back to the first shader hardware where texture image units might be
 shared between the VS and FS.

 The question is whether any of the hardware we care about has that
 restriction.

 I'm not aware of any hardware which has the restriction. I think
 the reason MAX_COMBINED_TEXTURE_IMAGE_UNITS exists is that the OpenGL
 texture units are shared by all shader stages.

 Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeon/uvd: fix feedback buffer handling

2014-02-03 Thread Michel Dänzer
On Mon, 2014-02-03 at 11:33 +0100, Christian König wrote:
 From: Christian König christian.koe...@amd.com
 
 Without the correct feedback buffer size UVD runs
 into an error on each frame, reducing the maximum FPS.
 
 Signed-off-by: Christian König christian.koe...@amd.com

Some minor comments below, other than that

Reviewed-by: Michel Dänzer michel.daen...@amd.com


 diff --git a/src/gallium/drivers/radeon/radeon_uvd.c 
 b/src/gallium/drivers/radeon/radeon_uvd.c
 index 95757e3..6ac2199 100644
 --- a/src/gallium/drivers/radeon/radeon_uvd.c
 +++ b/src/gallium/drivers/radeon/radeon_uvd.c
 @@ -132,15 +136,20 @@ static void send_cmd(struct ruvd_decoder *dec, unsigned 
 cmd,
  }
  
  /* map the next available message buffer */
 -static void map_msg_buf(struct ruvd_decoder *dec)
 +static void map_msg_fb_buf(struct ruvd_decoder *dec)

Maybe the function comment should be updated as well, e.g. to:

/* map the next available message/feedback buffer */


 + void *ptr;
  
 - /* grap the current message buffer */
 + /* grap the current message/feedback buffer */

While you're at it, why not fix the spelling: 'grab'


 - /* copy the message into it */
 - dec-msg = dec-ws-buffer_map(buf-cs_handle, dec-cs, 
 PIPE_TRANSFER_WRITE);
 + /* and map it for CPU access */
 + ptr = dec-ws-buffer_map(buf-cs_handle, dec-cs, PIPE_TRANSFER_WRITE);
 +
 + /* calc buffer offsets */
 + dec-msg = ptr;
 + dec-fb = ptr + FB_BUFFER_OFFSET;
  }

This is pointer arithmetic on a void* pointer, which is not defined by
the C standard and not supported by all compilers. Maybe make ptr a
char* instead, or just cast it to that for the assignment.


 @@ -898,7 +913,8 @@ struct pipe_video_codec *ruvd_create_decoder(struct 
 pipe_context *context,
  
   bs_buf_size = width * height * 512 / (16 * 16);
   for (i = 0; i  NUM_BUFFERS; ++i) {
 - unsigned msg_fb_size = align(sizeof(struct ruvd_msg), 0x1000) + 
 0x1000;
 + unsigned msg_fb_size = FB_BUFFER_OFFSET + FB_BUFFER_SIZE;
 + assert(sizeof(struct ruvd_msg) = FB_BUFFER_OFFSET);

This looks like it could be a STATIC_ASSERT.


-- 
Earthling Michel Dänzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallivm: allow large numbers of temporaries

2014-02-03 Thread Zack Rusin
The number of allowed temporaries increases almost with every
iteration of an api. We used to support 128, then we started
increasing and the newer api's support 4096+. So if we notice
that the number of temporaries is larger than our statically
allocated storage would allow we just treat them as indexable
temporaries and allocate them as an array from the start.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
index 9db41a9..7c5de21 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
@@ -2672,8 +2672,8 @@ lp_emit_declaration_soa(
   assert(last = bld-bld_base.info-file_max[decl-Declaration.File]);
   switch (decl-Declaration.File) {
   case TGSI_FILE_TEMPORARY:
- assert(idx  LP_MAX_TGSI_TEMPS);
  if (!(bld-indirect_files  (1  TGSI_FILE_TEMPORARY))) {
+assert(idx  LP_MAX_TGSI_TEMPS);
 for (i = 0; i  TGSI_NUM_CHANNELS; i++)
bld-temps[idx][i] = lp_build_alloca(gallivm, vec_type, temp);
  }
@@ -3621,6 +3621,15 @@ lp_build_tgsi_soa(struct gallivm_state *gallivm,
bld.bld_base.info = info;
bld.indirect_files = info-indirect_files;
 
+   /*
+* If the number of temporaries is rather large then we just
+* allocate them as an array right from the start and treat
+* like indirect temporaries.
+*/
+   if (info-file_max[TGSI_FILE_TEMPORARY] = LP_MAX_TGSI_TEMPS) {
+  bld.indirect_files |= (1  TGSI_FILE_TEMPORARY);
+   }
+
bld.bld_base.soa = TRUE;
bld.bld_base.emit_debug = emit_debug;
bld.bld_base.emit_fetch_funcs[TGSI_FILE_CONSTANT] = emit_fetch_constant;
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] d3d10: allow indexable temporaries as relative registers

2014-02-03 Thread Zack Rusin
Indexable temporaries are 2d (the index of the array and the index
within the array) and can be used both as outputs, inputs and relative
addressing registers. This fixes parsing of indexable temporaries
and fixes their parsing in relative addressing.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/state_trackers/d3d10/ShaderParse.c | 14 ++
 src/gallium/state_trackers/d3d10/ShaderParse.h |  2 +-
 src/gallium/state_trackers/d3d10/ShaderTGSI.c  |  8 +++-
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/src/gallium/state_trackers/d3d10/ShaderParse.c 
b/src/gallium/state_trackers/d3d10/ShaderParse.c
index 38ec2fe..7cec385 100644
--- a/src/gallium/state_trackers/d3d10/ShaderParse.c
+++ b/src/gallium/state_trackers/d3d10/ShaderParse.c
@@ -207,13 +207,19 @@ parse_relative_operand(const unsigned **curr,
assert(operand-type != D3D10_SB_OPERAND_TYPE_IMMEDIATE32);
 
/* Index dimension. */
-   assert(DECODE_D3D10_SB_OPERAND_INDEX_DIMENSION(**curr) == 
D3D10_SB_OPERAND_INDEX_1D);
assert(DECODE_D3D10_SB_OPERAND_INDEX_REPRESENTATION(0, **curr) == 
D3D10_SB_OPERAND_INDEX_IMMEDIATE32);
 
-   (*curr)++;
-
-   operand-index[0].imm = **curr;
+   if (DECODE_D3D10_SB_OPERAND_INDEX_DIMENSION(**curr) == 
D3D10_SB_OPERAND_INDEX_1D) {
+  (*curr)++;
+  operand-index[0].imm = **curr;
+   } else {
+  assert(DECODE_D3D10_SB_OPERAND_INDEX_DIMENSION(**curr) == 
D3D10_SB_OPERAND_INDEX_2D);
+  (*curr)++;
+  operand-index[0].imm = **curr;
+  (*curr)++;
+  operand-index[1].imm = **curr;
 
+   }
(*curr)++;
 }
 
diff --git a/src/gallium/state_trackers/d3d10/ShaderParse.h 
b/src/gallium/state_trackers/d3d10/ShaderParse.h
index 64f177c..5971864 100644
--- a/src/gallium/state_trackers/d3d10/ShaderParse.h
+++ b/src/gallium/state_trackers/d3d10/ShaderParse.h
@@ -54,7 +54,7 @@ struct Shader_relative_index {
 
 struct Shader_relative_operand {
D3D10_SB_OPERAND_TYPE type;
-   struct Shader_relative_index index[1];
+   struct Shader_relative_index index[2];
D3D10_SB_4_COMPONENT_NAME comp;
 };
 
diff --git a/src/gallium/state_trackers/d3d10/ShaderTGSI.c 
b/src/gallium/state_trackers/d3d10/ShaderTGSI.c
index 9fb6b1d..2e42b8b 100644
--- a/src/gallium/state_trackers/d3d10/ShaderTGSI.c
+++ b/src/gallium/state_trackers/d3d10/ShaderTGSI.c
@@ -637,9 +637,15 @@ translate_relative_operand(struct Shader_xlate *sx,
   reg = sx-prim_id;
   break;
 
+   case D3D10_SB_OPERAND_TYPE_INDEXABLE_TEMP:
+  assert(operand-index[1].imm  SHADER_MAX_TEMPS);
+
+  reg = 
ureg_src(sx-temps[sx-indexable_temp_offsets[operand-index[0].imm] +
+operand-index[1].imm]);
+  break;
+
case D3D10_SB_OPERAND_TYPE_INPUT:
case D3D10_SB_OPERAND_TYPE_OUTPUT:
-   case D3D10_SB_OPERAND_TYPE_INDEXABLE_TEMP:
case D3D10_SB_OPERAND_TYPE_IMMEDIATE32:
case D3D10_SB_OPERAND_TYPE_IMMEDIATE64:
case D3D10_SB_OPERAND_TYPE_SAMPLER:
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] d3d10: allow indirect addressing on outputs

2014-02-03 Thread Zack Rusin
Outputs can have relative addressing. This adds basic support for it.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/state_trackers/d3d10/ShaderTGSI.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/src/gallium/state_trackers/d3d10/ShaderTGSI.c 
b/src/gallium/state_trackers/d3d10/ShaderTGSI.c
index 2e42b8b..1cf9e0e 100644
--- a/src/gallium/state_trackers/d3d10/ShaderTGSI.c
+++ b/src/gallium/state_trackers/d3d10/ShaderTGSI.c
@@ -687,20 +687,26 @@ translate_operand(struct Shader_xlate *sx,
 
case D3D10_SB_OPERAND_TYPE_OUTPUT:
   assert(operand-index_dim == 1);
-  assert(operand-index[0].index_rep == 
D3D10_SB_OPERAND_INDEX_IMMEDIATE32);
   assert(operand-index[0].imm  SHADER_MAX_OUTPUTS);
 
-  if (!writemask) {
- reg = sx-outputs[operand-index[0].imm].reg[0];
-  } else {
- unsigned i;
- for (i = 0; i  4; ++i) {
-unsigned mask = 1  i;
-if ((writemask  mask)) {
-   reg = sx-outputs[operand-index[0].imm].reg[i];
-   break;
+  if (operand-index[0].index_rep == D3D10_SB_OPERAND_INDEX_IMMEDIATE32) {
+ if (!writemask) {
+reg = sx-outputs[operand-index[0].imm].reg[0];
+ } else {
+unsigned i;
+for (i = 0; i  4; ++i) {
+   unsigned mask = 1  i;
+   if ((writemask  mask)) {
+  reg = sx-outputs[operand-index[0].imm].reg[i];
+  break;
+   }
 }
  }
+  } else {
+ struct ureg_src addr =
+translate_relative_operand(sx, operand-index[0].rel);
+ assert(operand-index[0].index_rep == 
D3D10_SB_OPERAND_INDEX_IMMEDIATE32_PLUS_RELATIVE);
+ reg = ureg_dst_indirect(sx-outputs[operand-index[0].imm].reg[0], 
addr);
   }
   break;
 
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] d3d10: support 1d indirect addressing on inputs

2014-02-03 Thread Zack Rusin
we supported 2d indirect addressing (gs tests were using it) but
not 1d indirect addressing (which can be used in vs and ps). This
adds support for 1d indirect addressing.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/state_trackers/d3d10/ShaderTGSI.c | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/gallium/state_trackers/d3d10/ShaderTGSI.c 
b/src/gallium/state_trackers/d3d10/ShaderTGSI.c
index 1cf9e0e..76126c5 100644
--- a/src/gallium/state_trackers/d3d10/ShaderTGSI.c
+++ b/src/gallium/state_trackers/d3d10/ShaderTGSI.c
@@ -828,11 +828,29 @@ translate_src_operand(struct Shader_xlate *sx,
switch (operand-base.type) {
case D3D10_SB_OPERAND_TYPE_INPUT:
   if (operand-base.index_dim == 1) {
- assert(operand-base.index[0].index_rep ==
-D3D10_SB_OPERAND_INDEX_IMMEDIATE32);
- assert(operand-base.index[0].imm  SHADER_MAX_INPUTS);
+ switch (operand-base.index[0].index_rep) {
+ case D3D10_SB_OPERAND_INDEX_IMMEDIATE32:
+assert(operand-base.index[0].imm  SHADER_MAX_INPUTS);
+reg = sx-inputs[operand-base.index[0].imm].reg;
+break;
+ case D3D10_SB_OPERAND_INDEX_RELATIVE: {
+struct ureg_src tmp =
+   translate_relative_operand(sx, operand-base.index[0].rel);
+reg = ureg_src_indirect(sx-inputs[0].reg, tmp);
+ }
+break;
+ case D3D10_SB_OPERAND_INDEX_IMMEDIATE32_PLUS_RELATIVE: {
+struct ureg_src tmp =
+   translate_relative_operand(sx, operand-base.index[0].rel);
+reg = 
ureg_src_indirect(sx-inputs[operand-base.index[0].imm].reg, tmp);
+ }
+break;
+ default:
+/* XXX: Other index representations.
+ */
+LOG_UNSUPPORTED(TRUE);
 
- reg = sx-inputs[operand-base.index[0].imm].reg;
+ }
   } else {
  assert(operand-base.index_dim == 2);
  assert(operand-base.index[1].imm  SHADER_MAX_INPUTS);
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] d3d10: allow indexable temporaries as relative registers

2014-02-03 Thread Dave Airlie
Hi Zack,

you forgot to post the initial patch to add the d3d10 state tracker :-P

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gallium: remove PIPE_CAP_MAX_COMBINED_SAMPLERS

2014-02-03 Thread Ilia Mirkin
OK, so from the _looks_ of it, vertex and fragment texture/sampler
bindings are totally separate. Based on what I'm seeing, there are 16
slots for fragment shaders, and 4 for vertex shaders.

On Mon, Feb 3, 2014 at 8:51 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 Hmmm... unclear from a quick read of the code. nv40_verttex.c looks at
 nv30-fragprog.textures/samplers (instead of nv30-vertprog) to
 disable VTXTEX_ENABLE, but doesn't do anything with them if they're
 set. Could just be a copy-pasta situation, or could be that TEX_ENABLE
 and VTXTEX_ENABLE can't be set for the same index... The
 enables/formats/etc are all in separate registers, but there's nothing
 saying that you can use both at the same time. I'll see if there's any
 docs on it when I get home.

 Separately, the chances of major new features being implemented in
 nv30 at this point are relatively low. I'm probably the only person
 looking at it, and I'm just trying to make the current functionality
 work (or at least be not as wrong). There's sitll quite some work
 ahead in that regard --
 http://people.freedesktop.org/~imirkin/nv40-comparison/problems.html

 On Mon, Feb 3, 2014 at 7:58 PM, Marek Olšák mar...@gmail.com wrote:
 nv40 supports texturing in the vertex shader, but I don't know if the
 number of textures used by the vertex shader doesn't limit the maximum
 number of textures which can be used by the fragment shader. For
 example, if the vertex shaders uses 4 textures, the fragment shader
 can only use 12 instead of 16. I'm asking if nv40 has such a
 limitation.

 Thank you.

 Marek

 On Tue, Feb 4, 2014 at 1:37 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 [+ben, nv30 driver author]

 On Mon, Feb 3, 2014 at 7:18 PM, Marek Olšák mar...@gmail.com wrote:
 Ilia,

 I see you worked on nv30. Are vertex shader texture slots shared with
 fragment shader texture slots on nv40? If yes, does the nouveau team
 plan to implement vertex shader textures for nv40?

 Unfortunately my knowledge of these things is... limited. I haven't
 really touched the nv30 shader stuff much, esp not relating to
 textures. Ben, do you know what Marek is talking about? If not, I'll
 try to investigate.

 Marek -- perhaps you can give me an example of what you mean? Are you
 talking about, e.g. texture(sampler2D, ...), and whether one can do
 that from within the vertex shader and whether one can attach those
 samplers/textures to vs/fs independently or not? (Sorry, my knowledge
 of opengl and 3d hardware and esp terminology is still very poor. But
 I'm learning...)


 Marek

 On Wed, Jan 22, 2014 at 9:36 PM, Marek Olšák mar...@gmail.com wrote:
 On Mon, Jan 20, 2014 at 3:35 PM, Brian Paul bri...@vmware.com wrote:
 I'm not sure about this.  The concept of MAX_COMBINED_TEXTURE_IMAGE_UNITS
 goes back to the first shader hardware where texture image units might be
 shared between the VS and FS.

 The question is whether any of the hardware we care about has that
 restriction.

 I'm not aware of any hardware which has the restriction. I think
 the reason MAX_COMBINED_TEXTURE_IMAGE_UNITS exists is that the OpenGL
 texture units are shared by all shader stages.

 Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gallium: remove PIPE_CAP_MAX_COMBINED_SAMPLERS

2014-02-03 Thread Ben Skeggs
On Tue, Feb 4, 2014 at 1:22 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 OK, so from the _looks_ of it, vertex and fragment texture/sampler
 bindings are totally separate. Based on what I'm seeing, there are 16
 slots for fragment shaders, and 4 for vertex shaders.
From what I understand, they're completely separate and the use of one
imposes no limitations on the other.


 On Mon, Feb 3, 2014 at 8:51 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 Hmmm... unclear from a quick read of the code. nv40_verttex.c looks at
 nv30-fragprog.textures/samplers (instead of nv30-vertprog) to
 disable VTXTEX_ENABLE, but doesn't do anything with them if they're
 set. Could just be a copy-pasta situation, or could be that TEX_ENABLE
 and VTXTEX_ENABLE can't be set for the same index... The
 enables/formats/etc are all in separate registers, but there's nothing
 saying that you can use both at the same time. I'll see if there's any
 docs on it when I get home.

 Separately, the chances of major new features being implemented in
 nv30 at this point are relatively low. I'm probably the only person
 looking at it, and I'm just trying to make the current functionality
 work (or at least be not as wrong). There's sitll quite some work
 ahead in that regard --
 http://people.freedesktop.org/~imirkin/nv40-comparison/problems.html

 On Mon, Feb 3, 2014 at 7:58 PM, Marek Olšák mar...@gmail.com wrote:
 nv40 supports texturing in the vertex shader, but I don't know if the
 number of textures used by the vertex shader doesn't limit the maximum
 number of textures which can be used by the fragment shader. For
 example, if the vertex shaders uses 4 textures, the fragment shader
 can only use 12 instead of 16. I'm asking if nv40 has such a
 limitation.

 Thank you.

 Marek

 On Tue, Feb 4, 2014 at 1:37 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 [+ben, nv30 driver author]

 On Mon, Feb 3, 2014 at 7:18 PM, Marek Olšák mar...@gmail.com wrote:
 Ilia,

 I see you worked on nv30. Are vertex shader texture slots shared with
 fragment shader texture slots on nv40? If yes, does the nouveau team
 plan to implement vertex shader textures for nv40?

 Unfortunately my knowledge of these things is... limited. I haven't
 really touched the nv30 shader stuff much, esp not relating to
 textures. Ben, do you know what Marek is talking about? If not, I'll
 try to investigate.

 Marek -- perhaps you can give me an example of what you mean? Are you
 talking about, e.g. texture(sampler2D, ...), and whether one can do
 that from within the vertex shader and whether one can attach those
 samplers/textures to vs/fs independently or not? (Sorry, my knowledge
 of opengl and 3d hardware and esp terminology is still very poor. But
 I'm learning...)


 Marek

 On Wed, Jan 22, 2014 at 9:36 PM, Marek Olšák mar...@gmail.com wrote:
 On Mon, Jan 20, 2014 at 3:35 PM, Brian Paul bri...@vmware.com wrote:
 I'm not sure about this.  The concept of 
 MAX_COMBINED_TEXTURE_IMAGE_UNITS
 goes back to the first shader hardware where texture image units might 
 be
 shared between the VS and FS.

 The question is whether any of the hardware we care about has that
 restriction.

 I'm not aware of any hardware which has the restriction. I think
 the reason MAX_COMBINED_TEXTURE_IMAGE_UNITS exists is that the OpenGL
 texture units are shared by all shader stages.

 Marek
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 74476] libGL complains about missing symbol __driDriverGetExtensions_radeonsi

2014-02-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=74476

Michel Dänzer mic...@daenzer.net changed:

   What|Removed |Added

   Assignee|dri-devel@lists.freedesktop |mesa-dev@lists.freedesktop.
   |.org|org
  Component|Drivers/Gallium/radeonsi|GLX

--- Comment #1 from Michel Dänzer mic...@daenzer.net ---
The loader prints this for any non-mega-driver. It's harmless though.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Initialize ubo_binding_mask flags to zero.

2014-02-03 Thread Kenneth Graunke
On 02/03/2014 11:53 AM, Matt Turner wrote:
 Missed in commit e63bb298. Caused sporadic test failures, like
 incorrect-in-layout-qualifier-repeated-prim.geom.
 
 Cc: 10.0 mesa-sta...@lists.freedesktop.org
 ---
  src/glsl/ast_type.cpp | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/src/glsl/ast_type.cpp b/src/glsl/ast_type.cpp
 index 637da0d..5069326 100644
 --- a/src/glsl/ast_type.cpp
 +++ b/src/glsl/ast_type.cpp
 @@ -118,6 +118,7 @@ ast_type_qualifier::merge_qualifier(YYLTYPE *loc,
 ubo_layout_mask.flags.q.shared = 1;
  
 ast_type_qualifier ubo_binding_mask;
 +   ubo_binding_mask.flags.i = 0;
 ubo_binding_mask.flags.q.explicit_binding = 1;
 ubo_binding_mask.flags.q.explicit_offset = 1;
  
 

LGTM

Reviewed-by: Kenneth Graunke kenn...@whitecape.org



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] R600/SI: Add pattern for zero-extending i1 to i32

2014-02-03 Thread Michel Dänzer
From: Michel Dänzer michel.daen...@amd.com

Fixes opencl-example if_* tests with radeonsi.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74469
Signed-off-by: Michel Dänzer michel.daen...@amd.com
---
 lib/Target/R600/SIInstructions.td | 5 +
 1 file changed, 5 insertions(+)

diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index 7e37821..59fe2ae 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -1827,6 +1827,11 @@ def : Pat 
   (V_CNDMASK_B32_e64 (i32 0), (i32 -1), $src0)
 ;
 
+def : Pat 
+  (i32 (zext i1:$src0)),
+  (V_CNDMASK_B32_e64 (i32 0), (i32 1), $src0)
+;
+
 // 1. Offset as 8bit DWORD immediate
 def : Pat 
   (SIload_constant i128:$sbase, IMM8bitDWORD:$offset),
-- 
1.9.rc1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3] R600/SI: Fix fneg for 0.0

2014-02-03 Thread Michel Dänzer
From: Michel Dänzer michel.daen...@amd.com

V_ADD_F32 with source modifier does not produce -0.0 for this. Just
manipulate the sign bit directly instead.

Also add a pattern for (fneg (fabs ...)).

Fixes a bunch of bit encoding piglit tests with radeonsi.

Signed-off-by: Michel Dänzer michel.daen...@amd.com
---

v3: Add explanatory comments.

 lib/Target/R600/SIInstructions.td | 22 +---
 test/CodeGen/R600/fabs.ll | 14 +-
 test/CodeGen/R600/fneg-fabs.ll| 55 +++
 test/CodeGen/R600/fneg.ll | 14 +-
 4 files changed, 87 insertions(+), 18 deletions(-)
 create mode 100644 test/CodeGen/R600/fneg-fabs.ll

diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index b9e0c17..25fd7d5 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -1688,16 +1688,30 @@ def : Pat 
0 /* ABS */, 1 /* CLAMP */, 0 /* OMOD */, 0 /* NEG */)
 ;
 
+/**  **/
+/** Floating point absolute/negative **/
+/**  **/
+
+// Manipulate the sign bit directly, as e.g. using the source negation modifier
+// in V_ADD_F32_e64 $src, 0, [...] does not result in -0.0 for $src == +0.0,
+// breaking the piglit *s-floatBitsToInt-neg* tests
+
+// TODO: Look into not implementing isFNegFree/isFAbsFree for SI, and possibly
+// removing these patterns
+
+def : Pat 
+  (fneg (fabs f32:$src)),
+  (V_OR_B32_e32 $src, (V_MOV_B32_e32 0x8000)) /* Set sign bit */
+;
+
 def : Pat 
   (fabs f32:$src),
-  (V_ADD_F32_e64 $src, (i32 0 /* SRC1 */),
-   1 /* ABS */, 0 /* CLAMP */, 0 /* OMOD */, 0 /* NEG */)
+  (V_AND_B32_e32 $src, (V_MOV_B32_e32 0x7fff)) /* Clear sign bit */
 ;
 
 def : Pat 
   (fneg f32:$src),
-  (V_ADD_F32_e64 $src, (i32 0 /* SRC1 */),
-   0 /* ABS */, 0 /* CLAMP */, 0 /* OMOD */, 1 /* NEG */)
+  (V_XOR_B32_e32 $src, (V_MOV_B32_e32 0x8000)) /* Toggle sign bit */
 ;
 
 /** == **/
diff --git a/test/CodeGen/R600/fabs.ll b/test/CodeGen/R600/fabs.ll
index a5f5df9..2cd3a4f 100644
--- a/test/CodeGen/R600/fabs.ll
+++ b/test/CodeGen/R600/fabs.ll
@@ -9,7 +9,7 @@
 ; R600-CHECK-NOT: AND
 ; R600-CHECK: |PV.{{[XYZW]}}|
 ; SI-CHECK-LABEL: @fabs_free
-; SI-CHECK: V_ADD_F32_e64 v{{[0-9]}}, s{{[0-9]}}, 0, 1, 0, 0, 0
+; SI-CHECK: V_AND_B32
 
 define void @fabs_free(float addrspace(1)* %out, i32 %in) {
 entry:
@@ -23,8 +23,8 @@ entry:
 ; R600-CHECK: |{{(PV|T[0-9])\.[XYZW]}}|
 ; R600-CHECK: |{{(PV|T[0-9])\.[XYZW]}}|
 ; SI-CHECK-LABEL: @fabs_v2
-; SI-CHECK: V_ADD_F32_e64 v{{[0-9]}}, s{{[0-9]}}, 0, 1, 0, 0, 0
-; SI-CHECK: V_ADD_F32_e64 v{{[0-9]}}, s{{[0-9]}}, 0, 1, 0, 0, 0
+; SI-CHECK: V_AND_B32
+; SI-CHECK: V_AND_B32
 define void @fabs_v2(2 x float addrspace(1)* %out, 2 x float %in) {
 entry:
   %0 = call 2 x float @llvm.fabs.v2f32(2 x float %in)
@@ -38,10 +38,10 @@ entry:
 ; R600-CHECK: |{{(PV|T[0-9])\.[XYZW]}}|
 ; R600-CHECK: |{{(PV|T[0-9])\.[XYZW]}}|
 ; SI-CHECK-LABEL: @fabs_v4
-; SI-CHECK: V_ADD_F32_e64 v{{[0-9]}}, s{{[0-9]}}, 0, 1, 0, 0, 0
-; SI-CHECK: V_ADD_F32_e64 v{{[0-9]}}, s{{[0-9]}}, 0, 1, 0, 0, 0
-; SI-CHECK: V_ADD_F32_e64 v{{[0-9]}}, s{{[0-9]}}, 0, 1, 0, 0, 0
-; SI-CHECK: V_ADD_F32_e64 v{{[0-9]}}, s{{[0-9]}}, 0, 1, 0, 0, 0
+; SI-CHECK: V_AND_B32
+; SI-CHECK: V_AND_B32
+; SI-CHECK: V_AND_B32
+; SI-CHECK: V_AND_B32
 define void @fabs_v4(4 x float addrspace(1)* %out, 4 x float %in) {
 entry:
   %0 = call 4 x float @llvm.fabs.v4f32(4 x float %in)
diff --git a/test/CodeGen/R600/fneg-fabs.ll b/test/CodeGen/R600/fneg-fabs.ll
new file mode 100644
index 000..d95e131
--- /dev/null
+++ b/test/CodeGen/R600/fneg-fabs.ll
@@ -0,0 +1,55 @@
+; RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s 
--check-prefix=R600-CHECK
+; RUN: llc  %s -march=r600 -mcpu=SI -verify-machineinstrs | FileCheck %s 
--check-prefix=SI-CHECK
+
+; DAGCombiner will transform:
+; (fabs (f32 bitcast (i32 a))) = (f32 bitcast (and (i32 a), 0x7FFF))
+; unless isFabsFree returns true
+
+; R600-CHECK-LABEL: @fneg_fabs_free
+; R600-CHECK-NOT: AND
+; R600-CHECK: |PV.{{[XYZW]}}|
+; R600-CHECK: -PV
+; SI-CHECK-LABEL: @fneg_fabs_free
+; SI-CHECK: V_OR_B32
+
+define void @fneg_fabs_free(float addrspace(1)* %out, i32 %in) {
+entry:
+  %0 = bitcast i32 %in to float
+  %1 = call float @fabs(float %0)
+  %2 = fsub float -0.00e+00, %1
+  store float %2, float addrspace(1)* %out
+  ret void
+}
+
+; R600-CHECK-LABEL: @fneg_fabs_v2
+; R600-CHECK: |{{(PV|T[0-9])\.[XYZW]}}|
+; R600-CHECK: -PV
+; R600-CHECK: |{{(PV|T[0-9])\.[XYZW]}}|
+; R600-CHECK: -PV
+; SI-CHECK-LABEL: @fneg_fabs_v2
+; SI-CHECK: V_OR_B32
+; SI-CHECK: V_OR_B32
+define void @fneg_fabs_v2(2 x float addrspace(1)* %out, 2 x float %in) {
+entry:
+  %0 = call 2 x float @llvm.fabs.v2f32(2 x float %in)
+  %1 = fsub 2 x float float -0.00e+00, float -0.00e+00, %0
+  store 2 x float %1, 2 x float addrspace(1)* %out
+  ret void
+}
+
+; SI-CHECK-LABEL: 

Re: [Mesa-dev] [PATCH] R600/SI: Fix fneg for 0.0

2014-02-03 Thread Michel Dänzer
On Mon, 2014-02-03 at 14:16 -0800, Tom Stellard wrote:
 
 It's clear that there a few things that are wrong which are unrelated to this
 patch, so I think it is fine as is.  Could you add a comment above the pattern
 explaining why we need to manually toggle the sign bit and also a todo to fix 
 FabsFree
 and FNegFree.

Comments added in v3.


 If you add a lit test, then this patch has my r-b.

The v2 patch updates the two existing lit tests and adds a third one. Do
you have any other additional lit test(s) in mind?


-- 
Earthling Michel Dänzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 26/35] meta: Use common GLSL code for blits

2014-02-03 Thread Kenneth Graunke
On 02/03/2014 12:54 AM, Rogovin, Kevin wrote:
 Hi,
 
 We can't do stencil blits with GLSL because no driver that uses meta can
 do the GL_ARB_shader_stencil_export extension.  For depth and color
 blits, we can always write the values from the shader, and disable
 writes to the buffers using glColorMask and glDepthMask.  So... we just
 need one shader.
 
 Should we add GL_ARB_shader_stencil_export as one of the extensions Mesa
 can support? Naturally that is major tangent from this, and I am not too sure
 what hardware beyond newer AMD hardware has that capability.

Hi Kevin,

I don't believe our hardware can support GL_ARB_shader_stencil_export.
The render target write message can take RGBA, depth, and sample masks,
but not stencil.  Without that, it's not at all obvious how to implement it.

 The shader had that the fragment depth used the same value as the red of the
 fragment color, so should it not be then 3 shaders (color only, depth only, 
 color and depth)
 with the last one having two samplers?
 
 Also, should not the case of blitting from MSAA to non-MSAA be handled 
 correctly?
 i.e. use the sampler type sampler2DMS  and average over the number of samples?
 or possible fancier filtering kernels?

We could definitely support MSAA-non-MSAA blits via
ARB_texture_multisample, but we don't today.  Gallium drivers don't use
Meta, i965 instead uses BLORP, and the other classic drivers don't
support multisampling.

--Ken



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] R600/SI: Fix fneg for 0.0

2014-02-03 Thread Tom Stellard
On Tue, Feb 04, 2014 at 01:18:24PM +0900, Michel Dänzer wrote:
 From: Michel Dänzer michel.daen...@amd.com
 
 V_ADD_F32 with source modifier does not produce -0.0 for this. Just
 manipulate the sign bit directly instead.
 
 Also add a pattern for (fneg (fabs ...)).
 
 Fixes a bunch of bit encoding piglit tests with radeonsi.
 
 Signed-off-by: Michel Dänzer michel.daen...@amd.com
Reviewed-by: Tom Stellard thomas.stell...@amd.com
 ---
 
 v3: Add explanatory comments.
 
  lib/Target/R600/SIInstructions.td | 22 +---
  test/CodeGen/R600/fabs.ll | 14 +-
  test/CodeGen/R600/fneg-fabs.ll| 55 
 +++
  test/CodeGen/R600/fneg.ll | 14 +-
  4 files changed, 87 insertions(+), 18 deletions(-)
  create mode 100644 test/CodeGen/R600/fneg-fabs.ll
 
 diff --git a/lib/Target/R600/SIInstructions.td 
 b/lib/Target/R600/SIInstructions.td
 index b9e0c17..25fd7d5 100644
 --- a/lib/Target/R600/SIInstructions.td
 +++ b/lib/Target/R600/SIInstructions.td
 @@ -1688,16 +1688,30 @@ def : Pat 
 0 /* ABS */, 1 /* CLAMP */, 0 /* OMOD */, 0 /* NEG */)
  ;
  
 +/**  **/
 +/** Floating point absolute/negative **/
 +/**  **/
 +
 +// Manipulate the sign bit directly, as e.g. using the source negation 
 modifier
 +// in V_ADD_F32_e64 $src, 0, [...] does not result in -0.0 for $src == +0.0,
 +// breaking the piglit *s-floatBitsToInt-neg* tests
 +
 +// TODO: Look into not implementing isFNegFree/isFAbsFree for SI, and 
 possibly
 +// removing these patterns
 +
 +def : Pat 
 +  (fneg (fabs f32:$src)),
 +  (V_OR_B32_e32 $src, (V_MOV_B32_e32 0x8000)) /* Set sign bit */
 +;
 +
  def : Pat 
(fabs f32:$src),
 -  (V_ADD_F32_e64 $src, (i32 0 /* SRC1 */),
 -   1 /* ABS */, 0 /* CLAMP */, 0 /* OMOD */, 0 /* NEG */)
 +  (V_AND_B32_e32 $src, (V_MOV_B32_e32 0x7fff)) /* Clear sign bit */
  ;
  
  def : Pat 
(fneg f32:$src),
 -  (V_ADD_F32_e64 $src, (i32 0 /* SRC1 */),
 -   0 /* ABS */, 0 /* CLAMP */, 0 /* OMOD */, 1 /* NEG */)
 +  (V_XOR_B32_e32 $src, (V_MOV_B32_e32 0x8000)) /* Toggle sign bit */
  ;
  
  /** == **/
 diff --git a/test/CodeGen/R600/fabs.ll b/test/CodeGen/R600/fabs.ll
 index a5f5df9..2cd3a4f 100644
 --- a/test/CodeGen/R600/fabs.ll
 +++ b/test/CodeGen/R600/fabs.ll
 @@ -9,7 +9,7 @@
  ; R600-CHECK-NOT: AND
  ; R600-CHECK: |PV.{{[XYZW]}}|
  ; SI-CHECK-LABEL: @fabs_free
 -; SI-CHECK: V_ADD_F32_e64 v{{[0-9]}}, s{{[0-9]}}, 0, 1, 0, 0, 0
 +; SI-CHECK: V_AND_B32
  
  define void @fabs_free(float addrspace(1)* %out, i32 %in) {
  entry:
 @@ -23,8 +23,8 @@ entry:
  ; R600-CHECK: |{{(PV|T[0-9])\.[XYZW]}}|
  ; R600-CHECK: |{{(PV|T[0-9])\.[XYZW]}}|
  ; SI-CHECK-LABEL: @fabs_v2
 -; SI-CHECK: V_ADD_F32_e64 v{{[0-9]}}, s{{[0-9]}}, 0, 1, 0, 0, 0
 -; SI-CHECK: V_ADD_F32_e64 v{{[0-9]}}, s{{[0-9]}}, 0, 1, 0, 0, 0
 +; SI-CHECK: V_AND_B32
 +; SI-CHECK: V_AND_B32
  define void @fabs_v2(2 x float addrspace(1)* %out, 2 x float %in) {
  entry:
%0 = call 2 x float @llvm.fabs.v2f32(2 x float %in)
 @@ -38,10 +38,10 @@ entry:
  ; R600-CHECK: |{{(PV|T[0-9])\.[XYZW]}}|
  ; R600-CHECK: |{{(PV|T[0-9])\.[XYZW]}}|
  ; SI-CHECK-LABEL: @fabs_v4
 -; SI-CHECK: V_ADD_F32_e64 v{{[0-9]}}, s{{[0-9]}}, 0, 1, 0, 0, 0
 -; SI-CHECK: V_ADD_F32_e64 v{{[0-9]}}, s{{[0-9]}}, 0, 1, 0, 0, 0
 -; SI-CHECK: V_ADD_F32_e64 v{{[0-9]}}, s{{[0-9]}}, 0, 1, 0, 0, 0
 -; SI-CHECK: V_ADD_F32_e64 v{{[0-9]}}, s{{[0-9]}}, 0, 1, 0, 0, 0
 +; SI-CHECK: V_AND_B32
 +; SI-CHECK: V_AND_B32
 +; SI-CHECK: V_AND_B32
 +; SI-CHECK: V_AND_B32
  define void @fabs_v4(4 x float addrspace(1)* %out, 4 x float %in) {
  entry:
%0 = call 4 x float @llvm.fabs.v4f32(4 x float %in)
 diff --git a/test/CodeGen/R600/fneg-fabs.ll b/test/CodeGen/R600/fneg-fabs.ll
 new file mode 100644
 index 000..d95e131
 --- /dev/null
 +++ b/test/CodeGen/R600/fneg-fabs.ll
 @@ -0,0 +1,55 @@
 +; RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s 
 --check-prefix=R600-CHECK
 +; RUN: llc  %s -march=r600 -mcpu=SI -verify-machineinstrs | FileCheck %s 
 --check-prefix=SI-CHECK
 +
 +; DAGCombiner will transform:
 +; (fabs (f32 bitcast (i32 a))) = (f32 bitcast (and (i32 a), 0x7FFF))
 +; unless isFabsFree returns true
 +
 +; R600-CHECK-LABEL: @fneg_fabs_free
 +; R600-CHECK-NOT: AND
 +; R600-CHECK: |PV.{{[XYZW]}}|
 +; R600-CHECK: -PV
 +; SI-CHECK-LABEL: @fneg_fabs_free
 +; SI-CHECK: V_OR_B32
 +
 +define void @fneg_fabs_free(float addrspace(1)* %out, i32 %in) {
 +entry:
 +  %0 = bitcast i32 %in to float
 +  %1 = call float @fabs(float %0)
 +  %2 = fsub float -0.00e+00, %1
 +  store float %2, float addrspace(1)* %out
 +  ret void
 +}
 +
 +; R600-CHECK-LABEL: @fneg_fabs_v2
 +; R600-CHECK: |{{(PV|T[0-9])\.[XYZW]}}|
 +; R600-CHECK: -PV
 +; R600-CHECK: |{{(PV|T[0-9])\.[XYZW]}}|
 +; R600-CHECK: -PV
 +; SI-CHECK-LABEL: @fneg_fabs_v2
 +; SI-CHECK: V_OR_B32
 +; SI-CHECK: V_OR_B32
 +define void @fneg_fabs_v2(2 x 

[Mesa-dev] [PATCH 2/3] nv50: fix layerid to be the fp input number rather than vp output number

2014-02-03 Thread Ilia Mirkin
In the tests they were the same so it didn't matter, but indications are
that this is the correct behaviour. Also take this opportunity to
(trivially) support using gl_Layer in fp.

Cc: 10.1 mesa-sta...@lists.freedesktop.org
Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---

No regressions in piglit. Pretty sure that rewriting some of the
gl_Layer-using tests to have gp out's would have caused failures. Also this
gets rid of C99 stuff.

 src/gallium/drivers/nouveau/nv50/nv50_program.c  |  2 +-
 src/gallium/drivers/nouveau/nv50/nv50_program.h  |  2 +-
 src/gallium/drivers/nouveau/nv50/nv50_shader_state.c | 12 +++-
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c 
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index f14d3ef..e506438 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -104,7 +104,7 @@ nv50_vertprog_assign_slots(struct nv50_ir_prog_info *info)
  prog-vp.bfc[info-out[i].si] = i;
  break;
   case TGSI_SEMANTIC_LAYER:
- prog-gp.has_layer = true;
+ prog-gp.has_layer = TRUE;
  prog-gp.layerid = n;
  break;
   default:
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.h 
b/src/gallium/drivers/nouveau/nv50/nv50_program.h
index 75de007..8c1b327 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.h
@@ -89,7 +89,7 @@ struct nv50_program {
struct {
   uint32_t vert_count;
   uint8_t prim_type; /* point, line strip or tri strip */
-  bool has_layer;
+  uint8_t has_layer;
   ubyte layerid; /* hw value of layer output */
} gp;
 
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c 
b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
index f8b1c1b..129ed2a 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
@@ -346,7 +346,7 @@ nv50_fp_linkage_validate(struct nv50_context *nv50)
struct nv50_varying dummy;
int i, n, c, m;
uint32_t primid = 0;
-   uint32_t layerid = vp-gp.layerid;
+   uint32_t layerid = 0;
uint32_t psiz = 0x000;
uint32_t interp = fp-fp.interp;
uint32_t colors = fp-fp.colors;
@@ -405,15 +405,17 @@ nv50_fp_linkage_validate(struct nv50_context *nv50)
   case TGSI_SEMANTIC_PRIMID:
  primid = m;
  break;
+  case TGSI_SEMANTIC_LAYER:
+ layerid = m;
+ break;
   }
   m = nv50_vec4_map(map, m, lin,
 fp-in[i], (n  vp-out_nr) ? vp-out[n] : dummy);
}
 
-   if (vp-gp.has_layer) {
-  // In GL4.x, layer can be an fp input, but not in 3.x. Make sure to add
-  // it to the output map.
-  map[m++] = layerid;
+   if (vp-gp.has_layer  !layerid) {
+  layerid = m;
+  map[m++] = vp-gp.layerid;
}
 
if (nv50-rast-pipe.point_size_per_vertex) {
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] nv50: implement multiple viewports/scissors, enable ARB_viewport_array

2014-02-03 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---

This was tested along with airlied's patches to add support to mesa/st. With
the piglit tests modified to emit the viewport index for every vertex, this
passes all but the minmax test (it wants a viewport of 16K while we only offer
8K).

No apparent piglit regressions.

 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  |   1 +
 src/gallium/drivers/nouveau/nv50/nv50_context.h|   6 +-
 src/gallium/drivers/nouveau/nv50/nv50_program.c|   5 +
 src/gallium/drivers/nouveau/nv50/nv50_program.h|   2 +
 src/gallium/drivers/nouveau/nv50/nv50_screen.c |  20 ++--
 src/gallium/drivers/nouveau/nv50/nv50_screen.h |   2 +
 .../drivers/nouveau/nv50/nv50_shader_state.c   |  15 ++-
 src/gallium/drivers/nouveau/nv50/nv50_state.c  |  28 +++--
 .../drivers/nouveau/nv50/nv50_state_validate.c | 113 +
 9 files changed, 136 insertions(+), 56 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 33ebb54..51d3d08 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -1031,6 +1031,7 @@ bool Source::scanInstruction(const struct 
tgsi_full_instruction *inst)
  if (info-out[dst.getIndex(0)].sn == TGSI_SEMANTIC_PSIZE ||
  info-out[dst.getIndex(0)].sn == TGSI_SEMANTIC_PRIMID ||
  info-out[dst.getIndex(0)].sn == TGSI_SEMANTIC_LAYER ||
+ info-out[dst.getIndex(0)].sn == TGSI_SEMANTIC_VIEWPORT_INDEX ||
  info-out[dst.getIndex(0)].sn == TGSI_SEMANTIC_FOG)
 info-out[dst.getIndex(0)].mask = 1;
 
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.h 
b/src/gallium/drivers/nouveau/nv50/nv50_context.h
index 1ce52c9..57a3090 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_context.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_context.h
@@ -163,8 +163,10 @@ struct nv50_context {
struct pipe_blend_color blend_colour;
struct pipe_stencil_ref stencil_ref;
struct pipe_poly_stipple stipple;
-   struct pipe_scissor_state scissor;
-   struct pipe_viewport_state viewport;
+   struct pipe_scissor_state scissors[NV50_MAX_VIEWPORTS];
+   unsigned scissors_dirty;
+   struct pipe_viewport_state viewports[NV50_MAX_VIEWPORTS];
+   unsigned viewports_dirty;
struct pipe_clip_state clip;
 
unsigned sample_mask;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c 
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index e506438..0e06125 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -107,6 +107,10 @@ nv50_vertprog_assign_slots(struct nv50_ir_prog_info *info)
  prog-gp.has_layer = TRUE;
  prog-gp.layerid = n;
  break;
+  case TGSI_SEMANTIC_VIEWPORT_INDEX:
+ prog-gp.has_viewport = true;
+ prog-gp.viewportid = n;
+ break;
   default:
  break;
   }
@@ -344,6 +348,7 @@ nv50_program_translate(struct nv50_program *prog, uint16_t 
chipset)
prog-vp.clpd[1] = map_undef;
prog-vp.psiz = map_undef;
prog-gp.has_layer = 0;
+   prog-gp.has_viewport = 0;
 
info-driverPriv = prog;
 
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.h 
b/src/gallium/drivers/nouveau/nv50/nv50_program.h
index 8c1b327..87b0679 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.h
@@ -91,6 +91,8 @@ struct nv50_program {
   uint8_t prim_type; /* point, line strip or tri strip */
   uint8_t has_layer;
   ubyte layerid; /* hw value of layer output */
+  uint8_t has_viewport;
+  ubyte viewportid; /* hw value of viewport index output */
} gp;
 
void *fixups; /* relocation records */
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 5bb341d..bdfb931 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -196,6 +196,8 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
   return PIPE_ENDIAN_LITTLE;
case PIPE_CAP_TGSI_VS_LAYER:
   return 0;
+   case PIPE_CAP_MAX_VIEWPORTS:
+  return NV50_MAX_VIEWPORTS;
default:
   NOUVEAU_ERR(unknown PIPE_CAP %d\n, param);
   return 0;
@@ -524,9 +526,11 @@ nv50_screen_init_hwctx(struct nv50_screen *screen)
 
BEGIN_NV04(push, NV50_3D(VIEWPORT_TRANSFORM_EN), 1);
PUSH_DATA (push, 1);
-   BEGIN_NV04(push, NV50_3D(DEPTH_RANGE_NEAR(0)), 2);
-   PUSH_DATAf(push, 0.0f);
-   PUSH_DATAf(push, 1.0f);
+   for (i = 0; i  NV50_MAX_VIEWPORTS; i++) {
+  BEGIN_NV04(push, NV50_3D(DEPTH_RANGE_NEAR(i)), 2);
+  PUSH_DATAf(push, 0.0f);
+  PUSH_DATAf(push, 1.0f);
+   }
 
BEGIN_NV04(push, NV50_3D(VIEW_VOLUME_CLIP_CTRL), 1);
 #ifdef NV50_SCISSORS_CLIPPING
@@ -541,10 +545,12 

[Mesa-dev] [PATCH 1/3] nv50: rework primid logic

2014-02-03 Thread Ilia Mirkin
Functionally identical but much simpler. Should also better integrate
with future layer/viewport changes/fixes.

Cc: 10.1 mesa-sta...@lists.freedesktop.org
Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---

Not *strictly* necessary in stable, but it will make backporting later fixes
easier. No regressions in piglit.

 src/gallium/drivers/nouveau/nv50/nv50_program.c  | 5 +
 src/gallium/drivers/nouveau/nv50/nv50_program.h  | 1 -
 src/gallium/drivers/nouveau/nv50/nv50_shader_state.c | 4 +++-
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c 
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index 636ef87..f14d3ef 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -170,10 +170,8 @@ nv50_fragprog_assign_slots(struct nv50_ir_prog_info *info)
 
  if (info-in[i].sn == TGSI_SEMANTIC_COLOR)
 prog-vp.bfc[info-in[i].si] = j;
- else if (info-in[i].sn == TGSI_SEMANTIC_PRIMID) {
+ else if (info-in[i].sn == TGSI_SEMANTIC_PRIMID)
 prog-vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_PRIMITIVE_ID;
-prog-gp.primid = j;
- }
 
  prog-in[j].id = i;
  prog-in[j].mask = info-in[i].mask;
@@ -345,7 +343,6 @@ nv50_program_translate(struct nv50_program *prog, uint16_t 
chipset)
prog-vp.clpd[0] = map_undef;
prog-vp.clpd[1] = map_undef;
prog-vp.psiz = map_undef;
-   prog-gp.primid = 0x80;
prog-gp.has_layer = 0;
 
info-driverPriv = prog;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.h 
b/src/gallium/drivers/nouveau/nv50/nv50_program.h
index 5b092bd..75de007 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.h
@@ -88,7 +88,6 @@ struct nv50_program {
 
struct {
   uint32_t vert_count;
-  ubyte primid; /* primitive id output register */
   uint8_t prim_type; /* point, line strip or tri strip */
   bool has_layer;
   ubyte layerid; /* hw value of layer output */
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c 
b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
index 9a43502..f8b1c1b 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
@@ -401,8 +401,10 @@ nv50_fp_linkage_validate(struct nv50_context *nv50)
  if (vp-out[n].sn == fp-in[i].sn 
  vp-out[n].si == fp-in[i].si)
 break;
-  if (i == fp-gp.primid) {
+  switch (fp-in[i].sn) {
+  case TGSI_SEMANTIC_PRIMID:
  primid = m;
+ break;
   }
   m = nv50_vec4_map(map, m, lin,
 fp-in[i], (n  vp-out_nr) ? vp-out[n] : dummy);
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nv50: only over-allocate by a page for code

2014-02-03 Thread Ilia Mirkin
The pre-fetching doesn't go too far. Tested with over-allocating by only
a page, and didn't see any errors in dmesg. Saves ~512KB of VRAM.

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
Cc: 10.1 mesa-sta...@lists.freedesktop.org
---
 src/gallium/drivers/nouveau/nv50/nv50_screen.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index bdfb931..b02b6f8 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -747,12 +747,13 @@ nv50_screen_create(struct nouveau_device *dev)
   goto fail;
}
 
-   /* This over-allocates by a whole code BO. The GP, which would execute at
-* the end of the last page, would trigger faults. The going theory is that
-* it prefetches up to a certain amount. This avoids dmesg spam.
+   /* This over-allocates by a page. The GP, which would execute at the end of
+* the last page, would trigger faults. The going theory is that it
+* prefetches up to a certain amount. This avoids dmesg spam.
 */
ret = nouveau_bo_new(dev, NOUVEAU_BO_VRAM, 1  16,
-4  NV50_CODE_BO_SIZE_LOG2, NULL, screen-code);
+(3  NV50_CODE_BO_SIZE_LOG2) + 0x1000,
+NULL, screen-code);
if (ret) {
   NOUVEAU_ERR(Failed to allocate code bo: %d\n, ret);
   goto fail;
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/blorp: Fix hw blending coefficients

2014-02-03 Thread Pohjolainen, Topi
On Mon, Feb 03, 2014 at 11:30:58AM -0800, Eric Anholt wrote:
 Topi Pohjolainen topi.pohjolai...@intel.com writes:
 
  Previously the color components where evaluated using formula
  (src_color * src_color + 1.0 dst_color) and alpha in turn using
^ 0.0
  (1.0 * src_alpha + 0.0 * dst_alpha). The intention is to keep
  source color components unmodified and force alpha channel to
  fixed value of one regardless of source or destination.
 
 Not multiplying the source color by source color makes a lot of sense.
 But your CONSTANT_ALPHA with ALPHA of 1.0 is the same as just using ONE.
 (you're still multiplying the src alpha by 1.0 with your change, not
 replacing the source alpha value with 1.0).

Oh, that is true, it is still just a multiplier.

 
 Basically, I think your original patch is equivalent to this and bettter
 than this, and if we want to actually make the no-alpha-bits-present
 thing work, we need to override the bits in the surface state or in the
 generated code.  In the normal draw path, it's done for sampling by the
 swizzling code in brw_wm_surface_state.c, and the blending overrides is
 just to fix up the alpha blending stage which doesn't pay attention to
 that for the destination surface.
 
 So, the original patch is:
 
 Reviewed-by: Eric Anholt e...@anholt.net

Ok, thanks for the careful review!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev