Re: [Mesa-dev] [PATCH] i965: Actually check every primitive for cut index support.

2013-09-03 Thread Ian Romanick
On 09/01/2013 07:05 PM, Kenneth Graunke wrote:
 can_cut_index_handle_prims() was passed an array of _mesa_prim objects
 and a count, and ran a loop for that many iterations.  However, it
 treated the array like a pointer, repeatedly checking the first element.

Blarg.  How would an application be able to observe the old bug
behavior?  Can we come up with a test case that might tickle it?

 This patch makes it actually check every primitive.
 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org

Either way, the patch is obviously correct.

Reviewed-by: Ian Romanick ian.d.roman...@intel.com

 ---
  src/mesa/drivers/dri/i965/brw_primitive_restart.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_primitive_restart.c 
 b/src/mesa/drivers/dri/i965/brw_primitive_restart.c
 index 0dbc48f..b305dca 100644
 --- a/src/mesa/drivers/dri/i965/brw_primitive_restart.c
 +++ b/src/mesa/drivers/dri/i965/brw_primitive_restart.c
 @@ -92,8 +92,8 @@ can_cut_index_handle_prims(struct gl_context *ctx,
return false;
 }
  
 -   for ( ; nr_prims  0; nr_prims--) {
 -  switch(prim-mode) {
 +   for (int i = 0; i  nr_prims; i++) {
 +  switch (prim[i].mode) {
case GL_POINTS:
case GL_LINES:
case GL_LINE_STRIP:
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glx: Initialize OpenGL version to 1.0

2013-09-03 Thread Ian Romanick
Please send patches only using git-send-email.  Send patches as
attachments prevents people from being able to provide in-line review
comments.

On 09/01/2013 12:30 PM, Rico Schüller wrote:
 
 Some driver/card combinations (r200/RV280, i915/915G) don't support
 OpenGL 2.1. These create in some corner cases an indirect context

This was a typo on my part.  The Linux ABI requires OpenGL 1.2, so every
driver will support that.  I meant to type

   uint32_t major_ver = 1;
   uint32_t minor_ver = 2;

but instead typed

   uint32_t minor_ver = 1;
   uint32_t major_ver = 2;

Copy and paste did the rest. :(

All of your other changes are, I think, unnecessary code motion.

Does making that one change in dri2_glx.c and drisw_glx.c fix the problem?

 instead of a direct context when calling glXCreateContextAttribsARB().
 This happens because of a bad default value. To avoid this, choose a
 more sane default OpenGL 1.0 as mentioned in the ARB_create_context spec:
 The default values for GLX_CONTEXT_MAJOR_VERSION_ARB and
 GLX_CONTEXT_MINOR_VERSION_ARB are 1 and 0 respectively. In this
 case, implementations will typically return the most recent version
 of OpenGL they support which is backwards compatible with OpenGL 1.0
 (e.g. 3.0, 3.1 + GL_ARB_compatibility, or 3.2 compatibility profile)
 
 This fixes: http://bugs.winehq.org/show_bug.cgi?id=34238
 
 Signed-off-by: Rico Schüller kgbric...@web.de
 ---
  src/glx/dri2_glx.c   | 10 +-
  src/glx/dri_common.c | 14 +++---
  src/glx/drisw_glx.c  | 10 +-
  3 files changed, 17 insertions(+), 17 deletions(-)
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 54080] glXQueryDrawable fails with GLXBadDrawable for a Window in direct context

2013-09-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=54080

Alexander Monakov amona...@gmail.com changed:

   What|Removed |Added

 CC||a...@nwnk.net

--- Comment #2 from Alexander Monakov amona...@gmail.com ---
Adam, you seem to have looked at this a couple of times.  Will it be fixed with
your pending GLX patches?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glx: Initialize OpenGL version to 1.0

2013-09-03 Thread Rico Schüller

On 03.09.2013 01:54, Ian Romanick wrote:

Please send patches only using git-send-email.  Send patches as
attachments prevents people from being able to provide in-line review
comments.

Ok, sorry for the trouble.



On 09/01/2013 12:30 PM, Rico Schüller wrote:


Some driver/card combinations (r200/RV280, i915/915G) don't support
OpenGL 2.1. These create in some corner cases an indirect context


This was a typo on my part.  The Linux ABI requires OpenGL 1.2, so every
driver will support that.  I meant to type

uint32_t major_ver = 1;
uint32_t minor_ver = 2;

but instead typed

uint32_t minor_ver = 1;
uint32_t major_ver = 2;

Copy and paste did the rest. :(


So yes, we agree here, the version number needs to be fixed. The 
simplest one is to just change the number. I'm fine with it. I have no 
strong opinion about it. Though I think it should be consistent across 
all initialization occurrences (in dri_common.c/dri2_glx.c/drisw_glx.c).




All of your other changes are, I think, unnecessary code motion.


Yes, that's correct. It just removed the duplicated code and initialized 
all values in one location. As it seems to be unnecessary to initialize 
some variables in e.g. src/glx/dri2_glx.c:dri2_create_context_attribs 
only for the case when num_attribs is 0 in 
src/glx/dri_common.c:dri2_convert_glx_attribs. Hell, in all other cases 
they are overwritten (in dri_common.c) later anyway. So why bother to 
initialize it in dri2_glx.c/drisw_glx.c at all? It probably should be 
put into separate patch (if at all)...




Does making that one change in dri2_glx.c and drisw_glx.c fix the problem?


So imho, the only change really needed to fix this, is setting the 
version to the correct value (minor_ver = 2(or 0), major_ver = 1). The 
other changes are just cosmetic. I tested this with the environment 
variable e.g. MESA_GL_VERSION_OVERRIDE=1.3 (and the attachment 
http://bugs.winehq.org/attachment.cgi?id=45801 from wine bug 34238). The 
tests with real hardware weren't run, yet (as I don't have such hardware).


Cheers
Rico




instead of a direct context when calling glXCreateContextAttribsARB().
This happens because of a bad default value. To avoid this, choose a
more sane default OpenGL 1.0 as mentioned in the ARB_create_context spec:
 The default values for GLX_CONTEXT_MAJOR_VERSION_ARB and
 GLX_CONTEXT_MINOR_VERSION_ARB are 1 and 0 respectively. In this
 case, implementations will typically return the most recent version
 of OpenGL they support which is backwards compatible with OpenGL 1.0
 (e.g. 3.0, 3.1 + GL_ARB_compatibility, or 3.2 compatibility profile)

This fixes: http://bugs.winehq.org/show_bug.cgi?id=34238

Signed-off-by: Rico Schüller kgbric...@web.de
---
  src/glx/dri2_glx.c   | 10 +-
  src/glx/dri_common.c | 14 +++---
  src/glx/drisw_glx.c  | 10 +-
  3 files changed, 17 insertions(+), 17 deletions(-)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev





___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/11] Transform Feedback for Radeon SI

2013-09-03 Thread Marek Olšák
This series implements transform feedback for Radeon SI, which also enables 
OpenGL 3.0. It requires the LLVM patch I sent yesterday.

Transform feedback is disabled by default on CIK, because my card is very 
unstable with current kernel DRM. It should work though.

Please review.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/11] radeon: don't emit VGT_STRMOUT_BUFFER_BASE on SI

2013-09-03 Thread Marek Olšák
The register doesn't exist on SI.
---
 src/gallium/drivers/radeon/r600_streamout.c | 85 ++---
 1 file changed, 54 insertions(+), 31 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_streamout.c 
b/src/gallium/drivers/radeon/r600_streamout.c
index ab40630..313d737 100644
--- a/src/gallium/drivers/radeon/r600_streamout.c
+++ b/src/gallium/drivers/radeon/r600_streamout.c
@@ -74,23 +74,35 @@ static void r600_so_target_destroy(struct pipe_context *ctx,
 
 void r600_streamout_buffers_dirty(struct r600_common_context *rctx)
 {
+   struct r600_atom *begin = rctx-streamout.begin_atom;
+   unsigned num_bufs = util_bitcount(rctx-streamout.enabled_mask);
+   unsigned num_bufs_appended = util_bitcount(rctx-streamout.enabled_mask 

+  
rctx-streamout.append_bitmask);
+
rctx-streamout.num_dw_for_end =
12 + /* flush_vgt_streamout */
-   util_bitcount(rctx-streamout.enabled_mask) * 8 + /* 
STRMOUT_BUFFER_UPDATE */
+   num_bufs * 8 + /* STRMOUT_BUFFER_UPDATE */
3 /* set_streamout_enable(0) */;
 
-   rctx-streamout.begin_atom.num_dw =
-   12 + /* flush_vgt_streamout */
-   6 + /* set_streamout_enable */
-   util_bitcount(rctx-streamout.enabled_mask) * 7 + /* 
SET_CONTEXT_REG */
-   (rctx-family = CHIP_RS780 
-rctx-family = CHIP_RV740 ? 
util_bitcount(rctx-streamout.enabled_mask) * 5 : 0) + /* STRMOUT_BASE_UPDATE */
-   util_bitcount(rctx-streamout.enabled_mask  
rctx-streamout.append_bitmask) * 8 + /* STRMOUT_BUFFER_UPDATE */
-   util_bitcount(rctx-streamout.enabled_mask  
~rctx-streamout.append_bitmask) * 6 + /* STRMOUT_BUFFER_UPDATE */
+   begin-num_dw = 12 + /* flush_vgt_streamout */
+   6; /* set_streamout_enable */
+
+   if (rctx-chip_class = SI) {
+   begin-num_dw += num_bufs * 4; /* SET_CONTEXT_REG */
+   } else {
+   begin-num_dw += num_bufs * 7; /* SET_CONTEXT_REG */
+
+   if (rctx-family = CHIP_RS780  rctx-family = CHIP_RV740)
+   begin-num_dw += num_bufs * 5; /* STRMOUT_BASE_UPDATE */
+   }
+
+   begin-num_dw +=
+   num_bufs_appended * 8 + /* STRMOUT_BUFFER_UPDATE */
+   (num_bufs - num_bufs_appended) * 6 + /* STRMOUT_BUFFER_UPDATE */
(rctx-family  CHIP_R600  rctx-family  CHIP_RS780 ? 2 : 0) 
+ /* SURFACE_BASE_UPDATE */
rctx-streamout.num_dw_for_end;
 
-   rctx-streamout.begin_atom.dirty = true;
+   begin-dirty = true;
 }
 
 void r600_set_streamout_targets(struct pipe_context *ctx,
@@ -209,7 +221,6 @@ static void r600_emit_streamout_begin(struct 
r600_common_context *rctx, struct r
struct r600_so_target **t = rctx-streamout.targets;
unsigned *stride_in_dw = rctx-streamout.stride_in_dw;
unsigned i, update_flags = 0;
-   uint64_t va;
 
if (rctx-chip_class = EVERGREEN) {
evergreen_flush_vgt_streamout(rctx);
@@ -225,34 +236,46 @@ static void r600_emit_streamout_begin(struct 
r600_common_context *rctx, struct r
 
t[i]-stride_in_dw = stride_in_dw[i];
 
-   va = r600_resource_va(rctx-b.screen,
- (void*)t[i]-b.buffer);
-
-   update_flags |= SURFACE_BASE_UPDATE_STRMOUT(i);
-
-   r600_write_context_reg_seq(cs, 
R_028AD0_VGT_STRMOUT_BUFFER_SIZE_0 + 16*i, 3);
-   radeon_emit(cs, (t[i]-b.buffer_offset +
-t[i]-b.buffer_size)  2);/* BUFFER_SIZE 
(in DW) */
-   radeon_emit(cs, stride_in_dw[i]);   /* VTX_STRIDE 
(in DW) */
-   radeon_emit(cs, va  8);   /* BUFFER_BASE 
*/
+   if (rctx-chip_class = SI) {
+   /* SI binds streamout buffers as shader resources.
+* VGT only counts primitives and tells the shader
+* through SGPRs what to do. */
+   r600_write_context_reg_seq(cs, 
R_028AD0_VGT_STRMOUT_BUFFER_SIZE_0 + 16*i, 2);
+   radeon_emit(cs, (t[i]-b.buffer_offset +
+t[i]-b.buffer_size)  2);/* 
BUFFER_SIZE (in DW) */
+   radeon_emit(cs, stride_in_dw[i]);   /* 
VTX_STRIDE (in DW) */
+   } else {
+   uint64_t va = r600_resource_va(rctx-b.screen,
+  (void*)t[i]-b.buffer);
 
-   r600_emit_reloc(rctx, rctx-rings.gfx, 
r600_resource(t[i]-b.buffer),
-   RADEON_USAGE_WRITE);
+   update_flags |= SURFACE_BASE_UPDATE_STRMOUT(i);
 
-   /* R7xx requires this packet after updating BUFFER_BASE.
-* Without this, R7xx locks up. */
-   if 

[Mesa-dev] [PATCH 02/11] radeon: don't emit streamout state if there are no streamout buffers

2013-09-03 Thread Marek Olšák
This could happen if set_stream_output_targets is called twice
in a row without a draw call in between.
---
 src/gallium/drivers/radeon/r600_streamout.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/radeon/r600_streamout.c 
b/src/gallium/drivers/radeon/r600_streamout.c
index 313d737..18f7d88 100644
--- a/src/gallium/drivers/radeon/r600_streamout.c
+++ b/src/gallium/drivers/radeon/r600_streamout.c
@@ -137,6 +137,8 @@ void r600_set_streamout_targets(struct pipe_context *ctx,
 
if (num_targets) {
r600_streamout_buffers_dirty(rctx);
+   } else {
+   rctx-streamout.begin_atom.dirty = false;
}
 }
 
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/11] radeonsi: integrate shared streamout state

2013-09-03 Thread Marek Olšák
---
 src/gallium/drivers/radeonsi/r600_blit.c   |  4 ++--
 src/gallium/drivers/radeonsi/r600_hw_context.c | 26 +++---
 src/gallium/drivers/radeonsi/radeonsi_pipe.c   |  2 ++
 src/gallium/drivers/radeonsi/radeonsi_pipe.h   |  9 +
 src/gallium/drivers/radeonsi/radeonsi_shader.h |  1 -
 src/gallium/drivers/radeonsi/si_state.c|  6 --
 src/gallium/drivers/radeonsi/si_state_draw.c   | 10 --
 7 files changed, 20 insertions(+), 38 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/r600_blit.c 
b/src/gallium/drivers/radeonsi/r600_blit.c
index 20c1767..9d7c738 100644
--- a/src/gallium/drivers/radeonsi/r600_blit.c
+++ b/src/gallium/drivers/radeonsi/r600_blit.c
@@ -64,8 +64,8 @@ static void r600_blitter_begin(struct pipe_context *ctx, enum 
r600_blitter_op op
util_blitter_save_viewport(rctx-blitter, 
rctx-queued.named.viewport-viewport);
}
util_blitter_save_vertex_buffer_slot(rctx-blitter, 
rctx-vertex_buffer);
-   util_blitter_save_so_targets(rctx-blitter, rctx-num_so_targets,
-(struct 
pipe_stream_output_target**)rctx-so_targets);
+   util_blitter_save_so_targets(rctx-blitter, 
rctx-b.streamout.num_targets,
+(struct 
pipe_stream_output_target**)rctx-b.streamout.targets);
 
if (op  R600_SAVE_FRAMEBUFFER)
util_blitter_save_framebuffer(rctx-blitter, 
rctx-framebuffer);
diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c 
b/src/gallium/drivers/radeonsi/r600_hw_context.c
index db622ba..1a2128e 100644
--- a/src/gallium/drivers/radeonsi/r600_hw_context.c
+++ b/src/gallium/drivers/radeonsi/r600_hw_context.c
@@ -149,7 +149,9 @@ void si_need_cs_space(struct r600_context *ctx, unsigned 
num_dw,
num_dw += ctx-num_cs_dw_nontimer_queries_suspend;
 
/* Count in streamout_end at the end of CS. */
-   num_dw += ctx-num_cs_dw_streamout_end;
+   if (ctx-b.streamout.begin_emitted) {
+   num_dw += ctx-b.streamout.num_dw_for_end;
+   }
 
/* Count in render_condition(NULL) at the end of CS. */
if (ctx-predicate_drawing) {
@@ -179,10 +181,6 @@ void si_context_flush(struct r600_context *ctx, unsigned 
flags)
struct radeon_winsys_cs *cs = ctx-b.rings.gfx.cs;
bool queries_suspended = false;
 
-#if 0
-   bool streamout_suspended = false;
-#endif
-
if (!cs-cdw)
return;
 
@@ -192,12 +190,12 @@ void si_context_flush(struct r600_context *ctx, unsigned 
flags)
queries_suspended = true;
}
 
-#if 0
-   if (ctx-num_cs_dw_streamout_end) {
-   r600_context_streamout_end(ctx);
-   streamout_suspended = true;
+   ctx-b.streamout.suspended = false;
+
+   if (ctx-b.streamout.begin_emitted) {
+   r600_emit_streamout_end(ctx-b);
+   ctx-b.streamout.suspended = true;
}
-#endif
 
ctx-b.flags |= R600_CONTEXT_FLUSH_AND_INV_CB |
R600_CONTEXT_FLUSH_AND_INV_CB_META |
@@ -263,12 +261,10 @@ void si_context_flush(struct r600_context *ctx, unsigned 
flags)
si_pm4_emit(ctx, ctx-queued.named.init);
ctx-emitted.named.init = ctx-queued.named.init;
 
-#if 0
-   if (streamout_suspended) {
-   ctx-streamout_start = TRUE;
-   ctx-streamout_append_bitmask = ~0;
+   if (ctx-b.streamout.suspended) {
+   ctx-b.streamout.append_bitmask = ctx-b.streamout.enabled_mask;
+   r600_streamout_buffers_dirty(ctx-b);
}
-#endif
 
/* resume queries */
if (queries_suspended) {
diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c 
b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
index 6ca138f..993f30a 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
@@ -248,6 +248,8 @@ static struct pipe_context *r600_create_context(struct 
pipe_screen *screen, void
rctx-cache_flush = si_atom_cache_flush;
rctx-atoms.cache_flush = rctx-cache_flush;
 
+   rctx-atoms.streamout_begin = rctx-b.streamout.begin_atom;
+
switch (rctx-b.chip_class) {
case SI:
case CIK:
diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.h 
b/src/gallium/drivers/radeonsi/radeonsi_pipe.h
index 61fdfe2..ed17f2c 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.h
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.h
@@ -137,6 +137,7 @@ struct r600_context {
/* Caches must be flushed after resource descriptors are
 * updated in memory. */
struct r600_atom *cache_flush;
+   struct r600_atom *streamout_begin;
};
struct r600_atom *array[0];
} atoms;
@@ -179,19 +180,11 @@ struct r600_context {
/* The list of active queries. Only one query of each type can be 
active. */
struct 

[Mesa-dev] [PATCH 04/11] radeonsi: initialize the first CS like any other

2013-09-03 Thread Marek Olšák
So that the init state is always emitted first and not later in draw_vbo.

This fixes streamout where the init state, which disables streamout,
was emitted in draw_vbo after streamout was enabled.
---
 src/gallium/drivers/radeonsi/r600.h|  1 +
 src/gallium/drivers/radeonsi/r600_hw_context.c | 11 ---
 src/gallium/drivers/radeonsi/radeonsi_pipe.c   |  5 +++--
 src/gallium/drivers/radeonsi/radeonsi_pipe.h   |  2 ++
 src/gallium/drivers/radeonsi/radeonsi_pm4.c|  1 +
 5 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/r600.h 
b/src/gallium/drivers/radeonsi/r600.h
index a914ce2..4b43169 100644
--- a/src/gallium/drivers/radeonsi/r600.h
+++ b/src/gallium/drivers/radeonsi/r600.h
@@ -74,6 +74,7 @@ struct r600_screen;
 
 void si_get_backend_mask(struct r600_context *ctx);
 void si_context_flush(struct r600_context *ctx, unsigned flags);
+void si_begin_new_cs(struct r600_context *ctx);
 
 struct r600_query *r600_context_query_create(struct r600_context *ctx, 
unsigned query_type);
 void r600_context_query_destroy(struct r600_context *ctx, struct r600_query 
*query);
diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c 
b/src/gallium/drivers/radeonsi/r600_hw_context.c
index 1a2128e..c8fa66c 100644
--- a/src/gallium/drivers/radeonsi/r600_hw_context.c
+++ b/src/gallium/drivers/radeonsi/r600_hw_context.c
@@ -179,15 +179,15 @@ void si_need_cs_space(struct r600_context *ctx, unsigned 
num_dw,
 void si_context_flush(struct r600_context *ctx, unsigned flags)
 {
struct radeon_winsys_cs *cs = ctx-b.rings.gfx.cs;
-   bool queries_suspended = false;
 
if (!cs-cdw)
return;
 
/* suspend queries */
+   ctx-nontimer_queries_suspended = false;
if (ctx-num_cs_dw_nontimer_queries_suspend) {
r600_context_queries_suspend(ctx);
-   queries_suspended = true;
+   ctx-nontimer_queries_suspended = true;
}
 
ctx-b.streamout.suspended = false;
@@ -245,6 +245,11 @@ void si_context_flush(struct r600_context *ctx, unsigned 
flags)
}
 #endif
 
+   si_begin_new_cs(ctx);
+}
+
+void si_begin_new_cs(struct r600_context *ctx)
+{
ctx-pm4_dirty_cdwords = 0;
 
/* Flush read caches at the beginning of CS. */
@@ -267,7 +272,7 @@ void si_context_flush(struct r600_context *ctx, unsigned 
flags)
}
 
/* resume queries */
-   if (queries_suspended) {
+   if (ctx-nontimer_queries_suspended) {
r600_context_queries_resume(ctx);
}
 
diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c 
b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
index 993f30a..e219e36 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
@@ -279,14 +279,15 @@ static struct pipe_context *r600_create_context(struct 
pipe_screen *screen, void
if (rctx-blitter == NULL)
goto fail;
 
-   si_get_backend_mask(rctx); /* this emits commands and must be last */
-
rctx-dummy_pixel_shader =
util_make_fragment_cloneinput_shader(rctx-b.b, 0,
 TGSI_SEMANTIC_GENERIC,
 TGSI_INTERPOLATE_CONSTANT);
rctx-b.b.bind_fs_state(rctx-b.b, rctx-dummy_pixel_shader);
 
+   /* these must be last */
+   si_begin_new_cs(rctx);
+   si_get_backend_mask(rctx);
return rctx-b.b;
 fail:
r600_destroy_context(rctx-b.b);
diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.h 
b/src/gallium/drivers/radeonsi/radeonsi_pipe.h
index ed17f2c..c5059e8 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.h
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.h
@@ -180,6 +180,8 @@ struct r600_context {
/* The list of active queries. Only one query of each type can be 
active. */
struct list_headactive_nontimer_query_list;
unsignednum_cs_dw_nontimer_queries_suspend;
+   /* If queries have been suspended. */
+   boolnontimer_queries_suspended;
 
unsignedbackend_mask;
unsignedmax_db; /* for OQ */
diff --git a/src/gallium/drivers/radeonsi/radeonsi_pm4.c 
b/src/gallium/drivers/radeonsi/radeonsi_pm4.c
index 37a199d..eed0c47 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pm4.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_pm4.c
@@ -242,6 +242,7 @@ void si_pm4_emit_dirty(struct r600_context *rctx)
if (!state || rctx-emitted.array[i] == state)
continue;
 
+   assert(state != rctx-queued.named.init);
si_pm4_emit(rctx, state);
rctx-emitted.array[i] = state;
}
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/11] radeonsi: handle rasterizer_discard and set GS_OUT_PRIM_TYPE

2013-09-03 Thread Marek Olšák
---
 src/gallium/drivers/radeonsi/si_state.c  |  1 +
 src/gallium/drivers/radeonsi/si_state_draw.c | 28 +++-
 src/gallium/drivers/radeonsi/sid.h   |  3 +++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index f409af4..650db4f 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -552,6 +552,7 @@ static void *si_create_rs_state(struct pipe_context *ctx,
S_028810_PS_UCP_MODE(3) |
S_028810_ZCLIP_NEAR_DISABLE(!state-depth_clip) |
S_028810_ZCLIP_FAR_DISABLE(!state-depth_clip) |
+   S_028810_DX_RASTERIZATION_KILL(state-rasterizer_discard) |
S_028810_DX_LINEAR_ATTR_CLIP_ENA(1);
 
clip_rule = state-scissor ? 0x : 0x;
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index 581d289..3529660 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -273,12 +273,36 @@ static unsigned si_conv_pipe_prim(unsigned pprim)
return result;
 }
 
+static unsigned r600_conv_prim_to_gs_out(unsigned mode)
+{
+   static const int prim_conv[] = {
+   [PIPE_PRIM_POINTS]  = 
V_028A6C_OUTPRIM_TYPE_POINTLIST,
+   [PIPE_PRIM_LINES]   = 
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
+   [PIPE_PRIM_LINE_LOOP]   = 
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
+   [PIPE_PRIM_LINE_STRIP]  = 
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
+   [PIPE_PRIM_TRIANGLES]   = 
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   [PIPE_PRIM_TRIANGLE_STRIP]  = 
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   [PIPE_PRIM_TRIANGLE_FAN]= 
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   [PIPE_PRIM_QUADS]   = 
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   [PIPE_PRIM_QUAD_STRIP]  = 
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   [PIPE_PRIM_POLYGON] = 
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   [PIPE_PRIM_LINES_ADJACENCY] = 
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
+   [PIPE_PRIM_LINE_STRIP_ADJACENCY]= 
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
+   [PIPE_PRIM_TRIANGLES_ADJACENCY] = 
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   [PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY]= 
V_028A6C_OUTPRIM_TYPE_TRISTRIP
+   };
+   assert(mode  Elements(prim_conv));
+
+   return prim_conv[mode];
+}
+
 static bool si_update_draw_info_state(struct r600_context *rctx,
   const struct pipe_draw_info *info)
 {
struct si_pm4_state *pm4 = si_pm4_alloc_state(rctx);
struct si_shader *vs = rctx-vs_shader-current-shader;
unsigned prim = si_conv_pipe_prim(info-mode);
+   unsigned gs_out_prim = r600_conv_prim_to_gs_out(info-mode);
unsigned ls_mask = 0;
 
if (pm4 == NULL)
@@ -291,8 +315,10 @@ static bool si_update_draw_info_state(struct r600_context 
*rctx,
 
if (rctx-b.chip_class = CIK)
si_pm4_set_reg(pm4, R_030908_VGT_PRIMITIVE_TYPE, prim);
-   else
+   else {
si_pm4_set_reg(pm4, R_008958_VGT_PRIMITIVE_TYPE, prim);
+   si_pm4_set_reg(pm4, R_028A6C_VGT_GS_OUT_PRIM_TYPE, gs_out_prim);
+   }
si_pm4_set_reg(pm4, R_028400_VGT_MAX_VTX_INDX, ~0);
si_pm4_set_reg(pm4, R_028404_VGT_MIN_VTX_INDX, 0);
si_pm4_set_reg(pm4, R_028408_VGT_INDX_OFFSET,
diff --git a/src/gallium/drivers/radeonsi/sid.h 
b/src/gallium/drivers/radeonsi/sid.h
index 7f3329c..c6688b3 100644
--- a/src/gallium/drivers/radeonsi/sid.h
+++ b/src/gallium/drivers/radeonsi/sid.h
@@ -7423,6 +7423,9 @@
 #define   S_028A6C_OUTPRIM_TYPE(x)(((x)  
0x3F)  0)
 #define   G_028A6C_OUTPRIM_TYPE(x)(((x)  
0)  0x3F)
 #define   C_028A6C_OUTPRIM_TYPE   
0xFFC0
+#define V_028A6C_OUTPRIM_TYPE_POINTLIST0
+#define V_028A6C_OUTPRIM_TYPE_LINESTRIP1
+#define V_028A6C_OUTPRIM_TYPE_TRISTRIP 2
 #define   S_028A6C_OUTPRIM_TYPE_1(x)  (((x)  
0x3F)  8)
 #define   G_028A6C_OUTPRIM_TYPE_1(x)  (((x)  
8)  0x3F)
 #define   C_028A6C_OUTPRIM_TYPE_1 
0xC0FF
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/11] radeonsi: bind streamout buffers to VGT and the vertex shader

2013-09-03 Thread Marek Olšák
---
 src/gallium/drivers/radeonsi/radeonsi_pipe.h   |  2 +
 src/gallium/drivers/radeonsi/radeonsi_shader.c |  1 +
 src/gallium/drivers/radeonsi/radeonsi_shader.h | 18 ---
 src/gallium/drivers/radeonsi/si_descriptors.c  | 68 ++
 4 files changed, 81 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.h 
b/src/gallium/drivers/radeonsi/radeonsi_pipe.h
index c5059e8..9306790 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.h
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.h
@@ -134,6 +134,7 @@ struct r600_context {
/* The order matters. */
struct r600_atom *const_buffers[SI_NUM_SHADERS];
struct r600_atom *sampler_views[SI_NUM_SHADERS];
+   struct r600_atom *streamout_buffers;
/* Caches must be flushed after resource descriptors are
 * updated in memory. */
struct r600_atom *cache_flush;
@@ -164,6 +165,7 @@ struct r600_context {
unsignedsprite_coord_enable;
unsignedexport_16bpc;
struct si_buffer_resources  const_buffers[SI_NUM_SHADERS];
+   struct si_buffer_resources  streamout_buffers;
struct r600_textures_info   samplers[SI_NUM_SHADERS];
struct r600_resource*border_color_table;
unsignedborder_color_offset;
diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c 
b/src/gallium/drivers/radeonsi/radeonsi_shader.c
index 77915ae..335cd79 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_shader.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c
@@ -1364,6 +1364,7 @@ static void create_function(struct si_shader_context 
*si_shader_ctx)
switch (si_shader_ctx-type) {
case TGSI_PROCESSOR_VERTEX:
params[SI_PARAM_VERTEX_BUFFER] = params[SI_PARAM_CONST];
+   params[SI_PARAM_SO_BUFFER] = params[SI_PARAM_CONST];
params[SI_PARAM_START_INSTANCE] = i32;
last_sgpr = SI_PARAM_START_INSTANCE;
params[SI_PARAM_VERTEX_ID] = i32;
diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.h 
b/src/gallium/drivers/radeonsi/radeonsi_shader.h
index ede8bde..64766c9 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_shader.h
+++ b/src/gallium/drivers/radeonsi/radeonsi_shader.h
@@ -34,10 +34,11 @@
 #define SI_SGPR_CONST  0
 #define SI_SGPR_SAMPLER2
 #define SI_SGPR_RESOURCE   4
-#define SI_SGPR_VERTEX_BUFFER  6
-#define SI_SGPR_START_INSTANCE 8
+#define SI_SGPR_VERTEX_BUFFER  6  /* VS only */
+#define SI_SGPR_SO_BUFFER  8  /* VS only, stream-out */
+#define SI_SGPR_START_INSTANCE 10 /* VS only */
 
-#define SI_VS_NUM_USER_SGPR9
+#define SI_VS_NUM_USER_SGPR11
 #define SI_PS_NUM_USER_SGPR6
 
 /* LLVM function parameter indices */
@@ -47,11 +48,12 @@
 
 /* VS only parameters */
 #define SI_PARAM_VERTEX_BUFFER 3
-#define SI_PARAM_START_INSTANCE4
-#define SI_PARAM_VERTEX_ID 5
-#define SI_PARAM_DUMMY_0   6
-#define SI_PARAM_DUMMY_1   7
-#define SI_PARAM_INSTANCE_ID   8
+#define SI_PARAM_SO_BUFFER 4
+#define SI_PARAM_START_INSTANCE5
+#define SI_PARAM_VERTEX_ID 6
+#define SI_PARAM_DUMMY_0   7
+#define SI_PARAM_DUMMY_1   8
+#define SI_PARAM_INSTANCE_ID   9
 
 /* PS only parameters */
 #define SI_PARAM_PRIM_MASK 3
diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 5d85448..a8f8781 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -456,6 +456,67 @@ static void si_set_constant_buffer(struct pipe_context 
*ctx, uint shader, uint s
si_update_descriptors(rctx, buffers-desc);
 }
 
+/* STREAMOUT BUFFERS */
+
+static void si_set_streamout_targets(struct pipe_context *ctx,
+unsigned num_targets,
+struct pipe_stream_output_target **targets,
+unsigned append_bitmask)
+{
+   struct r600_context *rctx = (struct r600_context *)ctx;
+   struct si_buffer_resources *buffers = rctx-streamout_buffers;
+   unsigned old_num_targets = rctx-b.streamout.num_targets;
+   unsigned i;
+
+   /* Streamout buffers must be bound in 2 places:
+* 1) in VGT by setting the VGT_STRMOUT registers
+* 2) as shader resources
+*/
+
+   /* Set the VGT regs. */
+   r600_set_streamout_targets(ctx, num_targets, targets, append_bitmask);
+
+   /* Set the shader resources.*/
+   for (i = 0; i  num_targets; i++) {
+   if (targets[i]) {
+   struct pipe_resource *buffer = targets[i]-buffer;
+   uint64_t va = r600_resource_va(ctx-screen, buffer);
+
+   /* Set 

[Mesa-dev] [PATCH 07/11] radeonsi: implement streamout flush properly

2013-09-03 Thread Marek Olšák
---
 src/gallium/drivers/radeonsi/si_state_draw.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index 3529660..e65b0cf 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -649,10 +649,16 @@ void si_emit_cache_flush(struct r600_common_context 
*rctx, struct r600_atom *ato
radeon_emit(cs, EVENT_TYPE(V_028A90_FLUSH_AND_INV_CB_META) | 
EVENT_INDEX(0));
}
 
+   if (rctx-flags  R600_CONTEXT_STREAMOUT_FLUSH) {
+   /* Needed if streamout buffers are going to be used as a 
source. */
+   radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
+   radeon_emit(cs, EVENT_TYPE(V_028A90_VS_PARTIAL_FLUSH) | 
EVENT_INDEX(4));
+   }
+
rctx-flags = 0;
 }
 
-const struct r600_atom si_atom_cache_flush = { si_emit_cache_flush, 9 }; /* 
number of CS dwords */
+const struct r600_atom si_atom_cache_flush = { si_emit_cache_flush, 11 }; /* 
number of CS dwords */
 
 void si_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info *info)
 {
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/11] radeonsi: fix streamout queries

2013-09-03 Thread Marek Olšák
---
 src/gallium/drivers/radeonsi/r600_hw_context.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c 
b/src/gallium/drivers/radeonsi/r600_hw_context.c
index c8fa66c..b6e7a0f 100644
--- a/src/gallium/drivers/radeonsi/r600_hw_context.c
+++ b/src/gallium/drivers/radeonsi/r600_hw_context.c
@@ -474,8 +474,8 @@ void r600_query_begin(struct r600_context *ctx, struct 
r600_query *query)
case PIPE_QUERY_SO_OVERFLOW_PREDICATE:
cs-buf[cs-cdw++] = PKT3(PKT3_EVENT_WRITE, 2, 0);
cs-buf[cs-cdw++] = 
EVENT_TYPE(EVENT_TYPE_SAMPLE_STREAMOUTSTATS) | EVENT_INDEX(3);
-   cs-buf[cs-cdw++] = query-results_end;
-   cs-buf[cs-cdw++] = 0;
+   cs-buf[cs-cdw++] = va;
+   cs-buf[cs-cdw++] = (va  32UL)  0xFF;
break;
case PIPE_QUERY_TIME_ELAPSED:
cs-buf[cs-cdw++] = PKT3(PKT3_EVENT_WRITE_EOP, 4, 0);
@@ -529,10 +529,11 @@ void r600_query_end(struct r600_context *ctx, struct 
r600_query *query)
case PIPE_QUERY_PRIMITIVES_GENERATED:
case PIPE_QUERY_SO_STATISTICS:
case PIPE_QUERY_SO_OVERFLOW_PREDICATE:
+   va += query-results_end + query-result_size/2;
cs-buf[cs-cdw++] = PKT3(PKT3_EVENT_WRITE, 2, 0);
cs-buf[cs-cdw++] = 
EVENT_TYPE(EVENT_TYPE_SAMPLE_STREAMOUTSTATS) | EVENT_INDEX(3);
-   cs-buf[cs-cdw++] = query-results_end + query-result_size/2;
-   cs-buf[cs-cdw++] = 0;
+   cs-buf[cs-cdw++] = va;
+   cs-buf[cs-cdw++] = (va  32UL)  0xFF;
break;
case PIPE_QUERY_TIME_ELAPSED:
va += query-results_end + query-result_size/2;
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/11] radeonsi: implement glDrawTransformFeedback functionality

2013-09-03 Thread Marek Olšák
---
 src/gallium/drivers/radeonsi/si_state.c  |  1 +
 src/gallium/drivers/radeonsi/si_state_draw.c | 23 +++
 src/gallium/drivers/radeonsi/sid.h   |  6 ++
 3 files changed, 30 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 650db4f..e1b4e32 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -3241,6 +3241,7 @@ void si_init_config(struct r600_context *rctx)
si_pm4_set_reg(pm4, R_028A40_VGT_GS_MODE, 0x0);
si_pm4_set_reg(pm4, R_028A84_VGT_PRIMITIVEID_EN, 0x0);
si_pm4_set_reg(pm4, R_028A8C_VGT_PRIMITIVEID_RESET, 0x0);
+   si_pm4_set_reg(pm4, R_028B28_VGT_STRMOUT_DRAW_OPAQUE_OFFSET, 0);
si_pm4_set_reg(pm4, R_028B94_VGT_STRMOUT_CONFIG, 0x0);
si_pm4_set_reg(pm4, R_028B98_VGT_STRMOUT_BUFFER_CONFIG, 0x0);
si_pm4_set_reg(pm4, R_028AA8_IA_MULTI_VGT_PARAM,
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index e65b0cf..687410c 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -557,6 +557,29 @@ static void si_state_draw(struct r600_context *rctx,
   S_02800C_NOOP_CULL_DISABLE(1));
}
 
+   if (info-count_from_stream_output) {
+   struct r600_so_target *t =
+   (struct r600_so_target*)info-count_from_stream_output;
+   uint64_t va = r600_resource_va(rctx-screen-b.b,
+  t-buf_filled_size-b.b);
+   va += t-buf_filled_size_offset;
+
+   si_pm4_set_reg(pm4, 
R_028B30_VGT_STRMOUT_DRAW_OPAQUE_VERTEX_STRIDE,
+  t-stride_in_dw);
+
+   si_pm4_cmd_begin(pm4, PKT3_COPY_DATA);
+   si_pm4_cmd_add(pm4,
+  COPY_DATA_SRC_SEL(COPY_DATA_MEM) |
+  COPY_DATA_DST_SEL(COPY_DATA_REG) |
+  COPY_DATA_WR_CONFIRM);
+   si_pm4_cmd_add(pm4, va); /* src address lo */
+   si_pm4_cmd_add(pm4, va  32UL); /* src address hi */
+   si_pm4_cmd_add(pm4, 
R_028B2C_VGT_STRMOUT_DRAW_OPAQUE_BUFFER_FILLED_SIZE  2);
+   si_pm4_cmd_add(pm4, 0); /* unused */
+   si_pm4_add_bo(pm4, t-buf_filled_size, RADEON_USAGE_READ);
+   si_pm4_cmd_end(pm4, true);
+   }
+
/* draw packet */
si_pm4_cmd_begin(pm4, PKT3_INDEX_TYPE);
if (ib-index_size == 4) {
diff --git a/src/gallium/drivers/radeonsi/sid.h 
b/src/gallium/drivers/radeonsi/sid.h
index c6688b3..021f4eb 100644
--- a/src/gallium/drivers/radeonsi/sid.h
+++ b/src/gallium/drivers/radeonsi/sid.h
@@ -103,6 +103,12 @@
 #defineWAIT_REG_MEM_EQUAL  3
 #define PKT3_MEM_WRITE 0x3D /* not on CIK */
 #define PKT3_INDIRECT_BUFFER   0x32
+#define PKT3_COPY_DATA0x40
+#defineCOPY_DATA_SRC_SEL(x)((x)  0xf)
+#defineCOPY_DATA_REG   0
+#defineCOPY_DATA_MEM   1
+#defineCOPY_DATA_DST_SEL(x)(((x)  0xf)  8)
+#defineCOPY_DATA_WR_CONFIRM(1  20)
 #define PKT3_SURFACE_SYNC  0x43 /* deprecated on CIK, use 
ACQUIRE_MEM */
 #define PKT3_ME_INITIALIZE 0x44 /* not on CIK */
 #define PKT3_COND_WRITE0x45
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/11] radeonsi: enable streamout AKA transform feedback on SI

2013-09-03 Thread Marek Olšák
CIK is not enabled, because it's very unstable regardless of transform
feedback.
---
 src/gallium/drivers/radeonsi/radeonsi_pipe.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c 
b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
index e219e36..5220e41 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
@@ -342,6 +342,7 @@ static const char* r600_get_name(struct pipe_screen* 
pscreen)
 static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param)
 {
struct r600_screen *rscreen = (struct r600_screen *)pscreen;
+   bool has_streamout = HAVE_LLVM = 0x0304  rscreen-b.chip_class == SI;
 
switch (param) {
/* Supported features (boolean caps). */
@@ -414,20 +415,13 @@ static int r600_get_param(struct pipe_screen* pscreen, 
enum pipe_cap param)
return 0;
 
/* Stream output. */
-#if 0
case PIPE_CAP_MAX_STREAM_OUTPUT_BUFFERS:
-   return debug_get_bool_option(R600_STREAMOUT, FALSE) ? 4 : 0;
+   return has_streamout ? 4 : 0;
case PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME:
-   return debug_get_bool_option(R600_STREAMOUT, FALSE) ? 1 : 0;
+   return has_streamout ? 1 : 0;
case PIPE_CAP_MAX_STREAM_OUTPUT_SEPARATE_COMPONENTS:
case PIPE_CAP_MAX_STREAM_OUTPUT_INTERLEAVED_COMPONENTS:
-   return 16*4;
-#endif
-   case PIPE_CAP_MAX_STREAM_OUTPUT_BUFFERS:
-   case PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME:
-   case PIPE_CAP_MAX_STREAM_OUTPUT_SEPARATE_COMPONENTS:
-   case PIPE_CAP_MAX_STREAM_OUTPUT_INTERLEAVED_COMPONENTS:
-   return 0;
+   return has_streamout ? 32*4 : 0;
 
/* Texturing. */
case PIPE_CAP_MAX_TEXTURE_2D_LEVELS:
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/11] radeonsi: implement streamout shader support

2013-09-03 Thread Marek Olšák
The shader is responsible for writing to streamout buffers using
the TBUFFER_STORE_FORMAT_* instructions.

The locations of some input SGPRs and VGPRs are assigned dynamically, because
the input SGPRs controlling streamout are not declared if they are not needed,
decreasing the indices of all following inputs.
---
 src/gallium/drivers/radeonsi/radeonsi_shader.c | 279 -
 src/gallium/drivers/radeonsi/radeonsi_shader.h |   5 +-
 src/gallium/drivers/radeonsi/si_state_draw.c   |   7 +-
 3 files changed, 276 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c 
b/src/gallium/drivers/radeonsi/radeonsi_shader.c
index 335cd79..92f7cf5 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_shader.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c
@@ -34,6 +34,7 @@
 #include gallivm/lp_bld_logic.h
 #include gallivm/lp_bld_tgsi.h
 #include gallivm/lp_bld_arit.h
+#include gallivm/lp_bld_flow.h
 #include radeon_llvm.h
 #include radeon_llvm_emit.h
 #include util/u_memory.h
@@ -59,6 +60,11 @@ struct si_shader_context
struct tgsi_token * tokens;
struct si_pipe_shader *shader;
unsigned type; /* TGSI_PROCESSOR_* specifies the type of shader. */
+   int param_streamout_config;
+   int param_streamout_write_index;
+   int param_streamout_offset[4];
+   int param_vertex_id;
+   int param_instance_id;
LLVMValueRef const_md;
LLVMValueRef const_resource;
 #if HAVE_LLVM = 0x0304
@@ -67,6 +73,7 @@ struct si_shader_context
LLVMValueRef *constants;
LLVMValueRef *resources;
LLVMValueRef *samplers;
+   LLVMValueRef so_buffers[4];
 };
 
 static struct si_shader_context * si_shader_context(
@@ -119,9 +126,12 @@ static LLVMValueRef get_instance_index(
struct radeon_llvm_context * radeon_bld,
unsigned divisor)
 {
+   struct si_shader_context *si_shader_ctx =
+   si_shader_context(radeon_bld-soa.bld_base);
struct gallivm_state * gallivm = radeon_bld-soa.bld_base.base.gallivm;
 
-   LLVMValueRef result = LLVMGetParam(radeon_bld-main_fn, 
SI_PARAM_INSTANCE_ID);
+   LLVMValueRef result = LLVMGetParam(radeon_bld-main_fn,
+  si_shader_ctx-param_instance_id);
result = LLVMBuildAdd(gallivm-builder, result, LLVMGetParam(
radeon_bld-main_fn, SI_PARAM_START_INSTANCE), );
 
@@ -168,7 +178,8 @@ static void declare_input_vs(
} else {
/* Load the buffer index, which is always stored in VGPR0
 * for Vertex Shaders */
-   buffer_index = LLVMGetParam(si_shader_ctx-radeon_bld.main_fn, 
SI_PARAM_VERTEX_ID);
+   buffer_index = LLVMGetParam(si_shader_ctx-radeon_bld.main_fn,
+   si_shader_ctx-param_vertex_id);
}
 
vec4_type = LLVMVectorType(base-elem_type, 4);
@@ -397,7 +408,8 @@ static void declare_system_value(
unsigned index,
const struct tgsi_full_declaration *decl)
 {
-
+   struct si_shader_context *si_shader_ctx =
+   si_shader_context(radeon_bld-soa.bld_base);
LLVMValueRef value = 0;
 
switch (decl-Semantic.Name) {
@@ -406,7 +418,8 @@ static void declare_system_value(
break;
 
case TGSI_SEMANTIC_VERTEXID:
-   value = LLVMGetParam(radeon_bld-main_fn, SI_PARAM_VERTEX_ID);
+   value = LLVMGetParam(radeon_bld-main_fn,
+si_shader_ctx-param_vertex_id);
break;
 
default:
@@ -651,6 +664,206 @@ static void si_llvm_emit_clipvertex(struct 
lp_build_tgsi_context * bld_base,
}
 }
 
+static void si_dump_streamout(struct pipe_stream_output_info *so)
+{
+   unsigned i;
+
+   if (so-num_outputs)
+   fprintf(stderr, STREAMOUT\n);
+
+   for (i = 0; i  so-num_outputs; i++) {
+   unsigned mask = ((1  so-output[i].num_components) - 1) 
+   so-output[i].start_component;
+   fprintf(stderr,   %i: BUF%i[%i..%i] - OUT[%i].%s%s%s%s\n,
+   i, so-output[i].output_buffer,
+   so-output[i].dst_offset, so-output[i].dst_offset + 
so-output[i].num_components - 1,
+   so-output[i].register_index,
+   mask  1 ? x : ,
+   mask  2 ? y : ,
+   mask  4 ? z : ,
+   mask  8 ? w : );
+   }
+}
+
+/* TBUFFER_STORE_FORMAT_{X,XY,XYZ,XYZW} - the suffix is selected by 
num_channels=1..4.
+ * The type of vdata must be one of i32 (num_channels=1), v2i32 
(num_channels=2),
+ * or v4i32 (num_channels=3,4). */
+static void build_tbuffer_store(struct si_shader_context *shader,
+   LLVMValueRef rsrc,
+   LLVMValueRef vdata,
+   unsigned num_channels,
+ 

Re: [Mesa-dev] [PATCH] i965: Actually check every primitive for cut index support.

2013-09-03 Thread Paul Berry
On 2 September 2013 16:47, Ian Romanick i...@freedesktop.org wrote:

 On 09/01/2013 07:05 PM, Kenneth Graunke wrote:
  can_cut_index_handle_prims() was passed an array of _mesa_prim objects
  and a count, and ran a loop for that many iterations.  However, it
  treated the array like a pointer, repeatedly checking the first element.

 Blarg.  How would an application be able to observe the old bug
 behavior?  Can we come up with a test case that might tickle it?


We already had some discussion about this (see
http://lists.freedesktop.org/archives/mesa-dev/2013-August/044129.html).
 Briefly, I wasn't able to come up with a test case that produced incorrect
rendering, but I was able to produce a test case where prim[i] != prim[j].
 Considering how difficult the VBO module is to understand, I think it's
wise to err on the safe side and do what Ken has done in this patch.

Patch is:

Reviewed-by: Paul Berry stereotype...@gmail.com



  This patch makes it actually check every primitive.
 
  Signed-off-by: Kenneth Graunke kenn...@whitecape.org

 Either way, the patch is obviously correct.

 Reviewed-by: Ian Romanick ian.d.roman...@intel.com

  ---
   src/mesa/drivers/dri/i965/brw_primitive_restart.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)
 
  diff --git a/src/mesa/drivers/dri/i965/brw_primitive_restart.c
 b/src/mesa/drivers/dri/i965/brw_primitive_restart.c
  index 0dbc48f..b305dca 100644
  --- a/src/mesa/drivers/dri/i965/brw_primitive_restart.c
  +++ b/src/mesa/drivers/dri/i965/brw_primitive_restart.c
  @@ -92,8 +92,8 @@ can_cut_index_handle_prims(struct gl_context *ctx,
 return false;
  }
 
  -   for ( ; nr_prims  0; nr_prims--) {
  -  switch(prim-mode) {
  +   for (int i = 0; i  nr_prims; i++) {
  +  switch (prim[i].mode) {
 case GL_POINTS:
 case GL_LINES:
 case GL_LINE_STRIP:
 

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/6] i965/fs: Optimize IF/MOV/ELSE/MOV/ENDIF to SEL when possible.

2013-09-03 Thread Rob Clark
On Tue, Aug 6, 2013 at 3:24 PM, Christoph Bumiller
e0425...@student.tuwien.ac.at wrote:
 On 06.08.2013 19:19, Matt Turner wrote:
 On Tue, Aug 6, 2013 at 4:14 AM, Christoph Bumiller
 e0425...@student.tuwien.ac.at wrote:
 On 06.08.2013 03:28, Kenneth Graunke wrote:
 Many GLSL shaders contain code of the form:

x = condition ? foo : bar

 The compiler emits an ir_if tree for this, since each subexpression
 might be a complex tree that could have side-effects and short-circuit
 logic operations.

 However, the common case is to simply pick one of two constants or
 variable's values---which is exactly what SEL is for.  Replacing IF/ELSE
 with SEL also simplifies the control flow graph, making optimization
 passes which work on basic blocks more effective.

 Don't you think something like that should be implemented in common code
 so that all drivers can profit ?
 We would love that. As part of an work in progress, I'm adding
 conditional-select to the GLSL IR. We planned a few months ago to do
 this as a step toward SSA at the IR level, but have only laid a little
 bit of groundwork in that direction (Ian's vector insert/extract
 series).

 Looks like your backend already does SSA. Shouldn't that be
 implemented in common code? :)

 Then the code would have to run on GLSL IR as well as my internal IR
 because the intermediate one, TGSI, shouldn't be in SSA form, and
 abstracting an IR doesn't sound particularly fun.

btw, I'd *love* an option to get TGSI in SSA form (or at least a form
easier to turn back into SSA)..  it is starting to look like doing
anything vaguely clever w/ freedreno compiler will require essentially
turning TGSI into SSA, and I guess other drivers will need the same.
(Tegra will for sure, for if/else/endif in frag shader.  But I guess
it would be useful to others.)

I guess add TGSI_OPCODE_PHI plus maybe some hint or instruction to
indicate when a register is no longer used (maybe not needed, but
otherwise maybe for large programs tgsi_{src,dst}_register.Index might
overflow?).

BR,
-R

 Also I don't have to handle vectors so it's a bit simpler, actually
 pretty straightforward if you implement an existing algorithm.
 As for some other passes that could be shared, I still need them in the
 backend to be applied to device-specifc code sequences, you probably
 have a similar situation.

 It would be really nice to have more, useful device-independent
 optimizations or simplifications like this already done instead of
 requiring each driver to re-implement them (or use llvm).
 Yes, it definitely would.

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] glsl: Remove unused prog parameter from tfeedback_decl::init

2013-09-03 Thread Paul Berry
On 30 August 2013 16:07, Ian Romanick i...@freedesktop.org wrote:

 From: Ian Romanick ian.d.roman...@intel.com

 It looks like commit 53febac removed the last user of that parameter.

 Signed-off-by: Ian Romanick ian.d.roman...@intel.com
 Cc: Paul Berry stereotype...@gmail.com


Patches 2-4 are:

Reviewed-by: Paul Berry stereotype...@gmail.com


 ---
  src/glsl/link_varyings.cpp | 6 +++---
  src/glsl/link_varyings.h   | 3 +--
  2 files changed, 4 insertions(+), 5 deletions(-)

 diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
 index 7a61b1a..c3b5855 100644
 --- a/src/glsl/link_varyings.cpp
 +++ b/src/glsl/link_varyings.cpp
 @@ -236,8 +236,8 @@ cross_validate_outputs_to_inputs(struct
 gl_shader_program *prog,
   * will fail to find any matching variable.
   */
  void
 -tfeedback_decl::init(struct gl_context *ctx, struct gl_shader_program
 *prog,
 - const void *mem_ctx, const char *input)
 +tfeedback_decl::init(struct gl_context *ctx, const void *mem_ctx,
 + const char *input)
  {
 /* We don't have to be pedantic about what is a valid GLSL variable
 name,
  * because any variable with an invalid name can't exist in the IR
 anyway.
 @@ -507,7 +507,7 @@ parse_tfeedback_decls(struct gl_context *ctx, struct
 gl_shader_program *prog,
char **varying_names, tfeedback_decl *decls)
  {
 for (unsigned i = 0; i  num_names; ++i) {
 -  decls[i].init(ctx, prog, mem_ctx, varying_names[i]);
 +  decls[i].init(ctx, mem_ctx, varying_names[i]);

if (!decls[i].is_varying())
   continue;
 diff --git a/src/glsl/link_varyings.h b/src/glsl/link_varyings.h
 index 302ab5c..6264ef0 100644
 --- a/src/glsl/link_varyings.h
 +++ b/src/glsl/link_varyings.h
 @@ -91,8 +91,7 @@ struct tfeedback_candidate
  class tfeedback_decl
  {
  public:
 -   void init(struct gl_context *ctx, struct gl_shader_program *prog,
 - const void *mem_ctx, const char *input);
 +   void init(struct gl_context *ctx, const void *mem_ctx, const char
 *input);
 static bool is_same(const tfeedback_decl x, const tfeedback_decl y);
 bool assign_location(struct gl_context *ctx,
  struct gl_shader_program *prog);
 --
 1.8.1.4


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/gen7.5: Fix lower bound on number of VS URB entries.

2013-09-03 Thread Chad Versace

On 08/31/2013 10:34 PM, Paul Berry wrote:

Haswell GT2 and GT3 require the number of vertex shader URB entries to
be at least 64, not 32.

At the moment, we always meet this requirement automatically, because
in the absence of a geometry shader, we assign all available URB space
to the vertex shader.  But when we turn on support for geometry
shaders, this lower limit will become important.
---
  src/mesa/drivers/dri/i965/brw_context.c | 7 +++
  src/mesa/drivers/dri/i965/brw_context.h | 1 +
  src/mesa/drivers/dri/i965/gen6_urb.c| 2 +-
  src/mesa/drivers/dri/i965/gen7_urb.c| 7 ---
  4 files changed, 13 insertions(+), 4 deletions(-)


Reviewed-by: Chad Versace chad.vers...@linux.intel.com

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] glsl: Refactor a bunch of the code out of cross_validate_outputs_to_inputs

2013-09-03 Thread Paul Berry
On 30 August 2013 16:07, Ian Romanick i...@freedesktop.org wrote:

 From: Ian Romanick ian.d.roman...@intel.com

 The new function, cross_validate_types_and_qualifiers, will have
 multiple callers from this file in future commits.

 Signed-off-by: Ian Romanick ian.d.roman...@intel.com
 ---
  src/glsl/link_varyings.cpp | 171
 +
  1 file changed, 94 insertions(+), 77 deletions(-)

 diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
 index 4ceb1d3..a1899f7 100644
 --- a/src/glsl/link_varyings.cpp
 +++ b/src/glsl/link_varyings.cpp
 @@ -41,6 +41,97 @@


  /**
 + * Validate the types and qualifiers of an output from one stage against
 the
 + * matching input to another stage.
 + */
 +static void
 +cross_validate_types_and_qualifiers(struct gl_shader_program *prog,
 +const ir_variable *input,
 +const ir_variable *output,
 +GLenum consumer_type,
 +const char *consumer_stage,
 +const char *producer_stage)


It seems redundant to pass both consumer_type and consumer_stage as
arguments, since the latter is just
_mesa_glsl_shader_target_name(consumer_type).  You might want to just pass
consumer_type and producer_type, and use _mesa_glsl_shader_target_name() to
convert them to strings in the event of an error.

However, it's extra bookkeeping work to do that, so I'm ambivalent about
it.  Either way,

Reviewed-by: Paul Berry stereotype...@gmail.com


 +{
 +   /* Check that the types match between stages.
 +*/
 +   const glsl_type *type_to_match = input-type;
 +   if (consumer_type == GL_GEOMETRY_SHADER) {
 +  assert(type_to_match-is_array()); /* Enforced by ast_to_hir */
 +  type_to_match = type_to_match-element_type();
 +   }
 +   if (type_to_match != output-type) {
 +  /* There is a bit of a special case for gl_TexCoord.  This
 +   * built-in is unsized by default.  Applications that variable
 +   * access it must redeclare it with a size.  There is some
 +   * language in the GLSL spec that implies the fragment shader
 +   * and vertex shader do not have to agree on this size.  Other
 +   * driver behave this way, and one or two applications seem to
 +   * rely on it.
 +   *
 +   * Neither declaration needs to be modified here because the array
 +   * sizes are fixed later when update_array_sizes is called.
 +   *
 +   * From page 48 (page 54 of the PDF) of the GLSL 1.10 spec:
 +   *
 +   * Unlike user-defined varying variables, the built-in
 +   * varying variables don't have a strict one-to-one
 +   * correspondence between the vertex language and the
 +   * fragment language.
 +   */
 +  if (!output-type-is_array()
 +  || (strncmp(gl_, output-name, 3) != 0)) {
 + linker_error(prog,
 +  %s shader output `%s' declared as type `%s', 
 +  but %s shader input declared as type `%s'\n,
 +  producer_stage, output-name,
 +  output-type-name,
 +  consumer_stage, input-type-name);
 + return;
 +  }
 +   }
 +
 +   /* Check that all of the qualifiers match between stages.
 +*/
 +   if (input-centroid != output-centroid) {
 +  linker_error(prog,
 +   %s shader output `%s' %s centroid qualifier, 
 +   but %s shader input %s centroid qualifier\n,
 +   producer_stage,
 +   output-name,
 +   (output-centroid) ? has : lacks,
 +   consumer_stage,
 +   (input-centroid) ? has : lacks);
 +  return;
 +   }
 +
 +   if (input-invariant != output-invariant) {
 +  linker_error(prog,
 +   %s shader output `%s' %s invariant qualifier, 
 +   but %s shader input %s invariant qualifier\n,
 +   producer_stage,
 +   output-name,
 +   (output-invariant) ? has : lacks,
 +   consumer_stage,
 +   (input-invariant) ? has : lacks);
 +  return;
 +   }
 +
 +   if (input-interpolation != output-interpolation) {
 +  linker_error(prog,
 +   %s shader output `%s' specifies %s 
 +   interpolation qualifier, 
 +   but %s shader input specifies %s 
 +   interpolation qualifier\n,
 +   producer_stage,
 +   output-name,
 +   output-interpolation_string(),
 +   consumer_stage,
 +   input-interpolation_string());
 +  return;
 +   }
 +}
 +
 +/**
   * Validate that outputs from one stage match inputs of another
   */
  void
 @@ -81,83 +172,9 @@ cross_validate_outputs_to_inputs(struct
 gl_shader_program *prog,

  

[Mesa-dev] [PATCH] gallivm: support indirect registers on both dimensions

2013-09-03 Thread Zack Rusin
We support indirect addressing only on the vertex index, but some
shaders also use indirect addressing on attributes. This patch
adds support for indirect addressing on both dimensions inside
gs arrays.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_llvm.c  | 23 +--
 src/gallium/auxiliary/gallivm/lp_bld_tgsi.h |  3 ++-
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c |  4 +++-
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index 820d6b0..03668d9 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -1360,8 +1360,9 @@ clipmask_booli32(struct gallivm_state *gallivm,
 static LLVMValueRef
 draw_gs_llvm_fetch_input(const struct lp_build_tgsi_gs_iface *gs_iface,
  struct lp_build_tgsi_context * bld_base,
- boolean is_indirect,
+ boolean is_vindex_indirect,
  LLVMValueRef vertex_index,
+ boolean is_aindex_indirect,
  LLVMValueRef attrib_index,
  LLVMValueRef swizzle_index)
 {
@@ -1372,18 +1373,28 @@ draw_gs_llvm_fetch_input(const struct 
lp_build_tgsi_gs_iface *gs_iface,
LLVMValueRef res;
struct lp_type type = bld_base-base.type;
 
-   if (is_indirect) {
+   if (is_vindex_indirect || is_aindex_indirect) {
   int i;
   res = bld_base-base.zero;
   for (i = 0; i  type.length; ++i) {
  LLVMValueRef idx = lp_build_const_int32(gallivm, i);
- LLVMValueRef vert_chan_index = LLVMBuildExtractElement(builder,
-vertex_index, 
idx, );
+ LLVMValueRef vert_chan_index = vertex_index;
+ LLVMValueRef attr_chan_index = attrib_index;
  LLVMValueRef channel_vec, value;
+
+ if (is_vindex_indirect) {
+vert_chan_index = LLVMBuildExtractElement(builder,
+  vertex_index, idx, );
+ }
+ if (is_aindex_indirect) {
+attr_chan_index = LLVMBuildExtractElement(builder,
+  attrib_index, idx, );
+ }
+
  indices[0] = vert_chan_index;
- indices[1] = attrib_index;
+ indices[1] = attr_chan_index;
  indices[2] = swizzle_index;
- 
+
  channel_vec = LLVMBuildGEP(builder, gs-input, indices, 3, );
  channel_vec = LLVMBuildLoad(builder, channel_vec, );
  value = LLVMBuildExtractElement(builder, channel_vec, idx, );
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
index 522302e..8bcdbc8 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
@@ -395,8 +395,9 @@ struct lp_build_tgsi_gs_iface
 {
LLVMValueRef (*fetch_input)(const struct lp_build_tgsi_gs_iface *gs_iface,
struct lp_build_tgsi_context * bld_base,
-   boolean is_indirect,
+   boolean is_vindex_indirect,
LLVMValueRef vertex_index,
+   boolean is_aindex_indirect,
LLVMValueRef attrib_index,
LLVMValueRef swizzle_index);
void (*emit_vertex)(const struct lp_build_tgsi_gs_iface *gs_iface,
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
index 4c6b6ec..e50f1d1 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
@@ -1135,7 +1135,9 @@ emit_fetch_gs_input(
 
res = bld-gs_iface-fetch_input(bld-gs_iface, bld_base,
 reg-Dimension.Indirect,
-vertex_index, attrib_index,
+vertex_index,
+reg-Register.Indirect,
+attrib_index,
 swizzle_index);
 
assert(res);
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallivm: support indirect registers on both dimensions

2013-09-03 Thread Brian Paul

On 09/03/2013 11:50 AM, Zack Rusin wrote:

We support indirect addressing only on the vertex index, but some
shaders also use indirect addressing on attributes. This patch
adds support for indirect addressing on both dimensions inside
gs arrays.

Signed-off-by: Zack Rusin za...@vmware.com
---
  src/gallium/auxiliary/draw/draw_llvm.c  | 23 +--
  src/gallium/auxiliary/gallivm/lp_bld_tgsi.h |  3 ++-
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c |  4 +++-
  3 files changed, 22 insertions(+), 8 deletions(-)




Reviewed-by: Brian Paul bri...@vmware.com

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] i965/gen7: Set MOCS L3 cacheability for IVB/BYT

2013-09-03 Thread Ville Syrjälä
On Thu, Aug 15, 2013 at 10:39:31PM +0200, Vedran Rodic wrote:
  We do have the set_caching ioctl. It's enough to flip the PTEs to UC and
  let MOCS manage things. I actually did a few experiments on my IVB. I
  made all Mesa's buffers UC via PTEs by patching libdrm to change the
  cache mode of each bo after allocation. Then I fiddled with the MOCS
  LLC bits in various ways. It definitely has an effect, sometimes making
  things slower, sometimes faster. xonotic again seemed to benefit. IIRC
  leaving everything LLC uncached was actually the fastest (w/ high quality
  at least) so we may be thrashing the LLC a bit there. But eg. reaction
  quake regressed quite a lot if most things were left as UC.
 
 Can you share the libdrm patch?

Sorry, forgot to reply.

Here's the patch if you're still interested.

From 47f51b19137603dccaa4fcb2a703d56335c292fe Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= ville.syrj...@linux.intel.com
Date: Wed, 14 Aug 2013 15:12:29 +0300
Subject: [PATCH] make bos uncached in PTEs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
---
 intel/intel_bufmgr_gem.c | 60 ++--
 1 file changed, 53 insertions(+), 7 deletions(-)

diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index f98f7a7..32ff260 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -243,6 +243,10 @@ drm_intel_gem_bo_get_tiling(drm_intel_bo *bo, uint32_t * 
tiling_mode,
uint32_t * swizzle_mode);
 
 static int
+drm_intel_gem_bo_set_caching_internal(drm_intel_bo *bo,
+ uint32_t cache_mode);
+
+static int
 drm_intel_gem_bo_set_tiling_internal(drm_intel_bo *bo,
 uint32_t tiling_mode,
 uint32_t stride);
@@ -695,6 +699,7 @@ retry:
drm_intel_gem_bo_free(bo_gem-bo);
goto retry;
}
+
}
}
pthread_mutex_unlock(bufmgr_gem-lock);
@@ -761,9 +766,16 @@ drm_intel_gem_bo_alloc_for_render(drm_intel_bufmgr *bufmgr,
  unsigned long size,
  unsigned int alignment)
 {
-   return drm_intel_gem_bo_alloc_internal(bufmgr, name, size,
-  BO_ALLOC_FOR_RENDER,
-  I915_TILING_NONE, 0);
+   drm_intel_bo *bo;
+
+   bo = drm_intel_gem_bo_alloc_internal(bufmgr, name, size,
+BO_ALLOC_FOR_RENDER,
+I915_TILING_NONE, 0);
+
+   if (bo)
+   drm_intel_gem_bo_set_caching_internal(bo, I915_CACHEING_NONE);
+
+   return bo;
 }
 
 static drm_intel_bo *
@@ -772,8 +784,15 @@ drm_intel_gem_bo_alloc(drm_intel_bufmgr *bufmgr,
   unsigned long size,
   unsigned int alignment)
 {
-   return drm_intel_gem_bo_alloc_internal(bufmgr, name, size, 0,
-  I915_TILING_NONE, 0);
+   drm_intel_bo *bo;
+
+   bo = drm_intel_gem_bo_alloc_internal(bufmgr, name, size, 0,
+I915_TILING_NONE, 0);
+
+   if (bo)
+   drm_intel_gem_bo_set_caching_internal(bo, I915_CACHEING_CACHED);
+
+   return bo;
 }
 
 static drm_intel_bo *
@@ -784,6 +803,7 @@ drm_intel_gem_bo_alloc_tiled(drm_intel_bufmgr *bufmgr, 
const char *name,
drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *)bufmgr;
unsigned long size, stride;
uint32_t tiling;
+   drm_intel_bo *bo;
 
do {
unsigned long aligned_y, height_alignment;
@@ -824,8 +844,13 @@ drm_intel_gem_bo_alloc_tiled(drm_intel_bufmgr *bufmgr, 
const char *name,
if (tiling == I915_TILING_NONE)
stride = 0;
 
-   return drm_intel_gem_bo_alloc_internal(bufmgr, name, size, flags,
-  tiling, stride);
+   bo = drm_intel_gem_bo_alloc_internal(bufmgr, name, size, flags,
+tiling, stride);
+
+   if (bo)
+   drm_intel_gem_bo_set_caching_internal(bo, I915_CACHEING_NONE);
+
+   return bo;
 }
 
 /**
@@ -2363,6 +2388,27 @@ drm_intel_gem_bo_unpin(drm_intel_bo *bo)
 }
 
 static int
+drm_intel_gem_bo_set_caching_internal(drm_intel_bo *bo,
+ uint32_t cache_mode)
+{
+   drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo-bufmgr;
+   drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+   struct drm_i915_gem_cacheing set_caching;
+   int ret;
+
+   memset(set_caching, 0, sizeof(set_caching));
+
+   set_caching.handle = bo_gem-gem_handle;
+   

Re: [Mesa-dev] [PATCH] swrast: Fix crash in sPriv-swrast_loader-getImage().

2013-09-03 Thread Brian Paul

On 09/01/2013 08:19 AM, Johannes Obermayr wrote:

From: Egbert Eich e...@freedesktop.org

When glXBindTexImageEXT is called and SWrast is used there will be a crash when
sPriv-swrast_loader-getImage() is called from swrastSetTexBuffer2(). Reason:
no memory has been allocated for the destination thus texImage-Data is NULL.
Call ctx-Driver.TexImage2D() to initialize this. If memory has been allocated
in a previous call free it first.

Fixes: https://bugzilla.novell.com/show_bug.cgi?id=641297

Signed-off-by: Egbert Eich e...@freedesktop.org
Adapted-by: Stefan Dirsch sndir...@suse.com
Adapted-by: Tobias Johannes Klausmann tobias.johannes.klausm...@mni.thm.de
---

I am not sure whether this patch from Q3/2010 is required these days. But
openSUSE still applies it ...
If yes it should also land in 9.2, 9.1, 9.0 and 8.0 branches.

---
  src/mesa/drivers/dri/swrast/swrast.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/src/mesa/drivers/dri/swrast/swrast.c 
b/src/mesa/drivers/dri/swrast/swrast.c
index 332c7b7..b1c67a9 100644
--- a/src/mesa/drivers/dri/swrast/swrast.c
+++ b/src/mesa/drivers/dri/swrast/swrast.c
@@ -67,6 +67,7 @@
  static void swrastSetTexBuffer2(__DRIcontext *pDRICtx, GLint target,
GLint texture_format, __DRIdrawable *dPriv)
  {
+GET_CURRENT_CONTEXT(ctx);


The current gl_context can be found at dri_ctx-Base as seen below in 
several places.




  struct dri_context *dri_ctx;
  int x, y, w, h;
  __DRIscreen *sPriv = dPriv-driScreenPriv;
@@ -98,6 +99,13 @@ static void swrastSetTexBuffer2(__DRIcontext *pDRICtx, GLint 
target,
  _mesa_init_teximage_fields(dri_ctx-Base, texImage,
   w, h, 1, 0, internalFormat, texFormat);

+if (texImage-Data)


This won't work on master/9.2/9.1 (at least) since gl_texture_image 
doesn't have a 'Data' field.




+   ctx-Driver.FreeTexImageData(ctx, texImage);


There's no such driver hook in recent Mesa.



+
+ctx-Driver.TexImage2D(ctx, target, 0, internalFormat,
+  w, h, 0, texture_format, GL_UNSIGNED_INT_8_8_8_8,
+  NULL, ctx-Unpack, texObj, texImage);


If you just need to allocate texture memory you should probably call 
ctx-Driver.AllocTextureImageBuffer().




+
  sPriv-swrast_loader-getImage(dPriv, x, y, w, h, (char *)swImage-Buffer,
   dPriv-loaderPrivate);





-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/gs: Don't assign gl_Layer its own slot in the VUE map.

2013-09-03 Thread Paul Berry
---
 src/mesa/drivers/dri/i965/brw_vs.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
b/src/mesa/drivers/dri/i965/brw_vs.c
index b81a538..7c7493f 100644
--- a/src/mesa/drivers/dri/i965/brw_vs.c
+++ b/src/mesa/drivers/dri/i965/brw_vs.c
@@ -64,6 +64,11 @@ brw_compute_vue_map(struct brw_context *brw, struct 
brw_vue_map *vue_map,
vue_map-slots_valid = slots_valid;
int i;
 
+   /* gl_Layer doesn't get its own varying slot--it's stored in the virst VUE
+* slot (VARYING_SLOT_PSIZ).
+*/
+   slots_valid = ~VARYING_BIT_LAYER;
+
/* Make sure that the values we store in vue_map-varying_to_slot and
 * vue_map-slot_to_varying won't overflow the signed chars that are used
 * to store them.  Note that since vue_map-slot_to_varying sometimes holds
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: propagate max_array_access through function calls

2013-09-03 Thread Matt Turner
On Wed, Aug 28, 2013 at 1:10 PM, Dominik Behr db...@chromium.org wrote:
 Fixes a bug where if an uniform array is passed to a function the accesses
 to the array are not propagated so later all but the first vector of the
 uniform array are removed in parcel_out_uniform_storage resulting in
 broken shaders and out of bounds access to arrays in
 brw::vec4_visitor::pack_uniform_registers.

 Signed-off-by: Dominik Behr db...@chromium.org
 ---
  src/glsl/link_functions.cpp | 29 +
  1 file changed, 29 insertions(+)

 diff --git a/src/glsl/link_functions.cpp b/src/glsl/link_functions.cpp
 index 6b3e154..d935546 100644
 --- a/src/glsl/link_functions.cpp
 +++ b/src/glsl/link_functions.cpp
 @@ -173,6 +173,35 @@ public:
return visit_continue;
 }

 +   virtual ir_visitor_status visit_leave(ir_call *ir)
 +   {
 +  /* Traverse list of function parameters, and for array parameters
 + propagate max_array_access, Otherwise arrays that are only 
 referenced
 + from inside functions via function parameters will be incorrectly
 + optimized. This will lead to incorrect code being generated (or 
 worse).
 + Do it when leaving the node so the childen would propagate their
 + array accesses first */
 +
 +  const exec_node *formal_param_node = ir-callee-parameters.get_head();
 +  const exec_node *actual_param_node = ir-actual_parameters.get_head();
 +  while (!actual_param_node-is_tail_sentinel()) {
 + ir_variable *formal_param = (ir_variable *) formal_param_node;
 + ir_rvalue *actual_param = (ir_rvalue *) actual_param_node;
 +
 + formal_param_node = formal_param_node-get_next();
 + actual_param_node = actual_param_node-get_next();
 +
 + if (formal_param-type-is_array()) {
 +ir_dereference_variable *deref = 
 actual_param-as_dereference_variable();
 +if (deref  deref-var  deref-var-type-is_array()) {
 +   deref-var-max_array_access =
 +  MAX2(formal_param-max_array_access, 
 deref-var-max_array_access);
 +}
 + }
 +  }
 +  return visit_continue;
 +   }
 +
 virtual ir_visitor_status visit(ir_dereference_variable *ir)
 {
if (hash_table_find(locals, ir-var) == NULL) {
 --
 1.8.3.1

Reviewed-and-Tested-by: Matt Turner matts...@gmail.com

I've sent four tests to the piglit list and Cc'd you. Take a look at
them and make sure they're exercising the thing you want to test.

I'll commit this patch tomorrow, assuming no other comments or
problems with the tests. I'll also tag it for the stable branches,
since it's definitely a bug fix.

Thanks a bunch, Dominik!

Matt
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: propagate max_array_access through function calls

2013-09-03 Thread Dominik Behr
Thanks,
I looked at piglit tests and they look OK if they are only supposed to test
whether the shader compiles and links. It doesn't look like they test the
results of rendering which could be more useful?


On Tue, Sep 3, 2013 at 3:19 PM, Dominik Behr db...@google.com wrote:

 Thanks,
 I looked at piglit tests and they look OK if they are only supposed to
 test whether the shader compiles and links. It doesn't look like they test
 the results of rendering which could be more useful?

 --
 Dominik


 On Tue, Sep 3, 2013 at 2:52 PM, Matt Turner matts...@gmail.com wrote:

 On Wed, Aug 28, 2013 at 1:10 PM, Dominik Behr db...@chromium.org wrote:
  Fixes a bug where if an uniform array is passed to a function the
 accesses
  to the array are not propagated so later all but the first vector of the
  uniform array are removed in parcel_out_uniform_storage resulting in
  broken shaders and out of bounds access to arrays in
  brw::vec4_visitor::pack_uniform_registers.
 
  Signed-off-by: Dominik Behr db...@chromium.org
  ---
   src/glsl/link_functions.cpp | 29 +
   1 file changed, 29 insertions(+)
 
  diff --git a/src/glsl/link_functions.cpp b/src/glsl/link_functions.cpp
  index 6b3e154..d935546 100644
  --- a/src/glsl/link_functions.cpp
  +++ b/src/glsl/link_functions.cpp
  @@ -173,6 +173,35 @@ public:
 return visit_continue;
  }
 
  +   virtual ir_visitor_status visit_leave(ir_call *ir)
  +   {
  +  /* Traverse list of function parameters, and for array parameters
  + propagate max_array_access, Otherwise arrays that are only
 referenced
  + from inside functions via function parameters will be
 incorrectly
  + optimized. This will lead to incorrect code being generated
 (or worse).
  + Do it when leaving the node so the childen would propagate
 their
  + array accesses first */
  +
  +  const exec_node *formal_param_node =
 ir-callee-parameters.get_head();
  +  const exec_node *actual_param_node =
 ir-actual_parameters.get_head();
  +  while (!actual_param_node-is_tail_sentinel()) {
  + ir_variable *formal_param = (ir_variable *) formal_param_node;
  + ir_rvalue *actual_param = (ir_rvalue *) actual_param_node;
  +
  + formal_param_node = formal_param_node-get_next();
  + actual_param_node = actual_param_node-get_next();
  +
  + if (formal_param-type-is_array()) {
  +ir_dereference_variable *deref =
 actual_param-as_dereference_variable();
  +if (deref  deref-var  deref-var-type-is_array()) {
  +   deref-var-max_array_access =
  +  MAX2(formal_param-max_array_access,
 deref-var-max_array_access);
  +}
  + }
  +  }
  +  return visit_continue;
  +   }
  +
  virtual ir_visitor_status visit(ir_dereference_variable *ir)
  {
 if (hash_table_find(locals, ir-var) == NULL) {
  --
  1.8.3.1

 Reviewed-and-Tested-by: Matt Turner matts...@gmail.com

 I've sent four tests to the piglit list and Cc'd you. Take a look at
 them and make sure they're exercising the thing you want to test.

 I'll commit this patch tomorrow, assuming no other comments or
 problems with the tests. I'll also tag it for the stable branches,
 since it's definitely a bug fix.

 Thanks a bunch, Dominik!

 Matt



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/gs: Don't assign gl_Layer its own slot in the VUE map.

2013-09-03 Thread Chad Versace

On 09/03/2013 04:11 PM, Paul Berry wrote:

---
  src/mesa/drivers/dri/i965/brw_vs.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
b/src/mesa/drivers/dri/i965/brw_vs.c
index b81a538..7c7493f 100644
--- a/src/mesa/drivers/dri/i965/brw_vs.c
+++ b/src/mesa/drivers/dri/i965/brw_vs.c
@@ -64,6 +64,11 @@ brw_compute_vue_map(struct brw_context *brw, struct 
brw_vue_map *vue_map,
 vue_map-slots_valid = slots_valid;
 int i;

+   /* gl_Layer doesn't get its own varying slot--it's stored in the virst VUE
+* slot (VARYING_SLOT_PSIZ).
+*/
+   slots_valid = ~VARYING_BIT_LAYER;
+
 /* Make sure that the values we store in vue_map-varying_to_slot and
  * vue_map-slot_to_varying won't overflow the signed chars that are used
  * to store them.  Note that since vue_map-slot_to_varying sometimes holds




Reviewed-by: Chad Versace chad.vers...@linux.intel.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/15] i965/gen6+: Support 128 varying components.

2013-09-03 Thread Paul Berry
GL 3.2 requires us to support 128 varying components for geometry
shader outputs and fragment shader inputs, and 64 varying components
otherwise.  But there's no hardware limitation that restricts us to 64
varying components, and core Mesa doesn't currently allow different
stages to have different maximum values, so I've gone ahead and
enabled 128 varying components for all stages.  This has the advantage
of increased test coverage, since piglit already has a number of tests
to validate that the maximum advertised number of varying components
can be exchanged between VS and FS.  I've also gone ahead and
increased the limit for gen6 as well as gen7, since it required very
little extra work.

Previously, on gen6+, we relied on the SF/SBE stage of the pipeline to
reorder the outputs from the GS (or VS) to match the input ordering
required by the FS.  This allowed us to determine the order of FS
inputs solely based on the FS, so we avoided recompiles when separate
shader objects were in use.  But there's a problem with that: the
SF/SBE stage can't arbitrarily reorder more than 16 VUE slots (1 slot
= 4 varying components).  To avoid introducing additional recompiles
with previously-supported shaders, I've taken a hybrid approach to
choosing the FS input ordering: if the FS uses 16 or fewer input
varying slots, then it orders them solely based on its own
requirements.  If it uses more than 16 input varying slots, then it
orders them according to the GS (or VS) output VUE map, so that the
SF/SBE stage doesn't have to do any reordering.

Patches 1-3 modify the FS so that it exposes the order of input
varyings it needs via prog_data.

Patches 4-6 modify the SF/SBE setup so that it consults the FS
prog_data when choosing how to re-order varyings (previously, it
implicitly assumed an order that happened to match the order the FS
was using).

Patch 7 is a minor optimization made possible by patches 1-6: now that
the SF/SBE setup no longer makes implicit assumptions about the order
of the FS inputs, the FS no longer has to have dummy input slots for
gl_FragCoord and gl_FrontFacing.

Patch 8 tweaks the VUE map slightly so that it is uniquely determined
by a single 64-bit bitfield.  This will allow us to store the bitfield
in the FS program key rather than the entire VUE map.

Patch 9 is a minor optimization made possible by patch 8: now that the
VUE map is uniquely determined by a single 64-bit bitfield, we no
longer have to store the entire VUE map in the GS program key.

Patches 10-11 modify the FS to order its inputs according to the GS
(or VS) output VUE map when there are more than 16 input slots in use.

Patch 12 adjusts the VS and GS code so that it can output all 32
varyings to the VUE, even if it requires more than two URB writes to
do so.

Patches 13-14 make some minor gen6-specific adjustments to allow for
the larger URB entries needed for 32 vayings: the Gen6 transform
feedback code sometimes needs to do 2 URB writes instead of 1, and an
incorrect assertion in the gen6 URB setup needs to be fixed.

Patch 15 increases the value of MaxVarying from 16 to 32 for gen6+.

The series is available on branch increase-max-varyings of
https://github.com/stereotype441/mesa.git.  I've piglit tested it on
gen5, gen6, and gen7.

[PATCH 01/15] i965/fs: Expose urb_setup as part of brw_wm_prog_data.
[PATCH 02/15] i965/fs: Change brw_wm_prog_data::urb_read_length to 
num_varying_inputs.
[PATCH 03/15] i965/fs: Consult brw_wm_prog_data::num_varying_inputs when 
setting up WM state.
[PATCH 04/15] i965/sf: Use BRW_SF_URB_ENTRY_READ_OFFSET rather than hardcoded 
values.
[PATCH 05/15] i965/sf: Consolidate common code for setting up gen6-7 attribute 
overrides.
[PATCH 06/15] i965/sf: Consult brw_wm_prog_data when setting up SF/SBE state.
[PATCH 07/15] i965/fs: Stop wasting input attribute space on gl_FragCoord and 
gl_Frontfacing.
[PATCH 08/15] i965/gen6+: Remove VUE map dependency on userclip_active.
[PATCH 09/15] i965/gs: Stop storing an input VUE map in the GS program key.
[PATCH 10/15] i965/fs: Simplify computation of key.input_slots_valid during 
precompile.
[PATCH 11/15] i965/fs: When 64 input components, order them to match prev 
pipeline stage.
[PATCH 12/15] i965/vec4: Generate URB writes using a loop.
[PATCH 13/15] i965/gen6: Fix assertions on VS/GS URB size.
[PATCH 14/15] i965/ff_gs: Generate URB writes using a loop.
[PATCH 15/15] i965/gen6+: Support 128 varying components.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/15] i965/fs: Change brw_wm_prog_data::urb_read_length to num_varying_inputs.

2013-09-03 Thread Paul Berry
On gen4-5, the FS stage reads varying inputs from URB entries that
were output by the SF thread, where each register stores the
interpolation setup for two components of a vec4, therefore the FS
urb_read_length is twice the number of FS input varyings.  On gen6+,
varying inputs are directly deposited in the FS payload by the SF/SBE
fixed function logic, so urb_read_length is irrelevant.

However, in future patches, it will be nice to be able to consult
brw_wm_prog_data to determine how many varying inputs the FS expects
(rather than inferring it from gl_program::InputsRead).  So instead of
storing urb_read_length, we simply store num_varying_inputs in
brw_wm_prog_data.  On gen4-5, we multiply this by 2 to recover the URB
read length.
---
 src/mesa/drivers/dri/i965/brw_context.h  | 2 +-
 src/mesa/drivers/dri/i965/brw_fs.cpp | 7 ---
 src/mesa/drivers/dri/i965/brw_wm_state.c | 3 ++-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 41001d8..4c6aebe 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -314,7 +314,7 @@ struct brw_shader {
  */
 struct brw_wm_prog_data {
GLuint curb_read_length;
-   GLuint urb_read_length;
+   GLuint num_varying_inputs;
 
GLuint first_curbe_grf;
GLuint first_curbe_grf_16;
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 9e7d203..444c2b5 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1272,8 +1272,7 @@ fs_visitor::calculate_urb_setup()
  c-prog_data.urb_setup[VARYING_SLOT_PNTC] = urb_next++;
}
 
-   /* Each attribute is 4 setup channels, each of which is half a reg. */
-   c-prog_data.urb_read_length = urb_next * 2;
+   c-prog_data.num_varying_inputs = urb_next;
 }
 
 void
@@ -1298,7 +1297,9 @@ fs_visitor::assign_urb_setup()
   }
}
 
-   this-first_non_payload_grf = urb_start + c-prog_data.urb_read_length;
+   /* Each attribute is 4 setup channels, each of which is half a reg. */
+   this-first_non_payload_grf =
+  urb_start + c-prog_data.num_varying_inputs * 2;
 }
 
 /**
diff --git a/src/mesa/drivers/dri/i965/brw_wm_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_state.c
index 404fdad..4b06f66 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_state.c
@@ -133,7 +133,8 @@ brw_upload_wm_unit(struct brw_context *brw)
}
 
wm-thread3.dispatch_grf_start_reg = brw-wm.prog_data-first_curbe_grf;
-   wm-thread3.urb_entry_read_length = brw-wm.prog_data-urb_read_length;
+   wm-thread3.urb_entry_read_length =
+  brw-wm.prog_data-num_varying_inputs * 2;
wm-thread3.urb_entry_read_offset = 0;
wm-thread3.const_urb_entry_read_length =
   brw-wm.prog_data-curb_read_length;
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/15] i965/fs: Expose urb_setup as part of brw_wm_prog_data.

2013-09-03 Thread Paul Berry
At the moment, for Gen6+, the FS assumes that all varying inputs are
delivered to it in the order in which they appear in the
gl_program::InputsRead bitfield, and the SF/SBE setup code ensures
that they are delivered in this order.

When we add support for more than 64 varying components, this will no
longer always be possible, because the Gen6+ SF/SBE stage is only
capable of performing arbitrary reorderings of 16 varying slots.

To allow extra flexibility in the ordering of FS varyings, this patch
causes the FS to advertise exactly what ordering it expects.
---
 src/mesa/drivers/dri/i965/brw_context.h  |  7 +++
 src/mesa/drivers/dri/i965/brw_fs.cpp | 10 +-
 src/mesa/drivers/dri/i965/brw_fs.h   |  1 -
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |  4 ++--
 4 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 939083b..41001d8 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -336,6 +336,13 @@ struct brw_wm_prog_data {
 */
uint32_t barycentric_interp_modes;
 
+   /**
+* Map from gl_varying_slot to the position within the FS setup data
+* payload where the varying's attribute vertex deltas should be delivered.
+* For varying slots that are not used by the FS, the value is -1.
+*/
+   int urb_setup[VARYING_SLOT_MAX];
+
/* Pointers to tracked values (only valid once
 * _mesa_load_state_parameters has been called at runtime).
 *
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 96cb2ee..9e7d203 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1004,7 +1004,7 @@ fs_visitor::emit_general_interpolation(ir_variable *ir)
int location = ir-location;
for (unsigned int i = 0; i  array_elements; i++) {
   for (unsigned int j = 0; j  type-matrix_columns; j++) {
-if (urb_setup[location] == -1) {
+if (c-prog_data.urb_setup[location] == -1) {
/* If there's no incoming setup data for this slot, don't
 * emit interpolation for it.
 */
@@ -1231,7 +1231,7 @@ void
 fs_visitor::calculate_urb_setup()
 {
for (unsigned int i = 0; i  VARYING_SLOT_MAX; i++) {
-  urb_setup[i] = -1;
+  c-prog_data.urb_setup[i] = -1;
}
 
int urb_next = 0;
@@ -1239,7 +1239,7 @@ fs_visitor::calculate_urb_setup()
if (brw-gen = 6) {
   for (unsigned int i = 0; i  VARYING_SLOT_MAX; i++) {
 if (fp-Base.InputsRead  BITFIELD64_BIT(i)) {
-   urb_setup[i] = urb_next++;
+   c-prog_data.urb_setup[i] = urb_next++;
 }
   }
} else {
@@ -1257,7 +1257,7 @@ fs_visitor::calculate_urb_setup()
 * incremented, mapped or not.
 */
if (_mesa_varying_slot_in_fs((gl_varying_slot) i))
-  urb_setup[i] = urb_next;
+  c-prog_data.urb_setup[i] = urb_next;
 urb_next++;
 }
   }
@@ -1269,7 +1269,7 @@ fs_visitor::calculate_urb_setup()
* See compile_sf_prog() for more info.
*/
   if (fp-Base.InputsRead  BITFIELD64_BIT(VARYING_SLOT_PNTC))
- urb_setup[VARYING_SLOT_PNTC] = urb_next++;
+ c-prog_data.urb_setup[VARYING_SLOT_PNTC] = urb_next++;
}
 
/* Each attribute is 4 setup channels, each of which is half a reg. */
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index cb4ac3b..b77d4de 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -454,7 +454,6 @@ public:
int first_non_payload_grf;
/** Either BRW_MAX_GRF or GEN7_MRF_HACK_START */
int max_grf;
-   int urb_setup[VARYING_SLOT_MAX];
 
fs_reg *fp_temp_regs;
fs_reg *fp_input_regs;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index b049436..911b53e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -2159,10 +2159,10 @@ fs_visitor::emit_dummy_fs()
 struct brw_reg
 fs_visitor::interp_reg(int location, int channel)
 {
-   int regnr = urb_setup[location] * 2 + channel / 2;
+   int regnr = c-prog_data.urb_setup[location] * 2 + channel / 2;
int stride = (channel  1) * 4;
 
-   assert(urb_setup[location] != -1);
+   assert(c-prog_data.urb_setup[location] != -1);
 
return brw_vec1_grf(regnr, stride);
 }
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/15] i965/fs: Consult brw_wm_prog_data::num_varying_inputs when setting up WM state.

2013-09-03 Thread Paul Berry
Previously, we assumed that the number of varying inputs consumed by
the fragment shader was equal to the number of bits set in
gl_program::InputsRead.  However, we'll soon be making two changes
that will cause that not to be true:

- We'll stop wasting varying input space for gl_FragCoord and
  gl_FrontFacing, which aren't varyings.

- For fragment shaders that have more than 16 varying inputs, we'll
  adjust the layout of the inputs to account for the fact that the
  SF/SBE pipeline stage can't reorder inputs beyond the first 16; if
  there are GS outputs that the FS doens't use (or vice versa) this
  may cause the number of FS varying inputs to change.

So, instead of trying to guess the number of FS inputs from
gl_program::InputsRead, simply read it from
brw_wm_prog_data:num_varying_inputs, which is guaranteed to be correct
since it's populated by fs_visitor::calculate_urb_setup().
---
 src/mesa/drivers/dri/i965/gen6_wm_state.c | 3 ++-
 src/mesa/drivers/dri/i965/gen7_wm_state.c | 5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c 
b/src/mesa/drivers/dri/i965/gen6_wm_state.c
index 6725805..fff5cc4 100644
--- a/src/mesa/drivers/dri/i965/gen6_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c
@@ -187,7 +187,8 @@ upload_wm_state(struct brw_context *brw)
   dw5 |= GEN6_WM_DISPATCH_ENABLE;
}
 
-   dw6 |= _mesa_bitcount_64(brw-fragment_program-Base.InputsRead) 
+   /* CACHE_NEW_WM_PROG */
+   dw6 |= brw-wm.prog_data-num_varying_inputs 
   GEN6_WM_NUM_SF_OUTPUTS_SHIFT;
if (multisampled_fbo) {
   /* _NEW_MULTISAMPLE */
diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c 
b/src/mesa/drivers/dri/i965/gen7_wm_state.c
index e5691fb..842f744 100644
--- a/src/mesa/drivers/dri/i965/gen7_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c
@@ -167,6 +167,7 @@ upload_ps_state(struct brw_context *brw)
 * rendering, CurrentFragmentProgram is used for this check to
 * differentiate between the GLSL and non-GLSL cases.
 */
+   /* BRW_NEW_FRAGMENT_PROGRAM */
if (ctx-Shader.CurrentFragmentProgram == NULL)
   dw2 |= GEN7_PS_FLOATING_POINT_MODE_ALT;
 
@@ -190,8 +191,8 @@ upload_ps_state(struct brw_context *brw)
   dw4 |= GEN7_PS_DUAL_SOURCE_BLEND_ENABLE;
}
 
-   /* BRW_NEW_FRAGMENT_PROGRAM */
-   if (brw-fragment_program-Base.InputsRead != 0)
+   /* CACHE_NEW_WM_PROG */
+   if (brw-wm.prog_data-num_varying_inputs != 0)
   dw4 |= GEN7_PS_ATTRIBUTE_ENABLE;
 
dw4 |= GEN7_PS_8_DISPATCH_ENABLE;
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/15] i965/sf: Use BRW_SF_URB_ENTRY_READ_OFFSET rather than hardcoded values.

2013-09-03 Thread Paul Berry
We always program the SF unit to start reading the vertex URB entry at
offset 1.  In upcoming patches, we'll be adding FS code that relies on
this.  So consistently use the constant BRW_SF_URB_ENTRY_READ_OFFSET
rather than hardcoding a 1.
---
 src/mesa/drivers/dri/i965/brw_context.h   | 10 ++
 src/mesa/drivers/dri/i965/brw_sf.h|  2 --
 src/mesa/drivers/dri/i965/gen6_sf_state.c |  2 +-
 src/mesa/drivers/dri/i965/gen7_sf_state.c |  2 +-
 4 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 4c6aebe..a0a8d4f 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -478,6 +478,16 @@ struct brw_sf_prog_data {
GLuint urb_entry_size;
 };
 
+
+/**
+ * We always program SF to start reading at an offset of 1 (2 varying slots)
+ * from the start of the vertex URB entry.  This causes it to skip:
+ * - VARYING_SLOT_PSIZ and BRW_VARYING_SLOT_NDC on gen4-5
+ * - VARYING_SLOT_PSIZ and VARYING_SLOT_POS on gen6+
+ */
+#define BRW_SF_URB_ENTRY_READ_OFFSET 1
+
+
 struct brw_clip_prog_data {
GLuint curb_read_length;/* user planes? */
GLuint clip_mode;
diff --git a/src/mesa/drivers/dri/i965/brw_sf.h 
b/src/mesa/drivers/dri/i965/brw_sf.h
index 09880fe..0006239 100644
--- a/src/mesa/drivers/dri/i965/brw_sf.h
+++ b/src/mesa/drivers/dri/i965/brw_sf.h
@@ -105,6 +105,4 @@ void brw_emit_point_setup( struct brw_sf_compile *c, bool 
allocate );
 void brw_emit_point_sprite_setup( struct brw_sf_compile *c, bool allocate );
 void brw_emit_anyprim_setup( struct brw_sf_compile *c );
 
-#define BRW_SF_URB_ENTRY_READ_OFFSET 1
-
 #endif
diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c 
b/src/mesa/drivers/dri/i965/gen6_sf_state.c
index c76debe..dfe9a31 100644
--- a/src/mesa/drivers/dri/i965/gen6_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c
@@ -138,7 +138,7 @@ upload_sf_state(struct brw_context *brw)
bool multisampled_fbo = ctx-DrawBuffer-Visual.samples  1;
 
int attr = 0, input_index = 0;
-   int urb_entry_read_offset = 1;
+   const int urb_entry_read_offset = BRW_SF_URB_ENTRY_READ_OFFSET;
float point_size;
uint16_t attr_overrides[VARYING_SLOT_MAX];
uint32_t point_sprite_origin;
diff --git a/src/mesa/drivers/dri/i965/gen7_sf_state.c 
b/src/mesa/drivers/dri/i965/gen7_sf_state.c
index 0ff3388..715eb6c 100644
--- a/src/mesa/drivers/dri/i965/gen7_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sf_state.c
@@ -40,7 +40,7 @@ upload_sbe_state(struct brw_context *brw)
uint32_t dw1, dw10, dw11;
int i;
int attr = 0, input_index = 0;
-   int urb_entry_read_offset = 1;
+   const int urb_entry_read_offset = BRW_SF_URB_ENTRY_READ_OFFSET;
uint16_t attr_overrides[VARYING_SLOT_MAX];
/* _NEW_BUFFERS */
bool render_to_fbo = _mesa_is_user_fbo(ctx-DrawBuffer);
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/15] i965/sf: Consolidate common code for setting up gen6-7 attribute overrides.

2013-09-03 Thread Paul Berry
---
 src/mesa/drivers/dri/i965/brw_state.h |   9 +-
 src/mesa/drivers/dri/i965/gen6_sf_state.c | 153 +-
 src/mesa/drivers/dri/i965/gen7_sf_state.c |  64 +
 3 files changed, 97 insertions(+), 129 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index 22e4a61..dd3e216 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -223,9 +223,12 @@ void gen4_init_vtable_sampler_functions(struct brw_context 
*brw);
 void gen7_init_vtable_sampler_functions(struct brw_context *brw);
 
 /* gen6_sf_state.c */
-uint32_t
-get_attr_override(const struct brw_vue_map *vue_map, int urb_entry_read_offset,
-  int fs_attr, bool two_side_color, uint32_t *max_source_attr);
+void
+calculate_attr_overrides(const struct brw_context *brw,
+ uint16_t *attr_overrides,
+ uint32_t *point_sprite_enables,
+ uint32_t *flat_enables,
+ uint32_t *urb_entry_read_length);
 
 /* brw_vs_surface_state.c */
 void
diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c 
b/src/mesa/drivers/dri/i965/gen6_sf_state.c
index dfe9a31..7094994 100644
--- a/src/mesa/drivers/dri/i965/gen6_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c
@@ -52,7 +52,7 @@
  * the VUE that are not needed by the fragment shader.  It is measured in
  * 256-bit increments.
  */
-uint32_t
+static uint32_t
 get_attr_override(const struct brw_vue_map *vue_map, int urb_entry_read_offset,
   int fs_attr, bool two_side_color, uint32_t *max_source_attr)
 {
@@ -123,21 +123,98 @@ get_attr_override(const struct brw_vue_map *vue_map, int 
urb_entry_read_offset,
return source_attr;
 }
 
+
+/**
+ * Create the mapping from the FS inputs we produce to the VS outputs they
+ * source from.
+ */
+void
+calculate_attr_overrides(const struct brw_context *brw,
+ uint16_t *attr_overrides,
+ uint32_t *point_sprite_enables,
+ uint32_t *flat_enables,
+ uint32_t *urb_entry_read_length)
+{
+   const int urb_entry_read_offset = BRW_SF_URB_ENTRY_READ_OFFSET;
+   uint32_t max_source_attr = 0;
+   int input_index = 0;
+
+   /* _NEW_LIGHT */
+   bool shade_model_flat = brw-ctx.Light.ShadeModel == GL_FLAT;
+
+   for (int attr = 0; attr  VARYING_SLOT_MAX; attr++) {
+  enum glsl_interp_qualifier interp_qualifier =
+ brw-fragment_program-InterpQualifier[attr];
+  bool is_gl_Color = attr == VARYING_SLOT_COL0 || attr == 
VARYING_SLOT_COL1;
+
+  if (!(brw-fragment_program-Base.InputsRead  BITFIELD64_BIT(attr)))
+continue;
+
+  /* _NEW_POINT */
+  if (brw-ctx.Point.PointSprite 
+ (attr = VARYING_SLOT_TEX0  attr = VARYING_SLOT_TEX7) 
+ brw-ctx.Point.CoordReplace[attr - VARYING_SLOT_TEX0]) {
+*point_sprite_enables |= (1  input_index);
+  }
+
+  if (attr == VARYING_SLOT_PNTC)
+*point_sprite_enables |= (1  input_index);
+
+  /* flat shading */
+  if (interp_qualifier == INTERP_QUALIFIER_FLAT ||
+  (shade_model_flat  is_gl_Color 
+   interp_qualifier == INTERP_QUALIFIER_NONE))
+ *flat_enables |= (1  input_index);
+
+  /* The hardware can only do the overrides on 16 overrides at a
+   * time, and the other up to 16 have to be lined up so that the
+   * input index = the output index.  We'll need to do some
+   * tweaking to make sure that's the case.
+   */
+  assert(input_index  16 || attr == input_index);
+
+  /* BRW_NEW_VUE_MAP_GEOM_OUT | _NEW_LIGHT | _NEW_PROGRAM */
+  attr_overrides[input_index++] =
+ get_attr_override(brw-vue_map_geom_out,
+  urb_entry_read_offset, attr,
+   brw-ctx.VertexProgram._TwoSideEnabled,
+   max_source_attr);
+   }
+
+   for (; input_index  VARYING_SLOT_MAX; input_index++)
+  attr_overrides[input_index] = 0;
+
+   /* From the Sandy Bridge PRM, Volume 2, Part 1, documentation for
+* 3DSTATE_SF DWord 1 bits 15:11, Vertex URB Entry Read Length:
+*
+* This field should be set to the minimum length required to read the
+*  maximum source attribute.  The maximum source attribute is indicated
+*  by the maximum value of the enabled Attribute # Source Attribute if
+*  Attribute Swizzle Enable is set, Number of Output Attributes-1 if
+*  enable is not set.
+*  read_length = ceiling((max_source_attr + 1) / 2)
+*
+*  [errata] Corruption/Hang possible if length programmed larger than
+*  recommended
+*
+* Similar text exists for Ivy Bridge.
+*/
+   *urb_entry_read_length = ALIGN(max_source_attr + 1, 2) / 2;
+}
+
+
 static void
 upload_sf_state(struct brw_context *brw)
 {
struct gl_context *ctx = brw-ctx;
/* BRW_NEW_FRAGMENT_PROGRAM */
uint32_t 

[Mesa-dev] [PATCH 06/15] i965/sf: Consult brw_wm_prog_data when setting up SF/SBE state.

2013-09-03 Thread Paul Berry
Previously, the SF/SBE setup code delivered varying inputs to the FS
in the order in which they appear in the gl_program::InputsRead
bitfield, since that's what the FS expects.

When we add support for more than 64 varying components, this will no
longer always be the case, because the Gen6+ SF/SBE stage is only
capable of performing arbitrary reorderings of 16 varying slots.  So,
when there are more than 16 vec4's worth of varying inputs, the FS
will have to adjust the order its input varyings in order to partially
match the order of outputs from the geometry or vertex shader.

To allow extra flexibility in the ordering of FS varyings, this patch
causes the SF/SBE to deliver varying inputs to the FS in exactly the
order that the FS requests, by consulting brw_wm_prog_data::urb_setup
and brw_wm_prog_data::num_varying_inputs.
---
 src/mesa/drivers/dri/i965/gen6_sf_state.c | 45 ++-
 src/mesa/drivers/dri/i965/gen7_sf_state.c | 13 +
 2 files changed, 35 insertions(+), 23 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c 
b/src/mesa/drivers/dri/i965/gen6_sf_state.c
index 7094994..bcad5a4 100644
--- a/src/mesa/drivers/dri/i965/gen6_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c
@@ -137,17 +137,23 @@ calculate_attr_overrides(const struct brw_context *brw,
 {
const int urb_entry_read_offset = BRW_SF_URB_ENTRY_READ_OFFSET;
uint32_t max_source_attr = 0;
-   int input_index = 0;
 
/* _NEW_LIGHT */
bool shade_model_flat = brw-ctx.Light.ShadeModel == GL_FLAT;
 
+   /* Initialize all the attr_overrides to 0.  In the loop below we'll modify
+* just the ones that correspond to inputs used by the fs.
+*/
+   memset(attr_overrides, 0, 16*sizeof(*attr_overrides));
+
for (int attr = 0; attr  VARYING_SLOT_MAX; attr++) {
   enum glsl_interp_qualifier interp_qualifier =
  brw-fragment_program-InterpQualifier[attr];
   bool is_gl_Color = attr == VARYING_SLOT_COL0 || attr == 
VARYING_SLOT_COL1;
+  /* CACHE_NEW_WM_PROG */
+  int input_index = brw-wm.prog_data-urb_setup[attr];
 
-  if (!(brw-fragment_program-Base.InputsRead  BITFIELD64_BIT(attr)))
+  if (input_index  0)
 continue;
 
   /* _NEW_POINT */
@@ -166,23 +172,23 @@ calculate_attr_overrides(const struct brw_context *brw,
interp_qualifier == INTERP_QUALIFIER_NONE))
  *flat_enables |= (1  input_index);
 
-  /* The hardware can only do the overrides on 16 overrides at a
-   * time, and the other up to 16 have to be lined up so that the
-   * input index = the output index.  We'll need to do some
-   * tweaking to make sure that's the case.
-   */
-  assert(input_index  16 || attr == input_index);
-
   /* BRW_NEW_VUE_MAP_GEOM_OUT | _NEW_LIGHT | _NEW_PROGRAM */
-  attr_overrides[input_index++] =
+  uint16_t attr_override =
  get_attr_override(brw-vue_map_geom_out,
   urb_entry_read_offset, attr,
brw-ctx.VertexProgram._TwoSideEnabled,
max_source_attr);
-   }
 
-   for (; input_index  VARYING_SLOT_MAX; input_index++)
-  attr_overrides[input_index] = 0;
+  /* The hardware can only do the overrides on 16 overrides at a
+   * time, and the other up to 16 have to be lined up so that the
+   * input index = the output index.  We'll need to do some
+   * tweaking to make sure that's the case.
+   */
+  if (input_index  16)
+ attr_overrides[input_index] = attr_override;
+  else
+ assert(attr_override == input_index);
+   }
 
/* From the Sandy Bridge PRM, Volume 2, Part 1, documentation for
 * 3DSTATE_SF DWord 1 bits 15:11, Vertex URB Entry Read Length:
@@ -207,8 +213,8 @@ static void
 upload_sf_state(struct brw_context *brw)
 {
struct gl_context *ctx = brw-ctx;
-   /* BRW_NEW_FRAGMENT_PROGRAM */
-   uint32_t num_outputs = 
_mesa_bitcount_64(brw-fragment_program-Base.InputsRead);
+   /* CACHE_NEW_WM_PROG */
+   uint32_t num_outputs = brw-wm.prog_data-num_varying_inputs;
uint32_t dw1, dw2, dw3, dw4, dw16, dw17;
int i;
/* _NEW_BUFFER */
@@ -217,7 +223,7 @@ upload_sf_state(struct brw_context *brw)
 
const int urb_entry_read_offset = BRW_SF_URB_ENTRY_READ_OFFSET;
float point_size;
-   uint16_t attr_overrides[VARYING_SLOT_MAX];
+   uint16_t attr_overrides[16];
uint32_t point_sprite_origin;
 
dw1 = GEN6_SF_SWIZZLE_ENABLE | num_outputs  GEN6_SF_NUM_OUTPUTS_SHIFT;
@@ -353,7 +359,9 @@ upload_sf_state(struct brw_context *brw)
 (1  GEN6_SF_TRIFAN_PROVOKE_SHIFT);
}
 
-   /* BRW_NEW_VUE_MAP_GEOM_OUT | _NEW_POINT | _NEW_LIGHT | _NEW_PROGRAM */
+   /* BRW_NEW_VUE_MAP_GEOM_OUT | _NEW_POINT | _NEW_LIGHT | _NEW_PROGRAM |
+* CACHE_NEW_WM_PROG
+*/
uint32_t urb_entry_read_length;
calculate_attr_overrides(brw, attr_overrides, dw16, dw17,
 urb_entry_read_length);
@@ -391,7 +399,8 @@ const 

[Mesa-dev] [PATCH 07/15] i965/fs: Stop wasting input attribute space on gl_FragCoord and gl_FrontFacing.

2013-09-03 Thread Paul Berry
Previously, if a fragment shader accessed gl_FragCoord or
gl_FrontFacing, we would assign them their own slots in the fragment
shader input attribute array, using up space that could be made
available to real varyings.  This was not strictly necessary (since
these values are not true varyings, and are instead computed from
other data available in the FS payload).  But we had to do it anyway
because the SF/SBE setup code assumed that every 1 bit in the
gl_program::InputsRead bitfield corresponded to a genuine varying
variable.

Now that the SF/SBE code consults brw_wm_prog_data and only sets up
the attributes that the fragment shader actually needs, we don't have
to do this anymore.
---
 src/mesa/drivers/dri/i965/brw_context.h   | 9 +
 src/mesa/drivers/dri/i965/brw_fs.cpp  | 3 ++-
 src/mesa/drivers/dri/i965/gen6_sf_state.c | 8 
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index a0a8d4f..167ed4a 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -439,6 +439,15 @@ void brw_compute_vue_map(struct brw_context *brw, struct 
brw_vue_map *vue_map,
  GLbitfield64 slots_valid, bool userclip_active);
 
 
+/**
+ * Bitmask indicating which fragment shader inputs represent varyings (and
+ * hence have to be delivered to the fragment shader by the SF/SBE stage).
+ */
+#define BRW_FS_VARYING_INPUT_MASK \
+   (BITFIELD64_RANGE(0, VARYING_SLOT_MAX)  \
+~VARYING_BIT_POS  ~VARYING_BIT_FACE)
+
+
 /*
  * Mapping of VUE map slots to interpolation modes.
  */
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 444c2b5..013dc29 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1238,7 +1238,8 @@ fs_visitor::calculate_urb_setup()
/* Figure out where each of the incoming setup attributes lands. */
if (brw-gen = 6) {
   for (unsigned int i = 0; i  VARYING_SLOT_MAX; i++) {
-if (fp-Base.InputsRead  BITFIELD64_BIT(i)) {
+if (fp-Base.InputsRead  BRW_FS_VARYING_INPUT_MASK 
+ BITFIELD64_BIT(i)) {
c-prog_data.urb_setup[i] = urb_next++;
 }
   }
diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c 
b/src/mesa/drivers/dri/i965/gen6_sf_state.c
index bcad5a4..a093dc1 100644
--- a/src/mesa/drivers/dri/i965/gen6_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c
@@ -56,14 +56,6 @@ static uint32_t
 get_attr_override(const struct brw_vue_map *vue_map, int urb_entry_read_offset,
   int fs_attr, bool two_side_color, uint32_t *max_source_attr)
 {
-   if (fs_attr == VARYING_SLOT_POS) {
-  /* This attribute will be overwritten by the fragment shader's
-   * interpolation code (see emit_interp() in brw_wm_fp.c), so just let it
-   * reference the first available attribute.
-   */
-  return 0;
-   }
-
/* Find the VUE slot for this attribute. */
int slot = vue_map-varying_to_slot[fs_attr];
 
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/15] i965/gen6+: Remove VUE map dependency on userclip_active.

2013-09-03 Thread Paul Berry
Previously, on Gen6+, we laid out the vertex (or geometry) shader VUE
map differently depending whether user clipping was active.  If it was
active, we put the clip distances in slots 2 and 3 (where the clipper
expects them); if it was inactive, we assigned them in the order of
the gl_varying_slot enum.

This made for unnecessary recompiles, since turning clipping on/off
for a shader that used gl_ClipDistance might rearrange the varyings.
It also required extra bookkeeping, since it required the user
clipping flag to be provided to brw_compute_vue_map() as a parameter.

With this patch, we always put clip distances at in slots 2 and 3 if
they are written to.  do_vs_prog() and do_gs_prog() are responsible
for ensuring that clip distances are written to when user clipping is
enabled (as do_vs_prog() previously did for gen4-5).

This makes the only input to brw_compute_vue_map() a bitfield of which
varyings the shader writes to, a fact that we'll take advantage of in
forthcoming patches.
---
 src/mesa/drivers/dri/i965/brw_context.h |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4_gs.c | 15 ---
 src/mesa/drivers/dri/i965/brw_vs.c  | 26 +-
 3 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 167ed4a..0c1fd9e 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -436,7 +436,7 @@ static inline GLuint brw_varying_to_offset(struct 
brw_vue_map *vue_map,
 }
 
 void brw_compute_vue_map(struct brw_context *brw, struct brw_vue_map *vue_map,
- GLbitfield64 slots_valid, bool userclip_active);
+ GLbitfield64 slots_valid);
 
 
 /**
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs.c 
b/src/mesa/drivers/dri/i965/brw_vec4_gs.c
index 7ab03ac..94c4017 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs.c
@@ -62,9 +62,18 @@ do_gs_prog(struct brw_context *brw,
c.prog_data.base.param = rzalloc_array(NULL, const float *, param_count);
c.prog_data.base.pull_param = rzalloc_array(NULL, const float *, 
param_count);
 
-   brw_compute_vue_map(brw, c.prog_data.base.vue_map,
-   gp-program.Base.OutputsWritten,
-   c.key.base.userclip_active);
+   GLbitfield64 outputs_written = gp-program.Base.OutputsWritten;
+
+   /* In order for legacy clipping to work, we need to populate the clip
+* distance varying slots whenever clipping is enabled, even if the vertex
+* shader doesn't write to gl_ClipDistance.
+*/
+   if (c.key.base.userclip_active) {
+  outputs_written |= BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST0);
+  outputs_written |= BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST1);
+   }
+
+   brw_compute_vue_map(brw, c.prog_data.base.vue_map, outputs_written);
 
/* Compute the output vertex size.
 *
diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
b/src/mesa/drivers/dri/i965/brw_vs.c
index b81a538..6b97f01 100644
--- a/src/mesa/drivers/dri/i965/brw_vs.c
+++ b/src/mesa/drivers/dri/i965/brw_vs.c
@@ -52,14 +52,10 @@ static inline void assign_vue_slot(struct brw_vue_map 
*vue_map,
 
 /**
  * Compute the VUE map for vertex shader program.
- *
- * Note that consumers of this map using cache keys must include
- * prog_data-userclip and prog_data-outputs_written in their key
- * (generated by CACHE_NEW_VS_PROG).
  */
 void
 brw_compute_vue_map(struct brw_context *brw, struct brw_vue_map *vue_map,
-GLbitfield64 slots_valid, bool userclip_active)
+GLbitfield64 slots_valid)
 {
vue_map-slots_valid = slots_valid;
int i;
@@ -107,10 +103,11 @@ brw_compute_vue_map(struct brw_context *brw, struct 
brw_vue_map *vue_map,
*/
   assign_vue_slot(vue_map, VARYING_SLOT_PSIZ);
   assign_vue_slot(vue_map, VARYING_SLOT_POS);
-  if (userclip_active) {
+  if (slots_valid  BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST0))
  assign_vue_slot(vue_map, VARYING_SLOT_CLIP_DIST0);
+  if (slots_valid  BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST1))
  assign_vue_slot(vue_map, VARYING_SLOT_CLIP_DIST1);
-  }
+
   /* front and back colors need to be consecutive so that we can use
* ATTRIBUTE_SWIZZLE_INPUTATTR_FACING to swizzle them when doing
* two-sided color.
@@ -267,15 +264,18 @@ do_vs_prog(struct brw_context *brw,
  outputs_written |= BITFIELD64_BIT(VARYING_SLOT_COL0);
   if (outputs_written  BITFIELD64_BIT(VARYING_SLOT_BFC1))
  outputs_written |= BITFIELD64_BIT(VARYING_SLOT_COL1);
+   }
 
-  if (c.key.base.userclip_active) {
- outputs_written |= BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST0);
- outputs_written |= BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST1);
-  }
+   /* In order for legacy clipping to work, we need to populate the clip
+* distance varying slots whenever clipping is enabled, even if the 

[Mesa-dev] [PATCH 09/15] i965/gs: Stop storing an input VUE map in the GS program key.

2013-09-03 Thread Paul Berry
Now that the vertex shader output VUE map is determined solely by a
64-bit bitfield, we don't have to store it in its entirety in the
geometry shader program key; instead, we can just store the bitfield,
and let the geometry shader infer the VUE map at compile time.

This dramatically reduces the size of the geometry shader program key,
which we want to keep small since it gets recomputed whenever the
active program changes.
---
 src/mesa/drivers/dri/i965/brw_vec4_gs.c   | 6 --
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 4 ++--
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h   | 3 ++-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs.c 
b/src/mesa/drivers/dri/i965/brw_vec4_gs.c
index 94c4017..5e67d1a 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs.c
@@ -167,10 +167,12 @@ do_gs_prog(struct brw_context *brw,
 
c.prog_data.output_topology = prim_to_hw_prim[gp-program.OutputType];
 
+   brw_compute_vue_map(brw, c.input_vue_map, c.key.input_varyings);
+
/* GS inputs are read from the VUE 256 bits (2 vec4's) at a time, so we
 * need to program a URB read length of ceiling(num_slots / 2).
 */
-   c.prog_data.base.urb_read_length = (c.key.input_vue_map.num_slots + 1) / 2;
+   c.prog_data.base.urb_read_length = (c.input_vue_map.num_slots + 1) / 2;
 
void *mem_ctx = ralloc_context(NULL);
unsigned program_size;
@@ -239,7 +241,7 @@ brw_upload_gs_prog(struct brw_context *brw)
   key.base.tex);
 
/* BRW_NEW_VUE_MAP_VS */
-   key.input_vue_map = brw-vue_map_vs;
+   key.input_varyings = brw-vue_map_vs.slots_valid;
 
if (!brw_search_cache(brw-cache, BRW_GS_PROG,
  key, sizeof(key),
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
index d82a26e..ae78855 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
@@ -70,8 +70,8 @@ vec4_gs_visitor::setup_varying_inputs(int payload_reg, int 
*attribute_map)
assert(num_input_vertices = MAX_GS_INPUT_VERTICES);
unsigned input_array_stride = c-prog_data.base.urb_read_length * 2;
 
-   for (int slot = 0; slot  c-key.input_vue_map.num_slots; slot++) {
-  int varying = c-key.input_vue_map.slot_to_varying[slot];
+   for (int slot = 0; slot  c-input_vue_map.num_slots; slot++) {
+  int varying = c-input_vue_map.slot_to_varying[slot];
   for (unsigned vertex = 0; vertex  num_input_vertices; vertex++) {
  attribute_map[BRW_VARYING_SLOT_COUNT * vertex + varying] =
 payload_reg + input_array_stride * vertex + slot;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
index fba0ac6..48623d8 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
@@ -37,7 +37,7 @@ struct brw_gs_prog_key
 {
struct brw_vec4_prog_key base;
 
-   struct brw_vue_map input_vue_map;
+   GLbitfield64 input_varyings;
 };
 
 
@@ -49,6 +49,7 @@ struct brw_gs_compile
struct brw_vec4_compile base;
struct brw_gs_prog_key key;
struct brw_gs_prog_data prog_data;
+   struct brw_vue_map input_vue_map;
 
struct brw_geometry_program *gp;
 };
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/15] i965/fs: Simplify computation of key.input_slots_valid during precompile.

2013-09-03 Thread Paul Berry
The for loop was rather silly.  In addition to checking brw-gen  6
on each loop iteration, it took pains to exclude bits from
fp-Base.InputsRead that don't correspond to fragment shader inputs.
But those bits would never have been set in the first place, since the
only bits that are ever set in fp-Base.InputsRead are fragment shader
inputs.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 013dc29..7950d5f6 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3150,17 +3150,7 @@ brw_fs_precompile(struct gl_context *ctx, struct 
gl_shader_program *prog)
}
 
if (brw-gen  6)
-  key.input_slots_valid |= BITFIELD64_BIT(VARYING_SLOT_POS);
-
-   for (int i = 0; i  VARYING_SLOT_MAX; i++) {
-  if (!(fp-Base.InputsRead  BITFIELD64_BIT(i)))
-continue;
-
-  if (brw-gen  6) {
- if (_mesa_varying_slot_in_fs((gl_varying_slot) i))
-key.input_slots_valid |= BITFIELD64_BIT(i);
-  }
-   }
+  key.input_slots_valid = fp-Base.InputsRead | VARYING_BIT_POS;
 
key.clamp_fragment_color = ctx-API == API_OPENGL_COMPAT;
 
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/15] i965/fs: When 64 input components, order them to match prev pipeline stage.

2013-09-03 Thread Paul Berry
Since the SF/SBE stage is only capable of performing arbitrary
reorderings of 16 varying slots, we can't arrange the fragment shader
inputs in an arbitrary order if there are more than 16 input varying
slots in use.  We need to make sure that slots 16-31 match the
corresponding outputs of the previous pipeline stage.

The easiest way to accomplish this is to just make all varying slots
match up with the previous pipeline stage.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 42 ++--
 src/mesa/drivers/dri/i965/brw_wm.c   |  3 ++-
 2 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 7950d5f6..8d73a0f 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1237,11 +1237,40 @@ fs_visitor::calculate_urb_setup()
int urb_next = 0;
/* Figure out where each of the incoming setup attributes lands. */
if (brw-gen = 6) {
-  for (unsigned int i = 0; i  VARYING_SLOT_MAX; i++) {
-if (fp-Base.InputsRead  BRW_FS_VARYING_INPUT_MASK 
- BITFIELD64_BIT(i)) {
-   c-prog_data.urb_setup[i] = urb_next++;
-}
+  if (_mesa_bitcount_64(fp-Base.InputsRead 
+BRW_FS_VARYING_INPUT_MASK) = 16) {
+ /* The SF/SBE pipeline stage can do arbitrary rearrangement of the
+  * first 16 varying inputs, so we can put them wherever we want.
+  * Just put them in order.
+  */
+ for (unsigned int i = 0; i  VARYING_SLOT_MAX; i++) {
+if (fp-Base.InputsRead  BRW_FS_VARYING_INPUT_MASK 
+BITFIELD64_BIT(i)) {
+   c-prog_data.urb_setup[i] = urb_next++;
+}
+ }
+  } else {
+ /* We have enough input varyings that the SF/SBE pipeline stage can't
+  * arbitrarily rearrange them to suit our whim; we have to put them
+  * in an order that matches the output of the previous pipeline stage
+  * (geometry or vertex shader).
+  */
+ struct brw_vue_map prev_stage_vue_map;
+ brw_compute_vue_map(brw, prev_stage_vue_map,
+ c-key.input_slots_valid);
+ int first_slot = 2 * BRW_SF_URB_ENTRY_READ_OFFSET;
+ assert(prev_stage_vue_map.num_slots = first_slot + 32);
+ for (int slot = first_slot; slot  prev_stage_vue_map.num_slots;
+  slot++) {
+int varying = prev_stage_vue_map.slot_to_varying[slot];
+if (varying != BRW_VARYING_SLOT_COUNT 
+(fp-Base.InputsRead  BRW_FS_VARYING_INPUT_MASK 
+ BITFIELD64_BIT(varying))) {
+   c-prog_data.urb_setup[varying] = slot - first_slot;
+   urb_next = MAX2(urb_next, slot + 1);
+}
+ }
+ urb_next = prev_stage_vue_map.num_slots - first_slot;
   }
} else {
   /* FINISHME: The sf doesn't map VS-FS inputs for us very well. */
@@ -3149,7 +3178,8 @@ brw_fs_precompile(struct gl_context *ctx, struct 
gl_shader_program *prog)
   key.iz_lookup |= IZ_DEPTH_WRITE_ENABLE_BIT;
}
 
-   if (brw-gen  6)
+   if (brw-gen  6 || _mesa_bitcount_64(fp-Base.InputsRead 
+ BRW_FS_VARYING_INPUT_MASK)  16)
   key.input_slots_valid = fp-Base.InputsRead | VARYING_BIT_POS;
 
key.clamp_fragment_color = ctx-API == API_OPENGL_COMPAT;
diff --git a/src/mesa/drivers/dri/i965/brw_wm.c 
b/src/mesa/drivers/dri/i965/brw_wm.c
index 3df2b7d..3e59880 100644
--- a/src/mesa/drivers/dri/i965/brw_wm.c
+++ b/src/mesa/drivers/dri/i965/brw_wm.c
@@ -466,7 +466,8 @@ static void brw_wm_populate_key( struct brw_context *brw,
   (ctx-Multisample.SampleAlphaToCoverage || ctx-Color.AlphaEnabled);
 
/* BRW_NEW_VUE_MAP_GEOM_OUT */
-   if (brw-gen  6)
+   if (brw-gen  6 || _mesa_bitcount_64(fp-program.Base.InputsRead 
+ BRW_FS_VARYING_INPUT_MASK)  16)
   key-input_slots_valid = brw-vue_map_geom_out.slots_valid;
 
/* The unique fragment program ID */
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/15] i965/vec4: Generate URB writes using a loop.

2013-09-03 Thread Paul Berry
Previously we only ever did 1 or 2 URB writes, since the maximum
number of varyings we support is small enough to fit in 2 URB writes.
But GL 3.2 requires the geometry shader to support 128 output varying
components, and this could require up to 3 URB writes.
---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 52 +++---
 1 file changed, 21 insertions(+), 31 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 6771630..98b0a9b 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -2851,47 +2851,37 @@ vec4_visitor::emit_vertex()
   emit_clip_distances(output_reg[VARYING_SLOT_CLIP_DIST1], 4);
}
 
-   /* Set up the VUE data for the first URB write */
-   int slot;
-   for (slot = 0; slot  prog_data-vue_map.num_slots; ++slot) {
-  emit_urb_slot(mrf++, prog_data-vue_map.slot_to_varying[slot]);
-
-  /* If this was max_usable_mrf, we can't fit anything more into this URB
-   * WRITE.
+   /* We may need to split this up into several URB writes, so do them in a
+* loop.
+*/
+   int slot = 0;
+   bool complete = false;
+   do {
+  /* URB offset is in URB row increments, and each of our MRFs is half of
+   * one of those, since we're doing interleaved writes.
*/
-  if (mrf  max_usable_mrf) {
-slot++;
-break;
-  }
-   }
-
-   bool complete = slot = prog_data-vue_map.num_slots;
-   current_annotation = URB write;
-   vec4_instruction *inst = emit_urb_write_opcode(complete);
-   inst-base_mrf = base_mrf;
-   inst-mlen = align_interleaved_urb_mlen(brw, mrf - base_mrf);
+  int offset = slot / 2;
 
-   /* Optional second URB write */
-   if (!complete) {
   mrf = base_mrf + 1;
-
   for (; slot  prog_data-vue_map.num_slots; ++slot) {
-assert(mrf  max_usable_mrf);
-
  emit_urb_slot(mrf++, prog_data-vue_map.slot_to_varying[slot]);
+
+ /* If this was max_usable_mrf, we can't fit anything more into this
+  * URB WRITE.
+  */
+ if (mrf  max_usable_mrf) {
+slot++;
+break;
+ }
   }
 
+  complete = slot = prog_data-vue_map.num_slots;
   current_annotation = URB write;
-  inst = emit_urb_write_opcode(true /* complete */);
+  vec4_instruction *inst = emit_urb_write_opcode(complete);
   inst-base_mrf = base_mrf;
   inst-mlen = align_interleaved_urb_mlen(brw, mrf - base_mrf);
-  /* URB destination offset.  In the previous write, we got MRFs
-   * 2-13 minus the one header MRF, so 12 regs.  URB offset is in
-   * URB row increments, and each of our MRFs is half of one of
-   * those, since we're doing interleaved writes.
-   */
-  inst-offset = (max_usable_mrf - base_mrf) / 2;
-   }
+  inst-offset = offset;
+   } while(!complete);
 }
 
 void
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 13/15] i965/gen6: Fix assertions on VS/GS URB size.

2013-09-03 Thread Paul Berry
The {VS,GS} URB Entry Allocation Size fields of 3DSTATE_URB allow
values in the range 0-4, but they are U8-1 fields, so the range of
possible allocation sizes is 1-5.  We were erroneously prohibiting a
size of 5.
---
 src/mesa/drivers/dri/i965/gen6_urb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_urb.c 
b/src/mesa/drivers/dri/i965/gen6_urb.c
index e16d30a..86de9bd 100644
--- a/src/mesa/drivers/dri/i965/gen6_urb.c
+++ b/src/mesa/drivers/dri/i965/gen6_urb.c
@@ -86,8 +86,8 @@ gen6_upload_urb( struct brw_context *brw )
assert(brw-urb.nr_vs_entries = 24);
assert(brw-urb.nr_vs_entries % 4 == 0);
assert(brw-urb.nr_gs_entries % 4 == 0);
-   assert(vs_size  5);
-   assert(gs_size  5);
+   assert(vs_size = 5);
+   assert(gs_size = 5);
 
BEGIN_BATCH(3);
OUT_BATCH(_3DSTATE_URB  16 | (3 - 2));
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/15] i965/ff_gs: Generate URB writes using a loop.

2013-09-03 Thread Paul Berry
Previously we only ever did 1 URB write, since the maximum number of
varyings we support is small enough to fit in 1 URB write (when using
BRW_URB_SWIZZLE_NONE, which is what the pre-Gen7 GS always uses).  But
we're about to increase the number of varying components we support
from 64 to 128.

With 128 varyings, the most URB writes we'll have to do is 2, but it's
just as easy to write a general-purpose loop.
---
 src/mesa/drivers/dri/i965/brw_gs_emit.c | 61 -
 1 file changed, 38 insertions(+), 23 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_gs_emit.c 
b/src/mesa/drivers/dri/i965/brw_gs_emit.c
index 2c94eb0..9050b95 100644
--- a/src/mesa/drivers/dri/i965/brw_gs_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_gs_emit.c
@@ -169,31 +169,46 @@ static void brw_ff_gs_emit_vue(struct brw_ff_gs_compile 
*c,
bool last)
 {
struct brw_compile *p = c-func;
-   bool allocate = !last;
+   int write_offset = 0;
+   bool complete = false;
 
-   /* Copy the vertex from vertn into m1..mN+1:
-*/
-   brw_copy8(p, brw_message_reg(1), vert, c-nr_regs);
+   do {
+  /* We can't write more than 14 registers at a time to the URB */
+  int write_len = MIN2(c-nr_regs - write_offset, 14);
+  if (write_len == c-nr_regs - write_offset)
+ complete = true;
 
-   /* Send each vertex as a seperate write to the urb.  This is
-* different to the concept in brw_sf_emit.c, where subsequent
-* writes are used to build up a single urb entry.  Each of these
-* writes instantiates a seperate urb entry, and a new one must be
-* allocated each time.
-*/
-   brw_urb_WRITE(p, 
-allocate ? c-reg.temp
-  : retype(brw_null_reg(), BRW_REGISTER_TYPE_UD),
-0,
-c-reg.header,
-allocate ? BRW_URB_WRITE_ALLOCATE_COMPLETE
-  : BRW_URB_WRITE_EOT_COMPLETE,
-c-nr_regs + 1, /* msg length */
-allocate ? 1 : 0, /* response length */
-0, /* urb offset */
-BRW_URB_SWIZZLE_NONE);
-
-   if (allocate) {
+  /* Copy the vertex from vertn into m1..mN+1:
+   */
+  brw_copy8(p, brw_message_reg(1), offset(vert, write_offset), write_len);
+
+  /* Send the vertex data to the URB.  If this is the last write for this
+   * vertex, then we mark it as complete, and either end the thread or
+   * allocate another vertex URB entry (depending whether this is the last
+   * vertex).
+   */
+  enum brw_urb_write_flags flags;
+  if (!complete)
+ flags = BRW_URB_WRITE_NO_FLAGS;
+  else if (last)
+ flags = BRW_URB_WRITE_EOT_COMPLETE;
+  else
+ flags = BRW_URB_WRITE_ALLOCATE_COMPLETE;
+  brw_urb_WRITE(p,
+(flags  BRW_URB_WRITE_ALLOCATE) ? c-reg.temp
+: retype(brw_null_reg(), BRW_REGISTER_TYPE_UD),
+0,
+c-reg.header,
+flags,
+write_len + 1, /* msg length */
+(flags  BRW_URB_WRITE_ALLOCATE) ? 1
+: 0, /* response length */
+write_offset,  /* urb offset */
+BRW_URB_SWIZZLE_NONE);
+  write_offset += write_len;
+   } while (!complete);
+
+   if (!last) {
   brw_MOV(p, get_element_ud(c-reg.header, 0),
   get_element_ud(c-reg.temp, 0));
}
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 15/15] i965/gen6+: Support 128 varying components.

2013-09-03 Thread Paul Berry
GL 3.2 requires us to support 128 varying components for geometry
shader outputs and fragment shader inputs, and 64 varying components
otherwise.  But there's no hardware limitation that restricts us to 64
varying components, and core Mesa doesn't currently allow different
stages to have different maximum values, so just go ahead and enable
128 varying components for all stages.  This gets us better test
coverage anyway.

Even though we are only working on GL 3.2 support for gen7 right now,
gen6 also supports 128 varying components, so go ahead and switch it
on there too.
---
 src/mesa/drivers/dri/i965/brw_context.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 2321076..3c1e409 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -247,6 +247,9 @@ brw_initialize_context_constants(struct brw_context *brw)
ctx-Const.DisableGLSLLineContinuations =
   driQueryOptionb(brw-optionCache, disable_glsl_line_continuations);
 
+   if (brw-gen = 6)
+  ctx-Const.MaxVarying = 32;
+
/* We want the GLSL compiler to emit code that uses condition codes */
for (int i = 0; i  MESA_SHADER_TYPES; i++) {
   ctx-ShaderCompilerOptions[i].MaxIfDepth = brw-gen  6 ? 16 : UINT_MAX;
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] draw: cleanup the extra attribs

2013-09-03 Thread Stéphane Marchesin
Hi Zack,

This change regresses a bunch of point sprite piglit tests on i915g. Should
we revert back to the old behaviour? As far as I can see, it was correct
(it was keeping the attributes in case another stage is using them).

Stéphane



On Thu, Aug 8, 2013 at 12:46 PM, Zack Rusin za...@vmware.com wrote:

 Before inserting new front face and prim id outputs cleanup
 the old extra outputs, otherwise our cache will use previous
 output slots which will break as soon as outputs of the current
 shader don't match the last.

 Signed-off-by: Zack Rusin za...@vmware.com
 ---
  src/gallium/auxiliary/draw/draw_context.c |1 +
  1 file changed, 1 insertion(+)

 diff --git a/src/gallium/auxiliary/draw/draw_context.c
 b/src/gallium/auxiliary/draw/draw_context.c
 index af9caee..2dc6772 100644
 --- a/src/gallium/auxiliary/draw/draw_context.c
 +++ b/src/gallium/auxiliary/draw/draw_context.c
 @@ -555,6 +555,7 @@ draw_get_shader_info(const struct draw_context *draw)
  void
  draw_prepare_shader_outputs(struct draw_context *draw)
  {
 +   draw_remove_extra_vertex_attribs(draw);
 draw_ia_prepare_outputs(draw, draw-pipeline.ia);
 draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled);
  }
 --
 1.7.10.4
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] draw: cleanup the extra attribs

2013-09-03 Thread Matt Turner
On Tue, Sep 3, 2013 at 8:20 PM, Stéphane Marchesin
stephane.marche...@gmail.com wrote:
 Hi Zack,

 This change regresses a bunch of point sprite piglit tests on i915g. Should
 we revert back to the old behaviour? As far as I can see, it was correct (it
 was keeping the attributes in case another stage is using them).

 Stéphane

This commit actually already lead to three regression reports:

https://bugs.freedesktop.org/show_bug.cgi?id=67963
https://bugs.freedesktop.org/show_bug.cgi?id=67965
https://bugs.freedesktop.org/show_bug.cgi?id=67966

In fact, Zack has 11 regression reports filed (from Aug 3-10) against
commits he made, including one commit titled softpipe: fix the
regressions that oddly enough caused two regressions.

I have no idea why running piglit on Zack's system didn't catch these. ;)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Mesa (git 20130828) fails to build on MIPS

2013-09-03 Thread Christophe Jarry
Hello,

I encounter an error when I build mesa from git. I am on a MIPS
computer with ATI RS780E.

Here are the instructions I use for the build:

./autogen.sh \
--prefix=/usr \
--enable-gles2 \
--disable-gallium-egl \
--with-egl-platforms=x11,wayland,drm \
--enable-gbm \
--enable-shared-glapi \
--with-gallium-drivers=r300,r600,swrast \
--with-dri-drivers=radeon,swrast

make

Then, I get this error message:

make[2]: Entering directory `/usr/src/mesa/mesa-20130828/src/glsl'
  YACC glsl_parser.cpp
make[2]: *** [glsl_parser.cpp] Error 141

How do I know what the error #141 corresponds to?

Do you have an idea on how to fix this?

Thanks,

Christophe
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev