date:20130815

Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views

2013-08-15 Thread Michel Dänzer

On Don, 2013-08-15 at 05:25 +0200, Marek Olšák wrote:
 (This should be applied before MSAA, which will need to be rebased.)
 
 It moves all sampler view descriptors to a buffer.
 It supports partial resource updates and it can also unbind resources
 (required for FMASK texturing).
 
 The buffer contains all sampler view descriptors for one shader stage,
 represented as an array. On top of that, there are N arrays in the buffer,
 which are used to emulate context registers as implemented by the previous
 ASICs (each array is a context).
 
 This uses the RCU synchronization approach to avoid read-after-write hazards
 as discussed in the thread:
 radeonsi: add FMASK texture binding slots and resource setup
 
 CP DMA is used to clear the descriptors at context initialization and to copy
 the descriptors from one context to the next.
 
 IMPORTANT:
   128 resource contexts are needed, 64 doesn't work.

Hmm, I suppose this might depend on the specific GPU?

   If I set SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are 
 needed.

But that hurts performance?


The patch looks good to me, just a minor comment:


 +/* Emit a CP DMA packet to do a copy from one buffer to another.
 + * The size must fit in bits [20:0]. Notes:
 + *
 + * 1) Set sync to true if you want the 3D engine to wait until CP DMA is 
 done.
 + *
 + * 2) Set raw_hazard_wait to true if the source data was used as a 
 destination
 + *in a previous CP DMA packet. It's for preventing a read-after-write 
 hazard
 + *between two CP DMA packets.
 + */
 +static void si_emit_cp_dma_copy_buffer(struct r600_context *rctx,
 +uint64_t dst_va, uint64_t src_va,
 +unsigned size,
 +bool sync, bool raw_hazard_wait)

[...]

 + /* Copy the descriptors to a new context slot. */
 + si_emit_cp_dma_copy_buffer(rctx,
 +va_base + new_context_id * 
 desc-context_size,
 +va_base + desc-current_context_id * 
 desc-context_size,
 +desc-context_size, true, false);

It's hard to remember / guess the meaning of such boolean parameters, so
it might be better to use self-explanatory flags instead.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views

2013-08-15 Thread Christian König


Am 15.08.2013 09:33, schrieb Michel Dänzer:

On Don, 2013-08-15 at 05:25 +0200, Marek Olšák wrote:

(This should be applied before MSAA, which will need to be rebased.)

It moves all sampler view descriptors to a buffer.
It supports partial resource updates and it can also unbind resources
(required for FMASK texturing).

The buffer contains all sampler view descriptors for one shader stage,
represented as an array. On top of that, there are N arrays in the buffer,
which are used to emulate context registers as implemented by the previous
ASICs (each array is a context).

This uses the RCU synchronization approach to avoid read-after-write hazards
as discussed in the thread:
radeonsi: add FMASK texture binding slots and resource setup

CP DMA is used to clear the descriptors at context initialization and to copy
the descriptors from one context to the next.

IMPORTANT:
   128 resource contexts are needed, 64 doesn't work.

Hmm, I suppose this might depend on the specific GPU?


That looks like the SQ is indeed capable of handling more request at once.

Well there has to be an upper limit to this, we just have to figure it 
out from the docs.



   If I set SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are 
needed.

But that hurts performance?


Yeah, probably allot. Sounds like a complete pipeline drain to me.

Don't fully understand it either, but I think the commands from the IB 
are read through the KCACHE as well. So when you flush it the CP 
completely stops until that command is complete.




The patch looks good to me, just a minor comment:


Also have some more comments on this, but going to write a separate mail 
on it.


Christian.




+/* Emit a CP DMA packet to do a copy from one buffer to another.
+ * The size must fit in bits [20:0]. Notes:
+ *
+ * 1) Set sync to true if you want the 3D engine to wait until CP DMA is done.
+ *
+ * 2) Set raw_hazard_wait to true if the source data was used as a destination
+ *in a previous CP DMA packet. It's for preventing a read-after-write 
hazard
+ *between two CP DMA packets.
+ */
+static void si_emit_cp_dma_copy_buffer(struct r600_context *rctx,
+  uint64_t dst_va, uint64_t src_va,
+  unsigned size,
+  bool sync, bool raw_hazard_wait)

[...]


+   /* Copy the descriptors to a new context slot. */
+   si_emit_cp_dma_copy_buffer(rctx,
+  va_base + new_context_id * 
desc-context_size,
+  va_base + desc-current_context_id * 
desc-context_size,
+  desc-context_size, true, false);

It's hard to remember / guess the meaning of such boolean parameters, so
it might be better to use self-explanatory flags instead.




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: already pass coords in the right place in the sampler interface

2013-08-15 Thread Michel Dänzer

On Mit, 2013-08-14 at 18:35 +0200, srol...@vmware.com wrote:
 From: Roland Scheidegger srol...@vmware.com
 
 This makes things a bit nicer, and more importantly it fixes an issue
 where a downgraded array texture (due to view reduced to 1 layer and
 addressed with (non-array) samplec instruction) would use the wrong
 coord as shadow reference value. (This could also be fixed by passing
 target through the sampler interface much the same way as is done for
 size queries, might do this eventually anyway.)
 And if we'd ever want to support (shadow) cube map arrays, we'd need
 5 coords in any case.
 
 v2: fix bugs (texel fetch using wrong layer coord for 1d, shadow tex
 using wrong shadow coord for 2d...). Plus need to project the shadow
 coord, and just for fun keep projecting the layer coord too.

[...]

 @@ -1631,55 +1632,58 @@ emit_tex( struct lp_build_tgsi_soa_context *bld,
 }
  
 switch (inst-Texture.Texture) {
 -   case TGSI_TEXTURE_1D:
 -  num_coords = 1;
 -  num_offsets = 1;
 -  num_derivs = 1;
 -  break;
 case TGSI_TEXTURE_1D_ARRAY:
 -  num_coords = 2;
 +  layer_coord = 1;
 +  /* fallthrough */
 +   case TGSI_TEXTURE_1D:
num_offsets = 1;
num_derivs = 1;
break;
 +   case TGSI_TEXTURE_2D_ARRAY:
 +  layer_coord = 2;
 +  /* fallthrough */
 case TGSI_TEXTURE_2D:
 case TGSI_TEXTURE_RECT:
 -  num_coords = 2;
num_offsets = 2;
num_derivs = 2;
break;
 -   case TGSI_TEXTURE_SHADOW1D:
 case TGSI_TEXTURE_SHADOW1D_ARRAY:
 -  num_coords = 3;
 +  layer_coord = 1;
 +  /* fallthrough */
 +   case TGSI_TEXTURE_SHADOW1D:
 +  shadow_coord = 2;
num_offsets = 1;
num_derivs = 1;
break;
 +   case TGSI_TEXTURE_SHADOW2D_ARRAY:
 +  layer_coord = 2;
 +  shadow_coord = 3;
 +  num_offsets = 2;
 +  num_derivs = 2;
 +  break;
 case TGSI_TEXTURE_SHADOW2D:
 case TGSI_TEXTURE_SHADOWRECT:
 -   case TGSI_TEXTURE_2D_ARRAY:
 -  num_coords = 3;
 +  shadow_coord = 2;
num_offsets = 2;
num_derivs = 2;
break;
 case TGSI_TEXTURE_CUBE:
 -  num_coords = 3;
num_offsets = 2;
num_derivs = 3;
break;
 case TGSI_TEXTURE_3D:
 -  num_coords = 3;
num_offsets = 3;
num_derivs = 3;
break;
 -   case TGSI_TEXTURE_SHADOW2D_ARRAY:
 -  num_coords = 4;
 -  num_offsets = 2;
 -  num_derivs = 2;
 -  break;
 case TGSI_TEXTURE_SHADOWCUBE:
 -  num_coords = 4;
 +  shadow_coord = 3;
num_offsets = 2;
num_derivs = 3;
break;
 +   case TGSI_TEXTURE_CUBE_ARRAY:
 +   case TGSI_TEXTURE_SHADOWCUBE_ARRAY:
 +   case TGSI_TEXTURE_2D_MSAA:
 +   case TGSI_TEXTURE_2D_ARRAY_MSAA:
 default:
assert(0);
return;

Couldn't some of this code be simplified using
tgsi_util_get_texture_coord_dim()?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views

2013-08-15 Thread Christian König


Am 15.08.2013 05:25, schrieb Marek Olšák:

(This should be applied before MSAA, which will need to be rebased.)

It moves all sampler view descriptors to a buffer.
It supports partial resource updates and it can also unbind resources
(required for FMASK texturing).

The buffer contains all sampler view descriptors for one shader stage,
represented as an array. On top of that, there are N arrays in the buffer,
which are used to emulate context registers as implemented by the previous
ASICs (each array is a context).

This uses the RCU synchronization approach to avoid read-after-write hazards
as discussed in the thread:
radeonsi: add FMASK texture binding slots and resource setup

CP DMA is used to clear the descriptors at context initialization and to copy
the descriptors from one context to the next.

IMPORTANT:
   128 resource contexts are needed, 64 doesn't work. If I set
   SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are needed.
   I don't have an explanation for this.
---


The idea itself looks really good to me, but we should probably also 
move the all resources and samplers to the new model and then rip out 
the code that stores them directly into the IB.



+/* Emit a CP DMA packet to do a copy from one buffer to another.
+ * The size must fit in bits [20:0]. Notes:
+ *
+ * 1) Set sync to true if you want the 3D engine to wait until CP DMA is done.
+ *
+ * 2) Set raw_hazard_wait to true if the source data was used as a destination
+ *in a previous CP DMA packet. It's for preventing a read-after-write 
hazard
+ *between two CP DMA packets.
+ */
+static void si_emit_cp_dma_copy_buffer(struct r600_context *rctx,
+  uint64_t dst_va, uint64_t src_va,
+  unsigned size,
+  bool sync, bool raw_hazard_wait)
+{
+   struct radeon_winsys_cs *cs = rctx-cs;
+   uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0;
+   uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT : 0;
+
+   assert(size);
+   assert((size  ((121)-1)) == size);
+
+   cs-buf[cs-cdw++] = PKT3(PKT3_CP_DMA, 4, 0);
+   cs-buf[cs-cdw++] = src_va;  /* SRC_ADDR_LO [31:0] 
*/
+   cs-buf[cs-cdw++] = sync_flag | ((src_va  32)  0xff); /* CP_SYNC 
[31] | SRC_ADDR_HI [7:0] */
+   cs-buf[cs-cdw++] = dst_va;  /* DST_ADDR_LO [31:0] 
*/
+   cs-buf[cs-cdw++] = (dst_va  32)  0xff; /* DST_ADDR_HI [7:0] */
+   cs-buf[cs-cdw++] = size | raw_wait; /* COMMAND [29:22] | 
BYTE_COUNT [20:0] */
+}
+
+/* Emit a CP DMA packet to clear a buffer. The size must fit in bits [20:0]. */
+static void si_emit_cp_dma_clear_buffer(struct r600_context *rctx,
+   uint64_t dst_va, unsigned size,
+   uint32_t clear_value,
+   bool sync, bool raw_hazard_wait)
+{
+   struct radeon_winsys_cs *cs = rctx-cs;
+   uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0;
+   uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT : 0;
+
+   assert(size);
+   assert((size  ((121)-1)) == size);
+
+   cs-buf[cs-cdw++] = PKT3(PKT3_CP_DMA, 4, 0);
+   cs-buf[cs-cdw++] = clear_value; /* DATA [31:0] */
+   cs-buf[cs-cdw++] = sync_flag | PKT3_CP_DMA_SRC_SEL(2); /* CP_SYNC 
[31] | SRC_SEL[30:29] */
+   cs-buf[cs-cdw++] = dst_va;  /* DST_ADDR_LO [31:0] 
*/
+   cs-buf[cs-cdw++] = (dst_va  32)  0xff; /* DST_ADDR_HI [7:0] */
+   cs-buf[cs-cdw++] = size | raw_wait; /* COMMAND [29:22] | 
BYTE_COUNT [20:0] */
+}


Can we use some kind of macro or inline function instead of 
cs-buf[cs-cdw++]  ? That should help of we need to port that over to 
a different CS mechanism.


And IIRC the CP DMA is identical on all chipset generation (maybe 
excluding early R6xx, but I'm not 100% sure of that), so it might be a 
good idea to start sharing code again by putting this under 
src/gallium/drivers/radeon/radeon_cp_dma.c. Not necessary now, but 
more as a general idea. What do you think?


Christian.


+
+static void si_init_descriptors(struct r600_context *rctx,
+   struct si_descriptors *desc,
+   unsigned shader_userdata_reg,
+   unsigned element_dw_size,
+   unsigned num_elements,
+   void (*emit_func)(struct r600_context *ctx, 
struct si_atom *state))
+{
+   uint64_t va;
+
+   desc-atom.emit = emit_func;
+   desc-shader_userdata_reg = shader_userdata_reg;
+   desc-element_dw_size = element_dw_size;
+   desc-num_elements = num_elements;
+   desc-context_size = num_elements * element_dw_size * 4;
+
+   desc-buffer = (struct si_resource*)
+   pipe_buffer_create(rctx-context.screen, PIPE_BIND_CUSTOM,
+

Re: [Mesa-dev] radeonsi: LLVM r187139 broke some piglit tests

2013-08-15 Thread Michel Dänzer

On Fre, 2013-08-02 at 17:58 +0200, Michel Dänzer wrote:
 On Mit, 2013-07-31 at 08:42 -0700, Tom Stellard wrote:
  On Wed, Jul 31, 2013 at 01:04:01PM +0200, Michel Dänzer wrote:
   
   LLVM revision 187139 ('Allocate local registers in order for optimal
   coloring.') broke some derivative related piglit tests with the radeonsi
   driver. 
   
   I'm attaching a diff between the bad and good generated code (as printed
   with RADEON_DUMP_SHADERS=1) for the glsl-derivs test. The only
   difference I can see is in which registers are used in which order.
   
   I wonder if we might be missing S_WAITCNT after DS_READ/WRITE
   instructions in some cases, but I haven't spotted any candidates for
   that in the bad code which aren't there in the good code as well. Can
   anyone else spot something I've missed?
  
  Shouldn't we be using the S_BARRIER instruction to keep the threads in sync?
 
 Doesn't seem to help unfortunately, but thanks for the good suggestion.

I found one thing going wrong: DS_WRITE_B32 ends up using a VGPR
register number instead of the $gds operand for encoding the GDS field
(the asm output from llc even shows the VGPR name). If the VGPR number
happens to be odd (i.e. to have the least significant bit set), the
shader ends up writing to GDS instead of LDS.

But I have no idea why this is happening, or how to fix it. :(


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views

2013-08-15 Thread Marek Olšák

On Thu, Aug 15, 2013 at 9:33 AM, Michel Dänzer mic...@daenzer.net wrote:
 On Don, 2013-08-15 at 05:25 +0200, Marek Olšák wrote:
 (This should be applied before MSAA, which will need to be rebased.)

 It moves all sampler view descriptors to a buffer.
 It supports partial resource updates and it can also unbind resources
 (required for FMASK texturing).

 The buffer contains all sampler view descriptors for one shader stage,
 represented as an array. On top of that, there are N arrays in the buffer,
 which are used to emulate context registers as implemented by the previous
 ASICs (each array is a context).

 This uses the RCU synchronization approach to avoid read-after-write hazards
 as discussed in the thread:
 radeonsi: add FMASK texture binding slots and resource setup

 CP DMA is used to clear the descriptors at context initialization and to copy
 the descriptors from one context to the next.

 IMPORTANT:
   128 resource contexts are needed, 64 doesn't work.

 Hmm, I suppose this might depend on the specific GPU?

Very likely. I'm using SI VERDE.


   If I set SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are 
 needed.

 But that hurts performance?

I didn't test performance in that case, but presumably yes.



 The patch looks good to me, just a minor comment:


 +/* Emit a CP DMA packet to do a copy from one buffer to another.
 + * The size must fit in bits [20:0]. Notes:
 + *
 + * 1) Set sync to true if you want the 3D engine to wait until CP DMA is 
 done.
 + *
 + * 2) Set raw_hazard_wait to true if the source data was used as a 
 destination
 + *in a previous CP DMA packet. It's for preventing a read-after-write 
 hazard
 + *between two CP DMA packets.
 + */
 +static void si_emit_cp_dma_copy_buffer(struct r600_context *rctx,
 +uint64_t dst_va, uint64_t src_va,
 +unsigned size,
 +bool sync, bool raw_hazard_wait)

 [...]

 + /* Copy the descriptors to a new context slot. */
 + si_emit_cp_dma_copy_buffer(rctx,
 +va_base + new_context_id * 
 desc-context_size,
 +va_base + desc-current_context_id * 
 desc-context_size,
 +desc-context_size, true, false);

 It's hard to remember / guess the meaning of such boolean parameters, so
 it might be better to use self-explanatory flags instead.

I was thinking of that too. Alright, I'll add the flags.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] gallium, intel: Implements new __DRI_IMAGE_USE_LINEAR and PIPE_BIND_LINEAR flags to enforce no tiling.

2013-08-15 Thread Axel Davy

Signed-off-by: Axel Davy axel.d...@ens.fr
---
 include/GL/internal/dri_interface.h | 1 +
 src/gallium/drivers/i915/i915_resource.c| 8 ++--
 src/gallium/drivers/ilo/ilo_resource.c  | 2 +-
 src/gallium/drivers/nv50/nv50_miptree.c | 3 +++
 src/gallium/drivers/nvc0/nvc0_miptree.c | 3 +++
 src/gallium/drivers/r300/r300_texture.c | 2 +-
 src/gallium/drivers/r600/r600_texture.c | 3 ++-
 src/gallium/drivers/radeonsi/r600_texture.c | 2 +-
 src/gallium/include/pipe/p_defines.h| 4 
 src/gallium/state_trackers/dri/drm/dri2.c   | 2 ++
 src/mesa/drivers/dri/i915/intel_screen.c| 3 +++
 src/mesa/drivers/dri/i965/intel_screen.c| 3 +++
 12 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/include/GL/internal/dri_interface.h 
b/include/GL/internal/dri_interface.h
index be31bb8..709fece 100644
--- a/include/GL/internal/dri_interface.h
+++ b/include/GL/internal/dri_interface.h
@@ -968,6 +968,7 @@ struct __DRIdri2ExtensionRec {
 #define __DRI_IMAGE_USE_SHARE  0x0001
 #define __DRI_IMAGE_USE_SCANOUT0x0002
 #define __DRI_IMAGE_USE_CURSOR 0x0004 /* Depricated */
+#define __DRI_IMAGE_USE_LINEAR 0x0008
 
 
 /**
diff --git a/src/gallium/drivers/i915/i915_resource.c 
b/src/gallium/drivers/i915/i915_resource.c
index 314ebe9..627ed2b 100644
--- a/src/gallium/drivers/i915/i915_resource.c
+++ b/src/gallium/drivers/i915/i915_resource.c
@@ -12,8 +12,12 @@ i915_resource_create(struct pipe_screen *screen,
if (template-target == PIPE_BUFFER)
   return i915_buffer_create(screen, template);
else
-  return i915_texture_create(screen, template, FALSE);
-
+   {
+  if (!(template-bind  PIPE_BIND_LINEAR))
+ return i915_texture_create(screen, template, FALSE);
+  else
+ return i915_texture_create(screen, template, TRUE);
+   }
 }
 
 static struct pipe_resource *
diff --git a/src/gallium/drivers/ilo/ilo_resource.c 
b/src/gallium/drivers/ilo/ilo_resource.c
index 5061f69..7dd3435 100644
--- a/src/gallium/drivers/ilo/ilo_resource.c
+++ b/src/gallium/drivers/ilo/ilo_resource.c
@@ -473,7 +473,7 @@ tex_layout_init_tiling(struct tex_layout *layout)
 * The cursor surface address must be 4K byte aligned. The cursor must
 *  be in linear memory, it cannot be tiled.
 */
-   if (unlikely(templ-bind  PIPE_BIND_CURSOR))
+   if (unlikely(templ-bind  (PIPE_BIND_CURSOR | PIPE_BIND_LINEAR)))
   valid_tilings = tile_none;
 
/*
diff --git a/src/gallium/drivers/nv50/nv50_miptree.c 
b/src/gallium/drivers/nv50/nv50_miptree.c
index 28be768..e44c843 100644
--- a/src/gallium/drivers/nv50/nv50_miptree.c
+++ b/src/gallium/drivers/nv50/nv50_miptree.c
@@ -326,6 +326,9 @@ nv50_miptree_create(struct pipe_screen *pscreen,
pipe_reference_init(pt-reference, 1);
pt-screen = pscreen;
 
+   if (pt-bind  PIPE_BIND_LINEAR)
+  pt-flags |= NOUVEAU_RESOURCE_FLAG_LINEAR;
+
bo_config.nv50.memtype = nv50_mt_choose_storage_type(mt, TRUE);
 
if (!nv50_miptree_init_ms_mode(mt)) {
diff --git a/src/gallium/drivers/nvc0/nvc0_miptree.c 
b/src/gallium/drivers/nvc0/nvc0_miptree.c
index 9e57d74..f359207 100644
--- a/src/gallium/drivers/nvc0/nvc0_miptree.c
+++ b/src/gallium/drivers/nvc0/nvc0_miptree.c
@@ -274,6 +274,9 @@ nvc0_miptree_create(struct pipe_screen *pscreen,
   }
}
 
+   if (pt-bind  PIPE_BIND_LINEAR)
+  pt-flags |= NOUVEAU_RESOURCE_FLAG_LINEAR;
+
bo_config.nvc0.memtype = nvc0_mt_choose_storage_type(mt, compressed);
 
if (!nvc0_miptree_init_ms_mode(mt)) {
diff --git a/src/gallium/drivers/r300/r300_texture.c 
b/src/gallium/drivers/r300/r300_texture.c
index 13e9bc3..b7fb081 100644
--- a/src/gallium/drivers/r300/r300_texture.c
+++ b/src/gallium/drivers/r300/r300_texture.c
@@ -1079,7 +1079,7 @@ struct pipe_resource *r300_texture_create(struct 
pipe_screen *screen,
 enum radeon_bo_layout microtile, macrotile;
 
 if ((base-flags  R300_RESOURCE_FLAG_TRANSFER) ||
-(base-bind  PIPE_BIND_SCANOUT)) {
+(base-bind  (PIPE_BIND_SCANOUT | PIPE_BIND_LINEAR))) {
 microtile = RADEON_LAYOUT_LINEAR;
 macrotile = RADEON_LAYOUT_LINEAR;
 } else {
diff --git a/src/gallium/drivers/r600/r600_texture.c 
b/src/gallium/drivers/r600/r600_texture.c
index 36cca17..b81a432 100644
--- a/src/gallium/drivers/r600/r600_texture.c
+++ b/src/gallium/drivers/r600/r600_texture.c
@@ -609,7 +609,8 @@ struct pipe_resource *r600_texture_create(struct 
pipe_screen *screen,
 * because 422 formats are used for videos, which prefer linear buffers
 * for fast uploads anyway. */
if (!(templ-flags  R600_RESOURCE_FLAG_TRANSFER) 
-   desc-layout != UTIL_FORMAT_LAYOUT_SUBSAMPLED) {
+   (desc-layout != UTIL_FORMAT_LAYOUT_SUBSAMPLED) 
+   !(templ-bind  PIPE_BIND_LINEAR)) {
if (templ-flags  R600_RESOURCE_FLAG_FORCE_TILING) {
array_mode = V_038000_ARRAY_2D_TILED_THIN1;
} else if (!(templ-bind

Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views

2013-08-15 Thread Marek Olšák

On Thu, Aug 15, 2013 at 10:27 AM, Christian König
deathsim...@vodafone.de wrote:
 Am 15.08.2013 05:25, schrieb Marek Olšák:

 (This should be applied before MSAA, which will need to be rebased.)

 It moves all sampler view descriptors to a buffer.
 It supports partial resource updates and it can also unbind resources
 (required for FMASK texturing).

 The buffer contains all sampler view descriptors for one shader stage,
 represented as an array. On top of that, there are N arrays in the buffer,
 which are used to emulate context registers as implemented by the previous
 ASICs (each array is a context).

 This uses the RCU synchronization approach to avoid read-after-write
 hazards
 as discussed in the thread:
 radeonsi: add FMASK texture binding slots and resource setup

 CP DMA is used to clear the descriptors at context initialization and to
 copy
 the descriptors from one context to the next.

 IMPORTANT:
128 resource contexts are needed, 64 doesn't work. If I set
SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are
 needed.
I don't have an explanation for this.
 ---


 The idea itself looks really good to me, but we should probably also move
 the all resources and samplers to the new model and then rip out the code
 that stores them directly into the IB.

I'd like MSAA to land first, but yes, the plan is to eventually move
all resources and samplers to the new model.



 +/* Emit a CP DMA packet to do a copy from one buffer to another.
 + * The size must fit in bits [20:0]. Notes:
 + *
 + * 1) Set sync to true if you want the 3D engine to wait until CP DMA is
 done.
 + *
 + * 2) Set raw_hazard_wait to true if the source data was used as a
 destination
 + *in a previous CP DMA packet. It's for preventing a read-after-write
 hazard
 + *between two CP DMA packets.
 + */
 +static void si_emit_cp_dma_copy_buffer(struct r600_context *rctx,
 +  uint64_t dst_va, uint64_t src_va,
 +  unsigned size,
 +  bool sync, bool raw_hazard_wait)
 +{
 +   struct radeon_winsys_cs *cs = rctx-cs;
 +   uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0;
 +   uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT :
 0;
 +
 +   assert(size);
 +   assert((size  ((121)-1)) == size);
 +
 +   cs-buf[cs-cdw++] = PKT3(PKT3_CP_DMA, 4, 0);
 +   cs-buf[cs-cdw++] = src_va;/* SRC_ADDR_LO
 [31:0] */
 +   cs-buf[cs-cdw++] = sync_flag | ((src_va  32)  0xff); /*
 CP_SYNC [31] | SRC_ADDR_HI [7:0] */
 +   cs-buf[cs-cdw++] = dst_va;/* DST_ADDR_LO
 [31:0] */
 +   cs-buf[cs-cdw++] = (dst_va  32)  0xff; /* DST_ADDR_HI
 [7:0] */
 +   cs-buf[cs-cdw++] = size | raw_wait;   /* COMMAND [29:22]
 | BYTE_COUNT [20:0] */
 +}
 +
 +/* Emit a CP DMA packet to clear a buffer. The size must fit in bits
 [20:0]. */
 +static void si_emit_cp_dma_clear_buffer(struct r600_context *rctx,
 +   uint64_t dst_va, unsigned size,
 +   uint32_t clear_value,
 +   bool sync, bool raw_hazard_wait)
 +{
 +   struct radeon_winsys_cs *cs = rctx-cs;
 +   uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0;
 +   uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT :
 0;
 +
 +   assert(size);
 +   assert((size  ((121)-1)) == size);
 +
 +   cs-buf[cs-cdw++] = PKT3(PKT3_CP_DMA, 4, 0);
 +   cs-buf[cs-cdw++] = clear_value;   /* DATA [31:0] */
 +   cs-buf[cs-cdw++] = sync_flag | PKT3_CP_DMA_SRC_SEL(2); /*
 CP_SYNC [31] | SRC_SEL[30:29] */
 +   cs-buf[cs-cdw++] = dst_va;/* DST_ADDR_LO
 [31:0] */
 +   cs-buf[cs-cdw++] = (dst_va  32)  0xff; /* DST_ADDR_HI
 [7:0] */
 +   cs-buf[cs-cdw++] = size | raw_wait;   /* COMMAND [29:22]
 | BYTE_COUNT [20:0] */
 +}


 Can we use some kind of macro or inline function instead of
 cs-buf[cs-cdw++]  ? That should help of we need to port that over to a
 different CS mechanism.

How about this?

static INLINE void
r600_write_value(struct radeon_winsys_cs *cs, unsigned value)
{
cs-buf[cs-cdw++] = value;
}



 And IIRC the CP DMA is identical on all chipset generation (maybe excluding
 early R6xx, but I'm not 100% sure of that), so it might be a good idea to
 start sharing code again by putting this under
 src/gallium/drivers/radeon/radeon_cp_dma.c. Not necessary now, but more as
 a general idea. What do you think?

I agree.

CP DMA is indeed identical on all chipsets. The copying is supported
since R600 and the clearing is supported since Evergreen.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] gallium, intel: Implements new __DRI_IMAGE_USE_LINEAR and PIPE_BIND_LINEAR flags to enforce no tiling.

2013-08-15 Thread Axel Davy

Signed-off-by: Axel Davy axel.d...@ens.fr
---
 include/GL/internal/dri_interface.h | 1 +
 src/gallium/drivers/i915/i915_resource.c| 8 ++--
 src/gallium/drivers/ilo/ilo_resource.c  | 2 +-
 src/gallium/drivers/nv50/nv50_miptree.c | 3 +++
 src/gallium/drivers/nvc0/nvc0_miptree.c | 3 +++
 src/gallium/drivers/r300/r300_texture.c | 2 +-
 src/gallium/drivers/r600/r600_texture.c | 3 ++-
 src/gallium/drivers/radeonsi/r600_texture.c | 2 +-
 src/gallium/include/pipe/p_defines.h| 4 
 src/gallium/state_trackers/dri/drm/dri2.c   | 2 ++
 src/mesa/drivers/dri/i915/intel_screen.c| 3 +++
 src/mesa/drivers/dri/i965/intel_screen.c| 3 +++
 12 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/include/GL/internal/dri_interface.h 
b/include/GL/internal/dri_interface.h
index be31bb8..709fece 100644
--- a/include/GL/internal/dri_interface.h
+++ b/include/GL/internal/dri_interface.h
@@ -968,6 +968,7 @@ struct __DRIdri2ExtensionRec {
 #define __DRI_IMAGE_USE_SHARE  0x0001
 #define __DRI_IMAGE_USE_SCANOUT0x0002
 #define __DRI_IMAGE_USE_CURSOR 0x0004 /* Depricated */
+#define __DRI_IMAGE_USE_LINEAR 0x0008
 
 
 /**
diff --git a/src/gallium/drivers/i915/i915_resource.c 
b/src/gallium/drivers/i915/i915_resource.c
index 314ebe9..627ed2b 100644
--- a/src/gallium/drivers/i915/i915_resource.c
+++ b/src/gallium/drivers/i915/i915_resource.c
@@ -12,8 +12,12 @@ i915_resource_create(struct pipe_screen *screen,
if (template-target == PIPE_BUFFER)
   return i915_buffer_create(screen, template);
else
-  return i915_texture_create(screen, template, FALSE);
-
+   {
+  if (!(template-bind  PIPE_BIND_LINEAR))
+ return i915_texture_create(screen, template, FALSE);
+  else
+ return i915_texture_create(screen, template, TRUE);
+   }
 }
 
 static struct pipe_resource *
diff --git a/src/gallium/drivers/ilo/ilo_resource.c 
b/src/gallium/drivers/ilo/ilo_resource.c
index 5061f69..7dd3435 100644
--- a/src/gallium/drivers/ilo/ilo_resource.c
+++ b/src/gallium/drivers/ilo/ilo_resource.c
@@ -473,7 +473,7 @@ tex_layout_init_tiling(struct tex_layout *layout)
 * The cursor surface address must be 4K byte aligned. The cursor must
 *  be in linear memory, it cannot be tiled.
 */
-   if (unlikely(templ-bind  PIPE_BIND_CURSOR))
+   if (unlikely(templ-bind  (PIPE_BIND_CURSOR | PIPE_BIND_LINEAR)))
   valid_tilings = tile_none;
 
/*
diff --git a/src/gallium/drivers/nv50/nv50_miptree.c 
b/src/gallium/drivers/nv50/nv50_miptree.c
index 28be768..e44c843 100644
--- a/src/gallium/drivers/nv50/nv50_miptree.c
+++ b/src/gallium/drivers/nv50/nv50_miptree.c
@@ -326,6 +326,9 @@ nv50_miptree_create(struct pipe_screen *pscreen,
pipe_reference_init(pt-reference, 1);
pt-screen = pscreen;
 
+   if (pt-bind  PIPE_BIND_LINEAR)
+  pt-flags |= NOUVEAU_RESOURCE_FLAG_LINEAR;
+
bo_config.nv50.memtype = nv50_mt_choose_storage_type(mt, TRUE);
 
if (!nv50_miptree_init_ms_mode(mt)) {
diff --git a/src/gallium/drivers/nvc0/nvc0_miptree.c 
b/src/gallium/drivers/nvc0/nvc0_miptree.c
index 9e57d74..f359207 100644
--- a/src/gallium/drivers/nvc0/nvc0_miptree.c
+++ b/src/gallium/drivers/nvc0/nvc0_miptree.c
@@ -274,6 +274,9 @@ nvc0_miptree_create(struct pipe_screen *pscreen,
   }
}
 
+   if (pt-bind  PIPE_BIND_LINEAR)
+  pt-flags |= NOUVEAU_RESOURCE_FLAG_LINEAR;
+
bo_config.nvc0.memtype = nvc0_mt_choose_storage_type(mt, compressed);
 
if (!nvc0_miptree_init_ms_mode(mt)) {
diff --git a/src/gallium/drivers/r300/r300_texture.c 
b/src/gallium/drivers/r300/r300_texture.c
index 13e9bc3..b7fb081 100644
--- a/src/gallium/drivers/r300/r300_texture.c
+++ b/src/gallium/drivers/r300/r300_texture.c
@@ -1079,7 +1079,7 @@ struct pipe_resource *r300_texture_create(struct 
pipe_screen *screen,
 enum radeon_bo_layout microtile, macrotile;
 
 if ((base-flags  R300_RESOURCE_FLAG_TRANSFER) ||
-(base-bind  PIPE_BIND_SCANOUT)) {
+(base-bind  (PIPE_BIND_SCANOUT | PIPE_BIND_LINEAR))) {
 microtile = RADEON_LAYOUT_LINEAR;
 macrotile = RADEON_LAYOUT_LINEAR;
 } else {
diff --git a/src/gallium/drivers/r600/r600_texture.c 
b/src/gallium/drivers/r600/r600_texture.c
index 36cca17..b81a432 100644
--- a/src/gallium/drivers/r600/r600_texture.c
+++ b/src/gallium/drivers/r600/r600_texture.c
@@ -609,7 +609,8 @@ struct pipe_resource *r600_texture_create(struct 
pipe_screen *screen,
 * because 422 formats are used for videos, which prefer linear buffers
 * for fast uploads anyway. */
if (!(templ-flags  R600_RESOURCE_FLAG_TRANSFER) 
-   desc-layout != UTIL_FORMAT_LAYOUT_SUBSAMPLED) {
+   (desc-layout != UTIL_FORMAT_LAYOUT_SUBSAMPLED) 
+   !(templ-bind  PIPE_BIND_LINEAR)) {
if (templ-flags  R600_RESOURCE_FLAG_FORCE_TILING) {
array_mode = V_038000_ARRAY_2D_TILED_THIN1;
} else if (!(templ-bind

Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views

2013-08-15 Thread Christian König


Am 15.08.2013 12:54, schrieb Marek Olšák:

On Thu, Aug 15, 2013 at 10:27 AM, Christian König
deathsim...@vodafone.de wrote:

Am 15.08.2013 05:25, schrieb Marek Olšák:


(This should be applied before MSAA, which will need to be rebased.)

It moves all sampler view descriptors to a buffer.
It supports partial resource updates and it can also unbind resources
(required for FMASK texturing).

The buffer contains all sampler view descriptors for one shader stage,
represented as an array. On top of that, there are N arrays in the buffer,
which are used to emulate context registers as implemented by the previous
ASICs (each array is a context).

This uses the RCU synchronization approach to avoid read-after-write
hazards
as discussed in the thread:
radeonsi: add FMASK texture binding slots and resource setup

CP DMA is used to clear the descriptors at context initialization and to
copy
the descriptors from one context to the next.

IMPORTANT:
128 resource contexts are needed, 64 doesn't work. If I set
SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are
needed.
I don't have an explanation for this.
---


The idea itself looks really good to me, but we should probably also move
the all resources and samplers to the new model and then rip out the code
that stores them directly into the IB.

I'd like MSAA to land first, but yes, the plan is to eventually move
all resources and samplers to the new model.


Sounds good.




Can we use some kind of macro or inline function instead of
cs-buf[cs-cdw++]  ? That should help of we need to port that over to a
different CS mechanism.

How about this?

static INLINE void
r600_write_value(struct radeon_winsys_cs *cs, unsigned value)
{
 cs-buf[cs-cdw++] = value;
}


I would name it more like radeon_emit(cs, value) and put it in 
radeon_winsys.h, but yeah that's the general idea.


Christian.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nouveau/video: use correct parameter name

2013-08-15 Thread Rico Schüller


On 15.08.2013 02:10, Emil Velikov wrote:

Fix a typo introduced with commit d1ba1055d9 -
vl: Add support for max level query v2

Cc: Rico Schüller kgbric...@web.de
Cc: Christian König christian.koe...@amd.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68126
Signed-off-by: Emil Velikov emil.l.veli...@gmail.com
---

FWIW the original patch could have introduced default params for all profiles,
and let the individual driver provide their own function to ease duplication.
A call to get_params(VIDEO_CAP_SUPPORTED) ensures that we do not request the
CAP_MAX_LEVEL of a unsupported profile


And what are the correct default values? Are these the values which most 
hardware can do (at this time) or the maximum values? (for reference 
only for H.264 see: http://en.wikipedia.org/wiki/H.264#Levels ) It's 
pretty much likely that some video decoding units may have different 
supported levels and then we would add that back in again. I'm not sure 
this is really better. Does something like the attached meet your needs?


Thanks for the fix.

Cheers
Rico




  src/gallium/drivers/nouveau/nouveau_video.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nouveau_video.c 
b/src/gallium/drivers/nouveau/nouveau_video.c
index 1563b22..5c4ec0f 100644
--- a/src/gallium/drivers/nouveau/nouveau_video.c
+++ b/src/gallium/drivers/nouveau/nouveau_video.c
@@ -863,7 +863,7 @@ nouveau_screen_get_video_param(struct pipe_screen *pscreen,
 case PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE:
return true;
 case PIPE_VIDEO_CAP_MAX_LEVEL:
-  return vl_level_supported(screen, profile);
+  return vl_level_supported(pscreen, profile);
 default:
debug_printf(unknown video param: %d\n, param);
return 0;



commit d53abeff5304bb6148ddc1aae1d4810282042ee5
Author: Rico SchÃ¼ller kgbric...@web.de
Date:   Wed Aug 14 13:00:33 2013 +0200

vl/decoder: Use defaults for the supported level.

diff --git a/src/gallium/auxiliary/vl/vl_decoder.c b/src/gallium/auxiliary/vl/vl_decoder.c
index 16f09b5..68c6e4c 100644
--- a/src/gallium/auxiliary/vl/vl_decoder.c
+++ b/src/gallium/auxiliary/vl/vl_decoder.c
@@ -54,6 +54,20 @@ vl_level_supported(struct pipe_screen *screen, enum pipe_video_profile profile)
   case PIPE_VIDEO_PROFILE_MPEG2_SIMPLE:
   case PIPE_VIDEO_PROFILE_MPEG2_MAIN:
  return 3;
+  case PIPE_VIDEO_PROFILE_MPEG4_SIMPLE:
+ return 3;
+  case PIPE_VIDEO_PROFILE_MPEG4_ADVANCED_SIMPLE:
+ return 5;
+  case PIPE_VIDEO_PROFILE_VC1_SIMPLE:
+ return 1;
+  case PIPE_VIDEO_PROFILE_VC1_MAIN:
+ return 2;
+  case PIPE_VIDEO_PROFILE_VC1_ADVANCED:
+ return 4;
+  case PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE:
+  case PIPE_VIDEO_PROFILE_MPEG4_AVC_MAIN:
+  case PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH:
+ return 41;
   default:
  return 0;
}
diff --git a/src/gallium/auxiliary/vl/vl_decoder.h b/src/gallium/auxiliary/vl/vl_decoder.h
index b0b4161..cb4811d 100644
--- a/src/gallium/auxiliary/vl/vl_decoder.h
+++ b/src/gallium/auxiliary/vl/vl_decoder.h
@@ -38,7 +38,7 @@ bool
 vl_profile_supported(struct pipe_screen *screen, enum pipe_video_profile profile);
 
 /**
- * get the maximum supported level for the given profile with shader based decoding
+ * get the default maximum supported level for the given profile
  */
 int
 vl_level_supported(struct pipe_screen *screen, enum pipe_video_profile profile);
diff --git a/src/gallium/drivers/nv50/nv84_video.c b/src/gallium/drivers/nv50/nv84_video.c
index 3602a6d..0d1af8e 100644
--- a/src/gallium/drivers/nv50/nv84_video.c
+++ b/src/gallium/drivers/nv50/nv84_video.c
@@ -779,20 +779,7 @@ nv84_screen_get_video_param(struct pipe_screen *pscreen,
case PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE:
   return false;
case PIPE_VIDEO_CAP_MAX_LEVEL:
-  switch (profile) {
-  case PIPE_VIDEO_PROFILE_MPEG1:
- return 0;
-  case PIPE_VIDEO_PROFILE_MPEG2_SIMPLE:
-  case PIPE_VIDEO_PROFILE_MPEG2_MAIN:
- return 3;
-  case PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE:
-  case PIPE_VIDEO_PROFILE_MPEG4_AVC_MAIN:
-  case PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH:
- return 41;
-  default:
- debug_printf(unknown video profile: %d\n, profile);
- return 0;
-  }
+  return vl_level_supported(pscreen, profile);
default:
   debug_printf(unknown video param: %d\n, param);
   return 0;
diff --git a/src/gallium/drivers/nvc0/nvc0_video.c b/src/gallium/drivers/nvc0/nvc0_video.c
index a871ab7..b987344 100644
--- a/src/gallium/drivers/nvc0/nvc0_video.c
+++ b/src/gallium/drivers/nvc0/nvc0_video.c
@@ -49,30 +49,7 @@ nvc0_screen_get_video_param(struct pipe_screen *pscreen,
case PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE:
   return false;
case PIPE_VIDEO_CAP_MAX_LEVEL:
-  switch (profile) {
-  case PIPE_VIDEO_PROFILE_MPEG1:
- return 0;
-  case PIPE_VIDEO_PROFILE_MPEG2_SIMPLE:
-

Re: [Mesa-dev] [PATCH] gallivm: already pass coords in the right place in the sampler interface

2013-08-15 Thread Roland Scheidegger

Am 15.08.2013 10:16, schrieb Michel Dänzer:
 On Mit, 2013-08-14 at 18:35 +0200, srol...@vmware.com wrote:
 From: Roland Scheidegger srol...@vmware.com

 This makes things a bit nicer, and more importantly it fixes an issue
 where a downgraded array texture (due to view reduced to 1 layer and
 addressed with (non-array) samplec instruction) would use the wrong
 coord as shadow reference value. (This could also be fixed by passing
 target through the sampler interface much the same way as is done for
 size queries, might do this eventually anyway.)
 And if we'd ever want to support (shadow) cube map arrays, we'd need
 5 coords in any case.

 v2: fix bugs (texel fetch using wrong layer coord for 1d, shadow tex
 using wrong shadow coord for 2d...). Plus need to project the shadow
 coord, and just for fun keep projecting the layer coord too.
 
 [...]
 
 @@ -1631,55 +1632,58 @@ emit_tex( struct lp_build_tgsi_soa_context *bld,
 }
  
 switch (inst-Texture.Texture) {
 -   case TGSI_TEXTURE_1D:
 -  num_coords = 1;
 -  num_offsets = 1;
 -  num_derivs = 1;
 -  break;
 case TGSI_TEXTURE_1D_ARRAY:
 -  num_coords = 2;
 +  layer_coord = 1;
 +  /* fallthrough */
 +   case TGSI_TEXTURE_1D:
num_offsets = 1;
num_derivs = 1;
break;
 +   case TGSI_TEXTURE_2D_ARRAY:
 +  layer_coord = 2;
 +  /* fallthrough */
 case TGSI_TEXTURE_2D:
 case TGSI_TEXTURE_RECT:
 -  num_coords = 2;
num_offsets = 2;
num_derivs = 2;
break;
 -   case TGSI_TEXTURE_SHADOW1D:
 case TGSI_TEXTURE_SHADOW1D_ARRAY:
 -  num_coords = 3;
 +  layer_coord = 1;
 +  /* fallthrough */
 +   case TGSI_TEXTURE_SHADOW1D:
 +  shadow_coord = 2;
num_offsets = 1;
num_derivs = 1;
break;
 +   case TGSI_TEXTURE_SHADOW2D_ARRAY:
 +  layer_coord = 2;
 +  shadow_coord = 3;
 +  num_offsets = 2;
 +  num_derivs = 2;
 +  break;
 case TGSI_TEXTURE_SHADOW2D:
 case TGSI_TEXTURE_SHADOWRECT:
 -   case TGSI_TEXTURE_2D_ARRAY:
 -  num_coords = 3;
 +  shadow_coord = 2;
num_offsets = 2;
num_derivs = 2;
break;
 case TGSI_TEXTURE_CUBE:
 -  num_coords = 3;
num_offsets = 2;
num_derivs = 3;
break;
 case TGSI_TEXTURE_3D:
 -  num_coords = 3;
num_offsets = 3;
num_derivs = 3;
break;
 -   case TGSI_TEXTURE_SHADOW2D_ARRAY:
 -  num_coords = 4;
 -  num_offsets = 2;
 -  num_derivs = 2;
 -  break;
 case TGSI_TEXTURE_SHADOWCUBE:
 -  num_coords = 4;
 +  shadow_coord = 3;
num_offsets = 2;
num_derivs = 3;
break;
 +   case TGSI_TEXTURE_CUBE_ARRAY:
 +   case TGSI_TEXTURE_SHADOWCUBE_ARRAY:
 +   case TGSI_TEXTURE_2D_MSAA:
 +   case TGSI_TEXTURE_2D_ARRAY_MSAA:
 default:
assert(0);
return;
 
 Couldn't some of this code be simplified using
 tgsi_util_get_texture_coord_dim()?
 
 

Hmm yes this code is a bunch of crap but whenever I try to simplify it
it just gets more complicated.
Maybe it was a mistake to adjust layer coord position here, could do
that in the sampling code itself, which would mean
tgsi_util_get_texture_coord_dim() would fit quite well. The different
handling of derivs/offsets though sort of looks required to not fetch
these components for explicit derivatives, texel offsets as for instance
cube maps have 3 coords but only 2 offsets etc. Though I wonder if it's
possible that these can end up as invalid pointers, I guess not so could
probably stop caring about them and just fetch more components than
potentially necessary (after all llvm will immediately drop them anyway
as they won't get used). That definitely would clean things up.

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nouveau/video: use correct parameter name

2013-08-15 Thread Emil Velikov

On 15/08/13 13:41, Rico Schüller wrote:
 On 15.08.2013 02:10, Emil Velikov wrote:
 Fix a typo introduced with commit d1ba1055d9 -
 vl: Add support for max level query v2

 Cc: Rico Schüller kgbric...@web.de
 Cc: Christian König christian.koe...@amd.com
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68126
 Signed-off-by: Emil Velikov emil.l.veli...@gmail.com
 ---

 FWIW the original patch could have introduced default params for all
 profiles,
 and let the individual driver provide their own function to ease
 duplication.
 A call to get_params(VIDEO_CAP_SUPPORTED) ensures that we do not
 request the
 CAP_MAX_LEVEL of a unsupported profile

 And what are the correct default values? Are these the values which most
 hardware can do (at this time) or the maximum values? (for reference
 only for H.264 see: http://en.wikipedia.org/wiki/H.264#Levels ) It's
 pretty much likely that some video decoding units may have different
 supported levels and then we would add that back in again. I'm not sure
 this is really better. Does something like the attached meet your needs?

I was thinking that the ones used by most hardware can be considered
default. Anyway all this is a silly bikeshedding, which I could have
omitted.

Patch looks good, thanks.

Emil

 Thanks for the fix.

 Cheers
 Rico

   src/gallium/drivers/nouveau/nouveau_video.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/src/gallium/drivers/nouveau/nouveau_video.c
 b/src/gallium/drivers/nouveau/nouveau_video.c
 index 1563b22..5c4ec0f 100644
 --- a/src/gallium/drivers/nouveau/nouveau_video.c
 +++ b/src/gallium/drivers/nouveau/nouveau_video.c
 @@ -863,7 +863,7 @@ nouveau_screen_get_video_param(struct pipe_screen
 *pscreen,
  case PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE:
 return true;
  case PIPE_VIDEO_CAP_MAX_LEVEL:
 -  return vl_level_supported(screen, profile);
 +  return vl_level_supported(pscreen, profile);
  default:
 debug_printf(unknown video param: %d\n, param);
 return 0;

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views

2013-08-15 Thread Alex Deucher

On Wed, Aug 14, 2013 at 11:25 PM, Marek Olšák mar...@gmail.com wrote:
 (This should be applied before MSAA, which will need to be rebased.)

 It moves all sampler view descriptors to a buffer.
 It supports partial resource updates and it can also unbind resources
 (required for FMASK texturing).

 The buffer contains all sampler view descriptors for one shader stage,
 represented as an array. On top of that, there are N arrays in the buffer,
 which are used to emulate context registers as implemented by the previous
 ASICs (each array is a context).

 This uses the RCU synchronization approach to avoid read-after-write hazards
 as discussed in the thread:
 radeonsi: add FMASK texture binding slots and resource setup

 CP DMA is used to clear the descriptors at context initialization and to copy
 the descriptors from one context to the next.

 IMPORTANT:
   128 resource contexts are needed, 64 doesn't work. If I set
   SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are needed.
   I don't have an explanation for this.
 ---
  src/gallium/drivers/radeonsi/Makefile.sources  |   1 +
  src/gallium/drivers/radeonsi/r600_blit.c   |  12 +-
  src/gallium/drivers/radeonsi/r600_hw_context.c |  14 ++
  src/gallium/drivers/radeonsi/radeonsi_pipe.c   |   7 +-
  src/gallium/drivers/radeonsi/radeonsi_pipe.h   |  19 +-
  src/gallium/drivers/radeonsi/si_descriptors.c  | 335 
 +
  src/gallium/drivers/radeonsi/si_state.c|  47 ++--
  src/gallium/drivers/radeonsi/si_state.h|  56 +
  src/gallium/drivers/radeonsi/si_state_draw.c   |  18 +-
  src/gallium/drivers/radeonsi/sid.h |  43 
  10 files changed, 500 insertions(+), 52 deletions(-)
  create mode 100644 src/gallium/drivers/radeonsi/si_descriptors.c

 diff --git a/src/gallium/drivers/radeonsi/Makefile.sources 
 b/src/gallium/drivers/radeonsi/Makefile.sources
 index b3ffa72..68c8282 100644
 --- a/src/gallium/drivers/radeonsi/Makefile.sources
 +++ b/src/gallium/drivers/radeonsi/Makefile.sources
 @@ -10,6 +10,7 @@ C_SOURCES := \
 r600_translate.c \
 radeonsi_pm4.c \
 radeonsi_compute.c \
 +   si_descriptors.c \
 si_state.c \
 si_state_streamout.c \
 si_state_draw.c \
 diff --git a/src/gallium/drivers/radeonsi/r600_blit.c 
 b/src/gallium/drivers/radeonsi/r600_blit.c
 index bab108e..5bd1a62 100644
 --- a/src/gallium/drivers/radeonsi/r600_blit.c
 +++ b/src/gallium/drivers/radeonsi/r600_blit.c
 @@ -70,12 +70,12 @@ static void r600_blitter_begin(struct pipe_context *ctx, 
 enum r600_blitter_op op

 if (op  R600_SAVE_TEXTURES) {
 util_blitter_save_fragment_sampler_states(
 -   rctx-blitter, rctx-ps_samplers.n_samplers,
 -   (void**)rctx-ps_samplers.samplers);
 +   rctx-blitter, 
 rctx-samplers[PIPE_SHADER_FRAGMENT].n_samplers,
 +   
 (void**)rctx-samplers[PIPE_SHADER_FRAGMENT].samplers);

 -   util_blitter_save_fragment_sampler_views(
 -   rctx-blitter, rctx-ps_samplers.n_views,
 -   (struct pipe_sampler_view**)rctx-ps_samplers.views);
 +   util_blitter_save_fragment_sampler_views(rctx-blitter,
 +   
 util_bitcount(rctx-samplers[PIPE_SHADER_FRAGMENT].views.desc.enabled_mask),
 +   rctx-samplers[PIPE_SHADER_FRAGMENT].views.views);
 }

 if ((op  R600_DISABLE_RENDER_COND)  rctx-current_render_cond) {
 @@ -224,7 +224,7 @@ void si_flush_depth_textures(struct r600_context *rctx,
 struct pipe_sampler_view *view;
 struct r600_texture *tex;

 -   view = textures-views[i]-base;
 +   view = textures-views.views[i];
 if (!view) continue;

 tex = (struct r600_texture *)view-texture;
 diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c 
 b/src/gallium/drivers/radeonsi/r600_hw_context.c
 index 25c972b..cf43089 100644
 --- a/src/gallium/drivers/radeonsi/r600_hw_context.c
 +++ b/src/gallium/drivers/radeonsi/r600_hw_context.c
 @@ -114,9 +114,17 @@ err:
  void si_need_cs_space(struct r600_context *ctx, unsigned num_dw,
 boolean count_draw_in)
  {
 +   int i;
 +
 /* The number of dwords we already used in the CS so far. */
 num_dw += ctx-cs-cdw;

 +   for (i = 0; i  SI_NUM_ATOMS(ctx); i++) {
 +   if (ctx-atoms.array[i]-dirty) {
 +   num_dw += ctx-atoms.array[i]-num_dw;
 +   }
 +   }
 +
 if (count_draw_in) {
 /* The number of dwords all the dirty states would take. */
 num_dw += ctx-pm4_dirty_cdwords;
 @@ -254,6 +262,10 @@ void si_context_flush(struct r600_context *ctx, unsigned 
 flags)
 ctx-pm4_dirty_cdwords = 0;
 ctx-flags = 0;

 +   /* The CS initialization should be emitted before everything else. */
 +

[Mesa-dev] Bison 3 fixes

2013-08-15 Thread Armin K.

Can we have Bison 3 fixes in 9.2 branch so they make it into 9.2 release?

eb7c8c7fb6e49a04f3fe84a6d438160dc4a14ac0
f043381334a0760ec118d07b6fb7785b5692572a
de917b4c4c4dfc949d5f8e3d9eb2dd48b63a3de5
6d2a9220b832d9a0c0cf35fcc5b9de1542af267d
5ffa28df4e4cc22481b4ed41c78632f35765f41d
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Bison 3 fixes

2013-08-15 Thread Armin K.

On 08/15/2013 03:55 PM, Armin K. wrote:
 Can we have Bison 3 fixes in 9.2 branch so they make it into 9.2 release?
 
 eb7c8c7fb6e49a04f3fe84a6d438160dc4a14ac0
 f043381334a0760ec118d07b6fb7785b5692572a
 de917b4c4c4dfc949d5f8e3d9eb2dd48b63a3de5
 6d2a9220b832d9a0c0cf35fcc5b9de1542af267d
 5ffa28df4e4cc22481b4ed41c78632f35765f41d
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 

It appears that only last one is missing. Others have been applied to
9.2 branch.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] osmesa: Copy shared library to LIB_DIR to fix two symlinks

2013-08-15 Thread Armin K

---
 src/mesa/drivers/osmesa/Makefile.am | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/osmesa/Makefile.am 
b/src/mesa/drivers/osmesa/Makefile.am
index a1c0bb5..d625faa 100644
--- a/src/mesa/drivers/osmesa/Makefile.am
+++ b/src/mesa/drivers/osmesa/Makefile.am
@@ -56,6 +56,7 @@ all-local: lib@OSMESA_LIB@.la
$(MKDIR_P) $(top_builddir)/$(LIB_DIR);
ln -f .libs/lib@OSMESA_LIB@.so 
$(top_builddir)/$(LIB_DIR)/lib@OSMESA_LIB@.so;
ln -f .libs/lib@OSMESA_LIB@.so.@OSMESA_VERSION@ 
$(top_builddir)/$(LIB_DIR)/lib@OSMESA_LIB@.so.@OSMESA_VERSION@;
+   cp .libs/lib@OSMESA_LIB@.so.@OSMESA_VERSION@.0.0 
$(top_builddir)/$(LIB_DIR)/
 endif
 
 pkgconfigdir = $(libdir)/pkgconfig
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] llvmpipe: fix stencil bug if we have both stencil and depth tests

2013-08-15 Thread sroland

From: Roland Scheidegger srol...@vmware.com

This is a very well hidden bug found by accident (only the fixed glean
tstencil2 test so far seems to hit it).
We must use new mask with combined s_pass values and orig_mask values
for zpass/zfail stencil ops, otherwise both the sfail op and one of
zpass/zfail op are applied (probably not hit in most tests because
some of the ops tend to be KEEP usually).

Note: this is a candidate for the 9.2 branch.
---
 src/gallium/drivers/llvmpipe/lp_bld_depth.c |   27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_bld_depth.c 
b/src/gallium/drivers/llvmpipe/lp_bld_depth.c
index 06556dc..5c13ee5 100644
--- a/src/gallium/drivers/llvmpipe/lp_bld_depth.c
+++ b/src/gallium/drivers/llvmpipe/lp_bld_depth.c
@@ -836,7 +836,7 @@ lp_build_depth_stencil_test(struct gallivm_state *gallivm,
LLVMValueRef stencil_vals = NULL;
LLVMValueRef z_bitmask = NULL, stencil_shift = NULL;
LLVMValueRef z_pass = NULL, s_pass_mask = NULL;
-   LLVMValueRef orig_mask = lp_build_mask_value(mask);
+   LLVMValueRef current_mask = lp_build_mask_value(mask);
LLVMValueRef front_facing = NULL;
boolean have_z, have_s;
 
@@ -984,7 +984,7 @@ lp_build_depth_stencil_test(struct gallivm_state *gallivm,
 
   /* apply stencil-fail operator */
   {
- LLVMValueRef s_fail_mask = lp_build_andnot(s_bld, orig_mask, 
s_pass_mask);
+ LLVMValueRef s_fail_mask = lp_build_andnot(s_bld, current_mask, 
s_pass_mask);
  stencil_vals = lp_build_stencil_op(s_bld, stencil, S_FAIL_OP,
 stencil_refs, stencil_vals,
 s_fail_mask, front_facing);
@@ -1032,6 +1032,11 @@ lp_build_depth_stencil_test(struct gallivm_state 
*gallivm,
   /* compare src Z to dst Z, returning 'pass' mask */
   z_pass = lp_build_cmp(z_bld, depth-func, z_src, z_dst);
 
+  /* mask off bits that failed stencil test */
+  if (s_pass_mask) {
+ current_mask = LLVMBuildAnd(builder, current_mask, s_pass_mask, );
+  }
+
   if (!stencil[0].enabled) {
  /* We can potentially skip all remaining operations here, but only
   * if stencil is disabled because we still need to update the stencil
@@ -1041,25 +1046,19 @@ lp_build_depth_stencil_test(struct gallivm_state 
*gallivm,
 
  if (do_branch) {
 lp_build_mask_check(mask);
-do_branch = FALSE;
  }
   }
 
   if (depth-writemask) {
- LLVMValueRef zselectmask;
+ LLVMValueRef z_pass_mask;
 
  /* mask off bits that failed Z test */
- zselectmask = LLVMBuildAnd(builder, orig_mask, z_pass, );
-
- /* mask off bits that failed stencil test */
- if (s_pass_mask) {
-zselectmask = LLVMBuildAnd(builder, zselectmask, s_pass_mask, );
- }
+ z_pass_mask = LLVMBuildAnd(builder, current_mask, z_pass, );
 
  /* Mix the old and new Z buffer values.
   * z_dst[i] = zselectmask[i] ? z_src[i] : z_dst[i]
   */
- z_dst = lp_build_select(z_bld, zselectmask, z_src, z_dst);
+ z_dst = lp_build_select(z_bld, z_pass_mask, z_src, z_dst);
   }
 
   if (stencil[0].enabled) {
@@ -1067,13 +1066,13 @@ lp_build_depth_stencil_test(struct gallivm_state 
*gallivm,
  LLVMValueRef z_fail_mask, z_pass_mask;
 
  /* apply Z-fail operator */
- z_fail_mask = lp_build_andnot(s_bld, orig_mask, z_pass);
+ z_fail_mask = lp_build_andnot(s_bld, current_mask, z_pass);
  stencil_vals = lp_build_stencil_op(s_bld, stencil, Z_FAIL_OP,
 stencil_refs, stencil_vals,
 z_fail_mask, front_facing);
 
  /* apply Z-pass operator */
- z_pass_mask = LLVMBuildAnd(builder, orig_mask, z_pass, );
+ z_pass_mask = LLVMBuildAnd(builder, current_mask, z_pass, );
  stencil_vals = lp_build_stencil_op(s_bld, stencil, Z_PASS_OP,
 stencil_refs, stencil_vals,
 z_pass_mask, front_facing);
@@ -1083,7 +1082,7 @@ lp_build_depth_stencil_test(struct gallivm_state *gallivm,
   /* No depth test: apply Z-pass operator to stencil buffer values which
* passed the stencil test.
*/
-  s_pass_mask = LLVMBuildAnd(builder, orig_mask, s_pass_mask, );
+  s_pass_mask = LLVMBuildAnd(builder, current_mask, s_pass_mask, );
   stencil_vals = lp_build_stencil_op(s_bld, stencil, Z_PASS_OP,
  stencil_refs, stencil_vals,
  s_pass_mask, front_facing);
-- 
1.7.9.5
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views

2013-08-15 Thread Michel Dänzer

On Don, 2013-08-15 at 12:36 +0200, Marek Olšák wrote:
 On Thu, Aug 15, 2013 at 9:33 AM, Michel Dänzer mic...@daenzer.net wrote:
  On Don, 2013-08-15 at 05:25 +0200, Marek Olšák wrote:
  (This should be applied before MSAA, which will need to be rebased.)
 
  It moves all sampler view descriptors to a buffer.
  It supports partial resource updates and it can also unbind resources
  (required for FMASK texturing).
 
  The buffer contains all sampler view descriptors for one shader stage,
  represented as an array. On top of that, there are N arrays in the buffer,
  which are used to emulate context registers as implemented by the previous
  ASICs (each array is a context).
 
  This uses the RCU synchronization approach to avoid read-after-write 
  hazards
  as discussed in the thread:
  radeonsi: add FMASK texture binding slots and resource setup
 
  CP DMA is used to clear the descriptors at context initialization and to 
  copy
  the descriptors from one context to the next.
 
  IMPORTANT:
128 resource contexts are needed, 64 doesn't work.
 
  Hmm, I suppose this might depend on the specific GPU?
 
 Very likely. I'm using SI VERDE.

Then it seems likely more will be needed for Pitcairn or Tahiti due to
multiple shader engines?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Removing egl_glx?

2013-08-15 Thread Chad Versace


On 08/14/2013 10:04 PM, Chia-I Wu wrote:

On Thu, Aug 15, 2013 at 10:03 AM, Kenneth Graunke kenn...@whitecape.org wrote:

On 08/08/2013 03:13 PM, Chad Versace wrote:
[snip]


By the way, I talked to krh today, and he suggested that we delete egl_glx
rather than allow it to bitrot.



I'm in favor, but I don't know who uses that.

GLX on EGL would be an interesting experiment.  EGL on GLX is not useful,
IMHO.

That sounds good to me.


If Dr Jakob also agrees, then let's plan to kill egl_glx for 9.3/10.0.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] i965/gen7: Set MOCS L3 cacheability for IVB/BYT

2013-08-15 Thread Chad Versace


On 08/14/2013 12:50 AM, Ville Syrjälä wrote:

On Wed, Aug 14, 2013 at 10:45:23AM +0300, Ville Syrjälä wrote:

On Tue, Aug 13, 2013 at 05:46:55PM -0700, Chad Versace wrote:

On 08/13/2013 03:31 PM, Vedran Rodic wrote:

On Mon, Aug 12, 2013 at 3:07 PM,  ville.syrj...@linux.intel.com wrote:

From: Ville Syrjälä ville.syrj...@linux.intel.com




For L3 cacheability, IVB won't consult the PTE for anything that has a
relevant MOCS field. So even if you make everything L3 cacheable through
the PTEs, MOCS will always override it. How do i know you ask? Well,
BSpec says so for one, and more importantly I verified this by running
some tests [...]



I suspected this. Thanks for sharing your experimental evidence.



For LLC cachebility the story will be different because there IVB MOCS
can only say LLC cacheable or consult the PTE. So to make stuff
uncached in LLC on IVB, we'd need to issues the set_caching ioctl to
change the PTE to uncached, and after that we could use just the MOCS
to select the LLC caching policy. Since the set_caching ioctl only needs
to be issued once per object (or you could use the , there


Sorry hit send by accident. Was going to say we could use the new
create2 ioctl Chris has proposed that allows you to set the cache mode
when creating the object. So there won't be a performance hit from
extra ioctls getting issued all the time.



I would like such a cache-control ioctl, as long the ioctl can also
be used to change the object's cacheing policy in addition to
setting it at object creation. This would be
needed when an object's usage oscillates between texture surface
and render target.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 67962] undefined reference to `wayland_drm_buffer_get'

2013-08-15 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=67962

U. Artie Eoff ullysses.a.e...@intel.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from U. Artie Eoff ullysses.a.e...@intel.com ---
Fixed in commit
http://cgit.freedesktop.org/mesa/mesa/commit/?id=f423eba46e080b975a2b8366b490d99dee4729ad

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 67962] undefined reference to `wayland_drm_buffer_get'

2013-08-15 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=67962

U. Artie Eoff ullysses.a.e...@intel.com changed:

   What|Removed |Added

 Status|RESOLVED|VERIFIED

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] radeonsi: LLVM r187139 broke some piglit tests

2013-08-15 Thread Tom Stellard

On Thu, Aug 15, 2013 at 11:55:36AM +0200, Michel Dänzer wrote:
 On Fre, 2013-08-02 at 17:58 +0200, Michel Dänzer wrote:
  On Mit, 2013-07-31 at 08:42 -0700, Tom Stellard wrote:
   On Wed, Jul 31, 2013 at 01:04:01PM +0200, Michel Dänzer wrote:

LLVM revision 187139 ('Allocate local registers in order for optimal
coloring.') broke some derivative related piglit tests with the radeonsi
driver. 

I'm attaching a diff between the bad and good generated code (as printed
with RADEON_DUMP_SHADERS=1) for the glsl-derivs test. The only
difference I can see is in which registers are used in which order.

I wonder if we might be missing S_WAITCNT after DS_READ/WRITE
instructions in some cases, but I haven't spotted any candidates for
that in the bad code which aren't there in the good code as well. Can
anyone else spot something I've missed?
   
   Shouldn't we be using the S_BARRIER instruction to keep the threads in 
   sync?
  
  Doesn't seem to help unfortunately, but thanks for the good suggestion.
 
 I found one thing going wrong: DS_WRITE_B32 ends up using a VGPR
 register number instead of the $gds operand for encoding the GDS field
 (the asm output from llc even shows the VGPR name). If the VGPR number
 happens to be odd (i.e. to have the least significant bit set), the
 shader ends up writing to GDS instead of LDS.


Ouch, that's a pretty bad bug.

 But I have no idea why this is happening, or how to fix it. :(
 
 

I can take a look at it.

-Tom

 -- 
 Earthling Michel Dänzer   |   http://www.amd.com
 Libre software enthusiast |  Debian, X and DRI developer
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: fix stencil bug if we have both stencil and depth tests

2013-08-15 Thread Zack Rusin

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 This is a very well hidden bug found by accident (only the fixed glean
 tstencil2 test so far seems to hit it).
 We must use new mask with combined s_pass values and orig_mask values
 for zpass/zfail stencil ops, otherwise both the sfail op and one of
 zpass/zfail op are applied (probably not hit in most tests because
 some of the ops tend to be KEEP usually).
 
 Note: this is a candidate for the 9.2 branch.

Looks good
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: Enable LTO by default on i965/libdricore release builds.

2013-08-15 Thread Eric Anholt

Chad Versace chad.vers...@linux.intel.com writes:

 On 08/08/2013 01:43 PM, Eric Anholt wrote:
 We can't just smash it on globally due to (probably resolvable) issues
 with the asm in glapi.  And we don't want to penalize developers with
 longer build times for their normal debug environment.

 Due to libdricore making almost all of our symbols public, the effect is
 very small -- cairo-gl with INTEL_NO_HW=1 shows -0.798709% +/- 0.333703%
 change in runtime (n=30).
 ---

 If we were to avoid dricore, there's an additional 5% improvement available
 (see the megadriver branch of my tree).

   configure.ac  | 25 +
   src/mesa/Makefile.am  |  4 ++--
   src/mesa/drivers/dri/i965/Makefile.am |  1 +
   src/mesa/libdricore/Makefile.am   |  8 +++-
   src/mesa/program/Makefile.am  |  4 ++--
   5 files changed, 37 insertions(+), 5 deletions(-)

 diff --git a/configure.ac b/configure.ac
 index 62d06e0..26c230d 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -314,6 +314,7 @@ AC_ARG_ENABLE([debug],
   [enable_debug=$enableval],
   [enable_debug=no]
   )
 +enable_lto=yes
   if test x$enable_debug = xyes; then
   DEFINES_FOR_BUILD=$DEFINES_FOR_BUILD -DDEBUG
   if test x$GCC_FOR_BUILD = xyes; then
 @@ -330,7 +331,31 @@ if test x$enable_debug = xyes; then
   if test x$GXX = xyes; then
   CXXFLAGS=$CXXFLAGS -g -O0
   fi
 +
 +# Disable LTO by default on debug builds, since it's so expensive at
 +# compile time.
 +enable_lto=no
 +fi

 I'd like to emit a configuration error if someone tries to enable LTO
 in a debug build. According the gcc-4.8 manpage, that's a really bad
 idea.

 Link-time optimization does not work well with generation of
 debugging information.  Combining -flto with -g is currently
 experimental and expected to produce wrong results.

 Other than that, the patch looks good to me.

Yeah, debug symbols appear to be quite scrambled.  As is, I don't think
we can ship with it -- breaking backtraces (and thus profiling) is not
acceptable.


pgpkub4MzCftN.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] configure: link against -lLLVM to determine build type

2013-08-15 Thread Julien Cristau

On Wed, Aug 14, 2013 at 09:46:44 -0700, Tom Stellard wrote:

 On Wed, Aug 14, 2013 at 09:08:55AM +0200, Maarten Lankhorst wrote:
  Op 14-08-13 04:37, Tom Stellard schreef:
   On Tue, Aug 13, 2013 at 05:53:52PM +0200, Maarten Lankhorst wrote:
   Fixes a build failure of 9.2 on ubuntu, because libLLVM-3.3.so is not 
   present in /usr/lib/llvm-3.2/lib.
  
   I'm trying to understand the problem here, could you give a little more
   information about how Ubuntu packages LLVM?  Where are the LLVM
   libraries installed and what does llvm-config --libdir --ldflags report?
  libLLVM-3.3.so gets installed in /usr/lib/x86_64-linux-gnu/libLLVM-3.3.so,
  there is no /usr/lib/llvm-3.2/lib/libLLVM-3.3.so
  All the other libs still get installed to /usr/lib/llvm-3.3/lib.
  
  ~$ llvm-config-3.3 --libdir --ldflags
  /usr/lib/llvm-3.3/lib
  -L/usr/lib/llvm-3.3/lib  -lpthread -lffi -ldl -lm
 
 So, if /usr/lib/x86_64-linux-gnu/ is not in the ldflags, then how will
 the linker find it?
 
It's in the default search path.

Cheers,
Julien
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] osmesa: Copy shared library to LIB_DIR to fix two symlinks

2013-08-15 Thread Matt Turner

On Thu, Aug 15, 2013 at 7:19 AM, Armin K kre...@email.com wrote:
 ---
  src/mesa/drivers/osmesa/Makefile.am | 1 +
  1 file changed, 1 insertion(+)

 diff --git a/src/mesa/drivers/osmesa/Makefile.am 
 b/src/mesa/drivers/osmesa/Makefile.am
 index a1c0bb5..d625faa 100644
 --- a/src/mesa/drivers/osmesa/Makefile.am
 +++ b/src/mesa/drivers/osmesa/Makefile.am
 @@ -56,6 +56,7 @@ all-local: lib@OSMESA_LIB@.la
 $(MKDIR_P) $(top_builddir)/$(LIB_DIR);
 ln -f .libs/lib@OSMESA_LIB@.so 
 $(top_builddir)/$(LIB_DIR)/lib@OSMESA_LIB@.so;
 ln -f .libs/lib@OSMESA_LIB@.so.@OSMESA_VERSION@ 
 $(top_builddir)/$(LIB_DIR)/lib@OSMESA_LIB@.so.@OSMESA_VERSION@;
 +   cp .libs/lib@OSMESA_LIB@.so.@OSMESA_VERSION@.0.0 
 $(top_builddir)/$(LIB_DIR)/
  endif

  pkgconfigdir = $(libdir)/pkgconfig
 --
 1.8.3.4

Just ln -f it like the others.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] i965/gen7: Set MOCS L3 cacheability for IVB/BYT

2013-08-15 Thread Ville Syrjälä

On Thu, Aug 15, 2013 at 08:08:12AM -0700, Chad Versace wrote:
 On 08/14/2013 12:50 AM, Ville Syrjälä wrote:
  On Wed, Aug 14, 2013 at 10:45:23AM +0300, Ville Syrjälä wrote:
  On Tue, Aug 13, 2013 at 05:46:55PM -0700, Chad Versace wrote:
  On 08/13/2013 03:31 PM, Vedran Rodic wrote:
  On Mon, Aug 12, 2013 at 3:07 PM,  ville.syrj...@linux.intel.com wrote:
  From: Ville Syrjälä ville.syrj...@linux.intel.com
 
 
  For L3 cacheability, IVB won't consult the PTE for anything that has a
  relevant MOCS field. So even if you make everything L3 cacheable through
  the PTEs, MOCS will always override it. How do i know you ask? Well,
  BSpec says so for one, and more importantly I verified this by running
  some tests [...]
 
 
 I suspected this. Thanks for sharing your experimental evidence.
 
 
  For LLC cachebility the story will be different because there IVB MOCS
  can only say LLC cacheable or consult the PTE. So to make stuff
  uncached in LLC on IVB, we'd need to issues the set_caching ioctl to
  change the PTE to uncached, and after that we could use just the MOCS
  to select the LLC caching policy. Since the set_caching ioctl only needs
  to be issued once per object (or you could use the , there
 
  Sorry hit send by accident. Was going to say we could use the new
  create2 ioctl Chris has proposed that allows you to set the cache mode
  when creating the object. So there won't be a performance hit from
  extra ioctls getting issued all the time.
 
 
 I would like such a cache-control ioctl, as long the ioctl can also
 be used to change the object's cacheing policy in addition to
 setting it at object creation. This would be
 needed when an object's usage oscillates between texture surface
 and render target.

We do have the set_caching ioctl. It's enough to flip the PTEs to UC and
let MOCS manage things. I actually did a few experiments on my IVB. I
made all Mesa's buffers UC via PTEs by patching libdrm to change the
cache mode of each bo after allocation. Then I fiddled with the MOCS
LLC bits in various ways. It definitely has an effect, sometimes making
things slower, sometimes faster. xonotic again seemed to benefit. IIRC
leaving everything LLC uncached was actually the fastest (w/ high quality
at least) so we may be thrashing the LLC a bit there. But eg. reaction
quake regressed quite a lot if most things were left as UC.

I should probably run through a few MOCS combinations and collect a bit
more data. But it's looking like some sensible heuristic has to be
involved since different benchmarks show conflicting results. Maybe
your LLC overcommit prevention approach would be the one. Are you
planning to continue with that work?

-- 
Ville Syrjälä
Intel OTC
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] radeonsi: LLVM r187139 broke some piglit tests

2013-08-15 Thread Tom Stellard

On Thu, Aug 15, 2013 at 08:22:39AM -0700, Tom Stellard wrote:
 On Thu, Aug 15, 2013 at 11:55:36AM +0200, Michel Dänzer wrote:
  On Fre, 2013-08-02 at 17:58 +0200, Michel Dänzer wrote:
   On Mit, 2013-07-31 at 08:42 -0700, Tom Stellard wrote:
On Wed, Jul 31, 2013 at 01:04:01PM +0200, Michel Dänzer wrote:
 
 LLVM revision 187139 ('Allocate local registers in order for optimal
 coloring.') broke some derivative related piglit tests with the 
 radeonsi
 driver. 
 
 I'm attaching a diff between the bad and good generated code (as 
 printed
 with RADEON_DUMP_SHADERS=1) for the glsl-derivs test. The only
 difference I can see is in which registers are used in which order.
 
 I wonder if we might be missing S_WAITCNT after DS_READ/WRITE
 instructions in some cases, but I haven't spotted any candidates for
 that in the bad code which aren't there in the good code as well. Can
 anyone else spot something I've missed?

Shouldn't we be using the S_BARRIER instruction to keep the threads in 
sync?
   
   Doesn't seem to help unfortunately, but thanks for the good suggestion.
  
  I found one thing going wrong: DS_WRITE_B32 ends up using a VGPR
  register number instead of the $gds operand for encoding the GDS field
  (the asm output from llc even shows the VGPR name). If the VGPR number
  happens to be odd (i.e. to have the least significant bit set), the
  shader ends up writing to GDS instead of LDS.
 
 
 Ouch, that's a pretty bad bug.
 
  But I have no idea why this is happening, or how to fix it. :(
  
  
 
 I can take a look at it.


The attached patch should fix the problem, can you test?

-Tom
From c837737e2807b523fffac2ddf9aa9d8f293cf622 Mon Sep 17 00:00:00 2001
From: Tom Stellard thomas.stell...@amd.com
Date: Thu, 15 Aug 2013 09:03:08 -0700
Subject: [PATCH] R600/SI: Fix incorrect encoding of DS_WRITE_B32 instructions

The SIInsertWaits pass was overwriting the first operand (gds bit) of
DS_WRITE_B32 with the second operand (value to write).  This meant that
any time the value to write was stored in an odd number VGPR, the gds
bit would be set causing the instruction to write to GDS instead of LDS.
---
 lib/Target/R600/SIInsertWaits.cpp | 4 +---
 lib/Target/R600/SIInstrInfo.td| 8 
 lib/Target/R600/SIInstructions.td | 6 +++---
 test/CodeGen/R600/local-memory.ll | 4 ++--
 4 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/lib/Target/R600/SIInsertWaits.cpp 
b/lib/Target/R600/SIInsertWaits.cpp
index ba202e3..c477be5 100644
--- a/lib/Target/R600/SIInsertWaits.cpp
+++ b/lib/Target/R600/SIInsertWaits.cpp
@@ -134,9 +134,7 @@ Counters SIInsertWaits::getHwCounts(MachineInstr MI) {
   // LGKM may uses larger values
   if (TSFlags  SIInstrFlags::LGKM_CNT) {
 
-MachineOperand Op = MI.getOperand(0);
-if (!Op.isReg())
-  Op = MI.getOperand(1);
+const MachineOperand Op = MI.getOperand(0);
 assert(Op.isReg()  First LGKM operand must be a register!);
 
 unsigned Reg = Op.getReg();
diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
index ecc4718..1965ba0 100644
--- a/lib/Target/R600/SIInstrInfo.td
+++ b/lib/Target/R600/SIInstrInfo.td
@@ -342,8 +342,8 @@ class VOP3_64 bits9 op, string opName, listdag 
pattern : VOP3 
 class DS_Load_Helper bits8 op, string asm, RegisterClass regClass : DS 
   op,
   (outs regClass:$vdst),
-  (ins i1imm:$gds, VReg_32:$addr, VReg_32:$data0, VReg_32:$data1,
-   i8imm:$offset0, i8imm:$offset1),
+  (ins VReg_32:$addr, VReg_32:$data0, VReg_32:$data1,
+   i8imm:$offset0, i8imm:$offset1, i1imm:$gds),
   asm# $vdst, $gds, $addr, $data0, $data1, $offset0, $offset1, [M0],
   [] {
   let mayLoad = 1;
@@ -353,8 +353,8 @@ class DS_Load_Helper bits8 op, string asm, RegisterClass 
regClass : DS 
 class DS_Store_Helper bits8 op, string asm, RegisterClass regClass : DS 
   op,
   (outs),
-  (ins i1imm:$gds, VReg_32:$addr, VReg_32:$data0, VReg_32:$data1,
-   i8imm:$offset0, i8imm:$offset1),
+  (ins VReg_32:$addr, VReg_32:$data0, VReg_32:$data1,
+   i8imm:$offset0, i8imm:$offset1, i1imm:$gds),
   asm# $gds, $addr, $data0, $data1, $offset0, $offset1, [M0],
   [] {
   let mayStore = 1;
diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index 4eb3566..9856fa6 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -1745,13 +1745,13 @@ def : Pat 
 
 def : Pat 
 (local_load i64:$src0),
-(i32 (DS_READ_B32 0, (EXTRACT_SUBREG $src0, sub0),
-  (EXTRACT_SUBREG $src0, sub0), (EXTRACT_SUBREG $src0, 
sub0), 0, 0))
+(i32 (DS_READ_B32 (EXTRACT_SUBREG $src0, sub0),
+  (EXTRACT_SUBREG $src0, sub0), (EXTRACT_SUBREG $src0, 
sub0), 0, 0, 0))
 ;
 
 def : Pat 
 (local_store i32:$src1, i64:$src0),
-(DS_WRITE_B32 0, (EXTRACT_SUBREG $src0, sub0), $src1, $src1, 0, 0)
+(DS_WRITE_B32 (EXTRACT_SUBREG $src0, sub0), $src1, $src1, 0, 0, 0)
 ;

[Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views

2013-08-15 Thread Marek Olšák

(This should be applied before MSAA, which will need to be rebased.)

It moves all sampler view descriptors to a buffer.
It supports partial resource updates and it can also unbind resources
(required for FMASK texturing).

The buffer contains all sampler view descriptors for one shader stage,
represented as an array. On top of that, there are N arrays in the buffer,
which are used to emulate context registers as implemented by the previous
ASICs (each array is a context).

This uses the RCU synchronization approach to avoid read-after-write hazards
as discussed in the thread:
radeonsi: add FMASK texture binding slots and resource setup

CP DMA is used to clear the descriptors at context initialization and to copy
the descriptors from one context to the next.

v2: - use PKT3_DMA_DATA on CIK (I'll test CIK later)
- turn the bool CP DMA parameters into self-explanatory flags
- add a nice simple API for packet emission to radeon_winsys.h
- use 256 contexts, 128 causes texture corruption in openarena

DISCUSSION:
  Maybe there is a synchronization issue and we don't actually need so many
  contexts? We can always flush KCACHE at the end of the ring if needed.
---
 src/gallium/drivers/radeonsi/Makefile.sources  |   1 +
 src/gallium/drivers/radeonsi/r600_blit.c   |  12 +-
 src/gallium/drivers/radeonsi/r600_hw_context.c |  22 +-
 src/gallium/drivers/radeonsi/radeonsi_pipe.c   |   7 +-
 src/gallium/drivers/radeonsi/radeonsi_pipe.h   |  19 +-
 src/gallium/drivers/radeonsi/si_descriptors.c  | 355 +
 src/gallium/drivers/radeonsi/si_state.c|  47 +---
 src/gallium/drivers/radeonsi/si_state.h|  56 
 src/gallium/drivers/radeonsi/si_state_draw.c   |  18 +-
 src/gallium/drivers/radeonsi/sid.h |  54 
 src/gallium/winsys/radeon/drm/radeon_winsys.h  |  12 +
 11 files changed, 547 insertions(+), 56 deletions(-)
 create mode 100644 src/gallium/drivers/radeonsi/si_descriptors.c

diff --git a/src/gallium/drivers/radeonsi/Makefile.sources 
b/src/gallium/drivers/radeonsi/Makefile.sources
index b3ffa72..68c8282 100644
--- a/src/gallium/drivers/radeonsi/Makefile.sources
+++ b/src/gallium/drivers/radeonsi/Makefile.sources
@@ -10,6 +10,7 @@ C_SOURCES := \
r600_translate.c \
radeonsi_pm4.c \
radeonsi_compute.c \
+   si_descriptors.c \
si_state.c \
si_state_streamout.c \
si_state_draw.c \
diff --git a/src/gallium/drivers/radeonsi/r600_blit.c 
b/src/gallium/drivers/radeonsi/r600_blit.c
index bab108e..bdd9bb4 100644
--- a/src/gallium/drivers/radeonsi/r600_blit.c
+++ b/src/gallium/drivers/radeonsi/r600_blit.c
@@ -70,12 +70,12 @@ static void r600_blitter_begin(struct pipe_context *ctx, 
enum r600_blitter_op op
 
if (op  R600_SAVE_TEXTURES) {
util_blitter_save_fragment_sampler_states(
-   rctx-blitter, rctx-ps_samplers.n_samplers,
-   (void**)rctx-ps_samplers.samplers);
+   rctx-blitter, 
rctx-samplers[PIPE_SHADER_FRAGMENT].n_samplers,
+   (void**)rctx-samplers[PIPE_SHADER_FRAGMENT].samplers);
 
-   util_blitter_save_fragment_sampler_views(
-   rctx-blitter, rctx-ps_samplers.n_views,
-   (struct pipe_sampler_view**)rctx-ps_samplers.views);
+   util_blitter_save_fragment_sampler_views(rctx-blitter,
+   
util_last_bit(rctx-samplers[PIPE_SHADER_FRAGMENT].views.desc.enabled_mask),
+   rctx-samplers[PIPE_SHADER_FRAGMENT].views.views);
}
 
if ((op  R600_DISABLE_RENDER_COND)  rctx-current_render_cond) {
@@ -224,7 +224,7 @@ void si_flush_depth_textures(struct r600_context *rctx,
struct pipe_sampler_view *view;
struct r600_texture *tex;
 
-   view = textures-views[i]-base;
+   view = textures-views.views[i];
if (!view) continue;
 
tex = (struct r600_texture *)view-texture;
diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c 
b/src/gallium/drivers/radeonsi/r600_hw_context.c
index 25c972b..bc6ba0b 100644
--- a/src/gallium/drivers/radeonsi/r600_hw_context.c
+++ b/src/gallium/drivers/radeonsi/r600_hw_context.c
@@ -114,9 +114,17 @@ err:
 void si_need_cs_space(struct r600_context *ctx, unsigned num_dw,
boolean count_draw_in)
 {
+   int i;
+
/* The number of dwords we already used in the CS so far. */
num_dw += ctx-cs-cdw;
 
+   for (i = 0; i  SI_NUM_ATOMS(ctx); i++) {
+   if (ctx-atoms.array[i]-dirty) {
+   num_dw += ctx-atoms.array[i]-num_dw;
+   }
+   }
+
if (count_draw_in) {
/* The number of dwords all the dirty states would take. */
num_dw += ctx-pm4_dirty_cdwords;
@@ -254,6 +262,15 @@ void si_context_flush(struct r600_context *ctx, unsigned 
flags)
ctx-pm4_dirty_cdwords

Re: [Mesa-dev] [PATCH 2/3] i965/gen7: Set MOCS L3 cacheability for IVB/BYT

2013-08-15 Thread Chad Versace


On 08/15/2013 09:11 AM, Ville Syrjälä wrote:

On Thu, Aug 15, 2013 at 08:08:12AM -0700, Chad Versace wrote:



I would like such a cache-control ioctl, as long the ioctl can also
be used to change the object's cacheing policy in addition to
setting it at object creation. This would be
needed when an object's usage oscillates between texture surface
and render target.


We do have the set_caching ioctl. It's enough to flip the PTEs to UC and
let MOCS manage things. I actually did a few experiments on my IVB. I
made all Mesa's buffers UC via PTEs by patching libdrm to change the
cache mode of each bo after allocation. Then I fiddled with the MOCS
LLC bits in various ways. It definitely has an effect, sometimes making
things slower, sometimes faster. xonotic again seemed to benefit. IIRC
leaving everything LLC uncached was actually the fastest (w/ high quality
at least) so we may be thrashing the LLC a bit there. But eg. reaction
quake regressed quite a lot if most things were left as UC.

I should probably run through a few MOCS combinations and collect a bit
more data. But it's looking like some sensible heuristic has to be
involved since different benchmarks show conflicting results. Maybe
your LLC overcommit prevention approach would be the one. Are you
planning to continue with that work?


I do plan to continue that work. I plan to return to it the week of
Aug 26, because I need to first make more progress on Broadwell.

My simple heuristic that prevents overcommit of the
LLC, in its current form, gives varying results too. Some benchmarks benefit, 
some harmed.
In each experiment, I set the LLC commit threshhold to 0.80,
1.00, or 1.25. (That is, for a given draw call, Mesa stops putting objects
in the LLC when the draw call has filled that ratio of the LLC).

Hopefully, to get consistent benefit across all apps, all we need is to choose 
a significantly
higher or lower threshold than I've previously chosen. What I fear, though, is 
that
since the GPU shares the LLC with the CPU (GPU-LLC=CPU-L3), to find a heuristic 
that's
near-globally beneficial, we may need to consider the CPU load to intelligently 
choose the LLC commit threshold
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC] Consolidate the remaining source files to Makefile.sources

2013-08-15 Thread Emil Velikov

Hello list

Feeling inspired by the automake work done in mesa, I felt like
finishing a few things that did not receive too much attention
 * use Makefile.sources wherever possible
 * cleanup the duplicated C{,PP,XX}FLAGS and factor out the the common
ones into Automake.inc

If anyone is interested I have pushed my preliminary work to the
makefile.sources branch at https://github.com/evelikov/Mesa/

Currently I have gone through the following:
 gallium/drivers
 gallium/state_trackers/dri/drm
 gallium/state_trackers/dri/sw
 gallium/state_trackers/vega
 gallium/state_trackers/xorg

Producing the following diff stat:
 54 files changed, 352 insertions(+), 552 deletions(-)


Pros:
* patches such as gallium/dri-targets: hide all symbols except for
__driDriverExtensions will be brought down to changing only with 2-3 lines
* one less chance of breaking builds when someone forgets to update the
SCons/Makefile.am/etc build, after adding/removing a file

Cons:
* Non nouveau changes will be only(lacking any other the hardware atm).
Note that I will check the symbols for all drivers, to ensure that I do
not cause chaos.

Curious if anyone is interested and have any comments on this.
Note: there may be some git rebase fails as I've started this ~3months ago

Cheers
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/15] i965/fs: Don't kill ACP entries due to their generating instruction.

2013-08-15 Thread Paul Berry

On 12 August 2013 13:11, Kenneth Graunke kenn...@whitecape.org wrote:

 fs_copy_prop_dataflow::setup_kills() walks through each basic block's
 instructions, looking for instructions which overwrite registers used
 in ACP entries.

 This would be fine, except that it didn't exclude the instructions
 which generated the ACP entries in the first place.  This meant that
 every copy was killed by its own generating instruction, making the
 dataflow analysis useless.


I don't think this is actually a problem, because the bd[b].kills bits are
only used to prevent liveness information from being propagated from
bd[b].livein to bd[b].liveout, and the fs_copy_prop_dataflow constructor
initializes bd[b].liveout with the set of ACP's that are generated from
block b.  So the only situation in which your patch would allow liveness
information to be propagated from bd[b].livein to bd[b].liveout is for bits
in bd[b].liveout that are already set.

Or am I misunderstanding things?



 To fix this, this patch records the generating instruction in the
 ACP entry.  It then skips the kill checks when processing that
 instruction.

 Using a void pointer ensures it will only be used for comparisons,
 and not dereferenced.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
 b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
 index d8d1546..379605c 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
 @@ -42,6 +42,9 @@ namespace { /* avoid conflict with
 opt_copy_propagation_elements */
  struct acp_entry : public exec_node {
 fs_reg dst;
 fs_reg src;
 +
 +   /** MOV instruction which generated this copy/ACP entry. */
 +   void *inst;
  };

  struct block_data {
 @@ -146,8 +149,9 @@ fs_copy_prop_dataflow::setup_kills()
  continue;

   for (int i = 0; i  num_acp; i++) {
 -if (inst-overwrites_reg(acp[i]-dst) ||
 -inst-overwrites_reg(acp[i]-src)) {
 +if (inst != acp[i]-inst 
 +(inst-overwrites_reg(acp[i]-dst) ||
 + inst-overwrites_reg(acp[i]-src))) {
 BITSET_SET(bd[b].kill, i);
  }
   }
 @@ -418,6 +422,7 @@ fs_visitor::opt_copy_propagate_local(void *mem_ctx,
 bblock_t *block,
  acp_entry *entry = ralloc(mem_ctx, acp_entry);
  entry-dst = inst-dst;
  entry-src = inst-src[0];
 + entry-inst = inst;
  acp[entry-dst.reg % ACP_HASH_SIZE].push_tail(entry);
}
 }
 --
 1.8.3.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] draw: handle nan clipdistance

2013-08-15 Thread Zack Rusin

If clipdistance for one of the vertices is nan (or inf) then the
entire primitive should be discarded.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_cliptest_tmp.h |2 +-
 src/gallium/auxiliary/draw/draw_llvm.c |3 ++
 src/gallium/auxiliary/draw/draw_pipe_clip.c|   13 +-
 src/gallium/auxiliary/gallivm/lp_bld_arit.c|   53 
 src/gallium/auxiliary/gallivm/lp_bld_arit.h|   11 +
 5 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h 
b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
index e4500db..fc54810 100644
--- a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
+++ b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
@@ -140,7 +140,7 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs,
  clipdist = out-data[cd[0]][i];
   else
  clipdist = out-data[cd[1]][i-4];
-  if (clipdist  0)
+  if (clipdist  0 || util_is_inf_or_nan(clipdist))
  mask |= 1  plane_idx;
} else {
   if (dot4(clipvertex, plane[plane_idx])  0)
diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index 84e3392..1e9eadb 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -1261,6 +1261,7 @@ generate_clipmask(struct draw_llvm *llvm,
if (clip_user) {
   LLVMValueRef planes_ptr = draw_jit_context_planes(gallivm, context_ptr);
   LLVMValueRef indices[3];
+  LLVMValueRef is_nan;
 
   /* userclip planes */
   while (ucp_enable) {
@@ -1280,6 +1281,8 @@ generate_clipmask(struct draw_llvm *llvm,
clipdist = LLVMBuildLoad(builder, outputs[cd[1]][i-4], );
 }
 test = lp_build_compare(gallivm, f32_type, PIPE_FUNC_GREATER, 
zero, clipdist);
+is_nan = lp_build_is_inf_or_nan(gallivm, vs_type, clipdist);
+test = LLVMBuildOr(builder, test, is_nan, );
 temp = lp_build_const_int_vec(gallivm, i32_type, 1  plane_idx);
 test = LLVMBuildAnd(builder, test, temp, );
 mask = LLVMBuildOr(builder, mask, test, );
diff --git a/src/gallium/auxiliary/draw/draw_pipe_clip.c 
b/src/gallium/auxiliary/draw/draw_pipe_clip.c
index b76e9a5..2f2aadb 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_clip.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_clip.c
@@ -104,7 +104,7 @@ static void interp_attr( float dst[4],
 float t,
 const float in[4],
 const float out[4] )
-{  
+{
dst[0] = LINTERP( t, out[0], in[0] );
dst[1] = LINTERP( t, out[1], in[1] );
dst[2] = LINTERP( t, out[2], in[2] );
@@ -380,6 +380,9 @@ do_clip_tri( struct draw_stage *stage,
   dp_prev = getclipdist(clipper, vert_prev, plane_idx);
   clipmask = ~(1plane_idx);
 
+  if (util_is_inf_or_nan(dp_prev))
+ return; //discard nan
+
   assert(n  MAX_CLIPPED_VERTICES);
   if (n = MAX_CLIPPED_VERTICES)
  return;
@@ -392,6 +395,9 @@ do_clip_tri( struct draw_stage *stage,
 
  float dp = getclipdist(clipper, vert, plane_idx);
 
+ if (util_is_inf_or_nan(dp))
+return; //discard nan
+
 if (!IS_NEGATIVE(dp_prev)) {
 assert(outcount  MAX_CLIPPED_VERTICES);
 if (outcount = MAX_CLIPPED_VERTICES)
@@ -522,6 +528,9 @@ do_clip_line( struct draw_stage *stage,
   const float dp0 = getclipdist(clipper, v0, plane_idx);
   const float dp1 = getclipdist(clipper, v1, plane_idx);
 
+  if (util_is_inf_or_nan(dp0) || util_is_inf_or_nan(dp1))
+ return; //discard nan
+
   if (dp1  0.0F) {
 float t = dp1 / (dp1 - dp0);
  t1 = MAX2(t1, t);
@@ -594,7 +603,7 @@ clip_tri( struct draw_stage *stage,
unsigned clipmask = (header-v[0]-clipmask | 
 header-v[1]-clipmask | 
 header-v[2]-clipmask);
-   
+
if (clipmask == 0) {
   /* no clipping needed */
   stage-next-tri( stage-next, header );
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c 
b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
index 98409c3..72b563e 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
@@ -3671,3 +3671,56 @@ lp_build_isfinite(struct lp_build_context *bld,
return lp_build_compare(bld-gallivm, int_type, PIPE_FUNC_NOTEQUAL,
intx, infornan32);
 }
+
+/*
+ * Returns true if the number is nan or inf or false otherwise.
+ * The input has to be a floating point vector.
+ */
+LLVMValueRef
+lp_build_is_inf_or_nan(struct gallivm_state *gallivm,
+   const struct lp_type type,
+   LLVMValueRef x)
+{
+   LLVMBuilderRef builder = gallivm-builder;
+   struct lp_type int_type = lp_int_type(type);
+

Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views

2013-08-15 Thread Christian König


Am 15.08.2013 19:01, schrieb Marek Olšák:

(This should be applied before MSAA, which will need to be rebased.)

It moves all sampler view descriptors to a buffer.
It supports partial resource updates and it can also unbind resources
(required for FMASK texturing).

The buffer contains all sampler view descriptors for one shader stage,
represented as an array. On top of that, there are N arrays in the buffer,
which are used to emulate context registers as implemented by the previous
ASICs (each array is a context).

This uses the RCU synchronization approach to avoid read-after-write hazards
as discussed in the thread:
radeonsi: add FMASK texture binding slots and resource setup

CP DMA is used to clear the descriptors at context initialization and to copy
the descriptors from one context to the next.

v2: - use PKT3_DMA_DATA on CIK (I'll test CIK later)
 - turn the bool CP DMA parameters into self-explanatory flags
 - add a nice simple API for packet emission to radeon_winsys.h
 - use 256 contexts, 128 causes texture corruption in openarena

DISCUSSION:
   Maybe there is a synchronization issue and we don't actually need so many
   contexts? We can always flush KCACHE at the end of the ring if needed.


Well you definitely need to flush the texture cache when changing the 
descriptors, but I assume you already do so.


Going to dig up the documentation for it, but 256 indeed sounds a bit much.

Christian.


---
  src/gallium/drivers/radeonsi/Makefile.sources  |   1 +
  src/gallium/drivers/radeonsi/r600_blit.c   |  12 +-
  src/gallium/drivers/radeonsi/r600_hw_context.c |  22 +-
  src/gallium/drivers/radeonsi/radeonsi_pipe.c   |   7 +-
  src/gallium/drivers/radeonsi/radeonsi_pipe.h   |  19 +-
  src/gallium/drivers/radeonsi/si_descriptors.c  | 355 +
  src/gallium/drivers/radeonsi/si_state.c|  47 +---
  src/gallium/drivers/radeonsi/si_state.h|  56 
  src/gallium/drivers/radeonsi/si_state_draw.c   |  18 +-
  src/gallium/drivers/radeonsi/sid.h |  54 
  src/gallium/winsys/radeon/drm/radeon_winsys.h  |  12 +
  11 files changed, 547 insertions(+), 56 deletions(-)
  create mode 100644 src/gallium/drivers/radeonsi/si_descriptors.c

diff --git a/src/gallium/drivers/radeonsi/Makefile.sources 
b/src/gallium/drivers/radeonsi/Makefile.sources
index b3ffa72..68c8282 100644
--- a/src/gallium/drivers/radeonsi/Makefile.sources
+++ b/src/gallium/drivers/radeonsi/Makefile.sources
@@ -10,6 +10,7 @@ C_SOURCES := \
r600_translate.c \
radeonsi_pm4.c \
radeonsi_compute.c \
+   si_descriptors.c \
si_state.c \
si_state_streamout.c \
si_state_draw.c \
diff --git a/src/gallium/drivers/radeonsi/r600_blit.c 
b/src/gallium/drivers/radeonsi/r600_blit.c
index bab108e..bdd9bb4 100644
--- a/src/gallium/drivers/radeonsi/r600_blit.c
+++ b/src/gallium/drivers/radeonsi/r600_blit.c
@@ -70,12 +70,12 @@ static void r600_blitter_begin(struct pipe_context *ctx, 
enum r600_blitter_op op
  
  	if (op  R600_SAVE_TEXTURES) {

util_blitter_save_fragment_sampler_states(
-   rctx-blitter, rctx-ps_samplers.n_samplers,
-   (void**)rctx-ps_samplers.samplers);
+   rctx-blitter, 
rctx-samplers[PIPE_SHADER_FRAGMENT].n_samplers,
+   (void**)rctx-samplers[PIPE_SHADER_FRAGMENT].samplers);
  
-		util_blitter_save_fragment_sampler_views(

-   rctx-blitter, rctx-ps_samplers.n_views,
-   (struct pipe_sampler_view**)rctx-ps_samplers.views);
+   util_blitter_save_fragment_sampler_views(rctx-blitter,
+   
util_last_bit(rctx-samplers[PIPE_SHADER_FRAGMENT].views.desc.enabled_mask),
+   rctx-samplers[PIPE_SHADER_FRAGMENT].views.views);
}
  
  	if ((op  R600_DISABLE_RENDER_COND)  rctx-current_render_cond) {

@@ -224,7 +224,7 @@ void si_flush_depth_textures(struct r600_context *rctx,
struct pipe_sampler_view *view;
struct r600_texture *tex;
  
-		view = textures-views[i]-base;

+   view = textures-views.views[i];
if (!view) continue;
  
  		tex = (struct r600_texture *)view-texture;

diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c 
b/src/gallium/drivers/radeonsi/r600_hw_context.c
index 25c972b..bc6ba0b 100644
--- a/src/gallium/drivers/radeonsi/r600_hw_context.c
+++ b/src/gallium/drivers/radeonsi/r600_hw_context.c
@@ -114,9 +114,17 @@ err:
  void si_need_cs_space(struct r600_context *ctx, unsigned num_dw,
boolean count_draw_in)
  {
+   int i;
+
/* The number of dwords we already used in the CS so far. */
num_dw += ctx-cs-cdw;
  
+	for (i = 0; i  SI_NUM_ATOMS(ctx); i++) {

+   if (ctx-atoms.array[i]-dirty) {
+   num_dw += ctx-atoms.array[i]-num_dw;
+   }
+   }
+
if (count_draw_in) {

Re: [Mesa-dev] radeonsi: LLVM r187139 broke some piglit tests

2013-08-15 Thread Michel Dänzer

On Don, 2013-08-15 at 09:16 -0700, Tom Stellard wrote:
 On Thu, Aug 15, 2013 at 08:22:39AM -0700, Tom Stellard wrote:
  On Thu, Aug 15, 2013 at 11:55:36AM +0200, Michel Dänzer wrote:
   On Fre, 2013-08-02 at 17:58 +0200, Michel Dänzer wrote:
On Mit, 2013-07-31 at 08:42 -0700, Tom Stellard wrote:
 On Wed, Jul 31, 2013 at 01:04:01PM +0200, Michel Dänzer wrote:
  
  LLVM revision 187139 ('Allocate local registers in order for optimal
  coloring.') broke some derivative related piglit tests with the 
  radeonsi
  driver. 
  
  I'm attaching a diff between the bad and good generated code (as 
  printed
  with RADEON_DUMP_SHADERS=1) for the glsl-derivs test. The only
  difference I can see is in which registers are used in which order.
  
  I wonder if we might be missing S_WAITCNT after DS_READ/WRITE
  instructions in some cases, but I haven't spotted any candidates for
  that in the bad code which aren't there in the good code as well. 
  Can
  anyone else spot something I've missed?
 
 Shouldn't we be using the S_BARRIER instruction to keep the threads 
 in sync?

Doesn't seem to help unfortunately, but thanks for the good suggestion.
   
   I found one thing going wrong: DS_WRITE_B32 ends up using a VGPR
   register number instead of the $gds operand for encoding the GDS field
   (the asm output from llc even shows the VGPR name). If the VGPR number
   happens to be odd (i.e. to have the least significant bit set), the
   shader ends up writing to GDS instead of LDS.
  
  
  Ouch, that's a pretty bad bug.
  
   But I have no idea why this is happening, or how to fix it. :(
   
   
  
  I can take a look at it.
 
 The attached patch should fix the problem, can you test?

Thanks for finding my silly mistake.

However, I'd like to preserve the ability to use these instructions for
GDS access, and the logic in SIInsertWaits::getHwCounts() only really
makes sense for SMRD anyway.

How about this patch instead? It fixes the piglit regressions that
prompted me to start this thread.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
From 4ad4d9bba31b06bbadb83fead3e53d989b80d83e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Michel=20D=C3=A4nzer?= michel.daen...@amd.com
Date: Thu, 15 Aug 2013 19:43:02 +0200
Subject: [PATCH] R600/SI: Fix broken encoding of DS_WRITE_B32
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The logic in SIInsertWaits::getHwCounts() only really made sense for SMRD
instructions, and trying to shoehorn it into handling DS_WRITE_B32 caused
it to corrupt the encoding of that by clobbering the first operand with
the second one.

Undo that damage and only apply the SMRD logic to that.

Fixes some derivates related piglit regressions with radeonsi.

Signed-off-by: Michel Dänzer michel.daen...@amd.com
---
 lib/Target/R600/SIInsertWaits.cpp | 22 ++
 test/CodeGen/R600/local-memory.ll |  4 ++--
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/lib/Target/R600/SIInsertWaits.cpp b/lib/Target/R600/SIInsertWaits.cpp
index ba202e3..200e064 100644
--- a/lib/Target/R600/SIInsertWaits.cpp
+++ b/lib/Target/R600/SIInsertWaits.cpp
@@ -134,14 +134,20 @@ Counters SIInsertWaits::getHwCounts(MachineInstr MI) {
   // LGKM may uses larger values
   if (TSFlags  SIInstrFlags::LGKM_CNT) {
 
-MachineOperand Op = MI.getOperand(0);
-if (!Op.isReg())
-  Op = MI.getOperand(1);
-assert(Op.isReg()  First LGKM operand must be a register!);
-
-unsigned Reg = Op.getReg();
-unsigned Size = TRI-getMinimalPhysRegClass(Reg)-getSize();
-Result.Named.LGKM = Size  4 ? 2 : 1;
+if (MI.getNumOperands() == 3) {
+
+  // SMRD
+  MachineOperand Op = MI.getOperand(0);
+  assert(Op.isReg()  First LGKM operand must be a register!);
+
+  unsigned Reg = Op.getReg();
+  unsigned Size = TRI-getMinimalPhysRegClass(Reg)-getSize();
+  Result.Named.LGKM = Size  4 ? 2 : 1;
+
+} else {
+  // DS
+  Result.Named.LGKM = 1;
+}
 
   } else {
 Result.Named.LGKM = 0;
diff --git a/test/CodeGen/R600/local-memory.ll b/test/CodeGen/R600/local-memory.ll
index 5458fb9..9ebb769 100644
--- a/test/CodeGen/R600/local-memory.ll
+++ b/test/CodeGen/R600/local-memory.ll
@@ -13,7 +13,7 @@
 ; SI-CHECK-NEXT: .long 32768
 
 ; EG-CHECK: LDS_WRITE
-; SI-CHECK: DS_WRITE_B32
+; SI-CHECK: DS_WRITE_B32 0
 
 ; GROUP_BARRIER must be the last instruction in a clause
 ; EG-CHECK: GROUP_BARRIER
@@ -21,7 +21,7 @@
 ; SI-CHECK: S_BARRIER
 
 ; EG-CHECK: LDS_READ_RET
-; SI-CHECK: DS_READ_B32
+; SI-CHECK: DS_READ_B32 {{VGPR[0-9]+}}, 0
 
 define void @local_memory(i32 addrspace(1)* %out) {
 entry:
-- 
1.8.4.rc2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] draw: handle nan clipdistance

2013-08-15 Thread Roland Scheidegger

Am 15.08.2013 19:12, schrieb Zack Rusin:
 If clipdistance for one of the vertices is nan (or inf) then the
 entire primitive should be discarded.
 
 Signed-off-by: Zack Rusin za...@vmware.com
 ---
  src/gallium/auxiliary/draw/draw_cliptest_tmp.h |2 +-
  src/gallium/auxiliary/draw/draw_llvm.c |3 ++
  src/gallium/auxiliary/draw/draw_pipe_clip.c|   13 +-
  src/gallium/auxiliary/gallivm/lp_bld_arit.c|   53 
 
  src/gallium/auxiliary/gallivm/lp_bld_arit.h|   11 +
  5 files changed, 79 insertions(+), 3 deletions(-)
 
 diff --git a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h 
 b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
 index e4500db..fc54810 100644
 --- a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
 +++ b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
 @@ -140,7 +140,7 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs,
   clipdist = out-data[cd[0]][i];
else
   clipdist = out-data[cd[1]][i-4];
 -  if (clipdist  0)
 +  if (clipdist  0 || util_is_inf_or_nan(clipdist))
   mask |= 1  plane_idx;
 } else {
if (dot4(clipvertex, plane[plane_idx])  0)
 diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
 b/src/gallium/auxiliary/draw/draw_llvm.c
 index 84e3392..1e9eadb 100644
 --- a/src/gallium/auxiliary/draw/draw_llvm.c
 +++ b/src/gallium/auxiliary/draw/draw_llvm.c
 @@ -1261,6 +1261,7 @@ generate_clipmask(struct draw_llvm *llvm,
 if (clip_user) {
LLVMValueRef planes_ptr = draw_jit_context_planes(gallivm, 
 context_ptr);
LLVMValueRef indices[3];
 +  LLVMValueRef is_nan;
  
/* userclip planes */
while (ucp_enable) {
 @@ -1280,6 +1281,8 @@ generate_clipmask(struct draw_llvm *llvm,
 clipdist = LLVMBuildLoad(builder, outputs[cd[1]][i-4], );
  }
  test = lp_build_compare(gallivm, f32_type, PIPE_FUNC_GREATER, 
 zero, clipdist);
 +is_nan = lp_build_is_inf_or_nan(gallivm, vs_type, clipdist);
I think this should either use is_inf_or_nan (if it only cares about
nan, though I guess it really does care about inf too) or else the var
should be named is_inf_or_nan.

 +test = LLVMBuildOr(builder, test, is_nan, );
  temp = lp_build_const_int_vec(gallivm, i32_type, 1  plane_idx);
  test = LLVMBuildAnd(builder, test, temp, );
  mask = LLVMBuildOr(builder, mask, test, );
 diff --git a/src/gallium/auxiliary/draw/draw_pipe_clip.c 
 b/src/gallium/auxiliary/draw/draw_pipe_clip.c
 index b76e9a5..2f2aadb 100644
 --- a/src/gallium/auxiliary/draw/draw_pipe_clip.c
 +++ b/src/gallium/auxiliary/draw/draw_pipe_clip.c
 @@ -104,7 +104,7 @@ static void interp_attr( float dst[4],
float t,
const float in[4],
const float out[4] )
 -{  
 +{
 dst[0] = LINTERP( t, out[0], in[0] );
 dst[1] = LINTERP( t, out[1], in[1] );
 dst[2] = LINTERP( t, out[2], in[2] );
 @@ -380,6 +380,9 @@ do_clip_tri( struct draw_stage *stage,
dp_prev = getclipdist(clipper, vert_prev, plane_idx);
clipmask = ~(1plane_idx);
  
 +  if (util_is_inf_or_nan(dp_prev))
 + return; //discard nan
 +
assert(n  MAX_CLIPPED_VERTICES);
if (n = MAX_CLIPPED_VERTICES)
   return;
 @@ -392,6 +395,9 @@ do_clip_tri( struct draw_stage *stage,
  
   float dp = getclipdist(clipper, vert, plane_idx);
  
 + if (util_is_inf_or_nan(dp))
 +return; //discard nan
 +
if (!IS_NEGATIVE(dp_prev)) {
  assert(outcount  MAX_CLIPPED_VERTICES);
  if (outcount = MAX_CLIPPED_VERTICES)
 @@ -522,6 +528,9 @@ do_clip_line( struct draw_stage *stage,
const float dp0 = getclipdist(clipper, v0, plane_idx);
const float dp1 = getclipdist(clipper, v1, plane_idx);
  
 +  if (util_is_inf_or_nan(dp0) || util_is_inf_or_nan(dp1))
 + return; //discard nan
 +
if (dp1  0.0F) {
float t = dp1 / (dp1 - dp0);
   t1 = MAX2(t1, t);
 @@ -594,7 +603,7 @@ clip_tri( struct draw_stage *stage,
 unsigned clipmask = (header-v[0]-clipmask | 
  header-v[1]-clipmask | 
  header-v[2]-clipmask);
 -   
 +
 if (clipmask == 0) {
/* no clipping needed */
stage-next-tri( stage-next, header );
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c 
 b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
 index 98409c3..72b563e 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
 @@ -3671,3 +3671,56 @@ lp_build_isfinite(struct lp_build_context *bld,
 return lp_build_compare(bld-gallivm, int_type, PIPE_FUNC_NOTEQUAL,
 intx, infornan32);
  }
 +
 +/*
 + * Returns true if the number is nan or inf or false otherwise.
 + *

[Mesa-dev] [PATCH 1/4] glsl: Rename ubo_qualifiers_valid to ubo_qualifiers_allowed.

2013-08-15 Thread Matt Turner

The variable means that UBO qualifiers are allowed in a particular
context (e.g., not allowed in a struct field declaration), rather than a
particular set of UBO qualifiers are valid.
---
 src/glsl/ast.h  | 2 +-
 src/glsl/ast_to_hir.cpp | 6 +++---
 src/glsl/glsl_parser.yy | 2 +-
 src/glsl/glsl_parser_extras.cpp | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/glsl/ast.h b/src/glsl/ast.h
index 9b194db..b0446b5 100644
--- a/src/glsl/ast.h
+++ b/src/glsl/ast.h
@@ -640,7 +640,7 @@ public:
 * Flag indicating that these declarators are in a uniform block,
 * allowing UBO type qualifiers.
 */
-   bool ubo_qualifiers_valid;
+   bool ubo_qualifiers_allowed;
 };
 
 
diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 04b16c8..9bc713d 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -1929,7 +1929,7 @@ apply_type_qualifier_to_variable(const struct 
ast_type_qualifier *qual,
 ir_variable *var,
 struct _mesa_glsl_parse_state *state,
 YYLTYPE *loc,
-bool ubo_qualifiers_valid,
+bool ubo_qualifiers_allowed,
  bool is_parameter)
 {
STATIC_ASSERT(sizeof(qual-flags.q) = sizeof(qual-flags.i));
@@ -2275,7 +2275,7 @@ apply_type_qualifier_to_variable(const struct 
ast_type_qualifier *qual,
}
 
if (qual-flags.q.row_major || qual-flags.q.column_major) {
-  if (!ubo_qualifiers_valid) {
+  if (!ubo_qualifiers_allowed) {
 _mesa_glsl_error(loc, state,
  uniform block layout qualifiers row_major and 
  column_major can only be applied to uniform block 
@@ -2830,7 +2830,7 @@ ast_declarator_list::hir(exec_list *instructions,
   }
 
   apply_type_qualifier_to_variable( this-type-qualifier, var, state,
-   loc, this-ubo_qualifiers_valid, 
false);
+   loc, this-ubo_qualifiers_allowed, 
false);
 
   if (this-type-qualifier.flags.q.invariant) {
 if ((state-target == vertex_shader) 
diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy
index e3a57ea..0ed3453 100644
--- a/src/glsl/glsl_parser.yy
+++ b/src/glsl/glsl_parser.yy
@@ -2295,7 +2295,7 @@ member_declaration:
 
   $$ = new(ctx) ast_declarator_list(type);
   $$-set_location(yylloc);
-  $$-ubo_qualifiers_valid = true;
+  $$-ubo_qualifiers_allowed = true;
 
   $$-declarations.push_degenerate_list_at_head( $2-link);
}
diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
index 88f0483..98fd510 100644
--- a/src/glsl/glsl_parser_extras.cpp
+++ b/src/glsl/glsl_parser_extras.cpp
@@ -1170,7 +1170,7 @@ 
ast_declarator_list::ast_declarator_list(ast_fully_specified_type *type)
 {
this-type = type;
this-invariant = false;
-   this-ubo_qualifiers_valid = false;
+   this-ubo_qualifiers_allowed = false;
 }
 
 void
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/4] glsl: Drop duplicate error messages.

2013-08-15 Thread Matt Turner

This same message is printed in the validate_matrix_layout_for_type
function.
---
 src/glsl/ast_to_hir.cpp | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 9bc713d..c9ffa30 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -2275,13 +2275,7 @@ apply_type_qualifier_to_variable(const struct 
ast_type_qualifier *qual,
}
 
if (qual-flags.q.row_major || qual-flags.q.column_major) {
-  if (!ubo_qualifiers_allowed) {
-_mesa_glsl_error(loc, state,
- uniform block layout qualifiers row_major and 
- column_major can only be applied to uniform block 
- members);
-  } else
-validate_matrix_layout_for_type(state, loc, var-type);
+  validate_matrix_layout_for_type(state, loc, var-type);
}
 }
 
@@ -4415,11 +4409,6 @@ ast_process_structure_or_interface_block(exec_list 
*instructions,
_mesa_glsl_error(loc, state,
 row_major and column_major can only be 
 applied to uniform interface blocks);
-} else if (!field_type-is_matrix()  !field_type-is_record()) {
-   _mesa_glsl_error(loc, state,
-uniform block layout qualifiers row_major and 

-column_major can only be applied to matrix 
and 
-structure types);
 } else
validate_matrix_layout_for_type(state, loc, field_type);
  }
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] glsl: Remove ubo_qualifiers_allowed variable.

2013-08-15 Thread Matt Turner

No longer used.
---
 src/glsl/ast.h  | 6 --
 src/glsl/ast_to_hir.cpp | 5 ++---
 src/glsl/glsl_parser.yy | 1 -
 src/glsl/glsl_parser_extras.cpp | 1 -
 4 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/src/glsl/ast.h b/src/glsl/ast.h
index b0446b5..33a00e0 100644
--- a/src/glsl/ast.h
+++ b/src/glsl/ast.h
@@ -635,12 +635,6 @@ public:
 * is used to note these cases when no type is specified.
 */
int invariant;
-
-   /**
-* Flag indicating that these declarators are in a uniform block,
-* allowing UBO type qualifiers.
-*/
-   bool ubo_qualifiers_allowed;
 };
 
 
diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index c9ffa30..c2fdbd5 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -1929,7 +1929,6 @@ apply_type_qualifier_to_variable(const struct 
ast_type_qualifier *qual,
 ir_variable *var,
 struct _mesa_glsl_parse_state *state,
 YYLTYPE *loc,
-bool ubo_qualifiers_allowed,
  bool is_parameter)
 {
STATIC_ASSERT(sizeof(qual-flags.q) = sizeof(qual-flags.i));
@@ -2824,7 +2823,7 @@ ast_declarator_list::hir(exec_list *instructions,
   }
 
   apply_type_qualifier_to_variable( this-type-qualifier, var, state,
-   loc, this-ubo_qualifiers_allowed, 
false);
+   loc, false);
 
   if (this-type-qualifier.flags.q.invariant) {
 if ((state-target == vertex_shader) 
@@ -3334,7 +,7 @@ ast_parameter_declarator::hir(exec_list *instructions,
 * for function parameters the default mode is 'in'.
 */
apply_type_qualifier_to_variable( this-type-qualifier, var, state,  loc,
-   false, true);
+   true);
 
/* From page 17 (page 23 of the PDF) of the GLSL 1.20 spec:
 *
diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy
index 0ed3453..d8e589d 100644
--- a/src/glsl/glsl_parser.yy
+++ b/src/glsl/glsl_parser.yy
@@ -2295,7 +2295,6 @@ member_declaration:
 
   $$ = new(ctx) ast_declarator_list(type);
   $$-set_location(yylloc);
-  $$-ubo_qualifiers_allowed = true;
 
   $$-declarations.push_degenerate_list_at_head( $2-link);
}
diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
index 98fd510..8e87556 100644
--- a/src/glsl/glsl_parser_extras.cpp
+++ b/src/glsl/glsl_parser_extras.cpp
@@ -1170,7 +1170,6 @@ 
ast_declarator_list::ast_declarator_list(ast_fully_specified_type *type)
 {
this-type = type;
this-invariant = false;
-   this-ubo_qualifiers_allowed = false;
 }
 
 void
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/4] glsl: Give a warning, not an error, for UBO qualifiers on struct fields.

2013-08-15 Thread Matt Turner

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59648
---
 src/glsl/ast_to_hir.cpp | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index c2fdbd5..f35c11f 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -1830,10 +1830,11 @@ validate_matrix_layout_for_type(struct 
_mesa_glsl_parse_state *state,
const glsl_type *type)
 {
if (!type-is_matrix()  !type-is_record()) {
-  _mesa_glsl_error(loc, state,
-   uniform block layout qualifiers row_major and 
-   column_major can only be applied to matrix and 
-   structure types);
+  _mesa_glsl_warning(loc, state,
+ uniform block layout qualifiers row_major and 
+ column_major applied to structure fields is not 
+ strictly conformant and my be rejected by other 
+ compilers);
} else if (type-is_record()) {
   /* We allow 'layout(row_major)' on structure types because it's the only
* way to get row-major layouts on matrices contained in structures.
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] draw: handle nan clipdistance

2013-08-15 Thread Zack Rusin

 I realize this function isn't used but it looks unnecessarily
 complicated - two constants one AND plus one comparison when you could
 simply do a single comparison (compare x with x with unordered not
 equal). This is actually doubly bad with AVX because the int comparison
 is going to use 4 instructions instead of 1 (extract/2 cmp/1 insert),
 well if this runs 8-wide at least.

I'm going to kill that function, we already have lp_build_isnan that does the 
correct thing.

 Otherwise looks good. Though I'm not sure you really need to kill the
 prims if the clip distances are infinite?

The d3d10 spec says Coordinates coming in to clipping with infinites at x, y, 
z may or may not result in a discarded primitive.. I liked handling them the 
same way as nan, otherwise we're just generating pointless primitives. I don't 
have a strong opinion though, wlk doesn't seem to test infinites.

z
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] glsl: Give a warning, not an error, for UBO qualifiers on struct fields.

2013-08-15 Thread Anuj Phogat

On Thu, Aug 15, 2013 at 11:27 AM, Matt Turner matts...@gmail.com wrote:
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59648
 ---
  src/glsl/ast_to_hir.cpp | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

 diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
 index c2fdbd5..f35c11f 100644
 --- a/src/glsl/ast_to_hir.cpp
 +++ b/src/glsl/ast_to_hir.cpp
 @@ -1830,10 +1830,11 @@ validate_matrix_layout_for_type(struct 
 _mesa_glsl_parse_state *state,
 const glsl_type *type)
  {
 if (!type-is_matrix()  !type-is_record()) {
 -  _mesa_glsl_error(loc, state,
 -   uniform block layout qualifiers row_major and 
 -   column_major can only be applied to matrix and 
 -   structure types);
 +  _mesa_glsl_warning(loc, state,
 + uniform block layout qualifiers row_major and 
 + column_major applied to structure fields is not 
 + strictly conformant and my be rejected by other 
typo in spelling of may.
 + compilers);
 } else if (type-is_record()) {
/* We allow 'layout(row_major)' on structure types because it's the 
 only
 * way to get row-major layouts on matrices contained in structures.
 --
 1.8.3.2

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

This series is Reviewed-by: Anuj Phogat anuj.pho...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] draw: handle nan clipdistance

2013-08-15 Thread Roland Scheidegger

Am 15.08.2013 20:27, schrieb Zack Rusin:
 I realize this function isn't used but it looks unnecessarily 
 complicated - two constants one AND plus one comparison when you
 could simply do a single comparison (compare x with x with
 unordered not equal). This is actually doubly bad with AVX because
 the int comparison is going to use 4 instructions instead of 1
 (extract/2 cmp/1 insert), well if this runs 8-wide at least.
 
 I'm going to kill that function, we already have lp_build_isnan that
 does the correct thing.
Ah right. It actually uses OEQ then does a not, I think it would be
easier to understand if it would just use UNE.
I think at some point we might want to optimize these functions a bit,
even if it's really llvm's fault.
Consider this (a trivial select 1.0 if input is a nan otherwise 0.0)

define 4 x float @une(4 x float %in)
{
entry:
  %0 = fcmp une 4 x float %in, %in
  %1 = sext 4 x i1 %0 to 4 x i32
  %2 = and 4 x i32 %1, i32 1065353216,i32 1065353216,i32
1065353216,i32 1065353216
  %3 = bitcast 4 x i32 %2 to 4 x float
  ret 4 x float %3
}
This generates some neat code:
vcmpunordps %xmm0, %xmm0, %xmm0
vandps  .LCPI0_0(%rip), %xmm0, %xmm0
ret

(FWIW the OEQ version using a xor generates much the same just vcmpordps
and vandnotps instead).

But now do it with a 1-vec (which we will generally never do, just for
illustration):

define 1 x float @une(1 x float %in)
{
entry:
  %0 = fcmp une 1 x float %in, %in
  %1 = sext 1 x i1 %0 to 1 x i32
  %2 = and 1 x i32 %1, i32 1065353216
  %3 = bitcast 1 x i32 %2 to 1 x float
  ret 1 x float %3
}
Generates not-so-hot-code:
vucomiss%xmm0, %xmm0
setp%al
movzbl  %al, %eax
shll$31, %eax
sarl$31, %eax
andl$1065353216, %eax   # imm = 0x3F80
vmovd   %eax, %xmm0
ret

We generally use true scalars instead:
define float @une(float %in)
{
entry:
  %0 = fcmp une float %in, %in
  %1 = sext i1 %0 to i32
  %2 = and i32 %1, 1065353216
  %3 = bitcast i32 %2 to float
  ret float %3
}
Which generates this abomination (don't ask me why 1 x float vs. float
makes a difference in generated code):
vucomiss%xmm0, %xmm0
setp%cl
setne   %al
orb %cl, %al
movl$-1, %ecx
testb   %al, %al
movl$0, %eax
cmovnel %ecx, %eax
andl$1065353216, %eax   # imm = 0x3F80
vmovd   %eax, %xmm0
ret

So, it looks like for the scalar case using int comparisons would be
definitely better as it should avoid most of that ugliness (but I
couldn't be bothered to write the test case for that).
Now granted my guess is llvm simply should have used the simd unit no
matter which version is used in the first place - nothing forbids it to
use vector instructions for scalars. But no luck unfortunately...
(It is possible newer llvm versions handle this better, didn't check,
but I suspect shit like that could potentially make a performance
difference if these functions are used enough.)


 
 Otherwise looks good. Though I'm not sure you really need to kill
 the prims if the clip distances are infinite?
 
 The d3d10 spec says Coordinates coming in to clipping with infinites
 at x, y, z may or may not result in a discarded primitive.. I liked
 handling them the same way as nan, otherwise we're just generating
 pointless primitives. I don't have a strong opinion though, wlk
 doesn't seem to test infinites.

Ok whatever works. I just thought the test may be simpler but maybe not
(given the above...) so that's just fine.

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] i965/gen7: Set MOCS L3 cacheability for IVB/BYT

2013-08-15 Thread Vedran Rodic

 We do have the set_caching ioctl. It's enough to flip the PTEs to UC and
 let MOCS manage things. I actually did a few experiments on my IVB. I
 made all Mesa's buffers UC via PTEs by patching libdrm to change the
 cache mode of each bo after allocation. Then I fiddled with the MOCS
 LLC bits in various ways. It definitely has an effect, sometimes making
 things slower, sometimes faster. xonotic again seemed to benefit. IIRC
 leaving everything LLC uncached was actually the fastest (w/ high quality
 at least) so we may be thrashing the LLC a bit there. But eg. reaction
 quake regressed quite a lot if most things were left as UC.

Can you share the libdrm patch?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] radeonsi: LLVM r187139 broke some piglit tests

2013-08-15 Thread Tom Stellard

On Thu, Aug 15, 2013 at 07:50:10PM +0200, Michel Dänzer wrote:
 On Don, 2013-08-15 at 09:16 -0700, Tom Stellard wrote:
  On Thu, Aug 15, 2013 at 08:22:39AM -0700, Tom Stellard wrote:
   On Thu, Aug 15, 2013 at 11:55:36AM +0200, Michel Dänzer wrote:
On Fre, 2013-08-02 at 17:58 +0200, Michel Dänzer wrote:
 On Mit, 2013-07-31 at 08:42 -0700, Tom Stellard wrote:
  On Wed, Jul 31, 2013 at 01:04:01PM +0200, Michel Dänzer wrote:
   
   LLVM revision 187139 ('Allocate local registers in order for 
   optimal
   coloring.') broke some derivative related piglit tests with the 
   radeonsi
   driver. 
   
   I'm attaching a diff between the bad and good generated code (as 
   printed
   with RADEON_DUMP_SHADERS=1) for the glsl-derivs test. The only
   difference I can see is in which registers are used in which 
   order.
   
   I wonder if we might be missing S_WAITCNT after DS_READ/WRITE
   instructions in some cases, but I haven't spotted any candidates 
   for
   that in the bad code which aren't there in the good code as well. 
   Can
   anyone else spot something I've missed?
  
  Shouldn't we be using the S_BARRIER instruction to keep the threads 
  in sync?
 
 Doesn't seem to help unfortunately, but thanks for the good 
 suggestion.

I found one thing going wrong: DS_WRITE_B32 ends up using a VGPR
register number instead of the $gds operand for encoding the GDS field
(the asm output from llc even shows the VGPR name). If the VGPR number
happens to be odd (i.e. to have the least significant bit set), the
shader ends up writing to GDS instead of LDS.
   
   
   Ouch, that's a pretty bad bug.
   
But I have no idea why this is happening, or how to fix it. :(


   
   I can take a look at it.
  
  The attached patch should fix the problem, can you test?
 
 Thanks for finding my silly mistake.
 
 However, I'd like to preserve the ability to use these instructions for
 GDS access, and the logic in SIInsertWaits::getHwCounts() only really
 makes sense for SMRD anyway.
 
 How about this patch instead? It fixes the piglit regressions that
 prompted me to start this thread.
 
 
 -- 
 Earthling Michel Dänzer   |   http://www.amd.com
 Libre software enthusiast |  Debian, X and DRI developer

 From 4ad4d9bba31b06bbadb83fead3e53d989b80d83e Mon Sep 17 00:00:00 2001
 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= michel.daen...@amd.com
 Date: Thu, 15 Aug 2013 19:43:02 +0200
 Subject: [PATCH] R600/SI: Fix broken encoding of DS_WRITE_B32
 MIME-Version: 1.0
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit
 
 The logic in SIInsertWaits::getHwCounts() only really made sense for SMRD
 instructions, and trying to shoehorn it into handling DS_WRITE_B32 caused
 it to corrupt the encoding of that by clobbering the first operand with
 the second one.
 
 Undo that damage and only apply the SMRD logic to that.
 
 Fixes some derivates related piglit regressions with radeonsi.
 
 Signed-off-by: Michel Dänzer michel.daen...@amd.com
 ---
  lib/Target/R600/SIInsertWaits.cpp | 22 ++
  test/CodeGen/R600/local-memory.ll |  4 ++--
  2 files changed, 16 insertions(+), 10 deletions(-)
 
 diff --git a/lib/Target/R600/SIInsertWaits.cpp 
 b/lib/Target/R600/SIInsertWaits.cpp
 index ba202e3..200e064 100644
 --- a/lib/Target/R600/SIInsertWaits.cpp
 +++ b/lib/Target/R600/SIInsertWaits.cpp
 @@ -134,14 +134,20 @@ Counters SIInsertWaits::getHwCounts(MachineInstr MI) {
// LGKM may uses larger values
if (TSFlags  SIInstrFlags::LGKM_CNT) {
  
 -MachineOperand Op = MI.getOperand(0);
 -if (!Op.isReg())
 -  Op = MI.getOperand(1);
 -assert(Op.isReg()  First LGKM operand must be a register!);
 -
 -unsigned Reg = Op.getReg();
 -unsigned Size = TRI-getMinimalPhysRegClass(Reg)-getSize();
 -Result.Named.LGKM = Size  4 ? 2 : 1;
 +if (MI.getNumOperands() == 3) {

We should add a TSFlag for SMRD like we do for MIMG and add a helper
function isSMRD to SIInstrInfo and use it here.  The number of operands
for instructions tends to change from time to time.

-Tom

 +
 +  // SMRD
 +  MachineOperand Op = MI.getOperand(0);
 +  assert(Op.isReg()  First LGKM operand must be a register!);
 +
 +  unsigned Reg = Op.getReg();
 +  unsigned Size = TRI-getMinimalPhysRegClass(Reg)-getSize();
 +  Result.Named.LGKM = Size  4 ? 2 : 1;
 +
 +} else {
 +  // DS
 +  Result.Named.LGKM = 1;
 +}
  
} else {
  Result.Named.LGKM = 0;
 diff --git a/test/CodeGen/R600/local-memory.ll 
 b/test/CodeGen/R600/local-memory.ll
 index 5458fb9..9ebb769 100644
 --- a/test/CodeGen/R600/local-memory.ll
 +++ b/test/CodeGen/R600/local-memory.ll
 @@ -13,7 +13,7 @@
  ; SI-CHECK-NEXT: .long 32768
  
  ; EG-CHECK: LDS_WRITE
 -; SI-CHECK: DS_WRITE_B32
 +; SI-CHECK: DS_WRITE_B32 0
  
  ; GROUP_BARRIER must be the

Re: [Mesa-dev] [PATCH 07/15] i965/fs: Create the COPY() set for use in copy propagation dataflow.

2013-08-15 Thread Paul Berry

On 12 August 2013 13:11, Kenneth Graunke kenn...@whitecape.org wrote:

 This is the COPY set from Muchnick's textbook, which is necessary
 to do the dataflow algorithm correctly.


I believe this can be done more easily.  The only ACP entries that are in
the hastable on exit from opt_copy_propagate_local() are the ones that made
it to the end of the block.  So we should be able to simply populate the
new COPY set in the for loop at the bottom of the fs_copy_prop_dataflow
constructor.  We don't need to do the extra work you're doing in
setup_initial_values().



 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  .../drivers/dri/i965/brw_fs_copy_propagation.cpp   | 27
 ++
  1 file changed, 23 insertions(+), 4 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
 b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
 index 7aff36b..2970af6 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
 @@ -64,6 +64,13 @@ struct block_data {
 BITSET_WORD *liveout;

 /**
 +* Which entries in the fs_copy_prop_dataflow acp table are generated
 by
 +* instructions in this block which reach the end of the block without
 +* being killed.
 +*/
 +   BITSET_WORD *copy;
 +
 +   /**
  * Which entries in the fs_copy_prop_dataflow acp table are killed
 over the
  * course of this block.
  */
 @@ -113,6 +120,7 @@ fs_copy_prop_dataflow::fs_copy_prop_dataflow(void
 *mem_ctx, cfg_t *cfg,
 for (int b = 0; b  cfg-num_blocks; b++) {
bd[b].livein = rzalloc_array(bd, BITSET_WORD, bitset_words);
bd[b].liveout = rzalloc_array(bd, BITSET_WORD, bitset_words);
 +  bd[b].copy = rzalloc_array(bd, BITSET_WORD, bitset_words);
bd[b].kill = rzalloc_array(bd, BITSET_WORD, bitset_words);

for (int i = 0; i  ACP_HASH_SIZE; i++) {
 @@ -148,15 +156,26 @@ fs_copy_prop_dataflow::setup_initial_values()
   if (inst-dst.file != GRF)
  continue;

 - /* Mark ACP entries which are killed by this instruction. */
   for (int i = 0; i  num_acp; i++) {
 -if (inst != acp[i]-inst 
 -(inst-overwrites_reg(acp[i]-dst) ||
 - inst-overwrites_reg(acp[i]-src))) {
 +if (inst == acp[i]-inst) {
 +   /* Add this entry to the COPY set. */
 +   BITSET_SET(bd[b].copy, i);
 +} else if (inst-overwrites_reg(acp[i]-dst) ||
 +   inst-overwrites_reg(acp[i]-src)) {
 +   /* The current instruction kills this copy.  Add the entry
 to
 +* the KILL set.
 +*/
 BITSET_SET(bd[b].kill, i);
  }
   }
}
 +
 +  /* Anything killed did not make it to the end of the block, so it
 +   * shouldn't be in COPY.
 +   */
 +  for (int i = 0; i  bitset_words; i++) {
 + bd[b].copy[i] = ~bd[b].kill[i];
 +  }
 }
  }

 --
 1.8.3.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] Bison 3 fixes

2013-08-15 Thread Ian Romanick


On 08/15/2013 06:55 AM, Armin K. wrote:

Can we have Bison 3 fixes in 9.2 branch so they make it into 9.2 release?

eb7c8c7fb6e49a04f3fe84a6d438160dc4a14ac0
f043381334a0760ec118d07b6fb7785b5692572a
de917b4c4c4dfc949d5f8e3d9eb2dd48b63a3de5
6d2a9220b832d9a0c0cf35fcc5b9de1542af267d
5ffa28df4e4cc22481b4ed41c78632f35765f41d


It looks like this are all on the release branch already.  If they're on 
the branch, they'll be in the release.



___
mesa-stable mailing list
mesa-sta...@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-stable


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/fs: Fix Sandybridge regressions from SEL optimization.

2013-08-15 Thread Matt Turner

On Tue, Aug 13, 2013 at 4:47 PM, Kenneth Graunke kenn...@whitecape.org wrote:
 Sandybridge is the only platform that supports an IF instruction
 with an embedded comparison.  In this case, we need to emit a CMP
 to go along with the SEL.

 Fixes regressions in Piglit's glsl-fs-atan-3, fs-unpackHalf2x16,
 fs-faceforward-float-float-float, isinf-and-isnan fs_basic, and
 isinf-and-isnan fs_fbo.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 17 +
  1 file changed, 13 insertions(+), 4 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 index a36c248..984b08a 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 @@ -1911,10 +1911,19 @@ fs_visitor::try_replace_with_sel()
   emit(MOV(src0, then_mov-src[0]));
}

 -  fs_inst *sel = emit(BRW_OPCODE_SEL, then_mov-dst, src0, 
 else_mov-src[0]);
 -  sel-predicate = if_inst-predicate;
 -  sel-predicate_inverse = if_inst-predicate_inverse;
 -  sel-conditional_mod = if_inst-conditional_mod;
 +  fs_inst *sel;
 +  if (if_inst-conditional_mod) {
 + /* Sandybridge-specific IF with embedded comparison */
 + emit(CMP(reg_null_d, if_inst-src[0], if_inst-src[1],
 +  if_inst-conditional_mod));
 + sel = emit(BRW_OPCODE_SEL, then_mov-dst, src0, else_mov-src[0]);
 + sel-predicate = BRW_PREDICATE_NORMAL;
 +  } else {
 + /* Separate CMP and IF instructions */
 + sel = emit(BRW_OPCODE_SEL, then_mov-dst, src0, else_mov-src[0]);
 + sel-predicate = if_inst-predicate;
 + sel-predicate_inverse = if_inst-predicate_inverse;
 +  }
 }
  }

 --
 1.8.3.4

Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 13/15] i965/fs: Fully recompute liveout at each step.

2013-08-15 Thread Paul Berry

On 12 August 2013 13:11, Kenneth Graunke kenn...@whitecape.org wrote:

 Since we start with an overestimation of livein (0x), successive
 steps may should actually take away values.  This means we can't simply


Is may should a Southernism?

In any case,

Reviewed-by: Paul Berry stereotype...@gmail.com


 OR in new liveout values; we need to recompute it from scratch at each
 iteration of the fixed-point algorithm.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
 b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
 index f5c8e4a..9522649 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
 @@ -219,7 +219,7 @@ fs_copy_prop_dataflow::run()
   for (int i = 0; i  bitset_words; i++) {
  const BITSET_WORD old_liveout = bd[b].liveout[i];

 -bd[b].liveout[i] |=
 +bd[b].liveout[i] =
 bd[b].copy[i] | (bd[b].livein[i]  ~bd[b].kill[i]);

  if (old_liveout != bd[b].liveout[i])
 --
 1.8.3.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] i965 global copy propagation fixes

2013-08-15 Thread Paul Berry

On 12 August 2013 13:11, Kenneth Graunke kenn...@whitecape.org wrote:

 Hello,

 This surprisingly large series fixes bugs in the data flow algorithm
 for copy propagation.  As far as I can tell, there were a myriad of
 issues.  The updated algorithm seems to follow the textbook and
 online materials much more closely.

 No Piglit regressions on Ivybridge.

 shader-db statistics are more or less a wash:

 total instructions in shared programs: 1569184 - 1569423 (0.02%)
 instructions in affected programs: 44961 - 45200 (0.53%)

 A bunch of things are helped by a small margin, while other things are hurt
 by a small margin (a couple of extra MOVs).  The few hurt shaders I looked
 into appeared to suffer from compute-to-MRF not handling control flow.
 There may be other reasons as well.

 I'd still like to land it, though, as the old algorithm seems to be
 pretty broken.


 --Ken


I sent comments on patch 2, 7, and 13.  The remainder are:

Reviewed-by: Paul Berry stereotype...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/fs: Fix Sandybridge regressions from SEL optimization.

2013-08-15 Thread Anuj Phogat

On Tue, Aug 13, 2013 at 5:57 PM, Kenneth Graunke kenn...@whitecape.org wrote:
 On 08/13/2013 05:49 PM, Ian Romanick wrote:

 On 08/13/2013 04:47 PM, Kenneth Graunke wrote:

 Sandybridge is the only platform that supports an IF instruction
 with an embedded comparison.  In this case, we need to emit a CMP
 to go along with the SEL.

 Fixes regressions in Piglit's glsl-fs-atan-3, fs-unpackHalf2x16,
 fs-faceforward-float-float-float, isinf-and-isnan fs_basic, and
 isinf-and-isnan fs_fbo.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
   src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 17 +
   1 file changed, 13 insertions(+), 4 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 index a36c248..984b08a 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 @@ -1911,10 +1911,19 @@ fs_visitor::try_replace_with_sel()
emit(MOV(src0, then_mov-src[0]));
 }

 -  fs_inst *sel = emit(BRW_OPCODE_SEL, then_mov-dst, src0,
 else_mov-src[0]);
 -  sel-predicate = if_inst-predicate;
 -  sel-predicate_inverse = if_inst-predicate_inverse;
 -  sel-conditional_mod = if_inst-conditional_mod;
 +  fs_inst *sel;
 +  if (if_inst-conditional_mod) {
 + /* Sandybridge-specific IF with embedded comparison */


 This doesn't appear to be SNB-specific code.  Can you explain this?  Is
 if_inst-conditional_mod only set on SNB?  I really need to learn more
 about the back end...


 Normally, control flow looks like:

 cmp.l.f0(8)  null   g58,8,1F0F
 (+f0) if(8)

 For Sandybridge, the hardware designers extended IF to support built-in
 comparisons, so you can simply do:

 if.l(8)g58,8,1F0F

 They immediately dropped this with Ivybridge; it's not been present on any
 other platform.

 fs_inst::conditional_mod represents that conditional modifier (always = 0,
 never, less, equal, lequal, greater, notequal, gequal).

Thanks for explaining Ken.
Reviewed-by: Anuj Phogat anuj.pho...@gmail.com

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] Bison 3 fixes

2013-08-15 Thread Armin K.


On 16.8.2013 0:20, Ian Romanick wrote:

On 08/15/2013 06:55 AM, Armin K. wrote:

Can we have Bison 3 fixes in 9.2 branch so they make it into 9.2 release?

eb7c8c7fb6e49a04f3fe84a6d438160dc4a14ac0
f043381334a0760ec118d07b6fb7785b5692572a
de917b4c4c4dfc949d5f8e3d9eb2dd48b63a3de5
6d2a9220b832d9a0c0cf35fcc5b9de1542af267d
5ffa28df4e4cc22481b4ed41c78632f35765f41d


It looks like this are all on the release branch already.  If they're on
the branch, they'll be in the release.


___
mesa-stable mailing list
mesa-sta...@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-stable




Last one is missing.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/fs: Fix Sandybridge regressions from SEL optimization.

2013-08-15 Thread Ian Romanick


On 08/13/2013 05:57 PM, Kenneth Graunke wrote:

On 08/13/2013 05:49 PM, Ian Romanick wrote:

On 08/13/2013 04:47 PM, Kenneth Graunke wrote:

Sandybridge is the only platform that supports an IF instruction
with an embedded comparison.  In this case, we need to emit a CMP
to go along with the SEL.

Fixes regressions in Piglit's glsl-fs-atan-3, fs-unpackHalf2x16,
fs-faceforward-float-float-float, isinf-and-isnan fs_basic, and
isinf-and-isnan fs_fbo.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 17 +
  1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index a36c248..984b08a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1911,10 +1911,19 @@ fs_visitor::try_replace_with_sel()
   emit(MOV(src0, then_mov-src[0]));
}

-  fs_inst *sel = emit(BRW_OPCODE_SEL, then_mov-dst, src0,
else_mov-src[0]);
-  sel-predicate = if_inst-predicate;
-  sel-predicate_inverse = if_inst-predicate_inverse;
-  sel-conditional_mod = if_inst-conditional_mod;
+  fs_inst *sel;
+  if (if_inst-conditional_mod) {
+ /* Sandybridge-specific IF with embedded comparison */


This doesn't appear to be SNB-specific code.  Can you explain this?  Is
if_inst-conditional_mod only set on SNB?  I really need to learn more
about the back end...


Normally, control flow looks like:

cmp.l.f0(8)  null   g58,8,1F0F
(+f0) if(8)

For Sandybridge, the hardware designers extended IF to support built-in
comparisons, so you can simply do:

if.l(8)g58,8,1F0F

They immediately dropped this with Ivybridge; it's not been present on
any other platform.

fs_inst::conditional_mod represents that conditional modifier (always =
0, never, less, equal, lequal, greater, notequal, gequal).



In other words... yes to Is if_inst-conditional_mod only set on SNB? :)

Reviewed-by: Ian Romanick ian.d.roman...@intel.com

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] i965: Emit MOVs for neg/abs.

2013-08-15 Thread Paul Berry

On 12 August 2013 13:18, Matt Turner matts...@gmail.com wrote:

 Necessary to avoid combining a bitcast and a modifier into a single
 operation. Otherwise if safe, the MOV should be removed by
 copy-propagation or register coalescing.
 ---
  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 4 ++--
  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)


This patch is:

Reviewed-by: Paul Berry stereoytpe...@gmail.com



 diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 index ee7728c..fa4554b 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 @@ -361,12 +361,12 @@ fs_visitor::visit(ir_expression *ir)
break;
 case ir_unop_neg:
op[0].negate = !op[0].negate;
 -  this-result = op[0];
 +  emit(MOV(this-result, op[0]));
break;
 case ir_unop_abs:
op[0].abs = true;
op[0].negate = false;
 -  this-result = op[0];
 +  emit(MOV(this-result, op[0]));
break;
 case ir_unop_sign:
temp = fs_reg(this, ir-type);
 diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
 b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
 index 8d4a5d4..05c0091 100644
 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
 @@ -1391,12 +1391,12 @@ vec4_visitor::visit(ir_expression *ir)
break;
 case ir_unop_neg:
op[0].negate = !op[0].negate;
 -  this-result = op[0];
 +  emit(MOV(result_dst, op[0]));
break;
 case ir_unop_abs:
op[0].abs = true;
op[0].negate = false;
 -  this-result = op[0];
 +  emit(MOV(result_dst, op[0]));
break;

 case ir_unop_sign:
 --
 1.8.3.2

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: Don't copy propagate bitcasts with source modifiers.

2013-08-15 Thread Paul Berry

On 12 August 2013 13:18, Matt Turner matts...@gmail.com wrote:

 ---
  src/mesa/drivers/dri/i965/brw_fs.cpp| 13 +
  src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp   |  3 +++
  src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 10 ++
  3 files changed, 22 insertions(+), 4 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
 b/src/mesa/drivers/dri/i965/brw_fs.cpp
 index a81e97f..f104f8c 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
 @@ -2118,6 +2118,19 @@ fs_visitor::register_coalesce()
 }
  }

 + if (has_source_modifiers) {
 +for (int i = 0; i  3; i++) {
 +   if (scan_inst-src[i].file == GRF 
 +   scan_inst-src[i].reg == inst-dst.reg 
 +   scan_inst-src[i].reg_offset == inst-dst.reg_offset 
 +   inst-dst.type != scan_inst-src[i].type)
 +   {
 + return false;
 +   }
 +}
 + }
 +
 +


I don't understand the whole patch, but I'm pretty sure it's wrong to
return false in this case, because that will exit the entire
fs_visitor::register_coalesce() function.


  /* The gen6 MATH instruction can't handle source modifiers or
   * unusual register regions, so avoid coalescing those for
   * now.  We should do something more specific.
 diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
 b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
 index 234f8bd..a5cd858 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
 @@ -221,6 +221,9 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg,
 acp_entry *entry)
  entry-src.smear != -1)  !can_do_source_mods(inst))
return false;

 +   if (has_source_modifiers  (entry-dst.type != inst-src[arg].type))
 +  return false;
 +
 inst-src[arg].file = entry-src.file;
 inst-src[arg].reg = entry-src.reg;
 inst-src[arg].reg_offset = entry-src.reg_offset;
 diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
 b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
 index c28d0de..2a2d403 100644
 --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
 @@ -206,14 +206,16 @@ vec4_visitor::try_copy_propagation(vec4_instruction
 *inst, int arg,
 if (inst-src[arg].negate)
value.negate = !value.negate;

 -   bool has_source_modifiers = (value.negate || value.abs ||
 -value.swizzle != BRW_SWIZZLE_XYZW ||
 -value.file == UNIFORM);
 +   bool has_source_modifiers = value.negate || value.abs;

 /* gen6 math and gen7+ SENDs from GRFs ignore source modifiers on
  * instructions.
  */
 -   if (has_source_modifiers  !can_do_source_mods(inst))
 +   if ((has_source_modifiers || value.file == UNIFORM ||
 +value.swizzle != BRW_SWIZZLE_XYZW)  !can_do_source_mods(inst))
 +  return false;
 +
 +   if (has_source_modifiers  (value.type != inst-src[arg].type))
return false;

 bool is_3src_inst = (inst-opcode == BRW_OPCODE_LRP ||
 --
 1.8.3.2

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] i965: Don't copy propagate bitcasts with source modifiers.

2013-08-15 Thread Matt Turner

Previously, copy propagation would cause bitcast_f2u(abs(float)) to
be performed in a single step, but the application of source modifiers
(abs, neg) happens after type conversion, leading to incorrect results.

That is, for bitcast_f2u(abs(float)) we would in fact generate code to
do abs(bitcast_f2u(float)).

For example, whereas bitcast_f2u(abs(float)) might result in a register
argument such as
   (abs)g2.20,1,0UD

v2: Set interfered = true and break in register_coalesce instead of
returning false.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp| 14 ++
 src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp   |  3 +++
 src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 10 ++
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 69e544a..d111cbd 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2118,6 +2118,20 @@ fs_visitor::register_coalesce()
}
 }
 
+ if (has_source_modifiers) {
+for (int i = 0; i  3; i++) {
+   if (scan_inst-src[i].file == GRF 
+   scan_inst-src[i].reg == inst-dst.reg 
+   scan_inst-src[i].reg_offset == inst-dst.reg_offset 
+   inst-dst.type != scan_inst-src[i].type)
+   {
+ interfered = true;
+ break;
+   }
+}
+ }
+
+
 /* The gen6 MATH instruction can't handle source modifiers or
  * unusual register regions, so avoid coalescing those for
  * now.  We should do something more specific.
diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
index 234f8bd..61b2617 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
@@ -221,6 +221,9 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
acp_entry *entry)
 entry-src.smear != -1)  !can_do_source_mods(inst))
   return false;
 
+   if (has_source_modifiers  entry-dst.type != inst-src[arg].type)
+  return false;
+
inst-src[arg].file = entry-src.file;
inst-src[arg].reg = entry-src.reg;
inst-src[arg].reg_offset = entry-src.reg_offset;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index c28d0de..fdbe96c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
@@ -206,14 +206,16 @@ vec4_visitor::try_copy_propagation(vec4_instruction 
*inst, int arg,
if (inst-src[arg].negate)
   value.negate = !value.negate;
 
-   bool has_source_modifiers = (value.negate || value.abs ||
-value.swizzle != BRW_SWIZZLE_XYZW ||
-value.file == UNIFORM);
+   bool has_source_modifiers = value.negate || value.abs;
 
/* gen6 math and gen7+ SENDs from GRFs ignore source modifiers on
 * instructions.
 */
-   if (has_source_modifiers  !can_do_source_mods(inst))
+   if ((has_source_modifiers || value.file == UNIFORM ||
+value.swizzle != BRW_SWIZZLE_XYZW)  !can_do_source_mods(inst))
+  return false;
+
+   if (has_source_modifiers  value.type != inst-src[arg].type)
   return false;
 
bool is_3src_inst = (inst-opcode == BRW_OPCODE_LRP ||
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] gallivm: implement better control of per-quad/per-element/scalar lod

2013-08-15 Thread sroland

From: Roland Scheidegger srol...@vmware.com

There's a new debug value used to disable per-quad lod optimizations
in fragment shader (ignored for vs/gs as the results are just too wrong
typically). Also trying to detect if a supplied lod value is really a
scalar (if it's coming from immediate or constant file) in which case
sampler code can use this to stay on per-quad-lod path (in fact for
explicit lod could simplify even further and use same lod for both
quads in the avx case but this is not implemented yet).
Still need to actually implement per-element lod bias (and derivatives),
and need to handle per-element lod in size queries.
---
 src/gallium/auxiliary/draw/draw_llvm_sample.c |8 +-
 src/gallium/auxiliary/gallivm/lp_bld_debug.h  |3 +-
 src/gallium/auxiliary/gallivm/lp_bld_init.c   |1 +
 src/gallium/auxiliary/gallivm/lp_bld_sample.h |   11 +-
 src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |8 +-
 src/gallium/auxiliary/gallivm/lp_bld_tgsi.h   |5 +-
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c   |  158 -
 src/gallium/drivers/llvmpipe/lp_tex_sample.c  |8 +-
 8 files changed, 147 insertions(+), 55 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_llvm_sample.c 
b/src/gallium/auxiliary/draw/draw_llvm_sample.c
index 97b0255..a6341fa 100644
--- a/src/gallium/auxiliary/draw/draw_llvm_sample.c
+++ b/src/gallium/auxiliary/draw/draw_llvm_sample.c
@@ -238,7 +238,7 @@ draw_llvm_sampler_soa_emit_fetch_texel(const struct 
lp_build_sampler_soa *base,
const struct lp_derivatives *derivs,
LLVMValueRef lod_bias, /* optional */
LLVMValueRef explicit_lod, /* optional 
*/
-   boolean scalar_lod,
+   enum lp_sampler_lod_property 
lod_property,
LLVMValueRef *texel)
 {
struct draw_llvm_sampler_soa *sampler = (struct draw_llvm_sampler_soa 
*)base;
@@ -257,7 +257,7 @@ draw_llvm_sampler_soa_emit_fetch_texel(const struct 
lp_build_sampler_soa *base,
coords,
offsets,
derivs,
-   lod_bias, explicit_lod, scalar_lod,
+   lod_bias, explicit_lod, lod_property,
texel);
 }
 
@@ -272,7 +272,7 @@ draw_llvm_sampler_soa_emit_size_query(const struct 
lp_build_sampler_soa *base,
   unsigned texture_unit,
   unsigned target,
   boolean is_sviewinfo,
-  boolean scalar_lod,
+  enum lp_sampler_lod_property 
lod_property,
   LLVMValueRef explicit_lod, /* optional */
   LLVMValueRef *sizes_out)
 {
@@ -287,7 +287,7 @@ draw_llvm_sampler_soa_emit_size_query(const struct 
lp_build_sampler_soa *base,
texture_unit,
target,
is_sviewinfo,
-   scalar_lod,
+   lod_property,
explicit_lod,
sizes_out);
 }
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_debug.h 
b/src/gallium/auxiliary/gallivm/lp_bld_debug.h
index 4f38edf..76c39af 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_debug.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_debug.h
@@ -43,7 +43,8 @@
 #define GALLIVM_DEBUG_PERF  (1  4)
 #define GALLIVM_DEBUG_NO_BRILINEAR  (1  5)
 #define GALLIVM_DEBUG_NO_RHO_APPROX (1  6)
-#define GALLIVM_DEBUG_GC(1  7)
+#define GALLIVM_DEBUG_NO_QUAD_LOD   (1  7)
+#define GALLIVM_DEBUG_GC(1  8)
 
 
 #ifdef __cplusplus
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c 
b/src/gallium/auxiliary/gallivm/lp_bld_init.c
index e4cc058..61eadb8 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_init.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_init.c
@@ -80,6 +80,7 @@ static const struct debug_named_value lp_bld_debug_flags[] = {
{ perf,   GALLIVM_DEBUG_PERF, NULL },
{ no_brilinear, GALLIVM_DEBUG_NO_BRILINEAR, NULL },
{ no_rho_approx, GALLIVM_DEBUG_NO_RHO_APPROX, NULL },
+   { no_quad_lod, GALLIVM_DEBUG_NO_QUAD_LOD, NULL },
{ gc, GALLIVM_DEBUG_GC, NULL },
DEBUG_NAMED_VALUE_END
 };
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.h 
b/src/gallium/auxiliary/gallivm/lp_bld_sample.h
index 6d8fe88..a6901d1 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.h
@@ -61,6 +61,13 @@ struct lp_derivatives
 };
 
 
+enum lp_sampler_lod_property {
+   lp_sampler_lod_scalar,
+   lp_sampler_lod_perelement,
+   lp_sampler_lod_perquad
+};
+
+
 /**
  * Texture static state.
  *
@@ -476,7 +483,7

[Mesa-dev] [PATCH 1/2] radeonsi: add FMASK texture binding slots and resource setup (v2)

2013-08-15 Thread Marek Olšák

v2: bind FMASK textures to shader resource slots 16..31
---
This depends on the patch:
radeonsi: add flexible shader descriptor management and use it for sampler 
views

 src/gallium/drivers/radeonsi/r600_resource.h  |  1 +
 src/gallium/drivers/radeonsi/r600_texture.c   |  1 +
 src/gallium/drivers/radeonsi/radeonsi_pipe.h  |  1 +
 src/gallium/drivers/radeonsi/si_descriptors.c |  5 ++-
 src/gallium/drivers/radeonsi/si_state.c   | 52 +++
 src/gallium/drivers/radeonsi/si_state.h   | 10 --
 6 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/r600_resource.h 
b/src/gallium/drivers/radeonsi/r600_resource.h
index e5dd36a..ab5c7b7 100644
--- a/src/gallium/drivers/radeonsi/r600_resource.h
+++ b/src/gallium/drivers/radeonsi/r600_resource.h
@@ -44,6 +44,7 @@ struct r600_fmask_info {
unsigned offset;
unsigned size;
unsigned alignment;
+   unsigned pitch;
unsigned bank_height;
unsigned slice_tile_max;
unsigned tile_mode_index;
diff --git a/src/gallium/drivers/radeonsi/r600_texture.c 
b/src/gallium/drivers/radeonsi/r600_texture.c
index 59e3604..1507239 100644
--- a/src/gallium/drivers/radeonsi/r600_texture.c
+++ b/src/gallium/drivers/radeonsi/r600_texture.c
@@ -463,6 +463,7 @@ static void r600_texture_get_fmask_info(struct r600_screen 
*rscreen,
out-slice_tile_max -= 1;
 
out-tile_mode_index = fmask.tiling_index[0];
+   out-pitch = fmask.level[0].nblk_x;
out-bank_height = fmask.bankh;
out-alignment = MAX2(256, fmask.bo_alignment);
out-size = fmask.bo_size;
diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.h 
b/src/gallium/drivers/radeonsi/radeonsi_pipe.h
index 147368c..27913ed 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.h
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.h
@@ -83,6 +83,7 @@ struct si_pipe_sampler_view {
struct pipe_sampler_viewbase;
struct si_resource  *resource;
uint32_tstate[8];
+   uint32_tfmask_state[8];
 };
 
 struct si_pipe_sampler_state {
diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index f05c8f4..e171f8a 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -115,6 +115,9 @@ static void si_init_descriptors(struct r600_context *rctx,
 {
uint64_t va;
 
+   assert(num_elements = sizeof(desc-enabled_mask)*8);
+   assert(num_elements = sizeof(desc-dirty_mask)*8);
+
desc-atom.emit = emit_func;
desc-shader_userdata_reg = shader_userdata_reg;
desc-element_dw_size = element_dw_size;
@@ -263,7 +266,7 @@ static void si_init_sampler_views(struct r600_context *rctx,
si_init_descriptors(rctx, views-desc,
si_get_shader_user_data_base(shader) +
SI_SGPR_RESOURCE * 4,
-   8, 16, si_emit_sampler_views);
+   8, NUM_SAMPLER_VIEWS, si_emit_sampler_views);
 }
 
 static void si_release_sampler_views(struct si_sampler_views *views)
diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index b86872b..77a4339 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -2705,6 +2705,44 @@ static struct pipe_sampler_view 
*si_create_sampler_view(struct pipe_context *ctx
view-state[6] = 0;
view-state[7] = 0;
 
+   /* Initialize the sampler view for FMASK. */
+   if (tmp-fmask.size) {
+   uint64_t va = r600_resource_va(ctx-screen, texture) + 
tmp-fmask.offset;
+   uint32_t fmask_format;
+
+   switch (texture-nr_samples) {
+   case 2:
+   fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK8_S2_F2;
+   break;
+   case 4:
+   fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK8_S4_F4;
+   break;
+   case 8:
+   fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK32_S8_F8;
+   break;
+   default:
+   assert(0);
+   }
+
+   view-fmask_state[0] = va  8;
+   view-fmask_state[1] = S_008F14_BASE_ADDRESS_HI(va  40) |
+  S_008F14_DATA_FORMAT(fmask_format) |
+  
S_008F14_NUM_FORMAT(V_008F14_IMG_NUM_FORMAT_UINT);
+   view-fmask_state[2] = S_008F18_WIDTH(width - 1) |
+  S_008F18_HEIGHT(height - 1);
+   view-fmask_state[3] = S_008F1C_DST_SEL_X(V_008F1C_SQ_SEL_X) |
+  S_008F1C_DST_SEL_Y(V_008F1C_SQ_SEL_X) |
+

[Mesa-dev] [PATCH 2/2] radeonsi: implement texture fetching for compressed MSAA textures (v2)

2013-08-15 Thread Marek Olšák

v2: use resource slots 16..31 for FMASK textures
---
 src/gallium/drivers/radeonsi/radeonsi_shader.c | 121 -
 1 file changed, 116 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c 
b/src/gallium/drivers/radeonsi/radeonsi_shader.c
index cbe10cc..3e62858 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_shader.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c
@@ -915,6 +915,12 @@ handle_semantic:
 /* ctx-shader-output[i].spi_sid = 
r600_spi_sid(ctx-shader-output[i]);*/
 }
 
+static const struct lp_build_tgsi_action txf_action;
+
+static void build_tex_intrinsic(const struct lp_build_tgsi_action * action,
+   struct lp_build_tgsi_context * bld_base,
+   struct lp_build_emit_data * emit_data);
+
 static void tex_fetch_args(
struct lp_build_tgsi_context * bld_base,
struct lp_build_emit_data * emit_data)
@@ -924,9 +930,11 @@ static void tex_fetch_args(
const struct tgsi_full_instruction * inst = emit_data-inst;
unsigned opcode = inst-Instruction.Opcode;
unsigned target = inst-Texture.Texture;
-   unsigned sampler_src;
+   unsigned sampler_src, sampler_index;
LLVMValueRef coords[4];
LLVMValueRef address[16];
+   LLVMValueRef sample_index_rewrite = NULL;
+   LLVMValueRef sample_chan = NULL;
int ref_pos;
unsigned num_coords = tgsi_util_get_texture_coord_dim(target, ref_pos);
unsigned count = 0;
@@ -1021,9 +1029,99 @@ static void tex_fetch_args(
}
 
sampler_src = emit_data-inst-Instruction.NumSrcRegs - 1;
+   sampler_index = emit_data-inst-Src[sampler_src].Register.Index;
+
+   /* Adjust the sample index according to FMASK.
+*
+* For uncompressed MSAA surfaces, FMASK should return 0x76543210,
+* which is the identity mapping. Each nibble says which physical sample
+* should be fetched to get that sample.
+*
+* For example, 0x1100 means there are only 2 samples stored and
+* the second sample covers 3/4 of the pixel. When reading samples 0
+* and 1, return physical sample 0 (determined by the first two 0s
+* in FMASK), otherwise return physical sample 1.
+*
+* The sample index should be adjusted as follows:
+*   sample_index = (fmask  (sample_index * 4))  0xF;
+*/
+   if (target == TGSI_TEXTURE_2D_MSAA ||
+   target == TGSI_TEXTURE_2D_ARRAY_MSAA) {
+   struct lp_build_context *uint_bld = bld_base-uint_bld;
+   struct lp_build_emit_data txf_emit_data = *emit_data;
+   LLVMValueRef txf_address[16];
+   unsigned txf_count = count;
+
+   memcpy(txf_address, address, sizeof(address));
+
+   /* Pad to a power-of-two size. */
+   while (txf_count  util_next_power_of_two(txf_count))
+   txf_address[txf_count++] = 
LLVMGetUndef(LLVMInt32TypeInContext(gallivm-context));
+
+   /* Read FMASK using TXF. */
+   txf_emit_data.chan = 0;
+   txf_emit_data.dst_type = LLVMVectorType(
+   
LLVMInt32TypeInContext(bld_base-base.gallivm-context), 4);
+   txf_emit_data.args[0] = lp_build_gather_values(gallivm, 
txf_address, txf_count);
+   txf_emit_data.args[1] = 
si_shader_ctx-resources[FMASK_TEX_OFFSET + sampler_index];
+   txf_emit_data.args[2] = 
lp_build_const_int32(bld_base-base.gallivm, target);
+   txf_emit_data.arg_count = 3;
+
+   build_tex_intrinsic(txf_action, bld_base, txf_emit_data);
+
+   /* Initialize some constants. */
+   if (target == TGSI_TEXTURE_2D_MSAA) {
+   sample_chan = LLVMConstInt(uint_bld-elem_type, 2, 0);
+   } else {
+   sample_chan = LLVMConstInt(uint_bld-elem_type, 3, 0);
+   }
+
+   LLVMValueRef four = LLVMConstInt(uint_bld-elem_type, 4, 0);
+   LLVMValueRef F = LLVMConstInt(uint_bld-elem_type, 0xF, 0);
+
+   /* Apply the formula. */
+   LLVMValueRef fmask =
+   LLVMBuildExtractElement(gallivm-builder,
+   txf_emit_data.output[0],
+   uint_bld-zero, );
+
+   LLVMValueRef sample_index =
+   LLVMBuildExtractElement(gallivm-builder,
+   txf_emit_data.args[0],
+   sample_chan, );
+
+   LLVMValueRef sample_index4 =
+   LLVMBuildMul(gallivm-builder, sample_index, four, );
+
+   LLVMValueRef shifted_fmask =
+   LLVMBuildLShr(gallivm-builder, fmask, sample_index4, 
);
+
+   LLVMValueRef

Re: [Mesa-dev] [PATCH 1/2] i965: Emit MOVs for neg/abs.

2013-08-15 Thread Ian Romanick


On 08/12/2013 01:18 PM, Matt Turner wrote:

Necessary to avoid combining a bitcast and a modifier into a single
operation. Otherwise if safe, the MOV should be removed by
copy-propagation or register coalescing.


Has that been verified with shaderdb?


---
  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 4 ++--
  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index ee7728c..fa4554b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -361,12 +361,12 @@ fs_visitor::visit(ir_expression *ir)
break;
 case ir_unop_neg:
op[0].negate = !op[0].negate;
-  this-result = op[0];
+  emit(MOV(this-result, op[0]));
break;
 case ir_unop_abs:
op[0].abs = true;
op[0].negate = false;
-  this-result = op[0];
+  emit(MOV(this-result, op[0]));
break;
 case ir_unop_sign:
temp = fs_reg(this, ir-type);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 8d4a5d4..05c0091 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -1391,12 +1391,12 @@ vec4_visitor::visit(ir_expression *ir)
break;
 case ir_unop_neg:
op[0].negate = !op[0].negate;
-  this-result = op[0];
+  emit(MOV(result_dst, op[0]));
break;
 case ir_unop_abs:
op[0].abs = true;
op[0].negate = false;
-  this-result = op[0];
+  emit(MOV(result_dst, op[0]));
break;

 case ir_unop_sign:



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 67925] ARB_vertex_array_bgra api-errors test fails when normalized is false

2013-08-15 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=67925

Kenneth Graunke kenn...@whitecape.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|mesa-dev@lists.freedesktop. |kenn...@whitecape.org
   |org |
 QA Contact||e...@anholt.net
Product|Mesa|piglit
Version|git |unspecified
  Component|Mesa core   |tests
 Blocks|67224   |

--- Comment #2 from Kenneth Graunke kenn...@whitecape.org ---
I picked Fredrik's version as I liked it best.  It's now pushed to master as:

commit 0e7a61a29f883c63a5439ac16eddeba3aaf4
Author: Fredrik Höglund fred...@kde.org
Date:   Fri Apr 12 17:36:06 2013 +0200

mesa: Update the BGRA vertex array error handling

The error code was changed from INVALID_VALUE to INVALID_OPERATION
in OpenGL 3.3. We should also generate an error when size is BGRA
and normalized is FALSE.

Reviewed-by: Kenneth Graunke kenn...@whitecape.org

Since this uses INVALID_OPERATION, it remains to change the tests.  I've posted
patches for this:
http://lists.freedesktop.org/archives/piglit/2013-August/006774.html
http://lists.freedesktop.org/archives/piglit/2013-August/006775.html

In the meantime, I've changed this bug to Piglit and removed it from the Mesa
9.2 release tracker bug (as it isn't a Mesa bug now).

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/4] glsl: Drop duplicate error messages.

2013-08-15 Thread Kenneth Graunke

On Thursday, August 15, 2013 11:27:14 AM Matt Turner wrote:
 This same message is printed in the validate_matrix_layout_for_type
 function.
 ---
  src/glsl/ast_to_hir.cpp | 13 +
  1 file changed, 1 insertion(+), 12 deletions(-)
 
 diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
 index 9bc713d..c9ffa30 100644
 --- a/src/glsl/ast_to_hir.cpp
 +++ b/src/glsl/ast_to_hir.cpp
 @@ -2275,13 +2275,7 @@ apply_type_qualifier_to_variable(const struct
 ast_type_qualifier *qual, }
 
 if (qual-flags.q.row_major || qual-flags.q.column_major) {
 -  if (!ubo_qualifiers_allowed) {
 -  _mesa_glsl_error(loc, state,
 -   uniform block layout qualifiers row_major and 
 -   column_major can only be applied to uniform 
block 
 -   members);
 -  } else
 -  validate_matrix_layout_for_type(state, loc, var-type);
 +  validate_matrix_layout_for_type(state, loc, var-type);

Maybe I'm blind, but I don't see where row_major/column_major are only allowed 
for uniform blocks, as opposed to other interface blocks or variables in 
general.  I thought that was part of what ubo_qualifiers_valid did, but 
clearly it doesn't.

Regardless, assuming you fix the typo in patch 4's error message...

This series is:
Reviewed-by: Kenneth Graunke kenn...@whitecape.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] i965: Improve comments for driver hooks in intel_buffer_object.c.

2013-08-15 Thread Paul Berry

On 14 August 2013 12:07, Kenneth Graunke kenn...@whitecape.org wrote:

 Consistently using a The ___ driver hook. line at the the top of each
 function's comment block makes it easy to see at a glance what function
 is being implemented.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org


Series is:

Reviewed-by: Paul Berry stereotype...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Direct3D 9 state tracker

2013-08-15 Thread Carlo Marchiori

Hello,

one day I bought Mass Effect 3, but currently I'm not really able to play
it on my Linux laptop featuring

3.0 Mesa 9.1.4.
Gallium 0.4 on AMD CEDAR

I tried the fglrx driver, but it gives more problems than it solves (yet
the fps is there).
With the radeon driver I manage to get to an ALMOST decent frame rate,
meaning that is barely playable.

In conclusion I would like to try this new D3D9, but would somebody
be so kind to drop me 10 lines as a guideline to do it?

Thanks in advance,
Carlo.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nouveau/video: use correct parameter name

2013-08-15 Thread Christian König


Am 15.08.2013 14:54, schrieb Emil Velikov:

On 15/08/13 13:41, Rico Schüller wrote:

On 15.08.2013 02:10, Emil Velikov wrote:

Fix a typo introduced with commit d1ba1055d9 -
vl: Add support for max level query v2

Cc: Rico Schüller kgbric...@web.de
Cc: Christian König christian.koe...@amd.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68126
Signed-off-by: Emil Velikov emil.l.veli...@gmail.com
---

FWIW the original patch could have introduced default params for all
profiles,
and let the individual driver provide their own function to ease
duplication.
A call to get_params(VIDEO_CAP_SUPPORTED) ensures that we do not
request the
CAP_MAX_LEVEL of a unsupported profile

And what are the correct default values? Are these the values which most
hardware can do (at this time) or the maximum values? (for reference
only for H.264 see: http://en.wikipedia.org/wiki/H.264#Levels ) It's
pretty much likely that some video decoding units may have different
supported levels and then we would add that back in again. I'm not sure
this is really better. Does something like the attached meet your needs?


I was thinking that the ones used by most hardware can be considered
default. Anyway all this is a silly bikeshedding, which I could have
omitted.

Patch looks good, thanks.


Only for the shader based MPEG2 implementation a sane fallback is 
needed, so the original implementation is actually already fine, cause 
if anybody things the levels should be different for a specific driver 
can just change that driver.


Christian.


Emil


Thanks for the fix.

Cheers
Rico



   src/gallium/drivers/nouveau/nouveau_video.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nouveau_video.c
b/src/gallium/drivers/nouveau/nouveau_video.c
index 1563b22..5c4ec0f 100644
--- a/src/gallium/drivers/nouveau/nouveau_video.c
+++ b/src/gallium/drivers/nouveau/nouveau_video.c
@@ -863,7 +863,7 @@ nouveau_screen_get_video_param(struct pipe_screen
*pscreen,
  case PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE:
 return true;
  case PIPE_VIDEO_CAP_MAX_LEVEL:
-  return vl_level_supported(screen, profile);
+  return vl_level_supported(pscreen, profile);
  default:
 debug_printf(unknown video param: %d\n, param);
 return 0;






___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Direct3D 9 state tracker [amended]

2013-08-15 Thread Carlo Marchiori

Hello,

one day I bought Mass Effect 3, but currently I'm not really able to play
it on my Linux laptop featuring

3.0 Mesa 9.1.4.
Gallium 0.4 on AMD CEDAR

I tried the fglrx driver, but it gives more problems than it solves (yet
the fps is there).
With the radeon driver I manage to get to an ALMOST decent frame rate,
meaning that is barely playable.

In conclusion I would like to try this new D3D9, but would somebody
be so kind to drop me 10 lines as a guideline to do it?

Plese consider that Mass Effect 3 appear to run only on
wine 5.8 i386 (my laptop is 64 bit), so I should patch that very version.

Thanks in advance,
Carlo


2013/8/15 Carlo Marchiori carlo.marchi...@gmail.com

 Hello,

 one day I bought Mass Effect 3, but currently I'm not really able to play
 it on my Linux laptop featuring

 3.0 Mesa 9.1.4.
 Gallium 0.4 on AMD CEDAR

 I tried the fglrx driver, but it gives more problems than it solves (yet
 the fps is there).
 With the radeon driver I manage to get to an ALMOST decent frame rate,
 meaning that is barely playable.

 In conclusion I would like to try this new D3D9, but would somebody
 be so kind to drop me 10 lines as a guideline to do it?

 Thanks in advance,
 Carlo.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 09/10] i965: Shorten sampler loops in precompile key setup.

2013-08-15 Thread Paul Berry

On 14 August 2013 18:55, Kenneth Graunke kenn...@whitecape.org wrote:

 Now that we have the number of samplers available, we don't need to
 iterate over all 16.  This should be particularly helpful for vertex
 shaders.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +-
  src/mesa/drivers/dri/i965/brw_vs.c   | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
 b/src/mesa/drivers/dri/i965/brw_fs.cpp
 index 69e544a..27263fd 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
 @@ -3143,7 +3143,7 @@ brw_fs_precompile(struct gl_context *ctx, struct
 gl_shader_program *prog)

 key.clamp_fragment_color = ctx-API == API_OPENGL_COMPAT;

 -   for (int i = 0; i  MAX_SAMPLERS; i++) {
 +   for (unsigned i = 0; i  brw-wm.sampler_count; i++) {


Precompile is called during link, before the program is made current.  So
brw-wm is some other program.  I think you need to do this instead:
_mesa_fls(fp-Base.SamplersUsed).


if (fp-Base.ShadowSamplers  (1  i)) {
   /* Assume DEPTH_TEXTURE_MODE is the default: X, X, X, 1 */
   key.tex.swizzles[i] =
 diff --git a/src/mesa/drivers/dri/i965/brw_vs.c
 b/src/mesa/drivers/dri/i965/brw_vs.c
 index 5b8173d..7df93c2 100644
 --- a/src/mesa/drivers/dri/i965/brw_vs.c
 +++ b/src/mesa/drivers/dri/i965/brw_vs.c
 @@ -544,7 +544,7 @@ brw_vs_precompile(struct gl_context *ctx, struct
 gl_shader_program *prog)
 key.base.program_string_id = bvp-id;
 key.base.clamp_vertex_color = ctx-API == API_OPENGL_COMPAT;

 -   for (int i = 0; i  MAX_SAMPLERS; i++) {
 +   for (unsigned i = 0; i  brw-vs.sampler_count; i++) {


Similarly, this should be _mesa_fls(vp-Base.SamplersUsed).

With those fixes, this patch is:

Reviewed-by: Paul Berry stereotype...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/10] i965: Separate VS/FS sampler tables.

2013-08-15 Thread Paul Berry

On 14 August 2013 18:55, Kenneth Graunke kenn...@whitecape.org wrote:

 Currently, i965 uploads a single SAMPLER_STATE table shared across all
 shader stages (VS, FS).  This series splits it out, uploading a unique
 table for each stage.

 I think this may actually fix some bugs with vertex texturing:
 Piglit's fragment-and-vertex-texturing uses two textures (unit 0 and
 unit 1), one in each shader.  vs-SamplersUsed and fs-SamplersUsed
 both only have one bit set (bit 0), but vs-SamplerUnits and
 fs-SamplerUnits map them differently (units 0 and 1).  The existing
 code would select fs-SamplerUnits[0], ignoring vs-SamplerUnits[0].
 If the two textures had, say, different wrap modes, this would probably
 illustrate the problem.

 It also just seems like a good idea.  The border color code in particular
 is much nicer after this change, as it's not directly tied to brw-wm
 any longer (even though textures can be used in all shader stages).

 No observed Piglit changes on Ivybridge.


Nice!  I think this will make things easier for me in geometry shader land
as well.

I sent a comment on patch 9.  The rest are:

Reviewed-by: Paul Berry stereotype...@gmail.com

Possible follow-up idea: ctx-vs and ctx-fs now have a lot of elements in
common: scratch_bo, const_bo, prog_offset, state_offset, push_const_offset,
bind_bo_offset, surf_offset, sampler_count, sampler_offset, and
sdc_offset.  Let's put those in their own struct, and then we can pass a
pointer to that struct to upload_sampler_state_table virtual function.  I
might should try doing that as part of my upcoming geometry shader back-end
series.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

70 matches

Mail list logo