date:20160207

Re: [Mesa-dev] [PATCH 04/12] nvc0: bind driver consts on buffer 15 for compute on Fermi

2016-02-07 Thread Samuel Pitoiset




On 02/07/2016 12:02 AM, Ilia Mirkin wrote:

On Sat, Feb 6, 2016 at 5:38 PM, Samuel Pitoiset
 wrote:

Signed-off-by: Samuel Pitoiset 
---
  src/gallium/drivers/nouveau/nvc0/nvc0_compute.c | 13 ++---
  src/gallium/drivers/nouveau/nvc0/nvc0_program.c |  2 ++
  2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
index 3ac7ce1..49a58ce 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
@@ -236,10 +236,17 @@ nvc0_compute_upload_input(struct nvc0_context *nvc0, 
const void *input)
BEGIN_1IC0(push, NVC0_COMPUTE(CB_POS), 1 + cp->parm_size / 4);
PUSH_DATA (push, 0);
PUSH_DATAp(push, input, cp->parm_size / 4);
-
-  BEGIN_NVC0(push, NVC0_COMPUTE(FLUSH), 1);
-  PUSH_DATA (push, NVC0_COMPUTE_FLUSH_CB);
+   } else {
+  BEGIN_NVC0(push, NVC0_COMPUTE(CB_SIZE), 3);
+  PUSH_DATA (push, 1024);
+  PUSH_DATAh(push, nvc0->screen->uniform_bo->offset + (6 << 16) + (5 << 
10));
+  PUSH_DATA (push, nvc0->screen->uniform_bo->offset + (6 << 16) + (5 << 
10));
+  BEGIN_NVC0(push, NVC0_COMPUTE(CB_BIND), 1);
+  PUSH_DATA (push, (15 << 8) | 1);
 }


Why are these two mutually exclusive? The driver constbufs should be
always bound. And user parameters are only going to come in via the
clover path. Not 100% sure how they're going to be used yet, but it
does seem like they're going to be separate things...


User parameters also come in via the compute shader whichs read MP 
performance counters. Well, binding the driver constbufs all the time 
doesn't seem to be crazy though.





+
+   BEGIN_NVC0(push, NVC0_COMPUTE(FLUSH), 1);
+   PUSH_DATA (push, NVC0_COMPUTE_FLUSH_CB);
  }

  void
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
index 93f211b..afcff53 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
@@ -544,6 +544,8 @@ nvc0_program_translate(struct nvc0_program *prog, uint16_t 
chipset,
   info->io.texBindBase = NVE4_CP_INPUT_TEX(0);
   info->io.suInfoBase = NVE4_CP_INPUT_SUF(0);
   info->prop.cp.gridInfoBase = NVE4_CP_INPUT_GRID_INFO(0);
+  } else {
+ info->io.resInfoCBSlot = 15;
}
info->io.msInfoCBSlot = 0;
info->io.msInfoBase = NVE4_CP_INPUT_MS_OFFSETS;
--
2.6.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] glsl: return cloned signature, not the builtin one

2016-02-07 Thread Timothy Arceri

On Sat, 2016-02-06 at 17:10 -0500, Ilia Mirkin wrote:
> The builtin data can get released with a glReleaseShaderCompiler
> call.
> We're careful everywhere to clone everything that comes out of
> builtins
> except here, where we accidentally return the signature belonging to
> the
> builtin version, rather than the locally-cloned one.
> 
> Signed-off-by: Ilia Mirkin 

Reviewed-by: Timothy Arceri 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 3/5] nv50: add PIPE_QUERY_OCCLUSION_PREDICATE support

2016-02-07 Thread Samuel Pitoiset


Reviewed-by: Samuel Pitoiset 

On 02/07/2016 02:53 AM, Ilia Mirkin wrote:

Signed-off-by: Ilia Mirkin 

v1 -> v2: adjust begin/end methods as well
---

Tested this time around.

  src/gallium/drivers/nouveau/nv50/nv50_query_hw.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query_hw.c 
b/src/gallium/drivers/nouveau/nv50/nv50_query_hw.c
index cccd3b7..727b509 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_query_hw.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_query_hw.c
@@ -156,6 +156,7 @@ nv50_hw_begin_query(struct nv50_context *nv50, struct 
nv50_query *q)

 switch (q->type) {
 case PIPE_QUERY_OCCLUSION_COUNTER:
+   case PIPE_QUERY_OCCLUSION_PREDICATE:
hq->nesting = nv50->screen->num_occlusion_queries_active++;
if (hq->nesting) {
   nv50_hw_query_get(push, q, 0x10, 0x0100f002);
@@ -213,6 +214,7 @@ nv50_hw_end_query(struct nv50_context *nv50, struct 
nv50_query *q)

 switch (q->type) {
 case PIPE_QUERY_OCCLUSION_COUNTER:
+   case PIPE_QUERY_OCCLUSION_PREDICATE:
nv50_hw_query_get(push, q, 0, 0x0100f002);
if (--nv50->screen->num_occlusion_queries_active == 0) {
   PUSH_SPACE(push, 2);
@@ -304,6 +306,9 @@ nv50_hw_get_query_result(struct nv50_context *nv50, struct 
nv50_query *q,
 case PIPE_QUERY_OCCLUSION_COUNTER: /* u32 sequence, u32 count, u64 time */
res64[0] = hq->data[1] - hq->data[5];
break;
+   case PIPE_QUERY_OCCLUSION_PREDICATE:
+  res8[0] = hq->data[1] != hq->data[5];
+  break;
 case PIPE_QUERY_PRIMITIVES_GENERATED: /* u64 count, u64 time */
 case PIPE_QUERY_PRIMITIVES_EMITTED: /* u64 count, u64 time */
res64[0] = data64[0] - data64[2];
@@ -372,6 +377,7 @@ nv50_hw_create_query(struct nv50_context *nv50, unsigned 
type, unsigned index)

 switch (q->type) {
 case PIPE_QUERY_OCCLUSION_COUNTER:
+   case PIPE_QUERY_OCCLUSION_PREDICATE:
hq->rotate = 32;
break;
 case PIPE_QUERY_PRIMITIVES_GENERATED:


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 06/12] nvc0: bind textures/samplers for compute on Fermi

2016-02-07 Thread Samuel Pitoiset




On 02/07/2016 12:04 AM, Ilia Mirkin wrote:

Seems like it would make sense to stick these into nvc0_tex along with
the existing 3d functions that these are mostly copying. Perhaps
refactor so that they share logic?


I wasn't not sure about sharing the same code for both 3D and compute, 
but this is definitely possible, I'll do.




On Sat, Feb 6, 2016 at 5:38 PM, Samuel Pitoiset
 wrote:

Signed-off-by: Samuel Pitoiset 
---
  src/gallium/drivers/nouveau/nvc0/nvc0_compute.c | 140 +++-
  src/gallium/drivers/nouveau/nvc0/nvc0_context.h |   2 +
  src/gallium/drivers/nouveau/nvc0/nvc0_tex.c |   2 +-
  3 files changed, 141 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
index bd399e6..e63bdcb 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
@@ -105,7 +105,17 @@ nvc0_screen_compute_setup(struct nvc0_screen *screen,
 PUSH_DATAh(push, screen->text->offset);
 PUSH_DATA (push, screen->text->offset);

-   /* TODO: textures & samplers */
+   /* textures */
+   BEGIN_NVC0(push, NVC0_COMPUTE(TIC_ADDRESS_HIGH), 3);
+   PUSH_DATAh(push, screen->txc->offset);
+   PUSH_DATA (push, screen->txc->offset);
+   PUSH_DATA (push, NVC0_TIC_MAX_ENTRIES - 1);
+
+   /* samplers */
+   BEGIN_NVC0(push, NVC0_COMPUTE(TSC_ADDRESS_HIGH), 3);
+   PUSH_DATAh(push, screen->txc->offset + 65536);
+   PUSH_DATA (push, screen->txc->offset + 65536);
+   PUSH_DATA (push, NVC0_TSC_MAX_ENTRIES - 1);

 return 0;
  }
@@ -139,6 +149,128 @@ nvc0_compute_validate_program(struct nvc0_context *nvc0)
  }

  static void
+nvc0_compute_validate_samplers(struct nvc0_context *nvc0)
+{
+   uint32_t commands[16];
+   struct nouveau_pushbuf *push = nvc0->base.pushbuf;
+   const int s = 5;
+   unsigned i;
+   unsigned n = 0;
+   bool need_flush = false;
+
+   for (i = 0; i < nvc0->num_samplers[s]; ++i) {
+  struct nv50_tsc_entry *tsc = nv50_tsc_entry(nvc0->samplers[s][i]);
+
+  if (!(nvc0->samplers_dirty[s] & (1 << i)))
+ continue;
+  if (!tsc) {
+ commands[n++] = (i << 4) | 0;
+ continue;
+  }
+  if (tsc->id < 0) {
+ tsc->id = nvc0_screen_tsc_alloc(nvc0->screen, tsc);
+
+ nvc0_m2mf_push_linear(>base, nvc0->screen->txc,
+   65536 + tsc->id * 32, 
NV_VRAM_DOMAIN(>screen->base),
+   32, tsc->tsc);
+ need_flush = true;
+  }
+  nvc0->screen->tsc.lock[tsc->id / 32] |= 1 << (tsc->id % 32);
+
+  commands[n++] = (tsc->id << 12) | (i << 4) | 1;
+   }
+   for (; i < nvc0->state.num_samplers[s]; ++i)
+  commands[n++] = (i << 4) | 0;
+
+   nvc0->state.num_samplers[s] = nvc0->num_samplers[s];
+
+   if (n) {
+  BEGIN_NIC0(push, NVC0_COMPUTE(BIND_TSC), n);
+  PUSH_DATAp(push, commands, n);
+   }
+   nvc0->samplers_dirty[s] = 0;
+
+   if (need_flush) {
+  BEGIN_NVC0(push, NVC0_COMPUTE(TSC_FLUSH), 1);
+  PUSH_DATA (push, 0);
+   }
+}
+
+static void
+nvc0_compute_validate_textures(struct nvc0_context *nvc0)
+{
+   uint32_t commands[32];
+   struct nouveau_pushbuf *push = nvc0->base.pushbuf;
+   struct nouveau_bo *txc = nvc0->screen->txc;
+   const int s = 5;
+   unsigned i;
+   unsigned n = 0;
+   bool need_flush = false;
+
+   for (i = 0; i < nvc0->num_textures[s]; ++i) {
+  struct nv50_tic_entry *tic = nv50_tic_entry(nvc0->textures[s][i]);
+  struct nv04_resource *res;
+  const bool dirty = !!(nvc0->textures_dirty[s] & (1 << i));
+
+  if (!tic) {
+ if (dirty)
+commands[n++] = (i << 1) | 0;
+ continue;
+  }
+  res = nv04_resource(tic->pipe.texture);
+  nvc0_update_tic(nvc0, tic, res);
+
+  if (tic->id < 0) {
+ tic->id = nvc0_screen_tic_alloc(nvc0->screen, tic);
+
+ PUSH_SPACE(push, 17);
+ BEGIN_NVC0(push, NVC0_M2MF(OFFSET_OUT_HIGH), 2);
+ PUSH_DATAh(push, txc->offset + (tic->id * 32));
+ PUSH_DATA (push, txc->offset + (tic->id * 32));
+ BEGIN_NVC0(push, NVC0_M2MF(LINE_LENGTH_IN), 2);
+ PUSH_DATA (push, 32);
+ PUSH_DATA (push, 1);
+ BEGIN_NVC0(push, NVC0_M2MF(EXEC), 1);
+ PUSH_DATA (push, 0x100111);
+ BEGIN_NIC0(push, NVC0_M2MF(DATA), 8);
+ PUSH_DATAp(push, >tic[0], 8);
+
+ need_flush = true;
+  } else
+  if (res->status & NOUVEAU_BUFFER_STATUS_GPU_WRITING) {
+ BEGIN_NVC0(push, NVC0_COMPUTE(TEX_CACHE_CTL), 1);
+ PUSH_DATA (push, (tic->id << 4) | 1);
+ NOUVEAU_DRV_STAT(>screen->base, tex_cache_flush_count, 1);
+  }
+  nvc0->screen->tic.lock[tic->id / 32] |= 1 << (tic->id % 32);
+
+  res->status &= ~NOUVEAU_BUFFER_STATUS_GPU_WRITING;
+  res->status |=  NOUVEAU_BUFFER_STATUS_GPU_READING;
+
+  if (!dirty)
+ continue;
+  commands[n++] = (tic->id << 9) | (i << 1) |

Re: [Mesa-dev] [PATCH 11/12] nv50/ir: add atomics support on shared memory for Fermi

2016-02-07 Thread Samuel Pitoiset




On 02/07/2016 12:23 AM, Ilia Mirkin wrote:

On Sat, Feb 6, 2016 at 5:38 PM, Samuel Pitoiset
 wrote:

Signed-off-by: Samuel Pitoiset 
---
  .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  |   1 +
  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 102 -
  .../nouveau/codegen/nv50_ir_lowering_nvc0.h|   1 +
  3 files changed, 102 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index f6605eb..42b2a84 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -398,6 +398,7 @@ CodeEmitterNVC0::emitForm_A(const Instruction *i, uint64_t 
opc)
   srcId(i->src(s), s ? ((s == 2) ? 49 : s1) : 20);
   break;
default:
+ srcId(i->src(s), 49);


Yeah no :) I'd want to see some assert's here to make sure that
this is what you think it is. Also, as I recall this is related to
SELP emission, nothing here.


Oh right, I forgot to clean up this part. :-)




   // ignore here, can be predicate or flags, but must not be address
   break;
}
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index e7cb54b..243e23a 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -1033,6 +1033,99 @@ NVC0LoweringPass::handleSUQ(Instruction *suq)
 return true;
  }

+void
+NVC0LoweringPass::handleSharedATOM(Instruction *atom)
+{
+   assert(atom->src(0).getFile() == FILE_MEMORY_SHARED);
+
+   BasicBlock *currBB = atom->bb;
+   BasicBlock *tryLockAndSetBB = atom->bb->splitBefore(atom, false);
+   BasicBlock *joinBB = atom->bb->splitAfter(atom);
+
+   bld.setPosition(currBB, true);
+   assert(!currBB->joinAt);
+   currBB->joinAt = bld.mkFlow(OP_JOINAT, joinBB, CC_ALWAYS, NULL);
+
+   bld.mkFlow(OP_BRA, tryLockAndSetBB, CC_ALWAYS, NULL);
+   currBB->cfg.attach(>cfg, Graph::Edge::TREE);
+
+   bld.setPosition(tryLockAndSetBB, true);
+
+   Instruction *ld =
+  bld.mkLoad(TYPE_U32, atom->getDef(0),
+ bld.mkSymbol(FILE_MEMORY_SHARED, 0, TYPE_U32, 0), NULL);
+   ld->setDef(1, bld.getSSA(1, FILE_PREDICATE));
+   ld->subOp = NV50_IR_SUBOP_LOAD_LOCKED;
+
+   Value *stVal;
+   if (atom->subOp == NV50_IR_SUBOP_ATOM_EXCH) {
+  // Read the old value, and write the new one.
+  stVal = atom->getSrc(1);
+   } else if (atom->subOp == NV50_IR_SUBOP_ATOM_CAS) {
+  CmpInstruction *set =
+ bld.mkCmp(OP_SET, CC_EQ, TYPE_U32, bld.getSSA(1, FILE_PREDICATE),
+   TYPE_U32, ld->getDef(0), atom->getSrc(1));
+  set->setPredicate(CC_P, ld->getDef(1));
+
+  CmpInstruction *selp =
+ bld.mkCmp(OP_SELP, CC_NOT_P, TYPE_U32, bld.getSSA(4, FILE_ADDRESS),
+   TYPE_U32, ld->getDef(0), atom->getSrc(2),
+   set->getDef(0));
+  selp->setPredicate(CC_P, ld->getDef(1));
+
+  stVal = selp->getDef(0);
+   } else {
+  operation op;
+
+  switch (atom->subOp) {
+  case NV50_IR_SUBOP_ATOM_ADD:
+ op = OP_ADD;
+ break;
+  case NV50_IR_SUBOP_ATOM_AND:
+ op = OP_AND;
+ break;
+  case NV50_IR_SUBOP_ATOM_OR:
+ op = OP_OR;
+ break;
+  case NV50_IR_SUBOP_ATOM_XOR:
+ op = OP_XOR;
+ break;
+  case NV50_IR_SUBOP_ATOM_MIN:
+ op = OP_MIN;
+ break;
+  case NV50_IR_SUBOP_ATOM_MAX:
+ op = OP_MAX;
+ break;
+  default:
+ assert(0);
+  }
+
+  Instruction *i =
+ bld.mkOp2(op, atom->dType, bld.getSSA(4, FILE_ADDRESS), ld->getDef(0),
+   atom->getSrc(1));


Why is this FILE_ADDRESS? This is just a regular operation, nothing to
do with address registers. Just bld.getSSA() should be fine here.


Ok.




+  i->setPredicate(CC_P, ld->getDef(1));
+
+  stVal = i->getDef(0);
+   }
+
+   Instruction *st =
+  bld.mkStore(OP_STORE, TYPE_U32,
+  bld.mkSymbol(FILE_MEMORY_SHARED, 0, TYPE_U32, 0),
+  NULL, stVal);
+   st->setPredicate(CC_P, ld->getDef(1));
+   st->subOp = NV50_IR_SUBOP_STORE_UNLOCKED;
+
+   // Loop until the lock is acquired.
+   bld.mkFlow(OP_BRA, tryLockAndSetBB, CC_NOT_P, ld->getDef(1));
+   tryLockAndSetBB->cfg.attach(>cfg, Graph::Edge::BACK);
+   bld.mkFlow(OP_BRA, joinBB, CC_ALWAYS, NULL);


You need an edge to the joinBB as well, no? (a CROSS edge, I guess).


Mmmh... Yeah probably.




+
+   bld.remove(atom);
+
+   bld.setPosition(joinBB, false);
+   bld.mkFlow(OP_JOIN, NULL, CC_ALWAYS, NULL)->fixed = 1;
+}
+
  bool
  NVC0LoweringPass::handleATOM(Instruction *atom)
  {
@@ -1044,8 +1137,8 @@ NVC0LoweringPass::handleATOM(Instruction *atom)
sv = SV_LBASE;

Re: [Mesa-dev] [PATCH 01/12] nvc0: allocate an area for compute user constbufs

2016-02-07 Thread Samuel Pitoiset




On 02/07/2016 10:38 AM, Michael Schellenberger Costa wrote:

Hi,

Am 06/02/2016 um 23:38 schrieb Samuel Pitoiset:

For compute shaders, we might need to upload uniforms.

Signed-off-by: Samuel Pitoiset 
---
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 14 +++---
  src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c | 12 ++--
  src/gallium/drivers/nouveau/nvc0/nvc0_tex.c|  2 +-
  src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c| 10 ++
  4 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 2b12de4..84e4253 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -889,7 +889,7 @@ nvc0_screen_create(struct nouveau_device *dev)
  */
 nouveau_heap_init(>text_heap, 0, (1 << 20) - 0x100);

-   ret = nouveau_bo_new(dev, NV_VRAM_DOMAIN(>base), 1 << 12, 6 << 16, 
NULL,
+   ret = nouveau_bo_new(dev, NV_VRAM_DOMAIN(>base), 1 << 12, 7 << 16, 
NULL,
  >uniform_bo);


There aren't any enums for those magic numbers here and below?


Hi,

Well, we have a bunch of magic numbers in the nvc0 driver but they are 
quite understandable though. I agree with you that it would be better to 
use some macros here and there but this is not the main intention of 
this patch.





 if (ret)
goto fail;
@@ -901,8 +901,8 @@ nvc0_screen_create(struct nouveau_device *dev)
/* auxiliary constants (6 user clip planes, base instance id) */
BEGIN_NVC0(push, NVC0_3D(CB_SIZE), 3);
PUSH_DATA (push, 1024);
-  PUSH_DATAh(push, screen->uniform_bo->offset + (5 << 16) + (i << 10));
-  PUSH_DATA (push, screen->uniform_bo->offset + (5 << 16) + (i << 10));
+  PUSH_DATAh(push, screen->uniform_bo->offset + (6 << 16) + (i << 10));
+  PUSH_DATA (push, screen->uniform_bo->offset + (6 << 16) + (i << 10));

The pattern (N << 16) + (M << 10)) seems repetitive, would a helper make
sense here (Might help to avoid the magic numbers)?


cf. My comment above.



Michael


BEGIN_NVC0(push, NVC0_3D(CB_BIND(i)), 1);
PUSH_DATA (push, (15 << 4) | 1);
if (screen->eng3d->oclass >= NVE4_3D_CLASS) {
@@ -922,8 +922,8 @@ nvc0_screen_create(struct nouveau_device *dev)
 /* return { 0.0, 0.0, 0.0, 0.0 } for out-of-bounds vtxbuf access */
 BEGIN_NVC0(push, NVC0_3D(CB_SIZE), 3);
 PUSH_DATA (push, 256);
-   PUSH_DATAh(push, screen->uniform_bo->offset + (5 << 16) + (6 << 10));
-   PUSH_DATA (push, screen->uniform_bo->offset + (5 << 16) + (6 << 10));
+   PUSH_DATAh(push, screen->uniform_bo->offset + (6 << 16) + (6 << 10));
+   PUSH_DATA (push, screen->uniform_bo->offset + (6 << 16) + (6 << 10));
 BEGIN_1IC0(push, NVC0_3D(CB_POS), 5);
 PUSH_DATA (push, 0);
 PUSH_DATAf(push, 0.0f);
@@ -931,8 +931,8 @@ nvc0_screen_create(struct nouveau_device *dev)
 PUSH_DATAf(push, 0.0f);
 PUSH_DATAf(push, 0.0f);
 BEGIN_NVC0(push, NVC0_3D(VERTEX_RUNOUT_ADDRESS_HIGH), 2);
-   PUSH_DATAh(push, screen->uniform_bo->offset + (5 << 16) + (6 << 10));
-   PUSH_DATA (push, screen->uniform_bo->offset + (5 << 16) + (6 << 10));
+   PUSH_DATAh(push, screen->uniform_bo->offset + (6 << 16) + (6 << 10));
+   PUSH_DATA (push, screen->uniform_bo->offset + (6 << 16) + (6 << 10));

 if (screen->base.drm->version >= 0x01000101) {
ret = nouveau_getparam(dev, NOUVEAU_GETPARAM_GRAPH_UNITS, );
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
index c17223a..2bb9b44 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
@@ -184,8 +184,8 @@ nvc0_validate_fb(struct nvc0_context *nvc0)
  ms = 1 << ms_mode;
  BEGIN_NVC0(push, NVC0_3D(CB_SIZE), 3);
  PUSH_DATA (push, 1024);
-PUSH_DATAh(push, nvc0->screen->uniform_bo->offset + (5 << 16) + (4 << 10));
-PUSH_DATA (push, nvc0->screen->uniform_bo->offset + (5 << 16) + (4 << 10));
+PUSH_DATAh(push, nvc0->screen->uniform_bo->offset + (6 << 16) + (4 << 10));
+PUSH_DATA (push, nvc0->screen->uniform_bo->offset + (6 << 16) + (4 << 10));
  BEGIN_1IC0(push, NVC0_3D(CB_POS), 1 + 2 * ms);
  PUSH_DATA (push, 256 + 128);
  for (i = 0; i < ms; i++) {
@@ -318,8 +318,8 @@ nvc0_upload_uclip_planes(struct nvc0_context *nvc0, 
unsigned s)

 BEGIN_NVC0(push, NVC0_3D(CB_SIZE), 3);
 PUSH_DATA (push, 1024);
-   PUSH_DATAh(push, bo->offset + (5 << 16) + (s << 10));
-   PUSH_DATA (push, bo->offset + (5 << 16) + (s << 10));
+   PUSH_DATAh(push, bo->offset + (6 << 16) + (s << 10));
+   PUSH_DATA (push, bo->offset + (6 << 16) + (s << 10));
 BEGIN_1IC0(push, NVC0_3D(CB_POS), PIPE_MAX_CLIP_PLANES * 4 + 1);
 PUSH_DATA (push, 256);
 PUSH_DATAp(push, >clip.ucp[0][0], PIPE_MAX_CLIP_PLANES * 4);
@@ -479,8

Re: [Mesa-dev] [PATCH 02/12] nvc0: allow to push constant buffers for compute on Fermi

2016-02-07 Thread Samuel Pitoiset




On 02/06/2016 11:52 PM, Ilia Mirkin wrote:

On Sat, Feb 6, 2016 at 5:38 PM, Samuel Pitoiset
 wrote:

Constant buffers must be bound for compute like for 3D. This is done
by adding a new 'shader' parameter to nvc0_cb_bo_push() which allows
to use the compute channel for compute shaders, and the 3D channel
for other shader types.

Signed-off-by: Samuel Pitoiset 
---
  src/gallium/drivers/nouveau/nvc0/nvc0_context.h|  2 +-
  src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c |  2 +-
  src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c   | 15 +++
  3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h 
b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
index 2e726e6..2ab70e8 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
@@ -301,7 +301,7 @@ nve4_p2mf_push_linear(struct nouveau_context *nv,
unsigned size, const void *data);
  void
  nvc0_cb_bo_push(struct nouveau_context *,
-struct nouveau_bo *bo, unsigned domain,
+struct nouveau_bo *bo, unsigned domain, unsigned shader,
  unsigned base, unsigned size,
  unsigned offset, unsigned words, const uint32_t *data);

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
index 2bb9b44..97fcfbc 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
@@ -441,7 +441,7 @@ nvc0_constbufs_validate(struct nvc0_context *nvc0)
 PUSH_DATA (push, (0 << 4) | 1);
  }
  nvc0_cb_bo_push(>base, bo, 
NV_VRAM_DOMAIN(>screen->base),
- base, nvc0->state.uniform_buffer_bound[s],
+ base, nvc0->state.uniform_buffer_bound[s], s,
   0, (size + 3) / 4,
   nvc0->constbuf[s][0].u.data);
   } else {
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
index 279c7e9..5cea822 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
@@ -4,6 +4,7 @@
  #include "nvc0/nvc0_context.h"

  #include "nv50/nv50_defs.xml.h"
+#include "nvc0/nvc0_compute.xml.h"

  struct nvc0_transfer {
 struct pipe_transfer base;
@@ -532,7 +533,7 @@ nvc0_cb_push(struct nouveau_context *nv,

 if (cb) {
nvc0_cb_bo_push(nv, res->bo, res->domain,
-  res->offset + cb->offset, cb->size,
+  res->offset + cb->offset, cb->size, s,


I think you want s - 1 here. Also if a CB is bound to both a graphics
and compute pipeline, it'll get uploaded via the graphics pipeline.
Which leads me to my below comment:


Yeah, it's s-1.




offset - cb->offset, words, data);
 } else {
nv->push_data(nv, res->bo, res->offset + offset, res->domain,
@@ -543,7 +544,7 @@ nvc0_cb_push(struct nouveau_context *nv,
  void
  nvc0_cb_bo_push(struct nouveau_context *nv,
  struct nouveau_bo *bo, unsigned domain,
-unsigned base, unsigned size,
+unsigned base, unsigned size, unsigned shader,
  unsigned offset, unsigned words, const uint32_t *data)
  {
 struct nouveau_pushbuf *push = nv->pushbuf;
@@ -557,7 +558,10 @@ nvc0_cb_bo_push(struct nouveau_context *nv,
 assert(offset < size);
 assert(offset + words * 4 <= size);

-   BEGIN_NVC0(push, NVC0_3D(CB_SIZE), 3);
+   if (unlikely(shader == 5))
+  BEGIN_NVC0(push, NVC0_COMPUTE(CB_SIZE), 3);
+   else
+  BEGIN_NVC0(push, NVC0_3D(CB_SIZE), 3);


Did you observe this to *actually* matter (i.e. using the compute cb
upload thing)? I find that hard to believe... Also aren't these gone
on kepler compute? That'll be fun...


I'll try to upload constbufs used for compute with the 3D chan and we 
will see.. Anyway, uploading CB for compute can clobber CB for 3D on 
Fermi... that's sad.


Yes, this stuff has gone on Kepler. :-)




 PUSH_DATA (push, size);
 PUSH_DATAh(push, bo->offset + base);
 PUSH_DATA (push, bo->offset + base);
@@ -567,7 +571,10 @@ nvc0_cb_bo_push(struct nouveau_context *nv,

PUSH_SPACE(push, nr + 2);
PUSH_REFN (push, bo, NOUVEAU_BO_WR | domain);
-  BEGIN_1IC0(push, NVC0_3D(CB_POS), nr + 1);
+  if (unlikely(shader == 5))
+ BEGIN_1IC0(push, NVC0_COMPUTE(CB_POS), nr + 1);
+  else
+ BEGIN_1IC0(push, NVC0_3D(CB_POS), nr + 1);
PUSH_DATA (push, offset);
PUSH_DATAp(push, data, nr);

--
2.6.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] glsl: make sure builtins are initialized before getting the shader

2016-02-07 Thread Timothy Arceri

On Sat, 2016-02-06 at 17:10 -0500, Ilia Mirkin wrote:

It would be have been nice to have some reasoning in here,
the scenario its fixing etc.

> Signed-off-by: Ilia Mirkin 
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/compiler/glsl/linker.cpp | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/compiler/glsl/linker.cpp
> b/src/compiler/glsl/linker.cpp
> index 4776ffa..f1ac53a 100644
> --- a/src/compiler/glsl/linker.cpp
> +++ b/src/compiler/glsl/linker.cpp
> @@ -2125,6 +2125,7 @@ link_intrastage_shaders(void *mem_ctx,
>  
>    if (ok) {
>   memcpy(linking_shaders, shader_list, num_shaders *
> sizeof(gl_shader *));
> + _mesa_glsl_initialize_builtin_functions();

Doesn't this defeat the purpose of glReleaseShaderCompiler() as its
currently implemented in mesa? Although thinking about it I can't think
of any better alternative.

If you add something to the commit message describing why we would need
to make sure this is initialize here i.e the problem scenario.

Reviewed-by: Timothy Arceri 


>   linking_shaders[num_shaders] =
> _mesa_glsl_get_builtin_function_shader();
>  
>   ok = link_function_calls(prog, linked, linking_shaders,
> num_shaders + 1);
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 07/12] nvc0: add support for indirect compute on Fermi

2016-02-07 Thread Samuel Pitoiset




On 02/07/2016 12:13 AM, Ilia Mirkin wrote:

On Sat, Feb 6, 2016 at 5:38 PM, Samuel Pitoiset
 wrote:

When indirect compute is used, the size of the grid (in blocks) is
stored as three integers inside a buffer. This requires a macro to
set up GRIDDIM_YX and GRIDDIM_Z.

Signed-off-by: Samuel Pitoiset 
---
  src/gallium/drivers/nouveau/nvc0/mme/Makefile  |  2 +-
  src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme   | 19 +++
  src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme.h | 13 +
  src/gallium/drivers/nouveau/nvc0/nvc0_compute.c| 18 +++---
  src/gallium/drivers/nouveau/nvc0/nvc0_macros.h |  2 ++
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |  2 ++
  6 files changed, 52 insertions(+), 4 deletions(-)
  create mode 100644 src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme
  create mode 100644 src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme.h

diff --git a/src/gallium/drivers/nouveau/nvc0/mme/Makefile 
b/src/gallium/drivers/nouveau/nvc0/mme/Makefile
index 1c0f583..52fb0a5 100644
--- a/src/gallium/drivers/nouveau/nvc0/mme/Makefile
+++ b/src/gallium/drivers/nouveau/nvc0/mme/Makefile
@@ -1,5 +1,5 @@
  ENVYAS?=envyas
-TARGETS=com9097.mme.h
+TARGETS=com9097.mme.h com90c0.mme.h

  all: $(TARGETS)

diff --git a/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme 
b/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme
new file mode 100644
index 000..ee7f726
--- /dev/null
+++ b/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme
@@ -0,0 +1,19 @@
+/* NVC0_COMPUTE_MACRO_LAUNCH_GRID_INDIRECT
+ *
+ * arg = num_groups_x
+ * parm[0] = num_groups_y
+ * parm[1] = num_groups_z
+ */
+.section #mme90c0_launch_grid_indirect
+   parm $r2
+   parm $r3
+   mov $r4 (or $r1 $r2)
+   mov $r4 (or $r3 $r4)
+   braz $r4 #fail
+   maddr 0x108e /* GRIDDIM_YX */


You can move this up, e.g.

parm $r2 maddr 0x108e /* GRIDDIM_XY */


+   mov $r4 (extrshl $r2 $r0 0x10 0x10)


If you make this

(extrinsrt $r1 $r2 0x0 0x10 0x10)

then you can make it directly an argument to send, avoiding the separate or.


+   exit send (or $r4 $r1) /* (num_groups_y << 16) | num_groups_x */
+   send $r3
+fail:
+   exit


I think you need a nop here.


Okay, I'll reduce the number of insn. :-)




+
diff --git a/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme.h 
b/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme.h
new file mode 100644


I think Emil is going to yell at you about not adding this file to
some list somewhere so that make dist picks it up.


Well, try to 'git grep com9097.h' and you will see that this file is not 
defined in any Makefiles... I just followed the same rule, but as this 
file is automatically generated with envyas, I don't think it's a real 
problem.





index 000..89076cf
--- /dev/null
+++ b/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme.h
@@ -0,0 +1,13 @@
+uint32_t mme90c0_launch_grid_indirect[] = {
+   0x0201,
+   0x0301,
+/* 0x0009: fail */
+   0x00128c10,
+   0x00131c10,
+   0x00016007,
+   0x04238021,
+   0x84008413,
+   0x001260c0,
+   0x1841,
+   0x0091,
+};
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
index e63bdcb..dbf2148 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
@@ -452,9 +452,21 @@ nvc0_launch_grid(struct pipe_context *pipe, const struct 
pipe_grid_info *info)
 PUSH_DATA (push, cp->num_gprs);

 /* grid/block setup */
-   BEGIN_NVC0(push, NVC0_COMPUTE(GRIDDIM_YX), 2);
-   PUSH_DATA (push, (info->grid[1] << 16) | info->grid[0]);
-   PUSH_DATA (push, info->grid[2]);
+   if (unlikely(info->indirect)) {
+  struct nv04_resource *res = nv04_resource(info->indirect);
+  uint32_t offset = res->offset + info->indirect_offset;
+  unsigned macro = NVC0_COMPUTE_MACRO_LAUNCH_GRID_INDIRECT;
+
+  nouveau_pushbuf_space(push, 16, 0, 1);
+  PUSH_REFN(push, res->bo, NOUVEAU_BO_RD | res->domain);
+  PUSH_DATA(push, NVC0_FIFO_PKHDR_1I(1, macro, 3));
+  nouveau_pushbuf_data(push, res->bo, offset,
+   NVC0_IB_ENTRY_1_NO_PREFETCH | 3 * 4);
+   } else {
+  BEGIN_NVC0(push, NVC0_COMPUTE(GRIDDIM_YX), 2);
+  PUSH_DATA (push, (info->grid[1] << 16) | info->grid[0]);
+  PUSH_DATA (push, info->grid[2]);
+   }
 BEGIN_NVC0(push, NVC0_COMPUTE(BLOCKDIM_YX), 2);
 PUSH_DATA (push, (info->block[1] << 16) | info->block[0]);
 PUSH_DATA (push, info->block[2]);
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_macros.h 
b/src/gallium/drivers/nouveau/nvc0/nvc0_macros.h
index 49e176c..57262fe 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_macros.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_macros.h
@@ -35,4 +35,6 @@

  #define NVC0_3D_MACRO_QUERY_BUFFER_WRITE   0x3858

+#define NVC0_COMPUTE_MACRO_LAUNCH_GRID_INDIRECT   0x3860
+
  #endif

Re: [Mesa-dev] [PATCH v2 06/20] gallium: add a new interface for pipe_context::launch_grid()

2016-02-07 Thread Samuel Pitoiset




On 02/06/2016 11:04 PM, Samuel Pitoiset wrote:

This introduces pipe_grid_info which contains all information to
describe a launch_grid call. This will be used to implement indirect
compute in the same fashion as indirect draw.

Signed-off-by: Samuel Pitoiset 
Reviewed-by: Marek Olšák 
Reviewed-by: Ilia Mirkin 
---
  src/gallium/drivers/ilo/ilo_gpgpu.c|  8 ++
  src/gallium/drivers/nouveau/nv50/nv50_compute.c| 16 +--
  src/gallium/drivers/nouveau/nv50/nv50_context.h|  3 +-
  .../drivers/nouveau/nv50/nv50_query_hw_sm.c| 12 ++--
  src/gallium/drivers/nouveau/nvc0/nvc0_compute.c| 19 ++---
  src/gallium/drivers/nouveau/nvc0/nvc0_context.h|  6 ++--
  .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 12 ++--
  src/gallium/drivers/nouveau/nvc0/nve4_compute.c| 10 +++
  src/gallium/drivers/r600/evergreen_compute.c   | 19 ++---
  src/gallium/drivers/radeonsi/si_compute.c  | 33 +++---
  src/gallium/include/pipe/p_context.h   | 17 ++-
  src/gallium/include/pipe/p_state.h | 27 ++
  src/gallium/state_trackers/clover/core/kernel.cpp  | 13 +
  src/gallium/tests/trivial/compute.c| 11 +++-
  14 files changed, 117 insertions(+), 89 deletions(-)

diff --git a/src/gallium/drivers/ilo/ilo_gpgpu.c 
b/src/gallium/drivers/ilo/ilo_gpgpu.c
index b741590..ab165b6 100644
--- a/src/gallium/drivers/ilo/ilo_gpgpu.c
+++ b/src/gallium/drivers/ilo/ilo_gpgpu.c
@@ -79,9 +79,7 @@ launch_grid(struct ilo_context *ilo,
  }

  static void
-ilo_launch_grid(struct pipe_context *pipe,
-const uint *block_layout, const uint *grid_layout,
-uint32_t pc, const void *input)
+ilo_launch_grid(struct pipe_context *pipe, const struct pipe_grid_info *info)
  {
 struct ilo_context *ilo = ilo_context(pipe);
 struct ilo_shader_state *cs = ilo->state_vector.cs;
@@ -92,13 +90,13 @@ ilo_launch_grid(struct pipe_context *pipe,
 input_buf.buffer_size =
ilo_shader_get_kernel_param(cs, ILO_KERNEL_CS_INPUT_SIZE);
 if (input_buf.buffer_size) {
-  u_upload_data(ilo->uploader, 0, input_buf.buffer_size, 16, input,
+  u_upload_data(ilo->uploader, 0, input_buf.buffer_size, 16, info->input,
  _buf.buffer_offset, _buf.buffer);
 }

 ilo_shader_cache_upload(ilo->shader_cache, >cp->builder);

-   launch_grid(ilo, block_layout, grid_layout, _buf, pc);
+   launch_grid(ilo, info->block, info->grid, _buf, info->pc);

 ilo_render_invalidate_hw(ilo->render);

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_compute.c 
b/src/gallium/drivers/nouveau/nv50/nv50_compute.c
index 6d23fd6..04488d6 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_compute.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_compute.c
@@ -270,13 +270,11 @@ nv50_compute_find_symbol(struct nv50_context *nv50, 
uint32_t label)
  }

  void
-nv50_launch_grid(struct pipe_context *pipe,
- const uint *block_layout, const uint *grid_layout,
- uint32_t label, const void *input)
+nv50_launch_grid(struct pipe_context *pipe, const struct pipe_grid_info *info)
  {
 struct nv50_context *nv50 = nv50_context(pipe);
 struct nouveau_pushbuf *push = nv50->base.pushbuf;
-   unsigned block_size = block_layout[0] * block_layout[1] * block_layout[2];
+   unsigned block_size = info->block[0] * info->block[1] * info->block[2];
 struct nv50_program *cp = nv50->compprog;
 bool ret;

@@ -286,10 +284,10 @@ nv50_launch_grid(struct pipe_context *pipe,
return;
 }

-   nv50_compute_upload_input(nv50, input);
+   nv50_compute_upload_input(nv50, info->input);

 BEGIN_NV04(push, NV50_COMPUTE(CP_START_ID), 1);
-   PUSH_DATA (push, nv50_compute_find_symbol(nv50, label));
+   PUSH_DATA (push, nv50_compute_find_symbol(nv50, info->pc));

 BEGIN_NV04(push, NV50_COMPUTE(SHARED_SIZE), 1);
 PUSH_DATA (push, align(cp->cp.smem_size + cp->parm_size + 0x10, 0x40));
@@ -298,14 +296,14 @@ nv50_launch_grid(struct pipe_context *pipe,

 /* grid/block setup */
 BEGIN_NV04(push, NV50_COMPUTE(BLOCKDIM_XY), 2);
-   PUSH_DATA (push, block_layout[1] << 16 | block_layout[0]);
-   PUSH_DATA (push, block_layout[2]);
+   PUSH_DATA (push, info->block[1] << 16 | info->block[0]);
+   PUSH_DATA (push, info->block[2]);
 BEGIN_NV04(push, NV50_COMPUTE(BLOCK_ALLOC), 1);
 PUSH_DATA (push, 1 << 16 | block_size);
 BEGIN_NV04(push, NV50_COMPUTE(BLOCKDIM_LATCH), 1);
 PUSH_DATA (push, 1);
 BEGIN_NV04(push, NV50_COMPUTE(GRIDDIM), 1);
-   PUSH_DATA (push, grid_layout[1] << 16 | grid_layout[0]);
+   PUSH_DATA (push, info->grid[1] << 16 | info->grid[0]);
 BEGIN_NV04(push, NV50_COMPUTE(GRIDID), 1);
 PUSH_DATA (push, 1);

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.h 
b/src/gallium/drivers/nouveau/nv50/nv50_context.h
index

Re: [Mesa-dev] [PATCH 1/2] glsl: make sure builtins are initialized before getting the shader

2016-02-07 Thread Ilia Mirkin

On Sun, Feb 7, 2016 at 3:49 AM, Timothy Arceri
 wrote:
> On Sat, 2016-02-06 at 17:10 -0500, Ilia Mirkin wrote:
>
> It would be have been nice to have some reasoning in here,
> the scenario its fixing etc.
>
>> Signed-off-by: Ilia Mirkin 
>> Cc: mesa-sta...@lists.freedesktop.org
>> ---
>>  src/compiler/glsl/linker.cpp | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/src/compiler/glsl/linker.cpp
>> b/src/compiler/glsl/linker.cpp
>> index 4776ffa..f1ac53a 100644
>> --- a/src/compiler/glsl/linker.cpp
>> +++ b/src/compiler/glsl/linker.cpp
>> @@ -2125,6 +2125,7 @@ link_intrastage_shaders(void *mem_ctx,
>>
>>if (ok) {
>>   memcpy(linking_shaders, shader_list, num_shaders *
>> sizeof(gl_shader *));
>> + _mesa_glsl_initialize_builtin_functions();
>
> Doesn't this defeat the purpose of glReleaseShaderCompiler() as its
> currently implemented in mesa? Although thinking about it I can't think
> of any better alternative.

Yeah, not sure. It does seem like glReleaseShaderCompiler should be
effective before glLinkProgram while this will, effectively, bring all
that data back in. But I'm just trying to fix crashes here :)

>
> If you add something to the commit message describing why we would need
> to make sure this is initialize here i.e the problem scenario.

Sure, I can do that. At first I read this as adding a comment to the
code, but all the other call sites just do it without comment as well.
However something in the commit message makes sense... how about:

"The builtin function shader is part of the builtin state, released
when glReleaseShaderCompiler is called. We must ensure that the
builtins have been (re)initialized before attempting to link with the
builtin shader."

>
> Reviewed-by: Timothy Arceri 
>
>
>>   linking_shaders[num_shaders] =
>> _mesa_glsl_get_builtin_function_shader();
>>
>>   ok = link_function_calls(prog, linked, linking_shaders,
>> num_shaders + 1);
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/5] nir: Do opt_algebraic in reverse order.

2016-02-07 Thread Eduardo Lima Mitev

On 02/05/2016 02:47 AM, Matt Turner wrote:
> Walking the SSA definitions in order means that we consider the smallest
> algebraic optimizations before larger optimizations. So if a smaller
> rule is part of a larger rule, the smaller one will happen first,
> preventing the larger one from happening.
> 
> instructions in affected programs: 32721 -> 32611 (-0.34%)
> helped: 106
> 
> Prevents regressions and annoyances in the next commits.
> ---
>  src/compiler/nir/nir_algebraic.py | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/compiler/nir/nir_algebraic.py 
> b/src/compiler/nir/nir_algebraic.py
> index a30652f..77ad35e 100644
> --- a/src/compiler/nir/nir_algebraic.py
> +++ b/src/compiler/nir/nir_algebraic.py
> @@ -216,7 +216,7 @@ ${pass_name}_block(nir_block *block, void *void_state)
>  {
> struct opt_state *state = void_state;
>  
> -   nir_foreach_instr_safe(block, instr) {
> +   nir_foreach_instr_reverse_safe(block, instr) {

I would add an explicit comment here as to why walk in reverse order. It
is not immediately clear (at least to me) that the smallest algebraic
optimizations come before the larger ones. I could not find any comment
in opt_algebraic.py or anywhere else that would suggest this is the case.

>if (instr->type != nir_instr_type_alu)
>   continue;
>  
> @@ -255,7 +255,7 @@ ${pass_name}_impl(nir_function_impl *impl, const bool 
> *condition_flags)
> state.progress = false;
> state.condition_flags = condition_flags;
>  
> -   nir_foreach_block(impl, ${pass_name}_block, );
> +   nir_foreach_block_reverse(impl, ${pass_name}_block, );
>

Does it make sense to reverse traversing of blocks too? As far as I
understand opt_algebraic rules don't expand to other blocks (maybe I'm
wrong). I also don't see any difference in shader-db results running
with or without this chunk.

These are my results on HSW (with patches 1 to 3):

total instructions in shared programs: 6265414 -> 6265312 (-0.00%)
instructions in affected programs: 31499 -> 31397 (-0.32%)
helped: 98
HURT: 0

total cycles in shared programs: 56081290 -> 56078442 (-0.01%)
cycles in affected programs: 562440 -> 559592 (-0.51%)
helped: 102
HURT: 6


Patches 1 to 3 are:

Reviewed-by: Eduardo Lima Mitev 

> if (state.progress)
>nir_metadata_preserve(impl, nir_metadata_block_index |
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 05/20] gallium/cso: add support for compute shaders

2016-02-07 Thread Samuel Pitoiset




On 02/07/2016 12:28 AM, Ilia Mirkin wrote:

I believe that the previous suggestion was that cso didn't need to
know anything about compute shaders... just call the functions
directly.


It seems like the previous suggestion was to *only* remove 
cso_{save,restore}_compute_shader() because they are currently not used.


Let's wait for Marek to be sure. :-)



On Sat, Feb 6, 2016 at 5:04 PM, Samuel Pitoiset
 wrote:

Changes from v2:
  - removed cso_{save,restore}_compute_shader() functions and the
compute_shader_saved variable because disabling compute shaders for
meta ops is not currently needed

Signed-off-by: Samuel Pitoiset 
Reviewed-by: Ilia Mirkin  (v1)
---
  src/gallium/auxiliary/cso_cache/cso_context.c | 30 +++
  src/gallium/auxiliary/cso_cache/cso_context.h |  4 
  2 files changed, 34 insertions(+)

diff --git a/src/gallium/auxiliary/cso_cache/cso_context.c 
b/src/gallium/auxiliary/cso_cache/cso_context.c
index 6b29b20..79ae753 100644
--- a/src/gallium/auxiliary/cso_cache/cso_context.c
+++ b/src/gallium/auxiliary/cso_cache/cso_context.c
@@ -69,6 +69,7 @@ struct cso_context {

 boolean has_geometry_shader;
 boolean has_tessellation;
+   boolean has_compute_shader;
 boolean has_streamout;

 struct pipe_sampler_view *fragment_views[PIPE_MAX_SHADER_SAMPLER_VIEWS];
@@ -106,6 +107,7 @@ struct cso_context {
 void *geometry_shader, *geometry_shader_saved;
 void *tessctrl_shader, *tessctrl_shader_saved;
 void *tesseval_shader, *tesseval_shader_saved;
+   void *compute_shader;
 void *velements, *velements_saved;
 struct pipe_query *render_condition, *render_condition_saved;
 uint render_condition_mode, render_condition_mode_saved;
@@ -272,6 +274,10 @@ struct cso_context *cso_create_context( struct 
pipe_context *pipe )
  PIPE_SHADER_CAP_MAX_INSTRUCTIONS) > 0) {
ctx->has_tessellation = TRUE;
 }
+   if (pipe->screen->get_shader_param(pipe->screen, PIPE_SHADER_COMPUTE,
+  PIPE_SHADER_CAP_MAX_INSTRUCTIONS) > 0) {
+  ctx->has_compute_shader = TRUE;
+   }
 if (pipe->screen->get_param(pipe->screen,
 PIPE_CAP_MAX_STREAM_OUTPUT_BUFFERS) != 0) {
ctx->has_streamout = TRUE;
@@ -333,6 +339,10 @@ void cso_destroy_context( struct cso_context *ctx )
   ctx->pipe->bind_tes_state(ctx->pipe, NULL);
   ctx->pipe->set_constant_buffer(ctx->pipe, PIPE_SHADER_TESS_EVAL, 0, 
NULL);
}
+  if (ctx->has_compute_shader) {
+ ctx->pipe->bind_compute_state(ctx->pipe, NULL);
+ ctx->pipe->set_constant_buffer(ctx->pipe, PIPE_SHADER_COMPUTE, 0, 
NULL);
+  }
ctx->pipe->bind_vertex_elements_state( ctx->pipe, NULL );

if (ctx->has_streamout)
@@ -907,6 +917,26 @@ void cso_restore_tesseval_shader(struct cso_context *ctx)
 ctx->tesseval_shader_saved = NULL;
  }

+void cso_set_compute_shader_handle(struct cso_context *ctx, void *handle)
+{
+   assert(ctx->has_compute_shader || !handle);
+
+   if (ctx->has_compute_shader && ctx->compute_shader != handle) {
+  ctx->compute_shader = handle;
+  ctx->pipe->bind_compute_state(ctx->pipe, handle);
+   }
+}
+
+void cso_delete_compute_shader(struct cso_context *ctx, void *handle)
+{
+if (handle == ctx->compute_shader) {
+  /* unbind before deleting */
+  ctx->pipe->bind_compute_state(ctx->pipe, NULL);
+  ctx->compute_shader = NULL;
+   }
+   ctx->pipe->delete_compute_state(ctx->pipe, handle);
+}
+
  enum pipe_error
  cso_set_vertex_elements(struct cso_context *ctx,
  unsigned count,
diff --git a/src/gallium/auxiliary/cso_cache/cso_context.h 
b/src/gallium/auxiliary/cso_cache/cso_context.h
index f0a2739..ec9112b 100644
--- a/src/gallium/auxiliary/cso_cache/cso_context.h
+++ b/src/gallium/auxiliary/cso_cache/cso_context.h
@@ -151,6 +151,10 @@ void cso_save_tesseval_shader(struct cso_context *cso);
  void cso_restore_tesseval_shader(struct cso_context *cso);


+void cso_set_compute_shader_handle(struct cso_context *ctx, void *handle);
+void cso_delete_compute_shader(struct cso_context *ctx, void *handle);
+
+
  void cso_set_framebuffer(struct cso_context *cso,
   const struct pipe_framebuffer_state *fb);
  void cso_save_framebuffer(struct cso_context *cso);
--
2.6.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa/extensions: Fix NVX_gpu_memory_info lexicographical order.

2016-02-07 Thread Marek Olšák

Reviewed-by: Marek Olšák 

On Sat, Feb 6, 2016 at 8:30 AM, Vinson Lee  wrote:
> Fixes MesaExtensionsTest.AlphabeticallySorted.
>
> Fixes: 1d79b9958090 ("mesa: implement GL_NVX_gpu_memory_info (v2)")
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94016
> Signed-off-by: Vinson Lee 
> ---
>  src/mesa/main/extensions_table.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/main/extensions_table.h 
> b/src/mesa/main/extensions_table.h
> index ded6f2c..d1e3a99 100644
> --- a/src/mesa/main/extensions_table.h
> +++ b/src/mesa/main/extensions_table.h
> @@ -273,6 +273,8 @@ EXT(MESA_texture_signed_rgba, 
> EXT_texture_snorm
>  EXT(MESA_window_pos , dummy_true 
> , GLL,  x ,  x ,  x , 2000)
>  EXT(MESA_ycbcr_texture  , MESA_ycbcr_texture 
> , GLL, GLC,  x ,  x , 2002)
>
> +EXT(NVX_gpu_memory_info , NVX_gpu_memory_info
> , GLL, GLC,  x ,  x , 2013)
> +
>  EXT(NV_blend_square , dummy_true 
> , GLL,  x ,  x ,  x , 1999)
>  EXT(NV_conditional_render   , NV_conditional_render  
> , GLL, GLC,  x ,  x , 2008)
>  EXT(NV_depth_clamp  , ARB_depth_clamp
> , GLL, GLC,  x ,  x , 2001)
> @@ -293,7 +295,6 @@ EXT(NV_texture_barrier  , 
> NV_texture_barrier
>  EXT(NV_texture_env_combine4 , NV_texture_env_combine4
> , GLL,  x ,  x ,  x , 1999)
>  EXT(NV_texture_rectangle, NV_texture_rectangle   
> , GLL,  x ,  x ,  x , 2000)
>  EXT(NV_vdpau_interop, NV_vdpau_interop   
> , GLL, GLC,  x ,  x , 2010)
> -EXT(NVX_gpu_memory_info , NVX_gpu_memory_info
> , GLL, GLC,  x ,  x , 2013)
>
>  EXT(OES_EGL_image   , OES_EGL_image  
> , GLL, GLC, ES1, ES2, 2006) /* FIXME: Mesa expects GL_OES_EGL_image 
> to be available in OpenGL contexts. */
>  EXT(OES_EGL_image_external  , OES_EGL_image_external 
> ,  x ,  x , ES1, ES2, 2010)
> --
> 2.7.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/5] nir: Do opt_algebraic in reverse order.

2016-02-07 Thread Jason Ekstrand

On Feb 4, 2016 5:45 PM, "Matt Turner"  wrote:
>
> Walking the SSA definitions in order means that we consider the smallest
> algebraic optimizations before larger optimizations. So if a smaller
> rule is part of a larger rule, the smaller one will happen first,
> preventing the larger one from happening.
>
> instructions in affected programs: 32721 -> 32611 (-0.34%)
> helped: 106
>
> Prevents regressions and annoyances in the next commits.

Mind doing just a little tooling to try and determine whether or not this
increases the number of times the optimization loop runs?  Some
Optimizations may immediately allow some other optimization on their result
which will now have to wait until the next time through the loop.
--Jason

> ---
>  src/compiler/nir/nir_algebraic.py | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/compiler/nir/nir_algebraic.py
b/src/compiler/nir/nir_algebraic.py
> index a30652f..77ad35e 100644
> --- a/src/compiler/nir/nir_algebraic.py
> +++ b/src/compiler/nir/nir_algebraic.py
> @@ -216,7 +216,7 @@ ${pass_name}_block(nir_block *block, void *void_state)
>  {
> struct opt_state *state = void_state;
>
> -   nir_foreach_instr_safe(block, instr) {
> +   nir_foreach_instr_reverse_safe(block, instr) {
>if (instr->type != nir_instr_type_alu)
>   continue;
>
> @@ -255,7 +255,7 @@ ${pass_name}_impl(nir_function_impl *impl, const bool
*condition_flags)
> state.progress = false;
> state.condition_flags = condition_flags;
>
> -   nir_foreach_block(impl, ${pass_name}_block, );
> +   nir_foreach_block_reverse(impl, ${pass_name}_block, );
>
> if (state.progress)
>nir_metadata_preserve(impl, nir_metadata_block_index |
> --
> 2.4.10
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 09/20] tgsi/ureg: add shared variables support for compute shaders

2016-02-07 Thread Samuel Pitoiset




On 02/07/2016 12:51 AM, Ilia Mirkin wrote:

On Sat, Feb 6, 2016 at 5:04 PM, Samuel Pitoiset
 wrote:

This introduces TGSI_FILE_MEMORY for shared, global and local memory.
Only shared memory is currently supported.

Changes from v2:
  - introduce TGSI_FILE_MEMORY

Signed-off-by: Samuel Pitoiset 
---
  src/gallium/auxiliary/tgsi/tgsi_build.c|  1 +
  src/gallium/auxiliary/tgsi/tgsi_dump.c |  5 +
  src/gallium/auxiliary/tgsi/tgsi_strings.c  |  1 +
  src/gallium/auxiliary/tgsi/tgsi_text.c |  3 +++
  src/gallium/auxiliary/tgsi/tgsi_ureg.c | 32 ++
  src/gallium/auxiliary/tgsi/tgsi_ureg.h |  3 +++
  src/gallium/include/pipe/p_shader_tokens.h |  4 +++-
  7 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c 
b/src/gallium/auxiliary/tgsi/tgsi_build.c
index 83f5062..cfe9b92 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_build.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_build.c
@@ -111,6 +111,7 @@ tgsi_default_declaration( void )
 declaration.Local = 0;
 declaration.Array = 0;
 declaration.Atomic = 0;
+   declaration.Shared = 0;
 declaration.Padding = 0;

 return declaration;
diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c 
b/src/gallium/auxiliary/tgsi/tgsi_dump.c
index 2ad29b9..36f0cc5 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_dump.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c
@@ -364,6 +364,11 @@ iter_declaration(
   TXT(", ATOMIC");
 }

+   if (decl->Declaration.File == TGSI_FILE_MEMORY) {
+  if (decl->Declaration.Shared)
+ TXT(", SHARED");
+   }
+
 if (decl->Declaration.File == TGSI_FILE_SAMPLER_VIEW) {
TXT(", ");
ENM(decl->SamplerView.Resource, tgsi_texture_names);
diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c 
b/src/gallium/auxiliary/tgsi/tgsi_strings.c
index f2d70d4..b15ae69 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_strings.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c
@@ -57,6 +57,7 @@ static const char *tgsi_file_names[] =
 "IMAGE",
 "SVIEW",
 "BUFFER",
+   "MEMORY",
  };

  const char *tgsi_semantic_names[TGSI_SEMANTIC_COUNT] =
diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c 
b/src/gallium/auxiliary/tgsi/tgsi_text.c
index 97b1869..ef43ebc 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_text.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
@@ -1381,6 +1381,9 @@ static boolean parse_declaration( struct translate_ctx 
*ctx )
   if (str_match_nocase_whole(, "ATOMIC")) {
  decl.Declaration.Atomic = 1;
  ctx->cur = cur;
+ } else if (str_match_nocase_whole(, "SHARED")) {
+decl.Declaration.Shared = 1;
+ctx->cur = cur;
   }
} else {
   if (str_match_nocase_whole(, "LOCAL")) {
diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c 
b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
index 9654ac5..e1a7278 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
@@ -189,6 +189,8 @@ struct ureg_program
 unsigned nr_instructions;

 struct ureg_tokens domain[2];
+
+   bool use_shared_memory;
  };

  static union tgsi_any_token error_tokens[32];
@@ -727,6 +729,16 @@ struct ureg_src ureg_DECL_buffer(struct ureg_program 
*ureg, unsigned nr,
 return reg;
  }

+/* Allocate a shared memory area.
+ */
+struct ureg_src ureg_DECL_shared_memory(struct ureg_program *ureg)
+{
+   struct ureg_src reg = ureg_src_register(TGSI_FILE_MEMORY, 0);
+
+   ureg->use_shared_memory = true;
+   return reg;
+}
+
  static int
  match_or_expand_immediate64( const unsigned *v,
   int type,
@@ -1654,6 +1666,23 @@ emit_decl_buffer(struct ureg_program *ureg,
  }

  static void
+emit_decl_shared_memory(struct ureg_program *ureg)
+{
+   union tgsi_any_token *out = get_tokens(ureg, DOMAIN_DECL, 2);
+
+   out[0].value = 0;
+   out[0].decl.Type = TGSI_TOKEN_TYPE_DECLARATION;
+   out[0].decl.NrTokens = 2;
+   out[0].decl.File = TGSI_FILE_MEMORY;
+   out[0].decl.UsageMask = TGSI_WRITEMASK_XYZW;
+   out[0].decl.Shared = true;
+
+   out[1].value = 0;
+   out[1].decl_range.First = 0;
+   out[1].decl_range.Last = 0;
+}
+
+static void
  emit_immediate( struct ureg_program *ureg,
  const unsigned *v,
  unsigned type )
@@ -1825,6 +1854,9 @@ static void emit_decls( struct ureg_program *ureg )
emit_decl_buffer(ureg, ureg->buffer[i].index, ureg->buffer[i].atomic);
 }

+   if (ureg->use_shared_memory)
+  emit_decl_shared_memory(ureg);
+
 if (ureg->const_decls.nr_constant_ranges) {
for (i = 0; i < ureg->const_decls.nr_constant_ranges; i++) {
   emit_decl_range(ureg,
diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.h 
b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
index 86e58a9..6a3b5dd 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_ureg.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
@@ -337,6 +337,9 @@

Re: [Mesa-dev] [PATCH v2 19/20] st/mesa: expose ARB_compute_shader when compute is supported

2016-02-07 Thread Samuel Pitoiset




On 02/07/2016 01:02 AM, Ilia Mirkin wrote:

On Sat, Feb 6, 2016 at 5:04 PM, Samuel Pitoiset
 wrote:

ARB_compute_shader is only enabled if the underlying driver exposes
TGSI through the PIPE_CAP_SHADER_SUPPORTED_IRS cap.

Changes from v2:
  - make use of the new PIPE_CAP_SHADER_SUPPORTED_IRS cap instead of
enabling the extension when PIPE_CAP_COMPUTE is enabled.

Signed-off-by: Samuel Pitoiset 
---
  src/mesa/state_tracker/st_extensions.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index 660c05e..072e9fd 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -479,6 +479,7 @@ void st_init_extensions(struct pipe_screen *screen,
  {
 unsigned i;
 int glsl_feature_level;
+   int shader_supported_irs;
 GLboolean *extension_table = (GLboolean *) extensions;

 static const struct st_extension_cap_mapping cap_mapping[] = {
@@ -1020,7 +1021,10 @@ void st_init_extensions(struct pipe_screen *screen,
 screen->get_param(screen, PIPE_CAP_STRING_MARKER))
extensions->GREMEDY_string_marker = GL_TRUE;

-   if (extensions->ARB_compute_shader) {
+   shader_supported_irs =
+  screen->get_shader_param(screen, PIPE_SHADER_COMPUTE,
+   PIPE_SHADER_CAP_SUPPORTED_IRS);


You should query for PIPE_CAP_COMPUTE first. If that returns false,
the get_shader_param function could reasonably be expected to not
properly handle the unexpected PIPE_SHADER_COMPUTE shader.


That seems better actually.



Also the shader_supported_irs thing seems a bit misnamed, since it
only refers to the compute shader supported irs...


compute_supported_irs?




+   if (shader_supported_irs & (1 << PIPE_SHADER_IR_TGSI)) {
uint64_t grid_size[3], block_size[3];

screen->get_compute_param(screen, PIPE_COMPUTE_CAP_MAX_GRID_SIZE,
@@ -1036,5 +1040,6 @@ void st_init_extensions(struct pipe_screen *screen,
   consts->MaxComputeWorkGroupCount[i] = grid_size[i];
   consts->MaxComputeWorkGroupSize[i] = block_size[i];
}
+  extensions->ARB_compute_shader = true;
 }
  }
--
2.6.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] tgsi: minor whitespace fixes in tgsi_scan.c

2016-02-07 Thread Marek Olšák

For the series:
Reviewed-by: Marek Olšák 

Marek

On Sat, Feb 6, 2016 at 1:56 AM, Brian Paul  wrote:
> ---
>  src/gallium/auxiliary/tgsi/tgsi_scan.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.c 
> b/src/gallium/auxiliary/tgsi/tgsi_scan.c
> index 42b62aa..489423d 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_scan.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_scan.c
> @@ -462,12 +462,10 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
>procType == TGSI_PROCESSOR_COMPUTE);
> info->processor = procType;
>
> -
> /**
>  ** Loop over incoming program tokens/instructions
>  */
> -   while( !tgsi_parse_end_of_tokens(  ) ) {
> -
> +   while (!tgsi_parse_end_of_tokens()) {
>info->num_tokens++;
>
>tgsi_parse_token(  );
> @@ -510,7 +508,7 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
>}
> }
>
> -   tgsi_parse_free ();
> +   tgsi_parse_free();
>  }
>
>
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 08/20] gallium: add PIPE_SHADER_CAP_SUPPORTED_IRS

2016-02-07 Thread Samuel Pitoiset




On 02/07/2016 12:50 AM, Ilia Mirkin wrote:

On Sat, Feb 6, 2016 at 5:04 PM, Samuel Pitoiset
 wrote:

This cap indicates the supported representations of programs. It should
be a flag with the pipe_shader_ir enum values. It will allow to enable
ARB_compute_shader if the underlying driver supports TGSI.

Signed-off-by: Samuel Pitoiset 
---
  src/gallium/auxiliary/gallivm/lp_bld_limits.h| 2 ++
  src/gallium/auxiliary/tgsi/tgsi_exec.h   | 2 ++
  src/gallium/docs/source/screen.rst   | 2 ++
  src/gallium/drivers/freedreno/freedreno_screen.c | 2 ++
  src/gallium/drivers/ilo/ilo_screen.c | 2 ++
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c   | 2 ++
  src/gallium/drivers/r300/r300_screen.c   | 4 
  src/gallium/drivers/r600/r600_pipe.c | 2 ++
  src/gallium/drivers/radeonsi/si_pipe.c   | 5 +
  src/gallium/drivers/svga/svga_screen.c   | 6 ++
  src/gallium/drivers/vc4/vc4_screen.c | 2 ++
  src/gallium/include/pipe/p_defines.h | 1 +
  12 files changed, 32 insertions(+)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_limits.h 
b/src/gallium/auxiliary/gallivm/lp_bld_limits.h
index 4598db8..a123b4a 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_limits.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_limits.h
@@ -128,6 +128,8 @@ gallivm_get_shader_param(enum pipe_shader_cap param)
return PIPE_MAX_SHADER_SAMPLER_VIEWS;
 case PIPE_SHADER_CAP_PREFERRED_IR:
return PIPE_SHADER_IR_TGSI;
+   case PIPE_SHADER_CAP_SUPPORTED_IRS:
+  return 1 << PIPE_SHADER_IR_TGSI;
 case PIPE_SHADER_CAP_TGSI_SQRT_SUPPORTED:
 case PIPE_SHADER_CAP_TGSI_ANY_INOUT_DECL_RANGE:
return 1;
diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.h 
b/src/gallium/auxiliary/tgsi/tgsi_exec.h
index 26fec8e..c807af9 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.h
@@ -465,6 +465,8 @@ tgsi_exec_get_shader_param(enum pipe_shader_cap param)
return PIPE_MAX_SHADER_SAMPLER_VIEWS;
 case PIPE_SHADER_CAP_PREFERRED_IR:
return PIPE_SHADER_IR_TGSI;
+   case PIPE_SHADER_CAP_SUPPORTED_IRS:
+  return 1 << PIPE_SHADER_IR_TGSI;


Everywhere else you return 0, but here and above you return TGSI...


Yes, as I said you the other day on IRC, I'm not sure about using 0 or 1 
<< PIPE_SHADER_IR_TGSI here. Comments are welcome.





 case PIPE_SHADER_CAP_TGSI_SQRT_SUPPORTED:
return 1;
 case PIPE_SHADER_CAP_DOUBLES:
diff --git a/src/gallium/docs/source/screen.rst 
b/src/gallium/docs/source/screen.rst
index 3324bcc..ee8b446 100644
--- a/src/gallium/docs/source/screen.rst
+++ b/src/gallium/docs/source/screen.rst
@@ -415,6 +415,8 @@ to be 0.
(also used to implement atomic counters). Having this be non-0 also
implies support for the ``LOAD``, ``STORE``, and ``ATOM*`` TGSI
opcodes.
+* ``PIPE_SHADER_CAP_SUPPORTED_IRS``: Supported representations of the
+  program.  It should be a flag with the ``pipe_shader_ir`` enum values.


... be a mask of ``pipe_shader_ir`` bits

perhaps that'd be more clear?


Yes.






  .. _pipe_compute_cap:
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index 27f4d26..5387ef3 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -434,6 +434,8 @@ fd_screen_get_shader_param(struct pipe_screen *pscreen, 
unsigned shader,
 return 16;
 case PIPE_SHADER_CAP_PREFERRED_IR:
 return PIPE_SHADER_IR_TGSI;
+   case PIPE_SHADER_CAP_SUPPORTED_IRS:
+   return 0;
 case PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT:
 return 32;
 case PIPE_SHADER_CAP_MAX_SHADER_BUFFERS:
diff --git a/src/gallium/drivers/ilo/ilo_screen.c 
b/src/gallium/drivers/ilo/ilo_screen.c
index 44d7c11..ef9da6b 100644
--- a/src/gallium/drivers/ilo/ilo_screen.c
+++ b/src/gallium/drivers/ilo/ilo_screen.c
@@ -136,6 +136,8 @@ ilo_get_shader_param(struct pipe_screen *screen, unsigned 
shader,
return ILO_MAX_SAMPLER_VIEWS;
 case PIPE_SHADER_CAP_PREFERRED_IR:
return PIPE_SHADER_IR_TGSI;
+   case PIPE_SHADER_CAP_SUPPORTED_IRS:
+  return 0;
 case PIPE_SHADER_CAP_TGSI_SQRT_SUPPORTED:
return 1;
 case PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT:
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index d368fda..2b12de4 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -272,6 +272,8 @@ nvc0_screen_get_shader_param(struct pipe_screen *pscreen, 
unsigned shader,
 switch (param) {
 case PIPE_SHADER_CAP_PREFERRED_IR:
return PIPE_SHADER_IR_TGSI;
+   case PIPE_SHADER_CAP_SUPPORTED_IRS:
+  return 0;
 case PIPE_SHADER_CAP_MAX_INSTRUCTIONS:

Re: [Mesa-dev] [PATCH v2 05/20] gallium/cso: add support for compute shaders

2016-02-07 Thread Marek Olšák

On Sun, Feb 7, 2016 at 12:02 PM, Samuel Pitoiset
 wrote:
>
>
> On 02/07/2016 12:28 AM, Ilia Mirkin wrote:
>>
>> I believe that the previous suggestion was that cso didn't need to
>> know anything about compute shaders... just call the functions
>> directly.
>
>
> It seems like the previous suggestion was to *only* remove
> cso_{save,restore}_compute_shader() because they are currently not used.
>
> Let's wait for Marek to be sure. :-)

It's okay this way. The unbinding on delete seems useful.

Reviewed-by: Marek Olšák 

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 07/12] nvc0: add support for indirect compute on Fermi

2016-02-07 Thread Samuel Pitoiset




On 02/07/2016 05:56 AM, Ilia Mirkin wrote:

On Sat, Feb 6, 2016 at 6:13 PM, Ilia Mirkin  wrote:

On Sat, Feb 6, 2016 at 5:38 PM, Samuel Pitoiset
 wrote:

When indirect compute is used, the size of the grid (in blocks) is
stored as three integers inside a buffer. This requires a macro to
set up GRIDDIM_YX and GRIDDIM_Z.

Signed-off-by: Samuel Pitoiset 
---
  src/gallium/drivers/nouveau/nvc0/mme/Makefile  |  2 +-
  src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme   | 19 +++
  src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme.h | 13 +
  src/gallium/drivers/nouveau/nvc0/nvc0_compute.c| 18 +++---
  src/gallium/drivers/nouveau/nvc0/nvc0_macros.h |  2 ++
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |  2 ++
  6 files changed, 52 insertions(+), 4 deletions(-)
  create mode 100644 src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme
  create mode 100644 src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme.h

diff --git a/src/gallium/drivers/nouveau/nvc0/mme/Makefile 
b/src/gallium/drivers/nouveau/nvc0/mme/Makefile
index 1c0f583..52fb0a5 100644
--- a/src/gallium/drivers/nouveau/nvc0/mme/Makefile
+++ b/src/gallium/drivers/nouveau/nvc0/mme/Makefile
@@ -1,5 +1,5 @@
  ENVYAS?=envyas
-TARGETS=com9097.mme.h
+TARGETS=com9097.mme.h com90c0.mme.h

  all: $(TARGETS)

diff --git a/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme 
b/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme
new file mode 100644
index 000..ee7f726
--- /dev/null
+++ b/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme
@@ -0,0 +1,19 @@
+/* NVC0_COMPUTE_MACRO_LAUNCH_GRID_INDIRECT
+ *
+ * arg = num_groups_x
+ * parm[0] = num_groups_y
+ * parm[1] = num_groups_z
+ */
+.section #mme90c0_launch_grid_indirect
+   parm $r2
+   parm $r3
+   mov $r4 (or $r1 $r2)
+   mov $r4 (or $r3 $r4)
+   braz $r4 #fail
+   maddr 0x108e /* GRIDDIM_YX */


You can move this up, e.g.

parm $r2 maddr 0x108e /* GRIDDIM_XY */


+   mov $r4 (extrshl $r2 $r0 0x10 0x10)


If you make this

(extrinsrt $r1 $r2 0x0 0x10 0x10)


Oh and even better, do this as part of the computation that precedes
the braz, that way you save another op :)


mmh? How this can still be reduced? Currently I have:

.section #mme90c0_launch_grid_indirect
   parm $r2 maddr 0x108e /* GRIDDOM_YX */
   parm $r3
   mov $r4 (or $r1 $r2)
   mov $r4 (or $r3 $r4)
   braz $r4 #fail
   exit send (extrinsrt $r1 $r2 0x0 0x10 0x10) /* (num_groups_y << 16) 
| num_groups_x */

   send $r3
fail:
   nop
   exit





then you can make it directly an argument to send, avoiding the separate or.


+   exit send (or $r4 $r1) /* (num_groups_y << 16) | num_groups_x */
+   send $r3
+fail:
+   exit


I think you need a nop here.


+
diff --git a/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme.h 
b/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme.h
new file mode 100644


I think Emil is going to yell at you about not adding this file to
some list somewhere so that make dist picks it up.


index 000..89076cf
--- /dev/null
+++ b/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme.h
@@ -0,0 +1,13 @@
+uint32_t mme90c0_launch_grid_indirect[] = {
+   0x0201,
+   0x0301,
+/* 0x0009: fail */
+   0x00128c10,
+   0x00131c10,
+   0x00016007,
+   0x04238021,
+   0x84008413,
+   0x001260c0,
+   0x1841,
+   0x0091,
+};
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
index e63bdcb..dbf2148 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
@@ -452,9 +452,21 @@ nvc0_launch_grid(struct pipe_context *pipe, const struct 
pipe_grid_info *info)
 PUSH_DATA (push, cp->num_gprs);

 /* grid/block setup */
-   BEGIN_NVC0(push, NVC0_COMPUTE(GRIDDIM_YX), 2);
-   PUSH_DATA (push, (info->grid[1] << 16) | info->grid[0]);
-   PUSH_DATA (push, info->grid[2]);
+   if (unlikely(info->indirect)) {
+  struct nv04_resource *res = nv04_resource(info->indirect);
+  uint32_t offset = res->offset + info->indirect_offset;
+  unsigned macro = NVC0_COMPUTE_MACRO_LAUNCH_GRID_INDIRECT;
+
+  nouveau_pushbuf_space(push, 16, 0, 1);
+  PUSH_REFN(push, res->bo, NOUVEAU_BO_RD | res->domain);
+  PUSH_DATA(push, NVC0_FIFO_PKHDR_1I(1, macro, 3));
+  nouveau_pushbuf_data(push, res->bo, offset,
+   NVC0_IB_ENTRY_1_NO_PREFETCH | 3 * 4);
+   } else {
+  BEGIN_NVC0(push, NVC0_COMPUTE(GRIDDIM_YX), 2);
+  PUSH_DATA (push, (info->grid[1] << 16) | info->grid[0]);
+  PUSH_DATA (push, info->grid[2]);
+   }
 BEGIN_NVC0(push, NVC0_COMPUTE(BLOCKDIM_YX), 2);
 PUSH_DATA (push, (info->block[1] << 16) | info->block[0]);
 PUSH_DATA (push, info->block[2]);
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_macros.h 
b/src/gallium/drivers/nouveau/nvc0/nvc0_macros.h
index 49e176c..57262fe 100644
---

Re: [Mesa-dev] [PATCH 3/3] gallium/util: switch over to new u_debug_image.[ch] code

2016-02-07 Thread Marek Olšák

On Sat, Feb 6, 2016 at 1:54 AM, Brian Paul  wrote:
> ---
>  src/gallium/auxiliary/Makefile.sources |   4 +-
>  src/gallium/auxiliary/util/u_debug.c   | 313 
> +
>  src/gallium/auxiliary/util/u_debug.h   |  39 
>  src/gallium/drivers/llvmpipe/lp_flush.c|   1 +
>  src/gallium/drivers/softpipe/sp_flush.c|   1 +
>  src/gallium/drivers/svga/svga_pipe_flush.c |   1 +
>  src/gallium/targets/graw-null/graw_util.c  |   1 +
>  src/gallium/tests/graw/graw_util.h |   1 +
>  src/gallium/tests/trivial/quad-tex.c   |   2 +-
>  src/gallium/tests/trivial/tri.c|   2 +-
>  10 files changed, 12 insertions(+), 353 deletions(-)
>
> diff --git a/src/gallium/auxiliary/Makefile.sources 
> b/src/gallium/auxiliary/Makefile.sources
> index 6f50f71..84da85c 100644
> --- a/src/gallium/auxiliary/Makefile.sources
> +++ b/src/gallium/auxiliary/Makefile.sources
> @@ -191,11 +191,13 @@ C_SOURCES := \
> util/u_cpu_detect.c \
> util/u_cpu_detect.h \
> util/u_debug.c \
> +   util/u_debug.h \
> util/u_debug_describe.c \
> util/u_debug_describe.h \
> util/u_debug_flush.c \
> util/u_debug_flush.h \
> -   util/u_debug.h \
> +   util/u_debug_image.c \
> +   util/u_debug_image.h \
> util/u_debug_memory.c \
> util/u_debug_refcnt.c \
> util/u_debug_refcnt.h \
> diff --git a/src/gallium/auxiliary/util/u_debug.c 
> b/src/gallium/auxiliary/util/u_debug.c
> index 7a3d51f..f378415 100644
> --- a/src/gallium/auxiliary/util/u_debug.c
> +++ b/src/gallium/auxiliary/util/u_debug.c
> @@ -38,9 +38,9 @@
>  #include "util/u_memory.h"
>  #include "util/u_string.h"
>  #include "util/u_math.h"
> -#include "util/u_tile.h"
> +//#include "util/u_tile.h"
>  #include "util/u_prim.h"
> -#include "util/u_surface.h"
> +//#include "util/u_surface.h"

Did you mean to remove these? With that done, the series is:
Reviewed-by: Marek Olšák 

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] gallium AoA support and indirect sampler fixes

2016-02-07 Thread Laurent Carlier

Le vendredi 5 février 2016, 13:40:26 CET Dave Airlie a écrit :
> Hi,
> 
> In fixing some indirect sampler issues with ARB_gpu_shader5,
> I realised AoA was mostly fixed as well by the same things.
> 
> Ilia made me fix atomics as well.
> 
> So thise patch set enables AoA support on all gallium drivers
> exposing GLSL 1.30.
> 
> Dave.
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

I 've quickly tested the series, Shadow of Mordor segfault on start and first 
intro movie from witcher 2 is greenish then it segfault.

On top of mesa-git with llvm-svn (both trunk) and amdgpu/kernel-4.5rc2
-- 
Laurent Carlier
http://www.archlinux.org

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] trace: remove useless MALLOC() in trace_context_draw_vbo()

2016-02-07 Thread Samuel Pitoiset




On 02/07/2016 10:54 PM, Ilia Mirkin wrote:

Reviewed-by: Ilia Mirkin 

Please double-check that GALLIUM_TRACE=foo and
bin/arb_indirect_parameters-tf-count -fbo -auto work together.


I have already double-checked with bin/arb_draw_indirect-draw-arrays.
Both tests work fine, and the trace looks good.



On Sun, Feb 7, 2016 at 4:36 PM, Samuel Pitoiset
 wrote:

There is no need to allocate memory when unwrapping the indirect buf.

Signed-off-by: Samuel Pitoiset 
---
  src/gallium/drivers/trace/tr_context.c | 17 ++---
  1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/trace/tr_context.c 
b/src/gallium/drivers/trace/tr_context.c
index b04c88d..2280898 100644
--- a/src/gallium/drivers/trace/tr_context.c
+++ b/src/gallium/drivers/trace/tr_context.c
@@ -120,18 +120,13 @@ trace_context_draw_vbo(struct pipe_context *_pipe,
 trace_dump_trace_flush();

 if (info->indirect) {
-  struct pipe_draw_info *_info = NULL;
+  struct pipe_draw_info _info;

-  _info = MALLOC(sizeof(*_info));
-  if (!_info)
- return;
-
-  memcpy(_info, info, sizeof(*_info));
-  _info->indirect = trace_resource_unwrap(tr_ctx, _info->indirect);
-  _info->indirect_params = trace_resource_unwrap(tr_ctx,
- _info->indirect_params);
-  pipe->draw_vbo(pipe, _info);
-  FREE(_info);
+  memcpy(&_info, info, sizeof(_info));
+  _info.indirect = trace_resource_unwrap(tr_ctx, _info.indirect);
+  _info.indirect_params = trace_resource_unwrap(tr_ctx,
+_info.indirect_params);
+  pipe->draw_vbo(pipe, &_info);
 } else {
pipe->draw_vbo(pipe, info);
 }
--
2.6.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] glsl: return cloned signature, not the builtin one

2016-02-07 Thread Rob Herring

On Sat, Feb 6, 2016 at 4:10 PM, Ilia Mirkin  wrote:
> The builtin data can get released with a glReleaseShaderCompiler call.
> We're careful everywhere to clone everything that comes out of builtins
> except here, where we accidentally return the signature belonging to the
> builtin version, rather than the locally-cloned one.
>
> Signed-off-by: Ilia Mirkin 
> Cc: mesa-sta...@lists.freedesktop.org

Tested-by: Rob Herring 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] glsl: make sure builtins are initialized before getting the shader

2016-02-07 Thread Rob Herring

On Sat, Feb 6, 2016 at 4:10 PM, Ilia Mirkin  wrote:
> Signed-off-by: Ilia Mirkin 
> Cc: mesa-sta...@lists.freedesktop.org

Thanks for digging into this.

Tested-by: Rob Herring 

> ---
>  src/compiler/glsl/linker.cpp | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
> index 4776ffa..f1ac53a 100644
> --- a/src/compiler/glsl/linker.cpp
> +++ b/src/compiler/glsl/linker.cpp
> @@ -2125,6 +2125,7 @@ link_intrastage_shaders(void *mem_ctx,
>
>if (ok) {
>   memcpy(linking_shaders, shader_list, num_shaders * sizeof(gl_shader 
> *));
> + _mesa_glsl_initialize_builtin_functions();
>   linking_shaders[num_shaders] = 
> _mesa_glsl_get_builtin_function_shader();
>
>   ok = link_function_calls(prog, linked, linking_shaders, num_shaders 
> + 1);
> --
> 2.4.10
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 09/11] nv50/ir: make OP_SELP a compare instruction

2016-02-07 Thread Samuel Pitoiset

This OP_SELP insn will be used to handle compare and swap subops.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 8 
 src/gallium/drivers/nouveau/codegen/nv50_ir_inlines.h | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index c7239b3..f6605eb 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -120,7 +120,7 @@ private:
 
void emitSET(const CmpInstruction *);
void emitSLCT(const CmpInstruction *);
-   void emitSELP(const Instruction *);
+   void emitSELP(const CmpInstruction *);
 
void emitTEXBAR(const Instruction *);
void emitTEX(const TexInstruction *);
@@ -1170,11 +1170,11 @@ CodeEmitterNVC0::emitSLCT(const CmpInstruction *i)
   code[0] |= 1 << 5;
 }
 
-void CodeEmitterNVC0::emitSELP(const Instruction *i)
+void CodeEmitterNVC0::emitSELP(const CmpInstruction *i)
 {
emitForm_A(i, HEX64(2000, 0004));
 
-   if (i->cc == CC_NOT_P || i->src(2).mod & Modifier(NV50_IR_MOD_NOT))
+   if (i->setCond == CC_NOT_P || i->src(2).mod & Modifier(NV50_IR_MOD_NOT))
   code[1] |= 1 << 20;
 }
 
@@ -2433,7 +2433,7 @@ CodeEmitterNVC0::emitInstruction(Instruction *insn)
   emitSET(insn->asCmp());
   break;
case OP_SELP:
-  emitSELP(insn);
+  emitSELP(insn->asCmp());
   break;
case OP_SLCT:
   emitSLCT(insn->asCmp());
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_inlines.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_inlines.h
index e465f24..02e6157 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_inlines.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_inlines.h
@@ -281,14 +281,14 @@ Value *TexInstruction::getIndirectS() const
 
 CmpInstruction *Instruction::asCmp()
 {
-   if (op >= OP_SET_AND && op <= OP_SLCT && op != OP_SELP)
+   if (op >= OP_SET_AND && op <= OP_SLCT)
   return static_cast(this);
return NULL;
 }
 
 const CmpInstruction *Instruction::asCmp() const
 {
-   if (op >= OP_SET_AND && op <= OP_SLCT && op != OP_SELP)
+   if (op >= OP_SET_AND && op <= OP_SLCT)
   return static_cast(this);
return NULL;
 }
-- 
2.6.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 01/11] nvc0: allocate an area for compute user constbufs

2016-02-07 Thread Samuel Pitoiset

For compute shaders, we might need to upload uniforms.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 14 +++---
 src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c | 12 ++--
 src/gallium/drivers/nouveau/nvc0/nvc0_tex.c|  2 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c| 10 ++
 4 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 2b12de4..84e4253 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -889,7 +889,7 @@ nvc0_screen_create(struct nouveau_device *dev)
 */
nouveau_heap_init(>text_heap, 0, (1 << 20) - 0x100);
 
-   ret = nouveau_bo_new(dev, NV_VRAM_DOMAIN(>base), 1 << 12, 6 << 16, 
NULL,
+   ret = nouveau_bo_new(dev, NV_VRAM_DOMAIN(>base), 1 << 12, 7 << 16, 
NULL,
 >uniform_bo);
if (ret)
   goto fail;
@@ -901,8 +901,8 @@ nvc0_screen_create(struct nouveau_device *dev)
   /* auxiliary constants (6 user clip planes, base instance id) */
   BEGIN_NVC0(push, NVC0_3D(CB_SIZE), 3);
   PUSH_DATA (push, 1024);
-  PUSH_DATAh(push, screen->uniform_bo->offset + (5 << 16) + (i << 10));
-  PUSH_DATA (push, screen->uniform_bo->offset + (5 << 16) + (i << 10));
+  PUSH_DATAh(push, screen->uniform_bo->offset + (6 << 16) + (i << 10));
+  PUSH_DATA (push, screen->uniform_bo->offset + (6 << 16) + (i << 10));
   BEGIN_NVC0(push, NVC0_3D(CB_BIND(i)), 1);
   PUSH_DATA (push, (15 << 4) | 1);
   if (screen->eng3d->oclass >= NVE4_3D_CLASS) {
@@ -922,8 +922,8 @@ nvc0_screen_create(struct nouveau_device *dev)
/* return { 0.0, 0.0, 0.0, 0.0 } for out-of-bounds vtxbuf access */
BEGIN_NVC0(push, NVC0_3D(CB_SIZE), 3);
PUSH_DATA (push, 256);
-   PUSH_DATAh(push, screen->uniform_bo->offset + (5 << 16) + (6 << 10));
-   PUSH_DATA (push, screen->uniform_bo->offset + (5 << 16) + (6 << 10));
+   PUSH_DATAh(push, screen->uniform_bo->offset + (6 << 16) + (6 << 10));
+   PUSH_DATA (push, screen->uniform_bo->offset + (6 << 16) + (6 << 10));
BEGIN_1IC0(push, NVC0_3D(CB_POS), 5);
PUSH_DATA (push, 0);
PUSH_DATAf(push, 0.0f);
@@ -931,8 +931,8 @@ nvc0_screen_create(struct nouveau_device *dev)
PUSH_DATAf(push, 0.0f);
PUSH_DATAf(push, 0.0f);
BEGIN_NVC0(push, NVC0_3D(VERTEX_RUNOUT_ADDRESS_HIGH), 2);
-   PUSH_DATAh(push, screen->uniform_bo->offset + (5 << 16) + (6 << 10));
-   PUSH_DATA (push, screen->uniform_bo->offset + (5 << 16) + (6 << 10));
+   PUSH_DATAh(push, screen->uniform_bo->offset + (6 << 16) + (6 << 10));
+   PUSH_DATA (push, screen->uniform_bo->offset + (6 << 16) + (6 << 10));
 
if (screen->base.drm->version >= 0x01000101) {
   ret = nouveau_getparam(dev, NOUVEAU_GETPARAM_GRAPH_UNITS, );
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
index c17223a..2bb9b44 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
@@ -184,8 +184,8 @@ nvc0_validate_fb(struct nvc0_context *nvc0)
 ms = 1 << ms_mode;
 BEGIN_NVC0(push, NVC0_3D(CB_SIZE), 3);
 PUSH_DATA (push, 1024);
-PUSH_DATAh(push, nvc0->screen->uniform_bo->offset + (5 << 16) + (4 << 10));
-PUSH_DATA (push, nvc0->screen->uniform_bo->offset + (5 << 16) + (4 << 10));
+PUSH_DATAh(push, nvc0->screen->uniform_bo->offset + (6 << 16) + (4 << 10));
+PUSH_DATA (push, nvc0->screen->uniform_bo->offset + (6 << 16) + (4 << 10));
 BEGIN_1IC0(push, NVC0_3D(CB_POS), 1 + 2 * ms);
 PUSH_DATA (push, 256 + 128);
 for (i = 0; i < ms; i++) {
@@ -318,8 +318,8 @@ nvc0_upload_uclip_planes(struct nvc0_context *nvc0, 
unsigned s)
 
BEGIN_NVC0(push, NVC0_3D(CB_SIZE), 3);
PUSH_DATA (push, 1024);
-   PUSH_DATAh(push, bo->offset + (5 << 16) + (s << 10));
-   PUSH_DATA (push, bo->offset + (5 << 16) + (s << 10));
+   PUSH_DATAh(push, bo->offset + (6 << 16) + (s << 10));
+   PUSH_DATA (push, bo->offset + (6 << 16) + (s << 10));
BEGIN_1IC0(push, NVC0_3D(CB_POS), PIPE_MAX_CLIP_PLANES * 4 + 1);
PUSH_DATA (push, 256);
PUSH_DATAp(push, >clip.ucp[0][0], PIPE_MAX_CLIP_PLANES * 4);
@@ -479,8 +479,8 @@ nvc0_validate_buffers(struct nvc0_context *nvc0)
for (s = 0; s < 5; s++) {
   BEGIN_NVC0(push, NVC0_3D(CB_SIZE), 3);
   PUSH_DATA (push, 1024);
-  PUSH_DATAh(push, nvc0->screen->uniform_bo->offset + (5 << 16) + (s << 
10));
-  PUSH_DATA (push, nvc0->screen->uniform_bo->offset + (5 << 16) + (s << 
10));
+  PUSH_DATAh(push, nvc0->screen->uniform_bo->offset + (6 << 16) + (s << 
10));
+  PUSH_DATA (push, nvc0->screen->uniform_bo->offset + (6 << 16) + (s << 
10));
   BEGIN_1IC0(push, NVC0_3D(CB_POS), 1 + 4 * NVC0_MAX_BUFFERS);
   PUSH_DATA (push, 512);
   for (i = 0; i <

[Mesa-dev] [PATCH v2 06/11] nvc0: add support for indirect compute on Fermi

2016-02-07 Thread Samuel Pitoiset

When indirect compute is used, the size of the grid (in blocks) is
stored as three integers inside a buffer. This requires a macro to
set up GRIDDIM_YX and GRIDDIM_Z.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/mme/Makefile  |  2 +-
 src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme   | 24 ++
 src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme.h | 19 
 src/gallium/drivers/nouveau/nvc0/nvc0_compute.c| 52 ++
 src/gallium/drivers/nouveau/nvc0/nvc0_macros.h |  2 +
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |  2 +
 6 files changed, 81 insertions(+), 20 deletions(-)
 create mode 100644 src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme
 create mode 100644 src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme.h

diff --git a/src/gallium/drivers/nouveau/nvc0/mme/Makefile 
b/src/gallium/drivers/nouveau/nvc0/mme/Makefile
index 1c0f583..52fb0a5 100644
--- a/src/gallium/drivers/nouveau/nvc0/mme/Makefile
+++ b/src/gallium/drivers/nouveau/nvc0/mme/Makefile
@@ -1,5 +1,5 @@
 ENVYAS?=envyas
-TARGETS=com9097.mme.h
+TARGETS=com9097.mme.h com90c0.mme.h
 
 all: $(TARGETS)
 
diff --git a/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme 
b/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme
new file mode 100644
index 000..a3f1bde
--- /dev/null
+++ b/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme
@@ -0,0 +1,24 @@
+/* NVC0_COMPUTE_MACRO_LAUNCH_GRID_INDIRECT
+ *
+ * arg = num_groups_x
+ * parm[0] = num_groups_y
+ * parm[1] = num_groups_z
+ */
+.section #mme90c0_launch_grid_indirect
+   parm $r2 maddr 0x108e /* GRIDDIM_YX */
+   braz $r1 #fail
+   parm $r3
+   braz annul $r2 #fail
+   braz annul $r3 #fail
+   send (extrinsrt $r1 $r2 0x0 0x10 0x10) /* num_groups_y << 16 | num_groups_x 
*/
+   send $r3
+   maddrsend 0xa7 /* COMPUTE_BEGIN */
+   maddrsend 0x282 /* UNKA08 */
+   maddr 0xda /* LAUNCH */
+   send 0x1000
+   maddrsend 0x281 /* COMPUTE_END */
+   exit maddr 0xd8 /* UNK360 */
+   send 0x1
+fail:
+   exit
+   nop
diff --git a/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme.h 
b/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme.h
new file mode 100644
index 000..1dc06e5
--- /dev/null
+++ b/src/gallium/drivers/nouveau/nvc0/mme/com90c0.mme.h
@@ -0,0 +1,19 @@
+uint32_t mme90c0_launch_grid_indirect[] = {
+   0x04238251,
+   0x00034807,
+   0x0301,
+/* 0x000e: fail */
+   0x0002d027,
+   0x00029827,
+   0x84008842,
+   0x1841,
+   0x0029c071,
+   0x00a08071,
+   0x00368021,
+   0x0441,
+   0x00a04071,
+   0x003600a1,
+   0x4041,
+   0x0091,
+   0x0011,
+};
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
index 0a4efc0..bcd1c7c 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
@@ -348,14 +348,6 @@ nvc0_launch_grid(struct pipe_context *pipe, const struct 
pipe_grid_info *info)
BEGIN_NVC0(push, NVC0_COMPUTE(CP_GPR_ALLOC), 1);
PUSH_DATA (push, cp->num_gprs);
 
-   /* grid/block setup */
-   BEGIN_NVC0(push, NVC0_COMPUTE(GRIDDIM_YX), 2);
-   PUSH_DATA (push, (info->grid[1] << 16) | info->grid[0]);
-   PUSH_DATA (push, info->grid[2]);
-   BEGIN_NVC0(push, NVC0_COMPUTE(BLOCKDIM_YX), 2);
-   PUSH_DATA (push, (info->block[1] << 16) | info->block[0]);
-   PUSH_DATA (push, info->block[2]);
-
/* launch preliminary setup */
BEGIN_NVC0(push, NVC0_COMPUTE(GRIDID), 1);
PUSH_DATA (push, 0x1);
@@ -364,17 +356,39 @@ nvc0_launch_grid(struct pipe_context *pipe, const struct 
pipe_grid_info *info)
BEGIN_NVC0(push, NVC0_COMPUTE(FLUSH), 1);
PUSH_DATA (push, NVC0_COMPUTE_FLUSH_GLOBAL | NVC0_COMPUTE_FLUSH_UNK8);
 
-   /* kernel launching */
-   BEGIN_NVC0(push, NVC0_COMPUTE(COMPUTE_BEGIN), 1);
-   PUSH_DATA (push, 0);
-   BEGIN_NVC0(push, SUBC_COMPUTE(0x0a08), 1);
-   PUSH_DATA (push, 0);
-   BEGIN_NVC0(push, NVC0_COMPUTE(LAUNCH), 1);
-   PUSH_DATA (push, 0x1000);
-   BEGIN_NVC0(push, NVC0_COMPUTE(COMPUTE_END), 1);
-   PUSH_DATA (push, 0);
-   BEGIN_NVC0(push, SUBC_COMPUTE(0x0360), 1);
-   PUSH_DATA (push, 0x1);
+   /* block setup */
+   BEGIN_NVC0(push, NVC0_COMPUTE(BLOCKDIM_YX), 2);
+   PUSH_DATA (push, (info->block[1] << 16) | info->block[0]);
+   PUSH_DATA (push, info->block[2]);
+
+   if (unlikely(info->indirect)) {
+  struct nv04_resource *res = nv04_resource(info->indirect);
+  uint32_t offset = res->offset + info->indirect_offset;
+  unsigned macro = NVC0_COMPUTE_MACRO_LAUNCH_GRID_INDIRECT;
+
+  nouveau_pushbuf_space(push, 16, 0, 1);
+  PUSH_REFN(push, res->bo, NOUVEAU_BO_RD | res->domain);
+  PUSH_DATA(push, NVC0_FIFO_PKHDR_1I(1, macro, 3));
+  nouveau_pushbuf_data(push, res->bo, offset,
+   NVC0_IB_ENTRY_1_NO_PREFETCH | 3 * 4);
+   } else {
+  /* grid setup */
+  BEGIN_NVC0(push, NVC0_COMPUTE(GRIDDIM_YX), 2);
+  PUSH_DATA (push,

[Mesa-dev] [PATCH v2 08/11] nv50/ir: add lock/unlock subops for load/store

2016-02-07 Thread Samuel Pitoiset

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir.h|  2 ++
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp| 16 ++--
 src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp| 10 ++
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index 9d7becf..97ebed4 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -232,6 +232,8 @@ enum operation
 #define NV50_IR_SUBOP_SHFL_UP   1
 #define NV50_IR_SUBOP_SHFL_DOWN 2
 #define NV50_IR_SUBOP_SHFL_BFLY 3
+#define NV50_IR_SUBOP_LOAD_LOCKED1
+#define NV50_IR_SUBOP_STORE_UNLOCKED 2
 #define NV50_IR_SUBOP_MADSP_SD 0x
 // Yes, we could represent those with DataType.
 // Or put the type into operation and have a couple 1000 values in that enum.
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index 8637db9..c7239b3 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -1773,7 +1773,13 @@ CodeEmitterNVC0::emitSTORE(const Instruction *i)
switch (i->src(0).getFile()) {
case FILE_MEMORY_GLOBAL: opc = 0x9000; break;
case FILE_MEMORY_LOCAL:  opc = 0xc800; break;
-   case FILE_MEMORY_SHARED: opc = 0xc900; break;
+   case FILE_MEMORY_SHARED:
+  opc = 0xc800;
+  if (i->subOp == NV50_IR_SUBOP_STORE_UNLOCKED)
+ opc |= (1 << 26);
+  else
+ opc |= (1 << 24);
+  break;
default:
   assert(!"invalid memory file");
   opc = 0;
@@ -1804,7 +1810,13 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i)
switch (i->src(0).getFile()) {
case FILE_MEMORY_GLOBAL: opc = 0x8000; break;
case FILE_MEMORY_LOCAL:  opc = 0xc000; break;
-   case FILE_MEMORY_SHARED: opc = 0xc100; break;
+   case FILE_MEMORY_SHARED:
+  opc = 0xc000;
+  if (i->subOp == NV50_IR_SUBOP_LOAD_LOCKED)
+ opc |= (1 << 26);
+  else
+ opc |= (1 << 24);
+  break;
case FILE_MEMORY_CONST:
   if (!i->src(0).isIndirect(0) && typeSizeof(i->dType) == 4) {
  emitMOV(i); // not sure if this is any better
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
index 47285a2..85f7704 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
@@ -198,6 +198,11 @@ static const char *atomSubOpStr[] =
"add", "min", "max", "inc", "dec", "and", "or", "xor", "cas", "exch"
 };
 
+static const char *ldstSubOpStr[] =
+{
+   "", "lock", "unlock"
+};
+
 static const char *DataTypeStr[] =
 {
"-",
@@ -537,6 +542,11 @@ void Instruction::print() const
  if (subOp < Elements(atomSubOpStr))
 PRINT("%s ", atomSubOpStr[subOp]);
  break;
+  case OP_LOAD:
+  case OP_STORE:
+ if (subOp < Elements(ldstSubOpStr))
+PRINT("%s ", ldstSubOpStr[subOp]);
+ break;
   default:
  if (subOp)
 PRINT("(SUBOP:%u) ", subOp);
-- 
2.6.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 05/11] nvc0: bind textures/samplers for compute on Fermi

2016-02-07 Thread Samuel Pitoiset

Changes from v2:
 - refactor the code to share (almost) the same logic between 3d and
   compute

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_compute.c | 38 +++--
 src/gallium/drivers/nouveau/nvc0/nvc0_context.h |  2 ++
 src/gallium/drivers/nouveau/nvc0/nvc0_tex.c | 25 
 3 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
index 2e8a69e..0a4efc0 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
@@ -105,7 +105,17 @@ nvc0_screen_compute_setup(struct nvc0_screen *screen,
PUSH_DATAh(push, screen->text->offset);
PUSH_DATA (push, screen->text->offset);
 
-   /* TODO: textures & samplers */
+   /* textures */
+   BEGIN_NVC0(push, NVC0_COMPUTE(TIC_ADDRESS_HIGH), 3);
+   PUSH_DATAh(push, screen->txc->offset);
+   PUSH_DATA (push, screen->txc->offset);
+   PUSH_DATA (push, NVC0_TIC_MAX_ENTRIES - 1);
+
+   /* samplers */
+   BEGIN_NVC0(push, NVC0_COMPUTE(TSC_ADDRESS_HIGH), 3);
+   PUSH_DATAh(push, screen->txc->offset + 65536);
+   PUSH_DATA (push, screen->txc->offset + 65536);
+   PUSH_DATA (push, NVC0_TSC_MAX_ENTRIES - 1);
 
return 0;
 }
@@ -139,6 +149,26 @@ nvc0_compute_validate_program(struct nvc0_context *nvc0)
 }
 
 static void
+nvc0_compute_validate_samplers(struct nvc0_context *nvc0)
+{
+   bool need_flush = nvc0_validate_tsc(nvc0, 5);
+   if (need_flush) {
+  BEGIN_NVC0(nvc0->base.pushbuf, NVC0_COMPUTE(TSC_FLUSH), 1);
+  PUSH_DATA (nvc0->base.pushbuf, 0);
+   }
+}
+
+static void
+nvc0_compute_validate_textures(struct nvc0_context *nvc0)
+{
+   bool need_flush = nvc0_validate_tic(nvc0, 5);
+   if (need_flush) {
+  BEGIN_NVC0(nvc0->base.pushbuf, NVC0_COMPUTE(TIC_FLUSH), 1);
+  PUSH_DATA (nvc0->base.pushbuf, 0);
+   }
+}
+
+static void
 nvc0_compute_validate_constbufs(struct nvc0_context *nvc0)
 {
struct nouveau_pushbuf *push = nvc0->base.pushbuf;
@@ -233,12 +263,16 @@ nvc0_compute_state_validate(struct nvc0_context *nvc0)
 {
if (!nvc0_compute_validate_program(nvc0))
   return false;
+   if (nvc0->dirty_cp & NVC0_NEW_CP_TEXTURES)
+  nvc0_compute_validate_textures(nvc0);
+   if (nvc0->dirty_cp & NVC0_NEW_CP_SAMPLERS)
+  nvc0_compute_validate_samplers(nvc0);
if (nvc0->dirty_cp & NVC0_NEW_CP_CONSTBUF)
   nvc0_compute_validate_constbufs(nvc0);
if (nvc0->dirty_cp & NVC0_NEW_CP_BUFFERS)
   nvc0_compute_validate_buffers(nvc0);
 
-   /* TODO: textures, samplers, surfaces, global memory buffers */
+   /* TODO: surfaces, global memory buffers */
 
nvc0_bufctx_fence(nvc0, nvc0->bufctx_cp, false);
 
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h 
b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
index c6936c1..08f0966 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
@@ -270,6 +270,8 @@ extern void nvc0_clear(struct pipe_context *, unsigned 
buffers,
 extern void nvc0_init_surface_functions(struct nvc0_context *);
 
 /* nvc0_tex.c */
+bool nvc0_validate_tic(struct nvc0_context *nvc0, int s);
+bool nvc0_validate_tsc(struct nvc0_context *nvc0, int s);
 bool nve4_validate_tsc(struct nvc0_context *nvc0, int s);
 void nvc0_validate_textures(struct nvc0_context *);
 void nvc0_validate_samplers(struct nvc0_context *);
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_tex.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_tex.c
index 24bbff6..6dc5932 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_tex.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_tex.c
@@ -24,6 +24,7 @@
 #include "nvc0/nvc0_resource.h"
 #include "nv50/nv50_texture.xml.h"
 #include "nv50/nv50_defs.xml.h"
+#include "nvc0/nvc0_compute.xml.h"
 
 #include "util/u_format.h"
 
@@ -244,7 +245,7 @@ nvc0_update_tic(struct nvc0_context *nvc0, struct 
nv50_tic_entry *tic,
tic->tic[2] |= address >> 32;
 }
 
-static bool
+bool
 nvc0_validate_tic(struct nvc0_context *nvc0, int s)
 {
uint32_t commands[32];
@@ -285,7 +286,10 @@ nvc0_validate_tic(struct nvc0_context *nvc0, int s)
  need_flush = true;
   } else
   if (res->status & NOUVEAU_BUFFER_STATUS_GPU_WRITING) {
- BEGIN_NVC0(push, NVC0_3D(TEX_CACHE_CTL), 1);
+ if (unlikely(s == 5))
+BEGIN_NVC0(push, NVC0_COMPUTE(TEX_CACHE_CTL), 1);
+ else
+BEGIN_NVC0(push, NVC0_3D(TEX_CACHE_CTL), 1);
  PUSH_DATA (push, (tic->id << 4) | 1);
  NOUVEAU_DRV_STAT(>screen->base, tex_cache_flush_count, 1);
   }
@@ -298,7 +302,10 @@ nvc0_validate_tic(struct nvc0_context *nvc0, int s)
  continue;
   commands[n++] = (tic->id << 9) | (i << 1) | 1;
 
-  BCTX_REFN(nvc0->bufctx_3d, TEX(s, i), res, RD);
+  if (unlikely(s == 5))
+ BCTX_REFN(nvc0->bufctx_cp, CP_TEX(i), res, RD);
+  else
+ BCTX_REFN(nvc0->bufctx_3d, TEX(s,

[Mesa-dev] [PATCH v2 02/11] nvc0: bind constant buffers for compute on Fermi

2016-02-07 Thread Samuel Pitoiset

Loosely based on 3D.

Changes from v2:
 - get rid of the 's' param to nvc0_cb_bo_push() because it doesn't
   matter to upload constbufs for compute using the 3d chan

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_compute.c | 60 +
 src/gallium/drivers/nouveau/nvc0/nvc0_context.c | 11 +++--
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.h  |  2 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_state.c   |  4 +-
 4 files changed, 72 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
index 5c7dc0e..5985da5 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
@@ -138,11 +138,71 @@ nvc0_compute_validate_program(struct nvc0_context *nvc0)
return false;
 }
 
+static void
+nvc0_compute_validate_constbufs(struct nvc0_context *nvc0)
+{
+   struct nouveau_pushbuf *push = nvc0->base.pushbuf;
+   const int s = 5;
+
+   while (nvc0->constbuf_dirty[s]) {
+  int i = ffs(nvc0->constbuf_dirty[s]) - 1;
+  nvc0->constbuf_dirty[s] &= ~(1 << i);
+
+  if (nvc0->constbuf[s][i].user) {
+ struct nouveau_bo *bo = nvc0->screen->uniform_bo;
+ const unsigned base = s << 16;
+ const unsigned size = nvc0->constbuf[s][0].size;
+ assert(i == 0); /* we really only want OpenGL uniforms here */
+ assert(nvc0->constbuf[s][0].u.data);
+
+ if (nvc0->state.uniform_buffer_bound[s] < size) {
+nvc0->state.uniform_buffer_bound[s] = align(size, 0x100);
+
+BEGIN_NVC0(push, NVC0_COMPUTE(CB_SIZE), 3);
+PUSH_DATA (push, nvc0->state.uniform_buffer_bound[s]);
+PUSH_DATAh(push, bo->offset + base);
+PUSH_DATA (push, bo->offset + base);
+BEGIN_NVC0(push, NVC0_COMPUTE(CB_BIND), 1);
+PUSH_DATA (push, (0 << 8) | 1);
+ }
+ nvc0_cb_bo_push(>base, bo, NV_VRAM_DOMAIN(>screen->base),
+ base, nvc0->state.uniform_buffer_bound[s],
+ 0, (size + 3) / 4,
+ nvc0->constbuf[s][0].u.data);
+  } else {
+ struct nv04_resource *res =
+nv04_resource(nvc0->constbuf[s][i].u.buf);
+ if (res) {
+BEGIN_NVC0(push, NVC0_COMPUTE(CB_SIZE), 3);
+PUSH_DATA (push, nvc0->constbuf[s][i].size);
+PUSH_DATAh(push, res->address + nvc0->constbuf[s][i].offset);
+PUSH_DATA (push, res->address + nvc0->constbuf[s][i].offset);
+BEGIN_NVC0(push, NVC0_COMPUTE(CB_BIND), 1);
+PUSH_DATA (push, (i << 8) | 1);
+
+BCTX_REFN(nvc0->bufctx_cp, CP_CB(i), res, RD);
+
+res->cb_bindings[s] |= 1 << i;
+ } else {
+BEGIN_NVC0(push, NVC0_COMPUTE(CB_BIND), 1);
+PUSH_DATA (push, (i << 8) | 0);
+ }
+ if (i == 0)
+nvc0->state.uniform_buffer_bound[s] = 0;
+  }
+   }
+
+   BEGIN_NVC0(push, NVC0_COMPUTE(FLUSH), 1);
+   PUSH_DATA (push, NVC0_COMPUTE_FLUSH_CB);
+}
+
 static bool
 nvc0_compute_state_validate(struct nvc0_context *nvc0)
 {
if (!nvc0_compute_validate_program(nvc0))
   return false;
+   if (nvc0->dirty_cp & NVC0_NEW_CP_CONSTBUF)
+  nvc0_compute_validate_constbufs(nvc0);
 
/* TODO: textures, samplers, surfaces, global memory buffers */
 
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
index 547b8f5..4fed7b2 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
@@ -241,15 +241,20 @@ nvc0_invalidate_resource_storage(struct nouveau_context 
*ctx,
   }
   }
 
-  for (s = 0; s < 5; ++s) {
+  for (s = 0; s < 6; ++s) {
   for (i = 0; i < NVC0_MAX_PIPE_CONSTBUFS; ++i) {
  if (!(nvc0->constbuf_valid[s] & (1 << i)))
 continue;
  if (!nvc0->constbuf[s][i].user &&
  nvc0->constbuf[s][i].u.buf == res) {
-nvc0->dirty |= NVC0_NEW_CONSTBUF;
 nvc0->constbuf_dirty[s] |= 1 << i;
-nouveau_bufctx_reset(nvc0->bufctx_3d, NVC0_BIND_CB(s, i));
+if (unlikely(s == 5)) {
+   nvc0->dirty_cp |= NVC0_NEW_CP_CONSTBUF;
+   nouveau_bufctx_reset(nvc0->bufctx_cp, NVC0_BIND_CP_CB(i));
+} else {
+   nvc0->dirty |= NVC0_NEW_CONSTBUF;
+   nouveau_bufctx_reset(nvc0->bufctx_3d, NVC0_BIND_CB(s, i));
+}
 if (!--ref)
return ref;
  }
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
index 1a56177..d7c427b 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
@@ -51,7 +51,7 @@ struct nvc0_graph_state {
uint8_t c14_bound; /* whether immediate array constbuf

[Mesa-dev] [PATCH v2 04/11] nvc0: bind shader buffers for compute on Fermi

2016-02-07 Thread Samuel Pitoiset

Loosely based on 3D.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_compute.c | 34 +
 src/gallium/drivers/nouveau/nvc0/nvc0_context.c | 12 ++---
 src/gallium/drivers/nouveau/nvc0/nvc0_context.h |  4 ++-
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c |  1 +
 src/gallium/drivers/nouveau/nvc0/nvc0_state.c   | 13 +++---
 5 files changed, 57 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
index c76b707..2e8a69e 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
@@ -196,6 +196,38 @@ nvc0_compute_validate_constbufs(struct nvc0_context *nvc0)
PUSH_DATA (push, NVC0_COMPUTE_FLUSH_CB);
 }
 
+static void
+nvc0_compute_validate_buffers(struct nvc0_context *nvc0)
+{
+   struct nouveau_pushbuf *push = nvc0->base.pushbuf;
+   const int s = 5;
+   int i;
+
+   BEGIN_NVC0(push, NVC0_COMPUTE(CB_SIZE), 3);
+   PUSH_DATA (push, 1024);
+   PUSH_DATAh(push, nvc0->screen->uniform_bo->offset + (6 << 16) + (s << 10));
+   PUSH_DATA (push, nvc0->screen->uniform_bo->offset + (6 << 16) + (s << 10));
+   BEGIN_1IC0(push, NVC0_COMPUTE(CB_POS), 1 + 4 * NVC0_MAX_BUFFERS);
+   PUSH_DATA (push, 512);
+
+   for (i = 0; i < NVC0_MAX_BUFFERS; i++) {
+  if (nvc0->buffers[s][i].buffer) {
+ struct nv04_resource *res =
+nv04_resource(nvc0->buffers[s][i].buffer);
+ PUSH_DATA (push, res->address + nvc0->buffers[s][i].buffer_offset);
+ PUSH_DATAh(push, res->address + nvc0->buffers[s][i].buffer_offset);
+ PUSH_DATA (push, nvc0->buffers[s][i].buffer_size);
+ PUSH_DATA (push, 0);
+ BCTX_REFN(nvc0->bufctx_cp, CP_BUF, res, RDWR);
+  } else {
+ PUSH_DATA (push, 0);
+ PUSH_DATA (push, 0);
+ PUSH_DATA (push, 0);
+ PUSH_DATA (push, 0);
+  }
+   }
+}
+
 static bool
 nvc0_compute_state_validate(struct nvc0_context *nvc0)
 {
@@ -203,6 +235,8 @@ nvc0_compute_state_validate(struct nvc0_context *nvc0)
   return false;
if (nvc0->dirty_cp & NVC0_NEW_CP_CONSTBUF)
   nvc0_compute_validate_constbufs(nvc0);
+   if (nvc0->dirty_cp & NVC0_NEW_CP_BUFFERS)
+  nvc0_compute_validate_buffers(nvc0);
 
/* TODO: textures, samplers, surfaces, global memory buffers */
 
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
index 4fed7b2..0635b98 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
@@ -261,12 +261,17 @@ nvc0_invalidate_resource_storage(struct nouveau_context 
*ctx,
   }
   }
 
-  for (s = 0; s < 5; ++s) {
+  for (s = 0; s < 6; ++s) {
   for (i = 0; i < NVC0_MAX_BUFFERS; ++i) {
  if (nvc0->buffers[s][i].buffer == res) {
 nvc0->buffers_dirty[s] |= 1 << i;
-nvc0->dirty |= NVC0_NEW_BUFFERS;
-nouveau_bufctx_reset(nvc0->bufctx_3d, NVC0_BIND_BUF);
+if (unlikely(s == 5)) {
+   nvc0->dirty_cp |= NVC0_NEW_CP_BUFFERS;
+   nouveau_bufctx_reset(nvc0->bufctx_cp, NVC0_BIND_CP_BUF);
+} else {
+   nvc0->dirty |= NVC0_NEW_BUFFERS;
+   nouveau_bufctx_reset(nvc0->bufctx_3d, NVC0_BIND_BUF);
+}
 if (!--ref)
return ref;
  }
@@ -368,6 +373,7 @@ nvc0_create(struct pipe_screen *pscreen, void *priv, 
unsigned ctxflags)
BCTX_REFN_bo(nvc0->bufctx_3d, SCREEN, flags, screen->txc);
if (screen->compute) {
   BCTX_REFN_bo(nvc0->bufctx_cp, CP_SCREEN, flags, screen->text);
+  BCTX_REFN_bo(nvc0->bufctx_cp, CP_SCREEN, flags, screen->uniform_bo);
   BCTX_REFN_bo(nvc0->bufctx_cp, CP_SCREEN, flags, screen->txc);
   BCTX_REFN_bo(nvc0->bufctx_cp, CP_SCREEN, flags, screen->parm);
}
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h 
b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
index 2e726e6..c6936c1 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
@@ -64,6 +64,7 @@
 #define NVC0_NEW_CP_SAMPLERS  (1 << 3)
 #define NVC0_NEW_CP_CONSTBUF  (1 << 4)
 #define NVC0_NEW_CP_GLOBALS   (1 << 5)
+#define NVC0_NEW_CP_BUFFERS   (1 << 6)
 
 /* 3d bufctx (during draw_vbo, blit_3d) */
 #define NVC0_BIND_FB0
@@ -87,7 +88,8 @@
 #define NVC0_BIND_CP_DESC50
 #define NVC0_BIND_CP_SCREEN  51
 #define NVC0_BIND_CP_QUERY   52
-#define NVC0_BIND_CP_COUNT   53
+#define NVC0_BIND_CP_BUF 53
+#define NVC0_BIND_CP_COUNT   54
 
 /* bufctx for other operations */
 #define NVC0_BIND_2D0
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
index afcff53..bc884d6 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
+++

[Mesa-dev] [PATCH] trace: add missing pipe_context::clear_texture()

2016-02-07 Thread Samuel Pitoiset

This fixes a crash with bin/arb_clear_texture-base-formats and
probably some other tests which use clear_texture().

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/trace/tr_context.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/src/gallium/drivers/trace/tr_context.c 
b/src/gallium/drivers/trace/tr_context.c
index 2280898..c49e4f2 100644
--- a/src/gallium/drivers/trace/tr_context.c
+++ b/src/gallium/drivers/trace/tr_context.c
@@ -1325,6 +1325,35 @@ trace_context_clear_depth_stencil(struct pipe_context 
*_pipe,
 }
 
 static inline void
+trace_context_clear_texture(struct pipe_context *_pipe,
+struct pipe_resource *_res,
+unsigned level,
+const struct pipe_box *box,
+const void *data)
+{
+   struct trace_context *tr_ctx = trace_context(_pipe);
+   struct pipe_context *pipe = tr_ctx->pipe;
+   struct trace_resource *tr_res = trace_resource(_res);
+   struct pipe_resource *res = tr_res->resource;
+
+   trace_dump_call_begin("pipe_context", "clear_texture");
+
+   trace_dump_arg(ptr, pipe);
+   trace_dump_arg_begin("res");
+   trace_dump_resource_ptr(_res);
+   trace_dump_arg_end();
+   trace_dump_arg(uint, level);
+   trace_dump_arg_begin("box");
+   trace_dump_box(box);
+   trace_dump_arg_end();
+   trace_dump_arg(ptr, data);
+
+   pipe->clear_texture(pipe, res, level, box, data);
+
+   trace_dump_call_end();
+}
+
+static inline void
 trace_context_flush(struct pipe_context *_pipe,
 struct pipe_fence_handle **fence,
 unsigned flags)
@@ -1778,6 +1807,7 @@ trace_context_create(struct trace_screen *tr_scr,
TR_CTX_INIT(clear);
TR_CTX_INIT(clear_render_target);
TR_CTX_INIT(clear_depth_stencil);
+   TR_CTX_INIT(clear_texture);
TR_CTX_INIT(flush);
TR_CTX_INIT(generate_mipmap);
TR_CTX_INIT(texture_barrier);
-- 
2.6.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/5] nir: Do opt_algebraic in reverse order.

2016-02-07 Thread Matt Turner

On Sun, Feb 7, 2016 at 6:04 AM, Eduardo Lima Mitev  wrote:
> On 02/05/2016 02:47 AM, Matt Turner wrote:
>> Walking the SSA definitions in order means that we consider the smallest
>> algebraic optimizations before larger optimizations. So if a smaller
>> rule is part of a larger rule, the smaller one will happen first,
>> preventing the larger one from happening.
>>
>> instructions in affected programs: 32721 -> 32611 (-0.34%)
>> helped: 106
>>
>> Prevents regressions and annoyances in the next commits.
>> ---
>>  src/compiler/nir/nir_algebraic.py | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/compiler/nir/nir_algebraic.py 
>> b/src/compiler/nir/nir_algebraic.py
>> index a30652f..77ad35e 100644
>> --- a/src/compiler/nir/nir_algebraic.py
>> +++ b/src/compiler/nir/nir_algebraic.py
>> @@ -216,7 +216,7 @@ ${pass_name}_block(nir_block *block, void *void_state)
>>  {
>> struct opt_state *state = void_state;
>>
>> -   nir_foreach_instr_safe(block, instr) {
>> +   nir_foreach_instr_reverse_safe(block, instr) {
>
> I would add an explicit comment here as to why walk in reverse order. It
> is not immediately clear (at least to me) that the smallest algebraic
> optimizations come before the larger ones. I could not find any comment
> in opt_algebraic.py or anywhere else that would suggest this is the case.

I think you've misunderstood. The walk, reverse or otherwise, isn't
over the optimizations in nir_opt_algebraic. It's over the NIR
instructions.

Walking the instructions in reverse is beneficial because it
necessarily allows larger patterns to be recognized before smaller
patterns. Take for instance a portion of the bitfield_reverse pattern
in patch 5:

('ior', ('ishl', u, 16), ('ushr', u, 16))

If there were also a rule that matched ('ushr', u, 16) (as
('extract_u16', u, 1) for example), walking the instructions in order
would cause the extract_u16 rule to match first. Once that had
happened, the bitfield_reverse pattern would not match.

Walking the NIR in reverse means that you look at the largest
expression trees first.

>>if (instr->type != nir_instr_type_alu)
>>   continue;
>>
>> @@ -255,7 +255,7 @@ ${pass_name}_impl(nir_function_impl *impl, const bool 
>> *condition_flags)
>> state.progress = false;
>> state.condition_flags = condition_flags;
>>
>> -   nir_foreach_block(impl, ${pass_name}_block, );
>> +   nir_foreach_block_reverse(impl, ${pass_name}_block, );
>>
>
> Does it make sense to reverse traversing of blocks too? As far as I
> understand opt_algebraic rules don't expand to other blocks (maybe I'm
> wrong). I also don't see any difference in shader-db results running
> with or without this chunk.

I think it does make sense, because the expression trees can span
multiple basic blocks.

> These are my results on HSW (with patches 1 to 3):
>
> total instructions in shared programs: 6265414 -> 6265312 (-0.00%)
> instructions in affected programs: 31499 -> 31397 (-0.32%)
> helped: 98
> HURT: 0
>
> total cycles in shared programs: 56081290 -> 56078442 (-0.01%)
> cycles in affected programs: 562440 -> 559592 (-0.51%)
> helped: 102
> HURT: 6
>
>
> Patches 1 to 3 are:
>
> Reviewed-by: Eduardo Lima Mitev 

Thanks!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 14/20] st/mesa: keep track of shared memory declarations

2016-02-07 Thread Ilia Mirkin

Reviewed-by: Ilia Mirkin 

On Sat, Feb 6, 2016 at 5:04 PM, Samuel Pitoiset
 wrote:
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 19 ---
>  1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index d74b84c..9303495 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -401,6 +401,7 @@ public:
> bool native_integers;
> bool have_sqrt;
> bool have_fma;
> +   bool use_shared_memory;
>
> variable_storage *find_variable_storage(ir_variable *var);
>
> @@ -3979,6 +3980,7 @@ glsl_to_tgsi_visitor::glsl_to_tgsi_visitor()
> options = NULL;
> have_sqrt = false;
> have_fma = false;
> +   use_shared_memory = false;
>  }
>
>  glsl_to_tgsi_visitor::~glsl_to_tgsi_visitor()
> @@ -4024,6 +4026,8 @@ count_resources(glsl_to_tgsi_visitor *v, gl_program 
> *prog)
>  inst->op == TGSI_OPCODE_STORE)) {
>   if (inst->buffer.file == PROGRAM_BUFFER)
>  v->buffers_used |= 1 << inst->buffer.index;
> + if (inst->buffer.file == PROGRAM_MEMORY)
> +v->use_shared_memory = true;
>}
> }
> prog->SamplersUsed = v->samplers_used;
> @@ -4807,6 +4811,7 @@ struct st_translate {
> struct ureg_src samplers[PIPE_MAX_SAMPLERS];
> struct ureg_src buffers[PIPE_MAX_SHADER_BUFFERS];
> struct ureg_src systemValues[SYSTEM_VALUE_MAX];
> +   struct ureg_src shared_memory;
> struct tgsi_texture_offset tex_offsets[MAX_GLSL_TEXTURE_OFFSET];
> unsigned *array_sizes;
> struct array_decl *input_arrays;
> @@ -5295,7 +5300,10 @@ compile_tgsi_instruction(struct st_translate *t,
>for (i = num_src - 1; i >= 0; i--)
>   src[i + 1] = src[i];
>num_src++;
> -  src[0] = t->buffers[inst->buffer.index];
> +  if (inst->buffer.file == PROGRAM_MEMORY)
> + src[0] = t->shared_memory;
> +  else
> + src[0] = t->buffers[inst->buffer.index];
>if (inst->buffer.reladdr)
>   src[0] = ureg_src_indirect(src[0], ureg_src(t->address[2]));
>assert(src[0].File != TGSI_FILE_NULL);
> @@ -5304,7 +5312,11 @@ compile_tgsi_instruction(struct st_translate *t,
>break;
>
> case TGSI_OPCODE_STORE:
> -  dst[0] = ureg_writemask(ureg_dst(t->buffers[inst->buffer.index]), 
> inst->dst[0].writemask);
> +  if (inst->buffer.file == PROGRAM_MEMORY)
> + dst[0] = ureg_dst(t->shared_memory);
> +  else
> + dst[0] = ureg_dst(t->buffers[inst->buffer.index]);
> +  dst[0] = ureg_writemask(dst[0], inst->dst[0].writemask);
>if (inst->buffer.reladdr)
>   dst[0] = ureg_dst_indirect(dst[0], ureg_src(t->address[2]));
>assert(dst[0].File != TGSI_FILE_NULL);
> @@ -5959,7 +5971,8 @@ st_translate_program(
>}
> }
>
> -
> +   if (program->use_shared_memory)
> +  t->shared_memory = ureg_DECL_shared_memory(ureg);
>
> /* Emit each instruction in turn:
>  */
> --
> 2.6.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 04/20] mesa: add PROGRAM_MEMORY

2016-02-07 Thread Ilia Mirkin

Ugh, with this we're going to end up overflowing gl_register_file when
I add images... it ends up getting stored as 4 bits somewhere. Oh
well, not your problem.

Reviewed-by: Ilia Mirkin 

On Sat, Feb 6, 2016 at 5:04 PM, Samuel Pitoiset
 wrote:
> This will be used for shared, global and local memory areas.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/mesa/main/mtypes.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
> index 432cda9..d50376b 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@ -1872,6 +1872,7 @@ typedef enum
> PROGRAM_UNDEFINED,   /**< Invalid/TBD value */
> PROGRAM_IMMEDIATE,   /**< Immediate value, used by TGSI */
> PROGRAM_BUFFER,  /**< for shader buffers, compile-time only */
> +   PROGRAM_MEMORY,  /**< for shared, global and local memory */
> PROGRAM_FILE_MAX
>  } gl_register_file;
>
> --
> 2.6.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] trace: remove useless MALLOC() in trace_context_draw_vbo()

2016-02-07 Thread Samuel Pitoiset

There is no need to allocate memory when unwrapping the indirect buf.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/trace/tr_context.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/trace/tr_context.c 
b/src/gallium/drivers/trace/tr_context.c
index b04c88d..2280898 100644
--- a/src/gallium/drivers/trace/tr_context.c
+++ b/src/gallium/drivers/trace/tr_context.c
@@ -120,18 +120,13 @@ trace_context_draw_vbo(struct pipe_context *_pipe,
trace_dump_trace_flush();
 
if (info->indirect) {
-  struct pipe_draw_info *_info = NULL;
+  struct pipe_draw_info _info;
 
-  _info = MALLOC(sizeof(*_info));
-  if (!_info)
- return;
-
-  memcpy(_info, info, sizeof(*_info));
-  _info->indirect = trace_resource_unwrap(tr_ctx, _info->indirect);
-  _info->indirect_params = trace_resource_unwrap(tr_ctx,
- _info->indirect_params);
-  pipe->draw_vbo(pipe, _info);
-  FREE(_info);
+  memcpy(&_info, info, sizeof(_info));
+  _info.indirect = trace_resource_unwrap(tr_ctx, _info.indirect);
+  _info.indirect_params = trace_resource_unwrap(tr_ctx,
+_info.indirect_params);
+  pipe->draw_vbo(pipe, &_info);
} else {
   pipe->draw_vbo(pipe, info);
}
-- 
2.6.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] trace: remove useless MALLOC() in trace_context_draw_vbo()

2016-02-07 Thread Ilia Mirkin

Reviewed-by: Ilia Mirkin 

Please double-check that GALLIUM_TRACE=foo and
bin/arb_indirect_parameters-tf-count -fbo -auto work together.

On Sun, Feb 7, 2016 at 4:36 PM, Samuel Pitoiset
 wrote:
> There is no need to allocate memory when unwrapping the indirect buf.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/trace/tr_context.c | 17 ++---
>  1 file changed, 6 insertions(+), 11 deletions(-)
>
> diff --git a/src/gallium/drivers/trace/tr_context.c 
> b/src/gallium/drivers/trace/tr_context.c
> index b04c88d..2280898 100644
> --- a/src/gallium/drivers/trace/tr_context.c
> +++ b/src/gallium/drivers/trace/tr_context.c
> @@ -120,18 +120,13 @@ trace_context_draw_vbo(struct pipe_context *_pipe,
> trace_dump_trace_flush();
>
> if (info->indirect) {
> -  struct pipe_draw_info *_info = NULL;
> +  struct pipe_draw_info _info;
>
> -  _info = MALLOC(sizeof(*_info));
> -  if (!_info)
> - return;
> -
> -  memcpy(_info, info, sizeof(*_info));
> -  _info->indirect = trace_resource_unwrap(tr_ctx, _info->indirect);
> -  _info->indirect_params = trace_resource_unwrap(tr_ctx,
> - _info->indirect_params);
> -  pipe->draw_vbo(pipe, _info);
> -  FREE(_info);
> +  memcpy(&_info, info, sizeof(_info));
> +  _info.indirect = trace_resource_unwrap(tr_ctx, _info.indirect);
> +  _info.indirect_params = trace_resource_unwrap(tr_ctx,
> +_info.indirect_params);
> +  pipe->draw_vbo(pipe, &_info);
> } else {
>pipe->draw_vbo(pipe, info);
> }
> --
> 2.6.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/2] Simple Klocwork patches

2016-02-07 Thread Matt Turner

On Sun, Feb 7, 2016 at 1:37 PM, Juha-Pekka Heikkilä
 wrote:
> Hi Iago,
>
> I know there are lot of places where there is malloc unchecked still
> -- and then there is ralloc which is a story of its own. Reason why I
> think checking these would be remotely useful in windows only (or
> other way around, not under linux kernel) is on Windows one can get
> the null pointer from malloc. On Androids I think memory over
> committing has always been enabled and on Linux I suspect I belong to
> the minority who like to set ulimits for memory.
>
> I agree checking these mostly is quite useless but there are those
> corners where it may suddenly become valuable. When process is running
> and everything has settled it will be weird if hit any of these checks
> but any code which is run when process is starting I notice is the
> place where things will fail if they fail. This is of course just my
> opinion about the value of these checks but I really dislike
> possibility of segfault when it is coming from a library.
>
> I didn't quickly notice where _mesa_error() get more heap. Stack it of
> course needs but when I did stress test these _mesa_error() did still
> work. Cannot promise my test was 100% correct though, I think it was
> over year ago when I was playing with it.

There's no guarantee that fprintf() doesn't call malloc. In fact, glibc's does.

Adding these checks is really useless.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] trace: add missing pipe_context::clear_texture()

2016-02-07 Thread Ilia Mirkin

On Sun, Feb 7, 2016 at 5:32 PM, Samuel Pitoiset
 wrote:
> This fixes a crash with bin/arb_clear_texture-base-formats and
> probably some other tests which use clear_texture().
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/trace/tr_context.c | 30 ++
>  1 file changed, 30 insertions(+)
>
> diff --git a/src/gallium/drivers/trace/tr_context.c 
> b/src/gallium/drivers/trace/tr_context.c
> index 2280898..c49e4f2 100644
> --- a/src/gallium/drivers/trace/tr_context.c
> +++ b/src/gallium/drivers/trace/tr_context.c
> @@ -1325,6 +1325,35 @@ trace_context_clear_depth_stencil(struct pipe_context 
> *_pipe,
>  }
>
>  static inline void
> +trace_context_clear_texture(struct pipe_context *_pipe,
> +struct pipe_resource *_res,
> +unsigned level,
> +const struct pipe_box *box,
> +const void *data)
> +{
> +   struct trace_context *tr_ctx = trace_context(_pipe);
> +   struct pipe_context *pipe = tr_ctx->pipe;
> +   struct trace_resource *tr_res = trace_resource(_res);
> +   struct pipe_resource *res = tr_res->resource;

I guess it might be nicer to use trace_resource_unwrap here?

> +
> +   trace_dump_call_begin("pipe_context", "clear_texture");
> +
> +   trace_dump_arg(ptr, pipe);
> +   trace_dump_arg_begin("res");

Is this begin/end thing necessary here?

With the two things above fixed or otherwise addressed, this is
Reviewed-by: Ilia Mirkin 

> +   trace_dump_resource_ptr(_res);
> +   trace_dump_arg_end();
> +   trace_dump_arg(uint, level);
> +   trace_dump_arg_begin("box");
> +   trace_dump_box(box);
> +   trace_dump_arg_end();
> +   trace_dump_arg(ptr, data);
> +
> +   pipe->clear_texture(pipe, res, level, box, data);
> +
> +   trace_dump_call_end();
> +}
> +
> +static inline void
>  trace_context_flush(struct pipe_context *_pipe,
>  struct pipe_fence_handle **fence,
>  unsigned flags)
> @@ -1778,6 +1807,7 @@ trace_context_create(struct trace_screen *tr_scr,
> TR_CTX_INIT(clear);
> TR_CTX_INIT(clear_render_target);
> TR_CTX_INIT(clear_depth_stencil);
> +   TR_CTX_INIT(clear_texture);
> TR_CTX_INIT(flush);
> TR_CTX_INIT(generate_mipmap);
> TR_CTX_INIT(texture_barrier);
> --
> 2.6.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] winsys/radeon: fix a wrong NUM_TILE_PIPES value from the kernel

2016-02-07 Thread Nick Sarnie

Hi,

This fixes the bug for me.

Tested-by: Nick Sarnie 

Thanks

On Sun, Feb 7, 2016 at 2:25 PM, Marek Olšák  wrote:

> From: Marek Olšák 
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94019
> ---
>  src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
> b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
> index 35dc7e6..49c310c 100644
> --- a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
> +++ b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
> @@ -405,6 +405,12 @@ static boolean do_winsys_init(struct
> radeon_drm_winsys *ws)
>  radeon_get_drm_value(ws->fd, RADEON_INFO_NUM_TILE_PIPES, NULL,
>   >info.num_tile_pipes);
>
> +/* The kernel returns 12 for some cards for an unknown reason.
> + * I thought this was supposed to be a power of two.
> + */
> +if (ws->gen == DRV_SI && ws->info.num_tile_pipes == 12)
> +ws->info.num_tile_pipes = 8;
> +
>  if (radeon_get_drm_value(ws->fd, RADEON_INFO_BACKEND_MAP,
> NULL,
>>info.r600_gb_backend_map))
>  ws->info.r600_gb_backend_map_valid = TRUE;
> --
> 2.1.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 03/11] nvc0: bind driver consts on buffer 15 for compute on Fermi

2016-02-07 Thread Samuel Pitoiset

Changes from v2:
 - always bind the driver consts even if user params come in via clover

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_compute.c | 12 +---
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c |  2 ++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
index 5985da5..c76b707 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
@@ -236,10 +236,16 @@ nvc0_compute_upload_input(struct nvc0_context *nvc0, 
const void *input)
   BEGIN_1IC0(push, NVC0_COMPUTE(CB_POS), 1 + cp->parm_size / 4);
   PUSH_DATA (push, 0);
   PUSH_DATAp(push, input, cp->parm_size / 4);
-
-  BEGIN_NVC0(push, NVC0_COMPUTE(FLUSH), 1);
-  PUSH_DATA (push, NVC0_COMPUTE_FLUSH_CB);
}
+   BEGIN_NVC0(push, NVC0_COMPUTE(CB_SIZE), 3);
+   PUSH_DATA (push, 1024);
+   PUSH_DATAh(push, nvc0->screen->uniform_bo->offset + (6 << 16) + (5 << 10));
+   PUSH_DATA (push, nvc0->screen->uniform_bo->offset + (6 << 16) + (5 << 10));
+   BEGIN_NVC0(push, NVC0_COMPUTE(CB_BIND), 1);
+   PUSH_DATA (push, (15 << 8) | 1);
+
+   BEGIN_NVC0(push, NVC0_COMPUTE(FLUSH), 1);
+   PUSH_DATA (push, NVC0_COMPUTE_FLUSH_CB);
 }
 
 void
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
index 93f211b..afcff53 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
@@ -544,6 +544,8 @@ nvc0_program_translate(struct nvc0_program *prog, uint16_t 
chipset,
  info->io.texBindBase = NVE4_CP_INPUT_TEX(0);
  info->io.suInfoBase = NVE4_CP_INPUT_SUF(0);
  info->prop.cp.gridInfoBase = NVE4_CP_INPUT_GRID_INFO(0);
+  } else {
+ info->io.resInfoCBSlot = 15;
   }
   info->io.msInfoCBSlot = 0;
   info->io.msInfoBase = NVE4_CP_INPUT_MS_OFFSETS;
-- 
2.6.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 10/11] nv50/ir: add atomics support on shared memory for Fermi

2016-02-07 Thread Samuel Pitoiset

Changes from v2:
 - make sure the op is OP_SELP when emitting the predicate and add one
   assert
 - use bld.getSSA() for mkOp2()
 - add cross edge between tryLockAndSetBB and joinBB

Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  |   4 +
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 103 -
 .../nouveau/codegen/nv50_ir_lowering_nvc0.h|   1 +
 3 files changed, 106 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index f6605eb..b17bb86 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -398,6 +398,10 @@ CodeEmitterNVC0::emitForm_A(const Instruction *i, uint64_t 
opc)
  srcId(i->src(s), s ? ((s == 2) ? 49 : s1) : 20);
  break;
   default:
+ if (i->op == OP_SELP) {
+assert(i->src(s).getFile() == FILE_PREDICATE);
+srcId(i->src(s), 49);
+ }
  // ignore here, can be predicate or flags, but must not be address
  break;
   }
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index e7cb54b..21a6f1e 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -1033,6 +1033,100 @@ NVC0LoweringPass::handleSUQ(Instruction *suq)
return true;
 }
 
+void
+NVC0LoweringPass::handleSharedATOM(Instruction *atom)
+{
+   assert(atom->src(0).getFile() == FILE_MEMORY_SHARED);
+
+   BasicBlock *currBB = atom->bb;
+   BasicBlock *tryLockAndSetBB = atom->bb->splitBefore(atom, false);
+   BasicBlock *joinBB = atom->bb->splitAfter(atom);
+
+   bld.setPosition(currBB, true);
+   assert(!currBB->joinAt);
+   currBB->joinAt = bld.mkFlow(OP_JOINAT, joinBB, CC_ALWAYS, NULL);
+
+   bld.mkFlow(OP_BRA, tryLockAndSetBB, CC_ALWAYS, NULL);
+   currBB->cfg.attach(>cfg, Graph::Edge::TREE);
+
+   bld.setPosition(tryLockAndSetBB, true);
+
+   Instruction *ld =
+  bld.mkLoad(TYPE_U32, atom->getDef(0),
+ bld.mkSymbol(FILE_MEMORY_SHARED, 0, TYPE_U32, 0), NULL);
+   ld->setDef(1, bld.getSSA(1, FILE_PREDICATE));
+   ld->subOp = NV50_IR_SUBOP_LOAD_LOCKED;
+
+   Value *stVal;
+   if (atom->subOp == NV50_IR_SUBOP_ATOM_EXCH) {
+  // Read the old value, and write the new one.
+  stVal = atom->getSrc(1);
+   } else if (atom->subOp == NV50_IR_SUBOP_ATOM_CAS) {
+  CmpInstruction *set =
+ bld.mkCmp(OP_SET, CC_EQ, TYPE_U32, bld.getSSA(1, FILE_PREDICATE),
+   TYPE_U32, ld->getDef(0), atom->getSrc(1));
+  set->setPredicate(CC_P, ld->getDef(1));
+
+  CmpInstruction *selp =
+ bld.mkCmp(OP_SELP, CC_NOT_P, TYPE_U32, bld.getSSA(4, FILE_ADDRESS),
+   TYPE_U32, ld->getDef(0), atom->getSrc(2),
+   set->getDef(0));
+  selp->setPredicate(CC_P, ld->getDef(1));
+
+  stVal = selp->getDef(0);
+   } else {
+  operation op;
+
+  switch (atom->subOp) {
+  case NV50_IR_SUBOP_ATOM_ADD:
+ op = OP_ADD;
+ break;
+  case NV50_IR_SUBOP_ATOM_AND:
+ op = OP_AND;
+ break;
+  case NV50_IR_SUBOP_ATOM_OR:
+ op = OP_OR;
+ break;
+  case NV50_IR_SUBOP_ATOM_XOR:
+ op = OP_XOR;
+ break;
+  case NV50_IR_SUBOP_ATOM_MIN:
+ op = OP_MIN;
+ break;
+  case NV50_IR_SUBOP_ATOM_MAX:
+ op = OP_MAX;
+ break;
+  default:
+ assert(0);
+  }
+
+  Instruction *i =
+ bld.mkOp2(op, atom->dType, bld.getSSA(), ld->getDef(0),
+   atom->getSrc(1));
+  i->setPredicate(CC_P, ld->getDef(1));
+
+  stVal = i->getDef(0);
+   }
+
+   Instruction *st =
+  bld.mkStore(OP_STORE, TYPE_U32,
+  bld.mkSymbol(FILE_MEMORY_SHARED, 0, TYPE_U32, 0),
+  NULL, stVal);
+   st->setPredicate(CC_P, ld->getDef(1));
+   st->subOp = NV50_IR_SUBOP_STORE_UNLOCKED;
+
+   // Loop until the lock is acquired.
+   bld.mkFlow(OP_BRA, tryLockAndSetBB, CC_NOT_P, ld->getDef(1));
+   tryLockAndSetBB->cfg.attach(>cfg, Graph::Edge::BACK);
+   tryLockAndSetBB->cfg.attach(>cfg, Graph::Edge::CROSS);
+   bld.mkFlow(OP_BRA, joinBB, CC_ALWAYS, NULL);
+
+   bld.remove(atom);
+
+   bld.setPosition(joinBB, false);
+   bld.mkFlow(OP_JOIN, NULL, CC_ALWAYS, NULL)->fixed = 1;
+}
+
 bool
 NVC0LoweringPass::handleATOM(Instruction *atom)
 {
@@ -1044,8 +1138,8 @@ NVC0LoweringPass::handleATOM(Instruction *atom)
   sv = SV_LBASE;
   break;
case FILE_MEMORY_SHARED:
-  sv = SV_SBASE;
-  break;
+  handleSharedATOM(atom);
+  return true;
default:
   assert(atom->src(0).getFile() == FILE_MEMORY_GLOBAL);
   base = loadResInfo64(ind, atom->getSrc(0)->reg.fileIndex * 16);
@@

[Mesa-dev] [PATCH v2 11/11] nvc0: enable compute shaders on Fermi

2016-02-07 Thread Samuel Pitoiset

Kepler compute support is really different than Fermi and it's not
ready yet.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 85be1cc..863a52e 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -274,7 +274,9 @@ nvc0_screen_get_shader_param(struct pipe_screen *pscreen, 
unsigned shader,
case PIPE_SHADER_CAP_PREFERRED_IR:
   return PIPE_SHADER_IR_TGSI;
case PIPE_SHADER_CAP_SUPPORTED_IRS:
-  return 0;
+  if (class_3d >= NVE4_3D_CLASS)
+ return 0;
+  return 1 << PIPE_SHADER_IR_TGSI;
case PIPE_SHADER_CAP_MAX_INSTRUCTIONS:
case PIPE_SHADER_CAP_MAX_ALU_INSTRUCTIONS:
case PIPE_SHADER_CAP_MAX_TEX_INSTRUCTIONS:
-- 
2.6.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 07/11] nv50/ir: use s[] addr space for shared buffers

2016-02-07 Thread Samuel Pitoiset

Shared memory address space (FILE_MEMORY_SHARED) must be used instead
of global memory when a shared memory area is declared.

Changes from v2:
 - oops, do not remove TGSI_FILE_BUFFER in a switch in
   nv50_ir_from_tgsi.cpp

Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 41 --
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 52ac198..d06e9ef 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -374,6 +374,7 @@ static nv50_ir::DataFile translateFile(uint file)
case TGSI_FILE_IMMEDIATE:   return nv50_ir::FILE_IMMEDIATE;
case TGSI_FILE_SYSTEM_VALUE:return nv50_ir::FILE_SYSTEM_VALUE;
case TGSI_FILE_BUFFER:  return nv50_ir::FILE_MEMORY_GLOBAL;
+   case TGSI_FILE_MEMORY:  return nv50_ir::FILE_MEMORY_GLOBAL;
case TGSI_FILE_SAMPLER:
case TGSI_FILE_NULL:
default:
@@ -858,6 +859,11 @@ public:
};
std::vector resources;
 
+   struct MemoryFile {
+  bool shared;
+   };
+   std::vector memoryFiles;
+
 private:
int inferSysValDirection(unsigned sn) const;
bool scanDeclaration(const struct tgsi_full_declaration *);
@@ -904,6 +910,7 @@ bool Source::scanSource()
textureViews.resize(scan.file_max[TGSI_FILE_SAMPLER_VIEW] + 1);
//resources.resize(scan.file_max[TGSI_FILE_RESOURCE] + 1);
tempArrayId.resize(scan.file_max[TGSI_FILE_TEMPORARY] + 1);
+   memoryFiles.resize(scan.file_max[TGSI_FILE_MEMORY] + 1);
 
info->immd.bufSize = 0;
 
@@ -1213,6 +1220,11 @@ bool Source::scanDeclaration(const struct 
tgsi_full_declaration *decl)
   for (i = first; i <= last; ++i)
  textureViews[i].target = decl->SamplerView.Resource;
   break;
+   case TGSI_FILE_MEMORY:
+  for (i = first; i <= last; ++i)
+ memoryFiles[i].shared = decl->Declaration.Shared;
+  break;
+   case TGSI_FILE_NULL:
case TGSI_FILE_TEMPORARY:
   for (i = first; i <= last; ++i)
  tempArrayId[i] = arrayId;
@@ -1220,7 +1232,6 @@ bool Source::scanDeclaration(const struct 
tgsi_full_declaration *decl)
  tempArrayInfo.insert(std::make_pair(arrayId, std::make_pair(
first, last - first + 1)));
   break;
-   case TGSI_FILE_NULL:
case TGSI_FILE_ADDRESS:
case TGSI_FILE_CONSTANT:
case TGSI_FILE_IMMEDIATE:
@@ -1516,6 +1527,9 @@ Converter::makeSym(uint tgsiFile, int fileIdx, int idx, 
int c, uint32_t address)
 
sym->reg.fileIndex = fileIdx;
 
+   if (tgsiFile == TGSI_FILE_MEMORY && code->memoryFiles[fileIdx].shared)
+  sym->setFile(FILE_MEMORY_SHARED);
+
if (idx >= 0) {
   if (sym->reg.file == FILE_SHADER_INPUT)
  sym->setOffset(info->in[idx].slot[c] * 4);
@@ -1769,7 +1783,7 @@ Converter::acquireDst(int d, int c)
int idx = dst.getIndex(0);
int idx2d = dst.is2D() ? dst.getIndex(1) : 0;
 
-   if (dst.isMasked(c) || f == TGSI_FILE_BUFFER)
+   if (dst.isMasked(c) || f == TGSI_FILE_BUFFER || f == TGSI_FILE_MEMORY)
   return NULL;
 
if (dst.isIndirect(0) ||
@@ -2239,7 +2253,8 @@ Converter::handleLOAD(Value *dst0[4])
int c;
std::vector off, src, ldv, def;
 
-   if (tgsi.getSrc(0).getFile() == TGSI_FILE_BUFFER) {
+   if (tgsi.getSrc(0).getFile() == TGSI_FILE_BUFFER ||
+   tgsi.getSrc(0).getFile() == TGSI_FILE_MEMORY) {
   for (c = 0; c < 4; ++c) {
  if (!dst0[c])
 continue;
@@ -2248,9 +2263,10 @@ Converter::handleLOAD(Value *dst0[4])
  Symbol *sym;
  if (tgsi.getSrc(1).getFile() == TGSI_FILE_IMMEDIATE) {
 off = NULL;
-sym = makeSym(TGSI_FILE_BUFFER, r, -1, c, 
tgsi.getSrc(1).getValueU32(0, info) + 4 * c);
+sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c,
+  tgsi.getSrc(1).getValueU32(0, info) + 4 * c);
  } else {
-sym = makeSym(TGSI_FILE_BUFFER, r, -1, c, 4 * c);
+sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c, 4 * c);
  }
 
  Instruction *ld = mkLoad(TYPE_U32, dst0[c], sym, off);
@@ -2337,7 +2353,8 @@ Converter::handleSTORE()
int c;
std::vector off, src, dummy;
 
-   if (tgsi.getDst(0).getFile() == TGSI_FILE_BUFFER) {
+   if (tgsi.getDst(0).getFile() == TGSI_FILE_BUFFER ||
+   tgsi.getDst(0).getFile() == TGSI_FILE_MEMORY) {
   for (c = 0; c < 4; ++c) {
  if (!(tgsi.getDst(0).getMask() & (1 << c)))
 continue;
@@ -2346,11 +2363,11 @@ Converter::handleSTORE()
  Value *off;
  if (tgsi.getSrc(0).getFile() == TGSI_FILE_IMMEDIATE) {
 off = NULL;
-sym = makeSym(TGSI_FILE_BUFFER, r, -1, c,
+sym = makeSym(tgsi.getDst(0).getFile(), r, -1, c,
   tgsi.getSrc(0).getValueU32(0, info) +

Re: [Mesa-dev] [PATCH 0/2] Simple Klocwork patches

2016-02-07 Thread Juha-Pekka Heikkilä

Hi Iago,

I know there are lot of places where there is malloc unchecked still
-- and then there is ralloc which is a story of its own. Reason why I
think checking these would be remotely useful in windows only (or
other way around, not under linux kernel) is on Windows one can get
the null pointer from malloc. On Androids I think memory over
committing has always been enabled and on Linux I suspect I belong to
the minority who like to set ulimits for memory.

I agree checking these mostly is quite useless but there are those
corners where it may suddenly become valuable. When process is running
and everything has settled it will be weird if hit any of these checks
but any code which is run when process is starting I notice is the
place where things will fail if they fail. This is of course just my
opinion about the value of these checks but I really dislike
possibility of segfault when it is coming from a library.

I didn't quickly notice where _mesa_error() get more heap. Stack it of
course needs but when I did stress test these _mesa_error() did still
work. Cannot promise my test was 100% correct though, I think it was
over year ago when I was playing with it.

/Juha-Pekka

On Wed, Feb 3, 2016 at 5:12 PM, Iago Toral  wrote:
> Hi Juha,
>
> I don't know why checking for this might be more relevant in Windows,
> but in any case:
>
> There are a ton of other places in mesa where we allocate memory via
> calloc/malloc and we don't check that the allocation actually succeeded
> so I am not sure that fixing a couple of instances of *small*
> allocations changes anything.
>
> IMHO, this kind of things are only really useful when allocating memory
> for large amounts of data, otherwise even if you check for a NULL
> allocation you still need to make sure that you don't need any extra
> memory to handle that situation, and _mesa_error() needs memory, so it
> is probably not really giving us anything in practice other than
> silencing Klocwork...
>
> Iago
>
> On Wed, 2016-02-03 at 10:56 +0200, Juha-Pekka Heikkila wrote:
>> I'm thinking these things maybe could be wrapped up inside something like
>> "#ifdef windows" or so in the future. At least for Android and Linux these
>> are normally quite useless.
>>
>> /Juha-Pekka
>>
>> Juha-Pekka Heikkila (2):
>>   i965: in brw_link_shader() react to low memory
>>   glsl: Check for null pointer at ir_variable_refcount_visitor()
>>
>>  src/compiler/glsl/ir_variable_refcount.cpp | 7 +++
>>  src/mesa/drivers/dri/i965/brw_link.cpp | 4 
>>  src/mesa/main/ff_fragment_shader.cpp   | 6 --
>>  3 files changed, 15 insertions(+), 2 deletions(-)
>>
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] trace: add missing pipe_context::clear_texture()

2016-02-07 Thread Samuel Pitoiset




On 02/07/2016 11:39 PM, Ilia Mirkin wrote:

On Sun, Feb 7, 2016 at 5:32 PM, Samuel Pitoiset
 wrote:

This fixes a crash with bin/arb_clear_texture-base-formats and
probably some other tests which use clear_texture().

Signed-off-by: Samuel Pitoiset 
---
  src/gallium/drivers/trace/tr_context.c | 30 ++
  1 file changed, 30 insertions(+)

diff --git a/src/gallium/drivers/trace/tr_context.c 
b/src/gallium/drivers/trace/tr_context.c
index 2280898..c49e4f2 100644
--- a/src/gallium/drivers/trace/tr_context.c
+++ b/src/gallium/drivers/trace/tr_context.c
@@ -1325,6 +1325,35 @@ trace_context_clear_depth_stencil(struct pipe_context 
*_pipe,
  }

  static inline void
+trace_context_clear_texture(struct pipe_context *_pipe,
+struct pipe_resource *_res,
+unsigned level,
+const struct pipe_box *box,
+const void *data)
+{
+   struct trace_context *tr_ctx = trace_context(_pipe);
+   struct pipe_context *pipe = tr_ctx->pipe;
+   struct trace_resource *tr_res = trace_resource(_res);
+   struct pipe_resource *res = tr_res->resource;


I guess it might be nicer to use trace_resource_unwrap here?


Right.




+
+   trace_dump_call_begin("pipe_context", "clear_texture");
+
+   trace_dump_arg(ptr, pipe);
+   trace_dump_arg_begin("res");


Is this begin/end thing necessary here?


With trace_dump_resource_ptr() it's necessary, but I can use 
trace_dump_arg(ptr, res) which does the same job actually.




With the two things above fixed or otherwise addressed, this is
Reviewed-by: Ilia Mirkin 


+   trace_dump_resource_ptr(_res);
+   trace_dump_arg_end();
+   trace_dump_arg(uint, level);
+   trace_dump_arg_begin("box");
+   trace_dump_box(box);
+   trace_dump_arg_end();
+   trace_dump_arg(ptr, data);
+
+   pipe->clear_texture(pipe, res, level, box, data);
+
+   trace_dump_call_end();
+}
+
+static inline void
  trace_context_flush(struct pipe_context *_pipe,
  struct pipe_fence_handle **fence,
  unsigned flags)
@@ -1778,6 +1807,7 @@ trace_context_create(struct trace_screen *tr_scr,
 TR_CTX_INIT(clear);
 TR_CTX_INIT(clear_render_target);
 TR_CTX_INIT(clear_depth_stencil);
+   TR_CTX_INIT(clear_texture);
 TR_CTX_INIT(flush);
 TR_CTX_INIT(generate_mipmap);
 TR_CTX_INIT(texture_barrier);
--
2.6.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] gallium AoA support and indirect sampler fixes

2016-02-07 Thread Laurent Carlier

2016-02-07 15:13 GMT+01:00 Laurent Carlier :

> Le vendredi 5 février 2016, 13:40:26 CET Dave Airlie a écrit :
> > Hi,
> >
> > In fixing some indirect sampler issues with ARB_gpu_shader5,
> > I realised AoA was mostly fixed as well by the same things.
> >
> > Ilia made me fix atomics as well.
> >
> > So thise patch set enables AoA support on all gallium drivers
> > exposing GLSL 1.30.
> >
> > Dave.
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> I 've quickly tested the series, Shadow of Mordor segfault on start and
> first
> intro movie from witcher 2 is greenish then it segfault.
>
> On top of mesa-git with llvm-svn (both trunk) and amdgpu/kernel-4.5rc2
> --
> Laurent Carlier
> http://www.archlinux.org


Here is the backtrace from Shadow of Morder with gdb:
 Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x7fffca5b9700 (LWP 30471)]
0x7fffe92c81f6 in _debug_assert_fail (expr=expr@entry=0x7fffe94843f0
"idx < (int)ARRAY_SIZE(v->sampler_types)", file=file@entry=0x7fffe9483a18
"state_tracker/st_glsl_to_tgsi.cpp", line=line@entry=4033,
function=function@entry=0x7fffe9485f00

"count_resources") at util/u_debug.c:324
324os_abort();
(gdb) bt full
#0  0x7fffe92c81f6 in _debug_assert_fail (expr=expr@entry=0x7fffe94843f0
"idx < (int)ARRAY_SIZE(v->sampler_types)", file=file@entry=0x7fffe9483a18
"state_tracker/st_glsl_to_tgsi.cpp",
line=line@entry=4033, function=function@entry=0x7fffe9485f00

"count_resources") at util/u_debug.c:324
No locals.
#1  0x7fffe915a8a2 in count_resources (prog=0x7fff8d8804e0,
v=0x7fff8d8849c0) at state_tracker/st_glsl_to_tgsi.cpp:4033
idx = 
i = 
inst = 0x7fff8d883750
#2  get_mesa_program (ctx=ctx@entry=0xbde0d20,
shader_program=shader_program@entry=0x7fff8d7f90b0, shader=0x7fff8d8800d0)
at state_tracker/st_glsl_to_tgsi.cpp:6158
v = 0x7fff8d8849c0
prog = 0x7fff8d8804e0
progress = 
options = 
pscreen = 
stfp = 
stgp = 
sttcp = 
sttep = 
__PRETTY_FUNCTION__ = "gl_program* get_mesa_program(gl_context*,
gl_shader_program*, gl_shader*)"
#3  0x7fffe9161054 in st_link_shader (ctx=0xbde0d20,
prog=0x7fff8d7f90b0) at state_tracker/st_glsl_to_tgsi.cpp:6398
linked_prog = 0x0
i = 4
pscreen = 0xb86fb30
__PRETTY_FUNCTION__ = "GLboolean st_link_shader(gl_context*,
gl_shader_program*)"
#4  0x7fffe917823a in _mesa_glsl_link_shader (ctx=0xbde0d20,
prog=0x7fff8d7f90b0) at program/ir_to_mesa.cpp:2963
i = 
#5  0x7fffe909357a in link_program (ctx=0xbde0d20, program=) at main/shaderapi.c:1048
shProg = 
__func__ = "link_program"
#6  0x0243a81a in ?? ()
No symbol table info available.
#7  0x0243802c in ?? ()
No symbol table info available.
#8  0x024385dd in ?? ()
No symbol table info available.
#9  0x021da358 in ?? ()
No symbol table info available.
#10 0x021dacb4 in ?? ()
No symbol table info available.
#11 0x021c3512 in ?? ()
No symbol table info available.
#12 0x005a4dae in ?? ()
No symbol table info available.
#13 0x0053f4f9 in ?? ()
No symbol table info available.
#14 0x0053f567 in ?? ()
No symbol table info available.
#15 0x0059ce4e in ?? ()
No symbol table info available.
#16 0x0236c9d9 in ?? ()
No symbol table info available.
#17 0x0252fee1 in ?? ()
No symbol table info available.
#18 0x774ba4a4 in start_thread () from /usr/lib/libpthread.so.0
No symbol table info available.
#19 0x7594d13d in clone () from /usr/lib/libc.so.6
No symbol table info available.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 06/20] gallium: add a new interface for pipe_context::launch_grid()

2016-02-07 Thread Ilia Mirkin

On Sun, Feb 7, 2016 at 5:59 AM, Samuel Pitoiset
 wrote:
>
>
> On 02/06/2016 11:04 PM, Samuel Pitoiset wrote:
>>
>> This introduces pipe_grid_info which contains all information to
>> describe a launch_grid call. This will be used to implement indirect
>> compute in the same fashion as indirect draw.
>>
>> Signed-off-by: Samuel Pitoiset 
>> Reviewed-by: Marek Olšák 
>> Reviewed-by: Ilia Mirkin 
>> ---
>>   src/gallium/drivers/ilo/ilo_gpgpu.c|  8 ++
>>   src/gallium/drivers/nouveau/nv50/nv50_compute.c| 16 +--
>>   src/gallium/drivers/nouveau/nv50/nv50_context.h|  3 +-
>>   .../drivers/nouveau/nv50/nv50_query_hw_sm.c| 12 ++--
>>   src/gallium/drivers/nouveau/nvc0/nvc0_compute.c| 19 ++---
>>   src/gallium/drivers/nouveau/nvc0/nvc0_context.h|  6 ++--
>>   .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 12 ++--
>>   src/gallium/drivers/nouveau/nvc0/nve4_compute.c| 10 +++
>>   src/gallium/drivers/r600/evergreen_compute.c   | 19 ++---
>>   src/gallium/drivers/radeonsi/si_compute.c  | 33
>> +++---
>>   src/gallium/include/pipe/p_context.h   | 17 ++-
>>   src/gallium/include/pipe/p_state.h | 27
>> ++
>>   src/gallium/state_trackers/clover/core/kernel.cpp  | 13 +
>>   src/gallium/tests/trivial/compute.c| 11 +++-
>>   14 files changed, 117 insertions(+), 89 deletions(-)
>>
>> diff --git a/src/gallium/drivers/ilo/ilo_gpgpu.c
>> b/src/gallium/drivers/ilo/ilo_gpgpu.c
>> index b741590..ab165b6 100644
>> --- a/src/gallium/drivers/ilo/ilo_gpgpu.c
>> +++ b/src/gallium/drivers/ilo/ilo_gpgpu.c
>> @@ -79,9 +79,7 @@ launch_grid(struct ilo_context *ilo,
>>   }
>>
>>   static void
>> -ilo_launch_grid(struct pipe_context *pipe,
>> -const uint *block_layout, const uint *grid_layout,
>> -uint32_t pc, const void *input)
>> +ilo_launch_grid(struct pipe_context *pipe, const struct pipe_grid_info
>> *info)
>>   {
>>  struct ilo_context *ilo = ilo_context(pipe);
>>  struct ilo_shader_state *cs = ilo->state_vector.cs;
>> @@ -92,13 +90,13 @@ ilo_launch_grid(struct pipe_context *pipe,
>>  input_buf.buffer_size =
>> ilo_shader_get_kernel_param(cs, ILO_KERNEL_CS_INPUT_SIZE);
>>  if (input_buf.buffer_size) {
>> -  u_upload_data(ilo->uploader, 0, input_buf.buffer_size, 16, input,
>> +  u_upload_data(ilo->uploader, 0, input_buf.buffer_size, 16,
>> info->input,
>>   _buf.buffer_offset, _buf.buffer);
>>  }
>>
>>  ilo_shader_cache_upload(ilo->shader_cache, >cp->builder);
>>
>> -   launch_grid(ilo, block_layout, grid_layout, _buf, pc);
>> +   launch_grid(ilo, info->block, info->grid, _buf, info->pc);
>>
>>  ilo_render_invalidate_hw(ilo->render);
>>
>> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_compute.c
>> b/src/gallium/drivers/nouveau/nv50/nv50_compute.c
>> index 6d23fd6..04488d6 100644
>> --- a/src/gallium/drivers/nouveau/nv50/nv50_compute.c
>> +++ b/src/gallium/drivers/nouveau/nv50/nv50_compute.c
>> @@ -270,13 +270,11 @@ nv50_compute_find_symbol(struct nv50_context *nv50,
>> uint32_t label)
>>   }
>>
>>   void
>> -nv50_launch_grid(struct pipe_context *pipe,
>> - const uint *block_layout, const uint *grid_layout,
>> - uint32_t label, const void *input)
>> +nv50_launch_grid(struct pipe_context *pipe, const struct pipe_grid_info
>> *info)
>>   {
>>  struct nv50_context *nv50 = nv50_context(pipe);
>>  struct nouveau_pushbuf *push = nv50->base.pushbuf;
>> -   unsigned block_size = block_layout[0] * block_layout[1] *
>> block_layout[2];
>> +   unsigned block_size = info->block[0] * info->block[1] *
>> info->block[2];
>>  struct nv50_program *cp = nv50->compprog;
>>  bool ret;
>>
>> @@ -286,10 +284,10 @@ nv50_launch_grid(struct pipe_context *pipe,
>> return;
>>  }
>>
>> -   nv50_compute_upload_input(nv50, input);
>> +   nv50_compute_upload_input(nv50, info->input);
>>
>>  BEGIN_NV04(push, NV50_COMPUTE(CP_START_ID), 1);
>> -   PUSH_DATA (push, nv50_compute_find_symbol(nv50, label));
>> +   PUSH_DATA (push, nv50_compute_find_symbol(nv50, info->pc));
>>
>>  BEGIN_NV04(push, NV50_COMPUTE(SHARED_SIZE), 1);
>>  PUSH_DATA (push, align(cp->cp.smem_size + cp->parm_size + 0x10,
>> 0x40));
>> @@ -298,14 +296,14 @@ nv50_launch_grid(struct pipe_context *pipe,
>>
>>  /* grid/block setup */
>>  BEGIN_NV04(push, NV50_COMPUTE(BLOCKDIM_XY), 2);
>> -   PUSH_DATA (push, block_layout[1] << 16 | block_layout[0]);
>> -   PUSH_DATA (push, block_layout[2]);
>> +   PUSH_DATA (push, info->block[1] << 16 | info->block[0]);
>> +   PUSH_DATA (push, info->block[2]);
>>  BEGIN_NV04(push, NV50_COMPUTE(BLOCK_ALLOC), 1);
>>  PUSH_DATA (push, 1 << 16 | block_size);
>>  BEGIN_NV04(push, NV50_COMPUTE(BLOCKDIM_LATCH),

Re: [Mesa-dev] [PATCH v2 06/20] gallium: add a new interface for pipe_context::launch_grid()

2016-02-07 Thread Samuel Pitoiset




On 02/07/2016 07:00 PM, Ilia Mirkin wrote:

On Sun, Feb 7, 2016 at 5:59 AM, Samuel Pitoiset
 wrote:



On 02/06/2016 11:04 PM, Samuel Pitoiset wrote:


This introduces pipe_grid_info which contains all information to
describe a launch_grid call. This will be used to implement indirect
compute in the same fashion as indirect draw.

Signed-off-by: Samuel Pitoiset 
Reviewed-by: Marek Olšák 
Reviewed-by: Ilia Mirkin 
---
   src/gallium/drivers/ilo/ilo_gpgpu.c|  8 ++
   src/gallium/drivers/nouveau/nv50/nv50_compute.c| 16 +--
   src/gallium/drivers/nouveau/nv50/nv50_context.h|  3 +-
   .../drivers/nouveau/nv50/nv50_query_hw_sm.c| 12 ++--
   src/gallium/drivers/nouveau/nvc0/nvc0_compute.c| 19 ++---
   src/gallium/drivers/nouveau/nvc0/nvc0_context.h|  6 ++--
   .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 12 ++--
   src/gallium/drivers/nouveau/nvc0/nve4_compute.c| 10 +++
   src/gallium/drivers/r600/evergreen_compute.c   | 19 ++---
   src/gallium/drivers/radeonsi/si_compute.c  | 33
+++---
   src/gallium/include/pipe/p_context.h   | 17 ++-
   src/gallium/include/pipe/p_state.h | 27
++
   src/gallium/state_trackers/clover/core/kernel.cpp  | 13 +
   src/gallium/tests/trivial/compute.c| 11 +++-
   14 files changed, 117 insertions(+), 89 deletions(-)

diff --git a/src/gallium/drivers/ilo/ilo_gpgpu.c
b/src/gallium/drivers/ilo/ilo_gpgpu.c
index b741590..ab165b6 100644
--- a/src/gallium/drivers/ilo/ilo_gpgpu.c
+++ b/src/gallium/drivers/ilo/ilo_gpgpu.c
@@ -79,9 +79,7 @@ launch_grid(struct ilo_context *ilo,
   }

   static void
-ilo_launch_grid(struct pipe_context *pipe,
-const uint *block_layout, const uint *grid_layout,
-uint32_t pc, const void *input)
+ilo_launch_grid(struct pipe_context *pipe, const struct pipe_grid_info
*info)
   {
  struct ilo_context *ilo = ilo_context(pipe);
  struct ilo_shader_state *cs = ilo->state_vector.cs;
@@ -92,13 +90,13 @@ ilo_launch_grid(struct pipe_context *pipe,
  input_buf.buffer_size =
 ilo_shader_get_kernel_param(cs, ILO_KERNEL_CS_INPUT_SIZE);
  if (input_buf.buffer_size) {
-  u_upload_data(ilo->uploader, 0, input_buf.buffer_size, 16, input,
+  u_upload_data(ilo->uploader, 0, input_buf.buffer_size, 16,
info->input,
   _buf.buffer_offset, _buf.buffer);
  }

  ilo_shader_cache_upload(ilo->shader_cache, >cp->builder);

-   launch_grid(ilo, block_layout, grid_layout, _buf, pc);
+   launch_grid(ilo, info->block, info->grid, _buf, info->pc);

  ilo_render_invalidate_hw(ilo->render);

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_compute.c
b/src/gallium/drivers/nouveau/nv50/nv50_compute.c
index 6d23fd6..04488d6 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_compute.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_compute.c
@@ -270,13 +270,11 @@ nv50_compute_find_symbol(struct nv50_context *nv50,
uint32_t label)
   }

   void
-nv50_launch_grid(struct pipe_context *pipe,
- const uint *block_layout, const uint *grid_layout,
- uint32_t label, const void *input)
+nv50_launch_grid(struct pipe_context *pipe, const struct pipe_grid_info
*info)
   {
  struct nv50_context *nv50 = nv50_context(pipe);
  struct nouveau_pushbuf *push = nv50->base.pushbuf;
-   unsigned block_size = block_layout[0] * block_layout[1] *
block_layout[2];
+   unsigned block_size = info->block[0] * info->block[1] *
info->block[2];
  struct nv50_program *cp = nv50->compprog;
  bool ret;

@@ -286,10 +284,10 @@ nv50_launch_grid(struct pipe_context *pipe,
 return;
  }

-   nv50_compute_upload_input(nv50, input);
+   nv50_compute_upload_input(nv50, info->input);

  BEGIN_NV04(push, NV50_COMPUTE(CP_START_ID), 1);
-   PUSH_DATA (push, nv50_compute_find_symbol(nv50, label));
+   PUSH_DATA (push, nv50_compute_find_symbol(nv50, info->pc));

  BEGIN_NV04(push, NV50_COMPUTE(SHARED_SIZE), 1);
  PUSH_DATA (push, align(cp->cp.smem_size + cp->parm_size + 0x10,
0x40));
@@ -298,14 +296,14 @@ nv50_launch_grid(struct pipe_context *pipe,

  /* grid/block setup */
  BEGIN_NV04(push, NV50_COMPUTE(BLOCKDIM_XY), 2);
-   PUSH_DATA (push, block_layout[1] << 16 | block_layout[0]);
-   PUSH_DATA (push, block_layout[2]);
+   PUSH_DATA (push, info->block[1] << 16 | info->block[0]);
+   PUSH_DATA (push, info->block[2]);
  BEGIN_NV04(push, NV50_COMPUTE(BLOCK_ALLOC), 1);
  PUSH_DATA (push, 1 << 16 | block_size);
  BEGIN_NV04(push, NV50_COMPUTE(BLOCKDIM_LATCH), 1);
  PUSH_DATA (push, 1);
  BEGIN_NV04(push, NV50_COMPUTE(GRIDDIM), 1);
-   PUSH_DATA (push, grid_layout[1] << 16 | grid_layout[0]);
+   PUSH_DATA (push, info->grid[1] << 16 | info->grid[0]);
  BEGIN_NV04(push,

[Mesa-dev] [PATCH] winsys/radeon: fix a wrong NUM_TILE_PIPES value from the kernel

2016-02-07 Thread Marek Olšák

From: Marek Olšák 

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94019
---
 src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
index 35dc7e6..49c310c 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
@@ -405,6 +405,12 @@ static boolean do_winsys_init(struct radeon_drm_winsys *ws)
 radeon_get_drm_value(ws->fd, RADEON_INFO_NUM_TILE_PIPES, NULL,
  >info.num_tile_pipes);
 
+/* The kernel returns 12 for some cards for an unknown reason.
+ * I thought this was supposed to be a power of two.
+ */
+if (ws->gen == DRV_SI && ws->info.num_tile_pipes == 12)
+ws->info.num_tile_pipes = 8;
+
 if (radeon_get_drm_value(ws->fd, RADEON_INFO_BACKEND_MAP, NULL,
   >info.r600_gb_backend_map))
 ws->info.r600_gb_backend_map_valid = TRUE;
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 06/12] st/nine: Support ATI1/ATI2 for CubeTexture

2016-02-07 Thread Axel Davy

Texture and CubeTexture use common code,
and thus ATI1/ATI2 is already implemented
for CubeTexture.

Signed-off-by: Axel Davy 
---
 src/gallium/state_trackers/nine/adapter9.c   | 5 +++--
 src/gallium/state_trackers/nine/cubetexture9.c   | 4 
 src/gallium/state_trackers/nine/volumetexture9.c | 2 +-
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/src/gallium/state_trackers/nine/adapter9.c 
b/src/gallium/state_trackers/nine/adapter9.c
index 8428b1b..5e9c7f7 100644
--- a/src/gallium/state_trackers/nine/adapter9.c
+++ b/src/gallium/state_trackers/nine/adapter9.c
@@ -338,8 +338,9 @@ NineAdapter9_CheckDeviceFormat( struct NineAdapter9 *This,
 return D3DERR_NOTAVAILABLE;
 }
 
-/* we support ATI1 and ATI2 hack only for 2D textures */
-if (RType != D3DRTYPE_TEXTURE && (CheckFormat == D3DFMT_ATI1 || 
CheckFormat == D3DFMT_ATI2))
+/* we support ATI1 and ATI2 hack only for 2D and Cube textures */
+if (RType != D3DRTYPE_TEXTURE && RType != D3DRTYPE_CUBETEXTURE &&
+(CheckFormat == D3DFMT_ATI1 || CheckFormat == D3DFMT_ATI2))
 return D3DERR_NOTAVAILABLE;
 /* if (Usage & D3DUSAGE_NONSECURE) { don't know the implications of this } 
*/
 /* if (Usage & D3DUSAGE_SOFTWAREPROCESSING) { we can always support this } 
*/
diff --git a/src/gallium/state_trackers/nine/cubetexture9.c 
b/src/gallium/state_trackers/nine/cubetexture9.c
index 1749190..03b5fca 100644
--- a/src/gallium/state_trackers/nine/cubetexture9.c
+++ b/src/gallium/state_trackers/nine/cubetexture9.c
@@ -69,10 +69,6 @@ NineCubeTexture9_ctor( struct NineCubeTexture9 *This,
 if (pf == PIPE_FORMAT_NONE)
 return D3DERR_INVALIDCALL;
 
-/* We support ATI1 and ATI2 hacks only for 2D textures */
-if (Format == D3DFMT_ATI1 || Format == D3DFMT_ATI2)
-return D3DERR_INVALIDCALL;
-
 if (compressed_format(Format)) {
 const unsigned w = util_format_get_blockwidth(pf);
 const unsigned h = util_format_get_blockheight(pf);
diff --git a/src/gallium/state_trackers/nine/volumetexture9.c 
b/src/gallium/state_trackers/nine/volumetexture9.c
index cdec21f..cd94a36 100644
--- a/src/gallium/state_trackers/nine/volumetexture9.c
+++ b/src/gallium/state_trackers/nine/volumetexture9.c
@@ -63,7 +63,7 @@ NineVolumeTexture9_ctor( struct NineVolumeTexture9 *This,
 if (pf == PIPE_FORMAT_NONE)
 return D3DERR_INVALIDCALL;
 
-/* We support ATI1 and ATI2 hacks only for 2D textures */
+/* We support ATI1 and ATI2 hacks only for 2D and Cube textures */
 if (Format == D3DFMT_ATI1 || Format == D3DFMT_ATI2)
 return D3DERR_INVALIDCALL;
 
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 05/12] st/nine: Clean pSharedHandle Texture ctors checks

2016-02-07 Thread Axel Davy

Signed-off-by: Axel Davy 
---
 src/gallium/state_trackers/nine/cubetexture9.c   |  7 ---
 src/gallium/state_trackers/nine/texture9.c   | 25 
 src/gallium/state_trackers/nine/volumetexture9.c |  7 ---
 3 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/src/gallium/state_trackers/nine/cubetexture9.c 
b/src/gallium/state_trackers/nine/cubetexture9.c
index c6fa397..1749190 100644
--- a/src/gallium/state_trackers/nine/cubetexture9.c
+++ b/src/gallium/state_trackers/nine/cubetexture9.c
@@ -54,12 +54,13 @@ NineCubeTexture9_ctor( struct NineCubeTexture9 *This,
 Format, Pool, pSharedHandle);
 
 user_assert(EdgeLength, D3DERR_INVALIDCALL);
-user_assert(!pSharedHandle || Pool == D3DPOOL_DEFAULT, D3DERR_INVALIDCALL);
-user_assert(!(Usage & D3DUSAGE_AUTOGENMIPMAP) ||
-(Pool != D3DPOOL_SYSTEMMEM && Levels <= 1), 
D3DERR_INVALIDCALL);
 
+/* user_assert(!pSharedHandle || Pool == D3DPOOL_DEFAULT, 
D3DERR_INVALIDCALL); */
 user_assert(!pSharedHandle, D3DERR_INVALIDCALL); /* TODO */
 
+user_assert(!(Usage & D3DUSAGE_AUTOGENMIPMAP) ||
+(Pool != D3DPOOL_SYSTEMMEM && Levels <= 1), 
D3DERR_INVALIDCALL);
+
 if (Usage & D3DUSAGE_AUTOGENMIPMAP)
 Levels = 0;
 
diff --git a/src/gallium/state_trackers/nine/texture9.c 
b/src/gallium/state_trackers/nine/texture9.c
index 7338215..3052937 100644
--- a/src/gallium/state_trackers/nine/texture9.c
+++ b/src/gallium/state_trackers/nine/texture9.c
@@ -61,18 +61,22 @@ NineTexture9_ctor( struct NineTexture9 *This,
 d3dformat_to_string(Format), nine_D3DPOOL_to_str(Pool), pSharedHandle);
 
 user_assert(Width && Height, D3DERR_INVALIDCALL);
+
+/* pSharedHandle: can be non-null for ex only.
+ * D3DPOOL_SYSTEMMEM: Levels must be 1
+ * D3DPOOL_DEFAULT: no restriction for Levels
+ * Other Pools are forbidden. */
 user_assert(!pSharedHandle || pParams->device->ex, D3DERR_INVALIDCALL);
-/* When is used shared handle, Pool must be
- * SYSTEMMEM with Levels 1 or DEFAULT with any Levels */
-user_assert(!pSharedHandle || Pool != D3DPOOL_SYSTEMMEM || Levels == 1,
-D3DERR_INVALIDCALL);
-user_assert(!pSharedHandle || Pool == D3DPOOL_SYSTEMMEM || Pool == 
D3DPOOL_DEFAULT,
-D3DERR_INVALIDCALL);
-user_assert((Usage != D3DUSAGE_AUTOGENMIPMAP || Levels <= 1), 
D3DERR_INVALIDCALL);
+user_assert(!pSharedHandle ||
+(Pool == D3DPOOL_SYSTEMMEM && Levels == 1) ||
+Pool == D3DPOOL_DEFAULT, D3DERR_INVALIDCALL);
+
 user_assert(!(Usage & D3DUSAGE_AUTOGENMIPMAP) ||
-(Pool != D3DPOOL_SYSTEMMEM && Levels <= 1), 
D3DERR_INVALIDCALL);
+(Pool != D3DPOOL_SYSTEMMEM && Pool != D3DPOOL_SCRATCH && 
Levels <= 1),
+D3DERR_INVALIDCALL);
 
-/* TODO: implement buffer sharing (should work with cross process too)
+/* TODO: implement pSharedHandle for D3DPOOL_DEFAULT (cross process
+ * buffer sharing).
  *
  * Gem names may have fit but they're depreciated and won't work on 
render-nodes.
  * One solution is to use shm buffers. We would use a /dev/shm file, fill 
the first
@@ -85,9 +89,6 @@ NineTexture9_ctor( struct NineTexture9 *This,
  * invalid handle, that we would fail to import. Please note that we don't 
advertise
  * the flag indicating the support for that feature, but apps seem to not 
care.
  */
-user_assert(!pSharedHandle ||
-Pool == D3DPOOL_SYSTEMMEM ||
-Pool == D3DPOOL_DEFAULT, D3DERR_INVALIDCALL);
 
 if (pSharedHandle && Pool == D3DPOOL_DEFAULT) {
 if (!*pSharedHandle) {
diff --git a/src/gallium/state_trackers/nine/volumetexture9.c 
b/src/gallium/state_trackers/nine/volumetexture9.c
index cdfe7f2..cdec21f 100644
--- a/src/gallium/state_trackers/nine/volumetexture9.c
+++ b/src/gallium/state_trackers/nine/volumetexture9.c
@@ -49,14 +49,15 @@ NineVolumeTexture9_ctor( struct NineVolumeTexture9 *This,
 Usage, Format, Pool, pSharedHandle);
 
 user_assert(Width && Height && Depth, D3DERR_INVALIDCALL);
-user_assert(!pSharedHandle || Pool == D3DPOOL_DEFAULT, D3DERR_INVALIDCALL);
+
+/* user_assert(!pSharedHandle || Pool == D3DPOOL_DEFAULT, 
D3DERR_INVALIDCALL); */
+user_assert(!pSharedHandle, D3DERR_INVALIDCALL); /* TODO */
+
 /* An IDirect3DVolume9 cannot be bound as a render target can it ? */
 user_assert(!(Usage & (D3DUSAGE_RENDERTARGET | D3DUSAGE_DEPTHSTENCIL)),
 D3DERR_INVALIDCALL);
 user_assert(!(Usage & D3DUSAGE_AUTOGENMIPMAP), D3DERR_INVALIDCALL);
 
-user_assert(!pSharedHandle, D3DERR_INVALIDCALL); /* TODO */
-
 pf = d3d9_to_pipe_format_checked(screen, Format, PIPE_TEXTURE_3D, 0,
  PIPE_BIND_SAMPLER_VIEW, FALSE);
 if (pf == PIPE_FORMAT_NONE)
-- 
2.7.0

___
mesa-dev mailing list

Re: [Mesa-dev] [PATCH 10/12] st/nine: Remove usage of SQRT in ff code

2016-02-07 Thread Ilia Mirkin

On Sun, Feb 7, 2016 at 6:26 PM, Axel Davy  wrote:
> On 08/02/2016 00:21, Ilia Mirkin wrote:
>>
>> On Sun, Feb 7, 2016 at 6:13 PM, Axel Davy  wrote:
>>>
>>> SQRT is not supported everywhere, so replace
>>> it by RSQ + RCP
>>>
>>> Signed-off-by: Axel Davy 
>>> ---
>>>   src/gallium/state_trackers/nine/nine_ff.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/gallium/state_trackers/nine/nine_ff.c
>>> b/src/gallium/state_trackers/nine/nine_ff.c
>>> index a5466a7..894fc63 100644
>>> --- a/src/gallium/state_trackers/nine/nine_ff.c
>>> +++ b/src/gallium/state_trackers/nine/nine_ff.c
>>> @@ -563,7 +563,8 @@ nine_ff_build_vs(struct NineDevice9 *device, struct
>>> vs_build_ctx *vs)
>>>   struct ureg_src cPsz2 = ureg_DECL_constant(ureg, 27);
>>>
>>>   ureg_DP3(ureg, tmp_x, ureg_src(r[1]), ureg_src(r[1]));
>>> -ureg_SQRT(ureg, tmp_y, _X(tmp));
>>> +ureg_RSQ(ureg, tmp_y, _X(tmp));
>>> +ureg_RCP(ureg, tmp_y, _Y(tmp));
>>
>> I'd recommend doing
>>
>> ureg_MUL(ureg, tmp_y, _Y(tmp), _X(tmp))
>>
>> instead. That should be (a) more numerically stable (rcp doesn't have
>> great precision), and (b) not blow up for 0.
>
> Ok for the precision, but I'm not sure for 0
>
> With the mul version, with 0, it ends up computing inf * 0 = NaN,
> whereas with the rcp version, it does 1/inf == 0 (as far as I know),
> which is the expected result.

Hmmm... not sure what RSQ(0) returns actually. I assumed it was NaN.
What you really want is a "flush nan to 0" option on the mul like nvc0
has, but there's no way to express that in TGSI.

Perhaps you can keep the SQRT if PIPE_CAP_TGSI_SQRT is exposed, and
otherwise do the MUL or the RCP. FWIW this is what glsl_to_tgsi does:

 emit_scalar(ir, TGSI_OPCODE_RSQ, result_dst, op[0]);
 emit_asm(ir, TGSI_OPCODE_MUL, result_dst, result_src, op[0]);
 /* For incoming channels <= 0, set the result to 0. */
 op[0].negate = ~op[0].negate;
 emit_asm(ir, TGSI_OPCODE_CMP, result_dst,
  op[0], result_src, st_src_reg_for_float(0.0));
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 02/11] nvc0: bind constant buffers for compute on Fermi

2016-02-07 Thread Ilia Mirkin

Don't these all overwrite one another? i.e. 3d and compute? So don't
you need to adjust the state s.t. everything is dirtied on the "other
side" (coincidentally, it seems likely that compute only aliases with
frag shaders).

On Sun, Feb 7, 2016 at 3:49 PM, Samuel Pitoiset
 wrote:
> Loosely based on 3D.
>
> Changes from v2:
>  - get rid of the 's' param to nvc0_cb_bo_push() because it doesn't
>matter to upload constbufs for compute using the 3d chan
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/nvc0/nvc0_compute.c | 60 
> +
>  src/gallium/drivers/nouveau/nvc0/nvc0_context.c | 11 +++--
>  src/gallium/drivers/nouveau/nvc0/nvc0_screen.h  |  2 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_state.c   |  4 +-
>  4 files changed, 72 insertions(+), 5 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
> index 5c7dc0e..5985da5 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
> @@ -138,11 +138,71 @@ nvc0_compute_validate_program(struct nvc0_context *nvc0)
> return false;
>  }
>
> +static void
> +nvc0_compute_validate_constbufs(struct nvc0_context *nvc0)
> +{
> +   struct nouveau_pushbuf *push = nvc0->base.pushbuf;
> +   const int s = 5;
> +
> +   while (nvc0->constbuf_dirty[s]) {
> +  int i = ffs(nvc0->constbuf_dirty[s]) - 1;
> +  nvc0->constbuf_dirty[s] &= ~(1 << i);
> +
> +  if (nvc0->constbuf[s][i].user) {
> + struct nouveau_bo *bo = nvc0->screen->uniform_bo;
> + const unsigned base = s << 16;
> + const unsigned size = nvc0->constbuf[s][0].size;
> + assert(i == 0); /* we really only want OpenGL uniforms here */
> + assert(nvc0->constbuf[s][0].u.data);
> +
> + if (nvc0->state.uniform_buffer_bound[s] < size) {
> +nvc0->state.uniform_buffer_bound[s] = align(size, 0x100);
> +
> +BEGIN_NVC0(push, NVC0_COMPUTE(CB_SIZE), 3);
> +PUSH_DATA (push, nvc0->state.uniform_buffer_bound[s]);
> +PUSH_DATAh(push, bo->offset + base);
> +PUSH_DATA (push, bo->offset + base);
> +BEGIN_NVC0(push, NVC0_COMPUTE(CB_BIND), 1);
> +PUSH_DATA (push, (0 << 8) | 1);
> + }
> + nvc0_cb_bo_push(>base, bo, 
> NV_VRAM_DOMAIN(>screen->base),
> + base, nvc0->state.uniform_buffer_bound[s],
> + 0, (size + 3) / 4,
> + nvc0->constbuf[s][0].u.data);
> +  } else {
> + struct nv04_resource *res =
> +nv04_resource(nvc0->constbuf[s][i].u.buf);
> + if (res) {
> +BEGIN_NVC0(push, NVC0_COMPUTE(CB_SIZE), 3);
> +PUSH_DATA (push, nvc0->constbuf[s][i].size);
> +PUSH_DATAh(push, res->address + nvc0->constbuf[s][i].offset);
> +PUSH_DATA (push, res->address + nvc0->constbuf[s][i].offset);
> +BEGIN_NVC0(push, NVC0_COMPUTE(CB_BIND), 1);
> +PUSH_DATA (push, (i << 8) | 1);
> +
> +BCTX_REFN(nvc0->bufctx_cp, CP_CB(i), res, RD);
> +
> +res->cb_bindings[s] |= 1 << i;
> + } else {
> +BEGIN_NVC0(push, NVC0_COMPUTE(CB_BIND), 1);
> +PUSH_DATA (push, (i << 8) | 0);
> + }
> + if (i == 0)
> +nvc0->state.uniform_buffer_bound[s] = 0;
> +  }
> +   }
> +
> +   BEGIN_NVC0(push, NVC0_COMPUTE(FLUSH), 1);
> +   PUSH_DATA (push, NVC0_COMPUTE_FLUSH_CB);
> +}
> +
>  static bool
>  nvc0_compute_state_validate(struct nvc0_context *nvc0)
>  {
> if (!nvc0_compute_validate_program(nvc0))
>return false;
> +   if (nvc0->dirty_cp & NVC0_NEW_CP_CONSTBUF)
> +  nvc0_compute_validate_constbufs(nvc0);
>
> /* TODO: textures, samplers, surfaces, global memory buffers */
>
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
> index 547b8f5..4fed7b2 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.c
> @@ -241,15 +241,20 @@ nvc0_invalidate_resource_storage(struct nouveau_context 
> *ctx,
>}
>}
>
> -  for (s = 0; s < 5; ++s) {
> +  for (s = 0; s < 6; ++s) {
>for (i = 0; i < NVC0_MAX_PIPE_CONSTBUFS; ++i) {
>   if (!(nvc0->constbuf_valid[s] & (1 << i)))
>  continue;
>   if (!nvc0->constbuf[s][i].user &&
>   nvc0->constbuf[s][i].u.buf == res) {
> -nvc0->dirty |= NVC0_NEW_CONSTBUF;
>  nvc0->constbuf_dirty[s] |= 1 << i;
> -nouveau_bufctx_reset(nvc0->bufctx_3d, NVC0_BIND_CB(s, i));
> +if (unlikely(s == 5)) {
> +   nvc0->dirty_cp |= NVC0_NEW_CP_CONSTBUF;
> +   nouveau_bufctx_reset(nvc0->bufctx_cp, NVC0_BIND_CP_CB(i));
> +} else {
> +

Re: [Mesa-dev] [android-x86-devel] [PATCH 4/5] android: fix building with new glsl, nir, compiler libraries

2016-02-07 Thread Mauro Rossi

Hi,

Just to close this thread,
I've checked on both marshamallow-x86 and kitkat-x86 builds and the line

$(MESA_TOP)/src/glsl

which is now a non-existent path, is not needed anymore and can be removed
in LOCAL_C_INCLUDES for all the following android makefiles:

./src/mesa/Android.libmesa_glsl_utils.mk
./src/mesa/Android.libmesa_st_mesa.mk
./src/mesa/program/Android.mk
./src/mesa/Android.mesa_gen_matypes.mk
./src/gallium/drivers/r300/Android.mk

My doubt was that the new correct path $(MESA_TOP)/src/compiler/glsl had to
be included,
but I've checked and, without new glsl path, it builds just fine with both
marshmallow-x86 and with kitkat-x86.

My doubt was related to the different treatment wrt automatic header
picking, that has different behavior in L/M compared to kitkat.
I just wanted to be sure

Mauro

PS: There is a different building error appeared this week, but I'll submit
a separate patch, if needed, in the next days

$(MESA_TOP)/src/mesa/main path is needed in LOCAL_C_INCLUDES of
./src/mesa/program/Android.mk
to avoid the following building error:

external/mesa/src/mesa/program/prog_statevars.c:43:25: fatal error:
framebuffer.h: No such file or directory
compilation terminated.
build/core/binary.mk:512: recipe for target
'out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_program_intermediates/prog_statevars.o'
failed
make: ***
[out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_program_intermediates/prog_statevars.o]
Error 1
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/12] Some more Nine fixes

2016-02-07 Thread Axel Davy


The last patch awaits moderation because of its size,

you can find it here: 
https://github.com/iXit/Mesa-3D/commit/29e2ccf64273814071655d84aca69b6496fbb4bd


On 08/02/2016 00:13, Axel Davy wrote:

A few more patches I'd like to get in 11.2.

There a few cleanup patches and some fixes.

The last patch fixes build with llvm 32 bits
when it isn't built with -mstackrealign.
Basically Apps have a 4 byte aligned stack,
and it needs to be converted at some point
to 16 byte aligned stack to have SSE code
and llvm work correctly. I think the better
is to just realign at d3d entry points.

Any suggestion whether that last patch should
be sent to mesa stable or not ?

Yours.

Axel Davy (12):
   st/nine: Do not set resource usage for SYSTEMMEM
   st/nine: Do not set SHARED flag for shared textures.
   st/nine: Clean useless code in texture9.c
   st/nine: Move texture creation checks
   st/nine: Clean pSharedHandle Texture ctors checks
   st/nine: Support ATI1/ATI2 for CubeTexture
   st/nine: Add format checks to create_zs_or_rt_surface
   st/nine: SCRATCH does support all formats
   st/nine: Fix stateblocks crashes with lights
   st/nine: Remove usage of SQRT in ff code
   st/nine: Drop path for ureg_NRM and ureg_CLAMP
   st/nine: Align stack for entry points

  src/gallium/state_trackers/nine/adapter9.c |  44 +-
  src/gallium/state_trackers/nine/adapter9.h |  18 +-
  .../state_trackers/nine/authenticatedchannel9.c|  10 +-
  .../state_trackers/nine/authenticatedchannel9.h|  10 +-
  src/gallium/state_trackers/nine/basetexture9.c |  14 +-
  src/gallium/state_trackers/nine/basetexture9.h |  14 +-
  src/gallium/state_trackers/nine/buffer9.c  |   4 +-
  src/gallium/state_trackers/nine/buffer9.h  |   4 +-
  src/gallium/state_trackers/nine/cryptosession9.c   |  18 +-
  src/gallium/state_trackers/nine/cryptosession9.h   |  18 +-
  src/gallium/state_trackers/nine/cubetexture9.c |  25 +-
  src/gallium/state_trackers/nine/cubetexture9.h |  10 +-
  src/gallium/state_trackers/nine/device9.c  | 250 ++--
  src/gallium/state_trackers/nine/device9.h  | 232 +--
  src/gallium/state_trackers/nine/device9ex.c|  34 +-
  src/gallium/state_trackers/nine/device9ex.h|  36 +-
  src/gallium/state_trackers/nine/device9video.c |   6 +-
  src/gallium/state_trackers/nine/device9video.h |   6 +-
  src/gallium/state_trackers/nine/indexbuffer9.c |   6 +-
  src/gallium/state_trackers/nine/indexbuffer9.h |   6 +-
  src/gallium/state_trackers/nine/iunknown.c |   8 +-
  src/gallium/state_trackers/nine/iunknown.h |   9 +-
  src/gallium/state_trackers/nine/nine_alignment.h   |  14 +
  src/gallium/state_trackers/nine/nine_ff.c  |  31 +-
  src/gallium/state_trackers/nine/nine_lock.c| 444 ++---
  src/gallium/state_trackers/nine/nine_pipe.h|   8 +-
  .../state_trackers/nine/nineexoverlayextension.c   |   2 +-
  .../state_trackers/nine/nineexoverlayextension.h   |   2 +-
  src/gallium/state_trackers/nine/pixelshader9.c |   2 +-
  src/gallium/state_trackers/nine/pixelshader9.h |   2 +-
  src/gallium/state_trackers/nine/query9.c   |   8 +-
  src/gallium/state_trackers/nine/query9.h   |   8 +-
  src/gallium/state_trackers/nine/resource9.c|  14 +-
  src/gallium/state_trackers/nine/resource9.h|  14 +-
  src/gallium/state_trackers/nine/stateblock9.c  |  44 +-
  src/gallium/state_trackers/nine/stateblock9.h  |   4 +-
  src/gallium/state_trackers/nine/surface9.c |  26 +-
  src/gallium/state_trackers/nine/surface9.h |  12 +-
  src/gallium/state_trackers/nine/swapchain9.c   |  20 +-
  src/gallium/state_trackers/nine/swapchain9.h   |  12 +-
  src/gallium/state_trackers/nine/swapchain9ex.c |   6 +-
  src/gallium/state_trackers/nine/swapchain9ex.h |   6 +-
  src/gallium/state_trackers/nine/texture9.c |  48 +--
  src/gallium/state_trackers/nine/texture9.h |  10 +-
  src/gallium/state_trackers/nine/vertexbuffer9.c|   6 +-
  src/gallium/state_trackers/nine/vertexbuffer9.h|   6 +-
  .../state_trackers/nine/vertexdeclaration9.c   |   2 +-
  .../state_trackers/nine/vertexdeclaration9.h   |   2 +-
  src/gallium/state_trackers/nine/vertexshader9.c|   2 +-
  src/gallium/state_trackers/nine/vertexshader9.h|   2 +-
  src/gallium/state_trackers/nine/volume9.c  |  20 +-
  src/gallium/state_trackers/nine/volume9.h  |  14 +-
  src/gallium/state_trackers/nine/volumetexture9.c   |  23 +-
  src/gallium/state_trackers/nine/volumetexture9.h   |  10 +-
  54 files changed, 812 insertions(+), 794 deletions(-)
  create mode 100644 src/gallium/state_trackers/nine/nine_alignment.h



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 08/12] st/nine: SCRATCH does support all formats

2016-02-07 Thread Axel Davy

Add new argument to d3d9_to_pipe_format_checked to
be able to bypass format support checks. This argument
is set to TRUE when the requested Pool is SCRATCH.

Signed-off-by: Axel Davy 
---
 src/gallium/state_trackers/nine/adapter9.c   | 21 +++--
 src/gallium/state_trackers/nine/cubetexture9.c   |  4 +++-
 src/gallium/state_trackers/nine/device9.c|  2 +-
 src/gallium/state_trackers/nine/nine_pipe.h  |  8 ++--
 src/gallium/state_trackers/nine/surface9.c   |  3 ++-
 src/gallium/state_trackers/nine/swapchain9.c |  8 
 src/gallium/state_trackers/nine/texture9.c   |  4 +++-
 src/gallium/state_trackers/nine/volume9.c|  3 ++-
 src/gallium/state_trackers/nine/volumetexture9.c |  4 +++-
 9 files changed, 35 insertions(+), 22 deletions(-)

diff --git a/src/gallium/state_trackers/nine/adapter9.c 
b/src/gallium/state_trackers/nine/adapter9.c
index 5e9c7f7..e677c7b 100644
--- a/src/gallium/state_trackers/nine/adapter9.c
+++ b/src/gallium/state_trackers/nine/adapter9.c
@@ -207,11 +207,11 @@ NineAdapter9_CheckDeviceType( struct NineAdapter9 *This,
 dfmt = d3d9_to_pipe_format_checked(screen, AdapterFormat, PIPE_TEXTURE_2D,
1,
PIPE_BIND_DISPLAY_TARGET |
-   PIPE_BIND_SHARED, FALSE);
+   PIPE_BIND_SHARED, FALSE, FALSE);
 bfmt = d3d9_to_pipe_format_checked(screen, BackBufferFormat, 
PIPE_TEXTURE_2D,
1,
PIPE_BIND_DISPLAY_TARGET |
-   PIPE_BIND_SHARED, FALSE);
+   PIPE_BIND_SHARED, FALSE, FALSE);
 if (dfmt == PIPE_FORMAT_NONE || bfmt == PIPE_FORMAT_NONE) {
 DBG("Unsupported Adapter/BackBufferFormat.\n");
 return D3DERR_NOTAVAILABLE;
@@ -270,7 +270,7 @@ NineAdapter9_CheckDeviceFormat( struct NineAdapter9 *This,
 return hr;
 pf = d3d9_to_pipe_format_checked(screen, AdapterFormat, PIPE_TEXTURE_2D, 0,
  PIPE_BIND_DISPLAY_TARGET |
- PIPE_BIND_SHARED, FALSE);
+ PIPE_BIND_SHARED, FALSE, FALSE);
 if (pf == PIPE_FORMAT_NONE) {
 DBG("AdapterFormat %s not available.\n",
 d3dformat_to_string(AdapterFormat));
@@ -332,7 +332,8 @@ NineAdapter9_CheckDeviceFormat( struct NineAdapter9 *This,
 
 
 srgb = (Usage & (D3DUSAGE_QUERY_SRGBREAD | D3DUSAGE_QUERY_SRGBWRITE)) != 0;
-pf = d3d9_to_pipe_format_checked(screen, CheckFormat, target, 0, bind, 
srgb);
+pf = d3d9_to_pipe_format_checked(screen, CheckFormat, target,
+ 0, bind, srgb, FALSE);
 if (pf == PIPE_FORMAT_NONE) {
 DBG("NOT AVAILABLE\n");
 return D3DERR_NOTAVAILABLE;
@@ -379,7 +380,7 @@ NineAdapter9_CheckDeviceMultiSampleType( struct 
NineAdapter9 *This,
PIPE_BIND_TRANSFER_WRITE | PIPE_BIND_RENDER_TARGET;
 
 pf = d3d9_to_pipe_format_checked(screen, SurfaceFormat, PIPE_TEXTURE_2D,
- MultiSampleType, bind, FALSE);
+ MultiSampleType, bind, FALSE, FALSE);
 
 if (pf == PIPE_FORMAT_NONE) {
 DBG("%s with %u samples not available.\n",
@@ -418,16 +419,16 @@ NineAdapter9_CheckDepthStencilMatch( struct NineAdapter9 
*This,
 
 dfmt = d3d9_to_pipe_format_checked(screen, AdapterFormat, PIPE_TEXTURE_2D, 
0,
PIPE_BIND_DISPLAY_TARGET |
-   PIPE_BIND_SHARED, FALSE);
+   PIPE_BIND_SHARED, FALSE, FALSE);
 bfmt = d3d9_to_pipe_format_checked(screen, RenderTargetFormat,
PIPE_TEXTURE_2D, 0,
-   PIPE_BIND_RENDER_TARGET, FALSE);
+   PIPE_BIND_RENDER_TARGET, FALSE, FALSE);
 if (RenderTargetFormat == D3DFMT_NULL)
 bfmt = dfmt;
 zsfmt = d3d9_to_pipe_format_checked(screen, DepthStencilFormat,
 PIPE_TEXTURE_2D, 0,
 
d3d9_get_pipe_depth_format_bindings(DepthStencilFormat),
-FALSE);
+FALSE, FALSE);
 if (dfmt == PIPE_FORMAT_NONE ||
 bfmt == PIPE_FORMAT_NONE ||
 zsfmt == PIPE_FORMAT_NONE) {
@@ -462,10 +463,10 @@ NineAdapter9_CheckDeviceFormatConversion( struct 
NineAdapter9 *This,
 
 dfmt = d3d9_to_pipe_format_checked(screen, TargetFormat, PIPE_TEXTURE_2D, 
1,
PIPE_BIND_DISPLAY_TARGET |
-   PIPE_BIND_SHARED, FALSE);
+   PIPE_BIND_SHARED, FALSE, FALSE);
 bfmt =

[Mesa-dev] [PATCH 07/12] st/nine: Add format checks to create_zs_or_rt_surface

2016-02-07 Thread Axel Davy

Returns INVALIDCALL when trying to create a surface
of unsupported format.

In practice, apps are supposed to check for format
support before trying to create a render target
of that format. However some bad behaving apps
could just try to create the surface and deduce if
it failed that it wasn't supported.

Signed-off-by: Axel Davy 
---
 src/gallium/state_trackers/nine/device9.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/state_trackers/nine/device9.c 
b/src/gallium/state_trackers/nine/device9.c
index b6e75b4..3ebff3a 100644
--- a/src/gallium/state_trackers/nine/device9.c
+++ b/src/gallium/state_trackers/nine/device9.c
@@ -1126,6 +1126,9 @@ create_zs_or_rt_surface(struct NineDevice9 *This,
templ.nr_samples, templ.bind,
FALSE);
 
+if (templ.format == PIPE_FORMAT_NONE && Format != D3DFMT_NULL)
+return D3DERR_INVALIDCALL;
+
 desc.Format = Format;
 desc.Type = D3DRTYPE_SURFACE;
 desc.Usage = 0;
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 10/12] st/nine: Remove usage of SQRT in ff code

2016-02-07 Thread Axel Davy

SQRT is not supported everywhere, so replace
it by RSQ + RCP

Signed-off-by: Axel Davy 
---
 src/gallium/state_trackers/nine/nine_ff.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/nine/nine_ff.c 
b/src/gallium/state_trackers/nine/nine_ff.c
index a5466a7..894fc63 100644
--- a/src/gallium/state_trackers/nine/nine_ff.c
+++ b/src/gallium/state_trackers/nine/nine_ff.c
@@ -563,7 +563,8 @@ nine_ff_build_vs(struct NineDevice9 *device, struct 
vs_build_ctx *vs)
 struct ureg_src cPsz2 = ureg_DECL_constant(ureg, 27);
 
 ureg_DP3(ureg, tmp_x, ureg_src(r[1]), ureg_src(r[1]));
-ureg_SQRT(ureg, tmp_y, _X(tmp));
+ureg_RSQ(ureg, tmp_y, _X(tmp));
+ureg_RCP(ureg, tmp_y, _Y(tmp));
 ureg_MAD(ureg, tmp_x, _Y(tmp), _(cPsz2), _(cPsz2));
 ureg_MAD(ureg, tmp_x, _Y(tmp), _X(tmp), _(cPsz1));
 ureg_RCP(ureg, tmp_x, ureg_src(tmp));
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 09/12] st/nine: Fix stateblocks crashes with lights

2016-02-07 Thread Axel Davy

We had several issues of crashes with it.
This should fix it.

Signed-off-by: Axel Davy 
---
 src/gallium/state_trackers/nine/stateblock9.c | 40 +--
 1 file changed, 32 insertions(+), 8 deletions(-)

diff --git a/src/gallium/state_trackers/nine/stateblock9.c 
b/src/gallium/state_trackers/nine/stateblock9.c
index 0d1a04b..4789346 100644
--- a/src/gallium/state_trackers/nine/stateblock9.c
+++ b/src/gallium/state_trackers/nine/stateblock9.c
@@ -86,7 +86,7 @@ NineStateBlock9_dtor( struct NineStateBlock9 *This )
  */
 static void
 nine_state_copy_common(struct nine_state *dst,
-   const struct nine_state *src,
+   struct nine_state *src,
struct nine_state *mask, /* aliases either src or dst */
const boolean apply,
struct nine_range_pool *pool)
@@ -267,17 +267,41 @@ nine_state_copy_common(struct nine_state *dst,
 }
 }
 if (mask->changed.group & NINE_STATE_FF_LIGHTING) {
-if (dst->ff.num_lights < mask->ff.num_lights) {
+unsigned num_lights = MAX2(dst->ff.num_lights, src->ff.num_lights);
+/* Can happen in Capture() if device state has created new lights after
+ * the stateblock was created.
+ * Can happen in Apply() if the stateblock had recorded the creation of
+ * new lights. */
+if (dst->ff.num_lights < num_lights) {
 dst->ff.light = REALLOC(dst->ff.light,
 dst->ff.num_lights * sizeof(D3DLIGHT9),
-mask->ff.num_lights * sizeof(D3DLIGHT9));
-for (i = dst->ff.num_lights; i < mask->ff.num_lights; ++i) {
-memset(>ff.light[i], 0, sizeof(D3DLIGHT9));
-dst->ff.light[i].Type = (D3DLIGHTTYPE)NINED3DLIGHT_INVALID;
+num_lights * sizeof(D3DLIGHT9));
+memset(>ff.light[dst->ff.num_lights], 0, (num_lights - 
dst->ff.num_lights) * sizeof(D3DLIGHT9));
+/* if mask == dst, a Type of 0 will trigger
+ * "dst->ff.light[i] = src->ff.light[i];" later,
+ * which is what we want in that case. */
+if (mask != dst) {
+for (i = src->ff.num_lights; i < num_lights; ++i)
+src->ff.light[i].Type = (D3DLIGHTTYPE)NINED3DLIGHT_INVALID;
 }
-dst->ff.num_lights = mask->ff.num_lights;
+dst->ff.num_lights = num_lights;
 }
-for (i = 0; i < mask->ff.num_lights; ++i)
+/* Can happen in Capture() if the stateblock had recorded the creation 
of
+ * new lights.
+ * Can happen in Apply() if device state has created new lights after
+ * the stateblock was created. */
+if (src->ff.num_lights < num_lights) {
+src->ff.light = REALLOC(src->ff.light,
+src->ff.num_lights * sizeof(D3DLIGHT9),
+num_lights * sizeof(D3DLIGHT9));
+memset(>ff.light[src->ff.num_lights], 0, (num_lights - 
src->ff.num_lights) * sizeof(D3DLIGHT9));
+for (i = src->ff.num_lights; i < num_lights; ++i)
+src->ff.light[i].Type = (D3DLIGHTTYPE)NINED3DLIGHT_INVALID;
+src->ff.num_lights = num_lights;
+}
+/* Note: mask is either src or dst, so at this point src, dst and mask
+ * have num_lights lights. */
+for (i = 0; i < num_lights; ++i)
 if (mask->ff.light[i].Type != NINED3DLIGHT_INVALID)
 dst->ff.light[i] = src->ff.light[i];
 
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 04/12] st/nine: Move texture creation checks

2016-02-07 Thread Axel Davy

We were having checks at both Create*Texture functions
and in ctors.

Move all Create*Texture checks to ctors.

Signed-off-by: Axel Davy 
---
 src/gallium/state_trackers/nine/cubetexture9.c   |  2 ++
 src/gallium/state_trackers/nine/device9.c| 13 -
 src/gallium/state_trackers/nine/texture9.c   |  9 +
 src/gallium/state_trackers/nine/volumetexture9.c |  2 ++
 4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/src/gallium/state_trackers/nine/cubetexture9.c 
b/src/gallium/state_trackers/nine/cubetexture9.c
index 460cc85..c6fa397 100644
--- a/src/gallium/state_trackers/nine/cubetexture9.c
+++ b/src/gallium/state_trackers/nine/cubetexture9.c
@@ -53,6 +53,8 @@ NineCubeTexture9_ctor( struct NineCubeTexture9 *This,
 This, pParams, EdgeLength, Levels, Usage,
 Format, Pool, pSharedHandle);
 
+user_assert(EdgeLength, D3DERR_INVALIDCALL);
+user_assert(!pSharedHandle || Pool == D3DPOOL_DEFAULT, D3DERR_INVALIDCALL);
 user_assert(!(Usage & D3DUSAGE_AUTOGENMIPMAP) ||
 (Pool != D3DPOOL_SYSTEMMEM && Levels <= 1), 
D3DERR_INVALIDCALL);
 
diff --git a/src/gallium/state_trackers/nine/device9.c 
b/src/gallium/state_trackers/nine/device9.c
index 475ef96..b6e75b4 100644
--- a/src/gallium/state_trackers/nine/device9.c
+++ b/src/gallium/state_trackers/nine/device9.c
@@ -925,15 +925,6 @@ NineDevice9_CreateTexture( struct NineDevice9 *This,
  D3DUSAGE_SOFTWAREPROCESSING | D3DUSAGE_TEXTAPI;
 
 *ppTexture = NULL;
-user_assert(Width && Height, D3DERR_INVALIDCALL);
-user_assert(!pSharedHandle || This->ex, D3DERR_INVALIDCALL);
-/* When is used shared handle, Pool must be
- * SYSTEMMEM with Levels 1 or DEFAULT with any Levels */
-user_assert(!pSharedHandle || Pool != D3DPOOL_SYSTEMMEM || Levels == 1,
-D3DERR_INVALIDCALL);
-user_assert(!pSharedHandle || Pool == D3DPOOL_SYSTEMMEM || Pool == 
D3DPOOL_DEFAULT,
-D3DERR_INVALIDCALL);
-user_assert((Usage != D3DUSAGE_AUTOGENMIPMAP || Levels <= 1), 
D3DERR_INVALIDCALL);
 
 hr = NineTexture9_new(This, Width, Height, Levels, Usage, Format, Pool,
   , pSharedHandle);
@@ -967,8 +958,6 @@ NineDevice9_CreateVolumeTexture( struct NineDevice9 *This,
  D3DUSAGE_SOFTWAREPROCESSING;
 
 *ppVolumeTexture = NULL;
-user_assert(Width && Height && Depth, D3DERR_INVALIDCALL);
-user_assert(!pSharedHandle || Pool == D3DPOOL_DEFAULT, D3DERR_INVALIDCALL);
 
 hr = NineVolumeTexture9_new(This, Width, Height, Depth, Levels,
 Usage, Format, Pool, , pSharedHandle);
@@ -1001,8 +990,6 @@ NineDevice9_CreateCubeTexture( struct NineDevice9 *This,
  D3DUSAGE_SOFTWAREPROCESSING;
 
 *ppCubeTexture = NULL;
-user_assert(EdgeLength, D3DERR_INVALIDCALL);
-user_assert(!pSharedHandle || Pool == D3DPOOL_DEFAULT, D3DERR_INVALIDCALL);
 
 hr = NineCubeTexture9_new(This, EdgeLength, Levels, Usage, Format, Pool,
   , pSharedHandle);
diff --git a/src/gallium/state_trackers/nine/texture9.c 
b/src/gallium/state_trackers/nine/texture9.c
index 0bc37d3..7338215 100644
--- a/src/gallium/state_trackers/nine/texture9.c
+++ b/src/gallium/state_trackers/nine/texture9.c
@@ -60,6 +60,15 @@ NineTexture9_ctor( struct NineTexture9 *This,
 nine_D3DUSAGE_to_str(Usage),
 d3dformat_to_string(Format), nine_D3DPOOL_to_str(Pool), pSharedHandle);
 
+user_assert(Width && Height, D3DERR_INVALIDCALL);
+user_assert(!pSharedHandle || pParams->device->ex, D3DERR_INVALIDCALL);
+/* When is used shared handle, Pool must be
+ * SYSTEMMEM with Levels 1 or DEFAULT with any Levels */
+user_assert(!pSharedHandle || Pool != D3DPOOL_SYSTEMMEM || Levels == 1,
+D3DERR_INVALIDCALL);
+user_assert(!pSharedHandle || Pool == D3DPOOL_SYSTEMMEM || Pool == 
D3DPOOL_DEFAULT,
+D3DERR_INVALIDCALL);
+user_assert((Usage != D3DUSAGE_AUTOGENMIPMAP || Levels <= 1), 
D3DERR_INVALIDCALL);
 user_assert(!(Usage & D3DUSAGE_AUTOGENMIPMAP) ||
 (Pool != D3DPOOL_SYSTEMMEM && Levels <= 1), 
D3DERR_INVALIDCALL);
 
diff --git a/src/gallium/state_trackers/nine/volumetexture9.c 
b/src/gallium/state_trackers/nine/volumetexture9.c
index e5b2b53..cdfe7f2 100644
--- a/src/gallium/state_trackers/nine/volumetexture9.c
+++ b/src/gallium/state_trackers/nine/volumetexture9.c
@@ -48,6 +48,8 @@ NineVolumeTexture9_ctor( struct NineVolumeTexture9 *This,
 This, pParams, Width, Height, Depth, Levels,
 Usage, Format, Pool, pSharedHandle);
 
+user_assert(Width && Height && Depth, D3DERR_INVALIDCALL);
+user_assert(!pSharedHandle || Pool == D3DPOOL_DEFAULT, D3DERR_INVALIDCALL);
 /* An IDirect3DVolume9 cannot be bound as a render target can it ? */
 user_assert(!(Usage & (D3DUSAGE_RENDERTARGET | D3DUSAGE_DEPTHSTENCIL)),
 D3DERR_INVALIDCALL);
-- 
2.7.0

[Mesa-dev] [PATCH 01/12] st/nine: Do not set resource usage for SYSTEMMEM

2016-02-07 Thread Axel Davy

We do not create a resource for SYSTEMMEM textures,
thus we do not need to set resource usage.

The only exception is vertexbuffer SYSTEMMEM, since
we do use a pipe resource for them.

Signed-off-by: Axel Davy 
---
 src/gallium/state_trackers/nine/surface9.c | 11 ---
 src/gallium/state_trackers/nine/texture9.c |  3 ---
 src/gallium/state_trackers/nine/volume9.c  |  3 ---
 3 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/src/gallium/state_trackers/nine/surface9.c 
b/src/gallium/state_trackers/nine/surface9.c
index f88b75c..ce0f74c 100644
--- a/src/gallium/state_trackers/nine/surface9.c
+++ b/src/gallium/state_trackers/nine/surface9.c
@@ -116,13 +116,10 @@ NineSurface9_ctor( struct NineSurface9 *This,
 return E_OUTOFMEMORY;
 }
 
-if (pDesc->Pool == D3DPOOL_SYSTEMMEM) {
-This->base.info.usage = PIPE_USAGE_STAGING;
-assert(!pResource);
-} else {
-if (pResource && (pDesc->Usage & D3DUSAGE_DYNAMIC))
-pResource->flags |= NINE_RESOURCE_FLAG_LOCKABLE;
-}
+assert(pDesc->Pool != D3DPOOL_SYSTEMMEM || !pResource);
+
+if (pResource && (pDesc->Usage & D3DUSAGE_DYNAMIC))
+pResource->flags |= NINE_RESOURCE_FLAG_LOCKABLE;
 
 hr = NineResource9_ctor(>base, pParams, pResource, FALSE, 
D3DRTYPE_SURFACE,
 pDesc->Pool, pDesc->Usage);
diff --git a/src/gallium/state_trackers/nine/texture9.c 
b/src/gallium/state_trackers/nine/texture9.c
index ada08ce..a11dad4 100644
--- a/src/gallium/state_trackers/nine/texture9.c
+++ b/src/gallium/state_trackers/nine/texture9.c
@@ -143,9 +143,6 @@ NineTexture9_ctor( struct NineTexture9 *This,
 if (pSharedHandle)
 info->bind |= PIPE_BIND_SHARED;
 
-if (Pool == D3DPOOL_SYSTEMMEM)
-info->usage = PIPE_USAGE_STAGING;
-
 if (pSharedHandle && *pSharedHandle) { /* Pool == D3DPOOL_SYSTEMMEM */
 user_buffer = (void *)*pSharedHandle;
 level_offsets = alloca(sizeof(unsigned) * (info->last_level + 1));
diff --git a/src/gallium/state_trackers/nine/volume9.c 
b/src/gallium/state_trackers/nine/volume9.c
index f698892..5ef1141 100644
--- a/src/gallium/state_trackers/nine/volume9.c
+++ b/src/gallium/state_trackers/nine/volume9.c
@@ -116,9 +116,6 @@ NineVolume9_ctor( struct NineVolume9 *This,
 This->layer_stride = util_format_get_2d_size(This->info.format,
  This->stride, pDesc->Height);
 
-if (pDesc->Pool == D3DPOOL_SYSTEMMEM)
-This->info.usage = PIPE_USAGE_STAGING;
-
 if (!This->resource) {
 hr = NineVolume9_AllocateData(This);
 if (FAILED(hr))
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 00/12] Some more Nine fixes

2016-02-07 Thread Axel Davy

A few more patches I'd like to get in 11.2.

There a few cleanup patches and some fixes.

The last patch fixes build with llvm 32 bits
when it isn't built with -mstackrealign.
Basically Apps have a 4 byte aligned stack,
and it needs to be converted at some point
to 16 byte aligned stack to have SSE code
and llvm work correctly. I think the better
is to just realign at d3d entry points.

Any suggestion whether that last patch should
be sent to mesa stable or not ?

Yours.

Axel Davy (12):
  st/nine: Do not set resource usage for SYSTEMMEM
  st/nine: Do not set SHARED flag for shared textures.
  st/nine: Clean useless code in texture9.c
  st/nine: Move texture creation checks
  st/nine: Clean pSharedHandle Texture ctors checks
  st/nine: Support ATI1/ATI2 for CubeTexture
  st/nine: Add format checks to create_zs_or_rt_surface
  st/nine: SCRATCH does support all formats
  st/nine: Fix stateblocks crashes with lights
  st/nine: Remove usage of SQRT in ff code
  st/nine: Drop path for ureg_NRM and ureg_CLAMP
  st/nine: Align stack for entry points

 src/gallium/state_trackers/nine/adapter9.c |  44 +-
 src/gallium/state_trackers/nine/adapter9.h |  18 +-
 .../state_trackers/nine/authenticatedchannel9.c|  10 +-
 .../state_trackers/nine/authenticatedchannel9.h|  10 +-
 src/gallium/state_trackers/nine/basetexture9.c |  14 +-
 src/gallium/state_trackers/nine/basetexture9.h |  14 +-
 src/gallium/state_trackers/nine/buffer9.c  |   4 +-
 src/gallium/state_trackers/nine/buffer9.h  |   4 +-
 src/gallium/state_trackers/nine/cryptosession9.c   |  18 +-
 src/gallium/state_trackers/nine/cryptosession9.h   |  18 +-
 src/gallium/state_trackers/nine/cubetexture9.c |  25 +-
 src/gallium/state_trackers/nine/cubetexture9.h |  10 +-
 src/gallium/state_trackers/nine/device9.c  | 250 ++--
 src/gallium/state_trackers/nine/device9.h  | 232 +--
 src/gallium/state_trackers/nine/device9ex.c|  34 +-
 src/gallium/state_trackers/nine/device9ex.h|  36 +-
 src/gallium/state_trackers/nine/device9video.c |   6 +-
 src/gallium/state_trackers/nine/device9video.h |   6 +-
 src/gallium/state_trackers/nine/indexbuffer9.c |   6 +-
 src/gallium/state_trackers/nine/indexbuffer9.h |   6 +-
 src/gallium/state_trackers/nine/iunknown.c |   8 +-
 src/gallium/state_trackers/nine/iunknown.h |   9 +-
 src/gallium/state_trackers/nine/nine_alignment.h   |  14 +
 src/gallium/state_trackers/nine/nine_ff.c  |  31 +-
 src/gallium/state_trackers/nine/nine_lock.c| 444 ++---
 src/gallium/state_trackers/nine/nine_pipe.h|   8 +-
 .../state_trackers/nine/nineexoverlayextension.c   |   2 +-
 .../state_trackers/nine/nineexoverlayextension.h   |   2 +-
 src/gallium/state_trackers/nine/pixelshader9.c |   2 +-
 src/gallium/state_trackers/nine/pixelshader9.h |   2 +-
 src/gallium/state_trackers/nine/query9.c   |   8 +-
 src/gallium/state_trackers/nine/query9.h   |   8 +-
 src/gallium/state_trackers/nine/resource9.c|  14 +-
 src/gallium/state_trackers/nine/resource9.h|  14 +-
 src/gallium/state_trackers/nine/stateblock9.c  |  44 +-
 src/gallium/state_trackers/nine/stateblock9.h  |   4 +-
 src/gallium/state_trackers/nine/surface9.c |  26 +-
 src/gallium/state_trackers/nine/surface9.h |  12 +-
 src/gallium/state_trackers/nine/swapchain9.c   |  20 +-
 src/gallium/state_trackers/nine/swapchain9.h   |  12 +-
 src/gallium/state_trackers/nine/swapchain9ex.c |   6 +-
 src/gallium/state_trackers/nine/swapchain9ex.h |   6 +-
 src/gallium/state_trackers/nine/texture9.c |  48 +--
 src/gallium/state_trackers/nine/texture9.h |  10 +-
 src/gallium/state_trackers/nine/vertexbuffer9.c|   6 +-
 src/gallium/state_trackers/nine/vertexbuffer9.h|   6 +-
 .../state_trackers/nine/vertexdeclaration9.c   |   2 +-
 .../state_trackers/nine/vertexdeclaration9.h   |   2 +-
 src/gallium/state_trackers/nine/vertexshader9.c|   2 +-
 src/gallium/state_trackers/nine/vertexshader9.h|   2 +-
 src/gallium/state_trackers/nine/volume9.c  |  20 +-
 src/gallium/state_trackers/nine/volume9.h  |  14 +-
 src/gallium/state_trackers/nine/volumetexture9.c   |  23 +-
 src/gallium/state_trackers/nine/volumetexture9.h   |  10 +-
 54 files changed, 812 insertions(+), 794 deletions(-)
 create mode 100644 src/gallium/state_trackers/nine/nine_alignment.h

-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 02/12] st/nine: Do not set SHARED flag for shared textures.

2016-02-07 Thread Axel Davy

We do not support shared textures, thus no need to set
the shared flag.

Signed-off-by: Axel Davy 
---
 src/gallium/state_trackers/nine/texture9.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/src/gallium/state_trackers/nine/texture9.c 
b/src/gallium/state_trackers/nine/texture9.c
index a11dad4..6d1f897 100644
--- a/src/gallium/state_trackers/nine/texture9.c
+++ b/src/gallium/state_trackers/nine/texture9.c
@@ -140,9 +140,6 @@ NineTexture9_ctor( struct NineTexture9 *This,
 DBG("Application asked for Software Vertex Processing, "
 "but this is unimplemented\n");
 
-if (pSharedHandle)
-info->bind |= PIPE_BIND_SHARED;
-
 if (pSharedHandle && *pSharedHandle) { /* Pool == D3DPOOL_SYSTEMMEM */
 user_buffer = (void *)*pSharedHandle;
 level_offsets = alloca(sizeof(unsigned) * (info->last_level + 1));
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 11/12] st/nine: Drop path for ureg_NRM and ureg_CLAMP

2016-02-07 Thread Axel Davy

using MIN/MAX is fine instead of CLAMP.
NRM doesn't exist anymore.

Signed-off-by: Axel Davy 
---
 src/gallium/state_trackers/nine/nine_ff.c | 28 
 1 file changed, 4 insertions(+), 24 deletions(-)

diff --git a/src/gallium/state_trackers/nine/nine_ff.c 
b/src/gallium/state_trackers/nine/nine_ff.c
index 894fc63..a9d9e75 100644
--- a/src/gallium/state_trackers/nine/nine_ff.c
+++ b/src/gallium/state_trackers/nine/nine_ff.c
@@ -24,8 +24,6 @@
 #include "util/u_hash_table.h"
 #include "util/u_upload_mgr.h"
 
-#define NINE_TGSI_LAZY_DEVS 1
-
 #define DBG_CHANNEL DBG_FF
 
 #define NINE_FF_NUM_VS_CONST 256
@@ -319,15 +317,11 @@ ureg_normalize3(struct ureg_program *ureg,
 struct ureg_dst dst, struct ureg_src src,
 struct ureg_dst tmp)
 {
-#ifdef NINE_TGSI_LAZY_DEVS
 struct ureg_dst tmp_x = ureg_writemask(tmp, TGSI_WRITEMASK_X);
 
 ureg_DP3(ureg, tmp_x, src, src);
 ureg_RSQ(ureg, tmp_x, _X(tmp));
 ureg_MUL(ureg, dst, src, _X(tmp));
-#else
-ureg_NRM(ureg, dst, src);
-#endif
 }
 
 static void *
@@ -549,15 +543,8 @@ nine_ff_build_vs(struct NineDevice9 *device, struct 
vs_build_ctx *vs)
  */
 if (key->vertexpointsize) {
 struct ureg_src cPsz1 = ureg_DECL_constant(ureg, 26);
-#ifdef NINE_TGSI_LAZY_DEVS
-struct ureg_dst tmp_clamp = ureg_DECL_temporary(ureg);
-
-ureg_MAX(ureg, tmp_clamp, vs->aPsz, _(cPsz1));
-ureg_MIN(ureg, oPsz, ureg_src(tmp_clamp), _(cPsz1));
-ureg_release_temporary(ureg, tmp_clamp);
-#else
-ureg_CLAMP(ureg, oPsz, vs->aPsz, _(cPsz1), _(cPsz1));
-#endif
+ureg_MAX(ureg, tmp_x, _(vs->aPsz), _(cPsz1));
+ureg_MIN(ureg, oPsz, _X(tmp), _(cPsz1));
 } else if (key->pointscale) {
 struct ureg_src cPsz1 = ureg_DECL_constant(ureg, 26);
 struct ureg_src cPsz2 = ureg_DECL_constant(ureg, 27);
@@ -569,15 +556,8 @@ nine_ff_build_vs(struct NineDevice9 *device, struct 
vs_build_ctx *vs)
 ureg_MAD(ureg, tmp_x, _Y(tmp), _X(tmp), _(cPsz1));
 ureg_RCP(ureg, tmp_x, ureg_src(tmp));
 ureg_MUL(ureg, tmp_x, ureg_src(tmp), _(cPsz1));
-#ifdef NINE_TGSI_LAZY_DEVS
-struct ureg_dst tmp_clamp = ureg_DECL_temporary(ureg);
-
-ureg_MAX(ureg, tmp_clamp, _X(tmp), _(cPsz1));
-ureg_MIN(ureg, oPsz, ureg_src(tmp_clamp), _(cPsz1));
-ureg_release_temporary(ureg, tmp_clamp);
-#else
-ureg_CLAMP(ureg, oPsz, _X(tmp), _(cPsz1), _(cPsz1));
-#endif
+ureg_MAX(ureg, tmp_x, _X(tmp), _(cPsz1));
+ureg_MIN(ureg, oPsz, _X(tmp), _(cPsz1));
 }
 
 for (i = 0; i < 8; ++i) {
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 03/12] st/nine: Clean useless code in texture9.c

2016-02-07 Thread Axel Davy

This->base.base.resource is worth NULL
for SYSTEMMEM textures.

Signed-off-by: Axel Davy 
---
 src/gallium/state_trackers/nine/texture9.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/src/gallium/state_trackers/nine/texture9.c 
b/src/gallium/state_trackers/nine/texture9.c
index 6d1f897..0bc37d3 100644
--- a/src/gallium/state_trackers/nine/texture9.c
+++ b/src/gallium/state_trackers/nine/texture9.c
@@ -48,7 +48,6 @@ NineTexture9_ctor( struct NineTexture9 *This,
 {
 struct pipe_screen *screen = pParams->device->screen;
 struct pipe_resource *info = >base.base.info;
-struct pipe_resource *resource;
 enum pipe_format pf;
 unsigned *level_offsets;
 unsigned l;
@@ -182,11 +181,6 @@ NineTexture9_ctor( struct NineTexture9 *This,
 sfdesc.MultiSampleType = D3DMULTISAMPLE_NONE;
 sfdesc.MultiSampleQuality = 0;
 
-if (Pool == D3DPOOL_SYSTEMMEM)
-resource = NULL;
-else
-resource = This->base.base.resource;
-
 for (l = 0; l <= info->last_level; ++l) {
 sfdesc.Width = u_minify(Width, l);
 sfdesc.Height = u_minify(Height, l);
@@ -196,7 +190,7 @@ NineTexture9_ctor( struct NineTexture9 *This,
 level_offsets[l] : NULL;
 
 hr = NineSurface9_new(This->base.base.base.device, NineUnknown(This),
-  resource, user_buffer_for_level,
+  This->base.base.resource, user_buffer_for_level,
   D3DRTYPE_TEXTURE, l, 0,
   , >surfaces[l]);
 if (FAILED(hr))
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 10/12] st/nine: Remove usage of SQRT in ff code

2016-02-07 Thread Ilia Mirkin

On Sun, Feb 7, 2016 at 6:13 PM, Axel Davy  wrote:
> SQRT is not supported everywhere, so replace
> it by RSQ + RCP
>
> Signed-off-by: Axel Davy 
> ---
>  src/gallium/state_trackers/nine/nine_ff.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/state_trackers/nine/nine_ff.c 
> b/src/gallium/state_trackers/nine/nine_ff.c
> index a5466a7..894fc63 100644
> --- a/src/gallium/state_trackers/nine/nine_ff.c
> +++ b/src/gallium/state_trackers/nine/nine_ff.c
> @@ -563,7 +563,8 @@ nine_ff_build_vs(struct NineDevice9 *device, struct 
> vs_build_ctx *vs)
>  struct ureg_src cPsz2 = ureg_DECL_constant(ureg, 27);
>
>  ureg_DP3(ureg, tmp_x, ureg_src(r[1]), ureg_src(r[1]));
> -ureg_SQRT(ureg, tmp_y, _X(tmp));
> +ureg_RSQ(ureg, tmp_y, _X(tmp));
> +ureg_RCP(ureg, tmp_y, _Y(tmp));

I'd recommend doing

ureg_MUL(ureg, tmp_y, _Y(tmp), _X(tmp))

instead. That should be (a) more numerically stable (rcp doesn't have
great precision), and (b) not blow up for 0.

>  ureg_MAD(ureg, tmp_x, _Y(tmp), _(cPsz2), _(cPsz2));
>  ureg_MAD(ureg, tmp_x, _Y(tmp), _X(tmp), _(cPsz1));
>  ureg_RCP(ureg, tmp_x, ureg_src(tmp));
> --
> 2.7.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 94016] make check MesaExtensionsTest.AlphabeticallySorted regression

2016-02-07 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=94016

Vinson Lee  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Vinson Lee  ---
commit ccaf734275ede89bfc86f274a64570be715fed94
Author: Vinson Lee 
Date:   Fri Feb 5 23:16:31 2016 -0800

mesa/extensions: Fix NVX_gpu_memory_info lexicographical order.

Fixes MesaExtensionsTest.AlphabeticallySorted.

Fixes: 1d79b9958090 ("mesa: implement GL_NVX_gpu_memory_info (v2)")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94016
Signed-off-by: Vinson Lee 
Reviewed-by: Marek OlÅ¡Ã¡k 

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 10/12] st/nine: Remove usage of SQRT in ff code

2016-02-07 Thread Axel Davy


On 08/02/2016 00:21, Ilia Mirkin wrote:

On Sun, Feb 7, 2016 at 6:13 PM, Axel Davy  wrote:

SQRT is not supported everywhere, so replace
it by RSQ + RCP

Signed-off-by: Axel Davy 
---
  src/gallium/state_trackers/nine/nine_ff.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/nine/nine_ff.c 
b/src/gallium/state_trackers/nine/nine_ff.c
index a5466a7..894fc63 100644
--- a/src/gallium/state_trackers/nine/nine_ff.c
+++ b/src/gallium/state_trackers/nine/nine_ff.c
@@ -563,7 +563,8 @@ nine_ff_build_vs(struct NineDevice9 *device, struct 
vs_build_ctx *vs)
  struct ureg_src cPsz2 = ureg_DECL_constant(ureg, 27);

  ureg_DP3(ureg, tmp_x, ureg_src(r[1]), ureg_src(r[1]));
-ureg_SQRT(ureg, tmp_y, _X(tmp));
+ureg_RSQ(ureg, tmp_y, _X(tmp));
+ureg_RCP(ureg, tmp_y, _Y(tmp));

I'd recommend doing

ureg_MUL(ureg, tmp_y, _Y(tmp), _X(tmp))

instead. That should be (a) more numerically stable (rcp doesn't have
great precision), and (b) not blow up for 0.

Ok for the precision, but I'm not sure for 0

With the mul version, with 0, it ends up computing inf * 0 = NaN,
whereas with the rcp version, it does 1/inf == 0 (as far as I know),
which is the expected result.




  ureg_MAD(ureg, tmp_x, _Y(tmp), _(cPsz2), _(cPsz2));
  ureg_MAD(ureg, tmp_x, _Y(tmp), _X(tmp), _(cPsz1));
  ureg_RCP(ureg, tmp_x, ureg_src(tmp));
--
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [llvm] r259796 - [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads

2016-02-07 Thread Simon Pilgrim

Michel, thanks for the report, this should be fixed by rL259991.

> On 5 Feb 2016, at 10:10, Michel Dänzer  wrote:
> 
> 
> Hi Simon,
> 
> 
> On 05.02.2016 01:12, Simon Pilgrim via llvm-commits wrote:
>> Author: rksimon
>> Date: Thu Feb  4 10:12:56 2016
>> New Revision: 259796
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=259796=rev
>> Log:
>> [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to 
>> EltsFromConsecutiveLoads
>> 
>> This patch adds support for consecutive (load/undef elements) 32-bit loads, 
>> followed by trailing undef/zero elements to be combined to a single MOVD 
>> load.
>> 
>> Differential Revision: http://reviews.llvm.org/D16729
> 
> This change introduced an assertion failure with the Mesa llvmpipe
> driver unit test lp_test_format. See below for information about the
> CPU, the IR, the assertion failure and the backtrace.
> 
> 
> processor : 0
> vendor_id : AuthenticAMD
> cpu family: 21
> model : 48
> model name: AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
> stepping  : 1
> microcode : 0x6003106
> cpu MHz   : 4100.000
> cache size: 2048 KB
> physical id   : 0
> siblings  : 4
> core id   : 0
> cpu cores : 2
> apicid: 16
> initial apicid: 0
> fpu   : yes
> fpu_exception : yes
> cpuid level   : 13
> wp: yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
> pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
> rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf 
> eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave 
> avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 
> 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext 
> perfctr_core perfctr_nb bpext arat cpb hw_pstate npt lbrv svm_lock nrip_save 
> tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold 
> vmmcall fsgsbase bmi1 xsaveopt
> bugs  : fxsave_leak sysret_ss_attrs
> bogomips  : 8200.55
> TLB size  : 1536 4K pages
> clflush size  : 64
> cache_alignment   : 64
> address sizes : 48 bits physical, 48 bits virtual
> power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [13]
> 
> 
> define void @fetch_r32_unorm_float(<4 x float>*, i8*, i32, i32, { [2048 x 
> i32], [128 x i64] }*) {
> entry:
>  %5 = getelementptr i8, i8* %1, i32 0
>  %6 = bitcast i8* %5 to i32*
>  %7 = load i32, i32* %6
>  %8 = insertelement <4 x i32> undef, i32 %7, i32 0
>  %9 = shufflevector <4 x i32> %8, <4 x i32> undef, <4 x i32> zeroinitializer
>  %10 = lshr <4 x i32> %9, 
>  %11 = and <4 x i32> %10, 
>  %12 = uitofp <4 x i32> %11 to <4 x float>
>  %13 = fmul <4 x float> %12,  float 0.00e+00, float 0.00e+00>
>  %14 = shufflevector <4 x float> %13, <4 x float>  1.00e+00, float undef, float undef>, <4 x i32>  5>
>  store <4 x float> %14, <4 x float>* %0
>  ret void
> }
> 
> 
> lp_test_format: ../lib/CodeGen/SelectionDAG/SelectionDAG.cpp:5776: 
> llvm::SDNode* llvm::SelectionDAG::UpdateNodeOperands(llvm::SDNode*, 
> llvm::SDValue, llvm::SDValue): Assertion `N->getNumOperands() == 2 && "Update 
> with wrong number of operands"' failed.
> 
> Program received signal SIGABRT, Aborted.
> 0x745a7507 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/unix/sysv/linux/raise.c:55
> 55../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  0x745a7507 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/unix/sysv/linux/raise.c:55
> #1  0x745a88da in __GI_abort () at abort.c:89
> #2  0x745a059d in __assert_fail_base (fmt=0x746dd6b8 "%s%s%s:%u: 
> %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x76c99f70 
> "N->getNumOperands() == 2 && \"Update with wrong number of operands\"", 
>file=file@entry=0x76c97f40 
> "../lib/CodeGen/SelectionDAG/SelectionDAG.cpp", line=line@entry=5776, 
>function=function@entry=0x76ca3bc0 
>  llvm::SDValue)::__PRETTY_FUNCTION__> "llvm::SDNode* 
> llvm::SelectionDAG::UpdateNodeOperands(llvm::SDNode*, llvm::SDValue, 
> llvm::SDValue)")
>at assert.c:92
> #3  0x745a0652 in __GI___assert_fail 
> (assertion=assertion@entry=0x76c99f70 "N->getNumOperands() == 2 && 
> \"Update with wrong number of operands\"", file=file@entry=0x76c97f40 
> "../lib/CodeGen/SelectionDAG/SelectionDAG.cpp", line=line@entry=5776, 
>function=function@entry=0x76ca3bc0 
>  llvm::SDValue)::__PRETTY_FUNCTION__> "llvm::SDNode* 
> llvm::SelectionDAG::UpdateNodeOperands(llvm::SDNode*, llvm::SDValue, 
> llvm::SDValue)")
>at assert.c:101
> #4  0x75f87451 in llvm::SelectionDAG::UpdateNodeOperands 
> (this=, N=N@entry=0x8af230, Op1=..., Op2=...) at 
>

Re: [Mesa-dev] [PATCH 01/12] nvc0: allocate an area for compute user constbufs

2016-02-07 Thread Michael Schellenberger Costa

Hi,

Am 06/02/2016 um 23:38 schrieb Samuel Pitoiset:
> For compute shaders, we might need to upload uniforms.
> 
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 14 +++---
>  src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c | 12 ++--
>  src/gallium/drivers/nouveau/nvc0/nvc0_tex.c|  2 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c| 10 ++
>  4 files changed, 20 insertions(+), 18 deletions(-)
> 
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> index 2b12de4..84e4253 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> @@ -889,7 +889,7 @@ nvc0_screen_create(struct nouveau_device *dev)
>  */
> nouveau_heap_init(>text_heap, 0, (1 << 20) - 0x100);
>  
> -   ret = nouveau_bo_new(dev, NV_VRAM_DOMAIN(>base), 1 << 12, 6 << 
> 16, NULL,
> +   ret = nouveau_bo_new(dev, NV_VRAM_DOMAIN(>base), 1 << 12, 7 << 
> 16, NULL,
>  >uniform_bo);

There aren't any enums for those magic numbers here and below?

> if (ret)
>goto fail;
> @@ -901,8 +901,8 @@ nvc0_screen_create(struct nouveau_device *dev)
>/* auxiliary constants (6 user clip planes, base instance id) */
>BEGIN_NVC0(push, NVC0_3D(CB_SIZE), 3);
>PUSH_DATA (push, 1024);
> -  PUSH_DATAh(push, screen->uniform_bo->offset + (5 << 16) + (i << 10));
> -  PUSH_DATA (push, screen->uniform_bo->offset + (5 << 16) + (i << 10));
> +  PUSH_DATAh(push, screen->uniform_bo->offset + (6 << 16) + (i << 10));
> +  PUSH_DATA (push, screen->uniform_bo->offset + (6 << 16) + (i << 10));
The pattern (N << 16) + (M << 10)) seems repetitive, would a helper make
sense here (Might help to avoid the magic numbers)?

Michael

>BEGIN_NVC0(push, NVC0_3D(CB_BIND(i)), 1);
>PUSH_DATA (push, (15 << 4) | 1);
>if (screen->eng3d->oclass >= NVE4_3D_CLASS) {
> @@ -922,8 +922,8 @@ nvc0_screen_create(struct nouveau_device *dev)
> /* return { 0.0, 0.0, 0.0, 0.0 } for out-of-bounds vtxbuf access */
> BEGIN_NVC0(push, NVC0_3D(CB_SIZE), 3);
> PUSH_DATA (push, 256);
> -   PUSH_DATAh(push, screen->uniform_bo->offset + (5 << 16) + (6 << 10));
> -   PUSH_DATA (push, screen->uniform_bo->offset + (5 << 16) + (6 << 10));
> +   PUSH_DATAh(push, screen->uniform_bo->offset + (6 << 16) + (6 << 10));
> +   PUSH_DATA (push, screen->uniform_bo->offset + (6 << 16) + (6 << 10));
> BEGIN_1IC0(push, NVC0_3D(CB_POS), 5);
> PUSH_DATA (push, 0);
> PUSH_DATAf(push, 0.0f);
> @@ -931,8 +931,8 @@ nvc0_screen_create(struct nouveau_device *dev)
> PUSH_DATAf(push, 0.0f);
> PUSH_DATAf(push, 0.0f);
> BEGIN_NVC0(push, NVC0_3D(VERTEX_RUNOUT_ADDRESS_HIGH), 2);
> -   PUSH_DATAh(push, screen->uniform_bo->offset + (5 << 16) + (6 << 10));
> -   PUSH_DATA (push, screen->uniform_bo->offset + (5 << 16) + (6 << 10));
> +   PUSH_DATAh(push, screen->uniform_bo->offset + (6 << 16) + (6 << 10));
> +   PUSH_DATA (push, screen->uniform_bo->offset + (6 << 16) + (6 << 10));
>  
> if (screen->base.drm->version >= 0x01000101) {
>ret = nouveau_getparam(dev, NOUVEAU_GETPARAM_GRAPH_UNITS, );
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
> index c17223a..2bb9b44 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
> @@ -184,8 +184,8 @@ nvc0_validate_fb(struct nvc0_context *nvc0)
>  ms = 1 << ms_mode;
>  BEGIN_NVC0(push, NVC0_3D(CB_SIZE), 3);
>  PUSH_DATA (push, 1024);
> -PUSH_DATAh(push, nvc0->screen->uniform_bo->offset + (5 << 16) + (4 << 
> 10));
> -PUSH_DATA (push, nvc0->screen->uniform_bo->offset + (5 << 16) + (4 << 
> 10));
> +PUSH_DATAh(push, nvc0->screen->uniform_bo->offset + (6 << 16) + (4 << 
> 10));
> +PUSH_DATA (push, nvc0->screen->uniform_bo->offset + (6 << 16) + (4 << 
> 10));
>  BEGIN_1IC0(push, NVC0_3D(CB_POS), 1 + 2 * ms);
>  PUSH_DATA (push, 256 + 128);
>  for (i = 0; i < ms; i++) {
> @@ -318,8 +318,8 @@ nvc0_upload_uclip_planes(struct nvc0_context *nvc0, 
> unsigned s)
>  
> BEGIN_NVC0(push, NVC0_3D(CB_SIZE), 3);
> PUSH_DATA (push, 1024);
> -   PUSH_DATAh(push, bo->offset + (5 << 16) + (s << 10));
> -   PUSH_DATA (push, bo->offset + (5 << 16) + (s << 10));
> +   PUSH_DATAh(push, bo->offset + (6 << 16) + (s << 10));
> +   PUSH_DATA (push, bo->offset + (6 << 16) + (s << 10));
> BEGIN_1IC0(push, NVC0_3D(CB_POS), PIPE_MAX_CLIP_PLANES * 4 + 1);
> PUSH_DATA (push, 256);
> PUSH_DATAp(push, >clip.ucp[0][0], PIPE_MAX_CLIP_PLANES * 4);
> @@ -479,8 +479,8 @@ nvc0_validate_buffers(struct nvc0_context *nvc0)
> for (s = 0; s < 5; s++) {
>BEGIN_NVC0(push, NVC0_3D(CB_SIZE), 3);
>PUSH_DATA (push, 1024);
> -

[Mesa-dev] [PATCH 3/5] mesa: drop unused nonconst sampler functions.

2016-02-07 Thread Dave Airlie

Since we fixed the glsl->tgsi conversion we no longer need
this function.

Signed-off-by: Dave Airlie 
---
 src/mesa/program/sampler.cpp | 10 --
 src/mesa/program/sampler.h   |  4 
 2 files changed, 14 deletions(-)

diff --git a/src/mesa/program/sampler.cpp b/src/mesa/program/sampler.cpp
index f118552..994495a 100644
--- a/src/mesa/program/sampler.cpp
+++ b/src/mesa/program/sampler.cpp
@@ -132,13 +132,3 @@ _mesa_get_sampler_uniform_value(class ir_dereference 
*sampler,
   getname.offset;
 }
 
-
-class ir_rvalue *
-_mesa_get_sampler_array_nonconst_index(class ir_dereference *sampler)
-{
-   ir_dereference_array *deref_arr = sampler->as_dereference_array();
-   if (!deref_arr || deref_arr->array_index->as_constant())
-  return NULL;
-
-   return deref_arr->array_index;
-}
diff --git a/src/mesa/program/sampler.h b/src/mesa/program/sampler.h
index 61c7f58..397805a 100644
--- a/src/mesa/program/sampler.h
+++ b/src/mesa/program/sampler.h
@@ -32,8 +32,4 @@ _mesa_get_sampler_uniform_value(class ir_dereference *sampler,
struct gl_shader_program *shader_program,
const struct gl_program *prog);
 
-class ir_rvalue *
-_mesa_get_sampler_array_nonconst_index(class ir_dereference *sampler);
-
-
 #endif /* SAMPLER_H */
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] gallium AoA and indirect samplers fixes (second posting)

2016-02-07 Thread Dave Airlie

Since the last set posted, I've taken Timothy's review on board,
discovered the double life of var->data.location and managed
to find a balance between base/index issues that doesn't regress.

This should work a lot better, despite increasing the side of
ir_variable.

Dave.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/5] st/mesa: handle indirect samplers in arrays/structs properly (v4)

2016-02-07 Thread Dave Airlie

From: Dave Airlie 

The state tracker never handled this properly, and it finally
annoyed me for the second time so I decided to fix it properly.

This is inspired by the NIR sampler lowering code and I only realised
NIR seems to do its deref ordering different to GLSL at the last
minute, once I got that things got much easier.

it fixes a bunch of tests in
tests/spec/arb_gpu_shader5/execution/sampler_array_indexing/

v2: fix AoA tests when forced on.
I was right I didn't need all that code, fixing the AoA code
meant cleaning up a chunk of code I didn't like in the array
handling.

v3: start generalising the code a bit more for atomics.
v3.1: use UniformRemapTable

v4: handle uniforms differently using the param_index,
and go back to UniformStorage
fix issues identified by Timothy with deref handling.

Signed-off-by: Dave Airlie 
---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 136 +
 1 file changed, 118 insertions(+), 18 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 4b5f2a3..9e268bc 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -40,7 +40,6 @@
 #include "main/uniforms.h"
 #include "main/shaderapi.h"
 #include "program/prog_instruction.h"
-#include "program/sampler.h"
 
 #include "pipe/p_context.h"
 #include "pipe/p_screen.h"
@@ -257,6 +256,7 @@ public:
GLboolean cond_update;
bool saturate;
st_src_reg sampler; /**< sampler register */
+   int sampler_base;
int sampler_array_size; /**< 1-based size of sampler array, 1 if not array 
*/
int tex_target; /**< One of TEXTURE_*_INDEX */
glsl_base_type tex_type;
@@ -502,6 +502,19 @@ public:
 
void emit_arl(ir_instruction *ir, st_dst_reg dst, st_src_reg src0);
 
+   void get_deref_offsets(ir_dereference *ir,
+  unsigned *array_size,
+  unsigned *base,
+  unsigned *index,
+  st_src_reg *reladdr);
+  void calc_deref_offsets(ir_dereference *head,
+  ir_dereference *tail,
+  unsigned *array_elements,
+  unsigned *base,
+  unsigned *index,
+  st_src_reg *indirect,
+  unsigned *location);
+
bool try_emit_mad(ir_expression *ir,
   int mul_operand);
bool try_emit_mad_for_and_not(ir_expression *ir,
@@ -3437,17 +3450,107 @@ glsl_to_tgsi_visitor::visit(ir_call *ir)
 }
 
 void
+glsl_to_tgsi_visitor::calc_deref_offsets(ir_dereference *head,
+ ir_dereference *tail,
+ unsigned *array_elements,
+ unsigned *base,
+ unsigned *index,
+ st_src_reg *indirect,
+ unsigned *location)
+{
+   switch (tail->ir_type) {
+   case ir_type_dereference_record: {
+  ir_dereference_record *deref_record = tail->as_dereference_record();
+  const glsl_type *struct_type = deref_record->record->type;
+  int field_index = 
deref_record->record->type->field_index(deref_record->field);
+
+  calc_deref_offsets(head, deref_record->record->as_dereference(), 
array_elements, base, index, indirect, location);
+
+  assert(field_index >= 0);
+  *location += struct_type->record_location_offset(field_index);
+  break;
+   }
+
+   case ir_type_dereference_array: {
+  ir_dereference_array *deref_arr = tail->as_dereference_array();
+  ir_constant *array_index = 
deref_arr->array_index->constant_expression_value();
+
+  if (!array_index) {
+ st_src_reg temp_reg;
+ st_dst_reg temp_dst;
+
+ temp_reg = get_temp(glsl_type::uint_type);
+ temp_dst = st_dst_reg(temp_reg);
+ temp_dst.writemask = 1;
+
+ deref_arr->array_index->accept(this);
+ if (*array_elements != 1)
+emit_asm(NULL, TGSI_OPCODE_MUL, temp_dst, this->result, 
st_src_reg_for_int(*array_elements));
+ else
+emit_asm(NULL, TGSI_OPCODE_MOV, temp_dst, this->result);
+
+ if (indirect->file == PROGRAM_UNDEFINED)
+*indirect = temp_reg;
+ else {
+temp_dst = st_dst_reg(*indirect);
+temp_dst.writemask = 1;
+emit_asm(NULL, TGSI_OPCODE_ADD, temp_dst, *indirect, temp_reg);
+ }
+  } else
+ *index += array_index->value.u[0] * *array_elements;
+
+  /* if this is just a constant 1D array deref - adjust base and return 1 
array elements */
+  if (array_index && deref_arr->array->ir_type == 
ir_type_dereference_variable && head == tail) {
+ *base = *index;
+  } else {
+ *array_elements *= deref_arr->array->type->length;
+

[Mesa-dev] [PATCH 1/5] glsl/ir: add param index to variable.

2016-02-07 Thread Dave Airlie

From: Dave Airlie 

We have a requirement to store the index into the mesa parameterlist
for uniforms. Up until now we've overwritten var->data.location with
this info. However this then stops us accessing UniformStorage,
which is needed to do proper dereferencing.

Add a new variable to ir_variable to store this value in, and change
the two uses to use it correctly.

Signed-off-by: Dave Airlie 
---
 src/compiler/glsl/ir.h | 8 
 src/mesa/program/ir_to_mesa.cpp| 5 ++---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 +-
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/compiler/glsl/ir.h b/src/compiler/glsl/ir.h
index 09e21b2..bf9b7ca 100644
--- a/src/compiler/glsl/ir.h
+++ b/src/compiler/glsl/ir.h
@@ -864,6 +864,14 @@ public:
   int location;
 
   /**
+   * for glsl->tgsi/mesa IR we need to store the index into the
+   * parameters for uniforms, initially the code overloaded location
+   * but this causes problems with indirect samplers and AoA.
+   * This is assigned in _mesa_generate_parameters_list_for_uniforms.
+   */
+  int param_index;
+
+  /**
* Vertex stream output identifier.
*/
   unsigned stream;
diff --git a/src/mesa/program/ir_to_mesa.cpp b/src/mesa/program/ir_to_mesa.cpp
index 768d921..68cc4a5 100644
--- a/src/mesa/program/ir_to_mesa.cpp
+++ b/src/mesa/program/ir_to_mesa.cpp
@@ -1389,7 +1389,7 @@ ir_to_mesa_visitor::visit(ir_dereference_variable *ir)
   switch (var->data.mode) {
   case ir_var_uniform:
 entry = new(mem_ctx) variable_storage(var, PROGRAM_UNIFORM,
-  var->data.location);
+  var->data.param_index);
 this->variables.push_tail(entry);
 break;
   case ir_var_shader_in:
@@ -2268,8 +2268,7 @@ public:
{
   this->idx = -1;
   this->program_resource_visitor::process(var);
-
-  var->data.location = this->idx;
+  var->data.param_index = this->idx;
}
 
 private:
diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index b8182de..4b5f2a3 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -2350,7 +2350,7 @@ glsl_to_tgsi_visitor::visit(ir_dereference_variable *ir)
   switch (var->data.mode) {
   case ir_var_uniform:
  entry = new(mem_ctx) variable_storage(var, PROGRAM_UNIFORM,
-   var->data.location);
+   var->data.param_index);
  this->variables.push_tail(entry);
  break;
   case ir_var_shader_in:
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/5] st/mesa: enable AoA for gallium drivers reporting GLSL 1.30

2016-02-07 Thread Dave Airlie

From: Dave Airlie 

Signed-off-by: Dave Airlie 
---
 docs/GL3.txt   | 2 +-
 docs/relnotes/11.2.0.html  | 1 +
 src/mesa/state_tracker/st_extensions.c | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 257fc73..02dc842 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -149,7 +149,7 @@ GL 4.2, GLSL 4.20:
 
 GL 4.3, GLSL 4.30:
 
-  GL_ARB_arrays_of_arrays  DONE (i965)
+  GL_ARB_arrays_of_arrays  DONE (all drivers that 
support GLSL 1.30)
   GL_ARB_ES3_compatibility DONE (all drivers that 
support GLSL 3.30)
   GL_ARB_clear_buffer_object   DONE (all drivers)
   GL_ARB_compute_shaderDONE (i965)
diff --git a/docs/relnotes/11.2.0.html b/docs/relnotes/11.2.0.html
index 0d92ed4..069eca2 100644
--- a/docs/relnotes/11.2.0.html
+++ b/docs/relnotes/11.2.0.html
@@ -44,6 +44,7 @@ Note: some of the new features are only available with 
certain drivers.
 
 
 
+GL_ARB_arrays_of_arrays on all gallium drivers that provide GLSL 1.30
 GL_ARB_base_instance on freedreno/a4xx
 GL_ARB_compute_shader on i965
 GL_ARB_copy_image on r600
diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index f25bd74..feabe62 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -808,6 +808,7 @@ void st_init_extensions(struct pipe_screen *screen,
   }
 
   extensions->EXT_shader_integer_mix = GL_TRUE;
+  extensions->ARB_arrays_of_arrays = GL_TRUE;
} else {
   /* Optional integer support for GLSL 1.2. */
   if (screen->get_shader_param(screen, PIPE_SHADER_VERTEX,
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/5] st/mesa: add atomic AoA support

2016-02-07 Thread Dave Airlie

reuse the sampler deref handling code to do the same
thing for atomics.

Signed-off-by: Dave Airlie 
---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 9e268bc..b4b9dae 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -3160,19 +3160,17 @@ 
glsl_to_tgsi_visitor::visit_atomic_counter_intrinsic(ir_call *ir)
 
/* Calculate the surface offset */
st_src_reg offset;
-   ir_dereference_array *deref_array = deref->as_dereference_array();
+   unsigned array_size = 0, base = 0, index = 0;
 
-   if (deref_array) {
-  offset = get_temp(glsl_type::uint_type);
-
-  deref_array->array_index->accept(this);
+   get_deref_offsets(deref, _size, , , );
 
+   if (offset.file != PROGRAM_UNDEFINED) {
   emit_asm(ir, TGSI_OPCODE_MUL, st_dst_reg(offset),
-   this->result, st_src_reg_for_int(ATOMIC_COUNTER_SIZE));
+   offset, st_src_reg_for_int(ATOMIC_COUNTER_SIZE));
   emit_asm(ir, TGSI_OPCODE_ADD, st_dst_reg(offset),
-   offset, st_src_reg_for_int(location->data.offset));
+   offset, st_src_reg_for_int(location->data.offset + index * 
ATOMIC_COUNTER_SIZE));
} else {
-  offset = st_src_reg_for_int(location->data.offset);
+  offset = st_src_reg_for_int(location->data.offset + index * 
ATOMIC_COUNTER_SIZE);
}
 
ir->return_deref->accept(this);
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/5] st/mesa: enable AoA for gallium drivers reporting GLSL 1.30

2016-02-07 Thread Mike Lothian

Does that also add in AoA for OpenGL ES 3.1 or will that require more work?

On Mon, 8 Feb 2016 at 03:46 Dave Airlie  wrote:

> From: Dave Airlie 
>
> Signed-off-by: Dave Airlie 
> ---
>  docs/GL3.txt   | 2 +-
>  docs/relnotes/11.2.0.html  | 1 +
>  src/mesa/state_tracker/st_extensions.c | 1 +
>  3 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/docs/GL3.txt b/docs/GL3.txt
> index 257fc73..02dc842 100644
> --- a/docs/GL3.txt
> +++ b/docs/GL3.txt
> @@ -149,7 +149,7 @@ GL 4.2, GLSL 4.20:
>
>  GL 4.3, GLSL 4.30:
>
> -  GL_ARB_arrays_of_arrays  DONE (i965)
> +  GL_ARB_arrays_of_arrays  DONE (all drivers
> that support GLSL 1.30)
>GL_ARB_ES3_compatibility DONE (all drivers
> that support GLSL 3.30)
>GL_ARB_clear_buffer_object   DONE (all drivers)
>GL_ARB_compute_shaderDONE (i965)
> diff --git a/docs/relnotes/11.2.0.html b/docs/relnotes/11.2.0.html
> index 0d92ed4..069eca2 100644
> --- a/docs/relnotes/11.2.0.html
> +++ b/docs/relnotes/11.2.0.html
> @@ -44,6 +44,7 @@ Note: some of the new features are only available with
> certain drivers.
>  
>
>  
> +GL_ARB_arrays_of_arrays on all gallium drivers that provide GLSL
> 1.30
>  GL_ARB_base_instance on freedreno/a4xx
>  GL_ARB_compute_shader on i965
>  GL_ARB_copy_image on r600
> diff --git a/src/mesa/state_tracker/st_extensions.c
> b/src/mesa/state_tracker/st_extensions.c
> index f25bd74..feabe62 100644
> --- a/src/mesa/state_tracker/st_extensions.c
> +++ b/src/mesa/state_tracker/st_extensions.c
> @@ -808,6 +808,7 @@ void st_init_extensions(struct pipe_screen *screen,
>}
>
>extensions->EXT_shader_integer_mix = GL_TRUE;
> +  extensions->ARB_arrays_of_arrays = GL_TRUE;
> } else {
>/* Optional integer support for GLSL 1.2. */
>if (screen->get_shader_param(screen, PIPE_SHADER_VERTEX,
> --
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/5] st/mesa: enable AoA for gallium drivers reporting GLSL 1.30

2016-02-07 Thread Dave Airlie

On 8 February 2016 at 14:26, Mike Lothian  wrote:
> Does that also add in AoA for OpenGL ES 3.1 or will that require more work?

Good question, I've no idea. but I think desktop is > GLES in this
case, so I should
update GL3.txt for that as well then.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] st/mesa: handle const initialisers better

2016-02-07 Thread Dave Airlie

If we have constant initialisers up until
this point, collapse things and set the
array size to 1.

This fixes
tests/spec/arb_arrays_of_arrays/execution/sampler/fs-initializer-const-index.shader_test

Signed-off-by: Dave Airlie 
---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index b4b9dae..f087220 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -3498,8 +3498,9 @@ glsl_to_tgsi_visitor::calc_deref_offsets(ir_dereference 
*head,
  *index += array_index->value.u[0] * *array_elements;
 
   /* if this is just a constant 1D array deref - adjust base and return 1 
array elements */
-  if (array_index && deref_arr->array->ir_type == 
ir_type_dereference_variable && head == tail) {
+  if (array_index && deref_arr->array->ir_type == 
ir_type_dereference_variable && indirect->file == PROGRAM_UNDEFINED) {
  *base = *index;
+ *array_elements = 1;
   } else {
  *array_elements *= deref_arr->array->type->length;
   }
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] gallium AoA support and indirect sampler fixes

2016-02-07 Thread Dave Airlie

On 8 February 2016 at 00:13, Laurent Carlier  wrote:
> Le vendredi 5 février 2016, 13:40:26 CET Dave Airlie a écrit :
>> Hi,
>>
>> In fixing some indirect sampler issues with ARB_gpu_shader5,
>> I realised AoA was mostly fixed as well by the same things.
>>
>> Ilia made me fix atomics as well.
>>
>> So thise patch set enables AoA support on all gallium drivers
>> exposing GLSL 1.30.
>>
>> Dave.
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> I 've quickly tested the series, Shadow of Mordor segfault on start and first
> intro movie from witcher 2 is greenish then it segfault.
>
> On top of mesa-git with llvm-svn (both trunk) and amdgpu/kernel-4.5rc2

Thanks for giving it a go,

the series I reposted should fix all this, on my gallium-sampler-fix
branch as well.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 94040] clGetPlatformIDs causes futex race condition

2016-02-07 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=94040

--- Comment #3 from b...@bob131.so ---
Fabian Deutsch 

The backtrace shows that clover is used.
Component: ocl-icd â mesa

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 94040] clGetPlatformIDs causes futex race condition

2016-02-07 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=94040

--- Comment #4 from b...@bob131.so ---
I've tried with a variety of different configurations to check whether this bug
might be setup-specific. I normally use two GPUs from mixed vendors, but I
tried using a single GTX570, HD6870, HD6950, 9800GT and just the plain Intel
HD4000 but still this bug persists

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 94040] clGetPlatformIDs causes futex race condition

2016-02-07 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=94040

--- Comment #2 from b...@bob131.so ---
Created attachment 121583
  --> https://bugs.freedesktop.org/attachment.cgi?id=121583=edit
Blender GDB backtrace

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 94040] clGetPlatformIDs causes futex race condition

2016-02-07 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=94040

--- Comment #5 from b...@bob131.so ---
Fabian Deutsch 

I strongly suspect that it's an issue if clover/mesa's opencl tracker, please
file a bug in upstream or retry with the latest release from rawhide.
Status: NEW â CLOSED

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/5] android: radeonsi: fix building error in si_shader.c

2016-02-07 Thread Michel Dänzer

On 07.02.2016 08:56, Mauro Rossi wrote:
> From e33d112be85e86c2537c26622969dea7dfd16186 Mon Sep 17 00:00:00 2001
> From: Mauro Rossi >
> Date: Sat, 6 Feb 2016 23:54:24 +0100
> Subject: [PATCH 2/2] android: radeonsi: fix building error in si_shader.c
> 
> Android Bionic does not support strchrnul() function,
> causing the following building error:
> 
> external/mesa/src/gallium/drivers/radeonsi/si_shader.c:3863: error:
> undefined reference to 'strchrnul'
> collect2: error: ld returned 1 exit status
> ---
>  src/gallium/drivers/radeonsi/si_shader.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c
> b/src/gallium/drivers/radeonsi/si_shader.c
> index d9ed6b2..1b5e984 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -49,6 +49,10 @@
>  
>  #include 
>  
> +#if defined(__ANDROID__)

This guard needs to be in the header, as I suggested.


> +#include "strchrnul.h"

This should be

#include 

The "..." syntax is for files which are located in the same directory as
the file which has the #include statement.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [llvm] r259796 - [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads

2016-02-07 Thread Michel Dänzer

On 07.02.2016 01:51, Simon Pilgrim wrote:
> Michel, thanks for the report, this should be fixed by rL259991.

Yeah it's fixed, thanks!


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 94040] clGetPlatformIDs causes futex race condition

2016-02-07 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=94040

--- Comment #1 from b...@bob131.so ---
User-Agent:   Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:41.0)
Gecko/20100101 Firefox/41.0
Build Identifier: 

Apologies if I've mischaracterised anything above, I'm going off of my
uninformed analysis of the attached backtrace from GDB.

Attempting to use Blender in any meaningful way results in the UI hanging.
strace clued me into the blocking futex call and gdb seems to point the finger
at libOpenCL.so. 

Oddly enough, clinfo works as it should

Reproducible: Always

Steps to Reproduce:
1. Open Blender
2. Open system tab of preferences panel, switch to the Cycles renderer etc
Actual Results:  
UI freeze, application hang

Expected Results:  
Application doesn't hang

rpm -q fedora-release
fedora-release-23-0.17.noarch

rpm -q blender   
blender-2.75-4.fc23.x86_64

rpm -q ocl-icd
ocl-icd-2.2.7-2.git20150606.ebbc4c1.fc23.x86_64

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 94040] clGetPlatformIDs causes futex race condition

2016-02-07 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=94040

Bug ID: 94040
   Summary: clGetPlatformIDs causes futex race condition
   Product: Mesa
   Version: 11.0
  Hardware: Other
OS: Linux (All)
Status: NEW
  Severity: normal
  Priority: medium
 Component: Other
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: b...@bob131.so
QA Contact: mesa-dev@lists.freedesktop.org

I originally opened this bug on the Redhat bugzilla[1], but I was instructed to
take this upstream. Just so everyone's on the same page, I'm going to
copy-paste the report from RH.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1273131

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/5] android: radeonsi: fix building error in si_shader.c

2016-02-07 Thread Michel Dänzer

On 07.02.2016 08:55, Mauro Rossi wrote:
> From 8030a6cd9d7bb3320fca94038f1969db56223598 Mon Sep 17 00:00:00 2001
> From: Mauro Rossi >
> Date: Sat, 6 Feb 2016 23:52:36 +0100
> Subject: [PATCH 1/2] android: add support for strchrnul
> 
> Android Bionic has no strchrnul in string functions,
> radeonsi uses strchrnul, so we need an implementation.
> 
> strchrnul.h is added in top mesa include path.
> ---
>  include/strchrnul.h | 40 
>  1 file changed, 40 insertions(+)
>  create mode 100644 include/strchrnul.h
> 
> diff --git a/include/strchrnul.h b/include/strchrnul.h
> new file mode 100644
> index 000..83477e2
> --- /dev/null
> +++ b/include/strchrnul.h
> @@ -0,0 +1,40 @@
> +/**
> + *
> + * Copyright (C) 2014 Emil Velikov  >
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> "Software"),
> + * to deal in the Software without restriction, including without
> limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included
> + * in all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + *
> +
> **/
> +
> +#ifndef _STRCHRNUL_H_
> +#define _STRCHRNUL_H_
> +
> +char *
> +strchrnul(const char *s, int c)
> +{
> +char * result = strchr(s, c);
> +
> +if (result == NULL) {
> +result = s + strlen(s);
> +}
> +
> +return result;
> +}
> +
> +#endif /* _STRCHRNUL_H_ */

Apart from my review of patch 2, this header should

#include 

so that it also works on platforms which do provide strchrnul.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

92 matches

Mail list logo