Re: [Mesa-dev] Backporting bufmgr fixes to libdrm_intel (Was Re: [PATCH 6/9] i965/bufmgr: Garbage-collect vma cache/pruning)

2017-04-10 Thread Kenneth Graunke
On Monday, April 10, 2017 7:11:18 AM PDT Emil Velikov wrote:
> Hi all,
> 
> On 10 April 2017 at 08:18, Kenneth Graunke  wrote:
> > From: Daniel Vetter 
> >
> > This was done because the kernel has 1 global address space, shared
> > with all render clients, for gtt mmap offsets, and that address space
> > was only 32bit on 32bit kernels.
> >
> > This was fixed  in
> >
> > commit 440fd5283a87345cdd4237bdf45fb01130ea0056
> > Author: Thierry Reding 
> > Date:   Fri Jan 23 09:05:06 2015 +0100
> >
> > drm/mm: Support 4 GiB and larger ranges
> >
> > which shipped in 4.0. Of course you still want to limit the bo cache
> > to a reasonable size on 32bit apps to avoid ENOMEM, but that's better
> > solved by tuning the cache a bit. On 64bit, this was never an issue.
> >
> While this patch is _not_ a bugfix, it inspired an interesting question/topic:
> 
> Do we want to backport fixes from mesa's bufmgr to libdrm_intel?
> 
> Or in general what's the plan about the library - leave it as-is, sync
> fixes, remove it, other.
> Can we have the decision documented somewhere, please?
> 
> After all: good science/engineering is good documentation.
> 
> Thanks
> Emil

It makes sense to backport bug fixes, given that there are still
libdrm_intel uses out there (libva, beignet, i915_dri).

That said, brw_bufmgr is diverging enough that backporting fixes
probably means reimplementing the same idea in the other codebase.
If we find a nasty bug, we certainly should, but I imagine most
patches won't apply and that's OK.

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Take into account lower frequency of conditional blocks in spilling cost heuristic.

2017-04-10 Thread Kenneth Graunke
On Monday, April 10, 2017 5:23:20 PM PDT Francisco Jerez wrote:
> The individual branches of an if/else/endif construct will be executed
> some unknown number of times between 0 and 1 relative to the parent
> block.  Use some factor in between as weight while approximating the
> cost of spill/fill instructions within a conditional if-else branch.
> This favors spilling registers used within conditional branches which
> are likely to be executed less frequently than registers used at the
> top level.
> 
> Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on
> my SKL GT4e.  Should have a comparable effect on other platforms.  No
> significant regressions.
> ---
>  src/intel/compiler/brw_fs_reg_allocate.cpp | 19 ++-
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp 
> b/src/intel/compiler/brw_fs_reg_allocate.cpp
> index 5c6f3d4..c981d72 100644
> --- a/src/intel/compiler/brw_fs_reg_allocate.cpp
> +++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
> @@ -806,7 +806,7 @@ emit_spill(const fs_builder , fs_reg src,
>  int
>  fs_visitor::choose_spill_reg(struct ra_graph *g)
>  {
> -   float loop_scale = 1.0;
> +   float block_scale = 1.0;
> float spill_costs[this->alloc.count];
> bool no_spill[this->alloc.count];
>  
> @@ -822,23 +822,32 @@ fs_visitor::choose_spill_reg(struct ra_graph *g)
> foreach_block_and_inst(block, fs_inst, inst, cfg) {
>for (unsigned int i = 0; i < inst->sources; i++) {
>if (inst->src[i].file == VGRF)
> -spill_costs[inst->src[i].nr] += loop_scale;
> +spill_costs[inst->src[i].nr] += block_scale;
>}
>  
>if (inst->dst.file == VGRF)
>   spill_costs[inst->dst.nr] += DIV_ROUND_UP(inst->size_written, 
> REG_SIZE)
> -  * loop_scale;
> +  * block_scale;
>  
>switch (inst->opcode) {
>  
>case BRW_OPCODE_DO:
> -  loop_scale *= 10;
> +  block_scale *= 10;
>break;
>  
>case BRW_OPCODE_WHILE:
> -  loop_scale /= 10;
> +  block_scale /= 10;
>break;
>  
> +  case BRW_OPCODE_IF:
> +  case BRW_OPCODE_IFF:
> + block_scale *= 0.5;
> + break;
> +
> +  case BRW_OPCODE_ENDIF:
> + block_scale /= 0.5;
> + break;
> +
>case SHADER_OPCODE_GEN4_SCRATCH_WRITE:
>if (inst->src[0].file == VGRF)
>  no_spill[inst->src[0].nr] = true;
> 

Makes sense, nice simple improvement!

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] anv/allocator: Add a BO cache

2017-04-10 Thread Jason Ekstrand
This cache allows us to easily ensure that we have a unique anv_bo for
each gem handle.  We'll need this in order to support multiple-import of
memory objects and semaphores.

v2 (Jason Ekstrand):
 - Reject BO imports if the size doesn't match the prime fd size as
   reported by lseek().

v3 (Jason Ekstrand):
 - Get rid of the alloc parameter to all of the calls and just use
   device->alloc instead.
 - Allocate the correct amount of memory for the anv_cached_bo structure.

Cc: Chad Versace 
---
 src/intel/vulkan/anv_allocator.c | 263 +++
 src/intel/vulkan/anv_private.h   |  21 
 2 files changed, 284 insertions(+)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 45c663b..2753f44 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -34,6 +34,8 @@
 
 #include "anv_private.h"
 
+#include "util/hash_table.h"
+
 #ifdef HAVE_VALGRIND
 #define VG_NOACCESS_READ(__ptr) ({   \
VALGRIND_MAKE_MEM_DEFINED((__ptr), sizeof(*(__ptr))); \
@@ -976,3 +978,264 @@ anv_scratch_pool_alloc(struct anv_device *device, struct 
anv_scratch_pool *pool,
 
return >bo;
 }
+
+struct anv_cached_bo {
+   struct anv_bo bo;
+
+   uint32_t refcount;
+};
+
+static uint32_t
+hash_uint32_t(const void *key)
+{
+   return (uint32_t)(uintptr_t)key;
+}
+
+static bool
+uint32_t_equal(const void *a, const void *b)
+{
+   return a == b;
+}
+
+VkResult
+anv_bo_cache_init(struct anv_bo_cache *cache)
+{
+   cache->bo_map = _mesa_hash_table_create(NULL, hash_uint32_t, 
uint32_t_equal);
+   if (!cache->bo_map)
+  return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
+
+   if (pthread_mutex_init(>mutex, NULL)) {
+  _mesa_hash_table_destroy(cache->bo_map, NULL);
+  return vk_errorf(VK_ERROR_OUT_OF_HOST_MEMORY,
+   "pthread_mutex_inti failed: %m");
+   }
+
+   return VK_SUCCESS;
+}
+
+void
+anv_bo_cache_finish(struct anv_bo_cache *cache)
+{
+   _mesa_hash_table_destroy(cache->bo_map, NULL);
+   pthread_mutex_destroy(>mutex);
+}
+
+static struct anv_cached_bo *
+anv_bo_cache_lookup_locked(struct anv_bo_cache *cache, uint32_t gem_handle)
+{
+   struct hash_entry *entry =
+  _mesa_hash_table_search(cache->bo_map,
+  (const void *)(uintptr_t)gem_handle);
+   if (!entry)
+  return NULL;
+
+   struct anv_cached_bo *bo = (struct anv_cached_bo *)entry->data;
+   assert(bo->bo.gem_handle == gem_handle);
+
+   return bo;
+}
+
+static struct anv_bo *
+anv_bo_cache_lookup(struct anv_bo_cache *cache, uint32_t gem_handle)
+{
+   pthread_mutex_lock(>mutex);
+
+   struct anv_cached_bo *bo = anv_bo_cache_lookup_locked(cache, gem_handle);
+
+   pthread_mutex_unlock(>mutex);
+
+   return >bo;
+}
+
+VkResult
+anv_bo_cache_alloc(struct anv_device *device,
+   struct anv_bo_cache *cache,
+   uint64_t size, struct anv_bo **bo_out)
+{
+   struct anv_cached_bo *bo =
+  vk_alloc(>alloc, sizeof(struct anv_cached_bo), 8,
+   VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
+   if (!bo)
+  return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
+
+   bo->refcount = 1;
+
+   /* The kernel is going to give us whole pages anyway */
+   size = align_u64(size, 4096);
+
+   VkResult result = anv_bo_init_new(>bo, device, size);
+   if (result != VK_SUCCESS) {
+  vk_free(>alloc, bo);
+  return result;
+   }
+
+   assert(bo->bo.gem_handle);
+
+   pthread_mutex_lock(>mutex);
+
+   _mesa_hash_table_insert(cache->bo_map,
+   (void *)(uintptr_t)bo->bo.gem_handle, bo);
+
+   pthread_mutex_unlock(>mutex);
+
+   *bo_out = >bo;
+
+   return VK_SUCCESS;
+}
+
+VkResult
+anv_bo_cache_import(struct anv_device *device,
+struct anv_bo_cache *cache,
+int fd, uint64_t size, struct anv_bo **bo_out)
+{
+   pthread_mutex_lock(>mutex);
+
+   /* The kernel is going to give us whole pages anyway */
+   size = align_u64(size, 4096);
+
+   uint32_t gem_handle = anv_gem_fd_to_handle(device, fd);
+   if (!gem_handle) {
+  pthread_mutex_unlock(>mutex);
+  return vk_error(VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX);
+   }
+
+   struct anv_cached_bo *bo = anv_bo_cache_lookup_locked(cache, gem_handle);
+   if (bo) {
+  if (bo->bo.size != size) {
+ pthread_mutex_unlock(>mutex);
+ return vk_error(VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX);
+  }
+  __sync_fetch_and_add(>refcount, 1);
+   } else {
+  /* For security purposes, we reject BO imports where the size does not
+   * match exactly.  This prevents a malicious client from passing a
+   * buffer to a trusted client, lying about the size, and telling the
+   * trusted client to try and texture from an image that goes
+   * out-of-bounds.  This sort of thing could lead to GPU hangs or worse
+   * in the trusted client.  The trusted client can protect itself against
+   * this sort of attack but only if 

Re: [Mesa-dev] [PATCH] i965/fs: Take into account lower frequency of conditional blocks in spilling cost heuristic.

2017-04-10 Thread Jason Ekstrand
On Mon, Apr 10, 2017 at 5:23 PM, Francisco Jerez 
wrote:

> The individual branches of an if/else/endif construct will be executed
> some unknown number of times between 0 and 1 relative to the parent
> block.  Use some factor in between as weight while approximating the
> cost of spill/fill instructions within a conditional if-else branch.
> This favors spilling registers used within conditional branches which
> are likely to be executed less frequently than registers used at the
> top level.
>
> Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on
> my SKL GT4e.  Should have a comparable effect on other platforms.  No
> significant regressions.
>

Nice!


> ---
>  src/intel/compiler/brw_fs_reg_allocate.cpp | 19 ++-
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp
> b/src/intel/compiler/brw_fs_reg_allocate.cpp
> index 5c6f3d4..c981d72 100644
> --- a/src/intel/compiler/brw_fs_reg_allocate.cpp
> +++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
> @@ -806,7 +806,7 @@ emit_spill(const fs_builder , fs_reg src,
>  int
>  fs_visitor::choose_spill_reg(struct ra_graph *g)
>  {
> -   float loop_scale = 1.0;
> +   float block_scale = 1.0;
> float spill_costs[this->alloc.count];
> bool no_spill[this->alloc.count];
>
> @@ -822,23 +822,32 @@ fs_visitor::choose_spill_reg(struct ra_graph *g)
> foreach_block_and_inst(block, fs_inst, inst, cfg) {
>for (unsigned int i = 0; i < inst->sources; i++) {
>  if (inst->src[i].file == VGRF)
> -spill_costs[inst->src[i].nr] += loop_scale;
> +spill_costs[inst->src[i].nr] += block_scale;
>}
>
>if (inst->dst.file == VGRF)
>   spill_costs[inst->dst.nr] += DIV_ROUND_UP(inst->size_written,
> REG_SIZE)
> -  * loop_scale;
> +  * block_scale;
>
>switch (inst->opcode) {
>
>case BRW_OPCODE_DO:
> -loop_scale *= 10;
> +block_scale *= 10;
>  break;
>
>case BRW_OPCODE_WHILE:
> -loop_scale /= 10;
> +block_scale /= 10;
>  break;
>
> +  case BRW_OPCODE_IF:
> +  case BRW_OPCODE_IFF:
> + block_scale *= 0.5;
>

Maybe 0.75 since it may or may not be uniform?  Or 0.9 for that matter.
It's all arbitrary (see also 10).  My only concern is that, if we set it
too low, the compiler may decide to spill something with 4 spills over
something with 2 just because it's in control-flow (or 8 vs. 2 if it's
nested in a level).  In this particular shader, the spills are so deep into
an if-ladder that it doesn't really matter because the compounding will
make it look free regardless.

I'd like Matt or Ken's opinion in here as well but, for my part,

Reviewed-by: Jason Ekstrand 


> + break;
> +
> +  case BRW_OPCODE_ENDIF:
> + block_scale /= 0.5;
> + break;
> +
>case SHADER_OPCODE_GEN4_SCRATCH_WRITE:
>  if (inst->src[0].file == VGRF)
>  no_spill[inst->src[0].nr] = true;
> --
> 2.10.2
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/st: remove _mesa_get_fallback_texture() calls

2017-04-10 Thread Timothy Arceri
These calls look like leftover from fallback texture support first
being added to the st in 8f6d9e12be0be and then later being added
to core mesa in 00e203fe17cbf21.

The piglit test fp-incomplete-tex continues to work with this
change.
---
 src/mesa/state_tracker/st_atom_sampler.c | 8 ++--
 src/mesa/state_tracker/st_atom_texture.c | 5 +
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/src/mesa/state_tracker/st_atom_sampler.c 
b/src/mesa/state_tracker/st_atom_sampler.c
index 9ddc704..820a57d 100644
--- a/src/mesa/state_tracker/st_atom_sampler.c
+++ b/src/mesa/state_tracker/st_atom_sampler.c
@@ -130,27 +130,23 @@ static void
 convert_sampler(struct st_context *st,
 struct pipe_sampler_state *sampler,
 GLuint texUnit)
 {
const struct gl_texture_object *texobj;
struct gl_context *ctx = st->ctx;
const struct gl_sampler_object *msamp;
GLenum texBaseFormat;
 
texobj = ctx->Texture.Unit[texUnit]._Current;
-   if (!texobj) {
-  texobj = _mesa_get_fallback_texture(ctx, TEXTURE_2D_INDEX);
-  msamp = >Sampler;
-   } else {
-  msamp = _mesa_get_samplerobj(ctx, texUnit);
-   }
+   assert(texobj);
 
+   msamp = _mesa_get_samplerobj(ctx, texUnit);
texBaseFormat = _mesa_texture_base_format(texobj);
 
memset(sampler, 0, sizeof(*sampler));
sampler->wrap_s = gl_wrap_xlate(msamp->WrapS);
sampler->wrap_t = gl_wrap_xlate(msamp->WrapT);
sampler->wrap_r = gl_wrap_xlate(msamp->WrapR);
 
sampler->min_img_filter = gl_filter_to_img_filter(msamp->MinFilter);
sampler->min_mip_filter = gl_filter_to_mip_filter(msamp->MinFilter);
sampler->mag_img_filter = gl_filter_to_img_filter(msamp->MagFilter);
diff --git a/src/mesa/state_tracker/st_atom_texture.c 
b/src/mesa/state_tracker/st_atom_texture.c
index 5b481ec..fa4b644 100644
--- a/src/mesa/state_tracker/st_atom_texture.c
+++ b/src/mesa/state_tracker/st_atom_texture.c
@@ -59,25 +59,22 @@ update_single_texture(struct st_context *st,
 {
struct gl_context *ctx = st->ctx;
const struct gl_sampler_object *samp;
struct gl_texture_object *texObj;
struct st_texture_object *stObj;
GLboolean retval;
 
samp = _mesa_get_samplerobj(ctx, texUnit);
 
texObj = ctx->Texture.Unit[texUnit]._Current;
+   assert(texObj);
 
-   if (!texObj) {
-  texObj = _mesa_get_fallback_texture(ctx, TEXTURE_2D_INDEX);
-  samp = >Sampler;
-   }
stObj = st_texture_object(texObj);
 
retval = st_finalize_texture(ctx, st->pipe, texObj, 0);
if (!retval) {
   /* out of mem */
   return GL_FALSE;
}
 
/* Check a few pieces of state outside the texture object to see if we
 * need to force revalidation.
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V3 1/9] mesa: create _mesa_attach_renderbuffer_without_ref() helper

2017-04-10 Thread Brian Paul

On 04/10/2017 06:09 PM, Timothy Arceri wrote:

On 11/04/17 03:11, Brian Paul wrote:

On 04/07/2017 09:21 PM, Timothy Arceri wrote:

This will be used to take ownership of freashly created renderbuffers,
avoiding the need to call the reference function which requires
locking.

V2: dereference any existing fb attachments and actually attach the
 new rb.

v3: split out validation and attachment type/complete setting into
 a shared static function.
---
  src/mesa/main/renderbuffer.c | 43
+++
  src/mesa/main/renderbuffer.h |  5 +
  2 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/src/mesa/main/renderbuffer.c b/src/mesa/main/renderbuffer.c
index 4375b5b..627bdca 100644
--- a/src/mesa/main/renderbuffer.c
+++ b/src/mesa/main/renderbuffer.c
@@ -99,28 +99,24 @@ _mesa_new_renderbuffer(struct gl_context *ctx,
GLuint name)
   * free the object in the end.
   */
  void
  _mesa_delete_renderbuffer(struct gl_context *ctx, struct
gl_renderbuffer *rb)
  {
 mtx_destroy(>Mutex);
 free(rb->Label);
 free(rb);
  }

-
-/**
- * Attach a renderbuffer to a framebuffer.
- * \param bufferName  one of the BUFFER_x tokens
- */
-void
-_mesa_add_renderbuffer(struct gl_framebuffer *fb,
-   gl_buffer_index bufferName, struct
gl_renderbuffer *rb)
+static void
+validate_and_init_renderbuffer_attachment(struct gl_framebuffer *fb,
+  gl_buffer_index bufferName,
+  struct gl_renderbuffer *rb)
  {
 assert(fb);
 assert(rb);
 assert(bufferName < BUFFER_COUNT);

 /* There should be no previous renderbuffer on this attachment
point,
  * with the exception of depth/stencil since the same
renderbuffer may
  * be used for both.
  */
 assert(bufferName == BUFFER_DEPTH ||
@@ -130,20 +126,51 @@ _mesa_add_renderbuffer(struct gl_framebuffer *fb,
 /* winsys vs. user-created buffer cross check */
 if (_mesa_is_user_fbo(fb)) {
assert(rb->Name);
 }
 else {
assert(!rb->Name);
 }

 fb->Attachment[bufferName].Type = GL_RENDERBUFFER_EXT;
 fb->Attachment[bufferName].Complete = GL_TRUE;
+}
+
+
+/**
+ * Attach a renderbuffer to a framebuffer.
+ * \param bufferName  one of the BUFFER_x tokens
+ *
+ * This function avoids adding a reference and is therefore intended
to be
+ * used with a freashly created renderbuffer.


"freshly"



+ */
+void
+_mesa_add_renderbuffer_without_ref(struct gl_framebuffer *fb,
+   gl_buffer_index bufferName,
+   struct gl_renderbuffer *rb)


I see you've already pushed this.


Yes the previous leak was causing issue in Gnome so I wanted to push
these in case they were causing problems also.


Still, I'd like to suggest a
different name such as _mesa_own_renderbuffer() that stresses the
transfer of ownership of the renderbuffer.


I pushed the other two trivial suggestions, as for the name I struggled
a bit with this because we still want it to be obvious that it is a
variant of _mesa_add_renderbuffer().

How about _mesa_add_and_own_renderbuffer() ??


IMO these would make more sense being called _mesa_attach_* rather than
add. I can change that also is you agree?

Maybe:

_mesa_attach_and_own_rb()
_mesa_attach_and_reference_rb()

??


That sounds OK.

-Brian


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: use pre_hashed version of search for the mesa hash table

2017-04-10 Thread Eric Anholt
Timothy Arceri  writes:

> The key is just an unsigned int so there is never any real hashing
> done.
> ---
>  src/mesa/main/hash.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/main/hash.c b/src/mesa/main/hash.c
> index 670438a..eb25d88 100644
> --- a/src/mesa/main/hash.c
> +++ b/src/mesa/main/hash.c
> @@ -176,21 +176,22 @@ static inline void *
>  _mesa_HashLookup_unlocked(struct _mesa_HashTable *table, GLuint key)
>  {
> const struct hash_entry *entry;
>  
> assert(table);
> assert(key);
>  
> if (key == DELETED_KEY_VALUE)
>return table->deleted_key_data;
>  
> -   entry = _mesa_hash_table_search(table->ht, uint_key(key));
> +   uint32_t hash = uint_hash(key);
> +   entry = _mesa_hash_table_search_pre_hashed(table->ht, hash, 
> uint_key(key));
> if (!entry)
>return NULL;
>  
> return entry->data;
>  }

So this cuts out the no-op function call back from the HT code.  Seems
like a win that's worth the bit of complexity in this very hot path.  I
would also be happy with not having the temp and just doing:

entry = _mesa_hash_table_search_pre_hashed(table->ht,
   uint_hash(key),
   uint_key(key));

which I think looks pretty nice.  Either way,

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/st: only update samplers for stages that have changed

2017-04-10 Thread Timothy Arceri
Might helper reduce cpu for some apps that use sso.
---
 src/mesa/state_tracker/st_atom.h |  6 +-
 src/mesa/state_tracker/st_atom_list.h|  8 ++-
 src/mesa/state_tracker/st_atom_sampler.c | 94 ++--
 src/mesa/state_tracker/st_program.c  | 14 ++---
 4 files changed, 94 insertions(+), 28 deletions(-)

diff --git a/src/mesa/state_tracker/st_atom.h b/src/mesa/state_tracker/st_atom.h
index 45c3e48..0145cef 100644
--- a/src/mesa/state_tracker/st_atom.h
+++ b/src/mesa/state_tracker/st_atom.h
@@ -76,21 +76,25 @@ enum {
 #define ST_STATE(FLAG, st_update) static const uint64_t FLAG = 1llu << 
FLAG##_INDEX;
 #include "st_atom_list.h"
 #undef ST_STATE
 
 /* Add extern struct declarations. */
 #define ST_STATE(FLAG, st_update) extern const struct st_tracked_state 
st_update;
 #include "st_atom_list.h"
 #undef ST_STATE
 
 /* Combined state flags. */
-#define ST_NEW_SAMPLERS (ST_NEW_RENDER_SAMPLERS | \
+#define ST_NEW_SAMPLERS (ST_NEW_VS_SAMPLERS | \
+ ST_NEW_TCS_SAMPLERS | \
+ ST_NEW_TES_SAMPLERS | \
+ ST_NEW_GS_SAMPLERS | \
+ ST_NEW_FS_SAMPLERS | \
  ST_NEW_CS_SAMPLERS)
 
 #define ST_NEW_FRAMEBUFFER  (ST_NEW_FB_STATE | \
  ST_NEW_SAMPLE_MASK | \
  ST_NEW_SAMPLE_SHADING)
 
 #define ST_NEW_VERTEX_PROGRAM(st, p) (p->affected_states | \
   (st_user_clip_planes_enabled(st->ctx) ? \
ST_NEW_CLIP_STATE : 0))
 
diff --git a/src/mesa/state_tracker/st_atom_list.h 
b/src/mesa/state_tracker/st_atom_list.h
index d0d5a05..4212dac 100644
--- a/src/mesa/state_tracker/st_atom_list.h
+++ b/src/mesa/state_tracker/st_atom_list.h
@@ -15,21 +15,25 @@ ST_STATE(ST_NEW_SCISSOR, st_update_scissor)
 ST_STATE(ST_NEW_WINDOW_RECTANGLES, st_update_window_rectangles)
 ST_STATE(ST_NEW_BLEND, st_update_blend)
 
 ST_STATE(ST_NEW_VS_SAMPLER_VIEWS, st_update_vertex_texture)
 ST_STATE(ST_NEW_FS_SAMPLER_VIEWS, st_update_fragment_texture)
 ST_STATE(ST_NEW_GS_SAMPLER_VIEWS, st_update_geometry_texture)
 ST_STATE(ST_NEW_TCS_SAMPLER_VIEWS, st_update_tessctrl_texture)
 ST_STATE(ST_NEW_TES_SAMPLER_VIEWS, st_update_tesseval_texture)
 
 /* Non-compute samplers. */
-ST_STATE(ST_NEW_RENDER_SAMPLERS, st_update_sampler) /* depends on 
update_*_texture for swizzle */
+ST_STATE(ST_NEW_VS_SAMPLERS, st_update_vertex_sampler) /* depends on 
update_*_texture for swizzle */
+ST_STATE(ST_NEW_TCS_SAMPLERS, st_update_tessctrl_sampler) /* depends on 
update_*_texture for swizzle */
+ST_STATE(ST_NEW_TES_SAMPLERS, st_update_tesseval_sampler) /* depends on 
update_*_texture for swizzle */
+ST_STATE(ST_NEW_GS_SAMPLERS, st_update_geometry_sampler) /* depends on 
update_*_texture for swizzle */
+ST_STATE(ST_NEW_FS_SAMPLERS, st_update_fragment_sampler) /* depends on 
update_*_texture for swizzle */
 
 ST_STATE(ST_NEW_VS_IMAGES, st_bind_vs_images)
 ST_STATE(ST_NEW_TCS_IMAGES, st_bind_tcs_images)
 ST_STATE(ST_NEW_TES_IMAGES, st_bind_tes_images)
 ST_STATE(ST_NEW_GS_IMAGES, st_bind_gs_images)
 ST_STATE(ST_NEW_FS_IMAGES, st_bind_fs_images)
 
 ST_STATE(ST_NEW_FB_STATE, st_update_framebuffer) /* depends on 
update_*_texture and bind_*_images */
 ST_STATE(ST_NEW_SAMPLE_MASK, st_update_msaa)
 ST_STATE(ST_NEW_SAMPLE_SHADING, st_update_sample_shading)
@@ -60,16 +64,16 @@ ST_STATE(ST_NEW_GS_SSBOS, st_bind_gs_ssbos)
 
 ST_STATE(ST_NEW_PIXEL_TRANSFER, st_update_pixel_transfer)
 ST_STATE(ST_NEW_TESS_STATE, st_update_tess)
 
 /* this must be done after the vertex program update */
 ST_STATE(ST_NEW_VERTEX_ARRAYS, st_update_array)
 
 /* Compute states must be last. */
 ST_STATE(ST_NEW_CS_STATE, st_update_cp)
 ST_STATE(ST_NEW_CS_SAMPLER_VIEWS, st_update_compute_texture)
-ST_STATE(ST_NEW_CS_SAMPLERS, st_update_sampler) /* depends on 
update_compute_texture for swizzle */
+ST_STATE(ST_NEW_CS_SAMPLERS, st_update_compute_sampler) /* depends on 
update_compute_texture for swizzle */
 ST_STATE(ST_NEW_CS_CONSTANTS, st_update_cs_constants)
 ST_STATE(ST_NEW_CS_UBOS, st_bind_cs_ubos)
 ST_STATE(ST_NEW_CS_ATOMICS, st_bind_cs_atomics)
 ST_STATE(ST_NEW_CS_SSBOS, st_bind_cs_ssbos)
 ST_STATE(ST_NEW_CS_IMAGES, st_bind_cs_images)
diff --git a/src/mesa/state_tracker/st_atom_sampler.c 
b/src/mesa/state_tracker/st_atom_sampler.c
index 661e0f2..9ddc704 100644
--- a/src/mesa/state_tracker/st_atom_sampler.c
+++ b/src/mesa/state_tracker/st_atom_sampler.c
@@ -314,66 +314,124 @@ update_shader_samplers(struct st_context *st,
   }
 
   *num_samplers = MAX2(*num_samplers, extra + 1);
}
 
cso_set_samplers(st->cso_context, shader_stage, *num_samplers, states);
 }
 
 
 static void
-update_samplers(struct st_context *st)
+update_vertex_samplers(struct st_context *st)
 {
const struct gl_context *ctx = st->ctx;
 
update_shader_samplers(st,
-   

[Mesa-dev] [PATCH] mesa: use pre_hashed version of search for the mesa hash table

2017-04-10 Thread Timothy Arceri
The key is just an unsigned int so there is never any real hashing
done.
---
 src/mesa/main/hash.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/hash.c b/src/mesa/main/hash.c
index 670438a..eb25d88 100644
--- a/src/mesa/main/hash.c
+++ b/src/mesa/main/hash.c
@@ -176,21 +176,22 @@ static inline void *
 _mesa_HashLookup_unlocked(struct _mesa_HashTable *table, GLuint key)
 {
const struct hash_entry *entry;
 
assert(table);
assert(key);
 
if (key == DELETED_KEY_VALUE)
   return table->deleted_key_data;
 
-   entry = _mesa_hash_table_search(table->ht, uint_key(key));
+   uint32_t hash = uint_hash(key);
+   entry = _mesa_hash_table_search_pre_hashed(table->ht, hash, uint_key(key));
if (!entry)
   return NULL;
 
return entry->data;
 }
 
 
 /**
  * Lookup an entry in the hash table.
  * 
@@ -340,21 +341,23 @@ _mesa_HashRemove_unlocked(struct _mesa_HashTable *table, 
GLuint key)
/* have to check this outside of mutex lock */
if (table->InDeleteAll) {
   _mesa_problem(NULL, "_mesa_HashRemove illegally called from "
 "_mesa_HashDeleteAll callback function");
   return;
}
 
if (key == DELETED_KEY_VALUE) {
   table->deleted_key_data = NULL;
} else {
-  entry = _mesa_hash_table_search(table->ht, uint_key(key));
+  uint32_t hash = uint_hash(key);
+  entry = _mesa_hash_table_search_pre_hashed(table->ht, hash,
+ uint_key(key));
   _mesa_hash_table_remove(table->ht, entry);
}
 }
 
 
 void
 _mesa_HashRemoveLocked(struct _mesa_HashTable *table, GLuint key)
 {
_mesa_HashRemove_unlocked(table, key);
 }
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Meson mesademos (Was: [RFC libdrm 0/2] Replace the build system with meson)

2017-04-10 Thread Dylan Baker
Quoting Dylan Baker (2017-04-10 11:50:36)
> Quoting Nirbheek Chauhan (2017-04-10 06:59:02)
> > Hello Jose,
> > 
> > On Mon, Apr 10, 2017 at 5:41 PM, Jose Fonseca  wrote:
> > > I've been trying to get native mingw to build.  (It's still important to
> > > prototype mesademos with MSVC to ensure meson is up to the task, but long
> > > term, I think I'll push for dropping MSVC support from mesademos and 
> > > piglit,
> > > since MinGW is fine for this sort of samples/tests programs.)
> > >
> > > However native MinGW fails poorly:
> > >
> > > [78/1058] Static linking library src/util/libutil.a
> > > FAILED: src/util/libutil.a
> > > cmd /c del /f /s /q src/util/libutil.a && ar @src/util/libutil.a.rsp
> > > Invalid switch - "util".
> > >
> > > So the problem here is that meson is passing `/` separator to the cmd.exe
> > > del command, instead of `\`.
> > >
> > > Full log
> > > https://ci.appveyor.com/project/jrfonseca/mesademos/build/job/6rpen94u7yq3q69n
> > >
> > 
> > This was a regression with 0.39, and is already fixed in git master:
> > https://github.com/mesonbuild/meson/pull/1527
> > 
> > It will be in the next release, which is scheduled for April 22. In
> > the meantime, please test with git master.
> > 
> > >
> > > TBH, this is basic windows functionality, and if it can't get it right 
> > > then
> > > it shakes my belief that's it's getting proper windows testing...
> > >
> > 
> > I'm sorry to hear that.
> > 
> > >
> > > I think part of the problem is that per
> > > https://github.com/mesonbuild/meson/blob/master/.appveyor.yml Meson is 
> > > only
> > > being tested with MSYS (which provides a full-blow POSIX environment on
> > > Windows), and not with plain MinGW.
> > >
> > 
> > Actually, this slipped through the cracks (I broke it!) because we
> > didn't have our CI testing MinGW. Now we do, specifically to catch
> > this sort of stuff: https://github.com/mesonbuild/meson/pull/1346.
> > 
> > All our pull requests are required to pass all CI before they can be
> > merged, and every bug fixed and feature added is required to have a
> > new test case for it, so I expect the situation will not regress
> > again.
> > 
> > Our CI is fairly comprehensive -- MSVC 2010, 2015, 2017, MinGW, Cygwin
> > on just Windows and getting better every day. The biggest hole in it
> > right now is BSD, and we would be extremely grateful if someone could
> > help us with that too!
> > 
> > > IMHO, MSYS is a hack to get packages that use autotools to build with 
> > > MinGW.
> > > Packages that use Windows aware build systems (like Meson is trying to be)
> > > should stay as _far_ as possible from MSYS
> > >
> > 
> > Yes, I agree. MSYS2 in particular is especially broken (the toolchain
> > is buggy and even the python3 shipped with it is crap) and we do not
> > recommend using it at all (although a surprisingly large number of
> > people use its toolchain, so we do support it). If you look closely,
> > we do not use MSYS itself, only MinGW:
> > 
> > https://github.com/mesonbuild/meson/blob/master/.appveyor.yml#L61
> > 
> > The MSYS paths are C:\msys64\usr\bin and the MinGW (toolchain) paths
> > are C:\msys64\mingw??\bin.
> > 
> > And in any case our codepaths for building something with the Ninja
> > backend on MSVC and MinGW are almost identical, and our MSVC CI does
> > not have any POSIX binaries in their path.
> > 
> > I even have all of Glib + dependencies building out of the box with
> > just Meson git + MSVC [https://github.com/centricular/glib/], and my
> > next step is to have all of GStreamer building that way.
> > 
> > Hope this clarifies things!
> > 
> > Cheers,
> > Nirbheek
> 
> Jose,
> 
> I installed meson from git as Nirbheek suggested, and it got the mingw build
> working, and fixed the appveyor build to actually start, although I ran into
> some problems with freeglut I'm not sure if I'll have time to fix today
> (although I'd like to get them fixed). If you pull my branch both the travis
> build will turn completely green, and the MinGW build turns green on appveyor,
> though MSVC still doesn't. My meson branch is based on yours and you should be
> able to apply the changes cleanly.

I have freeglut building, but we're waiting for a patch to land in meson for
getting vs_modules_defs to take generated files (which we need for either glew
or freeglut, I can't remember which off the top of my head) (and nirbheek was so
kind as to review). Assuming that my patch lands today you may be able to get
started on msvc for mesa-demos itself, or we may need to to a little more work
to get freeglut and/or glew building on msvc.

You'll probably want to either pull my meson branch, or at least look at to get
these fixes. without the updated freeglut patch meson will fail to build when it
fails the sha256 check.

Thank you for setting up appveyor support, btw. In lieu of a real windows
install with visual studio it's been very helpful, but I've never figured out
how to configure it for C/C++ 

Re: [Mesa-dev] [PATCH] addrlib: don't use linear aligned when pow2Pad is selected.

2017-04-10 Thread Dave Airlie
On 4 April 2017 at 19:11, Marek Olšák  wrote:
> Why don't you set disableLinearOpt instead?

That seems like the wrong answer.

Can the hardware do mipmaps with the base level in linear aligned format,
but the other levels 1D tiled? If not why does addrlib give me that as a
result?

I'm not sure the user should be setting some other flag to avoid bad addrlib
behaviour.

Surely the linear opt is good for something?

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Take into account lower frequency of conditional blocks in spilling cost heuristic.

2017-04-10 Thread Francisco Jerez
The individual branches of an if/else/endif construct will be executed
some unknown number of times between 0 and 1 relative to the parent
block.  Use some factor in between as weight while approximating the
cost of spill/fill instructions within a conditional if-else branch.
This favors spilling registers used within conditional branches which
are likely to be executed less frequently than registers used at the
top level.

Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on
my SKL GT4e.  Should have a comparable effect on other platforms.  No
significant regressions.
---
 src/intel/compiler/brw_fs_reg_allocate.cpp | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp 
b/src/intel/compiler/brw_fs_reg_allocate.cpp
index 5c6f3d4..c981d72 100644
--- a/src/intel/compiler/brw_fs_reg_allocate.cpp
+++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
@@ -806,7 +806,7 @@ emit_spill(const fs_builder , fs_reg src,
 int
 fs_visitor::choose_spill_reg(struct ra_graph *g)
 {
-   float loop_scale = 1.0;
+   float block_scale = 1.0;
float spill_costs[this->alloc.count];
bool no_spill[this->alloc.count];
 
@@ -822,23 +822,32 @@ fs_visitor::choose_spill_reg(struct ra_graph *g)
foreach_block_and_inst(block, fs_inst, inst, cfg) {
   for (unsigned int i = 0; i < inst->sources; i++) {
 if (inst->src[i].file == VGRF)
-spill_costs[inst->src[i].nr] += loop_scale;
+spill_costs[inst->src[i].nr] += block_scale;
   }
 
   if (inst->dst.file == VGRF)
  spill_costs[inst->dst.nr] += DIV_ROUND_UP(inst->size_written, 
REG_SIZE)
-  * loop_scale;
+  * block_scale;
 
   switch (inst->opcode) {
 
   case BRW_OPCODE_DO:
-loop_scale *= 10;
+block_scale *= 10;
 break;
 
   case BRW_OPCODE_WHILE:
-loop_scale /= 10;
+block_scale /= 10;
 break;
 
+  case BRW_OPCODE_IF:
+  case BRW_OPCODE_IFF:
+ block_scale *= 0.5;
+ break;
+
+  case BRW_OPCODE_ENDIF:
+ block_scale /= 0.5;
+ break;
+
   case SHADER_OPCODE_GEN4_SCRATCH_WRITE:
 if (inst->src[0].file == VGRF)
 no_spill[inst->src[0].nr] = true;
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V3 1/9] mesa: create _mesa_attach_renderbuffer_without_ref() helper

2017-04-10 Thread Timothy Arceri

On 11/04/17 03:11, Brian Paul wrote:

On 04/07/2017 09:21 PM, Timothy Arceri wrote:

This will be used to take ownership of freashly created renderbuffers,
avoiding the need to call the reference function which requires
locking.

V2: dereference any existing fb attachments and actually attach the
 new rb.

v3: split out validation and attachment type/complete setting into
 a shared static function.
---
  src/mesa/main/renderbuffer.c | 43
+++
  src/mesa/main/renderbuffer.h |  5 +
  2 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/src/mesa/main/renderbuffer.c b/src/mesa/main/renderbuffer.c
index 4375b5b..627bdca 100644
--- a/src/mesa/main/renderbuffer.c
+++ b/src/mesa/main/renderbuffer.c
@@ -99,28 +99,24 @@ _mesa_new_renderbuffer(struct gl_context *ctx,
GLuint name)
   * free the object in the end.
   */
  void
  _mesa_delete_renderbuffer(struct gl_context *ctx, struct
gl_renderbuffer *rb)
  {
 mtx_destroy(>Mutex);
 free(rb->Label);
 free(rb);
  }

-
-/**
- * Attach a renderbuffer to a framebuffer.
- * \param bufferName  one of the BUFFER_x tokens
- */
-void
-_mesa_add_renderbuffer(struct gl_framebuffer *fb,
-   gl_buffer_index bufferName, struct
gl_renderbuffer *rb)
+static void
+validate_and_init_renderbuffer_attachment(struct gl_framebuffer *fb,
+  gl_buffer_index bufferName,
+  struct gl_renderbuffer *rb)
  {
 assert(fb);
 assert(rb);
 assert(bufferName < BUFFER_COUNT);

 /* There should be no previous renderbuffer on this attachment
point,
  * with the exception of depth/stencil since the same
renderbuffer may
  * be used for both.
  */
 assert(bufferName == BUFFER_DEPTH ||
@@ -130,20 +126,51 @@ _mesa_add_renderbuffer(struct gl_framebuffer *fb,
 /* winsys vs. user-created buffer cross check */
 if (_mesa_is_user_fbo(fb)) {
assert(rb->Name);
 }
 else {
assert(!rb->Name);
 }

 fb->Attachment[bufferName].Type = GL_RENDERBUFFER_EXT;
 fb->Attachment[bufferName].Complete = GL_TRUE;
+}
+
+
+/**
+ * Attach a renderbuffer to a framebuffer.
+ * \param bufferName  one of the BUFFER_x tokens
+ *
+ * This function avoids adding a reference and is therefore intended
to be
+ * used with a freashly created renderbuffer.


"freshly"



+ */
+void
+_mesa_add_renderbuffer_without_ref(struct gl_framebuffer *fb,
+   gl_buffer_index bufferName,
+   struct gl_renderbuffer *rb)


I see you've already pushed this.


Yes the previous leak was causing issue in Gnome so I wanted to push 
these in case they were causing problems also.



Still, I'd like to suggest a
different name such as _mesa_own_renderbuffer() that stresses the
transfer of ownership of the renderbuffer.


I pushed the other two trivial suggestions, as for the name I struggled 
a bit with this because we still want it to be obvious that it is a 
variant of _mesa_add_renderbuffer().


How about _mesa_add_and_own_renderbuffer() ??


IMO these would make more sense being called _mesa_attach_* rather than 
add. I can change that also is you agree?


Maybe:

_mesa_attach_and_own_rb()
_mesa_attach_and_reference_rb()

??






+{


If this function should only be used with a "freshly created"
renderbuffer, can we assert that its RefCount is one here?

-Brian


+   validate_and_init_renderbuffer_attachment(fb, bufferName, rb);
+
+
_mesa_reference_renderbuffer(>Attachment[bufferName].Renderbuffer,
+NULL);
+   fb->Attachment[bufferName].Renderbuffer = rb;
+}
+
+/**
+ * Attach a renderbuffer to a framebuffer.
+ * \param bufferName  one of the BUFFER_x tokens
+ */
+void
+_mesa_add_renderbuffer(struct gl_framebuffer *fb,
+   gl_buffer_index bufferName, struct
gl_renderbuffer *rb)
+{
+   validate_and_init_renderbuffer_attachment(fb, bufferName, rb);

_mesa_reference_renderbuffer(>Attachment[bufferName].Renderbuffer,
rb);
  }


  /**
   * Remove the named renderbuffer from the given framebuffer.
   * \param bufferName  one of the BUFFER_x tokens
   */
  void
  _mesa_remove_renderbuffer(struct gl_framebuffer *fb,
diff --git a/src/mesa/main/renderbuffer.h b/src/mesa/main/renderbuffer.h
index aa83120..a6f1439 100644
--- a/src/mesa/main/renderbuffer.h
+++ b/src/mesa/main/renderbuffer.h
@@ -40,20 +40,25 @@ struct gl_renderbuffer;
  extern void
  _mesa_init_renderbuffer(struct gl_renderbuffer *rb, GLuint name);

  extern struct gl_renderbuffer *
  _mesa_new_renderbuffer(struct gl_context *ctx, GLuint name);

  extern void
  _mesa_delete_renderbuffer(struct gl_context *ctx, struct
gl_renderbuffer *rb);

  extern void
+_mesa_add_renderbuffer_without_ref(struct gl_framebuffer *fb,
+   gl_buffer_index bufferName,
+   struct gl_renderbuffer *rb);
+
+extern 

[Mesa-dev] [PATCH 2/4] radv: Rename query pipeline/set layout.

2017-04-10 Thread Bas Nieuwenhuizen
For using them with both occlusion and pipeline statistics queries.

Signed-off-by: Bas Nieuwenhuizen 
---
 src/amd/vulkan/radv_private.h |  4 ++--
 src/amd/vulkan/radv_query.c   | 22 +++---
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
index a03c24c24ac..b54a2537c8a 100644
--- a/src/amd/vulkan/radv_private.h
+++ b/src/amd/vulkan/radv_private.h
@@ -440,8 +440,8 @@ struct radv_meta_state {
} buffer;
 
struct {
-   VkDescriptorSetLayout occlusion_query_ds_layout;
-   VkPipelineLayout occlusion_query_p_layout;
+   VkDescriptorSetLayout ds_layout;
+   VkPipelineLayout p_layout;
VkPipeline occlusion_query_pipeline;
} query;
 };
diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c
index 97b1ae6ac4e..cfe16a9d0e2 100644
--- a/src/amd/vulkan/radv_query.c
+++ b/src/amd/vulkan/radv_query.c
@@ -302,14 +302,14 @@ VkResult radv_device_init_meta_query_state(struct 
radv_device *device)
result = radv_CreateDescriptorSetLayout(radv_device_to_handle(device),
_ds_create_info,
>meta_state.alloc,
-   
>meta_state.query.occlusion_query_ds_layout);
+   
>meta_state.query.ds_layout);
if (result != VK_SUCCESS)
goto fail;
 
VkPipelineLayoutCreateInfo occlusion_pl_create_info = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
.setLayoutCount = 1,
-   .pSetLayouts = 
>meta_state.query.occlusion_query_ds_layout,
+   .pSetLayouts = >meta_state.query.ds_layout,
.pushConstantRangeCount = 1,
.pPushConstantRanges = 
&(VkPushConstantRange){VK_SHADER_STAGE_COMPUTE_BIT, 0, 8},
};
@@ -317,7 +317,7 @@ VkResult radv_device_init_meta_query_state(struct 
radv_device *device)
result = radv_CreatePipelineLayout(radv_device_to_handle(device),
  _pl_create_info,
  >meta_state.alloc,
- 
>meta_state.query.occlusion_query_p_layout);
+ >meta_state.query.p_layout);
if (result != VK_SUCCESS)
goto fail;
 
@@ -333,7 +333,7 @@ VkResult radv_device_init_meta_query_state(struct 
radv_device *device)
.sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO,
.stage = occlusion_pipeline_shader_stage,
.flags = 0,
-   .layout = device->meta_state.query.occlusion_query_p_layout,
+   .layout = device->meta_state.query.p_layout,
};
 
result = radv_CreateComputePipelines(radv_device_to_handle(device),
@@ -357,14 +357,14 @@ void radv_device_finish_meta_query_state(struct 
radv_device *device)
 
device->meta_state.query.occlusion_query_pipeline,
 >meta_state.alloc);
 
-   if (device->meta_state.query.occlusion_query_p_layout)
+   if (device->meta_state.query.p_layout)
radv_DestroyPipelineLayout(radv_device_to_handle(device),
-  
device->meta_state.query.occlusion_query_p_layout,
+  device->meta_state.query.p_layout,
   >meta_state.alloc);
 
-   if (device->meta_state.query.occlusion_query_ds_layout)
+   if (device->meta_state.query.ds_layout)
radv_DestroyDescriptorSetLayout(radv_device_to_handle(device),
-   
device->meta_state.query.occlusion_query_ds_layout,
+   
device->meta_state.query.ds_layout,
>meta_state.alloc);
 }
 
@@ -383,7 +383,7 @@ static void occlusion_query_shader(struct radv_cmd_buffer 
*cmd_buffer,
radv_meta_save_compute(_state, cmd_buffer, 4);
 
radv_temp_descriptor_set_create(device, cmd_buffer,
-   
device->meta_state.query.occlusion_query_ds_layout,
+   device->meta_state.query.ds_layout,
);
 
struct radv_buffer dst_buffer = {
@@ -435,7 +435,7 @@ static void occlusion_query_shader(struct radv_cmd_buffer 
*cmd_buffer,
 
radv_CmdBindDescriptorSets(radv_cmd_buffer_to_handle(cmd_buffer),
   VK_PIPELINE_BIND_POINT_COMPUTE,
-  
device->meta_state.query.occlusion_query_p_layout, 0, 1,
+  device->meta_state.query.p_layout, 0, 1,
 

[Mesa-dev] [PATCH 3/4] radv: Let count be dynamic in radv_break_on_count.

2017-04-10 Thread Bas Nieuwenhuizen
Signed-off-by: Bas Nieuwenhuizen 
---
 src/amd/vulkan/radv_query.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c
index cfe16a9d0e2..dc1844adb51 100644
--- a/src/amd/vulkan/radv_query.c
+++ b/src/amd/vulkan/radv_query.c
@@ -51,12 +51,12 @@ static unsigned get_max_db(struct radv_device *device)
return num_db;
 }
 
-static void radv_break_on_count(nir_builder *b, nir_variable *var, int count)
+static void radv_break_on_count(nir_builder *b, nir_variable *var, nir_ssa_def 
*count)
 {
nir_ssa_def *counter = nir_load_var(b, var);
 
nir_if *if_stmt = nir_if_create(b->shader);
-   if_stmt->condition = nir_src_for_ssa(nir_uge(b, counter, nir_imm_int(b, 
count)));
+   if_stmt->condition = nir_src_for_ssa(nir_uge(b, counter, count));
nir_cf_node_insert(b->cursor, _stmt->cf_node);
 
b->cursor = nir_after_cf_list(_stmt->then_list);
@@ -175,7 +175,7 @@ build_occlusion_query_shader(struct radv_device *device) {
b.cursor = nir_after_cf_list(_loop->body);
 
nir_ssa_def *current_outer_count = nir_load_var(, outer_counter);
-   radv_break_on_count(, outer_counter, db_count);
+   radv_break_on_count(, outer_counter, nir_imm_int(, db_count));
 
nir_ssa_def *load_offset = nir_imul(, current_outer_count, 
nir_imm_int(, 16));
load_offset = nir_iadd(, input_base, load_offset);
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] radv: Implement pipeline statistics queries.

2017-04-10 Thread Bas Nieuwenhuizen
The devil is in the shader again, otherwise this is
fairly straightforward.

The CTS contains no pipeline statistics copy to buffer
testcases, so I did a basic smoketest.

Signed-off-by: Bas Nieuwenhuizen 
---
 src/amd/vulkan/radv_device.c  |   2 +-
 src/amd/vulkan/radv_private.h |   2 +
 src/amd/vulkan/radv_query.c   | 414 +++---
 3 files changed, 392 insertions(+), 26 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 9e8faa3da9a..5f14394196a 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -483,7 +483,7 @@ void radv_GetPhysicalDeviceFeatures(
.textureCompressionASTC_LDR   = false,
.textureCompressionBC = true,
.occlusionQueryPrecise= true,
-   .pipelineStatisticsQuery  = false,
+   .pipelineStatisticsQuery  = true,
.vertexPipelineStoresAndAtomics   = true,
.fragmentStoresAndAtomics = true,
.shaderTessellationAndGeometryPointSize   = true,
diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
index b54a2537c8a..2cb8cdd8d84 100644
--- a/src/amd/vulkan/radv_private.h
+++ b/src/amd/vulkan/radv_private.h
@@ -443,6 +443,7 @@ struct radv_meta_state {
VkDescriptorSetLayout ds_layout;
VkPipelineLayout p_layout;
VkPipeline occlusion_query_pipeline;
+   VkPipeline pipeline_statistics_query_pipeline;
} query;
 };
 
@@ -1379,6 +1380,7 @@ struct radv_query_pool {
uint32_t availability_offset;
char *ptr;
VkQueryType type;
+   uint32_t pipeline_stats_mask;
 };
 
 VkResult
diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c
index dc1844adb51..2de484224bc 100644
--- a/src/amd/vulkan/radv_query.c
+++ b/src/amd/vulkan/radv_query.c
@@ -35,6 +35,9 @@
 #include "radv_cs.h"
 #include "sid.h"
 
+
+static const unsigned pipeline_statistics_indices[] = {7, 6, 3, 4, 5, 2, 1, 0, 
8, 9, 10};
+
 static unsigned get_max_db(struct radv_device *device)
 {
unsigned num_db = device->physical_device->rad_info.num_render_backends;
@@ -269,14 +272,259 @@ build_occlusion_query_shader(struct radv_device *device) 
{
return b.shader;
 }
 
+static nir_shader *
+build_pipeline_statistics_query_shader(struct radv_device *device) {
+   /* the shader this builds is roughly
+*
+* push constants {
+*  uint32_t flags;
+*  uint32_t dst_stride;
+*  uint32_t stats_mask;
+*  uint32_t avail_offset;
+* };
+*
+* uint32_t src_stride = 11 * 16;
+*
+* location(binding = 0) buffer dst_buf;
+* location(binding = 1) buffer src_buf;
+*
+* void main() {
+*  uint64_t src_offset = src_stride * global_id.x;
+*  uint64_t dst_base = dst_stride * global_id.x;
+*  uint64_t dst_offset = dst_base;
+*  uint32_t elem_size = flags & VK_QUERY_RESULT_64_BIT ? 8 : 4;
+*  uint32_t elem_count = stats_mask >> 16;
+*  uint32_t available = src_buf[avail_offset + 4 * global_id.x];
+*  if (flags & VK_QUERY_RESULT_WITH_AVAILABILITY_BIT) {
+*  dst_buf[dst_offset + elem_count * elem_size] = 
available;
+*  }
+*  if (available) {
+*  // repeat 11 times:
+*  if (stats_mask & (1 << 0)) {
+*  uint64_t start = src_buf[src_offset + 8 * 
indices[0]];
+*  uint64_t end = src_buf[src_offset + 8 * 
indices[0] + 0x58];
+*  uint64_t result = end - start;
+*  if (flags & VK_QUERY_RESULT_64_BIT)
+*  dst_buf[dst_offset] = result;
+*  else
+*  dst_buf[dst_offset] = (uint32_t)result.
+*  dst_offset += elem_size;
+*  }
+*  } else if (flags & VK_QUERY_RESULT_PARTIAL_BIT) {
+*  // Set everything to 0 as we don't know what is valid.
+*  for (int i = 0; i < elem_count; ++i)
+*  dst_buf[dst_base + elem_size * i] = 0;
+*  }
+* }
+*/
+   nir_builder b;
+   nir_builder_init_simple_shader(, NULL, MESA_SHADER_COMPUTE, NULL);
+   b.shader->info->name = ralloc_strdup(b.shader, 
"pipeline_statistics_query");
+   b.shader->info->cs.local_size[0] = 64;
+   b.shader->info->cs.local_size[1] = 1;
+   b.shader->info->cs.local_size[2] = 1;
+
+   nir_variable *output_offset = nir_local_variable_create(b.impl, 
glsl_int_type(), "output_offset");
+
+

[Mesa-dev] [PATCH 1/4] radv: Use VK_WHOLE_SIZE for the query buffer bindings.

2017-04-10 Thread Bas Nieuwenhuizen
The buffer sizes are specified just a few lines earlier, so don't
repeat ourselves.

Signed-off-by: Bas Nieuwenhuizen 
---
 src/amd/vulkan/radv_query.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c
index 86be85a5369..97b1ae6ac4e 100644
--- a/src/amd/vulkan/radv_query.c
+++ b/src/amd/vulkan/radv_query.c
@@ -411,7 +411,7 @@ static void occlusion_query_shader(struct radv_cmd_buffer 
*cmd_buffer,
  .pBufferInfo = 
&(VkDescriptorBufferInfo) {
.buffer = 
radv_buffer_to_handle(_buffer),
.offset = 0,
-   .range = dst_stride * 
count
+   .range = VK_WHOLE_SIZE
  }
  },
  {
@@ -424,7 +424,7 @@ static void occlusion_query_shader(struct radv_cmd_buffer 
*cmd_buffer,
  .pBufferInfo = 
&(VkDescriptorBufferInfo) {
.buffer = 
radv_buffer_to_handle(_buffer),
.offset = 0,
-   .range = stride * count
+   .range = VK_WHOLE_SIZE
  }
  }
  }, 0, NULL);
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] glsl: use the BA1 macro for textureCubeArrayShadow()

2017-04-10 Thread Timothy Arceri

Series:

Reviewed-by: Timothy Arceri 

On 11/04/17 03:23, Samuel Pitoiset wrote:

For both consistency and new bindless sampler types.

Signed-off-by: Samuel Pitoiset 
---
 src/compiler/glsl/builtin_functions.cpp | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/compiler/glsl/builtin_functions.cpp 
b/src/compiler/glsl/builtin_functions.cpp
index d902a91a77..0ab7875295 100644
--- a/src/compiler/glsl/builtin_functions.cpp
+++ b/src/compiler/glsl/builtin_functions.cpp
@@ -847,7 +847,7 @@ private:
const glsl_type *sampler_type,
const glsl_type *coord_type,
int flags = 0);
-   B0(textureCubeArrayShadow);
+   BA1(textureCubeArrayShadow);
ir_function_signature *_texelFetch(builtin_available_predicate avail,
   const glsl_type *return_type,
   const glsl_type *sampler_type,
@@ -1839,7 +1839,7 @@ builtin_builder::create_builtins()
 /* samplerCubeArrayShadow is special; it has an extra parameter
  * for the shadow comparator since there is no vec5 type.
  */
-_textureCubeArrayShadow(),
+_textureCubeArrayShadow(texture_cube_map_array, 
glsl_type::samplerCubeArrayShadow_type),

 _texture(ir_tex, v130, glsl_type::vec4_type,  
glsl_type::sampler2DRect_type,  glsl_type::vec2_type),
 _texture(ir_tex, v130, glsl_type::ivec4_type, 
glsl_type::isampler2DRect_type, glsl_type::vec2_type),
@@ -5064,12 +5064,13 @@ builtin_builder::_texture(ir_texture_opcode opcode,
 }

 ir_function_signature *
-builtin_builder::_textureCubeArrayShadow()
+builtin_builder::_textureCubeArrayShadow(builtin_available_predicate avail,
+ const glsl_type *sampler_type)
 {
-   ir_variable *s = in_var(glsl_type::samplerCubeArrayShadow_type, "sampler");
+   ir_variable *s = in_var(sampler_type, "sampler");
ir_variable *P = in_var(glsl_type::vec4_type, "P");
ir_variable *compare = in_var(glsl_type::float_type, "compare");
-   MAKE_SIG(glsl_type::float_type, texture_cube_map_array, 3, s, P, compare);
+   MAKE_SIG(glsl_type::float_type, avail, 3, s, P, compare);

ir_texture *tex = new(mem_ctx) ir_texture(ir_tex);
tex->set_sampler(var_ref(s), glsl_type::float_type);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nv50/ir: Change chipset constants to ISA constants.

2017-04-10 Thread Samuel Pitoiset

Karol told me that over IRC. Introducing ->getIsa() looks good to me.

On 04/11/2017 01:01 AM, Ilia Mirkin wrote:

I wanted to flip things over and use smxx notation...

On Apr 10, 2017 6:20 PM, "Samuel Pitoiset" > wrote:


Not sure why you get confused here. The chipset names are globally
consistent inside the codegen part and we never use SMxx. Maybe add
a comment like:

#define NVISA_GK104_CHIPSET0xe0 /* SM30 */

If you really need this?

On 04/10/2017 11:41 PM, Matthew Mondazzi wrote:

Define references to chipset did not actually use chipset,
leading to confusion. More relevant ISA constants put in place
of chipset compares.

Signed-off-by: Matthew Mondazzi >
---
   .../drivers/nouveau/codegen/nv50_ir_driver.h   |  7 ++--
   .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 24
+--
   .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 46
+++---
   .../nouveau/codegen/nv50_ir_target_nvc0.cpp|  6 +--
   4 files changed, 42 insertions(+), 41 deletions(-)

diff --git
a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
index e7d840d..76c815e 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
@@ -75,9 +75,10 @@ struct nv50_ir_prog_symbol
  uint32_t offset;
   };
   -#define NVISA_GK104_CHIPSET0xe0
-#define NVISA_GK20A_CHIPSET0xea
-#define NVISA_GM107_CHIPSET0x110
+#define NVISA_SM30   0xe0
+#define NVISA_SM35   0xea
+#define NVISA_SM50   0x110
+
 struct nv50_ir_prog_info
   {
diff --git
a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index 5467447..ed29661 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -806,7 +806,7 @@ CodeEmitterNVC0::emitSHLADD(const
Instruction *i)
   void
   CodeEmitterNVC0::emitMADSP(const Instruction *i)
   {
-   assert(targ->getChipset() >= NVISA_GK104_CHIPSET);
+   assert(targ->getChipset() >= NVISA_SM30);
emitForm_A(i, HEX64(, 0003));
   @@ -1852,7 +1852,7 @@ CodeEmitterNVC0::emitSTORE(const
Instruction *i)
  case FILE_MEMORY_LOCAL:  opc = 0xc800; break;
  case FILE_MEMORY_SHARED:
 if (i->subOp == NV50_IR_SUBOP_STORE_UNLOCKED) {
- if (targ->getChipset() >= NVISA_GK104_CHIPSET)
+ if (targ->getChipset() >= NVISA_SM30)
   opc = 0xb800;
else
   opc = 0xcc00;
@@ -1868,7 +1868,7 @@ CodeEmitterNVC0::emitSTORE(const
Instruction *i)
  code[0] = 0x0005;
  code[1] = opc;
   -   if (targ->getChipset() >= NVISA_GK104_CHIPSET) {
+   if (targ->getChipset() >= NVISA_SM30) {
 // Unlocked store on shared memory can fail.
 if (i->src(0).getFile() == FILE_MEMORY_SHARED &&
 i->subOp == NV50_IR_SUBOP_STORE_UNLOCKED) {
@@ -1901,7 +1901,7 @@ CodeEmitterNVC0::emitLOAD(const
Instruction *i)
  case FILE_MEMORY_LOCAL:  opc = 0xc000; break;
  case FILE_MEMORY_SHARED:
 if (i->subOp == NV50_IR_SUBOP_LOAD_LOCKED) {
- if (targ->getChipset() >= NVISA_GK104_CHIPSET)
+ if (targ->getChipset() >= NVISA_SM30)
   opc = 0xa800;
else
   opc = 0xc400;
@@ -1944,7 +1944,7 @@ CodeEmitterNVC0::emitLOAD(const
Instruction *i)
 code[0] |= 63 << 14;
if (p >= 0) {
-  if (targ->getChipset() >= NVISA_GK104_CHIPSET)
+  if (targ->getChipset() >= NVISA_SM30)
defId(i->def(p), 8);
 else
defId(i->def(p), 32 + 18);
@@ -2362,7 +2362,7 @@ CodeEmitterNVC0::emitSUSTGx(const
TexInstruction *i)
   void
   CodeEmitterNVC0::emitSUAddr(const TexInstruction *i)
   {
-   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
+   assert(targ->getChipset() < NVISA_SM30);
if (i->tex.rIndirectSrc < 0) {
 code[1] |= 0x4000;
@@ -2375,7 +2375,7 @@ CodeEmitterNVC0::emitSUAddr(const
TexInstruction *i)
   void
   

Re: [Mesa-dev] [PATCH] nv50/ir: Change chipset constants to ISA constants.

2017-04-10 Thread Ilia Mirkin
I wanted to flip things over and use smxx notation...

On Apr 10, 2017 6:20 PM, "Samuel Pitoiset" 
wrote:

> Not sure why you get confused here. The chipset names are globally
> consistent inside the codegen part and we never use SMxx. Maybe add a
> comment like:
>
> #define NVISA_GK104_CHIPSET0xe0 /* SM30 */
>
> If you really need this?
>
> On 04/10/2017 11:41 PM, Matthew Mondazzi wrote:
>
>> Define references to chipset did not actually use chipset, leading to
>> confusion. More relevant ISA constants put in place of chipset compares.
>>
>> Signed-off-by: Matthew Mondazzi 
>> ---
>>   .../drivers/nouveau/codegen/nv50_ir_driver.h   |  7 ++--
>>   .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 24 +--
>>   .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 46
>> +++---
>>   .../nouveau/codegen/nv50_ir_target_nvc0.cpp|  6 +--
>>   4 files changed, 42 insertions(+), 41 deletions(-)
>>
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
>> index e7d840d..76c815e 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
>> @@ -75,9 +75,10 @@ struct nv50_ir_prog_symbol
>>  uint32_t offset;
>>   };
>>   -#define NVISA_GK104_CHIPSET0xe0
>> -#define NVISA_GK20A_CHIPSET0xea
>> -#define NVISA_GM107_CHIPSET0x110
>> +#define NVISA_SM30   0xe0
>> +#define NVISA_SM35   0xea
>> +#define NVISA_SM50   0x110
>> +
>> struct nv50_ir_prog_info
>>   {
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
>> index 5467447..ed29661 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
>> @@ -806,7 +806,7 @@ CodeEmitterNVC0::emitSHLADD(const Instruction *i)
>>   void
>>   CodeEmitterNVC0::emitMADSP(const Instruction *i)
>>   {
>> -   assert(targ->getChipset() >= NVISA_GK104_CHIPSET);
>> +   assert(targ->getChipset() >= NVISA_SM30);
>>emitForm_A(i, HEX64(, 0003));
>>   @@ -1852,7 +1852,7 @@ CodeEmitterNVC0::emitSTORE(const Instruction *i)
>>  case FILE_MEMORY_LOCAL:  opc = 0xc800; break;
>>  case FILE_MEMORY_SHARED:
>> if (i->subOp == NV50_IR_SUBOP_STORE_UNLOCKED) {
>> - if (targ->getChipset() >= NVISA_GK104_CHIPSET)
>> + if (targ->getChipset() >= NVISA_SM30)
>>   opc = 0xb800;
>>else
>>   opc = 0xcc00;
>> @@ -1868,7 +1868,7 @@ CodeEmitterNVC0::emitSTORE(const Instruction *i)
>>  code[0] = 0x0005;
>>  code[1] = opc;
>>   -   if (targ->getChipset() >= NVISA_GK104_CHIPSET) {
>> +   if (targ->getChipset() >= NVISA_SM30) {
>> // Unlocked store on shared memory can fail.
>> if (i->src(0).getFile() == FILE_MEMORY_SHARED &&
>> i->subOp == NV50_IR_SUBOP_STORE_UNLOCKED) {
>> @@ -1901,7 +1901,7 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i)
>>  case FILE_MEMORY_LOCAL:  opc = 0xc000; break;
>>  case FILE_MEMORY_SHARED:
>> if (i->subOp == NV50_IR_SUBOP_LOAD_LOCKED) {
>> - if (targ->getChipset() >= NVISA_GK104_CHIPSET)
>> + if (targ->getChipset() >= NVISA_SM30)
>>   opc = 0xa800;
>>else
>>   opc = 0xc400;
>> @@ -1944,7 +1944,7 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i)
>> code[0] |= 63 << 14;
>>if (p >= 0) {
>> -  if (targ->getChipset() >= NVISA_GK104_CHIPSET)
>> +  if (targ->getChipset() >= NVISA_SM30)
>>defId(i->def(p), 8);
>> else
>>defId(i->def(p), 32 + 18);
>> @@ -2362,7 +2362,7 @@ CodeEmitterNVC0::emitSUSTGx(const TexInstruction
>> *i)
>>   void
>>   CodeEmitterNVC0::emitSUAddr(const TexInstruction *i)
>>   {
>> -   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
>> +   assert(targ->getChipset() < NVISA_SM30);
>>if (i->tex.rIndirectSrc < 0) {
>> code[1] |= 0x4000;
>> @@ -2375,7 +2375,7 @@ CodeEmitterNVC0::emitSUAddr(const TexInstruction
>> *i)
>>   void
>>   CodeEmitterNVC0::emitSUDim(const TexInstruction *i)
>>   {
>> -   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
>> +   assert(targ->getChipset() < NVISA_SM30);
>>code[1] |= (i->tex.target.getDim() - 1) << 12;
>>  if (i->tex.target.isArray() || i->tex.target.isCube() ||
>> @@ -2390,7 +2390,7 @@ CodeEmitterNVC0::emitSUDim(const TexInstruction *i)
>>   void
>>   CodeEmitterNVC0::emitSULEA(const TexInstruction *i)
>>   {
>> -   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
>> +   assert(targ->getChipset() < NVISA_SM30);
>>code[0] = 0x5;
>>  code[1] = 0xf000;
>> @@ -2413,7 +2413,7 @@ CodeEmitterNVC0::emitSULEA(const TexInstruction *i)
>>   void
>>   CodeEmitterNVC0::emitSULDB(const TexInstruction *i)
>>   {
>> -   

Re: [Mesa-dev] [PATCH] anv/pass: Initialize anv_pass::subpass_attachments

2017-04-10 Thread Jason Ekstrand
On Mon, Apr 10, 2017 at 3:13 PM, Nanley Chery  wrote:

> On Mon, Apr 10, 2017 at 01:31:52PM -0700, Nanley Chery wrote:
> > Fixes 0039d0cf278 "anv/pass: Use anv_multialloc for allocating the
> anv_pass"
> >
> > Signed-off-by: Nanley Chery 
> > ---
> >  src/intel/vulkan/anv_pass.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
>
> I rescind my patch submission. This field has no users, so we should
> probably just delete it.
>

I thought I already did. :(  Yes, please delete unused fields.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/5] use atomics for reference counting

2017-04-10 Thread Timothy Arceri

Hi,

I've been looking into this recently also. Unfortunately I don't think 
these will get applied as is.


These changes have been submitted before but rejected because they make 
existing race conditions worse. We really need to fix those first, I 
really think we are going to need some multi-threaded piglit tests to 
test some of this.


Also I think some of the locking (e.g. arrayobj, pipelineobj) can be 
dropped if we drop support for the GLX_MESA_multithread_makecurrent 
extension (which I believe we are planning to drop).


Tim

On 11/04/17 06:08, Bartosz Tomczyk wrote:

Bartosz Tomczyk (5):
  mesa/arrayobj: use atomics for reference counting
  mesa/pipelineobj: use atomics for reference counting
  mesa/renderbuffer: use atomics for reference counting
  mesa/samplerobj: use atomics for reference counting
  mesa/texobj: use atomics for reference counting

 src/mesa/main/arrayobj.c | 16 
 src/mesa/main/fbobject.c |  1 -
 src/mesa/main/mtypes.h   |  7 ---
 src/mesa/main/pipelineobj.c  | 16 
 src/mesa/main/renderbuffer.c | 15 +++
 src/mesa/main/samplerobj.c   | 16 
 src/mesa/main/shaderapi.c|  2 --
 src/mesa/main/texobj.c   | 19 ---
 8 files changed, 19 insertions(+), 73 deletions(-)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/53] i965: Eat libdrm_intel for breakfast

2017-04-10 Thread Chad Versace
On Tue 04 Apr 2017, Kenneth Graunke wrote:

> This series imports libdrm_intel into the i965 driver, hacks and
> slashes it down to size, and greatly simplifies our relocation
> handling.

You did it! IT'S FINALLY HAPPENING!!! Thanks for taking the leap.

> https://cgit.freedesktop.org/~kwg/mesa/log/?h=bacondrm

Bacon. Yum...

> This series begins making incremental progress towards a better future
> by importing libdrm_intel, and adjusting it to fit our needs.  libdrm
> provides some fairly foundational pieces of the driver, so it's not
> easy to move away from it in one swoop.  The series does not yet solve
> most of the problems, but it does cut 85% of the code out, and removes
> ABI-guarantee problems, which should make it much easier to work with.

This is a great start.

> I apologize that it may be difficult to review: most people aren't
> familiar with this code (I learned a lot myself), and it's kind of
> huge.  I tried.

I've poked inside libdrm enough times that I volunteer to do some
reviewing.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nv50/ir: Change chipset constants to ISA constants.

2017-04-10 Thread Samuel Pitoiset
Not sure why you get confused here. The chipset names are globally 
consistent inside the codegen part and we never use SMxx. Maybe add a 
comment like:


#define NVISA_GK104_CHIPSET0xe0 /* SM30 */

If you really need this?

On 04/10/2017 11:41 PM, Matthew Mondazzi wrote:

Define references to chipset did not actually use chipset, leading to 
confusion. More relevant ISA constants put in place of chipset compares.

Signed-off-by: Matthew Mondazzi 
---
  .../drivers/nouveau/codegen/nv50_ir_driver.h   |  7 ++--
  .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 24 +--
  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 46 +++---
  .../nouveau/codegen/nv50_ir_target_nvc0.cpp|  6 +--
  4 files changed, 42 insertions(+), 41 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
index e7d840d..76c815e 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
@@ -75,9 +75,10 @@ struct nv50_ir_prog_symbol
 uint32_t offset;
  };
  
-#define NVISA_GK104_CHIPSET0xe0

-#define NVISA_GK20A_CHIPSET0xea
-#define NVISA_GM107_CHIPSET0x110
+#define NVISA_SM30   0xe0
+#define NVISA_SM35   0xea
+#define NVISA_SM50   0x110
+
  
  struct nv50_ir_prog_info

  {
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index 5467447..ed29661 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -806,7 +806,7 @@ CodeEmitterNVC0::emitSHLADD(const Instruction *i)
  void
  CodeEmitterNVC0::emitMADSP(const Instruction *i)
  {
-   assert(targ->getChipset() >= NVISA_GK104_CHIPSET);
+   assert(targ->getChipset() >= NVISA_SM30);
  
 emitForm_A(i, HEX64(, 0003));
  
@@ -1852,7 +1852,7 @@ CodeEmitterNVC0::emitSTORE(const Instruction *i)

 case FILE_MEMORY_LOCAL:  opc = 0xc800; break;
 case FILE_MEMORY_SHARED:
if (i->subOp == NV50_IR_SUBOP_STORE_UNLOCKED) {
- if (targ->getChipset() >= NVISA_GK104_CHIPSET)
+ if (targ->getChipset() >= NVISA_SM30)
  opc = 0xb800;
   else
  opc = 0xcc00;
@@ -1868,7 +1868,7 @@ CodeEmitterNVC0::emitSTORE(const Instruction *i)
 code[0] = 0x0005;
 code[1] = opc;
  
-   if (targ->getChipset() >= NVISA_GK104_CHIPSET) {

+   if (targ->getChipset() >= NVISA_SM30) {
// Unlocked store on shared memory can fail.
if (i->src(0).getFile() == FILE_MEMORY_SHARED &&
i->subOp == NV50_IR_SUBOP_STORE_UNLOCKED) {
@@ -1901,7 +1901,7 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i)
 case FILE_MEMORY_LOCAL:  opc = 0xc000; break;
 case FILE_MEMORY_SHARED:
if (i->subOp == NV50_IR_SUBOP_LOAD_LOCKED) {
- if (targ->getChipset() >= NVISA_GK104_CHIPSET)
+ if (targ->getChipset() >= NVISA_SM30)
  opc = 0xa800;
   else
  opc = 0xc400;
@@ -1944,7 +1944,7 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i)
code[0] |= 63 << 14;
  
 if (p >= 0) {

-  if (targ->getChipset() >= NVISA_GK104_CHIPSET)
+  if (targ->getChipset() >= NVISA_SM30)
   defId(i->def(p), 8);
else
   defId(i->def(p), 32 + 18);
@@ -2362,7 +2362,7 @@ CodeEmitterNVC0::emitSUSTGx(const TexInstruction *i)
  void
  CodeEmitterNVC0::emitSUAddr(const TexInstruction *i)
  {
-   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
+   assert(targ->getChipset() < NVISA_SM30);
  
 if (i->tex.rIndirectSrc < 0) {

code[1] |= 0x4000;
@@ -2375,7 +2375,7 @@ CodeEmitterNVC0::emitSUAddr(const TexInstruction *i)
  void
  CodeEmitterNVC0::emitSUDim(const TexInstruction *i)
  {
-   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
+   assert(targ->getChipset() < NVISA_SM30);
  
 code[1] |= (i->tex.target.getDim() - 1) << 12;

 if (i->tex.target.isArray() || i->tex.target.isCube() ||
@@ -2390,7 +2390,7 @@ CodeEmitterNVC0::emitSUDim(const TexInstruction *i)
  void
  CodeEmitterNVC0::emitSULEA(const TexInstruction *i)
  {
-   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
+   assert(targ->getChipset() < NVISA_SM30);
  
 code[0] = 0x5;

 code[1] = 0xf000;
@@ -2413,7 +2413,7 @@ CodeEmitterNVC0::emitSULEA(const TexInstruction *i)
  void
  CodeEmitterNVC0::emitSULDB(const TexInstruction *i)
  {
-   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
+   assert(targ->getChipset() < NVISA_SM30);
  
 code[0] = 0x5;

 code[1] = 0xd400 | (i->subOp << 15);
@@ -2431,7 +2431,7 @@ CodeEmitterNVC0::emitSULDB(const TexInstruction *i)
  void
  CodeEmitterNVC0::emitSUSTx(const TexInstruction *i)
  {
-   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
+   assert(targ->getChipset() < NVISA_SM30);
  
 code[0] = 0x5;

 code[1] = 0xdc00 

Re: [Mesa-dev] [PATCH 0/9] nvc0: ARB_shader_ballot for Kepler+ (v3)

2017-04-10 Thread Samuel Pitoiset

Series is:

Reviewed-by: Samuel Pitoiset 

Thanks!

On 04/10/2017 04:55 PM, Boyan Ding wrote:

This is the third, and hopefully the last revision of ballot series.
This series mainly incorporates Ilia's feedback, with some fixes, more
check and code cleanup.

Please review.

Boyan Ding (9):
   gm107/ir: Emit third src 'bound' and optional predicate output of SHFL
   nvc0/ir: Properly handle a "split form" of predicate destination
   nvc0/ir: Emit OP_SHFL
   gk110/ir: Emit OP_SHFL
   nvc0/ir: Allow 0/1 immediate value as source of OP_VOTE
   nvc0/ir: Add SV_LANEMASK_* system values.
   nvc0/ir: Implement TGSI_SEMANTIC_SUBGROUP_*
   nvc0/ir: Implement TGSI_OPCODE_BALLOT and TGSI_OPCODE_READ_*
   nvc0: Enable ARB_shader_ballot on Kepler+

  docs/features.txt  |  2 +-
  docs/relnotes/17.1.0.html  |  2 +-
  src/gallium/drivers/nouveau/codegen/nv50_ir.h  |  5 ++
  .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 85 ++-
  .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 51 ++--
  .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 97 --
  .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 58 +
  .../nouveau/codegen/nv50_ir_lowering_gm107.cpp | 15 ++--
  .../drivers/nouveau/codegen/nv50_ir_print.cpp  |  5 ++
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |  3 +-
  10 files changed, 298 insertions(+), 25 deletions(-)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv/pass: Initialize anv_pass::subpass_attachments

2017-04-10 Thread Nanley Chery
On Mon, Apr 10, 2017 at 01:31:52PM -0700, Nanley Chery wrote:
> Fixes 0039d0cf278 "anv/pass: Use anv_multialloc for allocating the anv_pass"
> 
> Signed-off-by: Nanley Chery 
> ---
>  src/intel/vulkan/anv_pass.c | 1 +
>  1 file changed, 1 insertion(+)
> 

I rescind my patch submission. This field has no users, so we should
probably just delete it.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nv50/ir: Change chipset constants to ISA constants.

2017-04-10 Thread Matthew Mondazzi
Define references to chipset did not actually use chipset, leading to 
confusion. More relevant ISA constants put in place of chipset compares.

Signed-off-by: Matthew Mondazzi 
---
 .../drivers/nouveau/codegen/nv50_ir_driver.h   |  7 ++--
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 24 +--
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 46 +++---
 .../nouveau/codegen/nv50_ir_target_nvc0.cpp|  6 +--
 4 files changed, 42 insertions(+), 41 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
index e7d840d..76c815e 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
@@ -75,9 +75,10 @@ struct nv50_ir_prog_symbol
uint32_t offset;
 };
 
-#define NVISA_GK104_CHIPSET0xe0
-#define NVISA_GK20A_CHIPSET0xea
-#define NVISA_GM107_CHIPSET0x110
+#define NVISA_SM30   0xe0
+#define NVISA_SM35   0xea
+#define NVISA_SM50   0x110
+
 
 struct nv50_ir_prog_info
 {
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index 5467447..ed29661 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -806,7 +806,7 @@ CodeEmitterNVC0::emitSHLADD(const Instruction *i)
 void
 CodeEmitterNVC0::emitMADSP(const Instruction *i)
 {
-   assert(targ->getChipset() >= NVISA_GK104_CHIPSET);
+   assert(targ->getChipset() >= NVISA_SM30);
 
emitForm_A(i, HEX64(, 0003));
 
@@ -1852,7 +1852,7 @@ CodeEmitterNVC0::emitSTORE(const Instruction *i)
case FILE_MEMORY_LOCAL:  opc = 0xc800; break;
case FILE_MEMORY_SHARED:
   if (i->subOp == NV50_IR_SUBOP_STORE_UNLOCKED) {
- if (targ->getChipset() >= NVISA_GK104_CHIPSET)
+ if (targ->getChipset() >= NVISA_SM30)
 opc = 0xb800;
  else
 opc = 0xcc00;
@@ -1868,7 +1868,7 @@ CodeEmitterNVC0::emitSTORE(const Instruction *i)
code[0] = 0x0005;
code[1] = opc;
 
-   if (targ->getChipset() >= NVISA_GK104_CHIPSET) {
+   if (targ->getChipset() >= NVISA_SM30) {
   // Unlocked store on shared memory can fail.
   if (i->src(0).getFile() == FILE_MEMORY_SHARED &&
   i->subOp == NV50_IR_SUBOP_STORE_UNLOCKED) {
@@ -1901,7 +1901,7 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i)
case FILE_MEMORY_LOCAL:  opc = 0xc000; break;
case FILE_MEMORY_SHARED:
   if (i->subOp == NV50_IR_SUBOP_LOAD_LOCKED) {
- if (targ->getChipset() >= NVISA_GK104_CHIPSET)
+ if (targ->getChipset() >= NVISA_SM30)
 opc = 0xa800;
  else
 opc = 0xc400;
@@ -1944,7 +1944,7 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i)
   code[0] |= 63 << 14;
 
if (p >= 0) {
-  if (targ->getChipset() >= NVISA_GK104_CHIPSET)
+  if (targ->getChipset() >= NVISA_SM30)
  defId(i->def(p), 8);
   else
  defId(i->def(p), 32 + 18);
@@ -2362,7 +2362,7 @@ CodeEmitterNVC0::emitSUSTGx(const TexInstruction *i)
 void
 CodeEmitterNVC0::emitSUAddr(const TexInstruction *i)
 {
-   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
+   assert(targ->getChipset() < NVISA_SM30);
 
if (i->tex.rIndirectSrc < 0) {
   code[1] |= 0x4000;
@@ -2375,7 +2375,7 @@ CodeEmitterNVC0::emitSUAddr(const TexInstruction *i)
 void
 CodeEmitterNVC0::emitSUDim(const TexInstruction *i)
 {
-   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
+   assert(targ->getChipset() < NVISA_SM30);
 
code[1] |= (i->tex.target.getDim() - 1) << 12;
if (i->tex.target.isArray() || i->tex.target.isCube() ||
@@ -2390,7 +2390,7 @@ CodeEmitterNVC0::emitSUDim(const TexInstruction *i)
 void
 CodeEmitterNVC0::emitSULEA(const TexInstruction *i)
 {
-   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
+   assert(targ->getChipset() < NVISA_SM30);
 
code[0] = 0x5;
code[1] = 0xf000;
@@ -2413,7 +2413,7 @@ CodeEmitterNVC0::emitSULEA(const TexInstruction *i)
 void
 CodeEmitterNVC0::emitSULDB(const TexInstruction *i)
 {
-   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
+   assert(targ->getChipset() < NVISA_SM30);
 
code[0] = 0x5;
code[1] = 0xd400 | (i->subOp << 15);
@@ -2431,7 +2431,7 @@ CodeEmitterNVC0::emitSULDB(const TexInstruction *i)
 void
 CodeEmitterNVC0::emitSUSTx(const TexInstruction *i)
 {
-   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
+   assert(targ->getChipset() < NVISA_SM30);
 
code[0] = 0x5;
code[1] = 0xdc00 | (i->subOp << 15);
@@ -2751,14 +2751,14 @@ CodeEmitterNVC0::emitInstruction(Instruction *insn)
   emitMADSP(insn);
   break;
case OP_SULDB:
-  if (targ->getChipset() >= NVISA_GK104_CHIPSET)
+  if (targ->getChipset() >= NVISA_SM30)
  emitSULDGB(insn->asTex());
   else
  emitSULDB(insn->asTex());
   break;

Re: [Mesa-dev] [PATCH 3/3 v2] r600g: get rid of dummy pixel shader

2017-04-10 Thread Marek Olšák
Pushed the series, thanks!

Marek

On Mon, Apr 10, 2017 at 10:04 PM, Constantine Kharlamov
 wrote:
> The idea is taken from radeonsi. The code mostly was already checking for null
> pixel shader, so little checks had to be added.
>
> Interestingly, acc. to testing with GTAⅣ, though binding of null shader 
> happens
> a lot at the start (then just stops), but draw_vbo() never actually sees null
> ps.
>
> v2: added a check I missed because of a macros using a prefix to choose
> a shader.
>
> Signed-off-by: Constantine Kharlamov 
> Reviewed-by: Marek Olšák 
> ---
>  src/gallium/drivers/r600/r600_pipe.c |  9 -
>  src/gallium/drivers/r600/r600_pipe.h |  3 --
>  src/gallium/drivers/r600/r600_state_common.c | 58 
> ++--
>  3 files changed, 30 insertions(+), 40 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/r600_pipe.c 
> b/src/gallium/drivers/r600/r600_pipe.c
> index 5014f2525c..7d8efd2c9b 100644
> --- a/src/gallium/drivers/r600/r600_pipe.c
> +++ b/src/gallium/drivers/r600/r600_pipe.c
> @@ -82,9 +82,6 @@ static void r600_destroy_context(struct pipe_context 
> *context)
> if (rctx->fixed_func_tcs_shader)
> rctx->b.b.delete_tcs_state(>b.b, 
> rctx->fixed_func_tcs_shader);
>
> -   if (rctx->dummy_pixel_shader) {
> -   rctx->b.b.delete_fs_state(>b.b, 
> rctx->dummy_pixel_shader);
> -   }
> if (rctx->custom_dsa_flush) {
> rctx->b.b.delete_depth_stencil_alpha_state(>b.b, 
> rctx->custom_dsa_flush);
> }
> @@ -209,12 +206,6 @@ static struct pipe_context *r600_create_context(struct 
> pipe_screen *screen,
>
> r600_begin_new_cs(rctx);
>
> -   rctx->dummy_pixel_shader =
> -   util_make_fragment_cloneinput_shader(>b.b, 0,
> -TGSI_SEMANTIC_GENERIC,
> -
> TGSI_INTERPOLATE_CONSTANT);
> -   rctx->b.b.bind_fs_state(>b.b, rctx->dummy_pixel_shader);
> -
> return >b.b;
>
>  fail:
> diff --git a/src/gallium/drivers/r600/r600_pipe.h 
> b/src/gallium/drivers/r600/r600_pipe.h
> index 7f1ecc278b..e636ef0024 100644
> --- a/src/gallium/drivers/r600/r600_pipe.h
> +++ b/src/gallium/drivers/r600/r600_pipe.h
> @@ -432,9 +432,6 @@ struct r600_context {
> void*custom_blend_resolve;
> void*custom_blend_decompress;
> void*custom_blend_fastclear;
> -   /* With rasterizer discard, there doesn't have to be a pixel shader.
> -* In that case, we bind this one: */
> -   void*dummy_pixel_shader;
> /* These dummy CMASK and FMASK buffers are used to get around the 
> R6xx hardware
>  * bug where valid CMASK and FMASK are required to be present to avoid
>  * a hardlock in certain operations but aren't actually used
> diff --git a/src/gallium/drivers/r600/r600_state_common.c 
> b/src/gallium/drivers/r600/r600_state_common.c
> index 5be49dcdfe..0131ea80d2 100644
> --- a/src/gallium/drivers/r600/r600_state_common.c
> +++ b/src/gallium/drivers/r600/r600_state_common.c
> @@ -725,7 +725,8 @@ static inline void r600_shader_selector_key(const struct 
> pipe_context *ctx,
> if (!key->vs.as_ls)
> key->vs.as_es = (rctx->gs_shader != NULL);
>
> -   if (rctx->ps_shader->current->shader.gs_prim_id_input && 
> !rctx->gs_shader) {
> +   if (rctx->ps_shader && 
> rctx->ps_shader->current->shader.gs_prim_id_input &&
> +   !rctx->gs_shader) {
> key->vs.as_gs_a = true;
> key->vs.prim_id_out = 
> rctx->ps_shader->current->shader.input[rctx->ps_shader->current->shader.ps_prim_id_input].spi_sid;
> }
> @@ -909,9 +910,6 @@ static void r600_bind_ps_state(struct pipe_context *ctx, 
> void *state)
>  {
> struct r600_context *rctx = (struct r600_context *)ctx;
>
> -   if (!state)
> -   state = rctx->dummy_pixel_shader;
> -
> rctx->ps_shader = (struct r600_pipe_shader_selector *)state;
>  }
>
> @@ -1478,7 +1476,8 @@ static bool r600_update_derived_state(struct 
> r600_context *rctx)
> }
> }
>
> -   SELECT_SHADER_OR_FAIL(ps);
> +   if (rctx->ps_shader)
> +   SELECT_SHADER_OR_FAIL(ps);
>
> r600_mark_atom_dirty(rctx, >shader_stages.atom);
>
> @@ -1555,37 +1554,40 @@ static bool r600_update_derived_state(struct 
> r600_context *rctx)
> rctx->b.streamout.enabled_stream_buffers_mask = 
> clip_so_current->enabled_stream_buffers_mask;
> }
>
> -   if (unlikely(ps_dirty || 
> rctx->hw_shader_stages[R600_HW_STAGE_PS].shader != rctx->ps_shader->current ||
> -   rctx->rasterizer->sprite_coord_enable != 
> 

Re: [Mesa-dev] [PATCH] mesa: use single memcpy when strides match

2017-04-10 Thread Brian Paul

Pushed, with slightly more descriptive commit msg.

-Brian

On 04/10/2017 12:31 PM, Bartosz Tomczyk wrote:

v2: fix indentation
---
  src/mesa/main/readpix.c  | 15 ++-
  src/mesa/main/texstore.c | 15 +++
  2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
index 25823230d6..606d1e58e5 100644
--- a/src/mesa/main/readpix.c
+++ b/src/mesa/main/readpix.c
@@ -220,7 +220,7 @@ readpixels_memcpy(struct gl_context *ctx,
 struct gl_renderbuffer *rb =
   _mesa_get_read_renderbuffer_for_format(ctx, format);
 GLubyte *dst, *map;
-   int dstStride, stride, j, texelBytes;
+   int dstStride, stride, j, texelBytes, bytesPerRow;

 /* Fail if memcpy cannot be used. */
 if (!readpixels_can_use_memcpy(ctx, format, type, packing)) {
@@ -239,12 +239,17 @@ readpixels_memcpy(struct gl_context *ctx,
 }

 texelBytes = _mesa_get_format_bytes(rb->Format);
+   bytesPerRow = texelBytes * width;

 /* memcpy*/
-   for (j = 0; j < height; j++) {
-  memcpy(dst, map, width * texelBytes);
-  dst += dstStride;
-  map += stride;
+   if (dstStride == stride && dstStride == bytesPerRow) {
+  memcpy(dst, map, bytesPerRow * height);
+   } else {
+  for (j = 0; j < height; j++) {
+ memcpy(dst, map, bytesPerRow);
+ dst += dstStride;
+ map += stride;
+  }
 }

 ctx->Driver.UnmapRenderbuffer(ctx, rb);
diff --git a/src/mesa/main/texstore.c b/src/mesa/main/texstore.c
index 615ba63362..3314e557c0 100644
--- a/src/mesa/main/texstore.c
+++ b/src/mesa/main/texstore.c
@@ -1360,10 +1360,17 @@ _mesa_store_compressed_texsubimage(struct gl_context 
*ctx, GLuint dims,
if (dstMap) {

   /* copy rows of blocks */
- for (i = 0; i < store.CopyRowsPerSlice; i++) {
-memcpy(dstMap, src, store.CopyBytesPerRow);
-dstMap += dstRowStride;
-src += store.TotalBytesPerRow;
+ if (dstRowStride == store.TotalBytesPerRow &&
+ dstRowStride == store.CopyBytesPerRow) {
+memcpy(dstMap, src, store.CopyBytesPerRow * 
store.CopyRowsPerSlice);
+src += store.CopyBytesPerRow * store.CopyRowsPerSlice;
+ }
+ else {
+for (i = 0; i < store.CopyRowsPerSlice; i++) {
+   memcpy(dstMap, src, store.CopyBytesPerRow);
+   dstMap += dstRowStride;
+   src += store.TotalBytesPerRow;
+}
   }

   ctx->Driver.UnmapTextureImage(ctx, texImage, slice + zoffset);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] anv/pass: Initialize anv_pass::subpass_attachments

2017-04-10 Thread Nanley Chery
Fixes 0039d0cf278 "anv/pass: Use anv_multialloc for allocating the anv_pass"

Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/anv_pass.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/intel/vulkan/anv_pass.c b/src/intel/vulkan/anv_pass.c
index dcd9aafc64..c9473f07ed 100644
--- a/src/intel/vulkan/anv_pass.c
+++ b/src/intel/vulkan/anv_pass.c
@@ -77,6 +77,7 @@ VkResult anv_CreateRenderPass(
pass->subpass_count = pCreateInfo->subpassCount;
pass->attachments = attachments;
pass->subpass_flushes = subpass_flushes;
+   pass->subpass_attachments = subpass_attachments;
 
for (uint32_t i = 0; i < pCreateInfo->attachmentCount; i++) {
   struct anv_render_pass_attachment *att = >attachments[i];
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] mesa/arrayobj: use atomics for reference counting

2017-04-10 Thread Bartosz Tomczyk
---
 src/mesa/main/arrayobj.c | 16 
 src/mesa/main/mtypes.h   |  2 --
 2 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/src/mesa/main/arrayobj.c b/src/mesa/main/arrayobj.c
index ab1b834b6d..39bdb2e715 100644
--- a/src/mesa/main/arrayobj.c
+++ b/src/mesa/main/arrayobj.c
@@ -53,6 +53,7 @@
 #include "varray.h"
 #include "main/dispatch.h"
 #include "util/bitscan.h"
+#include "util/u_atomic.h"
 
 
 /**
@@ -169,7 +170,6 @@ _mesa_delete_vao(struct gl_context *ctx, struct 
gl_vertex_array_object *obj)
 {
unbind_array_object_vbos(ctx, obj);
_mesa_reference_buffer_object(ctx, >IndexBufferObj, NULL);
-   mtx_destroy(>Mutex);
free(obj->Label);
free(obj);
 }
@@ -189,16 +189,11 @@ _mesa_reference_vao_(struct gl_context *ctx,
 
if (*ptr) {
   /* Unreference the old array object */
-  GLboolean deleteFlag = GL_FALSE;
   struct gl_vertex_array_object *oldObj = *ptr;
 
-  mtx_lock(>Mutex);
   assert(oldObj->RefCount > 0);
-  oldObj->RefCount--;
-  deleteFlag = (oldObj->RefCount == 0);
-  mtx_unlock(>Mutex);
 
-  if (deleteFlag)
+  if (p_atomic_dec_zero(>RefCount))
  _mesa_delete_vao(ctx, oldObj);
 
   *ptr = NULL;
@@ -207,18 +202,16 @@ _mesa_reference_vao_(struct gl_context *ctx,
 
if (vao) {
   /* reference new array object */
-  mtx_lock(>Mutex);
-  if (vao->RefCount == 0) {
+  if (p_atomic_read(>RefCount) == 0) {
  /* this array's being deleted (look just above) */
  /* Not sure this can every really happen.  Warn if it does. */
  _mesa_problem(NULL, "referencing deleted array object");
  *ptr = NULL;
   }
   else {
- vao->RefCount++;
+ p_atomic_inc(>RefCount);
  *ptr = vao;
   }
-  mtx_unlock(>Mutex);
}
 }
 
@@ -274,7 +267,6 @@ _mesa_initialize_vao(struct gl_context *ctx,
 
vao->Name = name;
 
-   mtx_init(>Mutex, mtx_plain);
vao->RefCount = 1;
 
/* Init the individual arrays */
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index e5f7cbaa5b..5de464cc1b 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -1509,8 +1509,6 @@ struct gl_vertex_array_object
 
GLchar *Label;   /**< GL_KHR_debug */
 
-   mtx_t Mutex;
-
/**
 * Does the VAO use ARB semantics or Apple semantics?
 *
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] mesa/renderbuffer: use atomics for reference counting

2017-04-10 Thread Bartosz Tomczyk
---
 src/mesa/main/fbobject.c |  1 -
 src/mesa/main/mtypes.h   |  1 -
 src/mesa/main/renderbuffer.c | 15 +++
 3 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c
index d486d01195..f85f26674d 100644
--- a/src/mesa/main/fbobject.c
+++ b/src/mesa/main/fbobject.c
@@ -90,7 +90,6 @@ void
 _mesa_init_fbobjects(struct gl_context *ctx)
 {
mtx_init(, mtx_plain);
-   mtx_init(, mtx_plain);
mtx_init(, mtx_plain);
DummyFramebuffer.Delete = delete_dummy_framebuffer;
DummyRenderbuffer.Delete = delete_dummy_renderbuffer;
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 8b1577dd3f..d37a60d61c 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -3233,7 +3233,6 @@ struct gl_shared_state
  */
 struct gl_renderbuffer
 {
-   mtx_t Mutex; /**< for thread safety */
GLuint ClassID;/**< Useful for drivers */
GLuint Name;
GLchar *Label; /**< GL_KHR_debug */
diff --git a/src/mesa/main/renderbuffer.c b/src/mesa/main/renderbuffer.c
index 627bdca66c..ce4f0f229a 100644
--- a/src/mesa/main/renderbuffer.c
+++ b/src/mesa/main/renderbuffer.c
@@ -30,6 +30,7 @@
 #include "formats.h"
 #include "mtypes.h"
 #include "renderbuffer.h"
+#include "util/u_atomic.h"
 
 
 /**
@@ -40,8 +41,6 @@ _mesa_init_renderbuffer(struct gl_renderbuffer *rb, GLuint 
name)
 {
GET_CURRENT_CONTEXT(ctx);
 
-   mtx_init(>Mutex, mtx_plain);
-
rb->ClassID = 0;
rb->Name = name;
rb->RefCount = 1;
@@ -101,7 +100,6 @@ _mesa_new_renderbuffer(struct gl_context *ctx, GLuint name)
 void
 _mesa_delete_renderbuffer(struct gl_context *ctx, struct gl_renderbuffer *rb)
 {
-   mtx_destroy(>Mutex);
free(rb->Label);
free(rb);
 }
@@ -195,16 +193,11 @@ _mesa_reference_renderbuffer_(struct gl_renderbuffer 
**ptr,
 {
if (*ptr) {
   /* Unreference the old renderbuffer */
-  GLboolean deleteFlag = GL_FALSE;
   struct gl_renderbuffer *oldRb = *ptr;
 
-  mtx_lock(>Mutex);
   assert(oldRb->RefCount > 0);
-  oldRb->RefCount--;
-  deleteFlag = (oldRb->RefCount == 0);
-  mtx_unlock(>Mutex);
 
-  if (deleteFlag) {
+  if p_atomic_dec_zero(>RefCount) {
  GET_CURRENT_CONTEXT(ctx);
  oldRb->Delete(ctx, oldRb);
   }
@@ -215,9 +208,7 @@ _mesa_reference_renderbuffer_(struct gl_renderbuffer **ptr,
 
if (rb) {
   /* reference new renderbuffer */
-  mtx_lock(>Mutex);
-  rb->RefCount++;
-  mtx_unlock(>Mutex);
+  p_atomic_inc(>RefCount);
   *ptr = rb;
}
 }
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] mesa/samplerobj: use atomics for reference counting

2017-04-10 Thread Bartosz Tomczyk
---
 src/mesa/main/mtypes.h |  1 -
 src/mesa/main/samplerobj.c | 16 
 2 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index d37a60d61c..5a1be17a92 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -968,7 +968,6 @@ typedef enum
  */
 struct gl_sampler_object
 {
-   mtx_t Mutex;
GLuint Name;
GLint RefCount;
GLchar *Label;   /**< GL_KHR_debug */
diff --git a/src/mesa/main/samplerobj.c b/src/mesa/main/samplerobj.c
index 183f1d2a86..fa826b5bef 100644
--- a/src/mesa/main/samplerobj.c
+++ b/src/mesa/main/samplerobj.c
@@ -38,6 +38,7 @@
 #include "main/macros.h"
 #include "main/mtypes.h"
 #include "main/samplerobj.h"
+#include "util/u_atomic.h"
 
 
 struct gl_sampler_object *
@@ -61,7 +62,6 @@ static void
 delete_sampler_object(struct gl_context *ctx,
   struct gl_sampler_object *sampObj)
 {
-   mtx_destroy(>Mutex);
free(sampObj->Label);
free(sampObj);
 }
@@ -78,16 +78,11 @@ _mesa_reference_sampler_object_(struct gl_context *ctx,
 
if (*ptr) {
   /* Unreference the old sampler */
-  GLboolean deleteFlag = GL_FALSE;
   struct gl_sampler_object *oldSamp = *ptr;
 
-  mtx_lock(>Mutex);
   assert(oldSamp->RefCount > 0);
-  oldSamp->RefCount--;
-  deleteFlag = (oldSamp->RefCount == 0);
-  mtx_unlock(>Mutex);
 
-  if (deleteFlag)
+  if (p_atomic_dec_zero(>RefCount))
  delete_sampler_object(ctx, oldSamp);
 
   *ptr = NULL;
@@ -96,18 +91,16 @@ _mesa_reference_sampler_object_(struct gl_context *ctx,
 
if (samp) {
   /* reference new sampler */
-  mtx_lock(>Mutex);
-  if (samp->RefCount == 0) {
+  if (p_atomic_read(>RefCount) == 0) {
  /* this sampler's being deleted (look just above) */
  /* Not sure this can every really happen.  Warn if it does. */
  _mesa_problem(NULL, "referencing deleted sampler object");
  *ptr = NULL;
   }
   else {
- samp->RefCount++;
+ p_atomic_inc(>RefCount);
  *ptr = samp;
   }
-  mtx_unlock(>Mutex);
}
 }
 
@@ -118,7 +111,6 @@ _mesa_reference_sampler_object_(struct gl_context *ctx,
 static void
 _mesa_init_sampler_object(struct gl_sampler_object *sampObj, GLuint name)
 {
-   mtx_init(>Mutex, mtx_plain);
sampObj->Name = name;
sampObj->RefCount = 1;
sampObj->WrapS = GL_REPEAT;
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] mesa/texobj: use atomics for reference counting

2017-04-10 Thread Bartosz Tomczyk
---
 src/mesa/main/mtypes.h |  1 -
 src/mesa/main/texobj.c | 19 ---
 2 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 5a1be17a92..a1eabc8bf1 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -995,7 +995,6 @@ struct gl_sampler_object
  */
 struct gl_texture_object
 {
-   mtx_t Mutex;  /**< for thread safety */
GLint RefCount; /**< reference count */
GLuint Name;/**< the user-visible texture object ID */
GLchar *Label;   /**< GL_KHR_debug */
diff --git a/src/mesa/main/texobj.c b/src/mesa/main/texobj.c
index ad644ca1ca..4afa7d8fb2 100644
--- a/src/mesa/main/texobj.c
+++ b/src/mesa/main/texobj.c
@@ -43,6 +43,7 @@
 #include "texstate.h"
 #include "mtypes.h"
 #include "program/prog_instruction.h"
+#include "util/u_atomic.h"
 
 
 
@@ -270,7 +271,6 @@ _mesa_initialize_texture_object( struct gl_context *ctx,
 
memset(obj, 0, sizeof(*obj));
/* init the non-zero fields */
-   mtx_init(>Mutex, mtx_plain);
obj->RefCount = 1;
obj->Name = name;
obj->Target = target;
@@ -399,9 +399,6 @@ _mesa_delete_texture_object(struct gl_context *ctx,
 
_mesa_reference_buffer_object(ctx, >BufferObject, NULL);
 
-   /* destroy the mutex -- it may have allocated memory (eg on bsd) */
-   mtx_destroy(>Mutex);
-
free(texObj->Label);
 
/* free this object */
@@ -534,20 +531,14 @@ _mesa_reference_texobj_(struct gl_texture_object **ptr,
 
if (*ptr) {
   /* Unreference the old texture */
-  GLboolean deleteFlag = GL_FALSE;
   struct gl_texture_object *oldTex = *ptr;
 
   assert(valid_texture_object(oldTex));
   (void) valid_texture_object; /* silence warning in release builds */
 
-  mtx_lock(>Mutex);
   assert(oldTex->RefCount > 0);
-  oldTex->RefCount--;
-
-  deleteFlag = (oldTex->RefCount == 0);
-  mtx_unlock(>Mutex);
 
-  if (deleteFlag) {
+  if (p_atomic_dec_zero(>RefCount)) {
  /* Passing in the context drastically changes the driver code for
   * framebuffer deletion.
   */
@@ -565,18 +556,16 @@ _mesa_reference_texobj_(struct gl_texture_object **ptr,
if (tex) {
   /* reference new texture */
   assert(valid_texture_object(tex));
-  mtx_lock(>Mutex);
-  if (tex->RefCount == 0) {
+  if (p_atomic_read(>RefCount) == 0) {
  /* this texture's being deleted (look just above) */
  /* Not sure this can every really happen.  Warn if it does. */
  _mesa_problem(NULL, "referencing deleted texture object");
  *ptr = NULL;
   }
   else {
- tex->RefCount++;
+ p_atomic_inc(>RefCount);
  *ptr = tex;
   }
-  mtx_unlock(>Mutex);
}
 }
 
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] mesa/pipelineobj: use atomics for reference counting

2017-04-10 Thread Bartosz Tomczyk
---
 src/mesa/main/mtypes.h  |  2 --
 src/mesa/main/pipelineobj.c | 16 
 src/mesa/main/shaderapi.c   |  2 --
 3 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 5de464cc1b..8b1577dd3f 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -2998,8 +2998,6 @@ struct gl_pipeline_object
 
GLint RefCount;
 
-   mtx_t Mutex;
-
GLchar *Label;   /**< GL_KHR_debug */
 
/**
diff --git a/src/mesa/main/pipelineobj.c b/src/mesa/main/pipelineobj.c
index c1dd8d75c7..38fa9dcdbf 100644
--- a/src/mesa/main/pipelineobj.c
+++ b/src/mesa/main/pipelineobj.c
@@ -48,6 +48,7 @@
 #include "program/program.h"
 #include "program/prog_parameter.h"
 #include "util/ralloc.h"
+#include "util/u_atomic.h"
 
 /**
  * Delete a pipeline object.
@@ -66,7 +67,6 @@ _mesa_delete_pipeline_object(struct gl_context *ctx,
}
 
_mesa_reference_shader_program(ctx, >ActiveProgram, NULL);
-   mtx_destroy(>Mutex);
free(obj->Label);
ralloc_free(obj);
 }
@@ -80,7 +80,6 @@ _mesa_new_pipeline_object(struct gl_context *ctx, GLuint name)
struct gl_pipeline_object *obj = rzalloc(NULL, struct gl_pipeline_object);
if (obj) {
   obj->Name = name;
-  mtx_init(>Mutex, mtx_plain);
   obj->RefCount = 1;
   obj->Flags = _mesa_get_shader_flags();
   obj->InfoLog = NULL;
@@ -186,16 +185,11 @@ _mesa_reference_pipeline_object_(struct gl_context *ctx,
 
if (*ptr) {
   /* Unreference the old pipeline object */
-  GLboolean deleteFlag = GL_FALSE;
   struct gl_pipeline_object *oldObj = *ptr;
 
-  mtx_lock(>Mutex);
   assert(oldObj->RefCount > 0);
-  oldObj->RefCount--;
-  deleteFlag = (oldObj->RefCount == 0);
-  mtx_unlock(>Mutex);
 
-  if (deleteFlag) {
+  if (p_atomic_dec_zero(>RefCount)) {
  _mesa_delete_pipeline_object(ctx, oldObj);
   }
 
@@ -205,18 +199,16 @@ _mesa_reference_pipeline_object_(struct gl_context *ctx,
 
if (obj) {
   /* reference new pipeline object */
-  mtx_lock(>Mutex);
-  if (obj->RefCount == 0) {
+  if (p_atomic_read(>RefCount) == 0) {
  /* this pipeline's being deleted (look just above) */
  /* Not sure this can ever really happen.  Warn if it does. */
  _mesa_problem(NULL, "referencing deleted pipeline object");
  *ptr = NULL;
   }
   else {
- obj->RefCount++;
+ p_atomic_inc(>RefCount);
  *ptr = obj;
   }
-  mtx_unlock(>Mutex);
}
 }
 
diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
index 187475f127..0815ce36ff 100644
--- a/src/mesa/main/shaderapi.c
+++ b/src/mesa/main/shaderapi.c
@@ -138,7 +138,6 @@ _mesa_init_shader_state(struct gl_context *ctx)
 
/* Extended for ARB_separate_shader_objects */
ctx->Shader.RefCount = 1;
-   mtx_init(>Shader.Mutex, mtx_plain);
 
ctx->TessCtrlProgram.patch_vertices = 3;
for (i = 0; i < 4; ++i)
@@ -164,7 +163,6 @@ _mesa_free_shader_state(struct gl_context *ctx)
_mesa_reference_pipeline_object(ctx, >_Shader, NULL);
 
assert(ctx->Shader.RefCount == 1);
-   mtx_destroy(>Shader.Mutex);
 }
 
 
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/5] use atomics for reference counting

2017-04-10 Thread Bartosz Tomczyk
Bartosz Tomczyk (5):
  mesa/arrayobj: use atomics for reference counting
  mesa/pipelineobj: use atomics for reference counting
  mesa/renderbuffer: use atomics for reference counting
  mesa/samplerobj: use atomics for reference counting
  mesa/texobj: use atomics for reference counting

 src/mesa/main/arrayobj.c | 16 
 src/mesa/main/fbobject.c |  1 -
 src/mesa/main/mtypes.h   |  7 ---
 src/mesa/main/pipelineobj.c  | 16 
 src/mesa/main/renderbuffer.c | 15 +++
 src/mesa/main/samplerobj.c   | 16 
 src/mesa/main/shaderapi.c|  2 --
 src/mesa/main/texobj.c   | 19 ---
 8 files changed, 19 insertions(+), 73 deletions(-)

-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/3 v3] r600g: shader logic improvements

2017-04-10 Thread Constantine Kharlamov
Although I didn't see a statistically significant change in GTAⅣ benchmark, it
seem to have reduced stall for opening the door from a house to the outer world
at the first savepoint.

No changes in gpu.py tests of piglit in gbm mode.

v2: In the 1-st patch was occasionally removed empty line. Don't do that.

To the 3-rd patch added a check I missed because of macros using prefix.
Tbh I'd rather prefer to split ps-related logic out of
r600_update_derived_state(), but after more than hour of looking into 
it,
and with understanding only half of the logic, I gave up.

v3: 1-st patch: get the check for null tes and gs back, while I haven't
figured out the best way to move stride assignment into
r600_update_derived_state() (as it is in radeonsi).

2,3 are the same, already reviewed, and rebased against the 1-st.

Constantine Kharlamov (3):
  r600g: skip repeating vs, gs, and tes shader binds
  r600g: add draw_vbo check for a NULL pixel shader
  r600g: get rid of dummy pixel shader

 src/gallium/drivers/r600/evergreen_state.c   |  1 +
 src/gallium/drivers/r600/r600_pipe.c |  9 
 src/gallium/drivers/r600/r600_pipe.h |  4 +-
 src/gallium/drivers/r600/r600_state.c|  3 +-
 src/gallium/drivers/r600/r600_state_common.c | 73 
 5 files changed, 47 insertions(+), 43 deletions(-)

-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3 v2] r600g: get rid of dummy pixel shader

2017-04-10 Thread Constantine Kharlamov
The idea is taken from radeonsi. The code mostly was already checking for null
pixel shader, so little checks had to be added.

Interestingly, acc. to testing with GTAⅣ, though binding of null shader happens
a lot at the start (then just stops), but draw_vbo() never actually sees null
ps.

v2: added a check I missed because of a macros using a prefix to choose
a shader.

Signed-off-by: Constantine Kharlamov 
Reviewed-by: Marek Olšák 
---
 src/gallium/drivers/r600/r600_pipe.c |  9 -
 src/gallium/drivers/r600/r600_pipe.h |  3 --
 src/gallium/drivers/r600/r600_state_common.c | 58 ++--
 3 files changed, 30 insertions(+), 40 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 5014f2525c..7d8efd2c9b 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -82,9 +82,6 @@ static void r600_destroy_context(struct pipe_context *context)
if (rctx->fixed_func_tcs_shader)
rctx->b.b.delete_tcs_state(>b.b, 
rctx->fixed_func_tcs_shader);
 
-   if (rctx->dummy_pixel_shader) {
-   rctx->b.b.delete_fs_state(>b.b, rctx->dummy_pixel_shader);
-   }
if (rctx->custom_dsa_flush) {
rctx->b.b.delete_depth_stencil_alpha_state(>b.b, 
rctx->custom_dsa_flush);
}
@@ -209,12 +206,6 @@ static struct pipe_context *r600_create_context(struct 
pipe_screen *screen,
 
r600_begin_new_cs(rctx);
 
-   rctx->dummy_pixel_shader =
-   util_make_fragment_cloneinput_shader(>b.b, 0,
-TGSI_SEMANTIC_GENERIC,
-TGSI_INTERPOLATE_CONSTANT);
-   rctx->b.b.bind_fs_state(>b.b, rctx->dummy_pixel_shader);
-
return >b.b;
 
 fail:
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 7f1ecc278b..e636ef0024 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -432,9 +432,6 @@ struct r600_context {
void*custom_blend_resolve;
void*custom_blend_decompress;
void*custom_blend_fastclear;
-   /* With rasterizer discard, there doesn't have to be a pixel shader.
-* In that case, we bind this one: */
-   void*dummy_pixel_shader;
/* These dummy CMASK and FMASK buffers are used to get around the R6xx 
hardware
 * bug where valid CMASK and FMASK are required to be present to avoid
 * a hardlock in certain operations but aren't actually used
diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index 5be49dcdfe..0131ea80d2 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -725,7 +725,8 @@ static inline void r600_shader_selector_key(const struct 
pipe_context *ctx,
if (!key->vs.as_ls)
key->vs.as_es = (rctx->gs_shader != NULL);
 
-   if (rctx->ps_shader->current->shader.gs_prim_id_input && 
!rctx->gs_shader) {
+   if (rctx->ps_shader && 
rctx->ps_shader->current->shader.gs_prim_id_input &&
+   !rctx->gs_shader) {
key->vs.as_gs_a = true;
key->vs.prim_id_out = 
rctx->ps_shader->current->shader.input[rctx->ps_shader->current->shader.ps_prim_id_input].spi_sid;
}
@@ -909,9 +910,6 @@ static void r600_bind_ps_state(struct pipe_context *ctx, 
void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
-   if (!state)
-   state = rctx->dummy_pixel_shader;
-
rctx->ps_shader = (struct r600_pipe_shader_selector *)state;
 }
 
@@ -1478,7 +1476,8 @@ static bool r600_update_derived_state(struct r600_context 
*rctx)
}
}
 
-   SELECT_SHADER_OR_FAIL(ps);
+   if (rctx->ps_shader)
+   SELECT_SHADER_OR_FAIL(ps);
 
r600_mark_atom_dirty(rctx, >shader_stages.atom);
 
@@ -1555,37 +1554,40 @@ static bool r600_update_derived_state(struct 
r600_context *rctx)
rctx->b.streamout.enabled_stream_buffers_mask = 
clip_so_current->enabled_stream_buffers_mask;
}
 
-   if (unlikely(ps_dirty || 
rctx->hw_shader_stages[R600_HW_STAGE_PS].shader != rctx->ps_shader->current ||
-   rctx->rasterizer->sprite_coord_enable != 
rctx->ps_shader->current->sprite_coord_enable ||
-   rctx->rasterizer->flatshade != 
rctx->ps_shader->current->flatshade)) {
+   if (rctx->ps_shader) {
+   if (unlikely((ps_dirty || 
rctx->hw_shader_stages[R600_HW_STAGE_PS].shader != rctx->ps_shader->current ||
+ rctx->rasterizer->sprite_coord_enable != 

[Mesa-dev] [PATCH 2/3] r600g: add draw_vbo check for a NULL pixel shader

2017-04-10 Thread Constantine Kharlamov
Taken from radeonsi, required to remove dummy pixel shader in the next patch

Signed-off-by: Constantine Kharlamov 
Reviewed-by: Marek Olšák 
---
 src/gallium/drivers/r600/evergreen_state.c   | 1 +
 src/gallium/drivers/r600/r600_pipe.h | 1 +
 src/gallium/drivers/r600/r600_state.c| 3 ++-
 src/gallium/drivers/r600/r600_state_common.c | 7 ++-
 4 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 371e7ce212..5697da4af9 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -471,6 +471,7 @@ static void *evergreen_create_rs_state(struct pipe_context 
*ctx,
rs->clip_halfz = state->clip_halfz;
rs->flatshade = state->flatshade;
rs->sprite_coord_enable = state->sprite_coord_enable;
+   rs->rasterizer_discard = state->rasterizer_discard;
rs->two_side = state->light_twoside;
rs->clip_plane_enable = state->clip_plane_enable;
rs->pa_sc_line_stipple = state->line_stipple_enable ?
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 86634b8681..7f1ecc278b 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -279,6 +279,7 @@ struct r600_rasterizer_state {
boolscissor_enable;
boolmultisample_enable;
boolclip_halfz;
+   boolrasterizer_discard;
 };
 
 struct r600_poly_offset_state {
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index 1f7e9b3aa5..06100abc4a 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -470,6 +470,7 @@ static void *r600_create_rs_state(struct pipe_context *ctx,
rs->clip_halfz = state->clip_halfz;
rs->flatshade = state->flatshade;
rs->sprite_coord_enable = state->sprite_coord_enable;
+   rs->rasterizer_discard = state->rasterizer_discard;
rs->two_side = state->light_twoside;
rs->clip_plane_enable = state->clip_plane_enable;
rs->pa_sc_line_stipple = state->line_stipple_enable ?
@@ -622,7 +623,7 @@ static void *r600_create_sampler_state(struct pipe_context 
*ctx,
 static struct pipe_sampler_view *
 texture_buffer_sampler_view(struct r600_pipe_sampler_view *view,
unsigned width0, unsigned height0)
-   
+
 {
struct r600_texture *tmp = (struct r600_texture*)view->base.texture;
int stride = util_format_get_blocksize(view->base.format);
diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index 922030a1ed..5be49dcdfe 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -1712,7 +1712,12 @@ static void r600_draw_vbo(struct pipe_context *ctx, 
const struct pipe_draw_info
return;
}
 
-   if (unlikely(!rctx->vs_shader || !rctx->ps_shader)) {
+   if (unlikely(!rctx->vs_shader)) {
+   assert(0);
+   return;
+   }
+   if (unlikely(!rctx->ps_shader &&
+(!rctx->rasterizer || 
!rctx->rasterizer->rasterizer_discard))) {
assert(0);
return;
}
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3 v3] r600g: skip repeating vs, gs, and tes shader binds

2017-04-10 Thread Constantine Kharlamov
The idea is taken from radeonsi. The code lacks some checks for null vs,
and I'm unsure about some changes against that, so I left it in place.

Some statistics for GTAⅣ:
Average tesselation bind skip per frame: ≈350
Average geometric shaders bind skip per frame: ≈260
Skip of binding vertex ones occurs rarely enough to not get into per-frame
counter at all, so I just gonna say: it happens.

v2: I've occasionally removed an empty line, don't do this.
v3: return a check for null tes and gs back, while I haven't figured out
the way to move stride assignment to r600_update_derived_state() (as it
is in radeonsi).

Signed-off-by: Constantine Kharlamov 
---
 src/gallium/drivers/r600/r600_state_common.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index 4de2a7344b..922030a1ed 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -931,7 +931,7 @@ static void r600_bind_vs_state(struct pipe_context *ctx, 
void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
-   if (!state)
+   if (!state || rctx->vs_shader == state)
return;
 
rctx->vs_shader = (struct r600_pipe_shader_selector *)state;
@@ -943,6 +943,9 @@ static void r600_bind_gs_state(struct pipe_context *ctx, 
void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
+   if (state == rctx->gs_shader)
+   return;
+
rctx->gs_shader = (struct r600_pipe_shader_selector *)state;
r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx));
 
@@ -962,6 +965,9 @@ static void r600_bind_tes_state(struct pipe_context *ctx, 
void *state)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
+   if (state == rctx->tes_shader)
+   return;
+
rctx->tes_shader = (struct r600_pipe_shader_selector *)state;
r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx));
 
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: use single memcpy when strides matches

2017-04-10 Thread Bartosz Tomczyk

Please do, I don't have commits rights.


On 10.04.2017 20:44, Brian Paul wrote:

On 04/10/2017 12:35 PM, Bartosz Tomczyk wrote:

Yes, I tested with Piglit, there is no regression.



Do you need me to push this for you?

-Brian



On 10.04.2017 19:16, Brian Paul wrote:

On 04/09/2017 07:58 AM, Bartosz Tomczyk wrote:

---
  src/mesa/main/readpix.c  | 15 ++-
  src/mesa/main/texstore.c | 15 +++
  2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
index 25823230d6..14568de497 100644
--- a/src/mesa/main/readpix.c
+++ b/src/mesa/main/readpix.c
@@ -220,7 +220,7 @@ readpixels_memcpy(struct gl_context *ctx,
 struct gl_renderbuffer *rb =
   _mesa_get_read_renderbuffer_for_format(ctx, format);
 GLubyte *dst, *map;
-   int dstStride, stride, j, texelBytes;
+   int dstStride, stride, j, texelBytes, bytesPerRow;

 /* Fail if memcpy cannot be used. */
 if (!readpixels_can_use_memcpy(ctx, format, type, packing)) {
@@ -239,12 +239,17 @@ readpixels_memcpy(struct gl_context *ctx,
 }

 texelBytes = _mesa_get_format_bytes(rb->Format);
+   bytesPerRow = texelBytes * width;

 /* memcpy*/
-   for (j = 0; j < height; j++) {
-  memcpy(dst, map, width * texelBytes);
-  dst += dstStride;
-  map += stride;
+   if (dstStride == stride && dstStride == bytesPerRow) {
+ memcpy(dst, map, bytesPerRow * height);


Too much indentation there.

Looks OK otherwise.  I assume you tested with Piglit too.

Reviewed-by: Brian Paul 



+   } else {
+  for (j = 0; j < height; j++) {
+ memcpy(dst, map, bytesPerRow);
+ dst += dstStride;
+ map += stride;
+  }
 }

 ctx->Driver.UnmapRenderbuffer(ctx, rb);
diff --git a/src/mesa/main/texstore.c b/src/mesa/main/texstore.c
index 615ba63362..3314e557c0 100644
--- a/src/mesa/main/texstore.c
+++ b/src/mesa/main/texstore.c
@@ -1360,10 +1360,17 @@ _mesa_store_compressed_texsubimage(struct
gl_context *ctx, GLuint dims,
if (dstMap) {

   /* copy rows of blocks */
- for (i = 0; i < store.CopyRowsPerSlice; i++) {
-memcpy(dstMap, src, store.CopyBytesPerRow);
-dstMap += dstRowStride;
-src += store.TotalBytesPerRow;
+ if (dstRowStride == store.TotalBytesPerRow &&
+ dstRowStride == store.CopyBytesPerRow) {
+memcpy(dstMap, src, store.CopyBytesPerRow *
store.CopyRowsPerSlice);
+src += store.CopyBytesPerRow * store.CopyRowsPerSlice;
+ }
+ else {
+for (i = 0; i < store.CopyRowsPerSlice; i++) {
+   memcpy(dstMap, src, store.CopyBytesPerRow);
+   dstMap += dstRowStride;
+   src += store.TotalBytesPerRow;
+}
   }

   ctx->Driver.UnmapTextureImage(ctx, texImage, slice +
zoffset);









___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Meson mesademos (Was: [RFC libdrm 0/2] Replace the build system with meson)

2017-04-10 Thread Dylan Baker
Quoting Nirbheek Chauhan (2017-04-10 06:59:02)
> Hello Jose,
> 
> On Mon, Apr 10, 2017 at 5:41 PM, Jose Fonseca  wrote:
> > I've been trying to get native mingw to build.  (It's still important to
> > prototype mesademos with MSVC to ensure meson is up to the task, but long
> > term, I think I'll push for dropping MSVC support from mesademos and piglit,
> > since MinGW is fine for this sort of samples/tests programs.)
> >
> > However native MinGW fails poorly:
> >
> > [78/1058] Static linking library src/util/libutil.a
> > FAILED: src/util/libutil.a
> > cmd /c del /f /s /q src/util/libutil.a && ar @src/util/libutil.a.rsp
> > Invalid switch - "util".
> >
> > So the problem here is that meson is passing `/` separator to the cmd.exe
> > del command, instead of `\`.
> >
> > Full log
> > https://ci.appveyor.com/project/jrfonseca/mesademos/build/job/6rpen94u7yq3q69n
> >
> 
> This was a regression with 0.39, and is already fixed in git master:
> https://github.com/mesonbuild/meson/pull/1527
> 
> It will be in the next release, which is scheduled for April 22. In
> the meantime, please test with git master.
> 
> >
> > TBH, this is basic windows functionality, and if it can't get it right then
> > it shakes my belief that's it's getting proper windows testing...
> >
> 
> I'm sorry to hear that.
> 
> >
> > I think part of the problem is that per
> > https://github.com/mesonbuild/meson/blob/master/.appveyor.yml Meson is only
> > being tested with MSYS (which provides a full-blow POSIX environment on
> > Windows), and not with plain MinGW.
> >
> 
> Actually, this slipped through the cracks (I broke it!) because we
> didn't have our CI testing MinGW. Now we do, specifically to catch
> this sort of stuff: https://github.com/mesonbuild/meson/pull/1346.
> 
> All our pull requests are required to pass all CI before they can be
> merged, and every bug fixed and feature added is required to have a
> new test case for it, so I expect the situation will not regress
> again.
> 
> Our CI is fairly comprehensive -- MSVC 2010, 2015, 2017, MinGW, Cygwin
> on just Windows and getting better every day. The biggest hole in it
> right now is BSD, and we would be extremely grateful if someone could
> help us with that too!
> 
> > IMHO, MSYS is a hack to get packages that use autotools to build with MinGW.
> > Packages that use Windows aware build systems (like Meson is trying to be)
> > should stay as _far_ as possible from MSYS
> >
> 
> Yes, I agree. MSYS2 in particular is especially broken (the toolchain
> is buggy and even the python3 shipped with it is crap) and we do not
> recommend using it at all (although a surprisingly large number of
> people use its toolchain, so we do support it). If you look closely,
> we do not use MSYS itself, only MinGW:
> 
> https://github.com/mesonbuild/meson/blob/master/.appveyor.yml#L61
> 
> The MSYS paths are C:\msys64\usr\bin and the MinGW (toolchain) paths
> are C:\msys64\mingw??\bin.
> 
> And in any case our codepaths for building something with the Ninja
> backend on MSVC and MinGW are almost identical, and our MSVC CI does
> not have any POSIX binaries in their path.
> 
> I even have all of Glib + dependencies building out of the box with
> just Meson git + MSVC [https://github.com/centricular/glib/], and my
> next step is to have all of GStreamer building that way.
> 
> Hope this clarifies things!
> 
> Cheers,
> Nirbheek

Jose,

I installed meson from git as Nirbheek suggested, and it got the mingw build
working, and fixed the appveyor build to actually start, although I ran into
some problems with freeglut I'm not sure if I'll have time to fix today
(although I'd like to get them fixed). If you pull my branch both the travis
build will turn completely green, and the MinGW build turns green on appveyor,
though MSVC still doesn't. My meson branch is based on yours and you should be
able to apply the changes cleanly.


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: use single memcpy when strides matches

2017-04-10 Thread Brian Paul

On 04/10/2017 12:35 PM, Bartosz Tomczyk wrote:

Yes, I tested with Piglit, there is no regression.



Do you need me to push this for you?

-Brian



On 10.04.2017 19:16, Brian Paul wrote:

On 04/09/2017 07:58 AM, Bartosz Tomczyk wrote:

---
  src/mesa/main/readpix.c  | 15 ++-
  src/mesa/main/texstore.c | 15 +++
  2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
index 25823230d6..14568de497 100644
--- a/src/mesa/main/readpix.c
+++ b/src/mesa/main/readpix.c
@@ -220,7 +220,7 @@ readpixels_memcpy(struct gl_context *ctx,
 struct gl_renderbuffer *rb =
   _mesa_get_read_renderbuffer_for_format(ctx, format);
 GLubyte *dst, *map;
-   int dstStride, stride, j, texelBytes;
+   int dstStride, stride, j, texelBytes, bytesPerRow;

 /* Fail if memcpy cannot be used. */
 if (!readpixels_can_use_memcpy(ctx, format, type, packing)) {
@@ -239,12 +239,17 @@ readpixels_memcpy(struct gl_context *ctx,
 }

 texelBytes = _mesa_get_format_bytes(rb->Format);
+   bytesPerRow = texelBytes * width;

 /* memcpy*/
-   for (j = 0; j < height; j++) {
-  memcpy(dst, map, width * texelBytes);
-  dst += dstStride;
-  map += stride;
+   if (dstStride == stride && dstStride == bytesPerRow) {
+ memcpy(dst, map, bytesPerRow * height);


Too much indentation there.

Looks OK otherwise.  I assume you tested with Piglit too.

Reviewed-by: Brian Paul 



+   } else {
+  for (j = 0; j < height; j++) {
+ memcpy(dst, map, bytesPerRow);
+ dst += dstStride;
+ map += stride;
+  }
 }

 ctx->Driver.UnmapRenderbuffer(ctx, rb);
diff --git a/src/mesa/main/texstore.c b/src/mesa/main/texstore.c
index 615ba63362..3314e557c0 100644
--- a/src/mesa/main/texstore.c
+++ b/src/mesa/main/texstore.c
@@ -1360,10 +1360,17 @@ _mesa_store_compressed_texsubimage(struct
gl_context *ctx, GLuint dims,
if (dstMap) {

   /* copy rows of blocks */
- for (i = 0; i < store.CopyRowsPerSlice; i++) {
-memcpy(dstMap, src, store.CopyBytesPerRow);
-dstMap += dstRowStride;
-src += store.TotalBytesPerRow;
+ if (dstRowStride == store.TotalBytesPerRow &&
+ dstRowStride == store.CopyBytesPerRow) {
+memcpy(dstMap, src, store.CopyBytesPerRow *
store.CopyRowsPerSlice);
+src += store.CopyBytesPerRow * store.CopyRowsPerSlice;
+ }
+ else {
+for (i = 0; i < store.CopyRowsPerSlice; i++) {
+   memcpy(dstMap, src, store.CopyBytesPerRow);
+   dstMap += dstRowStride;
+   src += store.TotalBytesPerRow;
+}
   }

   ctx->Driver.UnmapTextureImage(ctx, texImage, slice +
zoffset);







___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: use single memcpy when strides matches

2017-04-10 Thread Bartosz Tomczyk

Yes, I tested with Piglit, there is no regression.


On 10.04.2017 19:16, Brian Paul wrote:

On 04/09/2017 07:58 AM, Bartosz Tomczyk wrote:

---
  src/mesa/main/readpix.c  | 15 ++-
  src/mesa/main/texstore.c | 15 +++
  2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
index 25823230d6..14568de497 100644
--- a/src/mesa/main/readpix.c
+++ b/src/mesa/main/readpix.c
@@ -220,7 +220,7 @@ readpixels_memcpy(struct gl_context *ctx,
 struct gl_renderbuffer *rb =
   _mesa_get_read_renderbuffer_for_format(ctx, format);
 GLubyte *dst, *map;
-   int dstStride, stride, j, texelBytes;
+   int dstStride, stride, j, texelBytes, bytesPerRow;

 /* Fail if memcpy cannot be used. */
 if (!readpixels_can_use_memcpy(ctx, format, type, packing)) {
@@ -239,12 +239,17 @@ readpixels_memcpy(struct gl_context *ctx,
 }

 texelBytes = _mesa_get_format_bytes(rb->Format);
+   bytesPerRow = texelBytes * width;

 /* memcpy*/
-   for (j = 0; j < height; j++) {
-  memcpy(dst, map, width * texelBytes);
-  dst += dstStride;
-  map += stride;
+   if (dstStride == stride && dstStride == bytesPerRow) {
+ memcpy(dst, map, bytesPerRow * height);


Too much indentation there.

Looks OK otherwise.  I assume you tested with Piglit too.

Reviewed-by: Brian Paul 



+   } else {
+  for (j = 0; j < height; j++) {
+ memcpy(dst, map, bytesPerRow);
+ dst += dstStride;
+ map += stride;
+  }
 }

 ctx->Driver.UnmapRenderbuffer(ctx, rb);
diff --git a/src/mesa/main/texstore.c b/src/mesa/main/texstore.c
index 615ba63362..3314e557c0 100644
--- a/src/mesa/main/texstore.c
+++ b/src/mesa/main/texstore.c
@@ -1360,10 +1360,17 @@ _mesa_store_compressed_texsubimage(struct 
gl_context *ctx, GLuint dims,

if (dstMap) {

   /* copy rows of blocks */
- for (i = 0; i < store.CopyRowsPerSlice; i++) {
-memcpy(dstMap, src, store.CopyBytesPerRow);
-dstMap += dstRowStride;
-src += store.TotalBytesPerRow;
+ if (dstRowStride == store.TotalBytesPerRow &&
+ dstRowStride == store.CopyBytesPerRow) {
+memcpy(dstMap, src, store.CopyBytesPerRow * 
store.CopyRowsPerSlice);

+src += store.CopyBytesPerRow * store.CopyRowsPerSlice;
+ }
+ else {
+for (i = 0; i < store.CopyRowsPerSlice; i++) {
+   memcpy(dstMap, src, store.CopyBytesPerRow);
+   dstMap += dstRowStride;
+   src += store.TotalBytesPerRow;
+}
   }

   ctx->Driver.UnmapTextureImage(ctx, texImage, slice + 
zoffset);






___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: use single memcpy when strides match

2017-04-10 Thread Bartosz Tomczyk
v2: fix indentation
---
 src/mesa/main/readpix.c  | 15 ++-
 src/mesa/main/texstore.c | 15 +++
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
index 25823230d6..606d1e58e5 100644
--- a/src/mesa/main/readpix.c
+++ b/src/mesa/main/readpix.c
@@ -220,7 +220,7 @@ readpixels_memcpy(struct gl_context *ctx,
struct gl_renderbuffer *rb =
  _mesa_get_read_renderbuffer_for_format(ctx, format);
GLubyte *dst, *map;
-   int dstStride, stride, j, texelBytes;
+   int dstStride, stride, j, texelBytes, bytesPerRow;
 
/* Fail if memcpy cannot be used. */
if (!readpixels_can_use_memcpy(ctx, format, type, packing)) {
@@ -239,12 +239,17 @@ readpixels_memcpy(struct gl_context *ctx,
}
 
texelBytes = _mesa_get_format_bytes(rb->Format);
+   bytesPerRow = texelBytes * width;
 
/* memcpy*/
-   for (j = 0; j < height; j++) {
-  memcpy(dst, map, width * texelBytes);
-  dst += dstStride;
-  map += stride;
+   if (dstStride == stride && dstStride == bytesPerRow) {
+  memcpy(dst, map, bytesPerRow * height);
+   } else {
+  for (j = 0; j < height; j++) {
+ memcpy(dst, map, bytesPerRow);
+ dst += dstStride;
+ map += stride;
+  }
}
 
ctx->Driver.UnmapRenderbuffer(ctx, rb);
diff --git a/src/mesa/main/texstore.c b/src/mesa/main/texstore.c
index 615ba63362..3314e557c0 100644
--- a/src/mesa/main/texstore.c
+++ b/src/mesa/main/texstore.c
@@ -1360,10 +1360,17 @@ _mesa_store_compressed_texsubimage(struct gl_context 
*ctx, GLuint dims,
   if (dstMap) {
 
  /* copy rows of blocks */
- for (i = 0; i < store.CopyRowsPerSlice; i++) {
-memcpy(dstMap, src, store.CopyBytesPerRow);
-dstMap += dstRowStride;
-src += store.TotalBytesPerRow;
+ if (dstRowStride == store.TotalBytesPerRow &&
+ dstRowStride == store.CopyBytesPerRow) {
+memcpy(dstMap, src, store.CopyBytesPerRow * 
store.CopyRowsPerSlice);
+src += store.CopyBytesPerRow * store.CopyRowsPerSlice;
+ }
+ else {
+for (i = 0; i < store.CopyRowsPerSlice; i++) {
+   memcpy(dstMap, src, store.CopyBytesPerRow);
+   dstMap += dstRowStride;
+   src += store.TotalBytesPerRow;
+}
  }
 
  ctx->Driver.UnmapTextureImage(ctx, texImage, slice + zoffset);
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vc4: Optimizing vc4_load_utile/vc4_store_utile with sse for x86 build

2017-04-10 Thread Eric Anholt
mas...@eltechs.com writes:

> From: Maxim Maslov 

The commit message needs some explanation of why we would want that
(given that 2835 is an ARM) and some performance data justifying the
change.

>
> --- src/gallium/drivers/vc4/vc4_tiling_lt.c | 93
>+++-- 1 file changed, 90 insertions(+), 3
>deletions(-)
>
> diff --git a/src/gallium/drivers/vc4/vc4_tiling_lt.c 
> b/src/gallium/drivers/vc4/vc4_tiling_lt.c
> index c9cbc65..d291262 100644
> --- a/src/gallium/drivers/vc4/vc4_tiling_lt.c
> +++ b/src/gallium/drivers/vc4/vc4_tiling_lt.c
> @@ -105,6 +105,49 @@ vc4_load_utile(void *cpu, void *gpu, uint32_t 
> cpu_stride, uint32_t cpp)
>  : "r"(gpu), "r"(cpu), "r"(cpu + 8), "r"(cpu_stride)
>  : "q0", "q1", "q2", "q3");
>  }
> +#elif defined(USE_SSE_ASM)
> +if (gpu_stride == 8) {
> +__asm__ volatile (
> +"movdqu 0(%1), %%xmm0;"
> +"movdqu 0x10(%1), %%xmm1;"
> +"movdqu 0x20(%1), %%xmm2;"
> +"movdqu 0x30(%1), %%xmm3;"
> +"movlpd %%xmm0, 0(%0);"
> +"mov %2, %%ecx;"
> +"movhpd %%xmm0, 0(%0,%%ecx,1);"
> +"add %2, %%ecx;"
> +"movlpd %%xmm1, 0(%0,%%ecx,1);"
> +"add %2, %%ecx;"
> +"movhpd %%xmm1, 0(%0,%%ecx,1);"
> +"add %2, %%ecx;"
> +"movlpd %%xmm2, 0(%0,%%ecx,1);"
> +"add %2, %%ecx;"
> +"movhpd %%xmm2, 0(%0,%%ecx,1);"
> +"add %2, %%ecx;"
> +"movlpd %%xmm3, 0(%0,%%ecx,1);"
> +"add %2, %%ecx;"
> +"movhpd %%xmm3, 0(%0,%%ecx,1);"
> +:
> +: "r"(cpu), "r"(gpu), "r"(cpu_stride)
> +: "%xmm0",  "%xmm1",  "%xmm2",  "%xmm3", "%ecx");
> +} else {
> +assert(gpu_stride == 16);
> +__asm__ volatile (
> +"movdqu 0(%1), %%xmm0;"
> +"movdqu 0x10(%1), %%xmm1;"
> +"movdqu 0x20(%1), %%xmm2;"
> +"movdqu 0x30(%1), %%xmm3;"
> +"movdqu %%xmm0, 0(%0);"
> +"mov %2, %%ecx;"
> +"movdqu %%xmm1, 0(%0,%%ecx,1);"
> +"add %2, %%ecx;"
> +"movdqu %%xmm2, 0(%0,%%ecx,1);"
> +"add %2, %%ecx;"
> +"movdqu %%xmm3, 0(%0,%%ecx,1);"
> +:
> +: "r"(cpu), "r"(gpu), "r"(cpu_stride)
> +: "%xmm0",  "%xmm1",  "%xmm2",  "%xmm3", "%ecx");
> +}

Using SSE in Mesa requires runtime detection if SSE is actually present.


>  #endif
> -
>  }
>  
>  void
> @@ -175,6 +260,7 @@ NEON_TAG(vc4_load_lt_image)(void *dst, uint32_t 
> dst_stride,
>  int cpp, const struct pipe_box *box)
>  {
>  uint32_t utile_w = vc4_utile_width(cpp);
> +uint32_t xfactor = 64 / utile_w;
>  uint32_t utile_h = vc4_utile_height(cpp);
>  uint32_t xstart = box->x;
>  uint32_t ystart = box->y;
> @@ -184,7 +270,7 @@ NEON_TAG(vc4_load_lt_image)(void *dst, uint32_t 
> dst_stride,
>  vc4_load_utile(dst + (dst_stride * y +
>x * cpp),
> src + ((ystart + y) * src_stride +
> -  (xstart + x) * 64 / utile_w),
> +  (xstart + x) * xfactor),
> dst_stride, cpp);
>  }
>  }
> @@ -196,6 +282,7 @@ NEON_TAG(vc4_store_lt_image)(void *dst, uint32_t 
> dst_stride,
>   int cpp, const struct pipe_box *box)
>  {
>  uint32_t utile_w = vc4_utile_width(cpp);
> +uint32_t xfactor = 64 / utile_w;
>  uint32_t utile_h = vc4_utile_height(cpp);
>  uint32_t xstart = box->x;
>  uint32_t ystart = box->y;
> @@ -203,7 +290,7 @@ NEON_TAG(vc4_store_lt_image)(void *dst, uint32_t 
> dst_stride,
>  for (uint32_t y = 0; y < box->height; y += utile_h) {
>  for (int x = 0; x < box->width; x += utile_w) {
>  vc4_store_utile(dst + ((ystart + y) * dst_stride +
> -   (xstart + x) * 64 / utile_w),
> +   (xstart + x) * xfactor),
>  src + (src_stride * y +
> x * cpp),
>  src_stride, cpp);
> -- 
> 

[Mesa-dev] [PATCH] vc4: Optimizing vc4_load_utile/vc4_store_utile with sse for x86 build

2017-04-10 Thread maslov
From: Maxim Maslov 

---
 src/gallium/drivers/vc4/vc4_tiling_lt.c | 93 +++--
 1 file changed, 90 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/vc4/vc4_tiling_lt.c 
b/src/gallium/drivers/vc4/vc4_tiling_lt.c
index c9cbc65..d291262 100644
--- a/src/gallium/drivers/vc4/vc4_tiling_lt.c
+++ b/src/gallium/drivers/vc4/vc4_tiling_lt.c
@@ -105,6 +105,49 @@ vc4_load_utile(void *cpu, void *gpu, uint32_t cpu_stride, 
uint32_t cpp)
 : "r"(gpu), "r"(cpu), "r"(cpu + 8), "r"(cpu_stride)
 : "q0", "q1", "q2", "q3");
 }
+#elif defined(USE_SSE_ASM)
+if (gpu_stride == 8) {
+__asm__ volatile (
+"movdqu 0(%1), %%xmm0;"
+"movdqu 0x10(%1), %%xmm1;"
+"movdqu 0x20(%1), %%xmm2;"
+"movdqu 0x30(%1), %%xmm3;"
+"movlpd %%xmm0, 0(%0);"
+"mov %2, %%ecx;"
+"movhpd %%xmm0, 0(%0,%%ecx,1);"
+"add %2, %%ecx;"
+"movlpd %%xmm1, 0(%0,%%ecx,1);"
+"add %2, %%ecx;"
+"movhpd %%xmm1, 0(%0,%%ecx,1);"
+"add %2, %%ecx;"
+"movlpd %%xmm2, 0(%0,%%ecx,1);"
+"add %2, %%ecx;"
+"movhpd %%xmm2, 0(%0,%%ecx,1);"
+"add %2, %%ecx;"
+"movlpd %%xmm3, 0(%0,%%ecx,1);"
+"add %2, %%ecx;"
+"movhpd %%xmm3, 0(%0,%%ecx,1);"
+:
+: "r"(cpu), "r"(gpu), "r"(cpu_stride)
+: "%xmm0",  "%xmm1",  "%xmm2",  "%xmm3", "%ecx");
+} else {
+assert(gpu_stride == 16);
+__asm__ volatile (
+"movdqu 0(%1), %%xmm0;"
+"movdqu 0x10(%1), %%xmm1;"
+"movdqu 0x20(%1), %%xmm2;"
+"movdqu 0x30(%1), %%xmm3;"
+"movdqu %%xmm0, 0(%0);"
+"mov %2, %%ecx;"
+"movdqu %%xmm1, 0(%0,%%ecx,1);"
+"add %2, %%ecx;"
+"movdqu %%xmm2, 0(%0,%%ecx,1);"
+"add %2, %%ecx;"
+"movdqu %%xmm3, 0(%0,%%ecx,1);"
+:
+: "r"(cpu), "r"(gpu), "r"(cpu_stride)
+: "%xmm0",  "%xmm1",  "%xmm2",  "%xmm3", "%ecx");
+}
 #else
 for (uint32_t gpu_offset = 0; gpu_offset < 64; gpu_offset += 
gpu_stride) {
 memcpy(cpu, gpu + gpu_offset, gpu_stride);
@@ -160,13 +203,55 @@ vc4_store_utile(void *gpu, void *cpu, uint32_t 
cpu_stride, uint32_t cpp)
 : "r"(gpu), "r"(cpu), "r"(cpu + 8), "r"(cpu_stride)
 : "q0", "q1", "q2", "q3");
 }
+#elif defined(USE_SSE_ASM)
+if (gpu_stride == 8) {
+__asm__ volatile (
+"movlpd 0(%1), %%xmm0;"
+"mov %2, %%ecx;"
+"movhpd 0(%1,%%ecx,1), %%xmm0;"
+"add %2, %%ecx;"
+"movlpd 0(%1,%%ecx,1), %%xmm1;"
+"add %2, %%ecx;"
+"movhpd 0(%1,%%ecx,1), %%xmm1;"
+"add %2, %%ecx;"
+"movlpd 0(%1,%%ecx,1), %%xmm2;"
+"add %2, %%ecx;"
+"movhpd 0(%1,%%ecx,1), %%xmm2;"
+"add %2, %%ecx;"
+"movlpd 0(%1,%%ecx,1), %%xmm3;"
+"add %2, %%ecx;"
+"movhpd 0(%1,%%ecx,1), %%xmm3;"
+"movdqu %%xmm0, 0(%0);"
+"movdqu %%xmm1, 0x10(%0);"
+"movdqu %%xmm2, 0x20(%0);"
+"movdqu %%xmm3, 0x30(%0);"
+:
+: "r"(gpu), "r"(cpu), "r"(cpu_stride)
+: "%xmm0",  "%xmm1",  "%xmm2",  "%xmm3", "%ecx");
+} else {
+assert(gpu_stride == 16);
+__asm__ volatile (
+   "movdqu 0(%1), %%xmm0;"
+   "mov %2, %%ecx;"
+   "movdqu 0(%1,%%ecx,1), %%xmm1;"
+   "add %2, %%ecx;"
+   "movdqu 0(%1,%%ecx,1), %%xmm2;"
+   "add %2, %%ecx;"
+   "movdqu 0(%1,%%ecx,1), %%xmm3;"
+   "movdqu %%xmm0, 0(%0);"
+   "movdqu %%xmm1, 0x10(%0);"
+   "movdqu %%xmm2, 0x20(%0);"
+   "movdqu %%xmm3, 0x30(%0);"
+   :
+   : "r"(gpu), "r"(cpu), "r"(cpu_stride)
+ 

Re: [Mesa-dev] [PATCH 9/9] i965/drm: Add stall warnings when mapping or waiting on BOs.

2017-04-10 Thread Chris Wilson
On Mon, Apr 10, 2017 at 10:29:50AM -0700, Kenneth Graunke wrote:
> On Monday, April 10, 2017 1:31:11 AM PDT Chris Wilson wrote:
> > In general, does 10us resolution require compensation for clock_gettime()
> > overhead and checking against clock_getres()?
> 
> FWIW, I copied the 10us threshold from your brw-batch series.  I'm happy
> to adjust it.

I can honestly say there wasn't any thought behind it. :)
I think it's about the right sort of threshold between "free" and "don't
do this". A hundred "don't do this" really eats into your frame budget!
 
> On my system, clock_getres(CLOCK_MONOTONIC[_RAW], ) reports a
> resolution of 1 nanosecond, so given a 10us = 1ns threshold, I
> doubt we need to consider it.

Best to check on an atom - though I doubt any are as bad as Pineview
(which was ~4ms iirc). MONOTONIC_RAW should be close to tsc resolution,
so yes if we have RAW, we probably don't need to worry. Just do a check
and warn if than the alarm threshold  and see if anybody ever files a
bug.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 9/9] i965/drm: Add stall warnings when mapping or waiting on BOs.

2017-04-10 Thread Kenneth Graunke
On Monday, April 10, 2017 1:31:11 AM PDT Chris Wilson wrote:
> On Mon, Apr 10, 2017 at 10:09:17AM +0200, Daniel Vetter wrote:
> > On Mon, Apr 10, 2017 at 12:18:54AM -0700, Kenneth Graunke wrote:
> > > diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
> > > b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> > > index 8ccc5a276b9..6e4b55cf9ec 100644
> > > --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> > > +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> > > @@ -100,7 +100,7 @@ intel_batchbuffer_reset(struct intel_batchbuffer 
> > > *batch,
> > >  
> > > batch->bo = brw_bo_alloc(bufmgr, "batchbuffer", BATCH_SZ, 4096);
> > > if (has_llc) {
> > > -  brw_bo_map(batch->bo, true);
> > > +  brw_bo_map(NULL, batch->bo, true);
> > 
> > Why NULL here? Mapping a fresh buffer might incur a clflush, which isn't
> > cheap. I think for atom tuning you want to hear about those.
> 
> I thought it was because there is no brw pointer at this point.

Chris is right - there's no brw pointer so we can't report anything.
We could easily plumb one through, but I was lazy.  I figured this
already gives a ton more coverage than we used to have, and it wasn't
that interesting of a case.

> For !llc, please do a WB mapping of the batch on first use, then a WC
> mapping thereafter. The clflush at execbuf is "free" - or rather it is
> done asynchronously, after taking advantage of the WB for any fixups
> required. Afterwards, you want to avoid clflushing which is where the
> pwrite was useful but now you can use the WC mmap to avoid the penalty
> of performing a copy and avoiding the WB/clflush 2-pass.
> 
> In general, does 10us resolution require compensation for clock_gettime()
> overhead and checking against clock_getres()?

FWIW, I copied the 10us threshold from your brw-batch series.  I'm happy
to adjust it.

On my system, clock_getres(CLOCK_MONOTONIC[_RAW], ) reports a
resolution of 1 nanosecond, so given a 10us = 1ns threshold, I
doubt we need to consider it.

> (I hope getime is using MONOTONIC_RAW!)

It isn't.  It should.  I'll send a patch.

> Longer term feeding the callsite down to set-domain is
> useful to make diagnosing the problem easier. I tried to give the name
> as being the closest GL entry point along with the function/line of the
> culprit.

Yeah.  bo->name is often enough to find the offender, but it'd
definitely be nicer to pass all that through.  Lots of "miptree" BOs
and lots of ways those can go wrong.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] glsl: use the BA1 macro for textureQueryLevels()

2017-04-10 Thread Samuel Pitoiset
For both consistency and new bindless sampler types.

Signed-off-by: Samuel Pitoiset 
---
 src/compiler/glsl/builtin_functions.cpp | 65 +
 1 file changed, 33 insertions(+), 32 deletions(-)

diff --git a/src/compiler/glsl/builtin_functions.cpp 
b/src/compiler/glsl/builtin_functions.cpp
index 769799595f..5d62d9f8ee 100644
--- a/src/compiler/glsl/builtin_functions.cpp
+++ b/src/compiler/glsl/builtin_functions.cpp
@@ -863,7 +863,7 @@ private:
B0(barrier)
 
BA2(textureQueryLod);
-   B1(textureQueryLevels);
+   BA1(textureQueryLevels);
BA2(textureSamplesIdentical);
B1(dFdx);
B1(dFdy);
@@ -2454,35 +2454,35 @@ builtin_builder::create_builtins()
 NULL);
 
add_function("textureQueryLevels",
-_textureQueryLevels(glsl_type::sampler1D_type),
-_textureQueryLevels(glsl_type::sampler2D_type),
-_textureQueryLevels(glsl_type::sampler3D_type),
-_textureQueryLevels(glsl_type::samplerCube_type),
-_textureQueryLevels(glsl_type::sampler1DArray_type),
-_textureQueryLevels(glsl_type::sampler2DArray_type),
-_textureQueryLevels(glsl_type::samplerCubeArray_type),
-_textureQueryLevels(glsl_type::sampler1DShadow_type),
-_textureQueryLevels(glsl_type::sampler2DShadow_type),
-_textureQueryLevels(glsl_type::samplerCubeShadow_type),
-_textureQueryLevels(glsl_type::sampler1DArrayShadow_type),
-_textureQueryLevels(glsl_type::sampler2DArrayShadow_type),
-_textureQueryLevels(glsl_type::samplerCubeArrayShadow_type),
-
-_textureQueryLevels(glsl_type::isampler1D_type),
-_textureQueryLevels(glsl_type::isampler2D_type),
-_textureQueryLevels(glsl_type::isampler3D_type),
-_textureQueryLevels(glsl_type::isamplerCube_type),
-_textureQueryLevels(glsl_type::isampler1DArray_type),
-_textureQueryLevels(glsl_type::isampler2DArray_type),
-_textureQueryLevels(glsl_type::isamplerCubeArray_type),
-
-_textureQueryLevels(glsl_type::usampler1D_type),
-_textureQueryLevels(glsl_type::usampler2D_type),
-_textureQueryLevels(glsl_type::usampler3D_type),
-_textureQueryLevels(glsl_type::usamplerCube_type),
-_textureQueryLevels(glsl_type::usampler1DArray_type),
-_textureQueryLevels(glsl_type::usampler2DArray_type),
-_textureQueryLevels(glsl_type::usamplerCubeArray_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::sampler1D_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::sampler2D_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::sampler3D_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::samplerCube_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::sampler1DArray_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::sampler2DArray_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::samplerCubeArray_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::sampler1DShadow_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::sampler2DShadow_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::samplerCubeShadow_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::sampler1DArrayShadow_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::sampler2DArrayShadow_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::samplerCubeArrayShadow_type),
+
+_textureQueryLevels(texture_query_levels, 
glsl_type::isampler1D_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::isampler2D_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::isampler3D_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::isamplerCube_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::isampler1DArray_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::isampler2DArray_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::isamplerCubeArray_type),
+
+_textureQueryLevels(texture_query_levels, 
glsl_type::usampler1D_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::usampler2D_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::usampler3D_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::usamplerCube_type),
+_textureQueryLevels(texture_query_levels, 
glsl_type::usampler1DArray_type),
+

[Mesa-dev] [PATCH 2/3] glsl: use the BA1 macro for textureSamples()

2017-04-10 Thread Samuel Pitoiset
For both consistency and new bindless sampler types.

Signed-off-by: Samuel Pitoiset 
---
 src/compiler/glsl/builtin_functions.cpp | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/src/compiler/glsl/builtin_functions.cpp 
b/src/compiler/glsl/builtin_functions.cpp
index 0ab7875295..769799595f 100644
--- a/src/compiler/glsl/builtin_functions.cpp
+++ b/src/compiler/glsl/builtin_functions.cpp
@@ -832,7 +832,7 @@ private:
B1(all);
B1(not);
BA2(textureSize);
-   B1(textureSamples);
+   BA1(textureSamples);
 
 /** Flags to _texture() */
 #define TEX_PROJECT 1
@@ -1792,13 +1792,13 @@ builtin_builder::create_builtins()
 NULL);
 
add_function("textureSamples",
-_textureSamples(glsl_type::sampler2DMS_type),
-_textureSamples(glsl_type::isampler2DMS_type),
-_textureSamples(glsl_type::usampler2DMS_type),
+_textureSamples(shader_samples, glsl_type::sampler2DMS_type),
+_textureSamples(shader_samples, glsl_type::isampler2DMS_type),
+_textureSamples(shader_samples, glsl_type::usampler2DMS_type),
 
-_textureSamples(glsl_type::sampler2DMSArray_type),
-_textureSamples(glsl_type::isampler2DMSArray_type),
-_textureSamples(glsl_type::usampler2DMSArray_type),
+_textureSamples(shader_samples, 
glsl_type::sampler2DMSArray_type),
+_textureSamples(shader_samples, 
glsl_type::isampler2DMSArray_type),
+_textureSamples(shader_samples, 
glsl_type::usampler2DMSArray_type),
 NULL);
 
add_function("texture",
@@ -4947,10 +4947,11 @@ 
builtin_builder::_textureSize(builtin_available_predicate avail,
 }
 
 ir_function_signature *
-builtin_builder::_textureSamples(const glsl_type *sampler_type)
+builtin_builder::_textureSamples(builtin_available_predicate avail,
+ const glsl_type *sampler_type)
 {
ir_variable *s = in_var(sampler_type, "sampler");
-   MAKE_SIG(glsl_type::int_type, shader_samples, 1, s);
+   MAKE_SIG(glsl_type::int_type, avail, 1, s);
 
ir_texture *tex = new(mem_ctx) ir_texture(ir_texture_samples);
tex->set_sampler(new(mem_ctx) ir_dereference_variable(s), 
glsl_type::int_type);
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] glsl: use the BA1 macro for textureCubeArrayShadow()

2017-04-10 Thread Samuel Pitoiset
For both consistency and new bindless sampler types.

Signed-off-by: Samuel Pitoiset 
---
 src/compiler/glsl/builtin_functions.cpp | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/compiler/glsl/builtin_functions.cpp 
b/src/compiler/glsl/builtin_functions.cpp
index d902a91a77..0ab7875295 100644
--- a/src/compiler/glsl/builtin_functions.cpp
+++ b/src/compiler/glsl/builtin_functions.cpp
@@ -847,7 +847,7 @@ private:
const glsl_type *sampler_type,
const glsl_type *coord_type,
int flags = 0);
-   B0(textureCubeArrayShadow);
+   BA1(textureCubeArrayShadow);
ir_function_signature *_texelFetch(builtin_available_predicate avail,
   const glsl_type *return_type,
   const glsl_type *sampler_type,
@@ -1839,7 +1839,7 @@ builtin_builder::create_builtins()
 /* samplerCubeArrayShadow is special; it has an extra parameter
  * for the shadow comparator since there is no vec5 type.
  */
-_textureCubeArrayShadow(),
+_textureCubeArrayShadow(texture_cube_map_array, 
glsl_type::samplerCubeArrayShadow_type),
 
 _texture(ir_tex, v130, glsl_type::vec4_type,  
glsl_type::sampler2DRect_type,  glsl_type::vec2_type),
 _texture(ir_tex, v130, glsl_type::ivec4_type, 
glsl_type::isampler2DRect_type, glsl_type::vec2_type),
@@ -5064,12 +5064,13 @@ builtin_builder::_texture(ir_texture_opcode opcode,
 }
 
 ir_function_signature *
-builtin_builder::_textureCubeArrayShadow()
+builtin_builder::_textureCubeArrayShadow(builtin_available_predicate avail,
+ const glsl_type *sampler_type)
 {
-   ir_variable *s = in_var(glsl_type::samplerCubeArrayShadow_type, "sampler");
+   ir_variable *s = in_var(sampler_type, "sampler");
ir_variable *P = in_var(glsl_type::vec4_type, "P");
ir_variable *compare = in_var(glsl_type::float_type, "compare");
-   MAKE_SIG(glsl_type::float_type, texture_cube_map_array, 3, s, P, compare);
+   MAKE_SIG(glsl_type::float_type, avail, 3, s, P, compare);
 
ir_texture *tex = new(mem_ctx) ir_texture(ir_tex);
tex->set_sampler(var_ref(s), glsl_type::float_type);
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/53] i965: Eat libdrm_intel for breakfast

2017-04-10 Thread Charles, Daniel
On Wed, Apr 5, 2017 at 11:27 AM, Kristian Høgsberg  wrote:
> On Wed, Apr 5, 2017 at 11:11 AM, Jason Ekstrand  wrote:
>> On Wed, Apr 5, 2017 at 11:03 AM, Emil Velikov 
>> wrote:
>>>
>>> On 5 April 2017 at 18:55, Daniel Vetter  wrote:
>>> > On Wed, Apr 05, 2017 at 04:38:25PM +0100, Emil Velikov wrote:
>>> >> Hi Ken,
>>> >>
>>> >> On 5 April 2017 at 01:09, Kenneth Graunke 
>>> >> wrote:
>>> >> > Hello,
>>> >> >
>>> >> > This series imports libdrm_intel into the i965 driver, hacks and
>>> >> > slashes it down to size, and greatly simplifies our relocation
>>> >> > handling.
>>> >> >
>>> >> > Some of the patches may be held for moderation.  You can find the
>>> >> > series in git here:
>>> >> >
>>> >> > https://cgit.freedesktop.org/~kwg/mesa/log/?h=bacondrm
>>> >> >
>>> >> > A couple of us have been talking about this in person and IRC for
>>> >> > a while, but I realize I haven't mentioned anything about it on the
>>> >> > mailing list yet, so this may come as a bit of a surprise.
>>> >> >
>>> >> > libdrm_intel is about 15 source files and almost 13,000 lines of
>>> >> > code.
>>> >> > This series adds 3 files (one .c, two .h) and only 2,137 lines of
>>> >> > code:
>>> >> >
>>> >> > 60 files changed, 2784 insertions(+), 647 deletions(-)
>>> >> >
>>> >> > The rest of the library is basically useless to us.  It contains a
>>> >> > lot
>>> >> > of legacy cruft from the pre-GEM, DRI1, or 8xx/9xx era.  But even the
>>> >> > parts we do use are in bad shape.  BO offset tracking is
>>> >> > non-threadsafe.
>>> >> > Relocation handling is way too complicated.  These things waste
>>> >> > memory,
>>> >> > burn CPU time, and make it difficult for us to take advantage of new
>>> >> > kernel features like I915_EXEC_NO_RELOC which would reduce overhead
>>> >> > further.  The unsynchronized mapping API performs a synchronized
>>> >> > mapping
>>> >> > on non-LLC platforms, which can massively hurt performance on Atoms.
>>> >> > Mesa is also using uncached GTT mappings for almost everything on
>>> >> > Atoms,
>>> >> > rather than fast CPU or WC maps where possible.
>>> >> >
>>> >> > Evolving this code in libdrm is very painful, as we aren't allowed to
>>> >> > break the ABI.  All the legacy cruft and design mistakes (in
>>> >> > hindsight)
>>> >> > make it difficult to follow what's going on.  We could keep piling
>>> >> > new
>>> >> > layers on top, but that only makes it worse.  Furthermore, there's a
>>> >> > bunch of complexity that comes from defending against or supporting
>>> >> > broken or badly designed callers.
>>> >> >
>>> >> I believe I mentioned it a few days ago - there is no need to worry
>>> >> about API or ABI stability.
>>> >>
>>> >> Need new API - add it. Things getting fragile or too many layers - sed
>>> >> /libdrm_intel$(N)/libdrm_intel$(N+1)/ and rework as needed.
>>> >>
>>> >> I fear that Importing libdrm_intel will be detrimental to libva's
>>> >> intel-driver, Beignet and xf86-video-intel
>>
>>
>> I wouldn't worry about xf86-video-intel.  Chris has already copy+pasted half
>> of the X server, what's libdrm? :-)
>>
>> The others, yeah, they could possibly benefit from drm_intel3.  That said, I
>> think you significantly over-estimate how much a driver actually gets from
>> libdrm.  We chose to not use libdrm in Vulkan and it really hasn't caused us
>> all that much pain.
>>
>>>
>>> >> development.
>>> >> Those teams seem to be more resource contained than Mesa, thus they
>>> >> will trail behind even more.
>>> >>
>>> >> As an example - the intel-driver is missing some trivial winsys
>>> >> optimisations that landed in Mesa 3+ years ago. That could have been
>>> >> avoided if the helpers were shared with the help of
>>> >> libdrm_intel/other.
>>
>>
>> libdrm should *never* touch winsys.  Please, no.
>>
>>>
>>> >
>>> > That is kinda the longer-term goal with this. There's a lot more that
>>> > needs to be done besides Ken's series here, this is just the first step,
>>> > but in the end we'll probably move brw_batch back into libdrm_intel2 or
>>> > so, for consumption by beignet and libva.
>>> >
>>> > But for rewriting the world and getting rid of 10+ years of compat
>>> > garbage, having a split between libdrm and mesa isn't great.
>>> >
>>> So the goal is to have the code in mesa as a form of incubator until
>>> it reaches maturity.
>>> This way one will have a more rapid development and greater
>>> flexibility during that stage.
>>
>>
>> Yes, I think we'd eventually like to have some shared code again.  However,
>> at the moment, that code sharing is costing us dearly and it's time for a
>> step back and a complete re-evaluation of how we do things.  Once we've
>> settled on something we like then maybe we can consider sharing again.
>> Ideally, I'd like the Vulkan driver to be able to share at least some bits
>> with i965.  At the moment, however, we don't know what the new API should
>> 

Re: [Mesa-dev] [PATCH] mesa: use single memcpy when strides matches

2017-04-10 Thread Brian Paul

On 04/09/2017 07:58 AM, Bartosz Tomczyk wrote:

---
  src/mesa/main/readpix.c  | 15 ++-
  src/mesa/main/texstore.c | 15 +++
  2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
index 25823230d6..14568de497 100644
--- a/src/mesa/main/readpix.c
+++ b/src/mesa/main/readpix.c
@@ -220,7 +220,7 @@ readpixels_memcpy(struct gl_context *ctx,
 struct gl_renderbuffer *rb =
   _mesa_get_read_renderbuffer_for_format(ctx, format);
 GLubyte *dst, *map;
-   int dstStride, stride, j, texelBytes;
+   int dstStride, stride, j, texelBytes, bytesPerRow;

 /* Fail if memcpy cannot be used. */
 if (!readpixels_can_use_memcpy(ctx, format, type, packing)) {
@@ -239,12 +239,17 @@ readpixels_memcpy(struct gl_context *ctx,
 }

 texelBytes = _mesa_get_format_bytes(rb->Format);
+   bytesPerRow = texelBytes * width;

 /* memcpy*/
-   for (j = 0; j < height; j++) {
-  memcpy(dst, map, width * texelBytes);
-  dst += dstStride;
-  map += stride;
+   if (dstStride == stride && dstStride == bytesPerRow) {
+ memcpy(dst, map, bytesPerRow * height);


Too much indentation there.

Looks OK otherwise.  I assume you tested with Piglit too.

Reviewed-by: Brian Paul 



+   } else {
+  for (j = 0; j < height; j++) {
+ memcpy(dst, map, bytesPerRow);
+ dst += dstStride;
+ map += stride;
+  }
 }

 ctx->Driver.UnmapRenderbuffer(ctx, rb);
diff --git a/src/mesa/main/texstore.c b/src/mesa/main/texstore.c
index 615ba63362..3314e557c0 100644
--- a/src/mesa/main/texstore.c
+++ b/src/mesa/main/texstore.c
@@ -1360,10 +1360,17 @@ _mesa_store_compressed_texsubimage(struct gl_context 
*ctx, GLuint dims,
if (dstMap) {

   /* copy rows of blocks */
- for (i = 0; i < store.CopyRowsPerSlice; i++) {
-memcpy(dstMap, src, store.CopyBytesPerRow);
-dstMap += dstRowStride;
-src += store.TotalBytesPerRow;
+ if (dstRowStride == store.TotalBytesPerRow &&
+ dstRowStride == store.CopyBytesPerRow) {
+memcpy(dstMap, src, store.CopyBytesPerRow * 
store.CopyRowsPerSlice);
+src += store.CopyBytesPerRow * store.CopyRowsPerSlice;
+ }
+ else {
+for (i = 0; i < store.CopyRowsPerSlice; i++) {
+   memcpy(dstMap, src, store.CopyBytesPerRow);
+   dstMap += dstRowStride;
+   src += store.TotalBytesPerRow;
+}
   }

   ctx->Driver.UnmapTextureImage(ctx, texImage, slice + zoffset);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V3 1/9] mesa: create _mesa_attach_renderbuffer_without_ref() helper

2017-04-10 Thread Brian Paul

On 04/07/2017 09:21 PM, Timothy Arceri wrote:

This will be used to take ownership of freashly created renderbuffers,
avoiding the need to call the reference function which requires
locking.

V2: dereference any existing fb attachments and actually attach the
 new rb.

v3: split out validation and attachment type/complete setting into
 a shared static function.
---
  src/mesa/main/renderbuffer.c | 43 +++
  src/mesa/main/renderbuffer.h |  5 +
  2 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/src/mesa/main/renderbuffer.c b/src/mesa/main/renderbuffer.c
index 4375b5b..627bdca 100644
--- a/src/mesa/main/renderbuffer.c
+++ b/src/mesa/main/renderbuffer.c
@@ -99,28 +99,24 @@ _mesa_new_renderbuffer(struct gl_context *ctx, GLuint name)
   * free the object in the end.
   */
  void
  _mesa_delete_renderbuffer(struct gl_context *ctx, struct gl_renderbuffer *rb)
  {
 mtx_destroy(>Mutex);
 free(rb->Label);
 free(rb);
  }

-
-/**
- * Attach a renderbuffer to a framebuffer.
- * \param bufferName  one of the BUFFER_x tokens
- */
-void
-_mesa_add_renderbuffer(struct gl_framebuffer *fb,
-   gl_buffer_index bufferName, struct gl_renderbuffer *rb)
+static void
+validate_and_init_renderbuffer_attachment(struct gl_framebuffer *fb,
+  gl_buffer_index bufferName,
+  struct gl_renderbuffer *rb)
  {
 assert(fb);
 assert(rb);
 assert(bufferName < BUFFER_COUNT);

 /* There should be no previous renderbuffer on this attachment point,
  * with the exception of depth/stencil since the same renderbuffer may
  * be used for both.
  */
 assert(bufferName == BUFFER_DEPTH ||
@@ -130,20 +126,51 @@ _mesa_add_renderbuffer(struct gl_framebuffer *fb,
 /* winsys vs. user-created buffer cross check */
 if (_mesa_is_user_fbo(fb)) {
assert(rb->Name);
 }
 else {
assert(!rb->Name);
 }

 fb->Attachment[bufferName].Type = GL_RENDERBUFFER_EXT;
 fb->Attachment[bufferName].Complete = GL_TRUE;
+}
+
+
+/**
+ * Attach a renderbuffer to a framebuffer.
+ * \param bufferName  one of the BUFFER_x tokens
+ *
+ * This function avoids adding a reference and is therefore intended to be
+ * used with a freashly created renderbuffer.


"freshly"



+ */
+void
+_mesa_add_renderbuffer_without_ref(struct gl_framebuffer *fb,
+   gl_buffer_index bufferName,
+   struct gl_renderbuffer *rb)


I see you've already pushed this.  Still, I'd like to suggest a 
different name such as _mesa_own_renderbuffer() that stresses the 
transfer of ownership of the renderbuffer.




+{


If this function should only be used with a "freshly created" 
renderbuffer, can we assert that its RefCount is one here?


-Brian


+   validate_and_init_renderbuffer_attachment(fb, bufferName, rb);
+
+   _mesa_reference_renderbuffer(>Attachment[bufferName].Renderbuffer,
+NULL);
+   fb->Attachment[bufferName].Renderbuffer = rb;
+}
+
+/**
+ * Attach a renderbuffer to a framebuffer.
+ * \param bufferName  one of the BUFFER_x tokens
+ */
+void
+_mesa_add_renderbuffer(struct gl_framebuffer *fb,
+   gl_buffer_index bufferName, struct gl_renderbuffer *rb)
+{
+   validate_and_init_renderbuffer_attachment(fb, bufferName, rb);
 _mesa_reference_renderbuffer(>Attachment[bufferName].Renderbuffer, rb);
  }


  /**
   * Remove the named renderbuffer from the given framebuffer.
   * \param bufferName  one of the BUFFER_x tokens
   */
  void
  _mesa_remove_renderbuffer(struct gl_framebuffer *fb,
diff --git a/src/mesa/main/renderbuffer.h b/src/mesa/main/renderbuffer.h
index aa83120..a6f1439 100644
--- a/src/mesa/main/renderbuffer.h
+++ b/src/mesa/main/renderbuffer.h
@@ -40,20 +40,25 @@ struct gl_renderbuffer;
  extern void
  _mesa_init_renderbuffer(struct gl_renderbuffer *rb, GLuint name);

  extern struct gl_renderbuffer *
  _mesa_new_renderbuffer(struct gl_context *ctx, GLuint name);

  extern void
  _mesa_delete_renderbuffer(struct gl_context *ctx, struct gl_renderbuffer *rb);

  extern void
+_mesa_add_renderbuffer_without_ref(struct gl_framebuffer *fb,
+   gl_buffer_index bufferName,
+   struct gl_renderbuffer *rb);
+
+extern void
  _mesa_add_renderbuffer(struct gl_framebuffer *fb,
 gl_buffer_index bufferName, struct gl_renderbuffer 
*rb);

  extern void
  _mesa_remove_renderbuffer(struct gl_framebuffer *fb,
gl_buffer_index bufferName);

  extern void
  _mesa_reference_renderbuffer_(struct gl_renderbuffer **ptr,
struct gl_renderbuffer *rb);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/12] swr: [rasterizer common/core] Fix 32-bit windows build

2017-04-10 Thread Tim Rowley
---
 .../drivers/swr/rasterizer/common/simd16intrin.h   | 198 +++--
 src/gallium/drivers/swr/rasterizer/core/clip.h |   6 +-
 src/gallium/drivers/swr/rasterizer/core/context.h  |   2 +-
 .../swr/rasterizer/core/format_conversion.h|   8 +-
 .../drivers/swr/rasterizer/core/format_types.h |  22 +--
 src/gallium/drivers/swr/rasterizer/core/frontend.h |   4 +-
 6 files changed, 123 insertions(+), 117 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h 
b/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h
index fee50d0..aa47574 100644
--- a/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h
+++ b/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h
@@ -60,6 +60,12 @@ typedef __mmask16 simd16mask;
 #define _simd16_maskhi(mask) (((mask) >> 8) & 0xFF)
 #define _simd16_setmask(hi, lo) (((hi) << 8) | (lo))
 
+#if defined(_WIN32)
+#define SIMDAPI __vectorcall
+#else
+#define SIMDAPI
+#endif
+
 OSALIGN(union, KNOB_SIMD16_BYTES) simd16vector
 {
 simd16scalar  v[4];
@@ -75,7 +81,7 @@ OSALIGN(union, KNOB_SIMD16_BYTES) simd16vector
 #if ENABLE_AVX512_EMULATION
 
 #define SIMD16_EMU_AVX512_0(type, func, intrin) \
-INLINE type func()\
+INLINE type SIMDAPI func()\
 {\
 type result;\
 \
@@ -86,7 +92,7 @@ INLINE type func()\
 }
 
 #define SIMD16_EMU_AVX512_1(type, func, intrin) \
-INLINE type func(type a)\
+INLINE type SIMDAPI func(type a)\
 {\
 type result;\
 \
@@ -97,7 +103,7 @@ INLINE type func(type a)\
 }
 
 #define SIMD16_EMU_AVX512_2(type, func, intrin) \
-INLINE type func(type a, type b)\
+INLINE type SIMDAPI func(type a, type b)\
 {\
 type result;\
 \
@@ -108,7 +114,7 @@ INLINE type func(type a, type b)\
 }
 
 #define SIMD16_EMU_AVX512_3(type, func, intrin) \
-INLINE type func(type a, type b, type c)\
+INLINE type SIMDAPI func(type a, type b, type c)\
 {\
 type result;\
 \
@@ -121,7 +127,7 @@ INLINE type func(type a, type b, type c)\
 SIMD16_EMU_AVX512_0(simd16scalar, _simd16_setzero_ps, _mm256_setzero_ps)
 SIMD16_EMU_AVX512_0(simd16scalari, _simd16_setzero_si, _mm256_setzero_si256)
 
-INLINE simd16scalar _simd16_set1_ps(float a)
+INLINE simd16scalar SIMDAPI _simd16_set1_ps(float a)
 {
 simd16scalar result;
 
@@ -131,7 +137,7 @@ INLINE simd16scalar _simd16_set1_ps(float a)
 return result;
 }
 
-INLINE simd16scalari _simd16_set1_epi8(char a)
+INLINE simd16scalari SIMDAPI _simd16_set1_epi8(char a)
 {
 simd16scalari result;
 
@@ -141,7 +147,7 @@ INLINE simd16scalari _simd16_set1_epi8(char a)
 return result;
 }
 
-INLINE simd16scalari _simd16_set1_epi32(int a)
+INLINE simd16scalari SIMDAPI _simd16_set1_epi32(int a)
 {
 simd16scalari result;
 
@@ -151,7 +157,7 @@ INLINE simd16scalari _simd16_set1_epi32(int a)
 return result;
 }
 
-INLINE simd16scalar _simd16_set_ps(float e15, float e14, float e13, float e12, 
float e11, float e10, float e9, float e8, float e7, float e6, float e5, float 
e4, float e3, float e2, float e1, float e0)
+INLINE simd16scalar SIMDAPI _simd16_set_ps(float e15, float e14, float e13, 
float e12, float e11, float e10, float e9, float e8, float e7, float e6, float 
e5, float e4, float e3, float e2, float e1, float e0)
 {
 simd16scalar result;
 
@@ -161,7 +167,7 @@ INLINE simd16scalar _simd16_set_ps(float e15, float e14, 
float e13, float e12, f
 return result;
 }
 
-INLINE simd16scalari _simd16_set_epi32(int e15, int e14, int e13, int e12, int 
e11, int e10, int e9, int e8, int e7, int e6, int e5, int e4, int e3, int e2, 
int e1, int e0)
+INLINE simd16scalari SIMDAPI _simd16_set_epi32(int e15, int e14, int e13, int 
e12, int e11, int e10, int e9, int e8, int e7, int e6, int e5, int e4, int e3, 
int e2, int e1, int e0)
 {
 simd16scalari result;
 
@@ -171,7 +177,7 @@ INLINE simd16scalari _simd16_set_epi32(int e15, int e14, 
int e13, int e12, int e
 return result;
 }
 
-INLINE simd16scalar _simd16_set_ps(float e7, float e6, float e5, float e4, 
float e3, float e2, float e1, float e0)
+INLINE simd16scalar SIMDAPI _simd16_set_ps(float e7, float e6, float e5, float 
e4, float e3, float e2, float e1, float e0)
 {
 simd16scalar result;
 
@@ -181,7 +187,7 @@ INLINE simd16scalar _simd16_set_ps(float e7, float e6, 
float e5, float e4, float
 return result;
 }
 
-INLINE simd16scalari _simd16_set_epi32(int e7, int e6, int e5, int e4, int e3, 
int e2, int e1, int e0)
+INLINE simd16scalari SIMDAPI _simd16_set_epi32(int e7, int e6, int e5, int e4, 
int e3, int e2, int e1, int e0)
 {
 simd16scalari result;
 
@@ -191,7 +197,7 @@ INLINE simd16scalari _simd16_set_epi32(int e7, int e6, int 
e5, int e4, int e3, i
 return result;
 }
 
-INLINE simd16scalar _simd16_load_ps(float const *m)
+INLINE simd16scalar SIMDAPI _simd16_load_ps(float const *m)
 {
 simd16scalar result;
 
@@ -203,7 +209,7 @@ INLINE simd16scalar _simd16_load_ps(float const *m)
 return result;
 }
 
-INLINE simd16scalar _simd16_loadu_ps(float const *m)
+INLINE simd16scalar SIMDAPI 

[Mesa-dev] [PATCH 08/12] swr: [rasterizer jitter] Remove HAVE_LLVM tests supporting llvm < 3.8

2017-04-10 Thread Tim Rowley
---
 .../drivers/swr/rasterizer/jitter/JitManager.cpp   | 10 ---
 .../drivers/swr/rasterizer/jitter/JitManager.h |  6 -
 .../drivers/swr/rasterizer/jitter/builder_misc.cpp | 31 --
 .../drivers/swr/rasterizer/jitter/builder_misc.h   |  5 
 4 files changed, 52 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp
index bdb8a52..8d1d259 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp
@@ -206,12 +206,7 @@ bool JitManager::SetupModuleFromIR(const uint8_t *pIR, 
size_t length)
 return false;
 }
 
-#if HAVE_LLVM == 0x307
-// llvm-3.7 has mismatched setDataLyout/getDataLayout APIs
-newModule->setDataLayout(*mpExec->getDataLayout());
-#else
 newModule->setDataLayout(mpExec->getDataLayout());
-#endif
 
 mpCurrentModule = newModule.get();
 #if defined(_WIN32)
@@ -256,12 +251,7 @@ void JitManager::DumpAsm(Function* pFunction, const char* 
fileName)
 sprintf(fName, "%s.%s.asm", funcName, fileName);
 #endif
 
-#if HAVE_LLVM == 0x306
-raw_fd_ostream fd(fName, EC, llvm::sys::fs::F_None);
-formatted_raw_ostream filestream(fd);
-#else
 raw_fd_ostream filestream(fName, EC, llvm::sys::fs::F_None);
-#endif
 
 legacy::PassManager* pMPasses = new legacy::PassManager();
 auto* pTarget = mpExec->getTargetMachine();
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.h 
b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.h
index 170bdde..d97ae87 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.h
+++ b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.h
@@ -61,15 +61,9 @@
 
 #include "llvm/Analysis/Passes.h"
 
-#if HAVE_LLVM == 0x306
-#include "llvm/PassManager.h"
-using FunctionPassManager = llvm::FunctionPassManager;
-using PassManager = llvm::PassManager;
-#else
 #include "llvm/IR/LegacyPassManager.h"
 using FunctionPassManager = llvm::legacy::FunctionPassManager;
 using PassManager = llvm::legacy::PassManager;
-#endif
 
 #include "llvm/CodeGen/Passes.h"
 #include "llvm/ExecutionEngine/ExecutionEngine.h"
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
index c28d2ed..09b69c7 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
@@ -236,13 +236,6 @@ namespace SwrJit
 return UndefValue::get(VectorType::get(t, mVWidth));
 }
 
-#if HAVE_LLVM == 0x306
-Value *Builder::VINSERT(Value *vec, Value *val, uint64_t index)
-{
-return VINSERT(vec, val, C((int64_t)index));
-}
-#endif
-
 Value *Builder::VBROADCAST(Value *src)
 {
 // check if src is already a vector
@@ -324,7 +317,6 @@ namespace SwrJit
 return CALLA(Callee, args);
 }
 
-#if HAVE_LLVM > 0x306
 CallInst *Builder::CALL(Value *Callee, Value* arg)
 {
 std::vector args;
@@ -348,7 +340,6 @@ namespace SwrJit
 args.push_back(arg3);
 return CALLA(Callee, args);
 }
-#endif
 
 //
 Value *Builder::DEBUGTRAP()
@@ -504,11 +495,7 @@ namespace SwrJit
 
 // get a pointer to the first character in the constant string array
 std::vector geplist{C(0),C(0)};
-#if HAVE_LLVM == 0x306
-Constant *strGEP = ConstantExpr::getGetElementPtr(gvPtr,geplist,false);
-#else
 Constant *strGEP = ConstantExpr::getGetElementPtr(nullptr, 
gvPtr,geplist,false);
-#endif
 
 // insert the pointer to the format string in the argument vector
 printCallArgs[0] = strGEP;
@@ -1536,11 +1523,7 @@ namespace SwrJit
 Value* Builder::STACKSAVE()
 {
 Function* pfnStackSave = 
Intrinsic::getDeclaration(JM()->mpCurrentModule, Intrinsic::stacksave);
-#if HAVE_LLVM == 0x306
-return CALL(pfnStackSave);
-#else
 return CALLA(pfnStackSave);
-#endif
 }
 
 void Builder::STACKRESTORE(Value* pSaved)
@@ -1594,29 +1577,16 @@ namespace SwrJit
 
 Value *Builder::VEXTRACTI128(Value* a, Constant* imm8)
 {
-#if HAVE_LLVM == 0x306
-Function *func =
-Intrinsic::getDeclaration(JM()->mpCurrentModule,
-  Intrinsic::x86_avx_vextractf128_si_256);
-return CALL(func, {a, imm8});
-#else
 bool flag = !imm8->isZeroValue();
 SmallVector idx;
 for (unsigned i = 0; i < mVWidth / 2; i++) {
 idx.push_back(C(flag ? i + mVWidth / 2 : i));
 }
 return VSHUFFLE(a, VUNDEF_I(), ConstantVector::get(idx));
-#endif
 }
 
 Value *Builder::VINSERTI128(Value* a, Value* b, Constant* imm8)
 {
-#if HAVE_LLVM == 0x306
-

[Mesa-dev] [PATCH 10/12] swr: [rasterizer archrast] Fix archrast for MSVC 2017 compiler

2017-04-10 Thread Tim Rowley
---
 src/gallium/drivers/swr/rasterizer/archrast/archrast.cpp  | 2 +-
 src/gallium/drivers/swr/rasterizer/archrast/archrast.h| 2 +-
 src/gallium/drivers/swr/rasterizer/archrast/eventmanager.h| 2 +-
 src/gallium/drivers/swr/rasterizer/codegen/templates/gen_ar_event.cpp | 2 +-
 src/gallium/drivers/swr/rasterizer/codegen/templates/gen_ar_event.hpp | 4 ++--
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/archrast/archrast.cpp 
b/src/gallium/drivers/swr/rasterizer/archrast/archrast.cpp
index a7d41e2..cda1612 100644
--- a/src/gallium/drivers/swr/rasterizer/archrast/archrast.cpp
+++ b/src/gallium/drivers/swr/rasterizer/archrast/archrast.cpp
@@ -298,7 +298,7 @@ namespace ArchRast
 }
 
 // Dispatch event for this thread.
-void Dispatch(HANDLE hThreadContext, Event& event)
+void Dispatch(HANDLE hThreadContext, const Event& event)
 {
 EventManager* pManager = FromHandle(hThreadContext);
 SWR_ASSERT(pManager != nullptr);
diff --git a/src/gallium/drivers/swr/rasterizer/archrast/archrast.h 
b/src/gallium/drivers/swr/rasterizer/archrast/archrast.h
index 1b81e6e..fa88a49 100644
--- a/src/gallium/drivers/swr/rasterizer/archrast/archrast.h
+++ b/src/gallium/drivers/swr/rasterizer/archrast/archrast.h
@@ -42,7 +42,7 @@ namespace ArchRast
 void DestroyThreadContext(HANDLE hThreadContext);
 
 // Dispatch event for this thread.
-void Dispatch(HANDLE hThreadContext, Event& event);
+void Dispatch(HANDLE hThreadContext, const Event& event);
 void FlushDraw(HANDLE hThreadContext, uint32_t drawId);
 };
 
diff --git a/src/gallium/drivers/swr/rasterizer/archrast/eventmanager.h 
b/src/gallium/drivers/swr/rasterizer/archrast/eventmanager.h
index 44f75e4..c251daf 100644
--- a/src/gallium/drivers/swr/rasterizer/archrast/eventmanager.h
+++ b/src/gallium/drivers/swr/rasterizer/archrast/eventmanager.h
@@ -60,7 +60,7 @@ namespace ArchRast
 mHandlers.push_back(pHandler);
 }
 
-void Dispatch(Event& event)
+void Dispatch(const Event& event)
 {
 ///@todo Add event filter check here.
 
diff --git 
a/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_ar_event.cpp 
b/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_ar_event.cpp
index d48fda6..1ecb455 100644
--- a/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_ar_event.cpp
+++ b/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_ar_event.cpp
@@ -37,7 +37,7 @@
 using namespace ArchRast;
 % for name in protos['event_names']:
 
-void ${name}::Accept(EventHandler* pHandler)
+void ${name}::Accept(EventHandler* pHandler) const
 {
 pHandler->Handle(*this);
 }
diff --git 
a/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_ar_event.hpp 
b/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_ar_event.hpp
index e792f5f..685a10b 100644
--- a/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_ar_event.hpp
+++ b/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_ar_event.hpp
@@ -57,7 +57,7 @@ namespace ArchRast
 Event() {}
 virtual ~Event() {}
 
-virtual void Accept(EventHandler* pHandler) = 0;
+virtual void Accept(EventHandler* pHandler) const = 0;
 };
 % for name in protos['event_names']:
 
@@ -102,7 +102,7 @@ namespace ArchRast
 % endfor
 }
 
-virtual void Accept(EventHandler* pHandler);
+virtual void Accept(EventHandler* pHandler) const;
 };
 % endfor
 }
\ No newline at end of file
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/12] swr: [rasterizer core] Disable 8x2 tile backend

2017-04-10 Thread Tim Rowley
---
 src/gallium/drivers/swr/rasterizer/core/knobs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/knobs.h 
b/src/gallium/drivers/swr/rasterizer/core/knobs.h
index e347558..7928f5d 100644
--- a/src/gallium/drivers/swr/rasterizer/core/knobs.h
+++ b/src/gallium/drivers/swr/rasterizer/core/knobs.h
@@ -39,7 +39,7 @@
 ///
 
 #define ENABLE_AVX512_SIMD161
-#define USE_8x2_TILE_BACKEND1
+#define USE_8x2_TILE_BACKEND0
 #define USE_SIMD16_FRONTEND 0
 
 ///
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/12] swr: [rasterizer common] Add _simd_testz_si alias

2017-04-10 Thread Tim Rowley
---
 src/gallium/drivers/swr/rasterizer/common/simdintrin.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/swr/rasterizer/common/simdintrin.h 
b/src/gallium/drivers/swr/rasterizer/common/simdintrin.h
index 1e3f14c..61c0c54 100644
--- a/src/gallium/drivers/swr/rasterizer/common/simdintrin.h
+++ b/src/gallium/drivers/swr/rasterizer/common/simdintrin.h
@@ -618,6 +618,7 @@ __m256i _simd_packs_epi32(__m256i a, __m256i b)
 #define _simd_loadu_si _mm256_loadu_si256
 #define _simd_sub_ps _mm256_sub_ps
 #define _simd_testz_ps _mm256_testz_ps
+#define _simd_testz_si _mm256_testz_si256
 #define _simd_xor_ps _mm256_xor_ps
 
 INLINE
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/12] swr: [rasterizer jitter] Remove unused function

2017-04-10 Thread Tim Rowley
---
 .../drivers/swr/rasterizer/jitter/JitManager.cpp   | 34 --
 .../drivers/swr/rasterizer/jitter/JitManager.h |  1 -
 2 files changed, 35 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp
index 8d1d259..5d8ad27 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp
@@ -187,40 +187,6 @@ void JitManager::SetupNewModule()
 mIsModuleFinalized = false;
 }
 
-//
-/// @brief Create new LLVM module from IR.
-bool JitManager::SetupModuleFromIR(const uint8_t *pIR, size_t length)
-{
-std::unique_ptr pMem = 
MemoryBuffer::getMemBuffer(StringRef((const char*)pIR, length), "");
-
-SMDiagnostic Err;
-std::unique_ptr newModule = parseIR(pMem.get()->getMemBufferRef(), 
Err, mContext);
-
-
-SWR_REL_ASSERT(
-!(newModule == nullptr),
-"Parse failed!\n"
-"%s", Err.getMessage().data());
-if (newModule == nullptr)
-{
-return false;
-}
-
-newModule->setDataLayout(mpExec->getDataLayout());
-
-mpCurrentModule = newModule.get();
-#if defined(_WIN32)
-// Needed for MCJIT on windows
-Triple hostTriple(sys::getProcessTriple());
-hostTriple.setObjectFormat(Triple::ELF);
-newModule->setTargetTriple(hostTriple.getTriple());
-#endif // _WIN32
-
-mpExec->addModule(std::move(newModule));
-mIsModuleFinalized = false;
-
-return true;
-}
 
 //
 /// @brief Dump function x86 assembly to file.
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.h 
b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.h
index d97ae87..97d9312 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.h
+++ b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.h
@@ -172,7 +172,6 @@ struct JitManager
 std::string mCore;
 
 void SetupNewModule();
-bool SetupModuleFromIR(const uint8_t *pIR, size_t length);
 
 void DumpAsm(llvm::Function* pFunction, const char* fileName);
 static void DumpToFile(llvm::Function *f, const char *fileName);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/12] swr: [rasterizer core] Fix unused variable warnings

2017-04-10 Thread Tim Rowley
---
 src/gallium/drivers/swr/rasterizer/core/api.cpp | 2 +-
 src/gallium/drivers/swr/rasterizer/core/backend.cpp | 1 -
 src/gallium/drivers/swr/rasterizer/core/binner.cpp  | 8 
 3 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/api.cpp 
b/src/gallium/drivers/swr/rasterizer/core/api.cpp
index 1710cc6..5c3225d 100644
--- a/src/gallium/drivers/swr/rasterizer/core/api.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/api.cpp
@@ -788,7 +788,6 @@ extern PFN_BACKEND_FUNC 
gBackendSingleSample[SWR_INPUT_COVERAGE_COUNT][2][2];
 extern PFN_BACKEND_FUNC 
gBackendSampleRateTable[SWR_MULTISAMPLE_TYPE_COUNT][SWR_INPUT_COVERAGE_COUNT][2][2];
 void SetupPipeline(DRAW_CONTEXT *pDC)
 {
-SWR_CONTEXT* pContext = pDC->pContext;
 DRAW_STATE* pState = pDC->pState;
 const SWR_RASTSTATE  = pState->state.rastState;
 const SWR_PS_STATE  = pState->state.psState;
@@ -1630,6 +1629,7 @@ void SWR_API SwrEndFrame(
 {
 SWR_CONTEXT *pContext = GetContext(hContext);
 DRAW_CONTEXT* pDC = GetDrawContext(pContext);
+(void)pDC; // var used
 
 RDTSC_ENDFRAME();
 AR_API_EVENT(FrameEndEvent(pContext->frameCount, pDC->drawId));
diff --git a/src/gallium/drivers/swr/rasterizer/core/backend.cpp 
b/src/gallium/drivers/swr/rasterizer/core/backend.cpp
index e3ed524..39f4802 100644
--- a/src/gallium/drivers/swr/rasterizer/core/backend.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/backend.cpp
@@ -872,7 +872,6 @@ void BackendNullPS(DRAW_CONTEXT *pDC, uint32_t workerId, 
uint32_t x, uint32_t y,
 
 AR_BEGIN(BENullBackend, pDC->drawId);
 ///@todo: handle center multisample pattern
-typedef SwrBackendTraits T;
 AR_BEGIN(BESetup, pDC->drawId);
 
 const API_STATE  = GetApiState(pDC);
diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
index 239c497..9d36f21 100644
--- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
@@ -2209,7 +2209,6 @@ void BinPostSetupLines(
 
 const API_STATE& state = GetApiState(pDC);
 const SWR_RASTSTATE& rastState = state.rastState;
-const SWR_FRONTEND_STATE& feState = state.frontendState;
 const SWR_GS_STATE& gsState = state.gsState;
 
 // Select attribute processor
@@ -2640,16 +2639,9 @@ void BinLines(
 simdscalari primID,
 simdscalari viewportIdx)
 {
-SWR_CONTEXT *pContext = pDC->pContext;
-
 const API_STATE& state = GetApiState(pDC);
 const SWR_RASTSTATE& rastState = state.rastState;
 const SWR_FRONTEND_STATE& feState = state.frontendState;
-const SWR_GS_STATE& gsState = state.gsState;
-
-// Select attribute processor
-PFN_PROCESS_ATTRIBUTES pfnProcessAttribs = GetProcessAttributesFunc(2,
-state.backendState.swizzleEnable, 
state.backendState.constantInterpolationMask);
 
 simdscalar vRecipW[2] = { _simd_set1_ps(1.0f), _simd_set1_ps(1.0f) };
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/12] swr: [rasterizer core] Multisample sample position setup change

2017-04-10 Thread Tim Rowley
---
 .../drivers/swr/rasterizer/codegen/gen_backends.py | 25 --
 .../drivers/swr/rasterizer/core/multisample.cpp| 44 +-
 .../drivers/swr/rasterizer/core/multisample.h  | 98 --
 3 files changed, 92 insertions(+), 75 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/codegen/gen_backends.py 
b/src/gallium/drivers/swr/rasterizer/codegen/gen_backends.py
index 242ab7a..d9e938a 100644
--- a/src/gallium/drivers/swr/rasterizer/codegen/gen_backends.py
+++ b/src/gallium/drivers/swr/rasterizer/codegen/gen_backends.py
@@ -38,14 +38,24 @@ def main(args=sys.argv[1:]):
 parser.add_argument('--cpp', help="Generate cpp file(s)", 
action='store_true', default=False)
 parser.add_argument('--cmake', help="Generate cmake file", 
action='store_true', default=False)
 
-
 args = parser.parse_args(args);
 
+class backendStrs :
+def __init__(self) :
+self.outFileName = 'gen_BackendPixelRate%s.cpp'
+self.functionTableName = 'gBackendPixelRateTable'
+self.funcInstanceHeader = ' = BackendPixelRate>;'
+tempStr += backend.funcInstanceHeader + ','.join(map(str, 
output_combinations[x])) + '>>;'
 #append the line of c++ code in the list of output lines
 output_list.append(tempStr)
 
@@ -72,8 +82,8 @@ def main(args=sys.argv[1:]):
 
 # generate .cpp files
 if args.cpp:
-baseCppName = os.path.join(args.outdir, 'gen_BackendPixelRate%s.cpp')
-templateCpp = os.path.join(thisDir, 'templates', 'gen_backend.cpp')
+baseCppName = os.path.join(args.outdir, backend.outFileName)
+templateCpp = os.path.join(thisDir, 'templates', backend.template)
 
 for fileNum in range(numFiles):
 filename = baseCppName % str(fileNum)
@@ -88,12 +98,13 @@ def main(args=sys.argv[1:]):
 # generate gen_backend.cmake file
 if args.cmake:
 templateCmake = os.path.join(thisDir, 'templates', 'gen_backend.cmake')
-cmakeFile = os.path.join(args.outdir, 'gen_backends.cmake')
+cmakeFile = os.path.join(args.outdir, backend.cmakeFileName)
 #print('Generating', cmakeFile)
 MakoTemplateWriter.to_file(
 templateCmake,
 cmakeFile,
 cmdline=sys.argv,
+srcVar=backend.cmakeSrcVar,
 numFiles=numFiles,
 baseCppName='${RASTY_GEN_SRC_DIR}/backends/' + 
os.path.basename(baseCppName))
 
diff --git a/src/gallium/drivers/swr/rasterizer/core/multisample.cpp 
b/src/gallium/drivers/swr/rasterizer/core/multisample.cpp
index 88a0ef7..8b20f7a 100644
--- a/src/gallium/drivers/swr/rasterizer/core/multisample.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/multisample.cpp
@@ -25,28 +25,24 @@
 **/
 
 #include "multisample.h"
-const uint32_t MultisampleTraits::samplePosXi {0x80};
-const uint32_t MultisampleTraits::samplePosYi {0x80};
-const uint32_t MultisampleTraits::samplePosXi[2] {0xC0, 
0x40};
-const uint32_t MultisampleTraits::samplePosYi[2] {0xC0, 
0x40};
-const uint32_t MultisampleTraits::samplePosXi[4] {0x60, 
0xE0, 0x20, 0xA0};
-const uint32_t MultisampleTraits::samplePosYi[4] {0x20, 
0x60, 0xA0, 0xE0};
-const uint32_t MultisampleTraits::samplePosXi[8] {0x90, 
0x70, 0xD0, 0x50, 0x30, 0x10, 0xB0, 0xF0};
-const uint32_t MultisampleTraits::samplePosYi[8] {0x50, 
0xB0, 0x90, 0x30, 0xD0, 0x70, 0xF0, 0x10};
-const uint32_t MultisampleTraits::samplePosXi[16] 
-{0x90, 0x70, 0x50, 0xC0, 0x30, 0xA0, 0xD0, 0xB0, 0x60, 0x80, 0x40, 0x20, 0x00, 
0xF0, 0xE0, 0x10};
-const uint32_t MultisampleTraits::samplePosYi[16]
-{0x90, 0x50, 0xA0, 0x70, 0x60, 0xD0, 0xB0, 0x30, 0xE0, 0x10, 0x20, 0xC0, 0x80, 
0x40, 0xF0, 

[Mesa-dev] [PATCH 01/12] swr: [rasterizer core] Reduce templates to speed compile

2017-04-10 Thread Tim Rowley
Quick patch to remove some unused template params to cut down
rasterizer compile time.
---
 src/gallium/drivers/swr/rasterizer/core/binner.cpp |  8 +--
 .../drivers/swr/rasterizer/core/rasterizer.cpp |  6 +-
 .../drivers/swr/rasterizer/core/rasterizer.h   | 67 +-
 3 files changed, 71 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
index 9ec5bea..eb1f20b 100644
--- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
@@ -795,7 +795,7 @@ void BinTriangles(
 {
 // degenerate triangles won't be sent to rasterizer; just enable all 
edges
 pfnWork = GetRasterizerFunc(rastState.sampleCount, 
rastState.bIsCenterPattern, (rastState.conservativeRast > 0), 
-(SWR_INPUT_COVERAGE)pDC->pState->state.psState.inputCoverage, 
ALL_EDGES_VALID, (state.scissorsTileAligned == false));
+(SWR_INPUT_COVERAGE)pDC->pState->state.psState.inputCoverage, 
EdgeValToEdgeState(ALL_EDGES_VALID), (state.scissorsTileAligned == false));
 }
 
 if (!triMask)
@@ -941,7 +941,7 @@ endBinTriangles:
 // only rasterize valid edges if we have a degenerate primitive
 int32_t triEdgeEnable = (edgeEnable >> (triIndex * 3)) & 
ALL_EDGES_VALID;
 work.pfnWork = GetRasterizerFunc(rastState.sampleCount, 
rastState.bIsCenterPattern, (rastState.conservativeRast > 0), 
-(SWR_INPUT_COVERAGE)pDC->pState->state.psState.inputCoverage, 
triEdgeEnable, (state.scissorsTileAligned == false));
+(SWR_INPUT_COVERAGE)pDC->pState->state.psState.inputCoverage, 
EdgeValToEdgeState(triEdgeEnable), (state.scissorsTileAligned == false));
 
 // Degenerate triangles are required to be constant interpolated
 isDegenerate = (triEdgeEnable != ALL_EDGES_VALID) ? true : false;
@@ -1236,7 +1236,7 @@ void BinTriangles_simd16(
 {
 // degenerate triangles won't be sent to rasterizer; just enable all 
edges
 pfnWork = GetRasterizerFunc(rastState.sampleCount, 
rastState.bIsCenterPattern, (rastState.conservativeRast > 0),
-(SWR_INPUT_COVERAGE)pDC->pState->state.psState.inputCoverage, 
ALL_EDGES_VALID, (state.scissorsTileAligned == false));
+(SWR_INPUT_COVERAGE)pDC->pState->state.psState.inputCoverage, 
EdgeValToEdgeState(ALL_EDGES_VALID), (state.scissorsTileAligned == false));
 }
 
 if (!triMask)
@@ -1396,7 +1396,7 @@ endBinTriangles:
 // only rasterize valid edges if we have a degenerate primitive
 int32_t triEdgeEnable = (edgeEnable >> (triIndex * 3)) & 
ALL_EDGES_VALID;
 work.pfnWork = GetRasterizerFunc(rastState.sampleCount, 
rastState.bIsCenterPattern, (rastState.conservativeRast > 0),
-(SWR_INPUT_COVERAGE)pDC->pState->state.psState.inputCoverage, 
triEdgeEnable, (state.scissorsTileAligned == false));
+(SWR_INPUT_COVERAGE)pDC->pState->state.psState.inputCoverage, 
EdgeValToEdgeState(triEdgeEnable), (state.scissorsTileAligned == false));
 
 // Degenerate triangles are required to be constant interpolated
 isDegenerate = (triEdgeEnable != ALL_EDGES_VALID) ? true : false;
diff --git a/src/gallium/drivers/swr/rasterizer/core/rasterizer.cpp 
b/src/gallium/drivers/swr/rasterizer/core/rasterizer.cpp
index 0837841..af54779 100644
--- a/src/gallium/drivers/swr/rasterizer/core/rasterizer.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/rasterizer.cpp
@@ -1343,7 +1343,7 @@ void RasterizeTriPoint(DRAW_CONTEXT *pDC, uint32_t 
workerId, uint32_t macroTile,
 PFN_WORK_FUNC pfnTriRast;
 // conservative rast not supported for points/lines
 pfnTriRast = GetRasterizerFunc(rastState.sampleCount, 
rastState.bIsCenterPattern, false, 
-   SWR_INPUT_COVERAGE_NONE, ALL_EDGES_VALID, 
(pDC->pState->state.scissorsTileAligned == false));
+   SWR_INPUT_COVERAGE_NONE, 
EdgeValToEdgeState(ALL_EDGES_VALID), (pDC->pState->state.scissorsTileAligned == 
false));
 
 // overwrite texcoords for point sprites
 if (isPointSpriteTexCoordEnabled)
@@ -1676,7 +1676,7 @@ void RasterizeLine(DRAW_CONTEXT *pDC, uint32_t workerId, 
uint32_t macroTile, voi
 PFN_WORK_FUNC pfnTriRast;
 // conservative rast not supported for points/lines
 pfnTriRast = GetRasterizerFunc(rastState.sampleCount, 
rastState.bIsCenterPattern, false, 
-   SWR_INPUT_COVERAGE_NONE, ALL_EDGES_VALID, 
(pDC->pState->state.scissorsTileAligned == false));
+   SWR_INPUT_COVERAGE_NONE, 
EdgeValToEdgeState(ALL_EDGES_VALID), (pDC->pState->state.scissorsTileAligned == 
false));
 
 // make sure this macrotile intersects the triangle
 __m128i vXai = fpToFixedPoint(vXa);
@@ -1798,6 +1798,6 @@ PFN_WORK_FUNC GetRasterizerFunc(
 IsCenter,

[Mesa-dev] [PATCH 05/12] swr: [rasterizer core] Code formating change

2017-04-10 Thread Tim Rowley
---
 src/gallium/drivers/swr/rasterizer/core/state.h | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/state.h 
b/src/gallium/drivers/swr/rasterizer/core/state.h
index eec68cd..535b85e 100644
--- a/src/gallium/drivers/swr/rasterizer/core/state.h
+++ b/src/gallium/drivers/swr/rasterizer/core/state.h
@@ -1131,16 +1131,16 @@ struct SWR_PS_STATE
 PFN_PIXEL_KERNEL pfnPixelShader;  // @llvm_pfn
 
 // dword 2
-uint32_t killsPixel : 1;// pixel shader can kill pixels
-uint32_t inputCoverage  : 2;// ps uses input coverage
-uint32_t writesODepth   : 1;// pixel shader writes to depth
-uint32_t usesSourceDepth: 1;// pixel shader reads depth
-uint32_t shadingRate: 2;// shading per pixel / sample / coarse 
pixel
-uint32_t numRenderTargets   : 4;// number of render target outputs in 
use (0-8)
-uint32_t posOffset  : 2;// type of offset (none, sample, 
centroid) to add to pixel position
-uint32_t barycentricsMask   : 3;// which type(s) of barycentric coords 
does the PS interpolate attributes with
-uint32_t usesUAV: 1;// pixel shader accesses UAV 
-uint32_t forceEarlyZ: 1;// force execution of early 
depth/stencil test
+uint32_t killsPixel : 1;// pixel shader can kill pixels
+uint32_t inputCoverage  : 2;// ps uses input coverage
+uint32_t writesODepth   : 1;// pixel shader writes to depth
+uint32_t usesSourceDepth: 1;// pixel shader reads depth
+uint32_t shadingRate: 2;// shading per pixel / sample / 
coarse pixel
+uint32_t numRenderTargets   : 4;// number of render target outputs 
in use (0-8)
+uint32_t posOffset  : 2;// type of offset (none, sample, 
centroid) to add to pixel position
+uint32_t barycentricsMask   : 3;// which type(s) of barycentric 
coords does the PS interpolate attributes with
+uint32_t usesUAV: 1;// pixel shader accesses UAV 
+uint32_t forceEarlyZ: 1;// force execution of early 
depth/stencil test
 };
 
 // depth bounds state
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/12] swr: [rasterizer core] SIMD16 Frontend WIP - Clipper

2017-04-10 Thread Tim Rowley
Implement widened clipper for SIMD16.
---
 .../drivers/swr/rasterizer/common/simd16intrin.h   |   41 +-
 src/gallium/drivers/swr/rasterizer/core/binner.cpp |   17 +-
 src/gallium/drivers/swr/rasterizer/core/clip.cpp   |   91 +-
 src/gallium/drivers/swr/rasterizer/core/clip.h | 1027 ++--
 src/gallium/drivers/swr/rasterizer/core/frontend.h |   29 +-
 5 files changed, 1011 insertions(+), 194 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h 
b/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h
index e5c34c2..fee50d0 100644
--- a/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h
+++ b/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h
@@ -436,7 +436,7 @@ INLINE simd16scalar _simd16_cvtepi32_ps(simd16scalari a)
 }
 
 template 
-INLINE simd16scalar _simd16_cmp_ps(simd16scalar a, simd16scalar b)
+INLINE simd16scalar _simd16_cmp_ps_temp(simd16scalar a, simd16scalar b)
 {
 simd16scalar result;
 
@@ -446,12 +446,14 @@ INLINE simd16scalar _simd16_cmp_ps(simd16scalar a, 
simd16scalar b)
 return result;
 }
 
-#define _simd16_cmplt_ps(a, b) _simd16_cmp_ps<_CMP_LT_OQ>(a, b)
-#define _simd16_cmpgt_ps(a, b) _simd16_cmp_ps<_CMP_GT_OQ>(a, b)
-#define _simd16_cmpneq_ps(a, b) _simd16_cmp_ps<_CMP_NEQ_OQ>(a, b)
-#define _simd16_cmpeq_ps(a, b) _simd16_cmp_ps<_CMP_EQ_OQ>(a, b)
-#define _simd16_cmpge_ps(a, b) _simd16_cmp_ps<_CMP_GE_OQ>(a, b)
-#define _simd16_cmple_ps(a, b) _simd16_cmp_ps<_CMP_LE_OQ>(a, b)
+#define _simd16_cmp_ps(a, b, comp)  _simd16_cmp_ps_temp(a, b)
+
+#define _simd16_cmplt_ps(a, b)  _simd16_cmp_ps(a, b, _CMP_LT_OQ)
+#define _simd16_cmpgt_ps(a, b)  _simd16_cmp_ps(a, b, _CMP_GT_OQ)
+#define _simd16_cmpneq_ps(a, b) _simd16_cmp_ps(a, b, _CMP_NEQ_OQ)
+#define _simd16_cmpeq_ps(a, b)  _simd16_cmp_ps(a, b, _CMP_EQ_OQ)
+#define _simd16_cmpge_ps(a, b)  _simd16_cmp_ps(a, b, _CMP_GE_OQ)
+#define _simd16_cmple_ps(a, b)  _simd16_cmp_ps(a, b, _CMP_LE_OQ)
 
 SIMD16_EMU_AVX512_2(simd16scalar, _simd16_and_ps, _simd_and_ps)
 SIMD16_EMU_AVX512_2(simd16scalar, _simd16_andnot_ps, _simd_andnot_ps)
@@ -525,8 +527,8 @@ SIMD16_EMU_AVX512_2(simd16scalari, _simd16_cmplt_epi32, 
_simd_cmplt_epi32)
 
 INLINE int _simd16_testz_ps(simd16scalar a, simd16scalar b)
 {
-int lo = _mm256_testz_ps(a.lo, b.lo);
-int hi = _mm256_testz_ps(a.hi, b.hi);
+int lo = _simd_testz_ps(a.lo, b.lo);
+int hi = _simd_testz_ps(a.hi, b.hi);
 
 return lo & hi;
 }
@@ -912,19 +914,19 @@ INLINE int _simd16_movemask_epi8(simd16scalari a)
 template 
 INLINE simd16scalar _simd16_cmp_ps_temp(simd16scalar a, simd16scalar b)
 {
-simd16mask k = _mm512_cmpeq_ps_mask(a, b);
+simd16mask k = _mm512_cmp_ps_mask(a, b, comp);
 
 return _mm512_castsi512_ps(_mm512_mask_blend_epi32(k, 
_mm512_setzero_epi32(), _mm512_set1_epi32(0x)));
 }
 
 #define _simd16_cmp_ps(a, b, comp)  _simd16_cmp_ps_temp(a, b)
 
-#define _simd16_cmplt_ps(a, b)  _simd16_cmp_ps<_CMP_LT_OQ>(a, b)
-#define _simd16_cmpgt_ps(a, b)  _simd16_cmp_ps<_CMP_GT_OQ>(a, b)
-#define _simd16_cmpneq_ps(a, b) _simd16_cmp_ps<_CMP_NEQ_OQ>(a, b)
-#define _simd16_cmpeq_ps(a, b)  _simd16_cmp_ps<_CMP_EQ_OQ>(a, b)
-#define _simd16_cmpge_ps(a, b)  _simd16_cmp_ps<_CMP_GE_OQ>(a, b)
-#define _simd16_cmple_ps(a, b)  _simd16_cmp_ps<_CMP_LE_OQ>(a, b)
+#define _simd16_cmplt_ps(a, b)  _simd16_cmp_ps(a, b, _CMP_LT_OQ)
+#define _simd16_cmpgt_ps(a, b)  _simd16_cmp_ps(a, b, _CMP_GT_OQ)
+#define _simd16_cmpneq_ps(a, b) _simd16_cmp_ps(a, b, _CMP_NEQ_OQ)
+#define _simd16_cmpeq_ps(a, b)  _simd16_cmp_ps(a, b, _CMP_EQ_OQ)
+#define _simd16_cmpge_ps(a, b)  _simd16_cmp_ps(a, b, _CMP_GE_OQ)
+#define _simd16_cmple_ps(a, b)  _simd16_cmp_ps(a, b, _CMP_LE_OQ)
 
 #define _simd16_castsi_ps   _mm512_castsi512_ps
 #define _simd16_castps_si   _mm512_castps_si512
@@ -982,17 +984,14 @@ INLINE simd16scalari _simd16_cmplt_epi32(simd16scalari a, 
simd16scalari b)
 return _mm512_mask_blend_epi32(k, _mm512_setzero_epi32(), 
_mm512_set1_epi32(0x));
 }
 
-#if 0
 INLINE int _simd16_testz_ps(simd16scalar a, simd16scalar b)
 {
-int lo = _mm256_testz_ps(a.lo, b.lo);
-int hi = _mm256_testz_ps(a.hi, b.hi);
+int lo = _simd_testz_ps(_simd16_extract_ps(a, 0), _simd16_extract_ps(b, 
0));
+int hi = _simd_testz_ps(_simd16_extract_ps(a, 1), _simd16_extract_ps(b, 
1));
 
 return lo & hi;
 }
 
-#endif
-
 #define _simd16_unpacklo_ps   _mm512_unpacklo_ps
 #define _simd16_unpackhi_ps   _mm512_unpackhi_ps
 #define _simd16_unpacklo_pd   _mm512_unpacklo_pd
diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
index eb1f20b..239c497 100644
--- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
@@ -1007,16 +1007,6 @@ endBinTriangles:
 }
 
 #if USE_SIMD16_FRONTEND
-inline uint32_t GetPrimMaskLo(uint32_t primMask)
-{
-return 

[Mesa-dev] [PATCH 04/12] swr: [rasterizer core] SIMD16 Frontend WIP - PA

2017-04-10 Thread Tim Rowley
Fix PA NextPrim for SIMD8 on SIMD16.
---
 src/gallium/drivers/swr/rasterizer/core/pa_avx.cpp | 44 +++---
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/pa_avx.cpp 
b/src/gallium/drivers/swr/rasterizer/core/pa_avx.cpp
index 3e3b7ab..6a24963 100644
--- a/src/gallium/drivers/swr/rasterizer/core/pa_avx.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/pa_avx.cpp
@@ -456,7 +456,7 @@ static bool PaPatchListTerm(PA_STATE_OPT& pa, uint32_t 
slot, simdvector verts[])
 PaPatchList,
 PaPatchListSingle,
 0,
-KNOB_SIMD_WIDTH,
+PA_STATE_OPT::SIMD_WIDTH,
 true);
 
 return true;
@@ -509,7 +509,7 @@ static bool PaPatchListTerm_simd16(PA_STATE_OPT& pa, 
uint32_t slot, simd16vector
 PaPatchList,
 PaPatchListSingle,
 0,
-KNOB_SIMD16_WIDTH,
+PA_STATE_OPT::SIMD_WIDTH,
 true);
 
 return true;
@@ -736,7 +736,7 @@ bool PaTriList2(PA_STATE_OPT& pa, uint32_t slot, simdvector 
verts[])
 }
 
 #endif
-SetNextPaState(pa, PaTriList0, PaTriListSingle0, 0, KNOB_SIMD_WIDTH, true);
+SetNextPaState(pa, PaTriList0, PaTriListSingle0, 0, 
PA_STATE_OPT::SIMD_WIDTH, true);
 return true;
 }
 
@@ -783,7 +783,7 @@ bool PaTriList2_simd16(PA_STATE_OPT& pa, uint32_t slot, 
simd16vector verts[])
 v2[i] = _simd16_permute_ps(temp2, perm2);
 }
 
-SetNextPaState_simd16(pa, PaTriList0_simd16, PaTriList0, PaTriListSingle0, 
0, KNOB_SIMD16_WIDTH, true);
+SetNextPaState_simd16(pa, PaTriList0_simd16, PaTriList0, PaTriListSingle0, 
0, PA_STATE_OPT::SIMD_WIDTH, true);
 return true;
 }
 
@@ -1014,7 +1014,7 @@ bool PaTriStrip1(PA_STATE_OPT& pa, uint32_t slot, 
simdvector verts[])
 v2[i] = _simd_shuffle_ps(a0, s, _MM_SHUFFLE(2, 2, 2, 2));
 }
 
-SetNextPaState(pa, PaTriStrip1, PaTriStripSingle0, 0, KNOB_SIMD_WIDTH);
+SetNextPaState(pa, PaTriStrip1, PaTriStripSingle0, 0, 
PA_STATE_OPT::SIMD_WIDTH);
 return true;
 }
 
@@ -1052,7 +1052,7 @@ bool PaTriStrip1_simd16(PA_STATE_OPT& pa, uint32_t slot, 
simd16vector verts[])
 v2[i] = _simd16_shuffle_ps(a[i], shuff, _MM_SHUFFLE(2, 2, 2, 2));  
 // a2 a2 a4 a4 a6 a6 a8 a8 aA aA aC aC aE aE b0 b0
 }
 
-SetNextPaState_simd16(pa, PaTriStrip1_simd16, PaTriStrip1, 
PaTriStripSingle0, 0, KNOB_SIMD16_WIDTH);
+SetNextPaState_simd16(pa, PaTriStrip1_simd16, PaTriStrip1, 
PaTriStripSingle0, 0, PA_STATE_OPT::SIMD_WIDTH);
 return true;
 }
 
@@ -1288,7 +1288,7 @@ bool PaTriFan1(PA_STATE_OPT& pa, uint32_t slot, 
simdvector verts[])
 v1[i] = _simd_shuffle_ps(a0, v2[i], _MM_SHUFFLE(2, 1, 2, 1));
 }
 
-SetNextPaState(pa, PaTriFan1, PaTriFanSingle0, 0, KNOB_SIMD_WIDTH);
+SetNextPaState(pa, PaTriFan1, PaTriFanSingle0, 0, 
PA_STATE_OPT::SIMD_WIDTH);
 return true;
 }
 
@@ -1345,7 +1345,7 @@ bool PaTriFan1_simd16(PA_STATE_OPT& pa, uint32_t slot, 
simd16vector verts[])
 v1[i] = _simd16_shuffle_ps(b[i], v2[i], _MM_SHUFFLE(2, 1, 2, 1));  
 // b1 b2 b3 b4 b5 b6 b7 b8 b9 bA bB bC bD bE bF c0
 }
 
-SetNextPaState_simd16(pa, PaTriFan1_simd16, PaTriFan1, PaTriFanSingle0, 0, 
KNOB_SIMD16_WIDTH);
+SetNextPaState_simd16(pa, PaTriFan1_simd16, PaTriFan1, PaTriFanSingle0, 0, 
PA_STATE_OPT::SIMD_WIDTH);
 return true;
 }
 
@@ -1480,7 +1480,7 @@ bool PaQuadList1(PA_STATE_OPT& pa, uint32_t slot, 
simdvector verts[])
 v2[i] = _simd_shuffle_ps(s1, s2, _MM_SHUFFLE(3, 2, 3, 2));
 }
 
-SetNextPaState(pa, PaQuadList0, PaQuadListSingle0, 0, KNOB_SIMD_WIDTH, 
true);
+SetNextPaState(pa, PaQuadList0, PaQuadListSingle0, 0, 
PA_STATE_OPT::SIMD_WIDTH, true);
 return true;
 }
 
@@ -1515,7 +1515,7 @@ bool PaQuadList1_simd16(PA_STATE_OPT& pa, uint32_t slot, 
simd16vector verts[])
 v2[i] = _simd16_shuffle_ps(temp0, temp1, _MM_SHUFFLE(3, 2, 3, 2)); 
 // a2 a3 a6 a7 aA aB aE aF b2 b3 b6 b7 bA bB bE bF
 }
 
-SetNextPaState_simd16(pa, PaQuadList0_simd16, PaQuadList0, 
PaQuadListSingle0, 0, KNOB_SIMD16_WIDTH, true);
+SetNextPaState_simd16(pa, PaQuadList0_simd16, PaQuadList0, 
PaQuadListSingle0, 0, PA_STATE_OPT::SIMD_WIDTH, true);
 return true;
 }
 
@@ -1735,7 +1735,7 @@ bool PaLineLoop1(PA_STATE_OPT& pa, uint32_t slot, 
simdvector verts[])
 }
 }
 
-SetNextPaState(pa, PaLineLoop1, PaLineLoopSingle0, 0, KNOB_SIMD_WIDTH);
+SetNextPaState(pa, PaLineLoop1, PaLineLoopSingle0, 0, 
PA_STATE_OPT::SIMD_WIDTH);
 return true;
 }
 
@@ -1765,7 +1765,7 @@ bool PaLineLoop1_simd16(PA_STATE_OPT& pa, uint32_t slot, 
simd16vector verts[])
 }
 }
 
-SetNextPaState_simd16(pa, PaLineLoop1_simd16, PaLineLoop1, 
PaLineLoopSingle0, 0, KNOB_SIMD16_WIDTH);
+SetNextPaState_simd16(pa, PaLineLoop1_simd16, PaLineLoop1, 
PaLineLoopSingle0, 0, PA_STATE_OPT::SIMD_WIDTH);
 return true;
 }
 
@@ -1847,7 +1847,7 @@ bool PaLineList1(PA_STATE_OPT& pa, 

[Mesa-dev] [PATCH 00/12] swr: update rasterizer

2017-04-10 Thread Tim Rowley
Highlights; compile time fix, simd16 work, code cleanup.

Tim Rowley (12):
  swr: [rasterizer core] Reduce templates to speed compile
  swr: [rasterizer core] Multisample sample position setup change
  swr: [rasterizer core] SIMD16 Frontend WIP - Clipper
  swr: [rasterizer core] SIMD16 Frontend WIP - PA
  swr: [rasterizer core] Code formating change
  swr: [rasterizer core] Fix unused variable warnings
  swr: [rasterizer common/core] Fix 32-bit windows build
  swr: [rasterizer jitter] Remove HAVE_LLVM tests supporting llvm < 3.8
  swr: [rasterizer jitter] Remove unused function
  swr: [rasterizer archrast] Fix archrast for MSVC 2017 compiler
  swr: [rasterizer common] Add _simd_testz_si alias
  swr: [rasterizer core] Disable 8x2 tile backend

 .../drivers/swr/rasterizer/archrast/archrast.cpp   |2 +-
 .../drivers/swr/rasterizer/archrast/archrast.h |2 +-
 .../drivers/swr/rasterizer/archrast/eventmanager.h |2 +-
 .../drivers/swr/rasterizer/codegen/gen_backends.py |   25 +-
 .../rasterizer/codegen/templates/gen_ar_event.cpp  |2 +-
 .../rasterizer/codegen/templates/gen_ar_event.hpp  |4 +-
 .../drivers/swr/rasterizer/common/simd16intrin.h   |  237 ++---
 .../drivers/swr/rasterizer/common/simdintrin.h |1 +
 src/gallium/drivers/swr/rasterizer/core/api.cpp|2 +-
 .../drivers/swr/rasterizer/core/backend.cpp|1 -
 src/gallium/drivers/swr/rasterizer/core/binner.cpp |   33 +-
 src/gallium/drivers/swr/rasterizer/core/clip.cpp   |   91 +-
 src/gallium/drivers/swr/rasterizer/core/clip.h | 1033 ++--
 src/gallium/drivers/swr/rasterizer/core/context.h  |2 +-
 .../swr/rasterizer/core/format_conversion.h|8 +-
 .../drivers/swr/rasterizer/core/format_types.h |   22 +-
 src/gallium/drivers/swr/rasterizer/core/frontend.h |   33 +-
 src/gallium/drivers/swr/rasterizer/core/knobs.h|2 +-
 .../drivers/swr/rasterizer/core/multisample.cpp|   44 +-
 .../drivers/swr/rasterizer/core/multisample.h  |   98 +-
 src/gallium/drivers/swr/rasterizer/core/pa_avx.cpp |   44 +-
 .../drivers/swr/rasterizer/core/rasterizer.cpp |6 +-
 .../drivers/swr/rasterizer/core/rasterizer.h   |   67 +-
 src/gallium/drivers/swr/rasterizer/core/state.h|   20 +-
 .../drivers/swr/rasterizer/jitter/JitManager.cpp   |   44 -
 .../drivers/swr/rasterizer/jitter/JitManager.h |7 -
 .../drivers/swr/rasterizer/jitter/builder_misc.cpp |   31 -
 .../drivers/swr/rasterizer/jitter/builder_misc.h   |5 -
 28 files changed, 1337 insertions(+), 531 deletions(-)

-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3 v2] r600g: get rid of dummy pixel shader

2017-04-10 Thread Marek Olšák
For patches 2-3:

Reviewed-by: Marek Olšák 

Marek

On Mon, Apr 10, 2017 at 11:44 AM, Constantine Kharlamov
 wrote:
> If that helps, I can split this patch to two: α) Adding checks for null ps, 
> and β) removing the dummy ps. I didn't do that originally, because the patch 
> was small anyway, it's in the 2-nd version that I had to re-indent a block of 
> code, and now it looks bigger.
>
> On 10.04.2017 00:09, Constantine Kharlamov wrote:
>> The idea is taken from radeonsi. The code mostly was already checking for 
>> null
>> pixel shader, so little checks had to be added.
>>
>> Interestingly, acc. to testing with GTAⅣ, though binding of null shader 
>> happens
>> a lot at the start (then just stops), but draw_vbo() never actually sees null
>> ps.
>>
>> v2: added a check I missed because of a macros using a prefix to choose
>> a shader.
>>
>> Signed-off-by: Constantine Kharlamov 
>> ---
>>  src/gallium/drivers/r600/r600_pipe.c |  9 -
>>  src/gallium/drivers/r600/r600_pipe.h |  3 --
>>  src/gallium/drivers/r600/r600_state_common.c | 58 
>> ++--
>>  3 files changed, 30 insertions(+), 40 deletions(-)
>>
>> diff --git a/src/gallium/drivers/r600/r600_pipe.c 
>> b/src/gallium/drivers/r600/r600_pipe.c
>> index 5014f2525c..7d8efd2c9b 100644
>> --- a/src/gallium/drivers/r600/r600_pipe.c
>> +++ b/src/gallium/drivers/r600/r600_pipe.c
>> @@ -82,9 +82,6 @@ static void r600_destroy_context(struct pipe_context 
>> *context)
>>   if (rctx->fixed_func_tcs_shader)
>>   rctx->b.b.delete_tcs_state(>b.b, 
>> rctx->fixed_func_tcs_shader);
>>
>> - if (rctx->dummy_pixel_shader) {
>> - rctx->b.b.delete_fs_state(>b.b, 
>> rctx->dummy_pixel_shader);
>> - }
>>   if (rctx->custom_dsa_flush) {
>>   rctx->b.b.delete_depth_stencil_alpha_state(>b.b, 
>> rctx->custom_dsa_flush);
>>   }
>> @@ -209,12 +206,6 @@ static struct pipe_context *r600_create_context(struct 
>> pipe_screen *screen,
>>
>>   r600_begin_new_cs(rctx);
>>
>> - rctx->dummy_pixel_shader =
>> - util_make_fragment_cloneinput_shader(>b.b, 0,
>> -  TGSI_SEMANTIC_GENERIC,
>> -  
>> TGSI_INTERPOLATE_CONSTANT);
>> - rctx->b.b.bind_fs_state(>b.b, rctx->dummy_pixel_shader);
>> -
>>   return >b.b;
>>
>>  fail:
>> diff --git a/src/gallium/drivers/r600/r600_pipe.h 
>> b/src/gallium/drivers/r600/r600_pipe.h
>> index 7f1ecc278b..e636ef0024 100644
>> --- a/src/gallium/drivers/r600/r600_pipe.h
>> +++ b/src/gallium/drivers/r600/r600_pipe.h
>> @@ -432,9 +432,6 @@ struct r600_context {
>>   void*custom_blend_resolve;
>>   void*custom_blend_decompress;
>>   void*custom_blend_fastclear;
>> - /* With rasterizer discard, there doesn't have to be a pixel shader.
>> -  * In that case, we bind this one: */
>> - void*dummy_pixel_shader;
>>   /* These dummy CMASK and FMASK buffers are used to get around the R6xx 
>> hardware
>>* bug where valid CMASK and FMASK are required to be present to avoid
>>* a hardlock in certain operations but aren't actually used
>> diff --git a/src/gallium/drivers/r600/r600_state_common.c 
>> b/src/gallium/drivers/r600/r600_state_common.c
>> index c9b41517cc..8d1193360b 100644
>> --- a/src/gallium/drivers/r600/r600_state_common.c
>> +++ b/src/gallium/drivers/r600/r600_state_common.c
>> @@ -725,7 +725,8 @@ static inline void r600_shader_selector_key(const struct 
>> pipe_context *ctx,
>>   if (!key->vs.as_ls)
>>   key->vs.as_es = (rctx->gs_shader != NULL);
>>
>> - if (rctx->ps_shader->current->shader.gs_prim_id_input && 
>> !rctx->gs_shader) {
>> + if (rctx->ps_shader && 
>> rctx->ps_shader->current->shader.gs_prim_id_input &&
>> + !rctx->gs_shader) {
>>   key->vs.as_gs_a = true;
>>   key->vs.prim_id_out = 
>> rctx->ps_shader->current->shader.input[rctx->ps_shader->current->shader.ps_prim_id_input].spi_sid;
>>   }
>> @@ -909,9 +910,6 @@ static void r600_bind_ps_state(struct pipe_context *ctx, 
>> void *state)
>>  {
>>   struct r600_context *rctx = (struct r600_context *)ctx;
>>
>> - if (!state)
>> - state = rctx->dummy_pixel_shader;
>> -
>>   rctx->ps_shader = (struct r600_pipe_shader_selector *)state;
>>  }
>>
>> @@ -1474,7 +1472,8 @@ static bool r600_update_derived_state(struct 
>> r600_context *rctx)
>>   }
>>   }
>>
>> - SELECT_SHADER_OR_FAIL(ps);
>> + if (rctx->ps_shader)
>> + SELECT_SHADER_OR_FAIL(ps);
>>
>>   r600_mark_atom_dirty(rctx, >shader_stages.atom);
>>
>> @@ -1551,37 +1550,40 @@ static bool r600_update_derived_state(struct 
>> r600_context 

Re: [Mesa-dev] [PATCH 1/3 v2] r600g: skip repeating vs, gs, and tes shader binds

2017-04-10 Thread Marek Olšák
On Sun, Apr 9, 2017 at 11:09 PM, Constantine Kharlamov
 wrote:
> The idea is taken from radeonsi. The code lacks some checks for null vs,
> and I'm unsure about some changes against that, so I left it in place.
>
> Some statistics for GTAⅣ:
> Average tesselation bind skip per frame: ≈350
> Average geometric shaders bind skip per frame: ≈260
> Skip of binding vertex ones occurs rarely enough to not get into per-frame
> counter at all, so I just gonna say: it happens.
>
> v2: I've occasionally removed an empty line, don't do this.
>
> Signed-off-by: Constantine Kharlamov 
> ---
>  src/gallium/drivers/r600/r600_state_common.c | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/r600_state_common.c 
> b/src/gallium/drivers/r600/r600_state_common.c
> index 4de2a7344b..94f85e6dd3 100644
> --- a/src/gallium/drivers/r600/r600_state_common.c
> +++ b/src/gallium/drivers/r600/r600_state_common.c
> @@ -931,7 +931,7 @@ static void r600_bind_vs_state(struct pipe_context *ctx, 
> void *state)
>  {
> struct r600_context *rctx = (struct r600_context *)ctx;
>
> -   if (!state)
> +   if (!state || rctx->vs_shader == state)
> return;
>
> rctx->vs_shader = (struct r600_pipe_shader_selector *)state;
> @@ -943,11 +943,12 @@ static void r600_bind_gs_state(struct pipe_context 
> *ctx, void *state)
>  {
> struct r600_context *rctx = (struct r600_context *)ctx;
>
> +   if (state == rctx->gs_shader)
> +   return;
> +
> rctx->gs_shader = (struct r600_pipe_shader_selector *)state;
> r600_update_vs_writes_viewport_index(>b, 
> r600_get_vs_info(rctx));
>
> -   if (!state)
> -   return;
> rctx->b.streamout.stride_in_dw = rctx->gs_shader->so.stride;

This will crash if states == NULL.

>  }
>
> @@ -962,11 +963,12 @@ static void r600_bind_tes_state(struct pipe_context 
> *ctx, void *state)
>  {
> struct r600_context *rctx = (struct r600_context *)ctx;
>
> +   if (state == rctx->tes_shader)
> +   return;
> +
> rctx->tes_shader = (struct r600_pipe_shader_selector *)state;
> r600_update_vs_writes_viewport_index(>b, 
> r600_get_vs_info(rctx));
>
> -   if (!state)
> -   return;
> rctx->b.streamout.stride_in_dw = rctx->tes_shader->so.stride;

Same here.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallium/radeon: add HUD queries for GPU temperature and clocks

2017-04-10 Thread Marek Olšák
For the series:

Reviewed-by: Marek Olšák 

Marek

On Mon, Apr 10, 2017 at 11:49 AM, Samuel Pitoiset
 wrote:
> Only the Radeon kernel driver exposed the GPU temperature and
> the shader/memory clocks, this implements the same functionality
> for the AMDGPU kernel driver.
>
> These queries will return 0 if the DRM version is less than 3.10,
> I don't explicitely check the version here because the query
> codepath is already a bit messy.
>
> v2: - rebase on top of master
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/radeon/r600_query.c   | 12 ++--
>  src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c |  7 ++-
>  2 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/src/gallium/drivers/radeon/r600_query.c 
> b/src/gallium/drivers/radeon/r600_query.c
> index cb90850a50..0980eca788 100644
> --- a/src/gallium/drivers/radeon/r600_query.c
> +++ b/src/gallium/drivers/radeon/r600_query.c
> @@ -1799,6 +1799,10 @@ static struct pipe_driver_query_info 
> r600_driver_query_list[] = {
> XG(GPIN, "GPIN_003",GPIN_NUM_SPI,   UINT, 
> AVERAGE),
> XG(GPIN, "GPIN_004",GPIN_NUM_SE,UINT, 
> AVERAGE),
>
> +   X("temperature",GPU_TEMPERATURE,UINT64, 
> AVERAGE),
> +   X("shader-clock",   CURRENT_GPU_SCLK,   HZ, AVERAGE),
> +   X("memory-clock",   CURRENT_GPU_MCLK,   HZ, AVERAGE),
> +
> /* The following queries must be at the end of the list because their
>  * availability is adjusted dynamically based on the DRM version. */
> X("GPU-load",   GPU_LOAD,   UINT64, 
> AVERAGE),
> @@ -1823,10 +1827,6 @@ static struct pipe_driver_query_info 
> r600_driver_query_list[] = {
> X("GPU-dma-busy",   GPU_DMA_BUSY,   UINT64, 
> AVERAGE),
> X("GPU-scratch-ram-busy",   GPU_SCRATCH_RAM_BUSY,   UINT64, 
> AVERAGE),
> X("GPU-ce-busy",GPU_CE_BUSY,UINT64, 
> AVERAGE),
> -
> -   X("temperature",GPU_TEMPERATURE,UINT64, 
> AVERAGE),
> -   X("shader-clock",   CURRENT_GPU_SCLK,   HZ, AVERAGE),
> -   X("memory-clock",   CURRENT_GPU_MCLK,   HZ, AVERAGE),
>  };
>
>  #undef X
> @@ -1839,9 +1839,9 @@ static unsigned r600_get_num_queries(struct 
> r600_common_screen *rscreen)
> return ARRAY_SIZE(r600_driver_query_list);
> else if (rscreen->info.drm_major == 3) {
> if (rscreen->chip_class >= VI)
> -   return ARRAY_SIZE(r600_driver_query_list) - 3;
> +   return ARRAY_SIZE(r600_driver_query_list);
> else
> -   return ARRAY_SIZE(r600_driver_query_list) - 10;
> +   return ARRAY_SIZE(r600_driver_query_list) - 7;
> }
> else
> return ARRAY_SIZE(r600_driver_query_list) - 25;
> diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c 
> b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
> index bb7e545ed6..f3a0c958ed 100644
> --- a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
> +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
> @@ -471,9 +471,14 @@ static uint64_t amdgpu_query_value(struct radeon_winsys 
> *rws,
>amdgpu_query_heap_info(ws->dev, AMDGPU_GEM_DOMAIN_GTT, 0, );
>return heap.heap_usage;
> case RADEON_GPU_TEMPERATURE:
> +  amdgpu_query_sensor_info(ws->dev, AMDGPU_INFO_SENSOR_GPU_TEMP, 4, 
> );
> +  return retval;
> case RADEON_CURRENT_SCLK:
> +  amdgpu_query_sensor_info(ws->dev, AMDGPU_INFO_SENSOR_GFX_SCLK, 4, 
> );
> +  return retval;
> case RADEON_CURRENT_MCLK:
> -  return 0;
> +  amdgpu_query_sensor_info(ws->dev, AMDGPU_INFO_SENSOR_GFX_MCLK, 4, 
> );
> +  return retval;
> case RADEON_GPU_RESET_COUNTER:
>assert(0);
>return 0;
> --
> 2.12.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] vbo: fix gl_DrawID handling in glMultiDrawArrays

2017-04-10 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Fri, Apr 7, 2017 at 6:30 PM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> Fixes a bug in
> KHR-GL45.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters.
> ---
>  src/mesa/vbo/vbo_exec_array.c | 21 +++--
>  1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c
> index bc4a18f..85d9b4b 100644
> --- a/src/mesa/vbo/vbo_exec_array.c
> +++ b/src/mesa/vbo/vbo_exec_array.c
> @@ -397,36 +397,38 @@ vbo_bind_arrays(struct gl_context *ctx)
>
>
>  /**
>   * Helper function called by the other DrawArrays() functions below.
>   * This is where we handle primitive restart for drawing non-indexed
>   * arrays.  If primitive restart is enabled, it typically means
>   * splitting one DrawArrays() into two.
>   */
>  static void
>  vbo_draw_arrays(struct gl_context *ctx, GLenum mode, GLint start,
> -GLsizei count, GLuint numInstances, GLuint baseInstance)
> +GLsizei count, GLuint numInstances, GLuint baseInstance,
> +GLuint drawID)
>  {
> struct vbo_context *vbo = vbo_context(ctx);
> struct _mesa_prim prim[2];
>
> vbo_bind_arrays(ctx);
>
> /* OpenGL 4.5 says that primitive restart is ignored with non-indexed
>  * draws.
>  */
> memset(prim, 0, sizeof(prim));
> prim[0].begin = 1;
> prim[0].end = 1;
> prim[0].mode = mode;
> prim[0].num_instances = numInstances;
> prim[0].base_instance = baseInstance;
> +   prim[0].draw_id = drawID;
> prim[0].is_indirect = 0;
> prim[0].start = start;
> prim[0].count = count;
>
> vbo->draw_prims(ctx, prim, 1, NULL,
> GL_TRUE, start, start + count - 1, NULL, 0, NULL);
>
> if (MESA_DEBUG_FLAGS & DEBUG_ALWAYS_FLUSH) {
>_mesa_flush(ctx);
> }
> @@ -565,21 +567,21 @@ vbo_exec_DrawArrays(GLenum mode, GLint start, GLsizei 
> count)
> if (MESA_VERBOSE & VERBOSE_DRAW)
>_mesa_debug(ctx, "glDrawArrays(%s, %d, %d)\n",
>_mesa_enum_to_string(mode), start, count);
>
> if (!_mesa_validate_DrawArrays(ctx, mode, count))
>return;
>
> if (0)
>check_draw_arrays_data(ctx, start, count);
>
> -   vbo_draw_arrays(ctx, mode, start, count, 1, 0);
> +   vbo_draw_arrays(ctx, mode, start, count, 1, 0, 0);
>
> if (0)
>print_draw_arrays(ctx, mode, start, count);
>  }
>
>
>  /**
>   * Called from glDrawArraysInstanced when in immediate mode (not
>   * display list mode).
>   */
> @@ -593,21 +595,21 @@ vbo_exec_DrawArraysInstanced(GLenum mode, GLint start, 
> GLsizei count,
>_mesa_debug(ctx, "glDrawArraysInstanced(%s, %d, %d, %d)\n",
>_mesa_enum_to_string(mode), start, count, numInstances);
>
> if (!_mesa_validate_DrawArraysInstanced(ctx, mode, start, count,
> numInstances))
>return;
>
> if (0)
>check_draw_arrays_data(ctx, start, count);
>
> -   vbo_draw_arrays(ctx, mode, start, count, numInstances, 0);
> +   vbo_draw_arrays(ctx, mode, start, count, numInstances, 0, 0);
>
> if (0)
>print_draw_arrays(ctx, mode, start, count);
>  }
>
>
>  /**
>   * Called from glDrawArraysInstancedBaseInstance when in immediate mode.
>   */
>  static void GLAPIENTRY
> @@ -623,21 +625,21 @@ vbo_exec_DrawArraysInstancedBaseInstance(GLenum mode, 
> GLint first,
>_mesa_enum_to_string(mode), first, count,
>numInstances, baseInstance);
>
> if (!_mesa_validate_DrawArraysInstanced(ctx, mode, first, count,
> numInstances))
>return;
>
> if (0)
>check_draw_arrays_data(ctx, first, count);
>
> -   vbo_draw_arrays(ctx, mode, first, count, numInstances, baseInstance);
> +   vbo_draw_arrays(ctx, mode, first, count, numInstances, baseInstance, 0);
>
> if (0)
>print_draw_arrays(ctx, mode, first, count);
>  }
>
>
>  /**
>   * Called from glMultiDrawArrays when in immediate mode.
>   */
>  static void GLAPIENTRY
> @@ -653,21 +655,28 @@ vbo_exec_MultiDrawArrays(GLenum mode, const GLint 
> *first,
>_mesa_enum_to_string(mode), first, count, primcount);
>
> if (!_mesa_validate_MultiDrawArrays(ctx, mode, count, primcount))
>return;
>
> for (i = 0; i < primcount; i++) {
>if (count[i] > 0) {
>   if (0)
>  check_draw_arrays_data(ctx, first[i], count[i]);
>
> - vbo_draw_arrays(ctx, mode, first[i], count[i], 1, 0);
> + /* The GL_ARB_shader_draw_parameters spec adds the following after 
> the
> +  * pseudo-code describing glMultiDrawArrays:
> +  *
> +  *"The index of the draw ( in the above pseudo-code) may be
> +  * read by a vertex shader as , as described in
> +  * Section 11.1.3.9."
> +  */
> + 

Re: [Mesa-dev] [PATCH 5/5] radeonsi: add new si_check_render_feedback_texture() helper

2017-04-10 Thread Marek Olšák
Other than my comment on patch 3, the series is:

Reviewed-by: Marek Olšák 

Marek

On Thu, Apr 6, 2017 at 12:07 AM, Samuel Pitoiset
 wrote:
> For bindless.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/radeonsi/si_blit.c | 89 
> +-
>  1 file changed, 44 insertions(+), 45 deletions(-)
>
> diff --git a/src/gallium/drivers/radeonsi/si_blit.c 
> b/src/gallium/drivers/radeonsi/si_blit.c
> index f690f3e2f3..998288dba2 100644
> --- a/src/gallium/drivers/radeonsi/si_blit.c
> +++ b/src/gallium/drivers/radeonsi/si_blit.c
> @@ -527,6 +527,40 @@ si_decompress_image_color_textures(struct si_context 
> *sctx,
> }
>  }
>
> +static void si_check_render_feedback_texture(struct si_context *sctx,
> +struct r600_texture *tex,
> +unsigned first_level,
> +unsigned last_level,
> +unsigned first_layer,
> +unsigned last_layer)
> +{
> +   bool render_feedback = false;
> +
> +   if (!tex->dcc_offset)
> +   return;
> +
> +   for (unsigned j = 0; j < sctx->framebuffer.state.nr_cbufs; ++j) {
> +   struct r600_surface * surf;
> +
> +   if (!sctx->framebuffer.state.cbufs[j])
> +   continue;
> +
> +   surf = (struct r600_surface*)sctx->framebuffer.state.cbufs[j];
> +
> +   if (tex == (struct r600_texture *)surf->base.texture &&
> +   surf->base.u.tex.level >= first_level &&
> +   surf->base.u.tex.level <= last_level &&
> +   surf->base.u.tex.first_layer <= last_layer &&
> +   surf->base.u.tex.last_layer >= first_layer) {
> +   render_feedback = true;
> +   break;
> +   }
> +   }
> +
> +   if (render_feedback)
> +   r600_texture_disable_dcc(>b, tex);
> +}
> +
>  static void si_check_render_feedback_textures(struct si_context *sctx,
>struct si_textures_info 
> *textures)
>  {
> @@ -535,7 +569,6 @@ static void si_check_render_feedback_textures(struct 
> si_context *sctx,
> while (mask) {
> const struct pipe_sampler_view *view;
> struct r600_texture *tex;
> -   bool render_feedback = false;
>
> unsigned i = u_bit_scan();
>
> @@ -544,29 +577,12 @@ static void si_check_render_feedback_textures(struct 
> si_context *sctx,
> continue;
>
> tex = (struct r600_texture *)view->texture;
> -   if (!tex->dcc_offset)
> -   continue;
>
> -   for (unsigned j = 0; j < sctx->framebuffer.state.nr_cbufs; 
> ++j) {
> -   struct r600_surface * surf;
> -
> -   if (!sctx->framebuffer.state.cbufs[j])
> -   continue;
> -
> -   surf = (struct 
> r600_surface*)sctx->framebuffer.state.cbufs[j];
> -
> -   if (tex == (struct r600_texture*)surf->base.texture &&
> -   surf->base.u.tex.level >= view->u.tex.first_level 
> &&
> -   surf->base.u.tex.level <= view->u.tex.last_level 
> &&
> -   surf->base.u.tex.first_layer <= 
> view->u.tex.last_layer &&
> -   surf->base.u.tex.last_layer >= 
> view->u.tex.first_layer) {
> -   render_feedback = true;
> -   break;
> -   }
> -   }
> -
> -   if (render_feedback)
> -   r600_texture_disable_dcc(>b, tex);
> +   si_check_render_feedback_texture(sctx, tex,
> +view->u.tex.first_level,
> +view->u.tex.last_level,
> +view->u.tex.first_layer,
> +view->u.tex.last_layer);
> }
>  }
>
> @@ -578,7 +594,6 @@ static void si_check_render_feedback_images(struct 
> si_context *sctx,
> while (mask) {
> const struct pipe_image_view *view;
> struct r600_texture *tex;
> -   bool render_feedback = false;
>
> unsigned i = u_bit_scan();
>
> @@ -587,28 +602,12 @@ static void si_check_render_feedback_images(struct 
> si_context *sctx,
> continue;
>
> tex = (struct r600_texture *)view->resource;
> -   if (!tex->dcc_offset)
> -   continue;
> -
> -   for (unsigned j = 0; j < sctx->framebuffer.state.nr_cbufs; 
> ++j) {
> 

Re: [Mesa-dev] [PATCH 3/5] radeonsi: add new is_depth_texture() helper

2017-04-10 Thread Marek Olšák
On Thu, Apr 6, 2017 at 12:07 AM, Samuel Pitoiset
 wrote:
> For bindless.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/radeonsi/si_descriptors.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
> b/src/gallium/drivers/radeonsi/si_descriptors.c
> index 703a7cb1fa..524277462f 100644
> --- a/src/gallium/drivers/radeonsi/si_descriptors.c
> +++ b/src/gallium/drivers/radeonsi/si_descriptors.c
> @@ -559,6 +559,13 @@ static bool is_compressed_colortex(struct r600_texture 
> *rtex)
>(rtex->dcc_offset && rtex->dirty_level_mask);
>  }
>
> +static bool is_depth_texture(struct r600_texture *rtex,
> +struct si_sampler_view *sview)

Please rename this to depth_needs_decompression.

Similarly, is_compressed_colortex can be renamed to
color_needs_decompression, but you don't have to do that.

Thanks,
Marek

> +{
> +   return rtex->db_compatible &&
> +  (!rtex->tc_compatible_htile || sview->is_stencil_sampler);
> +}
> +
>  static void si_update_compressed_tex_shader_mask(struct si_context *sctx,
>  unsigned shader)
>  {
> @@ -602,8 +609,7 @@ static void si_set_sampler_views(struct pipe_context *ctx,
> (struct r600_texture*)views[i]->texture;
> struct si_sampler_view *rview = (struct 
> si_sampler_view *)views[i];
>
> -   if (rtex->db_compatible &&
> -   (!rtex->tc_compatible_htile || 
> rview->is_stencil_sampler)) {
> +   if (is_depth_texture(rtex, rview)) {
> samplers->depth_texture_mask |= 1u << slot;
> } else {
> samplers->depth_texture_mask &= ~(1u << slot);
> --
> 2.12.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] intel/blorp: Add a blorp_emit_dynamic macro

2017-04-10 Thread Jason Ekstrand
This makes it much easier to throw together a bit of dynamic state.  It
also automatically handles flushing so you don't accidentally forget.
---
 src/intel/blorp/blorp_genX_exec.h | 114 +-
 1 file changed, 50 insertions(+), 64 deletions(-)

diff --git a/src/intel/blorp/blorp_genX_exec.h 
b/src/intel/blorp/blorp_genX_exec.h
index 3791462..b462c52 100644
--- a/src/intel/blorp/blorp_genX_exec.h
+++ b/src/intel/blorp/blorp_genX_exec.h
@@ -120,6 +120,18 @@ _blorp_combine_address(struct blorp_batch *batch, void 
*location,
   _dw ? _dw + 1 : NULL; /* Array starts at dw[1] */ \
})
 
+#define STRUCT_ZERO(S) ({ struct S t; memset(, 0, sizeof(t)); t; })
+
+#define blorp_emit_dynamic(batch, state, name, align, offset)  \
+   for (struct state name = STRUCT_ZERO(state), \
+*_dst = blorp_alloc_dynamic_state(batch,   \
+  _blorp_cmd_length(state) * 4, \
+  align, offset);   \
+__builtin_expect(_dst != NULL, 1);  \
+_blorp_cmd_pack(state)(batch, (void *)_dst, ), \
+blorp_flush_range(batch, _dst, _blorp_cmd_length(state) * 4),   \
+_dst = NULL)
+
 /* 3DSTATE_URB
  * 3DSTATE_URB_VS
  * 3DSTATE_URB_HS
@@ -899,26 +911,19 @@ static uint32_t
 blorp_emit_blend_state(struct blorp_batch *batch,
const struct blorp_params *params)
 {
-   struct GENX(BLEND_STATE) blend;
-   memset(, 0, sizeof(blend));
-
-   for (unsigned i = 0; i < params->num_draw_buffers; ++i) {
-  blend.Entry[i].PreBlendColorClampEnable = true;
-  blend.Entry[i].PostBlendColorClampEnable = true;
-  blend.Entry[i].ColorClampRange = COLORCLAMP_RTFORMAT;
-
-  blend.Entry[i].WriteDisableRed = params->color_write_disable[0];
-  blend.Entry[i].WriteDisableGreen = params->color_write_disable[1];
-  blend.Entry[i].WriteDisableBlue = params->color_write_disable[2];
-  blend.Entry[i].WriteDisableAlpha = params->color_write_disable[3];
-   }
-
uint32_t offset;
-   void *state = blorp_alloc_dynamic_state(batch,
-   GENX(BLEND_STATE_length) * 4,
-   64, );
-   GENX(BLEND_STATE_pack)(NULL, state, );
-   blorp_flush_range(batch, state, GENX(BLEND_STATE_length) * 4);
+   blorp_emit_dynamic(batch, GENX(BLEND_STATE), blend, 64, ) {
+  for (unsigned i = 0; i < params->num_draw_buffers; ++i) {
+ blend.Entry[i].PreBlendColorClampEnable = true;
+ blend.Entry[i].PostBlendColorClampEnable = true;
+ blend.Entry[i].ColorClampRange = COLORCLAMP_RTFORMAT;
+
+ blend.Entry[i].WriteDisableRed = params->color_write_disable[0];
+ blend.Entry[i].WriteDisableGreen = params->color_write_disable[1];
+ blend.Entry[i].WriteDisableBlue = params->color_write_disable[2];
+ blend.Entry[i].WriteDisableAlpha = params->color_write_disable[3];
+  }
+   }
 
 #if GEN_GEN >= 7
blorp_emit(batch, GENX(3DSTATE_BLEND_STATE_POINTERS), sp) {
@@ -942,18 +947,12 @@ static uint32_t
 blorp_emit_color_calc_state(struct blorp_batch *batch,
 const struct blorp_params *params)
 {
-   struct GENX(COLOR_CALC_STATE) cc = { 0 };
-
+   uint32_t offset;
+   blorp_emit_dynamic(batch, GENX(COLOR_CALC_STATE), cc, 64, ) {
 #if GEN_GEN <= 8
-   cc.StencilReferenceValue = params->stencil_ref;
+  cc.StencilReferenceValue = params->stencil_ref;
 #endif
-
-   uint32_t offset;
-   void *state = blorp_alloc_dynamic_state(batch,
-   GENX(COLOR_CALC_STATE_length) * 4,
-   64, );
-   GENX(COLOR_CALC_STATE_pack)(NULL, state, );
-   blorp_flush_range(batch, state, GENX(COLOR_CALC_STATE_length) * 4);
+   }
 
 #if GEN_GEN >= 7
blorp_emit(batch, GENX(3DSTATE_CC_STATE_POINTERS), sp) {
@@ -1179,31 +1178,25 @@ static void
 blorp_emit_sampler_state(struct blorp_batch *batch,
  const struct blorp_params *params)
 {
-   struct GENX(SAMPLER_STATE) sampler = {
-  .MipModeFilter = MIPFILTER_NONE,
-  .MagModeFilter = MAPFILTER_LINEAR,
-  .MinModeFilter = MAPFILTER_LINEAR,
-  .MinLOD = 0,
-  .MaxLOD = 0,
-  .TCXAddressControlMode = TCM_CLAMP,
-  .TCYAddressControlMode = TCM_CLAMP,
-  .TCZAddressControlMode = TCM_CLAMP,
-  .MaximumAnisotropy = RATIO21,
-  .RAddressMinFilterRoundingEnable = true,
-  .RAddressMagFilterRoundingEnable = true,
-  .VAddressMinFilterRoundingEnable = true,
-  .VAddressMagFilterRoundingEnable = true,
-  .UAddressMinFilterRoundingEnable = true,
-  .UAddressMagFilterRoundingEnable = true,
-  .NonnormalizedCoordinateEnable = true,
-   };
-
uint32_t offset;
-   void *state = blorp_alloc_dynamic_state(batch,
-   GENX(SAMPLER_STATE_length) 

Re: [Mesa-dev] [PATCH] ac: add unreachable() in ac_build_image_opcode()

2017-04-10 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Fri, Apr 7, 2017 at 6:44 PM, Samuel Pitoiset
 wrote:
> To silent the following compiler warning:
>
> common/ac_llvm_build.c: In function ‘ac_build_image_opcode’:
> common/ac_llvm_build.c:1080:3: warning: ‘name’ may be used uninitialized in 
> this function [-Wmaybe-uninitialized]
>snprintf(intr_name, sizeof(intr_name), "%s%s%s%s.v4f32.%s.v8i32",
>^
> name,
> ~
> a->compare ? ".c" : "",
> ~~~
> a->bias ? ".b" :
> 
> a->lod ? ".l" :
> ~~~
> a->deriv ? ".d" :
> ~
> a->level_zero ? ".lz" : "",
> ~~~
> a->offset ? ".o" : "",
> ~~
> type);
> ~
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/common/ac_llvm_build.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
> index 5745fab05f..d45094c862 100644
> --- a/src/amd/common/ac_llvm_build.c
> +++ b/src/amd/common/ac_llvm_build.c
> @@ -1072,6 +1072,8 @@ LLVMValueRef ac_build_image_opcode(struct 
> ac_llvm_context *ctx,
> case ac_image_get_resinfo:
> name = "llvm.amdgcn.image.getresinfo";
> break;
> +   default:
> +   unreachable("invalid image opcode");
> }
>
> ac_build_type_name_for_intr(LLVMTypeOf(args[0]), type,
> --
> 2.12.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: fix gl_BaseVertex value in non-indexed draws

2017-04-10 Thread Marek Olšák
Hi Nicolai,

I think there is a simpler way to do this. Instead of going through
update_shaders, we can just set some bit in a user data SGPR e.g.
SI_SGPR_VS_STATE_BITS[1] and the vertex shader can clear gl_BaseVertex
based on that bit. There is no performance concern due to additional
instructions, because gl_BaseVertex is unlikely to be used.

Marek


On Sat, Apr 8, 2017 at 12:41 AM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> gl_BaseVertex is supposed to be 0 in non-indexed draws. Unfortunately, the
> way they're implemented, the VGT always generates indices starting at 0,
> and the VS prolog adds the start index.
>
> There's a VGT_INDX_OFFSET register which causes the VGT to start at a
> driver-defined index. However, this register cannot be written from
> indirect draws.
>
> So fix this unlikely case in the VS prolog.
>
> Fixes a bug in
> KHR-GL45.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters.*
> ---
>  src/gallium/drivers/radeonsi/si_pipe.h  |  1 +
>  src/gallium/drivers/radeonsi/si_shader.c| 17 +
>  src/gallium/drivers/radeonsi/si_shader.h|  1 +
>  src/gallium/drivers/radeonsi/si_state_draw.c|  5 +
>  src/gallium/drivers/radeonsi/si_state_shaders.c |  2 ++
>  5 files changed, 26 insertions(+)
>
> diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
> b/src/gallium/drivers/radeonsi/si_pipe.h
> index daf2932..ecf0f41 100644
> --- a/src/gallium/drivers/radeonsi/si_pipe.h
> +++ b/src/gallium/drivers/radeonsi/si_pipe.h
> @@ -343,20 +343,21 @@ struct si_context {
> int last_sh_base_reg;
> int last_primitive_restart_en;
> int last_restart_index;
> int last_gs_out_prim;
> int last_prim;
> int last_multi_vgt_param;
> int last_rast_prim;
> unsignedlast_sc_line_stipple;
> enum pipe_prim_type current_rast_prim; /* primitive type after 
> TES, GS */
> boolgs_tri_strip_adj_fix;
> +   boolcurrent_indexed;
>
> /* Scratch buffer */
> struct r600_atomscratch_state;
> struct r600_resource*scratch_buffer;
> unsignedscratch_waves;
> unsignedspi_tmpring_size;
>
> struct r600_resource*compute_scratch_buffer;
>
> /* Emitted derived tessellation state. */
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> b/src/gallium/drivers/radeonsi/si_shader.c
> index f5f86f9..e76ee05 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -7871,20 +7871,37 @@ static void si_build_vs_prolog_function(struct 
> si_shader_context *ctx,
> index = LLVMBuildAdd(gallivm->builder,
>  LLVMGetParam(func, 
> ctx->param_vertex_id),
>  LLVMGetParam(func, 
> SI_SGPR_BASE_VERTEX), "");
> }
>
> index = LLVMBuildBitCast(gallivm->builder, index, ctx->f32, 
> "");
> ret = LLVMBuildInsertValue(gallivm->builder, ret, index,
>num_params++, "");
> }
>
> +   /* For DrawArrays(Indirect) and variants, the basevertex loaded into
> +* the SGPR is the 'first' parameter of the draw call. However, the
> +* value returned as gl_BaseVertex to the VS should be 0.
> +*/
> +   if (key->vs_prolog.states.clear_basevertex) {
> +   LLVMValueRef index;
> +
> +   index = LLVMBuildAdd(gallivm->builder,
> +LLVMGetParam(func, ctx->param_vertex_id),
> +LLVMGetParam(func, SI_SGPR_BASE_VERTEX), 
> "");
> +   index = LLVMBuildBitCast(gallivm->builder, index, ctx->f32, 
> "");
> +   ret = LLVMBuildInsertValue(gallivm->builder, ret, index,
> +  ctx->param_vertex_id, "");
> +   ret = LLVMBuildInsertValue(gallivm->builder, ret, ctx->i32_0,
> +  SI_SGPR_BASE_VERTEX, "");
> +   }
> +
> si_llvm_build_ret(ctx, ret);
>  }
>
>  /**
>   * Build the vertex shader epilog function. This is also used by the 
> tessellation
>   * evaluation shader compiled as VS.
>   *
>   * The input is PrimitiveID.
>   *
>   * If PrimitiveID is required by the pixel shader, export it.
> diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
> b/src/gallium/drivers/radeonsi/si_shader.h
> index 17ffc5d..a3fcb42 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.h
> +++ b/src/gallium/drivers/radeonsi/si_shader.h
> @@ -334,20 +334,21 @@ struct si_shader_selector {
>   *  | | 

[Mesa-dev] [PATCH v3 8/9] nvc0/ir: Implement TGSI_OPCODE_BALLOT and TGSI_OPCODE_READ_*

2017-04-10 Thread Boyan Ding
v2: Check if each channel is masked in TGSI_OPCODE_BALLOT (Ilia Mirkin)

Signed-off-by: Boyan Ding 
---
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 31 ++
 1 file changed, 31 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 1bd01a9a32..92cc13d611 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -978,6 +978,10 @@ static nv50_ir::operation translateOpcode(uint opcode)
NV50_IR_OPCODE_CASE(VOTE_ANY, VOTE);
NV50_IR_OPCODE_CASE(VOTE_EQ, VOTE);
 
+   NV50_IR_OPCODE_CASE(BALLOT, VOTE);
+   NV50_IR_OPCODE_CASE(READ_INVOC, SHFL);
+   NV50_IR_OPCODE_CASE(READ_FIRST, SHFL);
+
NV50_IR_OPCODE_CASE(END, EXIT);
 
default:
@@ -3431,6 +3435,33 @@ Converter::handleInstruction(const struct 
tgsi_full_instruction *insn)
  mkCvt(OP_CVT, TYPE_U32, dst0[c], TYPE_U8, val0);
   }
   break;
+   case TGSI_OPCODE_BALLOT:
+  if (!tgsi.getDst(0).isMasked(0)) {
+ val0 = new_LValue(func, FILE_PREDICATE);
+ mkCmp(OP_SET, CC_NE, TYPE_U32, val0, TYPE_U32, fetchSrc(0, 0), zero);
+ mkOp1(op, TYPE_U32, dst0[0], val0)->subOp = NV50_IR_SUBOP_VOTE_ANY;
+  }
+  if (!tgsi.getDst(0).isMasked(1))
+ mkMov(dst0[1], zero, TYPE_U32);
+  break;
+   case TGSI_OPCODE_READ_FIRST:
+  // ReadFirstInvocationARB(src) is implemented as
+  // ReadInvocationARB(src, findLSB(ballot(true)))
+  val0 = getScratch();
+  mkOp1(OP_VOTE, TYPE_U32, val0, mkImm(1))->subOp = NV50_IR_SUBOP_VOTE_ANY;
+  mkOp2(OP_EXTBF, TYPE_U32, val0, val0, mkImm(0x2000))
+ ->subOp = NV50_IR_SUBOP_EXTBF_REV;
+  mkOp1(OP_BFIND, TYPE_U32, val0, val0)->subOp = NV50_IR_SUBOP_BFIND_SAMT;
+  src1 = val0;
+  /* fallthrough */
+   case TGSI_OPCODE_READ_INVOC:
+  if (tgsi.getOpcode() == TGSI_OPCODE_READ_INVOC)
+ src1 = fetchSrc(1, 0);
+  FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) {
+ geni = mkOp3(op, dstTy, dst0[c], fetchSrc(0, c), src1, mkImm(0x1f));
+ geni->subOp = NV50_IR_SUBOP_SHFL_IDX;
+  }
+  break;
case TGSI_OPCODE_CLOCK:
   // Stick the 32-bit clock into the high dword of the logical result.
   if (!tgsi.getDst(0).isMasked(0))
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 9/9] nvc0: Enable ARB_shader_ballot on Kepler+

2017-04-10 Thread Boyan Ding
readInvocationARB() and readFirstInvocationARB() need SHFL.IDX
instruction which is introduced in Kepler.

Reviewed-by: Ilia Mirkin 

Signed-off-by: Boyan Ding 
---
 docs/features.txt  | 2 +-
 docs/relnotes/17.1.0.html  | 2 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/features.txt b/docs/features.txt
index edc56842b9..a2d7785827 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -292,7 +292,7 @@ Khronos, ARB, and OES extensions that are not part of any 
OpenGL or OpenGL ES ve
   GL_ARB_sample_locations   not started
   GL_ARB_seamless_cubemap_per_texture   DONE (i965, nvc0, 
radeonsi, r600, softpipe, swr)
   GL_ARB_shader_atomic_counter_ops  DONE (i965/gen7+, 
nvc0, radeonsi, softpipe)
-  GL_ARB_shader_ballot  DONE (radeonsi)
+  GL_ARB_shader_ballot  DONE (nvc0, radeonsi)
   GL_ARB_shader_clock   DONE (i965/gen7+, 
nv50, nvc0, radeonsi)
   GL_ARB_shader_draw_parameters DONE (i965, nvc0, 
radeonsi)
   GL_ARB_shader_group_vote  DONE (nvc0, radeonsi)
diff --git a/docs/relnotes/17.1.0.html b/docs/relnotes/17.1.0.html
index 0a5cabe4f1..8f237ed527 100644
--- a/docs/relnotes/17.1.0.html
+++ b/docs/relnotes/17.1.0.html
@@ -45,7 +45,7 @@ Note: some of the new features are only available with 
certain drivers.
 
 
 GL_ARB_gpu_shader_int64 on i965/gen8+, nvc0, radeonsi, softpipe, 
llvmpipe
-GL_ARB_shader_ballot on radeonsi
+GL_ARB_shader_ballot on nvc0, radeonsi
 GL_ARB_shader_clock on nv50, nvc0, radeonsi
 GL_ARB_shader_group_vote on radeonsi
 GL_ARB_sparse_buffer on radeonsi/CIK+
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 7ef9bf9c9c..8c6712a121 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -259,6 +259,8 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
   return class_3d >= NVE4_3D_CLASS; /* needs testing on fermi */
case PIPE_CAP_POLYGON_MODE_FILL_RECTANGLE:
   return class_3d >= GM200_3D_CLASS;
+   case PIPE_CAP_TGSI_BALLOT:
+  return class_3d >= NVE4_3D_CLASS;
 
/* unsupported caps */
case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT:
@@ -289,7 +291,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY:
case PIPE_CAP_INT64_DIVMOD:
case PIPE_CAP_SPARSE_BUFFER_PAGE_SIZE:
-   case PIPE_CAP_TGSI_BALLOT:
   return 0;
 
case PIPE_CAP_VENDOR_ID:
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 5/9] nvc0/ir: Allow 0/1 immediate value as source of OP_VOTE

2017-04-10 Thread Boyan Ding
Implementation of readFirstInvocationARB() on nvidia hardware needs a
ballotARB(true) used to decide the first active thread. This expressed
in gm107 asm as (supposing output is $r0):
vote any $r0 0x1 0x1

To model the always true input, which corresponds to the second 0x1
above, we make OP_VOTE accept immediate value 0/1 and emit "0x1" and
"not 0x1" in the src field respectively.

v2: Make sure that asImm() is not NULL (Samuel Pitoiset)

v3: (Ilia Mirkin)
Make the handling more symmetric with predicate version in gm107
Use i->getSrc(s)

Signed-off-by: Boyan Ding 
---
 .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 24 ++
 .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 23 ++---
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 24 ++
 3 files changed, 60 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index 2a6c773ba2..f2efb0c60b 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -1621,7 +1621,8 @@ CodeEmitterGK110::emitSHFL(const Instruction *i)
 void
 CodeEmitterGK110::emitVOTE(const Instruction *i)
 {
-   assert(i->src(0).getFile() == FILE_PREDICATE);
+   const ImmediateValue *imm;
+   uint32_t u32;
 
code[0] = 0x0002;
code[1] = 0x86c0 | (i->subOp << 19);
@@ -1646,9 +1647,24 @@ CodeEmitterGK110::emitVOTE(const Instruction *i)
   code[0] |= 255 << 2;
if (!(rp & 2))
   code[1] |= 7 << 16;
-   if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT))
-  code[1] |= 1 << 13;
-   srcId(i->src(0), 42);
+
+   switch (i->src(0).getFile()) {
+   case FILE_PREDICATE:
+  if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT))
+ code[0] |= 1 << 13;
+  srcId(i->src(0), 42);
+  break;
+   case FILE_IMMEDIATE:
+  imm = i->getSrc(0)->asImm();
+  assert(imm);
+  u32 = imm->reg.data.u32;
+  assert(u32 == 0 || u32 == 1);
+  code[1] |= (u32 == 1 ? 0x7 : 0xf) << 10;
+  break;
+   default:
+  assert(!"Unhandled src");
+  break;
+   }
 }
 
 void
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index 944563c93c..b164526556 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -2931,7 +2931,8 @@ CodeEmitterGM107::emitMEMBAR()
 void
 CodeEmitterGM107::emitVOTE()
 {
-   assert(insn->src(0).getFile() == FILE_PREDICATE);
+   const ImmediateValue *imm;
+   uint32_t u32;
 
int r = -1, p = -1;
for (int i = 0; insn->defExists(i); i++) {
@@ -2951,8 +2952,24 @@ CodeEmitterGM107::emitVOTE()
   emitPRED (0x2d, insn->def(p));
else
   emitPRED (0x2d);
-   emitField(0x2a, 1, insn->src(0).mod == Modifier(NV50_IR_MOD_NOT));
-   emitPRED (0x27, insn->src(0));
+
+   switch (insn->src(0).getFile()) {
+   case FILE_PREDICATE:
+  emitField(0x2a, 1, insn->src(0).mod == Modifier(NV50_IR_MOD_NOT));
+  emitPRED (0x27, insn->src(0));
+  break;
+   case FILE_IMMEDIATE:
+  imm = insn->getSrc(0)->asImm();
+  assert(imm);
+  u32 = imm->reg.data.u32;
+  assert(u32 == 0 || u32 == 1);
+  emitPRED(0x27);
+  emitField(0x2a, 1, u32 == 0);
+  break;
+   default:
+  assert(!"Unhandled src");
+  break;
+   }
 }
 
 void
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index f4c39a168b..5ca8672054 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -2583,7 +2583,8 @@ CodeEmitterNVC0::emitSHFL(const Instruction *i)
 void
 CodeEmitterNVC0::emitVOTE(const Instruction *i)
 {
-   assert(i->src(0).getFile() == FILE_PREDICATE);
+   const ImmediateValue *imm;
+   uint32_t u32;
 
code[0] = 0x0004 | (i->subOp << 5);
code[1] = 0x4800;
@@ -2608,9 +2609,24 @@ CodeEmitterNVC0::emitVOTE(const Instruction *i)
   code[0] |= 63 << 14;
if (!(rp & 2))
   code[1] |= 7 << 22;
-   if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT))
-  code[0] |= 1 << 23;
-   srcId(i->src(0), 20);
+
+   switch (i->src(0).getFile()) {
+   case FILE_PREDICATE:
+  if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT))
+ code[0] |= 1 << 23;
+  srcId(i->src(0), 20);
+  break;
+   case FILE_IMMEDIATE:
+  imm = i->getSrc(0)->asImm();
+  assert(imm);
+  u32 = imm->reg.data.u32;
+  assert(u32 == 0 || u32 == 1);
+  code[0] |= (u32 == 1 ? 0x7 : 0xf) << 20;
+  break;
+   default:
+  assert(!"Unhandled src");
+  break;
+   }
 }
 
 bool
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

[Mesa-dev] [PATCH v3 2/9] nvc0/ir: Properly handle a "split form" of predicate destination

2017-04-10 Thread Boyan Ding
GF100's ISA encoding has a weird form of predicate destination where its
3 bits are split across whole the instruction. Use a dedicated setPDSTL
function instead of original defId which is incorrect in this case.

v2: (Ilia Mirkin)
Change API of setPDSTL() to handle cases of no output
Fix setting of the highest bit in setPDSTL()

Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Boyan Ding 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index 5467447e35..a578e947ec 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -58,6 +58,7 @@ private:
void setImmediateS8(const ValueRef&);
void setSUConst16(const Instruction *, const int s);
void setSUPred(const Instruction *, const int s);
+   void setPDSTL(const Instruction *, const int d);
 
void emitCondCode(CondCode cc, int pos);
void emitInterpMode(const Instruction *);
@@ -375,6 +376,16 @@ void CodeEmitterNVC0::setImmediateS8(const ValueRef )
code[0] |= (s8 >> 6) << 8;
 }
 
+void CodeEmitterNVC0::setPDSTL(const Instruction *i, const int d)
+{
+   assert(d < 0 || (i->defExists(d) && i->def(d).getFile() == FILE_PREDICATE));
+
+   uint32_t pred = d >= 0 ? DDATA(i->def(d)).id : 7;
+
+   code[0] |= (pred & 3) << 8;
+   code[1] |= (pred & 4) << (26 - 2);
+}
+
 void
 CodeEmitterNVC0::emitForm_A(const Instruction *i, uint64_t opc)
 {
@@ -1873,7 +1884,7 @@ CodeEmitterNVC0::emitSTORE(const Instruction *i)
   if (i->src(0).getFile() == FILE_MEMORY_SHARED &&
   i->subOp == NV50_IR_SUBOP_STORE_UNLOCKED) {
  assert(i->defExists(0));
- defId(i->def(0), 8);
+ setPDSTL(i, 0);
   }
}
 
@@ -1945,7 +1956,7 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i)
 
if (p >= 0) {
   if (targ->getChipset() >= NVISA_GK104_CHIPSET)
- defId(i->def(p), 8);
+ setPDSTL(i, p);
   else
  defId(i->def(p), 32 + 18);
}
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 6/9] nvc0/ir: Add SV_LANEMASK_* system values.

2017-04-10 Thread Boyan Ding
v2: Add name strings in nv50_ir_print.cpp (Ilia Mirkin)

Signed-off-by: Boyan Ding 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir.h  | 5 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 5 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 5 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 5 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp  | 5 +
 5 files changed, 25 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index 6e5ffa525d..de6c110536 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -470,6 +470,11 @@ enum SVSemantic
SV_BASEINSTANCE,
SV_DRAWID,
SV_WORK_DIM,
+   SV_LANEMASK_EQ,
+   SV_LANEMASK_LT,
+   SV_LANEMASK_LE,
+   SV_LANEMASK_GT,
+   SV_LANEMASK_GE,
SV_UNDEFINED,
SV_LAST
 };
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index f2efb0c60b..370427d0d1 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -2300,6 +2300,11 @@ CodeEmitterGK110::getSRegEncoding(const ValueRef& ref)
case SV_NCTAID:return 0x2d + SDATA(ref).sv.index;
case SV_LBASE: return 0x34;
case SV_SBASE: return 0x30;
+   case SV_LANEMASK_EQ:   return 0x38;
+   case SV_LANEMASK_LT:   return 0x39;
+   case SV_LANEMASK_LE:   return 0x3a;
+   case SV_LANEMASK_GT:   return 0x3b;
+   case SV_LANEMASK_GE:   return 0x3c;
case SV_CLOCK: return 0x50 + SDATA(ref).sv.index;
default:
   assert(!"no sreg for system value");
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index b164526556..8b58df49c2 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -269,6 +269,11 @@ CodeEmitterGM107::emitSYS(int pos, const Value *val)
case SV_INVOCATION_INFO: id = 0x1d; break;
case SV_TID: id = 0x21 + val->reg.data.sv.index; break;
case SV_CTAID  : id = 0x25 + val->reg.data.sv.index; break;
+   case SV_LANEMASK_EQ: id = 0x38; break;
+   case SV_LANEMASK_LT: id = 0x39; break;
+   case SV_LANEMASK_LE: id = 0x3a; break;
+   case SV_LANEMASK_GT: id = 0x3b; break;
+   case SV_LANEMASK_GE: id = 0x3c; break;
case SV_CLOCK  : id = 0x50 + val->reg.data.sv.index; break;
default:
   assert(!"invalid system value");
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index 5ca8672054..14c00bd187 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -1991,6 +1991,11 @@ CodeEmitterNVC0::getSRegEncoding(const ValueRef& ref)
case SV_NCTAID:return 0x2d + SDATA(ref).sv.index;
case SV_LBASE: return 0x34;
case SV_SBASE: return 0x30;
+   case SV_LANEMASK_EQ:   return 0x38;
+   case SV_LANEMASK_LT:   return 0x39;
+   case SV_LANEMASK_LE:   return 0x3a;
+   case SV_LANEMASK_GT:   return 0x3b;
+   case SV_LANEMASK_GE:   return 0x3c;
case SV_CLOCK: return 0x50 + SDATA(ref).sv.index;
default:
   assert(!"no sreg for system value");
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
index 19b11642b5..f5253b3745 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
@@ -323,6 +323,11 @@ static const char *SemanticStr[SV_LAST + 1] =
"BASEINSTANCE",
"DRAWID",
"WORK_DIM",
+   "LANEMASK_EQ",
+   "LANEMASK_LT",
+   "LANEMASK_LE",
+   "LANEMASK_GT",
+   "LANEMASK_GE",
"?",
"(INVALID)"
 };
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 7/9] nvc0/ir: Implement TGSI_SEMANTIC_SUBGROUP_*

2017-04-10 Thread Boyan Ding
Reviewed-by: Ilia Mirkin 

Signed-off-by: Boyan Ding 
---
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 27 ++
 1 file changed, 27 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 3ed7d345c4..1bd01a9a32 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -450,6 +450,12 @@ static nv50_ir::SVSemantic translateSysVal(uint sysval)
case TGSI_SEMANTIC_BASEINSTANCE: return nv50_ir::SV_BASEINSTANCE;
case TGSI_SEMANTIC_DRAWID: return nv50_ir::SV_DRAWID;
case TGSI_SEMANTIC_WORK_DIM:   return nv50_ir::SV_WORK_DIM;
+   case TGSI_SEMANTIC_SUBGROUP_INVOCATION: return nv50_ir::SV_LANEID;
+   case TGSI_SEMANTIC_SUBGROUP_EQ_MASK: return nv50_ir::SV_LANEMASK_EQ;
+   case TGSI_SEMANTIC_SUBGROUP_LT_MASK: return nv50_ir::SV_LANEMASK_LT;
+   case TGSI_SEMANTIC_SUBGROUP_LE_MASK: return nv50_ir::SV_LANEMASK_LE;
+   case TGSI_SEMANTIC_SUBGROUP_GT_MASK: return nv50_ir::SV_LANEMASK_GT;
+   case TGSI_SEMANTIC_SUBGROUP_GE_MASK: return nv50_ir::SV_LANEMASK_GE;
default:
   assert(0);
   return nv50_ir::SV_CLOCK;
@@ -1667,6 +1673,8 @@ private:
Symbol *srcToSym(tgsi::Instruction::SrcRegister, int c);
Symbol *dstToSym(tgsi::Instruction::DstRegister, int c);
 
+   bool isSubGroupMask(uint8_t semantic);
+
bool handleInstruction(const struct tgsi_full_instruction *);
void exportOutputs();
inline Subroutine *getSubroutine(unsigned ip);
@@ -1996,6 +2004,21 @@ Converter::adjustTempIndex(int arrayId, int , int 
) const
idx += it->second;
 }
 
+bool
+Converter::isSubGroupMask(uint8_t semantic)
+{
+   switch (semantic) {
+  case TGSI_SEMANTIC_SUBGROUP_EQ_MASK:
+  case TGSI_SEMANTIC_SUBGROUP_LT_MASK:
+  case TGSI_SEMANTIC_SUBGROUP_LE_MASK:
+  case TGSI_SEMANTIC_SUBGROUP_GT_MASK:
+  case TGSI_SEMANTIC_SUBGROUP_GE_MASK:
+ return true;
+  default:
+ return false;
+   }
+}
+
 Value *
 Converter::fetchSrc(tgsi::Instruction::SrcRegister src, int c, Value *ptr)
 {
@@ -2041,6 +2064,10 @@ Converter::fetchSrc(tgsi::Instruction::SrcRegister src, 
int c, Value *ptr)
   if (info->sv[idx].sn == TGSI_SEMANTIC_THREAD_ID &&
   info->prop.cp.numThreads[swz] == 1)
  return loadImm(NULL, 0u);
+  if (isSubGroupMask(info->sv[idx].sn) && swz > 0)
+ return loadImm(NULL, 0u);
+  if (info->sv[idx].sn == TGSI_SEMANTIC_SUBGROUP_SIZE)
+ return loadImm(NULL, 32u);
   ld = mkOp1(OP_RDSV, TYPE_U32, getSSA(), srcToSym(src, c));
   ld->perPatch = info->sv[idx].patch;
   return ld->getDef(0);
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 3/9] nvc0/ir: Emit OP_SHFL

2017-04-10 Thread Boyan Ding
v2: (Samuel Pitoiset)
Add an assertion to check if the target is Kepler
Make sure that asImm() is not NULL

v3: (Ilia Mirkin)
Check the range of immediate value of OP_SHFL
Use the new setPDSTL API

Signed-off-by: Boyan Ding 
---
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 53 ++
 1 file changed, 53 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index a578e947ec..f4c39a168b 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -150,6 +150,8 @@ private:
 
void emitPIXLD(const Instruction *);
 
+   void emitSHFL(const Instruction *);
+
void emitVOTE(const Instruction *);
 
inline void defId(const ValueDef&, const int pos);
@@ -2531,6 +2533,54 @@ CodeEmitterNVC0::emitPIXLD(const Instruction *i)
 }
 
 void
+CodeEmitterNVC0::emitSHFL(const Instruction *i)
+{
+   const ImmediateValue *imm;
+
+   assert(targ->getChipset() >= NVISA_GK104_CHIPSET);
+
+   code[0] = 0x0005;
+   code[1] = 0x8800 | (i->subOp << 23);
+
+   emitPredicate(i);
+
+   defId(i->def(0), 14);
+   srcId(i->src(0), 20);
+
+   switch (i->src(1).getFile()) {
+   case FILE_GPR:
+  srcId(i->src(1), 26);
+  break;
+   case FILE_IMMEDIATE:
+  imm = i->getSrc(1)->asImm();
+  assert(imm && imm->reg.data.u32 < 0x20);
+  code[0] |= imm->reg.data.u32 << 26;
+  code[0] |= 1 << 5;
+  break;
+   default:
+  assert(!"invalid src1 file");
+  break;
+   }
+
+   switch (i->src(2).getFile()) {
+   case FILE_GPR:
+  srcId(i->src(2), 49);
+  break;
+   case FILE_IMMEDIATE:
+  imm = i->getSrc(2)->asImm();
+  assert(imm && imm->reg.data.u32 < 0x2000);
+  code[1] |= imm->reg.data.u32 << 10;
+  code[0] |= 1 << 6;
+  break;
+   default:
+  assert(!"invalid src2 file");
+  break;
+   }
+
+   setPDSTL(i, i->defExists(1) ? 1 : -1);
+}
+
+void
 CodeEmitterNVC0::emitVOTE(const Instruction *i)
 {
assert(i->src(0).getFile() == FILE_PREDICATE);
@@ -2839,6 +2889,9 @@ CodeEmitterNVC0::emitInstruction(Instruction *insn)
case OP_PIXLD:
   emitPIXLD(insn);
   break;
+   case OP_SHFL:
+  emitSHFL(insn);
+  break;
case OP_VOTE:
   emitVOTE(insn);
   break;
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 4/9] gk110/ir: Emit OP_SHFL

2017-04-10 Thread Boyan Ding
v2: Make sure that asImm() is not NULL (Samuel Pitoiset)

v3: Check the range of immediate in OP_SHFL (Ilia Mirkin)

Signed-off-by: Boyan Ding 
---
 .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 56 ++
 1 file changed, 56 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index 1121ae0912..2a6c773ba2 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -135,6 +135,8 @@ private:
 
void emitFlow(const Instruction *);
 
+   void emitSHFL(const Instruction *);
+
void emitVOTE(const Instruction *);
 
void emitSULDGB(const TexInstruction *);
@@ -1566,6 +1568,57 @@ CodeEmitterGK110::emitFlow(const Instruction *i)
 }
 
 void
+CodeEmitterGK110::emitSHFL(const Instruction *i)
+{
+   const ImmediateValue *imm;
+
+   code[0] = 0x0002;
+   code[1] = 0x7880 | (i->subOp << 1);
+
+   emitPredicate(i);
+
+   defId(i->def(0), 2);
+   srcId(i->src(0), 10);
+
+   switch (i->src(1).getFile()) {
+   case FILE_GPR:
+  srcId(i->src(1), 23);
+  break;
+   case FILE_IMMEDIATE:
+  imm = i->getSrc(1)->asImm();
+  assert(imm && imm->reg.data.u32 < 0x20);
+  code[0] |= imm->reg.data.u32 << 23;
+  code[0] |= 1 << 31;
+  break;
+   default:
+  assert(!"invalid src1 file");
+  break;
+   }
+
+   switch (i->src(2).getFile()) {
+   case FILE_GPR:
+  srcId(i->src(2), 42);
+  break;
+   case FILE_IMMEDIATE:
+  imm = i->getSrc(2)->asImm();
+  assert(imm && imm->reg.data.u32 < 0x2000);
+  code[1] |= imm->reg.data.u32 << 5;
+  code[1] |= 1;
+  break;
+   default:
+  assert(!"invalid src2 file");
+  break;
+   }
+
+   if (!i->defExists(1))
+  code[1] |= 7 << 19;
+   else {
+  assert(i->def(1).getFile() == FILE_PREDICATE);
+  defId(i->def(1), 51);
+   }
+}
+
+void
 CodeEmitterGK110::emitVOTE(const Instruction *i)
 {
assert(i->src(0).getFile() == FILE_PREDICATE);
@@ -2642,6 +2695,9 @@ CodeEmitterGK110::emitInstruction(Instruction *insn)
case OP_CCTL:
   emitCCTL(insn);
   break;
+   case OP_SHFL:
+  emitSHFL(insn);
+  break;
case OP_VOTE:
   emitVOTE(insn);
   break;
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 1/9] gm107/ir: Emit third src 'bound' and optional predicate output of SHFL

2017-04-10 Thread Boyan Ding
v2: Emit the original hard-coded 0x1c03 when OP_SHFL is used in gm107's
lowering (Samuel Pitoiset)

Signed-off-by: Boyan Ding 
---
 .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 23 ++
 .../nouveau/codegen/nv50_ir_lowering_gm107.cpp | 15 +-
 2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index c3c0dcd9fc..944563c93c 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -967,11 +967,26 @@ CodeEmitterGM107::emitSHFL()
   break;
}
 
-   /*XXX: what is this arg? hardcode immediate for now */
-   emitField(0x22, 13, 0x1c03);
-   type |= 2;
+   switch (insn->src(2).getFile()) {
+   case FILE_GPR:
+  emitGPR(0x27, insn->src(2));
+  break;
+   case FILE_IMMEDIATE:
+  emitIMMD(0x22, 13, insn->src(2));
+  type |= 2;
+  break;
+   default:
+  assert(!"invalid src2 file");
+  break;
+   }
+
+   if (!insn->defExists(1))
+  emitPRED(0x30);
+   else {
+  assert(insn->def(1).getFile() == FILE_PREDICATE);
+  emitPRED(0x30, insn->def(1));
+   }
 
-   emitPRED (0x30);
emitField(0x1e, 2, insn->subOp);
emitField(0x1c, 2, type);
emitGPR  (0x08, insn->src(0));
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_gm107.cpp
index 371ebae40c..6b9edd4864 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_gm107.cpp
@@ -41,6 +41,8 @@ namespace nv50_ir {
((QOP_##q << 6) | (QOP_##r << 4) |   \
 (QOP_##s << 2) | (QOP_##t << 0))
 
+#define SHFL_BOUND_QUAD 0x1c03
+
 void
 GM107LegalizeSSA::handlePFETCH(Instruction *i)
 {
@@ -120,7 +122,8 @@ GM107LoweringPass::handleManualTXD(TexInstruction *i)
   // mov coordinates from lane l to all lanes
   bld.mkOp(OP_QUADON, TYPE_NONE, NULL);
   for (c = 0; c < dim; ++c) {
- bld.mkOp2(OP_SHFL, TYPE_F32, crd[c], i->getSrc(c + array), 
bld.mkImm(l));
+ bld.mkOp3(OP_SHFL, TYPE_F32, crd[c], i->getSrc(c + array),
+   bld.mkImm(l), bld.mkImm(SHFL_BOUND_QUAD));
  add = bld.mkOp2(OP_QUADOP, TYPE_F32, crd[c], crd[c], zero);
  add->subOp = 0x00;
  add->lanes = 1; /* abused for .ndv */
@@ -128,7 +131,8 @@ GM107LoweringPass::handleManualTXD(TexInstruction *i)
 
   // add dPdx from lane l to lanes dx
   for (c = 0; c < dim; ++c) {
- bld.mkOp2(OP_SHFL, TYPE_F32, tmp, i->dPdx[c].get(), bld.mkImm(l));
+ bld.mkOp3(OP_SHFL, TYPE_F32, tmp, i->dPdx[c].get(), bld.mkImm(l),
+   bld.mkImm(SHFL_BOUND_QUAD));
  add = bld.mkOp2(OP_QUADOP, TYPE_F32, crd[c], tmp, crd[c]);
  add->subOp = qOps[l][0];
  add->lanes = 1; /* abused for .ndv */
@@ -136,7 +140,8 @@ GM107LoweringPass::handleManualTXD(TexInstruction *i)
 
   // add dPdy from lane l to lanes dy
   for (c = 0; c < dim; ++c) {
- bld.mkOp2(OP_SHFL, TYPE_F32, tmp, i->dPdy[c].get(), bld.mkImm(l));
+ bld.mkOp3(OP_SHFL, TYPE_F32, tmp, i->dPdy[c].get(), bld.mkImm(l),
+   bld.mkImm(SHFL_BOUND_QUAD));
  add = bld.mkOp2(OP_QUADOP, TYPE_F32, crd[c], tmp, crd[c]);
  add->subOp = qOps[l][1];
  add->lanes = 1; /* abused for .ndv */
@@ -203,8 +208,8 @@ GM107LoweringPass::handleDFDX(Instruction *insn)
   break;
}
 
-   shfl = bld.mkOp2(OP_SHFL, TYPE_F32, bld.getScratch(),
-insn->getSrc(0), bld.mkImm(xid));
+   shfl = bld.mkOp3(OP_SHFL, TYPE_F32, bld.getScratch(), insn->getSrc(0),
+bld.mkImm(xid), bld.mkImm(SHFL_BOUND_QUAD));
shfl->subOp = NV50_IR_SUBOP_SHFL_BFLY;
insn->op = OP_QUADOP;
insn->subOp = qop;
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/9] nvc0: ARB_shader_ballot for Kepler+ (v3)

2017-04-10 Thread Boyan Ding
This is the third, and hopefully the last revision of ballot series.
This series mainly incorporates Ilia's feedback, with some fixes, more
check and code cleanup.

Please review.

Boyan Ding (9):
  gm107/ir: Emit third src 'bound' and optional predicate output of SHFL
  nvc0/ir: Properly handle a "split form" of predicate destination
  nvc0/ir: Emit OP_SHFL
  gk110/ir: Emit OP_SHFL
  nvc0/ir: Allow 0/1 immediate value as source of OP_VOTE
  nvc0/ir: Add SV_LANEMASK_* system values.
  nvc0/ir: Implement TGSI_SEMANTIC_SUBGROUP_*
  nvc0/ir: Implement TGSI_OPCODE_BALLOT and TGSI_OPCODE_READ_*
  nvc0: Enable ARB_shader_ballot on Kepler+

 docs/features.txt  |  2 +-
 docs/relnotes/17.1.0.html  |  2 +-
 src/gallium/drivers/nouveau/codegen/nv50_ir.h  |  5 ++
 .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 85 ++-
 .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 51 ++--
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 97 --
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 58 +
 .../nouveau/codegen/nv50_ir_lowering_gm107.cpp | 15 ++--
 .../drivers/nouveau/codegen/nv50_ir_print.cpp  |  5 ++
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |  3 +-
 10 files changed, 298 insertions(+), 25 deletions(-)

-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Backporting bufmgr fixes to libdrm_intel (Was Re: [PATCH 6/9] i965/bufmgr: Garbage-collect vma cache/pruning)

2017-04-10 Thread Emil Velikov
Hi all,

On 10 April 2017 at 08:18, Kenneth Graunke  wrote:
> From: Daniel Vetter 
>
> This was done because the kernel has 1 global address space, shared
> with all render clients, for gtt mmap offsets, and that address space
> was only 32bit on 32bit kernels.
>
> This was fixed  in
>
> commit 440fd5283a87345cdd4237bdf45fb01130ea0056
> Author: Thierry Reding 
> Date:   Fri Jan 23 09:05:06 2015 +0100
>
> drm/mm: Support 4 GiB and larger ranges
>
> which shipped in 4.0. Of course you still want to limit the bo cache
> to a reasonable size on 32bit apps to avoid ENOMEM, but that's better
> solved by tuning the cache a bit. On 64bit, this was never an issue.
>
While this patch is _not_ a bugfix, it inspired an interesting question/topic:

Do we want to backport fixes from mesa's bufmgr to libdrm_intel?

Or in general what's the plan about the library - leave it as-is, sync
fixes, remove it, other.
Can we have the decision documented somewhere, please?

After all: good science/engineering is good documentation.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 100613] Regression in Mesa 17 on s390x (zSystems)

2017-04-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=100613

Vedran Miletić  changed:

   What|Removed |Added

 CC||ved...@miletic.net

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH kmscube 1/2] gst-decoder.c: Only gst_is_dmabuf_memory() once

2017-04-10 Thread Carlos Rafael Giani
This prevents potential segfaults in case the buffer was merged and the
mem pointer is then no longer valid

Signed-off-by: Carlos Rafael Giani 
---
 gst-decoder.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/gst-decoder.c b/gst-decoder.c
index e59148a..cc5c3b2 100644
--- a/gst-decoder.c
+++ b/gst-decoder.c
@@ -360,6 +360,7 @@ buffer_to_image(struct decoder *dec, GstBuffer *buf)
guint nplanes = GST_VIDEO_INFO_N_PLANES(&(dec->info));
guint i;
guint width, height;
+   gboolean is_dmabuf_mem;
GstMemory *mem;
int dmabuf_fd = -1;
 
@@ -379,10 +380,14 @@ buffer_to_image(struct decoder *dec, GstBuffer *buf)
EGL_DMA_BUF_PLANE2_PITCH_EXT,
};
 
+   /* Query gst_is_dmabuf_memory() here, since the gstmemory
+* block might get merged below by gst_buffer_map(), meaning
+* that the mem pointer would become invalid */
mem = gst_buffer_peek_memory(buf, 0);
+   is_dmabuf_mem = gst_is_dmabuf_memory(mem);
 
if (nmems > 1) {
-   if (gst_is_dmabuf_memory(mem)) {
+   if (is_dmabuf_mem) {
/* this case currently is not defined */
 
GST_FIXME("gstbuffers with multiple memory blocks and 
DMABUF "
@@ -395,7 +400,7 @@ buffer_to_image(struct decoder *dec, GstBuffer *buf)
 */
}
 
-   if (gst_is_dmabuf_memory(mem)) {
+   if (is_dmabuf_mem) {
dmabuf_fd = dup(gst_dmabuf_memory_get_fd(mem));
} else {
GstMapInfo map_info;
@@ -447,7 +452,7 @@ buffer_to_image(struct decoder *dec, GstBuffer *buf)
printf("GStreamer video stream information:\n");
printf("  size: %u x %u pixel\n", width, height);
printf("  pixel format: %s  number of planes: %u\n", 
pixfmt_str, nplanes);
-   printf("  can use zero-copy: %s\n", 
yesno(gst_is_dmabuf_memory(mem)));
+   printf("  can use zero-copy: %s\n", yesno(is_dmabuf_mem));
printf("  video meta found: %s\n", yesno(meta != NULL));
printf("===\n");
}
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH kmscube 2/2] gst-video-appsink: Cleanup & add max-lateness & enable QoS

2017-04-10 Thread Carlos Rafael Giani
The QoS and max-lateness settings are copied from GstVideoSink, since here,
the appsink subclass specifically handles video

Signed-off-by: Carlos Rafael Giani 
---
 gst-video-appsink.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gst-video-appsink.c b/gst-video-appsink.c
index 34e5931..2c5c15b 100644
--- a/gst-video-appsink.c
+++ b/gst-video-appsink.c
@@ -81,7 +81,9 @@ gst_video_appsink_class_init(GstVideoAppsinkClass *klass)
 static void
 gst_video_appsink_init(GstVideoAppsink *video_appsink)
 {
-   (void)video_appsink;
+   /* QoS and max-lateness lines taken from gstvideosink.c */
+   gst_base_sink_set_max_lateness(GST_BASE_SINK(video_appsink), 20 * 
GST_MSECOND);
+   gst_base_sink_set_qos_enabled(GST_BASE_SINK(video_appsink), TRUE);
 }
 
 
@@ -90,8 +92,6 @@ gst_video_appsink_sink_propose_allocation (GstBaseSink 
*bsink, GstQuery *query)
 {
(void)bsink;
 
-   gst_query_parse_allocation(query, NULL, NULL);
-
gst_query_add_allocation_meta(query, GST_VIDEO_META_API_TYPE, NULL);
 
return TRUE;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Meson mesademos (Was: [RFC libdrm 0/2] Replace the build system with meson)

2017-04-10 Thread Nirbheek Chauhan
Hello Jose,

On Mon, Apr 10, 2017 at 5:41 PM, Jose Fonseca  wrote:
> I've been trying to get native mingw to build.  (It's still important to
> prototype mesademos with MSVC to ensure meson is up to the task, but long
> term, I think I'll push for dropping MSVC support from mesademos and piglit,
> since MinGW is fine for this sort of samples/tests programs.)
>
> However native MinGW fails poorly:
>
> [78/1058] Static linking library src/util/libutil.a
> FAILED: src/util/libutil.a
> cmd /c del /f /s /q src/util/libutil.a && ar @src/util/libutil.a.rsp
> Invalid switch - "util".
>
> So the problem here is that meson is passing `/` separator to the cmd.exe
> del command, instead of `\`.
>
> Full log
> https://ci.appveyor.com/project/jrfonseca/mesademos/build/job/6rpen94u7yq3q69n
>

This was a regression with 0.39, and is already fixed in git master:
https://github.com/mesonbuild/meson/pull/1527

It will be in the next release, which is scheduled for April 22. In
the meantime, please test with git master.

>
> TBH, this is basic windows functionality, and if it can't get it right then
> it shakes my belief that's it's getting proper windows testing...
>

I'm sorry to hear that.

>
> I think part of the problem is that per
> https://github.com/mesonbuild/meson/blob/master/.appveyor.yml Meson is only
> being tested with MSYS (which provides a full-blow POSIX environment on
> Windows), and not with plain MinGW.
>

Actually, this slipped through the cracks (I broke it!) because we
didn't have our CI testing MinGW. Now we do, specifically to catch
this sort of stuff: https://github.com/mesonbuild/meson/pull/1346.

All our pull requests are required to pass all CI before they can be
merged, and every bug fixed and feature added is required to have a
new test case for it, so I expect the situation will not regress
again.

Our CI is fairly comprehensive -- MSVC 2010, 2015, 2017, MinGW, Cygwin
on just Windows and getting better every day. The biggest hole in it
right now is BSD, and we would be extremely grateful if someone could
help us with that too!

> IMHO, MSYS is a hack to get packages that use autotools to build with MinGW.
> Packages that use Windows aware build systems (like Meson is trying to be)
> should stay as _far_ as possible from MSYS
>

Yes, I agree. MSYS2 in particular is especially broken (the toolchain
is buggy and even the python3 shipped with it is crap) and we do not
recommend using it at all (although a surprisingly large number of
people use its toolchain, so we do support it). If you look closely,
we do not use MSYS itself, only MinGW:

https://github.com/mesonbuild/meson/blob/master/.appveyor.yml#L61

The MSYS paths are C:\msys64\usr\bin and the MinGW (toolchain) paths
are C:\msys64\mingw??\bin.

And in any case our codepaths for building something with the Ninja
backend on MSVC and MinGW are almost identical, and our MSVC CI does
not have any POSIX binaries in their path.

I even have all of Glib + dependencies building out of the box with
just Meson git + MSVC [https://github.com/centricular/glib/], and my
next step is to have all of GStreamer building that way.

Hope this clarifies things!

Cheers,
Nirbheek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nvc0: increase texture buffer object alignment to 256 for pre-GM107

2017-04-10 Thread Samuel Pitoiset



On 04/10/2017 02:33 PM, Ilia Mirkin wrote:

I assume Pascal is the same as Maxwell. Using tic, it gets 16...


Makes sense.

Reviewed-by: Samuel Pitoiset 



On Apr 10, 2017 5:32 AM, "Samuel Pitoiset" > wrote:


How about Pascal?

On 04/08/2017 09:10 PM, Ilia Mirkin wrote:

We currently don't pass the low byte of the address via the surface
info, so in order to work with images, these have to implicitly be
aligned to 256. The proprietary driver also doesn't go out of
its way to
provide lower alignment.

Fixes GL45-CTS.texture_buffer.texture_buffer_texture_buffer_range

Signed-off-by: Ilia Mirkin >
Cc: mesa-sta...@lists.freedesktop.org

---
   src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 543857a..fc44d32 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -147,7 +147,7 @@ nvc0_screen_get_param(struct pipe_screen
*pscreen, enum pipe_cap param)
  case PIPE_CAP_CONSTANT_BUFFER_OFFSET_ALIGNMENT:
 return 256;
  case PIPE_CAP_TEXTURE_BUFFER_OFFSET_ALIGNMENT:
-  if (class_3d < NVE4_3D_CLASS)
+  if (class_3d < GM107_3D_CLASS)
return 256; /* IMAGE bindings require alignment to
256 */
 return 16;
  case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT:


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nvc0: increase texture buffer object alignment to 256 for pre-GM107

2017-04-10 Thread Ilia Mirkin
I assume Pascal is the same as Maxwell. Using tic, it gets 16...

On Apr 10, 2017 5:32 AM, "Samuel Pitoiset" 
wrote:

> How about Pascal?
>
> On 04/08/2017 09:10 PM, Ilia Mirkin wrote:
>
>> We currently don't pass the low byte of the address via the surface
>> info, so in order to work with images, these have to implicitly be
>> aligned to 256. The proprietary driver also doesn't go out of its way to
>> provide lower alignment.
>>
>> Fixes GL45-CTS.texture_buffer.texture_buffer_texture_buffer_range
>>
>> Signed-off-by: Ilia Mirkin 
>> Cc: mesa-sta...@lists.freedesktop.org
>> ---
>>   src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
>> b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
>> index 543857a..fc44d32 100644
>> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
>> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
>> @@ -147,7 +147,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen,
>> enum pipe_cap param)
>>  case PIPE_CAP_CONSTANT_BUFFER_OFFSET_ALIGNMENT:
>> return 256;
>>  case PIPE_CAP_TEXTURE_BUFFER_OFFSET_ALIGNMENT:
>> -  if (class_3d < NVE4_3D_CLASS)
>> +  if (class_3d < GM107_3D_CLASS)
>>return 256; /* IMAGE bindings require alignment to 256 */
>> return 16;
>>  case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT:
>>
>>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/2] bin/get-{extra, fixes}-pick-list.sh: add support for ignore list

2017-04-10 Thread Emil Velikov
On 10 April 2017 at 11:15, Juan A. Suarez Romero  wrote:
> On Fri, 2017-04-07 at 19:38 +0100, Emil Velikov wrote:
>> On 7 April 2017 at 12:30, Juan A. Suarez Romero  wrote:
>> > Both scripts does not use a file with the commits to ignore. So if we
>> > have handled one of the suggested commits and decided we won't pick it,
>> > the scripts will continue suggesting them.
>> >
>> > This commits adds support for a bin/.cherry-ignore-extra where we can
>> > put the commits not explicitly rejected (those would be in the
>> > bin/.cherry-ignore) but we want the scripts don't suggest them because
>> > we know those won't be picked for stable.
>> >
>>
>> Don't see much value in having the extra file. The patch is not
>> suitable, regardless of how it was flagged.
>>
>
> Ok. I'll send a patch to use .cherry-ignore for all the cases.
>
>> > v2:
>> > - Mark the candidates in bin/get-extra-pick-list.sh (Juan A. Suarez)
>> > ---
>> >  bin/get-extra-pick-list.sh | 12 
>> >  bin/get-fixes-pick-list.sh | 14 ++
>>
>> bin/get-typod-pick-list.sh could use the .cherry-ignore fix right ?
>
> This script is using .cherry-ignore. Which fix do you mean?
>
Got confused there. Please ignore.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 0/4] intel/isl: Add support for emitting depth/stencil

2017-04-10 Thread Pohjolainen, Topi
On Fri, Apr 07, 2017 at 10:42:21PM -0700, Jason Ekstrand wrote:
> This is mostly a re-send of previous patches.  The two things that have
> changed over the last version is that the first patch is now actually
> correct for gen6.  Prior to sending the original version, I tested it only
> with Vulkan which doesn't run on gen6 so a few fields were missed.  This
> version passes both GL and Vulkan.
> 
> The second change is that we now pass the result of the relocation function
> calls into the address fields.  Previously, the addresses weren't properly
> getting filled out so we didn't have a valid address if a relocation didn't
> happen.  I have no idea how the Vulkan CTS managed to *not* catch this.

I'm equally surprised how version one worked at all. I remember comparing
the explicit reloc calls to the combine_address macros but somehow also missed
that addresses in the packets were left uninitialised.

These are:

Reviewed-by: Topi Pohjolainen 

> 
> Eventually, I think I'd like to use some sort of a relocation function
> pointer hook in ISL for doing these things but using the address fields
> works for now.  Unfortunately, we can't use the address fields for regular
> surface states because, thanks to the bottom bits of
> AuxiliarySurfaceBaseAddress being used for other things, you need the
> result of at least some of the packing in order to generate the reloc.  A
> function pointer mechanism would solve this because it would get called
> during the packing process.
> 
> Jason Ekstrand (4):
>   intel/isl: Add support for emitting depth/stencil/hiz
>   anv: Use ISL for emitting depth/stencil/hiz
>   intel/blorp: Emit 3DSTATE_STENCIL_BUFFER before HIER_DEPTH
>   intel/blorp: Use ISL for emitting depth/stencil/hiz
> 
>  src/intel/Makefile.sources |   7 ++
>  src/intel/blorp/blorp_genX_exec.h  | 119 +-
>  src/intel/isl/isl.c|  93 ++
>  src/intel/isl/isl.h|  74 +++
>  src/intel/isl/isl_emit_depth_stencil.c | 199 ++
>  src/intel/isl/isl_priv.h   |  28 +
>  src/intel/vulkan/genX_cmd_buffer.c | 218 
> ++---
>  7 files changed, 473 insertions(+), 265 deletions(-)
>  create mode 100644 src/intel/isl/isl_emit_depth_stencil.c
> 
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Meson mesademos (Was: [RFC libdrm 0/2] Replace the build system with meson)

2017-04-10 Thread Jose Fonseca

On 08/04/17 23:07, Jose Fonseca wrote:

On 08/04/17 00:24, Dylan Baker wrote:

Quoting Jose Fonseca (2017-03-30 15:19:31)


Cool.  BTW, another alternative (for things like LLVM) would  be to
chain build systems (ie, have a wrap that builds LLVM invoking CMake)

Jose



I have no idea whether chaining would work or not, that would be an
interesting
thing to try.

I have force pushed to the meson branch. Things are building on Linux
with both
mingw, gcc, and clang. I've wrapped freeglut and glew, and pulled out
the epoxy
stuff. The mingw cross build does rely on an unmerged patch (that is
approved,
just awaiting merge) for mingw windres support in the cross file. That
shouldn't
be a problem for msvc or building natively with mingw.

I have not done any of the msvc work or either mesa-demos or for
freeglut or
glew. Hopefully this gets things far enough along that you can get
msvc going
when you have some time.

Dylan



Thanks.  I hit an errors early on with MSVC.

I fixed a few, but I didn't spend much time on it.  Instead I added
AppVeyor integration so anybody can experiment.


https://ci.appveyor.com/project/jrfonseca/mesademos/build/job/qysf73s4975i2w36


  https://cgit.freedesktop.org/~jrfonseca/mesademos/log/?h=meson-appveyor

I had to push to a private git repos since I'd need a FDO admin to
install Appveyor hook on the official mesa demos repo.  I'll get that
going but it'll probably take time.

Given you use Github, it should be trivial to hook Appveyor on your
mesademos repos in github.

Jose


I've been trying to get native mingw to build.  (It's still important to 
prototype mesademos with MSVC to ensure meson is up to the task, but 
long term, I think I'll push for dropping MSVC support from mesademos 
and piglit, since MinGW is fine for this sort of samples/tests programs.)


However native MinGW fails poorly:

[78/1058] Static linking library src/util/libutil.a
FAILED: src/util/libutil.a
cmd /c del /f /s /q src/util/libutil.a && ar @src/util/libutil.a.rsp
Invalid switch - "util".

So the problem here is that meson is passing `/` separator to the 
cmd.exe del command, instead of `\`.


Full log 
https://ci.appveyor.com/project/jrfonseca/mesademos/build/job/6rpen94u7yq3q69n



TBH, this is basic windows functionality, and if it can't get it right 
then it shakes my belief that's it's getting proper windows testing...



I think part of the problem is that per 
https://github.com/mesonbuild/meson/blob/master/.appveyor.yml Meson is 
only being tested with MSYS (which provides a full-blow POSIX 
environment on Windows), and not with plain MinGW.



IMHO, MSYS is a hack to get packages that use autotools to build with 
MinGW.   Packages that use Windows aware build systems (like Meson is 
trying to be) should stay as _far_ as possible from MSYS



Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 100613] Regression in Mesa 17 on s390x (zSystems)

2017-04-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=100613

İsmail Dönmez  changed:

   What|Removed |Added

 CC||ism...@i10z.com

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/2] bin/get-{extra, fixes}-pick-list.sh: add support for ignore list

2017-04-10 Thread Juan A. Suarez Romero
On Fri, 2017-04-07 at 19:38 +0100, Emil Velikov wrote:
> On 7 April 2017 at 12:30, Juan A. Suarez Romero  wrote:
> > Both scripts does not use a file with the commits to ignore. So if we
> > have handled one of the suggested commits and decided we won't pick it,
> > the scripts will continue suggesting them.
> > 
> > This commits adds support for a bin/.cherry-ignore-extra where we can
> > put the commits not explicitly rejected (those would be in the
> > bin/.cherry-ignore) but we want the scripts don't suggest them because
> > we know those won't be picked for stable.
> > 
> 
> Don't see much value in having the extra file. The patch is not
> suitable, regardless of how it was flagged.
> 

Ok. I'll send a patch to use .cherry-ignore for all the cases.

> > v2:
> > - Mark the candidates in bin/get-extra-pick-list.sh (Juan A. Suarez)
> > ---
> >  bin/get-extra-pick-list.sh | 12 
> >  bin/get-fixes-pick-list.sh | 14 ++
> 
> bin/get-typod-pick-list.sh could use the .cherry-ignore fix right ?

This script is using .cherry-ignore. Which fix do you mean?


> 
> Thanks
> Emil
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] gallium/radeon: add HUD queries for GPU temperature and clocks

2017-04-10 Thread Samuel Pitoiset
Only the Radeon kernel driver exposed the GPU temperature and
the shader/memory clocks, this implements the same functionality
for the AMDGPU kernel driver.

These queries will return 0 if the DRM version is less than 3.10,
I don't explicitely check the version here because the query
codepath is already a bit messy.

v2: - rebase on top of master

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/radeon/r600_query.c   | 12 ++--
 src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c |  7 ++-
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_query.c 
b/src/gallium/drivers/radeon/r600_query.c
index cb90850a50..0980eca788 100644
--- a/src/gallium/drivers/radeon/r600_query.c
+++ b/src/gallium/drivers/radeon/r600_query.c
@@ -1799,6 +1799,10 @@ static struct pipe_driver_query_info 
r600_driver_query_list[] = {
XG(GPIN, "GPIN_003",GPIN_NUM_SPI,   UINT, AVERAGE),
XG(GPIN, "GPIN_004",GPIN_NUM_SE,UINT, AVERAGE),
 
+   X("temperature",GPU_TEMPERATURE,UINT64, 
AVERAGE),
+   X("shader-clock",   CURRENT_GPU_SCLK,   HZ, AVERAGE),
+   X("memory-clock",   CURRENT_GPU_MCLK,   HZ, AVERAGE),
+
/* The following queries must be at the end of the list because their
 * availability is adjusted dynamically based on the DRM version. */
X("GPU-load",   GPU_LOAD,   UINT64, 
AVERAGE),
@@ -1823,10 +1827,6 @@ static struct pipe_driver_query_info 
r600_driver_query_list[] = {
X("GPU-dma-busy",   GPU_DMA_BUSY,   UINT64, 
AVERAGE),
X("GPU-scratch-ram-busy",   GPU_SCRATCH_RAM_BUSY,   UINT64, 
AVERAGE),
X("GPU-ce-busy",GPU_CE_BUSY,UINT64, 
AVERAGE),
-
-   X("temperature",GPU_TEMPERATURE,UINT64, 
AVERAGE),
-   X("shader-clock",   CURRENT_GPU_SCLK,   HZ, AVERAGE),
-   X("memory-clock",   CURRENT_GPU_MCLK,   HZ, AVERAGE),
 };
 
 #undef X
@@ -1839,9 +1839,9 @@ static unsigned r600_get_num_queries(struct 
r600_common_screen *rscreen)
return ARRAY_SIZE(r600_driver_query_list);
else if (rscreen->info.drm_major == 3) {
if (rscreen->chip_class >= VI)
-   return ARRAY_SIZE(r600_driver_query_list) - 3;
+   return ARRAY_SIZE(r600_driver_query_list);
else
-   return ARRAY_SIZE(r600_driver_query_list) - 10;
+   return ARRAY_SIZE(r600_driver_query_list) - 7;
}
else
return ARRAY_SIZE(r600_driver_query_list) - 25;
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
index bb7e545ed6..f3a0c958ed 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
@@ -471,9 +471,14 @@ static uint64_t amdgpu_query_value(struct radeon_winsys 
*rws,
   amdgpu_query_heap_info(ws->dev, AMDGPU_GEM_DOMAIN_GTT, 0, );
   return heap.heap_usage;
case RADEON_GPU_TEMPERATURE:
+  amdgpu_query_sensor_info(ws->dev, AMDGPU_INFO_SENSOR_GPU_TEMP, 4, 
);
+  return retval;
case RADEON_CURRENT_SCLK:
+  amdgpu_query_sensor_info(ws->dev, AMDGPU_INFO_SENSOR_GFX_SCLK, 4, 
);
+  return retval;
case RADEON_CURRENT_MCLK:
-  return 0;
+  amdgpu_query_sensor_info(ws->dev, AMDGPU_INFO_SENSOR_GFX_MCLK, 4, 
);
+  return retval;
case RADEON_GPU_RESET_COUNTER:
   assert(0);
   return 0;
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >