Re: [Mesa-dev] [PATCH] i965: Drop brw_bo_alloc in ARB_indirect_parameters implementation.

2017-10-10 Thread Kenneth Graunke
On Tuesday, October 10, 2017 6:55:14 PM PDT Ilia Mirkin wrote:
> Do you need to initialize to null? Haven't looked at context, just guessing
> based on your description.

Yikes, yes, fixed locally.  Thanks!


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-10-10 Thread Jason Ekstrand
On Tue, Oct 10, 2017 at 1:10 PM, Jason Ekstrand 
wrote:

> On Tue, Oct 10, 2017 at 9:16 AM, Connor Abbott 
> wrote:
>
>> I'm a little nervous about this, because really, the only solution to
>> this problem is to ignore all non-WE_all definitions of all variables
>> in liveness analysis. For example, in something like:
>>
>> vec4 color2 = ...
>> if (...) {
>>color2 = texture();
>> }
>>
>> texture() can also overwrite inactive channels of color2. We happen to
>> get this right because we turn live ranges into live intervals without
>> holes, but I can't come up with a good reason why that would save us
>> in all cases except the one in this patch -- which makes me worry that
>> we'll find yet another case where there's a similar problem. I think
>> it would be clearer if we what I said above, i.e. ignore all
>> non-WE_all definitions, which will make things much worse, but then
>> apply Curro's patch which will return things to pretty much how they
>> were before, except this case will be fixed and maybe some other cases
>> we haven't thought of.
>>
>
> What you're suggesting may actually be less code and is arguably better in
> terms of being more straightforward.  However, I think intervals plus this
> patch is equivalent.  Curro's patch + always-partial will cause us to start
> the live range at the IP where it first *may* be defined and we keep the
> behavior of ending the live range at the last IP where some reachable
> instruction may use it.  With my patch + Curro's, we start the live range
> at the IP where it is first defined which will always all places it *may*
> be defined unless there is a back edge.  If there is a back-edge, I pull
> the live range up across the back edge.
>
> That said, I think I agree with you that my solution treats it as a
> special case and not a general problem.  However, I'm relucant to just
> change liveness analysis to assume partial writes because we use it for
> more than computing variable interference.  In particular, we use it for
> dead code elimination and the concept of partial/complete writes is crucial
> there.  I've got another patch cooking which I'll send out soon which
> should make you happier with it.
>

Forget the other patch I have cooking.  It came out burned. :-(  At the
moment, I have no better plan than the one in this patch.  Every "more
general" thing I've tried today has ended up extending the live ranges far
further than needed and making hash of shader-db.  The more I think about
it, the more I only see two solutions:

 1) Take advantage of structure like this patch does.  I really don't think
there's anything wrong, given that we have a structured IR, in thinking of
it in a structured way.

 2) Go fully for the dual-liveness model. Have what amounts to a virtual
"physical" CFG which is just the logical CFG with some edges added and have
"physical" livein/out sets which model the conflicting version.  The tricky
part would be to get the physical CFG right.  You would need at least:

a) Edges from each break to the block after the break like the first
attempt

b) Same thing for the continues

c) Edges from the then block into the else block (this shouldn't
actually matter thanks to intervals but we may as well)

d) An edge from the top of every loop to the bottom to simulate going
through the entire loop as a disabled channel.  You can think of this as
putting the entire contents of the loop in a virtual "if" statement.

I believe what was missing from the patch I sent out first was actually
edge d.  It's all well and good to get to the point where the def is
available at the top of the loop (which a) does) but it's another thing
entirely to say that there's a path from the top to the bottom where
nothing in the middle has a chance to kill it.  The thing I don't like
about option 2 is that it's still taking advantage of structure, it's just
adds enough compiler theory language on top to confuse things to the point
where you almost can't see it.  We may be able to come up with a model
that's truly unstructured but I'd need to think long and hard about how
unstructured SIMD control flow works.

Thoughts?  Curro, I'm happy for you to chime in too now that you're back.

--Jason


> --Jason
>
>
>>
>> On Thu, Oct 5, 2017 at 2:52 PM, Jason Ekstrand 
>> wrote:
>> > No Shader-db changes.
>> >
>> > Cc: mesa-sta...@lists.freedesktop.org
>> > ---
>> >  src/intel/compiler/brw_fs_live_variables.cpp | 55
>> 
>> >  1 file changed, 55 insertions(+)
>> >
>> > diff --git a/src/intel/compiler/brw_fs_live_variables.cpp
>> b/src/intel/compiler/brw_fs_live_variables.cpp
>> > index c449672..380060d 100644
>> > --- a/src/intel/compiler/brw_fs_live_variables.cpp
>> > +++ b/src/intel/compiler/brw_fs_live_variables.cpp
>> > @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
>> >   }
>> >}
>> > }
>> > +
>> > +   /* Due to the explicit way the SIMD data is handled on GEN, we need
>> to be a
>> > 

[Mesa-dev] [PATCH 4/4] radv: remove duplicate line of code

2017-10-10 Thread Timothy Arceri
The same line of code is a few lines above.
---
 src/amd/vulkan/radv_pipeline_cache.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/amd/vulkan/radv_pipeline_cache.c 
b/src/amd/vulkan/radv_pipeline_cache.c
index bced425966..714edf03a4 100644
--- a/src/amd/vulkan/radv_pipeline_cache.c
+++ b/src/amd/vulkan/radv_pipeline_cache.c
@@ -184,21 +184,20 @@ radv_create_shader_variant_from_pipeline_cache(struct 
radv_device *device,
 
variant = calloc(1, sizeof(struct radv_shader_variant));
if (!variant)
return NULL;
 
variant->code_size = entry->code_size;
variant->config = entry->config;
variant->info = entry->variant_info;
variant->rsrc1 = entry->rsrc1;
variant->rsrc2 = entry->rsrc2;
-   variant->code_size = entry->code_size;
variant->ref_count = 1;
 
void *ptr = radv_alloc_shader_memory(device, variant);
memcpy(ptr, entry->code, entry->code_size);
 
entry->variant = variant;
}
 
p_atomic_inc(&entry->variant->ref_count);
return entry->variant;
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] radv: create on-disk shader cache

2017-10-10 Thread Timothy Arceri
This is the drivers on-disk cache intended to be used as a
fallback as opposed to the pipeline cache provided by apps.
---
 src/amd/vulkan/radv_device.c  | 13 +
 src/amd/vulkan/radv_private.h |  6 ++
 src/util/disk_cache.h | 15 +++
 3 files changed, 34 insertions(+)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index d414d8ca85..23f5e70321 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -345,20 +345,32 @@ radv_physical_device_init(struct radv_physical_device 
*device,
}
 
if (radv_device_get_cache_uuid(device->rad_info.family, 
device->cache_uuid)) {
radv_finish_wsi(device);
device->ws->destroy(device->ws);
result = vk_errorf(VK_ERROR_INITIALIZATION_FAILED,
   "cannot generate UUID");
goto fail;
}
 
+   /* These flags affect shader compilation. */
+   uint64_t shader_env_flags =
+   (device->instance->perftest_flags & RADV_PERFTEST_SISCHED ? 0x1 
: 0) |
+   (device->instance->debug_flags & RADV_DEBUG_UNSAFE_MATH ? 0x2 : 
0);
+
+   /* The gpu id is already embeded in the uuid so we just pass "radv"
+* when creating the cache.
+*/
+   char buf[VK_UUID_SIZE + 1];
+   disk_cache_format_hex_id(buf, device->cache_uuid, VK_UUID_SIZE);
+   device->disk_cache = disk_cache_create("radv", buf, shader_env_flags);
+
result = radv_extensions_register(instance,
&device->extensions,
common_device_extensions,
ARRAY_SIZE(common_device_extensions));
if (result != VK_SUCCESS)
goto fail;
 
if (device->rad_info.chip_class >= VI && device->rad_info.max_se >= 2) {
result = radv_extensions_register(instance,
&device->extensions,
@@ -395,20 +407,21 @@ fail:
close(fd);
return result;
 }
 
 static void
 radv_physical_device_finish(struct radv_physical_device *device)
 {
radv_extensions_finish(device->instance, &device->extensions);
radv_finish_wsi(device);
device->ws->destroy(device->ws);
+   disk_cache_destroy(device->disk_cache);
close(device->local_fd);
 }
 
 static void *
 default_alloc_func(void *pUserData, size_t size, size_t align,
VkSystemAllocationScope allocationScope)
 {
return malloc(size);
 }
 
diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
index 548952faa9..e673527811 100644
--- a/src/amd/vulkan/radv_private.h
+++ b/src/amd/vulkan/radv_private.h
@@ -270,20 +270,26 @@ struct radv_physical_device {
uint8_t driver_uuid[VK_UUID_SIZE];
uint8_t device_uuid[VK_UUID_SIZE];
uint8_t cache_uuid[VK_UUID_SIZE];
 
int local_fd;
struct wsi_device   wsi_device;
struct radv_extensions  extensions;
 
bool has_rbplus; /* if RB+ register exist */
bool rbplus_allowed; /* if RB+ is allowed */
+
+
+   /* This is the drivers on-disk cache used as a fallback as opposed to
+* the pipeline cache defined by apps.
+*/
+   struct disk_cache *  disk_cache;
 };
 
 struct radv_instance {
VK_LOADER_DATA  _loader_data;
 
VkAllocationCallbacks   alloc;
 
uint32_tapiVersion;
int physicalDeviceCount;
struct radv_physical_device 
physicalDevices[RADV_MAX_DRM_DEVICES];
diff --git a/src/util/disk_cache.h b/src/util/disk_cache.h
index d2e4d9a69c..488b297ead 100644
--- a/src/util/disk_cache.h
+++ b/src/util/disk_cache.h
@@ -58,20 +58,35 @@ struct cache_item_metadata {
 */
uint32_t type;
 
/** GLSL cache item metadata */
cache_key *keys;   /* sha1 list of shaders that make up the cache item */
uint32_t num_keys;
 };
 
 struct disk_cache;
 
+static inline char *
+disk_cache_format_hex_id(char *buf, const uint8_t *hex_id, unsigned size)
+{
+   static const char hex_digits[] = "0123456789abcdef";
+   unsigned i;
+
+   for (i = 0; i < size; i += 2) {
+  buf[i] = hex_digits[hex_id[i >> 1] >> 4];
+  buf[i + 1] = hex_digits[hex_id[i >> 1] & 0x0f];
+   }
+   buf[i] = '\0';
+
+   return buf;
+}
+
 static inline bool
 disk_cache_get_function_timestamp(void *ptr, uint32_t* timestamp)
 {
 #ifdef ENABLE_SHADER_CACHE
Dl_info info;
struct stat st;
if (!dladdr(ptr, &info) || !info.dli_fname) {
   return false;
}
if (stat(info.dli_fname, &st)) {
-- 
2.13.6


[Mesa-dev] [PATCH 1/4] radv: remove duplicate debug_flags field

2017-10-10 Thread Timothy Arceri
---
 src/amd/vulkan/radv_cmd_buffer.c | 2 +-
 src/amd/vulkan/radv_debug.c  | 2 +-
 src/amd/vulkan/radv_device.c | 4 +---
 src/amd/vulkan/radv_image.c  | 4 ++--
 src/amd/vulkan/radv_meta_clear.c | 4 ++--
 src/amd/vulkan/radv_pipeline.c   | 4 ++--
 src/amd/vulkan/radv_pipeline_cache.c | 2 +-
 src/amd/vulkan/radv_private.h| 1 -
 src/amd/vulkan/radv_shader.c | 8 
 9 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index 67e038a152..495fd67dbb 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -366,21 +366,21 @@ void radv_cmd_buffer_trace_emit(struct radv_cmd_buffer 
*cmd_buffer)
++cmd_buffer->state.trace_id;
device->ws->cs_add_buffer(cs, device->trace_bo, 8);
radv_emit_write_data_packet(cs, va, 1, &cmd_buffer->state.trace_id);
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0));
radeon_emit(cs, AC_ENCODE_TRACE_POINT(cmd_buffer->state.trace_id));
 }
 
 static void
 radv_cmd_buffer_after_draw(struct radv_cmd_buffer *cmd_buffer)
 {
-   if (cmd_buffer->device->debug_flags & RADV_DEBUG_SYNC_SHADERS) {
+   if (cmd_buffer->device->instance->debug_flags & 
RADV_DEBUG_SYNC_SHADERS) {
enum radv_cmd_flush_bits flags;
 
/* Force wait for graphics/compute engines to be idle. */
flags = RADV_CMD_FLAG_PS_PARTIAL_FLUSH |
RADV_CMD_FLAG_CS_PARTIAL_FLUSH;
 
si_cs_emit_cache_flush(cmd_buffer->cs, false,
   
cmd_buffer->device->physical_device->rad_info.chip_class,
   NULL, 0,
   radv_cmd_buffer_uses_mec(cmd_buffer),
diff --git a/src/amd/vulkan/radv_debug.c b/src/amd/vulkan/radv_debug.c
index cb9509117e..b69c05b64f 100644
--- a/src/amd/vulkan/radv_debug.c
+++ b/src/amd/vulkan/radv_debug.c
@@ -592,21 +592,21 @@ radv_dump_dmesg(FILE *f)
pclose(p);
 }
 
 static void
 radv_dump_enabled_options(struct radv_device *device, FILE *f)
 {
uint64_t mask;
 
fprintf(f, "Enabled debug options: ");
 
-   mask = device->debug_flags;
+   mask = device->instance->debug_flags;
while (mask) {
int i = u_bit_scan64(&mask);
fprintf(f, "%s, ", radv_get_debug_option_name(i));
}
fprintf(f, "\n");
 
fprintf(f, "Enabled perftest options: ");
 
mask = device->instance->perftest_flags;
while (mask) {
diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index e07a573819..d414d8ca85 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -1147,22 +1147,20 @@ VkResult radv_CreateDevice(
 VK_SYSTEM_ALLOCATION_SCOPE_DEVICE);
if (!device)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
 
memset(device, 0, sizeof(*device));
 
device->_loader_data.loaderMagic = ICD_LOADER_MAGIC;
device->instance = physical_device->instance;
device->physical_device = physical_device;
 
-   device->debug_flags = device->instance->debug_flags;
-
device->ws = physical_device->ws;
if (pAllocator)
device->alloc = *pAllocator;
else
device->alloc = physical_device->instance->alloc;
 
mtx_init(&device->shader_slab_mutex, mtx_plain);
list_inithead(&device->shader_slabs);
 
for (unsigned i = 0; i < pCreateInfo->queueCreateInfoCount; i++) {
@@ -3144,21 +3142,21 @@ radv_initialise_color_surface(struct radv_device 
*device,
S_028C70_ENDIAN(endian);
if ((iview->image->info.samples > 1) && iview->image->fmask.size) {
cb->cb_color_info |= S_028C70_COMPRESSION(1);
if (device->physical_device->rad_info.chip_class == SI) {
unsigned fmask_bankh = 
util_logbase2(iview->image->fmask.bank_height);
cb->cb_color_attrib |= 
S_028C74_FMASK_BANK_HEIGHT(fmask_bankh);
}
}
 
if (iview->image->cmask.size &&
-   !(device->debug_flags & RADV_DEBUG_NO_FAST_CLEARS))
+   !(device->instance->debug_flags & RADV_DEBUG_NO_FAST_CLEARS))
cb->cb_color_info |= S_028C70_FAST_CLEAR(1);
 
if (radv_vi_dcc_enabled(iview->image, iview->base_mip))
cb->cb_color_info |= S_028C70_DCC_ENABLE(1);
 
if (device->physical_device->rad_info.chip_class >= VI) {
unsigned max_uncompressed_block_size = 2;
if (iview->image->info.samples > 1) {
if (iview->image->surface.bpe == 1)
max_uncompressed_block_size = 0;
diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c
index f0645279aa..7c3e55b1b8 100644
--- a/src/amd/vulkan/radv_image.c
++

[Mesa-dev] [PATCH 3/4] radv: make use of on-disk cache

2017-10-10 Thread Timothy Arceri
If the app provided in-memory pipeline cache doesn't yet contain
what we are looking for, or it doesn't provide one at all then we
fallback to the on-disk cache.
---
 src/amd/vulkan/radv_pipeline_cache.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/src/amd/vulkan/radv_pipeline_cache.c 
b/src/amd/vulkan/radv_pipeline_cache.c
index 625c58e66c..bced425966 100644
--- a/src/amd/vulkan/radv_pipeline_cache.c
+++ b/src/amd/vulkan/radv_pipeline_cache.c
@@ -16,20 +16,21 @@
  * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
  * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
  * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
  * IN THE SOFTWARE.
  */
 
 #include "util/mesa-sha1.h"
 #include "util/debug.h"
+#include "util/disk_cache.h"
 #include "util/u_atomic.h"
 #include "radv_debug.h"
 #include "radv_private.h"
 #include "radv_shader.h"
 
 #include "ac_nir_to_llvm.h"
 
 struct cache_entry {
union {
unsigned char sha1[20];
@@ -158,22 +159,32 @@ radv_create_shader_variant_from_pipeline_cache(struct 
radv_device *device,
   struct radv_pipeline_cache 
*cache,
   const unsigned char *sha1)
 {
struct cache_entry *entry = NULL;
 
if (cache)
entry = radv_pipeline_cache_search(cache, sha1);
else
entry = radv_pipeline_cache_search(device->mem_cache, sha1);
 
-   if (!entry)
-   return NULL;
+   if (!entry) {
+   uint8_t disk_sha1[20];
+   disk_cache_compute_key(device->physical_device->disk_cache,
+  sha1, 20, disk_sha1);
+   entry = (struct cache_entry *)
+   disk_cache_get(device->physical_device->disk_cache,
+  disk_sha1, NULL);
+   if (!entry)
+   return NULL;
+
+   entry->variant = NULL;
+   }
 
if (!entry->variant) {
struct radv_shader_variant *variant;
 
variant = calloc(1, sizeof(struct radv_shader_variant));
if (!variant)
return NULL;
 
variant->code_size = entry->code_size;
variant->config = entry->config;
@@ -299,20 +310,30 @@ radv_pipeline_cache_insert_shader(struct radv_device 
*device,
entry->config = variant->config;
entry->variant_info = variant->info;
entry->rsrc1 = variant->rsrc1;
entry->rsrc2 = variant->rsrc2;
entry->code_size = code_size;
entry->variant = variant;
p_atomic_inc(&variant->ref_count);
 
radv_pipeline_cache_add_entry(cache, entry);
 
+   /* Always add cache items to disk. This will allow collection of
+* compiled shaders by third parties such as steam, even if the app
+* implements its own pipeline cache.
+*/
+   uint8_t disk_sha1[20];
+   disk_cache_compute_key(device->physical_device->disk_cache, sha1, 20,
+  disk_sha1);
+   disk_cache_put(device->physical_device->disk_cache,
+  disk_sha1, entry, entry_size(entry), NULL);
+
cache->modified = true;
pthread_mutex_unlock(&cache->mutex);
return variant;
 }
 
 struct cache_header {
uint32_t header_size;
uint32_t header_version;
uint32_t vendor_id;
uint32_t device_id;
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] nv50, nvc0: fix push hint logic in presence of a start offset

2017-10-10 Thread Ilia Mirkin
Previously buffer offsets were passed in explicitly as an offset, which
had to be added to the resource address. Now they are passed in via an
increased 'start' parameter. As a result, we were double-adding the
start offset in this kind of situation.

This condition was triggered by piglit's draw-elements test which has a
requisite glMultiDrawElements in combination with a small enough number
of vertices to go through the immediate push path.

Reported-by: Karol Herbst 
Fixes: 330d0607ed6 ("gallium: remove pipe_index_buffer and set_index_buffer")
Cc: mesa-sta...@lists.freedesktop.org
---

v1 -> v2: also do it for nv50

 src/gallium/drivers/nouveau/nv50/nv50_push.c  | 3 +--
 src/gallium/drivers/nouveau/nvc0/nvc0_vbo_translate.c | 9 -
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_push.c 
b/src/gallium/drivers/nouveau/nv50/nv50_push.c
index 9ee9a8eed19..bec2d42e037 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_push.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_push.c
@@ -279,8 +279,7 @@ nv50_push_vbo(struct nv50_context *nv50, const struct 
pipe_draw_info *info)
if (info->index_size) {
   if (!info->has_user_indices) {
  ctx.idxbuf = nouveau_resource_map_offset(&nv50->base,
-nv04_resource(info->index.resource), info->start * 
info->index_size,
-NOUVEAU_BO_RD);
+nv04_resource(info->index.resource), 0, NOUVEAU_BO_RD);
   } else {
  ctx.idxbuf = info->index.user;
   }
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_vbo_translate.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_vbo_translate.c
index f05618f6596..256e20df2e4 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_vbo_translate.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_vbo_translate.c
@@ -84,13 +84,12 @@ nvc0_vertex_configure_translate(struct nvc0_context *nvc0, 
int32_t index_bias)
 
 static inline void
 nvc0_push_map_idxbuf(struct push_context *ctx, struct nvc0_context *nvc0,
- const struct pipe_draw_info *info,
- unsigned offset)
+ const struct pipe_draw_info *info)
 {
if (!info->has_user_indices) {
   struct nv04_resource *buf = nv04_resource(info->index.resource);
-  ctx->idxbuf = nouveau_resource_map_offset(&nvc0->base,
- buf, offset, NOUVEAU_BO_RD);
+  ctx->idxbuf = nouveau_resource_map_offset(
+&nvc0->base, buf, 0, NOUVEAU_BO_RD);
} else {
   ctx->idxbuf = info->index.user;
}
@@ -509,7 +508,7 @@ nvc0_push_vbo(struct nvc0_context *nvc0, const struct 
pipe_draw_info *info)
nvc0->state.prim_restart = info->primitive_restart;
 
if (info->index_size) {
-  nvc0_push_map_idxbuf(&ctx, nvc0, info, info->start * info->index_size);
+  nvc0_push_map_idxbuf(&ctx, nvc0, info);
   index_size = info->index_size;
} else {
   if (unlikely(info->count_from_stream_output)) {
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nvc0: fix push hint logic in presence of a start offset

2017-10-10 Thread Ilia Mirkin
Previously buffer offsets were passed in explicitly as an offset, which
had to be added to the resource address. Now they are passed in via an
increased 'start' parameter. As a result, we were double-adding the
start offset in this kind of situation.

This condition was triggered by piglit's draw-elements test which has a
requisite glMultiDrawElements in combination with a small enough number
of vertices to go through the immediate push path.

Reported-by: Karol Herbst 
Fixes: 330d0607ed6 ("gallium: remove pipe_index_buffer and set_index_buffer")
Cc: mesa-sta...@lists.freedesktop.org
---
 src/gallium/drivers/nouveau/nvc0/nvc0_vbo_translate.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_vbo_translate.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_vbo_translate.c
index f05618f6596..256e20df2e4 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_vbo_translate.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_vbo_translate.c
@@ -84,13 +84,12 @@ nvc0_vertex_configure_translate(struct nvc0_context *nvc0, 
int32_t index_bias)
 
 static inline void
 nvc0_push_map_idxbuf(struct push_context *ctx, struct nvc0_context *nvc0,
- const struct pipe_draw_info *info,
- unsigned offset)
+ const struct pipe_draw_info *info)
 {
if (!info->has_user_indices) {
   struct nv04_resource *buf = nv04_resource(info->index.resource);
-  ctx->idxbuf = nouveau_resource_map_offset(&nvc0->base,
- buf, offset, NOUVEAU_BO_RD);
+  ctx->idxbuf = nouveau_resource_map_offset(
+&nvc0->base, buf, 0, NOUVEAU_BO_RD);
} else {
   ctx->idxbuf = info->index.user;
}
@@ -509,7 +508,7 @@ nvc0_push_vbo(struct nvc0_context *nvc0, const struct 
pipe_draw_info *info)
nvc0->state.prim_restart = info->primitive_restart;
 
if (info->index_size) {
-  nvc0_push_map_idxbuf(&ctx, nvc0, info, info->start * info->index_size);
+  nvc0_push_map_idxbuf(&ctx, nvc0, info);
   index_size = info->index_size;
} else {
   if (unlikely(info->count_from_stream_output)) {
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: fix component packing of array with single slot type

2017-10-10 Thread Timothy Arceri

On 11/10/17 11:30, Kenneth Graunke wrote:

On Tuesday, October 10, 2017 4:57:08 PM PDT Timothy Arceri wrote:

From: Kenneth Graunke 

ARB_enhanced_layouts enables us to pack array varyings with
non-arrays types e.g.

layout(location = 0) in vec3 a[6];
layout(location = 0, component = 3) in float b;

With this change we calculate the size of output registers in a
separate pass, before allocating them.

Reviewed-by: Timothy Arceri 
---
  src/intel/compiler/brw_fs_nir.cpp | 23 ++-
  1 file changed, 18 insertions(+), 5 deletions(-)


I actually sent this out last night, so I pulled your R-b from this
email and pushed the patch.  Thanks!



I looked for it but didn't see it :( All good.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] include/drm-uapi: clarify when headers can be updated.

2017-10-10 Thread Gurchetan Singh
Reviewed-by: Gurchetan Singh

On Tue, Oct 10, 2017 at 5:46 PM, Dave Airlie  wrote:

> From: Dave Airlie 
>
> Clarify when headers can be updated here.
>
> Signed-off-by: Dave Airlie 
> ---
>  include/drm-uapi/README | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/include/drm-uapi/README b/include/drm-uapi/README
> index d67fb562..27de91c 100644
> --- a/include/drm-uapi/README
> +++ b/include/drm-uapi/README
> @@ -3,6 +3,9 @@ required by the anv & i965 drivers to communicate with the
> kernel.
>  Whenever either of those driver needs new definitions for new kernel
>  APIs, these files should be updated.
>
> +These files in master should only be updated once the changes have landed
> +in the drm-next tree.
> +
>  You can copy files installed after running this from the kernel
>  repository, at version the drivers require :
>
> --
> 2.9.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] include/drm-uapi: clarify when headers can be updated.

2017-10-10 Thread Dave Airlie
From: Dave Airlie 

Clarify when headers can be updated here.

Signed-off-by: Dave Airlie 
---
 include/drm-uapi/README | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/drm-uapi/README b/include/drm-uapi/README
index d67fb562..27de91c 100644
--- a/include/drm-uapi/README
+++ b/include/drm-uapi/README
@@ -3,6 +3,9 @@ required by the anv & i965 drivers to communicate with the 
kernel.
 Whenever either of those driver needs new definitions for new kernel
 APIs, these files should be updated.
 
+These files in master should only be updated once the changes have landed
+in the drm-next tree.
+
 You can copy files installed after running this from the kernel
 repository, at version the drivers require :
 
-- 
2.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: fix component packing of array with single slot type

2017-10-10 Thread Kenneth Graunke
On Tuesday, October 10, 2017 4:57:08 PM PDT Timothy Arceri wrote:
> From: Kenneth Graunke 
> 
> ARB_enhanced_layouts enables us to pack array varyings with
> non-arrays types e.g.
> 
> layout(location = 0) in vec3 a[6];
> layout(location = 0, component = 3) in float b;
> 
> With this change we calculate the size of output registers in a
> separate pass, before allocating them.
> 
> Reviewed-by: Timothy Arceri 
> ---
>  src/intel/compiler/brw_fs_nir.cpp | 23 ++-
>  1 file changed, 18 insertions(+), 5 deletions(-)

I actually sent this out last night, so I pulled your R-b from this
email and pushed the patch.  Thanks!


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Drop brw_bo_alloc in ARB_indirect_parameters implementation.

2017-10-10 Thread Kenneth Graunke
The original implementation allocated a new BO here, but we decided to
switch to intel_upload_space, which returns a reference to the current
upload BO.  We accidentally kept the brw_bo_alloc, even though it's no
longer necessary - intel_upload_space will immediately unreference it,
causing us to allocate and immediately free a buffer.

Cc: Plamena Manolova 
---
 src/mesa/drivers/dri/i965/brw_draw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index d6aa95b6f60..8f616e76c0d 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -903,7 +903,7 @@ brw_draw_prims(struct gl_context *ctx,
for (i = 0; i < nr_prims; i++) {
   /* Implementation of ARB_indirect_parameters via predicates */
   if (brw->draw.draw_params_count_bo) {
- struct brw_bo *draw_id_bo = brw_bo_alloc(brw->bufmgr, "draw_id", 4, 
4);
+ struct brw_bo *draw_id_bo;
  uint32_t draw_id_offset;
 
  intel_upload_data(brw, &prims[i].draw_id, 4, 4, &draw_id_bo,
-- 
2.14.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: bump loop unroll limit to 96.

2017-10-10 Thread Timothy Arceri

On 11/10/17 10:50, Dave Airlie wrote:

From: Dave Airlie 

With the ssao demo from Vulkan demos:
radv/rx480: 440->440fps
anv/haswell: 24->34 fps

The demo does a 0->32 loop across a ubo with 32 members.

Signed-off-by: Dave Airlie 


Reviewed-by: Timothy Arceri 


---
  src/compiler/nir/nir_opt_loop_unroll.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/compiler/nir/nir_opt_loop_unroll.c 
b/src/compiler/nir/nir_opt_loop_unroll.c
index 79d04f9..dae5bfc 100644
--- a/src/compiler/nir/nir_opt_loop_unroll.c
+++ b/src/compiler/nir/nir_opt_loop_unroll.c
@@ -33,8 +33,10 @@
   * to give about the same results. Around 5 instructions per node.  But some
   * loops that would unroll with GLSL IR fail to unroll if we set this to 25 so
   * we set it to 26.
+ * This was bumped to 96 because it unrolled more loops with a positive
+ * effect (vulkan ssao demo).
   */
-#define LOOP_UNROLL_LIMIT 26
+#define LOOP_UNROLL_LIMIT 96
  
  /* Prepare this loop for unrolling by first converting to lcssa and then

   * converting the phis from the loops first block and the block that follows


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: fix component packing of array with single slot type

2017-10-10 Thread Timothy Arceri
From: Kenneth Graunke 

ARB_enhanced_layouts enables us to pack array varyings with
non-arrays types e.g.

layout(location = 0) in vec3 a[6];
layout(location = 0, component = 3) in float b;

With this change we calculate the size of output registers in a
separate pass, before allocating them.

Reviewed-by: Timothy Arceri 
---
 src/intel/compiler/brw_fs_nir.cpp | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 425c52c9917..7ed44f534c0 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -53,14 +53,27 @@ fs_visitor::nir_setup_outputs()
if (stage == MESA_SHADER_TESS_CTRL || stage == MESA_SHADER_FRAGMENT)
   return;
 
+   unsigned vec4s[VARYING_SLOT_TESS_MAX] = { 0, };
+
+   /* Calculate the size of output registers in a separate pass, before
+* allocating them.  With ARB_enhanced_layouts, multiple output variables
+* may occupy the same slot, but have different type sizes.
+*/
nir_foreach_variable(var, &nir->outputs) {
-  const unsigned vec4s =
+  const int loc = var->data.driver_location;
+  const unsigned var_vec4s =
  var->data.compact ? DIV_ROUND_UP(glsl_get_length(var->type), 4)
: type_size_vec4(var->type);
-  fs_reg reg = bld.vgrf(BRW_REGISTER_TYPE_F, 4 * vec4s);
-  for (unsigned i = 0; i < vec4s; i++) {
- if (outputs[var->data.driver_location + i].file == BAD_FILE)
-outputs[var->data.driver_location + i] = offset(reg, bld, 4 * i);
+  vec4s[loc] = MAX2(vec4s[loc], var_vec4s);
+   }
+
+   nir_foreach_variable(var, &nir->outputs) {
+  const int loc = var->data.driver_location;
+  if (outputs[loc].file == BAD_FILE) {
+ fs_reg reg = bld.vgrf(BRW_REGISTER_TYPE_F, 4 * vec4s[loc]);
+ for (unsigned i = 0; i < vec4s[loc]; i++) {
+outputs[loc + i] = offset(reg, bld, 4 * i);
+ }
   }
}
 }
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: fix assert in wsi image code.

2017-10-10 Thread Jason Ekstrand
Bah...

Reviewed-by: Jason Ekstrand 

On Tue, Oct 10, 2017 at 4:47 PM, Dave Airlie  wrote:

> From: Dave Airlie 
>
> This assert was firing just running demos.
>
> Jason said it should be this.
>
> Fixes: 6c7720ed78 (anv/wsi: Allocate enough memory for the entire image)
> Signed-off-by: Dave Airlie 
> ---
>  src/intel/vulkan/anv_wsi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/intel/vulkan/anv_wsi.c b/src/intel/vulkan/anv_wsi.c
> index fc0c288..08d83cd 100644
> --- a/src/intel/vulkan/anv_wsi.c
> +++ b/src/intel/vulkan/anv_wsi.c
> @@ -239,7 +239,7 @@ anv_wsi_image_create(VkDevice device_h,
> memory->bo->flags |= EXEC_OBJECT_WRITE;
>
> anv_BindImageMemory(device_h, image_h, memory_h, 0);
> -   assert(image->size == 0);
> +   assert(image->planes[0].offset == 0);
>
> struct anv_surface *surface = &image->planes[0].surface;
> assert(surface->isl.tiling == ISL_TILING_X);
> --
> 2.9.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir: bump loop unroll limit to 96.

2017-10-10 Thread Dave Airlie
From: Dave Airlie 

With the ssao demo from Vulkan demos:
radv/rx480: 440->440fps
anv/haswell: 24->34 fps

The demo does a 0->32 loop across a ubo with 32 members.

Signed-off-by: Dave Airlie 
---
 src/compiler/nir/nir_opt_loop_unroll.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/compiler/nir/nir_opt_loop_unroll.c 
b/src/compiler/nir/nir_opt_loop_unroll.c
index 79d04f9..dae5bfc 100644
--- a/src/compiler/nir/nir_opt_loop_unroll.c
+++ b/src/compiler/nir/nir_opt_loop_unroll.c
@@ -33,8 +33,10 @@
  * to give about the same results. Around 5 instructions per node.  But some
  * loops that would unroll with GLSL IR fail to unroll if we set this to 25 so
  * we set it to 26.
+ * This was bumped to 96 because it unrolled more loops with a positive
+ * effect (vulkan ssao demo).
  */
-#define LOOP_UNROLL_LIMIT 26
+#define LOOP_UNROLL_LIMIT 96
 
 /* Prepare this loop for unrolling by first converting to lcssa and then
  * converting the phis from the loops first block and the block that follows
-- 
2.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] anv: fix assert in wsi image code.

2017-10-10 Thread Dave Airlie
From: Dave Airlie 

This assert was firing just running demos.

Jason said it should be this.

Fixes: 6c7720ed78 (anv/wsi: Allocate enough memory for the entire image)
Signed-off-by: Dave Airlie 
---
 src/intel/vulkan/anv_wsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_wsi.c b/src/intel/vulkan/anv_wsi.c
index fc0c288..08d83cd 100644
--- a/src/intel/vulkan/anv_wsi.c
+++ b/src/intel/vulkan/anv_wsi.c
@@ -239,7 +239,7 @@ anv_wsi_image_create(VkDevice device_h,
memory->bo->flags |= EXEC_OBJECT_WRITE;
 
anv_BindImageMemory(device_h, image_h, memory_h, 0);
-   assert(image->size == 0);
+   assert(image->planes[0].offset == 0);
 
struct anv_surface *surface = &image->planes[0].surface;
assert(surface->isl.tiling == ISL_TILING_X);
-- 
2.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Android: fix build break from r600/radeon split

2017-10-10 Thread Marek Olšák
Acked-by: Marek Olšák 

Marek

On Wed, Oct 11, 2017 at 12:07 AM, Rob Herring  wrote:
> Commit 06bfb2d28f7a ("r600: fork and import gallium/radeon") broke the
> Android build:
>
> external/mesa3d/src/gallium/drivers/radeon/r600_pipe_common.c:43:10: fatal 
> error: 'llvm-c/TargetMachine.h' file not found
>  ^~~~
>
> Update the Android makefiles so that drivers/radeon is only built when
> radeonsi (and therefore LLVM) is enabled.
>
> Fixes: 06bfb2d28f7a (r600: fork and import gallium/radeon)
> Cc: Marek Olšák 
> Signed-off-by: Rob Herring 
> ---
>  src/gallium/Android.mk| 2 +-
>  src/gallium/drivers/r600/Android.mk   | 4 
>  src/gallium/drivers/radeon/Android.mk | 2 +-
>  3 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/Android.mk b/src/gallium/Android.mk
> index 97cb61a356ef..64e1e4513451 100644
> --- a/src/gallium/Android.mk
> +++ b/src/gallium/Android.mk
> @@ -41,7 +41,7 @@ SUBDIRS += winsys/i915/drm drivers/i915
>  SUBDIRS += winsys/nouveau/drm drivers/nouveau
>  SUBDIRS += winsys/pl111/drm drivers/pl111
>  SUBDIRS += winsys/radeon/drm drivers/r300
> -SUBDIRS += winsys/radeon/drm drivers/r600 drivers/radeon
> +SUBDIRS += winsys/radeon/drm drivers/r600
>  SUBDIRS += winsys/radeon/drm winsys/amdgpu/drm drivers/radeonsi 
> drivers/radeon
>  SUBDIRS += winsys/vc4/drm drivers/vc4
>  SUBDIRS += winsys/virgl/drm winsys/virgl/vtest drivers/virgl
> diff --git a/src/gallium/drivers/r600/Android.mk 
> b/src/gallium/drivers/r600/Android.mk
> index 1683cfa09c9e..9f684cf2445e 100644
> --- a/src/gallium/drivers/r600/Android.mk
> +++ b/src/gallium/drivers/r600/Android.mk
> @@ -45,6 +45,10 @@ $(intermediates)/egd_tables.h: 
> $(MESA_TOP)/src/gallium/drivers/r600/egd_tables.p
> @echo "Gen Header: $(PRIVATE_MODULE) <= $(notdir $(@))"
> $(hide) $(MESA_PYTHON2) 
> $(MESA_TOP)/src/gallium/drivers/r600/egd_tables.py 
> $(MESA_TOP)/src/gallium/drivers/r600/evergreend.h > $@
>
> +ifeq ($(MESA_ENABLE_LLVM),true)
> +$(call mesa-build-with-llvm)
> +endif
> +
>  include $(GALLIUM_COMMON_MK)
>  include $(BUILD_STATIC_LIBRARY)
>
> diff --git a/src/gallium/drivers/radeon/Android.mk 
> b/src/gallium/drivers/radeon/Android.mk
> index c2d3a1cbce60..578ab0be91f9 100644
> --- a/src/gallium/drivers/radeon/Android.mk
> +++ b/src/gallium/drivers/radeon/Android.mk
> @@ -41,7 +41,7 @@ endif
>  include $(GALLIUM_COMMON_MK)
>  include $(BUILD_STATIC_LIBRARY)
>
> -ifneq ($(HAVE_GALLIUM_R600)$(HAVE_GALLIUM_RADEONSI),)
> +ifneq ($(HAVE_GALLIUM_RADEONSI),)
>  $(eval GALLIUM_LIBS += $(LOCAL_MODULE))
>  $(eval GALLIUM_SHARED_LIBS += $(LOCAL_SHARED_LIBRARIES))
>  endif
> --
> 2.11.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] gallium: clarify the constraints on sampler_view_destroy

2017-10-10 Thread Gurchetan Singh
"The GL state tracker, which is the only one that runs into the
multi-context subtleties (due to share groups), already guarantees that
sampler views are destroyed before their context of creation is destroyed."

How does the GL state tracker guarantee this?  Does this guarantee also
apply to pipe_surfaces?  Context: ChromiumOS previously ran into an issue
where sampler_views and pipe_surfaces outlived the context that created
them [1], and we have technical debt [2][3][4] emanating from this.  It
would be great to remove this, if things have changed in the Gallium code.

[1]
https://lists.freedesktop.org/archives/mesa-dev/2011-September/011578.html
[2]
https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/master/media-libs/mesa/files/10.3-state_tracker-gallium-fix-crash-with-st_renderbuffer.patch
[3]
https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/master/media-libs/mesa/files/10.3-state_tracker-gallium-fix-crash-with-st_renderbuffer-freedreno.patch
[4]
https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/master/media-libs/mesa/files/12.1-radeonsi-sampler_view_destroy.patch

On Tue, Oct 10, 2017 at 2:42 PM, Nicolai Hähnle  wrote:

> Same as on IRC:
>
> On 10.10.2017 04:06, Marek Olšák wrote:
>
>> Is there any difference with piglit/drawoverhead?
>>
>
> Yes, there is.
>
>
> If yes, would this be useful?
>> https://patchwork.freedesktop.org/patch/41241/
>>
>
> Surprisingly, not that much. I'm going to think though a couple of other
> options, but want to already push patch #3; I sent a version that is
> rebased on master and disentangled from the locking stuff.
>
> Cheers,
> Nicolai
>
>
>
>> Marek
>>
>>
>> On Fri, Oct 6, 2017 at 10:38 PM, Nicolai Hähnle 
>> wrote:
>>
>>> From: Nicolai Hähnle 
>>>
>>> r600 expects the context that created the sampler view to still be alive
>>> (there is a per-context list of sampler views).
>>>
>>> svga currently bails when the context of destruction is not the same as
>>> creation.
>>>
>>> The GL state tracker, which is the only one that runs into the
>>> multi-context subtleties (due to share groups), already guarantees that
>>> sampler views are destroyed before their context of creation is
>>> destroyed.
>>>
>>> Most drivers are context-agnostic, so the warning message in
>>> pipe_sampler_view_release doesn't really make sense.
>>> ---
>>>   src/gallium/auxiliary/util/u_inlines.h   | 16 ++--
>>>   src/gallium/include/pipe/p_context.h | 10 ++
>>>   src/mesa/state_tracker/st_sampler_view.c |  1 -
>>>   3 files changed, 20 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/src/gallium/auxiliary/util/u_inlines.h
>>> b/src/gallium/auxiliary/util/u_inlines.h
>>> index 79f62c32266..790352d7800 100644
>>> --- a/src/gallium/auxiliary/util/u_inlines.h
>>> +++ b/src/gallium/auxiliary/util/u_inlines.h
>>> @@ -142,45 +142,49 @@ pipe_resource_reference(struct pipe_resource
>>> **ptr, struct pipe_resource *tex)
>>>struct pipe_resource *next = old_tex->next;
>>>
>>>old_tex->screen->resource_destroy(old_tex->screen, old_tex);
>>>old_tex = next;
>>> } while (pipe_reference_described(&old_tex->reference, NULL,
>>>   (debug_reference_descriptor)de
>>> bug_describe_resource));
>>>  }
>>>  *ptr = tex;
>>>   }
>>>
>>> +/**
>>> + * Set *ptr to \p view with proper reference counting.
>>> + *
>>> + * The caller must guarantee that \p view and *ptr must have been
>>> created in
>>> + * the same context (if they exist), and that this must be the current
>>> context.
>>> + */
>>>   static inline void
>>>   pipe_sampler_view_reference(struct pipe_sampler_view **ptr, struct
>>> pipe_sampler_view *view)
>>>   {
>>>  struct pipe_sampler_view *old_view = *ptr;
>>>
>>>  if (pipe_reference_described(&(*ptr)->reference, &view->reference,
>>>   (debug_reference_descriptor)de
>>> bug_describe_sampler_view))
>>> old_view->context->sampler_view_destroy(old_view->context,
>>> old_view);
>>>  *ptr = view;
>>>   }
>>>
>>>   /**
>>>* Similar to pipe_sampler_view_reference() but always set the pointer
>>> to
>>> - * NULL and pass in an explicit context.  Passing an explicit context
>>> is a
>>> - * work-around for fixing a dangling context pointer problem when
>>> textures
>>> - * are shared by multiple contexts.  XXX fix this someday.
>>> + * NULL and pass in the current context explicitly.
>>> + *
>>> + * If *ptr is non-NULL, it may refer to a view that was created in a
>>> different
>>> + * context (however, that context must still be alive).
>>>*/
>>>   static inline void
>>>   pipe_sampler_view_release(struct pipe_context *ctx,
>>> struct pipe_sampler_view **ptr)
>>>   {
>>>  struct pipe_sampler_view *old_view = *ptr;
>>> -   if (*ptr && (*ptr)->context != ctx) {
>>> -  debug_printf_once(("context mis-match in
>>> pipe_sampler_view_re

[Mesa-dev] [PATCH] meson: fix glx test

2017-10-10 Thread Dylan Baker
That requires a generated header that was rolled into a loop.

fixes: a47c525f3281a27 ("meson: build glx")
Signed-off-by: Dylan Baker 
---
 src/mapi/glapi/gen/meson.build | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mapi/glapi/gen/meson.build b/src/mapi/glapi/gen/meson.build
index 79aa2accc2a..4360346edad 100644
--- a/src/mapi/glapi/gen/meson.build
+++ b/src/mapi/glapi/gen/meson.build
@@ -239,6 +239,7 @@ foreach x : [['indirect_size.h', ['-m', 'size_h', 
'--header-tag', '_INDIRECT_SIZ
 capture : true,
   )
 endforeach
+glx_indirect_size_h = glx_generated[3]
 
 glapi_x86_s = custom_target(
   'glapi_x86.S',
-- 
2.14.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/6] i965/tex: Use blorp texture upload for all CCS_E textures

2017-10-10 Thread Kenneth Graunke
From: Jason Ekstrand 

This improves the FillTex benchmark in GLBench 2.7 by 30% on my Broxton.
On Ken's Broxton which only has single-channel ram, it improves by 210%.

v2 (Ken): Check mt->aux_usage == ISL_AUX_USAGE_CCS_E rather than using
  intel_miptree_is_lossless_compressed().
---
 src/mesa/drivers/dri/i965/intel_tex_image.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index 9ae27c70280..ab3a4fb0615 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -329,7 +329,8 @@ intel_upload_tex(struct gl_context * ctx,
if (mt && mt->format == MESA_FORMAT_S_UINT8)
   mt->r8stencil_needs_update = true;
 
-   if (_mesa_is_bufferobj(packing->BufferObj) || tex_busy) {
+   if (_mesa_is_bufferobj(packing->BufferObj) || tex_busy ||
+   mt->aux_usage == ISL_AUX_USAGE_CCS_E) {
   ok = intel_texsubimage_blorp(brw, dims, texImage,
xoffset, yoffset, zoffset,
width, height, depth, format, type,
-- 
2.14.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/6] i965: Use blorp instead of meta for PBO texture uploads

2017-10-10 Thread Kenneth Graunke
From: Jason Ekstrand 

---
 src/mesa/drivers/dri/i965/intel_tex_image.c | 34 +
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index 69860e28e3b..9ae27c70280 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -26,6 +26,7 @@
 #include "intel_image.h"
 #include "intel_tiled_memcpy.h"
 #include "brw_context.h"
+#include "brw_blorp.h"
 
 #define FILE_DEBUG_FLAG DEBUG_TEXTURE
 
@@ -127,6 +128,28 @@ intel_miptree_create_for_teximage(struct brw_context *brw,
flags);
 }
 
+static bool
+intel_texsubimage_blorp(struct brw_context *brw, GLuint dims,
+struct gl_texture_image *tex_image,
+unsigned x, unsigned y, unsigned z,
+unsigned width, unsigned height, unsigned depth,
+GLenum format, GLenum type, const void *pixels,
+const struct gl_pixelstore_attrib *packing)
+{
+   struct intel_texture_image *intel_image = intel_texture_image(tex_image);
+   const unsigned mt_level = tex_image->Level + tex_image->TexObject->MinLevel;
+   const unsigned mt_z = tex_image->TexObject->MinLayer + tex_image->Face + z;
+
+   /* The blorp path can't understand crazy format hackery */
+   if (_mesa_base_tex_format(&brw->ctx, tex_image->InternalFormat) !=
+   _mesa_get_format_base_format(tex_image->TexFormat))
+  return false;
+
+   return brw_blorp_upload_miptree(brw, intel_image->mt, tex_image->TexFormat,
+   mt_level, x, y, mt_z, width, height, depth,
+   tex_image->TexObject->Target, format, type,
+   pixels, packing);
+}
 
 /**
  * \brief A fast path for glTexImage and glTexSubImage.
@@ -293,6 +316,7 @@ intel_upload_tex(struct gl_context * ctx,
  const GLvoid * pixels,
  const struct gl_pixelstore_attrib *packing)
 {
+   struct brw_context *brw = brw_context(ctx);
struct intel_mipmap_tree *mt = intel_texture_image(texImage)->mt;
bool ok;
 
@@ -305,12 +329,14 @@ intel_upload_tex(struct gl_context * ctx,
if (mt && mt->format == MESA_FORMAT_S_UINT8)
   mt->r8stencil_needs_update = true;
 
-   ok = _mesa_meta_pbo_TexSubImage(ctx, dims, texImage,
+   if (_mesa_is_bufferobj(packing->BufferObj) || tex_busy) {
+  ok = intel_texsubimage_blorp(brw, dims, texImage,
xoffset, yoffset, zoffset,
width, height, depth, format, type,
-   pixels, tex_busy, packing);
-   if (ok)
-  return;
+   pixels, packing);
+  if (ok)
+ return;
+   }
 
ok = intel_texsubimage_tiled_memcpy(ctx, dims, texImage,
xoffset, yoffset, zoffset,
-- 
2.14.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/6] i965/barrier: Do the correct flushes for framebuffer access

2017-10-10 Thread Kenneth Graunke
From: Jason Ekstrand 

Framebuffer access includes framebuffer reads so we need to invalidate
the texture cache.  We do not, however, need to flush the depth cache
because you cannot do bind a depth texture as an image.
---
 src/mesa/drivers/dri/i965/brw_program.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
b/src/mesa/drivers/dri/i965/brw_program.c
index 0b2622a2320..82d38f12f67 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -274,7 +274,7 @@ brw_memory_barrier(struct gl_context *ctx, GLbitfield 
barriers)
PIPE_CONTROL_RENDER_TARGET_FLUSH);
 
if (barriers & GL_FRAMEBUFFER_BARRIER_BIT)
-  bits |= (PIPE_CONTROL_DEPTH_CACHE_FLUSH |
+  bits |= (PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
PIPE_CONTROL_RENDER_TARGET_FLUSH);
 
/* Typed surface messages are handled by the render cache on IVB, so we
-- 
2.14.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/6] i965/barrier: Do the correct flushes for texture updates

2017-10-10 Thread Kenneth Graunke
From: Jason Ekstrand 

Texture uploads and downloads may go through the render pipe which may
result in texturing from or rendering to the texture or the PBO.  We
need to flush accordingly.
---
 src/mesa/drivers/dri/i965/brw_program.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
b/src/mesa/drivers/dri/i965/brw_program.c
index 9ec2917c90e..0b2622a2320 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -268,8 +268,10 @@ brw_memory_barrier(struct gl_context *ctx, GLbitfield 
barriers)
if (barriers & GL_TEXTURE_FETCH_BARRIER_BIT)
   bits |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
 
-   if (barriers & GL_TEXTURE_UPDATE_BARRIER_BIT)
-  bits |= PIPE_CONTROL_RENDER_TARGET_FLUSH;
+   if (barriers & (GL_TEXTURE_UPDATE_BARRIER_BIT|
+   GL_PIXEL_BUFFER_BARRIER_BIT))
+  bits |= (PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
+   PIPE_CONTROL_RENDER_TARGET_FLUSH);
 
if (barriers & GL_FRAMEBUFFER_BARRIER_BIT)
   bits |= (PIPE_CONTROL_DEPTH_CACHE_FLUSH |
-- 
2.14.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6] i965/tex: Check if there is data to upload up-front

2017-10-10 Thread Kenneth Graunke
From: Jason Ekstrand 

---
 src/mesa/drivers/dri/i965/intel_tex_image.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index 7396597d9f9..69860e28e3b 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -296,6 +296,10 @@ intel_upload_tex(struct gl_context * ctx,
struct intel_mipmap_tree *mt = intel_texture_image(texImage)->mt;
bool ok;
 
+   /* Check that there is actually data to store. */
+   if (pixels == NULL && !_mesa_is_bufferobj(packing->BufferObj))
+  return;
+
bool tex_busy = mt && brw_bo_busy(mt->bo);
 
if (mt && mt->format == MESA_FORMAT_S_UINT8)
-- 
2.14.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/6] i965: Add blorp-based texture upload path

2017-10-10 Thread Kenneth Graunke
From: Topi Pohjolainen 

v2:
   - Fix return value (s/MESA_FORMAT_NONE/false/) (Anuj)
   - Move _mesa_tex_format_from_format_and_type() just
 in the end avoiding additional if-block (Anuj)
   - Explain better the array alignment restriction (Anuj)
   - Do not bail out in case of gl_pixelstore_attrib::ImageHeight,
 it is handled by _mesa_image_offset() automatically (Ken).
   - Support 1D_ARRAY by flipping depth, width and y, z (Ken).

v3:
   - Contrary to v2, do not try to handle
 gl_pixelstore_attrib::ImageHeight. Currently there are no
 tests in piglit or cts for it. One could possibly copy or
 modify tests/texturing/texsubimage.c. There, however, seems
 to be number of corner cases to consider. Moreover, current
 meta path applies the packing height for both source and
 targets when determining the offset. This would probably
 require re-visiting also.

v4: Rebased on top of merged drm-bacon

v5 (Jason Ekstrand):
   - Move to brw_blorp.c
   - Significant refactoring
   - Fixed 1-D array textures
   - Simplified handling of PBOs vs. CPU data.
   - Handle gl_pixelstore_attrib::ImageHeight.  It turns out there are
 piglit tests that cover this. The original version was failing them
 because of an error in the way it handled 1-D array textures.
   - Add support for texture download

v6 (Ken): Rebase fixes:
   - Use intel_miptree_check_level_layer instead of deleted fields
   - Update for mesa_format_supports_render[] rename.
   - Pass 'false' (read-only) to intel_bufferobj_buffer

Cc: Topi Pohjolainen 
Cc: Kenneth Graunke 
Signed-off-by: Jason Ekstrand 
---
 src/mesa/drivers/dri/i965/brw_blorp.c | 331 ++
 src/mesa/drivers/dri/i965/brw_blorp.h |  20 ++
 2 files changed, 351 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
b/src/mesa/drivers/dri/i965/brw_blorp.c
index 835dbd2af36..5f3e437be3c 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -24,7 +24,10 @@
 #include "main/context.h"
 #include "main/teximage.h"
 #include "main/blend.h"
+#include "main/bufferobj.h"
+#include "main/enums.h"
 #include "main/fbobject.h"
+#include "main/image.h"
 #include "main/renderbuffer.h"
 #include "main/glformats.h"
 
@@ -33,6 +36,7 @@
 #include "brw_defines.h"
 #include "brw_meta_util.h"
 #include "brw_state.h"
+#include "intel_buffer_objects.h"
 #include "intel_fbo.h"
 #include "common/gen_debug.h"
 
@@ -754,6 +758,333 @@ brw_blorp_framebuffer(struct brw_context *brw,
return mask;
 }
 
+static struct brw_bo *
+blorp_get_client_bo(struct brw_context *brw,
+unsigned w, unsigned h, unsigned d,
+GLenum target, GLenum format, GLenum type,
+const void *pixels,
+const struct gl_pixelstore_attrib *packing,
+uint32_t *offset_out, uint32_t *row_stride_out,
+uint32_t *image_stride_out, bool read_only)
+{
+   /* Account for SKIP_PIXELS, SKIP_ROWS, ALIGNMENT, and SKIP_IMAGES */
+   const GLuint dims = _mesa_get_texture_dimensions(target);
+   const uint32_t first_pixel = _mesa_image_offset(dims, packing, w, h,
+   format, type, 0, 0, 0);
+   const uint32_t last_pixel =  _mesa_image_offset(dims, packing, w, h,
+   format, type,
+   d - 1, h - 1, w);
+   const uint32_t stride = _mesa_image_row_stride(packing, w, format, type);
+   const uint32_t cpp = _mesa_bytes_per_pixel(format, type);
+   const uint32_t size = last_pixel - first_pixel;
+
+   *row_stride_out = stride;
+   *image_stride_out = _mesa_image_image_stride(packing, w, h, format, type);
+
+   if (_mesa_is_bufferobj(packing->BufferObj)) {
+  const uint32_t offset = first_pixel + (intptr_t)pixels;
+  if (!read_only && ((offset % cpp) || (stride % cpp))) {
+ perf_debug("Bad PBO alignment; fallback to CPU mapping\n");
+ return false;
+  }
+
+  /* This is a user-provided PBO. We just need to get the BO out */
+  struct intel_buffer_object *intel_pbo =
+ intel_buffer_object(packing->BufferObj);
+  struct brw_bo *bo =
+ intel_bufferobj_buffer(brw, intel_pbo, offset, size, false);
+
+  /* We take a reference to the BO so that the caller can just always
+   * unref without having to worry about whether it's a user PBO or one
+   * we created.
+   */
+  brw_bo_reference(bo);
+
+  *offset_out = offset;
+  return bo;
+   } else {
+  /* Someone should have already checked that there is data to upload. */
+  assert(pixels);
+
+  /* Creating a temp buffer currently only works for upload */
+  assert(read_only);
+
+  /* This is not a user-provided PBO.  Instead, pixels is a pointer to CPU
+   * data which we need to copy into a BO.
+   */
+  struct brw_bo *bo =
+ brw_b

[Mesa-dev] [PATCH] Android: fix build break from r600/radeon split

2017-10-10 Thread Rob Herring
Commit 06bfb2d28f7a ("r600: fork and import gallium/radeon") broke the
Android build:

external/mesa3d/src/gallium/drivers/radeon/r600_pipe_common.c:43:10: fatal 
error: 'llvm-c/TargetMachine.h' file not found
 ^~~~

Update the Android makefiles so that drivers/radeon is only built when
radeonsi (and therefore LLVM) is enabled.

Fixes: 06bfb2d28f7a (r600: fork and import gallium/radeon)
Cc: Marek Olšák 
Signed-off-by: Rob Herring 
---
 src/gallium/Android.mk| 2 +-
 src/gallium/drivers/r600/Android.mk   | 4 
 src/gallium/drivers/radeon/Android.mk | 2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/gallium/Android.mk b/src/gallium/Android.mk
index 97cb61a356ef..64e1e4513451 100644
--- a/src/gallium/Android.mk
+++ b/src/gallium/Android.mk
@@ -41,7 +41,7 @@ SUBDIRS += winsys/i915/drm drivers/i915
 SUBDIRS += winsys/nouveau/drm drivers/nouveau
 SUBDIRS += winsys/pl111/drm drivers/pl111
 SUBDIRS += winsys/radeon/drm drivers/r300
-SUBDIRS += winsys/radeon/drm drivers/r600 drivers/radeon
+SUBDIRS += winsys/radeon/drm drivers/r600
 SUBDIRS += winsys/radeon/drm winsys/amdgpu/drm drivers/radeonsi drivers/radeon
 SUBDIRS += winsys/vc4/drm drivers/vc4
 SUBDIRS += winsys/virgl/drm winsys/virgl/vtest drivers/virgl
diff --git a/src/gallium/drivers/r600/Android.mk 
b/src/gallium/drivers/r600/Android.mk
index 1683cfa09c9e..9f684cf2445e 100644
--- a/src/gallium/drivers/r600/Android.mk
+++ b/src/gallium/drivers/r600/Android.mk
@@ -45,6 +45,10 @@ $(intermediates)/egd_tables.h: 
$(MESA_TOP)/src/gallium/drivers/r600/egd_tables.p
@echo "Gen Header: $(PRIVATE_MODULE) <= $(notdir $(@))"
$(hide) $(MESA_PYTHON2) 
$(MESA_TOP)/src/gallium/drivers/r600/egd_tables.py 
$(MESA_TOP)/src/gallium/drivers/r600/evergreend.h > $@
 
+ifeq ($(MESA_ENABLE_LLVM),true)
+$(call mesa-build-with-llvm)
+endif
+
 include $(GALLIUM_COMMON_MK)
 include $(BUILD_STATIC_LIBRARY)
 
diff --git a/src/gallium/drivers/radeon/Android.mk 
b/src/gallium/drivers/radeon/Android.mk
index c2d3a1cbce60..578ab0be91f9 100644
--- a/src/gallium/drivers/radeon/Android.mk
+++ b/src/gallium/drivers/radeon/Android.mk
@@ -41,7 +41,7 @@ endif
 include $(GALLIUM_COMMON_MK)
 include $(BUILD_STATIC_LIBRARY)
 
-ifneq ($(HAVE_GALLIUM_R600)$(HAVE_GALLIUM_RADEONSI),)
+ifneq ($(HAVE_GALLIUM_RADEONSI),)
 $(eval GALLIUM_LIBS += $(LOCAL_MODULE))
 $(eval GALLIUM_SHARED_LIBS += $(LOCAL_SHARED_LIBRARIES))
 endif
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: store state that affects sampler views per context

2017-10-10 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Tue, Oct 10, 2017 at 11:40 PM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> This fixes sequences like:
>
> 1. Context 1 samples from texture with sRGB decode enabled
> 2. Context 2 samples from texture with sRGB decode disabled
> 3. Context 1 samples from texture with sRGB decode disabled
>
> Previously, step 3 would see the prev_sRGBDecode value from context 2
> and would incorrectly use the old sampler view with sRGB decode enabled.
> ---
> This patch is originally from my texture locking series, which I'm delaying
> to take a closer look at performance.
>
> It is independent from the locking stuff, so I rebased it on master as I'd
> like to push it earlier and would appreciate a separate review.
>
> Cheers,
> Nicolai
> ---
>  src/mesa/state_tracker/st_atom_sampler.c |  4 +--
>  src/mesa/state_tracker/st_atom_texture.c | 12 
>  src/mesa/state_tracker/st_sampler_view.c | 49 
> ++--
>  src/mesa/state_tracker/st_texture.h  | 20 +
>  4 files changed, 44 insertions(+), 41 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_atom_sampler.c 
> b/src/mesa/state_tracker/st_atom_sampler.c
> index d9e8de3c9e0..ff3f49fa4e4 100644
> --- a/src/mesa/state_tracker/st_atom_sampler.c
> +++ b/src/mesa/state_tracker/st_atom_sampler.c
> @@ -163,22 +163,22 @@ st_convert_sampler(const struct st_context *st,
>GLenum texBaseFormat = _mesa_base_tex_image(texobj)->_BaseFormat;
>
>if (st->apply_texture_swizzle_to_border_color) {
>   const struct st_texture_object *stobj = 
> st_texture_object_const(texobj);
>   const struct pipe_sampler_view *sv = NULL;
>
>   /* Just search for the first used view. We can do this because the
>  swizzle is per-texture, not per context. */
>   /* XXX: clean that up to not use the sampler view at all */
>   for (unsigned i = 0; i < stobj->num_sampler_views; ++i) {
> -if (stobj->sampler_views[i]) {
> -   sv = stobj->sampler_views[i];
> +if (stobj->sampler_views[i].view) {
> +   sv = stobj->sampler_views[i].view;
> break;
>  }
>   }
>
>   if (sv) {
>  union pipe_color_union tmp;
>  const unsigned char swz[4] =
>  {
> sv->swizzle_r,
> sv->swizzle_g,
> diff --git a/src/mesa/state_tracker/st_atom_texture.c 
> b/src/mesa/state_tracker/st_atom_texture.c
> index 81bf62908f1..90828bb4cf9 100644
> --- a/src/mesa/state_tracker/st_atom_texture.c
> +++ b/src/mesa/state_tracker/st_atom_texture.c
> @@ -77,32 +77,20 @@ st_update_single_texture(struct st_context *st,
>return;
> }
>
> if (!st_finalize_texture(ctx, st->pipe, texObj, 0) ||
> !stObj->pt) {
>/* out of mem */
>*sampler_view = NULL;
>return;
> }
>
> -   /* Check a few pieces of state outside the texture object to see if we
> -* need to force revalidation.
> -*/
> -   if (stObj->prev_glsl130_or_later != glsl130_or_later ||
> -   stObj->prev_sRGBDecode != samp->sRGBDecode) {
> -
> -  st_texture_release_all_sampler_views(st, stObj);
> -
> -  stObj->prev_glsl130_or_later = glsl130_or_later;
> -  stObj->prev_sRGBDecode = samp->sRGBDecode;
> -   }
> -
> if (texObj->TargetIndex == TEXTURE_EXTERNAL_INDEX &&
> stObj->pt->screen->resource_changed)
>   stObj->pt->screen->resource_changed(stObj->pt->screen, stObj->pt);
>
> *sampler_view =
>st_get_texture_sampler_view_from_stobj(st, stObj, samp,
>   glsl130_or_later);
>  }
>
>
> diff --git a/src/mesa/state_tracker/st_sampler_view.c 
> b/src/mesa/state_tracker/st_sampler_view.c
> index 014b4d26784..99c4f74ae08 100644
> --- a/src/mesa/state_tracker/st_sampler_view.c
> +++ b/src/mesa/state_tracker/st_sampler_view.c
> @@ -40,70 +40,70 @@
>  #include "st_format.h"
>  #include "st_cb_bufferobjects.h"
>  #include "st_cb_texture.h"
>
>
>  /**
>   * Try to find a matching sampler view for the given context.
>   * If none is found an empty slot is initialized with a
>   * template and returned instead.
>   */
> -static struct pipe_sampler_view **
> +static struct st_sampler_view *
>  st_texture_get_sampler_view(struct st_context *st,
>  struct st_texture_object *stObj)
>  {
> -   struct pipe_sampler_view **free = NULL;
> +   struct st_sampler_view *free = NULL;
> GLuint i;
>
> for (i = 0; i < stObj->num_sampler_views; ++i) {
> -  struct pipe_sampler_view **sv = &stObj->sampler_views[i];
> +  struct st_sampler_view *sv = &stObj->sampler_views[i];
>/* Is the array entry used ? */
> -  if (*sv) {
> +  if (sv->view) {
>   /* check if the context matches */
> - if ((*sv)->context == st->pipe) {
> + if (sv->view->context == st->pipe) {
>  return sv;
>   }
>}

Re: [Mesa-dev] [PATCH 1/6] gallium: clarify the constraints on sampler_view_destroy

2017-10-10 Thread Nicolai Hähnle

Same as on IRC:

On 10.10.2017 04:06, Marek Olšák wrote:

Is there any difference with piglit/drawoverhead?


Yes, there is.



If yes, would this be useful?
https://patchwork.freedesktop.org/patch/41241/


Surprisingly, not that much. I'm going to think though a couple of other 
options, but want to already push patch #3; I sent a version that is 
rebased on master and disentangled from the locking stuff.


Cheers,
Nicolai



Marek


On Fri, Oct 6, 2017 at 10:38 PM, Nicolai Hähnle  wrote:

From: Nicolai Hähnle 

r600 expects the context that created the sampler view to still be alive
(there is a per-context list of sampler views).

svga currently bails when the context of destruction is not the same as
creation.

The GL state tracker, which is the only one that runs into the
multi-context subtleties (due to share groups), already guarantees that
sampler views are destroyed before their context of creation is destroyed.

Most drivers are context-agnostic, so the warning message in
pipe_sampler_view_release doesn't really make sense.
---
  src/gallium/auxiliary/util/u_inlines.h   | 16 ++--
  src/gallium/include/pipe/p_context.h | 10 ++
  src/mesa/state_tracker/st_sampler_view.c |  1 -
  3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_inlines.h 
b/src/gallium/auxiliary/util/u_inlines.h
index 79f62c32266..790352d7800 100644
--- a/src/gallium/auxiliary/util/u_inlines.h
+++ b/src/gallium/auxiliary/util/u_inlines.h
@@ -142,45 +142,49 @@ pipe_resource_reference(struct pipe_resource **ptr, 
struct pipe_resource *tex)
   struct pipe_resource *next = old_tex->next;

   old_tex->screen->resource_destroy(old_tex->screen, old_tex);
   old_tex = next;
} while (pipe_reference_described(&old_tex->reference, NULL,
  
(debug_reference_descriptor)debug_describe_resource));
 }
 *ptr = tex;
  }

+/**
+ * Set *ptr to \p view with proper reference counting.
+ *
+ * The caller must guarantee that \p view and *ptr must have been created in
+ * the same context (if they exist), and that this must be the current context.
+ */
  static inline void
  pipe_sampler_view_reference(struct pipe_sampler_view **ptr, struct 
pipe_sampler_view *view)
  {
 struct pipe_sampler_view *old_view = *ptr;

 if (pipe_reference_described(&(*ptr)->reference, &view->reference,
  
(debug_reference_descriptor)debug_describe_sampler_view))
old_view->context->sampler_view_destroy(old_view->context, old_view);
 *ptr = view;
  }

  /**
   * Similar to pipe_sampler_view_reference() but always set the pointer to
- * NULL and pass in an explicit context.  Passing an explicit context is a
- * work-around for fixing a dangling context pointer problem when textures
- * are shared by multiple contexts.  XXX fix this someday.
+ * NULL and pass in the current context explicitly.
+ *
+ * If *ptr is non-NULL, it may refer to a view that was created in a different
+ * context (however, that context must still be alive).
   */
  static inline void
  pipe_sampler_view_release(struct pipe_context *ctx,
struct pipe_sampler_view **ptr)
  {
 struct pipe_sampler_view *old_view = *ptr;
-   if (*ptr && (*ptr)->context != ctx) {
-  debug_printf_once(("context mis-match in 
pipe_sampler_view_release()\n"));
-   }
 if (pipe_reference_described(&(*ptr)->reference, NULL,
  (debug_reference_descriptor)debug_describe_sampler_view)) 
{
ctx->sampler_view_destroy(ctx, old_view);
 }
 *ptr = NULL;
  }

  static inline void
  pipe_so_target_reference(struct pipe_stream_output_target **ptr,
   struct pipe_stream_output_target *target)
diff --git a/src/gallium/include/pipe/p_context.h 
b/src/gallium/include/pipe/p_context.h
index 4609d4dbf23..087836d1c0c 100644
--- a/src/gallium/include/pipe/p_context.h
+++ b/src/gallium/include/pipe/p_context.h
@@ -501,20 +501,30 @@ struct pipe_context {
 void (*fence_server_sync)(struct pipe_context *pipe,
   struct pipe_fence_handle *fence);

 /**
  * Create a view on a texture to be used by a shader stage.
  */
 struct pipe_sampler_view * (*create_sampler_view)(struct pipe_context *ctx,
   struct pipe_resource 
*texture,
   const struct 
pipe_sampler_view *templat);

+   /**
+* Destroy a view on a texture.
+*
+* \param ctx the current context
+* \param view the view to be destroyed
+*
+* \note The current context may not be the context in which the view was
+*   created (view->context). However, the caller must guarantee that
+*   the context which created the view is still alive.
+*/
 void (*sampler_view_destroy)(struct pipe_context *ctx,
  

[Mesa-dev] [PATCH] st/mesa: store state that affects sampler views per context

2017-10-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This fixes sequences like:

1. Context 1 samples from texture with sRGB decode enabled
2. Context 2 samples from texture with sRGB decode disabled
3. Context 1 samples from texture with sRGB decode disabled

Previously, step 3 would see the prev_sRGBDecode value from context 2
and would incorrectly use the old sampler view with sRGB decode enabled.
---
This patch is originally from my texture locking series, which I'm delaying
to take a closer look at performance.

It is independent from the locking stuff, so I rebased it on master as I'd
like to push it earlier and would appreciate a separate review.

Cheers,
Nicolai
---
 src/mesa/state_tracker/st_atom_sampler.c |  4 +--
 src/mesa/state_tracker/st_atom_texture.c | 12 
 src/mesa/state_tracker/st_sampler_view.c | 49 ++--
 src/mesa/state_tracker/st_texture.h  | 20 +
 4 files changed, 44 insertions(+), 41 deletions(-)

diff --git a/src/mesa/state_tracker/st_atom_sampler.c 
b/src/mesa/state_tracker/st_atom_sampler.c
index d9e8de3c9e0..ff3f49fa4e4 100644
--- a/src/mesa/state_tracker/st_atom_sampler.c
+++ b/src/mesa/state_tracker/st_atom_sampler.c
@@ -163,22 +163,22 @@ st_convert_sampler(const struct st_context *st,
   GLenum texBaseFormat = _mesa_base_tex_image(texobj)->_BaseFormat;
 
   if (st->apply_texture_swizzle_to_border_color) {
  const struct st_texture_object *stobj = 
st_texture_object_const(texobj);
  const struct pipe_sampler_view *sv = NULL;
 
  /* Just search for the first used view. We can do this because the
 swizzle is per-texture, not per context. */
  /* XXX: clean that up to not use the sampler view at all */
  for (unsigned i = 0; i < stobj->num_sampler_views; ++i) {
-if (stobj->sampler_views[i]) {
-   sv = stobj->sampler_views[i];
+if (stobj->sampler_views[i].view) {
+   sv = stobj->sampler_views[i].view;
break;
 }
  }
 
  if (sv) {
 union pipe_color_union tmp;
 const unsigned char swz[4] =
 {
sv->swizzle_r,
sv->swizzle_g,
diff --git a/src/mesa/state_tracker/st_atom_texture.c 
b/src/mesa/state_tracker/st_atom_texture.c
index 81bf62908f1..90828bb4cf9 100644
--- a/src/mesa/state_tracker/st_atom_texture.c
+++ b/src/mesa/state_tracker/st_atom_texture.c
@@ -77,32 +77,20 @@ st_update_single_texture(struct st_context *st,
   return;
}
 
if (!st_finalize_texture(ctx, st->pipe, texObj, 0) ||
!stObj->pt) {
   /* out of mem */
   *sampler_view = NULL;
   return;
}
 
-   /* Check a few pieces of state outside the texture object to see if we
-* need to force revalidation.
-*/
-   if (stObj->prev_glsl130_or_later != glsl130_or_later ||
-   stObj->prev_sRGBDecode != samp->sRGBDecode) {
-
-  st_texture_release_all_sampler_views(st, stObj);
-
-  stObj->prev_glsl130_or_later = glsl130_or_later;
-  stObj->prev_sRGBDecode = samp->sRGBDecode;
-   }
-
if (texObj->TargetIndex == TEXTURE_EXTERNAL_INDEX &&
stObj->pt->screen->resource_changed)
  stObj->pt->screen->resource_changed(stObj->pt->screen, stObj->pt);
 
*sampler_view =
   st_get_texture_sampler_view_from_stobj(st, stObj, samp,
  glsl130_or_later);
 }
 
 
diff --git a/src/mesa/state_tracker/st_sampler_view.c 
b/src/mesa/state_tracker/st_sampler_view.c
index 014b4d26784..99c4f74ae08 100644
--- a/src/mesa/state_tracker/st_sampler_view.c
+++ b/src/mesa/state_tracker/st_sampler_view.c
@@ -40,70 +40,70 @@
 #include "st_format.h"
 #include "st_cb_bufferobjects.h"
 #include "st_cb_texture.h"
 
 
 /**
  * Try to find a matching sampler view for the given context.
  * If none is found an empty slot is initialized with a
  * template and returned instead.
  */
-static struct pipe_sampler_view **
+static struct st_sampler_view *
 st_texture_get_sampler_view(struct st_context *st,
 struct st_texture_object *stObj)
 {
-   struct pipe_sampler_view **free = NULL;
+   struct st_sampler_view *free = NULL;
GLuint i;
 
for (i = 0; i < stObj->num_sampler_views; ++i) {
-  struct pipe_sampler_view **sv = &stObj->sampler_views[i];
+  struct st_sampler_view *sv = &stObj->sampler_views[i];
   /* Is the array entry used ? */
-  if (*sv) {
+  if (sv->view) {
  /* check if the context matches */
- if ((*sv)->context == st->pipe) {
+ if (sv->view->context == st->pipe) {
 return sv;
  }
   } else {
  /* Found a free slot, remember that */
  free = sv;
   }
}
 
/* Couldn't find a slot for our context, create a new one */
 
if (!free) {
   /* Haven't even found a free one, resize the array */
   unsigned new_size = (stObj->num_sampler_views + 1) *
- sizeof(struct pipe_s

Re: [Mesa-dev] [PATCH] radv: Add R16G16B16A16_SNORM fast clear support

2017-10-10 Thread Dave Airlie
On 11 October 2017 at 02:00, Alex Smith  wrote:
> Signed-off-by: Alex Smith 

Thanks Alex,

pushed.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] st/va: Fix config entrypoint handling

2017-10-10 Thread Mark Thompson
Consistently use it as a PIPE_VIDEO_ENTRYPOINT.

v2: Return an error if the entrypoint is not set (Christian).

Signed-off-by: Mark Thompson 
---
On 10/10/17 08:32, Christian König wrote:
> Am 09.10.2017 um 22:45 schrieb Mark Thompson:
>> Consistently use it as a PIPE_VIDEO_ENTRYPOINT.
>>
>> Signed-off-by: Mark Thompson 
>> ---
>>   src/gallium/state_trackers/va/config.c | 15 +--
>>   1 file changed, 9 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/gallium/state_trackers/va/config.c 
>> b/src/gallium/state_trackers/va/config.c
>> index 1484fcacce..f3000be2fd 100644
>> --- a/src/gallium/state_trackers/va/config.c
>> +++ b/src/gallium/state_trackers/va/config.c
>> @@ -195,7 +195,7 @@ vlVaCreateConfig(VADriverContextP ctx, VAProfile 
>> profile, VAEntrypoint entrypoin
>>     return VA_STATUS_ERROR_ALLOCATION_FAILED;
>>    if (profile == VAProfileNone && entrypoint == VAEntrypointVideoProc) {
>> -  config->entrypoint = VAEntrypointVideoProc;
>> +  config->entrypoint = PIPE_VIDEO_ENTRYPOINT_UNKNOWN;
>>     config->profile = PIPE_VIDEO_PROFILE_UNKNOWN;
>>     supported_rt_formats = VA_RT_FORMAT_YUV420 |
>>    VA_RT_FORMAT_YUV420_10BPP |
>> @@ -342,14 +342,17 @@ vlVaQueryConfigAttributes(VADriverContextP ctx, 
>> VAConfigID config_id, VAProfile
>>    *profile = PipeToProfile(config->profile);
>>   -   if (config->profile == PIPE_VIDEO_PROFILE_UNKNOWN) {
>> +   switch (config->entrypoint) {
>> +   case PIPE_VIDEO_ENTRYPOINT_BITSTREAM:
>> +  *entrypoint = VAEntrypointVLD;
>> +  break;
>> +   case PIPE_VIDEO_ENTRYPOINT_ENCODE:
>> +  *entrypoint = VAEntrypointEncSlice;
>> +  break;
>> +   default:
> 
> I prefer an explicit case for PIPE_VIDEO_ENTRYPOINT_UNKNOWN and returning an 
> error in the default case here.
> 
> Apart from that the change looks good to me and seems to be a rather nice 
> cleanup/fix.

Sure!  Here's a new version (2/2 unchanged).

Thanks,

- Mark


 src/gallium/state_trackers/va/config.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/src/gallium/state_trackers/va/config.c 
b/src/gallium/state_trackers/va/config.c
index 1484fcacce..25043d6374 100644
--- a/src/gallium/state_trackers/va/config.c
+++ b/src/gallium/state_trackers/va/config.c
@@ -195,7 +195,7 @@ vlVaCreateConfig(VADriverContextP ctx, VAProfile profile, 
VAEntrypoint entrypoin
   return VA_STATUS_ERROR_ALLOCATION_FAILED;
 
if (profile == VAProfileNone && entrypoint == VAEntrypointVideoProc) {
-  config->entrypoint = VAEntrypointVideoProc;
+  config->entrypoint = PIPE_VIDEO_ENTRYPOINT_UNKNOWN;
   config->profile = PIPE_VIDEO_PROFILE_UNKNOWN;
   supported_rt_formats = VA_RT_FORMAT_YUV420 |
  VA_RT_FORMAT_YUV420_10BPP |
@@ -342,14 +342,20 @@ vlVaQueryConfigAttributes(VADriverContextP ctx, 
VAConfigID config_id, VAProfile
 
*profile = PipeToProfile(config->profile);
 
-   if (config->profile == PIPE_VIDEO_PROFILE_UNKNOWN) {
+   switch (config->entrypoint) {
+   case PIPE_VIDEO_ENTRYPOINT_BITSTREAM:
+  *entrypoint = VAEntrypointVLD;
+  break;
+   case PIPE_VIDEO_ENTRYPOINT_ENCODE:
+  *entrypoint = VAEntrypointEncSlice;
+  break;
+   case PIPE_VIDEO_ENTRYPOINT_UNKNOWN:
   *entrypoint = VAEntrypointVideoProc;
-  *num_attribs = 0;
-  return VA_STATUS_SUCCESS;
+  break;
+   default:
+  return VA_STATUS_ERROR_INVALID_CONFIG;
}
 
-   *entrypoint = config->entrypoint;
-
*num_attribs = 1;
attrib_list[0].type = VAConfigAttribRTFormat;
attrib_list[0].value = config->rt_format;
-- 
2.11.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Piglit on windows

2017-10-10 Thread Dylan Baker
Quoting Kyriazis, George (2017-10-10 12:14:00)
> 
> > On Oct 10, 2017, at 1:23 PM, Dylan Baker  wrote:
> > 
> > Quoting Kyriazis, George (2017-10-10 11:02:26)
> >> 
> >>> On Oct 10, 2017, at 12:46 PM, Dylan Baker  wrote:
> >>> 
> >>> Quoting Jose Fonseca (2017-10-10 08:41:49)
>  On 10/10/17 16:31, Kyriazis, George wrote:
> > Hello…
> > 
> > Piglit on windows prints out a message saying “Timeout are not 
> > implemented on Windows.”.  These timeouts are the test timeouts in case 
> > a test hangs.
> > 
> > What do people do when running piglit on windows and they hit a 
> > timeout?  I would imagine there would be a non-zero number of people 
> > running piglit on windows on a regular basis, as a regression tool...
> > 
> > Thank you!
> > 
> > George
>  
>  I haven't been involved into piglit Windows testing lately, so my
>  understanding might be dated.
>  
>  I believe that we have timeouts when we test piglit on Windows. It's not
>  implemented on piglit python framework itself, but rather on VMware
>  testing framework (that driver piglit, and a bunch of other tests.)
>  
>  That said, I believe it would be better long term if piglit framework
>  had timeouts on Windows, as it can probably track that with finer
>  granularity than we do now by putting a timeout on whole piglit or
>  subsets of piglit tests.
>  
>  python3's subprocess module has timeout options, so it should be
>  relatively easy to implement on top of it, in a cross-platform manner.
>  
>  Jose
>  ___
>  mesa-dev mailing list
>  mesa-dev@lists.freedesktop.org
>  https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>> 
> >>> Since I added that warning message...
> >>> 
> >>> There are no timeouts in python 2 on windows, even with 3rd party 
> >>> packages.
> >>> 
> >>> For python 3 to properly handle timeouts you need to kill the process that
> >>> exceeds the expected timeout. Basically when timeout expires the 
> >>> sub-process
> >>> communicate call stops blocking, but doesn't actually kill the process. 
> >>> On Linux
> >>> we ask the process to terminate, wait 3 seconds and then SIGKILL it.
> >>> 
> >>> I don't know how to do that on windows, so I didn't implement it and 
> >>> instead
> >>> windows users get a warning. If someone with a basic grasp of how to kill 
> >>> a
> >>> process on windows wanted that functionality it probably wouldn't be too 
> >>> hard to
> >>> implement.
> >>> 
> >>> Alternatively there are constructs that are only in python 3 that do the 
> >>> killing
> >>> for you, on both Windows and Linux, but they don't have a python 2 
> >>> equivalent.
> >>> 
> >> Hmm..  Since the current code has special cases between python 2/3 and 
> >> windows/linux, finding a cross-platform method for both OSes seems like a 
> >> “nice to have” at this point, although desirable.
> >> 
> >> I’ll see if I can find a quick fix for this.
> >> 
> >> Thank you!
> >> 
> >> George
> >> 
> >>> Dylan
> >> 
> > 
> > You could have a look at framework/test/base.py, the logic is in there. I 
> > don't
> > think it would really be that much code to add to that mess, and we could 
> > always
> > delete it whenever we finally decide that we don't care about python 2 
> > anymore.
> > 
> > Dylan
> 
> The interest is for python3, I don’t think windows piglit works with python2 
> anyway.  The README file lists python3 as a requirement for windows.
> 
> So, I did some simple tests with Popen() on windows, and it looks like 
> commnicate() understands the timeout argument, and proc.kill() seems to work 
> (at least for simple things, like “sleep”).
> 
> Maybe python3 support changed between the time you wrote the code and now?  
> I’m using Python 3.6.3.
> 
> Thanks,
> 
> george
> 

I don't remember why I didn't use proc.kill when I implemented it, but it's been
present since at least 3.2 and subprocess32 supports it, so it would probably be
fine to to drop the Linux specific code and use proc.kill instead.

I think to test it I had written a small python script that caught and ignored
SIGTERM to ensure that the SIGKILL functionality actually worked, but it was a
while ago so I could be misremembering.

Dylan


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 3/4] intel/cfg: Always add both successors to a break

2017-10-10 Thread Jason Ekstrand
On Tue, Oct 10, 2017 at 12:52 PM, Francisco Jerez 
wrote:

> Jason Ekstrand  writes:
>
> > On Wed, Oct 4, 2017 at 5:29 PM, Connor Abbott 
> wrote:
> >
> >> This won't completely solve the problem. For example, what if you
> >> hoist the assignment to color2 outside the loop?
> >>
> >> vec4 color2;
> >> while (1) {
> >>vec4 color = texture();
> >>color2 = color * 2;
> >>if (...) {
> >>   break;
> >>}
> >> }
> >> gl_FragColor = color2;
> >>
> >>
> >> Now the definition still dominates the use, even with the modified
> >> control-flow graph, and you have the same problem
> >
> >
> > Curro had me convinced that some detail of the liveness analysis pass
> saved
> > us here but now I can't remember what. :-(
> >
>
> I think it's a shame that my name didn't come up on this discussion
> until it was time to deflect Connor's blame away from you.  Not that I
> generally care a lot about other people putting their names on the
> result of my research while I'm out on vacation, but at least face the
> fact that the patch you put your name on is incomplete, on your own.
>

I'm sorry.  I did not intend to deflect blame there.  I was simply
recalling some discussions we had in which I became convinced (wrongly, as
it turns out) that this was sufficient.  I couldn't quite remember the
details at the time of my reply.  Later, I thought about it more and
realized what I was missing and sent the second patch.  I knew I was
pulling something called WIP off your Jenkins branch and that you had not
in any way shape or form signed off on it as being correct which is why I
didn't put your name on the patch I sent out.


> As it turns out there's a minimal solution for the problem Connor
> brought up that doesn't involve ad-hoc post-dataflow-analysis fix-ups,
> nor disabling dataflow analysis for non-writemask-all assignments, but
> why on earth would I get involved in this discussion now?
>

If you've got a minimum solution, I'd love to hear about it!  I'm just
trying to solve the problem because it's blocking other stuff.  And, to be
honest, my attempts aren't going so well ATM.

--Jason


> >
> >> The real problem is
> >> that the assignment to color2 is really a conditional assignment: if
> >> we're going channel-by-channel, it's not, but if you consider the
> >> *whole* register at the same time, it is. To really fix the problem,
> >> you need to model exactly what the machine actually does: you need to
> >> insert "fake" edges like these, that model the jumps that the machine
> >> can take, and you need to make every assignment a conditional
> >> assignment (i.e. it doesn't kill the register). It's probably not as
> >> bad with Curro's patch on top, though. Also, once you do this you can
> >> make register allocation more accurate by generating interferences
> >> from the liveness information directly instead of from the intervals.
> >>
> >> One thing I've thought about is, in addition to maintaining this
> >> "whole-vector" view of things, is to maintain a "per-channel" liveness
> >> that doesn't use the extra edges, partial definitions etc. and then
> >> use the "per-channel view" to calculate interference when the channels
> >> always line up.
> >>
> >
> > Yes, we've considered that and it's a good idea.  However, I'm trying to
> > fix bugs right now, not write the world's best liveness analysis pass.
> :-)
> >
> >
> >> On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand 
> >> wrote:
> >> > Shader-db results on Sky Lake:
> >> >
> >> > total instructions in shared programs: 12955125 -> 12953698
> (-0.01%)
> >> > instructions in affected programs: 55956 -> 54529 (-2.55%)
> >> > helped: 6
> >> > HURT: 38
> >> >
> >> > All of the hurt programs were hurt by exactly one instruction because
> >> > this patch affects copy propagation.  Most of the helped instructions
> >> > came from a single orbital explorer shader that was helped by 14.26%
> >> >
> >> > Cc: mesa-sta...@lists.freedesktop.org
> >> > ---
> >> >  src/intel/compiler/brw_cfg.cpp | 37 ++
> >> +--
> >> >  1 file changed, 35 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/src/intel/compiler/brw_cfg.cpp
> >> b/src/intel/compiler/brw_cfg.cpp
> >> > index fad12ee..d8bf725 100644
> >> > --- a/src/intel/compiler/brw_cfg.cpp
> >> > +++ b/src/intel/compiler/brw_cfg.cpp
> >> > @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions)
> >> >   assert(cur_while != NULL);
> >> >  cur->add_successor(mem_ctx, cur_while);
> >> >
> >> > + /* We also add the next block as a successor of the break.
> If
> >> the
> >> > +  * break is predicated, we need to do this because the break
> >> may not
> >> > +  * be taken.  If the break is not predicated, we add it
> anyway
> >> so
> >> > +  * that our live intervals computations will operate as if
> the
> >> break
> >> > +  * may or may not be taken.  Consider the following example:
> >> > +  *
> >> > +  

Re: [Mesa-dev] [PATCH] configure.ac: bump Clover LLVM requirement to 3.9

2017-10-10 Thread Francisco Jerez
Emil Velikov  writes:

> On 4 October 2017 at 15:10, Jan Vesely  wrote:
>> On Wed, 2017-10-04 at 14:59 +0100, Emil Velikov wrote:
>>> On 3 October 2017 at 19:19, Jan Vesely  wrote:
>>> > On Tue, 2017-10-03 at 17:51 +0100, Emil Velikov wrote:
>>> > > From: Emil Velikov 
>>> > >
>>> > > The only driver that utilises Clover already depends on LLVM 3.9.
>>> > > Additionally close to every supported distribution has said version.
>>> > >
>>> > > Additionally libclc requires LLVM 4.0 these days.
>>> >
>>> > support for llvm-3.9 has been restored to libclc since our discussion.
>>> > sorry, I should have mentioned that.
>>> >
>>>
>>> Right, I'll update the commit message as follows and push it in a few hours.
>>
>> Thanks.
>> Acked-by: Jan Vesely 
>>
>> you might want to get the maintainer's (Francisco) ack as well.
>>
> I would love some input from him, that's why he's been in CC chain
> from the first email ;-)
> Guess ^^ is not urgent yet I'd love to trim down the Travis combinations a 
> bit.
>

Acked-by: Francisco Jerez 

> -Emil
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-10-10 Thread Jason Ekstrand
On Tue, Oct 10, 2017 at 9:16 AM, Connor Abbott  wrote:

> I'm a little nervous about this, because really, the only solution to
> this problem is to ignore all non-WE_all definitions of all variables
> in liveness analysis. For example, in something like:
>
> vec4 color2 = ...
> if (...) {
>color2 = texture();
> }
>
> texture() can also overwrite inactive channels of color2. We happen to
> get this right because we turn live ranges into live intervals without
> holes, but I can't come up with a good reason why that would save us
> in all cases except the one in this patch -- which makes me worry that
> we'll find yet another case where there's a similar problem. I think
> it would be clearer if we what I said above, i.e. ignore all
> non-WE_all definitions, which will make things much worse, but then
> apply Curro's patch which will return things to pretty much how they
> were before, except this case will be fixed and maybe some other cases
> we haven't thought of.
>

What you're suggesting may actually be less code and is arguably better in
terms of being more straightforward.  However, I think intervals plus this
patch is equivalent.  Curro's patch + always-partial will cause us to start
the live range at the IP where it first *may* be defined and we keep the
behavior of ending the live range at the last IP where some reachable
instruction may use it.  With my patch + Curro's, we start the live range
at the IP where it is first defined which will always all places it *may*
be defined unless there is a back edge.  If there is a back-edge, I pull
the live range up across the back edge.

That said, I think I agree with you that my solution treats it as a special
case and not a general problem.  However, I'm relucant to just change
liveness analysis to assume partial writes because we use it for more than
computing variable interference.  In particular, we use it for dead code
elimination and the concept of partial/complete writes is crucial there.
I've got another patch cooking which I'll send out soon which should make
you happier with it.

--Jason


>
> On Thu, Oct 5, 2017 at 2:52 PM, Jason Ekstrand 
> wrote:
> > No Shader-db changes.
> >
> > Cc: mesa-sta...@lists.freedesktop.org
> > ---
> >  src/intel/compiler/brw_fs_live_variables.cpp | 55
> 
> >  1 file changed, 55 insertions(+)
> >
> > diff --git a/src/intel/compiler/brw_fs_live_variables.cpp
> b/src/intel/compiler/brw_fs_live_variables.cpp
> > index c449672..380060d 100644
> > --- a/src/intel/compiler/brw_fs_live_variables.cpp
> > +++ b/src/intel/compiler/brw_fs_live_variables.cpp
> > @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
> >   }
> >}
> > }
> > +
> > +   /* Due to the explicit way the SIMD data is handled on GEN, we need
> to be a
> > +* bit more careful with live ranges and loops.  Consider the
> following
> > +* example:
> > +*
> > +*vec4 color2;
> > +*while (1) {
> > +*   vec4 color = texture();
> > +*   if (...) {
> > +*  color2 = color * 2;
> > +*  break;
> > +*   }
> > +*}
> > +*gl_FragColor = color2;
> > +*
> > +* In this case, the definition of color2 dominates the use because
> the
> > +* loop only has the one exit.  This means that the live range
> interval for
> > +* color2 goes from the statement in the if to it's use below the
> loop.
> > +* Now suppose that the texture operation has a header register that
> gets
> > +* assigned one of the registers used for color2.  If the loop
> condition is
> > +* non-uniform and some of the threads will take the and others will
> > +* continue.  In this case, the next pass through the loop, the
> WE_all
> > +* setup of the header register will stomp the disabled channels of
> color2
> > +* and corrupt the value.
> > +*
> > +* This same problem can occur if you have a mix of 64, 32, and
> 16-bit
> > +* registers because the channels do not line up or if you have a
> SIMD16
> > +* program and the first half of one value overlaps the second half
> of the
> > +* other.
> > +*
> > +* To solve this problem, we take any VGRFs whose live ranges cross
> the
> > +* while instruction of a loop and extend their live ranges to the
> top of
> > +* the loop.  This more accurately models the hardware because the
> value in
> > +* the VGRF needs to be carried through subsequent loop iterations
> in order
> > +* to remain valid when we finally do break.
> > +*/
> > +   foreach_block (block, cfg) {
> > +  if (block->end()->opcode != BRW_OPCODE_WHILE)
> > + continue;
> > +
> > +  /* This is a WHILE instrution. Find the DO block. */
> > +  bblock_t *do_block = NULL;
> > +  foreach_list_typed(bblock_link, child_link, link,
> &block->children) {
> > + if (child_link->block->start_ip < block->end_ip) {
> > +assert(do_block == NULL

Re: [Mesa-dev] [Mesa-stable] [PATCH 3/4] intel/cfg: Always add both successors to a break

2017-10-10 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Wed, Oct 4, 2017 at 5:29 PM, Connor Abbott  wrote:
>
>> This won't completely solve the problem. For example, what if you
>> hoist the assignment to color2 outside the loop?
>>
>> vec4 color2;
>> while (1) {
>>vec4 color = texture();
>>color2 = color * 2;
>>if (...) {
>>   break;
>>}
>> }
>> gl_FragColor = color2;
>>
>>
>> Now the definition still dominates the use, even with the modified
>> control-flow graph, and you have the same problem
>
>
> Curro had me convinced that some detail of the liveness analysis pass saved
> us here but now I can't remember what. :-(
>

I think it's a shame that my name didn't come up on this discussion
until it was time to deflect Connor's blame away from you.  Not that I
generally care a lot about other people putting their names on the
result of my research while I'm out on vacation, but at least face the
fact that the patch you put your name on is incomplete, on your own.

As it turns out there's a minimal solution for the problem Connor
brought up that doesn't involve ad-hoc post-dataflow-analysis fix-ups,
nor disabling dataflow analysis for non-writemask-all assignments, but
why on earth would I get involved in this discussion now?

>
>> The real problem is
>> that the assignment to color2 is really a conditional assignment: if
>> we're going channel-by-channel, it's not, but if you consider the
>> *whole* register at the same time, it is. To really fix the problem,
>> you need to model exactly what the machine actually does: you need to
>> insert "fake" edges like these, that model the jumps that the machine
>> can take, and you need to make every assignment a conditional
>> assignment (i.e. it doesn't kill the register). It's probably not as
>> bad with Curro's patch on top, though. Also, once you do this you can
>> make register allocation more accurate by generating interferences
>> from the liveness information directly instead of from the intervals.
>>
>> One thing I've thought about is, in addition to maintaining this
>> "whole-vector" view of things, is to maintain a "per-channel" liveness
>> that doesn't use the extra edges, partial definitions etc. and then
>> use the "per-channel view" to calculate interference when the channels
>> always line up.
>>
>
> Yes, we've considered that and it's a good idea.  However, I'm trying to
> fix bugs right now, not write the world's best liveness analysis pass. :-)
>
>
>> On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand 
>> wrote:
>> > Shader-db results on Sky Lake:
>> >
>> > total instructions in shared programs: 12955125 -> 12953698 (-0.01%)
>> > instructions in affected programs: 55956 -> 54529 (-2.55%)
>> > helped: 6
>> > HURT: 38
>> >
>> > All of the hurt programs were hurt by exactly one instruction because
>> > this patch affects copy propagation.  Most of the helped instructions
>> > came from a single orbital explorer shader that was helped by 14.26%
>> >
>> > Cc: mesa-sta...@lists.freedesktop.org
>> > ---
>> >  src/intel/compiler/brw_cfg.cpp | 37 ++
>> +--
>> >  1 file changed, 35 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/src/intel/compiler/brw_cfg.cpp
>> b/src/intel/compiler/brw_cfg.cpp
>> > index fad12ee..d8bf725 100644
>> > --- a/src/intel/compiler/brw_cfg.cpp
>> > +++ b/src/intel/compiler/brw_cfg.cpp
>> > @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions)
>> >   assert(cur_while != NULL);
>> >  cur->add_successor(mem_ctx, cur_while);
>> >
>> > + /* We also add the next block as a successor of the break.  If
>> the
>> > +  * break is predicated, we need to do this because the break
>> may not
>> > +  * be taken.  If the break is not predicated, we add it anyway
>> so
>> > +  * that our live intervals computations will operate as if the
>> break
>> > +  * may or may not be taken.  Consider the following example:
>> > +  *
>> > +  *vec4 color2;
>> > +  *while (1) {
>> > +  *   vec4 color = texture();
>> > +  *   if (...) {
>> > +  *  color2 = color * 2;
>> > +  *  break;
>> > +  *   }
>> > +  *}
>> > +  *gl_FragColor = color2;
>> > +  *
>> > +  * In this case, the definition of color2 dominates the use
>> because
>> > +  * the loop only has the one exit.  This means that the live
>> range
>> > +  * interval for color2 goes from the statement in the if to
>> it's use
>> > +  * below the loop.  Now suppose that the texture operation has
>> a
>> > +  * header register that gets assigned one of the registers
>> used for
>> > +  * color2.  If the loop condition is non-uniform and some of
>> the
>> > +  * threads will take the break and others will continue.  In
>> this
>> > +  * case, the next pass through the loop, the WE_all setup of
>> the
>> > +   

Re: [Mesa-dev] Piglit on windows

2017-10-10 Thread Kyriazis, George

> On Oct 10, 2017, at 1:23 PM, Dylan Baker  wrote:
> 
> Quoting Kyriazis, George (2017-10-10 11:02:26)
>> 
>>> On Oct 10, 2017, at 12:46 PM, Dylan Baker  wrote:
>>> 
>>> Quoting Jose Fonseca (2017-10-10 08:41:49)
 On 10/10/17 16:31, Kyriazis, George wrote:
> Hello…
> 
> Piglit on windows prints out a message saying “Timeout are not 
> implemented on Windows.”.  These timeouts are the test timeouts in case a 
> test hangs.
> 
> What do people do when running piglit on windows and they hit a timeout?  
> I would imagine there would be a non-zero number of people running piglit 
> on windows on a regular basis, as a regression tool...
> 
> Thank you!
> 
> George
 
 I haven't been involved into piglit Windows testing lately, so my
 understanding might be dated.
 
 I believe that we have timeouts when we test piglit on Windows. It's not
 implemented on piglit python framework itself, but rather on VMware
 testing framework (that driver piglit, and a bunch of other tests.)
 
 That said, I believe it would be better long term if piglit framework
 had timeouts on Windows, as it can probably track that with finer
 granularity than we do now by putting a timeout on whole piglit or
 subsets of piglit tests.
 
 python3's subprocess module has timeout options, so it should be
 relatively easy to implement on top of it, in a cross-platform manner.
 
 Jose
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>> 
>>> Since I added that warning message...
>>> 
>>> There are no timeouts in python 2 on windows, even with 3rd party packages.
>>> 
>>> For python 3 to properly handle timeouts you need to kill the process that
>>> exceeds the expected timeout. Basically when timeout expires the sub-process
>>> communicate call stops blocking, but doesn't actually kill the process. On 
>>> Linux
>>> we ask the process to terminate, wait 3 seconds and then SIGKILL it.
>>> 
>>> I don't know how to do that on windows, so I didn't implement it and instead
>>> windows users get a warning. If someone with a basic grasp of how to kill a
>>> process on windows wanted that functionality it probably wouldn't be too 
>>> hard to
>>> implement.
>>> 
>>> Alternatively there are constructs that are only in python 3 that do the 
>>> killing
>>> for you, on both Windows and Linux, but they don't have a python 2 
>>> equivalent.
>>> 
>> Hmm..  Since the current code has special cases between python 2/3 and 
>> windows/linux, finding a cross-platform method for both OSes seems like a 
>> “nice to have” at this point, although desirable.
>> 
>> I’ll see if I can find a quick fix for this.
>> 
>> Thank you!
>> 
>> George
>> 
>>> Dylan
>> 
> 
> You could have a look at framework/test/base.py, the logic is in there. I 
> don't
> think it would really be that much code to add to that mess, and we could 
> always
> delete it whenever we finally decide that we don't care about python 2 
> anymore.
> 
> Dylan

The interest is for python3, I don’t think windows piglit works with python2 
anyway.  The README file lists python3 as a requirement for windows.

So, I did some simple tests with Popen() on windows, and it looks like 
commnicate() understands the timeout argument, and proc.kill() seems to work 
(at least for simple things, like “sleep”).

Maybe python3 support changed between the time you wrote the code and now?  I’m 
using Python 3.6.3.

Thanks,

george

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/4] nir: set default lod to texture opcodes that needed it but don't provide it

2017-10-10 Thread Eric Anholt
Samuel Iglesias Gonsálvez  writes:

> Signed-off-by: Samuel Iglesias Gonsálvez 
> ---
>  src/compiler/nir/nir_lower_tex.c | 68 
> 
>  1 file changed, 68 insertions(+)
>
> diff --git a/src/compiler/nir/nir_lower_tex.c 
> b/src/compiler/nir/nir_lower_tex.c
> index 65681decb1c..d3380710405 100644
> --- a/src/compiler/nir/nir_lower_tex.c
> +++ b/src/compiler/nir/nir_lower_tex.c
> @@ -717,6 +717,52 @@ linearize_srgb_result(nir_builder *b, nir_tex_instr *tex)
>result->parent_instr);
>  }
>  
> +static void
> +set_default_lod(nir_builder *b, nir_tex_instr *tex)
> +{
> +   b->cursor = nir_before_instr(&tex->instr);
> +
> +   /* We are going to emit the same texture but adding a default LOD.
> +*/
> +   int num_srcs = tex->num_srcs + 1;
> +   nir_tex_instr *new = nir_tex_instr_create(b->shader, num_srcs);
> +
> +   new->op = tex->op;
> +   new->sampler_dim = tex->sampler_dim;
> +   new->texture_index = tex->texture_index;
> +   new->dest_type = tex->dest_type;
> +   new->is_array = tex->is_array;
> +   new->is_shadow = tex->is_shadow;
> +   new->is_new_style_shadow = tex->is_new_style_shadow;
> +   new->sampler_index = tex->sampler_index;
> +   new->texture = nir_deref_var_clone(tex->texture, new);
> +   new->sampler = nir_deref_var_clone(tex->sampler, new);
> +   new->coord_components = tex->coord_components;

Couldn't we just make a new srcs array of num_srcs+1, memcpy the old
srcs/types over, add our new use of the immediate 0 lod to it by
manipulating the immediate's uses list, and free the old list?  It seems
less fragile to me than needing to update this path if we add a new
texture flag.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Piglit on windows

2017-10-10 Thread Dylan Baker
Quoting Kyriazis, George (2017-10-10 11:02:26)
> 
> > On Oct 10, 2017, at 12:46 PM, Dylan Baker  wrote:
> > 
> > Quoting Jose Fonseca (2017-10-10 08:41:49)
> >> On 10/10/17 16:31, Kyriazis, George wrote:
> >>> Hello…
> >>> 
> >>> Piglit on windows prints out a message saying “Timeout are not 
> >>> implemented on Windows.”.  These timeouts are the test timeouts in case a 
> >>> test hangs.
> >>> 
> >>> What do people do when running piglit on windows and they hit a timeout?  
> >>> I would imagine there would be a non-zero number of people running piglit 
> >>> on windows on a regular basis, as a regression tool...
> >>> 
> >>> Thank you!
> >>> 
> >>> George
> >> 
> >> I haven't been involved into piglit Windows testing lately, so my
> >> understanding might be dated.
> >> 
> >> I believe that we have timeouts when we test piglit on Windows. It's not
> >> implemented on piglit python framework itself, but rather on VMware
> >> testing framework (that driver piglit, and a bunch of other tests.)
> >> 
> >> That said, I believe it would be better long term if piglit framework
> >> had timeouts on Windows, as it can probably track that with finer
> >> granularity than we do now by putting a timeout on whole piglit or
> >> subsets of piglit tests.
> >> 
> >> python3's subprocess module has timeout options, so it should be
> >> relatively easy to implement on top of it, in a cross-platform manner.
> >> 
> >> Jose
> >> ___
> >> mesa-dev mailing list
> >> mesa-dev@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > 
> > Since I added that warning message...
> > 
> > There are no timeouts in python 2 on windows, even with 3rd party packages.
> > 
> > For python 3 to properly handle timeouts you need to kill the process that
> > exceeds the expected timeout. Basically when timeout expires the sub-process
> > communicate call stops blocking, but doesn't actually kill the process. On 
> > Linux
> > we ask the process to terminate, wait 3 seconds and then SIGKILL it.
> > 
> > I don't know how to do that on windows, so I didn't implement it and instead
> > windows users get a warning. If someone with a basic grasp of how to kill a
> > process on windows wanted that functionality it probably wouldn't be too 
> > hard to
> > implement.
> > 
> > Alternatively there are constructs that are only in python 3 that do the 
> > killing
> > for you, on both Windows and Linux, but they don't have a python 2 
> > equivalent.
> > 
> Hmm..  Since the current code has special cases between python 2/3 and 
> windows/linux, finding a cross-platform method for both OSes seems like a 
> “nice to have” at this point, although desirable.
> 
> I’ll see if I can find a quick fix for this.
> 
> Thank you!
> 
> George
> 
> > Dylan
> 

You could have a look at framework/test/base.py, the logic is in there. I don't
think it would really be that much code to add to that mess, and we could always
delete it whenever we finally decide that we don't care about python 2 anymore.

Dylan


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl/dri: don't crash when createImageFromRenderbuffer2 is NULL

2017-10-10 Thread Eric Anholt
Emil Velikov  writes:

> From: Emil Velikov 
>
> The __DRI_IMAGE version can be 17 or over, while the function pointer is
> NULL. Guard for that instead of crashing.
>
> Fixes: bad24395d91 ("egl/dri: use createImageFromRenderbuffer2 when
> available")
> Cc: Nicolai Hähnle 
> Signed-off-by: Emil Velikov 

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 3/3] st/glsl_to_tgsi: the second destination doesn't support relative addressing

2017-10-10 Thread Marek Olšák
For the series:

Reviewed-by: Marek Olšák 

Marek

On Tue, Oct 10, 2017 at 2:11 PM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> It's not used -- DFRACEXP gets array indexes of its exponent out-parameter
> lowered earlier -- and it wouldn't have worked correctly anyway when both
> dst and dst1 use relative addressing.
> ---
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index 235690510b9..394d39ade63 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -382,42 +382,39 @@ glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, 
> unsigned op,
> int num_reladdr = 0, i, j;
> bool dst_is_64bit[2];
>
> op = get_opcode(op, dst, src0, src1);
>
> /* If we have to do relative addressing, we want to load the ARL
>  * reg directly for one of the regs, and preload the other reladdr
>  * sources into temps.
>  */
> num_reladdr += dst.reladdr != NULL || dst.reladdr2;
> -   num_reladdr += dst1.reladdr != NULL || dst1.reladdr2;
> +   assert(!dst1.reladdr); /* should be lowered in earlier passes */
> num_reladdr += src0.reladdr != NULL || src0.reladdr2 != NULL;
> num_reladdr += src1.reladdr != NULL || src1.reladdr2 != NULL;
> num_reladdr += src2.reladdr != NULL || src2.reladdr2 != NULL;
> num_reladdr += src3.reladdr != NULL || src3.reladdr2 != NULL;
>
> reladdr_to_temp(ir, &src3, &num_reladdr);
> reladdr_to_temp(ir, &src2, &num_reladdr);
> reladdr_to_temp(ir, &src1, &num_reladdr);
> reladdr_to_temp(ir, &src0, &num_reladdr);
>
> if (dst.reladdr || dst.reladdr2) {
>if (dst.reladdr)
>   emit_arl(ir, address_reg, *dst.reladdr);
>if (dst.reladdr2)
>   emit_arl(ir, address_reg2, *dst.reladdr2);
>num_reladdr--;
> }
> -   if (dst1.reladdr) {
> -  emit_arl(ir, address_reg, *dst1.reladdr);
> -  num_reladdr--;
> -   }
> +
> assert(num_reladdr == 0);
>
> /* inst->op has only 8 bits. */
> STATIC_ASSERT(TGSI_OPCODE_LAST <= 255);
>
> inst->op = op;
> inst->precise = this->precise;
> inst->info = tgsi_get_opcode_info(op);
> inst->dst[0] = dst;
> inst->dst[1] = dst1;
> --
> 2.11.0
>
> ___
> mesa-stable mailing list
> mesa-sta...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-stable
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 1/3] mesa: Add new fast mtx_t mutex type for basic use cases

2017-10-10 Thread Dylan Baker
Quoting Nicolai Hähnle (2017-10-10 09:30:31)
> On 10.10.2017 04:45, Timothy Arceri wrote:
> > While modern pthread mutexes are very fast, they still incur a call to an
> > external DSO and overhead of the generality and features of pthread mutexes.
> > Most mutexes in mesa only needs lock/unlock, and the idea here is that we 
> > can
> > inline the atomic operation and make the fast case just two intructions.
> > Mutexes are subtle and finicky to implement, so we carefully copy the
> > implementation from Ulrich Dreppers well-written and well-reviewed paper:
> > 
> >"Futexes Are Tricky"
> >http://www.akkadia.org/drepper/futex.pdf
> > 
> > We implement "mutex3", which gives us a mutex that has no syscalls on
> > uncontended lock or unlock.  Further, the uncontended case boils down to a
> > cmpxchg and an untaken branch and the uncontended unlock is just a locked 
> > decr
> > and an untaken branch.  We use __builtin_expect() to indicate that 
> > contention
> > is unlikely so that gcc will put the contention code out of the main code
> > flow.
> > 
> > A fast mutex only supports lock/unlock, can't be recursive or used with
> > condition variables.  We keep the pthread mutex implementation around as
> > full_mtx_t for the few places where we use condition variables or recursive
> > locking.  For platforms or compilers where futex and atomics aren't 
> > available,
> > mtx_t falls back to the pthread mutex.
> > 
> > The pthread mutex lock/unlock overhead shows up on benchmarks for CPU bound
> > applications.  Most CPU bound cases are helped and some of our internal
> > bind_buffer_object heavy benchmarks gain up to 10%.
> > 
> > Signed-off-by: Kristian Høgsberg 
> > Signed-off-by: Timothy Arceri 
> > ---
> >   configure.ac  |   3 ++
> >   src/util/simple_mtx.h | 146 
> > ++
> 
> Add to util/Makefile.sources

Also add to util/meson.build

Dylan


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Piglit on windows

2017-10-10 Thread Kyriazis, George

> On Oct 10, 2017, at 12:46 PM, Dylan Baker  wrote:
> 
> Quoting Jose Fonseca (2017-10-10 08:41:49)
>> On 10/10/17 16:31, Kyriazis, George wrote:
>>> Hello…
>>> 
>>> Piglit on windows prints out a message saying “Timeout are not implemented 
>>> on Windows.”.  These timeouts are the test timeouts in case a test hangs.
>>> 
>>> What do people do when running piglit on windows and they hit a timeout?  I 
>>> would imagine there would be a non-zero number of people running piglit on 
>>> windows on a regular basis, as a regression tool...
>>> 
>>> Thank you!
>>> 
>>> George
>> 
>> I haven't been involved into piglit Windows testing lately, so my
>> understanding might be dated.
>> 
>> I believe that we have timeouts when we test piglit on Windows. It's not
>> implemented on piglit python framework itself, but rather on VMware
>> testing framework (that driver piglit, and a bunch of other tests.)
>> 
>> That said, I believe it would be better long term if piglit framework
>> had timeouts on Windows, as it can probably track that with finer
>> granularity than we do now by putting a timeout on whole piglit or
>> subsets of piglit tests.
>> 
>> python3's subprocess module has timeout options, so it should be
>> relatively easy to implement on top of it, in a cross-platform manner.
>> 
>> Jose
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> Since I added that warning message...
> 
> There are no timeouts in python 2 on windows, even with 3rd party packages.
> 
> For python 3 to properly handle timeouts you need to kill the process that
> exceeds the expected timeout. Basically when timeout expires the sub-process
> communicate call stops blocking, but doesn't actually kill the process. On 
> Linux
> we ask the process to terminate, wait 3 seconds and then SIGKILL it.
> 
> I don't know how to do that on windows, so I didn't implement it and instead
> windows users get a warning. If someone with a basic grasp of how to kill a
> process on windows wanted that functionality it probably wouldn't be too hard 
> to
> implement.
> 
> Alternatively there are constructs that are only in python 3 that do the 
> killing
> for you, on both Windows and Linux, but they don't have a python 2 equivalent.
> 
Hmm..  Since the current code has special cases between python 2/3 and 
windows/linux, finding a cross-platform method for both OSes seems like a “nice 
to have” at this point, although desirable.

I’ll see if I can find a quick fix for this.

Thank you!

George

> Dylan



signature.asc
Description: Message signed with OpenPGP
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 3/3] amdgpu: use simple mtx

2017-10-10 Thread Marek Olšák
For patches 2-3:

Reviewed-by: Marek Olšák 

Marek

On Tue, Oct 10, 2017 at 4:45 AM, Timothy Arceri  wrote:
> ---
>  src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 34 
> +--
>  src/gallium/winsys/amdgpu/drm/amdgpu_bo.h |  2 +-
>  src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 20 
>  src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c | 28 +++---
>  src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h |  5 ++--
>  5 files changed, 45 insertions(+), 44 deletions(-)
>
> diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c 
> b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
> index 897b4f0596..8c591110bb 100644
> --- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
> +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
> @@ -87,70 +87,70 @@ static bool amdgpu_bo_wait(struct pb_buffer *_buf, 
> uint64_t timeout,
>if (r)
>   fprintf(stderr, "%s: amdgpu_bo_wait_for_idle failed %i\n", __func__,
>   r);
>return !buffer_busy;
> }
>
> if (timeout == 0) {
>unsigned idle_fences;
>bool buffer_idle;
>
> -  mtx_lock(&ws->bo_fence_lock);
> +  simple_mtx_lock(&ws->bo_fence_lock);
>
>for (idle_fences = 0; idle_fences < bo->num_fences; ++idle_fences) {
>   if (!amdgpu_fence_wait(bo->fences[idle_fences], 0, false))
>  break;
>}
>
>/* Release the idle fences to avoid checking them again later. */
>for (unsigned i = 0; i < idle_fences; ++i)
>   amdgpu_fence_reference(&bo->fences[i], NULL);
>
>memmove(&bo->fences[0], &bo->fences[idle_fences],
>(bo->num_fences - idle_fences) * sizeof(*bo->fences));
>bo->num_fences -= idle_fences;
>
>buffer_idle = !bo->num_fences;
> -  mtx_unlock(&ws->bo_fence_lock);
> +  simple_mtx_unlock(&ws->bo_fence_lock);
>
>return buffer_idle;
> } else {
>bool buffer_idle = true;
>
> -  mtx_lock(&ws->bo_fence_lock);
> +  simple_mtx_lock(&ws->bo_fence_lock);
>while (bo->num_fences && buffer_idle) {
>   struct pipe_fence_handle *fence = NULL;
>   bool fence_idle = false;
>
>   amdgpu_fence_reference(&fence, bo->fences[0]);
>
>   /* Wait for the fence. */
> - mtx_unlock(&ws->bo_fence_lock);
> + simple_mtx_unlock(&ws->bo_fence_lock);
>   if (amdgpu_fence_wait(fence, abs_timeout, true))
>  fence_idle = true;
>   else
>  buffer_idle = false;
> - mtx_lock(&ws->bo_fence_lock);
> + simple_mtx_lock(&ws->bo_fence_lock);
>
>   /* Release an idle fence to avoid checking it again later, keeping 
> in
>* mind that the fence array may have been modified by other 
> threads.
>*/
>   if (fence_idle && bo->num_fences && bo->fences[0] == fence) {
>  amdgpu_fence_reference(&bo->fences[0], NULL);
>  memmove(&bo->fences[0], &bo->fences[1],
>  (bo->num_fences - 1) * sizeof(*bo->fences));
>  bo->num_fences--;
>   }
>
>   amdgpu_fence_reference(&fence, NULL);
>}
> -  mtx_unlock(&ws->bo_fence_lock);
> +  simple_mtx_unlock(&ws->bo_fence_lock);
>
>return buffer_idle;
> }
>  }
>
>  static enum radeon_bo_domain amdgpu_bo_get_initial_domain(
>struct pb_buffer *buf)
>  {
> return ((struct amdgpu_winsys_bo*)buf)->initial_domain;
>  }
> @@ -165,24 +165,24 @@ static void amdgpu_bo_remove_fences(struct 
> amdgpu_winsys_bo *bo)
> bo->max_fences = 0;
>  }
>
>  void amdgpu_bo_destroy(struct pb_buffer *_buf)
>  {
> struct amdgpu_winsys_bo *bo = amdgpu_winsys_bo(_buf);
>
> assert(bo->bo && "must not be called for slab entries");
>
> if (bo->ws->debug_all_bos) {
> -  mtx_lock(&bo->ws->global_bo_list_lock);
> +  simple_mtx_lock(&bo->ws->global_bo_list_lock);
>LIST_DEL(&bo->u.real.global_list_item);
>bo->ws->num_buffers--;
> -  mtx_unlock(&bo->ws->global_bo_list_lock);
> +  simple_mtx_unlock(&bo->ws->global_bo_list_lock);
> }
>
> amdgpu_bo_va_op(bo->bo, 0, bo->base.size, bo->va, 0, AMDGPU_VA_OP_UNMAP);
> amdgpu_va_range_free(bo->u.real.va_handle);
> amdgpu_bo_free(bo->bo);
>
> amdgpu_bo_remove_fences(bo);
>
> if (bo->initial_domain & RADEON_DOMAIN_VRAM)
>bo->ws->allocated_vram -= align64(bo->base.size, 
> bo->ws->info.gart_page_size);
> @@ -360,24 +360,24 @@ static const struct pb_vtbl amdgpu_winsys_bo_vtbl = {
> /* other functions are never called */
>  };
>
>  static void amdgpu_add_buffer_to_global_list(struct amdgpu_winsys_bo *bo)
>  {
> struct amdgpu_winsys *ws = bo->ws;
>
> assert(bo->bo);
>
> if (ws->debug_all_bos) {
> -  mtx_lock(&ws->global_bo_list_lock);
> +  simple_mtx_lock(&ws->global_bo_list_lock);
>LIST_ADDTAIL(&bo->u.real.global_list_item, &ws->global_bo_list);
>ws->num_buffers++;
> -  mtx_unlock(&ws->global_bo_list_lo

[Mesa-dev] [PATCH] egl/dri: don't crash when createImageFromRenderbuffer2 is NULL

2017-10-10 Thread Emil Velikov
From: Emil Velikov 

The __DRI_IMAGE version can be 17 or over, while the function pointer is
NULL. Guard for that instead of crashing.

Fixes: bad24395d91 ("egl/dri: use createImageFromRenderbuffer2 when
available")
Cc: Nicolai Hähnle 
Signed-off-by: Emil Velikov 
---
 src/egl/drivers/dri2/egl_dri2.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index 171858bbcd9..d65ced0c22f 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -1890,7 +1890,8 @@ dri2_create_image_khr_renderbuffer(_EGLDisplay *disp, 
_EGLContext *ctx,
   return EGL_NO_IMAGE_KHR;
}
 
-   if (dri2_dpy->image->base.version >= 17) {
+   if (dri2_dpy->image->base.version >= 17 &&
+   dri2_dpy->image->createImageFromRenderbuffer2) {
   unsigned error = ~0;
 
   dri_image = dri2_dpy->image->createImageFromRenderbuffer2(
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] ac/surface: add ac_surface::is_displayable

2017-10-10 Thread Marek Olšák
On Tue, Oct 10, 2017 at 1:11 PM, Daniel Stone  wrote:
> Hi Marek,
>
> On 9 October 2017 at 18:05, Marek Olšák  wrote:
>> surf->is_linear = surf->u.legacy.level[0].mode == 
>> RADEON_SURF_MODE_LINEAR_ALIGNED;
>> +   surf->is_displayable = surf->micro_tile_mode == 
>> RADEON_MICRO_MODE_DISPLAY ||
>> +  surf->micro_tile_mode == 
>> RADEON_MICRO_MODE_ROTATED;
>
> Are linear surfaces not displayable on gfx6 parts?

Yes, they are.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Piglit on windows

2017-10-10 Thread Dylan Baker
Quoting Jose Fonseca (2017-10-10 08:41:49)
> On 10/10/17 16:31, Kyriazis, George wrote:
> > Hello…
> > 
> > Piglit on windows prints out a message saying “Timeout are not implemented 
> > on Windows.”.  These timeouts are the test timeouts in case a test hangs.
> > 
> > What do people do when running piglit on windows and they hit a timeout?  I 
> > would imagine there would be a non-zero number of people running piglit on 
> > windows on a regular basis, as a regression tool...
> > 
> > Thank you!
> > 
> > George
> 
> I haven't been involved into piglit Windows testing lately, so my 
> understanding might be dated.
> 
> I believe that we have timeouts when we test piglit on Windows. It's not 
> implemented on piglit python framework itself, but rather on VMware 
> testing framework (that driver piglit, and a bunch of other tests.)
> 
> That said, I believe it would be better long term if piglit framework 
> had timeouts on Windows, as it can probably track that with finer 
> granularity than we do now by putting a timeout on whole piglit or 
> subsets of piglit tests.
> 
> python3's subprocess module has timeout options, so it should be 
> relatively easy to implement on top of it, in a cross-platform manner.
> 
> Jose
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Since I added that warning message...

There are no timeouts in python 2 on windows, even with 3rd party packages.

For python 3 to properly handle timeouts you need to kill the process that
exceeds the expected timeout. Basically when timeout expires the sub-process
communicate call stops blocking, but doesn't actually kill the process. On Linux
we ask the process to terminate, wait 3 seconds and then SIGKILL it.

I don't know how to do that on windows, so I didn't implement it and instead
windows users get a warning. If someone with a basic grasp of how to kill a
process on windows wanted that functionality it probably wouldn't be too hard to
implement.

Alternatively there are constructs that are only in python 3 that do the killing
for you, on both Windows and Linux, but they don't have a python 2 equivalent.

Dylan


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] configure: Add the new "vc5" driver to the list, requiring a simulator.

2017-10-10 Thread Emil Velikov
On 19 September 2017 at 19:06, Eric Anholt  wrote:
> My intent is to develop the vc5 driver in-tree for some time to build the
> CL generation and shader compiler code, and keep out-of-tree patches for
> talking to an actual kernel driver until the kernel driver can be
> stabilized on the hardware.
>
> v2: Define a HAVE_BROADCOM_DRIVERS, like HAVE_INTEL or HAVE_AMD.
> ---
>  configure.ac| 15 ++-
>  src/Makefile.am |  2 +-
>  2 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/configure.ac b/configure.ac
> index 605c9b4e992f..5c5aa984fc78 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -1267,7 +1267,7 @@ GALLIUM_DRIVERS_DEFAULT="r300,r600,svga,swrast"
>  AC_ARG_WITH([gallium-drivers],
>  [AS_HELP_STRING([--with-gallium-drivers@<:@=DIRS...@:>@],
>  [comma delimited Gallium drivers list, e.g.
> -
> "i915,nouveau,r300,r600,radeonsi,freedreno,pl111,svga,swrast,swr,vc4,virgl,etnaviv,imx"
> +
> "i915,nouveau,r300,r600,radeonsi,freedreno,pl111,svga,swrast,swr,vc4,vc5,virgl,etnaviv,imx"
>  @<:@default=r300,r600,svga,swrast@:>@])],
>  [with_gallium_drivers="$withval"],
>  [with_gallium_drivers="$GALLIUM_DRIVERS_DEFAULT"])
> @@ -2579,6 +2579,14 @@ if test -n "$with_gallium_drivers"; then
> DEFINES="$DEFINES -DUSE_VC4_SIMULATOR"],
>[USE_VC4_SIMULATOR=no])
>  ;;
> +xvc5)
> +HAVE_GALLIUM_VC5=yes
> +
> +PKG_CHECK_MODULES([VC5_SIMULATOR], [v3dv3],
> +  [USE_VC5_SIMULATOR=yes;
> +   DEFINES="$DEFINES -DUSE_VC5_SIMULATOR"],
> +  [AC_MSG_ERROR([vc5 requires the simulator])])
> +;;
Sounds reasonable.

Reviewed-by: Emil Velikov 

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 2/2] egl/wayland: Don't use dmabuf with no modifiers

2017-10-10 Thread Emil Velikov
Hi Dan,

Small question, which is somewhat orthogonal to the patch. Sorry :-\

On 2 October 2017 at 17:31, Daniel Stone  wrote:

>
> -   if (dri2_dpy->wl_dmabuf && dri2_dpy->image->base.version >= 15) {
> -  struct zwp_linux_buffer_params_v1 *params;
> +   if (dri2_dpy->image->base.version >= 15) {
Why are we using 15 here?

AFAICT the __DRI_IMAGE_ATTRIB_MODIFIER_{UPPER,LOWER} attributes were
introduced with v14.
Even then there should be no need for a check, since older versions
will just bail and we'll end up with DRM_FORMAT_MOD_INVALID.

Worth adding a small comment or dropping it all together?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] r600: drop tc_L2_dirty bit, this was SI only.

2017-10-10 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Mon, Oct 9, 2017 at 10:28 PM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> Signed-off-by: Dave Airlie 
> ---
>  src/gallium/drivers/r600/r600_buffer_common.c |  2 --
>  src/gallium/drivers/r600/r600_pipe_common.h   | 12 
>  src/gallium/drivers/r600/r600_query.c |  1 -
>  3 files changed, 15 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/r600_buffer_common.c 
> b/src/gallium/drivers/r600/r600_buffer_common.c
> index f35bc2c..a6e3b7f 100644
> --- a/src/gallium/drivers/r600/r600_buffer_common.c
> +++ b/src/gallium/drivers/r600/r600_buffer_common.c
> @@ -234,7 +234,6 @@ bool r600_alloc_resource(struct r600_common_screen 
> *rscreen,
> pb_reference(&old_buf, NULL);
>
> util_range_set_empty(&res->valid_buffer_range);
> -   res->TC_L2_dirty = false;
>
> /* Print debug information. */
> if (rscreen->debug_flags & DBG_VM && res->b.b.target == PIPE_BUFFER) {
> @@ -607,7 +606,6 @@ r600_alloc_buffer_struct(struct pipe_screen *screen,
>
> rbuffer->buf = NULL;
> rbuffer->bind_history = 0;
> -   rbuffer->TC_L2_dirty = false;
> util_range_init(&rbuffer->valid_buffer_range);
> return rbuffer;
>  }
> diff --git a/src/gallium/drivers/r600/r600_pipe_common.h 
> b/src/gallium/drivers/r600/r600_pipe_common.h
> index 87c48e3..a6406cf 100644
> --- a/src/gallium/drivers/r600/r600_pipe_common.h
> +++ b/src/gallium/drivers/r600/r600_pipe_common.h
> @@ -165,18 +165,6 @@ struct r600_resource {
>   */
> struct util_range   valid_buffer_range;
>
> -   /* For buffers only. This indicates that a write operation has been
> -* performed by TC L2, but the cache hasn't been flushed.
> -* Any hw block which doesn't use or bypasses TC L2 should check this
> -* flag and flush the cache before using the buffer.
> -*
> -* For example, TC L2 must be flushed if a buffer which has been
> -* modified by a shader store instruction is about to be used as
> -* an index buffer. The reason is that VGT DMA index fetching doesn't
> -* use TC L2.
> -*/
> -   boolTC_L2_dirty;
> -
> /* Whether the resource has been exported via resource_get_handle. */
> unsignedexternal_usage; /* 
> PIPE_HANDLE_USAGE_* */
>
> diff --git a/src/gallium/drivers/r600/r600_query.c 
> b/src/gallium/drivers/r600/r600_query.c
> index 86e6d09..aa3e36f 100644
> --- a/src/gallium/drivers/r600/r600_query.c
> +++ b/src/gallium/drivers/r600/r600_query.c
> @@ -1739,7 +1739,6 @@ static void r600_query_hw_get_result_resource(struct 
> r600_common_context *rctx,
> ssbo[2].buffer_offset = offset;
> ssbo[2].buffer_size = 8;
>
> -   ((struct r600_resource *)resource)->TC_L2_dirty = 
> true;
> }
>
> rctx->b.set_shader_buffers(&rctx->b, PIPE_SHADER_COMPUTE, 0, 
> 3, ssbo);
> --
> 2.9.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] st/mesa: don't assign prog->ShadowSamplers

2017-10-10 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Fri, Oct 6, 2017 at 10:39 PM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> It's not used, and the assignment for the TGSI case was incorrect
> for sampler arrays.
> ---
>  src/mesa/state_tracker/st_glsl_to_nir.cpp  | 1 -
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 4 
>  2 files changed, 5 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp 
> b/src/mesa/state_tracker/st_glsl_to_nir.cpp
> index 06a8ee8c612..5a439aaf92a 100644
> --- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
> @@ -428,21 +428,20 @@ st_nir_get_mesa_program(struct gl_context *ctx,
>
> if (ctx->_Shader->Flags & GLSL_DUMP) {
>_mesa_log("\n");
>_mesa_log("GLSL IR for linked %s program %d:\n",
>   _mesa_shader_stage_to_string(shader->Stage),
>   shader_program->Name);
>_mesa_print_ir(_mesa_get_log_file(), shader->ir, NULL);
>_mesa_log("\n\n");
> }
>
> -   prog->ShadowSamplers = shader->shadow_samplers;
> prog->ExternalSamplersUsed = gl_external_samplers(prog);
> _mesa_update_shader_textures_used(shader_program, prog);
>
> /* Avoid reallocation of the program parameter list, because the uniform
>  * storage is only associated with the original parameter list.
>  * This should be enough for Bitmap and DrawPixels constants.
>  */
> _mesa_reserve_parameter_storage(prog->Parameters, 8);
>
> /* This has to be done last.  Any operation the can cause
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index 4b365c84817..1cfc9d963dc 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -4468,24 +4468,20 @@ count_resources(glsl_to_tgsi_visitor *v, gl_program 
> *prog)
>if (inst->info->is_tex) {
>   for (int i = 0; i < inst->sampler_array_size; i++) {
>  unsigned idx = inst->sampler_base + i;
>  v->samplers_used |= 1u << idx;
>
>  debug_assert(idx < (int)ARRAY_SIZE(v->sampler_types));
>  v->sampler_types[idx] = inst->tex_type;
>  v->sampler_targets[idx] =
> st_translate_texture_target(inst->tex_target, 
> inst->tex_shadow);
>
> -if (inst->tex_shadow) {
> -   prog->ShadowSamplers |= 1 << (inst->resource.index + i);
> -}
> -
>  if (inst->op == TGSI_OPCODE_TXF || inst->op == 
> TGSI_OPCODE_TXF_LZ) {
> prog->TexelFetchSamplers |= 1u << idx;
>  }
>   }
>}
>
>if (inst->tex_target == TEXTURE_EXTERNAL_INDEX)
>   prog->ExternalSamplersUsed |= 1 << inst->resource.index;
>
>if (inst->resource.file != PROGRAM_UNDEFINED && (
> --
> 2.11.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] st/glsl_to_tgsi: ignore GL_TEXTURE_SRGB_DECODE_EXT for samplers used with texelFetch*()

2017-10-10 Thread Marek Olšák
Reviewed-by: Marek Olšák 

BTW, desktop OpenGL does have GL_TEXTURE_SRGB_DECODE_EXT, so the
comments are incorrect.

Marke

On Fri, Oct 6, 2017 at 10:39 PM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> See the comment for the relevant spec quote.
>
> Fixes 
> dEQP-GLES31.functional.srgb_texture_decode.skip_decode.srgba8.texel_fetch
> --
> Note that this is on top of the texture locking series which I have
> sent out a minute ago.
> ---
>  src/mesa/main/mtypes.h |  1 +
>  src/mesa/state_tracker/st_atom_texture.c   | 39 
> +++---
>  src/mesa/state_tracker/st_cb_texture.c |  8 +-
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp |  4 +++
>  src/mesa/state_tracker/st_sampler_view.c   | 19 +--
>  src/mesa/state_tracker/st_sampler_view.h   |  3 ++-
>  src/mesa/state_tracker/st_texture.c|  7 +-
>  src/mesa/state_tracker/st_texture.h|  7 +++---
>  8 files changed, 71 insertions(+), 17 deletions(-)
>
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
> index abda1a36e46..a45409ed2d9 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@ -2081,20 +2081,21 @@ struct gl_program
>
> bool is_arb_asm; /** Is this an ARB assembly-style program */
>
> /** Is this program written to on disk shader cache */
> bool program_written_to_cache;
>
> GLbitfield64 SecondaryOutputsWritten; /**< Subset of OutputsWritten 
> outputs written with non-zero index. */
> GLbitfield TexturesUsed[MAX_COMBINED_TEXTURE_IMAGE_UNITS];  /**< 
> TEXTURE_x_BIT bitmask */
> GLbitfield SamplersUsed;   /**< Bitfield of which samplers are used */
> GLbitfield ShadowSamplers; /**< Texture units used for shadow sampling. */
> +   GLbitfield TexelFetchSamplers; /**< Texture units used for texelFetch*(). 
> */
> GLbitfield ExternalSamplersUsed; /**< Texture units used for 
> samplerExternalOES */
>
> /* Fragement shader only fields */
> GLboolean OriginUpperLeft;
> GLboolean PixelCenterInteger;
>
> /** Named parameters, constants, etc. from program text */
> struct gl_program_parameter_list *Parameters;
>
> /** Map from sampler unit to texture unit (set by glUniform1i()) */
> diff --git a/src/mesa/state_tracker/st_atom_texture.c 
> b/src/mesa/state_tracker/st_atom_texture.c
> index 90828bb4cf9..c350a098097 100644
> --- a/src/mesa/state_tracker/st_atom_texture.c
> +++ b/src/mesa/state_tracker/st_atom_texture.c
> @@ -51,21 +51,22 @@
>  #include "util/u_inlines.h"
>  #include "cso_cache/cso_context.h"
>
>
>  /**
>   * Get a pipe_sampler_view object from a texture unit.
>   */
>  void
>  st_update_single_texture(struct st_context *st,
>   struct pipe_sampler_view **sampler_view,
> - GLuint texUnit, bool glsl130_or_later)
> + GLuint texUnit, bool glsl130_or_later,
> + bool ignore_srgb_decode)
>  {
> struct gl_context *ctx = st->ctx;
> const struct gl_sampler_object *samp;
> struct gl_texture_object *texObj;
> struct st_texture_object *stObj;
>
> samp = _mesa_get_samplerobj(ctx, texUnit);
>
> texObj = ctx->Texture.Unit[texUnit]._Current;
> assert(texObj);
> @@ -83,55 +84,85 @@ st_update_single_texture(struct st_context *st,
>*sampler_view = NULL;
>return;
> }
>
> if (texObj->TargetIndex == TEXTURE_EXTERNAL_INDEX &&
> stObj->pt->screen->resource_changed)
>   stObj->pt->screen->resource_changed(stObj->pt->screen, stObj->pt);
>
> *sampler_view =
>st_get_texture_sampler_view_from_stobj(st, stObj, samp,
> - glsl130_or_later);
> + glsl130_or_later,
> + ignore_srgb_decode);
>  }
>
>
>
>  static void
>  update_textures(struct st_context *st,
>  enum pipe_shader_type shader_stage,
>  const struct gl_program *prog,
>  struct pipe_sampler_view **sampler_views,
>  unsigned *out_num_textures)
>  {
> const GLuint old_max = *out_num_textures;
> GLbitfield samplers_used = prog->SamplersUsed;
> +   GLbitfield texel_fetch_samplers = prog->TexelFetchSamplers;
> GLbitfield free_slots = ~prog->SamplersUsed;
> GLbitfield external_samplers_used = prog->ExternalSamplersUsed;
> GLuint unit;
>
> if (samplers_used == 0x0 && old_max == 0)
>return;
>
> unsigned num_textures = 0;
>
> /* prog->sh.data is NULL if it's ARB_fragment_program */
> bool glsl130 = (prog->sh.data ? prog->sh.data->Version : 0) >= 130;
>
> /* loop over sampler units (aka tex image units) */
> for (unit = 0; samplers_used || unit < old_max;
> -unit++, samplers_used >>= 1) {
> +unit++, samplers_used >>= 1, texel_fetch_samplers >>= 1) {
>struct pipe_sampler_view *sampler_view = NULL;
>
>if (samplers_used & 

Re: [Mesa-dev] [PATCH] git_sha1_gen: accept MESA_GIT_SHA1_OVERRIDE env var

2017-10-10 Thread Dylan Baker
Thanks for making those changes.

Reviewed-by: Dylan Baker 

Quoting Brian Paul (2017-10-09 19:26:22)
> If one uses a parent build script to download/build Mesa we may not
> have a full git repository (maybe a tar archive) so the 'git rev-parse'
> command will fail.
> 
> This updates the script to look for a MESA_GIT_SHA1_OVERRIDE env var.
> If it's set, use that sha1 instead of using git rev-parse.  With this
> change we can put a git hash in the GL_VERSION string even when we
> don't have a git repo.
> 
> v2: incorporate Dylan's suggestions to simplify the code
> ---
>  bin/git_sha1_gen.py | 39 ++-
>  1 file changed, 26 insertions(+), 13 deletions(-)
> 
> diff --git a/bin/git_sha1_gen.py b/bin/git_sha1_gen.py
> index 97981fb..87e06a8 100755
> --- a/bin/git_sha1_gen.py
> +++ b/bin/git_sha1_gen.py
> @@ -1,20 +1,33 @@
>  #!/usr/bin/env python
>  
> +"""
> +Generate the contents of the git_sha1.h file.
> +The output of this script goes to stdout.
> +"""
> +
> +
> +import os
>  import os.path
>  import subprocess
>  import sys
>  
> -git_dir = os.path.join(os.path.dirname(sys.argv[0]), '..', '.git')
> -try:
> -git_sha1 = subprocess.check_output([
> -'git',
> -'--git-dir=' + git_dir,
> -'rev-parse',
> -'--short=10',
> -'HEAD',
> -], stderr=open(os.devnull, 'w')).decode("ascii")
> -except:
> -# don't print anything if it fails
> -pass
> -else:
> +
> +def get_git_sha1():
> +"""Try to get the git SHA1 with git rev-parse."""
> +git_dir = os.path.join(os.path.dirname(sys.argv[0]), '..', '.git')
> +try:
> +git_sha1 = subprocess.check_output([
> +'git',
> +'--git-dir=' + git_dir,
> +'rev-parse',
> +'HEAD',
> +], stderr=open(os.devnull, 'w')).decode("ascii")
> +except:
> +# don't print anything if it fails
> +git_sha1 = ''
> +return git_sha1
> +
> +
> +git_sha1 = os.environ.get('MESA_GIT_SHA1_OVERRIDE', get_git_sha1())[:10]
> +if git_sha1:
>  sys.stdout.write('#define MESA_GIT_SHA1 "git-%s"\n' % git_sha1.rstrip())
> -- 
> 1.9.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/6] st/dri: implement createImageFromRenderbuffer(2)

2017-10-10 Thread Nicolai Hähnle

On 10.10.2017 18:30, Emil Velikov wrote:

On 6 October 2017 at 21:16, Nicolai Hähnle  wrote:

From: Nicolai Hähnle 

Tested with dEQP-EGL tests.



+.createImageWithModifiers = NULL,
+.createImageFromDmaBufs2  = NULL,
+.queryDmaBufFormats   = NULL,
+.queryDmaBufModifiers = NULL,
+.queryDmaBufFormatModifierAttribs = NULL,


One of the nice things about C99 initializers is that you don't need
these lines ;-)


I added them because it highlights things that are not supported. I'll 
admit that that's a matter of taste.


Cheers,
Nicolai


Eric's got a good point through - please mention the set of tests and
flip the order of 5 and 6.

With the small nitpick with in 4/6 patches 1-4 are
Reviewed-by: Emil Velikov 

I'll send some patches for the i965/!Gallium drivers as a follow-up.

Thanks
Emil



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Testing out Kristian's fast mtx patch

2017-10-10 Thread Kristian Høgsberg
On Mon, Oct 9, 2017 at 7:45 PM, Timothy Arceri  wrote:
> After a recent discussion about this code from 2015 I was curious
> to give it a try. The outstanding review item was that we shouldn't
> be replacing the C11 mtx type/functions with our own, so I've renamed
> the fast path to simple_mtx* and added a couple of patches to make use
> of it.
>
> The idea is this fast mtx can be used in place of the full mtx
> implementation when its of type mtx_plain.
>
> I though if anywhere we might see a change in the drawoverhead piglit
> test but I didn't see any real change.
>
> Anyway since I've made the updates I thought I'd send it out. Maybe
> someone else might find some better results. Kristian reported a 10%
> increase in some internal Intel benchmarks, I wonder if thats still the
> case.

Hi Tim, thanks for reviving this. The one case where I saw that big
win was one of Intels internal micro-benchmarks (SynMark). It was
bottlenecking on bind/unbind of various objects and much of the
overhead was in the locking there. Any real world usecase without
excessive bind/unbind probably won't see any improvement.

Kristian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] mesa: Implement a new GL_MESA_tile_raster_order extension.

2017-10-10 Thread Eric Anholt
Eric Anholt  writes:

> [ Unknown signature status ]
> Nicolai Hähnle  writes:
>
>> Patches 1 & 2:
>>
>> Acked-by: Nicolai Hähnle 
>>
>> Patches 4 & 5:
>>
>> Reviewed-by: Nicolai Hähnle 
>> (for the gallium parts; it would have been nice to split patch 4 up into 
>> gallium and driver parts, but I won't insist)
>
> Thanks!  I've pinged on the registry PR, so hopefully that gets merged
> soon.

And they've merged it.  Pushing to Mesa now.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-10-10 Thread Connor Abbott
No, this is a different situation. On i965, the hardware keeps track
of that stuff automatically. The problem is that the header, which is
shared across all the threads in a wavefront and specifies stuff like
LOD, LOD bias, texture array offset, etc., uses the same register
space as normal, vector registers. So, to set it up before we do the
texture, we have to disable the usual exec masking stuff and just
write to certain channels of some random register indiscriminately,
regardless of whether the execution mask is enabled for them. This
then leads to the problems described.

I'm not sure how nvidia handles it, but they probably do something
much more sane like AMD. Although it turns out that you probably have
to deal with this problem anyways when implementing the new subgroup
reduction stuff, since the implementation usually involves some kind
of exec-mask-ignoring write.


On Tue, Oct 10, 2017 at 12:29 PM, Ilia Mirkin  wrote:
> I hope I'm not butting in too much with irrelevant info, but I think
> we had a similar issue in nouveau. On Kepler, texture instructions
> take an arbitrary amount of time to complete, and only write into
> destination registers on completion, while other instructions are
> executing after that tex dispatch. [You have to insert a barrier to
> force a wait for the tex to complete.]
>
> We had a clever thing that figured things out based on texture result
> uses and worked for reasonable cases. However the tricky case to
> handle turned out to be
>
> color = texture();
> if (a) {
>   use color
> } else {
>   dont use color
> }
>
> In that situation, the texture call would randomly overwrite registers
> when we went into the else case, since nothing used the texture
> results and that wasn't properly tracked.
>
> I know what you're going to say - just code-motion the texture into
> the if. But that's not always possible -- the actual original
> situation also added a loops with conditional texture calls to
> complicate matters.
>
> Not sure if this is exactly your situation, but thought I'd point it
> out as it may be relevant.
>
> Cheers,
>
>   -ilia
>
> On Tue, Oct 10, 2017 at 12:16 PM, Connor Abbott  wrote:
>> I'm a little nervous about this, because really, the only solution to
>> this problem is to ignore all non-WE_all definitions of all variables
>> in liveness analysis. For example, in something like:
>>
>> vec4 color2 = ...
>> if (...) {
>>color2 = texture();
>> }
>>
>> texture() can also overwrite inactive channels of color2. We happen to
>> get this right because we turn live ranges into live intervals without
>> holes, but I can't come up with a good reason why that would save us
>> in all cases except the one in this patch -- which makes me worry that
>> we'll find yet another case where there's a similar problem. I think
>> it would be clearer if we what I said above, i.e. ignore all
>> non-WE_all definitions, which will make things much worse, but then
>> apply Curro's patch which will return things to pretty much how they
>> were before, except this case will be fixed and maybe some other cases
>> we haven't thought of.
>>
>>
>>
>> On Thu, Oct 5, 2017 at 2:52 PM, Jason Ekstrand  wrote:
>>> No Shader-db changes.
>>>
>>> Cc: mesa-sta...@lists.freedesktop.org
>>> ---
>>>  src/intel/compiler/brw_fs_live_variables.cpp | 55 
>>> 
>>>  1 file changed, 55 insertions(+)
>>>
>>> diff --git a/src/intel/compiler/brw_fs_live_variables.cpp 
>>> b/src/intel/compiler/brw_fs_live_variables.cpp
>>> index c449672..380060d 100644
>>> --- a/src/intel/compiler/brw_fs_live_variables.cpp
>>> +++ b/src/intel/compiler/brw_fs_live_variables.cpp
>>> @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
>>>   }
>>>}
>>> }
>>> +
>>> +   /* Due to the explicit way the SIMD data is handled on GEN, we need to 
>>> be a
>>> +* bit more careful with live ranges and loops.  Consider the following
>>> +* example:
>>> +*
>>> +*vec4 color2;
>>> +*while (1) {
>>> +*   vec4 color = texture();
>>> +*   if (...) {
>>> +*  color2 = color * 2;
>>> +*  break;
>>> +*   }
>>> +*}
>>> +*gl_FragColor = color2;
>>> +*
>>> +* In this case, the definition of color2 dominates the use because the
>>> +* loop only has the one exit.  This means that the live range interval 
>>> for
>>> +* color2 goes from the statement in the if to it's use below the loop.
>>> +* Now suppose that the texture operation has a header register that 
>>> gets
>>> +* assigned one of the registers used for color2.  If the loop 
>>> condition is
>>> +* non-uniform and some of the threads will take the and others will
>>> +* continue.  In this case, the next pass through the loop, the WE_all
>>> +* setup of the header register will stomp the disabled channels of 
>>> color2
>>> +* and corrupt the value.
>>> +*
>>> +* This same problem can occur if you 

Re: [Mesa-dev] [PATCH 6/6] st/dri: implement createImageFromRenderbuffer(2)

2017-10-10 Thread Emil Velikov
On 6 October 2017 at 21:16, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> Tested with dEQP-EGL tests.

> +.createImageWithModifiers = NULL,
> +.createImageFromDmaBufs2  = NULL,
> +.queryDmaBufFormats   = NULL,
> +.queryDmaBufModifiers = NULL,
> +.queryDmaBufFormatModifierAttribs = NULL,

One of the nice things about C99 initializers is that you don't need
these lines ;-)
Eric's got a good point through - please mention the set of tests and
flip the order of 5 and 6.

With the small nitpick with in 4/6 patches 1-4 are
Reviewed-by: Emil Velikov 

I'll send some patches for the i965/!Gallium drivers as a follow-up.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 1/3] mesa: Add new fast mtx_t mutex type for basic use cases

2017-10-10 Thread Nicolai Hähnle

On 10.10.2017 04:45, Timothy Arceri wrote:

While modern pthread mutexes are very fast, they still incur a call to an
external DSO and overhead of the generality and features of pthread mutexes.
Most mutexes in mesa only needs lock/unlock, and the idea here is that we can
inline the atomic operation and make the fast case just two intructions.
Mutexes are subtle and finicky to implement, so we carefully copy the
implementation from Ulrich Dreppers well-written and well-reviewed paper:

   "Futexes Are Tricky"
   http://www.akkadia.org/drepper/futex.pdf

We implement "mutex3", which gives us a mutex that has no syscalls on
uncontended lock or unlock.  Further, the uncontended case boils down to a
cmpxchg and an untaken branch and the uncontended unlock is just a locked decr
and an untaken branch.  We use __builtin_expect() to indicate that contention
is unlikely so that gcc will put the contention code out of the main code
flow.

A fast mutex only supports lock/unlock, can't be recursive or used with
condition variables.  We keep the pthread mutex implementation around as
full_mtx_t for the few places where we use condition variables or recursive
locking.  For platforms or compilers where futex and atomics aren't available,
mtx_t falls back to the pthread mutex.

The pthread mutex lock/unlock overhead shows up on benchmarks for CPU bound
applications.  Most CPU bound cases are helped and some of our internal
bind_buffer_object heavy benchmarks gain up to 10%.

Signed-off-by: Kristian Høgsberg 
Signed-off-by: Timothy Arceri 
---
  configure.ac  |   3 ++
  src/util/simple_mtx.h | 146 ++


Add to util/Makefile.sources


[snip]

+static inline void
+simple_mtx_init(simple_mtx_t *mtx, int type)
+{
+   assert(type == mtx_plain);


mtx_plain is undefined here.

Cheers,
Nicolai


+
+   mtx->val = 0;
+}
+
+static inline void
+simple_mtx_destroy(simple_mtx_t *mtx)
+{
+}
+
+static inline void
+simple_mtx_lock(simple_mtx_t *mtx)
+{
+   uint32_t c;
+
+   c = __sync_val_compare_and_swap(&mtx->val, 0, 1);
+   if (__builtin_expect(c != 0, 0)) {
+  if (c != 2)
+ c = __sync_lock_test_and_set(&mtx->val, 2);
+  while (c != 0) {
+ futex_wait(&mtx->val, 2);
+ c = __sync_lock_test_and_set(&mtx->val, 2);
+  }
+   }
+}
+
+static inline void
+simple_mtx_unlock(simple_mtx_t *mtx)
+{
+   uint32_t c;
+
+   c = __sync_fetch_and_sub(&mtx->val, 1);
+   if (__builtin_expect(c != 1, 0)) {
+  mtx->val = 0;
+  futex_wake(&mtx->val);
+   }
+}
+
+#else
+
+typedef simple_mtx_t mtx_t;
+
+#define _SIMPLE_MTX_INITIALIZER_NP _MTX_INITIALIZER_NP
+
+static inline void
+simple_mtx_init(simple_mtx_t *mtx, int type)
+{
+   mtx_init(mtx, type);
+}
+
+static inline void
+simple_mtx_destroy(simple_mtx_t *mtx)
+{
+   mtx_destroy(mtx);
+}
+
+static inline void
+simple_mtx_lock(simple_mtx_t *mtx)
+{
+   mtx_lock(mtx);
+}
+
+static inline void
+simple_mtx_unlock(simple_mtx_t *mtx)
+{
+   mtx_unlock(mtx);
+}
+
+#endif




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-10-10 Thread Ilia Mirkin
I hope I'm not butting in too much with irrelevant info, but I think
we had a similar issue in nouveau. On Kepler, texture instructions
take an arbitrary amount of time to complete, and only write into
destination registers on completion, while other instructions are
executing after that tex dispatch. [You have to insert a barrier to
force a wait for the tex to complete.]

We had a clever thing that figured things out based on texture result
uses and worked for reasonable cases. However the tricky case to
handle turned out to be

color = texture();
if (a) {
  use color
} else {
  dont use color
}

In that situation, the texture call would randomly overwrite registers
when we went into the else case, since nothing used the texture
results and that wasn't properly tracked.

I know what you're going to say - just code-motion the texture into
the if. But that's not always possible -- the actual original
situation also added a loops with conditional texture calls to
complicate matters.

Not sure if this is exactly your situation, but thought I'd point it
out as it may be relevant.

Cheers,

  -ilia

On Tue, Oct 10, 2017 at 12:16 PM, Connor Abbott  wrote:
> I'm a little nervous about this, because really, the only solution to
> this problem is to ignore all non-WE_all definitions of all variables
> in liveness analysis. For example, in something like:
>
> vec4 color2 = ...
> if (...) {
>color2 = texture();
> }
>
> texture() can also overwrite inactive channels of color2. We happen to
> get this right because we turn live ranges into live intervals without
> holes, but I can't come up with a good reason why that would save us
> in all cases except the one in this patch -- which makes me worry that
> we'll find yet another case where there's a similar problem. I think
> it would be clearer if we what I said above, i.e. ignore all
> non-WE_all definitions, which will make things much worse, but then
> apply Curro's patch which will return things to pretty much how they
> were before, except this case will be fixed and maybe some other cases
> we haven't thought of.
>
>
>
> On Thu, Oct 5, 2017 at 2:52 PM, Jason Ekstrand  wrote:
>> No Shader-db changes.
>>
>> Cc: mesa-sta...@lists.freedesktop.org
>> ---
>>  src/intel/compiler/brw_fs_live_variables.cpp | 55 
>> 
>>  1 file changed, 55 insertions(+)
>>
>> diff --git a/src/intel/compiler/brw_fs_live_variables.cpp 
>> b/src/intel/compiler/brw_fs_live_variables.cpp
>> index c449672..380060d 100644
>> --- a/src/intel/compiler/brw_fs_live_variables.cpp
>> +++ b/src/intel/compiler/brw_fs_live_variables.cpp
>> @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
>>   }
>>}
>> }
>> +
>> +   /* Due to the explicit way the SIMD data is handled on GEN, we need to 
>> be a
>> +* bit more careful with live ranges and loops.  Consider the following
>> +* example:
>> +*
>> +*vec4 color2;
>> +*while (1) {
>> +*   vec4 color = texture();
>> +*   if (...) {
>> +*  color2 = color * 2;
>> +*  break;
>> +*   }
>> +*}
>> +*gl_FragColor = color2;
>> +*
>> +* In this case, the definition of color2 dominates the use because the
>> +* loop only has the one exit.  This means that the live range interval 
>> for
>> +* color2 goes from the statement in the if to it's use below the loop.
>> +* Now suppose that the texture operation has a header register that gets
>> +* assigned one of the registers used for color2.  If the loop condition 
>> is
>> +* non-uniform and some of the threads will take the and others will
>> +* continue.  In this case, the next pass through the loop, the WE_all
>> +* setup of the header register will stomp the disabled channels of 
>> color2
>> +* and corrupt the value.
>> +*
>> +* This same problem can occur if you have a mix of 64, 32, and 16-bit
>> +* registers because the channels do not line up or if you have a SIMD16
>> +* program and the first half of one value overlaps the second half of 
>> the
>> +* other.
>> +*
>> +* To solve this problem, we take any VGRFs whose live ranges cross the
>> +* while instruction of a loop and extend their live ranges to the top of
>> +* the loop.  This more accurately models the hardware because the value 
>> in
>> +* the VGRF needs to be carried through subsequent loop iterations in 
>> order
>> +* to remain valid when we finally do break.
>> +*/
>> +   foreach_block (block, cfg) {
>> +  if (block->end()->opcode != BRW_OPCODE_WHILE)
>> + continue;
>> +
>> +  /* This is a WHILE instrution. Find the DO block. */
>> +  bblock_t *do_block = NULL;
>> +  foreach_list_typed(bblock_link, child_link, link, &block->children) {
>> + if (child_link->block->start_ip < block->end_ip) {
>> +assert(do_block == NULL);
>> +do_block = child_link->block;
>> +  

Re: [Mesa-dev] [PATCH 3/6] egl/dri: use createImageFromRenderbuffer2 when available

2017-10-10 Thread Nicolai Hähnle

On 10.10.2017 18:24, Emil Velikov wrote:

On 6 October 2017 at 21:16, Nicolai Hähnle  wrote:


+   if (dri2_dpy->image->base.version >= 17) {

Don't forget to check for the function pointer:

if (dri2_dpy->image->base.version >= 17 &&
dri2_dpy->image->createImageFromRenderbuffer2) {


Too late :/




-Emil



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/6] egl/dri: use createImageFromRenderbuffer2 when available

2017-10-10 Thread Emil Velikov
On 6 October 2017 at 21:16, Nicolai Hähnle  wrote:

> +   if (dri2_dpy->image->base.version >= 17) {
Don't forget to check for the function pointer:

if (dri2_dpy->image->base.version >= 17 &&
dri2_dpy->image->createImageFromRenderbuffer2) {

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/6] egl/dri: remove old left-overs

2017-10-10 Thread Emil Velikov
On 6 October 2017 at 21:16, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> ---
>  src/egl/drivers/dri2/platform_x11_dri3.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/src/egl/drivers/dri2/platform_x11_dri3.c 
> b/src/egl/drivers/dri2/platform_x11_dri3.c
> index 45bb56ca17e..eadd37141e0 100644
> --- a/src/egl/drivers/dri2/platform_x11_dri3.c
> +++ b/src/egl/drivers/dri2/platform_x11_dri3.c
> @@ -297,22 +297,20 @@ dri3_create_image_khr_pixmap(_EGLDisplay *disp, 
> _EGLContext *ctx,
> free(bp_reply);
>
> return &dri2_img->base;
>  }
>
>  static _EGLImage *
>  dri3_create_image_khr(_EGLDriver *drv, _EGLDisplay *disp,
>_EGLContext *ctx, EGLenum target,
>EGLClientBuffer buffer, const EGLint *attr_list)
>  {
> -   (void) drv;
> -
I wouldn't bother with this for now - there's a ton more left.
We should drop the drv argument through egl, but rather boring and mechanical.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-10-10 Thread Connor Abbott
I'm a little nervous about this, because really, the only solution to
this problem is to ignore all non-WE_all definitions of all variables
in liveness analysis. For example, in something like:

vec4 color2 = ...
if (...) {
   color2 = texture();
}

texture() can also overwrite inactive channels of color2. We happen to
get this right because we turn live ranges into live intervals without
holes, but I can't come up with a good reason why that would save us
in all cases except the one in this patch -- which makes me worry that
we'll find yet another case where there's a similar problem. I think
it would be clearer if we what I said above, i.e. ignore all
non-WE_all definitions, which will make things much worse, but then
apply Curro's patch which will return things to pretty much how they
were before, except this case will be fixed and maybe some other cases
we haven't thought of.



On Thu, Oct 5, 2017 at 2:52 PM, Jason Ekstrand  wrote:
> No Shader-db changes.
>
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/intel/compiler/brw_fs_live_variables.cpp | 55 
> 
>  1 file changed, 55 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs_live_variables.cpp 
> b/src/intel/compiler/brw_fs_live_variables.cpp
> index c449672..380060d 100644
> --- a/src/intel/compiler/brw_fs_live_variables.cpp
> +++ b/src/intel/compiler/brw_fs_live_variables.cpp
> @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
>   }
>}
> }
> +
> +   /* Due to the explicit way the SIMD data is handled on GEN, we need to be 
> a
> +* bit more careful with live ranges and loops.  Consider the following
> +* example:
> +*
> +*vec4 color2;
> +*while (1) {
> +*   vec4 color = texture();
> +*   if (...) {
> +*  color2 = color * 2;
> +*  break;
> +*   }
> +*}
> +*gl_FragColor = color2;
> +*
> +* In this case, the definition of color2 dominates the use because the
> +* loop only has the one exit.  This means that the live range interval 
> for
> +* color2 goes from the statement in the if to it's use below the loop.
> +* Now suppose that the texture operation has a header register that gets
> +* assigned one of the registers used for color2.  If the loop condition 
> is
> +* non-uniform and some of the threads will take the and others will
> +* continue.  In this case, the next pass through the loop, the WE_all
> +* setup of the header register will stomp the disabled channels of color2
> +* and corrupt the value.
> +*
> +* This same problem can occur if you have a mix of 64, 32, and 16-bit
> +* registers because the channels do not line up or if you have a SIMD16
> +* program and the first half of one value overlaps the second half of the
> +* other.
> +*
> +* To solve this problem, we take any VGRFs whose live ranges cross the
> +* while instruction of a loop and extend their live ranges to the top of
> +* the loop.  This more accurately models the hardware because the value 
> in
> +* the VGRF needs to be carried through subsequent loop iterations in 
> order
> +* to remain valid when we finally do break.
> +*/
> +   foreach_block (block, cfg) {
> +  if (block->end()->opcode != BRW_OPCODE_WHILE)
> + continue;
> +
> +  /* This is a WHILE instrution. Find the DO block. */
> +  bblock_t *do_block = NULL;
> +  foreach_list_typed(bblock_link, child_link, link, &block->children) {
> + if (child_link->block->start_ip < block->end_ip) {
> +assert(do_block == NULL);
> +do_block = child_link->block;
> + }
> +  }
> +  assert(do_block);
> +
> +  for (int i = 0; i < num_vars; i++) {
> + if (start[i] < block->end_ip && end[i] > block->end_ip)
> +start[i] = MIN2(start[i], do_block->start_ip);
> +  }
> +   }
>  }
>
>  fs_live_variables::fs_live_variables(fs_visitor *v, const cfg_t *cfg)
> --
> 2.5.0.400.gff86faf
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/6] nir: add variant of lower_io_to_scalar to be called earlier

2017-10-10 Thread Eric Anholt
Timothy Arceri  writes:

> On 10/10/17 09:31, Timothy Arceri wrote:
>> On 10/10/17 09:06, Eric Anholt wrote:
>>> Timothy Arceri  writes:
>>>
 This is intended to be called before nir_lower_io() so that we
 can do some linking optimisations with the results. It can also
 be used with drivers that don't use nir_lower_io() at all such
 as RADV.
>>>
 +static void
 +lower_load_to_scalar_early(nir_builder *b, nir_intrinsic_instr *intr,
 +   nir_variable *var, struct hash_table 
 *split_inputs,
 +   struct hash_table *split_outputs)
 +{
 +   b->cursor = nir_before_instr(&intr->instr);
 +
 +   assert(intr->dest.is_ssa);
 +
 +   nir_ssa_def *loads[4];
 +
 +   nir_variable **chan_vars;
 +   if (var->data.mode == nir_var_shader_in) {
 +  chan_vars = get_channel_variables(split_inputs, var);
 +   } else {
 +  chan_vars = get_channel_variables(split_outputs, var);
 +   }
 +
 +   for (unsigned i = 0; i < intr->num_components; i++) {
 +  nir_variable *chan_var = chan_vars[var->data.location_frac + i];
 +  if (!chan_vars[var->data.location_frac + i]) {
 + chan_var = nir_variable_clone(var, b->shader);
 + chan_var->data.location_frac =  var->data.location_frac + i;
 + chan_var->type = glsl_channel_type(chan_var->type);
 +
 + chan_vars[var->data.location_frac + i] = chan_var;
 +
 + nir_shader_add_variable(b->shader, chan_var);
 +  }
 +
 +  nir_intrinsic_instr *chan_intr =
 + nir_intrinsic_instr_create(b->shader, intr->intrinsic);
 +  nir_ssa_dest_init(&chan_intr->instr, &chan_intr->dest,
 +    1, intr->dest.ssa.bit_size, NULL);
 +  chan_intr->num_components = 1;
 +  chan_intr->variables[0] = nir_deref_var_create(chan_intr, 
 chan_var);
 +
 +  if (intr->variables[0]->deref.child) {
 + chan_intr->variables[0]->deref.child =
 +
 &clone_deref_array(nir_deref_as_array(intr->variables[0]->deref.child),
 +   &chan_intr->variables[0]->deref)->deref;
 +  }
>>>
>>> Weird.  I would have expected that we would turn arrays into separate
>>> vectors (if possible) before this pass, so there wouldn't be any deref
>>> chains.  Has scalarizing arryas, specifically, been useful?
>> 
>> I don't have any shader-db numbers for this specifically (I guess I 
>> could check it out) but I was definitely ending up here via the test 
>> suites.
>
> Seems I've looked at it before but forgot. The GLSL IR array splitting 
> pass has a comment:
>
> " * This skips uniform/varying arrays, which would need careful
>   * handling due to their ir->location fields tying them to the GL API
>   * and other shader stages."
>
> So we don't split varying arrays at all there, and there is nothing in 
> nir that does it either. I seem to have created a file to do the 
> splitting in nir but its empty, I think I wanted to get this pass done 
> first.

7 years later, handling ir location fields doesn't sound so bad any more
:)

> To answer your question I've spotted shaders in shader-db where removing 
> components of an array helps, and also where splitting the array so we 
> can remove dead elements would help.

Awesome.  Let's continue on with this, then!


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radv: Add R16G16B16A16_SNORM fast clear support

2017-10-10 Thread Alex Smith
Signed-off-by: Alex Smith 
---
 src/amd/vulkan/radv_formats.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/amd/vulkan/radv_formats.c b/src/amd/vulkan/radv_formats.c
index 88305abd04..5c79ea7406 100644
--- a/src/amd/vulkan/radv_formats.c
+++ b/src/amd/vulkan/radv_formats.c
@@ -961,6 +961,12 @@ bool radv_format_pack_clear_color(VkFormat format,
clear_vals[1] = ((uint16_t)util_iround(CLAMP(value->float32[2], 
0.0f, 1.0f) * 0x)) & 0x;
clear_vals[1] |= 
((uint16_t)util_iround(CLAMP(value->float32[3], 0.0f, 1.0f) * 0x)) << 16;
break;
+   case VK_FORMAT_R16G16B16A16_SNORM:
+   clear_vals[0] = ((uint16_t)util_iround(CLAMP(value->float32[0], 
-1.0f, 1.0f) * 0x7fff)) & 0x;
+   clear_vals[0] |= 
((uint16_t)util_iround(CLAMP(value->float32[1], -1.0f, 1.0f) * 0x7fff)) << 16;
+   clear_vals[1] = ((uint16_t)util_iround(CLAMP(value->float32[2], 
-1.0f, 1.0f) * 0x7fff)) & 0x;
+   clear_vals[1] |= 
((uint16_t)util_iround(CLAMP(value->float32[3], -1.0f, 1.0f) * 0x7fff)) << 16;
+   break;
case VK_FORMAT_A2B10G10R10_UNORM_PACK32:
clear_vals[0] = ((uint16_t)util_iround(CLAMP(value->float32[0], 
0.0f, 1.0f) * 0x3ff)) & 0x3ff;
clear_vals[0] |= 
(((uint16_t)util_iround(CLAMP(value->float32[1], 0.0f, 1.0f) * 0x3ff)) & 0x3ff) 
<< 10;
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Piglit on windows

2017-10-10 Thread Kyriazis, George

On Oct 10, 2017, at 10:41 AM, Jose Fonseca 
mailto:jfons...@vmware.com>> wrote:

On 10/10/17 16:31, Kyriazis, George wrote:
Hello…
Piglit on windows prints out a message saying “Timeout are not implemented on 
Windows.”.  These timeouts are the test timeouts in case a test hangs.
What do people do when running piglit on windows and they hit a timeout?  I 
would imagine there would be a non-zero number of people running piglit on 
windows on a regular basis, as a regression tool...
Thank you!
George

I haven't been involved into piglit Windows testing lately, so my understanding 
might be dated.

I believe that we have timeouts when we test piglit on Windows. It's not 
implemented on piglit python framework itself, but rather on VMware testing 
framework (that driver piglit, and a bunch of other tests.)

That said, I believe it would be better long term if piglit framework had 
timeouts on Windows, as it can probably track that with finer granularity than 
we do now by putting a timeout on whole piglit or subsets of piglit tests.

Yes, definitely, that’s the way to go.

The code that I see in framework/test/base.py says:

if sys.platform == 'win32':
# There is no implementation in piglit to make timeouts work in
# windows, this uses the same Popen snippet as python 2 without
# subprocess32 to mask it. Patches are welcome.
# XXX: Should this also include cygwin?
warnings.warn('Timeouts are not implemented on Windows.')



python3's subprocess module has timeout options, so it should be relatively 
easy to implement on top of it, in a cross-platform manner.

Not being a python expert it looks like subprocess does not work on windows.

George


Jose

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Piglit on windows

2017-10-10 Thread Jose Fonseca

On 10/10/17 16:31, Kyriazis, George wrote:

Hello…

Piglit on windows prints out a message saying “Timeout are not implemented on 
Windows.”.  These timeouts are the test timeouts in case a test hangs.

What do people do when running piglit on windows and they hit a timeout?  I 
would imagine there would be a non-zero number of people running piglit on 
windows on a regular basis, as a regression tool...

Thank you!

George


I haven't been involved into piglit Windows testing lately, so my 
understanding might be dated.


I believe that we have timeouts when we test piglit on Windows. It's not 
implemented on piglit python framework itself, but rather on VMware 
testing framework (that driver piglit, and a bunch of other tests.)


That said, I believe it would be better long term if piglit framework 
had timeouts on Windows, as it can probably track that with finer 
granularity than we do now by putting a timeout on whole piglit or 
subsets of piglit tests.


python3's subprocess module has timeout options, so it should be 
relatively easy to implement on top of it, in a cross-platform manner.


Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/4] nir: set default lod to texture opcodes that needed it but don't provide it

2017-10-10 Thread Lionel Landwerlin

On 10/10/17 14:35, Samuel Iglesias Gonsálvez wrote:

Signed-off-by: Samuel Iglesias Gonsálvez 
---
  src/compiler/nir/nir_lower_tex.c | 68 
  1 file changed, 68 insertions(+)

diff --git a/src/compiler/nir/nir_lower_tex.c b/src/compiler/nir/nir_lower_tex.c
index 65681decb1c..d3380710405 100644
--- a/src/compiler/nir/nir_lower_tex.c
+++ b/src/compiler/nir/nir_lower_tex.c
@@ -717,6 +717,52 @@ linearize_srgb_result(nir_builder *b, nir_tex_instr *tex)
result->parent_instr);
  }
  
+static void

+set_default_lod(nir_builder *b, nir_tex_instr *tex)
+{
+   b->cursor = nir_before_instr(&tex->instr);
+
+   /* We are going to emit the same texture but adding a default LOD.
+*/
+   int num_srcs = tex->num_srcs + 1;
+   nir_tex_instr *new = nir_tex_instr_create(b->shader, num_srcs);
+
+   new->op = tex->op;
+   new->sampler_dim = tex->sampler_dim;
+   new->texture_index = tex->texture_index;
+   new->dest_type = tex->dest_type;
+   new->is_array = tex->is_array;
+   new->is_shadow = tex->is_shadow;
+   new->is_new_style_shadow = tex->is_new_style_shadow;
+   new->sampler_index = tex->sampler_index;
+   new->texture = nir_deref_var_clone(tex->texture, new);
+   new->sampler = nir_deref_var_clone(tex->sampler, new);
+   new->coord_components = tex->coord_components;


There are a couple of fields you're not copying : component & 
texture_array_size

Not 100% sure whether they need to be.


+
+   nir_ssa_dest_init(&new->instr, &new->dest, 4, 32, NULL);
+
+   int src_num = 0;
+   for (int i = 0; i < tex->num_srcs; i++) {
+  nir_src_copy(&new->src[src_num].src, &tex->src[i].src, new);
+  new->src[src_num].src_type = tex->src[i].src_type;
+  src_num++;
+   }
+
+   new->src[src_num].src = nir_src_for_ssa(nir_imm_int(b, 0));
+   new->src[src_num].src_type = nir_tex_src_lod;


I think you could get rid of the src_num variable and just use 
(new->num_srcs - 1) to set the default lod src.



+   src_num++;
+
+   assert(src_num == num_srcs);
+
+   nir_ssa_dest_init(&new->instr, &new->dest,
+ tex->dest.ssa.num_components, 32, NULL);
+   nir_builder_instr_insert(b, &new->instr);
+
+   nir_ssa_def_rewrite_uses(&tex->dest.ssa, nir_src_for_ssa(&new->dest.ssa));
+
+   nir_instr_remove(&tex->instr);
+}
+
  static bool
  nir_lower_tex_block(nir_block *block, nir_builder *b,
  const nir_lower_tex_options *options)
@@ -813,6 +859,28 @@ nir_lower_tex_block(nir_block *block, nir_builder *b,
   progress = true;
   continue;
}
+
+  /* TXF, TXS and TXL require a LOD but not everything we implement using 
those
+   * three opcodes provides one.  Provide a default LOD of 0.
+   */
+  if (tex->op == nir_texop_txf || tex->op == nir_texop_txs ||
+  tex->op == nir_texop_txl || tex->op == nir_texop_query_levels ||
+  (tex->op == nir_texop_tex && b->shader->stage != 
MESA_SHADER_FRAGMENT)) {
+ int i;
+ bool has_lod = false;
+ for (i = 0; i < tex->num_srcs; i++) {
+if (tex->src[i].src_type == nir_tex_src_lod) {
+   has_lod = true;
+   break;
+}
+ }
+
+ if (!has_lod) {
+set_default_lod(b, tex);
+progress = true;
+continue;
+ }
+  }
 }
  
 return progress;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 2/2] auxiliary: use vl_drm_screen_create method for surfaceless

2017-10-10 Thread Emil Velikov
Hi Suresh,

On 10 October 2017 at 07:20, sguttula  wrote:
> when display type is VA_DISPLAY_DRM_RENDERNODES and if
> drm screen create method is not enabled then vscreen is
> NULL. To fix this issue,this patch will enable
> vl_drm_screen_create() for surfaceless platforms
>
Repeating some earlier questions:

 - why do we need "surfaceless" support
 - does upstream VAAPI has surfaceless platform
 - why is the surfaceless implementation identical to the DRM one

Without any additional information, I think you should be using the
drm platform.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Piglit on windows

2017-10-10 Thread Kyriazis, George
Hello…

Piglit on windows prints out a message saying “Timeout are not implemented on 
Windows.”.  These timeouts are the test timeouts in case a test hangs.

What do people do when running piglit on windows and they hit a timeout?  I 
would imagine there would be a non-zero number of people running piglit on 
windows on a regular basis, as a regression tool...

Thank you!

George

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Testing out Kristian's fast mtx patch

2017-10-10 Thread Emil Velikov
On 10 October 2017 at 13:38, Marek Olšák  wrote:

>
> I think all our mutexes are mtx_plain, but the simple mutexes can't be used
> with cond vars.
>
Sadly we have a few mtx_recursive instances - 3 in mesa/main and 1 in svga.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] mesa: move _mesa_half_is_negative() to half_float.h

2017-10-10 Thread Roland Scheidegger
For the series:
Reviewed-by: Roland Scheidegger 

(I don't have any preference for !! or != albeit I do think doing it
explicit either way makes it more obvious indeed.)

Roland

Am 10.10.2017 um 04:29 schrieb Brian Paul:
> v2: use !! in the function to be explicit about type conversion.  Though,
> gcc generates the same code with or without the logical !!.
> ---
>  src/mesa/main/imports.h | 6 --
>  src/util/half_float.h   | 8 
>  2 files changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/src/mesa/main/imports.h b/src/mesa/main/imports.h
> index a4964a3..51fa72c 100644
> --- a/src/mesa/main/imports.h
> +++ b/src/mesa/main/imports.h
> @@ -333,12 +333,6 @@ _mesa_bitcount_64(uint64_t n);
>  #endif
>  
>  
> -static inline bool
> -_mesa_half_is_negative(GLhalfARB h)
> -{
> -   return h & 0x8000;
> -}
> -
>  extern int
>  _mesa_snprintf( char *str, size_t size, const char *fmt, ... ) PRINTFLIKE(3, 
> 4);
>  
> diff --git a/src/util/half_float.h b/src/util/half_float.h
> index 64f2042..b3bc3f6 100644
> --- a/src/util/half_float.h
> +++ b/src/util/half_float.h
> @@ -25,6 +25,7 @@
>  #ifndef _HALF_FLOAT_H_
>  #define _HALF_FLOAT_H_
>  
> +#include 
>  #include 
>  
>  #ifdef __cplusplus
> @@ -34,6 +35,13 @@ extern "C" {
>  uint16_t _mesa_float_to_half(float val);
>  float _mesa_half_to_float(uint16_t val);
>  
> +static inline bool
> +_mesa_half_is_negative(uint16_t h)
> +{
> +   return !!(h & 0x8000);
> +}
> +
> +
>  #ifdef __cplusplus
>  } /* extern C */
>  #endif
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/4] nir: set default lod to texture opcodes that needed it but don't provide it

2017-10-10 Thread Samuel Iglesias Gonsálvez
Signed-off-by: Samuel Iglesias Gonsálvez 
---
 src/compiler/nir/nir_lower_tex.c | 68 
 1 file changed, 68 insertions(+)

diff --git a/src/compiler/nir/nir_lower_tex.c b/src/compiler/nir/nir_lower_tex.c
index 65681decb1c..d3380710405 100644
--- a/src/compiler/nir/nir_lower_tex.c
+++ b/src/compiler/nir/nir_lower_tex.c
@@ -717,6 +717,52 @@ linearize_srgb_result(nir_builder *b, nir_tex_instr *tex)
   result->parent_instr);
 }
 
+static void
+set_default_lod(nir_builder *b, nir_tex_instr *tex)
+{
+   b->cursor = nir_before_instr(&tex->instr);
+
+   /* We are going to emit the same texture but adding a default LOD.
+*/
+   int num_srcs = tex->num_srcs + 1;
+   nir_tex_instr *new = nir_tex_instr_create(b->shader, num_srcs);
+
+   new->op = tex->op;
+   new->sampler_dim = tex->sampler_dim;
+   new->texture_index = tex->texture_index;
+   new->dest_type = tex->dest_type;
+   new->is_array = tex->is_array;
+   new->is_shadow = tex->is_shadow;
+   new->is_new_style_shadow = tex->is_new_style_shadow;
+   new->sampler_index = tex->sampler_index;
+   new->texture = nir_deref_var_clone(tex->texture, new);
+   new->sampler = nir_deref_var_clone(tex->sampler, new);
+   new->coord_components = tex->coord_components;
+
+   nir_ssa_dest_init(&new->instr, &new->dest, 4, 32, NULL);
+
+   int src_num = 0;
+   for (int i = 0; i < tex->num_srcs; i++) {
+  nir_src_copy(&new->src[src_num].src, &tex->src[i].src, new);
+  new->src[src_num].src_type = tex->src[i].src_type;
+  src_num++;
+   }
+
+   new->src[src_num].src = nir_src_for_ssa(nir_imm_int(b, 0));
+   new->src[src_num].src_type = nir_tex_src_lod;
+   src_num++;
+
+   assert(src_num == num_srcs);
+
+   nir_ssa_dest_init(&new->instr, &new->dest,
+ tex->dest.ssa.num_components, 32, NULL);
+   nir_builder_instr_insert(b, &new->instr);
+
+   nir_ssa_def_rewrite_uses(&tex->dest.ssa, nir_src_for_ssa(&new->dest.ssa));
+
+   nir_instr_remove(&tex->instr);
+}
+
 static bool
 nir_lower_tex_block(nir_block *block, nir_builder *b,
 const nir_lower_tex_options *options)
@@ -813,6 +859,28 @@ nir_lower_tex_block(nir_block *block, nir_builder *b,
  progress = true;
  continue;
   }
+
+  /* TXF, TXS and TXL require a LOD but not everything we implement using 
those
+   * three opcodes provides one.  Provide a default LOD of 0.
+   */
+  if (tex->op == nir_texop_txf || tex->op == nir_texop_txs ||
+  tex->op == nir_texop_txl || tex->op == nir_texop_query_levels ||
+  (tex->op == nir_texop_tex && b->shader->stage != 
MESA_SHADER_FRAGMENT)) {
+ int i;
+ bool has_lod = false;
+ for (i = 0; i < tex->num_srcs; i++) {
+if (tex->src[i].src_type == nir_tex_src_lod) {
+   has_lod = true;
+   break;
+}
+ }
+
+ if (!has_lod) {
+set_default_lod(b, tex);
+progress = true;
+continue;
+ }
+  }
}
 
return progress;
-- 
2.14.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/4] i965/fs: remove setting default LOD in the backend

2017-10-10 Thread Samuel Iglesias Gonsálvez
It is already done in NIR.

Signed-off-by: Samuel Iglesias Gonsálvez 
---
 src/intel/compiler/brw_fs_nir.cpp | 9 -
 1 file changed, 9 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 5b8ccd50bff..5c2f04ea268 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4518,15 +4518,6 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, 
nir_tex_instr *instr)
   unreachable("unknown texture opcode");
}
 
-   /* TXS and TXL require a LOD but not everything we implement using those
-* two opcodes provides one.  Provide a default LOD of 0.
-*/
-   if ((opcode == SHADER_OPCODE_TXS_LOGICAL ||
-opcode == SHADER_OPCODE_TXL_LOGICAL) &&
-   srcs[TEX_LOGICAL_SRC_LOD].file == BAD_FILE) {
-  srcs[TEX_LOGICAL_SRC_LOD] = brw_imm_ud(0u);
-   }
-
if (instr->op == nir_texop_tg4) {
   if (instr->component == 1 &&
   key_tex->gather_channel_quirk_mask & (1 << texture)) {
-- 
2.14.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 3/4] i965/vec4: remove setting default LOD in the backend

2017-10-10 Thread Samuel Iglesias Gonsálvez
It is already done in NIR.

Signed-off-by: Samuel Iglesias Gonsálvez 
---
 src/intel/compiler/brw_vec4_nir.cpp |  9 -
 src/intel/compiler/brw_vec4_visitor.cpp | 12 
 2 files changed, 21 deletions(-)

diff --git a/src/intel/compiler/brw_vec4_nir.cpp 
b/src/intel/compiler/brw_vec4_nir.cpp
index 9200ffa0ed7..0a1caa9fad8 100644
--- a/src/intel/compiler/brw_vec4_nir.cpp
+++ b/src/intel/compiler/brw_vec4_nir.cpp
@@ -2228,15 +2228,6 @@ vec4_visitor::nir_emit_texture(nir_tex_instr *instr)
   }
}
 
-   /* TXS and TXL require a LOD but not everything we implement using those
-* two opcodes provides one.  Provide a default LOD of 0.
-*/
-   if ((instr->op == nir_texop_txs ||
-instr->op == nir_texop_txl) &&
-   lod.file == BAD_FILE) {
-  lod = brw_imm_ud(0u);
-   }
-
if (instr->op == nir_texop_txf_ms ||
instr->op == nir_texop_samples_identical) {
   assert(coord_type != NULL);
diff --git a/src/intel/compiler/brw_vec4_visitor.cpp 
b/src/intel/compiler/brw_vec4_visitor.cpp
index 88e80aaa3af..52d1dfc8fdd 100644
--- a/src/intel/compiler/brw_vec4_visitor.cpp
+++ b/src/intel/compiler/brw_vec4_visitor.cpp
@@ -915,18 +915,6 @@ vec4_visitor::emit_texture(ir_texture_opcode op,
src_reg surface_reg,
src_reg sampler_reg)
 {
-   /* The sampler can only meaningfully compute LOD for fragment shader
-* messages. For all other stages, we change the opcode to TXL and hardcode
-* the LOD to 0.
-*
-* textureQueryLevels() is implemented in terms of TXS so we need to pass a
-* valid LOD argument.
-*/
-   if (op == ir_tex || op == ir_query_levels) {
-  assert(lod.file == BAD_FILE);
-  lod = brw_imm_f(0.0f);
-   }
-
enum opcode opcode;
switch (op) {
case ir_tex: opcode = SHADER_OPCODE_TXL; break;
-- 
2.14.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 4/4] spirv: add support for images and samplers as function arguments

2017-10-10 Thread Samuel Iglesias Gonsálvez
Fixes:

dEQP-VK.spirv_assembly.instruction.*.image_sampler.*

Signed-off-by: Samuel Iglesias Gonsálvez 
---
 src/compiler/spirv/vtn_cfg.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/src/compiler/spirv/vtn_cfg.c b/src/compiler/spirv/vtn_cfg.c
index 25ff254bcec..8b139068fb0 100644
--- a/src/compiler/spirv/vtn_cfg.c
+++ b/src/compiler/spirv/vtn_cfg.c
@@ -111,6 +111,25 @@ vtn_cfg_handle_prepass_instruction(struct vtn_builder *b, 
SpvOp opcode,
  param->name = ralloc_strdup(param, val->name);
 
  val->pointer = vtn_pointer_for_variable(b, vtn_var, type);
+  } else if (type->base_type == vtn_base_type_image || type->base_type == 
vtn_base_type_sampler) {
+ struct vtn_variable *vtn_var = rzalloc(b, struct vtn_variable);
+ struct vtn_type *ptr_type = rzalloc(b, struct vtn_type);
+ ptr_type->deref = type;
+ ptr_type->base_type = vtn_base_type_pointer;
+
+ vtn_var->type = type;
+ vtn_var->var = param;
+ vtn_var->mode = (type->base_type == vtn_base_type_image) ?
+vtn_variable_mode_image : vtn_variable_mode_sampler;
+ param->interface_type = type->type;
+
+ struct vtn_value *val =
+vtn_push_value(b, w[2], vtn_value_type_pointer);
+
+ /* Name the parameter so it shows up nicely in NIR */
+ param->name = ralloc_strdup(param, val->name);
+
+ val->pointer = vtn_pointer_for_variable(b, vtn_var, ptr_type);
   } else {
  /* We're a regular SSA value. */
  struct vtn_ssa_value *param_ssa =
-- 
2.14.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/6] Prehash all the things

2017-10-10 Thread Thomas Helland
Hi!

Thanks for keeping up with the long wait =)
I revisited this not too long ago, and found that with the new
pointer hashing function the benefits are zero to negative
from this series. I've reduced it to only the instruction set and
the string_to_uint_map patch but it's not convincing.
I suspect we are seeing cache miss vs hashing tradeoffs.
So I've basically put it to rest for now. Might give it another
go sometime, but I think right now the effort is better spent
elsewhere in the codebase.

Greetings,
Thomas

2017-10-09 14:02 GMT+02:00 Dieter Nützel :
> Hello Thomas,
>
> now, that you have write commit 'only' this one is missing.
> Maybe you have time for this.
>
> Latest version do not apply any longer.
>
> Wende an: util: Avoid computing hash twice in string_to_uint_map
> error: src/util/string_to_uint_map.h ist nicht im Index
> Anwendung des Patches fehlgeschlagen bei 0006 util: Avoid computing hash
> twice in string_to_uint_map
>
> Greetings,
> Dieter
>
>
> Am 19.06.2017 18:09, schrieb Dieter Nützel:
>>
>> Ping!
>>
>> Any news, reviews --- anyone?
>>
>> I'm running this all day without a hitch.
>>
>> Cheers,
>> Dieter
>>
>> Am 23.05.2017 05:40, schrieb Dieter Nützel:
>>>
>>> For the series:
>>>
>>> Tested-by: Dieter Nützel 
>>>
>>> on radeonsi/RX580
>>>
>>> Unigine_Heaven-4.0, Unigine_Valley-1.0, Unigine_Superposition-1.0,
>>> LS2015 (Wine-staging), Mesa-demos (objviewer)
>>>
>>> Dieter
>>>
>>> Am 22.05.2017 20:55, schrieb Thomas Helland:

 While this doesn't prehash all the things, it does switch quite a lot
 of places from doing a search and then a subsequent insert to first
 hash the key, and then use this hash when searching / inserting.
 While our new pointer hashing function remedied much of our overhead
 hashing pointers, there is still more to gain here. This cuts executed
 instructions / task-clock by about 0.5% on a shader-db run on my i965
 running machine. While that's not a lot, it is still a nice little
 improvement on the way to less overhead. The changes should also be
 fairly trivial, so it's not much of a burden.

 Thomas Helland (6):
   glsl: Prehash in refcount hash table to reduce hashing
   nir: Prehash in instr_set to avoid hashing twice
   glsl: Prehash in constant propagation
   glsl: Prehash in constant variable pass to avoid hashing twice
   glsl: Prehash to avoid computing the hash twice
   util: Avoid computing hash twice in string_to_uint_map

  src/compiler/glsl/ir_variable_refcount.cpp  | 7 +--
  src/compiler/glsl/opt_constant_propagation.cpp  | 8 +---
  src/compiler/glsl/opt_constant_variable.cpp | 6 --
  src/compiler/glsl/opt_copy_propagation_elements.cpp | 7 +--
  src/compiler/nir/nir_instr_set.c| 7 +--
  src/util/string_to_uint_map.h   | 9 ++---
  6 files changed, 30 insertions(+), 14 deletions(-)
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] st/va: Implement vaExportSurfaceHandle()

2017-10-10 Thread Leo Liu



On 10/07/2017 01:22 PM, Mark Thompson wrote:

On 07/10/17 15:23, Leo Liu wrote:

On 2017-10-01 01:40 PM, Mark Thompson wrote:

This is a new interface in libva2 to support wider use-cases of passing
surfaces to external APIs.  In particular, this allows export of NV12 and
P010 surfaces.

v2: Convert surfaces to progressive before exporting them (Christian).

v3: Set destination rectangle to match source when converting (Leo).
  Add guards to allow building with libva1.

Signed-off-by: Mark Thompson 

This patch is Acked-and-Tested-by: Leo Liu< leo@amd.com>, also pushed.

Thank you!

Can you look at (or direct me to who I should be pointing at) patch 1/2 as well?




  (Needed for the H.265 10-bit interop on the EGL side.)

Right.
The patch is Acked-by: Leo Liu 

I will commit it if there is no objection in a couple of days.

Thanks,
Leo




- Mark


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Testing out Kristian's fast mtx patch

2017-10-10 Thread Marek Olšák
On Oct 10, 2017 4:46 AM, "Timothy Arceri"  wrote:

After a recent discussion about this code from 2015 I was curious
to give it a try. The outstanding review item was that we shouldn't
be replacing the C11 mtx type/functions with our own, so I've renamed
the fast path to simple_mtx* and added a couple of patches to make use
of it.

The idea is this fast mtx can be used in place of the full mtx
implementation when its of type mtx_plain.


Thanks for picking this up!

I think all our mutexes are mtx_plain, but the simple mutexes can't be used
with cond vars.

Marek


I though if anywhere we might see a change in the drawoverhead piglit
test but I didn't see any real change.

Anyway since I've made the updates I thought I'd send it out. Maybe
someone else might find some better results. Kristian reported a 10%
increase in some internal Intel benchmarks, I wonder if thats still the
case.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: lower ffma in nir to mad.

2017-10-10 Thread Marek Olšák
For the series:

Reviewed-by: Marek Olšák 

Marek

On Oct 10, 2017 1:56 PM, "Nicolai Hähnle"  wrote:

> Both patches:
>
> Reviewed-by: Nicolai Hähnle 
>
> On 10.10.2017 05:25, Dave Airlie wrote:
>
>> From: Dave Airlie 
>>
>> This lowers ffma to a * b + c.
>>
>> This seems like it should keep Marek happiest, so
>> we'd never get to the fma instruction emission code.
>>
>> Signed-off-by: Dave Airlie 
>> ---
>>   src/gallium/drivers/radeonsi/si_pipe.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_pipe.c
>> b/src/gallium/drivers/radeonsi/si_pipe.c
>> index 4384987..b02247d 100644
>> --- a/src/gallium/drivers/radeonsi/si_pipe.c
>> +++ b/src/gallium/drivers/radeonsi/si_pipe.c
>> @@ -794,6 +794,7 @@ static const struct nir_shader_compiler_options
>> nir_options = {
>> .lower_fsat = true,
>> .lower_fdiv = true,
>> .lower_sub = true,
>> +   .lower_ffma = true,
>> .lower_pack_snorm_2x16 = true,
>> .lower_pack_snorm_4x8 = true,
>> .lower_pack_unorm_2x16 = true,
>>
>>
>
> --
> Lerne, wie die Welt wirklich ist,
> Aber vergiss niemals, wie sie sein sollte.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] ac/surface: add ac_surface::is_displayable

2017-10-10 Thread Nicolai Hähnle

On 10.10.2017 13:11, Daniel Stone wrote:

Hi Marek,

On 9 October 2017 at 18:05, Marek Olšák  wrote:

 surf->is_linear = surf->u.legacy.level[0].mode == 
RADEON_SURF_MODE_LINEAR_ALIGNED;
+   surf->is_displayable = surf->micro_tile_mode == 
RADEON_MICRO_MODE_DISPLAY ||
+  surf->micro_tile_mode == 
RADEON_MICRO_MODE_ROTATED;


Are linear surfaces not displayable on gfx6 parts?


With from Daniel's comment fixed, the series is

Reviewed-by: Nicolai Hähnle 




Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] st/glsl_to_tgsi: fix DFRACEXP with only one destination

2017-10-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Replace the undefined destination by a new temporary register.

Cleanup merge_two_dsts while we're at it.
---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 38 ++
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 168719b33c9..235690510b9 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -5199,54 +5199,62 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
return removed;
 }
 
 /* merge DFRACEXP instructions into one. */
 void
 glsl_to_tgsi_visitor::merge_two_dsts(void)
 {
/* We never delete inst, but we may delete its successor. */
foreach_in_list(glsl_to_tgsi_instruction, inst, &this->instructions) {
   glsl_to_tgsi_instruction *inst2;
-  bool merged;
+  unsigned defined;
+
   if (num_inst_dst_regs(inst) != 2)
  continue;
 
   if (inst->dst[0].file != PROGRAM_UNDEFINED &&
   inst->dst[1].file != PROGRAM_UNDEFINED)
  continue;
 
+  assert(inst->dst[0].file != PROGRAM_UNDEFINED ||
+ inst->dst[1].file != PROGRAM_UNDEFINED);
+
+  if (inst->dst[0].file == PROGRAM_UNDEFINED)
+ defined = 1;
+  else
+ defined = 0;
+
   inst2 = (glsl_to_tgsi_instruction *) inst->next;
   do {
-
- if (inst->src[0].file == inst2->src[0].file &&
+ if (inst->op == inst2->op &&
+ inst2->dst[defined].file == PROGRAM_UNDEFINED &&
+ inst->src[0].file == inst2->src[0].file &&
  inst->src[0].index == inst2->src[0].index &&
  inst->src[0].type == inst2->src[0].type &&
  inst->src[0].swizzle == inst2->src[0].swizzle)
 break;
  inst2 = (glsl_to_tgsi_instruction *) inst2->next;
   } while (inst2);
 
-  if (!inst2)
+  if (!inst2) {
+ /* Undefined destinations are not allowed, substitute with an unused
+  * temporary register.
+  */
+ st_src_reg tmp = get_temp(glsl_type::vec4_type);
+ inst->dst[defined ^ 1] = st_dst_reg(tmp);
+ inst->dst[defined ^ 1].writemask = 0;
  continue;
-  merged = false;
-  if (inst->dst[0].file == PROGRAM_UNDEFINED) {
- merged = true;
- inst->dst[0] = inst2->dst[0];
-  } else if (inst->dst[1].file == PROGRAM_UNDEFINED) {
- inst->dst[1] = inst2->dst[1];
- merged = true;
   }
 
-  if (merged) {
- inst2->remove();
- delete inst2;
-  }
+  inst->dst[defined ^ 1] = inst2->dst[defined ^ 1];
+  inst2->remove();
+  delete inst2;
}
 }
 
 /* Merges temporary registers together where possible to reduce the number of
  * registers needed to run a program.
  *
  * Produces optimal code only after copy propagation and dead code elimination
  * have been run. */
 void
 glsl_to_tgsi_visitor::merge_registers(void)
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] st/glsl_to_tgsi: the second destination doesn't support relative addressing

2017-10-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

It's not used -- DFRACEXP gets array indexes of its exponent out-parameter
lowered earlier -- and it wouldn't have worked correctly anyway when both
dst and dst1 use relative addressing.
---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 235690510b9..394d39ade63 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -382,42 +382,39 @@ glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, 
unsigned op,
int num_reladdr = 0, i, j;
bool dst_is_64bit[2];
 
op = get_opcode(op, dst, src0, src1);
 
/* If we have to do relative addressing, we want to load the ARL
 * reg directly for one of the regs, and preload the other reladdr
 * sources into temps.
 */
num_reladdr += dst.reladdr != NULL || dst.reladdr2;
-   num_reladdr += dst1.reladdr != NULL || dst1.reladdr2;
+   assert(!dst1.reladdr); /* should be lowered in earlier passes */
num_reladdr += src0.reladdr != NULL || src0.reladdr2 != NULL;
num_reladdr += src1.reladdr != NULL || src1.reladdr2 != NULL;
num_reladdr += src2.reladdr != NULL || src2.reladdr2 != NULL;
num_reladdr += src3.reladdr != NULL || src3.reladdr2 != NULL;
 
reladdr_to_temp(ir, &src3, &num_reladdr);
reladdr_to_temp(ir, &src2, &num_reladdr);
reladdr_to_temp(ir, &src1, &num_reladdr);
reladdr_to_temp(ir, &src0, &num_reladdr);
 
if (dst.reladdr || dst.reladdr2) {
   if (dst.reladdr)
  emit_arl(ir, address_reg, *dst.reladdr);
   if (dst.reladdr2)
  emit_arl(ir, address_reg2, *dst.reladdr2);
   num_reladdr--;
}
-   if (dst1.reladdr) {
-  emit_arl(ir, address_reg, *dst1.reladdr);
-  num_reladdr--;
-   }
+
assert(num_reladdr == 0);
 
/* inst->op has only 8 bits. */
STATIC_ASSERT(TGSI_OPCODE_LAST <= 255);
 
inst->op = op;
inst->precise = this->precise;
inst->info = tgsi_get_opcode_info(op);
inst->dst[0] = dst;
inst->dst[1] = dst1;
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] st/glsl_to_tgsi: fix indirect access to 64-bit integer

2017-10-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Make sure we actually allocate two adjacent TGSI temporaries. The
current code fails e.g. when an arithmetic operation has two
operands with indirect accesses.

I will send out a new piglit test
(arb_gpu_shader_int64/execution/indirect-array-two-accesses.shader_test)

Cc: mesa-sta...@lists.freedesktop.org
---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index e01268bbbea..168719b33c9 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -1262,21 +1262,21 @@ void
 glsl_to_tgsi_visitor::reladdr_to_temp(ir_instruction *ir,
   st_src_reg *reg, int *num_reladdr)
 {
if (!reg->reladdr && !reg->reladdr2)
   return;
 
if (reg->reladdr) emit_arl(ir, address_reg, *reg->reladdr);
if (reg->reladdr2) emit_arl(ir, address_reg2, *reg->reladdr2);
 
if (*num_reladdr != 1) {
-  st_src_reg temp = get_temp(reg->type == GLSL_TYPE_DOUBLE ? 
glsl_type::dvec4_type : glsl_type::vec4_type);
+  st_src_reg temp = get_temp(glsl_type::get_instance(reg->type, 4, 1));
 
   emit_asm(ir, TGSI_OPCODE_MOV, st_dst_reg(temp), *reg);
   *reg = temp;
}
 
(*num_reladdr)--;
 }
 
 void
 glsl_to_tgsi_visitor::visit(ir_expression *ir)
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 2/2] glsl: fix interpolateAtXxx(some_vec[idx], ...) with dynamic idx

2017-10-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

The dynamic index of a vector (not array!) is lowered to a sequence of
conditional assignments. However, the interpolate_at_* expressions
require that the interpolant is an l-value of a shader input.

So instead of doing conditional assignments of parts of the shader input
and then interpolating that (which is nonsensical), we interpolate the
entire shader input and then do conditional assignments of the interpolated
result.
---
 .../glsl/lower_vec_index_to_cond_assign.cpp| 31 +-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/src/compiler/glsl/lower_vec_index_to_cond_assign.cpp 
b/src/compiler/glsl/lower_vec_index_to_cond_assign.cpp
index a26253998e0..89244266602 100644
--- a/src/compiler/glsl/lower_vec_index_to_cond_assign.cpp
+++ b/src/compiler/glsl/lower_vec_index_to_cond_assign.cpp
@@ -121,21 +121,50 @@ 
ir_vec_index_to_cond_assign_visitor::convert_vec_index_to_cond_assign(void *mem_
 
this->progress = true;
return deref(var).val;
 }
 
 ir_rvalue *
 
ir_vec_index_to_cond_assign_visitor::convert_vector_extract_to_cond_assign(ir_rvalue
 *ir)
 {
ir_expression *const expr = ir->as_expression();
 
-   if (expr == NULL || expr->operation != ir_binop_vector_extract)
+   if (expr == NULL)
+  return ir;
+
+   if (expr->operation == ir_unop_interpolate_at_centroid ||
+   expr->operation == ir_binop_interpolate_at_offset ||
+   expr->operation == ir_binop_interpolate_at_sample) {
+  /* Lower interpolateAtXxx(some_vec[idx], ...) to
+   * interpolateAtXxx(some_vec, ...)[idx] before lowering to conditional
+   * assignments, to maintain the rule that the interpolant is an l-value
+   * referring to a (part of a) shader input.
+   *
+   * This is required when idx is dynamic (otherwise it gets lowered to
+   * a swizzle).
+   */
+  ir_expression *const interpolant = expr->operands[0]->as_expression();
+  if (!interpolant || interpolant->operation != ir_binop_vector_extract)
+ return ir;
+
+  ir_rvalue *vec_input = interpolant->operands[0];
+  ir_expression *const vec_interpolate =
+ new(base_ir) ir_expression(expr->operation, vec_input->type,
+vec_input, expr->operands[1]);
+
+  return convert_vec_index_to_cond_assign(ralloc_parent(ir),
+  vec_interpolate,
+  interpolant->operands[1],
+  ir->type);
+   }
+
+   if (expr->operation != ir_binop_vector_extract)
   return ir;
 
return convert_vec_index_to_cond_assign(ralloc_parent(ir),
expr->operands[0],
expr->operands[1],
ir->type);
 }
 
 ir_visitor_status
 ir_vec_index_to_cond_assign_visitor::visit_enter(ir_expression *ir)
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 1/2] glsl: allow any l-value of an input variable as interpolant in interpolateAt*

2017-10-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

The intended rule has been clarified in GLSL 4.60, Section 8.13.2
(Interpolation Functions):

   "For all of the interpolation functions, interpolant must be an l-value
from an in declaration; this can include a variable, a block or
structure member, an array element, or some combination of these.
Component selection operators (e.g., .xy) may be used when specifying
interpolant."

For members of interface blocks, var->data.must_be_shader_input must be
determined on-the-fly after lowering interface blocks, since we don't want
to disable varying packing for an entire block just because one input in it
is used in interpolateAt*.

v2: keep setting must_be_shader_input in ast_function (Ian)
v3: follow the relaxed rule of GLSL 4.60
v4: only apply the relaxed rules to desktop GL
(the ES WG decided that the relaxed rules may apply in a future version
 but not retroactively; see also
 
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.*)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101378
Reviewed-by: Ian Romanick  (v1)
---
 src/compiler/glsl/ast_function.cpp | 19 ++-
 src/compiler/glsl/lower_named_interface_blocks.cpp | 18 ++
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/src/compiler/glsl/ast_function.cpp 
b/src/compiler/glsl/ast_function.cpp
index 46a61e46fd5..d1596c272e6 100644
--- a/src/compiler/glsl/ast_function.cpp
+++ b/src/compiler/glsl/ast_function.cpp
@@ -220,33 +220,42 @@ verify_parameter_modes(_mesa_glsl_parse_state *state,
  if (val->ir_type == ir_type_swizzle) {
 if (!state->is_version(440, 0)) {
_mesa_glsl_error(&loc, state,
 "parameter `%s` must not be swizzled",
 formal->name);
return false;
 }
 val = ((ir_swizzle *)val)->val;
  }
 
- while (val->ir_type == ir_type_dereference_array) {
-val = ((ir_dereference_array *)val)->array;
+ for (;;) {
+if (val->ir_type == ir_type_dereference_array) {
+   val = ((ir_dereference_array *)val)->array;
+} else if (val->ir_type == ir_type_dereference_record &&
+   !state->es_shader) {
+   val = ((ir_dereference_record *)val)->record;
+} else
+   break;
  }
 
- if (!val->as_dereference_variable() ||
- val->variable_referenced()->data.mode != ir_var_shader_in) {
+ ir_variable *var = NULL;
+ if (const ir_dereference_variable *deref_var = 
val->as_dereference_variable())
+var = deref_var->variable_referenced();
+
+ if (!var || var->data.mode != ir_var_shader_in) {
 _mesa_glsl_error(&loc, state,
  "parameter `%s` must be a shader input",
  formal->name);
 return false;
  }
 
- val->variable_referenced()->data.must_be_shader_input = 1;
+ var->data.must_be_shader_input = 1;
   }
 
   /* Verify that 'out' and 'inout' actual parameters are lvalues. */
   if (formal->data.mode == ir_var_function_out
   || formal->data.mode == ir_var_function_inout) {
  const char *mode = NULL;
  switch (formal->data.mode) {
  case ir_var_function_out:   mode = "out";   break;
  case ir_var_function_inout: mode = "inout"; break;
  default:assert(false);  break;
diff --git a/src/compiler/glsl/lower_named_interface_blocks.cpp 
b/src/compiler/glsl/lower_named_interface_blocks.cpp
index 064694128bf..136352a131b 100644
--- a/src/compiler/glsl/lower_named_interface_blocks.cpp
+++ b/src/compiler/glsl/lower_named_interface_blocks.cpp
@@ -108,20 +108,21 @@ public:
 
flatten_named_interface_blocks_declarations(void *mem_ctx)
   : mem_ctx(mem_ctx),
 interface_namespace(NULL)
{
}
 
void run(exec_list *instructions);
 
virtual ir_visitor_status visit_leave(ir_assignment *);
+   virtual ir_visitor_status visit_leave(ir_expression *);
virtual void handle_rvalue(ir_rvalue **rvalue);
 };
 
 } /* anonymous namespace */
 
 void
 flatten_named_interface_blocks_declarations::run(exec_list *instructions)
 {
interface_namespace = _mesa_hash_table_create(NULL, _mesa_key_hash_string,
  _mesa_key_string_equal);
@@ -231,20 +232,37 @@ 
flatten_named_interface_blocks_declarations::visit_leave(ir_assignment *ir)
   }
 
   ir_variable *lhs_var =  lhs_rec_tmp->variable_referenced();
   if (lhs_var) {
  lhs_var->data.assigned = 1;
   }
}
return rvalue_visit(ir);
 }
 
+ir_visitor_status
+flatten_named_interface_blocks_declarations::visit_leave(ir_expression *ir)
+{
+   ir_visitor_status status = rvalue_visit(ir);
+
+   if (ir->operati

Re: [Mesa-dev] [PATCH 2/6] mesa/st/glsl_to_tgsi: Correct debug output for indirect access

2017-10-10 Thread Nicolai Hähnle

On 10.10.2017 13:15, Gert Wollny wrote:



+   os << "[";
+   if (reg.reladdr)
+  os << *reg.reladdr << "+";
+   if (reg.reladdr2)
+  os << *reg.reladdr2 << "+";


This isn't how that works. reladdr2 is the relative address for 2D
register accesses; see has_index2 and index2D.


Actually, I don't like this part of the dump at all: it is not known
from the register information I can access here whether an address
which address register index goes where and if an address register is
used at all. - and documentation on the issue is also very sparse. From
the code accessing has_index2 I figured that the dump should look
something like this: ?

template 
void dump_reg_access(std::ostream& os, const st_reg& reg)
{
    os << tgsi_file_names[reg.file];
    if (reg.file == PROGRAM_ARRAY)
   os << "(" << reg.array_id << ")";

    os << "[";
    if (reg.reladdr)
   os << *reg.reladdr << "+";
    os << reg.index << "]";

    if (reg.has_index2) {
   os << "[";
   if (reg.reladdr2)
  os << *reg.reladdr2 << "+";
   os << reg.index2D << "]";
    }


They should be the other way around, actually. index2D/reladdr2 are used 
for determining which constant buffer to use or which input/output 
vertex to reference in geometry/tessellation shaders. Those indices come 
first.


About the fact that reladdr can be present but it's not clear which 
register is actually used for the reladdr: totally agreed that this is 
pretty horrible code.


Perhaps a separate cleanup patch is in order where reladdr and reladdr2 
are appropriately modified when ARL instructions are generated.


Cheers,
Nicolai



}




Besides, you're not actually printing reg.index.

Same for when you're printing the destination.


This I was already about to fix.

Many thanks for you comments (also the other ones).

Best,
Gert





--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Use a 565 format for GL_RGB and GL_UNSIGNED_SHORT_5_6_5 textures.

2017-10-10 Thread Eero Tamminen

Hi,

On 05.10.2017 18:15, Eero Tamminen wrote:

On 05.10.2017 01:16, Eric Anholt wrote:

Kenneth Graunke  writes:


Found while trying to optimize an application.

Not observed to help performance on i965, but should at least reduce
the memory usage of such textures a bit.


I run benchmarks on it on few platforms and didn't see any performance 
changes either.


Tested-by: Eero Tamminen 


After looking at the longer time perf trends, there's actually an impact 
in one test-case, GLB v2.7 Egypt (when using 1/2 HD screen size).  On 
half the platforms, the change is still within variance, but not on all 
platforms, and ezBench bisected this commit.


It has visible:
- Positive impact on SNB GT2
- Negative impact on BYT, BSW, BXT, SKL GT2, KBL GT2

Impact is most visible on (non-TDP limited) SKL GT2, about 4% drop.

Because of the nature of the test (e.g. easily impacted by power 
management because it's partly CPU bound), I don't think this enough of 
a reason to revert the change, results might change again when textures 
get render buffer compression.




Reviewed-by: Eric Anholt 



- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] etnaviv: Do GC3000 resolve-in-place when possible

2017-10-10 Thread Lucas Stach
Am Samstag, den 30.09.2017, 10:11 +0200 schrieb Wladimir J. van der Laan:
> If an RS blit is done with source exactly the same as destination, and
> the hardware supports this, do an in-place resolve. This only fills in
> tiles that have not been rendered to using information from the TS.
> 
> This is the same as the blob does and potentially saves significant
> bandwidth when doing i.MX6qp scanout using PRE, and when rendering to
> textures (though here using sampler TS would be even better).
> 
> Signed-off-by: Wladimir J. van der Laan 

Reviewed-by: Lucas Stach 

> ---
>  src/gallium/drivers/etnaviv/etnaviv_clear_blit.c |  1 +
>  src/gallium/drivers/etnaviv/etnaviv_emit.c   |  9 -
>  src/gallium/drivers/etnaviv/etnaviv_rs.c | 17 ++---
>  src/gallium/drivers/etnaviv/etnaviv_rs.h |  2 ++
>  4 files changed, 25 insertions(+), 4 deletions(-)
> 
> This depends (for updating the rnndb headers) on Lucas Stach's patch
> "etnaviv: update HW headers and fix provoking vertex".
> 
> diff --git a/src/gallium/drivers/etnaviv/etnaviv_clear_blit.c 
> b/src/gallium/drivers/etnaviv/etnaviv_clear_blit.c
> index c85ada9..c62287b 100644
> --- a/src/gallium/drivers/etnaviv/etnaviv_clear_blit.c
> +++ b/src/gallium/drivers/etnaviv/etnaviv_clear_blit.c
> @@ -591,6 +591,7 @@ etna_try_rs_blit(struct pipe_context *pctx,
>    .source = src->bo,
>    .source_offset = src_offset,
>    .source_stride = src_lev->stride,
> +  .source_padded_width = src_lev->padded_width,
>    .source_padded_height = src_lev->padded_height,
>    .dest_format = translate_rs_format(dst_format),
>    .dest_tiling = dst->layout,
> diff --git a/src/gallium/drivers/etnaviv/etnaviv_emit.c 
> b/src/gallium/drivers/etnaviv/etnaviv_emit.c
> index c2117d5..707b1e7 100644
> --- a/src/gallium/drivers/etnaviv/etnaviv_emit.c
> +++ b/src/gallium/drivers/etnaviv/etnaviv_emit.c
> @@ -173,7 +173,14 @@ etna_submit_rs_state(struct etna_context *ctx,
>  
> ctx->stats.rs_operations++;
>  
> -   if (screen->specs.pixel_pipes == 1) {
> +   if (cs->RS_KICKER_INPLACE) {
> +  etna_cmd_stream_reserve(stream, 6);
> +  etna_coalesce_start(stream, &coalesce);
> +  /* 0/1 */ EMIT_STATE(RS_EXTRA_CONFIG, cs->RS_EXTRA_CONFIG);
> +  /* 2/3 */ EMIT_STATE(RS_SOURCE_STRIDE, cs->RS_SOURCE_STRIDE);
> +  /* 4/5 */ EMIT_STATE(RS_KICKER_INPLACE, cs->RS_KICKER_INPLACE);
> +  etna_coalesce_end(stream, &coalesce);
> +   } else if (screen->specs.pixel_pipes == 1) {
>    etna_cmd_stream_reserve(stream, 22);
>    etna_coalesce_start(stream, &coalesce);
>    /* 0/1 */ EMIT_STATE(RS_CONFIG, cs->RS_CONFIG);
> diff --git a/src/gallium/drivers/etnaviv/etnaviv_rs.c 
> b/src/gallium/drivers/etnaviv/etnaviv_rs.c
> index 5c108a6..c9072c2 100644
> --- a/src/gallium/drivers/etnaviv/etnaviv_rs.c
> +++ b/src/gallium/drivers/etnaviv/etnaviv_rs.c
> @@ -118,10 +118,21 @@ etna_compile_rs_state(struct etna_context *ctx, struct 
> compiled_rs_state *cs,
> cs->RS_FILL_VALUE[3] = rs->clear_value[3];
> cs->RS_EXTRA_CONFIG = VIVS_RS_EXTRA_CONFIG_AA(rs->aa) |
>   VIVS_RS_EXTRA_CONFIG_ENDIAN(rs->endian_mode);
> -   /* TODO: cs->RS_UNK016B0 = s->size / 64 ?
> -* The blob does this consistently but there seems to be no currently 
> supported
> -* model that needs it.
> +
> +   /* If source the same as destination, and the hardware supports this,
> +* do an in-place resolve to fill in unrendered tiles.
>  */
> +   if (ctx->specs.single_buffer && rs->source == rs->dest &&
> + rs->source_offset == rs->dest_offset &&
> + rs->source_format == rs->dest_format &&
> + rs->source_tiling == rs->dest_tiling &&
> + rs->source_stride == rs->dest_stride &&
> + !rs->downsample_x && !rs->downsample_y &&
> + !rs->swap_rb && !rs->flip &&
> + !rs->clear_mode && rs->source_padded_width) {
> +  /* Total number of tiles (same as for autodisable) */
> +  cs->RS_KICKER_INPLACE = rs->source_padded_width * 
> rs->source_padded_height / 16;
> +   }
>  }
>  
>  void
> diff --git a/src/gallium/drivers/etnaviv/etnaviv_rs.h 
> b/src/gallium/drivers/etnaviv/etnaviv_rs.h
> index ec5b659..171d3fa 100644
> --- a/src/gallium/drivers/etnaviv/etnaviv_rs.h
> +++ b/src/gallium/drivers/etnaviv/etnaviv_rs.h
> @@ -43,6 +43,7 @@ struct rs_state {
> struct etna_bo *source;
> uint32_t source_offset;
> uint32_t source_stride;
> +   uint32_t source_padded_width; /* total padded width (only needed for 
> source) */
> uint32_t source_padded_height; /* total padded height */
> struct etna_bo *dest;
> uint32_t dest_offset;
> @@ -69,6 +70,7 @@ struct compiled_rs_state {
> uint32_t RS_FILL_VALUE[4];
> uint32_t RS_EXTRA_CONFIG;
> uint32_t RS_PIPE_OFFSET[2];
> +   uint32_t RS_KICKER_INPLACE; /* Set if source is destination */
>  
> struct etna_reloc source[2];
> struct etna_reloc dest[2];
__

Re: [Mesa-dev] [PATCH 2/2] radeonsi: lower ffma in nir to mad.

2017-10-10 Thread Nicolai Hähnle

Both patches:

Reviewed-by: Nicolai Hähnle 

On 10.10.2017 05:25, Dave Airlie wrote:

From: Dave Airlie 

This lowers ffma to a * b + c.

This seems like it should keep Marek happiest, so
we'd never get to the fma instruction emission code.

Signed-off-by: Dave Airlie 
---
  src/gallium/drivers/radeonsi/si_pipe.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 4384987..b02247d 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -794,6 +794,7 @@ static const struct nir_shader_compiler_options nir_options 
= {
.lower_fsat = true,
.lower_fdiv = true,
.lower_sub = true,
+   .lower_ffma = true,
.lower_pack_snorm_2x16 = true,
.lower_pack_snorm_4x8 = true,
.lower_pack_unorm_2x16 = true,




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600: cleanup llvm ir target selection.

2017-10-10 Thread Nicolai Hähnle

Both patches:

Reviewed-by: Nicolai Hähnle 

On 09.10.2017 22:28, Dave Airlie wrote:

From: Dave Airlie 

Only r600 target used now for compute IR.

Signed-off-by: Dave Airlie 
---
  src/gallium/drivers/r600/r600_pipe_common.c | 20 ++--
  1 file changed, 2 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe_common.c 
b/src/gallium/drivers/r600/r600_pipe_common.c
index 066d10a..acad670 100644
--- a/src/gallium/drivers/r600/r600_pipe_common.c
+++ b/src/gallium/drivers/r600/r600_pipe_common.c
@@ -1012,24 +1012,8 @@ static int r600_get_compute_param(struct pipe_screen 
*screen,
switch (param) {
case PIPE_COMPUTE_CAP_IR_TARGET: {
const char *gpu;
-   const char *triple;
-   if (rscreen->family <= CHIP_ARUBA) {
-   triple = "r600--";
-   } else {
-   if (HAVE_LLVM < 0x0400) {
-   triple = "amdgcn--";
-   } else {
-   triple = "amdgcn-mesa-mesa3d";
-   }
-   }
-   switch(rscreen->family) {
-   /* Clang < 3.6 is missing Hainan in its list of
-* GPUs, so we need to use the name of a similar GPU.
-*/
-   default:
-   gpu = r600_get_llvm_processor_name(rscreen->family);
-   break;
-   }
+   const char *triple = "r600--";
+   gpu = r600_get_llvm_processor_name(rscreen->family);
if (ret) {
sprintf(ret, "%s-%s", gpu, triple);
}




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix src0 vs src1 typo

2017-10-10 Thread Eero Tamminen

Hi,

On 03.10.2017 08:20, Matt Turner wrote:

A typo caused us to copy src0's reg file to src1 rather than reading
src1's as intended. This caused us to fail to compact instructions like

mov(8)   g4<1>D0D  { align1 1Q };

because src1 was set to immediate rather than architecture file. Fixing
this reenables compaction (after the precompact() pass changes the data
types):

mov(8)   g4<1>UD   0xUD{ align1 1Q compacted };

Fixes: 1cb0a7941b27 ("i965: Switch to using the logical register types")


FYI: the original commit regressed SynMark v7 CSDof test performance by 
1-2% on GEN9+, and this fixes that performance regression.



- Eero


---
  src/intel/compiler/brw_eu_compact.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_eu_compact.c 
b/src/intel/compiler/brw_eu_compact.c
index 7674aa8b85..7b32270957 100644
--- a/src/intel/compiler/brw_eu_compact.c
+++ b/src/intel/compiler/brw_eu_compact.c
@@ -998,7 +998,7 @@ precompact(const struct gen_device_info *devinfo, brw_inst 
inst)
   (brw_inst_src0_type(devinfo, &inst) == BRW_REGISTER_TYPE_DF ||
brw_inst_src0_type(devinfo, &inst) == BRW_REGISTER_TYPE_UQ ||
brw_inst_src0_type(devinfo, &inst) == BRW_REGISTER_TYPE_Q))) {
-  enum brw_reg_file file = brw_inst_src0_reg_file(devinfo, &inst);
+  enum brw_reg_file file = brw_inst_src1_reg_file(devinfo, &inst);
brw_inst_set_src1_file_type(devinfo, &inst, file, BRW_REGISTER_TYPE_UD);
 }
  



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 1/3] mesa: Add new fast mtx_t mutex type for basic use cases

2017-10-10 Thread Tapani Pälli

ok another one too ...

On 10/10/2017 01:03 PM, Tapani Pälli wrote:

one error found below ..

On 10/10/2017 05:45 AM, Timothy Arceri wrote:

While modern pthread mutexes are very fast, they still incur a call to an
external DSO and overhead of the generality and features of pthread 
mutexes.
Most mutexes in mesa only needs lock/unlock, and the idea here is that 
we can

inline the atomic operation and make the fast case just two intructions.
Mutexes are subtle and finicky to implement, so we carefully copy the
implementation from Ulrich Dreppers well-written and well-reviewed paper:

   "Futexes Are Tricky"
   http://www.akkadia.org/drepper/futex.pdf

We implement "mutex3", which gives us a mutex that has no syscalls on
uncontended lock or unlock.  Further, the uncontended case boils down 
to a
cmpxchg and an untaken branch and the uncontended unlock is just a 
locked decr
and an untaken branch.  We use __builtin_expect() to indicate that 
contention

is unlikely so that gcc will put the contention code out of the main code
flow.

A fast mutex only supports lock/unlock, can't be recursive or used with
condition variables.  We keep the pthread mutex implementation around as
full_mtx_t for the few places where we use condition variables or 
recursive
locking.  For platforms or compilers where futex and atomics aren't 
available,

mtx_t falls back to the pthread mutex.

The pthread mutex lock/unlock overhead shows up on benchmarks for CPU 
bound

applications.  Most CPU bound cases are helped and some of our internal
bind_buffer_object heavy benchmarks gain up to 10%.

Signed-off-by: Kristian Høgsberg 
Signed-off-by: Timothy Arceri 
---
  configure.ac  |   3 ++
  src/util/simple_mtx.h | 146 
++

  2 files changed, 149 insertions(+)
  create mode 100644 src/util/simple_mtx.h

diff --git a/configure.ac b/configure.ac
index 903a3979d4..47fe427d19 100644
--- a/configure.ac
+++ b/configure.ac
@@ -887,20 +887,23 @@ linux* | cygwin* | darwin* | solaris* | *-gnu* | 
gnu*)

  ;;
  * )
  pthread_stubs_possible="yes"
  ;;
  esac
  if test "x$pthread_stubs_possible" = xyes; then
  PKG_CHECK_MODULES(PTHREADSTUBS, pthread-stubs >= 0.4)
  fi
+dnl Check for futex for fast inline simple_mtx_t.
+AC_CHECK_HEADER([linux/futex.h], [DEFINES="$DEFINES -DHAVE_FUTEX"])
+
  dnl SELinux awareness.
  AC_ARG_ENABLE([selinux],
  [AS_HELP_STRING([--enable-selinux],
  [Build SELinux-aware Mesa @<:@default=disabled@:>@])],
  [MESA_SELINUX="$enableval"],
  [MESA_SELINUX=no])
  if test "x$enable_selinux" = "xyes"; then
  PKG_CHECK_MODULES([SELINUX], [libselinux], [],
  [AC_CHECK_HEADER([selinux/selinux.h],[],
   [AC_MSG_ERROR([SELinux headers not found])])
diff --git a/src/util/simple_mtx.h b/src/util/simple_mtx.h
new file mode 100644
index 00..bd365ac237
--- /dev/null
+++ b/src/util/simple_mtx.h
@@ -0,0 +1,146 @@
+/*
+ * Copyright © 2015 Intel
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a
+ * copy of this software and associated documentation files (the 
"Software"),
+ * to deal in the Software without restriction, including without 
limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
sublicense,

+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including 
the next
+ * paragraph) shall be included in all copies or substantial portions 
of the

+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT 
SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES 
OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 
OTHER DEALINGS

+ * IN THE SOFTWARE.
+ */
+
+#if defined(__GNUC__) && defined(HAVE_FUTEX)
+
+/* mtx_t - Fast, simple mutex
+ *
+ * While modern pthread mutexes are very fast (implemented using 
futex), they
+ * still incur a call to an external DSO and overhead of the 
generality and
+ * features of pthread mutexes.  Most mutexes in mesa only needs 
lock/unlock,
+ * and the idea here is that we can inline the atomic operation and 
make the

+ * fast case just two intructions.  Mutexes are subtle and finicky to
+ * implement, so we carefully copy the implementation from Ulrich 
Dreppers

+ * well-written and well-reviewed paper:
+ *
+ *   "Futexes Are Tricky"
+ *   http://www.akkadia.org/drepper/futex.pdf
+ *
+ * We implement "mutex3", which gives us a mutex that has no syscalls on
+ * uncontended lock or unlock.  Further, the uncontended case boils 
down to a
+ * locked cmpxchg and 

Re: [Mesa-dev] [PATCH] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-10-10 Thread Chema Casanova
With this patch applied I can not reproduce anymore the regression
related to cross-channel variable interference in non-uniformly executed
loops exposed at
dEQP-VK.glsl.return.return_in_dynamic_loop_dynamic_vertex when applying
Curro's liveness patch

Tested-by: Jose Maria Casanova Crespo 

On 05/10/17 20:52, Jason Ekstrand wrote:
> No Shader-db changes.
> 
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/intel/compiler/brw_fs_live_variables.cpp | 55 
> 
>  1 file changed, 55 insertions(+)
> 
> diff --git a/src/intel/compiler/brw_fs_live_variables.cpp 
> b/src/intel/compiler/brw_fs_live_variables.cpp
> index c449672..380060d 100644
> --- a/src/intel/compiler/brw_fs_live_variables.cpp
> +++ b/src/intel/compiler/brw_fs_live_variables.cpp
> @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
>   }
>}
> }
> +
> +   /* Due to the explicit way the SIMD data is handled on GEN, we need to be 
> a
> +* bit more careful with live ranges and loops.  Consider the following
> +* example:
> +*
> +*vec4 color2;
> +*while (1) {
> +*   vec4 color = texture();
> +*   if (...) {
> +*  color2 = color * 2;
> +*  break;
> +*   }
> +*}
> +*gl_FragColor = color2;
> +*
> +* In this case, the definition of color2 dominates the use because the
> +* loop only has the one exit.  This means that the live range interval 
> for
> +* color2 goes from the statement in the if to it's use below the loop.
> +* Now suppose that the texture operation has a header register that gets
> +* assigned one of the registers used for color2.  If the loop condition 
> is
> +* non-uniform and some of the threads will take the and others will
> +* continue.  In this case, the next pass through the loop, the WE_all
> +* setup of the header register will stomp the disabled channels of color2
> +* and corrupt the value.
> +*
> +* This same problem can occur if you have a mix of 64, 32, and 16-bit
> +* registers because the channels do not line up or if you have a SIMD16
> +* program and the first half of one value overlaps the second half of the
> +* other.
> +*
> +* To solve this problem, we take any VGRFs whose live ranges cross the
> +* while instruction of a loop and extend their live ranges to the top of
> +* the loop.  This more accurately models the hardware because the value 
> in
> +* the VGRF needs to be carried through subsequent loop iterations in 
> order
> +* to remain valid when we finally do break.
> +*/
> +   foreach_block (block, cfg) {
> +  if (block->end()->opcode != BRW_OPCODE_WHILE)
> + continue;
> +
> +  /* This is a WHILE instrution. Find the DO block. */
> +  bblock_t *do_block = NULL;
> +  foreach_list_typed(bblock_link, child_link, link, &block->children) {
> + if (child_link->block->start_ip < block->end_ip) {
> +assert(do_block == NULL);
> +do_block = child_link->block;
> + }
> +  }
> +  assert(do_block);
> +
> +  for (int i = 0; i < num_vars; i++) {
> + if (start[i] < block->end_ip && end[i] > block->end_ip)
> +start[i] = MIN2(start[i], do_block->start_ip);
> +  }
> +   }
>  }
>  
>  fs_live_variables::fs_live_variables(fs_visitor *v, const cfg_t *cfg)
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] Gallium+RadeonSI: implement __DRIimageExtension::validateUsage properly

2017-10-10 Thread Daniel Stone
Hi Marek,

On 9 October 2017 at 18:05, Marek Olšák  wrote:
> This implements __DRIimageExtension::validateUsage properly.
>
> It also modifies Addrlib and adds a new pipe_screen function.

Thanks for looking into this! I don't know the AMD bits well enough to
say much about them, but patches 3 and 4 are:
Reviewed-by: Daniel Stone 

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] mesa/st/glsl_to_tgsi: Correct debug output for indirect access

2017-10-10 Thread Gert Wollny

> > +   os << "[";
> > +   if (reg.reladdr)
> > +  os << *reg.reladdr << "+";
> > +   if (reg.reladdr2)
> > +  os << *reg.reladdr2 << "+";
> 
> This isn't how that works. reladdr2 is the relative address for 2D 
> register accesses; see has_index2 and index2D.

Actually, I don't like this part of the dump at all: it is not known
from the register information I can access here whether an address 
which address register index goes where and if an address register is
used at all. - and documentation on the issue is also very sparse. From
the code accessing has_index2 I figured that the dump should look
something like this: ? 

template 
void dump_reg_access(std::ostream& os, const st_reg& reg)
{
   os << tgsi_file_names[reg.file];
   if (reg.file == PROGRAM_ARRAY)
  os << "(" << reg.array_id << ")";

   os << "[";
   if (reg.reladdr)
  os << *reg.reladdr << "+";
   os << reg.index << "]";

   if (reg.has_index2) {
  os << "[";
  if (reg.reladdr2)
 os << *reg.reladdr2 << "+";
  os << reg.index2D << "]";
   }
}


> 
> Besides, you're not actually printing reg.index.
> 
> Same for when you're printing the destination.

This I was already about to fix. 

Many thanks for you comments (also the other ones). 

Best, 
Gert


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] ac/surface: add ac_surface::is_displayable

2017-10-10 Thread Daniel Stone
Hi Marek,

On 9 October 2017 at 18:05, Marek Olšák  wrote:
> surf->is_linear = surf->u.legacy.level[0].mode == 
> RADEON_SURF_MODE_LINEAR_ALIGNED;
> +   surf->is_displayable = surf->micro_tile_mode == 
> RADEON_MICRO_MODE_DISPLAY ||
> +  surf->micro_tile_mode == 
> RADEON_MICRO_MODE_ROTATED;

Are linear surfaces not displayable on gfx6 parts?

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >