date:20170710

[Mesa-dev] [Bug 101334] Any vulkan app seems to freeze the system

2017-07-10 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=101334

--- Comment #23 from John  ---
Some apps worked, others froze the system.

I'm still hopeful to find a fix here :)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 101334] Any vulkan app seems to freeze the system

2017-07-10 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=101334

--- Comment #22 from Marko  ---
(In reply to John from comment #21)
> I believe that's a same generation card, so it would make sense to behave
> similarly.

Yeah. Did it ever work for you? RADV was a no-go on my card from day one.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] i965: Fix asynchronous mappings on !LLC platforms.

2017-07-10 Thread Kenneth Graunke

When using a read-only CPU mapping, we may encounter stale buffer
contents.  For example, the Piglit primitive-restart test offers the
following scenario:

   1. Read data via a CPU map.
   2. Destroy that buffer.
   3. Create a new buffer - obtaining the same one via the BO cache.
   4. Call BufferSubData, which does a GTT map with MAP_WRITE | MAP_ASYNC.
  (We avoid set_domain for async mappings, so no flushing occurs.)
   5. Read data via a CPU map.
  (Without explicit clflushing, this will contain data from step 1!)

Otherwise, everything ought to work, keeping in mind that we never use
CPU maps for writing - just read-only CPU maps.

This restores the performance gains after Matt's revert in commit
71651b3139c501f50e6547c21a1cdb816b0a9dde.

v2: Do the invalidate later, and even when asking for a brand new map.
---
 src/mesa/drivers/dri/i965/brw_bufmgr.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c 
b/src/mesa/drivers/dri/i965/brw_bufmgr.c
index 30e4b28b9e0..af6524b8100 100644
--- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
+++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
@@ -56,6 +56,7 @@
 #ifndef ETIME
 #define ETIME ETIMEDOUT
 #endif
+#include "common/gen_clflush.h"
 #include "common/gen_debug.h"
 #include "common/gen_device_info.h"
 #include "libdrm_macros.h"
@@ -703,11 +704,24 @@ brw_bo_map_cpu(struct brw_context *brw, struct brw_bo 
*bo, unsigned flags)
bo->map_cpu);
print_flags(flags);
 
-   if (!(flags & MAP_ASYNC) || !bufmgr->has_llc) {
+   if (!(flags & MAP_ASYNC)) {
   set_domain(brw, "CPU mapping", bo, I915_GEM_DOMAIN_CPU,
  flags & MAP_WRITE ? I915_GEM_DOMAIN_CPU : 0);
}
 
+   if (!bo->cache_coherent) {
+  /* If we're reusing an existing CPU mapping, the CPU caches may
+   * contain stale data from the last time we read from that mapping.
+   * (With the BO cache, it might even be data from a previous buffer!)
+   * Even if it's a brand new mapping, the kernel may have zeroed the
+   * buffer via CPU writes.
+   *
+   * We need to invalidate those cachelines so that we see the latest
+   * contents.
+   */
+  gen_invalidate_range(bo->map_cpu, bo->size);
+   }
+
return bo->map_cpu;
 }
 
@@ -754,7 +768,7 @@ brw_bo_map_gtt(struct brw_context *brw, struct brw_bo *bo, 
unsigned flags)
DBG("bo_map_gtt: %d (%s) -> %p, ", bo->gem_handle, bo->name, bo->map_gtt);
print_flags(flags);
 
-   if (!(flags & MAP_ASYNC) || !bufmgr->has_llc) {
+   if (!(flags & MAP_ASYNC)) {
   set_domain(brw, "GTT mapping", bo,
  I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
}
-- 
2.13.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] i965: Don't use PREAD for glGetBufferSubData().

2017-07-10 Thread Kenneth Graunke

Just map the buffer and memcpy.  This will do a CPU mmap, which should
be reasonably efficient, and doing this gives us full control over the
domains and caching instead of leaving it to the kernel.

This prevents regressions on Braswell in the next commit.  Specifically
GL45-CTS.shader_atomic_counters.basic-buffer-operations.  Because async
maps start skipping set-domain, the pread thought everything was nicely
still in the CPU domain, and returned stale data.

v2: Use _mesa_error_no_memory() if the map fails instead of crashing.
---
 src/mesa/drivers/dri/i965/brw_bufmgr.c   | 24 
 src/mesa/drivers/dri/i965/brw_bufmgr.h   |  3 ---
 src/mesa/drivers/dri/i965/intel_buffer_objects.c | 11 ++-
 3 files changed, 10 insertions(+), 28 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c 
b/src/mesa/drivers/dri/i965/brw_bufmgr.c
index 11251f15edc..30e4b28b9e0 100644
--- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
+++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
@@ -820,30 +820,6 @@ brw_bo_subdata(struct brw_bo *bo, uint64_t offset,
return ret;
 }
 
-int
-brw_bo_get_subdata(struct brw_bo *bo, uint64_t offset,
-   uint64_t size, void *data)
-{
-   struct brw_bufmgr *bufmgr = bo->bufmgr;
-   struct drm_i915_gem_pread pread;
-   int ret;
-
-   memclear(pread);
-   pread.handle = bo->gem_handle;
-   pread.offset = offset;
-   pread.size = size;
-   pread.data_ptr = (uint64_t) (uintptr_t) data;
-   ret = drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_PREAD, &pread);
-   if (ret != 0) {
-  ret = -errno;
-  DBG("%s:%d: Error reading data from buffer %d: "
-  "(%"PRIu64" %"PRIu64") %s .\n",
-  __FILE__, __LINE__, bo->gem_handle, offset, size, strerror(errno));
-   }
-
-   return ret;
-}
-
 /** Waits for all GPU rendering with the object to have completed. */
 void
 brw_bo_wait_rendering(struct brw_bo *bo)
diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.h 
b/src/mesa/drivers/dri/i965/brw_bufmgr.h
index d388e5ad150..01a540f5315 100644
--- a/src/mesa/drivers/dri/i965/brw_bufmgr.h
+++ b/src/mesa/drivers/dri/i965/brw_bufmgr.h
@@ -222,9 +222,6 @@ static inline int brw_bo_unmap(struct brw_bo *bo) { return 
0; }
 /** Write data into an object. */
 int brw_bo_subdata(struct brw_bo *bo, uint64_t offset,
uint64_t size, const void *data);
-/** Read data from an object. */
-int brw_bo_get_subdata(struct brw_bo *bo, uint64_t offset,
-   uint64_t size, void *data);
 /**
  * Waits for rendering to an object by the GPU to have completed.
  *
diff --git a/src/mesa/drivers/dri/i965/intel_buffer_objects.c 
b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
index a9ac29a6a81..85cc1a694bf 100644
--- a/src/mesa/drivers/dri/i965/intel_buffer_objects.c
+++ b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
@@ -289,7 +289,16 @@ brw_get_buffer_subdata(struct gl_context *ctx,
if (brw_batch_references(&brw->batch, intel_obj->buffer)) {
   intel_batchbuffer_flush(brw);
}
-   brw_bo_get_subdata(intel_obj->buffer, offset, size, data);
+
+   void *map = brw_bo_map(brw, intel_obj->buffer, MAP_READ);
+
+   if (unlikely(!map)) {
+  _mesa_error_no_memory(__func__);
+  return;
+   }
+
+   memcpy(data, map + offset, size);
+   brw_bo_unmap(intel_obj->buffer);
 
mark_buffer_inactive(intel_obj);
 }
-- 
2.13.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] XCOM: Enemy Unknown vs. NaN texture unit LOD bias

2017-07-10 Thread Kenneth Graunke

Hello,

Mesa master has been hitting assert failures when running "XCOM: Enemy
Unknown" since commit f8d69beed49c64f883bb8ffb28d4960306baf575, where we
started asserting that the SAMPLER_STATE LOD Bias value actually fits in
the correct number of bits.

Apparently, XCOM calls

   glTexEnv(GL_TEXTURE_FILTER_CONTROL_EXT, GL_TEXTURE_LOD_BIAS_EXT, val);

to set the texture unit LOD bias...but according to gdb, the value is:

   -nan(0x73)

In i965, we do CLAMP(lod bias, -16, 15)...but NaN ends up failing both
the < min and > max comparisons, so it slips through.  But, that raises
the question...what value *should* we be using?  0?  Min?  Max?

I couldn't find any immediately applicable GL spec text.  Anyone know of
any?  If not, does DirectX mandate something?

I wrote a hack to check isnan and replace it with 0, which gets the game
working again, but...it seems like we could have this problem in a lot of
other places too...and I'm not sure what the right answer is.

https://cgit.freedesktop.org/~kwg/mesa/commit/?h=xcom&id=6a1c0515b760c943eb547cced754b465aa3bd4ca

Thanks for any advice :)

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] Android: Fix vc4 build since XML changes.

2017-07-10 Thread Rob Herring

On Wed, Jul 5, 2017 at 1:19 PM, Eric Anholt  wrote:
> For the automake build, -Isrc/ is implied from the gallium cflags, while
> Android gallium driver builds don't get that by default.  I think it'll be
> better for vc4 to have broadcom includes appear as "#include
> " to make it more clear where to look in the
> tree than "#include " does.
> ---
>
> Rob: The patch *had* changed from what I submitted -- I replaced
> "intel" with "broadcom".  I wonder if maybe when you tested, you'd
> just dropped it, like this patch does.

Sorry, just getting around to testing this.

> I still wish we had a public docker image we could build Android Mesa
> from.  I looked at your scripts, but without access to an image, I'm
> not looking to build one from scratch.

I'll bug John Stultz about that. I know he is using docker for his build env.

>
>  src/broadcom/Android.genxml.mk | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/broadcom/Android.genxml.mk b/src/broadcom/Android.genxml.mk
> index df44b2ec0b79..dfd00a93fbcf 100644
> --- a/src/broadcom/Android.genxml.mk
> +++ b/src/broadcom/Android.genxml.mk
> @@ -51,7 +51,7 @@ $(intermediates)/cle/v3d_packet_v21_pack.h: 
> $(LOCAL_PATH)/cle/v3d_packet_v21.xml
> $(call header-gen)
>
>  LOCAL_EXPORT_C_INCLUDE_DIRS := \
> -   $(MESA_TOP)/src/broadcom \
> +   $(MESA_TOP)/src \

This doesn't fix things because we build out of tree. Here's the fixes
I need. I haven't checked if how aligned this is to automake files,
but ideally we'd keep things similar to some extent (e.g. same include
paths). I'll make a proper patch if you are fine with these changes.

diff --git a/src/broadcom/Android.genxml.mk b/src/broadcom/Android.genxml.mk
index dfd00a93fbcf..a504326135c5 100644
--- a/src/broadcom/Android.genxml.mk
+++ b/src/broadcom/Android.genxml.mk
@@ -37,7 +37,7 @@ $(intermediates)/dummy.c:
$(hide) touch $@

 # This is the list of auto-generated files headers
-LOCAL_GENERATED_SOURCES += $(addprefix $(intermediates)/,
$(BROADCOM_GENXML_GENERATED_FILES))
+LOCAL_GENERATED_SOURCES += $(addprefix $(intermediates)/broadcom/,
$(BROADCOM_GENXML_GENERATED_FILES))

 define header-gen
@mkdir -p $(dir $@)
@@ -45,13 +45,13 @@ define header-gen
$(hide) $(PRIVATE_SCRIPT) $(PRIVATE_SCRIPT_FLAGS) $(PRIVATE_XML) > $@
 endef

-$(intermediates)/cle/v3d_packet_v21_pack.h: PRIVATE_SCRIPT :=
$(MESA_PYTHON2) $(LOCAL_PATH)/cle/gen_pack_header.py
-$(intermediates)/cle/v3d_packet_v21_pack.h: PRIVATE_XML :=
$(LOCAL_PATH)/cle/v3d_packet_v21.xml
-$(intermediates)/cle/v3d_packet_v21_pack.h:
$(LOCAL_PATH)/cle/v3d_packet_v21.xml
$(LOCAL_PATH)/cle/gen_pack_header.py
+$(intermediates)/broadcom/cle/v3d_packet_v21_pack.h: PRIVATE_SCRIPT
:= $(MESA_PYTHON2) $(LOCAL_PATH)/cle/gen_pack_header.py
+$(intermediates)/broadcom/cle/v3d_packet_v21_pack.h: PRIVATE_XML :=
$(LOCAL_PATH)/cle/v3d_packet_v21.xml
+$(intermediates)/broadcom/cle/v3d_packet_v21_pack.h:
$(LOCAL_PATH)/cle/v3d_packet_v21.xml
$(LOCAL_PATH)/cle/gen_pack_header.py
$(call header-gen)

 LOCAL_EXPORT_C_INCLUDE_DIRS := \
-   $(MESA_TOP)/src \
+   $(MESA_TOP)/src/broadcom/cle \
$(intermediates)

 include $(MESA_COMMON_MK)
diff --git a/src/gallium/drivers/vc4/vc4_cl_dump.c
b/src/gallium/drivers/vc4/vc4_cl_dump.c
index cbe35b0208e7..b14cf387d1e4 100644
--- a/src/gallium/drivers/vc4/vc4_cl_dump.c
+++ b/src/gallium/drivers/vc4/vc4_cl_dump.c
@@ -25,7 +25,7 @@
 #include "util/u_prim.h"
 #include "util/macros.h"
 #include "vc4_cl_dump.h"
-#include "vc4_packet.h"
+#include "kernel/vc4_packet.h"

 #define __gen_user_data void
 #define __gen_address_type uint32_t
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [EGL android: accquire fence implementation] i965: Queue the buffer with a sync fence for Android OS

2017-07-10 Thread Wu, Zhongmin


Add Gao, Shuo
-Original Message-
From: Wu, Zhongmin 
Sent: Tuesday, July 11, 2017 10:03 
To: 'Emil Velikov' ; Marathe, Yogesh 

Cc: Widawsky, Benjamin ; Liu, Zhiquan 
; 'Eric Engestrom' ; 'Rob Clark' 
; 'Tomasz Figa' ; 'Kenneth 
Graunke' ; Kondapally, Kalyan 
; 'ML mesa-dev' ; 
'Timothy Arceri' ; 'Chuanbo Weng' 

Subject: RE: [Mesa-dev] [EGL android: accquire fence implementation] i965: 
Queue the buffer with a sync fence for Android OS

By the way, 

For cancelBuffer, sorry I forget such function, thanks for notice. It should 
also pass the same fence fd as the queuebuffer.

And Yogesh, you mentioned the gallium,   is it another platform supported by 
mesa ?  I am sorry I have no idea about this,  could you please help to check 
this ?

I think we can co-work with mesa team to work out an acceptable fix which can 
meet the requirement of Android without any break on other platforms.

-Original Message-
From: Wu, Zhongmin
Sent: Tuesday, July 11, 2017 8:40
To: 'Emil Velikov' ; Marathe, Yogesh 

Cc: Widawsky, Benjamin ; Liu, Zhiquan 
; Eric Engestrom ; Rob Clark 
; Tomasz Figa ; Kenneth Graunke 
; Kondapally, Kalyan ; ML 
mesa-dev ; Timothy Arceri 
; Chuanbo Weng 
Subject: RE: [Mesa-dev] [EGL android: accquire fence implementation] i965: 
Queue the buffer with a sync fence for Android OS

Hi Emil and Yogesh
Thank you for your comments,  and thanks Yogesh for giving the detailed 
explanations 


And according to the document of Android below 
(https://source.android.com/devices/graphics/arch-bq-gralloc):

Recent Android devices support the sync framework, which enables the system to 
do nifty things when combined with hardware components that can manipulate 
graphics data asynchronously. For example, a producer can submit a series of 
OpenGL ES drawing commands and then enqueue the output buffer before rendering 
completes. The buffer is accompanied by a fence that signals when the contents 
are ready.


I think the things is very clear, that is if the rendering is completed already 
when we call queueBuffer() in mesa ?   If not, we should queue the buffer with 
a fence which will be signaled when the buffer is ready.



-Original Message-
From: Emil Velikov [mailto:emil.l.veli...@gmail.com]
Sent: Tuesday, July 11, 2017 1:18
To: Marathe, Yogesh 
Cc: Wu, Zhongmin ; Widawsky, Benjamin 
; Liu, Zhiquan ; Eric 
Engestrom ; Rob Clark ; Tomasz 
Figa ; Kenneth Graunke ; Kondapally, 
Kalyan ; ML mesa-dev 
; Timothy Arceri ; 
Chuanbo Weng 
Subject: Re: [Mesa-dev] [EGL android: accquire fence implementation] i965: 
Queue the buffer with a sync fence for Android OS

On 10 July 2017 at 15:26, Marathe, Yogesh  wrote:
> Hello Emil, My two cents since I too spent some time on this.
>
>> -Original Message-
>> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On 
>> Behalf Of Emil Velikov
>> Sent: Monday, July 10, 2017 4:41 PM
>> To: Wu, Zhongmin 
>> Cc: Widawsky, Benjamin ; Liu, Zhiquan 
>> ; Eric Engestrom ; Rob 
>> Clark ; Tomasz Figa ; 
>> Kenneth Graunke ; Kondapally, Kalyan 
>> ; ML mesa-dev > d...@lists.freedesktop.org>; Timothy Arceri ; 
>> Chuanbo Weng 
>> Subject: Re: [Mesa-dev] [EGL android: accquire fence implementation] i965:
>> Queue the buffer with a sync fence for Android OS
>>
>> Hi Zhongmin Wu,
>>
>> Above all, a bit of a disclaimer: I'm by no means an expert on the 
>> topic so take the following with a pinch of salt.
>>
>> On 10 July 2017 at 03:11, Zhongmin Wu  wrote:
>> > Before we queued the buffer with a invalid fence (-1), it will make 
>> > some benchmarks failed to test such as flatland.
>> >
>> > Now we get the out fence during the flushing buffer and then pass 
>> > it to SurfaceFlinger in eglSwapbuffer function.
>> >
>> Having a closer look it seems that the issue can be summarised as follows:
>>  - flatland intercepts/interacts ANativeWindow::{de,}queueBuffer (how 
>> about
>> ANativeWindow::cancelBuffer?)
>>  - the program expects that a sync fd is available for both dequeue 
>> and queue
>>
>> At the same time:
>>  - the ANativeWindow documentation does _not_ state such requirement
>>  - even if it did, that will be somewhat wrong, since 
>> ANativeWindow::queueBuffer is called by eglSwapBuffers() Where the 
>> latter documentation clearly states - "... performs an implicit flush ... 
>> glFlush ...
>> vgFlush"
>>
>> My take is that if flatland/Android framework does want an explicit 
>> sync point it should insert one with the EGL API.
>> There could be alternative solutions, but the proposed patch seems 
>> wrong IMHO.
>
> In fact, I could work this around in producer  (Surface::queueBuffer) 
> by ignoring the (-1) passed and by creating a sync using egl APIs. I see two 
> problems with that.
>
> - Before getting a fd using eglDupNativeFenceFDANDROID(), you need a 
> glFlush(),
>this costs additional cycles for each queueBuffer transaction on each 
> BufferItem and
>I believe fd is also signaled due to this. (so I don’t know what we'll get

[Mesa-dev] [PATCH v2 0/2] swr: drastically reduce compiled size

2017-07-10 Thread Tim Rowley

These two patches allow us to change how we build and link the swr
driver; details are in the second patch commit message.

Change in disk space:

libGL.so6464 Kb ->  7000 Kb
libswrAVX.so   10068 Kb ->  5432 Kb
libswrAVX2.so   9828 Kb ->  5200 Kb

Total  26360 Kb -> 17632 Kb

v2:
  add scons changes for the new swr build setup
  reduce rules/flags and add comments to Makefile.am

Tim Rowley (2):
  swr: switch to using SwrGetInterface api table
  swr: build driver proper separate from rasterizer

 src/gallium/drivers/swr/Makefile.am | 31 ++
 src/gallium/drivers/swr/SConscript  | 26 +--
 src/gallium/drivers/swr/swr_clear.cpp   |  6 ++---
 src/gallium/drivers/swr/swr_context.cpp | 19 --
 src/gallium/drivers/swr/swr_context.h   |  5 +++-
 src/gallium/drivers/swr/swr_draw.cpp| 46 -
 src/gallium/drivers/swr/swr_fence.cpp   |  2 +-
 src/gallium/drivers/swr/swr_loader.cpp  | 14 +-
 src/gallium/drivers/swr/swr_memory.h|  6 ++---
 src/gallium/drivers/swr/swr_query.cpp   |  8 +++---
 src/gallium/drivers/swr/swr_scratch.cpp |  2 +-
 src/gallium/drivers/swr/swr_screen.cpp  |  3 ++-
 src/gallium/drivers/swr/swr_screen.h|  2 ++
 src/gallium/drivers/swr/swr_state.cpp   | 40 ++--
 14 files changed, 107 insertions(+), 103 deletions(-)

-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 1/2] swr: switch to using SwrGetInterface api table

2017-07-10 Thread Tim Rowley

Use the SWR rasterizer API through the table returned from
SwrGetInterface rather than referencing the functions directly.
This will allow us to move to a model of having the driver dynamically
load the appropriate swr architecture library.
---
 src/gallium/drivers/swr/swr_clear.cpp   |  6 ++---
 src/gallium/drivers/swr/swr_context.cpp | 19 --
 src/gallium/drivers/swr/swr_context.h   |  5 +++-
 src/gallium/drivers/swr/swr_draw.cpp| 46 -
 src/gallium/drivers/swr/swr_fence.cpp   |  2 +-
 src/gallium/drivers/swr/swr_memory.h|  6 ++---
 src/gallium/drivers/swr/swr_query.cpp   |  8 +++---
 src/gallium/drivers/swr/swr_scratch.cpp |  2 +-
 src/gallium/drivers/swr/swr_screen.cpp  |  3 ++-
 src/gallium/drivers/swr/swr_state.cpp   | 40 ++--
 10 files changed, 72 insertions(+), 65 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_clear.cpp 
b/src/gallium/drivers/swr/swr_clear.cpp
index 3a35805..233432e 100644
--- a/src/gallium/drivers/swr/swr_clear.cpp
+++ b/src/gallium/drivers/swr/swr_clear.cpp
@@ -78,9 +78,9 @@ swr_clear(struct pipe_context *pipe,
 
for (unsigned i = 0; i < layers; ++i) {
   swr_update_draw_context(ctx);
-  SwrClearRenderTarget(ctx->swrContext, clearMask, i,
-   color->f, depth, stencil,
-   clear_rect);
+  ctx->api.pfnSwrClearRenderTarget(ctx->swrContext, clearMask, i,
+   color->f, depth, stencil,
+   clear_rect);
 
   // Mask out the attachments that are out of layers.
   if (fb->zsbuf &&
diff --git a/src/gallium/drivers/swr/swr_context.cpp 
b/src/gallium/drivers/swr/swr_context.cpp
index f2d971a..9648278 100644
--- a/src/gallium/drivers/swr/swr_context.cpp
+++ b/src/gallium/drivers/swr/swr_context.cpp
@@ -311,8 +311,8 @@ swr_blit(struct pipe_context *pipe, const struct 
pipe_blit_info *blit_info)
}
 
if (ctx->active_queries) {
-  SwrEnableStatsFE(ctx->swrContext, FALSE);
-  SwrEnableStatsBE(ctx->swrContext, FALSE);
+  ctx->api.pfnSwrEnableStatsFE(ctx->swrContext, FALSE);
+  ctx->api.pfnSwrEnableStatsBE(ctx->swrContext, FALSE);
}
 
util_blitter_save_vertex_buffer_slot(ctx->blitter, ctx->vertex_buffer);
@@ -349,8 +349,8 @@ swr_blit(struct pipe_context *pipe, const struct 
pipe_blit_info *blit_info)
util_blitter_blit(ctx->blitter, &info);
 
if (ctx->active_queries) {
-  SwrEnableStatsFE(ctx->swrContext, TRUE);
-  SwrEnableStatsBE(ctx->swrContext, TRUE);
+  ctx->api.pfnSwrEnableStatsFE(ctx->swrContext, TRUE);
+  ctx->api.pfnSwrEnableStatsBE(ctx->swrContext, TRUE);
}
 }
 
@@ -383,10 +383,10 @@ swr_destroy(struct pipe_context *pipe)
 
/* Idle core after destroying buffer resources, but before deleting
 * context.  Destroying resources has potentially called StoreTiles.*/
-   SwrWaitForIdle(ctx->swrContext);
+   ctx->api.pfnSwrWaitForIdle(ctx->swrContext);
 
if (ctx->swrContext)
-  SwrDestroyContext(ctx->swrContext);
+  ctx->api.pfnSwrDestroyContext(ctx->swrContext);
 
delete ctx->blendJIT;
 
@@ -467,6 +467,9 @@ swr_create_context(struct pipe_screen *p_screen, void 
*priv, unsigned flags)
   AlignedMalloc(sizeof(struct swr_context), KNOB_SIMD_BYTES);
memset(ctx, 0, sizeof(struct swr_context));
 
+   SwrGetInterface(ctx->api);
+   ctx->swrDC.pAPI = &ctx->api;
+
ctx->blendJIT =
   new std::unordered_map;
 
@@ -478,9 +481,9 @@ swr_create_context(struct pipe_screen *p_screen, void 
*priv, unsigned flags)
createInfo.pfnClearTile = swr_StoreHotTileClear;
createInfo.pfnUpdateStats = swr_UpdateStats;
createInfo.pfnUpdateStatsFE = swr_UpdateStatsFE;
-   ctx->swrContext = SwrCreateContext(&createInfo);
+   ctx->swrContext = ctx->api.pfnSwrCreateContext(&createInfo);
 
-   SwrInit();
+   ctx->api.pfnSwrInit();
 
if (ctx->swrContext == NULL)
   goto fail;
diff --git a/src/gallium/drivers/swr/swr_context.h 
b/src/gallium/drivers/swr/swr_context.h
index 3ff4bf3..753cbf3 100644
--- a/src/gallium/drivers/swr/swr_context.h
+++ b/src/gallium/drivers/swr/swr_context.h
@@ -102,6 +102,7 @@ struct swr_draw_context {
 
SWR_SURFACE_STATE renderTargets[SWR_NUM_ATTACHMENTS];
struct swr_query_result *pStats; // @llvm_struct
+   SWR_INTERFACE *pAPI; // @llvm_struct - Needed for the swr_memory callbacks
 };
 
 /* gen_llvm_types FINI */
@@ -169,6 +170,8 @@ struct swr_context {
struct swr_draw_context swrDC;
 
unsigned dirty; /**< Mask of SWR_NEW_x flags */
+
+   SWR_INTERFACE api;
 };
 
 static INLINE struct swr_context *
@@ -182,7 +185,7 @@ swr_update_draw_context(struct swr_context *ctx,
   struct swr_query_result *pqr = nullptr)
 {
swr_draw_context *pDC =
-  (swr_draw_context *)SwrGetPrivateContextState(ctx->swrContext);
+  (swr_draw_context 
*)ctx->api.pfnSwrGetPrivateContextState(ctx->swrContext);
if (pqr)
   ctx->swrDC.pStats = pqr;
memcpy(pDC

[Mesa-dev] [PATCH v2 2/2] swr: build driver proper separate from rasterizer

2017-07-10 Thread Tim Rowley

swr used to build and link the rasterizer to the driver, and to support
multiple architectures we needed to have multiple versions of the
driver/rasterizer combination, which needed to link in much of mesa.

Changing to having one instance of the driver and just building
architecture specific versions of the rasterizer gives a large reduction
in disk space.

libGL.so6464 Kb ->  7000 Kb
libswrAVX.so   10068 Kb ->  5432 Kb
libswrAVX2.so   9828 Kb ->  5200 Kb

Total  26360 Kb -> 17632 Kb
---
 src/gallium/drivers/swr/Makefile.am | 31 ++-
 src/gallium/drivers/swr/SConscript  | 26 +-
 src/gallium/drivers/swr/swr_context.cpp |  2 +-
 src/gallium/drivers/swr/swr_loader.cpp  | 14 ++
 src/gallium/drivers/swr/swr_screen.h|  2 ++
 5 files changed, 36 insertions(+), 39 deletions(-)

diff --git a/src/gallium/drivers/swr/Makefile.am 
b/src/gallium/drivers/swr/Makefile.am
index 4b4bd37..7461228 100644
--- a/src/gallium/drivers/swr/Makefile.am
+++ b/src/gallium/drivers/swr/Makefile.am
@@ -26,7 +26,14 @@ AM_CXXFLAGS = $(GALLIUM_DRIVER_CFLAGS) $(SWR_CXX11_CXXFLAGS)
 
 noinst_LTLIBRARIES = libmesaswr.la
 
-libmesaswr_la_SOURCES = $(LOADER_SOURCES)
+# gen_knobs.* included here to provide driver access to swr configuration
+libmesaswr_la_SOURCES = \
+   $(CXX_SOURCES) \
+   $(COMMON_CXX_SOURCES) \
+   $(JITTER_CXX_SOURCES) \
+   rasterizer/codegen/gen_knobs.cpp \
+   rasterizer/codegen/gen_knobs.h \
+   $(LOADER_SOURCES)
 
 COMMON_CXXFLAGS = \
-fno-strict-aliasing \
@@ -43,12 +50,15 @@ COMMON_CXXFLAGS = \
-I$(srcdir)/rasterizer/jitter \
-I$(srcdir)/rasterizer/archrast
 
+# SWR_AVX_CXXFLAGS needed for intrinsic usage in swr api headers
+libmesaswr_la_CXXFLAGS = \
+   $(SWR_AVX_CXXFLAGS) \
+   $(COMMON_CXXFLAGS)
+
 COMMON_SOURCES = \
-   $(CXX_SOURCES) \
$(ARCHRAST_CXX_SOURCES) \
$(COMMON_CXX_SOURCES) \
$(CORE_CXX_SOURCES) \
-   $(JITTER_CXX_SOURCES) \
$(MEMORY_CXX_SOURCES) \
$(BUILT_SOURCES)
 
@@ -207,19 +217,12 @@ rasterizer.intermediate: 
rasterizer/codegen/gen_backends.py rasterizer/codegen/t
--cpp \
--hpp
 
-COMMON_LIBADD = \
-   $(top_builddir)/src/gallium/auxiliary/libgallium.la \
-   $(top_builddir)/src/mesa/libmesagallium.la \
-   $(LLVM_LIBS)
-
 COMMON_LDFLAGS = \
-shared \
-module \
-no-undefined \
$(GC_SECTIONS) \
-   $(NO_UNDEFINED) \
-   $(LLVM_LDFLAGS)
-
+   $(NO_UNDEFINED)
 
 lib_LTLIBRARIES = libswrAVX.la libswrAVX2.la
 
@@ -231,9 +234,6 @@ libswrAVX_la_CXXFLAGS = \
 libswrAVX_la_SOURCES = \
$(COMMON_SOURCES)
 
-libswrAVX_la_LIBADD = \
-   $(COMMON_LIBADD)
-
 libswrAVX_la_LDFLAGS = \
$(COMMON_LDFLAGS)
 
@@ -245,9 +245,6 @@ libswrAVX2_la_CXXFLAGS = \
 libswrAVX2_la_SOURCES = \
$(COMMON_SOURCES)
 
-libswrAVX2_la_LIBADD = \
-   $(COMMON_LIBADD)
-
 libswrAVX2_la_LDFLAGS = \
$(COMMON_LDFLAGS)
 
diff --git a/src/gallium/drivers/swr/SConscript 
b/src/gallium/drivers/swr/SConscript
index 512269a..cdfb91a 100644
--- a/src/gallium/drivers/swr/SConscript
+++ b/src/gallium/drivers/swr/SConscript
@@ -30,12 +30,6 @@ else:
 llvm_includedir = env.backtick('%s --includedir' % llvm_config).rstrip()
 print "llvm include dir %s" % llvm_includedir
 
-# the loader is included in the mesa lib itself
-# All the remaining files are in loadable modules
-loadersource = env.ParseSourceList('Makefile.sources', [
-'LOADER_SOURCES'
-])
-
 if not env['msvc'] :
 env.Append(CCFLAGS = [
 '-std=c++11',
@@ -191,16 +185,12 @@ built_sources += [backendPixelRateFiles, 
genRasterizerFiles]
 
 source = built_sources
 source += env.ParseSourceList(swrroot + 'Makefile.sources', [
-'CXX_SOURCES',
 'ARCHRAST_CXX_SOURCES',
 'COMMON_CXX_SOURCES',
 'CORE_CXX_SOURCES',
-'JITTER_CXX_SOURCES',
 'MEMORY_CXX_SOURCES'
 ])
 
-env.Prepend(LIBS = [ mesautil, mesa, gallium ])
-
 env.Prepend(CPPPATH = [
 '.',
 'rasterizer',
@@ -242,14 +232,24 @@ swrAVX2 = envavx2.SharedLibrary(
 )
 env.Alias('swrAVX2', swrAVX2)
 
+source = env.ParseSourceList(swrroot + 'Makefile.sources', [
+'CXX_SOURCES',
+'COMMON_CXX_SOURCES',
+'JITTER_CXX_SOURCES',
+'LOADER_SOURCES'
+])
+source += [
+'rasterizer/codegen/gen_knobs.cpp',
+'rasterizer/archrast/gen_ar_event.cpp',
+]
 
 # main SWR lib
-swr = env.ConvenienceLibrary(
+envSWR = envavx.Clone() # pick up the arch flag for intrinsic usage
+swr = envSWR.ConvenienceLibrary(
 target = 'swr',
-source = loadersource,
+source = source,
 )
 
-
 # treat arch libs as dependencies, even though they are not linked
 # into swr, so we don't have to build them separately
 Depends(swr, ['swrAVX', 'swrAVX2'])
diff --git a/src/gallium/drivers/swr/swr_context.cpp 
b/src/gallium/drivers/swr/swr_context.cpp
index 9648278..c058870 100644
--- a/src

[Mesa-dev] [PATCH] radv: allow clear merging for depth/stencil with no care stencil

2017-07-10 Thread Dave Airlie

From: Dave Airlie 

Some of the Sascha Willems demos pick a D32/S8 format for the depth
buffer, then do a LOAD_OP_CLEAR/LOAD_OP_DONT_CARE on it, which means
we don't get to merge the undefined->depth and clear htile transitions.

This add the stencil aspect to the pending clears if there is a depth
clear pending and the stencil aspect is don't care.

Signed-off-by: Dave Airlie 
---
 src/amd/vulkan/radv_cmd_buffer.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index dd83fd0..d0271c8 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -1853,6 +1853,9 @@ radv_cmd_state_setup_attachments(struct radv_cmd_buffer 
*cmd_buffer,
if ((att_aspects & VK_IMAGE_ASPECT_DEPTH_BIT) &&
att->load_op == VK_ATTACHMENT_LOAD_OP_CLEAR) {
clear_aspects |= VK_IMAGE_ASPECT_DEPTH_BIT;
+   if (att_aspects & VK_IMAGE_ASPECT_STENCIL_BIT &&
+   att->stencil_load_op == 
VK_ATTACHMENT_LOAD_OP_DONT_CARE)
+   clear_aspects |= 
VK_IMAGE_ASPECT_STENCIL_BIT;
}
if ((att_aspects & VK_IMAGE_ASPECT_STENCIL_BIT) &&
att->stencil_load_op == 
VK_ATTACHMENT_LOAD_OP_CLEAR) {
-- 
2.9.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [EGL android: accquire fence implementation] i965: Queue the buffer with a sync fence for Android OS

2017-07-10 Thread Wu, Zhongmin

By the way, 

For cancelBuffer, sorry I forget such function, thanks for notice. It should 
also pass the same fence fd as the queuebuffer.

And Yogesh, you mentioned the gallium,   is it another platform supported by 
mesa ?  I am sorry I have no idea about this,  could you please help to check 
this ?

I think we can co-work with mesa team to work out an acceptable fix which can 
meet the requirement of Android without any break on other platforms.

-Original Message-
From: Wu, Zhongmin 
Sent: Tuesday, July 11, 2017 8:40 
To: 'Emil Velikov' ; Marathe, Yogesh 

Cc: Widawsky, Benjamin ; Liu, Zhiquan 
; Eric Engestrom ; Rob Clark 
; Tomasz Figa ; Kenneth Graunke 
; Kondapally, Kalyan ; ML 
mesa-dev ; Timothy Arceri 
; Chuanbo Weng 
Subject: RE: [Mesa-dev] [EGL android: accquire fence implementation] i965: 
Queue the buffer with a sync fence for Android OS

Hi Emil and Yogesh
Thank you for your comments,  and thanks Yogesh for giving the detailed 
explanations 


And according to the document of Android below 
(https://source.android.com/devices/graphics/arch-bq-gralloc):

Recent Android devices support the sync framework, which enables the system to 
do nifty things when combined with hardware components that can manipulate 
graphics data asynchronously. For example, a producer can submit a series of 
OpenGL ES drawing commands and then enqueue the output buffer before rendering 
completes. The buffer is accompanied by a fence that signals when the contents 
are ready.


I think the things is very clear, that is if the rendering is completed already 
when we call queueBuffer() in mesa ?   If not, we should queue the buffer with 
a fence which will be signaled when the buffer is ready.



-Original Message-
From: Emil Velikov [mailto:emil.l.veli...@gmail.com]
Sent: Tuesday, July 11, 2017 1:18
To: Marathe, Yogesh 
Cc: Wu, Zhongmin ; Widawsky, Benjamin 
; Liu, Zhiquan ; Eric 
Engestrom ; Rob Clark ; Tomasz 
Figa ; Kenneth Graunke ; Kondapally, 
Kalyan ; ML mesa-dev 
; Timothy Arceri ; 
Chuanbo Weng 
Subject: Re: [Mesa-dev] [EGL android: accquire fence implementation] i965: 
Queue the buffer with a sync fence for Android OS

On 10 July 2017 at 15:26, Marathe, Yogesh  wrote:
> Hello Emil, My two cents since I too spent some time on this.
>
>> -Original Message-
>> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On 
>> Behalf Of Emil Velikov
>> Sent: Monday, July 10, 2017 4:41 PM
>> To: Wu, Zhongmin 
>> Cc: Widawsky, Benjamin ; Liu, Zhiquan 
>> ; Eric Engestrom ; Rob 
>> Clark ; Tomasz Figa ; 
>> Kenneth Graunke ; Kondapally, Kalyan 
>> ; ML mesa-dev > d...@lists.freedesktop.org>; Timothy Arceri ; 
>> Chuanbo Weng 
>> Subject: Re: [Mesa-dev] [EGL android: accquire fence implementation] i965:
>> Queue the buffer with a sync fence for Android OS
>>
>> Hi Zhongmin Wu,
>>
>> Above all, a bit of a disclaimer: I'm by no means an expert on the 
>> topic so take the following with a pinch of salt.
>>
>> On 10 July 2017 at 03:11, Zhongmin Wu  wrote:
>> > Before we queued the buffer with a invalid fence (-1), it will make 
>> > some benchmarks failed to test such as flatland.
>> >
>> > Now we get the out fence during the flushing buffer and then pass 
>> > it to SurfaceFlinger in eglSwapbuffer function.
>> >
>> Having a closer look it seems that the issue can be summarised as follows:
>>  - flatland intercepts/interacts ANativeWindow::{de,}queueBuffer (how 
>> about
>> ANativeWindow::cancelBuffer?)
>>  - the program expects that a sync fd is available for both dequeue 
>> and queue
>>
>> At the same time:
>>  - the ANativeWindow documentation does _not_ state such requirement
>>  - even if it did, that will be somewhat wrong, since 
>> ANativeWindow::queueBuffer is called by eglSwapBuffers() Where the 
>> latter documentation clearly states - "... performs an implicit flush ... 
>> glFlush ...
>> vgFlush"
>>
>> My take is that if flatland/Android framework does want an explicit 
>> sync point it should insert one with the EGL API.
>> There could be alternative solutions, but the proposed patch seems 
>> wrong IMHO.
>
> In fact, I could work this around in producer  (Surface::queueBuffer) 
> by ignoring the (-1) passed and by creating a sync using egl APIs. I see two 
> problems with that.
>
> - Before getting a fd using eglDupNativeFenceFDANDROID(), you need a 
> glFlush(),
>this costs additional cycles for each queueBuffer transaction on each 
> BufferItem and
>I believe fd is also signaled due to this. (so I don’t know what we'll get 
> by waiting on
>that fd on consumer side).
> - AFAIK, the whole idea of explicit sync revolves around being able to pass 
> fds created
>   by driver between processes and this one breaks that chain. If we work this 
> around in
>   upper layers, explicit sync feature will have to be fixed for every other 
> lib that may use
>   lib mesa underneath.
>
> For these reasons, I still believe we should fix it here. Of course, 
> you and Rob hav

[Mesa-dev] [PATCH] nv50/ir: fix threads calculation for non-compute shaders

2017-07-10 Thread Ilia Mirkin

We were using the "cp" union fields, which are only valid for compute
shaders. The threads calculation affects the availabel GPRs, so just
pick a small number for other shader types to avoid limiting available
registers.

Signed-off-by: Ilia Mirkin 
Cc: mesa-sta...@lists.freedesktop.org
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_target.h | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h
index e9d10574835..afeca14d7d1 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h
@@ -174,11 +174,15 @@ public:
virtual void getBuiltinCode(const uint32_t **code, uint32_t *size) const = 
0;
 
virtual void parseDriverInfo(const struct nv50_ir_prog_info *info) {
-  threads = info->prop.cp.numThreads[0] *
- info->prop.cp.numThreads[1] *
- info->prop.cp.numThreads[2];
-  if (threads == 0)
- threads = info->target >= NVISA_GK104_CHIPSET ? 1024 : 512;
+  if (info->type == PIPE_SHADER_COMPUTE) {
+ threads = info->prop.cp.numThreads[0] *
+info->prop.cp.numThreads[1] *
+info->prop.cp.numThreads[2];
+ if (threads == 0)
+threads = info->target >= NVISA_GK104_CHIPSET ? 1024 : 512;
+  } else {
+ threads = 32; // doesn't matter, just not too big.
+  }
}
 
virtual bool runLegalizePass(Program *, CGStage stage) const = 0;
-- 
2.13.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [EGL android: accquire fence implementation] i965: Queue the buffer with a sync fence for Android OS

2017-07-10 Thread Wu, Zhongmin

Hi Emil and Yogesh
Thank you for your comments,  and thanks Yogesh for giving the detailed 
explanations 


And according to the document of Android below 
(https://source.android.com/devices/graphics/arch-bq-gralloc):

Recent Android devices support the sync framework, which enables the system to 
do nifty things when combined with hardware components that can manipulate 
graphics data asynchronously. For example, a producer can submit a series of 
OpenGL ES drawing commands and then enqueue the output buffer before rendering 
completes. The buffer is accompanied by a fence that signals when the contents 
are ready.


I think the things is very clear, that is if the rendering is completed already 
when we call queueBuffer() in mesa ?   If not, we should queue the buffer with 
a fence which will be signaled when the buffer is ready.



-Original Message-
From: Emil Velikov [mailto:emil.l.veli...@gmail.com] 
Sent: Tuesday, July 11, 2017 1:18 
To: Marathe, Yogesh 
Cc: Wu, Zhongmin ; Widawsky, Benjamin 
; Liu, Zhiquan ; Eric 
Engestrom ; Rob Clark ; Tomasz 
Figa ; Kenneth Graunke ; Kondapally, 
Kalyan ; ML mesa-dev 
; Timothy Arceri ; 
Chuanbo Weng 
Subject: Re: [Mesa-dev] [EGL android: accquire fence implementation] i965: 
Queue the buffer with a sync fence for Android OS

On 10 July 2017 at 15:26, Marathe, Yogesh  wrote:
> Hello Emil, My two cents since I too spent some time on this.
>
>> -Original Message-
>> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On 
>> Behalf Of Emil Velikov
>> Sent: Monday, July 10, 2017 4:41 PM
>> To: Wu, Zhongmin 
>> Cc: Widawsky, Benjamin ; Liu, Zhiquan 
>> ; Eric Engestrom ; Rob 
>> Clark ; Tomasz Figa ; 
>> Kenneth Graunke ; Kondapally, Kalyan 
>> ; ML mesa-dev > d...@lists.freedesktop.org>; Timothy Arceri ; 
>> Chuanbo Weng 
>> Subject: Re: [Mesa-dev] [EGL android: accquire fence implementation] i965:
>> Queue the buffer with a sync fence for Android OS
>>
>> Hi Zhongmin Wu,
>>
>> Above all, a bit of a disclaimer: I'm by no means an expert on the 
>> topic so take the following with a pinch of salt.
>>
>> On 10 July 2017 at 03:11, Zhongmin Wu  wrote:
>> > Before we queued the buffer with a invalid fence (-1), it will make 
>> > some benchmarks failed to test such as flatland.
>> >
>> > Now we get the out fence during the flushing buffer and then pass 
>> > it to SurfaceFlinger in eglSwapbuffer function.
>> >
>> Having a closer look it seems that the issue can be summarised as follows:
>>  - flatland intercepts/interacts ANativeWindow::{de,}queueBuffer (how 
>> about
>> ANativeWindow::cancelBuffer?)
>>  - the program expects that a sync fd is available for both dequeue 
>> and queue
>>
>> At the same time:
>>  - the ANativeWindow documentation does _not_ state such requirement
>>  - even if it did, that will be somewhat wrong, since 
>> ANativeWindow::queueBuffer is called by eglSwapBuffers() Where the 
>> latter documentation clearly states - "... performs an implicit flush ... 
>> glFlush ...
>> vgFlush"
>>
>> My take is that if flatland/Android framework does want an explicit 
>> sync point it should insert one with the EGL API.
>> There could be alternative solutions, but the proposed patch seems 
>> wrong IMHO.
>
> In fact, I could work this around in producer  (Surface::queueBuffer) 
> by ignoring the (-1) passed and by creating a sync using egl APIs. I see two 
> problems with that.
>
> - Before getting a fd using eglDupNativeFenceFDANDROID(), you need a 
> glFlush(),
>this costs additional cycles for each queueBuffer transaction on each 
> BufferItem and
>I believe fd is also signaled due to this. (so I don’t know what we'll get 
> by waiting on
>that fd on consumer side).
> - AFAIK, the whole idea of explicit sync revolves around being able to pass 
> fds created
>   by driver between processes and this one breaks that chain. If we work this 
> around in
>   upper layers, explicit sync feature will have to be fixed for every other 
> lib that may use
>   lib mesa underneath.
>
> For these reasons, I still believe we should fix it here. Of course, 
> you and Rob have very valid points on cancelBuffer and about not 
> breaking gallium respectively, those need to be taken care of.
>
What I'm saying is - seems like the app/framework does something silly or at 
least undocumented.
Fixing things in Mesa may be the right thing to do, but without more 
information, its everyone's guess who's got it wrong.

As Rob asked earlier - can we get an a simple test case or some pseudo code 
illustrating the whole thing?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/blorp: Use the renderbuffer format for clears

2017-07-10 Thread Jason Ekstrand

On Mon, Jul 10, 2017 at 1:34 PM, Andres Gomez  wrote:

> Jason, which is the status of this patch? Has it been superseded or
> discarded?
>

Still awaiting review.


> On Mon, 2017-06-26 at 09:01 -0700, Jason Ekstrand wrote:
> > This fixes the Piglit ARB_texture_views rendering-formats test.
> >
> > Cc: "17.1" 
> > ---
> >  src/mesa/drivers/dri/i965/brw_blorp.c | 10 +-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c
> b/src/mesa/drivers/dri/i965/brw_blorp.c
> > index 87c9dd4..96dc657 100644
> > --- a/src/mesa/drivers/dri/i965/brw_blorp.c
> > +++ b/src/mesa/drivers/dri/i965/brw_blorp.c
> > @@ -746,9 +746,9 @@ do_single_blorp_clear(struct brw_context *brw,
> struct gl_framebuffer *fb,
> >  {
> > struct gl_context *ctx = &brw->ctx;
> > struct intel_renderbuffer *irb = intel_renderbuffer(rb);
> > -   mesa_format format = irb->mt->format;
> > uint32_t x0, x1, y0, y1;
> >
> > +   mesa_format format = irb->Base.Base.Format;
> > if (!encode_srgb && _mesa_get_format_color_encoding(format) ==
> GL_SRGB)
> >format = _mesa_get_srgb_format_linear(format);
> >
> > @@ -772,6 +772,14 @@ do_single_blorp_clear(struct brw_context *brw,
> struct gl_framebuffer *fb,
> > if (set_write_disables(irb, ctx->Color.ColorMask[buf],
> color_write_disable))
> >can_fast_clear = false;
> >
> > +   /* We store clear colors as floats or uints as needed.  If there are
> > +* texture views in play, the formats will not properly be respected
> > +* during resolves because the resolve operations only know about the
> > +* miptree and not the renderbuffer.
> > +*/
> > +   if (irb->Base.Base.Format != irb->mt->format)
> > +  can_fast_clear = false;
> > +
> > if (!irb->mt->supports_fast_clear ||
> > !brw_is_color_fast_clear_compatible(brw, irb->mt,
> &ctx->Color.ClearColor))
> >can_fast_clear = false;
> --
> Br,
>
> Andres
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/5] svga: s/unsigned/enum tgsi_texture_type/

2017-07-10 Thread Neha Bhende

For the series,


Reviewed-by: Neha Bhende 


Regards,

Neha


From: Brian Paul 
Sent: Monday, July 10, 2017 2:50:25 PM
To: mesa-dev@lists.freedesktop.org
Cc: Charmaine Lee; Neha Bhende
Subject: [PATCH 5/5] svga: s/unsigned/enum tgsi_texture_type/

---
 src/gallium/drivers/svga/svga_tgsi_vgpu10.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c 
b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
index bbaad20..d29ac28 100644
--- a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
+++ b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
@@ -2955,7 +2955,8 @@ emit_sampler_declarations(struct svga_shader_emitter_v10 
*emit)
  * Translate TGSI_TEXTURE_x to VGAPU10_RESOURCE_DIMENSION_x.
  */
 static unsigned
-tgsi_texture_to_resource_dimension(unsigned target, boolean is_array)
+tgsi_texture_to_resource_dimension(enum tgsi_texture_type target,
+   boolean is_array)
 {
switch (target) {
case TGSI_TEXTURE_BUFFER:
@@ -4867,7 +4868,7 @@ setup_texcoord(struct svga_shader_emitter_v10 *emit,
  */
 static void
 emit_tex_compare_refcoord(struct svga_shader_emitter_v10 *emit,
-  unsigned target,
+  enum tgsi_texture_type target,
   const struct tgsi_full_src_register *coord)
 {
struct tgsi_full_src_register coord_src_ref;
@@ -4901,7 +4902,7 @@ struct tex_swizzle_info
boolean swizzled;
boolean shadow_compare;
unsigned unit;
-   unsigned texture_target;  /**< TGSI_TEXTURE_x */
+   enum tgsi_texture_type texture_target;  /**< TGSI_TEXTURE_x */
struct tgsi_full_src_register tmp_src;
struct tgsi_full_dst_register tmp_dst;
const struct tgsi_full_dst_register *inst_dst;
--
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 06/16] anv/gpu_memcpy: Add a lighter-weight GPU memcpy function

2017-07-10 Thread Nanley Chery

On Mon, Jul 10, 2017 at 09:35:25AM -0700, Jason Ekstrand wrote:
> On Wed, Jun 28, 2017 at 2:14 PM, Nanley Chery  wrote:
> 
> > We'll be performing a GPU memcpy in more places to copy small amounts of
> > data. Add an alternate function that thrashes less state.
> >
> > v2:
> > - Make a new function (Jason Ekstrand).
> > - Move the #define into the function.
> > v3:
> > - Update the function name (Jason).
> > - Update comments.
> >
> > Signed-off-by: Nanley Chery 
> > ---
> >  src/intel/vulkan/anv_genX.h|  5 +
> >  src/intel/vulkan/genX_gpu_memcpy.c | 40 ++
> > 
> >  2 files changed, 45 insertions(+)
> >
> > diff --git a/src/intel/vulkan/anv_genX.h b/src/intel/vulkan/anv_genX.h
> > index 8da5e075dc..0b7322e281 100644
> > --- a/src/intel/vulkan/anv_genX.h
> > +++ b/src/intel/vulkan/anv_genX.h
> > @@ -69,5 +69,10 @@ void genX(cmd_buffer_so_memcpy)(struct anv_cmd_buffer
> > *cmd_buffer,
> >  struct anv_bo *src, uint32_t src_offset,
> >  uint32_t size);
> >
> > +void genX(cmd_buffer_mi_memcpy)(struct anv_cmd_buffer *cmd_buffer,
> > +struct anv_bo *dst, uint32_t dst_offset,
> > +struct anv_bo *src, uint32_t src_offset,
> > +uint32_t size);
> > +
> >  void genX(blorp_exec)(struct blorp_batch *batch,
> >const struct blorp_params *params);
> > diff --git a/src/intel/vulkan/genX_gpu_memcpy.c
> > b/src/intel/vulkan/genX_gpu_memcpy.c
> > index 5ef35e6283..9c6b46de94 100644
> > --- a/src/intel/vulkan/genX_gpu_memcpy.c
> > +++ b/src/intel/vulkan/genX_gpu_memcpy.c
> > @@ -52,6 +52,46 @@ gcd_pow2_u64(uint64_t a, uint64_t b)
> >  }
> >
> >  void
> > +genX(cmd_buffer_mi_memcpy)(struct anv_cmd_buffer *cmd_buffer,
> > +   struct anv_bo *dst, uint32_t dst_offset,
> > +   struct anv_bo *src, uint32_t src_offset,
> > +   uint32_t size)
> > +{
> > +   /* This memcpy operates in units of dwords. */
> > +   assert(size % 4 == 0);
> > +   assert(dst_offset % 4 == 0);
> > +   assert(src_offset % 4 == 0);
> > +
> > +   for (uint32_t i = 0; i < size; i += 4) {
> > +  const struct anv_address src_addr =
> > + (struct anv_address) { src, src_offset + i};
> > +  const struct anv_address dst_addr =
> > + (struct anv_address) { dst, dst_offset + i};
> > +#if GEN_GEN >= 8
> > +  anv_batch_emit(&cmd_buffer->batch, GENX(MI_COPY_MEM_MEM), cp) {
> > + cp.DestinationMemoryAddress = dst_addr;
> > + cp.SourceMemoryAddress = src_addr;
> > +  }
> > +#else
> > +  /* IVB does not have a general purpose register for command streamer
> > +   * commands. Therefore, we use an alternate temporary register.
> > +   */
> > +#define TEMP_REG 0x2400 /* MI_PREDICATE_SRC0 */
> >
> 
> Using the predicate register seems a bit sketchy.  Vulkan doesn't support
> predication today so it's probably safe but I don't know what form
> predication will take in the future (there's a decent chance it'll get
> added) so I have no idea if this will end up being safe.  Why not use one
> of the indirect dispatch/draw registers?  Those will be safe because we
> only ever set them immediately before 3DPRIMITIVE or GPGPU_WALKER.
> 
> --Jason
> 
> 

I don't mind using any alternate register, so long as it doesn't lose
any bits (like the SO_WRITE_OFFSET registers). The register, Load
Indirect Base Vertex (0x2440), looks like it will work just fine.

Thanks,
Nanley

> > +  anv_batch_emit(&cmd_buffer->batch, GENX(MI_LOAD_REGISTER_MEM),
> > load) {
> > + load.RegisterAddress = TEMP_REG;
> > + load.MemoryAddress = src_addr;
> > +  }
> > +  anv_batch_emit(&cmd_buffer->batch, GENX(MI_STORE_REGISTER_MEM),
> > store) {
> > + store.RegisterAddress = TEMP_REG;
> > + store.MemoryAddress = dst_addr;
> > +  }
> > +#undef TEMP_REG
> > +#endif
> > +   }
> > +   return;
> > +}
> > +
> > +void
> >  genX(cmd_buffer_so_memcpy)(struct anv_cmd_buffer *cmd_buffer,
> > struct anv_bo *dst, uint32_t dst_offset,
> > struct anv_bo *src, uint32_t src_offset,
> > --
> > 2.13.1
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 09/20] nir: Add system values from ARB_shader_ballot

2017-07-10 Thread Connor Abbott

On Mon, Jul 10, 2017 at 3:50 PM, Matt Turner  wrote:
> On Mon, Jul 10, 2017 at 1:10 PM, Connor Abbott  wrote:
>> On Thu, Jul 6, 2017 at 4:48 PM, Matt Turner  wrote:
>>> We already had a channel_num system value, which I'm renaming to
>>> subgroup_invocation to match the rest of the new system values.
>>>
>>> Note that while ballotARB(true) will return zeros in the high 32-bits on
>>> systems where gl_SubGroupSizeARB <= 32, the gl_SubGroup??MaskARB
>>> variables do not consider whether channels are enabled. See issue (1) of
>>> ARB_shader_ballot.
>>> ---
>>>  src/compiler/nir/nir.c |  4 
>>>  src/compiler/nir/nir_intrinsics.h  |  8 +++-
>>>  src/compiler/nir/nir_lower_system_values.c | 28 
>>> 
>>>  src/intel/compiler/brw_fs_nir.cpp  |  2 +-
>>>  src/intel/compiler/brw_nir_intrinsics.c|  4 ++--
>>>  5 files changed, 42 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c
>>> index 491b908396..9827e129ca 100644
>>> --- a/src/compiler/nir/nir.c
>>> +++ b/src/compiler/nir/nir.c
>>> @@ -1908,6 +1908,10 @@ nir_intrinsic_from_system_value(gl_system_value val)
>>>return nir_intrinsic_load_helper_invocation;
>>> case SYSTEM_VALUE_VIEW_INDEX:
>>>return nir_intrinsic_load_view_index;
>>> +   case SYSTEM_VALUE_SUBGROUP_SIZE:
>>> +  return nir_intrinsic_load_subgroup_size;
>>> +   case SYSTEM_VALUE_SUBGROUP_INVOCATION:
>>> +  return nir_intrinsic_load_subgroup_invocation;
>>> default:
>>>unreachable("system value does not directly correspond to 
>>> intrinsic");
>>> }
>>> diff --git a/src/compiler/nir/nir_intrinsics.h 
>>> b/src/compiler/nir/nir_intrinsics.h
>>> index 6c6ba4cf59..96ecfbc338 100644
>>> --- a/src/compiler/nir/nir_intrinsics.h
>>> +++ b/src/compiler/nir/nir_intrinsics.h
>>> @@ -344,10 +344,16 @@ SYSTEM_VALUE(work_group_id, 3, 0, xx, xx, xx)
>>>  SYSTEM_VALUE(user_clip_plane, 4, 1, UCP_ID, xx, xx)
>>>  SYSTEM_VALUE(num_work_groups, 3, 0, xx, xx, xx)
>>>  SYSTEM_VALUE(helper_invocation, 1, 0, xx, xx, xx)
>>> -SYSTEM_VALUE(channel_num, 1, 0, xx, xx, xx)
>>>  SYSTEM_VALUE(alpha_ref_float, 1, 0, xx, xx, xx)
>>>  SYSTEM_VALUE(layer_id, 1, 0, xx, xx, xx)
>>>  SYSTEM_VALUE(view_index, 1, 0, xx, xx, xx)
>>> +SYSTEM_VALUE(subgroup_size, 1, 0, xx, xx, xx)
>>> +SYSTEM_VALUE(subgroup_invocation, 1, 0, xx, xx, xx)
>>> +SYSTEM_VALUE(subgroup_eq_mask, 1, 0, xx, xx, xx)
>>> +SYSTEM_VALUE(subgroup_ge_mask, 1, 0, xx, xx, xx)
>>> +SYSTEM_VALUE(subgroup_gt_mask, 1, 0, xx, xx, xx)
>>> +SYSTEM_VALUE(subgroup_le_mask, 1, 0, xx, xx, xx)
>>> +SYSTEM_VALUE(subgroup_lt_mask, 1, 0, xx, xx, xx)
>>>
>>>  /* Blend constant color values.  Float values are clamped. */
>>>  SYSTEM_VALUE(blend_const_color_r_float, 1, 0, xx, xx, xx)
>>> diff --git a/src/compiler/nir/nir_lower_system_values.c 
>>> b/src/compiler/nir/nir_lower_system_values.c
>>> index 810100a081..faf0c3c9da 100644
>>> --- a/src/compiler/nir/nir_lower_system_values.c
>>> +++ b/src/compiler/nir/nir_lower_system_values.c
>>> @@ -116,6 +116,34 @@ convert_block(nir_block *block, nir_builder *b)
>>> nir_load_base_instance(b));
>>>   break;
>>>
>>> +  case SYSTEM_VALUE_SUBGROUP_EQ_MASK:
>>> +  case SYSTEM_VALUE_SUBGROUP_GE_MASK:
>>> +  case SYSTEM_VALUE_SUBGROUP_GT_MASK:
>>> +  case SYSTEM_VALUE_SUBGROUP_LE_MASK:
>>> +  case SYSTEM_VALUE_SUBGROUP_LT_MASK: {
>>> + nir_ssa_def *count = nir_load_subgroup_invocation(b);
>>> +
>>> + switch (var->data.location) {
>>> + case SYSTEM_VALUE_SUBGROUP_EQ_MASK:
>>> +sysval = nir_ishl(b, nir_imm_int64(b, 1ull), count);
>>> +break;
>>> + case SYSTEM_VALUE_SUBGROUP_GE_MASK:
>>> +sysval = nir_ishl(b, nir_imm_int64(b, ~0ull), count);
>>> +break;
>>> + case SYSTEM_VALUE_SUBGROUP_GT_MASK:
>>> +sysval = nir_ishl(b, nir_imm_int64(b, ~1ull), count);
>>> +break;
>>> + case SYSTEM_VALUE_SUBGROUP_LE_MASK:
>>> +sysval = nir_inot(b, nir_ishl(b, nir_imm_int64(b, ~1ull), 
>>> count));
>>> +break;
>>> + case SYSTEM_VALUE_SUBGROUP_LT_MASK:
>>> +sysval = nir_inot(b, nir_ishl(b, nir_imm_int64(b, ~0ull), 
>>> count));
>>> +break;
>>> + default:
>>> +unreachable("you seriously can't tell this is unreachable?");
>>> + }
>>> +  }
>>> +
>>
>> While this fine to do for both Intel and AMD, Nvidia actually has
>> special system values for these, and AMD has special instructions for
>> bitCount(foo & gl_SubGroupLtMask), so I think we should have actual
>
> So, just add this to the above switch statement?
>
>if (!b->shader->options->lower_subgroup_masks)
>   break;
>
> I'll also add the missing cases to nir_intrinsic_from_system_value()
> and nir_system_value_from_intrinsic().

Well, it gets a little more complicated... with SPIR-V, you also have

Re: [Mesa-dev] [PATCH v3 16/16] anv: Predicate fast-clear resolves

2017-07-10 Thread Jason Ekstrand

On Wed, Jun 28, 2017 at 2:14 PM, Nanley Chery  wrote:

> Image layouts only let us know that an image *may* be fast-cleared. For
> this reason we can end up with redundant resolves. Testing has shown
> that such resolves can measurably hurt performance and that predicating
> them can avoid the penalty.
>
> Signed-off-by: Nanley Chery 
> ---
>  src/intel/vulkan/anv_blorp.c   |  3 +-
>  src/intel/vulkan/anv_private.h | 13 --
>  src/intel/vulkan/genX_cmd_buffer.c | 87 ++
> ++--
>  3 files changed, 95 insertions(+), 8 deletions(-)
>
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> index 35317ba6be..d06d7e2cc3 100644
> --- a/src/intel/vulkan/anv_blorp.c
> +++ b/src/intel/vulkan/anv_blorp.c
> @@ -1619,7 +1619,8 @@ anv_ccs_resolve(struct anv_cmd_buffer * const
> cmd_buffer,
>return;
>
> struct blorp_batch batch;
> -   blorp_batch_init(&cmd_buffer->device->blorp, &batch, cmd_buffer, 0);
> +   blorp_batch_init(&cmd_buffer->device->blorp, &batch, cmd_buffer,
> +BLORP_BATCH_PREDICATE_ENABLE);
>
> struct blorp_surf surf;
> get_blorp_surf_for_anv_image(image, VK_IMAGE_ASPECT_COLOR_BIT,
> diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_
> private.h
> index be1623f3c3..951cf50842 100644
> --- a/src/intel/vulkan/anv_private.h
> +++ b/src/intel/vulkan/anv_private.h
> @@ -2118,11 +2118,16 @@ anv_fast_clear_state_entry_size(const struct
> anv_device *device)
>  {
> assert(device);
> /* Entry contents:
> -*   +--+
> -*   | clear value dword(s) |
> -*   +--+
> +*   ++
> +*   | clear value dword(s) | needs resolve dword |
> +*   ++
>  */
> -   return device->isl_dev.ss.clear_value_size;
> +
> +   /* Ensure that the needs resolve dword is in fact dword-aligned to
> enable
> +* GPU memcpy operations.
> +*/
> +   assert(device->isl_dev.ss.clear_value_size % 4 == 0);
> +   return device->isl_dev.ss.clear_value_size + 4;
>  }
>
>  /* Returns true if a HiZ-enabled depth buffer can be sampled from. */
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> b/src/intel/vulkan/genX_cmd_buffer.c
> index 62a2f22782..65d9c92783 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -421,6 +421,59 @@ get_fast_clear_state_entry_offset(const struct
> anv_device *device,
> return offset;
>  }
>
> +#define MI_PREDICATE_SRC0  0x2400
> +#define MI_PREDICATE_SRC1  0x2408
> +
> +enum ccs_resolve_state {
> +   CCS_RESOLVE_NOT_NEEDED,
> +   CCS_RESOLVE_NEEDED,
>

Are these two values sufficient?  Do we ever have a scenario where we do a
partial resolve and then a full resolve?  Do we need to be able to track
that?


> +   CCS_RESOLVE_STARTING,
> +};
> +
> +/* Manages the state of an color image subresource to ensure resolves are
> + * performed properly.
> + */
> +static void
> +genX(set_resolve_state)(struct anv_cmd_buffer *cmd_buffer,
> +const struct anv_image *image,
> +unsigned level,
> +enum ccs_resolve_state state)
> +{
> +   assert(cmd_buffer && image);
> +   assert(image->aspects == VK_IMAGE_ASPECT_COLOR_BIT);
> +   assert(level < anv_image_aux_levels(image));
> +
> +   const uint32_t resolve_flag_offset =
> +  get_fast_clear_state_entry_offset(cmd_buffer->device, image,
> level) +
> +  cmd_buffer->device->isl_dev.ss.clear_value_size;
> +
> +   if (state != CCS_RESOLVE_STARTING) {
> +  assert(state == CCS_RESOLVE_NEEDED || state ==
> CCS_RESOLVE_NOT_NEEDED);
> +  /* The HW docs say that there is no way to guarantee the completion
> of
> +   * the following command. We use it nevertheless because it shows no
> +   * issues in testing is currently being used in the GL driver.
> +   */
> +  anv_batch_emit(&cmd_buffer->batch, GENX(MI_STORE_DATA_IMM), sdi) {
> + sdi.Address = (struct anv_address) { image->bo,
> resolve_flag_offset };
> + sdi.ImmediateData = state == CCS_RESOLVE_NEEDED;
> +  }
> +   } else {
> +  /* Make the pending predicated resolve a no-op if one is not needed.
> +   * predicate = do_resolve = resolve_flag != 0;
> +   */
> +  emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC1, 0);
> +  emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC1 + 4, 0);
> +  emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC0, 0);
> +  emit_lrm(&cmd_buffer->batch, MI_PREDICATE_SRC0 + 4,
> +   image->bo, resolve_flag_offset);
> +  anv_batch_emit(&cmd_buffer->batch, GENX(MI_PREDICATE), mip) {
> + mip.LoadOperation= LOAD_LOADINV;
> + mip.CombineOperation = COMBINE_SET;
> + mip.CompareOperation = COMPARE_SRCS_EQUAL;
> +  }
> +   }
>

This function does two very different things hidden behind an enum that, in
my view, only cloud

Re: [Mesa-dev] [PATCH 09/20] nir: Add system values from ARB_shader_ballot

2017-07-10 Thread Matt Turner

On Mon, Jul 10, 2017 at 1:10 PM, Connor Abbott  wrote:
> On Thu, Jul 6, 2017 at 4:48 PM, Matt Turner  wrote:
>> We already had a channel_num system value, which I'm renaming to
>> subgroup_invocation to match the rest of the new system values.
>>
>> Note that while ballotARB(true) will return zeros in the high 32-bits on
>> systems where gl_SubGroupSizeARB <= 32, the gl_SubGroup??MaskARB
>> variables do not consider whether channels are enabled. See issue (1) of
>> ARB_shader_ballot.
>> ---
>>  src/compiler/nir/nir.c |  4 
>>  src/compiler/nir/nir_intrinsics.h  |  8 +++-
>>  src/compiler/nir/nir_lower_system_values.c | 28 
>>  src/intel/compiler/brw_fs_nir.cpp  |  2 +-
>>  src/intel/compiler/brw_nir_intrinsics.c|  4 ++--
>>  5 files changed, 42 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c
>> index 491b908396..9827e129ca 100644
>> --- a/src/compiler/nir/nir.c
>> +++ b/src/compiler/nir/nir.c
>> @@ -1908,6 +1908,10 @@ nir_intrinsic_from_system_value(gl_system_value val)
>>return nir_intrinsic_load_helper_invocation;
>> case SYSTEM_VALUE_VIEW_INDEX:
>>return nir_intrinsic_load_view_index;
>> +   case SYSTEM_VALUE_SUBGROUP_SIZE:
>> +  return nir_intrinsic_load_subgroup_size;
>> +   case SYSTEM_VALUE_SUBGROUP_INVOCATION:
>> +  return nir_intrinsic_load_subgroup_invocation;
>> default:
>>unreachable("system value does not directly correspond to intrinsic");
>> }
>> diff --git a/src/compiler/nir/nir_intrinsics.h 
>> b/src/compiler/nir/nir_intrinsics.h
>> index 6c6ba4cf59..96ecfbc338 100644
>> --- a/src/compiler/nir/nir_intrinsics.h
>> +++ b/src/compiler/nir/nir_intrinsics.h
>> @@ -344,10 +344,16 @@ SYSTEM_VALUE(work_group_id, 3, 0, xx, xx, xx)
>>  SYSTEM_VALUE(user_clip_plane, 4, 1, UCP_ID, xx, xx)
>>  SYSTEM_VALUE(num_work_groups, 3, 0, xx, xx, xx)
>>  SYSTEM_VALUE(helper_invocation, 1, 0, xx, xx, xx)
>> -SYSTEM_VALUE(channel_num, 1, 0, xx, xx, xx)
>>  SYSTEM_VALUE(alpha_ref_float, 1, 0, xx, xx, xx)
>>  SYSTEM_VALUE(layer_id, 1, 0, xx, xx, xx)
>>  SYSTEM_VALUE(view_index, 1, 0, xx, xx, xx)
>> +SYSTEM_VALUE(subgroup_size, 1, 0, xx, xx, xx)
>> +SYSTEM_VALUE(subgroup_invocation, 1, 0, xx, xx, xx)
>> +SYSTEM_VALUE(subgroup_eq_mask, 1, 0, xx, xx, xx)
>> +SYSTEM_VALUE(subgroup_ge_mask, 1, 0, xx, xx, xx)
>> +SYSTEM_VALUE(subgroup_gt_mask, 1, 0, xx, xx, xx)
>> +SYSTEM_VALUE(subgroup_le_mask, 1, 0, xx, xx, xx)
>> +SYSTEM_VALUE(subgroup_lt_mask, 1, 0, xx, xx, xx)
>>
>>  /* Blend constant color values.  Float values are clamped. */
>>  SYSTEM_VALUE(blend_const_color_r_float, 1, 0, xx, xx, xx)
>> diff --git a/src/compiler/nir/nir_lower_system_values.c 
>> b/src/compiler/nir/nir_lower_system_values.c
>> index 810100a081..faf0c3c9da 100644
>> --- a/src/compiler/nir/nir_lower_system_values.c
>> +++ b/src/compiler/nir/nir_lower_system_values.c
>> @@ -116,6 +116,34 @@ convert_block(nir_block *block, nir_builder *b)
>> nir_load_base_instance(b));
>>   break;
>>
>> +  case SYSTEM_VALUE_SUBGROUP_EQ_MASK:
>> +  case SYSTEM_VALUE_SUBGROUP_GE_MASK:
>> +  case SYSTEM_VALUE_SUBGROUP_GT_MASK:
>> +  case SYSTEM_VALUE_SUBGROUP_LE_MASK:
>> +  case SYSTEM_VALUE_SUBGROUP_LT_MASK: {
>> + nir_ssa_def *count = nir_load_subgroup_invocation(b);
>> +
>> + switch (var->data.location) {
>> + case SYSTEM_VALUE_SUBGROUP_EQ_MASK:
>> +sysval = nir_ishl(b, nir_imm_int64(b, 1ull), count);
>> +break;
>> + case SYSTEM_VALUE_SUBGROUP_GE_MASK:
>> +sysval = nir_ishl(b, nir_imm_int64(b, ~0ull), count);
>> +break;
>> + case SYSTEM_VALUE_SUBGROUP_GT_MASK:
>> +sysval = nir_ishl(b, nir_imm_int64(b, ~1ull), count);
>> +break;
>> + case SYSTEM_VALUE_SUBGROUP_LE_MASK:
>> +sysval = nir_inot(b, nir_ishl(b, nir_imm_int64(b, ~1ull), 
>> count));
>> +break;
>> + case SYSTEM_VALUE_SUBGROUP_LT_MASK:
>> +sysval = nir_inot(b, nir_ishl(b, nir_imm_int64(b, ~0ull), 
>> count));
>> +break;
>> + default:
>> +unreachable("you seriously can't tell this is unreachable?");
>> + }
>> +  }
>> +
>
> While this fine to do for both Intel and AMD, Nvidia actually has
> special system values for these, and AMD has special instructions for
> bitCount(foo & gl_SubGroupLtMask), so I think we should have actual

So, just add this to the above switch statement?

   if (!b->shader->options->lower_subgroup_masks)
  break;

I'll also add the missing cases to nir_intrinsic_from_system_value()
and nir_system_value_from_intrinsic().

> nir_load_subgroup_*_mask intrinsics for these. Also, that way you can
> use the same shrinking logic to turn these into 32-bit shifts on
> Intel.

The channel liveness doesn't affect SubGroup*Mask. See Issue (1) of
ARB_shader_ballot (note Not

Re: [Mesa-dev] [PATCH 5/5] svga: s/unsigned/enum tgsi_texture_type/

2017-07-10 Thread Charmaine Lee


Series looks good.

Reviewed-by: Charmaine Lee 

From: Brian Paul 
Sent: Monday, July 10, 2017 2:50:25 PM
To: mesa-dev@lists.freedesktop.org
Cc: Charmaine Lee; Neha Bhende
Subject: [PATCH 5/5] svga: s/unsigned/enum tgsi_texture_type/

---
 src/gallium/drivers/svga/svga_tgsi_vgpu10.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c 
b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
index bbaad20..d29ac28 100644
--- a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
+++ b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
@@ -2955,7 +2955,8 @@ emit_sampler_declarations(struct svga_shader_emitter_v10 
*emit)
  * Translate TGSI_TEXTURE_x to VGAPU10_RESOURCE_DIMENSION_x.
  */
 static unsigned
-tgsi_texture_to_resource_dimension(unsigned target, boolean is_array)
+tgsi_texture_to_resource_dimension(enum tgsi_texture_type target,
+   boolean is_array)
 {
switch (target) {
case TGSI_TEXTURE_BUFFER:
@@ -4867,7 +4868,7 @@ setup_texcoord(struct svga_shader_emitter_v10 *emit,
  */
 static void
 emit_tex_compare_refcoord(struct svga_shader_emitter_v10 *emit,
-  unsigned target,
+  enum tgsi_texture_type target,
   const struct tgsi_full_src_register *coord)
 {
struct tgsi_full_src_register coord_src_ref;
@@ -4901,7 +4902,7 @@ struct tex_swizzle_info
boolean swizzled;
boolean shadow_compare;
unsigned unit;
-   unsigned texture_target;  /**< TGSI_TEXTURE_x */
+   enum tgsi_texture_type texture_target;  /**< TGSI_TEXTURE_x */
struct tgsi_full_src_register tmp_src;
struct tgsi_full_dst_register tmp_dst;
const struct tgsi_full_dst_register *inst_dst;
--
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 101668] Counter-Strike: Global Offense - Everything is purple

2017-07-10 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=101668

Ernst Sjöstrand  changed:

   What|Removed |Added

 CC||ern...@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 101668] Counter-Strike: Global Offense - Everything is purple

2017-07-10 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=101668

--- Comment #4 from Ernst Sjöstrand  ---
I saw this too. The next time Padoka updates LLVM that fix should come in,
let's see if that fixes it.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] mesa: GL_TEXTURE_BORDER_COLOR exists in OpenGL 1.0, so don't depend on GL_ARB_texture_border_clamp

2017-07-10 Thread Francisco Jerez

Ian Romanick  writes:

> From: Ian Romanick 
>
> On NV20 (and probably also on earlier NV GPUs that lack
> GL_ARB_texture_border_clamp) fixes the following piglit tests:
>
> gl-1.0-beginend-coverage gltexparameter[if]{v,}
> push-pop-texture-state
> texwrap 1d
> texwrap 1d proj
> texwrap 2d proj
> texwrap formats
>
> All told, 49 more tests pass on NV20 (10de:0201).
>
> No changes on Intel CI run or RV250 (1002:4c66).
>
> Signed-off-by: Ian Romanick 

Thanks -- Series is:

Reviewed-by: Francisco Jerez 

> ---
>  src/mesa/main/texparam.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/main/texparam.c b/src/mesa/main/texparam.c
> index 3c110de..857faf6 100644
> --- a/src/mesa/main/texparam.c
> +++ b/src/mesa/main/texparam.c
> @@ -736,8 +736,16 @@ set_tex_parameterf(struct gl_context *ctx,
>break;
>  
> case GL_TEXTURE_BORDER_COLOR:
> +  /* Border color exists in desktop OpenGL since 1.0 for GL_CLAMP.  In
> +   * OpenGL ES 2.0+, it only exists in when GL_OES_texture_border_clamp 
> is
> +   * enabled.  It is never available in OpenGL ES 1.x.
> +   *
> +   * FIXME: Every driver that supports GLES2 has this extension.  Elide
> +   * the check?
> +   */
>if (ctx->API == API_OPENGLES ||
> -  !ctx->Extensions.ARB_texture_border_clamp)
> +  (ctx->API == API_OPENGLES2 &&
> +   !ctx->Extensions.ARB_texture_border_clamp))
>   goto invalid_pname;
>  
>if (!_mesa_target_allows_setting_sampler_parameters(texObj->Target))
> -- 
> 2.9.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 05/16] anv/cmd_buffer: Restrict fast clears in the GENERAL layout

2017-07-10 Thread Nanley Chery

On Mon, Jul 10, 2017 at 09:28:05AM -0700, Jason Ekstrand wrote:
> On Wed, Jun 28, 2017 at 2:14 PM, Nanley Chery  wrote:
> 
> > v2: Remove ::first_subpass_layout assertion (Jason Ekstrand).
> > v3: Allow some fast clears in the GENERAL layout.
> >
> > Signed-off-by: Nanley Chery 
> > ---
> >  src/intel/vulkan/anv_pass.c| 22 ++
> >  src/intel/vulkan/anv_private.h |  2 ++
> >  src/intel/vulkan/genX_cmd_buffer.c | 17 -
> >  3 files changed, 40 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/intel/vulkan/anv_pass.c b/src/intel/vulkan/anv_pass.c
> > index 1b30c1409d..ab0733fc10 100644
> > --- a/src/intel/vulkan/anv_pass.c
> > +++ b/src/intel/vulkan/anv_pass.c
> > @@ -34,6 +34,16 @@ num_subpass_attachments(const VkSubpassDescription
> > *desc)
> >(desc->pDepthStencilAttachment != NULL);
> >  }
> >
> > +static void
> > +init_first_subpass_layout(struct anv_render_pass_attachment * const att,
> > +  const VkAttachmentReference att_ref)
> > +{
> > +   if (att->first_subpass_layout == VK_IMAGE_LAYOUT_UNDEFINED) {
> > +  att->first_subpass_layout = att_ref.layout;
> > +  assert(att->first_subpass_layout != VK_IMAGE_LAYOUT_UNDEFINED);
> > +   }
> > +}
> > +
> >  VkResult anv_CreateRenderPass(
> >  VkDevice_device,
> >  const VkRenderPassCreateInfo*   pCreateInfo,
> > @@ -91,6 +101,7 @@ VkResult anv_CreateRenderPass(
> >att->stencil_load_op = pCreateInfo->pAttachments[i].stencilLoadOp;
> >att->initial_layout = pCreateInfo->pAttachments[i].initialLayout;
> >att->final_layout = pCreateInfo->pAttachments[i].finalLayout;
> > +  att->first_subpass_layout = VK_IMAGE_LAYOUT_UNDEFINED;
> >att->subpass_usage = subpass_usages;
> >subpass_usages += pass->subpass_count;
> > }
> > @@ -119,6 +130,8 @@ VkResult anv_CreateRenderPass(
> > pass->attachments[a].subpass_usage[i] |=
> > ANV_SUBPASS_USAGE_INPUT;
> > pass->attachments[a].last_subpass_idx = i;
> >
> > +   init_first_subpass_layout(&pass->attachments[a],
> > + desc->pInputAttachments[j]);
> > if (desc->pDepthStencilAttachment &&
> > a == desc->pDepthStencilAttachment->attachment)
> >subpass->has_ds_self_dep = true;
> > @@ -138,6 +151,9 @@ VkResult anv_CreateRenderPass(
> > pass->attachments[a].usage |= VK_IMAGE_USAGE_COLOR_
> > ATTACHMENT_BIT;
> > pass->attachments[a].subpass_usage[i] |=
> > ANV_SUBPASS_USAGE_DRAW;
> > pass->attachments[a].last_subpass_idx = i;
> > +
> > +   init_first_subpass_layout(&pass->attachments[a],
> > + desc->pColorAttachments[j]);
> >  }
> >   }
> >}
> > @@ -162,6 +178,9 @@ VkResult anv_CreateRenderPass(
> > pass->attachments[a].subpass_usage[i] |=
> >ANV_SUBPASS_USAGE_RESOLVE_DST;
> > pass->attachments[a].last_subpass_idx = i;
> > +
> > +   init_first_subpass_layout(&pass->attachments[a],
> > + desc->pResolveAttachments[j]);
> >  }
> >   }
> >}
> > @@ -176,6 +195,9 @@ VkResult anv_CreateRenderPass(
> > VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT;
> >  pass->attachments[a].subpass_usage[i] |=
> > ANV_SUBPASS_USAGE_DRAW;
> >  pass->attachments[a].last_subpass_idx = i;
> > +
> > +init_first_subpass_layout(&pass->attachments[a],
> > +  *desc->pDepthStencilAttachment);
> >   }
> >} else {
> >   subpass->depth_stencil_attachment.attachment =
> > VK_ATTACHMENT_UNUSED;
> > diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_
> > private.h
> > index a95188ac30..c5a2ba0888 100644
> > --- a/src/intel/vulkan/anv_private.h
> > +++ b/src/intel/vulkan/anv_private.h
> > @@ -1518,6 +1518,7 @@ struct anv_attachment_state {
> > bool fast_clear;
> > VkClearValue clear_value;
> > bool clear_color_is_zero_one;
> > +   bool clear_color_is_zero;
> >  };
> >
> >  /** State required while building cmd buffer */
> > @@ -2336,6 +2337,7 @@ struct anv_render_pass_attachment {
> > VkAttachmentLoadOp   stencil_load_op;
> > VkImageLayoutinitial_layout;
> > VkImageLayoutfinal_layout;
> > +   VkImageLayoutfirst_subpass_layout;
> >
> > /* An array, indexed by subpass id, of how the attachment will be
> > used. */
> > enum anv_subpass_usage * subpass_usa

[Mesa-dev] [PATCH 5/5] svga: s/unsigned/enum tgsi_texture_type/

2017-07-10 Thread Brian Paul

---
 src/gallium/drivers/svga/svga_tgsi_vgpu10.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c 
b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
index bbaad20..d29ac28 100644
--- a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
+++ b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
@@ -2955,7 +2955,8 @@ emit_sampler_declarations(struct svga_shader_emitter_v10 
*emit)
  * Translate TGSI_TEXTURE_x to VGAPU10_RESOURCE_DIMENSION_x.
  */
 static unsigned
-tgsi_texture_to_resource_dimension(unsigned target, boolean is_array)
+tgsi_texture_to_resource_dimension(enum tgsi_texture_type target,
+   boolean is_array)
 {
switch (target) {
case TGSI_TEXTURE_BUFFER:
@@ -4867,7 +4868,7 @@ setup_texcoord(struct svga_shader_emitter_v10 *emit,
  */
 static void
 emit_tex_compare_refcoord(struct svga_shader_emitter_v10 *emit,
-  unsigned target,
+  enum tgsi_texture_type target,
   const struct tgsi_full_src_register *coord)
 {
struct tgsi_full_src_register coord_src_ref;
@@ -4901,7 +4902,7 @@ struct tex_swizzle_info
boolean swizzled;
boolean shadow_compare;
unsigned unit;
-   unsigned texture_target;  /**< TGSI_TEXTURE_x */
+   enum tgsi_texture_type texture_target;  /**< TGSI_TEXTURE_x */
struct tgsi_full_src_register tmp_src;
struct tgsi_full_dst_register tmp_dst;
const struct tgsi_full_dst_register *inst_dst;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/5] svga: s/unsigned/enum tgsi_interpolate_mode/

2017-07-10 Thread Brian Paul

And s/unsigned/enum tgsi_interpolate_loc/
---
 src/gallium/drivers/svga/svga_tgsi_vgpu10.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c 
b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
index cd4cab4..d02dbb6 100644
--- a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
+++ b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
@@ -1843,7 +1843,8 @@ emit_vgpu10_immediates_block(struct 
svga_shader_emitter_v10 *emit)
  */
 static unsigned
 translate_interpolation(const struct svga_shader_emitter_v10 *emit,
-unsigned interp, unsigned interpolate_loc)
+enum tgsi_interpolate_mode interp,
+enum tgsi_interpolate_loc interpolate_loc)
 {
if (interp == TGSI_INTERPOLATE_COLOR) {
   interp = emit->key.fs.flatshade ?
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/5] svga: s/unsigned/enum tgsi_swizzle/

2017-07-10 Thread Brian Paul

---
 src/gallium/drivers/svga/svga_tgsi_vgpu10.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c 
b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
index d02dbb6..bbaad20 100644
--- a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
+++ b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
@@ -1470,7 +1470,7 @@ absolute_src(const struct tgsi_full_src_register *reg)
 
 /** Return the named swizzle term from the src register */
 static inline unsigned
-get_swizzle(const struct tgsi_full_src_register *reg, unsigned term)
+get_swizzle(const struct tgsi_full_src_register *reg, enum tgsi_swizzle term)
 {
switch (term) {
case TGSI_SWIZZLE_X:
@@ -1493,8 +1493,8 @@ get_swizzle(const struct tgsi_full_src_register *reg, 
unsigned term)
  */
 static struct tgsi_full_src_register
 swizzle_src(const struct tgsi_full_src_register *reg,
-unsigned swizzleX, unsigned swizzleY,
-unsigned swizzleZ, unsigned swizzleW)
+enum tgsi_swizzle swizzleX, enum tgsi_swizzle swizzleY,
+enum tgsi_swizzle swizzleZ, enum tgsi_swizzle swizzleW)
 {
struct tgsi_full_src_register swizzled = *reg;
/* Note: we swizzle the current swizzle */
@@ -1511,7 +1511,7 @@ swizzle_src(const struct tgsi_full_src_register *reg,
  * terms are the same.
  */
 static struct tgsi_full_src_register
-scalar_src(const struct tgsi_full_src_register *reg, unsigned swizzle)
+scalar_src(const struct tgsi_full_src_register *reg, enum tgsi_swizzle swizzle)
 {
struct tgsi_full_src_register swizzled = *reg;
/* Note: we swizzle the current swizzle */
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/5] svga: s/unsigned/enum tgsi_semantic/

2017-07-10 Thread Brian Paul

Makes gdb debugging a little nicer.
---
 src/gallium/drivers/svga/svga_link.c   |  2 +-
 src/gallium/drivers/svga/svga_pipe_streamout.c |  3 ++-
 src/gallium/drivers/svga/svga_swtnl_state.c|  2 +-
 src/gallium/drivers/svga/svga_tgsi_vgpu10.c| 11 ++-
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_link.c 
b/src/gallium/drivers/svga/svga_link.c
index 5bc7f61..9c1df0c 100644
--- a/src/gallium/drivers/svga/svga_link.c
+++ b/src/gallium/drivers/svga/svga_link.c
@@ -62,7 +62,7 @@ svga_link_shaders(const struct tgsi_shader_info 
*outshader_info,
free_slot = outshader_info->num_outputs + 1;
 
for (i = 0; i < inshader_info->num_inputs; i++) {
-  unsigned sem_name = inshader_info->input_semantic_name[i];
+  enum tgsi_semantic sem_name = inshader_info->input_semantic_name[i];
   unsigned sem_index = inshader_info->input_semantic_index[i];
   unsigned j;
   /**
diff --git a/src/gallium/drivers/svga/svga_pipe_streamout.c 
b/src/gallium/drivers/svga/svga_pipe_streamout.c
index 3f30e64..0c6c034 100644
--- a/src/gallium/drivers/svga/svga_pipe_streamout.c
+++ b/src/gallium/drivers/svga/svga_pipe_streamout.c
@@ -92,7 +92,8 @@ svga_create_stream_output(struct svga_context *svga,
for (i = 0; i < info->num_outputs; i++) {
   unsigned reg_idx = info->output[i].register_index;
   unsigned buf_idx = info->output[i].output_buffer;
-  const unsigned sem_name = shader->info.output_semantic_name[reg_idx];
+  const enum tgsi_semantic sem_name =
+ shader->info.output_semantic_name[reg_idx];
 
   assert(buf_idx <= PIPE_MAX_SO_BUFFERS);
 
diff --git a/src/gallium/drivers/svga/svga_swtnl_state.c 
b/src/gallium/drivers/svga/svga_swtnl_state.c
index 71faf3a..8b7a8e7 100644
--- a/src/gallium/drivers/svga/svga_swtnl_state.c
+++ b/src/gallium/drivers/svga/svga_swtnl_state.c
@@ -253,7 +253,7 @@ svga_swtnl_update_vdecl( struct svga_context *svga )
nr_decls++;
 
for (i = 0; i < fs->base.info.num_inputs; i++) {
-  const unsigned sem_name = fs->base.info.input_semantic_name[i];
+  const enum tgsi_semantic sem_name = fs->base.info.input_semantic_name[i];
   const unsigned sem_index = fs->base.info.input_semantic_index[i];
 
   src = draw_find_shader_output(draw, sem_name, sem_index);
diff --git a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c 
b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
index 1dd76cc..070d67f 100644
--- a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
+++ b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
@@ -851,7 +851,7 @@ emit_dst_register(struct svga_shader_emitter_v10 *emit,
 {
unsigned file = reg->Register.File;
unsigned index = reg->Register.Index;
-   const unsigned sem_name = emit->info.output_semantic_name[index];
+   const enum tgsi_semantic sem_name = emit->info.output_semantic_name[index];
const unsigned sem_index = emit->info.output_semantic_index[index];
unsigned writemask = reg->Register.WriteMask;
const unsigned indirect = reg->Register.Indirect;
@@ -2178,7 +2178,7 @@ emit_fragdepth_output_declaration(struct 
svga_shader_emitter_v10 *emit)
  */
 static void
 emit_system_value_declaration(struct svga_shader_emitter_v10 *emit,
-  unsigned semantic_name, unsigned index)
+  enum tgsi_semantic semantic_name, unsigned index)
 {
switch (semantic_name) {
case TGSI_SEMANTIC_INSTANCEID:
@@ -2345,7 +2345,7 @@ emit_input_declarations(struct svga_shader_emitter_v10 
*emit)
if (emit->unit == PIPE_SHADER_FRAGMENT) {
 
   for (i = 0; i < emit->linkage.num_inputs; i++) {
- unsigned semantic_name = emit->info.input_semantic_name[i];
+ enum tgsi_semantic semantic_name = emit->info.input_semantic_name[i];
  unsigned usage_mask = emit->info.input_usage_mask[i];
  unsigned index = emit->linkage.input_map[i];
  unsigned type, interpolationMode, name;
@@ -2404,7 +2404,7 @@ emit_input_declarations(struct svga_shader_emitter_v10 
*emit)
else if (emit->unit == PIPE_SHADER_GEOMETRY) {
 
   for (i = 0; i < emit->info.num_inputs; i++) {
- unsigned semantic_name = emit->info.input_semantic_name[i];
+ enum tgsi_semantic semantic_name = emit->info.input_semantic_name[i];
  unsigned usage_mask = emit->info.input_usage_mask[i];
  unsigned index = emit->linkage.input_map[i];
  unsigned opcodeType, operandType;
@@ -2487,7 +2487,8 @@ emit_output_declarations(struct svga_shader_emitter_v10 
*emit)
 
for (i = 0; i < emit->info.num_outputs; i++) {
   /*const unsigned usage_mask = emit->info.output_usage_mask[i];*/
-  const unsigned semantic_name = emit->info.output_semantic_name[i];
+  const enum tgsi_semantic semantic_name =
+ emit->info.output_semantic_name[i];
   const unsigned semantic_index = emit->info.output_semantic_index[i];
   unsigned index = i;
 
-- 
1.9.1

[Mesa-dev] [PATCH 2/5] svga: s/unsigned/enum tgsi_file_type/

2017-07-10 Thread Brian Paul

---
 src/gallium/drivers/svga/svga_tgsi_vgpu10.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c 
b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
index 070d67f..cd4cab4 100644
--- a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
+++ b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
@@ -706,7 +706,7 @@ emit_null_dst_register(struct svga_shader_emitter_v10 *emit)
  */
 static unsigned
 get_temp_array_id(const struct svga_shader_emitter_v10 *emit,
-  unsigned file, unsigned index)
+  enum tgsi_file_type file, unsigned index)
 {
if (file == TGSI_FILE_TEMPORARY) {
   return emit->temp_map[index].arrayId;
@@ -723,7 +723,7 @@ get_temp_array_id(const struct svga_shader_emitter_v10 
*emit,
  */
 static unsigned
 remap_temp_index(const struct svga_shader_emitter_v10 *emit,
- unsigned file, unsigned index)
+ enum tgsi_file_type file, unsigned index)
 {
if (file == TGSI_FILE_TEMPORARY) {
   return emit->temp_map[index].index;
@@ -741,7 +741,7 @@ remap_temp_index(const struct svga_shader_emitter_v10 *emit,
 static VGPU10OperandToken0
 setup_operand0_indexing(struct svga_shader_emitter_v10 *emit,
 VGPU10OperandToken0 operand0,
-unsigned file,
+enum tgsi_file_type file,
 boolean indirect, boolean index2D,
 unsigned tempArrayID)
 {
@@ -849,7 +849,7 @@ static void
 emit_dst_register(struct svga_shader_emitter_v10 *emit,
   const struct tgsi_full_dst_register *reg)
 {
-   unsigned file = reg->Register.File;
+   enum tgsi_file_type file = reg->Register.File;
unsigned index = reg->Register.Index;
const enum tgsi_semantic sem_name = emit->info.output_semantic_name[index];
const unsigned sem_index = emit->info.output_semantic_index[index];
@@ -967,7 +967,7 @@ static void
 emit_src_register(struct svga_shader_emitter_v10 *emit,
   const struct tgsi_full_src_register *reg)
 {
-   unsigned file = reg->Register.File;
+   enum tgsi_file_type file = reg->Register.File;
unsigned index = reg->Register.Index;
const unsigned indirect = reg->Register.Indirect;
const unsigned tempArrayId = get_temp_array_id(emit, file, index);
@@ -1364,7 +1364,7 @@ free_temp_indexes(struct svga_shader_emitter_v10 *emit)
  * Create a tgsi_full_src_register.
  */
 static struct tgsi_full_src_register
-make_src_reg(unsigned file, unsigned index)
+make_src_reg(enum tgsi_file_type file, unsigned index)
 {
struct tgsi_full_src_register reg;
 
@@ -1413,7 +1413,7 @@ make_src_immediate_reg(unsigned index)
  * Create a tgsi_full_dst_register.
  */
 static struct tgsi_full_dst_register
-make_dst_reg(unsigned file, unsigned index)
+make_dst_reg(enum tgsi_file_type file, unsigned index)
 {
struct tgsi_full_dst_register reg;
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 03/16] anv/cmd_buffer: Initialize the clear values buffer

2017-07-10 Thread Nanley Chery

On Mon, Jul 10, 2017 at 09:18:56AM -0700, Jason Ekstrand wrote:
> On Wed, Jun 28, 2017 at 2:14 PM, Nanley Chery  wrote:
> 
> > v2: Rewrite functions.
> >
> > Signed-off-by: Nanley Chery 
> > ---
> >  src/intel/vulkan/genX_cmd_buffer.c | 93 ++
> > 
> >  1 file changed, 84 insertions(+), 9 deletions(-)
> >
> > diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> > b/src/intel/vulkan/genX_cmd_buffer.c
> > index 53c58ca5b3..8601d706d1 100644
> > --- a/src/intel/vulkan/genX_cmd_buffer.c
> > +++ b/src/intel/vulkan/genX_cmd_buffer.c
> > @@ -384,6 +384,70 @@ transition_depth_buffer(struct anv_cmd_buffer
> > *cmd_buffer,
> >anv_gen8_hiz_op_resolve(cmd_buffer, image, hiz_op);
> >  }
> >
> > +static inline uint32_t
> > +get_fast_clear_state_entry_offset(const struct anv_device *device,
> > +  const struct anv_image *image,
> > +  unsigned level)
> > +{
> > +   assert(device && image);
> > +   assert(image->aspects == VK_IMAGE_ASPECT_COLOR_BIT);
> > +   assert(level < anv_image_aux_levels(image));
> > +   const uint32_t offset = image->offset + image->aux_surface.offset +
> > +   image->aux_surface.isl.size +
> > +   anv_fast_clear_state_entry_size(device) *
> > level;
> > +   assert(offset < image->offset + image->size);
> > +   return offset;
> > +}
> > +
> > +static void
> > +init_fast_clear_state_entry(struct anv_cmd_buffer *cmd_buffer,
> > +const struct anv_image *image,
> > +unsigned level)
> > +{
> > +   assert(cmd_buffer && image);
> > +   assert(image->aspects == VK_IMAGE_ASPECT_COLOR_BIT);
> > +   assert(level < anv_image_aux_levels(image));
> > +
> > +   /* The fast clear value dword(s) will be copied into a surface state
> > object.
> > +* Ensure that the restrictions of the fields in the dword(s) are
> > followed.
> > +*
> > +* CCS buffers on SKL+ can have any value set for the clear colors.
> > +*/
> > +   if (image->samples == 1 && GEN_GEN >= 9)
> > +  return;
> > +
> > +   /* Other combinations of auxiliary buffers and platforms require
> > specific
> > +* values in the clear value dword(s).
> > +*/
> > +   unsigned i = 0;
> > +   for (; i < cmd_buffer->device->isl_dev.ss.clear_value_size; i += 4) {
> > +  anv_batch_emit(&cmd_buffer->batch, GENX(MI_STORE_DATA_IMM), sdi) {
> > + const uint32_t entry_offset =
> > +get_fast_clear_state_entry_offset(cmd_buffer->device, image,
> > level);
> > + sdi.Address = (struct anv_address) { image->bo, entry_offset + i
> > };
> > +
> > + if (GEN_GEN >= 9) {
> > +/* MCS buffers on SKL+ can only have 1/0 clear colors. */
> > +assert(image->aux_usage == ISL_AUX_USAGE_MCS);
> > +sdi.ImmediateData = 0;
> > + } else {
> > +/* Pre-SKL, the dword containing the clear values also
> > contains
> > + * other fields, so we need to initialize those fields to
> > match the
> > + * values that would be in a color attachment.
> > + */
> > +assert(i == 0);
> > +sdi.ImmediateData = level << 8;
> >
> 
> From the Broadwell PRM, RENDER_SURFACE_STATE::Resource Min LOD:
> 
> For Sampling Engine Surfaces:
> This field indicates the most detailed LOD that is present in the resource
> underlying the surface.
> Refer to the "LOD Computation Pseudocode" section for the use of this field.
> 
> For Other Surfaces:
> This field is ignored.
> 
> I think we can safely leave this field zero since this will only ever be
> ORed into render target surfaces.

I agree that we can leave this as zero, but I should mention that we
don't perform any OR operations with this dword and that this field will
be used with input attachments, which are sampling engine surfaces.

> Grepping through isl_surface_state.c also indicates that we never set
> this field in either GL or Vulkan so it's always zero.
> 
> --Jason
> 
> 

Good find. Additionally, going by the description of the field in the HW
docs, setting it to a non-zero value seems incorrect.

Thanks,
Nanley

> > +if (GEN_VERSIONx10 >= 75) {
> > +   sdi.ImmediateData |= ISL_CHANNEL_SELECT_RED   << 25 |
> > +ISL_CHANNEL_SELECT_GREEN << 22 |
> > +ISL_CHANNEL_SELECT_BLUE  << 19 |
> > +ISL_CHANNEL_SELECT_ALPHA << 16;
> >
> 
> These, however, are needed. :-)
> 
> 
> > +}
> > + }
> > +  }
> > +   }
> > +}
> > +
> >  static void
> >  transition_color_buffer(struct anv_cmd_buffer *cmd_buffer,
> >  const struct anv_image *image,
> > @@ -392,7 +456,9 @@ transition_color_buffer(struct anv_cmd_buffer
> > *cmd_buffer,
> >  VkImageLayout initial_layout,
> >  VkImage

Re: [Mesa-dev] [PATCH v3 11/16] anv/cmd_buffer: Move aux_usage assignment up

2017-07-10 Thread Jason Ekstrand

On Wed, Jun 28, 2017 at 2:14 PM, Nanley Chery  wrote:

> For readability, bring the assignment of CCS closer to the assignment of
> NONE and MCS.
>
> Signed-off-by: Nanley Chery 
> ---
>  src/intel/vulkan/genX_cmd_buffer.c | 62 ++
> 
>  1 file changed, 30 insertions(+), 32 deletions(-)
>
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> b/src/intel/vulkan/genX_cmd_buffer.c
> index 49ad41edbd..1aa79c8e7b 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -253,6 +253,36 @@ color_attachment_compute_aux_usage(struct anv_device
> * device,
>att_state->input_aux_usage = ISL_AUX_USAGE_MCS;
>att_state->fast_clear = false;
>return;
> +   } else if (iview->image->aux_usage == ISL_AUX_USAGE_CCS_E) {
> +  att_state->aux_usage = ISL_AUX_USAGE_CCS_E;
> +  att_state->input_aux_usage = ISL_AUX_USAGE_CCS_E;
>

I'm not sure if this actually improves readability as-is.  The no aux case
and MCS cases are early returns.  Maybe what we want is something like this:

if (aux_surface.isl.size == 0) {
   /* set to none */
   return;
}

switch (aux_usage) {
case MCS:
case CCS_E:
   aux_state->aux_usage = iview->image->aux_usage;
   aux_state->input_aux_usage = iview->image->aux_usage;
   break;

case NONE:
   assert(samples == 1);
   /* stuff below */
   break;

default:
   unreachable();
}

/* Now we determine whether or not we want to fast-clear */

if (samples > 1) {
   perf_debug();
   fast_clear = false;
   return;
}

/* Other fast clear determination. */

Incidentally, it may be cleaner in the long run if we split this into two
functions: compute_ccs_usage and compute_mcs_usage.

Just thoughts BTW.  I'm not 100% sure how to make this the most readable.


> +   } else {
> +  att_state->aux_usage = ISL_AUX_USAGE_CCS_D;
> +  /* From the Sky Lake PRM, RENDER_SURFACE_STATE::
> AuxiliarySurfaceMode:
> +   *
> +   *"If Number of Multisamples is MULTISAMPLECOUNT_1, AUX_CCS_D
> +   *setting is only allowed if Surface Format supported for Fast
> +   *Clear. In addition, if the surface is bound to the sampling
> +   *engine, Surface Format must be supported for Render Target
> +   *Compression for surfaces bound to the sampling engine."
> +   *
> +   * In other words, we can only sample from a fast-cleared image if
> it
> +   * also supports color compression.
> +   */
> +  if (isl_format_supports_ccs_e(&device->info, iview->isl.format)) {
> + /* TODO: Consider using a heuristic to determine if temporarily
> enabling
> +  * CCS_E for this image view would be beneficial.
> +  *
> +  * While fast-clear resolves and partial resolves are fairly
> cheap in the
> +  * case where you render to most of the pixels, full resolves
> are not
> +  * because they potentially involve reading and writing the
> entire
> +  * framebuffer.  If we can't texture with CCS_E, we should leave
> it off and
> +  * limit ourselves to fast clears.
> +  */
> + att_state->input_aux_usage = ISL_AUX_USAGE_CCS_D;
> +  } else {
> + att_state->input_aux_usage = ISL_AUX_USAGE_NONE;
> +  }
> }
>
> assert(iview->image->aux_surface.isl.usage & ISL_SURF_USAGE_CCS_BIT);
> @@ -315,38 +345,6 @@ color_attachment_compute_aux_usage(struct anv_device
> * device,
> } else {
>att_state->fast_clear = false;
> }
> -
> -   /**
> -* TODO: Consider using a heuristic to determine if temporarily
> enabling
> -* CCS_E for this image view would be beneficial.
> -*
> -* While fast-clear resolves and partial resolves are fairly cheap in
> the
> -* case where you render to most of the pixels, full resolves are not
> -* because they potentially involve reading and writing the entire
> -* framebuffer.  If we can't texture with CCS_E, we should leave it
> off and
> -* limit ourselves to fast clears.
> -*/
> -   if (iview->image->aux_usage == ISL_AUX_USAGE_CCS_E) {
> -  att_state->aux_usage = ISL_AUX_USAGE_CCS_E;
> -  att_state->input_aux_usage = ISL_AUX_USAGE_CCS_E;
> -   } else {
> -  att_state->aux_usage = ISL_AUX_USAGE_CCS_D;
> -  /* From the Sky Lake PRM, RENDER_SURFACE_STATE::
> AuxiliarySurfaceMode:
> -   *
> -   *"If Number of Multisamples is MULTISAMPLECOUNT_1, AUX_CCS_D
> -   *setting is only allowed if Surface Format supported for Fast
> -   *Clear. In addition, if the surface is bound to the sampling
> -   *engine, Surface Format must be supported for Render Target
> -   *Compression for surfaces bound to the sampling engine."
> -   *
> -   * In other words, we can only sample from a fast-cleared image if
> it
> -   * also supports color compression.
> -   */
> -  if (isl_format_supports_ccs_e(&device->info, iview->isl.format))
> - att_state->input_aux_usage = ISL_

Re: [Mesa-dev] [PATCH] st/dri: add 32-bit RGBX/RGBA formats

2017-07-10 Thread Chad Versace

On Mon 10 Jul 2017, Chad Versace wrote:
> On Fri 07 Jul 2017, Rob Herring wrote:
> > On Wed, Jul 5, 2017 at 5:14 PM, Chad Versace  
> > wrote:
> > > On Fri 30 Jun 2017, Rob Herring wrote:
> > >> Add support for 32-bit RGBX/RGBA formats which are required for Android.
> > >>
> > >> The original patch (commit ccdcf91104a5) was reverted (commit
> > >> c0c6ca40a25e) in mesa as it broke GLX resulting in swapped colors. Based
> > >> on further investigation by Chad Versace, moving the RGBX/RGBA configs
> > >> to the end is enough to prevent breaking GLX.
> > >>
> > >> Cc: Marek Olšák 
> > >> Cc: Eric Anholt 
> > >> Cc: Chad Versace 
> > >> Cc: Mauro Rossi 
> > >> Signed-off-by: Rob Herring 
> > >> ---
> > >> I've tested only on Android and could use help testing with KDE which
> > >> broke last time. This has been done on the Intel driver and *should* be
> > >> okay, but maybe not.
> > >
> > > Should this patch also update the switch statement in
> > > dri2.c:dri2_drawable_get_buffers()? I think so, but am not certain.
> > 
> > I don't know. At least for Android, I think we'd always take the
> > dri_image_drawable_get_buffers path which already has the formats.
> 
> True, I think Android always takes the dri_image_drawable_get_buffers()
> path. It wouldn't hurt to also add the formats to
> dri2_drawable_get_buffers(), but I doubt that function will ever see the
> new formats.
> 
> Reviewed-by: Chad Versace 

Oops. I retract my r-b.

dri_create_context() and dri_create_buffer() call dri_fill_st_visual(),
but dri_fill_st_visual() hasn't yet been taught about the new formats.
But I don't understand Gallium well enough to know when those functions
get called.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 05/73] nir: add nir_lower_uniforms_to_ubo pass

2017-07-10 Thread Eric Anholt

Nicolai Hähnle  writes:

> From: Nicolai Hähnle 
>
> This is a further lowering of default-block uniform loads that transforms
> load_uniform intrinsics into load_ubo intrinsics. This simplies the rest

"simplifies"

> of the backend.

I don't think I'll be able to use it, but this seems like a reasonable
tool to have around.

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC 01/22] RFC: egl/x11: Support DRI3 v1.1

2017-07-10 Thread Louis-Francis Ratté-Boulianne

Hi,

On Tue, 2017-06-20 at 15:19 +0100, Emil Velikov wrote:
> > +for (i = 0; i < count; i++) {
> > +   modifiers[i] = (uint64_t) mod_parts[i * 2] << 32;
> > +   modifiers[i] |= (uint64_t) mod_parts[i * 2 + 1] &
> > 0xff;
> > +}
> > + }
> > +
> > + free(mod_reply);
> > +
> > + buffer->image = draw->ext->image-
> > >createImageWithModifiers(draw->dri_screen,
> > +  
> >   width, height,
> > +  
> >   format,
> > +  
> >   modifiers, count,
> > +  
> >   buffer);
> > + free(modifiers);
> > +  }
> > +#endif
> > +
> > +  if (!buffer->image)
> 
> Does not align with all the error paths above. There we bail out, yet
> here we fall-back to the old extension.
> Perhaps change the former to come here? One could even more that
> whole
> hunk into a separate function.

The rationale is that we only try to create a buffer with the supported
modifiers. If it doesn't work, it's still sensible to try the old path
as it's better to have a DRI image without any optimization than none.

> +   }
> > 
> > -   xcb_dri3_pixmap_from_buffer(draw->conn,
> > -   (pixmap = xcb_generate_id(draw-
> > >conn)),
> > -   draw->drawable,
> > -   buffer->size,
> > -   width, height, buffer->pitch,
> > -   depth, buffer->cpp * 8,
> > -   buffer_fd);
> > +#if XCB_DRI3_MAJOR_VERSION > 1 || XCB_DRI3_MINOR_VERSION >= 1
> > +   if (draw->multiplanes_available) {
> 
> This else looks a bit odd. If we fail to manage multiple buffers
> above, multiplanes_available will still be true, yet we could have a
> DRIImage.
> We should track that (modify multiplanes_available/other) and act
> accordingly here.

What can fail above (and not be critical) is creating an image with
modifiers. I grant you that it means the resulting image will only have
one plane, but that doesn't make it "bad" to use
xcb_dri3_pixmap_from_buffers. multiplanes_available simply means that
the X server actually supports using multiple planes.

The rest of your comments have been addressed and I will post the new
RFC (v2) soon. Thank you for the review.

--
Louis-Francis
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 07/73] st/glsl_to_nir: fix the case where NIR clone testing is enabled

2017-07-10 Thread Eric Anholt

Nicolai Hähnle  writes:

> From: Nicolai Hähnle 
>
> In that case, prog->nir must be assigned at the end.

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/dri: add 32-bit RGBX/RGBA formats

2017-07-10 Thread Chad Versace

On Fri 07 Jul 2017, Rob Herring wrote:
> On Wed, Jul 5, 2017 at 5:14 PM, Chad Versace  wrote:
> > On Fri 30 Jun 2017, Rob Herring wrote:
> >> Add support for 32-bit RGBX/RGBA formats which are required for Android.
> >>
> >> The original patch (commit ccdcf91104a5) was reverted (commit
> >> c0c6ca40a25e) in mesa as it broke GLX resulting in swapped colors. Based
> >> on further investigation by Chad Versace, moving the RGBX/RGBA configs
> >> to the end is enough to prevent breaking GLX.
> >>
> >> Cc: Marek Olšák 
> >> Cc: Eric Anholt 
> >> Cc: Chad Versace 
> >> Cc: Mauro Rossi 
> >> Signed-off-by: Rob Herring 
> >> ---
> >> I've tested only on Android and could use help testing with KDE which
> >> broke last time. This has been done on the Intel driver and *should* be
> >> okay, but maybe not.
> >
> > Should this patch also update the switch statement in
> > dri2.c:dri2_drawable_get_buffers()? I think so, but am not certain.
> 
> I don't know. At least for Android, I think we'd always take the
> dri_image_drawable_get_buffers path which already has the formats.

True, I think Android always takes the dri_image_drawable_get_buffers()
path. It wouldn't hurt to also add the formats to
dri2_drawable_get_buffers(), but I doubt that function will ever see the
new formats.

Reviewed-by: Chad Versace 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 06/10] radeonsi: expose ARB_timer_query unconditionally

2017-07-10 Thread Marek Olšák

From: Marek Olšák 

clock_crystal_freq is always non-zero now.
---
 src/gallium/drivers/radeonsi/si_pipe.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 371d337..e2ec377 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -502,20 +502,22 @@ static int si_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_CULL_DISTANCE:
case PIPE_CAP_TGSI_ARRAY_COMPONENTS:
case PIPE_CAP_TGSI_CAN_READ_OUTPUTS:
case PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY:
case PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME:
case PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS:
case PIPE_CAP_DOUBLES:
case PIPE_CAP_TGSI_TEX_TXF_LZ:
case PIPE_CAP_TGSI_TES_LAYER_VIEWPORT:
case PIPE_CAP_BINDLESS_TEXTURE:
+   case PIPE_CAP_QUERY_TIMESTAMP:
+   case PIPE_CAP_QUERY_TIME_ELAPSED:
return 1;
 
case PIPE_CAP_INT64:
case PIPE_CAP_INT64_DIVMOD:
case PIPE_CAP_TGSI_CLOCK:
case PIPE_CAP_CAN_BIND_CONST_BUFFER_AS_VERTEX:
case PIPE_CAP_ALLOW_MAPPED_BUFFERS_DURING_EXECUTION:
return 1;
 
case PIPE_CAP_TGSI_VOTE:
@@ -638,25 +640,20 @@ static int si_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
/* textures support 8192, but layered rendering supports 2048 */
return 2048;
 
/* Viewports and render targets. */
case PIPE_CAP_MAX_VIEWPORTS:
return R600_MAX_VIEWPORTS;
case PIPE_CAP_VIEWPORT_SUBPIXEL_BITS:
case PIPE_CAP_MAX_RENDER_TARGETS:
return 8;
 
-   /* Timer queries, present when the clock frequency is non zero. */
-   case PIPE_CAP_QUERY_TIMESTAMP:
-   case PIPE_CAP_QUERY_TIME_ELAPSED:
-   return sscreen->b.info.clock_crystal_freq != 0;
-
case PIPE_CAP_MIN_TEXTURE_GATHER_OFFSET:
case PIPE_CAP_MIN_TEXEL_OFFSET:
return -32;
 
case PIPE_CAP_MAX_TEXTURE_GATHER_OFFSET:
case PIPE_CAP_MAX_TEXEL_OFFSET:
return 31;
 
case PIPE_CAP_ENDIANNESS:
return PIPE_ENDIAN_LITTLE;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 10/10] radeonsi/gfx9: add VM fault dmesg parser support

2017-07-10 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_debug.c | 29 +++--
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_debug.c 
b/src/gallium/drivers/radeonsi/si_debug.c
index 0d26ce5..06dea61 100644
--- a/src/gallium/drivers/radeonsi/si_debug.c
+++ b/src/gallium/drivers/radeonsi/si_debug.c
@@ -859,21 +859,21 @@ static void si_dump_dma(struct si_context *sctx,
for (i = 0; i < saved->num_dw; ++i) {
fprintf(f, " %08x\n", saved->ib[i]);
}
 
fprintf(f, "--- %s end ---\n", ib_name);
fprintf(f, "\n");
 
fprintf(f, "SDMA Dump Done.\n");
 }
 
-static bool si_vm_fault_occured(struct si_context *sctx, uint32_t *out_addr)
+static bool si_vm_fault_occured(struct si_context *sctx, uint64_t *out_addr)
 {
char line[2000];
unsigned sec, usec;
int progress = 0;
uint64_t timestamp = 0;
bool fault = false;
 
FILE *p = popen("dmesg", "r");
if (!p)
return false;
@@ -914,32 +914,49 @@ static bool si_vm_fault_occured(struct si_context *sctx, 
uint32_t *out_addr)
line[len-1] = 0;
 
/* Get the message part. */
msg = strchr(line, ']');
if (!msg) {
assert(0);
continue;
}
msg++;
 
+   const char *header_line, *addr_line_prefix, *addr_line_format;
+
+   if (sctx->b.chip_class >= GFX9) {
+   /* Match this:
+* ..: [gfxhub] VMC page fault (src_id:0 ring:158 
vm_id:2 pas_id:0)
+* ..:   at page 0x000219f8f000 from 27
+* ..: VM_L2_PROTECTION_FAULT_STATUS:0x0020113C
+*/
+   header_line = "VMC page fault";
+   addr_line_prefix = "   at page";
+   addr_line_format = "%"PRIx64;
+   } else {
+   header_line = "GPU fault detected:";
+   addr_line_prefix = "VM_CONTEXT1_PROTECTION_FAULT_ADDR";
+   addr_line_format = "%"PRIX64;
+   }
+
switch (progress) {
case 0:
-   if (strstr(msg, "GPU fault detected:"))
+   if (strstr(msg, header_line))
progress = 1;
break;
case 1:
-   msg = strstr(msg, "VM_CONTEXT1_PROTECTION_FAULT_ADDR");
+   msg = strstr(msg, addr_line_prefix);
if (msg) {
msg = strstr(msg, "0x");
if (msg) {
msg += 2;
-   if (sscanf(msg, "%X", out_addr) == 1)
+   if (sscanf(msg, addr_line_format, 
out_addr) == 1)
fault = true;
}
}
progress = 0;
break;
default:
progress = 0;
}
}
pclose(p);
@@ -948,37 +965,37 @@ static bool si_vm_fault_occured(struct si_context *sctx, 
uint32_t *out_addr)
sctx->dmesg_timestamp = timestamp;
return fault;
 }
 
 void si_check_vm_faults(struct r600_common_context *ctx,
struct radeon_saved_cs *saved, enum ring_type ring)
 {
struct si_context *sctx = (struct si_context *)ctx;
struct pipe_screen *screen = sctx->b.b.screen;
FILE *f;
-   uint32_t addr;
+   uint64_t addr;
char cmd_line[4096];
 
if (!si_vm_fault_occured(sctx, &addr))
return;
 
f = dd_get_debug_file(false);
if (!f)
return;
 
fprintf(f, "VM fault report.\n\n");
if (os_get_command_line(cmd_line, sizeof(cmd_line)))
fprintf(f, "Command: %s\n", cmd_line);
fprintf(f, "Driver vendor: %s\n", screen->get_vendor(screen));
fprintf(f, "Device vendor: %s\n", screen->get_device_vendor(screen));
fprintf(f, "Device name: %s\n\n", screen->get_name(screen));
-   fprintf(f, "Failing VM page: 0x%08x\n\n", addr);
+   fprintf(f, "Failing VM page: 0x%08"PRIx64"\n\n", addr);
 
if (sctx->apitrace_call_number)
fprintf(f, "Last apitrace call: %u\n\n",
sctx->apitrace_call_number);
 
switch (ring) {
case RING_GFX:
si_dump_debug_state(&sctx->b.b, f,
PIPE_DUMP_CURRENT_STATES |
PIPE_DUMP_CURRENT_SHADERS |
-- 
2.7.4

___
mesa-dev mail

[Mesa-dev] [PATCH 08/10] radeonsi: prevent a deadlock in util_queue_add_job with too many GL contexts

2017-07-10 Thread Marek Olšák

From: Marek Olšák 

If the queue is full, util_queue_add_job will wait while bo_fence_lock is
held.

It pb_slab wants to reuse a buffer, it will lock the pb_slab mutex and
try to check BO fence busyness, but it has to wait for bo_fence_lock to get
released. Both bo_fence_lock and pb_slab mutex are locked now.

When the CS thread unreferences and releases a suballocated buffer,
it will try to lock the pb_slab mutex and has to wait. The CS thread
can't finish its job in order to free a queue slot and unblock
util_queue_add_job ==> deadlock.
---
 src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
index 30f4dfb..837c1e2 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
@@ -309,21 +309,22 @@ amdgpu_winsys_create(int fd, unsigned flags,
ws->base.get_chip_name = amdgpu_get_chip_name;
 
amdgpu_bo_init_functions(ws);
amdgpu_cs_init_functions(ws);
amdgpu_surface_init_functions(ws);
 
LIST_INITHEAD(&ws->global_bo_list);
(void) mtx_init(&ws->global_bo_list_lock, mtx_plain);
(void) mtx_init(&ws->bo_fence_lock, mtx_plain);
 
-   if (!util_queue_init(&ws->cs_queue, "amdgpu_cs", 8, 1, 0)) {
+   if (!util_queue_init(&ws->cs_queue, "amdgpu_cs", 8, 1,
+UTIL_QUEUE_INIT_RESIZE_IF_FULL)) {
   amdgpu_winsys_destroy(&ws->base);
   mtx_unlock(&dev_tab_mutex);
   return NULL;
}
 
/* Create the screen at the end. The winsys must be initialized
 * completely.
 *
 * Alternatively, we could create the screen based on "ws->gen"
 * and link all drivers into one binary blob. */
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 03/10] radeonsi: prevent a crash with DBG_CHECK_VM and u_threaded_context

2017-07-10 Thread Marek Olšák

From: Marek Olšák 

by setting PIPE_CONTEXT_DEBUG in the caller
---
 src/gallium/drivers/radeonsi/si_pipe.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 8a4bc41..371d337 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -158,23 +158,20 @@ static struct pipe_context *si_create_context(struct 
pipe_screen *screen,
   unsigned flags)
 {
struct si_context *sctx = CALLOC_STRUCT(si_context);
struct si_screen* sscreen = (struct si_screen *)screen;
struct radeon_winsys *ws = sscreen->b.ws;
int shader, i;
 
if (!sctx)
return NULL;
 
-   if (sscreen->b.debug_flags & DBG_CHECK_VM)
-   flags |= PIPE_CONTEXT_DEBUG;
-
if (flags & PIPE_CONTEXT_DEBUG)
sscreen->record_llvm_ir = true; /* racy but not critical */
 
sctx->b.b.screen = screen; /* this must be set first */
sctx->b.b.priv = NULL;
sctx->b.b.destroy = si_destroy_context;
sctx->b.b.emit_string_marker = si_emit_string_marker;
sctx->b.set_atom_dirty = (void *)si_set_atom_dirty;
sctx->screen = sscreen; /* Easy accessing of screen/winsys. */
sctx->is_debug = (flags & PIPE_CONTEXT_DEBUG) != 0;
@@ -371,21 +368,26 @@ static struct pipe_context *si_create_context(struct 
pipe_screen *screen,
 fail:
fprintf(stderr, "radeonsi: Failed to create a context.\n");
si_destroy_context(&sctx->b.b);
return NULL;
 }
 
 static struct pipe_context *si_pipe_create_context(struct pipe_screen *screen,
   void *priv, unsigned flags)
 {
struct si_screen *sscreen = (struct si_screen *)screen;
-   struct pipe_context *ctx = si_create_context(screen, flags);
+   struct pipe_context *ctx;
+
+   if (sscreen->b.debug_flags & DBG_CHECK_VM)
+   flags |= PIPE_CONTEXT_DEBUG;
+
+   ctx = si_create_context(screen, flags);
 
if (!(flags & PIPE_CONTEXT_PREFER_THREADED))
return ctx;
 
/* Clover (compute-only) is unsupported.
 *
 * Since the threaded context creates shader states from the non-driver
 * thread, asynchronous compilation is required for create_{shader}_-
 * state not to use pipe_context. Debug contexts (ddebug) disable
 * asynchronous compilation, so don't use the threaded context with
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 09/10] radeonsi: automatically resize shader compiler thread queues when they are full

2017-07-10 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_pipe.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index e2ec377..4df60b6 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -954,35 +954,31 @@ struct pipe_screen *radeonsi_screen_create(struct 
radeon_winsys *ws,
/* Only enable as many threads as we have target machines, but at most
 * the number of CPUs - 1 if there is more than one.
 */
num_threads = sysconf(_SC_NPROCESSORS_ONLN);
num_threads = MAX2(1, num_threads - 1);
num_compiler_threads = MIN2(num_threads, ARRAY_SIZE(sscreen->tm));
num_compiler_threads_lowprio =
MIN2(num_threads, ARRAY_SIZE(sscreen->tm_low_priority));
 
if (!util_queue_init(&sscreen->shader_compiler_queue, "si_shader",
-32, num_compiler_threads, 0)) {
+32, num_compiler_threads,
+UTIL_QUEUE_INIT_RESIZE_IF_FULL)) {
si_destroy_shader_cache(sscreen);
FREE(sscreen);
return NULL;
}
 
-   /* The queue must be large enough so that adding optimized shaders
-* doesn't stall draw calls when the queue is full. Especially varying
-* packing generates a very high volume of optimized shader compilation
-* jobs.
-*/
if (!util_queue_init(&sscreen->shader_compiler_queue_low_priority,
 "si_shader_low",
-1024, num_compiler_threads,
-UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY)) {
+32, num_compiler_threads,
+UTIL_QUEUE_INIT_RESIZE_IF_FULL)) {
   si_destroy_shader_cache(sscreen);
   FREE(sscreen);
   return NULL;
}
 
si_handle_env_var_force_family(sscreen);
 
if (!debug_get_bool_option("RADEON_DISABLE_PERFCOUNTERS", false))
si_init_perfcounters(sscreen);
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 05/10] ac/gpu_info: if clock crystal frequency is 0, print an error and set 1

2017-07-10 Thread Marek Olšák

From: Marek Olšák 

During bring-up, this is often 0. Prevent automatic disablement of
ARB_timer_query and demotion of the OpenGL version to 3.2 by setting
a non-zero frequency. Print an error message instead.
---
 src/amd/common/ac_gpu_info.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/amd/common/ac_gpu_info.c b/src/amd/common/ac_gpu_info.c
index 3f39a08..ced7183 100644
--- a/src/amd/common/ac_gpu_info.c
+++ b/src/amd/common/ac_gpu_info.c
@@ -253,20 +253,24 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle dev,
info->max_sh_per_se = amdinfo->num_shader_arrays_per_engine;
info->has_hw_decode =
(uvd.available_rings != 0) || (vcn_dec.available_rings != 0);
info->uvd_fw_version =
uvd.available_rings ? uvd_version : 0;
info->vce_fw_version =
vce.available_rings ? vce_version : 0;
info->has_userptr = true;
info->num_render_backends = amdinfo->rb_pipes;
info->clock_crystal_freq = amdinfo->gpu_counter_freq;
+   if (!info->clock_crystal_freq) {
+   fprintf(stderr, "amdgpu: clock crystal frequency is 0, 
timestamps will be wrong\n");
+   info->clock_crystal_freq = 1;
+   }
info->tcc_cache_line_size = 64; /* TC L2 line size on GCN */
if (info->chip_class == GFX9) {
info->num_tile_pipes = 1 << 
G_0098F8_NUM_PIPES(amdinfo->gb_addr_cfg);
info->pipe_interleave_bytes =
256 << 
G_0098F8_PIPE_INTERLEAVE_SIZE_GFX9(amdinfo->gb_addr_cfg);
} else {
info->num_tile_pipes = cik_get_num_tile_pipes(amdinfo);
info->pipe_interleave_bytes =
256 << 
G_0098F8_PIPE_INTERLEAVE_SIZE_GFX6(amdinfo->gb_addr_cfg);
}
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 07/10] util/u_queue: add an option to resize the queue when it's full

2017-07-10 Thread Marek Olšák

From: Marek Olšák 

Consider the following situation:
  mtx_lock(mutex);
  do_something();
  util_queue_add_job(...);
  mtx_unlock(mutex);

If the queue is full, util_queue_add_job will wait for a free slot.
If the job which is currently being executed tries to lock the mutex,
it will be stuck forever, because util_queue_add_job is stuck.

The deadlock can be trivially resolved by increasing the queue size
(reallocating the queue) in util_queue_add_job if the queue is full.
Then util_queue_add_job becomes wait-free.

radeonsi will use it.
---
 src/util/u_queue.c | 37 ++---
 src/util/u_queue.h |  2 ++
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/src/util/u_queue.c b/src/util/u_queue.c
index cb59030..49361c3 100644
--- a/src/util/u_queue.c
+++ b/src/util/u_queue.c
@@ -197,20 +197,21 @@ bool
 util_queue_init(struct util_queue *queue,
 const char *name,
 unsigned max_jobs,
 unsigned num_threads,
 unsigned flags)
 {
unsigned i;
 
memset(queue, 0, sizeof(*queue));
queue->name = name;
+   queue->flags = flags;
queue->num_threads = num_threads;
queue->max_jobs = max_jobs;
 
queue->jobs = (struct util_queue_job*)
  calloc(max_jobs, sizeof(struct util_queue_job));
if (!queue->jobs)
   goto fail;
 
(void) mtx_init(&queue->lock, mtx_plain);
 
@@ -322,23 +323,53 @@ util_queue_add_job(struct util_queue *queue,
   /* well no good option here, but any leaks will be
* short-lived as things are shutting down..
*/
   return;
}
 
fence->signalled = false;
 
assert(queue->num_queued >= 0 && queue->num_queued <= queue->max_jobs);
 
-   /* if the queue is full, wait until there is space */
-   while (queue->num_queued == queue->max_jobs)
-  cnd_wait(&queue->has_space_cond, &queue->lock);
+   if (queue->num_queued == queue->max_jobs) {
+  if (queue->flags & UTIL_QUEUE_INIT_RESIZE_IF_FULL) {
+ /* If the queue is full, make it larger to avoid waiting for a free
+  * slot.
+  */
+ unsigned new_max_jobs = queue->max_jobs + 8;
+ struct util_queue_job *jobs =
+(struct util_queue_job*)calloc(new_max_jobs,
+   sizeof(struct util_queue_job));
+ assert(jobs);
+
+ /* Copy all queued jobs into the new list. */
+ unsigned num_jobs = 0;
+ unsigned i = queue->read_idx;
+
+ do {
+jobs[num_jobs++] = queue->jobs[i];
+i = (i + 1) % queue->max_jobs;
+ } while (i != queue->write_idx);
+
+ assert(num_jobs == queue->num_queued);
+
+ free(queue->jobs);
+ queue->jobs = jobs;
+ queue->read_idx = 0;
+ queue->write_idx = num_jobs;
+ queue->max_jobs = new_max_jobs;
+  } else {
+ /* Wait until there is a free slot. */
+ while (queue->num_queued == queue->max_jobs)
+cnd_wait(&queue->has_space_cond, &queue->lock);
+  }
+   }
 
ptr = &queue->jobs[queue->write_idx];
assert(ptr->job == NULL);
ptr->job = job;
ptr->fence = fence;
ptr->execute = execute;
ptr->cleanup = cleanup;
queue->write_idx = (queue->write_idx + 1) % queue->max_jobs;
 
queue->num_queued++;
diff --git a/src/util/u_queue.h b/src/util/u_queue.h
index edd6bab..ff713ae 100644
--- a/src/util/u_queue.h
+++ b/src/util/u_queue.h
@@ -36,20 +36,21 @@
 #include 
 
 #include "util/list.h"
 #include "util/u_thread.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
 #define UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY  (1 << 0)
+#define UTIL_QUEUE_INIT_RESIZE_IF_FULL(1 << 1)
 
 /* Job completion fence.
  * Put this into your job structure.
  */
 struct util_queue_fence {
mtx_t mutex;
cnd_t cond;
int signalled;
 };
 
@@ -62,20 +63,21 @@ struct util_queue_job {
util_queue_execute_func cleanup;
 };
 
 /* Put this into your context. */
 struct util_queue {
const char *name;
mtx_t lock;
cnd_t has_queued_cond;
cnd_t has_space_cond;
thrd_t *threads;
+   unsigned flags;
int num_queued;
unsigned num_threads;
int kill_threads;
int max_jobs;
int write_idx, read_idx; /* ring buffer pointers */
struct util_queue_job *jobs;
 
/* for cleanup at exit(), protected by exit_mutex */
struct list_head head;
 };
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 02/10] ac/surface/gfx9: flags.texture currently refers to TC-compatible HTILE

2017-07-10 Thread Marek Olšák

From: Marek Olšák 

This should lead to better MSAA performance on GFX9.
---
 src/amd/common/ac_surface.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/amd/common/ac_surface.c b/src/amd/common/ac_surface.c
index a4df595..5f38205 100644
--- a/src/amd/common/ac_surface.c
+++ b/src/amd/common/ac_surface.c
@@ -940,21 +940,23 @@ static int gfx9_compute_surface(ADDR_HANDLE addrlib,
default:
assert(0);
}
} else {
AddrSurfInfoIn.bpp = surf->bpe * 8;
}
 
AddrSurfInfoIn.flags.color = !(surf->flags & RADEON_SURF_Z_OR_SBUFFER);
AddrSurfInfoIn.flags.depth = (surf->flags & RADEON_SURF_ZBUFFER) != 0;
AddrSurfInfoIn.flags.display = (surf->flags & RADEON_SURF_SCANOUT) != 0;
-   AddrSurfInfoIn.flags.texture = 1;
+   /* flags.texture currently refers to TC-compatible HTILE */
+   AddrSurfInfoIn.flags.texture = AddrSurfInfoIn.flags.color ||
+  surf->flags & 
RADEON_SURF_TC_COMPATIBLE_HTILE;
AddrSurfInfoIn.flags.opt4space = 1;
 
AddrSurfInfoIn.numMipLevels = config->info.levels;
AddrSurfInfoIn.numSamples = config->info.samples ? config->info.samples 
: 1;
AddrSurfInfoIn.numFrags = AddrSurfInfoIn.numSamples;
 
/* GFX9 doesn't support 1D depth textures, so allocate all 1D textures
 * as 2D to avoid having shader variants for 1D vs 2D, so all shaders
 * must sample 1D textures as 2D. */
if (config->is_3d)
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 04/10] radeonsi/gfx9: don't read back non-existent register SRBM_STATUS2

2017-07-10 Thread Marek Olšák

From: Marek Olšák 

It looks like there is no way to monitor SDMA busyness on GFX9.
---
 src/gallium/drivers/radeon/r600_gpu_load.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/r600_gpu_load.c 
b/src/gallium/drivers/radeon/r600_gpu_load.c
index 3b45545..d8f7c3d 100644
--- a/src/gallium/drivers/radeon/r600_gpu_load.c
+++ b/src/gallium/drivers/radeon/r600_gpu_load.c
@@ -98,21 +98,21 @@ static void r600_update_mmio_counters(struct 
r600_common_screen *rscreen,
UPDATE_COUNTER(spi, SPI_BUSY);
UPDATE_COUNTER(bci, BCI_BUSY);
UPDATE_COUNTER(sc, SC_BUSY);
UPDATE_COUNTER(pa, PA_BUSY);
UPDATE_COUNTER(db, DB_BUSY);
UPDATE_COUNTER(cp, CP_BUSY);
UPDATE_COUNTER(cb, CB_BUSY);
UPDATE_COUNTER(gui, GUI_ACTIVE);
gui_busy = GUI_ACTIVE(value);
 
-   if (rscreen->chip_class >= CIK) {
+   if (rscreen->chip_class == CIK || rscreen->chip_class == VI) {
/* SRBM_STATUS2 */
rscreen->ws->read_registers(rscreen->ws, SRBM_STATUS2, 1, 
&value);
 
UPDATE_COUNTER(sdma, SDMA_BUSY);
sdma_busy = SDMA_BUSY(value);
}
 
if (rscreen->chip_class >= VI) {
/* CP_STAT */
rscreen->ws->read_registers(rscreen->ws, CP_STAT, 1, &value);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 01/10] radeonsi: simplify computation of tessellation offchip buffers

2017-07-10 Thread Marek Olšák

From: Marek Olšák 

This is overly cautious, but better safe than sorry.
---
 src/gallium/drivers/radeonsi/si_state_shaders.c | 17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index f1170be..619ad9f 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -2935,51 +2935,38 @@ static bool si_update_spi_tmpring_size(struct 
si_context *sctx)
si_mark_atom_dirty(sctx, &sctx->scratch_state);
}
return true;
 }
 
 static void si_init_tess_factor_ring(struct si_context *sctx)
 {
bool double_offchip_buffers = sctx->b.chip_class >= CIK &&
  sctx->b.family != CHIP_CARRIZO &&
  sctx->b.family != CHIP_STONEY;
-   unsigned max_offchip_buffers_per_se = double_offchip_buffers ? 128 : 64;
+   /* This must be one less than the maximum number due to a hw 
limitation. */
+   unsigned max_offchip_buffers_per_se = double_offchip_buffers ? 127 : 63;
unsigned max_offchip_buffers = max_offchip_buffers_per_se *
   sctx->screen->b.info.max_se;
unsigned offchip_granularity;
 
switch (sctx->screen->tess_offchip_block_dw_size) {
default:
assert(0);
/* fall through */
case 8192:
offchip_granularity = V_03093C_X_8K_DWORDS;
break;
case 4096:
offchip_granularity = V_03093C_X_4K_DWORDS;
break;
}
 
-   switch (sctx->b.chip_class) {
-   case SI:
-   max_offchip_buffers = MIN2(max_offchip_buffers, 126);
-   break;
-   case CIK:
-   case VI:
-   case GFX9:
-   max_offchip_buffers = MIN2(max_offchip_buffers, 508);
-   break;
-   default:
-   assert(0);
-   return;
-   }
-
assert(!sctx->tf_ring);
/* Use 64K alignment for both rings, so that we can pass the address
 * to shaders as one SGPR containing bits [16:47].
 */
sctx->tf_ring = r600_aligned_buffer_create(sctx->b.b.screen,
   
R600_RESOURCE_FLAG_UNMAPPABLE,
   PIPE_USAGE_DEFAULT,
   32768 * 
sctx->screen->b.info.max_se,
   64 * 1024);
if (!sctx->tf_ring)
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 09/73] st/mesa: get rid of st_glsl_types

2017-07-10 Thread Eric Anholt

Nicolai Hähnle  writes:

> From: Nicolai Hähnle 
>
> It's a duplicate of glsl_type::count_attribute_slots.

Reviewed-by: Eric Anholt 

It's a bit unfortunate to duplicate the little wrapper function
everywhere, but I know st_glsl_type_size() has been a pain for linking,
and deleting this much code is great.

signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 11/11] i965: Use pushed UBO data in the scalar backend.

2017-07-10 Thread Kenneth Graunke

On Monday, July 10, 2017 11:59:45 AM PDT Matt Turner wrote:
> On Thu, Jul 6, 2017 at 5:22 PM, Kenneth Graunke  wrote:
> > This actually takes advantage of the newly pushed UBO data, avoiding
> > pull loads.
> >
> > XXX: quote performance numbers
> > ---
> >  src/intel/compiler/brw_fs.cpp | 35 ++-
> >  src/intel/compiler/brw_fs.h   |  2 ++
> >  src/intel/compiler/brw_fs_nir.cpp | 28 
> >  3 files changed, 64 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> > index 49e714f1c1f..68648eda64d 100644
> > --- a/src/intel/compiler/brw_fs.cpp
> > +++ b/src/intel/compiler/brw_fs.cpp
> > @@ -1386,7 +1386,9 @@ fs_visitor::assign_curb_setup()
> > unsigned uniform_push_length = DIV_ROUND_UP(stage_prog_data->nr_params, 
> > 8);
> >
> > unsigned ubo_push_length = 0;
> > +   unsigned ubo_push_start[4];
> > for (int i = 0; i < 4; i++) {
> > +  ubo_push_start[i] = 8 * (ubo_push_length + uniform_push_length);
> >ubo_push_length += stage_prog_data->ubo_ranges[i].length;
> > }
> >
> > @@ -1398,7 +1400,11 @@ fs_visitor::assign_curb_setup()
> >  if (inst->src[i].file == UNIFORM) {
> >  int uniform_nr = inst->src[i].nr + inst->src[i].offset / 4;
> >  int constant_nr;
> > -if (uniform_nr >= 0 && uniform_nr < (int) uniforms) {
> > +if (inst->src[i].nr >= UBO_START) {
> > +   /* constant_nr is in 32-bit units, the rest are in bytes */
> > +   constant_nr = ubo_push_start[inst->src[i].nr - UBO_START] +
> > + inst->src[i].offset / 4;
> > +} else if (uniform_nr >= 0 && uniform_nr < (int) uniforms) {
> > constant_nr = push_constant_loc[uniform_nr];
> >  } else {
> > /* Section 5.11 of the OpenGL 4.1 spec says:
> > @@ -2069,6 +2075,20 @@ fs_visitor::assign_constant_locations()
> > stage_prog_data->nr_params = num_push_constants;
> > stage_prog_data->nr_pull_params = num_pull_constants;
> >
> > +   /* Now that we know how many regular uniforms we'll push, reduce the
> > +* UBO push ranges so we don't exceed the 3DSTATE_CONSTANT limits.
> > +*/
> > +   unsigned push_length = DIV_ROUND_UP(stage_prog_data->nr_params, 8);
> > +   for (int i = 0; i < 4; i++) {
> > +  struct brw_ubo_range *range = &prog_data->ubo_ranges[i];
> > +
> > +  if (push_length + range->length > 64)
> > + range->length = 64 - push_length;
> > +
> > +  push_length += range->length;
> > +   }
> > +   assert(push_length <= 64);
> > +
> > /* Up until now, the param[] array has been indexed by reg + offset
> >  * of UNIFORM registers.  Move pull constants into pull_param[] and
> >  * condense param[] to only contain the uniforms we chose to push.
> > @@ -2103,6 +2123,19 @@ fs_visitor::get_pull_locs(const fs_reg &src,
> >  {
> > assert(src.file == UNIFORM);
> >
> > +   if (src.nr >= UBO_START) {
> > +  const struct brw_ubo_range *range =
> > + &prog_data->ubo_ranges[src.nr - UBO_START];
> > +
> > +  /* If this access is in our (reduced) range, use the push data. */
> > +  if (src.offset / 32 < range->length && !getenv("PULL"))
> 
> If the environment variable is useful, make it part of INTEL_DEBUG?

Oops.  I put that in so I could quickly benchmark pushing vs. pulling
performance (though not exactly - because you still have the overhead
of uploading those ranges...just not the benefit of using them).

I meant to delete it before sending.  If we wanted an option for this,
we should just skip the analysis pass so there's nothing to push, not
hack around things here.  I'll just get rid of this.

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 1/3] spirv: Fix reaching unreachable for compare exchange on images

2017-07-10 Thread Jason Ekstrand

On Mon, Jul 10, 2017 at 1:46 PM, Andres Gomez  wrote:

> James, it doesn't seem like this patch has landed in master. Are you in
> need of review or is it that this has been superseded?
>

Sorry.  My fault.

Reviewed-by: Jason Ekstrand 

and pushed.  Thanks!


> Thanks!
>
> On Mon, 2017-06-26 at 10:46 +0100, James Legg wrote:
> > We were hitting the
> >   unreachable("Invalid image opcode")
> > near the end of vtn_handle_image when parsing the
> > SpvOpAtomicCompareExchange opcode.
> >
> > v2: Add stable CC.
> > v3: Ignore SpvOpAtomicCompareExchangeWeak. It requires the Kernel
> > capability which is not exposed in Vulkan, and spirv_to_nir is not used
> > for OpenCL which does support it.
> >
> > CC: 
> > ---
> >  src/compiler/spirv/spirv_to_nir.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/src/compiler/spirv/spirv_to_nir.c
> b/src/compiler/spirv/spirv_to_nir.c
> > index 0a5eb0e..0e6229b 100644
> > --- a/src/compiler/spirv/spirv_to_nir.c
> > +++ b/src/compiler/spirv/spirv_to_nir.c
> > @@ -1977,6 +1977,7 @@ vtn_handle_image(struct vtn_builder *b, SpvOp
> opcode,
> >intrin->src[2] = nir_src_for_ssa(vtn_ssa_value(b, w[3])->def);
> >break;
> >
> > +   case SpvOpAtomicCompareExchange:
> > case SpvOpAtomicIIncrement:
> > case SpvOpAtomicIDecrement:
> > case SpvOpAtomicExchange:
> --
> Br,
>
> Andres
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 1/3] spirv: Fix reaching unreachable for compare exchange on images

2017-07-10 Thread Andres Gomez

James, it doesn't seem like this patch has landed in master. Are you in
need of review or is it that this has been superseded?

Thanks!

On Mon, 2017-06-26 at 10:46 +0100, James Legg wrote:
> We were hitting the
>   unreachable("Invalid image opcode")
> near the end of vtn_handle_image when parsing the
> SpvOpAtomicCompareExchange opcode.
> 
> v2: Add stable CC.
> v3: Ignore SpvOpAtomicCompareExchangeWeak. It requires the Kernel
> capability which is not exposed in Vulkan, and spirv_to_nir is not used
> for OpenCL which does support it.
> 
> CC: 
> ---
>  src/compiler/spirv/spirv_to_nir.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/compiler/spirv/spirv_to_nir.c 
> b/src/compiler/spirv/spirv_to_nir.c
> index 0a5eb0e..0e6229b 100644
> --- a/src/compiler/spirv/spirv_to_nir.c
> +++ b/src/compiler/spirv/spirv_to_nir.c
> @@ -1977,6 +1977,7 @@ vtn_handle_image(struct vtn_builder *b, SpvOp opcode,
>intrin->src[2] = nir_src_for_ssa(vtn_ssa_value(b, w[3])->def);
>break;
>  
> +   case SpvOpAtomicCompareExchange:
> case SpvOpAtomicIIncrement:
> case SpvOpAtomicIDecrement:
> case SpvOpAtomicExchange:
-- 
Br,

Andres
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] meta: Fix BlitFramebuffer temp texture setup

2017-07-10 Thread Andres Gomez

Ville, has this patch fallen through the cracks ?

On Fri, 2017-06-23 at 14:58 +0300, ville.syrj...@linux.intel.com wrote:
> From: Ville Syrjälä 
> 
> Pass the correct src coordinates to CopyTexSubImage()
> when creating the temporary texture, and also take care to adjust
> flipX/Y if the original src coordinates were flipped compared to
> the new temporary texture src coordinates.
> 
> This fixes all the flip_src_x/y tests in
> piglit.spec.arb_framebuffer_object.fbo-blit-stretch on i915, but
> we're still left with the some failures in the stretch tests.
> 
> It looks to me like commit b702233f53d6 ("meta: Refactor the
> BlitFramebuffer color CopyTexImage fallback.") most likely
> broke this codepath.
> 
> Cc: mesa-sta...@lists.freedesktop.org
> Cc: Eric Anholt 
> Cc: Kenneth Graunke 
> Cc: Ian Romanick 
> Cc: Anuj Phogat 
> Fixes: b702233f53d6 ("meta: Refactor the BlitFramebuffer color CopyTexImage 
> fallback.")
> References: https://bugs.freedesktop.org/show_bug.cgi?id=101414
> Signed-off-by: Ville Syrjälä 
> ---
>  src/mesa/drivers/common/meta_blit.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/common/meta_blit.c 
> b/src/mesa/drivers/common/meta_blit.c
> index 7adad469aceb..7262ecdfaf13 100644
> --- a/src/mesa/drivers/common/meta_blit.c
> +++ b/src/mesa/drivers/common/meta_blit.c
> @@ -680,12 +680,16 @@ blitframebuffer_texture(struct gl_context *ctx,
>}
>  
>_mesa_meta_setup_copypix_texture(ctx, meta_temp_texture,
> -   srcX0, srcY0,
> +   MIN2(srcX0, srcX1),
> +   MIN2(srcY0, srcY1),
> srcW, srcH,
> tex_base_format,
> filter);
>  
> -
> +  if (srcX0 > srcX1)
> + flipX = -flipX;
> +  if (srcY0 > srcY1)
> + flipY = -flipY;
>srcX0 = 0;
>srcY0 = 0;
>srcX1 = srcW;
-- 
Br,

Andres
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/blorp: Use the renderbuffer format for clears

2017-07-10 Thread Andres Gomez

Jason, which is the status of this patch? Has it been superseded or
discarded?


On Mon, 2017-06-26 at 09:01 -0700, Jason Ekstrand wrote:
> This fixes the Piglit ARB_texture_views rendering-formats test.
> 
> Cc: "17.1" 
> ---
>  src/mesa/drivers/dri/i965/brw_blorp.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
> b/src/mesa/drivers/dri/i965/brw_blorp.c
> index 87c9dd4..96dc657 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.c
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.c
> @@ -746,9 +746,9 @@ do_single_blorp_clear(struct brw_context *brw, struct 
> gl_framebuffer *fb,
>  {
> struct gl_context *ctx = &brw->ctx;
> struct intel_renderbuffer *irb = intel_renderbuffer(rb);
> -   mesa_format format = irb->mt->format;
> uint32_t x0, x1, y0, y1;
>  
> +   mesa_format format = irb->Base.Base.Format;
> if (!encode_srgb && _mesa_get_format_color_encoding(format) == GL_SRGB)
>format = _mesa_get_srgb_format_linear(format);
>  
> @@ -772,6 +772,14 @@ do_single_blorp_clear(struct brw_context *brw, struct 
> gl_framebuffer *fb,
> if (set_write_disables(irb, ctx->Color.ColorMask[buf], 
> color_write_disable))
>can_fast_clear = false;
>  
> +   /* We store clear colors as floats or uints as needed.  If there are
> +* texture views in play, the formats will not properly be respected
> +* during resolves because the resolve operations only know about the
> +* miptree and not the renderbuffer.
> +*/
> +   if (irb->Base.Base.Format != irb->mt->format)
> +  can_fast_clear = false;
> +
> if (!irb->mt->supports_fast_clear ||
> !brw_is_color_fast_clear_compatible(brw, irb->mt, 
> &ctx->Color.ClearColor))
>can_fast_clear = false;
-- 
Br,

Andres
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa/marshal: fix glNamedBufferData with NULLdata

2017-07-10 Thread Marc Dietrich

Am Montag, 10. Juli 2017, 16:28:28 CEST schrieb Grigori Goronzy:
> The semantics are similar to glBufferData. Fixes a crash with VMWare
> Player.
> 
> Signed-off-by: Grigori Goronzy 

Tested-by: Marc Dietrich 

> ---
>  src/mesa/main/marshal.c | 17 +
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/main/marshal.c b/src/mesa/main/marshal.c
> index 8db4531..b801bdc 100644
> --- a/src/mesa/main/marshal.c
> +++ b/src/mesa/main/marshal.c
> @@ -415,6 +415,7 @@ struct marshal_cmd_NamedBufferData
> GLuint name;
> GLsizei size;
> GLenum usage;
> +   bool data_null; /* If set, no data follows for "data" */
> /* Next size bytes are GLubyte data[size] */
>  };
> 
> @@ -425,7 +426,12 @@ _mesa_unmarshal_NamedBufferData(struct gl_context *ctx,
> const GLuint name = cmd->name;
> const GLsizei size = cmd->size;
> const GLenum usage = cmd->usage;
> -   const void *data = (const void *) (cmd + 1);
> +   const void *data;
> +
> +   if (cmd->data_null)
> +  data = NULL;
> +   else
> +  data = (const void *) (cmd + 1);
> 
> CALL_NamedBufferData(ctx->CurrentServerDispatch,
>  (name, size, data, usage));
> @@ -436,7 +442,7 @@ _mesa_marshal_NamedBufferData(GLuint buffer, GLsizeiptr
> size, const GLvoid * data, GLenum usage)
>  {
> GET_CURRENT_CONTEXT(ctx);
> -   size_t cmd_size = sizeof(struct marshal_cmd_NamedBufferData) + size;
> +   size_t cmd_size = sizeof(struct marshal_cmd_NamedBufferData) + (data ?
> size : 0);
> 
> debug_print_marshal("NamedBufferData");
> if (unlikely(size < 0)) {
> @@ -452,8 +458,11 @@ _mesa_marshal_NamedBufferData(GLuint buffer, GLsizeiptr
> size, cmd->name = buffer;
>cmd->size = size;
>cmd->usage = usage;
> -  char *variable_data = (char *) (cmd + 1);
> -  memcpy(variable_data, data, size);
> +  cmd->data_null = !data;
> +  if (data) {
> + char *variable_data = (char *) (cmd + 1);
> + memcpy(variable_data, data, size);
> +  }
>_mesa_post_marshal_hook(ctx);
> } else {
>_mesa_glthread_finish(ctx);



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 09/20] nir: Add system values from ARB_shader_ballot

2017-07-10 Thread Connor Abbott

On Thu, Jul 6, 2017 at 4:48 PM, Matt Turner  wrote:
> We already had a channel_num system value, which I'm renaming to
> subgroup_invocation to match the rest of the new system values.
>
> Note that while ballotARB(true) will return zeros in the high 32-bits on
> systems where gl_SubGroupSizeARB <= 32, the gl_SubGroup??MaskARB
> variables do not consider whether channels are enabled. See issue (1) of
> ARB_shader_ballot.
> ---
>  src/compiler/nir/nir.c |  4 
>  src/compiler/nir/nir_intrinsics.h  |  8 +++-
>  src/compiler/nir/nir_lower_system_values.c | 28 
>  src/intel/compiler/brw_fs_nir.cpp  |  2 +-
>  src/intel/compiler/brw_nir_intrinsics.c|  4 ++--
>  5 files changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c
> index 491b908396..9827e129ca 100644
> --- a/src/compiler/nir/nir.c
> +++ b/src/compiler/nir/nir.c
> @@ -1908,6 +1908,10 @@ nir_intrinsic_from_system_value(gl_system_value val)
>return nir_intrinsic_load_helper_invocation;
> case SYSTEM_VALUE_VIEW_INDEX:
>return nir_intrinsic_load_view_index;
> +   case SYSTEM_VALUE_SUBGROUP_SIZE:
> +  return nir_intrinsic_load_subgroup_size;
> +   case SYSTEM_VALUE_SUBGROUP_INVOCATION:
> +  return nir_intrinsic_load_subgroup_invocation;
> default:
>unreachable("system value does not directly correspond to intrinsic");
> }
> diff --git a/src/compiler/nir/nir_intrinsics.h 
> b/src/compiler/nir/nir_intrinsics.h
> index 6c6ba4cf59..96ecfbc338 100644
> --- a/src/compiler/nir/nir_intrinsics.h
> +++ b/src/compiler/nir/nir_intrinsics.h
> @@ -344,10 +344,16 @@ SYSTEM_VALUE(work_group_id, 3, 0, xx, xx, xx)
>  SYSTEM_VALUE(user_clip_plane, 4, 1, UCP_ID, xx, xx)
>  SYSTEM_VALUE(num_work_groups, 3, 0, xx, xx, xx)
>  SYSTEM_VALUE(helper_invocation, 1, 0, xx, xx, xx)
> -SYSTEM_VALUE(channel_num, 1, 0, xx, xx, xx)
>  SYSTEM_VALUE(alpha_ref_float, 1, 0, xx, xx, xx)
>  SYSTEM_VALUE(layer_id, 1, 0, xx, xx, xx)
>  SYSTEM_VALUE(view_index, 1, 0, xx, xx, xx)
> +SYSTEM_VALUE(subgroup_size, 1, 0, xx, xx, xx)
> +SYSTEM_VALUE(subgroup_invocation, 1, 0, xx, xx, xx)
> +SYSTEM_VALUE(subgroup_eq_mask, 1, 0, xx, xx, xx)
> +SYSTEM_VALUE(subgroup_ge_mask, 1, 0, xx, xx, xx)
> +SYSTEM_VALUE(subgroup_gt_mask, 1, 0, xx, xx, xx)
> +SYSTEM_VALUE(subgroup_le_mask, 1, 0, xx, xx, xx)
> +SYSTEM_VALUE(subgroup_lt_mask, 1, 0, xx, xx, xx)
>
>  /* Blend constant color values.  Float values are clamped. */
>  SYSTEM_VALUE(blend_const_color_r_float, 1, 0, xx, xx, xx)
> diff --git a/src/compiler/nir/nir_lower_system_values.c 
> b/src/compiler/nir/nir_lower_system_values.c
> index 810100a081..faf0c3c9da 100644
> --- a/src/compiler/nir/nir_lower_system_values.c
> +++ b/src/compiler/nir/nir_lower_system_values.c
> @@ -116,6 +116,34 @@ convert_block(nir_block *block, nir_builder *b)
> nir_load_base_instance(b));
>   break;
>
> +  case SYSTEM_VALUE_SUBGROUP_EQ_MASK:
> +  case SYSTEM_VALUE_SUBGROUP_GE_MASK:
> +  case SYSTEM_VALUE_SUBGROUP_GT_MASK:
> +  case SYSTEM_VALUE_SUBGROUP_LE_MASK:
> +  case SYSTEM_VALUE_SUBGROUP_LT_MASK: {
> + nir_ssa_def *count = nir_load_subgroup_invocation(b);
> +
> + switch (var->data.location) {
> + case SYSTEM_VALUE_SUBGROUP_EQ_MASK:
> +sysval = nir_ishl(b, nir_imm_int64(b, 1ull), count);
> +break;
> + case SYSTEM_VALUE_SUBGROUP_GE_MASK:
> +sysval = nir_ishl(b, nir_imm_int64(b, ~0ull), count);
> +break;
> + case SYSTEM_VALUE_SUBGROUP_GT_MASK:
> +sysval = nir_ishl(b, nir_imm_int64(b, ~1ull), count);
> +break;
> + case SYSTEM_VALUE_SUBGROUP_LE_MASK:
> +sysval = nir_inot(b, nir_ishl(b, nir_imm_int64(b, ~1ull), 
> count));
> +break;
> + case SYSTEM_VALUE_SUBGROUP_LT_MASK:
> +sysval = nir_inot(b, nir_ishl(b, nir_imm_int64(b, ~0ull), 
> count));
> +break;
> + default:
> +unreachable("you seriously can't tell this is unreachable?");
> + }
> +  }
> +

While this fine to do for both Intel and AMD, Nvidia actually has
special system values for these, and AMD has special instructions for
bitCount(foo & gl_SubGroupLtMask), so I think we should have actual
nir_load_subgroup_*_mask intrinsics for these. Also, that way you can
use the same shrinking logic to turn these into 32-bit shifts on
Intel.

>default:
>   break;
>}
> diff --git a/src/intel/compiler/brw_fs_nir.cpp 
> b/src/intel/compiler/brw_fs_nir.cpp
> index 264398f38e..17f35e081d 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -4075,7 +4075,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
> nir_intrinsic_instr *instr
>break;
> }
>
> -   case nir_intrinsic_load_channel_num: {
> +   case nir_intrinsic_load_subgroup_invoc

Re: [Mesa-dev] [PATCH 1/2] swr: switch to using SwrGetInterface api table

2017-07-10 Thread Cherniak, Bruce

Reviewed-by: Bruce Cherniak  

> On Jul 7, 2017, at 4:25 PM, Tim Rowley  wrote:
> 
> Use the SWR rasterizer API through the table returned from
> SwrGetInterface rather than referencing the functions directly.
> This will allow us to move to a model of having the driver dynamically
> load the appropriate swr architecture library.
> ---
> src/gallium/drivers/swr/swr_clear.cpp   |  6 ++---
> src/gallium/drivers/swr/swr_context.cpp | 19 --
> src/gallium/drivers/swr/swr_context.h   |  5 +++-
> src/gallium/drivers/swr/swr_draw.cpp| 46 -
> src/gallium/drivers/swr/swr_fence.cpp   |  2 +-
> src/gallium/drivers/swr/swr_memory.h|  6 ++---
> src/gallium/drivers/swr/swr_query.cpp   |  8 +++---
> src/gallium/drivers/swr/swr_scratch.cpp |  2 +-
> src/gallium/drivers/swr/swr_screen.cpp  |  3 ++-
> src/gallium/drivers/swr/swr_state.cpp   | 40 ++--
> 10 files changed, 72 insertions(+), 65 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/swr_clear.cpp 
> b/src/gallium/drivers/swr/swr_clear.cpp
> index 3a35805..233432e 100644
> --- a/src/gallium/drivers/swr/swr_clear.cpp
> +++ b/src/gallium/drivers/swr/swr_clear.cpp
> @@ -78,9 +78,9 @@ swr_clear(struct pipe_context *pipe,
> 
>for (unsigned i = 0; i < layers; ++i) {
>   swr_update_draw_context(ctx);
> -  SwrClearRenderTarget(ctx->swrContext, clearMask, i,
> -   color->f, depth, stencil,
> -   clear_rect);
> +  ctx->api.pfnSwrClearRenderTarget(ctx->swrContext, clearMask, i,
> +   color->f, depth, stencil,
> +   clear_rect);
> 
>   // Mask out the attachments that are out of layers.
>   if (fb->zsbuf &&
> diff --git a/src/gallium/drivers/swr/swr_context.cpp 
> b/src/gallium/drivers/swr/swr_context.cpp
> index f2d971a..9648278 100644
> --- a/src/gallium/drivers/swr/swr_context.cpp
> +++ b/src/gallium/drivers/swr/swr_context.cpp
> @@ -311,8 +311,8 @@ swr_blit(struct pipe_context *pipe, const struct 
> pipe_blit_info *blit_info)
>}
> 
>if (ctx->active_queries) {
> -  SwrEnableStatsFE(ctx->swrContext, FALSE);
> -  SwrEnableStatsBE(ctx->swrContext, FALSE);
> +  ctx->api.pfnSwrEnableStatsFE(ctx->swrContext, FALSE);
> +  ctx->api.pfnSwrEnableStatsBE(ctx->swrContext, FALSE);
>}
> 
>util_blitter_save_vertex_buffer_slot(ctx->blitter, ctx->vertex_buffer);
> @@ -349,8 +349,8 @@ swr_blit(struct pipe_context *pipe, const struct 
> pipe_blit_info *blit_info)
>util_blitter_blit(ctx->blitter, &info);
> 
>if (ctx->active_queries) {
> -  SwrEnableStatsFE(ctx->swrContext, TRUE);
> -  SwrEnableStatsBE(ctx->swrContext, TRUE);
> +  ctx->api.pfnSwrEnableStatsFE(ctx->swrContext, TRUE);
> +  ctx->api.pfnSwrEnableStatsBE(ctx->swrContext, TRUE);
>}
> }
> 
> @@ -383,10 +383,10 @@ swr_destroy(struct pipe_context *pipe)
> 
>/* Idle core after destroying buffer resources, but before deleting
> * context.  Destroying resources has potentially called StoreTiles.*/
> -   SwrWaitForIdle(ctx->swrContext);
> +   ctx->api.pfnSwrWaitForIdle(ctx->swrContext);
> 
>if (ctx->swrContext)
> -  SwrDestroyContext(ctx->swrContext);
> +  ctx->api.pfnSwrDestroyContext(ctx->swrContext);
> 
>delete ctx->blendJIT;
> 
> @@ -467,6 +467,9 @@ swr_create_context(struct pipe_screen *p_screen, void 
> *priv, unsigned flags)
>   AlignedMalloc(sizeof(struct swr_context), KNOB_SIMD_BYTES);
>memset(ctx, 0, sizeof(struct swr_context));
> 
> +   SwrGetInterface(ctx->api);
> +   ctx->swrDC.pAPI = &ctx->api;
> +
>ctx->blendJIT =
>   new std::unordered_map;
> 
> @@ -478,9 +481,9 @@ swr_create_context(struct pipe_screen *p_screen, void 
> *priv, unsigned flags)
>createInfo.pfnClearTile = swr_StoreHotTileClear;
>createInfo.pfnUpdateStats = swr_UpdateStats;
>createInfo.pfnUpdateStatsFE = swr_UpdateStatsFE;
> -   ctx->swrContext = SwrCreateContext(&createInfo);
> +   ctx->swrContext = ctx->api.pfnSwrCreateContext(&createInfo);
> 
> -   SwrInit();
> +   ctx->api.pfnSwrInit();
> 
>if (ctx->swrContext == NULL)
>   goto fail;
> diff --git a/src/gallium/drivers/swr/swr_context.h 
> b/src/gallium/drivers/swr/swr_context.h
> index 3ff4bf3..753cbf3 100644
> --- a/src/gallium/drivers/swr/swr_context.h
> +++ b/src/gallium/drivers/swr/swr_context.h
> @@ -102,6 +102,7 @@ struct swr_draw_context {
> 
>SWR_SURFACE_STATE renderTargets[SWR_NUM_ATTACHMENTS];
>struct swr_query_result *pStats; // @llvm_struct
> +   SWR_INTERFACE *pAPI; // @llvm_struct - Needed for the swr_memory callbacks
> };
> 
> /* gen_llvm_types FINI */
> @@ -169,6 +170,8 @@ struct swr_context {
>struct swr_draw_context swrDC;
> 
>unsigned dirty; /**< Mask of SWR_NEW_x flags */
> +
> +   SWR_INTERFACE api;
> };
> 
> static INLINE struct swr_context *
> @@ -182,7 +185,7 @@ swr_update_draw_context(struct swr_context *ctx,
>   struct swr_quer

Re: [Mesa-dev] [PATCH 00/11] i965: UBO pushing for fun and profit?

2017-07-10 Thread Matt Turner

On Thu, Jul 6, 2017 at 5:22 PM, Kenneth Graunke  wrote:
> Hello,
>
> This series begins pushing UBOs (rather than resorting to pull loads)
> for scalar shaders on Gen7.5+, for the OpenGL driver.  Future work is
> to hook it up for Vulkan (haven't started), for the vec4 shader stages
> (I have about 75% of the code written), and for Gen7 (I have a plan).
>
> Note that compute shaders unfortunately still resort to pull messages,
> because I haven't found a way to make the constant commands absolute
> addresses instead of being relative to dynamic state base address.
>
> This has long been a gap in our UBO support - we pushed regular
> uniform data, but always resorted to pulls for UBOs, making them
> slower than regular uniforms.
>
> I started this project a year and a half ago, and it initially looked
> very promising - up to 30% faster in Tomb Raider, for example.  However,
> Curro improved the performance of pull messages significantly since then.
> Now, it doesn't seem to have as large of an impact.  Jason thinks this
> would help close the GL/Vulkan gap in Talos Principle, when we finally
> hook it up in Vulkan.  One place where it does help is GLBenchmark 3.1
> Manhattan, which improves 3-4% on most platforms, and 6-7% on SKL GT4.
> This is primarily because it avoids doing a pull load in a loop, though,
> which could be solved by using the global code motion pass...
>
> I figured I'd at least send it out for an initial review, and we can
> continue collecting benchmark data...

The series looks good to me. I'm not the ideal person to review state
upload changes, so take that review for what its worth.

Reviewed-by: Matt Turner 

I guess we just need benchmarking data. Also, a set of todos I noticed

- give higher weight to UBO accesses in loops (and less if in an if?)
- consider combining ranges, but ugh, heuristics

I'm definitely okay with leaving those until a time when we find they're useful.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] drirc: whitelist glthread for The Witcher 2

2017-07-10 Thread Edmondo Tommasina

Performance delta on AMD Phenom II X3 720 / RX 470

The Witcher 2: +18%
---
 src/mesa/drivers/dri/common/drirc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/common/drirc 
b/src/mesa/drivers/dri/common/drirc
index 69b735ce70..3108451090 100644
--- a/src/mesa/drivers/dri/common/drirc
+++ b/src/mesa/drivers/dri/common/drirc
@@ -175,6 +175,9 @@ TODO: document the other workarounds.
 
 
 
+
+
+
 
 
 
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] squash: Add more comments to bitfield manipulations in UBO analysis pass

2017-07-10 Thread Kenneth Graunke

---
 src/intel/compiler/brw_nir_analyze_ubo_ranges.c | 28 -
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c 
b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
index 3535e67758c..b365728e77b 100644
--- a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
+++ b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
@@ -83,6 +83,10 @@ cmp_ubo_range_entry(const void *va, const void *vb)
 
 struct ubo_block_info
 {
+   /* Each bit in the offsets bitfield represents a 32-byte section of data.
+* If it's set to one, there is interesting UBO data at that offset.  If
+* not, there's a "hole" - padding between data - or just nothing at all.
+*/
uint64_t offsets;
uint8_t uses[64];
 };
@@ -189,7 +193,7 @@ brw_nir_analyze_ubo_ranges(const struct brw_compiler 
*compiler,
   }
}
 
-   /* Find ranges. */
+   /* Find ranges: a block, starting 32-byte offset, and length. */
struct util_dynarray ranges;
util_dynarray_init(&ranges, mem_ctx);
 
@@ -199,13 +203,34 @@ brw_nir_analyze_ubo_ranges(const struct brw_compiler 
*compiler,
   const struct ubo_block_info *info = entry->data;
   uint64_t offsets = info->offsets;
 
+  /* Walk through the offsets bitfield, finding contiguous regions of
+   * set bits:
+   *
+   *   01111100
+   *^^^^^
+   *
+   * Each of these will become a UBO range.
+   */
   while (offsets != 0) {
+ /* Find the first 1 in the offsets bitfield.  This represents the
+  * start of a range of interesting UBO data.  Make it zero-indexed.
+  */
  int first_bit = ffsll(offsets) - 1;
+
+ /* Find the first 0 bit in offsets beyond first_bit.  To find the
+  * first zero bit, we find the first 1 bit in the complement.  In
+  * order to ignore bits before first_bit, we mask off those bits.
+  */
  int first_hole = ffsll(~offsets & ~((1ull << first_bit) - 1)) - 1;
+
  if (first_hole == -1) {
+/* If we didn't find a hole, then set it to the end of the
+ * bitfield.  There are no more ranges to process.
+ */
 first_hole = 64;
 offsets = 0;
  } else {
+/* We've processed all bits before first_hole.  Mask them off. */
 offsets &= ~((1ull << first_hole) - 1);
  }
 
@@ -214,6 +239,7 @@ brw_nir_analyze_ubo_ranges(const struct brw_compiler 
*compiler,
 
  entry->range.block = b;
  entry->range.start = first_bit;
+ /* first_hole is one beyond the end, so we don't need to add 1 */
  entry->range.length = first_hole - first_bit;
  entry->benefit = 0;
 
-- 
2.13.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 11/11] i965: Use pushed UBO data in the scalar backend.

2017-07-10 Thread Matt Turner

On Thu, Jul 6, 2017 at 5:22 PM, Kenneth Graunke  wrote:
> This actually takes advantage of the newly pushed UBO data, avoiding
> pull loads.
>
> XXX: quote performance numbers
> ---
>  src/intel/compiler/brw_fs.cpp | 35 ++-
>  src/intel/compiler/brw_fs.h   |  2 ++
>  src/intel/compiler/brw_fs_nir.cpp | 28 
>  3 files changed, 64 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 49e714f1c1f..68648eda64d 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -1386,7 +1386,9 @@ fs_visitor::assign_curb_setup()
> unsigned uniform_push_length = DIV_ROUND_UP(stage_prog_data->nr_params, 
> 8);
>
> unsigned ubo_push_length = 0;
> +   unsigned ubo_push_start[4];
> for (int i = 0; i < 4; i++) {
> +  ubo_push_start[i] = 8 * (ubo_push_length + uniform_push_length);
>ubo_push_length += stage_prog_data->ubo_ranges[i].length;
> }
>
> @@ -1398,7 +1400,11 @@ fs_visitor::assign_curb_setup()
>  if (inst->src[i].file == UNIFORM) {
>  int uniform_nr = inst->src[i].nr + inst->src[i].offset / 4;
>  int constant_nr;
> -if (uniform_nr >= 0 && uniform_nr < (int) uniforms) {
> +if (inst->src[i].nr >= UBO_START) {
> +   /* constant_nr is in 32-bit units, the rest are in bytes */
> +   constant_nr = ubo_push_start[inst->src[i].nr - UBO_START] +
> + inst->src[i].offset / 4;
> +} else if (uniform_nr >= 0 && uniform_nr < (int) uniforms) {
> constant_nr = push_constant_loc[uniform_nr];
>  } else {
> /* Section 5.11 of the OpenGL 4.1 spec says:
> @@ -2069,6 +2075,20 @@ fs_visitor::assign_constant_locations()
> stage_prog_data->nr_params = num_push_constants;
> stage_prog_data->nr_pull_params = num_pull_constants;
>
> +   /* Now that we know how many regular uniforms we'll push, reduce the
> +* UBO push ranges so we don't exceed the 3DSTATE_CONSTANT limits.
> +*/
> +   unsigned push_length = DIV_ROUND_UP(stage_prog_data->nr_params, 8);
> +   for (int i = 0; i < 4; i++) {
> +  struct brw_ubo_range *range = &prog_data->ubo_ranges[i];
> +
> +  if (push_length + range->length > 64)
> + range->length = 64 - push_length;
> +
> +  push_length += range->length;
> +   }
> +   assert(push_length <= 64);
> +
> /* Up until now, the param[] array has been indexed by reg + offset
>  * of UNIFORM registers.  Move pull constants into pull_param[] and
>  * condense param[] to only contain the uniforms we chose to push.
> @@ -2103,6 +2123,19 @@ fs_visitor::get_pull_locs(const fs_reg &src,
>  {
> assert(src.file == UNIFORM);
>
> +   if (src.nr >= UBO_START) {
> +  const struct brw_ubo_range *range =
> + &prog_data->ubo_ranges[src.nr - UBO_START];
> +
> +  /* If this access is in our (reduced) range, use the push data. */
> +  if (src.offset / 32 < range->length && !getenv("PULL"))

If the environment variable is useful, make it part of INTEL_DEBUG?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] svga: fix PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE value

2017-07-10 Thread Charmaine Lee


Reviewed-by :Charmaine Lee 

From: Brian Paul 
Sent: Monday, July 10, 2017 7:40 AM
To: mesa-dev@lists.freedesktop.org
Cc: Charmaine Lee; mesa-sta...@lists.freedesktop.org
Subject: [PATCH] svga: fix PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE value

This query is supposed to return the max texture buffer size/width in
texels, not size in bytes.  Divide by 16 (the largest format size) to
return texels.

Fixes Piglit arb_texture_buffer_object-max-size test.

Cc: mesa-sta...@lists.freedesktop.org
---
 src/gallium/drivers/svga/svga_screen.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/svga/svga_screen.c 
b/src/gallium/drivers/svga/svga_screen.c
index 0b63525..f40d151 100644
--- a/src/gallium/drivers/svga/svga_screen.c
+++ b/src/gallium/drivers/svga/svga_screen.c
@@ -312,7 +312,10 @@ svga_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
   return svgascreen->ms_samples ? 1 : 0;

case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
-  return SVGA3D_DX_MAX_RESOURCE_SIZE;
+  /* convert bytes to texels for the case of the largest texel
+   * size: float[4].
+   */
+  return SVGA3D_DX_MAX_RESOURCE_SIZE / (4 * sizeof(float));

case PIPE_CAP_MIN_TEXEL_OFFSET:
   return sws->have_vgpu10 ? VGPU10_MIN_TEXEL_FETCH_OFFSET : 0;
--
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 03/11] i965: Select ranges of UBO data to be uploaded as push constants.

2017-07-10 Thread Matt Turner

On Thu, Jul 6, 2017 at 5:22 PM, Kenneth Graunke  wrote:
> This adds a NIR pass that decides which portions of UBOS we should
> upload as push constants, rather than pull constants.
> ---
>  src/intel/Makefile.sources  |   1 +
>  src/intel/compiler/brw_compiler.h   |  11 +
>  src/intel/compiler/brw_nir.h|   4 +
>  src/intel/compiler/brw_nir_analyze_ubo_ranges.c | 271 
> 
>  src/mesa/drivers/dri/i965/brw_gs.c  |   2 +
>  src/mesa/drivers/dri/i965/brw_tcs.c |   2 +
>  src/mesa/drivers/dri/i965/brw_tes.c |   2 +
>  src/mesa/drivers/dri/i965/brw_vs.c  |   2 +
>  src/mesa/drivers/dri/i965/brw_wm.c  |   2 +
>  9 files changed, 297 insertions(+)
>  create mode 100644 src/intel/compiler/brw_nir_analyze_ubo_ranges.c
>
> diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
> index b672e615c52..f0a8bf517a1 100644
> --- a/src/intel/Makefile.sources
> +++ b/src/intel/Makefile.sources
> @@ -73,6 +73,7 @@ COMPILER_FILES = \
> compiler/brw_nir.h \
> compiler/brw_nir.c \
> compiler/brw_nir_analyze_boolean_resolves.c \
> +   compiler/brw_nir_analyze_ubo_ranges.c \
> compiler/brw_nir_attribute_workarounds.c \
> compiler/brw_nir_intrinsics.c \
> compiler/brw_nir_opt_peephole_ffma.c \
> diff --git a/src/intel/compiler/brw_compiler.h 
> b/src/intel/compiler/brw_compiler.h
> index e4c22e31177..d8e7717e867 100644
> --- a/src/intel/compiler/brw_compiler.h
> +++ b/src/intel/compiler/brw_compiler.h
> @@ -468,6 +468,15 @@ struct brw_image_param {
>   */
>  #define BRW_SHADER_TIME_STRIDE 64
>
> +struct brw_ubo_range
> +{
> +   // XXX: jason says that 255 won't be enough for vulkan - we may have
> +   // large amounts of UBOs in the future.  use uint16_t.
> +   uint8_t block;
> +   uint8_t start;
> +   uint8_t length;
> +};
> +
>  struct brw_stage_prog_data {
> struct {
>/** size of our binding table. */
> @@ -488,6 +497,8 @@ struct brw_stage_prog_data {
>/** @} */
> } binding_table;
>
> +   struct brw_ubo_range ubo_ranges[4];
> +
> GLuint nr_params;   /**< number of float params/constants */
> GLuint nr_pull_params;
> unsigned nr_image_params;
> diff --git a/src/intel/compiler/brw_nir.h b/src/intel/compiler/brw_nir.h
> index 5d866b86ac8..560027c3662 100644
> --- a/src/intel/compiler/brw_nir.h
> +++ b/src/intel/compiler/brw_nir.h
> @@ -142,6 +142,10 @@ void brw_nir_setup_glsl_uniforms(nir_shader *shader,
>  void brw_nir_setup_arb_uniforms(nir_shader *shader, struct gl_program *prog,
>  struct brw_stage_prog_data *stage_prog_data);
>
> +void brw_nir_analyze_ubo_ranges(const struct brw_compiler *compiler,
> +nir_shader *nir,
> +struct brw_ubo_range out_ranges[4]);
> +
>  bool brw_nir_opt_peephole_ffma(nir_shader *shader);
>
>  #define BRW_NIR_FRAG_OUTPUT_INDEX_SHIFT 0
> diff --git a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c 
> b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
> new file mode 100644
> index 000..3535e67758c
> --- /dev/null
> +++ b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
> @@ -0,0 +1,271 @@
> +/*
> + * Copyright © 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include "brw_nir.h"
> +#include "compiler/nir/nir.h"
> +#include "util/u_dynarray.h"
> +
> +/**
> + * \file brw_nir_analyze_ubo_ranges.c
> + *
> + * This pass decides which portions of UBOs to upload as push constants,
> + * so shaders can access them as part of the thread payload, rather than
> + * having to issue expensive memory reads to pull the data.
> + *
> + * The 3DSTATE_CONSTANT_* mechanism can push data from up to 4 different
> + * buffers, in GRF (256-bit/32-byte)

Re: [Mesa-dev] [PATCH] i965: Use brw_bo_wait() for brw_bo_wait_rendering()

2017-07-10 Thread Kenneth Graunke

On Friday, July 7, 2017 5:12:54 AM PDT Chris Wilson wrote:
> Currently, we use set_domain() to cause a stall on rendering. But the
> set-domain ioctl has the side-effect of changing the kernel's cache
> domain underneath the struct_mutex, which may perturb state if there was
> no rendering to wait upon and in general is much heavier than the
> lockless wait-ioctl. Historically libdrm used set-domain as we did not
> have an explicit wait-ioctl (and the patches to teach it to use wait if
> available were lost in the mists). Since mesa already depends upon a
> kernel support the wait-ioctl, we do not need to supply a fallback.
> 
> Signed-off-by: Chris Wilson 
> Cc: Daniel Vetter 
> Cc: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_bufmgr.c| 8 +---
>  src/mesa/drivers/dri/i965/brw_bufmgr.h| 2 +-
>  src/mesa/drivers/dri/i965/brw_context.c   | 2 +-
>  src/mesa/drivers/dri/i965/brw_performance_query.c | 2 +-
>  src/mesa/drivers/dri/i965/intel_batchbuffer.c | 4 ++--
>  5 files changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c 
> b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> index da12a13152..ee4a5cfa2c 100644
> --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
> +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> @@ -831,10 +831,12 @@ brw_bo_get_subdata(struct brw_bo *bo, uint64_t offset,
>  
>  /** Waits for all GPU rendering with the object to have completed. */
>  void
> -brw_bo_wait_rendering(struct brw_context *brw, struct brw_bo *bo)
> +brw_bo_wait_rendering(struct brw_bo *bo)
>  {
> -   set_domain(brw, "waiting for",
> -  bo, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
> +   /* We require a kernel recent enough for WAIT_IOCTL support.
> +* See intel_init_bufmgr()
> +*/
> +   brw_bo_wait(bo, -1);
>  }
>  
>  /**
> diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.h 
> b/src/mesa/drivers/dri/i965/brw_bufmgr.h
> index 4d671b6aae..80c71825e8 100644
> --- a/src/mesa/drivers/dri/i965/brw_bufmgr.h
> +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.h
> @@ -227,7 +227,7 @@ int brw_bo_get_subdata(struct brw_bo *bo, uint64_t offset,
>   * bo_subdata, etc.  It is merely a way for the driver to implement
>   * glFinish.
>   */
> -void brw_bo_wait_rendering(struct brw_context *brw, struct brw_bo *bo);
> +void brw_bo_wait_rendering(struct brw_bo *bo);
>  
>  /**
>   * Tears down the buffer manager instance.
> diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
> b/src/mesa/drivers/dri/i965/brw_context.c
> index 0b3fdc6842..8a3ffab443 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.c
> +++ b/src/mesa/drivers/dri/i965/brw_context.c
> @@ -256,7 +256,7 @@ intel_finish(struct gl_context * ctx)
> intel_glFlush(ctx);
>  
> if (brw->batch.last_bo)
> -  brw_bo_wait_rendering(brw, brw->batch.last_bo);
> +  brw_bo_wait_rendering(brw->batch.last_bo);
>  }
>  
>  static void
> diff --git a/src/mesa/drivers/dri/i965/brw_performance_query.c 
> b/src/mesa/drivers/dri/i965/brw_performance_query.c
> index 81389dbd3e..e4e1854bf2 100644
> --- a/src/mesa/drivers/dri/i965/brw_performance_query.c
> +++ b/src/mesa/drivers/dri/i965/brw_performance_query.c
> @@ -1350,7 +1350,7 @@ brw_wait_perf_query(struct gl_context *ctx, struct 
> gl_perf_query_object *o)
> if (brw_batch_references(&brw->batch, bo))
>intel_batchbuffer_flush(brw);
>  
> -   brw_bo_wait_rendering(brw, bo);
> +   brw_bo_wait_rendering(bo);
>  
> /* Due to a race condition between the OA unit signaling report
>  * availability and the report actually being written into memory,
> diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
> b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> index 62d2fe8ef3..28c2f474c0 100644
> --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> @@ -497,7 +497,7 @@ throttle(struct brw_context *brw)
>  /* Pass NULL rather than brw so we avoid perf_debug warnings;
>   * stalling is common and expected here...
>   */
> -brw_bo_wait_rendering(NULL, brw->throttle_batch[1]);
> +brw_bo_wait_rendering(brw->throttle_batch[1]);
>   }
>   brw_bo_unreference(brw->throttle_batch[1]);
>}
> @@ -723,7 +723,7 @@ _intel_batchbuffer_flush_fence(struct brw_context *brw,
>  
> if (unlikely(INTEL_DEBUG & DEBUG_SYNC)) {
>fprintf(stderr, "waiting for idle\n");
> -  brw_bo_wait_rendering(brw, brw->batch.bo);
> +  brw_bo_wait_rendering(brw->batch.bo);
> }
>  
> /* Start a new batch buffer. */
> 

Reviewed-by: Kenneth Graunke 

and pushed:

To ssh://git.freedesktop.org/git/mesa/mesa
   3b28eaabf60..833108ac14a  master -> master


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [PATCH] nir: copy front interpolation when creating fake back color input

2017-07-10 Thread Andres Gomez

I'll leave it out, then.

Thanks for the feedback! ☺

On Mon, 2017-07-10 at 13:35 -0400, Ilia Mirkin wrote:
> I wouldn't object to it being in stable, but it's also not
> super-important. It does fix some piglits for freedreno though. (I
> don't think vc4 exposes GL 3.0, so the problematic condition can't
> happen there.)
> 
> On Mon, Jul 10, 2017 at 1:30 PM, Andres Gomez  wrote:
> > Ilia, would we want this patch in -stable ?
> > 
> > On Fri, 2017-07-07 at 20:34 -0400, Ilia Mirkin wrote:
> > > Fixes a bunch of gl_BackColor interpolation tests that had explicit
> > > interpolation specified on the fragment shader gl_Color.
> > > 
> > > Signed-off-by: Ilia Mirkin 
> > > ---
> > >  src/compiler/nir/nir_lower_two_sided_color.c | 8 ++--
> > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/src/compiler/nir/nir_lower_two_sided_color.c 
> > > b/src/compiler/nir/nir_lower_two_sided_color.c
> > > index 7d1a3bd236d..90da1013ec8 100644
> > > --- a/src/compiler/nir/nir_lower_two_sided_color.c
> > > +++ b/src/compiler/nir/nir_lower_two_sided_color.c
> > > @@ -46,7 +46,8 @@ typedef struct {
> > >   */
> > > 
> > >  static nir_variable *
> > > -create_input(nir_shader *shader, unsigned drvloc, gl_varying_slot slot)
> > > +create_input(nir_shader *shader, unsigned drvloc, gl_varying_slot slot,
> > > + enum glsl_interp_mode interpolation)
> > >  {
> > > nir_variable *var = rzalloc(shader, nir_variable);
> > > 
> > > @@ -56,6 +57,7 @@ create_input(nir_shader *shader, unsigned drvloc, 
> > > gl_varying_slot slot)
> > > var->name = ralloc_asprintf(var, "in_%d", drvloc);
> > > var->data.index = 0;
> > > var->data.location = slot;
> > > +   var->data.interpolation = interpolation;
> > > 
> > > exec_list_push_tail(&shader->inputs, &var->node);
> > > 
> > > @@ -116,7 +118,9 @@ setup_inputs(lower_2side_state *state)
> > >else
> > >   slot = VARYING_SLOT_BFC1;
> > > 
> > > -  state->colors[i].back = create_input(state->shader, ++maxloc, 
> > > slot);
> > > +  state->colors[i].back = create_input(
> > > +state->shader, ++maxloc, slot,
> > > +state->colors[i].front->data.interpolation);
> > > }
> > > 
> > > return 0;
> > 
> > --
> > Br,
> > 
> > Andres
> 
> ___
> mesa-stable mailing list
> mesa-sta...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-stable
-- 
Br,

Andres
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [PATCH] gallium: improve selection of pixel format

2017-07-10 Thread Andres Gomez

Great!

Thanks for the feedback! ☺

On Mon, 2017-07-10 at 12:18 +0200, Olivier Lauffenburger wrote:
> Technically, a correctly written application should not rely on
> ChoosePixelFormat but should enumerate by itself through all the
> pixel formats and select the best one with its own algorithm.
> 
> However, for the (numerous) applications that use ChoosePixelFormat,
> color depth is more a question of quality of rendering, whereas depth
> and stencil buffers are a question a correct/incorrect rendering.
> 
> The current method gives a higher priority to color depth than to
> depth and stencil buffers depth to the point that a different color
> depth disables the stencil buffer.
> 
> To make things even worse, many applications (including GLUT)
> incorrectly set ppfd->cColorBits to 32 instead of 24, although the
> documentation clearly states that "For RGBA pixel types, it is the
> size of the color buffer, EXCLUDING THE ALPHA BITPLANES" (emphasis is
> mine).
> 
> As a result, those applications never get a stencil buffer, which
> leads to incorrect rendering. I stumbled on this problem while giving
> a try to OpenCSG and it took me a while to discover that the wrong
> results were caused by the absence of a stencil buffer requested by
> GLUT.
> 
> Although there is no universal selection algorithm, the one I suggest
> tries to enforce the following policy:
> 
> - Most important is to enable all the buffers (depth, stencil,
> accumulation...) that are requested (correction).
> - Then, try to allocate at least as many bits as requested (quality +
> performance).
> - Least important, try not to allocate more bits than requested
> (economy).
> 
> This algorithm seems to be in line with the behavior of most Windows
> drivers (Microsoft, NVIDIA, AMD) and, more important, I can't imagine
> a sensible scenario where this change would break an existing
> application.
> 
> -Olivier
> 
> -Message d'origine-
> De : Andres Gomez [mailto:ago...@igalia.com] 
> Envoyé : samedi 8 juillet 2017 22:08
> À : Olivier Lauffenburger ; mesa-dev@li
> sts.freedesktop.org
> Cc : mesa-sta...@lists.freedesktop.org; Brian Paul  >
> Objet : Re: [Mesa-dev] [PATCH] gallium: improve selection of pixel
> format
> 
> Olivier, Brian, do we want this into -stable?
> 
> On Thu, 2017-07-06 at 17:08 +0200, Olivier Lauffenburger wrote:
> > Current selection of pixel format does not enforce the request of 
> > stencil or depth buffer if the color depth is not the same as 
> > requested.
> > For instance, GLUT requests a 32-bit color buffer with an 8-bit 
> > stencil buffer, but because color buffers are only 24-bit, no
> > priority 
> > is given to creating a stencil buffer.
> > 
> > This patch gives more priority to the creation of requested
> > buffers 
> > and less priority to the difference in bit depth.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101703
> > 
> > Signed-off-by: Olivier Lauffenburger 
> > ---
> >  src/gallium/state_trackers/wgl/stw_pixelformat.c | 36 
> > +++-
> >  1 file changed, 29 insertions(+), 7 deletions(-)
> > 
> > diff --git a/src/gallium/state_trackers/wgl/stw_pixelformat.c 
> > b/src/gallium/state_trackers/wgl/stw_pixelformat.c
> > index 7763f71cbc..833308d964 100644
> > --- a/src/gallium/state_trackers/wgl/stw_pixelformat.c
> > +++ b/src/gallium/state_trackers/wgl/stw_pixelformat.c
> > @@ -432,17 +432,39 @@ stw_pixelformat_choose(HDC hdc, CONST
> > PIXELFORMATDESCRIPTOR *ppfd)
> >    !!(pfi->pfd.dwFlags & PFD_DOUBLEBUFFER))
> >   continue;
> >  
> > -  /* FIXME: Take in account individual channel bits */
> > -  if (ppfd->cColorBits != pfi->pfd.cColorBits)
> > - delta += 8;
> > +  /* Selection logic:
> > +  * - Enabling a feature (depth, stencil...) is given highest
> > priority.
> > +  * - Giving as many bits as requested is given medium
> > priority.
> > +  * - Giving no more bits than requested is given lowest
> > priority.
> > +  */
> >  
> > -  if (ppfd->cDepthBits != pfi->pfd.cDepthBits)
> > - delta += 4;
> > +  /* FIXME: Take in account individual channel bits */
> > +  if (ppfd->cColorBits && !pfi->pfd.cColorBits)
> > + delta += 1;
> > +  else if (ppfd->cColorBits > pfi->pfd.cColorBits)
> > + delta += 100;
> > +  else if (ppfd->cColorBits < pfi->pfd.cColorBits)
> > + delta++;
> >  
> > -  if (ppfd->cStencilBits != pfi->pfd.cStencilBits)
> > +  if (ppfd->cDepthBits && !pfi->pfd.cDepthBits)
> > + delta += 1;
> > +  else if (ppfd->cDepthBits > pfi->pfd.cDepthBits)
> > + delta += 200;
> > +  else if (ppfd->cDepthBits < pfi->pfd.cDepthBits)
> >   delta += 2;
> >  
> > -  if (ppfd->cAlphaBits != pfi->pfd.cAlphaBits)
> > +  if (ppfd->cStencilBits && !pfi->pfd.cStencilBits)
> > + delta += 1;
> > +  else if (ppfd->cStencilBits > pfi->pfd.cStencilBits)
> > + delta += 400;
> > +  else if (

Re: [Mesa-dev] [PATCH] nir: copy front interpolation when creating fake back color input

2017-07-10 Thread Ilia Mirkin

I wouldn't object to it being in stable, but it's also not
super-important. It does fix some piglits for freedreno though. (I
don't think vc4 exposes GL 3.0, so the problematic condition can't
happen there.)

On Mon, Jul 10, 2017 at 1:30 PM, Andres Gomez  wrote:
> Ilia, would we want this patch in -stable ?
>
> On Fri, 2017-07-07 at 20:34 -0400, Ilia Mirkin wrote:
>> Fixes a bunch of gl_BackColor interpolation tests that had explicit
>> interpolation specified on the fragment shader gl_Color.
>>
>> Signed-off-by: Ilia Mirkin 
>> ---
>>  src/compiler/nir/nir_lower_two_sided_color.c | 8 ++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/compiler/nir/nir_lower_two_sided_color.c 
>> b/src/compiler/nir/nir_lower_two_sided_color.c
>> index 7d1a3bd236d..90da1013ec8 100644
>> --- a/src/compiler/nir/nir_lower_two_sided_color.c
>> +++ b/src/compiler/nir/nir_lower_two_sided_color.c
>> @@ -46,7 +46,8 @@ typedef struct {
>>   */
>>
>>  static nir_variable *
>> -create_input(nir_shader *shader, unsigned drvloc, gl_varying_slot slot)
>> +create_input(nir_shader *shader, unsigned drvloc, gl_varying_slot slot,
>> + enum glsl_interp_mode interpolation)
>>  {
>> nir_variable *var = rzalloc(shader, nir_variable);
>>
>> @@ -56,6 +57,7 @@ create_input(nir_shader *shader, unsigned drvloc, 
>> gl_varying_slot slot)
>> var->name = ralloc_asprintf(var, "in_%d", drvloc);
>> var->data.index = 0;
>> var->data.location = slot;
>> +   var->data.interpolation = interpolation;
>>
>> exec_list_push_tail(&shader->inputs, &var->node);
>>
>> @@ -116,7 +118,9 @@ setup_inputs(lower_2side_state *state)
>>else
>>   slot = VARYING_SLOT_BFC1;
>>
>> -  state->colors[i].back = create_input(state->shader, ++maxloc, slot);
>> +  state->colors[i].back = create_input(
>> +state->shader, ++maxloc, slot,
>> +state->colors[i].front->data.interpolation);
>> }
>>
>> return 0;
> --
> Br,
>
> Andres
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: copy front interpolation when creating fake back color input

2017-07-10 Thread Andres Gomez

Ilia, would we want this patch in -stable ?

On Fri, 2017-07-07 at 20:34 -0400, Ilia Mirkin wrote:
> Fixes a bunch of gl_BackColor interpolation tests that had explicit
> interpolation specified on the fragment shader gl_Color.
> 
> Signed-off-by: Ilia Mirkin 
> ---
>  src/compiler/nir/nir_lower_two_sided_color.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/src/compiler/nir/nir_lower_two_sided_color.c 
> b/src/compiler/nir/nir_lower_two_sided_color.c
> index 7d1a3bd236d..90da1013ec8 100644
> --- a/src/compiler/nir/nir_lower_two_sided_color.c
> +++ b/src/compiler/nir/nir_lower_two_sided_color.c
> @@ -46,7 +46,8 @@ typedef struct {
>   */
>  
>  static nir_variable *
> -create_input(nir_shader *shader, unsigned drvloc, gl_varying_slot slot)
> +create_input(nir_shader *shader, unsigned drvloc, gl_varying_slot slot,
> + enum glsl_interp_mode interpolation)
>  {
> nir_variable *var = rzalloc(shader, nir_variable);
>  
> @@ -56,6 +57,7 @@ create_input(nir_shader *shader, unsigned drvloc, 
> gl_varying_slot slot)
> var->name = ralloc_asprintf(var, "in_%d", drvloc);
> var->data.index = 0;
> var->data.location = slot;
> +   var->data.interpolation = interpolation;
>  
> exec_list_push_tail(&shader->inputs, &var->node);
>  
> @@ -116,7 +118,9 @@ setup_inputs(lower_2side_state *state)
>else
>   slot = VARYING_SLOT_BFC1;
>  
> -  state->colors[i].back = create_input(state->shader, ++maxloc, slot);
> +  state->colors[i].back = create_input(
> +state->shader, ++maxloc, slot,
> +state->colors[i].front->data.interpolation);
> }
>  
> return 0;
-- 
Br,

Andres
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [Fwd: Re: [PATCH 1/2] intel/isl: Use uint64_t to store total surface size]

2017-07-10 Thread Andres Gomez

OK, thanks for the feedback! ☺

On Mon, 2017-07-10 at 09:46 -0700, Anuj Phogat wrote:
> These patches are not fixing any known bug. But they are adding the
> previously missing surface size limits for the hardware. It's really hard
> to hit these limits. But, let's pick them to stable for the sake of 
> completion.
> Thanks for marking them for mesa-stable.
> 
> 
> On Fri, Jul 7, 2017 at 7:25 AM, Andres Gomez  wrote:
> > On Thu, 2017-07-06 at 18:21 +0100, Emil Velikov wrote:
> > > On 3 July 2017 at 21:14, Andres Gomez  wrote:
> > > > Actually, forgot to add -stable into CC.
> > > > 
> > > >  Forwarded Message 
> > > > From: Andres Gomez 
> > > > To: Anuj Phogat , mesa-dev@lists.freedesktop.org
> > > > Subject: Re: [Mesa-dev] [PATCH 1/2] intel/isl: Use uint64_t to store
> > > > total surface size
> > > > Date: Mon, 03 Jul 2017 23:02:31 +0300
> > > > 
> > > > It looks like we could want these 2 into -stable (?)
> > > > 
> > > 
> > > Shouldn't hurt, despite that most of the
> > > isl_surf_init/isl_surf_get_[a-z]_surf handling is a simple assert(ok).
> > > I'll leave the call to the experts, but my take is "don't bother".
> > 
> > OK, I'll wait to see if Anuj has anything to say before picking this
> > one, then.
> > 
> > Thanks for the feedback!
> > 
> > --
> > Br,
> > 
> > Andres
> 
> ___
> mesa-stable mailing list
> mesa-sta...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-stable
-- 
Br,

Andres
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 18/20] i965/fs: Match destination type to size for ballot

2017-07-10 Thread Matt Turner

No use in taking a 64-bit value when we know the high 32-bits are zero.
---
v2: Update for v2 of 16/20 (Connor)

 src/intel/compiler/brw_compiler.c | 2 +-
 src/intel/compiler/brw_fs_nir.cpp | 6 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_compiler.c 
b/src/intel/compiler/brw_compiler.c
index b910fcbc3d..195dbf2a02 100644
--- a/src/intel/compiler/brw_compiler.c
+++ b/src/intel/compiler/brw_compiler.c
@@ -57,7 +57,7 @@ static const struct nir_shader_compiler_options 
scalar_nir_options = {
.lower_unpack_snorm_4x8 = true,
.lower_unpack_unorm_2x16 = true,
.lower_unpack_unorm_4x8 = true,
-   .max_subgroup_size = 64, /* FIXME */
+   .max_subgroup_size = 32,
.max_unroll_iterations = 32,
 };
 
diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 25e9b703eb..9c69ade6e5 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4147,7 +4147,11 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
   bld.exec_all().MOV(flag, brw_imm_ud(0u));
   bld.CMP(bld.null_reg_ud(), value, brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
 
-  dest.type = BRW_REGISTER_TYPE_UQ;
+  if (instr->dest.ssa.bit_size > 32) {
+ dest.type = BRW_REGISTER_TYPE_UQ;
+  } else {
+ dest.type = BRW_REGISTER_TYPE_UD;
+  }
   bld.MOV(dest, flag);
   break;
}
-- 
2.13.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 16/20] nir: Reduce destination size of ballot intrinsic when possible

2017-07-10 Thread Matt Turner

Some hardware, like i965, doesn't support group sizes greater than 32.
In that case, we can reduce the destination size of the ballot
intrinsic, which will simplify our code generation.
---
v2: Just change the intrinsic size, and don't add a new intrinsic (Connor)

 src/compiler/nir/nir.h|  2 ++
 src/compiler/nir/nir_opt_intrinsics.c | 18 ++
 src/intel/compiler/brw_compiler.c |  1 +
 3 files changed, 21 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 1e2d7d3cf6..5518807b0b 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1842,6 +1842,8 @@ typedef struct nir_shader_compiler_options {
 */
bool use_interpolated_input_intrinsics;
 
+   unsigned max_subgroup_size;
+
unsigned max_unroll_iterations;
 } nir_shader_compiler_options;
 
diff --git a/src/compiler/nir/nir_opt_intrinsics.c 
b/src/compiler/nir/nir_opt_intrinsics.c
index 0358680aae..d30c1cf6bb 100644
--- a/src/compiler/nir/nir_opt_intrinsics.c
+++ b/src/compiler/nir/nir_opt_intrinsics.c
@@ -62,6 +62,24 @@ opt_intrinsics_impl(nir_function_impl *impl)
 replacement = nir_imm_int(&b, NIR_TRUE);
 break;
  }
+ case nir_intrinsic_ballot: {
+assert(b.shader->options->max_subgroup_size != 0);
+if (b.shader->options->max_subgroup_size > 32 ||
+intrin->dest.ssa.bit_size <= 32)
+   continue;
+
+nir_intrinsic_instr *ballot =
+   nir_intrinsic_instr_create(b.shader, nir_intrinsic_ballot);
+nir_ssa_dest_init(&ballot->instr, &ballot->dest, 1, 32, NULL);
+ballot->src[0] = intrin->src[0];
+
+nir_builder_instr_insert(&b, &ballot->instr);
+
+replacement = nir_pack_64_2x32_split(&b,
+ &ballot->dest.ssa,
+ nir_imm_int(&b, 0));
+break;
+ }
  default:
 break;
  }
diff --git a/src/intel/compiler/brw_compiler.c 
b/src/intel/compiler/brw_compiler.c
index 397c8cccf9..b910fcbc3d 100644
--- a/src/intel/compiler/brw_compiler.c
+++ b/src/intel/compiler/brw_compiler.c
@@ -57,6 +57,7 @@ static const struct nir_shader_compiler_options 
scalar_nir_options = {
.lower_unpack_snorm_4x8 = true,
.lower_unpack_unorm_2x16 = true,
.lower_unpack_unorm_4x8 = true,
+   .max_subgroup_size = 64, /* FIXME */
.max_unroll_iterations = 32,
 };
 
-- 
2.13.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 16/20] nir: Add a ballot32 intrinsic

2017-07-10 Thread Matt Turner

On Thu, Jul 6, 2017 at 7:39 PM, Connor Abbott  wrote:
> I've thought about this a little bit, and I think we'd rather just
> decrease the bitsize of the intrinsic rather than add a whole new one.
> The separate intrinsic isn't really buying you anything, I don't think
> it's going to make anything simpler.

Thanks. That's a good idea. I've fixed that locally.

I think I tried doing that during development and ran into some other
problem. Regardless, it all works now. I'll resend 16/20 and 18/20.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 03/20] nir: Support lowering vote intrinsics

2017-07-10 Thread Matt Turner

On Thu, Jul 6, 2017 at 8:04 PM, Connor Abbott  wrote:
> On Thu, Jul 6, 2017 at 4:48 PM, Matt Turner  wrote:
>> ... trivially (as allowed by the spec!) by reusing the existing
>> nir_opt_intrinsics code.
>> ---
>>  src/compiler/nir/nir.h| 4 
>>  src/compiler/nir/nir_opt_intrinsics.c | 6 +++---
>>  2 files changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> index 44a1d0887e..401c41f155 100644
>> --- a/src/compiler/nir/nir.h
>> +++ b/src/compiler/nir/nir.h
>> @@ -1821,6 +1821,10 @@ typedef struct nir_shader_compiler_options {
>> bool lower_extract_byte;
>> bool lower_extract_word;
>>
>> +   bool lower_vote_any;
>> +   bool lower_vote_all;
>> +   bool lower_vote_eq;
>
> Since there are potentially multiple ways to lower these (voteAny(x)
> -> !voteAll(!x), using ballotARB(), etc.), and the way they're lowered
> is a little... unexpected (although admittedly legal!), why don't we
> use a more descriptive name, like lower_vote_*_trivial? While we're at
> it, I highly doubt that an implementation would want this kind of
> lowering for just one of the intrinsics, so we can merge this into a
> single flag, say lower_vote_trivial.

Thanks, both good ideas. I've replaced all three fields with a
lower_vote_trivial field.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [EGL android: accquire fence implementation] i965: Queue the buffer with a sync fence for Android OS

2017-07-10 Thread Emil Velikov

On 10 July 2017 at 15:26, Marathe, Yogesh  wrote:
> Hello Emil, My two cents since I too spent some time on this.
>
>> -Original Message-
>> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf
>> Of Emil Velikov
>> Sent: Monday, July 10, 2017 4:41 PM
>> To: Wu, Zhongmin 
>> Cc: Widawsky, Benjamin ; Liu, Zhiquan
>> ; Eric Engestrom ; Rob Clark
>> ; Tomasz Figa ; Kenneth
>> Graunke ; Kondapally, Kalyan
>> ; ML mesa-dev > d...@lists.freedesktop.org>; Timothy Arceri ; Chuanbo
>> Weng 
>> Subject: Re: [Mesa-dev] [EGL android: accquire fence implementation] i965:
>> Queue the buffer with a sync fence for Android OS
>>
>> Hi Zhongmin Wu,
>>
>> Above all, a bit of a disclaimer: I'm by no means an expert on the topic so 
>> take
>> the following with a pinch of salt.
>>
>> On 10 July 2017 at 03:11, Zhongmin Wu  wrote:
>> > Before we queued the buffer with a invalid fence (-1), it will make
>> > some benchmarks failed to test such as flatland.
>> >
>> > Now we get the out fence during the flushing buffer and then pass it
>> > to SurfaceFlinger in eglSwapbuffer function.
>> >
>> Having a closer look it seems that the issue can be summarised as follows:
>>  - flatland intercepts/interacts ANativeWindow::{de,}queueBuffer (how about
>> ANativeWindow::cancelBuffer?)
>>  - the program expects that a sync fd is available for both dequeue and queue
>>
>> At the same time:
>>  - the ANativeWindow documentation does _not_ state such requirement
>>  - even if it did, that will be somewhat wrong, since
>> ANativeWindow::queueBuffer is called by eglSwapBuffers() Where the latter
>> documentation clearly states - "... performs an implicit flush ... glFlush 
>> ...
>> vgFlush"
>>
>> My take is that if flatland/Android framework does want an explicit sync 
>> point it
>> should insert one with the EGL API.
>> There could be alternative solutions, but the proposed patch seems wrong
>> IMHO.
>
> In fact, I could work this around in producer  (Surface::queueBuffer) by 
> ignoring the (-1)
> passed and by creating a sync using egl APIs. I see two problems with that.
>
> - Before getting a fd using eglDupNativeFenceFDANDROID(), you need a 
> glFlush(),
>this costs additional cycles for each queueBuffer transaction on each 
> BufferItem and
>I believe fd is also signaled due to this. (so I don’t know what we'll get 
> by waiting on
>that fd on consumer side).
> - AFAIK, the whole idea of explicit sync revolves around being able to pass 
> fds created
>   by driver between processes and this one breaks that chain. If we work this 
> around in
>   upper layers, explicit sync feature will have to be fixed for every other 
> lib that may use
>   lib mesa underneath.
>
> For these reasons, I still believe we should fix it here. Of course, you and 
> Rob have very
> valid points on cancelBuffer and about not breaking gallium respectively, 
> those need to
> be taken care of.
>
What I'm saying is - seems like the app/framework does something silly
or at least undocumented.
Fixing things in Mesa may be the right thing to do, but without more
information, its everyone's guess who's got it wrong.

As Rob asked earlier - can we get an a simple test case or some pseudo
code illustrating the whole thing?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] svga: fix breakage in create_backed_surface_view()

2017-07-10 Thread Charmaine Lee


Reviewed-by: Charmaine Lee 

From: Brian Paul 
Sent: Sunday, July 9, 2017 1:00 PM
To: mesa-dev@lists.freedesktop.org
Cc: Charmaine Lee
Subject: [PATCH] svga: fix breakage in create_backed_surface_view()

This fixes a regression in some piglit tests since commit 5e5d5f1a2eb.
I think I mis-resolved the merge conflict when cherry-picking that
commit to master.
---
 src/gallium/drivers/svga/svga_surface.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_surface.c 
b/src/gallium/drivers/svga/svga_surface.c
index 1f50b9c..d7c9850 100644
--- a/src/gallium/drivers/svga/svga_surface.c
+++ b/src/gallium/drivers/svga/svga_surface.c
@@ -446,6 +446,8 @@ create_backed_surface_view(struct svga_context *svga, 
struct svga_surface *s)
  goto done;

   s->backed = svga_surface(backed_view);
+
+  SVGA_STATS_TIME_POP(svga_sws(svga));
}
else if (s->backed->age < tex->age) {
   /*
@@ -474,12 +476,9 @@ create_backed_surface_view(struct svga_context *svga, 
struct svga_surface *s)
 bs->key.numMipLevels,
 bs->key.numFaces * bs->key.arraySize,
 zslice, s->base.u.tex.level, layer);
-
-  svga_mark_surface_dirty(&s->backed->base);
-
-  SVGA_STATS_TIME_POP(svga_sws(svga));
}

+   svga_mark_surface_dirty(&s->backed->base);
s->backed->age = tex->age;

 done:
--
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 08/16] anv: Transition more color buffer layouts

2017-07-10 Thread Jason Ekstrand

On Wed, Jun 28, 2017 at 2:14 PM, Nanley Chery  wrote:

> v2: Expound on comment for the pipe controls (Jason Ekstrand).
>
> Signed-off-by: Nanley Chery 
> ---
>  src/intel/vulkan/anv_blorp.c   |   4 +-
>  src/intel/vulkan/genX_cmd_buffer.c | 183 ++
> +++
>  2 files changed, 167 insertions(+), 20 deletions(-)
>
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> index 459d57ec57..84b01e8792 100644
> --- a/src/intel/vulkan/anv_blorp.c
> +++ b/src/intel/vulkan/anv_blorp.c
> @@ -1451,7 +1451,9 @@ anv_image_ccs_clear(struct anv_cmd_buffer
> *cmd_buffer,
>
> struct blorp_surf surf;
> get_blorp_surf_for_anv_image(image, VK_IMAGE_ASPECT_COLOR_BIT,
> -image->aux_usage, &surf);
> +image->aux_usage == ISL_AUX_USAGE_CCS_E ?
> +ISL_AUX_USAGE_CCS_E : ISL_AUX_USAGE_CCS_D,
> +&surf);
>
> /* From the Sky Lake PRM Vol. 7, "Render Target Fast Clear":
>  *
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> b/src/intel/vulkan/genX_cmd_buffer.c
> index decf0b28d6..1a9b841c7c 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -524,6 +524,17 @@ genX(copy_fast_clear_dwords)(struct anv_cmd_buffer
> *cmd_buffer,
> }
>  }
>
> +/**
> + * @brief Transitions a color buffer from one layout to another.
> + *
> + * See section 6.1.1. Image Layout Transitions of the Vulkan 1.0.50 spec
> for
> + * more information.
> + *
> + * @param level_count VK_REMAINING_MIP_LEVELS isn't supported.
> + * @param layer_count VK_REMAINING_ARRAY_LAYERS isn't supported. For 3D
> images,
> + *this represents the maximum layers to transition at
> each
> + *specified miplevel.
> + */
>  static void
>  transition_color_buffer(struct anv_cmd_buffer *cmd_buffer,
>  const struct anv_image *image,
> @@ -532,13 +543,27 @@ transition_color_buffer(struct anv_cmd_buffer
> *cmd_buffer,
>  VkImageLayout initial_layout,
>  VkImageLayout final_layout)
>  {
> -   assert(image->aspects == VK_IMAGE_ASPECT_COLOR_BIT);
> -
> -   if (image->aux_surface.isl.size == 0)
> -  return;
> -
> -   if (initial_layout != VK_IMAGE_LAYOUT_UNDEFINED &&
> -   initial_layout != VK_IMAGE_LAYOUT_PREINITIALIZED)
> +   /* Validate the inputs. */
> +   assert(cmd_buffer);
> +   assert(image && image->aspects == VK_IMAGE_ASPECT_COLOR_BIT);
> +   /* These values aren't supported for simplicity's sake. */
> +   assert(level_count != VK_REMAINING_MIP_LEVELS &&
> +  layer_count != VK_REMAINING_ARRAY_LAYERS);
> +   /* Ensure the subresource range is valid. */
> +   uint64_t last_level_num = base_level + level_count;
> +   const uint32_t max_depth = anv_minify(image->extent.depth,
> base_level);
> +   const uint32_t image_layers = MAX2(image->array_size, max_depth);
> +   assert(base_layer + layer_count  <= image_layers);
> +   assert(last_level_num <= image->levels);
> +   /* The spec disallows these final layouts. */
> +   assert(final_layout != VK_IMAGE_LAYOUT_UNDEFINED &&
> +  final_layout != VK_IMAGE_LAYOUT_PREINITIALIZED);
> +
> +   /* No work is necessary if the layout stays the same or if this
> subresource
> +* range lacks auxiliary data.
> +*/
> +   if (initial_layout == final_layout ||
> +   base_layer >= anv_image_aux_layers(image, base_level))
>return;
>
> /* A transition of a 3D subresource works on all slices at a time. */
> @@ -549,22 +574,142 @@ transition_color_buffer(struct anv_cmd_buffer
> *cmd_buffer,
>
> /* We're interested in the subresource range subset that has aux data.
> */
> level_count = MIN2(level_count, anv_image_aux_levels(image));
> +   layer_count = MIN2(layer_count, anv_image_aux_layers(image,
> base_level));
>

Is this correct?  I think we want MIN2(layer_count, anv_image_aux_layers()
- base_layer), don't we?  This would also mean there's a bug in the current
level_count.


> +   last_level_num = base_level + level_count;
> +
> +   /* Record whether or not the layout is undefined. Pre-initialized
> images
> +* with auxiliary buffers have a non-linear layout and are thus
> undefined.
> +*/
> +   assert(image->tiling == VK_IMAGE_TILING_OPTIMAL);
> +   const bool undef_layout = initial_layout == VK_IMAGE_LAYOUT_UNDEFINED
> ||
> + initial_layout == VK_IMAGE_LAYOUT_
> PREINITIALIZED;
>
> -   /* We're transitioning from an undefined layout. We must ensure that
> the
> -* clear values buffer is filled with valid data.
> +   /* Do preparatory work before the resolve operation or return early if
> no
> +* resolve is actually needed.
>  */
> -   for (unsigned l = 0; l < level_count; l++)
> -  init_fast_clear_state_entry(cmd_buffer, image, base_level + l);
> -
> -   if (image->aux_usage == ISL_AUX_USAGE_CCS_E) {
>

Re: [Mesa-dev] [PATCH 2/2] swr: build driver proper separate from rasterizer

2017-07-10 Thread Rowley, Timothy O


On Jul 10, 2017, at 8:24 AM, Emil Velikov 
mailto:emil.l.veli...@gmail.com>> wrote:

Hi Tim,

On 7 July 2017 at 22:25, Tim Rowley 
mailto:timothy.o.row...@intel.com>> wrote:
swr used to build and link the rasterizer to the driver, and to support
multiple architectures we needed to have multiple versions of the
driver/rasterizer combination, which needed to link in much of mesa.

Changing to having one instance of the driver and just building
architecture specific versions of the rasterizer gives a large reduction
in disk space.

libGL.so6464 Kb ->  7000 Kb
libswrAVX.so   10068 Kb ->  5432 Kb
libswrAVX2.so   9828 Kb ->  5200 Kb

If one considers the other binaries which include 
libmesaswr.la
(swr_dri.so, osmesa, etc) savings might be a bit smaller ;-)
Regardless, thank you for working on this.

I do have an ulterior motive in mind for reducing our footprint - there’s a 
couple follow-up patches to come, one of which will make the swr architectures 
we build configurable, and another which will add a KNL architecture (disabled 
by default).


Total  26360 Kb -> 17632 Kb
---
src/gallium/drivers/swr/Makefile.am | 24 +---
src/gallium/drivers/swr/swr_context.cpp |  2 +-
src/gallium/drivers/swr/swr_loader.cpp  | 14 ++
src/gallium/drivers/swr/swr_screen.h|  2 ++
4 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/src/gallium/drivers/swr/Makefile.am 
b/src/gallium/drivers/swr/Makefile.am
index 4b4bd37..e764e0d 100644
--- a/src/gallium/drivers/swr/Makefile.am
+++ b/src/gallium/drivers/swr/Makefile.am
@@ -26,7 +26,13 @@ AM_CXXFLAGS = $(GALLIUM_DRIVER_CFLAGS) $(SWR_CXX11_CXXFLAGS)

noinst_LTLIBRARIES = libmesaswr.la

-libmesaswr_la_SOURCES = $(LOADER_SOURCES)
+libmesaswr_la_SOURCES = \

+   $(COMMON_CXX_SOURCES) \
+   rasterizer/codegen/gen_knobs.cpp \
+   rasterizer/codegen/gen_knobs.h \
These three now seems to be duplicated across the frontend and
AVX/AVX2 backends. Is that intentional?
Worth adding a note?

Yes, that was intentional - our driver looks at a handful of knobs (primarily 
the hardcoded ones in knobs.h, but also one out of gen_knobs.h).  Adding a knob 
query to the api table didn’t really fit the rest of the api, so I decided to 
take the small hit of duplication.  That does mean the driver can no longer 
override knobs for the core, which is why a previous patch moved the tuning of 
the frontend draw split from the driver to the core.

+libmesaswr_la_CXXFLAGS = \
+   $(SWR_AVX_CXXFLAGS) \
+   -DKNOB_ARCH=KNOB_ARCH_AVX \
With his KNOB, the frontend will be build for AVX. What about AVX2?

This is an artifact of api.h including state.h, which contains both api-exposed 
state structures and internal ones which depend on the simd size.  The former 
are simd-size safe, but as our intrinsics layer is included we need to specify 
some architecture to allow a compile.  I chose the lowest common denominator 
architecture in case some simd-using helper function got called.  I’ll look 
into splitting state.h into internal/external in a future commit.

-COMMON_LIBADD = \
-   
$(top_builddir)/src/gallium/auxiliary/libgallium.la \
-   $(top_builddir)/src/mesa/libmesagallium.la \
-   $(LLVM_LIBS)
-
With this gone libswrAVX{,2}_la_LIBADD become empty, so we can drop them.

Will remove.


Can you check that configure --with-gallium-drivers=swr
--enable-gallium-osmesa --disable-dri --enable-glx=gallium-xlib build
fine (needs a second run dropping the latter two options). I cannot
spot anything obvious - just a gut feeling. You might want to sort the
SCons build as well?


gallium-xlib is the configuration we normally build and test with.  A dri 
version builds, but I don’t have a machine that I can actually run it on.

SCons - the build system I keep forgetting.  Working on getting this updated 
and tested for v2 of the patch.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [Fwd: Re: [PATCH 1/2] intel/isl: Use uint64_t to store total surface size]

2017-07-10 Thread Anuj Phogat

These patches are not fixing any known bug. But they are adding the
previously missing surface size limits for the hardware. It's really hard
to hit these limits. But, let's pick them to stable for the sake of completion.
Thanks for marking them for mesa-stable.


On Fri, Jul 7, 2017 at 7:25 AM, Andres Gomez  wrote:
> On Thu, 2017-07-06 at 18:21 +0100, Emil Velikov wrote:
>> On 3 July 2017 at 21:14, Andres Gomez  wrote:
>> > Actually, forgot to add -stable into CC.
>> >
>> >  Forwarded Message 
>> > From: Andres Gomez 
>> > To: Anuj Phogat , mesa-dev@lists.freedesktop.org
>> > Subject: Re: [Mesa-dev] [PATCH 1/2] intel/isl: Use uint64_t to store
>> > total surface size
>> > Date: Mon, 03 Jul 2017 23:02:31 +0300
>> >
>> > It looks like we could want these 2 into -stable (?)
>> >
>>
>> Shouldn't hurt, despite that most of the
>> isl_surf_init/isl_surf_get_[a-z]_surf handling is a simple assert(ok).
>> I'll leave the call to the experts, but my take is "don't bother".
>
> OK, I'll wait to see if Anuj has anything to say before picking this
> one, then.
>
> Thanks for the feedback!
>
> --
> Br,
>
> Andres
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 07/16] anv/cmd_buffer: Ensure fast-clear values are current

2017-07-10 Thread Jason Ekstrand

On Wed, Jun 28, 2017 at 2:14 PM, Nanley Chery  wrote:

> v2: Rewrite functions, change location of synchronization.
>
> Signed-off-by: Nanley Chery 
> ---
>  src/intel/vulkan/genX_cmd_buffer.c | 114 ++
> +++
>  1 file changed, 114 insertions(+)
>
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> b/src/intel/vulkan/genX_cmd_buffer.c
> index 253e68cd1f..decf0b28d6 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -479,6 +479,51 @@ init_fast_clear_state_entry(struct anv_cmd_buffer
> *cmd_buffer,
> }
>  }
>
> +/* Copy the fast-clear value dword(s) between a surface state object and
> an
> + * image's fast clear state buffer.
> + */
> +static void
> +genX(copy_fast_clear_dwords)(struct anv_cmd_buffer *cmd_buffer,
> + struct anv_state surface_state,
> + const struct anv_image *image,
> + unsigned level,
> + bool copy_from_surface_state)
> +{
> +   assert(cmd_buffer && image);
> +   assert(image->aspects == VK_IMAGE_ASPECT_COLOR_BIT);
> +   assert(level < anv_image_aux_levels(image));
> +
> +   struct anv_bo *ss_bo =
> +  &cmd_buffer->device->surface_state_pool.block_pool.bo;
> +   uint32_t ss_clear_offset = surface_state.offset +
> +  cmd_buffer->device->isl_dev.ss.clear_value_offset;
> +   uint32_t entry_offset =
> +  get_fast_clear_state_entry_offset(cmd_buffer->device, image,
> level);
> +   unsigned copy_size = cmd_buffer->device->isl_dev.ss.clear_value_size;
> +
> +   if (copy_from_surface_state) {
> +  genX(cmd_buffer_mi_memcpy)(cmd_buffer, image->bo, entry_offset,
> + ss_bo, ss_clear_offset, copy_size);
> +   } else {
> +  genX(cmd_buffer_mi_memcpy)(cmd_buffer, ss_bo, ss_clear_offset,
> + image->bo, entry_offset, copy_size);
> +
> +  /* Updating a surface state object may require that the state cache
> be
> +   * invalidated. From the SKL PRM, Shared Functions -> State -> State
> +   * Caching:
> +   *
> +   *Whenever the RENDER_SURFACE_STATE object in memory pointed to
> by
> +   *the Binding Table Pointer (BTP) and Binding Table Index (BTI)
> is
> +   *modified [...], the L1 state cache must be invalidated to
> ensure
> +   *the new surface or sampler state is fetched from system
> memory.
> +   *
> +   * In testing, SKL doesn't actually seem to need this, but HSW does.
> +   */
> +  cmd_buffer->state.pending_pipe_bits |=
> + ANV_PIPE_STATE_CACHE_INVALIDATE_BIT;
> +   }
> +}
> +
>  static void
>  transition_color_buffer(struct anv_cmd_buffer *cmd_buffer,
>  const struct anv_image *image,
> @@ -2615,6 +2660,66 @@ cmd_buffer_subpass_transition_layouts(struct
> anv_cmd_buffer * const cmd_buffer,
> }
>  }
>
> +/* Update the clear value dword(s) in surface state objects or the fast
> clear
> + * state buffer entry for the color attachments used in this subpass.
> + */
> +static void
> +cmd_buffer_subpass_sync_fast_clear_values(struct anv_cmd_buffer
> *cmd_buffer)
> +{
> +   assert(cmd_buffer && cmd_buffer->state.subpass);
> +
> +   const struct anv_cmd_state *state = &cmd_buffer->state;
> +
> +   /* Iterate through every color attachment used in this subpass. */
> +   for (uint32_t i = 0; i < state->subpass->color_count; ++i) {
> +
> +  /* The attachment should be one of the attachments described in the
> +   * render pass and used in the subpass.
> +   */
> +  const uint32_t a = state->subpass->color_attachments[i].attachment;
> +  assert(a < state->pass->attachment_count);
> +  if (a == VK_ATTACHMENT_UNUSED)
> + continue;
> +
> +  /* Store some information regarding this attachment. */
> +  const struct anv_attachment_state *att_state =
> &state->attachments[a];
> +  const struct anv_image_view *iview = state->framebuffer->
> attachments[a];
> +  const struct anv_render_pass_attachment *rp_att =
> + &state->pass->attachments[a];
> +
> +  if (att_state->aux_usage == ISL_AUX_USAGE_NONE)
> + continue;
> +
> +  /* The fast clear state entry must be updated if a fast clear is
> going to
> +   * happen. The surface state must be updated if the clear value
> from a
> +   * prior fast clear may be needed.
> +   */
> +  if (att_state->pending_clear_aspects && att_state->fast_clear) {
> + /* Update the fast clear state entry. */
> + genX(copy_fast_clear_dwords)(cmd_buffer,
> att_state->color_rt_state,
> +  iview->image, iview->isl.base_level,
> +  true /* copy from ss */);
>

In the future, I think we will want to do this as part of the fast-clear
operation rather than as a "synchronization" step.  Why?  Because we're
going to want to store the fast-clear color in two

[Mesa-dev] [Bug 101614] OSMesa 17.1.3 simd16intrin build FAIL on Win/MinGW - 'expected initializer before _simd16_setzero_ps ...'

2017-07-10 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=101614

--- Comment #14 from Emil Velikov  ---
George, I'm only build-testing those, I have no setup to actually test those.

No idea on the on the symbols part. In general scons or mingw (the w64 of
course), Brian and Jose are the people to check with.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 101614] OSMesa 17.1.3 simd16intrin build FAIL on Win/MinGW - 'expected initializer before _simd16_setzero_ps ...'

2017-07-10 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=101614

--- Comment #13 from Emil Velikov  ---
(In reply to Trevor SANDY from comment #10)
> Ok, Thanks to George's patch (0001-mingw-fixes.patch), the behaviour
> reported in this ticket is fixed.
> 
> I applied the set below (excluding the conflict) to revision
> 89d4008ac85714bab8c49974377fd37970f6d66a of the master branch and the build
> is now running smoothly. 
> 
> # add_pi.patch \
> # gallium-once-flag.patch \
> # gallium-osmesa-threadsafe.patch \
> # glapi-getproc-mangled.patch \
> # install-GL-headers.patch \
> # lp_scene-safe.patch \
> # mesa-glversion-override.patch \
> # osmesa-gallium-driver.patch \
> # redefinition-of-typedef-nirshader.patch \
> # scons25.patch \
> # scons-llvm-3-9-libs.patch \
> # swr-sched.patch \
> #   scons-swr-cc-arch.patch \ (conflict)
> # msys2_scons_fix.patch \
> # 0001-mingw-fixes.patch \
> 
You really want to get these sorted and pushed upstream. While most devs will
be happy to pull misc trees and play around, you're not doing yourself or them
a favour.

That's enough harping from me on the topic.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 06/16] anv/gpu_memcpy: Add a lighter-weight GPU memcpy function

2017-07-10 Thread Jason Ekstrand

On Wed, Jun 28, 2017 at 2:14 PM, Nanley Chery  wrote:

> We'll be performing a GPU memcpy in more places to copy small amounts of
> data. Add an alternate function that thrashes less state.
>
> v2:
> - Make a new function (Jason Ekstrand).
> - Move the #define into the function.
> v3:
> - Update the function name (Jason).
> - Update comments.
>
> Signed-off-by: Nanley Chery 
> ---
>  src/intel/vulkan/anv_genX.h|  5 +
>  src/intel/vulkan/genX_gpu_memcpy.c | 40 ++
> 
>  2 files changed, 45 insertions(+)
>
> diff --git a/src/intel/vulkan/anv_genX.h b/src/intel/vulkan/anv_genX.h
> index 8da5e075dc..0b7322e281 100644
> --- a/src/intel/vulkan/anv_genX.h
> +++ b/src/intel/vulkan/anv_genX.h
> @@ -69,5 +69,10 @@ void genX(cmd_buffer_so_memcpy)(struct anv_cmd_buffer
> *cmd_buffer,
>  struct anv_bo *src, uint32_t src_offset,
>  uint32_t size);
>
> +void genX(cmd_buffer_mi_memcpy)(struct anv_cmd_buffer *cmd_buffer,
> +struct anv_bo *dst, uint32_t dst_offset,
> +struct anv_bo *src, uint32_t src_offset,
> +uint32_t size);
> +
>  void genX(blorp_exec)(struct blorp_batch *batch,
>const struct blorp_params *params);
> diff --git a/src/intel/vulkan/genX_gpu_memcpy.c
> b/src/intel/vulkan/genX_gpu_memcpy.c
> index 5ef35e6283..9c6b46de94 100644
> --- a/src/intel/vulkan/genX_gpu_memcpy.c
> +++ b/src/intel/vulkan/genX_gpu_memcpy.c
> @@ -52,6 +52,46 @@ gcd_pow2_u64(uint64_t a, uint64_t b)
>  }
>
>  void
> +genX(cmd_buffer_mi_memcpy)(struct anv_cmd_buffer *cmd_buffer,
> +   struct anv_bo *dst, uint32_t dst_offset,
> +   struct anv_bo *src, uint32_t src_offset,
> +   uint32_t size)
> +{
> +   /* This memcpy operates in units of dwords. */
> +   assert(size % 4 == 0);
> +   assert(dst_offset % 4 == 0);
> +   assert(src_offset % 4 == 0);
> +
> +   for (uint32_t i = 0; i < size; i += 4) {
> +  const struct anv_address src_addr =
> + (struct anv_address) { src, src_offset + i};
> +  const struct anv_address dst_addr =
> + (struct anv_address) { dst, dst_offset + i};
> +#if GEN_GEN >= 8
> +  anv_batch_emit(&cmd_buffer->batch, GENX(MI_COPY_MEM_MEM), cp) {
> + cp.DestinationMemoryAddress = dst_addr;
> + cp.SourceMemoryAddress = src_addr;
> +  }
> +#else
> +  /* IVB does not have a general purpose register for command streamer
> +   * commands. Therefore, we use an alternate temporary register.
> +   */
> +#define TEMP_REG 0x2400 /* MI_PREDICATE_SRC0 */
>

Using the predicate register seems a bit sketchy.  Vulkan doesn't support
predication today so it's probably safe but I don't know what form
predication will take in the future (there's a decent chance it'll get
added) so I have no idea if this will end up being safe.  Why not use one
of the indirect dispatch/draw registers?  Those will be safe because we
only ever set them immediately before 3DPRIMITIVE or GPGPU_WALKER.

--Jason


> +  anv_batch_emit(&cmd_buffer->batch, GENX(MI_LOAD_REGISTER_MEM),
> load) {
> + load.RegisterAddress = TEMP_REG;
> + load.MemoryAddress = src_addr;
> +  }
> +  anv_batch_emit(&cmd_buffer->batch, GENX(MI_STORE_REGISTER_MEM),
> store) {
> + store.RegisterAddress = TEMP_REG;
> + store.MemoryAddress = dst_addr;
> +  }
> +#undef TEMP_REG
> +#endif
> +   }
> +   return;
> +}
> +
> +void
>  genX(cmd_buffer_so_memcpy)(struct anv_cmd_buffer *cmd_buffer,
> struct anv_bo *dst, uint32_t dst_offset,
> struct anv_bo *src, uint32_t src_offset,
> --
> 2.13.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 05/16] anv/cmd_buffer: Restrict fast clears in the GENERAL layout

2017-07-10 Thread Jason Ekstrand

On Wed, Jun 28, 2017 at 2:14 PM, Nanley Chery  wrote:

> v2: Remove ::first_subpass_layout assertion (Jason Ekstrand).
> v3: Allow some fast clears in the GENERAL layout.
>
> Signed-off-by: Nanley Chery 
> ---
>  src/intel/vulkan/anv_pass.c| 22 ++
>  src/intel/vulkan/anv_private.h |  2 ++
>  src/intel/vulkan/genX_cmd_buffer.c | 17 -
>  3 files changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/vulkan/anv_pass.c b/src/intel/vulkan/anv_pass.c
> index 1b30c1409d..ab0733fc10 100644
> --- a/src/intel/vulkan/anv_pass.c
> +++ b/src/intel/vulkan/anv_pass.c
> @@ -34,6 +34,16 @@ num_subpass_attachments(const VkSubpassDescription
> *desc)
>(desc->pDepthStencilAttachment != NULL);
>  }
>
> +static void
> +init_first_subpass_layout(struct anv_render_pass_attachment * const att,
> +  const VkAttachmentReference att_ref)
> +{
> +   if (att->first_subpass_layout == VK_IMAGE_LAYOUT_UNDEFINED) {
> +  att->first_subpass_layout = att_ref.layout;
> +  assert(att->first_subpass_layout != VK_IMAGE_LAYOUT_UNDEFINED);
> +   }
> +}
> +
>  VkResult anv_CreateRenderPass(
>  VkDevice_device,
>  const VkRenderPassCreateInfo*   pCreateInfo,
> @@ -91,6 +101,7 @@ VkResult anv_CreateRenderPass(
>att->stencil_load_op = pCreateInfo->pAttachments[i].stencilLoadOp;
>att->initial_layout = pCreateInfo->pAttachments[i].initialLayout;
>att->final_layout = pCreateInfo->pAttachments[i].finalLayout;
> +  att->first_subpass_layout = VK_IMAGE_LAYOUT_UNDEFINED;
>att->subpass_usage = subpass_usages;
>subpass_usages += pass->subpass_count;
> }
> @@ -119,6 +130,8 @@ VkResult anv_CreateRenderPass(
> pass->attachments[a].subpass_usage[i] |=
> ANV_SUBPASS_USAGE_INPUT;
> pass->attachments[a].last_subpass_idx = i;
>
> +   init_first_subpass_layout(&pass->attachments[a],
> + desc->pInputAttachments[j]);
> if (desc->pDepthStencilAttachment &&
> a == desc->pDepthStencilAttachment->attachment)
>subpass->has_ds_self_dep = true;
> @@ -138,6 +151,9 @@ VkResult anv_CreateRenderPass(
> pass->attachments[a].usage |= VK_IMAGE_USAGE_COLOR_
> ATTACHMENT_BIT;
> pass->attachments[a].subpass_usage[i] |=
> ANV_SUBPASS_USAGE_DRAW;
> pass->attachments[a].last_subpass_idx = i;
> +
> +   init_first_subpass_layout(&pass->attachments[a],
> + desc->pColorAttachments[j]);
>  }
>   }
>}
> @@ -162,6 +178,9 @@ VkResult anv_CreateRenderPass(
> pass->attachments[a].subpass_usage[i] |=
>ANV_SUBPASS_USAGE_RESOLVE_DST;
> pass->attachments[a].last_subpass_idx = i;
> +
> +   init_first_subpass_layout(&pass->attachments[a],
> + desc->pResolveAttachments[j]);
>  }
>   }
>}
> @@ -176,6 +195,9 @@ VkResult anv_CreateRenderPass(
> VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT;
>  pass->attachments[a].subpass_usage[i] |=
> ANV_SUBPASS_USAGE_DRAW;
>  pass->attachments[a].last_subpass_idx = i;
> +
> +init_first_subpass_layout(&pass->attachments[a],
> +  *desc->pDepthStencilAttachment);
>   }
>} else {
>   subpass->depth_stencil_attachment.attachment =
> VK_ATTACHMENT_UNUSED;
> diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_
> private.h
> index a95188ac30..c5a2ba0888 100644
> --- a/src/intel/vulkan/anv_private.h
> +++ b/src/intel/vulkan/anv_private.h
> @@ -1518,6 +1518,7 @@ struct anv_attachment_state {
> bool fast_clear;
> VkClearValue clear_value;
> bool clear_color_is_zero_one;
> +   bool clear_color_is_zero;
>  };
>
>  /** State required while building cmd buffer */
> @@ -2336,6 +2337,7 @@ struct anv_render_pass_attachment {
> VkAttachmentLoadOp   stencil_load_op;
> VkImageLayoutinitial_layout;
> VkImageLayoutfinal_layout;
> +   VkImageLayoutfirst_subpass_layout;
>
> /* An array, indexed by subpass id, of how the attachment will be
> used. */
> enum anv_subpass_usage * subpass_usage;
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> b/src/intel/vulkan/genX_cmd_buffer.c
> index 15927d32ad..253e68cd1f 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -253,7 +253,12 @@ color_attachment_compute_aux_usage(struc

Re: [Mesa-dev] [PATCH] gallium: improve selection of pixel format

2017-07-10 Thread Olivier Lauffenburger

Technically, a correctly written application should not rely on 
ChoosePixelFormat but should enumerate by itself through all the pixel formats 
and select the best one with its own algorithm.

However, for the (numerous) applications that use ChoosePixelFormat, color 
depth is more a question of quality of rendering, whereas depth and stencil 
buffers are a question a correct/incorrect rendering.

The current method gives a higher priority to color depth than to depth and 
stencil buffers depth to the point that a different color depth disables the 
stencil buffer.

To make things even worse, many applications (including GLUT) incorrectly set 
ppfd->cColorBits to 32 instead of 24, although the documentation clearly states 
that "For RGBA pixel types, it is the size of the color buffer, EXCLUDING THE 
ALPHA BITPLANES" (emphasis is mine).

As a result, those applications never get a stencil buffer, which leads to 
incorrect rendering. I stumbled on this problem while giving a try to OpenCSG 
and it took me a while to discover that the wrong results were caused by the 
absence of a stencil buffer requested by GLUT.

Although there is no universal selection algorithm, the one I suggest tries to 
enforce the following policy:

- Most important is to enable all the buffers (depth, stencil, accumulation...) 
that are requested (correction).
- Then, try to allocate at least as many bits as requested (quality + 
performance).
- Least important, try not to allocate more bits than requested (economy).

This algorithm seems to be in line with the behavior of most Windows drivers 
(Microsoft, NVIDIA, AMD) and, more important, I can't imagine a sensible 
scenario where this change would break an existing application.

-Olivier

-Message d'origine-
De : Andres Gomez [mailto:ago...@igalia.com] 
Envoyé : samedi 8 juillet 2017 22:08
À : Olivier Lauffenburger ; 
mesa-dev@lists.freedesktop.org
Cc : mesa-sta...@lists.freedesktop.org; Brian Paul 
Objet : Re: [Mesa-dev] [PATCH] gallium: improve selection of pixel format

Olivier, Brian, do we want this into -stable?

On Thu, 2017-07-06 at 17:08 +0200, Olivier Lauffenburger wrote:
> Current selection of pixel format does not enforce the request of 
> stencil or depth buffer if the color depth is not the same as 
> requested.
> For instance, GLUT requests a 32-bit color buffer with an 8-bit 
> stencil buffer, but because color buffers are only 24-bit, no priority 
> is given to creating a stencil buffer.
> 
> This patch gives more priority to the creation of requested buffers 
> and less priority to the difference in bit depth.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101703
> 
> Signed-off-by: Olivier Lauffenburger 
> ---
>  src/gallium/state_trackers/wgl/stw_pixelformat.c | 36 
> +++-
>  1 file changed, 29 insertions(+), 7 deletions(-)
> 
> diff --git a/src/gallium/state_trackers/wgl/stw_pixelformat.c 
> b/src/gallium/state_trackers/wgl/stw_pixelformat.c
> index 7763f71cbc..833308d964 100644
> --- a/src/gallium/state_trackers/wgl/stw_pixelformat.c
> +++ b/src/gallium/state_trackers/wgl/stw_pixelformat.c
> @@ -432,17 +432,39 @@ stw_pixelformat_choose(HDC hdc, CONST 
> PIXELFORMATDESCRIPTOR *ppfd)
>!!(pfi->pfd.dwFlags & PFD_DOUBLEBUFFER))
>   continue;
>  
> -  /* FIXME: Take in account individual channel bits */
> -  if (ppfd->cColorBits != pfi->pfd.cColorBits)
> - delta += 8;
> +  /* Selection logic:
> +  * - Enabling a feature (depth, stencil...) is given highest priority.
> +  * - Giving as many bits as requested is given medium priority.
> +  * - Giving no more bits than requested is given lowest priority.
> +  */
>  
> -  if (ppfd->cDepthBits != pfi->pfd.cDepthBits)
> - delta += 4;
> +  /* FIXME: Take in account individual channel bits */
> +  if (ppfd->cColorBits && !pfi->pfd.cColorBits)
> + delta += 1;
> +  else if (ppfd->cColorBits > pfi->pfd.cColorBits)
> + delta += 100;
> +  else if (ppfd->cColorBits < pfi->pfd.cColorBits)
> + delta++;
>  
> -  if (ppfd->cStencilBits != pfi->pfd.cStencilBits)
> +  if (ppfd->cDepthBits && !pfi->pfd.cDepthBits)
> + delta += 1;
> +  else if (ppfd->cDepthBits > pfi->pfd.cDepthBits)
> + delta += 200;
> +  else if (ppfd->cDepthBits < pfi->pfd.cDepthBits)
>   delta += 2;
>  
> -  if (ppfd->cAlphaBits != pfi->pfd.cAlphaBits)
> +  if (ppfd->cStencilBits && !pfi->pfd.cStencilBits)
> + delta += 1;
> +  else if (ppfd->cStencilBits > pfi->pfd.cStencilBits)
> + delta += 400;
> +  else if (ppfd->cStencilBits < pfi->pfd.cStencilBits)
> + delta++;
> +
> +  if (ppfd->cAlphaBits && !pfi->pfd.cAlphaBits)
> + delta += 1;
> +  else if (ppfd->cAlphaBits > pfi->pfd.cAlphaBits)
> + delta += 100;
> +  else if (ppfd->cAlphaBits < pfi->pfd.cAlphaBits)
>   delta++;
>  
>if (

Re: [Mesa-dev] [PATCH v3 03/16] anv/cmd_buffer: Initialize the clear values buffer

2017-07-10 Thread Jason Ekstrand

On Wed, Jun 28, 2017 at 2:14 PM, Nanley Chery  wrote:

> v2: Rewrite functions.
>
> Signed-off-by: Nanley Chery 
> ---
>  src/intel/vulkan/genX_cmd_buffer.c | 93 ++
> 
>  1 file changed, 84 insertions(+), 9 deletions(-)
>
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> b/src/intel/vulkan/genX_cmd_buffer.c
> index 53c58ca5b3..8601d706d1 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -384,6 +384,70 @@ transition_depth_buffer(struct anv_cmd_buffer
> *cmd_buffer,
>anv_gen8_hiz_op_resolve(cmd_buffer, image, hiz_op);
>  }
>
> +static inline uint32_t
> +get_fast_clear_state_entry_offset(const struct anv_device *device,
> +  const struct anv_image *image,
> +  unsigned level)
> +{
> +   assert(device && image);
> +   assert(image->aspects == VK_IMAGE_ASPECT_COLOR_BIT);
> +   assert(level < anv_image_aux_levels(image));
> +   const uint32_t offset = image->offset + image->aux_surface.offset +
> +   image->aux_surface.isl.size +
> +   anv_fast_clear_state_entry_size(device) *
> level;
> +   assert(offset < image->offset + image->size);
> +   return offset;
> +}
> +
> +static void
> +init_fast_clear_state_entry(struct anv_cmd_buffer *cmd_buffer,
> +const struct anv_image *image,
> +unsigned level)
> +{
> +   assert(cmd_buffer && image);
> +   assert(image->aspects == VK_IMAGE_ASPECT_COLOR_BIT);
> +   assert(level < anv_image_aux_levels(image));
> +
> +   /* The fast clear value dword(s) will be copied into a surface state
> object.
> +* Ensure that the restrictions of the fields in the dword(s) are
> followed.
> +*
> +* CCS buffers on SKL+ can have any value set for the clear colors.
> +*/
> +   if (image->samples == 1 && GEN_GEN >= 9)
> +  return;
> +
> +   /* Other combinations of auxiliary buffers and platforms require
> specific
> +* values in the clear value dword(s).
> +*/
> +   unsigned i = 0;
> +   for (; i < cmd_buffer->device->isl_dev.ss.clear_value_size; i += 4) {
> +  anv_batch_emit(&cmd_buffer->batch, GENX(MI_STORE_DATA_IMM), sdi) {
> + const uint32_t entry_offset =
> +get_fast_clear_state_entry_offset(cmd_buffer->device, image,
> level);
> + sdi.Address = (struct anv_address) { image->bo, entry_offset + i
> };
> +
> + if (GEN_GEN >= 9) {
> +/* MCS buffers on SKL+ can only have 1/0 clear colors. */
> +assert(image->aux_usage == ISL_AUX_USAGE_MCS);
> +sdi.ImmediateData = 0;
> + } else {
> +/* Pre-SKL, the dword containing the clear values also
> contains
> + * other fields, so we need to initialize those fields to
> match the
> + * values that would be in a color attachment.
> + */
> +assert(i == 0);
> +sdi.ImmediateData = level << 8;
>

>From the Broadwell PRM, RENDER_SURFACE_STATE::Resource Min LOD:

For Sampling Engine Surfaces:
This field indicates the most detailed LOD that is present in the resource
underlying the surface.
Refer to the "LOD Computation Pseudocode" section for the use of this field.

For Other Surfaces:
This field is ignored.

I think we can safely leave this field zero since this will only ever be
ORed into render target surfaces.  Grepping through isl_surface_state.c
also indicates that we never set this field in either GL or Vulkan so it's
always zero.

--Jason


> +if (GEN_VERSIONx10 >= 75) {
> +   sdi.ImmediateData |= ISL_CHANNEL_SELECT_RED   << 25 |
> +ISL_CHANNEL_SELECT_GREEN << 22 |
> +ISL_CHANNEL_SELECT_BLUE  << 19 |
> +ISL_CHANNEL_SELECT_ALPHA << 16;
>

These, however, are needed. :-)


> +}
> + }
> +  }
> +   }
> +}
> +
>  static void
>  transition_color_buffer(struct anv_cmd_buffer *cmd_buffer,
>  const struct anv_image *image,
> @@ -392,7 +456,9 @@ transition_color_buffer(struct anv_cmd_buffer
> *cmd_buffer,
>  VkImageLayout initial_layout,
>  VkImageLayout final_layout)
>  {
> -   if (image->aux_usage != ISL_AUX_USAGE_CCS_E)
> +   assert(image->aspects == VK_IMAGE_ASPECT_COLOR_BIT);
> +
> +   if (image->aux_surface.isl.size == 0)
>return;
>
> if (initial_layout != VK_IMAGE_LAYOUT_UNDEFINED &&
> @@ -405,15 +471,24 @@ transition_color_buffer(struct anv_cmd_buffer
> *cmd_buffer,
>layer_count = anv_minify(image->extent.depth, base_level);
> }
>
> -#if GEN_GEN >= 9
> -   /* We're transitioning from an undefined layout so it doesn't really
> matter
> -* what data ends up in the color buffer.  We do, however, need to
> ensure
> -* that the CCS has valid data in it.  One easy way

Re: [Mesa-dev] [PATCH mesa] util: enforce unreachable() semantic

2017-07-10 Thread Eric Engestrom

On Monday, 2017-07-10 17:02:48 +0100, Emil Velikov wrote:
> On 10 July 2017 at 16:08, Eric Engestrom  wrote:
> > No implementation of unreachable() should allow code execution to
> > keep going past it.
> >
> > We can discuss whether we should have a dead loop, abort(), or do
> > something else, but the current "meh, let's just keep going" is
> > just wrong.
> >
> > Cc: mesa-sta...@lists.freedesktop.org
> > Signed-off-by: Eric Engestrom 
> > ---
> >  src/util/macros.h | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/util/macros.h b/src/util/macros.h
> > index a10f1de814..16682bf6e8 100644
> > --- a/src/util/macros.h
> > +++ b/src/util/macros.h
> > @@ -84,7 +84,11 @@ do {\
> > __assume(0); \
> >  } while (0)
> >  #else
> > -#define unreachable(str) assert(!str)
> > +#define unreachable(str)\
> > +do {\
> > +   assert(!str);\
> > +   while(1);\
> > +} while (0)
> >  #endif
> Strictly speaking the current solution follows what the builtin does -
> pretty much anything is possible.

Hmm, you're right actually, I misremembered that.
Withdrawing the patch, as no such guarantee exist in other
implementations either. "anything can happen" is the point here.

> In release builds the assert gets purged and we end up at the mercy of
> the compiler.
> 
> Simple test showed varying behaviour - wrong case statement picked
> (-O0), straight crash (-O1), semi-dead loop followed by a crash (-O2,
> -O3).
> 
> I think we'd want to stick with the current solution, but either way
> is fine with me.
> -Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 101614] OSMesa 17.1.3 simd16intrin build FAIL on Win/MinGW - 'expected initializer before _simd16_setzero_ps ...'

2017-07-10 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=101614

--- Comment #12 from George Kyriazis  ---
Emil,

you said that cross compiling software drivers on arch works for you.

Have you tried running the generated binaries?  I've created a patch for
Trevor, but for some reason the swr driver is not loading for me (llvmpipe
works fine).

In addition, debug scons builds with mingw/msys2 don't include symbols.  Do you
know if that is a known issue?

I don't want to submit the patch until swr runs on mingw/msys2.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH mesa] util: enforce unreachable() semantic

2017-07-10 Thread Emil Velikov

On 10 July 2017 at 16:08, Eric Engestrom  wrote:
> No implementation of unreachable() should allow code execution to
> keep going past it.
>
> We can discuss whether we should have a dead loop, abort(), or do
> something else, but the current "meh, let's just keep going" is
> just wrong.
>
> Cc: mesa-sta...@lists.freedesktop.org
> Signed-off-by: Eric Engestrom 
> ---
>  src/util/macros.h | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/src/util/macros.h b/src/util/macros.h
> index a10f1de814..16682bf6e8 100644
> --- a/src/util/macros.h
> +++ b/src/util/macros.h
> @@ -84,7 +84,11 @@ do {\
> __assume(0); \
>  } while (0)
>  #else
> -#define unreachable(str) assert(!str)
> +#define unreachable(str)\
> +do {\
> +   assert(!str);\
> +   while(1);\
> +} while (0)
>  #endif
Strictly speaking the current solution follows what the builtin does -
pretty much anything is possible.
In release builds the assert gets purged and we end up at the mercy of
the compiler.

Simple test showed varying behaviour - wrong case statement picked
(-O0), straight crash (-O1), semi-dead loop followed by a crash (-O2,
-O3).

I think we'd want to stick with the current solution, but either way
is fine with me.
-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 101614] OSMesa 17.1.3 simd16intrin build FAIL on Win/MinGW - 'expected initializer before _simd16_setzero_ps ...'

2017-07-10 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=101614

George Kyriazis  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #11 from George Kyriazis  ---
I wouldn't mark it as resolved yet.

Since it is filed as a mesa bug, it does reflect the status of mesa, not the
status of any user's solution.

Re-opening until fix is checked in in mesa master, and due testing is
performed.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 101614] OSMesa 17.1.3 simd16intrin build FAIL on Win/MinGW - 'expected initializer before _simd16_setzero_ps ...'

2017-07-10 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=101614

Trevor SANDY  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #10 from Trevor SANDY  ---
Ok, Thanks to George's patch (0001-mingw-fixes.patch), the behaviour reported
in this ticket is fixed.

I applied the set below (excluding the conflict) to revision
89d4008ac85714bab8c49974377fd37970f6d66a of the master branch and the build is
now running smoothly. 

# add_pi.patch \
# gallium-once-flag.patch \
# gallium-osmesa-threadsafe.patch \
# glapi-getproc-mangled.patch \
# install-GL-headers.patch \
# lp_scene-safe.patch \
# mesa-glversion-override.patch \
# osmesa-gallium-driver.patch \
# redefinition-of-typedef-nirshader.patch \
# scons25.patch \
# scons-llvm-3-9-libs.patch \
# swr-sched.patch \
#   scons-swr-cc-arch.patch \ (conflict)
# msys2_scons_fix.patch \
# 0001-mingw-fixes.patch \

You can see all the patches except George's at my github repo. I will add
George's patch once he confirms it is released.

So for me, this issue is now resolved.

Cheers,

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv: Stop setting domains to RENDER on EXEC_OBJECT_WRITE

2017-07-10 Thread Jason Ekstrand

On Sat, Jul 8, 2017 at 12:18 PM, Matt Turner  wrote:

> On Sat, Jul 8, 2017 at 11:05 AM, Jason Ekstrand 
> wrote:
> > On July 7, 2017 1:52:54 PM Chris Wilson 
> wrote:
> >
> >> Quoting Jason Ekstrand (2017-07-07 21:37:29)
> >>>
> >>> The reason we were doing this was to ensure that the kernel did the
> >>> appropriate cross-ring synchronization and flushing.  However, the
> >>> kernel only looks at EXEC_OBJECT_WRITE to determine whether or not to
> >>> insert a fence.  It only cares about the domain for determining whether
> >>> or not it needs to clflush the BO before using it for scanout but the
> >>> domain automatically gets set to RENDER internally by the kernel if
> >>> EXEC_OBJECT_WRITE is set.
> >>
> >>
> >> Once upon a time we also depended upon EXEC_OBJECT_WRITE for correct
> >> swapout. That was until I saw what you were planning to do for anv. Hmm,
> >> that puts the oldest kernel that might support anv as
> >>
> >> commit 51bc140431e233284660b1d22c47dec9ecdb521e [v4.3]
> >> Author: Chris Wilson 
> >> Date:   Mon Aug 31 15:10:39 2015 +0100
> >>
> >> drm/i915: Always mark the object as dirty when used by the GPU
> >
> >
> > I think we're probably ok there.  We have a hard requirement on memfd
> which
> > I think landed in 4.6 though I could be wrong about that.
>
> No. memfd_create was added in 3.17.
>

Bah.  I don't know why 4.6 is stuck in my brain as being important but it
is. :-/  In any case, is there some way we can check for that commit?
Otherwise, I think the only real thing we can do is just hope you don't
swap and accept corruption if you do. :-(

--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH mesa] util: enforce unreachable() semantic

2017-07-10 Thread Eric Engestrom

No implementation of unreachable() should allow code execution to
keep going past it.

We can discuss whether we should have a dead loop, abort(), or do
something else, but the current "meh, let's just keep going" is
just wrong.

Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Eric Engestrom 
---
 src/util/macros.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/util/macros.h b/src/util/macros.h
index a10f1de814..16682bf6e8 100644
--- a/src/util/macros.h
+++ b/src/util/macros.h
@@ -84,7 +84,11 @@ do {\
__assume(0); \
 } while (0)
 #else
-#define unreachable(str) assert(!str)
+#define unreachable(str)\
+do {\
+   assert(!str);\
+   while(1);\
+} while (0)
 #endif
 
 /**
-- 
Cheers,
  Eric

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH mesa] scons: split out check_header() helper

2017-07-10 Thread Eric Engestrom

Signed-off-by: Eric Engestrom 
---
 scons/gallium.py | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/scons/gallium.py b/scons/gallium.py
index 61643a6d4f..c8e47a39db 100755
--- a/scons/gallium.py
+++ b/scons/gallium.py
@@ -145,6 +145,17 @@ def check_cc(env, cc, expr, cpp_opt = '-E'):
 sys.stdout.write(' %s\n' % ['no', 'yes'][int(bool(result))])
 return result
 
+def check_header(env, header):
+'''Check if the header exist'''
+
+conf = SCons.Script.Configure(env)
+have_header = False
+
+if conf.CheckHeader(header):
+have_header = True
+
+env = conf.Finish()
+return have_header
 
 def check_prog(env, prog):
 """Check whether this program exists."""
@@ -325,10 +336,8 @@ def generate(env):
 'GLX_INDIRECT_RENDERING',
 ]
 
-conf = SCons.Script.Configure(env)
-if conf.CheckHeader('xlocale.h'):
+if check_header(env, 'xlocale.h'):
 cppdefines += ['HAVE_XLOCALE_H']
-env = conf.Finish()
 
 if platform == 'windows':
 cppdefines += [
-- 
Cheers,
  Eric

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] svga: fix PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE value

2017-07-10 Thread Brian Paul

This query is supposed to return the max texture buffer size/width in
texels, not size in bytes.  Divide by 16 (the largest format size) to
return texels.

Fixes Piglit arb_texture_buffer_object-max-size test.

Cc: mesa-sta...@lists.freedesktop.org
---
 src/gallium/drivers/svga/svga_screen.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/svga/svga_screen.c 
b/src/gallium/drivers/svga/svga_screen.c
index 0b63525..f40d151 100644
--- a/src/gallium/drivers/svga/svga_screen.c
+++ b/src/gallium/drivers/svga/svga_screen.c
@@ -312,7 +312,10 @@ svga_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
   return svgascreen->ms_samples ? 1 : 0;
 
case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
-  return SVGA3D_DX_MAX_RESOURCE_SIZE;
+  /* convert bytes to texels for the case of the largest texel
+   * size: float[4].
+   */
+  return SVGA3D_DX_MAX_RESOURCE_SIZE / (4 * sizeof(float));
 
case PIPE_CAP_MIN_TEXEL_OFFSET:
   return sws->have_vgpu10 ? VGPU10_MIN_TEXEL_FETCH_OFFSET : 0;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] mesa/marshal: fix glNamedBufferData with NULL data

2017-07-10 Thread Grigori Goronzy

The semantics are similar to glBufferData. Fixes a crash with VMWare
Player.

Signed-off-by: Grigori Goronzy 
---
 src/mesa/main/marshal.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/src/mesa/main/marshal.c b/src/mesa/main/marshal.c
index 8db4531..b801bdc 100644
--- a/src/mesa/main/marshal.c
+++ b/src/mesa/main/marshal.c
@@ -415,6 +415,7 @@ struct marshal_cmd_NamedBufferData
GLuint name;
GLsizei size;
GLenum usage;
+   bool data_null; /* If set, no data follows for "data" */
/* Next size bytes are GLubyte data[size] */
 };
 
@@ -425,7 +426,12 @@ _mesa_unmarshal_NamedBufferData(struct gl_context *ctx,
const GLuint name = cmd->name;
const GLsizei size = cmd->size;
const GLenum usage = cmd->usage;
-   const void *data = (const void *) (cmd + 1);
+   const void *data;
+
+   if (cmd->data_null)
+  data = NULL;
+   else
+  data = (const void *) (cmd + 1);
 
CALL_NamedBufferData(ctx->CurrentServerDispatch,
 (name, size, data, usage));
@@ -436,7 +442,7 @@ _mesa_marshal_NamedBufferData(GLuint buffer, GLsizeiptr 
size,
   const GLvoid * data, GLenum usage)
 {
GET_CURRENT_CONTEXT(ctx);
-   size_t cmd_size = sizeof(struct marshal_cmd_NamedBufferData) + size;
+   size_t cmd_size = sizeof(struct marshal_cmd_NamedBufferData) + (data ? size 
: 0);
 
debug_print_marshal("NamedBufferData");
if (unlikely(size < 0)) {
@@ -452,8 +458,11 @@ _mesa_marshal_NamedBufferData(GLuint buffer, GLsizeiptr 
size,
   cmd->name = buffer;
   cmd->size = size;
   cmd->usage = usage;
-  char *variable_data = (char *) (cmd + 1);
-  memcpy(variable_data, data, size);
+  cmd->data_null = !data;
+  if (data) {
+ char *variable_data = (char *) (cmd + 1);
+ memcpy(variable_data, data, size);
+  }
   _mesa_post_marshal_hook(ctx);
} else {
   _mesa_glthread_finish(ctx);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [EGL android: accquire fence implementation] i965: Queue the buffer with a sync fence for Android OS

2017-07-10 Thread Marathe, Yogesh

Hello Emil, My two cents since I too spent some time on this.

> -Original Message-
> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf
> Of Emil Velikov
> Sent: Monday, July 10, 2017 4:41 PM
> To: Wu, Zhongmin 
> Cc: Widawsky, Benjamin ; Liu, Zhiquan
> ; Eric Engestrom ; Rob Clark
> ; Tomasz Figa ; Kenneth
> Graunke ; Kondapally, Kalyan
> ; ML mesa-dev  d...@lists.freedesktop.org>; Timothy Arceri ; Chuanbo
> Weng 
> Subject: Re: [Mesa-dev] [EGL android: accquire fence implementation] i965:
> Queue the buffer with a sync fence for Android OS
> 
> Hi Zhongmin Wu,
> 
> Above all, a bit of a disclaimer: I'm by no means an expert on the topic so 
> take
> the following with a pinch of salt.
> 
> On 10 July 2017 at 03:11, Zhongmin Wu  wrote:
> > Before we queued the buffer with a invalid fence (-1), it will make
> > some benchmarks failed to test such as flatland.
> >
> > Now we get the out fence during the flushing buffer and then pass it
> > to SurfaceFlinger in eglSwapbuffer function.
> >
> Having a closer look it seems that the issue can be summarised as follows:
>  - flatland intercepts/interacts ANativeWindow::{de,}queueBuffer (how about
> ANativeWindow::cancelBuffer?)
>  - the program expects that a sync fd is available for both dequeue and queue
> 
> At the same time:
>  - the ANativeWindow documentation does _not_ state such requirement
>  - even if it did, that will be somewhat wrong, since
> ANativeWindow::queueBuffer is called by eglSwapBuffers() Where the latter
> documentation clearly states - "... performs an implicit flush ... glFlush ...
> vgFlush"
> 
> My take is that if flatland/Android framework does want an explicit sync 
> point it
> should insert one with the EGL API.
> There could be alternative solutions, but the proposed patch seems wrong
> IMHO.

In fact, I could work this around in producer  (Surface::queueBuffer) by 
ignoring the (-1)
passed and by creating a sync using egl APIs. I see two problems with that.

- Before getting a fd using eglDupNativeFenceFDANDROID(), you need a glFlush(),
   this costs additional cycles for each queueBuffer transaction on each 
BufferItem and 
   I believe fd is also signaled due to this. (so I don’t know what we'll get 
by waiting on 
   that fd on consumer side).
- AFAIK, the whole idea of explicit sync revolves around being able to pass fds 
created 
  by driver between processes and this one breaks that chain. If we work this 
around in 
  upper layers, explicit sync feature will have to be fixed for every other lib 
that may use
  lib mesa underneath.

For these reasons, I still believe we should fix it here. Of course, you and 
Rob have very
valid points on cancelBuffer and about not breaking gallium respectively, those 
need to
be taken care of.

> 
> Regards,
> Emil
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] swr: build driver proper separate from rasterizer

2017-07-10 Thread Emil Velikov

Hi Tim,

On 7 July 2017 at 22:25, Tim Rowley  wrote:
> swr used to build and link the rasterizer to the driver, and to support
> multiple architectures we needed to have multiple versions of the
> driver/rasterizer combination, which needed to link in much of mesa.
>
> Changing to having one instance of the driver and just building
> architecture specific versions of the rasterizer gives a large reduction
> in disk space.
>
> libGL.so6464 Kb ->  7000 Kb
> libswrAVX.so   10068 Kb ->  5432 Kb
> libswrAVX2.so   9828 Kb ->  5200 Kb
>
If one considers the other binaries which include libmesaswr.la
(swr_dri.so, osmesa, etc) savings might be a bit smaller ;-)
Regardless, thank you for working on this.

> Total  26360 Kb -> 17632 Kb
> ---
>  src/gallium/drivers/swr/Makefile.am | 24 +---
>  src/gallium/drivers/swr/swr_context.cpp |  2 +-
>  src/gallium/drivers/swr/swr_loader.cpp  | 14 ++
>  src/gallium/drivers/swr/swr_screen.h|  2 ++
>  4 files changed, 22 insertions(+), 20 deletions(-)
>
> diff --git a/src/gallium/drivers/swr/Makefile.am 
> b/src/gallium/drivers/swr/Makefile.am
> index 4b4bd37..e764e0d 100644
> --- a/src/gallium/drivers/swr/Makefile.am
> +++ b/src/gallium/drivers/swr/Makefile.am
> @@ -26,7 +26,13 @@ AM_CXXFLAGS = $(GALLIUM_DRIVER_CFLAGS) 
> $(SWR_CXX11_CXXFLAGS)
>
>  noinst_LTLIBRARIES = libmesaswr.la
>
> -libmesaswr_la_SOURCES = $(LOADER_SOURCES)
> +libmesaswr_la_SOURCES = \

> +   $(COMMON_CXX_SOURCES) \
> +   rasterizer/codegen/gen_knobs.cpp \
> +   rasterizer/codegen/gen_knobs.h \
These three now seems to be duplicated across the frontend and
AVX/AVX2 backends. Is that intentional?
Worth adding a note?


> +libmesaswr_la_CXXFLAGS = \
> +   $(SWR_AVX_CXXFLAGS) \
> +   -DKNOB_ARCH=KNOB_ARCH_AVX \
With his KNOB, the frontend will be build for AVX. What about AVX2?


> -COMMON_LIBADD = \
> -   $(top_builddir)/src/gallium/auxiliary/libgallium.la \
> -   $(top_builddir)/src/mesa/libmesagallium.la \
> -   $(LLVM_LIBS)
> -
With this gone libswrAVX{,2}_la_LIBADD become empty, so we can drop them.

Can you check that configure --with-gallium-drivers=swr
--enable-gallium-osmesa --disable-dri --enable-glx=gallium-xlib build
fine (needs a second run dropping the latter two options). I cannot
spot anything obvious - just a gut feeling. You might want to sort the
SCons build as well?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] vbo: simplify vbo_save_NotifyBegin()

2017-07-10 Thread Erik Faye-Lund

Reviewed-by: Erik Faye-Lund 

On Fri, Jul 7, 2017 at 4:11 PM, Brian Paul  wrote:
> This function always returned GL_TRUE.  Just make it a void function.
> Remove unreachable code following the call to vbo_save_NotifyBegin()
> in save_Begin() in dlist.c
>
> There were some stale comments that no longer applied since an earlier
> code refactoring.
>
> No Piglit regressions.
> ---
>  src/mesa/main/dlist.c   | 18 +-
>  src/mesa/vbo/vbo.h  |  2 +-
>  src/mesa/vbo/vbo_save_api.c |  7 +--
>  3 files changed, 3 insertions(+), 24 deletions(-)
>
> diff --git a/src/mesa/main/dlist.c b/src/mesa/main/dlist.c
> index 7e44054..9e817be 100644
> --- a/src/mesa/main/dlist.c
> +++ b/src/mesa/main/dlist.c
> @@ -5766,25 +5766,9 @@ save_Begin(GLenum mode)
>_mesa_compile_error(ctx, GL_INVALID_OPERATION, "recursive glBegin");
> }
> else {
> -  Node *n;
> -
>ctx->Driver.CurrentSavePrimitive = mode;
>
> -  /* Give the driver an opportunity to hook in an optimized
> -   * display list compiler.
> -   */
> -  if (vbo_save_NotifyBegin(ctx, mode))
> - return;
> -
> -  SAVE_FLUSH_VERTICES(ctx);
> -  n = alloc_instruction(ctx, OPCODE_BEGIN, 1);
> -  if (n) {
> - n[1].e = mode;
> -  }
> -
> -  if (ctx->ExecuteFlag) {
> - CALL_Begin(ctx->Exec, (mode));
> -  }
> +  vbo_save_NotifyBegin(ctx, mode);
> }
>  }
>
> diff --git a/src/mesa/vbo/vbo.h b/src/mesa/vbo/vbo.h
> index eec484b..c8e87d3 100644
> --- a/src/mesa/vbo/vbo.h
> +++ b/src/mesa/vbo/vbo.h
> @@ -90,7 +90,7 @@ vbo_initialize_save_dispatch(const struct gl_context *ctx,
>
>  void vbo_exec_FlushVertices(struct gl_context *ctx, GLuint flags);
>  void vbo_save_SaveFlushVertices(struct gl_context *ctx);
> -GLboolean vbo_save_NotifyBegin(struct gl_context *ctx, GLenum mode);
> +void vbo_save_NotifyBegin(struct gl_context *ctx, GLenum mode);
>  void vbo_save_NewList(struct gl_context *ctx, GLuint list, GLenum mode);
>  void vbo_save_EndList(struct gl_context *ctx);
>  void vbo_save_BeginCallList(struct gl_context *ctx, struct gl_display_list 
> *list);
> diff --git a/src/mesa/vbo/vbo_save_api.c b/src/mesa/vbo/vbo_save_api.c
> index a0735f6..a42a3c3 100644
> --- a/src/mesa/vbo/vbo_save_api.c
> +++ b/src/mesa/vbo/vbo_save_api.c
> @@ -1035,7 +1035,7 @@ _save_CallLists(GLsizei n, GLenum type, const GLvoid * 
> v)
>   * Called when a glBegin is getting compiled into a display list.
>   * Updating of ctx->Driver.CurrentSavePrimitive is already taken care of.
>   */
> -GLboolean
> +void
>  vbo_save_NotifyBegin(struct gl_context *ctx, GLenum mode)
>  {
> struct vbo_save_context *save = &vbo_context(ctx)->save;
> @@ -1064,11 +1064,6 @@ vbo_save_NotifyBegin(struct gl_context *ctx, GLenum 
> mode)
>
> /* We need to call vbo_save_SaveFlushVertices() if there's state change */
> ctx->Driver.SaveNeedFlush = GL_TRUE;
> -
> -   /* GL_TRUE means we've handled this glBegin here; don't compile a BEGIN
> -* opcode into the display list.
> -*/
> -   return GL_TRUE;
>  }
>
>
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 101614] OSMesa 17.1.3 simd16intrin build FAIL on Win/MinGW - 'expected initializer before _simd16_setzero_ps ...'

2017-07-10 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=101614

--- Comment #9 from Emil Velikov  ---
Trevor, please try and get [ideally] all of the patches upstreamed. Be that in
mesa[1] or the respective project. Frédéric Devernay may be able to lend a
hand?

Any of your local patches may be changing Mesa is subtle ways, that people
cannot foresee.

On the Travis/Appveyor part - yes, not everything is covered. Do send a patch
when you get bored. Note that by default both libgl-gdi and osmesa are build,
even if they are not explicitly listed on the command line.

On the issue in question - I'm suspecting that it's MINGW64 version/Windows
specific, since cross-compiling llvmpipe/swr/osmesa on my Arch box has been
fine for a while. I've been doing that for a while as part of the releasing
process, even before 17.1.3 was out ;-)

Thanks
Emil
[1] https://www.mesa3d.org/submittingpatches.html

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [EGL android: accquire fence implementation] i965: Queue the buffer with a sync fence for Android OS

2017-07-10 Thread Rob Clark

On Sun, Jul 9, 2017 at 10:11 PM, Zhongmin Wu  wrote:
> Before we queued the buffer with a invalid fence (-1), it will
> make some benchmarks failed to test such as flatland.
>
> Now we get the out fence during the flushing buffer and then pass
> it to SurfaceFlinger in eglSwapbuffer function.
>
> Change-Id: Ic0773c19788d612a98d1402f5b5619dab64c1bc2
> Tracked-On: https://jira01.devtools.intel.com/browse/OAM-43936
> Reported-On: https://bugs.freedesktop.org/show_bug.cgi?id=101655
> Signed-off-by: Zhongmin Wu 
> Reported-by: Li, Guangli 
> Tested-by: Marathe, Yogesh 
> ---
>  include/GL/internal/dri_interface.h   |2 ++
>  src/egl/drivers/dri2/platform_android.c   |   10 +++---
>  src/mesa/drivers/dri/i965/brw_context.c   |7 ++-
>  src/mesa/drivers/dri/i965/brw_context.h   |1 +
>  src/mesa/drivers/dri/i965/intel_batchbuffer.c |   15 ++-
>  src/mesa/drivers/dri/i965/intel_screen.c  |7 +++
>  6 files changed, 37 insertions(+), 5 deletions(-)
>
> diff --git a/include/GL/internal/dri_interface.h 
> b/include/GL/internal/dri_interface.h
> index fc2d4bb..8760aec 100644
> --- a/include/GL/internal/dri_interface.h
> +++ b/include/GL/internal/dri_interface.h
> @@ -316,6 +316,8 @@ struct __DRI2flushExtensionRec {
>   __DRIdrawable *drawable,
>   unsigned flags,
>   enum __DRI2throttleReason throttle_reason);
> +
> +int (*get_retrive_fd)(__DRIcontext *ctx);
>  };
>
>
> diff --git a/src/egl/drivers/dri2/platform_android.c 
> b/src/egl/drivers/dri2/platform_android.c
> index bfa20f8..844bb8d 100644
> --- a/src/egl/drivers/dri2/platform_android.c
> +++ b/src/egl/drivers/dri2/platform_android.c
> @@ -289,10 +289,14 @@ droid_window_enqueue_buffer(_EGLDisplay *disp, struct 
> dri2_egl_surface *dri2_sur
>  *is passed to queueBuffer, and the ANativeWindow implementation
>  *is responsible for closing it.
>  */
> -   int fence_fd = -1;
> -   dri2_surf->window->queueBuffer(dri2_surf->window, dri2_surf->buffer,
> -  fence_fd);
>
> +   _EGLContext *ctx = _eglGetCurrentContext();
> +   struct dri2_egl_context *dri2_ctx = dri2_egl_context(ctx);
> +
> +   int fd = -1;
> +   fd = dri2_dpy->flush->get_retrive_fd(dri2_ctx->dri_context);

so from a quick look at this patch, I suspect this will cause gallium
drivers to start crashing without implementing this new function in
mesa/st.  (Or is someone already working on that?)

Possibly an if(dri2_dpy->flush->get_retrive_fd) is sufficient

BR,
-R


> +   dri2_surf->window->queueBuffer(dri2_surf->window, dri2_surf->buffer,
> +  fd);
> dri2_surf->buffer->common.decRef(&dri2_surf->buffer->common);
> dri2_surf->buffer = NULL;
> dri2_surf->back = NULL;
> diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
> b/src/mesa/drivers/dri/i965/brw_context.c
> index 5433f90..f74ae91 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.c
> +++ b/src/mesa/drivers/dri/i965/brw_context.c
> @@ -940,6 +940,7 @@ brwCreateContext(gl_api api,
> brw->screen = screen;
> brw->bufmgr = screen->bufmgr;
>
> +   brw->retrive_fd = -1;
> brw->gen = devinfo->gen;
> brw->gt = devinfo->gt;
> brw->is_g4x = devinfo->is_g4x;
> @@ -1176,8 +1177,12 @@ intelDestroyContext(__DRIcontext * driContextPriv)
>
> ralloc_free(brw);
> driContextPriv->driverPrivate = NULL;
> -}
>
> +   if(brw->retrive_fd != -1) {
> +   close(brw->retrive_fd);
> +   brw->retrive_fd = -1;
> +   }
> +}
>  GLboolean
>  intelUnbindContext(__DRIcontext * driContextPriv)
>  {
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index dc4bc8f..8f277c3 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1217,6 +1217,7 @@ struct brw_context
>
> __DRIcontext *driContext;
> struct intel_screen *screen;
> +   int retrive_fd;
>  };
>
>  /* brw_clear.c */
> diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
> b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> index 62d2fe8..31515b2 100644
> --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> @@ -648,9 +648,22 @@ do_flush_locked(struct brw_context *brw, int 
> in_fence_fd, int *out_fence_fd)
>   /* Add the batch itself to the end of the validation list */
>   add_exec_bo(batch, batch->bo);
>
> + if(brw->retrive_fd != -1) {
> + close(brw->retrive_fd);
> + brw->retrive_fd = -1;
> + }
> +
> + int fd = -1;
>   ret = execbuffer(dri_screen->fd, batch, hw_ctx,
>4 * USED_BATCH(*batch),
> -  in_fence_fd, out_fence_fd, flags);
> +  in_fence_fd, &fd, flags);
> +
> + if(out_fence_fd != NULL) {
> + *out_fence_fd = fd;
> +

Re: [Mesa-dev] [PATCH 4/4] docs: update master's release notes, news and calendar commit

2017-07-10 Thread Emil Velikov

On 8 July 2017 at 20:59, Andres Gomez  wrote:
> This reflects closer what we are actually doing.
>
> Signed-off-by: Andres Gomez 
> ---
>  docs/releasing.html | 15 ---
>  1 file changed, 4 insertions(+), 11 deletions(-)
>
> diff --git a/docs/releasing.html b/docs/releasing.html
> index 99235d8412..152f5cea73 100644
> --- a/docs/releasing.html
> +++ b/docs/releasing.html
> @@ -24,7 +24,6 @@
>  Making a branchpoint
>  Pre-release announcement
>  Making a new release
> -Update the calendar
>  Announce the release
>  Update the mesa3d.org website
>  Update Bugzilla
> @@ -574,23 +573,17 @@ Something like the following steps will do the trick:
>  
>
>  
> -Also, edit docs/relnotes.html to add a link to the new release notes, and 
> edit
> -docs/index.html to add a news entry. Then commit and push:
> +Also, edit docs/relnotes.html to add a link to the new release notes,
> +edit docs/index.html to add a news entry, and remove the version from
> +docs/release-calendar.html. Then commit and push:
>  
>
>  
> -   git commit -as -m "docs: add news item and link release notes for 
> X.Y.Z"
> +   git commit -as -m "docs: update calendar, add news item and link 
> release notes for X.Y.Z"
> git push origin master X.Y
>  
>
>
> -Update the calendar
> -
> -
> -Remove the version from the  target="_parent">calendar.
> -
> -
> -
Originally I thought about keeping it separate commit. Although
squashing seems reasonable.

With some info why LD_LIBRARY_PATH is set and the trivial nits
addressed (for 3/4)

1, 3 and 4 are
Reviewed-by: Emil Velikov 

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

1 2 >

1 - 100 of 110 matches

Mail list logo