Re: [Mesa-dev] [PATCH] radv: add scratch support for spilling.

2017-01-24 Thread Bas Nieuwenhuizen
I'm not sure if using a scratch buffer per command buffer is correct.
AFAIU each ring has a separate counter for the scratch offsets, and if a
command buffer is used in multiple compute rings at the same time, these
separate counters could conflict.

I'd think we need a preamble IB per queue that sets SGPR0/1 for all
relevant stages, and modify the winsys so that that is called in the
same submit ioctl as the application command buffers.

- Bas

On Tue, Jan 24, 2017, at 18:32, Dave Airlie wrote:
> From: Dave Airlie 
> 
> Currently LLVM 5.0 has support for spilling to a place
> pointed to by the user sgprs instead of using relocations.
> 
> This is enabled by using the amdgcn-mesa-mesa3d triple.
> 
> For compute gfx shaders we spill to a buffer pointed to
> by 64-bit address stored in sgprs 0/1.
> For other gfx shaders we spill to a buffer pointed to by
> the first two dwords of the buffer pointed to in sgprs 0/1.
> 
> This patch enables radv to use the llvm support when present.
> 
> This fixes Sascha Willems computeshader demo first screen,
> and a bunch of CTS tests now pass.
> 
> This patch is likely to be in LLVM 4.0 release as well
> (fingers crossed) in which case we need to adjust the detection
> logic.
> 
> SIgned-off-by: Dave Airlie 
> ---
>  src/amd/common/ac_binary.c   |  30 +
>  src/amd/common/ac_binary.h   |   4 +-
>  src/amd/common/ac_llvm_util.c|   4 +-
>  src/amd/common/ac_llvm_util.h|   2 +-
>  src/amd/common/ac_nir_to_llvm.c  |  14 ++--
>  src/amd/common/ac_nir_to_llvm.h  |   6 +-
>  src/amd/vulkan/radv_cmd_buffer.c | 137
>  ++-
>  src/amd/vulkan/radv_device.c |  22 +++
>  src/amd/vulkan/radv_pipeline.c   |  10 +--
>  src/amd/vulkan/radv_private.h|  13 
>  10 files changed, 215 insertions(+), 27 deletions(-)
> 
> diff --git a/src/amd/common/ac_binary.c b/src/amd/common/ac_binary.c
> index 01cf000..9c66a82 100644
> --- a/src/amd/common/ac_binary.c
> +++ b/src/amd/common/ac_binary.c
> @@ -212,23 +212,28 @@ static const char *scratch_rsrc_dword1_symbol =
>  
>  void ac_shader_binary_read_config(struct ac_shader_binary *binary,
> struct ac_shader_config *conf,
> - unsigned symbol_offset)
> + unsigned symbol_offset,
> + bool supports_spill)
>  {
>   unsigned i;
>   const unsigned char *config =
>   ac_shader_binary_config_start(binary, symbol_offset);
>   bool really_needs_scratch = false;
> -
> +   uint32_t wavesize = 0;
>   /* LLVM adds SGPR spills to the scratch size.
>* Find out if we really need the scratch buffer.
>*/
> -   for (i = 0; i < binary->reloc_count; i++) {
> -   const struct ac_shader_reloc *reloc = >relocs[i];
> +   if (supports_spill) {
> +   really_needs_scratch = true;
> +   } else {
> +   for (i = 0; i < binary->reloc_count; i++) {
> +   const struct ac_shader_reloc *reloc =
> >relocs[i];
>  
> -   if (!strcmp(scratch_rsrc_dword0_symbol, reloc->name) ||
> -   !strcmp(scratch_rsrc_dword1_symbol, reloc->name)) {
> -   really_needs_scratch = true;
> -   break;
> +   if (!strcmp(scratch_rsrc_dword0_symbol,
> reloc->name) ||
> +   !strcmp(scratch_rsrc_dword1_symbol,
> reloc->name)) {
> +   really_needs_scratch = true;
> +   break;
> +   }
>   }
>   }
>  
> @@ -259,9 +264,7 @@ void ac_shader_binary_read_config(struct
> ac_shader_binary *binary,
>   case R_0286E8_SPI_TMPRING_SIZE:
>   case R_00B860_COMPUTE_TMPRING_SIZE:
>   /* WAVESIZE is in units of 256 dwords. */
> -   if (really_needs_scratch)
> -   conf->scratch_bytes_per_wave =
> -   G_00B860_WAVESIZE(value) * 256 *
> 4;
> +   wavesize = value;
>   break;
>   case SPILLED_SGPRS:
>   conf->spilled_sgprs = value;
> @@ -285,4 +288,9 @@ void ac_shader_binary_read_config(struct
> ac_shader_binary *binary,
>   if (!conf->spi_ps_input_addr)
>   conf->spi_ps_input_addr = conf->spi_ps_input_ena;
>   }
> +
> +   if (really_needs_scratch) {
> +   /* sgprs spills aren't spilling */
> +   conf->scratch_bytes_per_wave =
> G_00B860_WAVESIZE(wavesize) * 256 * 4;
> +   }
>  }
> diff --git a/src/amd/common/ac_binary.h b/src/amd/common/ac_binary.h
> index 282f33d..06fd855 100644
> --- a/src/amd/common/ac_binary.h
> +++ b/src/amd/common/ac_binary.h
> @@ -27,6 +27,7 @@
>  #pragma once
>  
>  #include 
> +#include 
>  
>  struct 

Re: [Mesa-dev] [PATCH 9/9] i965: Drop _mesa_meta_pbo_TexSubImage() even for gen < 6

2017-01-24 Thread Tapani Pälli

I tested dropping meta here separately in the context of this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=99209

No regressions seen there.

Tested-by: Tapani Pälli 


On 12/20/2016 04:45 PM, Topi Pohjolainen wrote:

Signed-off-by: Topi Pohjolainen 
---
  src/mesa/drivers/dri/i965/intel_tex_image.c| 24 +++-
  src/mesa/drivers/dri/i965/intel_tex_subimage.c | 19 +--
  2 files changed, 12 insertions(+), 31 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index 67f83db..e503043 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -127,7 +127,6 @@ intelTexImage(struct gl_context * ctx,
  {
 struct brw_context *brw = brw_context(ctx);
 struct intel_texture_image *intelImage = intel_texture_image(texImage);
-   bool ok;
  
 bool tex_busy = intelImage->mt && drm_intel_bo_busy(intelImage->mt->bo);
  
@@ -156,22 +155,13 @@ intelTexImage(struct gl_context * ctx,

format, type, pixels, unpack))
return;
  
-   if (brw->gen < 6 &&

-   _mesa_meta_pbo_TexSubImage(ctx, dims, texImage, 0, 0, 0,
-  texImage->Width, texImage->Height,
-  texImage->Depth,
-  format, type, pixels,
-  tex_busy, unpack))
-  return;
-
-   ok = intel_texsubimage_tiled_memcpy(ctx, dims, texImage,
-   0, 0, 0, /*x,y,z offsets*/
-   texImage->Width,
-   texImage->Height,
-   texImage->Depth,
-   format, type, pixels, unpack,
-   false /*allocate_storage*/);
-   if (ok)
+   if (intel_texsubimage_tiled_memcpy(ctx, dims, texImage,
+  0, 0, 0, /*x,y,z offsets*/
+  texImage->Width,
+  texImage->Height,
+  texImage->Depth,
+  format, type, pixels, unpack,
+  false /*allocate_storage*/))
return;
  
 DBG("%s: upload image %dx%dx%d pixels %p\n",

diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c 
b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
index 741637a..60dc862 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
@@ -395,7 +395,6 @@ intelTexSubImage(struct gl_context * ctx,
  {
 struct brw_context *brw = brw_context(ctx);
 struct intel_mipmap_tree *mt = intel_texture_image(texImage)->mt;
-   bool ok;
  
 bool tex_busy = mt && drm_intel_bo_busy(mt->bo);
  
@@ -416,19 +415,11 @@ intelTexSubImage(struct gl_context * ctx,

format, type, pixels, packing))
return;
  
-   ok = _mesa_meta_pbo_TexSubImage(ctx, dims, texImage,

-   xoffset, yoffset, zoffset,
-   width, height, depth, format, type,
-   pixels, tex_busy, packing);
-   if (ok)
-  return;
-
-   ok = intel_texsubimage_tiled_memcpy(ctx, dims, texImage,
-   xoffset, yoffset, zoffset,
-   width, height, depth,
-   format, type, pixels, packing,
-   false /*for_glTexImage*/);
-   if (ok)
+   if (intel_texsubimage_tiled_memcpy(ctx, dims, texImage,
+  xoffset, yoffset, zoffset,
+  width, height, depth,
+  format, type, pixels, packing,
+  false /*for_glTexImage*/))
   return;
  
 _mesa_store_texsubimage(ctx, dims, texImage,



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] configure.ac: Remove redundant libglvnd stanza

2017-01-24 Thread Boyan Ding
There were two "libglvnd configuration" section in the squashed commit
that added libglvnd support, while only one in the original libglvnd
branch. A following commit moves one of them downwards. Now remove the
upper "older" one and move GL_LIB name decision downwards after the new
libglvnd configuration section.

Signed-off-by: Boyan Ding 
---
 configure.ac | 81 
 1 file changed, 32 insertions(+), 49 deletions(-)

diff --git a/configure.ac b/configure.ac
index 64ace9dbcb..687ad9f99b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -528,8 +528,6 @@ else
DEFINES="$DEFINES -DNDEBUG"
 fi
 
-DEFAULT_GL_LIB_NAME=GL
-
 dnl
 dnl Check if linker supports -Bsymbolic
 dnl
@@ -627,23 +625,6 @@ esac
 
 AM_CONDITIONAL(HAVE_COMPAT_SYMLINKS, test "x$HAVE_COMPAT_SYMLINKS" = xyes)
 
-DEFAULT_GL_LIB_NAME=GL
-
-dnl
-dnl Libglvnd configuration
-dnl
-AC_ARG_ENABLE([libglvnd],
-[AS_HELP_STRING([--enable-libglvnd],
-[Build for libglvnd @<:@default=disabled@:>@])],
-[enable_libglvnd="$enableval"],
-[enable_libglvnd=no])
-AM_CONDITIONAL(USE_LIBGLVND_GLX, test "x$enable_libglvnd" = xyes)
-#AM_COND_IF([USE_LIBGLVND_GLX], [DEFINES="${DEFINES} -DUSE_LIBGLVND_GLX=1"])
-if test "x$enable_libglvnd" = xyes ; then
-DEFINES="${DEFINES} -DUSE_LIBGLVND_GLX=1"
-DEFAULT_GL_LIB_NAME=GLX_mesa
-fi
-
 dnl
 dnl library names
 dnl
@@ -677,36 +658,6 @@ esac
 
 AC_SUBST([LIB_EXT])
 
-AC_ARG_WITH([gl-lib-name],
-  [AS_HELP_STRING([--with-gl-lib-name@<:@=NAME@:>@],
-[specify GL library name @<:@default=GL@:>@])],
-  [GL_LIB=$withval],
-  [GL_LIB="$DEFAULT_GL_LIB_NAME"])
-AC_ARG_WITH([osmesa-lib-name],
-  [AS_HELP_STRING([--with-osmesa-lib-name@<:@=NAME@:>@],
-[specify OSMesa library name @<:@default=OSMesa@:>@])],
-  [OSMESA_LIB=$withval],
-  [OSMESA_LIB=OSMesa])
-AS_IF([test "x$GL_LIB" = xyes], [GL_LIB="$DEFAULT_GL_LIB_NAME"])
-AS_IF([test "x$OSMESA_LIB" = xyes], [OSMESA_LIB=OSMesa])
-
-dnl
-dnl Mangled Mesa support
-dnl
-AC_ARG_ENABLE([mangling],
-  [AS_HELP_STRING([--enable-mangling],
-[enable mangled symbols and library name @<:@default=disabled@:>@])],
-  [enable_mangling="${enableval}"],
-  [enable_mangling=no]
-)
-if test "x${enable_mangling}" = "xyes" ; then
-  DEFINES="${DEFINES} -DUSE_MGL_NAMESPACE"
-  GL_LIB="Mangled${GL_LIB}"
-  OSMESA_LIB="Mangled${OSMESA_LIB}"
-fi
-AC_SUBST([GL_LIB])
-AC_SUBST([OSMESA_LIB])
-
 dnl
 dnl potentially-infringing-but-nobody-knows-for-sure stuff
 dnl
@@ -1332,6 +1283,8 @@ AM_CONDITIONAL(HAVE_DRI_GLX, test "x$enable_glx" = xdri)
 AM_CONDITIONAL(HAVE_XLIB_GLX, test "x$enable_glx" = xxlib)
 AM_CONDITIONAL(HAVE_GALLIUM_XLIB_GLX, test "x$enable_glx" = xgallium-xlib)
 
+DEFAULT_GL_LIB_NAME=GL
+
 dnl
 dnl Libglvnd configuration
 dnl
@@ -1361,6 +1314,36 @@ if test "x$enable_libglvnd" = xyes ; then
 DEFAULT_GL_LIB_NAME=GLX_mesa
 fi
 
+AC_ARG_WITH([gl-lib-name],
+  [AS_HELP_STRING([--with-gl-lib-name@<:@=NAME@:>@],
+[specify GL library name @<:@default=GL@:>@])],
+  [GL_LIB=$withval],
+  [GL_LIB="$DEFAULT_GL_LIB_NAME"])
+AC_ARG_WITH([osmesa-lib-name],
+  [AS_HELP_STRING([--with-osmesa-lib-name@<:@=NAME@:>@],
+[specify OSMesa library name @<:@default=OSMesa@:>@])],
+  [OSMESA_LIB=$withval],
+  [OSMESA_LIB=OSMesa])
+AS_IF([test "x$GL_LIB" = xyes], [GL_LIB="$DEFAULT_GL_LIB_NAME"])
+AS_IF([test "x$OSMESA_LIB" = xyes], [OSMESA_LIB=OSMesa])
+
+dnl
+dnl Mangled Mesa support
+dnl
+AC_ARG_ENABLE([mangling],
+  [AS_HELP_STRING([--enable-mangling],
+[enable mangled symbols and library name @<:@default=disabled@:>@])],
+  [enable_mangling="${enableval}"],
+  [enable_mangling=no]
+)
+if test "x${enable_mangling}" = "xyes" ; then
+  DEFINES="${DEFINES} -DUSE_MGL_NAMESPACE"
+  GL_LIB="Mangled${GL_LIB}"
+  OSMESA_LIB="Mangled${OSMESA_LIB}"
+fi
+AC_SUBST([GL_LIB])
+AC_SUBST([OSMESA_LIB])
+
 # Check for libdrm
 PKG_CHECK_MODULES([LIBDRM], [libdrm >= $LIBDRM_REQUIRED],
   [have_libdrm=yes], [have_libdrm=no])
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98002] Mud rendering bug in Portal 2

2017-01-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98002

--- Comment #15 from Clément Guérin  ---
Today's Portal 2 update fixed the bug.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/radeon: add a new HUD query for the number of mapped buffers

2017-01-24 Thread Michel Dänzer
On 24/01/17 07:38 PM, Nicolai Hähnle wrote:
> On 24.01.2017 11:34, Samuel Pitoiset wrote:
>> On 01/24/2017 11:31 AM, Nicolai Hähnle wrote:
>>> On 24.01.2017 11:25, Samuel Pitoiset wrote:
 On 01/24/2017 07:39 AM, Michel Dänzer wrote:
> On 24/01/17 05:44 AM, Samuel Pitoiset wrote:
>> Useful when debugging applications which map too much VRAM.
>
> Is the number of mapped buffers really useful, as opposed to the total
> size of buffer mappings? Even if it was the latter though, it doesn't
> show which mappings are for BOs in VRAM vs GTT, does it? Also, even
> the
> total size of mappings of BOs currently in VRAM doesn't directly
> reflect
> the pressure on the CPU visible part of VRAM — only the BOs which are
> actively being accessed by the CPU contribute to that.

 It's actually useful to know the number of mapped buffers, but maybe it
 would be better to have two separate counters for GTT and VRAM.
 Although
 the number of mapped buffers in VRAM is most of the time very high
 compared to GTT AFAIK.

 I will submit in a follow-up patch, something which reduces the number
 of mapped buffers in VRAM (when a BO has been mapped only once). And
 this new counter helped me.
>>>
>>> Michel's point probably means that reducing the number/size of mapped
>>> VRAM buffers isn't actually that important though.
>>
>> It seems useful for apps which map more than 256MB of VRAM.
> 
> True, if all of that range is actually used by the CPU (which may well
> happen, of course). If I understand Michel correctly (and this was news
> to me as well), if 1GB of VRAM is mapped, but only 64MB of that are
> regularly accessed by the CPU, then the kernel will migrate all of the
> rest into non-visible VRAM.

Some caveats:

While what you're describing should certainly be possible, I'm not sure
it's what currently happens with the amdgpu kernel driver. It's possible
that BOs are evicted from CPU visible VRAM to GTT instead of to CPU
invisible VRAM. Also, if a BO is currently in CPU invisible VRAM when
the CPU tries accessing it, and it can't be moved into CPU visible VRAM
(e.g. due to fragmentation caused by BOs which are pinned, either
permanently for scanout or temporarily for command stream execution),
it's migrated to GTT instead.

Anyway, the point is that the existence or absence of mappings per se
shouldn't affect the BO migration; only actual CPU access does.


Also note that BOs can currently only be migrated into CPU visible VRAM
as a whole for CPU access, i.e. the whole BO has to fit into a single
physically contiguous range of VRAM.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/radeon: add a new HUD query for the number of mapped buffers

2017-01-24 Thread Michel Dänzer
On 25/01/17 12:05 AM, Marek Olšák wrote:
> On Tue, Jan 24, 2017 at 2:17 PM, Christian König
>  wrote:
>> Am 24.01.2017 um 11:44 schrieb Samuel Pitoiset:
>>> On 01/24/2017 11:38 AM, Nicolai Hähnle wrote:
 On 24.01.2017 11:34, Samuel Pitoiset wrote:
> On 01/24/2017 11:31 AM, Nicolai Hähnle wrote:
>> On 24.01.2017 11:25, Samuel Pitoiset wrote:
>>> On 01/24/2017 07:39 AM, Michel Dänzer wrote:
 On 24/01/17 05:44 AM, Samuel Pitoiset wrote:
>
> Useful when debugging applications which map too much VRAM.


 Is the number of mapped buffers really useful, as opposed to the
 total
 size of buffer mappings? Even if it was the latter though, it doesn't
 show which mappings are for BOs in VRAM vs GTT, does it? Also, even
 the
 total size of mappings of BOs currently in VRAM doesn't directly
 reflect
 the pressure on the CPU visible part of VRAM — only the BOs which are
 actively being accessed by the CPU contribute to that.
>>>
>>>
>>> It's actually useful to know the number of mapped buffers, but maybe
>>> it
>>> would be better to have two separate counters for GTT and VRAM.
>>> Although
>>> the number of mapped buffers in VRAM is most of the time very high
>>> compared to GTT AFAIK.
>>>
>>> I will submit in a follow-up patch, something which reduces the number
>>> of mapped buffers in VRAM (when a BO has been mapped only once). And
>>> this new counter helped me.
>>
>>
>> Michel's point probably means that reducing the number/size of mapped
>> VRAM buffers isn't actually that important though.
>
>
> It seems useful for apps which map more than 256MB of VRAM.


 True, if all of that range is actually used by the CPU (which may well
 happen, of course). If I understand Michel correctly (and this was news
 to me as well), if 1GB of VRAM is mapped, but only 64MB of that are
 regularly accessed by the CPU, then the kernel will migrate all of the
 rest into non-visible VRAM.
>>>
>>>
>>> And this can hurt us, for example DXMD maps over 500MB of VRAM. And a
>>> bunch of BOs are only mapped once.
>>
>>
>> But when they are mapped once that won't be a problem.
>>
>> Again as Michel noted when a VRAM buffer is mapped it is migrated into the
>> visible parts of VRAM on access, not on mapping.
>>
>> In other words you can map all your VRAM buffers and keep them mapped and
>> that won't hurt anybody.
> 
> Are you saying that I can map 2 GB of VRAM and it will all stay in
> VRAM and I'll get maximum performance if it's not accessed by the CPU
> too much?

Yes, that's how it's supposed to work.


> Are you sure it won't have any adverse effects on anything?

That's a pretty big statement. :) Bugs happen.


> Having useless memory mappings certainly must have some negative
> effect on something. It doesn't seem like a good idea to have a lot of
> mapped memory that doesn't have to be mapped.

I guess e.g. the bookkeeping overhead might become significant with
large numbers of mappings. Maybe the issue Sam has been looking into is
actually related to something like that, not to VRAM?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/radeon: add a new HUD query for the number of mapped buffers

2017-01-24 Thread Michel Dänzer
On 24/01/17 07:18 PM, Nicolai Hähnle wrote:
> On 24.01.2017 07:39, Michel Dänzer wrote:
>> On 24/01/17 05:44 AM, Samuel Pitoiset wrote:
>>> Useful when debugging applications which map too much VRAM.
>>
>> Is the number of mapped buffers really useful, as opposed to the total
>> size of buffer mappings? Even if it was the latter though, it doesn't
>> show which mappings are for BOs in VRAM vs GTT, does it? Also, even the
>> total size of mappings of BOs currently in VRAM doesn't directly reflect
>> the pressure on the CPU visible part of VRAM — only the BOs which are
>> actively being accessed by the CPU contribute to that.
> 
> Thanks, I didn't know that.
> 
> However, the number of mapped buffers is still useful information
> because we used to run into Linux's limit on the number of simultaneous
> mmap()ings before :)

Makes sense, but then the commit log should be changed to better reflect
what it's useful for.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/37] util: add a disk_cache_remove() function

2017-01-24 Thread Timothy Arceri
On Tue, 2017-01-24 at 17:38 -0800, Eric Anholt wrote:
> Timothy Arceri  writes:
> 
> > On Tue, 2017-01-24 at 15:54 -0800, Eric Anholt wrote:
> > > Timothy Arceri  writes:
> > > 
> > > > From: Timothy Arceri 
> > > > 
> > > > This will be used to remove cache items created with old
> > > > versions
> > > > of Mesa or other invalid cache items from the cache.
> > > 
> > > I'm not convinced that removing the item from cache when we get a
> > > hit
> > > on
> > > everything in the key except for Mesa version is the right way to
> > > go.  I
> > > think we should just be hashing the Mesa version in the key so
> > > that
> > > we
> > > don't hit on mismatched versions.  Then we wouldn't thrash our
> > > cache
> > > when we're, say, checking out around different versions of Mesa
> > > and
> > > re-pigliting things.
> > 
> > I agree. I mention this problem in the cover letter, it's going to
> > take
> > some reworking so I was hoping to fix it in a follow-up.
> > 
> > The plan is to create directory structures like so:
> > 
> > Mesa-17.0.0/i965-BDW/
> > Mesa-17.1.0/i965-BDW/
> > 
> > This will allow us to just delete and entire directory if we are
> > hitting the cache limit and also easily allows third parties to
> > install
> > precompiled shaders in those dirs.
> 
> I don't get how Mesa-17.0.0 identifies a specific compile of Mesa, so
> that doesn't seem to solve versioning.  Are you going to have the
> Mesa
> build date or something under that?

It will be the Mesa version string which for stable would be something
like Mesa-17.0.0 and for git based packages it would be something like
Mesa 17.1.0 (git-38a67f0).

> 
> I'm pretty skeptical of anybody ever actually installing precompiled
> shaders and their users successfully getting cache hits off of them,
> so
> architecting for that seems strange to me.

Don't make Plagman sad. It's in the pipeline :)

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radv: add scratch support for spilling.

2017-01-24 Thread Dave Airlie
From: Dave Airlie 

Currently LLVM 5.0 has support for spilling to a place
pointed to by the user sgprs instead of using relocations.

This is enabled by using the amdgcn-mesa-mesa3d triple.

For compute gfx shaders we spill to a buffer pointed to
by 64-bit address stored in sgprs 0/1.
For other gfx shaders we spill to a buffer pointed to by
the first two dwords of the buffer pointed to in sgprs 0/1.

This patch enables radv to use the llvm support when present.

This fixes Sascha Willems computeshader demo first screen,
and a bunch of CTS tests now pass.

This patch is likely to be in LLVM 4.0 release as well
(fingers crossed) in which case we need to adjust the detection
logic.

SIgned-off-by: Dave Airlie 
---
 src/amd/common/ac_binary.c   |  30 +
 src/amd/common/ac_binary.h   |   4 +-
 src/amd/common/ac_llvm_util.c|   4 +-
 src/amd/common/ac_llvm_util.h|   2 +-
 src/amd/common/ac_nir_to_llvm.c  |  14 ++--
 src/amd/common/ac_nir_to_llvm.h  |   6 +-
 src/amd/vulkan/radv_cmd_buffer.c | 137 ++-
 src/amd/vulkan/radv_device.c |  22 +++
 src/amd/vulkan/radv_pipeline.c   |  10 +--
 src/amd/vulkan/radv_private.h|  13 
 10 files changed, 215 insertions(+), 27 deletions(-)

diff --git a/src/amd/common/ac_binary.c b/src/amd/common/ac_binary.c
index 01cf000..9c66a82 100644
--- a/src/amd/common/ac_binary.c
+++ b/src/amd/common/ac_binary.c
@@ -212,23 +212,28 @@ static const char *scratch_rsrc_dword1_symbol =
 
 void ac_shader_binary_read_config(struct ac_shader_binary *binary,
  struct ac_shader_config *conf,
- unsigned symbol_offset)
+ unsigned symbol_offset,
+ bool supports_spill)
 {
unsigned i;
const unsigned char *config =
ac_shader_binary_config_start(binary, symbol_offset);
bool really_needs_scratch = false;
-
+   uint32_t wavesize = 0;
/* LLVM adds SGPR spills to the scratch size.
 * Find out if we really need the scratch buffer.
 */
-   for (i = 0; i < binary->reloc_count; i++) {
-   const struct ac_shader_reloc *reloc = >relocs[i];
+   if (supports_spill) {
+   really_needs_scratch = true;
+   } else {
+   for (i = 0; i < binary->reloc_count; i++) {
+   const struct ac_shader_reloc *reloc = 
>relocs[i];
 
-   if (!strcmp(scratch_rsrc_dword0_symbol, reloc->name) ||
-   !strcmp(scratch_rsrc_dword1_symbol, reloc->name)) {
-   really_needs_scratch = true;
-   break;
+   if (!strcmp(scratch_rsrc_dword0_symbol, reloc->name) ||
+   !strcmp(scratch_rsrc_dword1_symbol, reloc->name)) {
+   really_needs_scratch = true;
+   break;
+   }
}
}
 
@@ -259,9 +264,7 @@ void ac_shader_binary_read_config(struct ac_shader_binary 
*binary,
case R_0286E8_SPI_TMPRING_SIZE:
case R_00B860_COMPUTE_TMPRING_SIZE:
/* WAVESIZE is in units of 256 dwords. */
-   if (really_needs_scratch)
-   conf->scratch_bytes_per_wave =
-   G_00B860_WAVESIZE(value) * 256 * 4;
+   wavesize = value;
break;
case SPILLED_SGPRS:
conf->spilled_sgprs = value;
@@ -285,4 +288,9 @@ void ac_shader_binary_read_config(struct ac_shader_binary 
*binary,
if (!conf->spi_ps_input_addr)
conf->spi_ps_input_addr = conf->spi_ps_input_ena;
}
+
+   if (really_needs_scratch) {
+   /* sgprs spills aren't spilling */
+   conf->scratch_bytes_per_wave = G_00B860_WAVESIZE(wavesize) * 
256 * 4;
+   }
 }
diff --git a/src/amd/common/ac_binary.h b/src/amd/common/ac_binary.h
index 282f33d..06fd855 100644
--- a/src/amd/common/ac_binary.h
+++ b/src/amd/common/ac_binary.h
@@ -27,6 +27,7 @@
 #pragma once
 
 #include 
+#include 
 
 struct ac_shader_reloc {
char name[32];
@@ -85,4 +86,5 @@ void ac_elf_read(const char *elf_data, unsigned elf_size,
 
 void ac_shader_binary_read_config(struct ac_shader_binary *binary,
  struct ac_shader_config *conf,
- unsigned symbol_offset);
+ unsigned symbol_offset,
+ bool supports_spill);
diff --git a/src/amd/common/ac_llvm_util.c b/src/amd/common/ac_llvm_util.c
index 770e3bd..3ba5281 100644
--- a/src/amd/common/ac_llvm_util.c
+++ b/src/amd/common/ac_llvm_util.c
@@ -126,11 +126,11 @@ static const char *ac_get_llvm_processor_name(enum 
radeon_family 

[Mesa-dev] [Bug 99527] Provide option for llvmpipe JIT code to run cleanly under valgrind

2017-01-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99527

--- Comment #1 from Roland Scheidegger  ---
I agree it would be really nice if we wouldn't get valgrind errors.
If you figure out how to fix it, patches welcome...
I tried to look into it at some point but couldn't really figure it out (didn't
invest all that much time though). I'm not even sure this isn't a valgrind bug
(last I checked there could still be some problems with simd instructions).

Tracking this stuff down in jit code isn't exactly easy, and having these
harmless errors makes it more difficult to debug real issues (I've seen invalid
reads and writes which needed to be fixed, and they got kinda buried in the
valgrind output).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8] nir/spirv/glsl450: Implement IEEE-compliant handling of atan2(±∞, ±∞).

2017-01-24 Thread Ian Romanick
This appears to do the same thing as the GLSL change.  This patch is

Reviewed-by: Ian Romanick 


On 01/24/2017 03:26 PM, Francisco Jerez wrote:
> ---
>  src/compiler/spirv/vtn_glsl450.c | 22 +-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/src/compiler/spirv/vtn_glsl450.c 
> b/src/compiler/spirv/vtn_glsl450.c
> index 508f218..7af2dad 100644
> --- a/src/compiler/spirv/vtn_glsl450.c
> +++ b/src/compiler/spirv/vtn_glsl450.c
> @@ -325,12 +325,32 @@ build_atan2(nir_builder *b, nir_ssa_def *y, nir_ssa_def 
> *x)
> nir_ssa_def *rcp_scaled_t = nir_frcp(b, nir_fmul(b, t, scale));
> nir_ssa_def *s_over_t = nir_fmul(b, nir_fmul(b, s, scale), rcp_scaled_t);
>  
> +   /* For |x| = |y| assume tan = 1 even if infinite (i.e. pretend momentarily
> +* that ∞/∞ = 1) in order to comply with the rather artificial rules
> +* inherited from IEEE 754-2008, namely:
> +*
> +*  "atan2(±∞, −∞) is ±3π/4
> +*   atan2(±∞, +∞) is ±π/4"
> +*
> +* Note that this is inconsistent with the rules for the neighborhood of
> +* zero that are based on iterated limits:
> +*
> +*  "atan2(±0, −0) is ±π
> +*   atan2(±0, +0) is ±0"
> +*
> +* but GLSL specifically allows implementations to deviate from IEEE rules
> +* at (0,0), so we take that license (i.e. pretend that 0/0 = 1 here as
> +* well).
> +*/
> +   nir_ssa_def *tan = nir_bcsel(b, nir_feq(b, nir_fabs(b, x), nir_fabs(b, 
> y)),
> +one, nir_fabs(b, s_over_t));
> +
> /* Calculate the arctangent and fix up the result if we had flipped the
>  * coordinate system.
>  */
> nir_ssa_def *arc = nir_fadd(b, nir_fmul(b, nir_b2f(b, flip),
> nir_imm_float(b, M_PI_2f)),
> -   build_atan(b, nir_fabs(b, s_over_t)));
> +   build_atan(b, tan));
>  
> /* Rather convoluted calculation of the sign of the result.  When x < 0 we
>  * cannot use fsign because we need to be able to distinguish between
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/8] glsl: Implement IEEE-compliant handling of atan2(±∞, ±∞).

2017-01-24 Thread Ian Romanick
This patch is

Reviewed-by: Ian Romanick 

On 01/24/2017 03:26 PM, Francisco Jerez wrote:
> ---
>  src/compiler/glsl/builtin_functions.cpp | 22 +-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/src/compiler/glsl/builtin_functions.cpp 
> b/src/compiler/glsl/builtin_functions.cpp
> index fd59381..9d6ab80 100644
> --- a/src/compiler/glsl/builtin_functions.cpp
> +++ b/src/compiler/glsl/builtin_functions.cpp
> @@ -3590,11 +3590,31 @@ builtin_builder::_atan2(const glsl_type *type)
> body.emit(assign(rcp_scaled_t, rcp(mul(t, scale;
> ir_expression *s_over_t = mul(mul(s, scale), rcp_scaled_t);
>  
> +   /* For |x| = |y| assume tan = 1 even if infinite (i.e. pretend momentarily
> +* that ∞/∞ = 1) in order to comply with the rather artificial rules
> +* inherited from IEEE 754-2008, namely:
> +*
> +*  "atan2(±∞, −∞) is ±3π/4
> +*   atan2(±∞, +∞) is ±π/4"
> +*
> +* Note that this is inconsistent with the rules for the neighborhood of
> +* zero that are based on iterated limits:
> +*
> +*  "atan2(±0, −0) is ±π
> +*   atan2(±0, +0) is ±0"
> +*
> +* but GLSL specifically allows implementations to deviate from IEEE rules
> +* at (0,0), so we take that license (i.e. pretend that 0/0 = 1 here as
> +* well).
> +*/
> +   ir_expression *tan = csel(equal(abs(x), abs(y)),
> + imm(1.0f, n), abs(s_over_t));
> +
> /* Calculate the arctangent and fix up the result if we had flipped the
>  * coordinate system.
>  */
> ir_variable *arc = body.make_temp(type, "arc");
> -   do_atan(body, type, arc, abs(s_over_t));
> +   do_atan(body, type, arc, tan);
> body.emit(assign(arc, add(arc, mul(b2f(flip), imm(M_PI_2f);
>  
> /* Rather convoluted calculation of the sign of the result.  When x < 0 we
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/8] glsl: Rewrite atan2 implementation to fix accuracy and handling of zero/infinity.

2017-01-24 Thread Ian Romanick
It's a real bummer that we have two implementations of this function
that are basically written in assembly... I'm not sure what else you'd
call generating IR by hand.  The code review and maintenance costs are
of the same magnitude for sure.

We could move this to GLSL and let the standalone compiler generate the
builder code.  I don't think that is currently helpful.  However, for
future "soft" int64 and fp64 work the standalone compiler will need to
be extended to also generate NIR builder.  Once that is done, I think
the cost-benefit analysis changes.

On 01/24/2017 03:26 PM, Francisco Jerez wrote:
> This addresses several issues of the current atan2 implementation:
> 
>  - Negative zero (and negative denorms which end up getting flushed to
>zero) isn't handled correctly by the current implementation.  The
>reason is that it does 'y >= 0' and 'x < 0' comparisons to decide
>on which side of the branch cut the argument is, which causes us to
>return incorrect results (off by up to 2π) for very small negative
>values.
> 
>  - There is a serious precision problem for x values of large enough
>magnitude introduced by the floating point division operation being
>implemented as a mul+rcp sequence.  This can lead to the quotient
>getting flushed to zero in some cases introducing an error of over
>8e6 ULP in the result -- Or in the most catastrophic case will
>cause us to return NaN instead of the correct value ±π/2 for y=±∞
>and x very large.  We can fix this easily by scaling down both
>arguments when the absolute value of the denominator goes above
>certain threshold.  The error of this atan2 implementation remains
>below 25 ULP in most of its domain except for a neighborhood of y=0
>where it reaches a maximum error of about 180 ULP.
> 
>  - It emits a bunch of instructions including no less than three
>if-else branches per scalar component that don't seem to get
>optimized out later on.  This implementation uses about 13% less
>instructions on Intel SKL hardware and doesn't emit any control
>flow instructions.
> ---
>  src/compiler/glsl/builtin_functions.cpp | 82 
> ++---
>  1 file changed, 46 insertions(+), 36 deletions(-)
> 
> diff --git a/src/compiler/glsl/builtin_functions.cpp 
> b/src/compiler/glsl/builtin_functions.cpp
> index 4a6c5af..fd59381 100644
> --- a/src/compiler/glsl/builtin_functions.cpp
> +++ b/src/compiler/glsl/builtin_functions.cpp
> @@ -3560,44 +3560,54 @@ builtin_builder::_acos(const glsl_type *type)
>  ir_function_signature *
>  builtin_builder::_atan2(const glsl_type *type)
>  {
> -   ir_variable *vec_y = in_var(type, "vec_y");
> -   ir_variable *vec_x = in_var(type, "vec_x");
> -   MAKE_SIG(type, always_available, 2, vec_y, vec_x);
> -
> -   ir_variable *vec_result = body.make_temp(type, "vec_result");
> -   ir_variable *r = body.make_temp(glsl_type::float_type, "r");
> -   for (int i = 0; i < type->vector_elements; i++) {
> -  ir_variable *y = body.make_temp(glsl_type::float_type, "y");
> -  ir_variable *x = body.make_temp(glsl_type::float_type, "x");
> -  body.emit(assign(y, swizzle(vec_y, i, 1)));
> -  body.emit(assign(x, swizzle(vec_x, i, 1)));
> -
> -  /* If |x| >= 1.0e-8 * |y|: */
> -  ir_if *outer_if =
> - new(mem_ctx) ir_if(greater(abs(x), mul(imm(1.0e-8f), abs(y;
> -
> -  ir_factory outer_then(_if->then_instructions, mem_ctx);
> -
> -  /* Then...call atan(y/x) */
> -  do_atan(outer_then, glsl_type::float_type, r, div(y, x));
> -
> -  /* ...and fix it up: */
> -  ir_if *inner_if = new(mem_ctx) ir_if(less(x, imm(0.0f)));
> -  inner_if->then_instructions.push_tail(
> - if_tree(gequal(y, imm(0.0f)),
> - assign(r, add(r, imm(M_PIf))),
> - assign(r, sub(r, imm(M_PIf);
> -  outer_then.emit(inner_if);
> -
> -  /* Else... */
> -  outer_if->else_instructions.push_tail(
> - assign(r, mul(sign(y), imm(M_PI_2f;
> +   const unsigned n = type->vector_elements;
> +   ir_variable *y = in_var(type, "y");
> +   ir_variable *x = in_var(type, "x");
> +   MAKE_SIG(type, always_available, 2, y, x);
>  
> -  body.emit(outer_if);
> +   /* If we're on the left half-plane rotate the coordinates π/2 clock-wise
> +* for the y=0 discontinuity to end up aligned with the vertical
> +* discontinuity of atan(s/t) along t=0.
> +*/
> +   ir_variable *flip = body.make_temp(glsl_type::bvec(n), "flip");
> +   body.emit(assign(flip, less(x, imm(0.0f, n;
> +   ir_variable *s = body.make_temp(type, "s");
> +   body.emit(assign(s, csel(flip, abs(x), y)));
> +   ir_variable *t = body.make_temp(type, "t");
> +   body.emit(assign(t, csel(flip, y, abs(x;
>  
> -  body.emit(assign(vec_result, r, 1 << i));
> -   }
> -   body.emit(ret(vec_result));
> +   /* If the magnitude of the denominator exceeds some huge value, scale down
> +* the arguments in order to 

Re: [Mesa-dev] [PATCH 06/37] util: add a disk_cache_remove() function

2017-01-24 Thread Eric Anholt
Timothy Arceri  writes:

> On Tue, 2017-01-24 at 15:54 -0800, Eric Anholt wrote:
>> Timothy Arceri  writes:
>> 
>> > From: Timothy Arceri 
>> > 
>> > This will be used to remove cache items created with old versions
>> > of Mesa or other invalid cache items from the cache.
>> 
>> I'm not convinced that removing the item from cache when we get a hit
>> on
>> everything in the key except for Mesa version is the right way to
>> go.  I
>> think we should just be hashing the Mesa version in the key so that
>> we
>> don't hit on mismatched versions.  Then we wouldn't thrash our cache
>> when we're, say, checking out around different versions of Mesa and
>> re-pigliting things.
>
> I agree. I mention this problem in the cover letter, it's going to take
> some reworking so I was hoping to fix it in a follow-up.
>
> The plan is to create directory structures like so:
>
> Mesa-17.0.0/i965-BDW/
> Mesa-17.1.0/i965-BDW/
>
> This will allow us to just delete and entire directory if we are
> hitting the cache limit and also easily allows third parties to install
> precompiled shaders in those dirs.

I don't get how Mesa-17.0.0 identifies a specific compile of Mesa, so
that doesn't seem to solve versioning.  Are you going to have the Mesa
build date or something under that?

I'm pretty skeptical of anybody ever actually installing precompiled
shaders and their users successfully getting cache hits off of them, so
architecting for that seems strange to me.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 99527] Provide option for llvmpipe JIT code to run cleanly under valgrind

2017-01-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99527

Bug ID: 99527
   Summary: Provide option for llvmpipe JIT code to run cleanly
under valgrind
   Product: Mesa
   Version: 13.0
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Severity: normal
  Priority: medium
 Component: Other
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: john.fireba...@gmail.com
QA Contact: mesa-dev@lists.freedesktop.org

Currently llvmpipe JIT code is known to trigger errors when run under valgrind.
For example, bug #29922 reports the following, which I also observe:

==17795== Conditional jump or move depends on uninitialised value(s)
==17795==at 0x573F792: ???
==17795==by 0x4171342: lp_rast_shade_quads_mask (lp_rast.c:473)
==17795==by 0x4173EE9: do_block_4_3 (lp_rast_tri_tmp.h:61)
==17795==by 0x4178087: lp_rast_triangle_3_16 (lp_rast_tri.c:229)
==17795==by 0x4171913: rasterize_bin (lp_rast.c:667)
==17795==by 0x4171ACE: rasterize_scene (lp_rast.c:766)
==17795==by 0x4171BA4: lp_rast_queue_scene (lp_rast.c:791)
==17795==by 0x4178EB4: lp_scene_rasterize (lp_scene.c:405)
==17795==by 0x4179DF4: lp_setup_rasterize_scene (lp_setup.c:158)
==17795==by 0x417A296: set_scene_state (lp_setup.c:260)
==17795==by 0x417A39C: lp_setup_flush (lp_setup.c:295)
==17795==by 0x416E756: llvmpipe_flush (lp_flush.c:56)

That bug is closed as RESOLVED WONTFIX but I would like to ask that this be
reconsidered. Conscientious downstream developers want to make sure their code
runs cleanly under valgrind. If libraries they use trigger lots of errors, it
makes this task more difficult. For instance, I first had to determine whether
or not this error represented a misuse of OpenGL by my own code. In this case,
it's possible to search for "valgrind lp_rast_shade_quads_mask" and find the
above bug report, so I was able to reasonably conclude that this was not a bug
I was responsible for. In many of the other errors in JIT code that valgrind
reports, that's not the case, and I'm still not 100% sure of the status --
whether it's a bug in my code, a bug in llvm, a supposedly harmless use of an
uninitialized value, or a true false positive.

I'm not the only one dissatisfied with the status quo. For a more strongly
worded opinion, see
http://www.americanteeth.org/2013/08/14/valgrind-is-not-optional/.

If you believe that fixing these errors would harm performance of production
builds, please consider using the `--enable-valgrind` configure flag as an
explicit opt-in mechanism.

For reference, here are some of the other errors I have received:

==9337== Conditional jump or move depends on uninitialised value(s)
==9337==at 0x402E63D: ???
==9337==by 0xD32C84D: lp_rast_shade_quads_all (lp_rast_priv.h:271)
==9337==by 0xD32C368: block_full_4 (lp_rast_tri.c:46)
==9337==by 0xD329222: do_block_16_32_3 (lp_rast_tri_tmp.h:167)
==9337==by 0xD328E52: lp_rast_triangle_32_3 (lp_rast_tri_tmp.h:305)
==9337==by 0xD32073C: do_rasterize_bin (lp_rast.c:609)
==9337==by 0xD3203EB: rasterize_bin (lp_rast.c:628)
==9337==by 0xD31FBD1: rasterize_scene (lp_rast.c:688)
==9337==by 0xD321823: thread_function (lp_rast.c:828)
==9337==by 0xD321A61: impl_thrd_routine (threads_posix.h:87)
==9337==by 0x4E42183: start_thread (pthread_create.c:312)
==9337==by 0x6A6E37C: clone (clone.S:111)
==9337==  Uninitialised value was created by a heap allocation
==9337==at 0x4C2B221: operator new(unsigned long) (in
/home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/valgrind/3.12.0/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9337==by 0xDB14217: llvm::User::operator new(unsigned long, unsigned int)
(in
/home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/mesa/13.0.3/lib/dri/swrast_dri.so)
==9337==by 0xDA60CDA: llvm::ConstantFP::get(llvm::LLVMContext&,
llvm::APFloat const&) (in
/home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/mesa/13.0.3/lib/dri/swrast_dri.so)
==9337==by 0xDA629BD: llvm::ConstantFP::get(llvm::Type*, double) (in
/home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/mesa/13.0.3/lib/dri/swrast_dri.so)
==9337==by 0xD29993E: lp_build_const_elem (lp_bld_const.c:309)
==9337==by 0xD2999F0: lp_build_const_vec (lp_bld_const.c:333)
==9337==by 0xD29B902: lp_build_conv (lp_bld_conv.c:654)
==9337==by 0xD29B08E: lp_build_conv_auto (lp_bld_conv.c:491)
==9337==by 0xD344C3C: generate_unswizzled_blend (lp_state_fs.c:1884)
==9337==by 0xD342505: generate_fragment (lp_state_fs.c:2452)
==9337==by 0xD340947: generate_variant (lp_state_fs.c:2637)
==9337==by 0xD33FC79: llvmpipe_update_fs (lp_state_fs.c:3204)
==9337== 

==9337== Thread 3 llvmpipe-1:
==9337== Use of uninitialised value of size 8
==9337==at 0x4035AEE: ???
==9337==by 0x40354D4: ???

[Mesa-dev] [PATCH 1/4] mesa: Trivial clean-ups in uniform_query.cpp

2017-01-24 Thread Ian Romanick
From: Ian Romanick 

This is C++, so we can mix code and declarations.  Doing so allows
constification.

Signed-off-by: Ian Romanick 
---
 src/mesa/main/uniform_query.cpp | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/src/mesa/main/uniform_query.cpp b/src/mesa/main/uniform_query.cpp
index d5a2d0f..c2429c1 100644
--- a/src/mesa/main/uniform_query.cpp
+++ b/src/mesa/main/uniform_query.cpp
@@ -992,10 +992,6 @@ _mesa_uniform_matrix(struct gl_context *ctx, struct 
gl_shader_program *shProg,
  const GLvoid *values, enum glsl_base_type basicType)
 {
unsigned offset;
-   unsigned vectors;
-   unsigned components;
-   unsigned elements;
-   int size_mul;
struct gl_uniform_storage *const uni =
   validate_uniform_parameters(ctx, shProg, location, count,
   , "glUniformMatrix");
@@ -1009,11 +1005,11 @@ _mesa_uniform_matrix(struct gl_context *ctx, struct 
gl_shader_program *shProg,
}
 
assert(basicType == GLSL_TYPE_FLOAT || basicType == GLSL_TYPE_DOUBLE);
-   size_mul = basicType == GLSL_TYPE_DOUBLE ? 2 : 1;
+   const unsigned size_mul = basicType == GLSL_TYPE_DOUBLE ? 2 : 1;
 
assert(!uni->type->is_sampler());
-   vectors = uni->type->matrix_columns;
-   components = uni->type->vector_elements;
+   const unsigned vectors = uni->type->matrix_columns;
+   const unsigned components = uni->type->vector_elements;
 
/* Verify that the types are compatible.  This is greatly simplified for
 * matrices because they can only have a float base type.
@@ -1084,7 +1080,7 @@ _mesa_uniform_matrix(struct gl_context *ctx, struct 
gl_shader_program *shProg,
 
/* Store the data in the "actual type" backing storage for the uniform.
 */
-   elements = components * vectors;
+   const unsigned elements = components * vectors;
 
if (!transpose) {
   memcpy(>storage[size_mul * elements * offset], values,
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] mesa: Arrange _mesa_uniform parameters to match the call sites

2017-01-24 Thread Ian Romanick
From: Ian Romanick 

By putting the parameters first that match the parameters to the call
site, 4 (of 14) instructions are saved at _mesa_Uniform4fv on x64.  On
IA32, the details of the instructions change, but it is the same count
and mix of instructions.

Before:

0830 <_mesa_Uniform4fv>:
 830:   48 83 ec 10 sub$0x10,%rsp
 834:   49 89 d0mov%rdx,%r8
 837:   48 8b 15 00 00 00 00mov0x0(%rip),%rdx# 83e 
<_mesa_Uniform4fv+0xe>
 83e:   89 f8   mov%edi,%eax
 840:   89 f1   mov%esi,%ecx
 842:   41 b9 02 00 00 00   mov$0x2,%r9d
 848:   64 48 8b 3a mov%fs:(%rdx),%rdi
 84c:   48 8b 97 c8 01 02 00mov0x201c8(%rdi),%rdx
 853:   48 8b 72 70 mov0x70(%rdx),%rsi
 857:   6a 04   pushq  $0x4
 859:   89 c2   mov%eax,%edx
 85b:   e8 00 00 00 00  callq  860 <_mesa_Uniform4fv+0x30>
 860:   48 83 c4 18 add$0x18,%rsp
 864:   c3  retq

After:

07f0 <_mesa_Uniform4fv>:
 7f0:   48 83 ec 10 sub$0x10,%rsp
 7f4:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 7fb 
<_mesa_Uniform4fv+0xb>
 7fb:   41 b9 02 00 00 00   mov$0x2,%r9d
 801:   64 48 8b 08 mov%fs:(%rax),%rcx
 805:   48 8b 81 c8 01 02 00mov0x201c8(%rcx),%rax
 80c:   6a 04   pushq  $0x4
 80e:   4c 8b 40 70 mov0x70(%rax),%r8
 812:   e8 00 00 00 00  callq  817 <_mesa_Uniform4fv+0x27>
 817:   48 83 c4 18 add$0x18,%rsp
 81b:   c3  retq

Saves a measly 416 bytes of text on x64.  Depending on exactly when this
is applied, a lot of variation is possible due to function alignment.

   textdata bss dec hex filename
6670131  228340   22552 6921023  699b3f lib/i965_dri.so before
6670131  228340   22552 6921023  699b3f lib/i965_dri.so after
6343348  293872   29880 6667100  65bb5c lib64/i965_dri.so before
6342932  293872   29880 684  65b9bc lib64/i965_dri.so after

There is likely to be no performance change with just this patch.
_mesa_uniform immediately calls validate_uniform_parameters with
parameters in the "wrong" (different from the call site) order.

v2: Rebase on GL_ARB_gpu_shader_fp64.

v3: Rebase on GL_ARB_gpu_shader_int64.

Signed-off-by: Ian Romanick 
---
 src/mesa/main/uniform_query.cpp |   8 +-
 src/mesa/main/uniforms.c| 192 
 src/mesa/main/uniforms.h|   8 +-
 3 files changed, 102 insertions(+), 106 deletions(-)

diff --git a/src/mesa/main/uniform_query.cpp b/src/mesa/main/uniform_query.cpp
index 0275e4f..ef51571 100644
--- a/src/mesa/main/uniform_query.cpp
+++ b/src/mesa/main/uniform_query.cpp
@@ -771,11 +771,9 @@ glsl_type_name(enum glsl_base_type type)
  * Called via glUniform*() functions.
  */
 extern "C" void
-_mesa_uniform(struct gl_context *ctx, struct gl_shader_program *shProg,
- GLint location, GLsizei count,
-  const GLvoid *values,
-  enum glsl_base_type basicType,
-  unsigned src_components)
+_mesa_uniform(GLint location, GLsizei count, const GLvoid *values,
+  struct gl_context *ctx, struct gl_shader_program *shProg,
+  enum glsl_base_type basicType, unsigned src_components)
 {
unsigned offset;
int size_mul = glsl_base_type_is_64bit(basicType) ? 2 : 1;
diff --git a/src/mesa/main/uniforms.c b/src/mesa/main/uniforms.c
index c1d951a..a954055 100644
--- a/src/mesa/main/uniforms.c
+++ b/src/mesa/main/uniforms.c
@@ -150,7 +150,7 @@ void GLAPIENTRY
 _mesa_Uniform1f(GLint location, GLfloat v0)
 {
GET_CURRENT_CONTEXT(ctx);
-   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, 1, , 
GLSL_TYPE_FLOAT, 1);
+   _mesa_uniform(location, 1, , ctx, ctx->_Shader->ActiveProgram, 
GLSL_TYPE_FLOAT, 1);
 }
 
 void GLAPIENTRY
@@ -160,7 +160,7 @@ _mesa_Uniform2f(GLint location, GLfloat v0, GLfloat v1)
GLfloat v[2];
v[0] = v0;
v[1] = v1;
-   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, 1, v, 
GLSL_TYPE_FLOAT, 2);
+   _mesa_uniform(location, 1, v, ctx, ctx->_Shader->ActiveProgram, 
GLSL_TYPE_FLOAT, 2);
 }
 
 void GLAPIENTRY
@@ -171,7 +171,7 @@ _mesa_Uniform3f(GLint location, GLfloat v0, GLfloat v1, 
GLfloat v2)
v[0] = v0;
v[1] = v1;
v[2] = v2;
-   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, 1, v, 
GLSL_TYPE_FLOAT, 3);
+   _mesa_uniform(location, 1, v, ctx, ctx->_Shader->ActiveProgram, 
GLSL_TYPE_FLOAT, 3);
 }
 
 void GLAPIENTRY
@@ -184,14 +184,14 @@ _mesa_Uniform4f(GLint location, GLfloat v0, GLfloat v1, 
GLfloat v2,
v[1] = v1;
v[2] = v2;
v[3] = v3;
-   

[Mesa-dev] [PATCH 0/4] Micro optimizations for glUniform and glUniformMatrix

2017-01-24 Thread Ian Romanick
These are some patches that I wrote ages ago... the initial versions
pre-date Mesa's ARB_gpu_shader_fp64 support.  This was part of a larger
effort that got bogged down and eventually abandonded.  The problem with
the larger series was trying to measure the performance impact.  Random
changes in function alignment had more impact on CPU-bound tests than
anything else I did.

I believe that these changes are good without collecting performance
data.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] mesa: Arrange validate_uniform_parameters parameters to match call sites

2017-01-24 Thread Ian Romanick
From: Ian Romanick 

Saves a measly 20 bytes on IA32 and nothing on x64.  Depending on
exactly when this is applied, a lot of variation is possible due to
function alignment.

   textdata bss dec hex filename
6670131  228340   22552 6921023  699b3f lib/i965_dri.so before
6670111  228340   22552 6921003  699b2b lib/i965_dri.so after
6342932  293872   29880 684  65b9bc lib64/i965_dri.so before
6342932  293872   29880 684  65b9bc lib64/i965_dri.so after

Signed-off-by: Ian Romanick 
---
 src/mesa/main/uniform_query.cpp | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/mesa/main/uniform_query.cpp b/src/mesa/main/uniform_query.cpp
index ef51571..418cfc9 100644
--- a/src/mesa/main/uniform_query.cpp
+++ b/src/mesa/main/uniform_query.cpp
@@ -156,11 +156,11 @@ _mesa_GetActiveUniformsiv(GLuint program,
 }
 
 static struct gl_uniform_storage *
-validate_uniform_parameters(struct gl_context *ctx,
-   struct gl_shader_program *shProg,
-   GLint location, GLsizei count,
-   unsigned *array_index,
-   const char *caller)
+validate_uniform_parameters(GLint location, GLsizei count,
+unsigned *array_index,
+struct gl_context *ctx,
+struct gl_shader_program *shProg,
+const char *caller)
 {
if (shProg == NULL) {
   _mesa_error(ctx, GL_INVALID_OPERATION, "%s(program not linked)", caller);
@@ -284,8 +284,8 @@ _mesa_get_uniform(struct gl_context *ctx, GLuint program, 
GLint location,
unsigned offset;
 
struct gl_uniform_storage *const uni =
-  validate_uniform_parameters(ctx, shProg, location, 1,
-  , "glGetUniform");
+  validate_uniform_parameters(location, 1, ,
+  ctx, shProg, "glGetUniform");
if (uni == NULL) {
   /* For glGetUniform, page 264 (page 278 of the PDF) of the OpenGL 2.1
* spec says:
@@ -779,8 +779,8 @@ _mesa_uniform(GLint location, GLsizei count, const GLvoid 
*values,
int size_mul = glsl_base_type_is_64bit(basicType) ? 2 : 1;
 
struct gl_uniform_storage *const uni =
-  validate_uniform_parameters(ctx, shProg, location, count,
-  , "glUniform");
+  validate_uniform_parameters(location, count, ,
+  ctx, shProg, "glUniform");
if (uni == NULL)
   return;
 
@@ -990,8 +990,8 @@ _mesa_uniform_matrix(GLint location, GLsizei count,
 {
unsigned offset;
struct gl_uniform_storage *const uni =
-  validate_uniform_parameters(ctx, shProg, location, count,
-  , "glUniformMatrix");
+  validate_uniform_parameters(location, count, ,
+  ctx, shProg, "glUniformMatrix");
if (uni == NULL)
   return;
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] mesa: Arrange _mesa_uniform_matrix parameters to match the call sites

2017-01-24 Thread Ian Romanick
From: Ian Romanick 

By putting the parameters first that match the parameters to the call
site, 4 (of 16) instructions are saved at _mesa_UniformMatrix4fv on
x64.  On IA32, the details of the instructions change, but it is the
same count and mix of instructions.

Before:

1380 <_mesa_UniformMatrix4fv>:
1380:   48 83 ec 10 sub$0x10,%rsp
1384:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 138b 
<_mesa_UniformMatrix4fv+0xb>
138b:   41 89 f8mov%edi,%r8d
138e:   41 89 f1mov%esi,%r9d
1391:   0f b6 d2movzbl %dl,%edx
1394:   64 48 8b 38 mov%fs:(%rax),%rdi
1398:   48 8b b7 c8 01 02 00mov0x201c8(%rdi),%rsi
139f:   48 8b 76 70 mov0x70(%rsi),%rsi
13a3:   68 06 14 00 00  pushq  $0x1406
13a8:   51  push   %rcx
13a9:   52  push   %rdx
13aa:   b9 04 00 00 00  mov$0x4,%ecx
13af:   ba 04 00 00 00  mov$0x4,%edx
13b4:   e8 00 00 00 00  callq  13b9 
<_mesa_UniformMatrix4fv+0x39>
13b9:   48 83 c4 28 add$0x28,%rsp
13bd:   c3  retq

After:

1360 <_mesa_UniformMatrix4fv>:
1360:   48 83 ec 10 sub$0x10,%rsp
1364:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 136b 
<_mesa_UniformMatrix4fv+0xb>
136b:   0f b6 d2movzbl %dl,%edx
136e:   64 4c 8b 00 mov%fs:(%rax),%r8
1372:   49 8b 80 c8 01 02 00mov0x201c8(%r8),%rax
1379:   68 06 14 00 00  pushq  $0x1406
137e:   6a 04   pushq  $0x4
1380:   6a 04   pushq  $0x4
1382:   4c 8b 48 70 mov0x70(%rax),%r9
1386:   e8 00 00 00 00  callq  138b 
<_mesa_UniformMatrix4fv+0x2b>
138b:   48 83 c4 28 add$0x28,%rsp
138f:   c3  retq

Saves a measly 576 bytes of text on x64.

   textdata bss dec hex filename
6670131  228340   22552 6921023  699b3f lib/i965_dri.so before
6670131  228340   22552 6921023  699b3f lib/i965_dri.so after
6343924  293872   29880 6667676  65bd9c lib64/i965_dri.so before
6343348  293872   29880 6667100  65bb5c lib64/i965_dri.so after

v2: Rebase on GL_ARB_gpu_shader_fp64.

Signed-off-by: Ian Romanick 
---
 src/mesa/main/uniform_query.cpp |   9 ++--
 src/mesa/main/uniforms.c| 117 +---
 src/mesa/main/uniforms.h|   9 ++--
 3 files changed, 71 insertions(+), 64 deletions(-)

diff --git a/src/mesa/main/uniform_query.cpp b/src/mesa/main/uniform_query.cpp
index c2429c1..0275e4f 100644
--- a/src/mesa/main/uniform_query.cpp
+++ b/src/mesa/main/uniform_query.cpp
@@ -985,11 +985,10 @@ _mesa_uniform(struct gl_context *ctx, struct 
gl_shader_program *shProg,
  * Note: cols=2, rows=4  ==>  array[2] of vec4
  */
 extern "C" void
-_mesa_uniform_matrix(struct gl_context *ctx, struct gl_shader_program *shProg,
-GLuint cols, GLuint rows,
- GLint location, GLsizei count,
- GLboolean transpose,
- const GLvoid *values, enum glsl_base_type basicType)
+_mesa_uniform_matrix(GLint location, GLsizei count,
+ GLboolean transpose, const void *values,
+ struct gl_context *ctx, struct gl_shader_program *shProg,
+ GLuint cols, GLuint rows, enum glsl_base_type basicType)
 {
unsigned offset;
struct gl_uniform_storage *const uni =
diff --git a/src/mesa/main/uniforms.c b/src/mesa/main/uniforms.c
index 3b645cb..c1d951a 100644
--- a/src/mesa/main/uniforms.c
+++ b/src/mesa/main/uniforms.c
@@ -551,8 +551,8 @@ _mesa_UniformMatrix2fv(GLint location, GLsizei count, 
GLboolean transpose,
   const GLfloat * value)
 {
GET_CURRENT_CONTEXT(ctx);
-   _mesa_uniform_matrix(ctx, ctx->_Shader->ActiveProgram,
-   2, 2, location, count, transpose, value, 
GLSL_TYPE_FLOAT);
+   _mesa_uniform_matrix(location, count, transpose, value,
+ctx, ctx->_Shader->ActiveProgram, 2, 2, 
GLSL_TYPE_FLOAT);
 }
 
 void GLAPIENTRY
@@ -560,8 +560,8 @@ _mesa_UniformMatrix3fv(GLint location, GLsizei count, 
GLboolean transpose,
   const GLfloat * value)
 {
GET_CURRENT_CONTEXT(ctx);
-   _mesa_uniform_matrix(ctx, ctx->_Shader->ActiveProgram,
-   3, 3, location, count, transpose, value, 
GLSL_TYPE_FLOAT);
+   _mesa_uniform_matrix(location, count, transpose, value,
+ctx, ctx->_Shader->ActiveProgram, 3, 3, 
GLSL_TYPE_FLOAT);
 }
 
 void GLAPIENTRY
@@ -569,8 +569,8 @@ _mesa_UniformMatrix4fv(GLint location, GLsizei count, 

Re: [Mesa-dev] [PATCH 08/37] glsl: add initial implementation of shader cache

2017-01-24 Thread Timothy Arceri
On Tue, 2017-01-24 at 16:33 -0800, Eric Anholt wrote:
> Timothy Arceri  writes:
> 
> > From: Timothy Arceri 
> > 
> > This uses disk_cache.c to write out a serialization of various
> > state that's required in order to successfully load and use a
> > binary written out by a drivers backend, this state is referred to
> > as
> > "metadata" throughout the implementation.
> > 
> > This initial version is intended to work with vertex and fragment
> > shader stages only.
> 
> This is really interesting.  I was definitely expecting that the
> cache
> at this level would be a map from ([sha1s of shader source], mesa
> version, compiler options, other linker inputs) -> ([compiled GLSL IR
> shaders], linker metadata output).  The advantage you seem to be
> going
> for is to not have GLSL IR ever present in memory, which would be
> pretty
> cool.

That's the plan. It does mean we need some special handling for when we
must fallback to a recompile (i965 shader variants, corrupt cache
items, etc) but it's not so bad. It certainly simpler that caching the
IR. In the i965 patchset I add an environment var to enabled this
fallback path to be forced for debugging.

>   I'm really curious to see how this would work out for a gallium
> driver.

Yeah I really haven't looked at this very hard yet. I'll start looking
at it next week, but my assumption was we might need 3 levels of cache
for a gallium driver. glsl, gallium and backend caches.
 
> 
> Could you extend the file's doxygen comment to cover some of these
> design decisions?

Sure.

> 
> Also, I think in this series you've missed having the
> gl_shader_compiler_options options in the shader key, which I believe
> might affect the compiled metadata output.  Other than that, will
> gallium vs i965 have different GLSL IR passes being run at the
> CompileShader or LinkShader stages before we write to disk?  Will we
> need the driver's name to be in the key, maybe?

See my reply to patch 6 I think that should cover all of these issues. 

I'd really like it if that didn't hold this up from landing however as
I'd really like to start working on improvements rather than constantly
wasting time rebasing things :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] vulkan/wsi: Lower the maximum image sizes

2017-01-24 Thread Jason Ekstrand
---
 src/vulkan/wsi/wsi_common_wayland.c | 3 ++-
 src/vulkan/wsi/wsi_common_x11.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/vulkan/wsi/wsi_common_wayland.c 
b/src/vulkan/wsi/wsi_common_wayland.c
index c9c476e..bdb80a7 100644
--- a/src/vulkan/wsi/wsi_common_wayland.c
+++ b/src/vulkan/wsi/wsi_common_wayland.c
@@ -379,7 +379,8 @@ wsi_wl_surface_get_capabilities(VkIcdSurfaceBase *surface,
 
caps->currentExtent = (VkExtent2D) { -1, -1 };
caps->minImageExtent = (VkExtent2D) { 1, 1 };
-   caps->maxImageExtent = (VkExtent2D) { INT16_MAX, INT16_MAX };
+   /* This is the maximum supported size on Intel */
+   caps->maxImageExtent = (VkExtent2D) { 1 << 14, 1 << 14 };
caps->supportedTransforms = VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
caps->currentTransform = VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
caps->maxImageArrayLayers = 1;
diff --git a/src/vulkan/wsi/wsi_common_x11.c b/src/vulkan/wsi/wsi_common_x11.c
index 5e3c910..851932d 100644
--- a/src/vulkan/wsi/wsi_common_x11.c
+++ b/src/vulkan/wsi/wsi_common_x11.c
@@ -370,7 +370,8 @@ x11_surface_get_capabilities(VkIcdSurfaceBase *icd_surface,
*/
   caps->currentExtent = (VkExtent2D) { -1, -1 };
   caps->minImageExtent = (VkExtent2D) { 1, 1 };
-  caps->maxImageExtent = (VkExtent2D) { INT16_MAX, INT16_MAX };
+  /* This is the maximum supported size on Intel */
+  caps->maxImageExtent = (VkExtent2D) { 1 << 14, 1 << 14 };
}
free(err);
free(geom);
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/8] i965/fs: Fix nir_op_fsign of absolute value.

2017-01-24 Thread Ian Romanick
On 01/24/2017 03:26 PM, Francisco Jerez wrote:
> This does point at the front-end emitting silly code that could have
> been optimized out, but the current fsign implementation would emit
> bogus IR if abs was set for the argument (because it would apply the
> abs modifier on an unsigned integer type), and we shouldn't rely on
> the upper layer's optimization passes for correctness.

Other than the atan2 code you emit later in the series, is there a test
for this?

> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index e1ab598..e0c2fa0 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -701,7 +701,14 @@ fs_visitor::nir_emit_alu(const fs_builder , 
> nir_alu_instr *instr)
>break;
>  
> case nir_op_fsign: {
> -  if (type_sz(op[0].type) < 8) {
> +  if (op[0].abs) {
> + /* Straightforward since the source can be assumed to be
> +  * non-negative.
> +  */
> + set_condmod(BRW_CONDITIONAL_NZ, bld.MOV(result, op[0]));
> + set_predicate(BRW_PREDICATE_NORMAL, bld.MOV(result, 
> brw_imm_f(1.0f)));

Does this work for DF source?

If we had an optimization pass for this, it would probably map
fsign(abs(a)) to float(a != 0) or double(a != 0).  This is different
from what we would generate for that, but I don't know which is better.

> +
> +  } else if (type_sz(op[0].type) < 8) {
>   /* AND(val, 0x8000) gives the sign bit.
>*
>* Predicated OR ORs 1.0 (0x3f80) with the sign bit if val is 
> not
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97879] [amdgpu] Rocket League: long hangs (several seconds) when loading assets (models/textures/shaders?)

2017-01-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97879

--- Comment #54 from Michel Dänzer  ---
(In reply to Marek Olšák from comment #52)
> 2) Make a screenshot of the sysprof window and send it to the game developer.

Please save the profile in sysprof and send the saved data instead of a
screenshot. Then the recipient can peruse the profile any way they like.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/8] glsl: Fix constant evaluation of the rcp op.

2017-01-24 Thread Ian Romanick
On 01/24/2017 03:26 PM, Francisco Jerez wrote:
> Will avoid a regression in a future commit that introduces some
> additional rcp operations.

When I converted GLSL IR to ir_expression_operation.py, I was careful to
keep all the expressions the same.  rcp and div had these weird guards.
GLSL doesn't require that NaN be generated, and quite a few old GPUs
don't.  If the atan2 implementation depends on NaN being generated by
rcp, it may have problems on i915, r300, and similar GPUs.  I don't know
what they generate, but it's not NaN and it's probably not 0.0.

That said, this matches NIR, and it's probably fine.

> ---
>  src/compiler/glsl/ir_expression_operation.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/compiler/glsl/ir_expression_operation.py 
> b/src/compiler/glsl/ir_expression_operation.py
> index f91ac9b..4ac1ffb 100644
> --- a/src/compiler/glsl/ir_expression_operation.py
> +++ b/src/compiler/glsl/ir_expression_operation.py
> @@ -422,7 +422,7 @@ ir_expression_operation = [
> operation("neg", 1, source_types=numeric_types, c_expression={'u': 
> "-((int) {src0})", 'default': "-{src0}"}),
> operation("abs", 1, source_types=signed_numeric_types, c_expression={'i': 
> "{src0} < 0 ? -{src0} : {src0}", 'f': "fabsf({src0})", 'd': "fabs({src0})", 
> 'i64': "{src0} < 0 ? -{src0} : {src0}"}),
> operation("sign", 1, source_types=signed_numeric_types, 
> c_expression={'i': "({src0} > 0) - ({src0} < 0)", 'f': "float(({src0} > 0.0F) 
> - ({src0} < 0.0F))", 'd': "double(({src0} > 0.0) - ({src0} < 0.0))", 'i64': 
> "({src0} > 0) - ({src0} < 0)"}),
> -   operation("rcp", 1, source_types=real_types, c_expression={'f': "{src0} 
> != 0.0F ? 1.0F / {src0} : 0.0F", 'd': "{src0} != 0.0 ? 1.0 / {src0} : 0.0"}),
> +   operation("rcp", 1, source_types=real_types, c_expression={'f': "1.0F / 
> {src0}", 'd': "1.0 / {src0}"}),
> operation("rsq", 1, source_types=real_types, c_expression={'f': "1.0F / 
> sqrtf({src0})", 'd': "1.0 / sqrt({src0})"}),
> operation("sqrt", 1, source_types=real_types, c_expression={'f': 
> "sqrtf({src0})", 'd': "sqrt({src0})"}),
> operation("exp", 1, source_types=(float_type,), 
> c_expression="expf({src0})"), # Log base e on gentype
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix fast depth clears for surfaces with a dimension of 16384.

2017-01-24 Thread Nanley Chery
On Tue, Jan 24, 2017 at 03:32:28PM -0800, Kenneth Graunke wrote:
> I hadn't bothered to set this bit because I figured it would just
> paper over us getting the rectangle wrong.  But it turns out that
> there is a legitimate reason to use it, so let's do so.
> 
> The alternative would be to chop up 16k clears to multiple 8k clears,
> which is pointlessly painful.
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/gen8_depth_state.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/gen8_depth_state.c 
> b/src/mesa/drivers/dri/i965/gen8_depth_state.c
> index ec296698267..de5a16e91bf 100644
> --- a/src/mesa/drivers/dri/i965/gen8_depth_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_depth_state.c
> @@ -477,6 +477,17 @@ gen8_hiz_exec(struct brw_context *brw, struct 
> intel_mipmap_tree *mt,
>break;
> case BLORP_HIZ_OP_DEPTH_CLEAR:
>dw1 |= GEN8_WM_HZ_DEPTH_CLEAR;
> +
> +  /* The "Clear Rectangle X Max" (and Y Max) fields are exclusive,
> +   * rather than inclusive, and limited to 16383.  This means that
> +   * for a 16384x16384 render target, we would miss the last pixel.

Perhaps you meant to say that we'd miss the last pixels (plural) on the
far edges? The comment gets the point across nonetheless.

This patch is,
Reviewed-by: Nanley Chery 

> +   *
> +   * To work around this, we have to set the "Full Surface Depth
> +   * and Stencil Clear" bit.  We can do this in all cases because
> +   * we always clear the full rectangle anyway.  We'll need to
> +   * change this if we ever add scissored clear support.
> +   */
> +  dw1 |= GEN8_WM_HZ_FULL_SURFACE_DEPTH_CLEAR;
>break;
> case BLORP_HIZ_OP_NONE:
>unreachable("Should not get here.");
> -- 
> 2.11.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix fast depth clears for surfaces with a dimension of 16384.

2017-01-24 Thread Anuj Phogat
On Tue, Jan 24, 2017 at 3:32 PM, Kenneth Graunke  wrote:
> I hadn't bothered to set this bit because I figured it would just
> paper over us getting the rectangle wrong.  But it turns out that
> there is a legitimate reason to use it, so let's do so.
>
> The alternative would be to chop up 16k clears to multiple 8k clears,
> which is pointlessly painful.
>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/gen8_depth_state.c | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/gen8_depth_state.c 
> b/src/mesa/drivers/dri/i965/gen8_depth_state.c
> index ec296698267..de5a16e91bf 100644
> --- a/src/mesa/drivers/dri/i965/gen8_depth_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_depth_state.c
> @@ -477,6 +477,17 @@ gen8_hiz_exec(struct brw_context *brw, struct 
> intel_mipmap_tree *mt,
>break;
> case BLORP_HIZ_OP_DEPTH_CLEAR:
>dw1 |= GEN8_WM_HZ_DEPTH_CLEAR;
> +
> +  /* The "Clear Rectangle X Max" (and Y Max) fields are exclusive,
> +   * rather than inclusive, and limited to 16383.  This means that
> +   * for a 16384x16384 render target, we would miss the last pixel.
> +   *
> +   * To work around this, we have to set the "Full Surface Depth
> +   * and Stencil Clear" bit.  We can do this in all cases because
> +   * we always clear the full rectangle anyway.  We'll need to
> +   * change this if we ever add scissored clear support.
> +   */
> +  dw1 |= GEN8_WM_HZ_FULL_SURFACE_DEPTH_CLEAR;
>break;
> case BLORP_HIZ_OP_NONE:
>unreachable("Should not get here.");
> --
> 2.11.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Verified the restriction from PRM. Patch looks good to me.
Reviewed-by: Anuj Phogat 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] glsl/ir_builder: Add rcp builder.

2017-01-24 Thread Ian Romanick
This patch is

Reviewed-by: Ian Romanick 

Next time someone asks for a newbie task, we should have ir_builder be
generated from ir_expression_operation.py.

On 01/24/2017 03:26 PM, Francisco Jerez wrote:
> ---
>  src/compiler/glsl/ir_builder.cpp | 6 ++
>  src/compiler/glsl/ir_builder.h   | 1 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/src/compiler/glsl/ir_builder.cpp 
> b/src/compiler/glsl/ir_builder.cpp
> index 0cee856..8d61533 100644
> --- a/src/compiler/glsl/ir_builder.cpp
> +++ b/src/compiler/glsl/ir_builder.cpp
> @@ -315,6 +315,12 @@ exp(operand a)
>  }
>  
>  ir_expression *
> +rcp(operand a)
> +{
> +   return expr(ir_unop_rcp, a);
> +}
> +
> +ir_expression *
>  rsq(operand a)
>  {
> return expr(ir_unop_rsq, a);
> diff --git a/src/compiler/glsl/ir_builder.h b/src/compiler/glsl/ir_builder.h
> index 5ee9412..ff1ff70 100644
> --- a/src/compiler/glsl/ir_builder.h
> +++ b/src/compiler/glsl/ir_builder.h
> @@ -148,6 +148,7 @@ ir_expression *neg(operand a);
>  ir_expression *sin(operand a);
>  ir_expression *cos(operand a);
>  ir_expression *exp(operand a);
> +ir_expression *rcp(operand a);
>  ir_expression *rsq(operand a);
>  ir_expression *sqrt(operand a);
>  ir_expression *log(operand a);
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] [swr] Update fs texture & sampler state logic

2017-01-24 Thread Cherniak, Bruce
Reviewed-by: Bruce Cherniak 

> On Jan 24, 2017, at 5:27 PM, George Kyriazis  
> wrote:
> 
> In swr_update_derived() update texture and sampler state on a new fragment
> shader.  GALLIUM_HUD can update fs using a previously bound texture and
> sampler.
> ---
> src/gallium/drivers/swr/swr_state.cpp | 7 +--
> 1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/swr_state.cpp 
> b/src/gallium/drivers/swr/swr_state.cpp
> index 41e0356..f1f4963 100644
> --- a/src/gallium/drivers/swr/swr_state.cpp
> +++ b/src/gallium/drivers/swr/swr_state.cpp
> @@ -1283,7 +1283,8 @@ swr_update_derived(struct pipe_context *pipe,
>   SwrSetPixelShaderState(ctx->swrContext, );
> 
>   /* JIT sampler state */
> -  if (ctx->dirty & SWR_NEW_SAMPLER) {
> +  if (ctx->dirty & (SWR_NEW_SAMPLER |
> +SWR_NEW_FS)) {
>  swr_update_sampler_state(ctx,
>   PIPE_SHADER_FRAGMENT,
>   key.nr_samplers,
> @@ -1291,7 +1292,9 @@ swr_update_derived(struct pipe_context *pipe,
>   }
> 
>   /* JIT sampler view state */
> -  if (ctx->dirty & (SWR_NEW_SAMPLER_VIEW | SWR_NEW_FRAMEBUFFER)) {
> +  if (ctx->dirty & (SWR_NEW_SAMPLER_VIEW |
> +SWR_NEW_FRAMEBUFFER |
> +SWR_NEW_FS)) {
>  swr_update_texture_state(ctx,
>   PIPE_SHADER_FRAGMENT,
>   key.nr_sampler_views,
> -- 
> 2.10.0.windows.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] vulkan/wsi/wayland: Handle VK_INCOMPLETE for GetPresentModes

2017-01-24 Thread Jason Ekstrand
---
 src/vulkan/wsi/wsi_common_wayland.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/vulkan/wsi/wsi_common_wayland.c 
b/src/vulkan/wsi/wsi_common_wayland.c
index d745413..04cea97 100644
--- a/src/vulkan/wsi/wsi_common_wayland.c
+++ b/src/vulkan/wsi/wsi_common_wayland.c
@@ -443,11 +443,13 @@ wsi_wl_surface_get_present_modes(VkIcdSurfaceBase 
*surface,
   return VK_SUCCESS;
}
 
-   assert(*pPresentModeCount >= ARRAY_SIZE(present_modes));
+   *pPresentModeCount = MIN2(*pPresentModeCount, ARRAY_SIZE(present_modes));
typed_memcpy(pPresentModes, present_modes, *pPresentModeCount);
-   *pPresentModeCount = ARRAY_SIZE(present_modes);
 
-   return VK_SUCCESS;
+   if (*pPresentModeCount < ARRAY_SIZE(present_modes))
+  return VK_INCOMPLETE;
+   else
+  return VK_SUCCESS;
 }
 
 VkResult wsi_create_wl_surface(const VkAllocationCallbacks *pAllocator,
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] vulkan/wsi/wayland: Handle VK_INCOMPLETE for GetFormats

2017-01-24 Thread Jason Ekstrand
---
 src/vulkan/wsi/wsi_common_wayland.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/src/vulkan/wsi/wsi_common_wayland.c 
b/src/vulkan/wsi/wsi_common_wayland.c
index 687ac9c..d745413 100644
--- a/src/vulkan/wsi/wsi_common_wayland.c
+++ b/src/vulkan/wsi/wsi_common_wayland.c
@@ -409,25 +409,27 @@ wsi_wl_surface_get_formats(VkIcdSurfaceBase *icd_surface,
if (!display)
   return VK_ERROR_OUT_OF_HOST_MEMORY;
 
-   uint32_t count = u_vector_length(>formats);
-
if (pSurfaceFormats == NULL) {
-  *pSurfaceFormatCount = count;
+  *pSurfaceFormatCount = u_vector_length(>formats);
   return VK_SUCCESS;
}
 
-   assert(*pSurfaceFormatCount >= count);
-   *pSurfaceFormatCount = count;
-
+   uint32_t count = 0;
VkFormat *f;
u_vector_foreach(f, >formats) {
-  *(pSurfaceFormats++) = (VkSurfaceFormatKHR) {
+  if (count == *pSurfaceFormatCount)
+ return VK_INCOMPLETE;
+
+  pSurfaceFormats[count++] = (VkSurfaceFormatKHR) {
  .format = *f,
  /* TODO: We should get this from the compositor somehow */
  .colorSpace = VK_COLORSPACE_SRGB_NONLINEAR_KHR,
   };
}
 
+   assert(*pSurfaceFormatCount <= count);
+   *pSurfaceFormatCount = count;
+
return VK_SUCCESS;
 }
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/8] mesa/program: Translate csel operation from GLSL IR.

2017-01-24 Thread Ian Romanick
I'd swear that I wrote a nearly identical patch almost 2 years ago.
The work that depended on it fizzled, so I never sent it out.  The one
difference is I had the following comment:

  /* We assume that Boolean true and false are 1.0 and 0.0.  OPCODE_CMP
   * selects src1 if src0 is < 0, src2 otherwise.
   */

Either way, this patch is

Reviewed-by: Ian Romanick 

On 01/24/2017 03:26 PM, Francisco Jerez wrote:
> This will be used internally by the GLSL front-end in order to
> implement some built-in functions. Plumb it through MESA IR for
> back-ends that rely on this translation pass.
> ---
>  src/mesa/program/ir_to_mesa.cpp | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/program/ir_to_mesa.cpp b/src/mesa/program/ir_to_mesa.cpp
> index 0ae797f..5ff7304 100644
> --- a/src/mesa/program/ir_to_mesa.cpp
> +++ b/src/mesa/program/ir_to_mesa.cpp
> @@ -1360,13 +1360,17 @@ ir_to_mesa_visitor::visit(ir_expression *ir)
>emit(ir, OPCODE_LRP, result_dst, op[2], op[1], op[0]);
>break;
>  
> +   case ir_triop_csel:
> +  op[0].negate = ~op[0].negate;
> +  emit(ir, OPCODE_CMP, result_dst, op[0], op[1], op[2]);
> +  break;
> +
> case ir_binop_vector_extract:
> case ir_triop_fma:
> case ir_triop_bitfield_extract:
> case ir_triop_vector_insert:
> case ir_quadop_bitfield_insert:
> case ir_binop_ldexp:
> -   case ir_triop_csel:
> case ir_binop_carry:
> case ir_binop_borrow:
> case ir_binop_imul_high:
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/37] glsl: add initial implementation of shader cache

2017-01-24 Thread Eric Anholt
Timothy Arceri  writes:

> From: Timothy Arceri 
>
> This uses disk_cache.c to write out a serialization of various
> state that's required in order to successfully load and use a
> binary written out by a drivers backend, this state is referred to as
> "metadata" throughout the implementation.
>
> This initial version is intended to work with vertex and fragment
> shader stages only.

This is really interesting.  I was definitely expecting that the cache
at this level would be a map from ([sha1s of shader source], mesa
version, compiler options, other linker inputs) -> ([compiled GLSL IR
shaders], linker metadata output).  The advantage you seem to be going
for is to not have GLSL IR ever present in memory, which would be pretty
cool.  I'm really curious to see how this would work out for a gallium
driver.

Could you extend the file's doxygen comment to cover some of these
design decisions?

Also, I think in this series you've missed having the
gl_shader_compiler_options options in the shader key, which I believe
might affect the compiled metadata output.  Other than that, will
gallium vs i965 have different GLSL IR passes being run at the
CompileShader or LinkShader stages before we write to disk?  Will we
need the driver's name to be in the key, maybe?


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/37] util: add a disk_cache_remove() function

2017-01-24 Thread Timothy Arceri
On Tue, 2017-01-24 at 15:54 -0800, Eric Anholt wrote:
> Timothy Arceri  writes:
> 
> > From: Timothy Arceri 
> > 
> > This will be used to remove cache items created with old versions
> > of Mesa or other invalid cache items from the cache.
> 
> I'm not convinced that removing the item from cache when we get a hit
> on
> everything in the key except for Mesa version is the right way to
> go.  I
> think we should just be hashing the Mesa version in the key so that
> we
> don't hit on mismatched versions.  Then we wouldn't thrash our cache
> when we're, say, checking out around different versions of Mesa and
> re-pigliting things.

I agree. I mention this problem in the cover letter, it's going to take
some reworking so I was hoping to fix it in a follow-up.

The plan is to create directory structures like so:

Mesa-17.0.0/i965-BDW/
Mesa-17.1.0/i965-BDW/

This will allow us to just delete and entire directory if we are
hitting the cache limit and also easily allows third parties to install
precompiled shaders in those dirs.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: Fix copy-and-paste bug in _mesa_(Program|)Uniform[1234](i|ui)64vARB functions

2017-01-24 Thread Ian Romanick
From: Ian Romanick 

All of the functions were passing 1 to _mesa_uniform instead of passing
count.

Fixes 16 unsed parameter warnings like:

main/uniforms.c: In function ‘_mesa_Uniform1i64vARB’:
main/uniforms.c:1692:47: warning: unused parameter ‘count’ [-Wunused-parameter]
 _mesa_Uniform1i64vARB(GLint location, GLsizei count, const GLint64 *value)
   ^

This is why I build with extra warnings enabled.  Unfortunately, there
are so many unused parameter warnings in Mesa that I didn't notice these
added warnings for over 6 months. :(

Signed-off-by: Ian Romanick 
---
 src/mesa/main/uniforms.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/src/mesa/main/uniforms.c b/src/mesa/main/uniforms.c
index 29a1155..3b645cb 100644
--- a/src/mesa/main/uniforms.c
+++ b/src/mesa/main/uniforms.c
@@ -1683,28 +1683,28 @@ void GLAPIENTRY
 _mesa_Uniform1i64vARB(GLint location, GLsizei count, const GLint64 *value)
 {
GET_CURRENT_CONTEXT(ctx);
-   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, 1, value, 
GLSL_TYPE_INT64, 1);
+   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, count, value, 
GLSL_TYPE_INT64, 1);
 }
 
 void GLAPIENTRY
 _mesa_Uniform2i64vARB(GLint location,  GLsizei count, const GLint64 *value)
 {
GET_CURRENT_CONTEXT(ctx);
-   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, 1, value, 
GLSL_TYPE_INT64, 2);
+   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, count, value, 
GLSL_TYPE_INT64, 2);
 }
 
 void GLAPIENTRY
 _mesa_Uniform3i64vARB(GLint location,  GLsizei count, const GLint64 *value)
 {
GET_CURRENT_CONTEXT(ctx);
-   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, 1, value, 
GLSL_TYPE_INT64, 3);
+   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, count, value, 
GLSL_TYPE_INT64, 3);
 }
 
 void GLAPIENTRY
 _mesa_Uniform4i64vARB(GLint location,  GLsizei count, const GLint64 *value)
 {
GET_CURRENT_CONTEXT(ctx);
-   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, 1, value, 
GLSL_TYPE_INT64, 4);
+   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, count, value, 
GLSL_TYPE_INT64, 4);
 }
 
 void GLAPIENTRY
@@ -1751,28 +1751,28 @@ void GLAPIENTRY
 _mesa_Uniform1ui64vARB(GLint location,  GLsizei count, const GLuint64 *value)
 {
GET_CURRENT_CONTEXT(ctx);
-   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, 1, value, 
GLSL_TYPE_UINT64, 1);
+   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, count, value, 
GLSL_TYPE_UINT64, 1);
 }
 
 void GLAPIENTRY
 _mesa_Uniform2ui64vARB(GLint location,  GLsizei count, const GLuint64 *value)
 {
GET_CURRENT_CONTEXT(ctx);
-   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, 1, value, 
GLSL_TYPE_UINT64, 2);
+   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, count, value, 
GLSL_TYPE_UINT64, 2);
 }
 
 void GLAPIENTRY
 _mesa_Uniform3ui64vARB(GLint location,  GLsizei count, const GLuint64 *value)
 {
GET_CURRENT_CONTEXT(ctx);
-   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, 1, value, 
GLSL_TYPE_UINT64, 3);
+   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, count, value, 
GLSL_TYPE_UINT64, 3);
 }
 
 void GLAPIENTRY
 _mesa_Uniform4ui64vARB(GLint location,  GLsizei count, const GLuint64 *value)
 {
GET_CURRENT_CONTEXT(ctx);
-   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, 1, value, 
GLSL_TYPE_UINT64, 4);
+   _mesa_uniform(ctx, ctx->_Shader->ActiveProgram, location, count, value, 
GLSL_TYPE_UINT64, 4);
 }
 
 /* DSA entrypoints */
@@ -1835,7 +1835,7 @@ _mesa_ProgramUniform1i64vARB(GLuint program, GLint 
location, GLsizei count, cons
struct gl_shader_program *shProg =
   _mesa_lookup_shader_program_err(ctx, program,
   "glProgramUniform1i64vARB");
-   _mesa_uniform(ctx, shProg, location, 1, value, GLSL_TYPE_INT64, 1);
+   _mesa_uniform(ctx, shProg, location, count, value, GLSL_TYPE_INT64, 1);
 }
 
 void GLAPIENTRY
@@ -1845,7 +1845,7 @@ _mesa_ProgramUniform2i64vARB(GLuint program, GLint 
location,  GLsizei count, con
struct gl_shader_program *shProg =
   _mesa_lookup_shader_program_err(ctx, program,
   "glProgramUniform2i64vARB");
-   _mesa_uniform(ctx, shProg, location, 1, value, GLSL_TYPE_INT64, 2);
+   _mesa_uniform(ctx, shProg, location, count, value, GLSL_TYPE_INT64, 2);
 }
 
 void GLAPIENTRY
@@ -1855,7 +1855,7 @@ _mesa_ProgramUniform3i64vARB(GLuint program, GLint 
location,  GLsizei count, con
struct gl_shader_program *shProg =
   _mesa_lookup_shader_program_err(ctx, program,
   "glProgramUniform3i64vARB");
-   _mesa_uniform(ctx, shProg, location, 1, value, GLSL_TYPE_INT64, 3);
+   _mesa_uniform(ctx, shProg, location, count, value, GLSL_TYPE_INT64, 3);
 }
 
 void GLAPIENTRY
@@ -1865,7 +1865,7 @@ _mesa_ProgramUniform4i64vARB(GLuint 

Re: [Mesa-dev] [PATCH] i965: Use a UW source type for CS_OPCODE_CS_TERMINATE.

2017-01-24 Thread Francisco Jerez
Matt Turner  writes:

> On Tue, Jan 24, 2017 at 2:18 PM, Kenneth Graunke  
> wrote:
>> SIMD16 compute shaders use a send(16) with mlen 1 for the EOT message,
>> using a source of g127 for the single register.  With a UD type, this
>> supposedly could read g128, which doesn't exist, causing the simulator
>> to get cranky.  Use a UW type to avoid this.
>
> Bizarre. Is the hardware this stupid, or just the simulator?

I doubt the hardware would care, but I guess it wouldn't hurt to do this
in order to make the simulator happy.  How about we fix this in the
generator instead for consistency with the other send-message UW
register retyping workarounds?  Assuming you apply the same fix in
fs_generator::generate_cs_terminate instead patch is:

Reviewed-by: Francisco Jerez 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/37] glsl: Switch to disable-by-default for the GLSL shader cache

2017-01-24 Thread Eric Anholt
Timothy Arceri  writes:

> From: Carl Worth 
>
> The shader cache is expected to be developed incrementally over a
> fairly long series of commits. For that period of instability, we
> require users to opt into the shader cache by setting:
>
>   MESA_GLSL_CACHE_ENABLE=1
>
> In the future, when the shader cache is complete, we can revert this
> commit so that the cache will be on by default.
>
> The user can always disable the cache with
> MESA_GLSL_CACHE_DISABLE=1. That functionality is not affected by this
> commit, (nor will it be affected by the future revert).
> ---
>  src/compiler/glsl/tests/cache_test.c | 5 +
>  src/util/disk_cache.c| 7 +++
>  2 files changed, 12 insertions(+)
>
> diff --git a/src/compiler/glsl/tests/cache_test.c 
> b/src/compiler/glsl/tests/cache_test.c
> index 0ef05aa..8547141 100644
> --- a/src/compiler/glsl/tests/cache_test.c
> +++ b/src/compiler/glsl/tests/cache_test.c
> @@ -388,6 +388,11 @@ main(void)
>  #ifdef ENABLE_SHADER_CACHE
> int err;
>  
> +   /* While the shader cache is still experimental, this variable must
> +* be set or the cache does nothing.
> +*/
> +   setenv("MESA_GLSL_CACHE_ENABLE", "1", 1);
> +
> test_disk_cache_create();
>  
> test_put_and_get();
> diff --git a/src/util/disk_cache.c b/src/util/disk_cache.c
> index 6de608c..dec09e0 100644
> --- a/src/util/disk_cache.c
> +++ b/src/util/disk_cache.c
> @@ -151,6 +151,13 @@ disk_cache_create(void)
> if (getenv("MESA_GLSL_CACHE_DISABLE"))
>goto fail;
>  
> +   /* As a temporary measure, (while the shader cache is under
> +* development, and known to not be fully function), also require

"functional"

> +* the MESA_GLSL_CACHE_ENABLE variable to be set.
> +*/
> +   if (! getenv ("MESA_GLSL_CACHE_ENABLE"))
> +  goto fail;

cworth-style whitespace to be fixed here.

Other than that, 1-5 are:

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/37] util: add a disk_cache_remove() function

2017-01-24 Thread Eric Anholt
Timothy Arceri  writes:

> From: Timothy Arceri 
>
> This will be used to remove cache items created with old versions
> of Mesa or other invalid cache items from the cache.

I'm not convinced that removing the item from cache when we get a hit on
everything in the key except for Mesa version is the right way to go.  I
think we should just be hashing the Mesa version in the key so that we
don't hit on mismatched versions.  Then we wouldn't thrash our cache
when we're, say, checking out around different versions of Mesa and
re-pigliting things.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Use a UW source type for CS_OPCODE_CS_TERMINATE.

2017-01-24 Thread Matt Turner
On Tue, Jan 24, 2017 at 2:18 PM, Kenneth Graunke  wrote:
> SIMD16 compute shaders use a send(16) with mlen 1 for the EOT message,
> using a source of g127 for the single register.  With a UD type, this
> supposedly could read g128, which doesn't exist, causing the simulator
> to get cranky.  Use a UW type to avoid this.

Bizarre. Is the hardware this stupid, or just the simulator?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/7] i965/blorp: Remove a pile of blorp_blit restrictions

2017-01-24 Thread Jason Ekstrand
Previously, blorp could only blit into something that was renderable.
Thanks to recent additions to blorp, it can now blit into basically
anything so long as it isn't compressed.
---
 src/mesa/drivers/dri/i965/brw_blorp.c | 64 +--
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
b/src/mesa/drivers/dri/i965/brw_blorp.c
index 3a7cf84..624b5e8 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -274,6 +274,26 @@ blorp_surf_for_miptree(struct brw_context *brw,
   (surf->aux_addr.buffer == NULL));
 }
 
+static bool
+brw_blorp_supports_dst_format(struct brw_context *brw, mesa_format format)
+{
+   /* If it's renderable, it's definitely supported. */
+   if (brw->format_supported_as_render_target[format])
+  return true;
+
+   /* BLORP can't compress anything */
+   if (_mesa_is_format_compressed(format))
+  return false;
+
+   /* No exotic formats such as GL_LUMINANCE_ALPHA */
+   if (_mesa_get_format_bits(format, GL_RED_BITS) == 0 &&
+   _mesa_get_format_bits(format, GL_DEPTH_BITS) == 0 &&
+   _mesa_get_format_bits(format, GL_STENCIL_BITS) == 0)
+  return false;
+
+   return true;
+}
+
 static enum isl_format
 brw_blorp_to_isl_format(struct brw_context *brw, mesa_format format,
 bool is_render_target)
@@ -291,15 +311,20 @@ brw_blorp_to_isl_format(struct brw_context *brw, 
mesa_format format,
   return ISL_FORMAT_R32_FLOAT;
case MESA_FORMAT_Z_UNORM16:
   return ISL_FORMAT_R16_UNORM;
-   default: {
+   default:
   if (is_render_target) {
- assert(brw->format_supported_as_render_target[format]);
- return brw->render_target_format[format];
+ assert(brw_blorp_supports_dst_format(brw, format));
+ if (brw->format_supported_as_render_target[format]) {
+return brw->render_target_format[format];
+ } else {
+return brw_format_for_mesa_format(format);
+ }
   } else {
+ /* Some destinations (is_render_target == true) are supported by
+  * blorp even though we technically can't render to them.
+  */
  return brw_format_for_mesa_format(format);
   }
-  break;
-   }
}
 }
 
@@ -540,8 +565,6 @@ try_blorp_blit(struct brw_context *brw,
/* Find buffers */
struct intel_renderbuffer *src_irb;
struct intel_renderbuffer *dst_irb;
-   struct intel_mipmap_tree *src_mt;
-   struct intel_mipmap_tree *dst_mt;
switch (buffer_bit) {
case GL_COLOR_BUFFER_BIT:
   src_irb = intel_renderbuffer(read_fb->_ColorReadBuffer);
@@ -561,16 +584,6 @@ try_blorp_blit(struct brw_context *brw,
  intel_renderbuffer(read_fb->Attachment[BUFFER_DEPTH].Renderbuffer);
   dst_irb =
  intel_renderbuffer(draw_fb->Attachment[BUFFER_DEPTH].Renderbuffer);
-  src_mt = find_miptree(buffer_bit, src_irb);
-  dst_mt = find_miptree(buffer_bit, dst_irb);
-
-  /* We can't handle format conversions between Z24 and other formats
-   * since we have to lie about the surface format. See the comments in
-   * brw_blorp_surface_info::set().
-   */
-  if ((src_mt->format == MESA_FORMAT_Z24_UNORM_X8_UINT) !=
-  (dst_mt->format == MESA_FORMAT_Z24_UNORM_X8_UINT))
- return false;
 
   do_blorp_blit(brw, buffer_bit, src_irb, MESA_FORMAT_NONE,
 dst_irb, MESA_FORMAT_NONE, srcX0, srcY0,
@@ -627,21 +640,8 @@ brw_blorp_copytexsubimage(struct brw_context *brw,
if (brw->gen < 6)
   return false;
 
-   if (_mesa_get_format_base_format(src_rb->Format) !=
-   _mesa_get_format_base_format(dst_image->TexFormat)) {
-  return false;
-   }
-
-   /* We can't handle format conversions between Z24 and other formats since
-* we have to lie about the surface format.  See the comments in
-* brw_blorp_surface_info::set().
-*/
-   if ((src_mt->format == MESA_FORMAT_Z24_UNORM_X8_UINT) !=
-   (dst_mt->format == MESA_FORMAT_Z24_UNORM_X8_UINT)) {
-  return false;
-   }
-
-   if (!brw->format_supported_as_render_target[dst_image->TexFormat])
+   /* BLORP can't compress anything */
+   if (!brw_blorp_supports_dst_format(brw, dst_image->TexFormat))
   return false;
 
/* Source clipping shouldn't be necessary, since copytexsubimage (in
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/7] intel/blorp: Silently convert RGBX destination formats to RGBA

2017-01-24 Thread Jason Ekstrand
---
 src/intel/blorp/blorp_blit.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/intel/blorp/blorp_blit.c b/src/intel/blorp/blorp_blit.c
index b964224..4d8942e 100644
--- a/src/intel/blorp/blorp_blit.c
+++ b/src/intel/blorp/blorp_blit.c
@@ -1883,6 +1883,10 @@ try_blorp_blit(struct blorp_batch *batch,
 
   wm_prog_key->dst_rgb = true;
   wm_prog_key->need_dst_offset = true;
+   } else if (isl_format_is_rgbx(params->dst.view.format)) {
+  /* We can handle RGBX formats easily enough by treating them as RGBA */
+  params->dst.view.format =
+ isl_format_rgbx_to_rgba(params->dst.view.format);
} else if (params->dst.view.format == ISL_FORMAT_R24_UNORM_X8_TYPELESS) {
   wm_prog_key->dst_format = params->dst.view.format;
   params->dst.view.format = ISL_FORMAT_R32_UNORM;
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/7] anv: Allow blitting to/from any supported format

2017-01-24 Thread Jason Ekstrand
Now that blorp handles all the cases, why not?
---
 src/intel/vulkan/anv_formats.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/intel/vulkan/anv_formats.c b/src/intel/vulkan/anv_formats.c
index f4183f0..2a924d5 100644
--- a/src/intel/vulkan/anv_formats.c
+++ b/src/intel/vulkan/anv_formats.c
@@ -319,8 +319,7 @@ get_image_format_properties(const struct gen_device_info 
*devinfo,
 
VkFormatFeatureFlags flags = 0;
if (isl_format_supports_sampling(devinfo, format.isl_format)) {
-  flags |= VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT |
-   VK_FORMAT_FEATURE_BLIT_SRC_BIT;
+  flags |= VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT;
 
   if (isl_format_supports_filtering(devinfo, format.isl_format))
  flags |= VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT;
@@ -332,8 +331,7 @@ get_image_format_properties(const struct gen_device_info 
*devinfo,
 */
if (isl_format_supports_rendering(devinfo, format.isl_format) &&
format.swizzle.a == ISL_CHANNEL_SELECT_ALPHA) {
-  flags |= VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BIT |
-   VK_FORMAT_FEATURE_BLIT_DST_BIT;
+  flags |= VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BIT;
 
   if (isl_format_supports_alpha_blending(devinfo, format.isl_format))
  flags |= VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BLEND_BIT;
@@ -349,7 +347,9 @@ get_image_format_properties(const struct gen_device_info 
*devinfo,
   flags |= VK_FORMAT_FEATURE_STORAGE_IMAGE_ATOMIC_BIT;
 
if (flags) {
-  flags |= VK_FORMAT_FEATURE_TRANSFER_SRC_BIT_KHR |
+  flags |= VK_FORMAT_FEATURE_BLIT_SRC_BIT |
+   VK_FORMAT_FEATURE_BLIT_DST_BIT |
+   VK_FORMAT_FEATURE_TRANSFER_SRC_BIT_KHR |
VK_FORMAT_FEATURE_TRANSFER_DST_BIT_KHR;
}
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/7] intel/blorp: Handle more exotic destination formats

2017-01-24 Thread Jason Ekstrand
This commit adds support for using both R24_UNORM_X8_TYPELESS and
R9G9B9E5_SHAREDEXP as destination formats even though the hardware does
not support rendering to them.  This is done by using a different format
and emitting shader code to fake it the rest of the way.
---
 src/intel/blorp/blorp_blit.c | 92 
 src/intel/blorp/blorp_priv.h |  6 +++
 2 files changed, 98 insertions(+)

diff --git a/src/intel/blorp/blorp_blit.c b/src/intel/blorp/blorp_blit.c
index fc76fd4..b964224 100644
--- a/src/intel/blorp/blorp_blit.c
+++ b/src/intel/blorp/blorp_blit.c
@@ -26,6 +26,7 @@
 #include "blorp_priv.h"
 #include "brw_meta_util.h"
 
+#include "util/format_rgb9e5.h"
 /* header-only include needed for _mesa_unorm_to_float and friends. */
 #include "mesa/main/format_utils.h"
 
@@ -916,6 +917,88 @@ bit_cast_color(struct nir_builder *b, nir_ssa_def *color,
}
 }
 
+static nir_ssa_def *
+convert_color(struct nir_builder *b, nir_ssa_def *color,
+  const struct brw_blorp_blit_prog_key *key)
+{
+   /* All of our color conversions end up generating a single-channel color
+* value that we need to write out.
+*/
+   nir_ssa_def *value;
+
+   if (key->dst_format == ISL_FORMAT_R24_UNORM_X8_TYPELESS) {
+  /* The destination image is bound as R32_UNORM but the data needs to be
+   * in R24_UNORM_X8_TYPELESS.  The bottom 24 are the actual data and the
+   * top 8 need to be zero.  We can accomplish this by simply multiplying
+   * by a factor to scale things down.
+   */
+  float factor = (float)((1 << 24) - 1) / (float)UINT32_MAX;
+  value = nir_fmul(b, nir_fsat(b, nir_channel(b, color, 0)),
+  nir_imm_float(b, factor));
+   } else if (key->dst_format == ISL_FORMAT_R9G9B9E5_SHAREDEXP) {
+  /* See also float3_to_rgb9e5 */
+
+  /* First, we need to clamp it to range. */
+  nir_ssa_def *clamped = nir_fmin(b, color, nir_imm_float(b, MAX_RGB9E5));
+
+  /* Get rid of negatives and NaN */
+  clamped = nir_bcsel(b, nir_ult(b, nir_imm_int(b, 0x7f80), color),
+ nir_imm_float(b, 0), clamped);
+
+  /* maxrgb.u = MAX3(rc.u, gc.u, bc.u); */
+  nir_ssa_def *maxu = nir_umax(b, nir_channel(b, clamped, 0),
+  nir_umax(b, nir_channel(b, clamped, 1),
+  nir_channel(b, clamped, 2)));
+
+  /* maxrgb.u += maxrgb.u & (1 << (23-9)); */
+  maxu = nir_iadd(b, maxu, nir_iand(b, maxu, nir_imm_int(b, 1 << 14)));
+
+  /* exp_shared = MAX2((maxrgb.u >> 23), -RGB9E5_EXP_BIAS - 1 + 127) +
+   *  1 + RGB9E5_EXP_BIAS - 127;
+   */
+  nir_ssa_def *exp_shared =
+ nir_iadd(b, nir_umax(b, nir_ushr(b, maxu, nir_imm_int(b, 23)),
+ nir_imm_int(b, -RGB9E5_EXP_BIAS - 1 + 127)),
+ nir_imm_int(b, 1 + RGB9E5_EXP_BIAS - 127));
+
+  /* revdenom_biasedexp = 127 - (exp_shared - RGB9E5_EXP_BIAS -
+   * RGB9E5_MANTISSA_BITS) + 1;
+   */
+  nir_ssa_def *revdenom_biasedexp =
+ nir_isub(b, nir_imm_int(b, 127 + RGB9E5_EXP_BIAS +
+RGB9E5_MANTISSA_BITS + 1),
+ exp_shared);
+
+  /* revdenom.u = revdenom_biasedexp << 23; */
+  nir_ssa_def *revdenom =
+ nir_ishl(b, revdenom_biasedexp, nir_imm_int(b, 23));
+
+  /* rm = (int) (rc.f * revdenom.f);
+   * gm = (int) (gc.f * revdenom.f);
+   * bm = (int) (bc.f * revdenom.f);
+   */
+  nir_ssa_def *mantissa =
+ nir_f2i(b, nir_fmul(b, clamped, revdenom));
+
+  /* rm = (rm & 1) + (rm >> 1);
+   * gm = (gm & 1) + (gm >> 1);
+   * bm = (bm & 1) + (bm >> 1);
+   */
+  mantissa = nir_iadd(b, nir_iand(b, mantissa, nir_imm_int(b, 1)),
+ nir_ushr(b, mantissa, nir_imm_int(b, 1)));
+
+  value = nir_channel(b, mantissa, 0);
+  value = nir_mask_shift_or(b, value, nir_channel(b, mantissa, 1), ~0, 9);
+  value = nir_mask_shift_or(b, value, nir_channel(b, mantissa, 2), ~0, 18);
+  value = nir_mask_shift_or(b, value, exp_shared, ~0, 27);
+   } else {
+  unreachable("Unsupported format conversion");
+   }
+
+   nir_ssa_def *u = nir_ssa_undef(b, 1, 32);
+   return nir_vec4(b, value, u, u, u);
+}
+
 /**
  * Generator for WM programs used in BLORP blits.
  *
@@ -1274,6 +1357,9 @@ brw_blorp_build_nir_shader(struct blorp_context *blorp, 
void *mem_ctx,
if (key->dst_bpc != key->src_bpc)
   color = bit_cast_color(, color, key);
 
+   if (key->dst_format)
+  color = convert_color(, color, key);
+
if (key->dst_rgb) {
   /* The destination image is bound as a red texture three times as wide
* as the actual image.  Our shader is effectively running one color
@@ -1797,6 +1883,12 @@ try_blorp_blit(struct blorp_batch *batch,
 
   wm_prog_key->dst_rgb = true;
   wm_prog_key->need_dst_offset = true;
+   } 

[Mesa-dev] [PATCH 5/7] intel/blorp: Support the RGB workaround on more formats

2017-01-24 Thread Jason Ekstrand
Previously we only supported UINT formats because that's what blorp_copy
required.  If we want to use it in blorp_blit, however, we need to
support everything.
---
 src/intel/blorp/blorp_blit.c | 73 
 1 file changed, 53 insertions(+), 20 deletions(-)

diff --git a/src/intel/blorp/blorp_blit.c b/src/intel/blorp/blorp_blit.c
index 4d8942e..ff8352d 100644
--- a/src/intel/blorp/blorp_blit.c
+++ b/src/intel/blorp/blorp_blit.c
@@ -1639,6 +1639,56 @@ struct blt_coords {
struct blt_axis x, y;
 };
 
+static enum isl_format
+get_red_format_for_rgb_format(enum isl_format format)
+{
+   const struct isl_format_layout *fmtl = isl_format_get_layout(format);
+
+   switch (fmtl->channels.r.bits) {
+   case 8:
+  switch (fmtl->channels.r.type) {
+  case ISL_UNORM:
+ return ISL_FORMAT_R8_UNORM;
+  case ISL_SNORM:
+ return ISL_FORMAT_R8_SNORM;
+  case ISL_UINT:
+ return ISL_FORMAT_R8_UINT;
+  case ISL_SINT:
+ return ISL_FORMAT_R8_SINT;
+  default:
+ unreachable("Invalid 8-bit RGB channel type");
+  }
+   case 16:
+  switch (fmtl->channels.r.type) {
+  case ISL_UNORM:
+ return ISL_FORMAT_R16_UNORM;
+  case ISL_SNORM:
+ return ISL_FORMAT_R16_SNORM;
+  case ISL_SFLOAT:
+ return ISL_FORMAT_R16_FLOAT;
+  case ISL_UINT:
+ return ISL_FORMAT_R16_UINT;
+  case ISL_SINT:
+ return ISL_FORMAT_R16_SINT;
+  default:
+ unreachable("Invalid 8-bit RGB channel type");
+  }
+   case 32:
+  switch (fmtl->channels.r.type) {
+  case ISL_SFLOAT:
+ return ISL_FORMAT_R32_FLOAT;
+  case ISL_UINT:
+ return ISL_FORMAT_R32_UINT;
+  case ISL_SINT:
+ return ISL_FORMAT_R32_SINT;
+  default:
+ unreachable("Invalid 8-bit RGB channel type");
+  }
+   default:
+  unreachable("Invalid number of red channel bits");
+   }
+}
+
 static void
 surf_fake_rgb_with_red(const struct isl_device *isl_dev,
struct brw_blorp_surface_info *info)
@@ -1648,26 +1698,9 @@ surf_fake_rgb_with_red(const struct isl_device *isl_dev,
info->surf.logical_level0_px.width *= 3;
info->surf.phys_level0_sa.width *= 3;
 
-   enum isl_format red_format;
-   switch (info->view.format) {
-   case ISL_FORMAT_R8G8B8_UNORM:
-  red_format = ISL_FORMAT_R8_UNORM;
-  break;
-   case ISL_FORMAT_R8G8B8_UINT:
-  red_format = ISL_FORMAT_R8_UINT;
-  break;
-   case ISL_FORMAT_R16G16B16_UNORM:
-  red_format = ISL_FORMAT_R16_UNORM;
-  break;
-   case ISL_FORMAT_R16G16B16_UINT:
-  red_format = ISL_FORMAT_R16_UINT;
-  break;
-   case ISL_FORMAT_R32G32B32_UINT:
-  red_format = ISL_FORMAT_R32_UINT;
-  break;
-   default:
-  unreachable("Invalid RGB copy destination format");
-   }
+   enum isl_format red_format =
+  get_red_format_for_rgb_format(info->view.format);
+
assert(isl_format_get_layout(red_format)->channels.r.type ==
   isl_format_get_layout(info->view.format)->channels.r.type);
assert(isl_format_get_layout(red_format)->channels.r.bits ==
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/7] blorp: Handle the RGB workaround more like other workarounds

2017-01-24 Thread Jason Ekstrand
The previous version was sort-of strapped on in that it just adjusted
the blit rectangle and trusted in the fact that we would use texelFetch
and round to the nearest integer to ensure that the component positions
matched.  This new version, while slightly more complicated, is more
accurate because all three components end up with exactly the same
dst_pos and so they will get interpolated and sampled at the same
texture coordinate.  This makes the workaround suitable for using with
scaled blits.
---
 src/intel/blorp/blorp_blit.c | 60 ++--
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/src/intel/blorp/blorp_blit.c b/src/intel/blorp/blorp_blit.c
index 111f1c1..fc76fd4 100644
--- a/src/intel/blorp/blorp_blit.c
+++ b/src/intel/blorp/blorp_blit.c
@@ -1138,6 +1138,20 @@ brw_blorp_build_nir_shader(struct blorp_context *blorp, 
void *mem_ctx,
   key->dst_layout);
}
 
+   nir_ssa_def *comp = NULL;
+   if (key->dst_rgb) {
+  /* The destination image is bound as a red texture three times as wide
+   * as the actual image.  Our shader is effectively running one color
+   * component at a time.  We need to save off the component and adjust
+   * the destination position.
+   */
+  assert(dst_pos->num_components == 2);
+  nir_ssa_def *dst_x = nir_channel(, dst_pos, 0);
+  comp = nir_umod(, dst_x, nir_imm_int(, 3));
+  dst_pos = nir_vec2(, nir_idiv(, dst_x, nir_imm_int(, 3)),
+ nir_channel(, dst_pos, 1));
+   }
+
/* Now (X, Y, S) = decode_msaa(dst_samples, detile(dst_tiling, offset)).
 *
 * That is: X, Y and S now contain the true coordinates and sample index of
@@ -1267,8 +1281,6 @@ brw_blorp_build_nir_shader(struct blorp_context *blorp, 
void *mem_ctx,
* from the source color and write that to destination red.
*/
   assert(dst_pos->num_components == 2);
-  nir_ssa_def *comp =
- nir_umod(, nir_channel(, dst_pos, 0), nir_imm_int(, 3));
 
   nir_ssa_def *color_component =
  nir_bcsel(, nir_ieq(, comp, nir_imm_int(, 0)),
@@ -1543,15 +1555,12 @@ struct blt_coords {
 
 static void
 surf_fake_rgb_with_red(const struct isl_device *isl_dev,
-   struct brw_blorp_surface_info *info,
-   uint32_t *x, uint32_t *width)
+   struct brw_blorp_surface_info *info)
 {
surf_convert_to_single_slice(isl_dev, info);
 
info->surf.logical_level0_px.width *= 3;
info->surf.phys_level0_sa.width *= 3;
-   *x *= 3;
-   *width *= 3;
 
enum isl_format red_format;
switch (info->view.format) {
@@ -1581,28 +1590,6 @@ surf_fake_rgb_with_red(const struct isl_device *isl_dev,
info->surf.format = info->view.format = red_format;
 }
 
-static void
-fake_dest_rgb_with_red(const struct isl_device *dev,
-   struct blorp_params *params,
-   struct brw_blorp_blit_prog_key *wm_prog_key,
-   struct blt_coords *coords)
-{
-   /* Handle RGB destinations for blorp_copy */
-   const struct isl_format_layout *dst_fmtl =
-  isl_format_get_layout(params->dst.surf.format);
-
-   if (dst_fmtl->bpb % 3 == 0) {
-  uint32_t dst_x = coords->x.dst0;
-  uint32_t dst_width = coords->x.dst1 - dst_x;
-  surf_fake_rgb_with_red(dev, >dst,
- _x, _width);
-  coords->x.dst0 = dst_x;
-  coords->x.dst1 = dst_x + dst_width;
-  wm_prog_key->dst_rgb = true;
-  wm_prog_key->need_dst_offset = true;
-   }
-}
-
 enum blit_shrink_status {
BLIT_NO_SHRINK = 0,
BLIT_WIDTH_SHRINK = 1,
@@ -1621,8 +1608,6 @@ try_blorp_blit(struct blorp_batch *batch,
 {
const struct gen_device_info *devinfo = batch->blorp->isl_dev->info;
 
-   fake_dest_rgb_with_red(batch->blorp->isl_dev, params, wm_prog_key, coords);
-
if (isl_format_has_sint_channel(params->src.view.format)) {
   wm_prog_key->texture_data_type = nir_type_int;
} else if (isl_format_has_uint_channel(params->src.view.format)) {
@@ -1799,6 +1784,21 @@ try_blorp_blit(struct blorp_batch *batch,
 
params->num_samples = params->dst.surf.samples;
 
+   if (isl_format_get_layout(params->dst.view.format)->bpb % 3 == 0) {
+  /* We can't render to  RGB formats natively because they aren't a
+   * power-of-two size.  Instead, we fake them by using a red format
+   * with the same channel type and size and emitting shader code to
+   * only write one channel at a time.
+   */
+  params->x0 *= 3;
+  params->x1 *= 3;
+
+  surf_fake_rgb_with_red(batch->blorp->isl_dev, >dst);
+
+  wm_prog_key->dst_rgb = true;
+  wm_prog_key->need_dst_offset = true;
+   }
+
if (params->src.tile_x_sa || params->src.tile_y_sa) {
   assert(wm_prog_key->need_src_offset);
   surf_get_intratile_offset_px(>src,
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list

[Mesa-dev] [PATCH 3/7] intel/isl: Add some helpers for working with RGBX formats

2017-01-24 Thread Jason Ekstrand
---
 src/intel/isl/isl.h| 11 +++
 src/intel/isl/isl_format.c | 32 
 2 files changed, 43 insertions(+)

diff --git a/src/intel/isl/isl.h b/src/intel/isl/isl.h
index 07368f9..9d5b372 100644
--- a/src/intel/isl/isl.h
+++ b/src/intel/isl/isl.h
@@ -1138,8 +1138,19 @@ isl_format_is_rgb(enum isl_format fmt)
   isl_format_layouts[fmt].channels.a.bits == 0;
 }
 
+static inline bool
+isl_format_is_rgbx(enum isl_format fmt)
+{
+   return isl_format_layouts[fmt].channels.r.bits > 0 &&
+  isl_format_layouts[fmt].channels.g.bits > 0 &&
+  isl_format_layouts[fmt].channels.b.bits > 0 &&
+  isl_format_layouts[fmt].channels.a.bits > 0 &&
+  isl_format_layouts[fmt].channels.a.type == ISL_VOID;
+}
+
 enum isl_format isl_format_rgb_to_rgba(enum isl_format rgb) ATTRIBUTE_CONST;
 enum isl_format isl_format_rgb_to_rgbx(enum isl_format rgb) ATTRIBUTE_CONST;
+enum isl_format isl_format_rgbx_to_rgba(enum isl_format rgb) ATTRIBUTE_CONST;
 
 bool isl_is_storage_image_format(enum isl_format fmt);
 
diff --git a/src/intel/isl/isl_format.c b/src/intel/isl/isl_format.c
index c8daece..8473285 100644
--- a/src/intel/isl/isl_format.c
+++ b/src/intel/isl/isl_format.c
@@ -623,3 +623,35 @@ isl_format_rgb_to_rgbx(enum isl_format rgb)
   return ISL_FORMAT_UNSUPPORTED;
}
 }
+
+enum isl_format
+isl_format_rgbx_to_rgba(enum isl_format rgbx)
+{
+   assert(isl_format_is_rgbx(rgbx));
+
+   switch (rgbx) {
+   case ISL_FORMAT_R32G32B32X32_FLOAT:
+  return ISL_FORMAT_R32G32B32A32_FLOAT;
+   case ISL_FORMAT_R16G16B16X16_UNORM:
+  return ISL_FORMAT_R16G16B16A16_UNORM;
+   case ISL_FORMAT_R16G16B16X16_FLOAT:
+  return ISL_FORMAT_R16G16B16A16_FLOAT;
+   case ISL_FORMAT_B8G8R8X8_UNORM:
+  return ISL_FORMAT_B8G8R8A8_UNORM;
+   case ISL_FORMAT_B8G8R8X8_UNORM_SRGB:
+  return ISL_FORMAT_B8G8R8A8_UNORM_SRGB;
+   case ISL_FORMAT_R8G8B8X8_UNORM:
+  return ISL_FORMAT_R8G8B8A8_UNORM;
+   case ISL_FORMAT_R8G8B8X8_UNORM_SRGB:
+  return ISL_FORMAT_R8G8B8A8_UNORM_SRGB;
+   case ISL_FORMAT_B10G10R10X2_UNORM:
+  return ISL_FORMAT_B10G10R10A2_UNORM;
+   case ISL_FORMAT_B5G5R5X1_UNORM:
+  return ISL_FORMAT_B5G5R5A1_UNORM;
+   case ISL_FORMAT_B5G5R5X1_UNORM_SRGB:
+  return ISL_FORMAT_B5G5R5A1_UNORM_SRGB;
+   default:
+  assert(!"Invalid RGBX format");
+  return rgbx;
+   }
+}
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/8] glsl: Rewrite atan2 implementation to fix accuracy and handling of zero/infinity.

2017-01-24 Thread Francisco Jerez
This addresses several issues of the current atan2 implementation:

 - Negative zero (and negative denorms which end up getting flushed to
   zero) isn't handled correctly by the current implementation.  The
   reason is that it does 'y >= 0' and 'x < 0' comparisons to decide
   on which side of the branch cut the argument is, which causes us to
   return incorrect results (off by up to 2π) for very small negative
   values.

 - There is a serious precision problem for x values of large enough
   magnitude introduced by the floating point division operation being
   implemented as a mul+rcp sequence.  This can lead to the quotient
   getting flushed to zero in some cases introducing an error of over
   8e6 ULP in the result -- Or in the most catastrophic case will
   cause us to return NaN instead of the correct value ±π/2 for y=±∞
   and x very large.  We can fix this easily by scaling down both
   arguments when the absolute value of the denominator goes above
   certain threshold.  The error of this atan2 implementation remains
   below 25 ULP in most of its domain except for a neighborhood of y=0
   where it reaches a maximum error of about 180 ULP.

 - It emits a bunch of instructions including no less than three
   if-else branches per scalar component that don't seem to get
   optimized out later on.  This implementation uses about 13% less
   instructions on Intel SKL hardware and doesn't emit any control
   flow instructions.
---
 src/compiler/glsl/builtin_functions.cpp | 82 ++---
 1 file changed, 46 insertions(+), 36 deletions(-)

diff --git a/src/compiler/glsl/builtin_functions.cpp 
b/src/compiler/glsl/builtin_functions.cpp
index 4a6c5af..fd59381 100644
--- a/src/compiler/glsl/builtin_functions.cpp
+++ b/src/compiler/glsl/builtin_functions.cpp
@@ -3560,44 +3560,54 @@ builtin_builder::_acos(const glsl_type *type)
 ir_function_signature *
 builtin_builder::_atan2(const glsl_type *type)
 {
-   ir_variable *vec_y = in_var(type, "vec_y");
-   ir_variable *vec_x = in_var(type, "vec_x");
-   MAKE_SIG(type, always_available, 2, vec_y, vec_x);
-
-   ir_variable *vec_result = body.make_temp(type, "vec_result");
-   ir_variable *r = body.make_temp(glsl_type::float_type, "r");
-   for (int i = 0; i < type->vector_elements; i++) {
-  ir_variable *y = body.make_temp(glsl_type::float_type, "y");
-  ir_variable *x = body.make_temp(glsl_type::float_type, "x");
-  body.emit(assign(y, swizzle(vec_y, i, 1)));
-  body.emit(assign(x, swizzle(vec_x, i, 1)));
-
-  /* If |x| >= 1.0e-8 * |y|: */
-  ir_if *outer_if =
- new(mem_ctx) ir_if(greater(abs(x), mul(imm(1.0e-8f), abs(y;
-
-  ir_factory outer_then(_if->then_instructions, mem_ctx);
-
-  /* Then...call atan(y/x) */
-  do_atan(outer_then, glsl_type::float_type, r, div(y, x));
-
-  /* ...and fix it up: */
-  ir_if *inner_if = new(mem_ctx) ir_if(less(x, imm(0.0f)));
-  inner_if->then_instructions.push_tail(
- if_tree(gequal(y, imm(0.0f)),
- assign(r, add(r, imm(M_PIf))),
- assign(r, sub(r, imm(M_PIf);
-  outer_then.emit(inner_if);
-
-  /* Else... */
-  outer_if->else_instructions.push_tail(
- assign(r, mul(sign(y), imm(M_PI_2f;
+   const unsigned n = type->vector_elements;
+   ir_variable *y = in_var(type, "y");
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(type, always_available, 2, y, x);
 
-  body.emit(outer_if);
+   /* If we're on the left half-plane rotate the coordinates π/2 clock-wise
+* for the y=0 discontinuity to end up aligned with the vertical
+* discontinuity of atan(s/t) along t=0.
+*/
+   ir_variable *flip = body.make_temp(glsl_type::bvec(n), "flip");
+   body.emit(assign(flip, less(x, imm(0.0f, n;
+   ir_variable *s = body.make_temp(type, "s");
+   body.emit(assign(s, csel(flip, abs(x), y)));
+   ir_variable *t = body.make_temp(type, "t");
+   body.emit(assign(t, csel(flip, y, abs(x;
 
-  body.emit(assign(vec_result, r, 1 << i));
-   }
-   body.emit(ret(vec_result));
+   /* If the magnitude of the denominator exceeds some huge value, scale down
+* the arguments in order to prevent the reciprocal operation from flushing
+* its result to zero, which would cause precision problems, and for s
+* infinite would cause us to return a NaN instead of the correct finite
+* value.
+*/
+   ir_constant *huge = imm(1e37f, n);
+   ir_variable *scale = body.make_temp(type, "scale");
+   body.emit(assign(scale, csel(gequal(abs(t), huge),
+imm(0.0625f, n), imm(1.0f, n;
+   ir_variable *rcp_scaled_t = body.make_temp(type, "rcp_scaled_t");
+   body.emit(assign(rcp_scaled_t, rcp(mul(t, scale;
+   ir_expression *s_over_t = mul(mul(s, scale), rcp_scaled_t);
+
+   /* Calculate the arctangent and fix up the result if we had flipped the
+* coordinate system.
+*/
+   ir_variable *arc = body.make_temp(type, "arc");
+   

[Mesa-dev] [PATCH 0/7] intel/blorp: Be able to blit to ANYTHING!!!

2017-01-24 Thread Jason Ekstrand
This somewhat tongue-in-cheek series adds support to BLORP for blitting to
a lot more different destination formats.  We now even support the crazy
R9G9B9E5_SHAREDEXP format by emitting shader code to do the conversion.
The result of this is that we can now use blorp for almost all blit
operations in gl and *all* Vulkan formats we support in any way shape or
form we now support for VkBlitImage.  Why?  Because we can!

Jason Ekstrand (7):
  blorp: Handle the RGB workaround more like other workarounds
  intel/blorp: Handle more exotic destination formats
  intel/isl: Add some helpers for working with RGBX formats
  intel/blorp: Silently convert RGBX destination formats to RGBA
  intel/blorp: Support the RGB workaround on more formats
  anv: Allow blitting to/from any supported format
  i965/blorp: Remove a pile of blorp_blit restrictions

 src/intel/blorp/blorp_blit.c  | 229 ++
 src/intel/blorp/blorp_priv.h  |   6 +
 src/intel/isl/isl.h   |  11 ++
 src/intel/isl/isl_format.c|  32 +
 src/intel/vulkan/anv_formats.c|  10 +-
 src/mesa/drivers/dri/i965/brw_blorp.c |  64 +-
 6 files changed, 265 insertions(+), 87 deletions(-)

-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/8] i965/fs: Fix nir_op_fsign of absolute value.

2017-01-24 Thread Francisco Jerez
This does point at the front-end emitting silly code that could have
been optimized out, but the current fsign implementation would emit
bogus IR if abs was set for the argument (because it would apply the
abs modifier on an unsigned integer type), and we shouldn't rely on
the upper layer's optimization passes for correctness.
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index e1ab598..e0c2fa0 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -701,7 +701,14 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
   break;
 
case nir_op_fsign: {
-  if (type_sz(op[0].type) < 8) {
+  if (op[0].abs) {
+ /* Straightforward since the source can be assumed to be
+  * non-negative.
+  */
+ set_condmod(BRW_CONDITIONAL_NZ, bld.MOV(result, op[0]));
+ set_predicate(BRW_PREDICATE_NORMAL, bld.MOV(result, brw_imm_f(1.0f)));
+
+  } else if (type_sz(op[0].type) < 8) {
  /* AND(val, 0x8000) gives the sign bit.
   *
   * Predicated OR ORs 1.0 (0x3f80) with the sign bit if val is not
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/8] glsl: Fix constant evaluation of the rcp op.

2017-01-24 Thread Francisco Jerez
Will avoid a regression in a future commit that introduces some
additional rcp operations.
---
 src/compiler/glsl/ir_expression_operation.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compiler/glsl/ir_expression_operation.py 
b/src/compiler/glsl/ir_expression_operation.py
index f91ac9b..4ac1ffb 100644
--- a/src/compiler/glsl/ir_expression_operation.py
+++ b/src/compiler/glsl/ir_expression_operation.py
@@ -422,7 +422,7 @@ ir_expression_operation = [
operation("neg", 1, source_types=numeric_types, c_expression={'u': "-((int) 
{src0})", 'default': "-{src0}"}),
operation("abs", 1, source_types=signed_numeric_types, c_expression={'i': 
"{src0} < 0 ? -{src0} : {src0}", 'f': "fabsf({src0})", 'd': "fabs({src0})", 
'i64': "{src0} < 0 ? -{src0} : {src0}"}),
operation("sign", 1, source_types=signed_numeric_types, c_expression={'i': 
"({src0} > 0) - ({src0} < 0)", 'f': "float(({src0} > 0.0F) - ({src0} < 0.0F))", 
'd': "double(({src0} > 0.0) - ({src0} < 0.0))", 'i64': "({src0} > 0) - ({src0} 
< 0)"}),
-   operation("rcp", 1, source_types=real_types, c_expression={'f': "{src0} != 
0.0F ? 1.0F / {src0} : 0.0F", 'd': "{src0} != 0.0 ? 1.0 / {src0} : 0.0"}),
+   operation("rcp", 1, source_types=real_types, c_expression={'f': "1.0F / 
{src0}", 'd': "1.0 / {src0}"}),
operation("rsq", 1, source_types=real_types, c_expression={'f': "1.0F / 
sqrtf({src0})", 'd': "1.0 / sqrt({src0})"}),
operation("sqrt", 1, source_types=real_types, c_expression={'f': 
"sqrtf({src0})", 'd': "sqrt({src0})"}),
operation("exp", 1, source_types=(float_type,), 
c_expression="expf({src0})"), # Log base e on gentype
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/8] glsl/ir_builder: Add rcp builder.

2017-01-24 Thread Francisco Jerez
---
 src/compiler/glsl/ir_builder.cpp | 6 ++
 src/compiler/glsl/ir_builder.h   | 1 +
 2 files changed, 7 insertions(+)

diff --git a/src/compiler/glsl/ir_builder.cpp b/src/compiler/glsl/ir_builder.cpp
index 0cee856..8d61533 100644
--- a/src/compiler/glsl/ir_builder.cpp
+++ b/src/compiler/glsl/ir_builder.cpp
@@ -315,6 +315,12 @@ exp(operand a)
 }
 
 ir_expression *
+rcp(operand a)
+{
+   return expr(ir_unop_rcp, a);
+}
+
+ir_expression *
 rsq(operand a)
 {
return expr(ir_unop_rsq, a);
diff --git a/src/compiler/glsl/ir_builder.h b/src/compiler/glsl/ir_builder.h
index 5ee9412..ff1ff70 100644
--- a/src/compiler/glsl/ir_builder.h
+++ b/src/compiler/glsl/ir_builder.h
@@ -148,6 +148,7 @@ ir_expression *neg(operand a);
 ir_expression *sin(operand a);
 ir_expression *cos(operand a);
 ir_expression *exp(operand a);
+ir_expression *rcp(operand a);
 ir_expression *rsq(operand a);
 ir_expression *sqrt(operand a);
 ir_expression *log(operand a);
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/8] nir/spirv/glsl450: Rewrite atan2 implementation to fix accuracy and handling of zero/infinity.

2017-01-24 Thread Francisco Jerez
See "glsl: Rewrite atan2 implementation to fix accuracy and handling
of zero/infinity." for the rationale, but note that the instruction
count benefit discussed there is somewhat less important for the SPIRV
implementation, because the current code already emitted no control
flow instructions -- Still this saves us one hardware instruction per
scalar component on Intel SKL hardware.

Fixes the following Vulkan CTS tests on Intel hardware:

dEQP-VK.glsl.builtin.precision.atan2.highp_compute.scalar
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec2
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec3
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec4
dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec2
dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec4

Note that most of the test-cases above expect IEEE-compliant handling
of atan2(±∞, ±∞), which this patch doesn't explicitly handle, so
except for the last two the test-cases above weren't expected to pass
yet.  The reason they do is that the i965 back-end implementation of
the NIR fmin and fmax instructions is not quite GLSL-compliant (it
complies with IEEE 754 recommendations though), because fmin/fmax of a
NaN and a non-NaN argument currently always return the non-NaN
argument, which causes atan() to flush NaN to one and return the
expected value.  The front-end should probably not be relying on this
behavior for correctness though because other back-ends are likely to
behave differently -- A follow-up patch will handle the atan2(±∞, ±∞)
corner cases explicitly.
---
 src/compiler/spirv/vtn_glsl450.c | 61 ++--
 1 file changed, 40 insertions(+), 21 deletions(-)

diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c
index 0d32fdd..508f218 100644
--- a/src/compiler/spirv/vtn_glsl450.c
+++ b/src/compiler/spirv/vtn_glsl450.c
@@ -302,28 +302,47 @@ build_atan(nir_builder *b, nir_ssa_def *y_over_x)
 static nir_ssa_def *
 build_atan2(nir_builder *b, nir_ssa_def *y, nir_ssa_def *x)
 {
-   nir_ssa_def *zero = nir_imm_float(b, 0.0f);
+   nir_ssa_def *zero = nir_imm_float(b, 0);
+   nir_ssa_def *one = nir_imm_float(b, 1);
 
-   /* If |x| >= 1.0e-8 * |y|: */
-   nir_ssa_def *condition =
-  nir_fge(b, nir_fabs(b, x),
-  nir_fmul(b, nir_imm_float(b, 1.0e-8f), nir_fabs(b, y)));
-
-   /* Then...call atan(y/x) and fix it up: */
-   nir_ssa_def *atan1 = build_atan(b, nir_fdiv(b, y, x));
-   nir_ssa_def *r_then =
-  nir_bcsel(b, nir_flt(b, x, zero),
-   nir_fadd(b, atan1,
-   nir_bcsel(b, nir_fge(b, y, zero),
-nir_imm_float(b, M_PIf),
-nir_imm_float(b, -M_PIf))),
-   atan1);
-
-   /* Else... */
-   nir_ssa_def *r_else =
-  nir_fmul(b, nir_fsign(b, y), nir_imm_float(b, M_PI_2f));
-
-   return nir_bcsel(b, condition, r_then, r_else);
+   /* If we're on the left half-plane rotate the coordinates π/2 clock-wise
+* for the y=0 discontinuity to end up aligned with the vertical
+* discontinuity of atan(s/t) along t=0.
+*/
+   nir_ssa_def *flip = nir_flt(b, x, zero);
+   nir_ssa_def *s = nir_bcsel(b, flip, nir_fabs(b, x), y);
+   nir_ssa_def *t = nir_bcsel(b, flip, y, nir_fabs(b, x));
+
+   /* If the magnitude of the denominator exceeds some huge value, scale down
+* the arguments in order to prevent the reciprocal operation from flushing
+* its result to zero, which would cause precision problems, and for s
+* infinite would cause us to return a NaN instead of the correct finite
+* value.
+*/
+   nir_ssa_def *huge = nir_imm_float(b, 1e37f);
+   nir_ssa_def *scale = nir_bcsel(b, nir_fge(b, nir_fabs(b, t), huge),
+  nir_imm_float(b, 0.0625), one);
+   nir_ssa_def *rcp_scaled_t = nir_frcp(b, nir_fmul(b, t, scale));
+   nir_ssa_def *s_over_t = nir_fmul(b, nir_fmul(b, s, scale), rcp_scaled_t);
+
+   /* Calculate the arctangent and fix up the result if we had flipped the
+* coordinate system.
+*/
+   nir_ssa_def *arc = nir_fadd(b, nir_fmul(b, nir_b2f(b, flip),
+   nir_imm_float(b, M_PI_2f)),
+   build_atan(b, nir_fabs(b, s_over_t)));
+
+   /* Rather convoluted calculation of the sign of the result.  When x < 0 we
+* cannot use fsign because we need to be able to distinguish between
+* negative and positive zero.  We don't use bitwise arithmetic tricks for
+* consistency with the GLSL front-end.  When x >= 0 rcp_scaled_t will
+* always be non-negative so this won't be able to distinguish between
+* negative and positive zero, but we don't care because atan2 is
+* continuous along the whole positive y = 0 half-line, so it won't affect
+* the result.
+*/
+   return nir_bcsel(b, nir_flt(b, nir_fmin(b, y, rcp_scaled_t), zero),
+

[Mesa-dev] [PATCH] [swr] Update fs texture & sampler state logic

2017-01-24 Thread George Kyriazis
In swr_update_derived() update texture and sampler state on a new fragment
shader.  GALLIUM_HUD can update fs using a previously bound texture and
sampler.
---
 src/gallium/drivers/swr/swr_state.cpp | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index 41e0356..f1f4963 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -1283,7 +1283,8 @@ swr_update_derived(struct pipe_context *pipe,
   SwrSetPixelShaderState(ctx->swrContext, );
 
   /* JIT sampler state */
-  if (ctx->dirty & SWR_NEW_SAMPLER) {
+  if (ctx->dirty & (SWR_NEW_SAMPLER |
+SWR_NEW_FS)) {
  swr_update_sampler_state(ctx,
   PIPE_SHADER_FRAGMENT,
   key.nr_samplers,
@@ -1291,7 +1292,9 @@ swr_update_derived(struct pipe_context *pipe,
   }
 
   /* JIT sampler view state */
-  if (ctx->dirty & (SWR_NEW_SAMPLER_VIEW | SWR_NEW_FRAMEBUFFER)) {
+  if (ctx->dirty & (SWR_NEW_SAMPLER_VIEW |
+SWR_NEW_FRAMEBUFFER |
+SWR_NEW_FS)) {
  swr_update_texture_state(ctx,
   PIPE_SHADER_FRAGMENT,
   key.nr_sampler_views,
-- 
2.10.0.windows.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Fix fast depth clears for surfaces with a dimension of 16384.

2017-01-24 Thread Kenneth Graunke
I hadn't bothered to set this bit because I figured it would just
paper over us getting the rectangle wrong.  But it turns out that
there is a legitimate reason to use it, so let's do so.

The alternative would be to chop up 16k clears to multiple 8k clears,
which is pointlessly painful.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/gen8_depth_state.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/gen8_depth_state.c 
b/src/mesa/drivers/dri/i965/gen8_depth_state.c
index ec296698267..de5a16e91bf 100644
--- a/src/mesa/drivers/dri/i965/gen8_depth_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_depth_state.c
@@ -477,6 +477,17 @@ gen8_hiz_exec(struct brw_context *brw, struct 
intel_mipmap_tree *mt,
   break;
case BLORP_HIZ_OP_DEPTH_CLEAR:
   dw1 |= GEN8_WM_HZ_DEPTH_CLEAR;
+
+  /* The "Clear Rectangle X Max" (and Y Max) fields are exclusive,
+   * rather than inclusive, and limited to 16383.  This means that
+   * for a 16384x16384 render target, we would miss the last pixel.
+   *
+   * To work around this, we have to set the "Full Surface Depth
+   * and Stencil Clear" bit.  We can do this in all cases because
+   * we always clear the full rectangle anyway.  We'll need to
+   * change this if we ever add scissored clear support.
+   */
+  dw1 |= GEN8_WM_HZ_FULL_SURFACE_DEPTH_CLEAR;
   break;
case BLORP_HIZ_OP_NONE:
   unreachable("Should not get here.");
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/8] glsl: Implement IEEE-compliant handling of atan2(±∞, ±∞).

2017-01-24 Thread Francisco Jerez
---
 src/compiler/glsl/builtin_functions.cpp | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/compiler/glsl/builtin_functions.cpp 
b/src/compiler/glsl/builtin_functions.cpp
index fd59381..9d6ab80 100644
--- a/src/compiler/glsl/builtin_functions.cpp
+++ b/src/compiler/glsl/builtin_functions.cpp
@@ -3590,11 +3590,31 @@ builtin_builder::_atan2(const glsl_type *type)
body.emit(assign(rcp_scaled_t, rcp(mul(t, scale;
ir_expression *s_over_t = mul(mul(s, scale), rcp_scaled_t);
 
+   /* For |x| = |y| assume tan = 1 even if infinite (i.e. pretend momentarily
+* that ∞/∞ = 1) in order to comply with the rather artificial rules
+* inherited from IEEE 754-2008, namely:
+*
+*  "atan2(±∞, −∞) is ±3π/4
+*   atan2(±∞, +∞) is ±π/4"
+*
+* Note that this is inconsistent with the rules for the neighborhood of
+* zero that are based on iterated limits:
+*
+*  "atan2(±0, −0) is ±π
+*   atan2(±0, +0) is ±0"
+*
+* but GLSL specifically allows implementations to deviate from IEEE rules
+* at (0,0), so we take that license (i.e. pretend that 0/0 = 1 here as
+* well).
+*/
+   ir_expression *tan = csel(equal(abs(x), abs(y)),
+ imm(1.0f, n), abs(s_over_t));
+
/* Calculate the arctangent and fix up the result if we had flipped the
 * coordinate system.
 */
ir_variable *arc = body.make_temp(type, "arc");
-   do_atan(body, type, arc, abs(s_over_t));
+   do_atan(body, type, arc, tan);
body.emit(assign(arc, add(arc, mul(b2f(flip), imm(M_PI_2f);
 
/* Rather convoluted calculation of the sign of the result.  When x < 0 we
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 8/8] nir/spirv/glsl450: Implement IEEE-compliant handling of atan2(±∞, ±∞).

2017-01-24 Thread Francisco Jerez
---
 src/compiler/spirv/vtn_glsl450.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c
index 508f218..7af2dad 100644
--- a/src/compiler/spirv/vtn_glsl450.c
+++ b/src/compiler/spirv/vtn_glsl450.c
@@ -325,12 +325,32 @@ build_atan2(nir_builder *b, nir_ssa_def *y, nir_ssa_def 
*x)
nir_ssa_def *rcp_scaled_t = nir_frcp(b, nir_fmul(b, t, scale));
nir_ssa_def *s_over_t = nir_fmul(b, nir_fmul(b, s, scale), rcp_scaled_t);
 
+   /* For |x| = |y| assume tan = 1 even if infinite (i.e. pretend momentarily
+* that ∞/∞ = 1) in order to comply with the rather artificial rules
+* inherited from IEEE 754-2008, namely:
+*
+*  "atan2(±∞, −∞) is ±3π/4
+*   atan2(±∞, +∞) is ±π/4"
+*
+* Note that this is inconsistent with the rules for the neighborhood of
+* zero that are based on iterated limits:
+*
+*  "atan2(±0, −0) is ±π
+*   atan2(±0, +0) is ±0"
+*
+* but GLSL specifically allows implementations to deviate from IEEE rules
+* at (0,0), so we take that license (i.e. pretend that 0/0 = 1 here as
+* well).
+*/
+   nir_ssa_def *tan = nir_bcsel(b, nir_feq(b, nir_fabs(b, x), nir_fabs(b, y)),
+one, nir_fabs(b, s_over_t));
+
/* Calculate the arctangent and fix up the result if we had flipped the
 * coordinate system.
 */
nir_ssa_def *arc = nir_fadd(b, nir_fmul(b, nir_b2f(b, flip),
nir_imm_float(b, M_PI_2f)),
-   build_atan(b, nir_fabs(b, s_over_t)));
+   build_atan(b, tan));
 
/* Rather convoluted calculation of the sign of the result.  When x < 0 we
 * cannot use fsign because we need to be able to distinguish between
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/8] mesa/program: Translate csel operation from GLSL IR.

2017-01-24 Thread Francisco Jerez
This will be used internally by the GLSL front-end in order to
implement some built-in functions. Plumb it through MESA IR for
back-ends that rely on this translation pass.
---
 src/mesa/program/ir_to_mesa.cpp | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/mesa/program/ir_to_mesa.cpp b/src/mesa/program/ir_to_mesa.cpp
index 0ae797f..5ff7304 100644
--- a/src/mesa/program/ir_to_mesa.cpp
+++ b/src/mesa/program/ir_to_mesa.cpp
@@ -1360,13 +1360,17 @@ ir_to_mesa_visitor::visit(ir_expression *ir)
   emit(ir, OPCODE_LRP, result_dst, op[2], op[1], op[0]);
   break;
 
+   case ir_triop_csel:
+  op[0].negate = ~op[0].negate;
+  emit(ir, OPCODE_CMP, result_dst, op[0], op[1], op[2]);
+  break;
+
case ir_binop_vector_extract:
case ir_triop_fma:
case ir_triop_bitfield_extract:
case ir_triop_vector_insert:
case ir_quadop_bitfield_insert:
case ir_binop_ldexp:
-   case ir_triop_csel:
case ir_binop_carry:
case ir_binop_borrow:
case ir_binop_imul_high:
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/31] radv: program a default point size.

2017-01-24 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

On Fri, Jan 20, 2017 at 4:02 AM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> Along the lines of what
> 3b804819 anv: Default PointSize to 1.0 if not written by the shader
> does for anv, program a default point size in the hw of 1.0.
>
> This preempt fixes a bunch of geom shader tests.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/amd/vulkan/radv_cmd_buffer.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/amd/vulkan/radv_cmd_buffer.c 
> b/src/amd/vulkan/radv_cmd_buffer.c
> index c6f238b..c62d275 100644
> --- a/src/amd/vulkan/radv_cmd_buffer.c
> +++ b/src/amd/vulkan/radv_cmd_buffer.c
> @@ -438,7 +438,8 @@ radv_emit_graphics_raster_state(struct radv_cmd_buffer 
> *cmd_buffer,
>raster->spi_interp_control);
>
> radeon_set_context_reg_seq(cmd_buffer->cs, R_028A00_PA_SU_POINT_SIZE, 
> 2);
> -   radeon_emit(cmd_buffer->cs, 0);
> +   unsigned tmp = (unsigned)(1.0 * 8.0);
> +   radeon_emit(cmd_buffer->cs, S_028A00_HEIGHT(tmp) | 
> S_028A00_WIDTH(tmp));
> radeon_emit(cmd_buffer->cs, 
> S_028A04_MIN_SIZE(radv_pack_float_12p4(0)) |
> S_028A04_MAX_SIZE(radv_pack_float_12p4(8192/2))); /* 
> R_028A04_PA_SU_POINT_MINMAX */
>
> --
> 2.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 23/31] radv/ac: handle case of swizzle with single components in get_alu_src.

2017-01-24 Thread Bas Nieuwenhuizen
How are you hitting this? The enclosing if is (need_swizzle ||
num_components != src_components) and if src_components =
num_components = 1, then need_swizzle should be false?

On Fri, Jan 20, 2017 at 4:03 AM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> This gets hit with some geom shaders.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/amd/common/ac_nir_to_llvm.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index 92e2b44..97e352b 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -844,7 +844,10 @@ static LLVMValueRef get_alu_src(struct 
> nir_to_llvm_context *ctx,
> LLVMConstInt(ctx->i32, src.swizzle[2], false),
> LLVMConstInt(ctx->i32, src.swizzle[3], false)};
>
> -   if (src_components > 1 && num_components == 1) {
> +   if (src_components == 1 && num_components == 1) {
> +   value = LLVMBuildExtractElement(ctx->builder, value,
> +   masks[0], "");
> +   } else if (src_components > 1 && num_components == 1) {
> value = LLVMBuildExtractElement(ctx->builder, value,
> masks[0], "");
> } else if (src_components == 1 && num_components > 1) {
> --
> 2.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi Add disk shader cache

2017-01-24 Thread kdj0c

On 24/01/2017 22:59, Timothy Arceri wrote:

On Tue, 2017-01-24 at 18:10 +0100, kdj0c wrote:

On 24/01/2017 17:40, Nicolai Hähnle wrote:

On 24.01.2017 17:08, kdj0c wrote:

use the util/disk_cache.c interface to cache some? radeonsi
shaders on disk

missing features :

- add #if ENABLE_SHADER_CACHE where needed.
- when loading from disk cache, also insert it to RAM cache.

must be built with --enable-shader-cache to have the cache
working.
---
Hi, This is my first mail to the list.

I'm not sure this is the right way to do this, it's my first
attempt to patch mesa.
I've tested on a radeon HD7950 with glxgears and quake3. I have
some binary shaders in ~/.cache/mesa after running them, and they
are re-used when re-launching them.
I wanted to test more recent games, but the LD_LIBRARY_PATH trick
didn't work with steam games, and I don't want to install mesa
master system-wide.


Unfortunately, I'd say that this is a pretty wrong approach. A
radeonsi-level cache is nice, but the GLSL-level compilation and
linking has overhead as well, which we want to avoid with the
cache.

We really want to detect a re-used shader already at the GLSL
level, to be able to go straight to binaries (and TGSI I guess, for
optimized monolithic variants).



ok This is what I was wondering, it's not the right place to put it.
(but it was easy because there was already a RAM cache).

Thanks


Hi,

Welcome to contributing to Mesa :)

I'm not sure how much time you have to work on this feature, but just
letting you know it was my intention to start work on shade cache
support for radeonsi next week.


ok, I was following the advice and looking at glsl and tgsi code,
but I'm a bit lost. That looks to complex for a first contribution.
I can still help by testing patches (on radeon HD7950 only).

Thanks,

--

Jocelyn



Tim
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: implement pipeline statistics queries

2017-01-24 Thread Robert Bragg
On Tue, Jan 24, 2017 at 2:37 PM, Ilia Mirkin  wrote:
> On Tue, Jan 24, 2017 at 5:27 PM, Robert Bragg  wrote:
>>> +/*
>>> + * GPR0 = GPR0 >> 2;
>>> + *
>>> + * Note that the upper 30 bits of GPR are lost!
>>> + */
>>> +static void
>>> +shr_gpr0_by_2_bits(struct anv_batch *batch)
>>> +{
>>> +   shl_gpr0_by_30_bits(batch);
>>> +   emit_load_alu_reg_reg32(batch, CS_GPR(0) + 4, CS_GPR(0));
>>> +   emit_load_alu_reg_imm32(batch, CS_GPR(0) + 4, 0);
>>
>>
>> I recently noticed from inspecting the original hsw_queryobj,c code
>> that this looks suspicious.
>>
>> Conceptually it makes sense to implement a right shift as lshift by
>> 32-n and then keeping the upper 32bits, but the emit_load_ functions
>> take a destination followed by a source and so it looks like after the
>> left shift it's copying the least significant 32bits of R0 over the
>> most significant and then setting the most significant 32bits of R0 to
>> zero. It looks like the first load_alu is redundant if the second one
>> just writes zero to the same location.
>>
>> Maybe I'm misreading something here though, this comment it just based
>> on inspection.
>
> What you're missing, I think, is that
>
> emit_load_alu_reg_reg32(batch, CS_GPR(0) + 4, CS_GPR(0));
>
> does CS_GPR(0) = CS_GPR(0) + 4, and not the inverse as one logically
> might have thought. I copied the semantics from the hsw_queryobj.c
> file, but I think they stink. But it stinks even more to have 2
> functions with inverted argument meanings.
>
> Does that make sense?

oh yeah sorry, not sure how I convinced myself it took dst then src.

>
> [So we have GPR0 which is a 64-bit entity, and do GPR0 <<= 30; GPR0_LO
> = GPR0_HI; GPR0_HI = 0; and then we can store GPR0 somewhere.]
>
> As for re-using your generalized shifter, I don't think that'd make
> sense to introduce in this change. It feels like a component on its
> own, which should be integrated (or not) separately. When/if it is,
> this and hsw_queryobj.c could migrate to using it.

Yup definitely, this code works for the current need so no need to
mess around with it here - thanks for clarifying my misreading.

- Robert

>
> Cheers,
>
>   -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 99517] [TRACKER] Mesa 17.0 release tracker

2017-01-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99517

Mark Janes  changed:

   What|Removed |Added

 Depends on||96907


Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=96907
[Bug 96907] piglit.spec.arb_gpu_shader5.arb_gpu_shader5-emitstreamvertex_nodraw
intermittent
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 99517] [TRACKER] Mesa 17.0 release tracker

2017-01-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99517

Mark Janes  changed:

   What|Removed |Added

 Depends on||98892


Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=98892
[Bug 98892] [BDW] dEQP-VK.ubo.single_nested_struct_array tests intermittent
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 99517] [TRACKER] Mesa 17.0 release tracker

2017-01-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99517

Mark Janes  changed:

   What|Removed |Added

 Depends on||99099


Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=99099
[Bug 99099] [SNB] intermittent gpu hang in
piglit.spec.ext_framebuffer_multisample.accuracy
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 99517] [TRACKER] Mesa 17.0 release tracker

2017-01-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99517

Mark Janes  changed:

   What|Removed |Added

 Depends on||99266


Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=99266
[Bug 99266] piglit.spec.ext_framebuffer_object.getteximage-formats
init-by-clear-and-render
-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: implement pipeline statistics queries

2017-01-24 Thread Ilia Mirkin
On Tue, Jan 24, 2017 at 5:27 PM, Robert Bragg  wrote:
>> +/*
>> + * GPR0 = GPR0 >> 2;
>> + *
>> + * Note that the upper 30 bits of GPR are lost!
>> + */
>> +static void
>> +shr_gpr0_by_2_bits(struct anv_batch *batch)
>> +{
>> +   shl_gpr0_by_30_bits(batch);
>> +   emit_load_alu_reg_reg32(batch, CS_GPR(0) + 4, CS_GPR(0));
>> +   emit_load_alu_reg_imm32(batch, CS_GPR(0) + 4, 0);
>
>
> I recently noticed from inspecting the original hsw_queryobj,c code
> that this looks suspicious.
>
> Conceptually it makes sense to implement a right shift as lshift by
> 32-n and then keeping the upper 32bits, but the emit_load_ functions
> take a destination followed by a source and so it looks like after the
> left shift it's copying the least significant 32bits of R0 over the
> most significant and then setting the most significant 32bits of R0 to
> zero. It looks like the first load_alu is redundant if the second one
> just writes zero to the same location.
>
> Maybe I'm misreading something here though, this comment it just based
> on inspection.

What you're missing, I think, is that

emit_load_alu_reg_reg32(batch, CS_GPR(0) + 4, CS_GPR(0));

does CS_GPR(0) = CS_GPR(0) + 4, and not the inverse as one logically
might have thought. I copied the semantics from the hsw_queryobj.c
file, but I think they stink. But it stinks even more to have 2
functions with inverted argument meanings.

Does that make sense?

[So we have GPR0 which is a 64-bit entity, and do GPR0 <<= 30; GPR0_LO
= GPR0_HI; GPR0_HI = 0; and then we can store GPR0 somewhere.]

As for re-using your generalized shifter, I don't think that'd make
sense to introduce in this change. It feels like a component on its
own, which should be integrated (or not) separately. When/if it is,
this and hsw_queryobj.c could migrate to using it.

Cheers,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: implement pipeline statistics queries

2017-01-24 Thread Robert Bragg
Sorry for the delay responding here; some comments below...


On Tue, Jan 24, 2017 at 11:48 AM, Ilia Mirkin  wrote:
> 2-month ping. [ok, it hasn't been 2 months on the dot, but ... close.]
>
> On Tue, Jan 10, 2017 at 5:49 PM, Ilia Mirkin  wrote:
>> ping.
>>
>> On Thu, Dec 22, 2016 at 11:14 AM, Ilia Mirkin  wrote:
>>> Ping? Any further comments/feedback/reviews?
>>>
>>>
>>> On Dec 5, 2016 11:22 AM, "Ilia Mirkin"  wrote:
>>>
>>> On Mon, Dec 5, 2016 at 11:11 AM, Robert Bragg  wrote:


 On Sun, Nov 27, 2016 at 7:23 PM, Ilia Mirkin  wrote:
>
> The strategy is to just keep n anv_query_pool_slot entries per query
> instead of one. The available bit is only valid in the last one.
>
> Signed-off-by: Ilia Mirkin 
> ---
>
> I think this is in a pretty good state now. I've tested both the direct
> and
> buffer paths with a hacked up cube application, and I'm seeing
> non-ridiculous
> values for the various counters, although I haven't 100% verified them
> for
> accuracy.
>
> This also implements the hsw/bdw workaround for dividing frag invocations
> by 4,
> copied from hsw_queryobj. I tested this on SKL and it seem to divide the
> values
> as expected.
>
> The cube patch I've been testing with is at
> http://paste.debian.net/899374/
> You can flip between copying to a buffer and explicit retrieval by
> commenting
> out the relevant function calls.
>
>  src/intel/vulkan/anv_device.c  |   2 +-
>  src/intel/vulkan/anv_private.h |   4 +
>  src/intel/vulkan/anv_query.c   |  99 ++
>  src/intel/vulkan/genX_cmd_buffer.c | 260
> -
>  4 files changed, 308 insertions(+), 57 deletions(-)
>
>
> diff --git a/src/intel/vulkan/anv_device.c
> b/src/intel/vulkan/anv_device.c
> index 99eb73c..7ad1970 100644
> --- a/src/intel/vulkan/anv_device.c
> +++ b/src/intel/vulkan/anv_device.c
> @@ -427,7 +427,7 @@ void anv_GetPhysicalDeviceFeatures(
>.textureCompressionASTC_LDR   = pdevice->info.gen >=
> 9,
> /* FINISHME CHV */
>.textureCompressionBC = true,
>.occlusionQueryPrecise= true,
> -  .pipelineStatisticsQuery  = false,
> +  .pipelineStatisticsQuery  = true,
>.fragmentStoresAndAtomics = true,
>.shaderTessellationAndGeometryPointSize   = true,
>.shaderImageGatherExtended= false,
> diff --git a/src/intel/vulkan/anv_private.h
> b/src/intel/vulkan/anv_private.h
> index 2fc543d..7271609 100644
> --- a/src/intel/vulkan/anv_private.h
> +++ b/src/intel/vulkan/anv_private.h
> @@ -1763,6 +1763,8 @@ struct anv_render_pass {
> struct anv_subpass   subpasses[0];
>  };
>
> +#define ANV_PIPELINE_STATISTICS_COUNT 11
> +
>  struct anv_query_pool_slot {
> uint64_t begin;
> uint64_t end;
> @@ -1772,6 +1774,8 @@ struct anv_query_pool_slot {
>  struct anv_query_pool {
> VkQueryType  type;
> uint32_t slots;
> +   uint32_t pipeline_statistics;
> +   uint32_t slot_stride;
> struct anv_bobo;
>  };
>
> diff --git a/src/intel/vulkan/anv_query.c b/src/intel/vulkan/anv_query.c
> index 293257b..dc00859 100644
> --- a/src/intel/vulkan/anv_query.c
> +++ b/src/intel/vulkan/anv_query.c
> @@ -38,8 +38,10 @@ VkResult anv_CreateQueryPool(
> ANV_FROM_HANDLE(anv_device, device, _device);
> struct anv_query_pool *pool;
> VkResult result;
> -   uint32_t slot_size;
> -   uint64_t size;
> +   uint32_t slot_size = sizeof(struct anv_query_pool_slot);
> +   uint32_t slot_stride = 1;
> +   uint64_t size = pCreateInfo->queryCount * slot_size;
> +   uint32_t pipeline_statistics = 0;
>
> assert(pCreateInfo->sType ==
> VK_STRUCTURE_TYPE_QUERY_POOL_CREATE_INFO);
>
> @@ -48,12 +50,16 @@ VkResult anv_CreateQueryPool(
> case VK_QUERY_TYPE_TIMESTAMP:
>break;
> case VK_QUERY_TYPE_PIPELINE_STATISTICS:
> -  return VK_ERROR_INCOMPATIBLE_DRIVER;
> +  pipeline_statistics = pCreateInfo->pipelineStatistics &
> + ((1 << ANV_PIPELINE_STATISTICS_COUNT) - 1);
> +  slot_stride = _mesa_bitcount(pipeline_statistics);
> +  size *= slot_stride;
> +  break;
> default:
>assert(!"Invalid query type");
> +  return 

[Mesa-dev] [PATCH] i965: Use a UW source type for CS_OPCODE_CS_TERMINATE.

2017-01-24 Thread Kenneth Graunke
SIMD16 compute shaders use a send(16) with mlen 1 for the EOT message,
using a source of g127 for the single register.  With a UD type, this
supposedly could read g128, which doesn't exist, causing the simulator
to get cranky.  Use a UW type to avoid this.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index cea38d86237..97420586d71 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -814,7 +814,8 @@ fs_visitor::emit_cs_terminate()
 
/* Send a message to the thread spawner to terminate the thread. */
fs_inst *inst = bld.exec_all()
-  .emit(CS_OPCODE_CS_TERMINATE, reg_undef, payload);
+  .emit(CS_OPCODE_CS_TERMINATE, reg_undef,
+retype(payload, BRW_REGISTER_TYPE_UW));
inst->eot = true;
 }
 
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi Add disk shader cache

2017-01-24 Thread Timothy Arceri
On Tue, 2017-01-24 at 18:10 +0100, kdj0c wrote:
> On 24/01/2017 17:40, Nicolai Hähnle wrote:
> > On 24.01.2017 17:08, kdj0c wrote:
> > > use the util/disk_cache.c interface to cache some? radeonsi
> > > shaders on disk
> > > 
> > > missing features :
> > > 
> > > - add #if ENABLE_SHADER_CACHE where needed.
> > > - when loading from disk cache, also insert it to RAM cache.
> > > 
> > > must be built with --enable-shader-cache to have the cache
> > > working.
> > > ---
> > > Hi, This is my first mail to the list.
> > > 
> > > I'm not sure this is the right way to do this, it's my first
> > > attempt to patch mesa.
> > > I've tested on a radeon HD7950 with glxgears and quake3. I have
> > > some binary shaders in ~/.cache/mesa after running them, and they
> > > are re-used when re-launching them.
> > > I wanted to test more recent games, but the LD_LIBRARY_PATH trick
> > > didn't work with steam games, and I don't want to install mesa
> > > master system-wide.
> > 
> > Unfortunately, I'd say that this is a pretty wrong approach. A
> > radeonsi-level cache is nice, but the GLSL-level compilation and
> > linking has overhead as well, which we want to avoid with the
> > cache.
> > 
> > We really want to detect a re-used shader already at the GLSL
> > level, to be able to go straight to binaries (and TGSI I guess, for
> > optimized monolithic variants).
> > 
> 
> ok This is what I was wondering, it's not the right place to put it.
> (but it was easy because there was already a RAM cache).
> 
> Thanks

Hi,

Welcome to contributing to Mesa :)

I'm not sure how much time you have to work on this feature, but just
letting you know it was my intention to start work on shade cache
support for radeonsi next week. 

Tim
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] r600g: use ieee variants of multiplication instructions

2017-01-24 Thread Axel Davy

On 24/01/2017 20:11, Matteo Bruni wrote:

2017-01-24 19:15 GMT+01:00 Ilia Mirkin :

On Tue, Jan 24, 2017 at 1:11 PM, Matteo Bruni  wrote:

2017-01-24 3:18 GMT+01:00 Ilia Mirkin :

This matches the behavior of most other drivers, including nouveau.

Doesn't this break all the applications depending on d3d9 NaN behavior
(including, but not limited to, d3d9 games in Wine) on r600g?

If I got this right, flipping around the two patches in this series
and enabling the TGSI_PROPERTY_MUL_ZERO_WINS flag for OpenGL
non-compute shaders (if that's not the case already) should avoid
regressions.

This patch normalizes r600g wrt multiply handling with the other
DX10/11 hardware drivers. nv50, nvc0, si, and i965 all use the IEEE
behavior. I don't know for sure, but assume that nv30 and r300 have
the DX9 behavior natively without IEEE support.

The next patch allows for the MUL_ZERO_WINS property to be used to get
the DX9 behavior, which st/nine will make use of.

That doesn't help Wine or any "native" OpenGL application which
happens to depend on the old behavior.
Even if there are none of them (which doesn't sound right to me)
applying this patch before 2/2 means that you are changing behavior
for nine in this one patch and changing it back again with the next,
which looks to me as something generally better avoided.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Bad apps that depend on the behaviour could be listed in drirc with a 
workaround to force them use the gl extension associated with the feature.



Yours,


Axel Davy

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V3] glsl: lower constant arrays to uniform arrays before optimisation loop

2017-01-24 Thread Timothy Arceri
On Tue, 2017-01-24 at 09:57 -0800, Eric Anholt wrote:
> Timothy Arceri  writes:
> 
> > From: Timothy Arceri 
> > 
> > Previously the constant array would not get copy propagated until
> > the backend
> > did its GLSL IR opt loop. I plan on removing that from i965 shortly
> > which
> > caused huge regressions in Deus-ex and Tomb Raider which have large
> > constant arrays. Moving lowering before the opt loop in the GLSL
> > linker
> > fixes this and unexpectedly improves some compute shaders also.
> 
> It seems like we should figure out what's missing in NIR that the
> lack
> of GLSL copy propagation hurt, but this is a pretty easy fix for now:
> 
> Reviewed-by: Eric Anholt 

Thanks.

The problem in NIR is that we end up with IR that looks like this.

vec4 32 ssa_496 = intrinsic load_var () (constarray_0_4[264]) ()
intrinsic store_var (ssa_496) (icb[264]) (15) /* wrmask=xyzw */

But NIRs variable-based copy propagation pass needs there to be a
copy_var in order to progress. We certainly need to improve this but
there are so many bits that need to be improved I'm trying not to get
sidetracked, for now my goal is to remove all GLSL IR opts from the
i965 linker.

Also since this actually improved some shaders it makes sense to make
the change now so that we can try to carry over the improvement when
fixing the NIR pass.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 99517] [TRACKER] Mesa 17.0 release tracker

2017-01-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99517

Mark Janes  changed:

   What|Removed |Added

 Depends on||99509


Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=99509
[Bug 99509] [SKLGT4e] piglit.spec.arb_shader_image_load_store.qualifiers
intermittent
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: fix compile errors with mingw due to missing PRIx64 definitions

2017-01-24 Thread Ian Romanick
On 01/23/2017 11:21 AM, srol...@vmware.com wrote:
> From: Roland Scheidegger 
> 
> define __STDC_FORMAT_MACROS and include  (same as
> ir_builder_print_visitor.cpp already does).
> 
> Otherwise, some mingw build errors out (since
> 8e7e1ae0365ddc7edb0d4d98250ab46728e6c14a and
> bbce1c538dc0cb8bf3769510283d11847dc07540 presumably) with:
> src/compiler/glsl/ir_print_visitor.cpp:479:40: error: expected ‘)’ before 
> ‘PRIu64’
>case GLSL_TYPE_UINT64:fprintf(f, "%" PRIu64, ir->value.u64[i]); break;
> 
> (Note even with that fix I get other format specifier warnings:
> src/compiler/glsl/ir_print_visitor.cpp:473:47:
> warning: unknown conversion type character ‘a’ in format [-Wformat=]
> fprintf(f, "%a", ir->value.f[i]);
>^
> src/compiler/glsl/ir_print_visitor.cpp:473:47:
> warning: too many arguments for format [-Wformat-extra-args]
> but it still compiles at least)

Ouch.  That was added over 3 years ago.

commit 1ecfdba98a346c8bb05ad9403e3a6412574215f4
Author: Matt Turner 
Date:   Sun Aug 4 14:01:30 2013 -0700

glsl: Add heuristics to print floating-point numbers better.

v2: Fix *.expected files to match.
Reviewed-by: Paul Berry 

> ---
>  src/compiler/glsl/glsl_parser_extras.cpp | 2 ++
>  src/compiler/glsl/ir_print_visitor.cpp   | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/src/compiler/glsl/glsl_parser_extras.cpp 
> b/src/compiler/glsl/glsl_parser_extras.cpp
> index e888090..3d2fc14 100644
> --- a/src/compiler/glsl/glsl_parser_extras.cpp
> +++ b/src/compiler/glsl/glsl_parser_extras.cpp
> @@ -20,6 +20,8 @@
>   * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>   * DEALINGS IN THE SOFTWARE.
>   */
> +#define __STDC_FORMAT_MACROS 1
> +#include  /* for PRIx64 macro */
>  #include 
>  #include 
>  #include 
> diff --git a/src/compiler/glsl/ir_print_visitor.cpp 
> b/src/compiler/glsl/ir_print_visitor.cpp
> index 0763277..debbdad 100644
> --- a/src/compiler/glsl/ir_print_visitor.cpp
> +++ b/src/compiler/glsl/ir_print_visitor.cpp
> @@ -21,6 +21,8 @@
>   * DEALINGS IN THE SOFTWARE.
>   */
>  
> +#define __STDC_FORMAT_MACROS 1
> +#include  /* for PRIx64 macro */
>  #include "ir_print_visitor.h"
>  #include "compiler/glsl_types.h"
>  #include "glsl_parser_extras.h"
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] utils/sha1: make _mesa_sha1_final a simple define around SHA1Final

2017-01-24 Thread Emil Velikov
From: Emil Velikov 

Swap the argument order as applicable.

Signed-off-by: Emil Velikov 
---
Similar patch for _mesa_sha1_update will require a bunch of casting due
to the data type, which imho makes things uglier.
---
 src/amd/vulkan/radv_descriptor_set.c  | 2 +-
 src/amd/vulkan/radv_pipeline_cache.c  | 2 +-
 src/intel/vulkan/anv_descriptor_set.c | 2 +-
 src/intel/vulkan/anv_pipeline_cache.c | 2 +-
 src/util/mesa-sha1.h  | 8 ++--
 5 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/src/amd/vulkan/radv_descriptor_set.c 
b/src/amd/vulkan/radv_descriptor_set.c
index 435b7394a3..e35ed99d71 100644
--- a/src/amd/vulkan/radv_descriptor_set.c
+++ b/src/amd/vulkan/radv_descriptor_set.c
@@ -219,7 +219,7 @@ VkResult radv_CreatePipelineLayout(
layout->push_constant_size = align(layout->push_constant_size, 16);
_mesa_sha1_update(, >push_constant_size,
  sizeof(layout->push_constant_size));
-   _mesa_sha1_final(, layout->sha1);
+   _mesa_sha1_final(layout->sha1, );
*pPipelineLayout = radv_pipeline_layout_to_handle(layout);
 
return VK_SUCCESS;
diff --git a/src/amd/vulkan/radv_pipeline_cache.c 
b/src/amd/vulkan/radv_pipeline_cache.c
index 1bfdbe804c..164d38fc96 100644
--- a/src/amd/vulkan/radv_pipeline_cache.c
+++ b/src/amd/vulkan/radv_pipeline_cache.c
@@ -104,7 +104,7 @@ radv_hash_shader(unsigned char *hash, struct 
radv_shader_module *module,
  spec_info->mapEntryCount * sizeof 
spec_info->pMapEntries[0]);
_mesa_sha1_update(, spec_info->pData, spec_info->dataSize);
}
-   _mesa_sha1_final(, hash);
+   _mesa_sha1_final(hash, );
 }
 
 
diff --git a/src/intel/vulkan/anv_descriptor_set.c 
b/src/intel/vulkan/anv_descriptor_set.c
index 29bb67c5c3..05a9828aab 100644
--- a/src/intel/vulkan/anv_descriptor_set.c
+++ b/src/intel/vulkan/anv_descriptor_set.c
@@ -271,7 +271,7 @@ VkResult anv_CreatePipelineLayout(
   _mesa_sha1_update(, >stage[s].has_dynamic_offsets,
 sizeof(layout->stage[s].has_dynamic_offsets));
}
-   _mesa_sha1_final(, layout->sha1);
+   _mesa_sha1_final(layout->sha1, );
 
*pPipelineLayout = anv_pipeline_layout_to_handle(layout);
 
diff --git a/src/intel/vulkan/anv_pipeline_cache.c 
b/src/intel/vulkan/anv_pipeline_cache.c
index 0b677a49f3..b34bffaca4 100644
--- a/src/intel/vulkan/anv_pipeline_cache.c
+++ b/src/intel/vulkan/anv_pipeline_cache.c
@@ -221,7 +221,7 @@ anv_hash_shader(unsigned char *hash, const void *key, 
size_t key_size,
 spec_info->mapEntryCount * sizeof 
spec_info->pMapEntries[0]);
   _mesa_sha1_update(, spec_info->pData, spec_info->dataSize);
}
-   _mesa_sha1_final(, hash);
+   _mesa_sha1_final(hash, );
 }
 
 static struct anv_shader_bin *
diff --git a/src/util/mesa-sha1.h b/src/util/mesa-sha1.h
index 02dd5f81bf..bab81299c6 100644
--- a/src/util/mesa-sha1.h
+++ b/src/util/mesa-sha1.h
@@ -40,11 +40,7 @@ _mesa_sha1_update(struct mesa_sha1 *ctx, const void *data, 
int size)
SHA1Update(ctx, data, size);
 }
 
-static inline void
-_mesa_sha1_final(struct mesa_sha1 *ctx, unsigned char result[20])
-{
-   SHA1Final(result, ctx);
-}
+#define _mesa_sha1_final SHA1Final
 
 static inline void
 _mesa_sha1_format(char *buf, const unsigned char *sha1)
@@ -66,7 +62,7 @@ _mesa_sha1_compute(const void *data, size_t size, unsigned 
char result[20])
 
_mesa_sha1_init();
_mesa_sha1_update(, data, size);
-   _mesa_sha1_final(, result);
+   _mesa_sha1_final(result, );
 }
 
 
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] util/sha1: inline the final _mesa_sha1 wrappers inside the header

2017-01-24 Thread Emil Velikov
From: Emil Velikov 

Signed-off-by: Emil Velikov 
---
 src/util/Makefile.sources |  1 -
 src/util/mesa-sha1.c  | 57 ---
 src/util/mesa-sha1.h  | 33 ++-
 3 files changed, 27 insertions(+), 64 deletions(-)
 delete mode 100644 src/util/mesa-sha1.c

diff --git a/src/util/Makefile.sources b/src/util/Makefile.sources
index a68a5fe22f..aeedbffdf5 100644
--- a/src/util/Makefile.sources
+++ b/src/util/Makefile.sources
@@ -17,7 +17,6 @@ MESA_UTIL_FILES :=\
hash_table.h \
list.h \
macros.h \
-   mesa-sha1.c \
mesa-sha1.h \
sha1/sha1.c \
sha1/sha1.h \
diff --git a/src/util/mesa-sha1.c b/src/util/mesa-sha1.c
deleted file mode 100644
index a14fec97e7..00
--- a/src/util/mesa-sha1.c
+++ /dev/null
@@ -1,57 +0,0 @@
-/* Copyright © 2007 Carl Worth
- * Copyright © 2009 Jeremy Huddleston, Julien Cristau, and Matthieu Herrb
- * Copyright © 2009-2010 Mikhail Gusarov
- * Copyright © 2012 Yaakov Selkowitz and Keith Packard
- * Copyright © 2014 Intel Corporation
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
- * DEALINGS IN THE SOFTWARE.
- */
-
-#include "sha1/sha1.h"
-#include "mesa-sha1.h"
-
-void
-_mesa_sha1_update(struct mesa_sha1 *ctx, const void *data, int size)
-{
-   SHA1Update(ctx, data, size);
-}
-
-void
-_mesa_sha1_compute(const void *data, size_t size, unsigned char result[20])
-{
-   struct mesa_sha1 ctx;
-
-   _mesa_sha1_init();
-   _mesa_sha1_update(, data, size);
-   _mesa_sha1_final(, result);
-}
-
-void
-_mesa_sha1_format(char *buf, const unsigned char *sha1)
-{
-   static const char hex_digits[] = "0123456789abcdef";
-   int i;
-
-   for (i = 0; i < 40; i += 2) {
-  buf[i] = hex_digits[sha1[i >> 1] >> 4];
-  buf[i + 1] = hex_digits[sha1[i >> 1] & 0x0f];
-   }
-   buf[i] = '\0';
-}
diff --git a/src/util/mesa-sha1.h b/src/util/mesa-sha1.h
index ecbc708b5e..02dd5f81bf 100644
--- a/src/util/mesa-sha1.h
+++ b/src/util/mesa-sha1.h
@@ -34,8 +34,11 @@ extern "C" {
 
 #define _mesa_sha1_init SHA1Init
 
-void
-_mesa_sha1_update(struct mesa_sha1 *ctx, const void *data, int size);
+static inline void
+_mesa_sha1_update(struct mesa_sha1 *ctx, const void *data, int size)
+{
+   SHA1Update(ctx, data, size);
+}
 
 static inline void
 _mesa_sha1_final(struct mesa_sha1 *ctx, unsigned char result[20])
@@ -43,11 +46,29 @@ _mesa_sha1_final(struct mesa_sha1 *ctx, unsigned char 
result[20])
SHA1Final(result, ctx);
 }
 
-void
-_mesa_sha1_format(char *buf, const unsigned char *sha1);
+static inline void
+_mesa_sha1_format(char *buf, const unsigned char *sha1)
+{
+   static const char hex_digits[] = "0123456789abcdef";
+   int i;
+
+   for (i = 0; i < 40; i += 2) {
+  buf[i] = hex_digits[sha1[i >> 1] >> 4];
+  buf[i + 1] = hex_digits[sha1[i >> 1] & 0x0f];
+   }
+   buf[i] = '\0';
+}
+
+static inline void
+_mesa_sha1_compute(const void *data, size_t size, unsigned char result[20])
+{
+   struct mesa_sha1 ctx;
+
+   _mesa_sha1_init();
+   _mesa_sha1_update(, data, size);
+   _mesa_sha1_final(, result);
+}
 
-void
-_mesa_sha1_compute(const void *data, size_t size, unsigned char result[20]);
 
 #ifdef __cplusplus
 } /* extern C */
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] util/sha1: drop _mesa_sha1_{update, format} return type

2017-01-24 Thread Emil Velikov
From: Emil Velikov 

Unused/unchecked by any of the callers.

Signed-off-by: Emil Velikov 
---
 src/util/mesa-sha1.c | 7 ++-
 src/util/mesa-sha1.h | 4 ++--
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/src/util/mesa-sha1.c b/src/util/mesa-sha1.c
index eb882e8bd0..a14fec97e7 100644
--- a/src/util/mesa-sha1.c
+++ b/src/util/mesa-sha1.c
@@ -27,11 +27,10 @@
 #include "sha1/sha1.h"
 #include "mesa-sha1.h"
 
-int
+void
 _mesa_sha1_update(struct mesa_sha1 *ctx, const void *data, int size)
 {
SHA1Update(ctx, data, size);
-   return 1;
 }
 
 void
@@ -44,7 +43,7 @@ _mesa_sha1_compute(const void *data, size_t size, unsigned 
char result[20])
_mesa_sha1_final(, result);
 }
 
-char *
+void
 _mesa_sha1_format(char *buf, const unsigned char *sha1)
 {
static const char hex_digits[] = "0123456789abcdef";
@@ -55,6 +54,4 @@ _mesa_sha1_format(char *buf, const unsigned char *sha1)
   buf[i + 1] = hex_digits[sha1[i >> 1] & 0x0f];
}
buf[i] = '\0';
-
-   return buf;
 }
diff --git a/src/util/mesa-sha1.h b/src/util/mesa-sha1.h
index f927d5772d..ecbc708b5e 100644
--- a/src/util/mesa-sha1.h
+++ b/src/util/mesa-sha1.h
@@ -34,7 +34,7 @@ extern "C" {
 
 #define _mesa_sha1_init SHA1Init
 
-int
+void
 _mesa_sha1_update(struct mesa_sha1 *ctx, const void *data, int size);
 
 static inline void
@@ -43,7 +43,7 @@ _mesa_sha1_final(struct mesa_sha1 *ctx, unsigned char 
result[20])
SHA1Final(result, ctx);
 }
 
-char *
+void
 _mesa_sha1_format(char *buf, const unsigned char *sha1);
 
 void
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] util/sha1: rework _mesa_sha1_{init,final}

2017-01-24 Thread Emil Velikov
From: Emil Velikov 

Rather than having an extra memory allocation [that we currently do not
and act accordingly] just make the API take an pointer to a stack
allocated instance.

This and follow-up steps will effectively make the _mesa_sha1_foo simple
define/inlines around their SHA1 counterparts.

Signed-off-by: Emil Velikov 
---
 src/amd/vulkan/radv_descriptor_set.c  | 10 +-
 src/amd/vulkan/radv_pipeline_cache.c  | 18 +-
 src/intel/vulkan/anv_descriptor_set.c | 13 +++--
 src/intel/vulkan/anv_pipeline_cache.c | 18 +-
 src/util/mesa-sha1.c  | 34 +-
 src/util/mesa-sha1.h  | 13 -
 6 files changed, 43 insertions(+), 63 deletions(-)

diff --git a/src/amd/vulkan/radv_descriptor_set.c 
b/src/amd/vulkan/radv_descriptor_set.c
index eb8b5d6e3a..435b7394a3 100644
--- a/src/amd/vulkan/radv_descriptor_set.c
+++ b/src/amd/vulkan/radv_descriptor_set.c
@@ -180,7 +180,7 @@ VkResult radv_CreatePipelineLayout(
 {
RADV_FROM_HANDLE(radv_device, device, _device);
struct radv_pipeline_layout *layout;
-   struct mesa_sha1 *ctx;
+   struct mesa_sha1 ctx;
 
assert(pCreateInfo->sType == 
VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO);
 
@@ -194,7 +194,7 @@ VkResult radv_CreatePipelineLayout(
unsigned dynamic_offset_count = 0;
 
 
-   ctx = _mesa_sha1_init();
+   _mesa_sha1_init();
for (uint32_t set = 0; set < pCreateInfo->setLayoutCount; set++) {
RADV_FROM_HANDLE(radv_descriptor_set_layout, set_layout,
 pCreateInfo->pSetLayouts[set]);
@@ -204,7 +204,7 @@ VkResult radv_CreatePipelineLayout(
for (uint32_t b = 0; b < set_layout->binding_count; b++) {
dynamic_offset_count += 
set_layout->binding[b].array_size * set_layout->binding[b].dynamic_offset_count;
}
-   _mesa_sha1_update(ctx, set_layout->binding,
+   _mesa_sha1_update(, set_layout->binding,
  sizeof(set_layout->binding[0]) * 
set_layout->binding_count);
}
 
@@ -217,9 +217,9 @@ VkResult radv_CreatePipelineLayout(
}
 
layout->push_constant_size = align(layout->push_constant_size, 16);
-   _mesa_sha1_update(ctx, >push_constant_size,
+   _mesa_sha1_update(, >push_constant_size,
  sizeof(layout->push_constant_size));
-   _mesa_sha1_final(ctx, layout->sha1);
+   _mesa_sha1_final(, layout->sha1);
*pPipelineLayout = radv_pipeline_layout_to_handle(layout);
 
return VK_SUCCESS;
diff --git a/src/amd/vulkan/radv_pipeline_cache.c 
b/src/amd/vulkan/radv_pipeline_cache.c
index 2cb1dfb6eb..1bfdbe804c 100644
--- a/src/amd/vulkan/radv_pipeline_cache.c
+++ b/src/amd/vulkan/radv_pipeline_cache.c
@@ -90,21 +90,21 @@ radv_hash_shader(unsigned char *hash, struct 
radv_shader_module *module,
 const struct radv_pipeline_layout *layout,
 const union ac_shader_variant_key *key)
 {
-   struct mesa_sha1 *ctx;
+   struct mesa_sha1 ctx;
 
-   ctx = _mesa_sha1_init();
+   _mesa_sha1_init();
if (key)
-   _mesa_sha1_update(ctx, key, sizeof(*key));
-   _mesa_sha1_update(ctx, module->sha1, sizeof(module->sha1));
-   _mesa_sha1_update(ctx, entrypoint, strlen(entrypoint));
+   _mesa_sha1_update(, key, sizeof(*key));
+   _mesa_sha1_update(, module->sha1, sizeof(module->sha1));
+   _mesa_sha1_update(, entrypoint, strlen(entrypoint));
if (layout)
-   _mesa_sha1_update(ctx, layout->sha1, sizeof(layout->sha1));
+   _mesa_sha1_update(, layout->sha1, sizeof(layout->sha1));
if (spec_info) {
-   _mesa_sha1_update(ctx, spec_info->pMapEntries,
+   _mesa_sha1_update(, spec_info->pMapEntries,
  spec_info->mapEntryCount * sizeof 
spec_info->pMapEntries[0]);
-   _mesa_sha1_update(ctx, spec_info->pData, spec_info->dataSize);
+   _mesa_sha1_update(, spec_info->pData, spec_info->dataSize);
}
-   _mesa_sha1_final(ctx, hash);
+   _mesa_sha1_final(, hash);
 }
 
 
diff --git a/src/intel/vulkan/anv_descriptor_set.c 
b/src/intel/vulkan/anv_descriptor_set.c
index a5e65afc48..29bb67c5c3 100644
--- a/src/intel/vulkan/anv_descriptor_set.c
+++ b/src/intel/vulkan/anv_descriptor_set.c
@@ -259,18 +259,19 @@ VkResult anv_CreatePipelineLayout(
   }
}
 
-   struct mesa_sha1 *ctx = _mesa_sha1_init();
+   struct mesa_sha1 ctx;
+   _mesa_sha1_init();
for (unsigned s = 0; s < layout->num_sets; s++) {
-  sha1_update_descriptor_set_layout(ctx, layout->set[s].layout);
-  _mesa_sha1_update(ctx, >set[s].dynamic_offset_start,
+  sha1_update_descriptor_set_layout(, layout->set[s].layout);
+  _mesa_sha1_update(, 

[Mesa-dev] [PATCH 1/5] util/sha1: add non-typedef name for the SHA1_CTX struct

2017-01-24 Thread Emil Velikov
From: Emil Velikov 

Using typedef(s) is not always the answer and makes it harder for people
to do clever (or one might call nasty) things with the code.

Add a struct name which we will use with follow-up commit.

Signed-off-by: Emil Velikov 
---
 src/util/sha1/README | 3 +++
 src/util/sha1/sha1.h | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/util/sha1/README b/src/util/sha1/README
index f13baf9d1a..f30acf984e 100644
--- a/src/util/sha1/README
+++ b/src/util/sha1/README
@@ -57,3 +57,6 @@ Upstream status: TBD (N/A ?)
  - Manually expand __BEGIN_DECLS/__END_DECLS and make sure that they include
 the struct declaration.
 Upstream status: TBD
+
+ - Add non-typedef struct name.
+Upstream status: TBD
diff --git a/src/util/sha1/sha1.h b/src/util/sha1/sha1.h
index 243481a98e..029a0ae87f 100644
--- a/src/util/sha1/sha1.h
+++ b/src/util/sha1/sha1.h
@@ -20,7 +20,7 @@
 extern "C" {
 #endif
 
-typedef struct {
+typedef struct _SHA1_CTX {
 uint32_t state[5];
 uint64_t count;
 uint8_t buffer[SHA1_BLOCK_LENGTH];
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: implement pipeline statistics queries

2017-01-24 Thread Ilia Mirkin
2-month ping. [ok, it hasn't been 2 months on the dot, but ... close.]

On Tue, Jan 10, 2017 at 5:49 PM, Ilia Mirkin  wrote:
> ping.
>
> On Thu, Dec 22, 2016 at 11:14 AM, Ilia Mirkin  wrote:
>> Ping? Any further comments/feedback/reviews?
>>
>>
>> On Dec 5, 2016 11:22 AM, "Ilia Mirkin"  wrote:
>>
>> On Mon, Dec 5, 2016 at 11:11 AM, Robert Bragg  wrote:
>>>
>>>
>>> On Sun, Nov 27, 2016 at 7:23 PM, Ilia Mirkin  wrote:

 The strategy is to just keep n anv_query_pool_slot entries per query
 instead of one. The available bit is only valid in the last one.

 Signed-off-by: Ilia Mirkin 
 ---

 I think this is in a pretty good state now. I've tested both the direct
 and
 buffer paths with a hacked up cube application, and I'm seeing
 non-ridiculous
 values for the various counters, although I haven't 100% verified them
 for
 accuracy.

 This also implements the hsw/bdw workaround for dividing frag invocations
 by 4,
 copied from hsw_queryobj. I tested this on SKL and it seem to divide the
 values
 as expected.

 The cube patch I've been testing with is at
 http://paste.debian.net/899374/
 You can flip between copying to a buffer and explicit retrieval by
 commenting
 out the relevant function calls.

  src/intel/vulkan/anv_device.c  |   2 +-
  src/intel/vulkan/anv_private.h |   4 +
  src/intel/vulkan/anv_query.c   |  99 ++
  src/intel/vulkan/genX_cmd_buffer.c | 260
 -
  4 files changed, 308 insertions(+), 57 deletions(-)


 diff --git a/src/intel/vulkan/anv_device.c
 b/src/intel/vulkan/anv_device.c
 index 99eb73c..7ad1970 100644
 --- a/src/intel/vulkan/anv_device.c
 +++ b/src/intel/vulkan/anv_device.c
 @@ -427,7 +427,7 @@ void anv_GetPhysicalDeviceFeatures(
.textureCompressionASTC_LDR   = pdevice->info.gen >=
 9,
 /* FINISHME CHV */
.textureCompressionBC = true,
.occlusionQueryPrecise= true,
 -  .pipelineStatisticsQuery  = false,
 +  .pipelineStatisticsQuery  = true,
.fragmentStoresAndAtomics = true,
.shaderTessellationAndGeometryPointSize   = true,
.shaderImageGatherExtended= false,
 diff --git a/src/intel/vulkan/anv_private.h
 b/src/intel/vulkan/anv_private.h
 index 2fc543d..7271609 100644
 --- a/src/intel/vulkan/anv_private.h
 +++ b/src/intel/vulkan/anv_private.h
 @@ -1763,6 +1763,8 @@ struct anv_render_pass {
 struct anv_subpass   subpasses[0];
  };

 +#define ANV_PIPELINE_STATISTICS_COUNT 11
 +
  struct anv_query_pool_slot {
 uint64_t begin;
 uint64_t end;
 @@ -1772,6 +1774,8 @@ struct anv_query_pool_slot {
  struct anv_query_pool {
 VkQueryType  type;
 uint32_t slots;
 +   uint32_t pipeline_statistics;
 +   uint32_t slot_stride;
 struct anv_bobo;
  };

 diff --git a/src/intel/vulkan/anv_query.c b/src/intel/vulkan/anv_query.c
 index 293257b..dc00859 100644
 --- a/src/intel/vulkan/anv_query.c
 +++ b/src/intel/vulkan/anv_query.c
 @@ -38,8 +38,10 @@ VkResult anv_CreateQueryPool(
 ANV_FROM_HANDLE(anv_device, device, _device);
 struct anv_query_pool *pool;
 VkResult result;
 -   uint32_t slot_size;
 -   uint64_t size;
 +   uint32_t slot_size = sizeof(struct anv_query_pool_slot);
 +   uint32_t slot_stride = 1;
 +   uint64_t size = pCreateInfo->queryCount * slot_size;
 +   uint32_t pipeline_statistics = 0;

 assert(pCreateInfo->sType ==
 VK_STRUCTURE_TYPE_QUERY_POOL_CREATE_INFO);

 @@ -48,12 +50,16 @@ VkResult anv_CreateQueryPool(
 case VK_QUERY_TYPE_TIMESTAMP:
break;
 case VK_QUERY_TYPE_PIPELINE_STATISTICS:
 -  return VK_ERROR_INCOMPATIBLE_DRIVER;
 +  pipeline_statistics = pCreateInfo->pipelineStatistics &
 + ((1 << ANV_PIPELINE_STATISTICS_COUNT) - 1);
 +  slot_stride = _mesa_bitcount(pipeline_statistics);
 +  size *= slot_stride;
 +  break;
 default:
assert(!"Invalid query type");
 +  return VK_ERROR_INCOMPATIBLE_DRIVER;
 }

 -   slot_size = sizeof(struct anv_query_pool_slot);
 pool = vk_alloc2(>alloc, pAllocator, sizeof(*pool), 8,
   VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
 if (pool == NULL)
 @@ 

Re: [Mesa-dev] [PATCH 1/2] r600g: use ieee variants of multiplication instructions

2017-01-24 Thread Ilia Mirkin
On Tue, Jan 24, 2017 at 2:11 PM, Matteo Bruni  wrote:
> That doesn't help Wine or any "native" OpenGL application which
> happens to depend on the old behavior.

Oh, and another note on that - I *do* think it helps those
applications. Because now they will no longer inexplicably work on
r600 and not work on radeonsi, i965, and nouveau. It will now be
consistent, which will eliminate the "oh, that driver is broken"
suspicion.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: add support for optionally using non-IEEE mul ops

2017-01-24 Thread Ilia Mirkin
I think of the first patch as a fix to the driver, and the second
patch as a new feature.

On Tue, Jan 24, 2017 at 2:27 PM, Nicolai Hähnle  wrote:
> No piglit regressions on Redwood with these two patches. Matteo's point
> about switching the order of the patches around seems reasonable.
>
> Cheers,
> Nicolai
>
>
> On 24.01.2017 10:20, Nicolai Hähnle wrote:
>>
>> The series looks reasonable to me, so
>>
>> Reviewed-by: Nicolai Hähnle 
>>
>> Please hold off on pushing this for a day or so, to give me or someone
>> else a chance to test this.
>>
>> On 24.01.2017 03:18, Ilia Mirkin wrote:
>>>
>>> Signed-off-by: Ilia Mirkin 
>>> ---
>>>
>>> Untested. Can be verified with Xnine. It should pass before 1/2 of
>>> this series,
>>> start failing with it, and pass again with 2/2 in place.
>>>
>>>  src/gallium/drivers/r600/r600_pipe.c   |  2 +-
>>>  src/gallium/drivers/r600/r600_shader.c | 20 +---
>>>  2 files changed, 18 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/r600/r600_pipe.c
>>> b/src/gallium/drivers/r600/r600_pipe.c
>>> index 98ceebf..d126d37 100644
>>> --- a/src/gallium/drivers/r600/r600_pipe.c
>>> +++ b/src/gallium/drivers/r600/r600_pipe.c
>>> @@ -286,6 +286,7 @@ static int r600_get_param(struct pipe_screen*
>>> pscreen, enum pipe_cap param)
>>>  case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
>>>  case PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED:
>>>  case PIPE_CAP_CLEAR_TEXTURE:
>>> +case PIPE_CAP_TGSI_MUL_ZERO_WINS:
>>>  return 1;
>>>
>>>  case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
>>> @@ -378,7 +379,6 @@ static int r600_get_param(struct pipe_screen*
>>> pscreen, enum pipe_cap param)
>>>  case PIPE_CAP_NATIVE_FENCE_FD:
>>>  case PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY:
>>>  case PIPE_CAP_TGSI_FS_FBFETCH:
>>> -case PIPE_CAP_TGSI_MUL_ZERO_WINS:
>>>  return 0;
>>>
>>>  case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS:
>>> diff --git a/src/gallium/drivers/r600/r600_shader.c
>>> b/src/gallium/drivers/r600/r600_shader.c
>>> index 0114f8f..b692e7f 100644
>>> --- a/src/gallium/drivers/r600/r600_shader.c
>>> +++ b/src/gallium/drivers/r600/r600_shader.c
>>> @@ -3906,6 +3906,11 @@ static int tgsi_op2_s(struct r600_shader_ctx
>>> *ctx, int swap, int trans_only)
>>>  int i, j, r, lasti = tgsi_last_instruction(write_mask);
>>>  /* use temp register if trans_only and more than one dst
>>> component */
>>>  int use_tmp = trans_only && (write_mask ^ (1 << lasti));
>>> +unsigned op = ctx->inst_info->op;
>>> +
>>> +if (op == ALU_OP2_MUL_IEEE &&
>>> +ctx->info.properties[TGSI_PROPERTY_MUL_ZERO_WINS])
>>> +op = ALU_OP2_MUL;
>>>
>>>  for (i = 0; i <= lasti; i++) {
>>>  if (!(write_mask & (1 << i)))
>>> @@ -3919,7 +3924,7 @@ static int tgsi_op2_s(struct r600_shader_ctx
>>> *ctx, int swap, int trans_only)
>>>  } else
>>>  tgsi_dst(ctx, >Dst[0], i, );
>>>
>>> -alu.op = ctx->inst_info->op;
>>> +alu.op = op;
>>>  if (!swap) {
>>>  for (j = 0; j < inst->Instruction.NumSrcRegs; j++) {
>>>  r600_bytecode_src([j], >src[j], i);
>>> @@ -6543,6 +6548,11 @@ static int tgsi_op3(struct r600_shader_ctx *ctx)
>>>  int i, j, r;
>>>  int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);
>>>  int temp_regs[4];
>>> +unsigned op = ctx->inst_info->op;
>>> +
>>> +if (op == ALU_OP3_MULADD_IEEE &&
>>> +ctx->info.properties[TGSI_PROPERTY_MUL_ZERO_WINS])
>>> +op = ALU_OP3_MULADD;
>>>
>>>  for (j = 0; j < inst->Instruction.NumSrcRegs; j++) {
>>>  temp_regs[j] = 0;
>>> @@ -6554,7 +6564,7 @@ static int tgsi_op3(struct r600_shader_ctx *ctx)
>>>  continue;
>>>
>>>  memset(, 0, sizeof(struct r600_bytecode_alu));
>>> -alu.op = ctx->inst_info->op;
>>> +alu.op = op;
>>>  for (j = 0; j < inst->Instruction.NumSrcRegs; j++) {
>>>  r = tgsi_make_src_for_op3(ctx, temp_regs[j], i,
>>> [j], >src[j]);
>>>  if (r)
>>> @@ -6580,10 +6590,14 @@ static int tgsi_dp(struct r600_shader_ctx *ctx)
>>>  struct tgsi_full_instruction *inst =
>>> >parse.FullToken.FullInstruction;
>>>  struct r600_bytecode_alu alu;
>>>  int i, j, r;
>>> +unsigned op = ctx->inst_info->op;
>>> +if (op == ALU_OP2_DOT4_IEEE &&
>>> +ctx->info.properties[TGSI_PROPERTY_MUL_ZERO_WINS])
>>> +op = ALU_OP2_DOT4;
>>>
>>>  for (i = 0; i < 4; i++) {
>>>  memset(, 0, sizeof(struct r600_bytecode_alu));
>>> -alu.op = ctx->inst_info->op;
>>> +alu.op = op;
>>>  for (j = 0; j < inst->Instruction.NumSrcRegs; j++) {
>>>  r600_bytecode_src([j], >src[j], i);
>>>  }
>>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] r600g: use ieee variants of multiplication instructions

2017-01-24 Thread Ilia Mirkin
On Tue, Jan 24, 2017 at 2:11 PM, Matteo Bruni  wrote:
> 2017-01-24 19:15 GMT+01:00 Ilia Mirkin :
>> On Tue, Jan 24, 2017 at 1:11 PM, Matteo Bruni  
>> wrote:
>>> 2017-01-24 3:18 GMT+01:00 Ilia Mirkin :
 This matches the behavior of most other drivers, including nouveau.
>>>
>>> Doesn't this break all the applications depending on d3d9 NaN behavior
>>> (including, but not limited to, d3d9 games in Wine) on r600g?
>>>
>>> If I got this right, flipping around the two patches in this series
>>> and enabling the TGSI_PROPERTY_MUL_ZERO_WINS flag for OpenGL
>>> non-compute shaders (if that's not the case already) should avoid
>>> regressions.
>>
>> This patch normalizes r600g wrt multiply handling with the other
>> DX10/11 hardware drivers. nv50, nvc0, si, and i965 all use the IEEE
>> behavior. I don't know for sure, but assume that nv30 and r300 have
>> the DX9 behavior natively without IEEE support.
>>
>> The next patch allows for the MUL_ZERO_WINS property to be used to get
>> the DX9 behavior, which st/nine will make use of.
>
> That doesn't help Wine or any "native" OpenGL application which
> happens to depend on the old behavior.
> Even if there are none of them (which doesn't sound right to me)
> applying this patch before 2/2 means that you are changing behavior
> for nine in this one patch and changing it back again with the next,
> which looks to me as something generally better avoided.

IMHO this patch should go in irrespective of the second patch. The
IEEE behavior on multiplies is what all the other hw drivers do.
Having one driver do one thing and every other driver do another thing
is not a great situation to be in. The second patch is a nicety for
st/nine and any future GL extensions that can make use of the
functionality.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vulkan: Don't install vk_platform.h or vulkan.h.

2017-01-24 Thread Jason Ekstrand
On Tue, Jan 24, 2017 at 11:25 AM, Emil Velikov 
wrote:

> On 24 January 2017 at 18:02, Jason Ekstrand  wrote:
> > On Tue, Jan 24, 2017 at 9:03 AM, Matt Turner  wrote:
> >>
> >> On Tue, Jan 24, 2017 at 8:41 AM, Emil Velikov  >
> >> wrote:
> >> > On 24 January 2017 at 00:54, Matt Turner  wrote:
> >> >> These files belong to the vulkan loader.
> >> > Fully agreed, patch is
> >> > Reviewed-by: Emil Velikov 
> >>
> >> Thanks!
> >>
> >> > Related question:
> >> > I was wondering about getting this a step further:
> >> >  - having the loader provide a .pc file
> >> >  - tracking required version at configure time and dropping our local
> >> > copies of the headers/xml.
> >> >
> >> > Would you be in favour, against, neutral of such an approach ?
> >>
> >> I'd be in favor of that, but let's see what Jason thinks.
> >
> >
> > I'd rather not.  That would make sense if we all lived in the open-source
> > world where everything is upstream all the time.  Unfortunately, not all
> of
> > us have that luxury and we need to be able to work on experimental
> branches
> > of the spec that may have more extensions than are provided by any loader
> > version we can install.  I'd be ok with a check for a particular loader
> > version just to force distros to update their loader but I would like to
> be
> > able to build with arbitrary XML branches without having to install a
> branch
> > of the loader.
> What if I tell you that you wouldn't need to install the loader ;-)
> More as we get a .pc patches in.
>

A lot of extensions don't require explicit loader support.  I don't want to
have to update my loader (or put it in some folder and point pkg-config at
it) just to hack on them.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: add support for optionally using non-IEEE mul ops

2017-01-24 Thread Nicolai Hähnle
No piglit regressions on Redwood with these two patches. Matteo's point 
about switching the order of the patches around seems reasonable.


Cheers,
Nicolai

On 24.01.2017 10:20, Nicolai Hähnle wrote:

The series looks reasonable to me, so

Reviewed-by: Nicolai Hähnle 

Please hold off on pushing this for a day or so, to give me or someone
else a chance to test this.

On 24.01.2017 03:18, Ilia Mirkin wrote:

Signed-off-by: Ilia Mirkin 
---

Untested. Can be verified with Xnine. It should pass before 1/2 of
this series,
start failing with it, and pass again with 2/2 in place.

 src/gallium/drivers/r600/r600_pipe.c   |  2 +-
 src/gallium/drivers/r600/r600_shader.c | 20 +---
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c
b/src/gallium/drivers/r600/r600_pipe.c
index 98ceebf..d126d37 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -286,6 +286,7 @@ static int r600_get_param(struct pipe_screen*
pscreen, enum pipe_cap param)
 case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
 case PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED:
 case PIPE_CAP_CLEAR_TEXTURE:
+case PIPE_CAP_TGSI_MUL_ZERO_WINS:
 return 1;

 case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
@@ -378,7 +379,6 @@ static int r600_get_param(struct pipe_screen*
pscreen, enum pipe_cap param)
 case PIPE_CAP_NATIVE_FENCE_FD:
 case PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY:
 case PIPE_CAP_TGSI_FS_FBFETCH:
-case PIPE_CAP_TGSI_MUL_ZERO_WINS:
 return 0;

 case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS:
diff --git a/src/gallium/drivers/r600/r600_shader.c
b/src/gallium/drivers/r600/r600_shader.c
index 0114f8f..b692e7f 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -3906,6 +3906,11 @@ static int tgsi_op2_s(struct r600_shader_ctx
*ctx, int swap, int trans_only)
 int i, j, r, lasti = tgsi_last_instruction(write_mask);
 /* use temp register if trans_only and more than one dst
component */
 int use_tmp = trans_only && (write_mask ^ (1 << lasti));
+unsigned op = ctx->inst_info->op;
+
+if (op == ALU_OP2_MUL_IEEE &&
+ctx->info.properties[TGSI_PROPERTY_MUL_ZERO_WINS])
+op = ALU_OP2_MUL;

 for (i = 0; i <= lasti; i++) {
 if (!(write_mask & (1 << i)))
@@ -3919,7 +3924,7 @@ static int tgsi_op2_s(struct r600_shader_ctx
*ctx, int swap, int trans_only)
 } else
 tgsi_dst(ctx, >Dst[0], i, );

-alu.op = ctx->inst_info->op;
+alu.op = op;
 if (!swap) {
 for (j = 0; j < inst->Instruction.NumSrcRegs; j++) {
 r600_bytecode_src([j], >src[j], i);
@@ -6543,6 +6548,11 @@ static int tgsi_op3(struct r600_shader_ctx *ctx)
 int i, j, r;
 int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);
 int temp_regs[4];
+unsigned op = ctx->inst_info->op;
+
+if (op == ALU_OP3_MULADD_IEEE &&
+ctx->info.properties[TGSI_PROPERTY_MUL_ZERO_WINS])
+op = ALU_OP3_MULADD;

 for (j = 0; j < inst->Instruction.NumSrcRegs; j++) {
 temp_regs[j] = 0;
@@ -6554,7 +6564,7 @@ static int tgsi_op3(struct r600_shader_ctx *ctx)
 continue;

 memset(, 0, sizeof(struct r600_bytecode_alu));
-alu.op = ctx->inst_info->op;
+alu.op = op;
 for (j = 0; j < inst->Instruction.NumSrcRegs; j++) {
 r = tgsi_make_src_for_op3(ctx, temp_regs[j], i,
[j], >src[j]);
 if (r)
@@ -6580,10 +6590,14 @@ static int tgsi_dp(struct r600_shader_ctx *ctx)
 struct tgsi_full_instruction *inst =
>parse.FullToken.FullInstruction;
 struct r600_bytecode_alu alu;
 int i, j, r;
+unsigned op = ctx->inst_info->op;
+if (op == ALU_OP2_DOT4_IEEE &&
+ctx->info.properties[TGSI_PROPERTY_MUL_ZERO_WINS])
+op = ALU_OP2_DOT4;

 for (i = 0; i < 4; i++) {
 memset(, 0, sizeof(struct r600_bytecode_alu));
-alu.op = ctx->inst_info->op;
+alu.op = op;
 for (j = 0; j < inst->Instruction.NumSrcRegs; j++) {
 r600_bytecode_src([j], >src[j], i);
 }


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/7] gallium/radeon: adjust the rule for using the LINEAR_ALIGNED layout

2017-01-24 Thread Nicolai Hähnle

This patch breaks piglit

./bin/ext_image_dma_buf_import-refcount -auto -fbo

at least on Redwood. VI seems to be fine.

Nicolai

On 20.01.2017 20:07, Marek Olšák wrote:

From: Marek Olšák 

---
 src/gallium/drivers/radeon/r600_texture.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/r600_texture.c 
b/src/gallium/drivers/radeon/r600_texture.c
index cba4e7d..0b77c82 100644
--- a/src/gallium/drivers/radeon/r600_texture.c
+++ b/src/gallium/drivers/radeon/r600_texture.c
@@ -1177,21 +1177,23 @@ r600_choose_tiling(struct r600_common_screen *rscreen,
if (rscreen->chip_class >= SI &&
(templ->bind & PIPE_BIND_CURSOR))
return RADEON_SURF_MODE_LINEAR_ALIGNED;

if (templ->bind & PIPE_BIND_LINEAR)
return RADEON_SURF_MODE_LINEAR_ALIGNED;

/* Textures with a very small height are recommended to be 
linear. */
if (templ->target == PIPE_TEXTURE_1D ||
templ->target == PIPE_TEXTURE_1D_ARRAY ||
-   templ->height0 <= 4)
+   /* Only very thin and long 2D textures should benefit from
+* linear_aligned. */
+   (templ->width0 > 8 && templ->height0 <= 2))
return RADEON_SURF_MODE_LINEAR_ALIGNED;

/* Textures likely to be mapped often. */
if (templ->usage == PIPE_USAGE_STAGING ||
templ->usage == PIPE_USAGE_STREAM)
return RADEON_SURF_MODE_LINEAR_ALIGNED;
}

/* Make small textures 1D tiled. */
if (templ->width0 <= 16 || templ->height0 <= 16 ||


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vulkan: Don't install vk_platform.h or vulkan.h.

2017-01-24 Thread Emil Velikov
On 24 January 2017 at 18:02, Jason Ekstrand  wrote:
> On Tue, Jan 24, 2017 at 9:03 AM, Matt Turner  wrote:
>>
>> On Tue, Jan 24, 2017 at 8:41 AM, Emil Velikov 
>> wrote:
>> > On 24 January 2017 at 00:54, Matt Turner  wrote:
>> >> These files belong to the vulkan loader.
>> > Fully agreed, patch is
>> > Reviewed-by: Emil Velikov 
>>
>> Thanks!
>>
>> > Related question:
>> > I was wondering about getting this a step further:
>> >  - having the loader provide a .pc file
>> >  - tracking required version at configure time and dropping our local
>> > copies of the headers/xml.
>> >
>> > Would you be in favour, against, neutral of such an approach ?
>>
>> I'd be in favor of that, but let's see what Jason thinks.
>
>
> I'd rather not.  That would make sense if we all lived in the open-source
> world where everything is upstream all the time.  Unfortunately, not all of
> us have that luxury and we need to be able to work on experimental branches
> of the spec that may have more extensions than are provided by any loader
> version we can install.  I'd be ok with a check for a particular loader
> version just to force distros to update their loader but I would like to be
> able to build with arbitrary XML branches without having to install a branch
> of the loader.
What if I tell you that you wouldn't need to install the loader ;-)
More as we get a .pc patches in.

Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] r600g: use ieee variants of multiplication instructions

2017-01-24 Thread Matteo Bruni
2017-01-24 19:15 GMT+01:00 Ilia Mirkin :
> On Tue, Jan 24, 2017 at 1:11 PM, Matteo Bruni  
> wrote:
>> 2017-01-24 3:18 GMT+01:00 Ilia Mirkin :
>>> This matches the behavior of most other drivers, including nouveau.
>>
>> Doesn't this break all the applications depending on d3d9 NaN behavior
>> (including, but not limited to, d3d9 games in Wine) on r600g?
>>
>> If I got this right, flipping around the two patches in this series
>> and enabling the TGSI_PROPERTY_MUL_ZERO_WINS flag for OpenGL
>> non-compute shaders (if that's not the case already) should avoid
>> regressions.
>
> This patch normalizes r600g wrt multiply handling with the other
> DX10/11 hardware drivers. nv50, nvc0, si, and i965 all use the IEEE
> behavior. I don't know for sure, but assume that nv30 and r300 have
> the DX9 behavior natively without IEEE support.
>
> The next patch allows for the MUL_ZERO_WINS property to be used to get
> the DX9 behavior, which st/nine will make use of.

That doesn't help Wine or any "native" OpenGL application which
happens to depend on the old behavior.
Even if there are none of them (which doesn't sound right to me)
applying this patch before 2/2 means that you are changing behavior
for nine in this one patch and changing it back again with the next,
which looks to me as something generally better avoided.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] anv: bail out if using loader interface prior to v3

2017-01-24 Thread Jason Ekstrand
On Tue, Jan 24, 2017 at 8:17 AM, Emil Velikov 
wrote:

> On 24 January 2017 at 15:41, Chad Versace 
> wrote:
> > On Tue 24 Jan 2017, Emil Velikov wrote:
> >> From: Emil Velikov 
> >>
> >> Strictly speaking we could add support for v2 and earlier. At the same
> >> time, those tend to be buggy and as such there's limited testing done.
> >
> > I'm confused by the claim of "limited testing". Before my patch landed
> > that upgraded anvil to loader interface v3, the driver only supported
> > loader interface v1. And any differences between v1 and v2 are
> > negligible enough to not be the cause of any crash.
> >
> > So... is the real problem
> > a. anvil doesn't support loader interface v2, or
> > b. Fedora 25 ships a buggy loader, and this patch effectively forces
> >the user to upgrade the loader to a version in which the bug is
> >fixed.
> >
> > I have difficulty understanding how (a) could possibly be the problem.
> > Did some patches land in src/vulkan/wsi that broke the v2 interface? If
> > so, then this patch is probably justified.
> >
> > If the actual problem is (b), then I believe this patch is the wrong way
> > to fix it. The real fix should go into the loader. And this patch
> > prevents the driver working on systems where it should work.
> >
> I fully agree with your reasoning.
>
> B is the one to blame here. I may have gone overzealous with the
> wording/approach, but the idea is there - how do we deal with issues,
> reported against Mesa (ANV/RADV) where the problems seems to be in the
> loader.
> We don't want to have the behaviour we had with OpenGL where people
> jump to assumptions that ANV/RADV is broken because "it works" with
> binary driver FOO. Even when the crash/issue is outside Mesa.
>
> Looking at git log (as per the bugreport) I wonder if encouraging
> people to use updated loader (as this patch does) isn't that bad of a
> thing. Esp. since distros might not always see a reason otherwise.
>

For what it's worth, Fedora is in the process of updating their loader...

Also, I think we will want to do this eventually but not yet.  One of these
days, I'm going to rewrite the WSI implementation *again* to make it do
something useful in CreateFooSurface.


> > More comments below.
> >
> >> Cc: Jason Ekstrand 
> >> Cc: Shawn Starr 
> >> Cc: Chad Versace 
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99446
> >> Signed-off-by: Emil Velikov 
> >> ---
> >> Slightly pedantic, yet explicitly mentioned in the spec as a way to
> >> detect/manage older loader versions. Would have saved us a crash, so I'm
> >> wondering if we want it for stable ?
> >>
> >> Shawn considering you still have the old libvulkan.so around can you
> >> give this and/or 2/2 a test ?
> >> ---
> >>  src/intel/vulkan/anv_device.c | 8 
> >>  1 file changed, 8 insertions(+)
> >>
> >> diff --git a/src/intel/vulkan/anv_device.c
> b/src/intel/vulkan/anv_device.c
> >> index f80a36a940..e7aa81883a 100644
> >> --- a/src/intel/vulkan/anv_device.c
> >> +++ b/src/intel/vulkan/anv_device.c
> >> @@ -36,6 +36,8 @@
> >>
> >>  #include "genxml/gen7_pack.h"
> >>
> >> +static uint32_t loader_version;
> >> +
> >>  struct anv_dispatch_table dtable;
> >>
> >>  static void
> >> @@ -739,6 +741,11 @@ VKAPI_ATTR PFN_vkVoidFunction VKAPI_CALL
> vk_icdGetInstanceProcAddr(
> >>  VkInstance  instance,
> >>  const char* pName)
> >>  {
> >> +   if (loader_version < 3u) {
> >> +  fprintf(stderr, "WARNING: ANV supports Loader interface v3 or
> newer, v%u "
> >> +  "detected. Update your libvulkan.so.\n",
> loader_version);
> >> +  return NULL;
> >> +   }
> >> return anv_GetInstanceProcAddr(instance, pName);
> >>  }
> >>
> >> @@ -2075,6 +2082,7 @@ vk_icdNegotiateLoaderICDInterfaceVersion(uint32_t*
> pSupportedVersion)
> >>  *  vkDestroySurfaceKHR(), and other API which uses
> VKSurfaceKHR,
> >>  *  because the loader no longer does so.
> >>  */
> >> +   loader_version = *pSupportedVersion;
> >> *pSupportedVersion = MIN2(*pSupportedVersion, 3u);
> >> return VK_SUCCESS;
> >>  }
> >
> > If this patch does land, then This hunk needs fixing. If the driver
> > doesn't support loader interface version 2, then the loader spec
> > requires that we return VK_ERROR_INCOMPATIBLE_DRIVER here if
> > *pSupportedVersion < 3.
> >
> > The loader spec says:
> >
> > If the ICD receiving the call no longer supports the interface
> > version provided  by the loader (due to deprecation), then it should
> > report VK_ERROR_INCOMPATIBLE_DRIVER error.  Otherwise it sets the
> > value pointed by "pSupportedVersion" to the latest interface version
> > supported by both the ICD and the loader and returns 

Re: [Mesa-dev] [PATCH 1/6] anv: Set viewport extents correctly when height is negative

2017-01-24 Thread Lionel Landwerlin

On 24/01/17 17:40, Jason Ekstrand wrote:
On Tue, Jan 24, 2017 at 12:49 AM, Iago Toral > wrote:


On Mon, 2017-01-23 at 14:12 -0800, Jason Ekstrand wrote:
> As per VK_KHR_maintenance1, setting a negative height in the
viewport
> can be used to get flipped coordinates.  This is, aparently, very
> useful
> when porting D3D apps to Vulkan.  All we need to do to support this
> is
> to make sure we actually set the min and max correctly.
> ---
>  src/intel/vulkan/gen8_cmd_buffer.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/vulkan/gen8_cmd_buffer.c
> b/src/intel/vulkan/gen8_cmd_buffer.c
> index f22037b..ab68872 100644
> --- a/src/intel/vulkan/gen8_cmd_buffer.c
> +++ b/src/intel/vulkan/gen8_cmd_buffer.c
> @@ -59,8 +59,8 @@ gen8_cmd_buffer_emit_viewport(struct
anv_cmd_buffer
> *cmd_buffer)
>   .YMaxClipGuardband = 1.0f,
>   .XMinViewPort = vp->x,
>   .XMaxViewPort = vp->x + vp->width - 1,
> - .YMinViewPort = vp->y,
> - .YMaxViewPort = vp->y + vp->height - 1,
> + .YMinViewPort = MIN2(vp->y, vp->y + vp->height),
> + .YMaxViewPort = MAX2(vp->y, vp->y + vp->height) - 1,
>};

If we have y = 0 and height = -100, shouldn't we use YMinVP = -99 and
YMaxVP = 0 instead of (-100, -1)?


No, I think we still want -100, -1.  In the case mentioned, the Y 
region, in floating-point, is [-100, 0]. However, it appears that, 
even though it's float, we're expected to provide max-1 in the max fields.


Thanks for the explanation!

Reviewed-by: Lionel Landwerlin 


>GENX(SF_CLIP_VIEWPORT_pack)(NULL, sf_clip_state.map + i * 64,




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] spirv: handle gl_SampleMask

2017-01-24 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Tue, Jan 24, 2017 at 4:48 AM, Iago Toral Quiroga 
wrote:

> SPIR-V maps both gl_SampleMask and gl_SampleMaskIn to the same
> builtin (SampleMask). The only way to tell which one we are dealing with
> is to check if it is an input or an output.
>
> Fixes:
> dEQP-VK.pipeline.multisample_shader_builtin.sample_mask.write.*
> ---
> I am still waiting on Jenkins to report results from this patch, but for
> some reason it is taking surprisingly long so I figured I'd send it for
> review ahead of the results, I don't expect regressions, but I'll verify
> there aren't any when I get them in any case.
>
>  src/compiler/spirv/vtn_variables.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/src/compiler/spirv/vtn_variables.c b/src/compiler/spirv/vtn_
> variables.c
> index d55f81e..4d1ec78 100644
> --- a/src/compiler/spirv/vtn_variables.c
> +++ b/src/compiler/spirv/vtn_variables.c
> @@ -975,8 +975,12 @@ vtn_get_builtin_location(struct vtn_builder *b,
>set_mode_system_value(mode);
>break;
> case SpvBuiltInSampleMask:
> -  *location = SYSTEM_VALUE_SAMPLE_MASK_IN; /* XXX out? */
> -  set_mode_system_value(mode);
> +  if (*mode == nir_var_shader_out) {
> + *location = FRAG_RESULT_SAMPLE_MASK;
> +  } else {
> + *location = SYSTEM_VALUE_SAMPLE_MASK_IN;
> + set_mode_system_value(mode);
> +  }
>break;
> case SpvBuiltInFragDepth:
>*location = FRAG_RESULT_DEPTH;
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97879] [amdgpu] Rocket League: long hangs (several seconds) when loading assets (models/textures/shaders?)

2017-01-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97879

--- Comment #53 from Timothee Besset  ---
Hello! I have started working on this. I haven't found the root cause yet but I
will update here when I have something.

(For context, I did the initial port work for Psyonix. I just recently got a
radeonsi setup together so I can look at this now.)

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] r600g: use ieee variants of multiplication instructions

2017-01-24 Thread Ilia Mirkin
On Tue, Jan 24, 2017 at 1:11 PM, Matteo Bruni  wrote:
> 2017-01-24 3:18 GMT+01:00 Ilia Mirkin :
>> This matches the behavior of most other drivers, including nouveau.
>
> Doesn't this break all the applications depending on d3d9 NaN behavior
> (including, but not limited to, d3d9 games in Wine) on r600g?
>
> If I got this right, flipping around the two patches in this series
> and enabling the TGSI_PROPERTY_MUL_ZERO_WINS flag for OpenGL
> non-compute shaders (if that's not the case already) should avoid
> regressions.

This patch normalizes r600g wrt multiply handling with the other
DX10/11 hardware drivers. nv50, nvc0, si, and i965 all use the IEEE
behavior. I don't know for sure, but assume that nv30 and r300 have
the DX9 behavior natively without IEEE support.

The next patch allows for the MUL_ZERO_WINS property to be used to get
the DX9 behavior, which st/nine will make use of.

Cheers,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: fix compile errors with mingw due to missing PRIx64 definitions

2017-01-24 Thread Roland Scheidegger
Am 24.01.2017 um 14:23 schrieb Jose Fonseca:
> On 23/01/17 19:21, srol...@vmware.com wrote:
>> From: Roland Scheidegger 
>>
>> define __STDC_FORMAT_MACROS and include  (same as
>> ir_builder_print_visitor.cpp already does).
>>
>> Otherwise, some mingw build errors out (since
>> 8e7e1ae0365ddc7edb0d4d98250ab46728e6c14a and
>> bbce1c538dc0cb8bf3769510283d11847dc07540 presumably) with:
>> src/compiler/glsl/ir_print_visitor.cpp:479:40: error: expected ‘)’
>> before ‘PRIu64’
>>case GLSL_TYPE_UINT64:fprintf(f, "%" PRIu64, ir->value.u64[i]); break;
>>
>> (Note even with that fix I get other format specifier warnings:
>> src/compiler/glsl/ir_print_visitor.cpp:473:47:
>> warning: unknown conversion type character ‘a’ in format [-Wformat=]
>> fprintf(f, "%a", ir->value.f[i]);
>>^
>> src/compiler/glsl/ir_print_visitor.cpp:473:47:
>> warning: too many arguments for format [-Wformat-extra-args]
>> but it still compiles at least)
>> ---
>>  src/compiler/glsl/glsl_parser_extras.cpp | 2 ++
>>  src/compiler/glsl/ir_print_visitor.cpp   | 2 ++
>>  2 files changed, 4 insertions(+)
>>
>> diff --git a/src/compiler/glsl/glsl_parser_extras.cpp
>> b/src/compiler/glsl/glsl_parser_extras.cpp
>> index e888090..3d2fc14 100644
>> --- a/src/compiler/glsl/glsl_parser_extras.cpp
>> +++ b/src/compiler/glsl/glsl_parser_extras.cpp
>> @@ -20,6 +20,8 @@
>>   * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>>   * DEALINGS IN THE SOFTWARE.
>>   */
>> +#define __STDC_FORMAT_MACROS 1
>> +#include  /* for PRIx64 macro */
>>  #include 
>>  #include 
>>  #include 
>> diff --git a/src/compiler/glsl/ir_print_visitor.cpp
>> b/src/compiler/glsl/ir_print_visitor.cpp
>> index 0763277..debbdad 100644
>> --- a/src/compiler/glsl/ir_print_visitor.cpp
>> +++ b/src/compiler/glsl/ir_print_visitor.cpp
>> @@ -21,6 +21,8 @@
>>   * DEALINGS IN THE SOFTWARE.
>>   */
>>
>> +#define __STDC_FORMAT_MACROS 1
>> +#include  /* for PRIx64 macro */
>>  #include "ir_print_visitor.h"
>>  #include "compiler/glsl_types.h"
>>  #include "glsl_parser_extras.h"
>>
> 
> Reviewed-by: Jose Fonseca 
> 
> But I think it might be more efficient to define this on configure.ac
> and scons/gallium.py like we already do for other __STDC__MACROS
> 
> Jose

Sounds reasonable, but I'll leave that to someone else...

Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] spirv: handle gl_SampleMask

2017-01-24 Thread Anuj Phogat
On Tue, Jan 24, 2017 at 4:48 AM, Iago Toral Quiroga  wrote:
> SPIR-V maps both gl_SampleMask and gl_SampleMaskIn to the same
> builtin (SampleMask). The only way to tell which one we are dealing with
> is to check if it is an input or an output.
>
> Fixes:
> dEQP-VK.pipeline.multisample_shader_builtin.sample_mask.write.*
> ---
> I am still waiting on Jenkins to report results from this patch, but for
> some reason it is taking surprisingly long so I figured I'd send it for
> review ahead of the results, I don't expect regressions, but I'll verify
> there aren't any when I get them in any case.
>
>  src/compiler/spirv/vtn_variables.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/src/compiler/spirv/vtn_variables.c 
> b/src/compiler/spirv/vtn_variables.c
> index d55f81e..4d1ec78 100644
> --- a/src/compiler/spirv/vtn_variables.c
> +++ b/src/compiler/spirv/vtn_variables.c
> @@ -975,8 +975,12 @@ vtn_get_builtin_location(struct vtn_builder *b,
>set_mode_system_value(mode);
>break;
> case SpvBuiltInSampleMask:
> -  *location = SYSTEM_VALUE_SAMPLE_MASK_IN; /* XXX out? */
> -  set_mode_system_value(mode);
> +  if (*mode == nir_var_shader_out) {
> + *location = FRAG_RESULT_SAMPLE_MASK;
> +  } else {
> + *location = SYSTEM_VALUE_SAMPLE_MASK_IN;
> + set_mode_system_value(mode);
> +  }
>break;
> case SpvBuiltInFragDepth:
>*location = FRAG_RESULT_DEPTH;
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Reviewed-by: Anuj Phogat 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] r600g: use ieee variants of multiplication instructions

2017-01-24 Thread Matteo Bruni
2017-01-24 3:18 GMT+01:00 Ilia Mirkin :
> This matches the behavior of most other drivers, including nouveau.

Doesn't this break all the applications depending on d3d9 NaN behavior
(including, but not limited to, d3d9 games in Wine) on r600g?

If I got this right, flipping around the two patches in this series
and enabling the TGSI_PROPERTY_MUL_ZERO_WINS flag for OpenGL
non-compute shaders (if that's not the case already) should avoid
regressions.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH mesa 1/2] egl: update headers from registry

2017-01-24 Thread Eric Engestrom
Khronos introduced a new macro (suggested by Google) to avoid using
C-style casts in C++ code, as those generate warnings.

Khronos Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=16113
Signed-off-by: Eric Engestrom 
---
 include/EGL/egl.h |  24 +++---
 include/EGL/eglext.h  | 197 +++---
 include/EGL/eglplatform.h |  10 ++-
 3 files changed, 206 insertions(+), 25 deletions(-)

diff --git a/include/EGL/egl.h b/include/EGL/egl.h
index 0d514e4def..29f30d94de 100644
--- a/include/EGL/egl.h
+++ b/include/EGL/egl.h
@@ -6,7 +6,7 @@ extern "C" {
 #endif
 
 /*
-** Copyright (c) 2013-2014 The Khronos Group Inc.
+** Copyright (c) 2013-2017 The Khronos Group Inc.
 **
 ** Permission is hereby granted, free of charge, to any person obtaining a
 ** copy of this software and/or associated documentation files (the
@@ -31,14 +31,14 @@ extern "C" {
 ** This header is generated from the Khronos OpenGL / OpenGL ES XML
 ** API Registry. The current version of the Registry, generator scripts
 ** used to make the header, and the header can be found at
-**   http://www.opengl.org/registry/
+**   http://www.opengl.org/registry/egl
 **
-** Khronos $Revision: 31039 $ on $Date: 2015-05-04 17:01:57 -0700 (Mon, 04 May 
2015) $
+** Khronos $Revision$ on $Date$
 */
 
 #include 
 
-/* Generated on date 20150504 */
+/* Generated on date 20161230 */
 
 /* Generated C header for:
  * API: egl
@@ -78,7 +78,7 @@ typedef void 
(*__eglMustCastToProperFunctionPointerType)(void);
 #define EGL_CONFIG_ID 0x3028
 #define EGL_CORE_NATIVE_ENGINE0x305B
 #define EGL_DEPTH_SIZE0x3025
-#define EGL_DONT_CARE ((EGLint)-1)
+#define EGL_DONT_CARE EGL_CAST(EGLint,-1)
 #define EGL_DRAW  0x3059
 #define EGL_EXTENSIONS0x3055
 #define EGL_FALSE 0
@@ -95,9 +95,9 @@ typedef void 
(*__eglMustCastToProperFunctionPointerType)(void);
 #define EGL_NONE  0x3038
 #define EGL_NON_CONFORMANT_CONFIG 0x3051
 #define EGL_NOT_INITIALIZED   0x3001
-#define EGL_NO_CONTEXT((EGLContext)0)
-#define EGL_NO_DISPLAY((EGLDisplay)0)
-#define EGL_NO_SURFACE((EGLSurface)0)
+#define EGL_NO_CONTEXTEGL_CAST(EGLContext,0)
+#define EGL_NO_DISPLAYEGL_CAST(EGLDisplay,0)
+#define EGL_NO_SURFACEEGL_CAST(EGLSurface,0)
 #define EGL_PBUFFER_BIT   0x0001
 #define EGL_PIXMAP_BIT0x0002
 #define EGL_READ  0x305A
@@ -197,7 +197,7 @@ typedef void *EGLClientBuffer;
 #define EGL_RGB_BUFFER0x308E
 #define EGL_SINGLE_BUFFER 0x3085
 #define EGL_SWAP_BEHAVIOR 0x3093
-#define EGL_UNKNOWN   ((EGLint)-1)
+#define EGL_UNKNOWN   EGL_CAST(EGLint,-1)
 #define EGL_VERTICAL_RESOLUTION   0x3091
 EGLAPI EGLBoolean EGLAPIENTRY eglBindAPI (EGLenum api);
 EGLAPI EGLenum EGLAPIENTRY eglQueryAPI (void);
@@ -224,7 +224,7 @@ EGLAPI EGLBoolean EGLAPIENTRY eglWaitClient (void);
 
 #ifndef EGL_VERSION_1_4
 #define EGL_VERSION_1_4 1
-#define EGL_DEFAULT_DISPLAY   ((EGLNativeDisplayType)0)
+#define EGL_DEFAULT_DISPLAY   EGL_CAST(EGLNativeDisplayType,0)
 #define EGL_MULTISAMPLE_RESOLVE_BOX_BIT   0x0200
 #define EGL_MULTISAMPLE_RESOLVE   0x3099
 #define EGL_MULTISAMPLE_RESOLVE_DEFAULT   0x309A
@@ -266,7 +266,7 @@ typedef void *EGLImage;
 #define EGL_FOREVER   0xull
 #define EGL_TIMEOUT_EXPIRED   0x30F5
 #define EGL_CONDITION_SATISFIED   0x30F6
-#define EGL_NO_SYNC   ((EGLSync)0)
+#define EGL_NO_SYNC   EGL_CAST(EGLSync,0)
 #define EGL_SYNC_FENCE0x30F9
 #define EGL_GL_COLORSPACE 0x309D
 #define EGL_GL_COLORSPACE_SRGB0x3089
@@ -283,7 +283,7 @@ typedef void *EGLImage;
 #define EGL_GL_TEXTURE_CUBE_MAP_POSITIVE_Z 0x30B7
 #define EGL_GL_TEXTURE_CUBE_MAP_NEGATIVE_Z 0x30B8
 #define EGL_IMAGE_PRESERVED   0x30D2
-#define EGL_NO_IMAGE  ((EGLImage)0)
+#define EGL_NO_IMAGE  EGL_CAST(EGLImage,0)
 EGLAPI EGLSync EGLAPIENTRY eglCreateSync (EGLDisplay dpy, EGLenum type, const 
EGLAttrib *attrib_list);
 EGLAPI EGLBoolean EGLAPIENTRY eglDestroySync (EGLDisplay dpy, EGLSync sync);
 EGLAPI EGLint EGLAPIENTRY eglClientWaitSync (EGLDisplay dpy, EGLSync sync, 
EGLint flags, EGLTime timeout);
diff --git a/include/EGL/eglext.h b/include/EGL/eglext.h
index 4ccbab8927..bc8f0bab23 100644
--- a/include/EGL/eglext.h
+++ b/include/EGL/eglext.h
@@ -6,7 +6,7 @@ extern "C" {
 #endif
 
 /*
-** Copyright (c) 2013-2016 The Khronos Group Inc.
+** Copyright (c) 2013-2017 The Khronos Group Inc.

[Mesa-dev] [PATCH mesa 2/2] egl: EGL_PLATFORM_SURFACELESS_MESA is now upstream

2017-01-24 Thread Eric Engestrom
EGL_PLATFORM_SURFACELESS_MESA is in eglext.h as of last commit.

Signed-off-by: Eric Engestrom 
---
 include/EGL/eglmesaext.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/include/EGL/eglmesaext.h b/include/EGL/eglmesaext.h
index 405d0e9ee4..3a1b88e3d1 100644
--- a/include/EGL/eglmesaext.h
+++ b/include/EGL/eglmesaext.h
@@ -85,11 +85,6 @@
 #define EGL_NO_CONFIG_MESA ((EGLConfig)0)
 #endif
 
-#ifndef EGL_MESA_platform_surfaceless
-#define EGL_MESA_platform_surfaceless 1
-#define EGL_PLATFORM_SURFACELESS_MESA   0x31DD
-#endif /* EGL_MESA_platform_surfaceless */
-
 #ifdef __cplusplus
 }
 #endif
-- 
Cheers,
  Eric

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vulkan: Don't install vk_platform.h or vulkan.h.

2017-01-24 Thread Jason Ekstrand
On Tue, Jan 24, 2017 at 9:03 AM, Matt Turner  wrote:

> On Tue, Jan 24, 2017 at 8:41 AM, Emil Velikov 
> wrote:
> > On 24 January 2017 at 00:54, Matt Turner  wrote:
> >> These files belong to the vulkan loader.
> > Fully agreed, patch is
> > Reviewed-by: Emil Velikov 
>
> Thanks!
>
> > Related question:
> > I was wondering about getting this a step further:
> >  - having the loader provide a .pc file
> >  - tracking required version at configure time and dropping our local
> > copies of the headers/xml.
> >
> > Would you be in favour, against, neutral of such an approach ?
>
> I'd be in favor of that, but let's see what Jason thinks.
>

I'd rather not.  That would make sense if we all lived in the open-source
world where everything is upstream all the time.  Unfortunately, not all of
us have that luxury and we need to be able to work on experimental branches
of the spec that may have more extensions than are provided by any loader
version we can install.  I'd be ok with a check for a particular loader
version just to force distros to update their loader but I would like to be
able to build with arbitrary XML branches without having to install a
branch of the loader.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97879] [amdgpu] Rocket League: long hangs (several seconds) when loading assets (models/textures/shaders?)

2017-01-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97879

--- Comment #52 from Marek Olšák  ---
We don't need a debug build. We just need:
1) One person to run the debug build and use sysprof to capture where the CPU
is spending time during the freeze.
2) Make a screenshot of the sysprof window and send it to the game developer.
3) The game developer should look at it and decide what to do next.

sysprof is a very-easy-to-use standalone CPU profiler GUI that you run under
root. It's observing all processes and also the kernel. For apps built with -g
(but also keep -O2 at least), it will show the functions and % of CPU time
spent in them. For apps also built with -fno-omit-frame-pointer, it will show
whole call stacks.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V3] glsl: lower constant arrays to uniform arrays before optimisation loop

2017-01-24 Thread Eric Anholt
Timothy Arceri  writes:

> From: Timothy Arceri 
>
> Previously the constant array would not get copy propagated until the backend
> did its GLSL IR opt loop. I plan on removing that from i965 shortly which
> caused huge regressions in Deus-ex and Tomb Raider which have large
> constant arrays. Moving lowering before the opt loop in the GLSL linker
> fixes this and unexpectedly improves some compute shaders also.

It seems like we should figure out what's missing in NIR that the lack
of GLSL copy propagation hurt, but this is a pretty easy fix for now:

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >