[Mesa-dev] [PATCH] nvc0: don't try to go through the push path for indirect draws

2016-05-14 Thread Ilia Mirkin
This fixes

dEQP-GLES31.functional.draw_indirect.draw_elements_indirect.*.default_attribute

These tests were causing a const vbo to be set up, and were small enough
draws that the logic was trying to go via the push path (which emits
data directly into the cmd stream rather than uploading a user vbo).

Signed-off-by: Ilia Mirkin 
Cc: mesa-sta...@lists.freedesktop.org
---
 src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c
index 4d9cd57..888c094 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c
@@ -948,7 +948,8 @@ nvc0_draw_vbo(struct pipe_context *pipe, const struct 
pipe_draw_info *info)
 * if index count is larger and we expect repeated vertices, suggest upload.
 */
nvc0->vbo_push_hint =
-  info->indexed && (nvc0->vb_elt_limit >= (info->count * 2));
+  !info->indirect && info->indexed &&
+  (nvc0->vb_elt_limit >= (info->count * 2));
 
/* Check whether we want to switch vertex-submission mode. */
if (nvc0->vbo_user && !(nvc0->dirty_3d & (NVC0_NEW_3D_ARRAYS | 
NVC0_NEW_3D_VERTEX))) {
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 95395] glsl: NULL type value in add_uniform() leads to SIGSEGV

2016-05-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=95395

--- Comment #2 from Jonathan Gray  ---
None of the builds were with LLVM enabled.  Interestingly I can't reproduce
this on sparc64 which requires strict alignment.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] clover: add LLVM version to device and platform version

2016-05-14 Thread Francisco Jerez
Giuseppe Bilotta  writes:

> Code generation (kernel compilation) may sometimes hit LLVM-specific
> bugs. Adding the used LLVM version to the version string may make bug
> triaging easier.
>
> Signed-off-by: Giuseppe Bilotta 

Acked-by: Francisco Jerez 

> ---
>  configure.ac   | 2 +-
>  src/gallium/state_trackers/clover/api/device.cpp   | 2 +-
>  src/gallium/state_trackers/clover/api/platform.cpp | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
>
>
> I believe similar additions could be made for OpenGL version strings as well.
>
> diff --git a/configure.ac b/configure.ac
> index 023110e..4fcadcf 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -2116,7 +2116,7 @@ if test "x$enable_gallium_llvm" = xyes; then
>  LLVM_COMPONENTS="${LLVM_COMPONENTS} all-targets ipo linker 
> instrumentation"
>  LLVM_COMPONENTS="${LLVM_COMPONENTS} irreader option objcarcopts 
> profiledata"
>  fi
> -DEFINES="${DEFINES} -DHAVE_LLVM=0x0$LLVM_VERSION_INT 
> -DMESA_LLVM_VERSION_PATCH=$LLVM_VERSION_PATCH"
> +DEFINES="${DEFINES} -DHAVE_LLVM=0x0$LLVM_VERSION_INT 
> -DMESA_LLVM_VERSION_PATCH=$LLVM_VERSION_PATCH 
> '-DMESA_LLVM_VERSION_STRING=\"$LLVM_VERSION_MAJOR.$LLVM_VERSION_MINOR.$LLVM_VERSION_PATCH\"'"
>  MESA_LLVM=1
>  
>  dnl Check for Clang internal headers
> diff --git a/src/gallium/state_trackers/clover/api/device.cpp 
> b/src/gallium/state_trackers/clover/api/device.cpp
> index bc93f91..0d0f77b 100644
> --- a/src/gallium/state_trackers/clover/api/device.cpp
> +++ b/src/gallium/state_trackers/clover/api/device.cpp
> @@ -300,7 +300,7 @@ clGetDeviceInfo(cl_device_id d_dev, cl_device_info param,
>break;
>  
> case CL_DEVICE_VERSION:
> -  buf.as_string() = "OpenCL 1.1 MESA " PACKAGE_VERSION;
> +  buf.as_string() = "OpenCL 1.1 MESA " PACKAGE_VERSION " LLVM " 
> MESA_LLVM_VERSION_STRING;
>break;
>  
> case CL_DEVICE_EXTENSIONS:
> diff --git a/src/gallium/state_trackers/clover/api/platform.cpp 
> b/src/gallium/state_trackers/clover/api/platform.cpp
> index cf71593..06eb4ec 100644
> --- a/src/gallium/state_trackers/clover/api/platform.cpp
> +++ b/src/gallium/state_trackers/clover/api/platform.cpp
> @@ -57,7 +57,7 @@ clover::GetPlatformInfo(cl_platform_id d_platform, 
> cl_platform_info param,
>break;
>  
> case CL_PLATFORM_VERSION:
> -  buf.as_string() = "OpenCL 1.1 MESA " PACKAGE_VERSION;
> +  buf.as_string() = "OpenCL 1.1 MESA " PACKAGE_VERSION " LLVM " 
> MESA_LLVM_VERSION_STRING;
>break;
>  
> case CL_PLATFORM_NAME:
> -- 
> 2.8.1.372.g9612035
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] st/dri: don't call close(-1) in dri{2, kms_}_init_screen error path

2016-05-14 Thread Leo Liu

Series is:
Reviewed-by: Leo Liu 

On 05/14/2016 11:33 AM, Emil Velikov wrote:

Add separate labels and jump to the correct one as needed.

Signed-off-by: Emil Velikov 
---
  src/gallium/state_trackers/dri/dri2.c | 30 --
  1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/src/gallium/state_trackers/dri/dri2.c 
b/src/gallium/state_trackers/dri/dri2.c
index 675a9bb..2330530 100644
--- a/src/gallium/state_trackers/dri/dri2.c
+++ b/src/gallium/state_trackers/dri/dri2.c
@@ -1714,7 +1714,7 @@ dri2_init_screen(__DRIscreen * sPriv)
 struct pipe_screen *pscreen = NULL;
 const struct drm_conf_ret *throttle_ret;
 const struct drm_conf_ret *dmabuf_ret;
-   int fd = -1;
+   int fd;
  
 screen = CALLOC_STRUCT(dri_screen);

 if (!screen)
@@ -1727,13 +1727,13 @@ dri2_init_screen(__DRIscreen * sPriv)
 sPriv->driverPrivate = (void *)screen;
  
 if (screen->fd < 0 || (fd = dup(screen->fd)) < 0)

-  goto fail;
+  goto free_screen;
  
 if (pipe_loader_drm_probe_fd(&screen->dev, fd))

pscreen = pipe_loader_create_screen(screen->dev);
  
 if (!pscreen)

-   goto fail;
+   goto release_pipe;
  
 throttle_ret = pipe_loader_configuration(screen->dev, DRM_CONF_THROTTLE);

 dmabuf_ret = pipe_loader_configuration(screen->dev, DRM_CONF_SHARE_FD);
@@ -1762,7 +1762,7 @@ dri2_init_screen(__DRIscreen * sPriv)
  
 configs = dri_init_screen_helper(screen, pscreen, screen->dev->driver_name);

 if (!configs)
-  goto fail;
+  goto destroy_screen;
  
 screen->can_share_buffer = true;

 screen->auto_fake_front = dri_with_format(sPriv);
@@ -1770,12 +1770,17 @@ dri2_init_screen(__DRIscreen * sPriv)
 screen->lookup_egl_image = dri2_lookup_egl_image;
  
 return configs;

-fail:
+
+destroy_screen:
 dri_destroy_screen_helper(screen);
+
+release_pipe:
 if (screen->dev)
pipe_loader_release(&screen->dev, 1);
 else
close(fd);
+
+free_screen:
 FREE(screen);
 return NULL;
  }
@@ -1793,7 +1798,7 @@ dri_kms_init_screen(__DRIscreen * sPriv)
 struct dri_screen *screen;
 struct pipe_screen *pscreen = NULL;
 uint64_t cap;
-   int fd = -1;
+   int fd;
  
 screen = CALLOC_STRUCT(dri_screen);

 if (!screen)
@@ -1805,13 +1810,13 @@ dri_kms_init_screen(__DRIscreen * sPriv)
 sPriv->driverPrivate = (void *)screen;
  
 if (screen->fd < 0 || (fd = dup(screen->fd)) < 0)

-  goto fail;
+  goto free_screen;
  
 if (pipe_loader_sw_probe_kms(&screen->dev, fd))

pscreen = pipe_loader_create_screen(screen->dev);
  
 if (!pscreen)

-   goto fail;
+   goto release_pipe;
  
 if (drmGetCap(sPriv->fd, DRM_CAP_PRIME, &cap) == 0 &&

(cap & DRM_PRIME_CAP_IMPORT)) {
@@ -1823,7 +1828,7 @@ dri_kms_init_screen(__DRIscreen * sPriv)
  
 configs = dri_init_screen_helper(screen, pscreen, "swrast");

 if (!configs)
-  goto fail;
+  goto destroy_screen;
  
 screen->can_share_buffer = false;

 screen->auto_fake_front = dri_with_format(sPriv);
@@ -1831,12 +1836,17 @@ dri_kms_init_screen(__DRIscreen * sPriv)
 screen->lookup_egl_image = dri2_lookup_egl_image;
  
 return configs;

-fail:
+
+destroy_screen:
 dri_destroy_screen_helper(screen);
+
+release_pipe:
 if (screen->dev)
pipe_loader_release(&screen->dev, 1);
 else
close(fd);
+
+free_screen:
 FREE(screen);
  #endif // GALLIUM_SOFTPIPE
 return NULL;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] clover: Error on incomplete switch statements

2016-05-14 Thread Francisco Jerez
Jan Vesely  writes:

> Signed-off-by: Jan Vesely 
> ---
>  src/gallium/state_trackers/clover/Makefile.am | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/src/gallium/state_trackers/clover/Makefile.am 
> b/src/gallium/state_trackers/clover/Makefile.am
> index 4c9d7d9..26ebd3b 100644
> --- a/src/gallium/state_trackers/clover/Makefile.am
> +++ b/src/gallium/state_trackers/clover/Makefile.am
> @@ -1,5 +1,9 @@
>  include Makefile.sources
>  
> +AM_CXXFLAGS = -Werror=switch
> +
> +CXXFLAGS += $(AM_CXXFLAGS)
> +

I'm not much into build systems, but I don't think this is the way
you're supposed to add flags to the compiler command line, because the
user can easily override your definition inadvertently.  AFAIK the usual
idiom is to add AM_CXXFLAGS explicitly to each of the per-target
CXXFLAGS variables.  Once you do that it should be easy to remove some
redundancy between per-target CXXFLAGS (e.g. -std=c++11 and
$(VISIBILITY_CXXFLAGS)).

>  AM_CPPFLAGS = \
>   -I$(top_srcdir)/include \
>   -I$(top_srcdir)/src \
> -- 
> 2.5.5


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] clover: Handle PIPE_SHADER_IR_NIR in switch

2016-05-14 Thread Francisco Jerez
Jan Vesely  writes:

> Signed-off-by: Jan Vesely 
> ---
>  src/gallium/state_trackers/clover/llvm/invocation.cpp | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
> b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> index 96f6a48..e2cadda 100644
> --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
> +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> @@ -893,8 +893,9 @@ clover::compile_program_llvm(const std::string &source,
> module m;
> // Build the clover::module
> switch (ir) {
> +  case PIPE_SHADER_IR_NIR:
>case PIPE_SHADER_IR_TGSI:
> - //XXX: Handle TGSI
> + //XXX: Handle TGSI, NIR

Heh, I doubt that writing a NIR LLVM back-end would be particularly
rewarding or useful, but sure:

Reviewed-by: Francisco Jerez 

>   assert(0);
>   m = module();
>   break;
> -- 
> 2.5.5


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nvc0/ir: make sure to align the second arg of TXD to 4, as we do for TEX

2016-05-14 Thread Ilia Mirkin
This was handled in handleTEX(), however the way the logic works, those
extra arguments aren't added on by then, so it did nothing. Instead we
must duplicate that bit here. GK110 appears to complain about
MISALIGNED_GPR, however it's reasonable to believe that GK104 has the
same requirements.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95403
Signed-off-by: Ilia Mirkin 
Cc: mesa-sta...@lists.freedesktop.org
---
 .../drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index 1068c21..869b06c 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -993,6 +993,20 @@ NVC0LoweringPass::handleTXD(TexInstruction *txd)
   txd->dPdx[c].set(NULL);
   txd->dPdy[c].set(NULL);
}
+
+   // In this case we have fewer than 4 "real" arguments, which means that
+   // handleTEX didn't apply any padding. However we have to make sure that
+   // the second "group" of arguments still gets padded up to 4.
+   if (chipset >= NVISA_GK104_CHIPSET) {
+  int s = arg + 2 * dim;
+  if (s >= 4 && s < 7) {
+ if (txd->srcExists(s)) // move potential predicate out of the way
+txd->moveSources(s, 7 - s);
+ while (s < 7)
+txd->setSrc(s++, bld.loadImm(NULL, 0));
+  }
+   }
+
return true;
 }
 
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 15/30] i965/fs: support doubles with UBO loads

2016-05-14 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> On 14/05/16 01:16, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez  writes:
>> 
>>> From: Iago Toral Quiroga 
>>>
>>> UBO loads with constant offset use the UNIFORM_PULL_CONSTANT_LOAD
>>> instruction, which reads 16 bytes (a vec4) of data from memory. For dvec
>>> types this only provides components x and y. Thus, if we are reading
>>> more than 2 components we need to issue a second load at offset+16 to
>>> read the next 16-byte chunk with components w and z.
>>>
>>> UBO loads with non-constant offset emit a load for each component
>>> in the vector (and rely in CSE to fix redundant loads), so we only
>>> need to consider the size of the data type when computing the offset
>>> of each element in a vector.
>>>
>>> v2 (Sam):
>>> - Adapt the code to use component() (Curro).
>>>
>>> Signed-off-by: Samuel Iglesias Gonsálvez 
>>> Reviewed-by: Kenneth Graunke 
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 52 
>>> +++-
>>>  1 file changed, 45 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> index 2d57fd3..02f1e81 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> @@ -3362,6 +3362,9 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
>>> nir_intrinsic_instr *instr
>>> nir->info.num_ubos - 1);
>>>}
>>>  
>>> +  /* Number of 32-bit slots in the type */
>>> +  unsigned type_slots = MAX2(1, type_sz(dest.type) / 4);
>>> +
>>>nir_const_value *const_offset = 
>>> nir_src_as_const_value(instr->src[1]);
>>>if (const_offset == NULL) {
>>>   fs_reg base_offset = retype(get_nir_src(instr->src[1]),
>>> @@ -3369,19 +3372,54 @@ fs_visitor::nir_emit_intrinsic(const fs_builder 
>>> &bld, nir_intrinsic_instr *instr
>>>  
>>>   for (int i = 0; i < instr->num_components; i++)
>>>  VARYING_PULL_CONSTANT_LOAD(bld, offset(dest, bld, i), 
>>> surf_index,
>>> -   base_offset, i * 4);
>>> +   base_offset, i * 4 * type_slots);
>> 
>> Why not 'i * type_sz(...)'?  As before it seems like type_slots is just
>> going to introduce rounding errors here for no benefit?
>> 
>
> Right, I will fix it.
>
>>>} else {
>>> + /* Even if we are loading doubles, a pull constant load will load
>>> +  * a 32-bit vec4, so should only reserve vgrf space for that. If 
>>> we
>>> +  * need to load a full dvec4 we will have to emit 2 loads. This is
>>> +  * similar to demote_pull_constants(), except that in that case we
>>> +  * see individual accesses to each component of the vector and 
>>> then
>>> +  * we let CSE deal with duplicate loads. Here we see a vector 
>>> access
>>> +  * and we have to split it if necessary.
>>> +  */
>>>   fs_reg packed_consts = vgrf(glsl_type::float_type);
>>>   packed_consts.type = dest.type;
>>>  
>>> - struct brw_reg const_offset_reg = brw_imm_ud(const_offset->u32[0] 
>>> & ~15);
>>> - bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD, packed_consts,
>>> -  surf_index, const_offset_reg);
>>> + unsigned const_offset_aligned = const_offset->u32[0] & ~15;
>>> +
>>> + /* A vec4 only contains half of a dvec4, if we need more than 2
>>> +  * components of a dvec4 we will have to issue another load for
>>> +  * components z and w
>>> +  */
>>> + int num_components;
>>> + if (type_slots == 1)
>>> +num_components = instr->num_components;
>>> + else
>>> +num_components = MIN2(2, instr->num_components);
>>>
>>> - const fs_reg consts = byte_offset(packed_consts, 
>>> const_offset->u32[0] % 16);
>>> + int remaining_components = instr->num_components;
>>> + while (remaining_components > 0) {
>>> +/* Read the vec4 from a 16-byte aligned offset */
>>> +struct brw_reg const_offset_reg = 
>>> brw_imm_ud(const_offset_aligned);
>>> +bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
>>> + retype(packed_consts, BRW_REGISTER_TYPE_F),
>>> + surf_index, const_offset_reg);
>>>  
>>> - for (unsigned i = 0; i < instr->num_components; i++)
>>> -bld.MOV(offset(dest, bld, i), component(consts, i));
>>> +const fs_reg consts = byte_offset(packed_consts, 
>>> (const_offset->u32[0] % 16));
>> 
>> This looks really fishy to me, if the initial offset is not 16B aligned
>> you'll apply the same sub-16B offset to the result from each one of the
>> subsequent pull constant loads.
>
> This cannot happen thanks to the layout alignment rules, see below.
>
>> Also you don't seem to take into
>> account whether the initial offset is misaligned in 

[Mesa-dev] [Bug 95374] ARK:survival of the fittest fails when GL4.3 is enabled.

2016-05-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=95374

Vedran Miletić  changed:

   What|Removed |Added

 CC||riva...@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/17] nir: add lowering pass for y-transform

2016-05-14 Thread Rob Clark
On Sat, May 14, 2016 at 4:43 PM, Jason Ekstrand  wrote:
> On Sat, May 14, 2016 at 12:23 PM, Rob Clark  wrote:
>>
>> On Thu, May 12, 2016 at 10:17 PM, Jason Ekstrand 
>> wrote:
>> >
>> >
>> > On Mon, May 9, 2016 at 12:33 PM, Rob Clark  wrote:
>> >>
>> >> From: Rob Clark 
>> >>
>> >> Signed-off-by: Rob Clark 
>> >> Reviewed-by: Connor Abbott 
>> >> ---
>> >>  src/compiler/Makefile.sources|   1 +
>> >>  src/compiler/nir/nir.h   |  11 +
>> >>  src/compiler/nir/nir_lower_wpos_ytransform.c | 310
>> >> +++
>> >>  3 files changed, 322 insertions(+)
>> >>  create mode 100644 src/compiler/nir/nir_lower_wpos_ytransform.c
>> >>
>> >> diff --git a/src/compiler/Makefile.sources
>> >> b/src/compiler/Makefile.sources
>> >> index 2a52319..b542a1a 100644
>> >> --- a/src/compiler/Makefile.sources
>> >> +++ b/src/compiler/Makefile.sources
>> >> @@ -208,6 +208,7 @@ NIR_FILES = \
>> >> nir/nir_lower_vars_to_ssa.c \
>> >> nir/nir_lower_var_copies.c \
>> >> nir/nir_lower_vec_to_movs.c \
>> >> +   nir/nir_lower_wpos_ytransform.c \
>> >> nir/nir_metadata.c \
>> >> nir/nir_move_vec_src_uses_to_dest.c \
>> >> nir/nir_normalize_cubemap_coords.c \
>> >> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> >> index 8a616d4..474ba63 100644
>> >> --- a/src/compiler/nir/nir.h
>> >> +++ b/src/compiler/nir/nir.h
>> >> @@ -2374,6 +2374,17 @@ void nir_lower_two_sided_color(nir_shader
>> >> *shader);
>> >>
>> >>  void nir_lower_clamp_color_outputs(nir_shader *shader);
>> >>
>> >> +typedef struct nir_lower_wpos_ytransform_options {
>> >> +   int state_tokens[5];
>> >> +   bool fs_coord_origin_upper_left :1;
>> >> +   bool fs_coord_origin_lower_left :1;
>> >> +   bool fs_coord_pixel_center_integer :1;
>> >> +   bool fs_coord_pixel_center_half_integer :1;
>> >
>> >
>> > Drive-by commentary: Why are we using two booleans for one boolean here?
>> > All hardware should be either lower-left or upper-left and I'm going to
>> > hazard that the other two are mutually exclusive as well.  The pass
>> > certainly seems to assume so.
>>
>> mostly just because gallium splits it out into two caps, and this
>> matches the logic in the equiv tgsi lowering pass more closely..
>>
>> The way it is currently would, I think, work if there was some hw that
>> supported both cases (which is, I assume, why the gallium part of it
>> works the way it does)
>
>
> Yeah, I guess I could see that.  In that case, I suppose you could just not
> run the pass?  I guess you could have a case where HW supports both
> gl_FragCoords modes but only one pixel-center mode.  Whatever.  I guess I'm
> ok with 4 bools if it's useful.
>

Yeah, not entirely sure why you'd run the pass if hw supported both
cases..  at this point, the strongest argument for keeping it as-is is
probably just to keep the logic similar to equiv tgsi lowering pass.
Maybe someone with a longer history on the gallium/tgsi side of things
will pipe up.

BR,
-R


>>
>>
>> BR,
>> -R
>>
>> > Let's just make it two booleans.   If we come across hardware that puts
>> > the
>> > pixel center at 0.75, 0.25 then we can make fs_coord_pixel_center an
>> > enum.
>> > --Jason
>> >
>> >>
>> >> +} nir_lower_wpos_ytransform_options;
>> >> +
>> >> +bool nir_lower_wpos_ytransform(nir_shader *shader,
>> >> +   const nir_lower_wpos_ytransform_options
>> >> *options);
>> >> +
>> >>  void nir_lower_atomics(nir_shader *shader,
>> >> const struct gl_shader_program
>> >> *shader_program);
>> >>  void nir_lower_to_source_mods(nir_shader *shader);
>> >> diff --git a/src/compiler/nir/nir_lower_wpos_ytransform.c
>> >> b/src/compiler/nir/nir_lower_wpos_ytransform.c
>> >> new file mode 100644
>> >> index 000..1d53530
>> >> --- /dev/null
>> >> +++ b/src/compiler/nir/nir_lower_wpos_ytransform.c
>> >> @@ -0,0 +1,310 @@
>> >> +/*
>> >> + * Copyright © 2015 Red Hat
>> >> + *
>> >> + * Permission is hereby granted, free of charge, to any person
>> >> obtaining
>> >> a
>> >> + * copy of this software and associated documentation files (the
>> >> "Software"),
>> >> + * to deal in the Software without restriction, including without
>> >> limitation
>> >> + * the rights to use, copy, modify, merge, publish, distribute,
>> >> sublicense,
>> >> + * and/or sell copies of the Software, and to permit persons to whom
>> >> the
>> >> + * Software is furnished to do so, subject to the following
>> >> conditions:
>> >> + *
>> >> + * The above copyright notice and this permission notice (including
>> >> the
>> >> next
>> >> + * paragraph) shall be included in all copies or substantial portions
>> >> of
>> >> the
>> >> + * Software.
>> >> + *
>> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> >> EXPRESS OR
>> >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> >> MERCHANTABILITY,
>> >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVEN

Re: [Mesa-dev] [PATCH 04/17] nir: add lowering pass for y-transform

2016-05-14 Thread Jason Ekstrand
On Sat, May 14, 2016 at 12:23 PM, Rob Clark  wrote:

> On Thu, May 12, 2016 at 10:17 PM, Jason Ekstrand 
> wrote:
> >
> >
> > On Mon, May 9, 2016 at 12:33 PM, Rob Clark  wrote:
> >>
> >> From: Rob Clark 
> >>
> >> Signed-off-by: Rob Clark 
> >> Reviewed-by: Connor Abbott 
> >> ---
> >>  src/compiler/Makefile.sources|   1 +
> >>  src/compiler/nir/nir.h   |  11 +
> >>  src/compiler/nir/nir_lower_wpos_ytransform.c | 310
> >> +++
> >>  3 files changed, 322 insertions(+)
> >>  create mode 100644 src/compiler/nir/nir_lower_wpos_ytransform.c
> >>
> >> diff --git a/src/compiler/Makefile.sources
> b/src/compiler/Makefile.sources
> >> index 2a52319..b542a1a 100644
> >> --- a/src/compiler/Makefile.sources
> >> +++ b/src/compiler/Makefile.sources
> >> @@ -208,6 +208,7 @@ NIR_FILES = \
> >> nir/nir_lower_vars_to_ssa.c \
> >> nir/nir_lower_var_copies.c \
> >> nir/nir_lower_vec_to_movs.c \
> >> +   nir/nir_lower_wpos_ytransform.c \
> >> nir/nir_metadata.c \
> >> nir/nir_move_vec_src_uses_to_dest.c \
> >> nir/nir_normalize_cubemap_coords.c \
> >> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> >> index 8a616d4..474ba63 100644
> >> --- a/src/compiler/nir/nir.h
> >> +++ b/src/compiler/nir/nir.h
> >> @@ -2374,6 +2374,17 @@ void nir_lower_two_sided_color(nir_shader
> *shader);
> >>
> >>  void nir_lower_clamp_color_outputs(nir_shader *shader);
> >>
> >> +typedef struct nir_lower_wpos_ytransform_options {
> >> +   int state_tokens[5];
> >> +   bool fs_coord_origin_upper_left :1;
> >> +   bool fs_coord_origin_lower_left :1;
> >> +   bool fs_coord_pixel_center_integer :1;
> >> +   bool fs_coord_pixel_center_half_integer :1;
> >
> >
> > Drive-by commentary: Why are we using two booleans for one boolean here?
> > All hardware should be either lower-left or upper-left and I'm going to
> > hazard that the other two are mutually exclusive as well.  The pass
> > certainly seems to assume so.
>
> mostly just because gallium splits it out into two caps, and this
> matches the logic in the equiv tgsi lowering pass more closely..
>
> The way it is currently would, I think, work if there was some hw that
> supported both cases (which is, I assume, why the gallium part of it
> works the way it does)
>

Yeah, I guess I could see that.  In that case, I suppose you could just not
run the pass?  I guess you could have a case where HW supports both
gl_FragCoords modes but only one pixel-center mode.  Whatever.  I guess I'm
ok with 4 bools if it's useful.


>
> BR,
> -R
>
> > Let's just make it two booleans.   If we come across hardware that puts
> the
> > pixel center at 0.75, 0.25 then we can make fs_coord_pixel_center an
> enum.
> > --Jason
> >
> >>
> >> +} nir_lower_wpos_ytransform_options;
> >> +
> >> +bool nir_lower_wpos_ytransform(nir_shader *shader,
> >> +   const nir_lower_wpos_ytransform_options
> >> *options);
> >> +
> >>  void nir_lower_atomics(nir_shader *shader,
> >> const struct gl_shader_program *shader_program);
> >>  void nir_lower_to_source_mods(nir_shader *shader);
> >> diff --git a/src/compiler/nir/nir_lower_wpos_ytransform.c
> >> b/src/compiler/nir/nir_lower_wpos_ytransform.c
> >> new file mode 100644
> >> index 000..1d53530
> >> --- /dev/null
> >> +++ b/src/compiler/nir/nir_lower_wpos_ytransform.c
> >> @@ -0,0 +1,310 @@
> >> +/*
> >> + * Copyright © 2015 Red Hat
> >> + *
> >> + * Permission is hereby granted, free of charge, to any person
> obtaining
> >> a
> >> + * copy of this software and associated documentation files (the
> >> "Software"),
> >> + * to deal in the Software without restriction, including without
> >> limitation
> >> + * the rights to use, copy, modify, merge, publish, distribute,
> >> sublicense,
> >> + * and/or sell copies of the Software, and to permit persons to whom
> the
> >> + * Software is furnished to do so, subject to the following conditions:
> >> + *
> >> + * The above copyright notice and this permission notice (including the
> >> next
> >> + * paragraph) shall be included in all copies or substantial portions
> of
> >> the
> >> + * Software.
> >> + *
> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> >> EXPRESS OR
> >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> >> MERCHANTABILITY,
> >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
> >> SHALL
> >> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> >> OTHER
> >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> >> ARISING FROM,
> >> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS
> >> IN THE
> >> + * SOFTWARE.
> >> + */
> >> +
> >> +#include "nir.h"
> >> +#include "nir_builder.h"
> >> +
> >> +/* Lower gl_FragCoord (and fddy) to account for driver's requested
> >> coordinate-
> >> + * origin and pixel-center vs. shader.  If tra

Re: [Mesa-dev] [PATCH 00/28] i965/blorp: Use NIR for compiling shaders

2016-05-14 Thread Kenneth Graunke
On Tuesday, May 10, 2016 4:16:20 PM PDT Jason Ekstrand wrote:
> When Paul originally wrote blorp he hand-rolled a shader builder that
> builds i965 shaders directly.  This has caused headaches because every time
> we make a change to the back-end compiler, we have to update blorp.  NIR on
> the other hand tends to be more stable at this point since it has many
> different users all across mesa.
> 
> Using NIR also means that we get decent optimizations, register allocation,
> and scheduling.  The original blorp codegen code tried fairly hard to emit
> reasonably efficient code in that it didn't do more work than needed but it
> was fairly naieve when it came to register allocation and scheduling.
> Using the full compiler stack also means that we get new features for free
> without having to re-implement them in blorp.  On Sky Lake, for instance,
> we are now generating shaders with sampler-EOT.
> 
> In spite of all this, this series shows no measurable performance
> difference on Haswell with every benchmark in sixonyx run 25 times.

Patches 1-13 are:
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] nir: forward-declare 'struct gl_shader_program'

2016-05-14 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Sat, May 14, 2016 at 1:10 PM, Rob Clark  wrote:

> From: Rob Clark 
>
> Drop extra #include which is otherwise unneeded (and makes this header
> difficult to include from outside of src/mesa).
>
> Signed-off-by: Rob Clark 
> ---
>  src/compiler/nir/glsl_to_nir.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/compiler/nir/glsl_to_nir.h
> b/src/compiler/nir/glsl_to_nir.h
> index e3fe9b0..14641fc 100644
> --- a/src/compiler/nir/glsl_to_nir.h
> +++ b/src/compiler/nir/glsl_to_nir.h
> @@ -26,12 +26,13 @@
>   */
>
>  #include "nir.h"
> -#include "compiler/glsl/glsl_parser_extras.h"
>
>  #ifdef __cplusplus
>  extern "C" {
>  #endif
>
> +struct gl_shader_program;
> +
>  nir_shader *glsl_to_nir(const struct gl_shader_program *shader_prog,
>  gl_shader_stage stage,
>  const nir_shader_compiler_options *options);
> --
> 2.5.5
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/11] tgsi: remove culldist semantic.

2016-05-14 Thread Ilia Mirkin
On Sat, May 14, 2016 at 2:58 PM, Roland Scheidegger  wrote:
> On 05/14/2016 04:24 PM, Ilia Mirkin wrote:
>>
>> On Sat, May 14, 2016 at 10:23 AM, Roland Scheidegger 
>> wrote:
>>>
>>> Am 14.05.2016 um 14:55 schrieb Marek Olšák:

 Dave,
 It should be noted that clip distances can be disabled by
 pipe_rasterizer_state::clip_plane_enable, but cull distances can't.
 (same as GL)
>>>
>>>
>>> That only applies to user clip planes, not shader clip distances.
>>
>>
>> Actually, it applies to both.
>
>
> Yes, you are right. Ahh crap. draw, however, ignores the enable bits for
> clip distances (and we're probably relying on this even internally right
> now). Do blobs actually honor them? I'm wondering because some code changes
> I was recently doing at vmware shouldn't have worked if they did... Or maybe
> I got lucky...
> In any case honoring the enable bits should still be possible even with both
> clip and cull integrated into the same output.

What I do is compute a clip & cull mask separately in the shader and
then &= clip mask, then |= cull mask. (Onto the rast->clip_enable
mask.) Seems to work.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] freedreno/ir3: cmdline compiler for glsl

2016-05-14 Thread Rob Clark
From: Rob Clark 

use glsl/libstandalone.la to add support for taking glsl src files (in
addition to .tgsi) as input.  Then glsl->nir and feed the result into
the ir3 backend as normal.

Signed-off-by: Rob Clark 
---
 src/gallium/drivers/freedreno/Makefile.am   |  2 +
 src/gallium/drivers/freedreno/ir3/ir3_cmdline.c | 89 +
 2 files changed, 77 insertions(+), 14 deletions(-)

diff --git a/src/gallium/drivers/freedreno/Makefile.am 
b/src/gallium/drivers/freedreno/Makefile.am
index 9c0ccdf..1af8dec 100644
--- a/src/gallium/drivers/freedreno/Makefile.am
+++ b/src/gallium/drivers/freedreno/Makefile.am
@@ -37,6 +37,8 @@ ir3_compiler_LDADD = \
libfreedreno.la \
$(top_builddir)/src/gallium/auxiliary/libgallium.la \
$(top_builddir)/src/compiler/nir/libnir.la \
+   $(top_builddir)/src/compiler/glsl/libstandalone.la \
$(top_builddir)/src/util/libmesautil.la \
+   $(top_builddir)/src/mesa/libmesagallium.la \
$(GALLIUM_COMMON_LIB_DEPS) \
$(FREEDRENO_LIBS)
diff --git a/src/gallium/drivers/freedreno/ir3/ir3_cmdline.c 
b/src/gallium/drivers/freedreno/ir3/ir3_cmdline.c
index 47bcec4..0e4827c 100644
--- a/src/gallium/drivers/freedreno/ir3/ir3_cmdline.c
+++ b/src/gallium/drivers/freedreno/ir3/ir3_cmdline.c
@@ -44,6 +44,9 @@
 #include "instr-a3xx.h"
 #include "ir3.h"
 
+#include "compiler/glsl/standalone.h"
+#include "compiler/nir/glsl_to_nir.h"
+
 static void dump_info(struct ir3_shader_variant *so, const char *str)
 {
uint32_t *bin;
@@ -55,6 +58,47 @@ static void dump_info(struct ir3_shader_variant *so, const 
char *str)
free(bin);
 }
 
+int st_glsl_type_size(const struct glsl_type *type);
+
+static nir_shader *
+load_glsl(const char *filename, gl_shader_stage stage)
+{
+   static const struct standalone_options options = {
+   .glsl_version = 140,
+   .do_link = true,
+   };
+   struct gl_shader_program *prog;
+
+   prog = standalone_compile_shader(&options, 1, (char * const*)&filename);
+   if (!prog)
+   errx(1, "couldn't parse `%s'", filename);
+
+   nir_shader *nir = glsl_to_nir(prog, stage, ir3_get_compiler_options());
+
+   standalone_compiler_cleanup(prog);
+
+   /* required NIR passes: */
+   /* TODO cmdline args for some of the conditional lowering passes? */
+
+   NIR_PASS_V(nir, nir_lower_io_to_temporaries,
+   nir_shader_get_entrypoint(nir),
+   true, true);
+   NIR_PASS_V(nir, nir_lower_global_vars_to_local);
+   NIR_PASS_V(nir, nir_split_var_copies);
+   NIR_PASS_V(nir, nir_lower_var_copies);
+
+   NIR_PASS_V(nir, nir_split_var_copies);
+   NIR_PASS_V(nir, nir_lower_var_copies);
+   NIR_PASS_V(nir, nir_lower_io_types);
+
+   // TODO nir_assign_var_locations??
+
+   NIR_PASS_V(nir, nir_lower_system_values);
+   NIR_PASS_V(nir, nir_lower_io, nir_var_all, st_glsl_type_size);
+   NIR_PASS_V(nir, nir_lower_samplers, prog);
+
+   return nir;
+}
 
 static int
 read_file(const char *filename, void **ptr, size_t *size)
@@ -86,7 +130,7 @@ read_file(const char *filename, void **ptr, size_t *size)
 
 static void print_usage(void)
 {
-   printf("Usage: ir3_compiler [OPTIONS]... FILE\n");
+   printf("Usage: ir3_compiler [OPTIONS]... \n");
printf("--verbose - verbose compiler/debug messages\n");
printf("--binning-pass- generate binning pass shader (VERT)\n");
printf("--color-two-side  - emulate two-sided color (FRAG)\n");
@@ -105,8 +149,6 @@ int main(int argc, char **argv)
 {
int ret = 0, n = 1;
const char *filename;
-   struct tgsi_token toks[65536];
-   struct tgsi_parse_context parse;
struct ir3_shader_variant v;
struct ir3_shader s;
struct ir3_shader_key key = {};
@@ -234,31 +276,50 @@ int main(int argc, char **argv)
if (fd_mesa_debug & FD_DBG_OPTMSGS)
debug_printf("%s\n", (char *)ptr);
 
-   if (!tgsi_text_translate(ptr, toks, ARRAY_SIZE(toks)))
-   errx(1, "could not parse `%s'", filename);
+   nir_shader *nir;
 
-   if (fd_mesa_debug & FD_DBG_OPTMSGS)
-   tgsi_dump(toks, 0);
+   char *ext = rindex(filename, '.');
+
+   if (strcmp(ext, ".tgsi") == 0) {
+   struct tgsi_token toks[65536];
+
+   if (!tgsi_text_translate(ptr, toks, ARRAY_SIZE(toks)))
+   errx(1, "could not parse `%s'", filename);
+
+   if (fd_mesa_debug & FD_DBG_OPTMSGS)
+   tgsi_dump(toks, 0);
+
+   nir = ir3_tgsi_to_nir(toks);
+   s.from_tgsi = true;
+   } else if (strcmp(ext, ".frag") == 0) {
+   nir = load_glsl(filename, MESA_SHADER_FRAGMENT);
+   s.from_tgsi = false;
+   } else if (strcmp(ext, ".vert") == 0) {
+   nir = load_glsl(filename, MESA_SHADER_FRAGMENT);
+  

[Mesa-dev] [PATCH 1/3] glsl: split out libstandalone

2016-05-14 Thread Rob Clark
From: Rob Clark 

Split standalone glsl_compiler into a libstandalone.la and a thin
main.cpp.  This way drivers can re-use the glsl standalone frontend in
their own standalone compilers.

Signed-off-by: Rob Clark 
---
There is one kinda ugly hack (#including a .cpp file) to work around
an automake issue..  not sure if there is a better way to do that, or
if we should bother caring (since it isn't something that is installed
anyways)

 src/compiler/Makefile.glsl.am|  12 +-
 src/compiler/Makefile.sources|   5 +-
 src/compiler/glsl/main.cpp   | 380 ++
 src/compiler/glsl/standalone.cpp | 437 +++
 src/compiler/glsl/standalone.h   |  51 +
 5 files changed, 514 insertions(+), 371 deletions(-)
 create mode 100644 src/compiler/glsl/standalone.cpp
 create mode 100644 src/compiler/glsl/standalone.h

diff --git a/src/compiler/Makefile.glsl.am b/src/compiler/Makefile.glsl.am
index daf98f6..69def41 100644
--- a/src/compiler/Makefile.glsl.am
+++ b/src/compiler/Makefile.glsl.am
@@ -93,7 +93,7 @@ glsl_tests_sampler_types_test_LDADD = \
$(top_builddir)/src/libglsl_util.la \
$(PTHREAD_LIBS)
 
-noinst_LTLIBRARIES += glsl/libglsl.la glsl/libglcpp.la
+noinst_LTLIBRARIES += glsl/libglsl.la glsl/libglcpp.la glsl/libstandalone.la
 
 glsl_libglcpp_la_LIBADD =  \
$(top_builddir)/src/util/libmesautil.la
@@ -121,15 +121,21 @@ glsl_libglsl_la_SOURCES = \
$(LIBGLSL_FILES)
 
 
-glsl_compiler_SOURCES = \
+glsl_libstandalone_la_SOURCES = \
$(GLSL_COMPILER_CXX_FILES)
 
-glsl_compiler_LDADD =  \
+glsl_libstandalone_la_LIBADD = \
glsl/libglsl.la \
$(top_builddir)/src/libglsl_util.la \
$(top_builddir)/src/util/libmesautil.la \
$(PTHREAD_LIBS)
 
+glsl_compiler_SOURCES = \
+   glsl/main.cpp
+
+glsl_compiler_LDADD = \
+   glsl/libstandalone.la
+
 glsl_glsl_test_SOURCES = \
glsl/standalone_scaffolding.cpp \
glsl/test.cpp \
diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
index 66fbd84..881a616 100644
--- a/src/compiler/Makefile.sources
+++ b/src/compiler/Makefile.sources
@@ -136,9 +136,8 @@ LIBGLSL_FILES = \
 # glsl_compiler
 
 GLSL_COMPILER_CXX_FILES = \
-   glsl/standalone_scaffolding.cpp \
-   glsl/standalone_scaffolding.h \
-   glsl/main.cpp
+   glsl/standalone.cpp \
+   glsl/standalone.h
 
 # libglsl generated sources
 LIBGLSL_GENERATED_CXX_FILES = \
diff --git a/src/compiler/glsl/main.cpp b/src/compiler/glsl/main.cpp
index d253575..f65b185 100644
--- a/src/compiler/glsl/main.cpp
+++ b/src/compiler/glsl/main.cpp
@@ -20,6 +20,8 @@
  * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
  * DEALINGS IN THE SOFTWARE.
  */
+
+#include 
 #include 
 
 /** @file main.cpp
@@ -31,255 +33,16 @@
  * offline compile GLSL code and examine the resulting GLSL IR.
  */
 
-#include "ast.h"
-#include "glsl_parser_extras.h"
-#include "ir_optimization.h"
-#include "program.h"
-#include "program/hash_table.h"
-#include "loop_analysis.h"
-#include "standalone_scaffolding.h"
-
-static int glsl_version = 330;
-
-static void
-initialize_context(struct gl_context *ctx, gl_api api)
-{
-   initialize_context_to_defaults(ctx, api);
-
-   /* The standalone compiler needs to claim support for almost
-* everything in order to compile the built-in functions.
-*/
-   ctx->Const.GLSLVersion = glsl_version;
-   ctx->Extensions.ARB_ES3_compatibility = true;
-   ctx->Const.MaxComputeWorkGroupCount[0] = 65535;
-   ctx->Const.MaxComputeWorkGroupCount[1] = 65535;
-   ctx->Const.MaxComputeWorkGroupCount[2] = 65535;
-   ctx->Const.MaxComputeWorkGroupSize[0] = 1024;
-   ctx->Const.MaxComputeWorkGroupSize[1] = 1024;
-   ctx->Const.MaxComputeWorkGroupSize[2] = 64;
-   ctx->Const.MaxComputeWorkGroupInvocations = 1024;
-   ctx->Const.MaxComputeSharedMemorySize = 32768;
-   ctx->Const.Program[MESA_SHADER_COMPUTE].MaxTextureImageUnits = 16;
-   ctx->Const.Program[MESA_SHADER_COMPUTE].MaxUniformComponents = 1024;
-   ctx->Const.Program[MESA_SHADER_COMPUTE].MaxCombinedUniformComponents = 1024;
-   ctx->Const.Program[MESA_SHADER_COMPUTE].MaxInputComponents = 0; /* not used 
*/
-   ctx->Const.Program[MESA_SHADER_COMPUTE].MaxOutputComponents = 0; /* not 
used */
-   ctx->Const.Program[MESA_SHADER_COMPUTE].MaxAtomicBuffers = 8;
-   ctx->Const.Program[MESA_SHADER_COMPUTE].MaxAtomicCounters = 8;
-   ctx->Const.Program[MESA_SHADER_COMPUTE].MaxImageUniforms = 8;
-   ctx->Const.Program[MESA_SHADER_COMPUTE].MaxUniformBlocks = 12;
-
-   switch (ctx->Const.GLSLVersion) {
-   case 100:
-  ctx->Const.MaxClipPlanes = 0;
-  ctx->Const.MaxCombinedTextureImageUnits = 8;
-  ctx->Const.MaxDrawBuffers = 2;
-  ctx->Const.MinProgramTexelOffset = 0;
-  ctx->Const.MaxP

[Mesa-dev] [PATCH 2/3] nir: forward-declare 'struct gl_shader_program'

2016-05-14 Thread Rob Clark
From: Rob Clark 

Drop extra #include which is otherwise unneeded (and makes this header
difficult to include from outside of src/mesa).

Signed-off-by: Rob Clark 
---
 src/compiler/nir/glsl_to_nir.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/compiler/nir/glsl_to_nir.h b/src/compiler/nir/glsl_to_nir.h
index e3fe9b0..14641fc 100644
--- a/src/compiler/nir/glsl_to_nir.h
+++ b/src/compiler/nir/glsl_to_nir.h
@@ -26,12 +26,13 @@
  */
 
 #include "nir.h"
-#include "compiler/glsl/glsl_parser_extras.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
+struct gl_shader_program;
+
 nir_shader *glsl_to_nir(const struct gl_shader_program *shader_prog,
 gl_shader_stage stage,
 const nir_shader_compiler_options *options);
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] nir/algebraic: support for power-of-two optimizations

2016-05-14 Thread Jason Ekstrand
On Sat, May 14, 2016 at 12:20 PM, Rob Clark  wrote:

> On Thu, May 12, 2016 at 10:55 PM, Jason Ekstrand 
> wrote:
> >
> >
> > On Tue, May 10, 2016 at 11:57 AM, Rob Clark  wrote:
> >>
> >> From: Rob Clark 
> >>
> >> Some optimizations, like converting integer multiply/divide into left/
> >> right shifts, have additional constraints on the search expression.
> >> Like requiring that a variable is a constant power of two.  Support
> >> these cases by allowing a fxn name to be appended to the search var
> >> expression (ie. "a#32(is_power_of_two)").
> >>
> >> TODO update doc/comment explaining search var syntax
> >> TODO the eagle-eyed viewer might have noticed that this could also
> >> replace the existing const syntax (ie. "#a").  Not sure if we should
> >> keep that.. we could make it syntactic sugar (ie '#' automatically sets
> >> the cond fxn ptr to 'is_const') or just get rid of it entirely?  Maybe
> >> that is a follow-on clean-up patch?
> >>
> >> Signed-off-by: Rob Clark 
> >> ---
> >>  src/compiler/nir/nir_algebraic.py |  8 +++--
> >>  src/compiler/nir/nir_opt_algebraic.py |  5 +++
> >>  src/compiler/nir/nir_search.c |  3 ++
> >>  src/compiler/nir/nir_search.h | 10 ++
> >>  src/compiler/nir/nir_search_helpers.h | 66
> >> +++
> >>  5 files changed, 90 insertions(+), 2 deletions(-)
> >>  create mode 100644 src/compiler/nir/nir_search_helpers.h
> >>
> >> diff --git a/src/compiler/nir/nir_algebraic.py
> >> b/src/compiler/nir/nir_algebraic.py
> >> index 285f853..19ac6ee 100644
> >> --- a/src/compiler/nir/nir_algebraic.py
> >> +++ b/src/compiler/nir/nir_algebraic.py
> >> @@ -76,6 +76,7 @@ class Value(object):
> >>   return Constant(val, name_base)
> >>
> >> __template = mako.template.Template("""
> >> +#include "compiler/nir/nir_search_helpers.h"
> >>  static const ${val.c_type} ${val.name} = {
> >> { ${val.type_enum}, ${val.bit_size} },
> >>  % if isinstance(val, Constant):
> >> @@ -84,6 +85,7 @@ static const ${val.c_type} ${val.name} = {
> >> ${val.index}, /* ${val.var_name} */
> >> ${'true' if val.is_constant else 'false'},
> >> ${val.type() or 'nir_type_invalid' },
> >> +   ${val.cond if val.cond else 'NULL'},
> >>  % elif isinstance(val, Expression):
> >> ${'true' if val.inexact else 'false'},
> >> nir_op_${val.opcode},
> >> @@ -113,7 +115,7 @@ static const ${val.c_type} ${val.name} = {
> >>  Variable=Variable,
> >>  Expression=Expression)
> >>
> >> -_constant_re = re.compile(r"(?P[^@]+)(?:@(?P\d+))?")
> >> +_constant_re = re.compile(r"(?P[^@\(]+)(?:@(?P\d+))?")
> >
> >
> > Spurious change?
> >
>
> I thought it needed to avoid matching something like
> a(is_power_of_two).. but it seems to work with that hunk reverted so I
> guess I can drop it..
>
> >>
> >>
> >>  class Constant(Value):
> >> def __init__(self, val, name):
> >> @@ -150,7 +152,8 @@ class Constant(Value):
> >>   return "nir_type_float"
> >>
> >>  _var_name_re = re.compile(r"(?P#)?(?P\w+)"
> >> -
> >> r"(?:@(?Pint|uint|bool|float)?(?P\d+)?)?")
> >> +
> >> r"(?:@(?Pint|uint|bool|float)?(?P\d+)?)?"
> >> +  r"(?P\([^\)]+\))?")
> >>
> >>  class Variable(Value):
> >> def __init__(self, val, name, varset):
> >> @@ -161,6 +164,7 @@ class Variable(Value):
> >>
> >>self.var_name = m.group('name')
> >>self.is_constant = m.group('const') is not None
> >> +  self.cond = m.group('cond')
> >>self.required_type = m.group('type')
> >>self.bit_size = int(m.group('bits')) if m.group('bits') else 0
> >>
> >> diff --git a/src/compiler/nir/nir_opt_algebraic.py
> >> b/src/compiler/nir/nir_opt_algebraic.py
> >> index 0a95725..952a91a 100644
> >> --- a/src/compiler/nir/nir_opt_algebraic.py
> >> +++ b/src/compiler/nir/nir_opt_algebraic.py
> >> @@ -62,6 +62,11 @@ d = 'd'
> >>  # constructed value should have that bit-size.
> >>
> >>  optimizations = [
> >> +
> >> +   (('imul', a, '#b@32(is_power_of_two)'), ('ishl', a, ('find_lsb',
> b))),
> >> +   (('udiv', a, '#b@32(is_power_of_two)'), ('ushr', a, ('find_lsb',
> b))),
> >> +   (('umod', a, '#b(is_power_of_two)'),('iand', a, ('isub', b,
> 1))),
> >> +
> >> (('fneg', ('fneg', a)), a),
> >> (('ineg', ('ineg', a)), a),
> >> (('fabs', ('fabs', a)), ('fabs', a)),
> >> diff --git a/src/compiler/nir/nir_search.c
> b/src/compiler/nir/nir_search.c
> >> index 2c2fd92..b21fb2c 100644
> >> --- a/src/compiler/nir/nir_search.c
> >> +++ b/src/compiler/nir/nir_search.c
> >> @@ -127,6 +127,9 @@ match_value(const nir_search_value *value,
> >> nir_alu_instr *instr, unsigned src,
> >>   instr->src[src].src.ssa->parent_instr->type !=
> >> nir_instr_type_load_const)
> >>  return false;
> >>
> >> + if (var->cond && !var->cond(instr, src, num_components,
> >> new_swizzle))
> >> +return false;
> >> +
> >>   if (var->type != nir_type_inval

Re: [Mesa-dev] [PATCH v2] i965/blorp: Special-case the clear color in MSAA resolves

2016-05-14 Thread Jason Ekstrand
On Fri, May 13, 2016 at 10:49 AM, Jason Ekstrand 
wrote:

>
>
> On Wed, May 11, 2016 at 7:42 PM, Jason Ekstrand 
> wrote:
>
>> The current MSAA resolve code has a special-case for if the MCS value is
>> 0.
>> In this case we can only sample once because we know that all values are
>> in
>> slice 0.  This commit adds a second optimization that detecs the magic MCS
>> value that indicates the clear color and grabs the color from a push
>> constant and avoids sampling altogether.  On a microbenchmark written by
>> Neil Roberts that tests resolving surfaces with just clear color, this
>> improves performance by 60% for 8x, 40% for 4x, and 28% for 2x MSAA on my
>> SKL gte3 laptop.  The benchmark can be found on the ML archive:
>>
>> https://lists.freedesktop.org/archives/mesa-dev/2016-February/108077.html
>>
>
More data:  It seems to help T-Rex on Haswell by maybe 0.5% and hurts some
of the cpu-bound synthetics just a bit.  Meh?
--Jason


> ---
>>  src/mesa/drivers/dri/i965/brw_blorp.h|   4 +-
>>  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 101
>> +--
>>  2 files changed, 100 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h
>> b/src/mesa/drivers/dri/i965/brw_blorp.h
>> index 15114d0..9d71ca4 100644
>> --- a/src/mesa/drivers/dri/i965/brw_blorp.h
>> +++ b/src/mesa/drivers/dri/i965/brw_blorp.h
>> @@ -197,7 +197,9 @@ struct brw_blorp_wm_push_constants
>> uint32_t src_z;
>>
>> /* Pad out to an integral number of registers */
>> -   uint32_t pad[5];
>> +   uint32_t pad;
>> +
>> +   union gl_color_union clear_color;
>>  };
>>
>>  #define BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS \
>> diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
>> b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
>> index 514a316..45b696d 100644
>> --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
>> @@ -346,6 +346,7 @@ struct brw_blorp_blit_vars {
>>nir_variable *offset;
>> } u_x_transform, u_y_transform;
>> nir_variable *u_src_z;
>> +   nir_variable *u_clear_color;
>>
>> /* gl_FragCoord */
>> nir_variable *frag_coord;
>> @@ -374,6 +375,7 @@ brw_blorp_blit_vars_init(nir_builder *b, struct
>> brw_blorp_blit_vars *v,
>> LOAD_UNIFORM(y_transform.multiplier, glsl_float_type())
>> LOAD_UNIFORM(y_transform.offset, glsl_float_type())
>> LOAD_UNIFORM(src_z, glsl_uint_type())
>> +   LOAD_UNIFORM(clear_color, glsl_vec4_type())
>>
>>  #undef DECL_UNIFORM
>>
>> @@ -858,7 +860,8 @@ static nir_ssa_def *
>>  blorp_nir_manual_blend_average(nir_builder *b, nir_ssa_def *pos,
>> unsigned tex_samples,
>> enum intel_msaa_layout tex_layout,
>> -   enum brw_reg_type dst_type)
>> +   enum brw_reg_type dst_type,
>> +   struct brw_blorp_blit_vars *v)
>>  {
>> /* If non-null, this is the outer-most if statement */
>> nir_if *outer_if = NULL;
>> @@ -867,9 +870,53 @@ blorp_nir_manual_blend_average(nir_builder *b,
>> nir_ssa_def *pos,
>>nir_local_variable_create(b->impl, glsl_vec4_type(), "color");
>>
>> nir_ssa_def *mcs = NULL;
>> -   if (tex_layout == INTEL_MSAA_LAYOUT_CMS)
>> +   if (tex_layout == INTEL_MSAA_LAYOUT_CMS) {
>>mcs = blorp_nir_txf_ms_mcs(b, pos);
>>
>> +  /* The MCS buffer stores a packed value that provides a mapping
>> from
>> +   * samples to array slices.  The magic value of all ones means
>> that all
>> +   * samples have the clear color.  In this case, we can
>> short-circuit the
>> +   * sampling process and just use the clear color that we pushed
>> into the
>> +   * shader.
>> +   */
>> +  nir_ssa_def *is_clear_color;
>> +  switch (tex_samples) {
>> +  case 2:
>> + /* Empirical evidence suggests that the value returned from the
>> +  * sampler is not always 0x3 for clear color so we need to mask
>> it.
>> +  */
>> + is_clear_color =
>> +nir_ieq(b, nir_iand(b, nir_channel(b, mcs, 0),
>> nir_imm_int(b, 0x3)),
>> +   nir_imm_int(b, 0x3));
>> + break;
>> +  case 4:
>> + is_clear_color =
>> +nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b, 0xff));
>> + break;
>> +  case 8:
>> + is_clear_color =
>> +nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b, ~0));
>> + break;
>> +  case 16:
>> + is_clear_color =
>> +nir_ior(b, nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b,
>> ~0)),
>>
>
> This needs to be nir_iand.  Fixed locally...
>
>
>> +   nir_ieq(b, nir_channel(b, mcs, 1), nir_imm_int(b,
>> ~0)));
>> + break;
>> +  default:
>> + unreachable("Invalid sample count");
>> +  }
>> +
>> +  nir_if *if_stmt = nir_if_create(b->shader);
>> +  if_stmt->condition = nir_src_for_ssa(is_cl

Re: [Mesa-dev] [PATCH] nir: fix comment typo about f2d/d2f

2016-05-14 Thread Kenneth Graunke
On Saturday, May 14, 2016 3:26:41 PM PDT Rob Clark wrote:
> From: Rob Clark 
> 
> Signed-off-by: Rob Clark 
> ---
>  src/compiler/nir/nir_opcodes.py | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/
nir_opcodes.py
> index 24ffc31..9d05594 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -180,8 +180,8 @@ unop_convert("b2i", tint32, tbool, "src0 ? 1 : 0") # 
Boolean-to-int conversion
>  unop_convert("u2f", tfloat32, tuint32, "src0") # Unsigned-to-float 
conversion.
>  unop_convert("u2d", tfloat64, tuint32, "src0") # Unsigned-to-double 
conversion.
>  # double-to-float conversion
> -unop_convert("d2f", tfloat32, tfloat64, "src0") # Single to double 
precision
> -unop_convert("f2d", tfloat64, tfloat32, "src0") # Double to single 
precision
> +unop_convert("d2f", tfloat32, tfloat64, "src0") # Double to single 
precision
> +unop_convert("f2d", tfloat64, tfloat32, "src0") # Single to double 
precision
>  
>  # half/full conversion:
>  unop_convert("f2h", tfloat16, tfloat32, "src0")
> 

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] nir/print: add support for print annotations

2016-05-14 Thread Rob Clark
From: Rob Clark 

Caller can pass a hashtable mapping NIR object (currently instr or var,
but I guess others could be added as needed) to annotation msg to print
inline with the shader dump.  As the annotation msg is printed, it is
removed from the hashtable to give the caller a way to know about any
unassociated msgs.

This is used in the next patch, for nir_validate to try to associate
error msgs to nir_print dump.

Signed-off-by: Rob Clark 
---
 src/compiler/nir/nir.h   |  1 +
 src/compiler/nir/nir_print.c | 41 -
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index ade584c..5f2cc8e 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2196,6 +2196,7 @@ unsigned nir_index_instrs(nir_function_impl *impl);
 void nir_index_blocks(nir_function_impl *impl);
 
 void nir_print_shader(nir_shader *shader, FILE *fp);
+void nir_print_shader_annotated(nir_shader *shader, FILE *fp, struct 
hash_table *errors);
 void nir_print_instr(const nir_instr *instr, FILE *fp);
 
 nir_shader *nir_shader_clone(void *mem_ctx, const nir_shader *s);
diff --git a/src/compiler/nir/nir_print.c b/src/compiler/nir/nir_print.c
index a36561e..70ed73f 100644
--- a/src/compiler/nir/nir_print.c
+++ b/src/compiler/nir/nir_print.c
@@ -53,8 +53,28 @@ typedef struct {
 
/* an index used to make new non-conflicting names */
unsigned index;
+
+   /**
+* Optional table of annotations mapping nir object
+* (such as instr or var) to message to print.
+*/
+   struct hash_table *annotations;
 } print_state;
 
+static const char *
+get_annotation(print_state *state, void *obj)
+{
+   if (!state->annotations)
+  return NULL;
+
+   struct hash_entry *entry = _mesa_hash_table_search(state->annotations, obj);
+   if (entry) {
+  _mesa_hash_table_remove(state->annotations, entry);
+  return entry->data;
+   }
+   return NULL;
+}
+
 static void
 print_register(nir_register *reg, print_state *state)
 {
@@ -413,6 +433,11 @@ print_var_decl(nir_variable *var, print_state *state)
}
 
fprintf(fp, "\n");
+
+   const char *note = get_annotation(state, var);
+   if (note) {
+  fprintf(stderr, "%s\n", note);
+   }
 }
 
 static void
@@ -918,6 +943,11 @@ print_block(nir_block *block, print_state *state, unsigned 
tabs)
nir_foreach_instr(instr, block) {
   print_instr(instr, state, tabs);
   fprintf(fp, "\n");
+
+  const char *note = get_annotation(state, instr);
+  if (note) {
+ fprintf(stderr, "%s\n", note);
+  }
}
 
print_tabs(tabs, fp);
@@ -1090,11 +1120,14 @@ destroy_print_state(print_state *state)
 }
 
 void
-nir_print_shader(nir_shader *shader, FILE *fp)
+nir_print_shader_annotated(nir_shader *shader, FILE *fp,
+   struct hash_table *annotations)
 {
print_state state;
init_print_state(&state, shader, fp);
 
+   state.annotations = annotations;
+
fprintf(fp, "shader: %s\n", gl_shader_stage_name(shader->stage));
 
if (shader->info.name)
@@ -1144,6 +1177,12 @@ nir_print_shader(nir_shader *shader, FILE *fp)
 }
 
 void
+nir_print_shader(nir_shader *shader, FILE *fp)
+{
+   nir_print_shader_annotated(shader, fp, NULL);
+}
+
+void
 nir_print_instr(const nir_instr *instr, FILE *fp)
 {
print_state state = {
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] nir/validate: dump annotated shader with error msgs

2016-05-14 Thread Rob Clark
From: Rob Clark 

Log all the errors, and at the end dump the shader w/ error annotations
to make it easier to see where the problems are.

Signed-off-by: Rob Clark 
---
 src/compiler/nir/nir_validate.c | 58 +
 1 file changed, 58 insertions(+)

diff --git a/src/compiler/nir/nir_validate.c b/src/compiler/nir/nir_validate.c
index 84334d4..a26f480 100644
--- a/src/compiler/nir/nir_validate.c
+++ b/src/compiler/nir/nir_validate.c
@@ -69,6 +69,9 @@ typedef struct {
/* the current instruction being validated */
nir_instr *instr;
 
+   /* the current variable being validated */
+   nir_variable *var;
+
/* the current basic block being validated */
nir_block *block;
 
@@ -95,8 +98,29 @@ typedef struct {
 
/* map of local variable -> function implementation where it is defined */
struct hash_table *var_defs;
+
+   /* map of instruction/var/etc to failed assert string */
+   struct hash_table *errors;
 } validate_state;
 
+static void
+log_error(validate_state *state, const char *failed)
+{
+   const void *obj;
+
+   if (state->instr)
+  obj = state->instr;
+   else if (state->var)
+  obj = state->var;
+   else
+  obj = failed;
+
+   _mesa_hash_table_insert(state->errors, obj, (void *)failed);
+}
+
+#undef assert
+#define assert(x) do { if (!(x)) log_error(state, "error: "#x); } while (0)
+
 static void validate_src(nir_src *src, validate_state *state);
 
 static void
@@ -901,6 +925,8 @@ postvalidate_reg_decl(nir_register *reg, validate_state 
*state)
 static void
 validate_var_decl(nir_variable *var, bool is_global, validate_state *state)
 {
+   state->var = var;
+
assert(is_global == nir_variable_is_global(var));
 
/* Must have exactly one mode set */
@@ -914,6 +940,8 @@ validate_var_decl(nir_variable *var, bool is_global, 
validate_state *state)
if (!is_global) {
   _mesa_hash_table_insert(state->var_defs, var, state->impl);
}
+
+   state->var = NULL;
 }
 
 static bool
@@ -1042,7 +1070,12 @@ init_validate_state(validate_state *state)
state->regs_found = NULL;
state->var_defs = _mesa_hash_table_create(NULL, _mesa_hash_pointer,
  _mesa_key_pointer_equal);
+   state->errors = _mesa_hash_table_create(NULL, _mesa_hash_pointer,
+   _mesa_key_pointer_equal);
+
state->loop = NULL;
+   state->instr = NULL;
+   state->var = NULL;
 }
 
 static void
@@ -1053,6 +1086,28 @@ destroy_validate_state(validate_state *state)
free(state->ssa_defs_found);
free(state->regs_found);
_mesa_hash_table_destroy(state->var_defs, NULL);
+   _mesa_hash_table_destroy(state->errors, NULL);
+}
+
+static void
+dump_errors(validate_state *state)
+{
+   struct hash_table *errors = state->errors;
+
+   fprintf(stderr, "%d errors:\n", _mesa_hash_table_num_entries(errors));
+
+   nir_print_shader_annotated(state->shader, stderr, errors);
+
+   if (_mesa_hash_table_num_entries(errors) > 0) {
+  fprintf(stderr, "%d additional errors:\n",
+  _mesa_hash_table_num_entries(errors));
+  struct hash_entry *entry;
+  hash_table_foreach(errors, entry) {
+ fprintf(stderr, "%s\n", (char *)entry->data);
+  }
+   }
+
+   abort();
 }
 
 void
@@ -1112,6 +1167,9 @@ nir_validate_shader(nir_shader *shader)
   postvalidate_reg_decl(reg, &state);
}
 
+   if (_mesa_hash_table_num_entries(state.errors) > 0)
+  dump_errors(&state);
+
destroy_validate_state(&state);
 }
 
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] clover: Handle PIPE_SHADER_IR_NIR in switch

2016-05-14 Thread Jan Vesely
Signed-off-by: Jan Vesely 
---
 src/gallium/state_trackers/clover/llvm/invocation.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
b/src/gallium/state_trackers/clover/llvm/invocation.cpp
index 96f6a48..e2cadda 100644
--- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
+++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
@@ -893,8 +893,9 @@ clover::compile_program_llvm(const std::string &source,
module m;
// Build the clover::module
switch (ir) {
+  case PIPE_SHADER_IR_NIR:
   case PIPE_SHADER_IR_TGSI:
- //XXX: Handle TGSI
+ //XXX: Handle TGSI, NIR
  assert(0);
  m = module();
  break;
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] clover: Error on incomplete switch statements

2016-05-14 Thread Jan Vesely
Signed-off-by: Jan Vesely 
---
 src/gallium/state_trackers/clover/Makefile.am | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/gallium/state_trackers/clover/Makefile.am 
b/src/gallium/state_trackers/clover/Makefile.am
index 4c9d7d9..26ebd3b 100644
--- a/src/gallium/state_trackers/clover/Makefile.am
+++ b/src/gallium/state_trackers/clover/Makefile.am
@@ -1,5 +1,9 @@
 include Makefile.sources
 
+AM_CXXFLAGS = -Werror=switch
+
+CXXFLAGS += $(AM_CXXFLAGS)
+
 AM_CPPFLAGS = \
-I$(top_srcdir)/include \
-I$(top_srcdir)/src \
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir: fix comment typo about f2d/d2f

2016-05-14 Thread Rob Clark
From: Rob Clark 

Signed-off-by: Rob Clark 
---
 src/compiler/nir/nir_opcodes.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
index 24ffc31..9d05594 100644
--- a/src/compiler/nir/nir_opcodes.py
+++ b/src/compiler/nir/nir_opcodes.py
@@ -180,8 +180,8 @@ unop_convert("b2i", tint32, tbool, "src0 ? 1 : 0") # 
Boolean-to-int conversion
 unop_convert("u2f", tfloat32, tuint32, "src0") # Unsigned-to-float conversion.
 unop_convert("u2d", tfloat64, tuint32, "src0") # Unsigned-to-double conversion.
 # double-to-float conversion
-unop_convert("d2f", tfloat32, tfloat64, "src0") # Single to double precision
-unop_convert("f2d", tfloat64, tfloat32, "src0") # Double to single precision
+unop_convert("d2f", tfloat32, tfloat64, "src0") # Double to single precision
+unop_convert("f2d", tfloat64, tfloat32, "src0") # Single to double precision
 
 # half/full conversion:
 unop_convert("f2h", tfloat16, tfloat32, "src0")
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/17] nir: add lowering pass for y-transform

2016-05-14 Thread Rob Clark
On Thu, May 12, 2016 at 10:17 PM, Jason Ekstrand  wrote:
>
>
> On Mon, May 9, 2016 at 12:33 PM, Rob Clark  wrote:
>>
>> From: Rob Clark 
>>
>> Signed-off-by: Rob Clark 
>> Reviewed-by: Connor Abbott 
>> ---
>>  src/compiler/Makefile.sources|   1 +
>>  src/compiler/nir/nir.h   |  11 +
>>  src/compiler/nir/nir_lower_wpos_ytransform.c | 310
>> +++
>>  3 files changed, 322 insertions(+)
>>  create mode 100644 src/compiler/nir/nir_lower_wpos_ytransform.c
>>
>> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
>> index 2a52319..b542a1a 100644
>> --- a/src/compiler/Makefile.sources
>> +++ b/src/compiler/Makefile.sources
>> @@ -208,6 +208,7 @@ NIR_FILES = \
>> nir/nir_lower_vars_to_ssa.c \
>> nir/nir_lower_var_copies.c \
>> nir/nir_lower_vec_to_movs.c \
>> +   nir/nir_lower_wpos_ytransform.c \
>> nir/nir_metadata.c \
>> nir/nir_move_vec_src_uses_to_dest.c \
>> nir/nir_normalize_cubemap_coords.c \
>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> index 8a616d4..474ba63 100644
>> --- a/src/compiler/nir/nir.h
>> +++ b/src/compiler/nir/nir.h
>> @@ -2374,6 +2374,17 @@ void nir_lower_two_sided_color(nir_shader *shader);
>>
>>  void nir_lower_clamp_color_outputs(nir_shader *shader);
>>
>> +typedef struct nir_lower_wpos_ytransform_options {
>> +   int state_tokens[5];
>> +   bool fs_coord_origin_upper_left :1;
>> +   bool fs_coord_origin_lower_left :1;
>> +   bool fs_coord_pixel_center_integer :1;
>> +   bool fs_coord_pixel_center_half_integer :1;
>
>
> Drive-by commentary: Why are we using two booleans for one boolean here?
> All hardware should be either lower-left or upper-left and I'm going to
> hazard that the other two are mutually exclusive as well.  The pass
> certainly seems to assume so.

mostly just because gallium splits it out into two caps, and this
matches the logic in the equiv tgsi lowering pass more closely..

The way it is currently would, I think, work if there was some hw that
supported both cases (which is, I assume, why the gallium part of it
works the way it does)

BR,
-R

> Let's just make it two booleans.   If we come across hardware that puts the
> pixel center at 0.75, 0.25 then we can make fs_coord_pixel_center an enum.
> --Jason
>
>>
>> +} nir_lower_wpos_ytransform_options;
>> +
>> +bool nir_lower_wpos_ytransform(nir_shader *shader,
>> +   const nir_lower_wpos_ytransform_options
>> *options);
>> +
>>  void nir_lower_atomics(nir_shader *shader,
>> const struct gl_shader_program *shader_program);
>>  void nir_lower_to_source_mods(nir_shader *shader);
>> diff --git a/src/compiler/nir/nir_lower_wpos_ytransform.c
>> b/src/compiler/nir/nir_lower_wpos_ytransform.c
>> new file mode 100644
>> index 000..1d53530
>> --- /dev/null
>> +++ b/src/compiler/nir/nir_lower_wpos_ytransform.c
>> @@ -0,0 +1,310 @@
>> +/*
>> + * Copyright © 2015 Red Hat
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining
>> a
>> + * copy of this software and associated documentation files (the
>> "Software"),
>> + * to deal in the Software without restriction, including without
>> limitation
>> + * the rights to use, copy, modify, merge, publish, distribute,
>> sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the
>> next
>> + * paragraph) shall be included in all copies or substantial portions of
>> the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
>> SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
>> IN THE
>> + * SOFTWARE.
>> + */
>> +
>> +#include "nir.h"
>> +#include "nir_builder.h"
>> +
>> +/* Lower gl_FragCoord (and fddy) to account for driver's requested
>> coordinate-
>> + * origin and pixel-center vs. shader.  If transformation is required, a
>> + * gl_FbWposYTransform uniform is inserted (with the specified
>> state-slots)
>> + * and additional instructions are inserted to transform gl_FragCoord
>> (and
>> + * fddy src arg).
>> + *
>> + * This is based on the logic in emit_wpos()/emit_wpos_adjustment() in
>> TGSI
>> + * compiler.
>> + *
>> + * Run before nir_lower_io.
>> + */
>> +
>> +typedef struct {
>> +   const nir_lower_wpos_ytransform_options *options;
>> +   nir_shader   *shader;
>> +   nir_builder   b;
>> +   nir_variable *transform;
>> +} lower_wpos_ytransform_stat

Re: [Mesa-dev] [PATCH 2/3] nir/algebraic: support for power-of-two optimizations

2016-05-14 Thread Rob Clark
On Thu, May 12, 2016 at 10:55 PM, Jason Ekstrand  wrote:
>
>
> On Tue, May 10, 2016 at 11:57 AM, Rob Clark  wrote:
>>
>> From: Rob Clark 
>>
>> Some optimizations, like converting integer multiply/divide into left/
>> right shifts, have additional constraints on the search expression.
>> Like requiring that a variable is a constant power of two.  Support
>> these cases by allowing a fxn name to be appended to the search var
>> expression (ie. "a#32(is_power_of_two)").
>>
>> TODO update doc/comment explaining search var syntax
>> TODO the eagle-eyed viewer might have noticed that this could also
>> replace the existing const syntax (ie. "#a").  Not sure if we should
>> keep that.. we could make it syntactic sugar (ie '#' automatically sets
>> the cond fxn ptr to 'is_const') or just get rid of it entirely?  Maybe
>> that is a follow-on clean-up patch?
>>
>> Signed-off-by: Rob Clark 
>> ---
>>  src/compiler/nir/nir_algebraic.py |  8 +++--
>>  src/compiler/nir/nir_opt_algebraic.py |  5 +++
>>  src/compiler/nir/nir_search.c |  3 ++
>>  src/compiler/nir/nir_search.h | 10 ++
>>  src/compiler/nir/nir_search_helpers.h | 66
>> +++
>>  5 files changed, 90 insertions(+), 2 deletions(-)
>>  create mode 100644 src/compiler/nir/nir_search_helpers.h
>>
>> diff --git a/src/compiler/nir/nir_algebraic.py
>> b/src/compiler/nir/nir_algebraic.py
>> index 285f853..19ac6ee 100644
>> --- a/src/compiler/nir/nir_algebraic.py
>> +++ b/src/compiler/nir/nir_algebraic.py
>> @@ -76,6 +76,7 @@ class Value(object):
>>   return Constant(val, name_base)
>>
>> __template = mako.template.Template("""
>> +#include "compiler/nir/nir_search_helpers.h"
>>  static const ${val.c_type} ${val.name} = {
>> { ${val.type_enum}, ${val.bit_size} },
>>  % if isinstance(val, Constant):
>> @@ -84,6 +85,7 @@ static const ${val.c_type} ${val.name} = {
>> ${val.index}, /* ${val.var_name} */
>> ${'true' if val.is_constant else 'false'},
>> ${val.type() or 'nir_type_invalid' },
>> +   ${val.cond if val.cond else 'NULL'},
>>  % elif isinstance(val, Expression):
>> ${'true' if val.inexact else 'false'},
>> nir_op_${val.opcode},
>> @@ -113,7 +115,7 @@ static const ${val.c_type} ${val.name} = {
>>  Variable=Variable,
>>  Expression=Expression)
>>
>> -_constant_re = re.compile(r"(?P[^@]+)(?:@(?P\d+))?")
>> +_constant_re = re.compile(r"(?P[^@\(]+)(?:@(?P\d+))?")
>
>
> Spurious change?
>

I thought it needed to avoid matching something like
a(is_power_of_two).. but it seems to work with that hunk reverted so I
guess I can drop it..

>>
>>
>>  class Constant(Value):
>> def __init__(self, val, name):
>> @@ -150,7 +152,8 @@ class Constant(Value):
>>   return "nir_type_float"
>>
>>  _var_name_re = re.compile(r"(?P#)?(?P\w+)"
>> -
>> r"(?:@(?Pint|uint|bool|float)?(?P\d+)?)?")
>> +
>> r"(?:@(?Pint|uint|bool|float)?(?P\d+)?)?"
>> +  r"(?P\([^\)]+\))?")
>>
>>  class Variable(Value):
>> def __init__(self, val, name, varset):
>> @@ -161,6 +164,7 @@ class Variable(Value):
>>
>>self.var_name = m.group('name')
>>self.is_constant = m.group('const') is not None
>> +  self.cond = m.group('cond')
>>self.required_type = m.group('type')
>>self.bit_size = int(m.group('bits')) if m.group('bits') else 0
>>
>> diff --git a/src/compiler/nir/nir_opt_algebraic.py
>> b/src/compiler/nir/nir_opt_algebraic.py
>> index 0a95725..952a91a 100644
>> --- a/src/compiler/nir/nir_opt_algebraic.py
>> +++ b/src/compiler/nir/nir_opt_algebraic.py
>> @@ -62,6 +62,11 @@ d = 'd'
>>  # constructed value should have that bit-size.
>>
>>  optimizations = [
>> +
>> +   (('imul', a, '#b@32(is_power_of_two)'), ('ishl', a, ('find_lsb', b))),
>> +   (('udiv', a, '#b@32(is_power_of_two)'), ('ushr', a, ('find_lsb', b))),
>> +   (('umod', a, '#b(is_power_of_two)'),('iand', a, ('isub', b, 1))),
>> +
>> (('fneg', ('fneg', a)), a),
>> (('ineg', ('ineg', a)), a),
>> (('fabs', ('fabs', a)), ('fabs', a)),
>> diff --git a/src/compiler/nir/nir_search.c b/src/compiler/nir/nir_search.c
>> index 2c2fd92..b21fb2c 100644
>> --- a/src/compiler/nir/nir_search.c
>> +++ b/src/compiler/nir/nir_search.c
>> @@ -127,6 +127,9 @@ match_value(const nir_search_value *value,
>> nir_alu_instr *instr, unsigned src,
>>   instr->src[src].src.ssa->parent_instr->type !=
>> nir_instr_type_load_const)
>>  return false;
>>
>> + if (var->cond && !var->cond(instr, src, num_components,
>> new_swizzle))
>> +return false;
>> +
>>   if (var->type != nir_type_invalid) {
>>  if (instr->src[src].src.ssa->parent_instr->type !=
>> nir_instr_type_alu)
>> return false;
>> diff --git a/src/compiler/nir/nir_search.h b/src/compiler/nir/nir_search.h
>> index a500feb..f55d797 100644
>> --- a/src/compiler/nir/nir_search.h
>> +++ b/src/compiler/nir/ni

Re: [Mesa-dev] [PATCH 09/11] tgsi: remove culldist semantic.

2016-05-14 Thread Roland Scheidegger

On 05/14/2016 04:24 PM, Ilia Mirkin wrote:

On Sat, May 14, 2016 at 10:23 AM, Roland Scheidegger  wrote:

Am 14.05.2016 um 14:55 schrieb Marek Olšák:

Dave,
It should be noted that clip distances can be disabled by
pipe_rasterizer_state::clip_plane_enable, but cull distances can't.
(same as GL)


That only applies to user clip planes, not shader clip distances.


Actually, it applies to both.


Yes, you are right. Ahh crap. draw, however, ignores the enable bits for 
clip distances (and we're probably relying on this even internally right 
now). Do blobs actually honor them? I'm wondering because some code 
changes I was recently doing at vmware shouldn't have worked if they 
did... Or maybe I got lucky...
In any case honoring the enable bits should still be possible even with 
both clip and cull integrated into the same output.


Roland


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] egl: android: drop dri2_create_image_android_native_buffer argument

2016-05-14 Thread Rob Herring
On Sun, May 1, 2016 at 6:42 AM, Emil Velikov  wrote:
> The drv is no longer used/needed as of last commit.
>
> Cc: Rob Herring 
> Signed-off-by: Emil Velikov 
> ---
>  src/egl/drivers/dri2/platform_android.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Acked-by: Rob Herring 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] egl: android: directly use dri2_create_image_dma_buf()

2016-05-14 Thread Rob Herring
On Sun, May 1, 2016 at 6:42 AM, Emil Velikov  wrote:
> Make the function non static so that we can use it directly from the
> android platform code.
>
> Cc: Rob Herring 
> Signed-off-by: Emil Velikov 
> ---
>  src/egl/drivers/dri2/egl_dri2.c | 2 +-
>  src/egl/drivers/dri2/egl_dri2.h | 4 
>  src/egl/drivers/dri2/platform_android.c | 3 +--
>  3 files changed, 6 insertions(+), 3 deletions(-)

Acked-by: Rob Herring 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 95395] glsl: NULL type value in add_uniform() leads to SIGSEGV

2016-05-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=95395

--- Comment #1 from Ilia Mirkin  ---
Another thing to check is whether you can reproduce on amd64 with
DRAW_USE_LLVM=0 - otherwise softpipe uses llvm for the vertex stages when
available.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 95395] glsl: NULL type value in add_uniform() leads to SIGSEGV

2016-05-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=95395

Kenneth Graunke  changed:

   What|Removed |Added

   Assignee|i...@freedesktop.org |mesa-dev@lists.freedesktop.
   ||org

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Android: apps crashed on Intel Gen9 GPU

2016-05-14 Thread Chih-Wei Huang
2016-05-13 15:30 GMT+08:00 Pohjolainen, Topi :
> On Thu, May 12, 2016 at 12:25:25AM +0800, Chih-Wei Huang wrote:
>> Testing android-x86 with mesa 11.2.2,
>> I found the Google Play crashed forever on
>> a device with Intel Gen9 GPU (e.g., Skylake).
>>
>> After analyzing, the i965 driver seems to assume
>> irb->mt is not null. For example in
>> brw_meta_fast_clear of brw_meta_fast_clear.c:
>>
>>   struct intel_renderbuffer *irb = intel_renderbuffer(rb);
>>   ...
>>   if (brw->gen >= 9 &&
>>   brw_format_for_mesa_format(irb->mt->format) !=
>> ^ => crashing
>>   brw->render_target_format[irb->mt->format])
>>  clear_type = REP_CLEAR;
>>
>> If I added null checking to irb->mt, it fixes this crashing.
>> However, the app still crashed at other place that
>> accesses irb->mt similarly.
>> (brw_draw.c line 399, gen8_surface_state.c line 432, etc)
>>
>> Please comment how to fix it correctly.
>> Why irb->mt is null but the code assumes it's not?
>
> As far as I understand something has gone wrong before - having an
> intel_renderbuffer without a miptree shouldn't be a reachable state at all.

Thank you for the reply.
When/where should the miptree be set?
How can I debug it?


-- 
Chih-Wei
Android-x86 project
http://www.android-x86.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 00/14] radeonsi: Offchip tessellation

2016-05-14 Thread
Hi, there are minor rendering glitches in Shadow of Mordor on R9 390.
git-59156b2 + this patch v2. I can send a screenshot if you are unable to
reproduce this.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa 11.2.2 problems with Intel i965 graphics on Arch Linux

2016-05-14 Thread Kenneth Graunke
On Saturday, May 14, 2016 8:55:12 AM PDT Vanja Z wrote:
> Hi all,
> 
> I'm sorry if this is the wrong place to post this. Upgrading from mesa 
11.2.1 to 11.2.2 on Arch Linux results in several programs not working. I am 
getting the following errors when launching Paraview for example,
> 
> libGL error: unable to load driver: i965_dri.so
> libGL error: driver pointer missing
> libGL error: failed to load driver: i965
> libGL error: unable to load driver: swrast_dri.so
> libGL error: failed to load driver: swrast
> 
> Both files exist on my system,
> 
> /usr/lib/xorg/modules/dri/i965_dri.so
> /usr/lib/xorg/modules/dri/swrast_dri.so
> 
> I am not sure if this is a problem with mesa, or with the Arch package or 
with my X configuration. I've tried asking on the Arch forums to no avail.
> 
> 
> Best regards,
> Vanja

This is likely an issue with your installation.
Setting LIBGL_DEBUG=verbose when running the application would give
you more information.

If some programs are working and not others, it could be a multilib
issue - maybe your lib32-mesa packages are messed up?

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] c11/threads: create mutexattrs only when needed

2016-05-14 Thread Emil Velikov
Any comments on the patch and/or discussion after the --- line ?

On 24 April 2016 at 16:14, Emil Velikov  wrote:
> From: Emil Velikov 
>
> If the mutexattrs are the default one can just pass NULL to
> pthread_mutex_init. As the compiler does not know this detail it
> unnecessarily creates/destroys the attrs.
>
> Signed-off-by: Emil Velikov 
> ---
> While going through GLVND, I've noticed that it (sort of) breaks its
> assumptions/goals - 'we don't want the heavy locking/etc. brought by
> pthreads' [for single threaded uses]
>
> Thus I gave mesa a quick look and the following popped up:
>
> - pthread_once - libglapi, classic + dri
> Replace with an atomic test & set combo ?
>
> - pthread_mutexattr_* - all dri modules, libGL-apple
> Using a recursive lock in src/mesa/main/shared.c and
> src/glx/apple/apple_glx_drawable.c
>
> - pthread_key_* - EGL
> - pthread_.etspecific - EGL
> Extend pthread-stubs explicitly required it by mesa ?
> Note: the original code that pthread-stubs is based on (libX11) does
> have these ;-)
>
> - pthread_barrier_* - llvmpipe
> Fall-back to the mutex + cond implementation ?
>
> - pthread_setname_np - llvmpipe
> Do we need this ? Afaict the Windows build does not have an equivalent.
>
> - pthread_join - nine, llvmpipe, radeon(s), rbug, omx (thanks bellagio)
> - pthread_create - nine, llvmpipe, radeon(s), rbug
> - pthread_sigmask - nine, llvmpipe, radeon(s), rbug
>
> These four (five inc bellagio/omx) want more than one thread. How do we
> get others pthread free, while keeping these happy ?
>
> Please let me know how you feel on the topic. Do you see this as worthy
> goal ? Does the proposed solutions sound OK ? Can you think of any
> alternatives?
>
> -Emil
>
> P.S. For anyone who wonders, libc (GNU one only iirc) provides
> lightweight stubs, thus single-threaded apps work without the overhead.
> ---
>  include/c11/threads_posix.h | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/include/c11/threads_posix.h b/include/c11/threads_posix.h
> index ce9853b..11d36e4 100644
> --- a/include/c11/threads_posix.h
> +++ b/include/c11/threads_posix.h
> @@ -180,9 +180,14 @@ mtx_init(mtx_t *mtx, int type)
>&& type != (mtx_timed|mtx_recursive)
>&& type != (mtx_try|mtx_recursive))
>  return thrd_error;
> +
> +if ((type & mtx_recursive) == 0) {
> +pthread_mutex_init(mtx, NULL);
> +return thrd_success;
> +}
> +
>  pthread_mutexattr_init(&attr);
> -if ((type & mtx_recursive) != 0)
> -pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
> +pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
>  pthread_mutex_init(mtx, &attr);
>  pthread_mutexattr_destroy(&attr);
>  return thrd_success;
> --
> 2.8.0
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] egl: android: directly use dri2_create_image_dma_buf()

2016-05-14 Thread Emil Velikov
On 1 May 2016 at 12:42, Emil Velikov  wrote:
> Make the function non static so that we can use it directly from the
> android platform code.
>
> Cc: Rob Herring 
> Signed-off-by: Emil Velikov 
> ---
>  src/egl/drivers/dri2/egl_dri2.c | 2 +-
>  src/egl/drivers/dri2/egl_dri2.h | 4 
>  src/egl/drivers/dri2/platform_android.c | 3 +--
>  3 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
> index d8448f4..95dc0d6 100644
> --- a/src/egl/drivers/dri2/egl_dri2.c
> +++ b/src/egl/drivers/dri2/egl_dri2.c
> @@ -1960,7 +1960,7 @@ dri2_check_dma_buf_format(const _EGLImageAttribs *attrs)
>   *
>   * Therefore we must never close or otherwise modify the file descriptors.
>   */
> -static _EGLImage *
> +_EGLImage *
>  dri2_create_image_dma_buf(_EGLDisplay *disp, _EGLContext *ctx,
>   EGLClientBuffer buffer, const EGLint *attr_list)
>  {
> diff --git a/src/egl/drivers/dri2/egl_dri2.h b/src/egl/drivers/dri2/egl_dri2.h
> index ddb5f39..925294b 100644
> --- a/src/egl/drivers/dri2/egl_dri2.h
> +++ b/src/egl/drivers/dri2/egl_dri2.h
> @@ -361,6 +361,10 @@ dri2_create_image_khr(_EGLDriver *drv, _EGLDisplay *disp,
>   _EGLContext *ctx, EGLenum target,
>   EGLClientBuffer buffer, const EGLint *attr_list);
>
> +_EGLImage *
> +dri2_create_image_dma_buf(_EGLDisplay *disp, _EGLContext *ctx,
> + EGLClientBuffer buffer, const EGLint *attr_list);
> +
>  EGLBoolean
>  dri2_initialize_x11(_EGLDriver *drv, _EGLDisplay *disp);
>
> diff --git a/src/egl/drivers/dri2/platform_android.c 
> b/src/egl/drivers/dri2/platform_android.c
> index c837b35..9f0f133 100644
> --- a/src/egl/drivers/dri2/platform_android.c
> +++ b/src/egl/drivers/dri2/platform_android.c
> @@ -494,8 +494,7 @@ dri2_create_image_android_native_buffer(_EGLDriver *drv, 
> _EGLDisplay *disp,
>if (fourcc == -1 || pitch == 0)
>   return NULL;
>
> -  return dri2_create_image_khr(drv, disp, ctx, EGL_LINUX_DMA_BUF_EXT,
> - NULL, attr_list);
> +  return dri2_create_image_dma_buf(disp, ctx, NULL, attr_list);
Rob, care to ack this and 2/3 ? Using dri2_create_image_dma_buf over
the generic dri2_create_image_khr seems should make things a bit more
obvious

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] gbm: remove define _BSD_SOURCE

2016-05-14 Thread Emil Velikov
On 1 May 2016 at 13:48, Emil Velikov  wrote:
> The build systems already add this as applicable. There's no need to
> have this in the source file.
>
> Signed-off-by: Emil Velikov 
> ---
>  src/gbm/main/gbm.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/src/gbm/main/gbm.c b/src/gbm/main/gbm.c
> index c046b1a..a8da082 100644
> --- a/src/gbm/main/gbm.c
> +++ b/src/gbm/main/gbm.c
> @@ -25,7 +25,6 @@
>   *Benjamin Franzke 
>   */
>
> -#define _BSD_SOURCE
>  #define _DEFAULT_SOURCE
>
Do we have a brave soul to ack/r-b this and the other two "kill the
manual  _{BSD,DEFAUL}_SOURCE defines" ?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/14] vl/dri3: implement dri3 screen create and destroy

2016-05-14 Thread Emil Velikov
On 12 May 2016 at 16:25, Leo Liu  wrote:
> On 05/12/2016 11:08 AM, Emil Velikov wrote:
>>
>> On 12 May 2016 at 15:01, Leo Liu  wrote:
>>>
>>>
>>> On 05/12/2016 09:47 AM, Emil Velikov wrote:

 Hi Leo,

 On 11 May 2016 at 22:14, Leo Liu  wrote:
>
> On 05/11/2016 04:20 PM, Axel Davy wrote:
>>
>> On 11/05/2016 17:06, Leo Liu wrote:
>>>
>>> Screen created with device fd returned from X server,
>>> also will bail out to DRI2 with certain conditions.
>>>
>>> Signed-off-by: Leo Liu 
>>> ---
>>> configure.ac  |  7 ++-
>>> src/gallium/auxiliary/vl/vl_winsys_dri3.c | 88
>>> ++-
>>> 2 files changed, 93 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/configure.ac b/configure.ac
>>> index 023110e..8c3960a 100644
>>> --- a/configure.ac
>>> +++ b/configure.ac
>>> @@ -1779,7 +1779,12 @@ if test "x$enable_xvmc" = xyes -o \
>>> "x$enable_vdpau" = xyes -o \
>>> "x$enable_omx" = xyes -o \
>>> "x$enable_va" = xyes; then
>>> -PKG_CHECK_MODULES([VL], [x11-xcb xcb xcb-dri2 >=
>>> $XCBDRI2_REQUIRED])
>>> +if test x"$enable_dri3" = xyes; then
>>> +PKG_CHECK_MODULES([VL], [xcb-dri3 xcb-present xcb-sync
>>> xshmfence

 = $XSHMFENCE_REQUIRED
>>>
>>> + x11-xcb xcb xcb-dri2 >=
>>> $XCBDRI2_REQUIRED])

 We don't need xcb-dri2 in the above do we ?
>>>
>>>
>>> Yes I think so. That's for all vl, includes building vl_winsys_dri.c.
>>>
>>>
>> Yes we need it, or yes we don't need it ? Afaict the vl_winsys_dri.c
>> case is handled in the else statement.
>
> We still need vl_winsys_dri.c even with "enable_dri3", because there's
> fallback case.
>
Indeed you are correct - had a PEBKAC moment.

Thanks for the patience
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vl/dri: fix close fd error out

2016-05-14 Thread Emil Velikov
On 12 May 2016 at 16:19, Leo Liu  wrote:
> On 05/12/2016 11:10 AM, Emil Velikov wrote:
>>
>> On 12 May 2016 at 15:10, Leo Liu  wrote:
>>>
>>> fd should be set to -1 only if it got closed by pipe_loader_release.
>>>
>>> Signed-off-by: Leo Liu 
>>> ---
>>>   src/gallium/auxiliary/vl/vl_winsys_dri.c | 5 +++--
>>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri.c
>>> b/src/gallium/auxiliary/vl/vl_winsys_dri.c
>>> index 0136526..4636feb 100644
>>> --- a/src/gallium/auxiliary/vl/vl_winsys_dri.c
>>> +++ b/src/gallium/auxiliary/vl/vl_winsys_dri.c
>>> @@ -427,9 +427,10 @@ vl_dri2_screen_create(Display *display, int screen)
>>>  return &scrn->base;
>>>
>>>   release_pipe:
>>> -   if (scrn->base.dev)
>>> +   if (scrn->base.dev) {
>>> pipe_loader_release(&scrn->base.dev, 1);
>>> -   fd = -1;
>>> +  fd = -1;
>>> +   }
>>>   free_authenticate:
>>>  free(authenticate);
>>>   close_fd:
>>
>> +if (fd != -1)
>>  close(fd)
>>
>> Please add a -1 check before the close.
>
>
> Sure I will add it, commit it to the repo later today.
>
In general it would be better to reply with the updated patch first.
Obviously it's not a big deal here.

Thanks again for squashing my silly mistake(s).
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] st/xa: don't call close(-1) in xa_tracker_create error path

2016-05-14 Thread Emil Velikov
Analogous to previous commit.

Signed-off-by: Emil Velikov 
---
 src/gallium/state_trackers/xa/xa_tracker.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/state_trackers/xa/xa_tracker.c 
b/src/gallium/state_trackers/xa/xa_tracker.c
index f09baed..e091b083 100644
--- a/src/gallium/state_trackers/xa/xa_tracker.c
+++ b/src/gallium/state_trackers/xa/xa_tracker.c
@@ -152,7 +152,7 @@ xa_tracker_create(int drm_fd)
 struct xa_tracker *xa = calloc(1, sizeof(struct xa_tracker));
 enum xa_surface_type stype;
 unsigned int num_formats;
-int fd = -1;
+int fd;
 
 if (!xa)
return NULL;
@@ -212,9 +212,9 @@ xa_tracker_create(int drm_fd)
  out_no_screen:
 if (xa->dev)
pipe_loader_release(&xa->dev, 1);
-fd = -1;
+else
+   close(fd);
  out_no_fd:
-close(fd);
 free(xa);
 return NULL;
 }
-- 
2.8.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] vl/drm: don't call close(-1) in vl_drm_screen_create error path

2016-05-14 Thread Emil Velikov
Analogous to previous commits.

Signed-off-by: Emil Velikov 
---
 src/gallium/auxiliary/vl/vl_winsys_drm.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_winsys_drm.c 
b/src/gallium/auxiliary/vl/vl_winsys_drm.c
index 6d9d947..6a759ae 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_drm.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_drm.c
@@ -41,20 +41,20 @@ struct vl_screen *
 vl_drm_screen_create(int fd)
 {
struct vl_screen *vscreen;
-   int new_fd = -1;
+   int new_fd;
 
vscreen = CALLOC_STRUCT(vl_screen);
if (!vscreen)
   return NULL;
 
if (fd < 0 || (new_fd = dup(fd)) < 0)
-  goto error;
+  goto free_screen;
 
if (pipe_loader_drm_probe_fd(&vscreen->dev, new_fd))
   vscreen->pscreen = pipe_loader_create_screen(vscreen->dev);
 
if (!vscreen->pscreen)
-  goto error;
+  goto release_pipe;
 
vscreen->destroy = vl_drm_screen_destroy;
vscreen->texture_from_drawable = NULL;
@@ -64,12 +64,13 @@ vl_drm_screen_create(int fd)
vscreen->get_private = NULL;
return vscreen;
 
-error:
+release_pipe:
if (vscreen->dev)
   pipe_loader_release(&vscreen->dev, 1);
else
   close(new_fd);
 
+free_screen:
FREE(vscreen);
return NULL;
 }
-- 
2.8.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] st/dri: don't call close(-1) in dri{2, kms_}_init_screen error path

2016-05-14 Thread Emil Velikov
Add separate labels and jump to the correct one as needed.

Signed-off-by: Emil Velikov 
---
 src/gallium/state_trackers/dri/dri2.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/src/gallium/state_trackers/dri/dri2.c 
b/src/gallium/state_trackers/dri/dri2.c
index 675a9bb..2330530 100644
--- a/src/gallium/state_trackers/dri/dri2.c
+++ b/src/gallium/state_trackers/dri/dri2.c
@@ -1714,7 +1714,7 @@ dri2_init_screen(__DRIscreen * sPriv)
struct pipe_screen *pscreen = NULL;
const struct drm_conf_ret *throttle_ret;
const struct drm_conf_ret *dmabuf_ret;
-   int fd = -1;
+   int fd;
 
screen = CALLOC_STRUCT(dri_screen);
if (!screen)
@@ -1727,13 +1727,13 @@ dri2_init_screen(__DRIscreen * sPriv)
sPriv->driverPrivate = (void *)screen;
 
if (screen->fd < 0 || (fd = dup(screen->fd)) < 0)
-  goto fail;
+  goto free_screen;
 
if (pipe_loader_drm_probe_fd(&screen->dev, fd))
   pscreen = pipe_loader_create_screen(screen->dev);
 
if (!pscreen)
-   goto fail;
+   goto release_pipe;
 
throttle_ret = pipe_loader_configuration(screen->dev, DRM_CONF_THROTTLE);
dmabuf_ret = pipe_loader_configuration(screen->dev, DRM_CONF_SHARE_FD);
@@ -1762,7 +1762,7 @@ dri2_init_screen(__DRIscreen * sPriv)
 
configs = dri_init_screen_helper(screen, pscreen, screen->dev->driver_name);
if (!configs)
-  goto fail;
+  goto destroy_screen;
 
screen->can_share_buffer = true;
screen->auto_fake_front = dri_with_format(sPriv);
@@ -1770,12 +1770,17 @@ dri2_init_screen(__DRIscreen * sPriv)
screen->lookup_egl_image = dri2_lookup_egl_image;
 
return configs;
-fail:
+
+destroy_screen:
dri_destroy_screen_helper(screen);
+
+release_pipe:
if (screen->dev)
   pipe_loader_release(&screen->dev, 1);
else
   close(fd);
+
+free_screen:
FREE(screen);
return NULL;
 }
@@ -1793,7 +1798,7 @@ dri_kms_init_screen(__DRIscreen * sPriv)
struct dri_screen *screen;
struct pipe_screen *pscreen = NULL;
uint64_t cap;
-   int fd = -1;
+   int fd;
 
screen = CALLOC_STRUCT(dri_screen);
if (!screen)
@@ -1805,13 +1810,13 @@ dri_kms_init_screen(__DRIscreen * sPriv)
sPriv->driverPrivate = (void *)screen;
 
if (screen->fd < 0 || (fd = dup(screen->fd)) < 0)
-  goto fail;
+  goto free_screen;
 
if (pipe_loader_sw_probe_kms(&screen->dev, fd))
   pscreen = pipe_loader_create_screen(screen->dev);
 
if (!pscreen)
-   goto fail;
+   goto release_pipe;
 
if (drmGetCap(sPriv->fd, DRM_CAP_PRIME, &cap) == 0 &&
   (cap & DRM_PRIME_CAP_IMPORT)) {
@@ -1823,7 +1828,7 @@ dri_kms_init_screen(__DRIscreen * sPriv)
 
configs = dri_init_screen_helper(screen, pscreen, "swrast");
if (!configs)
-  goto fail;
+  goto destroy_screen;
 
screen->can_share_buffer = false;
screen->auto_fake_front = dri_with_format(sPriv);
@@ -1831,12 +1836,17 @@ dri_kms_init_screen(__DRIscreen * sPriv)
screen->lookup_egl_image = dri2_lookup_egl_image;
 
return configs;
-fail:
+
+destroy_screen:
dri_destroy_screen_helper(screen);
+
+release_pipe:
if (screen->dev)
   pipe_loader_release(&screen->dev, 1);
else
   close(fd);
+
+free_screen:
FREE(screen);
 #endif // GALLIUM_SOFTPIPE
return NULL;
-- 
2.8.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 7/9] gbm: rename gbm_dri_bo_{map, unmap} to gbm_dri_bo_{map, unmap}_dumb

2016-05-14 Thread Emil Velikov
On 4 May 2016 at 03:02, Rob Herring  wrote:
> In preparation to add public map/unmap functions, rename the existing
> gbm_dri_bo_{map,unmap} functions to indicate that they are only for dumb
> buffers.
>
> Signed-off-by: Rob Herring 
> ---
> v2:
> - moved into new patch
>
>  src/gbm/backends/dri/gbm_dri.c| 4 ++--
>  src/gbm/backends/dri/gbm_driint.h | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
I mentioned it before, guess I wasn't clear enough - there are cases
of these functions used outside of GBM (yes it is a bit of a 'nasty'
ABI)
Namely, there's two of each in origin/master:src/egl/drivers/dri2/platform_drm.c

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa 11.3.0/12.0.0 release plan

2016-05-14 Thread Emil Velikov
On 29 April 2016 at 14:07, Iago Toral  wrote:
> On Fri, 2016-04-29 at 14:01 +0100, Emil Velikov wrote:
>> On 29 April 2016 at 13:19, Iago Toral  wrote:
>> > On Fri, 2016-04-29 at 00:55 -0700, Kenneth Graunke wrote:
>> >> On Thursday, April 28, 2016 4:06:49 PM PDT Emil Velikov wrote:
>> >> > Hi all,
>> >> >
>> >> > Here is the current tentative 11.3.0/12.0.0 release schedule.
>> >> >
>> >> > May 20th 2016 - Feature freeze/Release candidate 1
>> >> > May 27th 2016 - Release candidate 2
>> >> > June 03rd 2016 - Release candidate 3
>> >> > June 10th 2016 - Release candidate 4/final release
>> >> >
>> >> > With the above in mind we have three weeks to get new features.
>> >> >
>> >> > Do we have some serious work that we want to squeeze in and the time
>> >> > is not enough. Does the proposed dates align with distributions
>> >> > needs/expectations ?
>> >> >
>> >> > Kindly let me know.
>> >> >
>> >> > Thanks
>> >> > Emil
>> >>
>> >> I'd really love to get fp64/va64 for Broadwell+ landed - with that in
>> >> place, we'll jump forward to GL 4.2.  We'll try and pull it off, but we
>> >> might need a little bit more time...
>> >
>> > We have just sent the first bunch of i965 patches for review so it is
>> > all going to depend on the kind of review feedback we get. It is a large
>> > series that touches a lot of things so I imagine it might take some time
>> > to get it in a shape where everyone feels comfortable merging it, but
>> > let's see.
>> >
>> So should we keep the dates as-is and re-estimate in a week ? Afaict
>> it's be nearly impossible to say how much extra time will be needed,
>> if any.
>
> Yes, I think that sounds reasonable.
>
Gents am I missing something or not many of the fp64/va64 patches have
landed yet ? The proposed branchpoint is a week away, so I'd like to
hear your thoughts on how long you think is going to take to land the
work.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8] nvc0: expose GLSL version 420 on GF100

2016-05-14 Thread Ilia Mirkin
Except for patches 1 and 6, this series is

Reviewed-by: Ilia Mirkin 

On Sat, May 14, 2016 at 9:54 AM, Samuel Pitoiset
 wrote:
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> index bd68ca9..40e5a9d 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> @@ -120,7 +120,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
> pipe_cap param)
> case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
>return 128 * 1024 * 1024;
> case PIPE_CAP_GLSL_FEATURE_LEVEL:
> -  if (class_3d == NVE4_3D_CLASS || class_3d == NVF0_3D_CLASS)
> +  if (class_3d <= NVF0_3D_CLASS)
>   return 420;
>return 410;
> case PIPE_CAP_MAX_RENDER_TARGETS:
> --
> 2.8.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/8] nvc0/ir: add a lowering pass for surfaces on Fermi

2016-05-14 Thread Ilia Mirkin
On Sat, May 14, 2016 at 9:54 AM, Samuel Pitoiset
 wrote:
> Signed-off-by: Samuel Pitoiset 
> ---
>  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 117 
> +
>  .../nouveau/codegen/nv50_ir_lowering_nvc0.h|   2 +
>  2 files changed, 119 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> index 1068c21..002f09d 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
> @@ -1982,6 +1982,121 @@ NVC0LoweringPass::handleSurfaceOpNVE4(TexInstruction 
> *su)
>su->sType = (su->tex.target == TEX_TARGET_BUFFER) ? TYPE_U32 : TYPE_U8;
>  }
>
> +void
> +NVC0LoweringPass::processSurfaceCoordsNVC0(TexInstruction *su)
> +{
> +   const int idx = su->tex.r;
> +   const int dim = su->tex.target.getDim();
> +   const int arg = dim + (su->tex.target.isArray() || 
> su->tex.target.isCube());
> +   const uint16_t base = idx * NVE4_SU_INFO__STRIDE;
> +   int c;
> +   Value *zero = bld.mkImm(0);
> +   Value *src[3];
> +   Value *v;
> +   Value *ind = NULL;
> +
> +   if (su->tex.rIndirectSrc >= 0) {
> +  // FIXME: out of bounds
> +  assert(su->tex.r == 0);
> +  ind = bld.mkOp2v(OP_SHL, TYPE_U32, bld.getSSA(),
> +   su->getIndirectR(), bld.mkImm(6));
> +   }
> +
> +   // get surface coordinates
> +   for (c = 0; c < arg; ++c)
> +  src[c] = su->getSrc(c);
> +   for (; c < 3; ++c)
> +  src[c] = zero;
> +
> +   // calculate pixel offset
> +   if (su->op == OP_SULDP || su->op == OP_SUREDP) {
> +  v = loadSuInfo32(ind, base + NVE4_SU_INFO_BSIZE);
> +  su->setSrc(0, bld.mkOp2v(OP_MUL, TYPE_U32, bld.getSSA(), src[0], v));
> +   }
> +
> +   // add array layer offset
> +   if (su->tex.target.isArray() || su->tex.target.isCube()) {
> +  v = loadSuInfo32(ind, base + NVE4_SU_INFO_ARRAY);
> +  assert(dim > 1);
> +  su->setSrc(2, bld.mkOp2v(OP_MUL, TYPE_U32, bld.getSSA(), src[2], v));
> +   }
> +
> +   // prevent read fault when the image is not actually bound
> +   CmpInstruction *pred =
> +  bld.mkCmp(OP_SET, CC_EQ, TYPE_U32, bld.getSSA(1, FILE_PREDICATE),
> +TYPE_U32, bld.mkImm(0),
> +loadSuInfo32(ind, base + NVE4_SU_INFO_ADDR));
> +   if (su->tex.format) {
> +  const TexInstruction::ImgFormatDesc *format = su->tex.format;
> +  int blockwidth = format->bits[0] + format->bits[1] +
> +   format->bits[2] + format->bits[3];
> +
> +  if (blockwidth >= 8) {

Why is the blockwidth so important here? Don't you just want to do
this for reads, since those use byte-type accesses as well as atomics?
i.e. do you need to do this for regular stores?

Even if you decide to stick with it, what you're really protecting
against here is a format of PIPE_FORMAT_NONE, which you should check
for explicitly here rather than creating an arbitrary 8-bit limit.

> + // make sure that the format doesn't mismatch
> + bld.mkCmp(OP_SET_OR, CC_NE, TYPE_U32, pred->getDef(0),
> +   TYPE_U32, bld.loadImm(NULL, blockwidth / 8),
> +   loadSuInfo32(ind, base + NVE4_SU_INFO_BSIZE),
> +   pred->getDef(0));
> +  }
> +   }
> +   su->setPredicate(CC_NOT_P, pred->getDef(0));
> +}
> +
> +void
> +NVC0LoweringPass::handleSurfaceOpNVC0(TexInstruction *su)
> +{
> +   if (su->tex.target == TEX_TARGET_1D_ARRAY) {
> +  /* As 1d arrays also need 3 coordinates, switching to 
> TEX_TARGET_2D_ARRAY
> +   * will simplify the lowering pass and the texture constraints. */
> +  su->moveSources(1, 1);
> +  su->setSrc(2, su->getSrc(1));

Is this line necessary? I thought that moveSources would take src(1)
and move it to src(2) [and so on].

> +  su->setSrc(1, bld.loadImm(NULL, 0));
> +  su->tex.target = TEX_TARGET_2D_ARRAY;
> +   }
> +
> +   processSurfaceCoordsNVC0(su);
> +
> +   if (su->op == OP_SULDP)
> +  convertSurfaceFormat(su);
> +
> +   if (su->op == OP_SUREDB || su->op == OP_SUREDP) {
> +  const int dim = su->tex.target.getDim();
> +  const int arg = dim + (su->tex.target.isArray() || 
> su->tex.target.isCube());
> +  LValue *addr = bld.getSSA(8);
> +  Value *def = su->getDef(0);
> +
> +  su->op = OP_SULEA;
> +
> +  // Set the destination to the address
> +  su->dType = TYPE_U64;
> +  su->setDef(0, addr);
> +  su->setDef(1, su->getPredicate());
> +
> +  bld.setPosition(su, true);
> +
> +  // Perform the atomic op
> +  Instruction *red = bld.mkOp(OP_ATOM, su->sType, bld.getSSA());
> +  red->subOp = su->subOp;
> +  red->setSrc(0, bld.mkSymbol(FILE_MEMORY_GLOBAL, 0, su->sType, 0));
> +  red->setSrc(1, su->getSrc(arg));
> +  if (red->subOp == NV50_IR_SUBOP_ATOM_CAS)
> + red->setSrc(2, su->getSrc(arg + 1));
> +  red->setIndirect(0, 0, addr);
> +
> +  // make sure to i

Re: [Mesa-dev] [PATCH] nv50/ir: avoid asserts when the state tracker feeds us bogus inputs

2016-05-14 Thread Samuel Pitoiset



On 05/13/2016 05:45 AM, Ilia Mirkin wrote:

INTERP is defined (by me) to have to have a INPUT source. However the
state tracker does not always obey this. This happens due to varying
packing logic introducing additional mov's which can't always be undone.
Instead of just giving up, we instead try harder to find the original
input. This won't always be possible, for example with indirect
accesses. There's not much we can (easily) do about that though.

This fixes a bunch of dEQP interpolateAt* tests that happen to hit this.


Maybe you can just add:

dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.*
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.*

to be (a little) more precise?

Anyway, this patch is:

Reviewed-by: Samuel Pitoiset 



Signed-off-by: Ilia Mirkin 
---
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 59 ++
 1 file changed, 48 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 69e1a34..73c824c 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -2733,24 +2733,61 @@ Converter::handleINTERP(Value *dst[4])
// Check whether the input is linear. All other attributes ignored.
Instruction *insn;
Value *offset = NULL, *ptr = NULL, *w = NULL;
+   Symbol *sym[4] = { NULL };
bool linear;
operation op;
int c, mode;

tgsi::Instruction::SrcRegister src = tgsi.getSrc(0);
-   assert(src.getFile() == TGSI_FILE_INPUT);

-   if (src.isIndirect(0))
+   // In some odd cases, in large part due to varying packing, the source
+   // might not actually be an input. This is illegal TGSI, but it's easier to
+   // account for it here than it is to fix it where the TGSI is being
+   // generated. In that case, it's going to be a straight up mov (or sequence
+   // of mov's) from the input in question. We follow the mov chain to see
+   // which input we need to use.
+   FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) {
+  if (src.getFile() == TGSI_FILE_INPUT) {
+ sym[c] = srcToSym(src, c);
+ continue;
+  }
+  Value *val = fetchSrc(0, c);
+  assert(val->defs.size() == 1);
+  insn = val->getInsn();
+  while (insn->op == OP_MOV) {
+ assert(insn->getSrc(0)->defs.size() == 1);
+ insn = insn->getSrc(0)->getInsn();
+ assert(insn);
+ if (!insn) {
+// This could happen if there's an indirect situation which caused
+// us to move this temp array into local memory. Just bail.
+WARN("Miscompiling shader due to unhandled INTERP\n");
+return;
+ }
+  }
+  sym[c] = insn->getSrc(0)->asSym();
+  op = insn->op;
+  mode = insn->ipa;
+   }
+
+   if (src.isIndirect(0)) {
+  // In the case where there's varying packing *and* indirect inputs going
+  // on, we're sunk.
+  assert(src.getFile() == TGSI_FILE_INPUT);
   ptr = fetchSrc(src.getIndirect(0), 0, NULL);
+   }

-   // XXX: no way to know interp mode if we don't know the index
-   linear = info->in[ptr ? 0 : src.getIndex(0)].linear;
-   if (linear) {
-  op = OP_LINTERP;
-  mode = NV50_IR_INTERP_LINEAR;
-   } else {
-  op = OP_PINTERP;
-  mode = NV50_IR_INTERP_PERSPECTIVE;
+   // We can assume that the fixed index will point to an input of the same
+   // interpolation type in case of an indirect.
+   if (src.getFile() == TGSI_FILE_INPUT) {
+  linear = info->in[src.getIndex(0)].linear;
+  if (linear) {
+ op = OP_LINTERP;
+ mode = NV50_IR_INTERP_LINEAR;
+  } else {
+ op = OP_PINTERP;
+ mode = NV50_IR_INTERP_PERSPECTIVE;
+  }
}

switch (tgsi.getOpcode()) {
@@ -2793,7 +2830,7 @@ Converter::handleINTERP(Value *dst[4])


FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) {
-  insn = mkOp1(op, TYPE_F32, dst[c], srcToSym(src, c));
+  insn = mkOp1(op, TYPE_F32, dst[c], sym[c]);
   if (op == OP_PINTERP)
  insn->setSrc(1, w);
   if (ptr)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/8] nvc0: bind images on fragment and compute shaders for Fermi

2016-05-14 Thread Ilia Mirkin
On Sat, May 14, 2016 at 9:54 AM, Samuel Pitoiset
 wrote:
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/nvc0/nvc0_compute.c |  53 
>  src/gallium/drivers/nouveau/nvc0/nvc0_context.h |   1 +
>  src/gallium/drivers/nouveau/nvc0/nvc0_program.c |   8 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_tex.c | 154 
> +++-
>  4 files changed, 209 insertions(+), 7 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
> index bbc8edb..78ce000 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
> @@ -258,6 +258,45 @@ nvc0_compute_validate_globals(struct nvc0_context *nvc0)
> }
>  }
>
> +static inline void
> +nvc0_compute_invalidate_surfaces(struct nvc0_context *nvc0, const int s)
> +{
> +   struct nouveau_pushbuf *push = nvc0->base.pushbuf;
> +   int i;
> +
> +   for (i = 0; i < NVC0_MAX_IMAGES; ++i) {
> +  if (s == 5)
> + BEGIN_NVC0(push, NVC0_CP(IMAGE(i)), 6);
> +  else
> + BEGIN_NVC0(push, NVC0_3D(IMAGE(i)), 6);
> +  PUSH_DATA(push, 0);
> +  PUSH_DATA(push, 0);
> +  PUSH_DATA(push, 0);
> +  PUSH_DATA(push, 0);
> +  PUSH_DATA(push, 0x14000);
> +  PUSH_DATA(push, 0);
> +   }
> +}
> +
> +static void
> +nvc0_compute_validate_surfaces(struct nvc0_context *nvc0)
> +{
> +   /* TODO: Invalidating both 3D and CP surfaces before validating surfaces 
> for
> +* compute is probably not really necessary, but we didn't find any better
> +* solutions for now. This fixes some invalidation issues when compute and
> +* fragment shaders are used inside the same context. Anyway, we 
> definitely
> +* have invalidation issues between 3D and CP for other resources like 
> SSBO
> +* and atomic counters. */
> +   nvc0_compute_invalidate_surfaces(nvc0, 4);
> +   nvc0_compute_invalidate_surfaces(nvc0, 5);
> +
> +   nvc0_validate_suf(nvc0, 5);
> +
> +   /* Invalidate all FRAGMENT images because they are aliased with COMPUTE. 
> */
> +   nvc0->dirty_3d |= NVC0_NEW_3D_SURFACES;
> +   nvc0->images_dirty[4] |= nvc0->images_valid[4];
> +}
> +
>  static struct nvc0_state_validate
>  validate_list_cp[] = {
> { nvc0_compprog_validate,  NVC0_NEW_CP_PROGRAM },
> @@ -267,6 +306,7 @@ validate_list_cp[] = {
> { nvc0_compute_validate_textures,  NVC0_NEW_CP_TEXTURES},
> { nvc0_compute_validate_samplers,  NVC0_NEW_CP_SAMPLERS},
> { nvc0_compute_validate_globals,   NVC0_NEW_CP_GLOBALS },
> +   { nvc0_compute_validate_surfaces,  NVC0_NEW_CP_SURFACES},
>  };
>
>  static bool
> @@ -384,6 +424,19 @@ nvc0_launch_grid(struct pipe_context *pipe, const struct 
> pipe_grid_info *info)
>PUSH_DATA (push, 0x1);
> }
>
> +   for (int i = 0; i < NVC0_MAX_IMAGES; ++i) {
> +  BEGIN_NVC0(push, NVC0_CP(IMAGE(i)), 6);
> +  PUSH_DATA(push, 0);
> +  PUSH_DATA(push, 0);
> +  PUSH_DATA(push, 0);
> +  PUSH_DATA(push, 0);
> +  PUSH_DATA(push, 0x14000);
> +  PUSH_DATA(push, 0);
> +   }
> +
> +   /* TODO: Not sure if this is really necessary. */
> +   nvc0_compute_invalidate_surfaces(nvc0, 5);

Errr... so you're doing this 2x? Did you mean to get rid of the loop above?

> +
> /* Invalidate all 3D constbufs because they are aliased with COMPUTE. */
> nvc0->dirty_3d |= NVC0_NEW_3D_CONSTBUF;
> for (s = 0; s < 5; s++) {
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
> index 7fcbf4a..436e912 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
> @@ -323,6 +323,7 @@ extern void nvc0_init_surface_functions(struct 
> nvc0_context *);
>  bool nvc0_validate_tic(struct nvc0_context *nvc0, int s);
>  bool nvc0_validate_tsc(struct nvc0_context *nvc0, int s);
>  bool nve4_validate_tsc(struct nvc0_context *nvc0, int s);
> +void nvc0_validate_suf(struct nvc0_context *nvc0, int s);
>  void nvc0_validate_textures(struct nvc0_context *);
>  void nvc0_validate_samplers(struct nvc0_context *);
>  void nve4_set_tex_handles(struct nvc0_context *);
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
> index 9db45c0..9e214a5 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
> @@ -552,22 +552,18 @@ nvc0_program_translate(struct nvc0_program *prog, 
> uint16_t chipset,
>   info->io.texBindBase = NVC0_CB_AUX_TEX_INFO(0);
>   info->prop.cp.gridInfoBase = NVC0_CB_AUX_GRID_INFO;
>   info->io.uboInfoBase = NVC0_CB_AUX_UBO_INFO(0);
> - info->io.suInfoBase = NVC0_CB_AUX_SU_INFO(0);
> -  } else {
> - info->io.suInfoBase = 0; /* TODO */
>}
>info->io.msInfoCBSlot = 0;
>info->io.msInfoBase = NVC0_CB_AUX_MS_INFO;
>

Re: [Mesa-dev] [PATCH 09/11] tgsi: remove culldist semantic.

2016-05-14 Thread Ilia Mirkin
On Sat, May 14, 2016 at 10:23 AM, Roland Scheidegger  wrote:
> Am 14.05.2016 um 14:55 schrieb Marek Olšák:
>> Dave,
>> It should be noted that clip distances can be disabled by
>> pipe_rasterizer_state::clip_plane_enable, but cull distances can't.
>> (same as GL)
>
> That only applies to user clip planes, not shader clip distances.

Actually, it applies to both.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/11] tgsi: remove culldist semantic.

2016-05-14 Thread Roland Scheidegger
Am 14.05.2016 um 14:55 schrieb Marek Olšák:
> Dave,
> It should be noted that clip distances can be disabled by
> pipe_rasterizer_state::clip_plane_enable, but cull distances can't.
> (same as GL)

That only applies to user clip planes, not shader clip distances.


> 
> Roland,
> Our hardware only has 2 vec4 outputs. Each component can be configured
> to be "clip distance", "cull distance", or "disabled" independently.
> 

Ok.

Roland
> 
> On Sat, May 14, 2016 at 12:43 AM, Roland Scheidegger  
> wrote:
>> Am 13.05.2016 um 23:10 schrieb Dave Airlie:
>>> From: Dave Airlie 
>>>
>>> This isn't used anymore in the tree, culldist's
>>> are part of the clipdist semantic, we could in theory
>>> rename it, but I'm not sure there is much point, and
>>> I'd have to be careful with virgl.
>>>
>>> Signed-off-by: Dave Airlie 
>>> ---
>>>  src/gallium/auxiliary/tgsi/tgsi_strings.c  |  1 -
>>>  src/gallium/docs/source/tgsi.rst   | 22 ++
>>>  src/gallium/include/pipe/p_shader_tokens.h |  1 -
>>>  3 files changed, 18 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c 
>>> b/src/gallium/auxiliary/tgsi/tgsi_strings.c
>>> index 306ab4f..c13f7ea 100644
>>> --- a/src/gallium/auxiliary/tgsi/tgsi_strings.c
>>> +++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c
>>> @@ -85,7 +85,6 @@ const char *tgsi_semantic_names[TGSI_SEMANTIC_COUNT] =
>>> "PCOORD",
>>> "VIEWPORT_INDEX",
>>> "LAYER",
>>> -   "CULLDIST",
>>> "SAMPLEID",
>>> "SAMPLEPOS",
>>> "SAMPLEMASK",
>>> diff --git a/src/gallium/docs/source/tgsi.rst 
>>> b/src/gallium/docs/source/tgsi.rst
>>> index 4315707..ab12490 100644
>>> --- a/src/gallium/docs/source/tgsi.rst
>>> +++ b/src/gallium/docs/source/tgsi.rst
>>> @@ -2876,18 +2876,32 @@ annotated with those semantics.
>>>  TGSI_SEMANTIC_CLIPDIST
>>>  ""
>>>
>>> +Note this covers clipping and culling distances.
>>> +
>>>  When components of vertex elements are identified this way, these
>>>  values are each assumed to be a float32 signed distance to a plane.
>>> +
>>> +For clip distances:
>>>  Primitive setup only invokes rasterization on pixels for which
>>> -the interpolated plane distances are >= 0. Multiple clip planes
>>> -can be implemented simultaneously, by annotating multiple
>>> -components of one or more vertex elements with the above specified
>>> -semantic. The limits on both clip and cull distances are bound
>>> +the interpolated plane distances are >= 0.
>>> +
>>> +For cull distances:
>>> +Primitives will be completely discarded if the plane distance
>>> +for all of the vertices in the primitive are < 0.
>>> +If a vertex has a cull distance of NaN, that vertex counts as "out"
>>> +(as if its < 0);
>>> +
>>> +Multiple clip/cull planes can be implemented simultaneously, by
>>> +annotating multiple components of one or more vertex elements with
>>> +the above specified semantic.
>>> +The limits on both clip and cull distances are bound
>>>  by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
>>>  the maximum number of components that can be used to hold the
>>>  distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
>>>  which specifies the maximum number of registers which can be
>>>  annotated with those semantics.
>>> +The properties NUM_CLIPDIST_ENABLED and NUM_CULLDIST_ENABLED
>>> +are used to divide up the 2 x vec4 space between clipping and culling.
>> This should really say how it's determined which one is which (so clip
>> dists come first).
>>
>>
>> You should remove the TGSI_SEMANTIC_CULLDIST section.
>>
>> For patch 10, shouldn't this work with softpipe too?
>>
>> Honestly, I'm not a big fan of packed clip and cull dists in the same
>> regs (it's still not the same as what d3d10 does in any case), my
>> opinion is since we generally don't allow different semantics within the
>> same reg, I see no good reason why we allow it here (and clip dists and
>> cull dists, albeit somewhat similar, are still different). So, if some
>> drivers wanted it in different regs and some in the same regs, I'd
>> prefer it to be different regs in the interface, with drivers having to
>> merge it when required, just because it looks cleaner. But if really all
>> hw wants it like that, 6,8-11 are
>> Reviewed-by: Roland Scheidegger 
>> (But I'd like to hear from other driver's authors.)
>>
>> Roland
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=CwIBaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=mWAND3ELitFSIGn3LaQ9eDlEXitrSp5g2LRX0nzGYF8&s=c_Ik7rEVzrYiqaJEZb_A51FunW8lKm-znV3nP6F_Jvc&e=
>>  

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nv50,nvc0: add support for cull distances

2016-05-14 Thread Ilia Mirkin
From: Tobias Klausmann 

Cull distances are just a special case of clip distances as far as the
hardware is concerned. Make sure that the relevant "planes" are enabled,
and flip the clip mode to cull for those.

Signed-off-by: Tobias Klausmann 
[imirkin: add enables on nvc0, add nv50 support]
Signed-off-by: Ilia Mirkin 
---
 docs/GL3.txt   |  2 +-
 docs/relnotes/11.3.0.html  |  2 +-
 src/gallium/drivers/nouveau/nv50/nv50_program.c|  9 -
 src/gallium/drivers/nouveau/nv50/nv50_program.h|  3 +++
 src/gallium/drivers/nouveau/nv50/nv50_screen.c |  2 +-
 src/gallium/drivers/nouveau/nv50/nv50_screen.h |  1 +
 src/gallium/drivers/nouveau/nv50/nv50_shader_state.c   |  5 +++--
 src/gallium/drivers/nouveau/nv50/nv50_state_validate.c | 15 +++
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c|  5 +++--
 src/gallium/drivers/nouveau/nvc0/nvc0_program.h|  1 +
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |  2 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c |  1 +
 12 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 5e49c57..b8b4361 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -211,7 +211,7 @@ GL 4.5, GLSL 4.50:
   GL_ARB_ES3_1_compatibilitynot started
   GL_ARB_clip_control   DONE (i965, nv50, 
nvc0, r600, radeonsi, llvmpipe, softpipe)
   GL_ARB_conditional_render_invertedDONE (i965, nv50, 
nvc0, r600, radeonsi, llvmpipe, softpipe)
-  GL_ARB_cull_distance  DONE (i965)
+  GL_ARB_cull_distance  DONE (i965, nv50, nvc0)
   GL_ARB_derivative_control DONE (i965, nv50, 
nvc0, r600, radeonsi)
   GL_ARB_direct_state_accessDONE (all drivers)
   GL_ARB_get_texture_sub_image  DONE (all drivers)
diff --git a/docs/relnotes/11.3.0.html b/docs/relnotes/11.3.0.html
index 6a964f2..f456c0e 100644
--- a/docs/relnotes/11.3.0.html
+++ b/docs/relnotes/11.3.0.html
@@ -46,7 +46,7 @@ Note: some of the new features are only available with 
certain drivers.
 
 OpenGL 4.2 on radeonsi
 GL_ARB_compute_shader on radeonsi, softpipe
-GL_ARB_cull_distance on i965/gen6+
+GL_ARB_cull_distance on i965/gen6+, nv50, nvc0
 GL_ARB_framebuffer_no_attachments on nvc0, r600, radeonsi, softpipe
 GL_ARB_internalformat_query2 on all drivers
 GL_ARB_query_buffer_object on i965/hsw+
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c 
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index 89db67f..648cb73 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -319,7 +319,7 @@ nv50_program_translate(struct nv50_program *prog, uint16_t 
chipset,
struct pipe_debug_callback *debug)
 {
struct nv50_ir_prog_info *info;
-   int ret;
+   int i, ret;
const uint8_t map_undef = (prog->type == PIPE_SHADER_VERTEX) ? 0x40 : 0x80;
 
info = CALLOC_STRUCT(nv50_ir_prog_info);
@@ -378,6 +378,13 @@ nv50_program_translate(struct nv50_program *prog, uint16_t 
chipset,
 
prog->vp.need_vertex_id = info->io.vertexId < PIPE_MAX_SHADER_INPUTS;
 
+   prog->vp.clip_enable = (1 << info->io.clipDistances) - 1;
+   prog->vp.cull_enable =
+  ((1 << info->io.cullDistances) - 1) << info->io.clipDistances;
+   prog->vp.clip_mode = 0;
+   for (i = 0; i < info->io.cullDistances; ++i)
+  prog->vp.clip_mode |= 1 << ((info->io.clipDistances + i) * 4);
+
if (prog->type == PIPE_SHADER_FRAGMENT) {
   if (info->prop.fp.writesDepth) {
  prog->fp.flags[0] |= NV50_3D_FP_CONTROL_EXPORTS_Z;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.h 
b/src/gallium/drivers/nouveau/nv50/nv50_program.h
index 1de5122..0a22e5b 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.h
@@ -79,6 +79,9 @@ struct nv50_program {
   ubyte clpd[2]; /* output slot of clip distance[i]'s 1st component */
   ubyte clpd_nr;
   bool need_vertex_id;
+  uint32_t clip_mode;
+  uint8_t clip_enable; /* mask of defined clip planes */
+  uint8_t cull_enable; /* mask of defined cull distances */
} vp;
 
struct {
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 0912150..fa2493c 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -195,6 +195,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_TGSI_FS_FACE_IS_INTEGER_SYSVAL:
case PIPE_CAP_INVALIDATE_BUFFER:
case PIPE_CAP_STRING_MARKER:
+   case PIPE_CAP_CULL_DISTANCE:
   return 1;
case PIPE_CAP_SEAMLESS_CUBE_MAP:
   return 1; /* class_3d >= NVA0_3D_CLASS; */
@

[Mesa-dev] [PATCH] st/mesa: disable cull distance for now

2016-05-14 Thread Ilia Mirkin
The pass that st/mesa relies on to combine clip and cull distances has
been reverted, so we can't expose ARB_cull_distance until that is
resolved.

Signed-off-by: Ilia Mirkin 
---
 src/mesa/state_tracker/st_extensions.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index 4b9a3bd..ea60e41 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -574,7 +574,7 @@ void st_init_extensions(struct pipe_screen *screen,
   { o(ARB_color_buffer_float),   PIPE_CAP_VERTEX_COLOR_UNCLAMPED   
},
   { o(ARB_conditional_render_inverted),  
PIPE_CAP_CONDITIONAL_RENDER_INVERTED  },
   { o(ARB_copy_image),   
PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS },
-  { o(ARB_cull_distance),PIPE_CAP_CULL_DISTANCE
},
+  //{ o(ARB_cull_distance),PIPE_CAP_CULL_DISTANCE  
  },
   { o(ARB_depth_clamp),  PIPE_CAP_DEPTH_CLIP_DISABLE   
},
   { o(ARB_depth_texture),PIPE_CAP_TEXTURE_SHADOW_MAP   
},
   { o(ARB_derivative_control),   PIPE_CAP_TGSI_FS_FINE_DERIVATIVE  
},
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/8] nvc0/ir: add emission for OP_SULEA

2016-05-14 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 58 ++
 1 file changed, 58 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index 14f4be4..f7bdc19 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -63,6 +63,8 @@ private:
void emitInterpMode(const Instruction *);
void emitLoadStoreType(DataType ty);
void emitSUGType(DataType);
+   void emitSUAddr(const TexInstruction *);
+   void emitSUDim(const TexInstruction *);
void emitCachingMode(CacheMode c);
 
void emitShortSrc2(const ValueRef&);
@@ -137,6 +139,8 @@ private:
void emitSULDGB(const TexInstruction *);
void emitSUSTGx(const TexInstruction *);
 
+   void emitSULEA(const TexInstruction *);
+
void emitVSHL(const Instruction *);
void emitVectorSubOp(const Instruction *);
 
@@ -2285,6 +2289,57 @@ CodeEmitterNVC0::emitSUSTGx(const TexInstruction *i)
 }
 
 void
+CodeEmitterNVC0::emitSUAddr(const TexInstruction *i)
+{
+   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
+
+   if (i->tex.rIndirectSrc < 0) {
+  code[1] |= 0x4000;
+  code[0] |= i->tex.r << 26;
+   } else {
+  srcId(i, i->tex.rIndirectSrc, 26);
+   }
+}
+
+void
+CodeEmitterNVC0::emitSUDim(const TexInstruction *i)
+{
+   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
+
+   code[1] |= (i->tex.target.getDim() - 1) << 12;
+   if (i->tex.target.isArray() || i->tex.target.isCube() ||
+   i->tex.target.getDim() == 3) {
+  // use e2d mode for 3-dim images, arrays and cubes.
+  code[1] |= 3 << 12;
+   }
+
+   srcId(i->src(0), 20);
+}
+
+void
+CodeEmitterNVC0::emitSULEA(const TexInstruction *i)
+{
+   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
+
+   code[0] = 0x5;
+   code[1] = 0xf000;
+
+   emitPredicate(i);
+   emitLoadStoreType(i->sType);
+
+   defId(i->def(0), 14);
+
+   if (i->defExists(1)) {
+  defId(i->def(1), 32 + 22);
+   } else {
+  code[1] |= 7 << 22;
+   }
+
+   emitSUAddr(i);
+   emitSUDim(i);
+}
+
+void
 CodeEmitterNVC0::emitVectorSubOp(const Instruction *i)
 {
switch (NV50_IR_SUBOP_Vn(i->subOp)) {
@@ -2579,6 +2634,9 @@ CodeEmitterNVC0::emitInstruction(Instruction *insn)
   else
  ERROR("SUSTx not yet supported on < nve4\n");
   break;
+   case OP_SULEA:
+  emitSULEA(insn->asTex());
+  break;
case OP_ATOM:
   emitATOM(insn);
   break;
-- 
2.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/8] nvc0: bind images on fragment and compute shaders for Fermi

2016-05-14 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_compute.c |  53 
 src/gallium/drivers/nouveau/nvc0/nvc0_context.h |   1 +
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c |   8 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_tex.c | 154 +++-
 4 files changed, 209 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
index bbc8edb..78ce000 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
@@ -258,6 +258,45 @@ nvc0_compute_validate_globals(struct nvc0_context *nvc0)
}
 }
 
+static inline void
+nvc0_compute_invalidate_surfaces(struct nvc0_context *nvc0, const int s)
+{
+   struct nouveau_pushbuf *push = nvc0->base.pushbuf;
+   int i;
+
+   for (i = 0; i < NVC0_MAX_IMAGES; ++i) {
+  if (s == 5)
+ BEGIN_NVC0(push, NVC0_CP(IMAGE(i)), 6);
+  else
+ BEGIN_NVC0(push, NVC0_3D(IMAGE(i)), 6);
+  PUSH_DATA(push, 0);
+  PUSH_DATA(push, 0);
+  PUSH_DATA(push, 0);
+  PUSH_DATA(push, 0);
+  PUSH_DATA(push, 0x14000);
+  PUSH_DATA(push, 0);
+   }
+}
+
+static void
+nvc0_compute_validate_surfaces(struct nvc0_context *nvc0)
+{
+   /* TODO: Invalidating both 3D and CP surfaces before validating surfaces for
+* compute is probably not really necessary, but we didn't find any better
+* solutions for now. This fixes some invalidation issues when compute and
+* fragment shaders are used inside the same context. Anyway, we definitely
+* have invalidation issues between 3D and CP for other resources like SSBO
+* and atomic counters. */
+   nvc0_compute_invalidate_surfaces(nvc0, 4);
+   nvc0_compute_invalidate_surfaces(nvc0, 5);
+
+   nvc0_validate_suf(nvc0, 5);
+
+   /* Invalidate all FRAGMENT images because they are aliased with COMPUTE. */
+   nvc0->dirty_3d |= NVC0_NEW_3D_SURFACES;
+   nvc0->images_dirty[4] |= nvc0->images_valid[4];
+}
+
 static struct nvc0_state_validate
 validate_list_cp[] = {
{ nvc0_compprog_validate,  NVC0_NEW_CP_PROGRAM },
@@ -267,6 +306,7 @@ validate_list_cp[] = {
{ nvc0_compute_validate_textures,  NVC0_NEW_CP_TEXTURES},
{ nvc0_compute_validate_samplers,  NVC0_NEW_CP_SAMPLERS},
{ nvc0_compute_validate_globals,   NVC0_NEW_CP_GLOBALS },
+   { nvc0_compute_validate_surfaces,  NVC0_NEW_CP_SURFACES},
 };
 
 static bool
@@ -384,6 +424,19 @@ nvc0_launch_grid(struct pipe_context *pipe, const struct 
pipe_grid_info *info)
   PUSH_DATA (push, 0x1);
}
 
+   for (int i = 0; i < NVC0_MAX_IMAGES; ++i) {
+  BEGIN_NVC0(push, NVC0_CP(IMAGE(i)), 6);
+  PUSH_DATA(push, 0);
+  PUSH_DATA(push, 0);
+  PUSH_DATA(push, 0);
+  PUSH_DATA(push, 0);
+  PUSH_DATA(push, 0x14000);
+  PUSH_DATA(push, 0);
+   }
+
+   /* TODO: Not sure if this is really necessary. */
+   nvc0_compute_invalidate_surfaces(nvc0, 5);
+
/* Invalidate all 3D constbufs because they are aliased with COMPUTE. */
nvc0->dirty_3d |= NVC0_NEW_3D_CONSTBUF;
for (s = 0; s < 5; s++) {
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h 
b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
index 7fcbf4a..436e912 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
@@ -323,6 +323,7 @@ extern void nvc0_init_surface_functions(struct nvc0_context 
*);
 bool nvc0_validate_tic(struct nvc0_context *nvc0, int s);
 bool nvc0_validate_tsc(struct nvc0_context *nvc0, int s);
 bool nve4_validate_tsc(struct nvc0_context *nvc0, int s);
+void nvc0_validate_suf(struct nvc0_context *nvc0, int s);
 void nvc0_validate_textures(struct nvc0_context *);
 void nvc0_validate_samplers(struct nvc0_context *);
 void nve4_set_tex_handles(struct nvc0_context *);
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
index 9db45c0..9e214a5 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
@@ -552,22 +552,18 @@ nvc0_program_translate(struct nvc0_program *prog, 
uint16_t chipset,
  info->io.texBindBase = NVC0_CB_AUX_TEX_INFO(0);
  info->prop.cp.gridInfoBase = NVC0_CB_AUX_GRID_INFO;
  info->io.uboInfoBase = NVC0_CB_AUX_UBO_INFO(0);
- info->io.suInfoBase = NVC0_CB_AUX_SU_INFO(0);
-  } else {
- info->io.suInfoBase = 0; /* TODO */
   }
   info->io.msInfoCBSlot = 0;
   info->io.msInfoBase = NVC0_CB_AUX_MS_INFO;
   info->io.bufInfoBase = NVC0_CB_AUX_BUF_INFO(0);
+  info->io.suInfoBase = NVC0_CB_AUX_SU_INFO(0);
} else {
   if (chipset >= NVISA_GK104_CHIPSET) {
  info->io.texBindBase = NVC0_CB_AUX_TEX_INFO(0);
- info->io.suInfoBase = NVC0_CB_AUX_SU_INFO(0);
-  } else {
- info->io.suInfoBase = 0; /* TODO */
   }
   info->io.sampleInfoBase = 

[Mesa-dev] [PATCH 3/8] nv50/ir: fix tex constraints for surface coords on Fermi

2016-05-14 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
index 27883a0..b893996 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
@@ -2154,6 +2154,9 @@ 
RegAlloc::InsertConstraintsPass::texConstraintNVC0(TexInstruction *tex)
if (tex->op == OP_TXQ) {
   s = tex->srcCount(0xff);
   n = 0;
+   } else if (isSurfaceOp(tex->op)) {
+  s = tex->tex.target.getDim() + (tex->tex.target.isArray() || 
tex->tex.target.isCube());
+  n = tex->srcCount(0xff) - s;
} else {
   s = tex->tex.target.getArgCount() - tex->tex.target.isMS();
   if (!tex->tex.target.isArray() &&
-- 
2.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/8] nvc0/ir: add a lowering pass for surfaces on Fermi

2016-05-14 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 117 +
 .../nouveau/codegen/nv50_ir_lowering_nvc0.h|   2 +
 2 files changed, 119 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index 1068c21..002f09d 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -1982,6 +1982,121 @@ NVC0LoweringPass::handleSurfaceOpNVE4(TexInstruction 
*su)
   su->sType = (su->tex.target == TEX_TARGET_BUFFER) ? TYPE_U32 : TYPE_U8;
 }
 
+void
+NVC0LoweringPass::processSurfaceCoordsNVC0(TexInstruction *su)
+{
+   const int idx = su->tex.r;
+   const int dim = su->tex.target.getDim();
+   const int arg = dim + (su->tex.target.isArray() || su->tex.target.isCube());
+   const uint16_t base = idx * NVE4_SU_INFO__STRIDE;
+   int c;
+   Value *zero = bld.mkImm(0);
+   Value *src[3];
+   Value *v;
+   Value *ind = NULL;
+
+   if (su->tex.rIndirectSrc >= 0) {
+  // FIXME: out of bounds
+  assert(su->tex.r == 0);
+  ind = bld.mkOp2v(OP_SHL, TYPE_U32, bld.getSSA(),
+   su->getIndirectR(), bld.mkImm(6));
+   }
+
+   // get surface coordinates
+   for (c = 0; c < arg; ++c)
+  src[c] = su->getSrc(c);
+   for (; c < 3; ++c)
+  src[c] = zero;
+
+   // calculate pixel offset
+   if (su->op == OP_SULDP || su->op == OP_SUREDP) {
+  v = loadSuInfo32(ind, base + NVE4_SU_INFO_BSIZE);
+  su->setSrc(0, bld.mkOp2v(OP_MUL, TYPE_U32, bld.getSSA(), src[0], v));
+   }
+
+   // add array layer offset
+   if (su->tex.target.isArray() || su->tex.target.isCube()) {
+  v = loadSuInfo32(ind, base + NVE4_SU_INFO_ARRAY);
+  assert(dim > 1);
+  su->setSrc(2, bld.mkOp2v(OP_MUL, TYPE_U32, bld.getSSA(), src[2], v));
+   }
+
+   // prevent read fault when the image is not actually bound
+   CmpInstruction *pred =
+  bld.mkCmp(OP_SET, CC_EQ, TYPE_U32, bld.getSSA(1, FILE_PREDICATE),
+TYPE_U32, bld.mkImm(0),
+loadSuInfo32(ind, base + NVE4_SU_INFO_ADDR));
+   if (su->tex.format) {
+  const TexInstruction::ImgFormatDesc *format = su->tex.format;
+  int blockwidth = format->bits[0] + format->bits[1] +
+   format->bits[2] + format->bits[3];
+
+  if (blockwidth >= 8) {
+ // make sure that the format doesn't mismatch
+ bld.mkCmp(OP_SET_OR, CC_NE, TYPE_U32, pred->getDef(0),
+   TYPE_U32, bld.loadImm(NULL, blockwidth / 8),
+   loadSuInfo32(ind, base + NVE4_SU_INFO_BSIZE),
+   pred->getDef(0));
+  }
+   }
+   su->setPredicate(CC_NOT_P, pred->getDef(0));
+}
+
+void
+NVC0LoweringPass::handleSurfaceOpNVC0(TexInstruction *su)
+{
+   if (su->tex.target == TEX_TARGET_1D_ARRAY) {
+  /* As 1d arrays also need 3 coordinates, switching to TEX_TARGET_2D_ARRAY
+   * will simplify the lowering pass and the texture constraints. */
+  su->moveSources(1, 1);
+  su->setSrc(2, su->getSrc(1));
+  su->setSrc(1, bld.loadImm(NULL, 0));
+  su->tex.target = TEX_TARGET_2D_ARRAY;
+   }
+
+   processSurfaceCoordsNVC0(su);
+
+   if (su->op == OP_SULDP)
+  convertSurfaceFormat(su);
+
+   if (su->op == OP_SUREDB || su->op == OP_SUREDP) {
+  const int dim = su->tex.target.getDim();
+  const int arg = dim + (su->tex.target.isArray() || 
su->tex.target.isCube());
+  LValue *addr = bld.getSSA(8);
+  Value *def = su->getDef(0);
+
+  su->op = OP_SULEA;
+
+  // Set the destination to the address
+  su->dType = TYPE_U64;
+  su->setDef(0, addr);
+  su->setDef(1, su->getPredicate());
+
+  bld.setPosition(su, true);
+
+  // Perform the atomic op
+  Instruction *red = bld.mkOp(OP_ATOM, su->sType, bld.getSSA());
+  red->subOp = su->subOp;
+  red->setSrc(0, bld.mkSymbol(FILE_MEMORY_GLOBAL, 0, su->sType, 0));
+  red->setSrc(1, su->getSrc(arg));
+  if (red->subOp == NV50_IR_SUBOP_ATOM_CAS)
+ red->setSrc(2, su->getSrc(arg + 1));
+  red->setIndirect(0, 0, addr);
+
+  // make sure to initialize dst value when the atomic operation is not
+  // performed
+  Instruction *mov = bld.mkMov(bld.getSSA(), bld.loadImm(NULL, 0));
+
+  assert(su->cc == CC_NOT_P);
+  red->setPredicate(su->cc, su->getPredicate());
+  mov->setPredicate(CC_P, su->getPredicate());
+
+  bld.mkOp2(OP_UNION, TYPE_U32, def, red->getDef(0), mov->getDef(0));
+
+  handleCasExch(red, false);
+   }
+}
+
 bool
 NVC0LoweringPass::handleWRSV(Instruction *i)
 {
@@ -2455,6 +2570,8 @@ NVC0LoweringPass::visit(Instruction *i)
case OP_SUREDP:
   if (targ->getChipset() >= NVISA_GK104_CHIPSET)
  handleSurfaceOpNVE4(i->asTex());
+  else
+ handleSurfaceOpNVC0(i->asTex());
   break;
case OP_SUQ:
   handleSUQ(i->asTex());
diff --git a/src/gallium/drivers/nou

[Mesa-dev] [PATCH 5/8] nvc0/ir: add emission for SULDB and SUSTx

2016-05-14 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 46 +-
 1 file changed, 44 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index f7bdc19..596293e 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -139,6 +139,8 @@ private:
void emitSULDGB(const TexInstruction *);
void emitSUSTGx(const TexInstruction *);
 
+   void emitSULDB(const TexInstruction *);
+   void emitSUSTx(const TexInstruction *);
void emitSULEA(const TexInstruction *);
 
void emitVSHL(const Instruction *);
@@ -2340,6 +2342,46 @@ CodeEmitterNVC0::emitSULEA(const TexInstruction *i)
 }
 
 void
+CodeEmitterNVC0::emitSULDB(const TexInstruction *i)
+{
+   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
+
+   code[0] = 0x5;
+   code[1] = 0xd400 | (i->subOp << 15);
+
+   emitPredicate(i);
+   emitLoadStoreType(i->dType);
+
+   defId(i->def(0), 14);
+
+   emitCachingMode(i->cache);
+   emitSUAddr(i);
+   emitSUDim(i);
+}
+
+void
+CodeEmitterNVC0::emitSUSTx(const TexInstruction *i)
+{
+   assert(targ->getChipset() < NVISA_GK104_CHIPSET);
+
+   code[0] = 0x5;
+   code[1] = 0xdc00 | (i->subOp << 15);
+
+   if (i->op == OP_SUSTP)
+  code[1] |= i->tex.mask << 17;
+   else
+  emitLoadStoreType(i->dType);
+
+   emitPredicate(i);
+
+   srcId(i->src(1), 14);
+
+   emitCachingMode(i->cache);
+   emitSUAddr(i);
+   emitSUDim(i);
+}
+
+void
 CodeEmitterNVC0::emitVectorSubOp(const Instruction *i)
 {
switch (NV50_IR_SUBOP_Vn(i->subOp)) {
@@ -2625,14 +2667,14 @@ CodeEmitterNVC0::emitInstruction(Instruction *insn)
   if (targ->getChipset() >= NVISA_GK104_CHIPSET)
  emitSULDGB(insn->asTex());
   else
- ERROR("SULDB not yet supported on < nve4\n");
+ emitSULDB(insn->asTex());
   break;
case OP_SUSTB:
case OP_SUSTP:
   if (targ->getChipset() >= NVISA_GK104_CHIPSET)
  emitSUSTGx(insn->asTex());
   else
- ERROR("SUSTx not yet supported on < nve4\n");
+ emitSUSTx(insn->asTex());
   break;
case OP_SULEA:
   emitSULEA(insn->asTex());
-- 
2.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/8] nvc0: enable ARB_shader_image_load_store on GF100

2016-05-14 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index eaf9c78..bd68ca9 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -376,6 +376,9 @@ nvc0_screen_get_shader_param(struct pipe_screen *pscreen, 
unsigned shader,
case PIPE_SHADER_CAP_MAX_SHADER_IMAGES:
   if (class_3d == NVE4_3D_CLASS || class_3d == NVF0_3D_CLASS)
  return NVC0_MAX_IMAGES;
+  if (class_3d < NVE4_3D_CLASS)
+ if (shader == PIPE_SHADER_FRAGMENT || shader == PIPE_SHADER_COMPUTE)
+return NVC0_MAX_IMAGES;
   return 0;
default:
   NOUVEAU_ERR("unknown PIPE_SHADER_CAP %d\n", param);
-- 
2.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 8/8] nvc0: expose GLSL version 420 on GF100

2016-05-14 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index bd68ca9..40e5a9d 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -120,7 +120,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
   return 128 * 1024 * 1024;
case PIPE_CAP_GLSL_FEATURE_LEVEL:
-  if (class_3d == NVE4_3D_CLASS || class_3d == NVF0_3D_CLASS)
+  if (class_3d <= NVF0_3D_CLASS)
  return 420;
   return 410;
case PIPE_CAP_MAX_RENDER_TARGETS:
-- 
2.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/8] nvc0: expose OpenGL 4.2 on Fermi

2016-05-14 Thread Samuel Pitoiset
Hi there,

This series implements both ARB_shader_image_load_store (GL 4.2) and
ARB_shader_image_size (GL 4.3) which allows us to enable OpenGL 4.2 on Fermi
GPUS. (GL3.txt won't be updated until images are also implemented on Maxwell)

3D images are fully not supported because we don't think they are used in real
applications and because it's a bit tricky to do. Anyway this could be
implemented with a separate series later if we really need them.

Except 3d images, we have exactly the same passrate as Kepler.

Next step is to implement images on Maxwell GPUs but this won't be ready for
the next release.

As usual, the list of dEQP/piglit fails is listed below.

Please review,
Thanks!

Ilia Mirkin (1):
  nv50/ir: use moveSources to condense sources

Samuel Pitoiset (7):
  nvc0: bind images on fragment and compute shaders for Fermi
  nv50/ir: fix tex constraints for surface coords on Fermi
  nvc0/ir: add emission for OP_SULEA
  nvc0/ir: add emission for SULDB and SUSTx
  nvc0/ir: add a lowering pass for surfaces on Fermi
  nvc0: enable ARB_shader_image_load_store on GF100
  nvc0: expose GLSL version 420 on GF100

 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 104 +-
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 117 
 .../nouveau/codegen/nv50_ir_lowering_nvc0.h|   2 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp |  10 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_compute.c|  53 +++
 src/gallium/drivers/nouveau/nvc0/nvc0_context.h|   1 +
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c|   8 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |   5 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_tex.c| 154 -
 9 files changed, 438 insertions(+), 16 deletions(-)

-- 
2.8.2

** dEQP **

deqp-gles31/functional/image_load_store/3d/atomic/add_r32i_result: fail
deqp-gles31/functional/image_load_store/3d/atomic/add_r32i_return_value: fail
deqp-gles31/functional/image_load_store/3d/atomic/add_r32ui_result: fail
deqp-gles31/functional/image_load_store/3d/atomic/add_r32ui_return_value: fail
deqp-gles31/functional/image_load_store/3d/atomic/and_r32i_result: fail
deqp-gles31/functional/image_load_store/3d/atomic/and_r32i_return_value: fail
deqp-gles31/functional/image_load_store/3d/atomic/and_r32ui_result: fail
deqp-gles31/functional/image_load_store/3d/atomic/and_r32ui_return_value: fail
deqp-gles31/functional/image_load_store/3d/atomic/comp_swap_r32i_result: fail
deqp-gles31/functional/image_load_store/3d/atomic/comp_swap_r32i_return_value: 
fail
deqp-gles31/functional/image_load_store/3d/atomic/comp_swap_r32ui_result: fail
deqp-gles31/functional/image_load_store/3d/atomic/comp_swap_r32ui_return_value: 
fail
deqp-gles31/functional/image_load_store/3d/atomic/exchange_r32f_result: fail
deqp-gles31/functional/image_load_store/3d/atomic/exchange_r32f_return_value: 
fail
deqp-gles31/functional/image_load_store/3d/atomic/exchange_r32i_result: fail
deqp-gles31/functional/image_load_store/3d/atomic/exchange_r32i_return_value: 
fail
deqp-gles31/functional/image_load_store/3d/atomic/exchange_r32ui_result: fail
deqp-gles31/functional/image_load_store/3d/atomic/exchange_r32ui_return_value: 
fail
deqp-gles31/functional/image_load_store/3d/atomic/max_r32i_result: fail
deqp-gles31/functional/image_load_store/3d/atomic/max_r32i_return_value: fail
deqp-gles31/functional/image_load_store/3d/atomic/max_r32ui_result: fail
deqp-gles31/functional/image_load_store/3d/atomic/max_r32ui_return_value: fail
deqp-gles31/functional/image_load_store/3d/atomic/min_r32i_result: fail
deqp-gles31/functional/image_load_store/3d/atomic/min_r32i_return_value: fail
deqp-gles31/functional/image_load_store/3d/atomic/min_r32ui_result: fail
deqp-gles31/functional/image_load_store/3d/atomic/min_r32ui_return_value: fail
deqp-gles31/functional/image_load_store/3d/atomic/or_r32i_result: fail
deqp-gles31/functional/image_load_store/3d/atomic/or_r32i_return_value: fail
deqp-gles31/functional/image_load_store/3d/atomic/or_r32ui_result: fail
deqp-gles31/functional/image_load_store/3d/atomic/or_r32ui_return_value: fail
deqp-gles31/functional/image_load_store/3d/atomic/xor_r32i_result: fail
deqp-gles31/functional/image_load_store/3d/atomic/xor_r32i_return_value: fail
deqp-gles31/functional/image_load_store/3d/atomic/xor_r32ui_result: fail
deqp-gles31/functional/image_load_store/3d/atomic/xor_r32ui_return_value: fail
deqp-gles31/functional/image_load_store/3d/format_reinterpret/r32f_r32i: fail
deqp-gles31/functional/image_load_store/3d/format_reinterpret/r32f_r32ui: fail
deqp-gles31/functional/image_load_store/3d/format_reinterpret/r32f_rgba8: fail
deqp-gles31/functional/image_load_store/3d/format_reinterpret/r32f_rgba8_snorm: 
fail
deqp-gles31/functional/image_load_store/3d/format_reinterpret/r32f_rgba8i: fail
deqp-gles31/functional/image_load_store/3d/format_reinterpret/r32f_rgba8ui: fail
deqp-gles31/functional/image_load_store/3d/format_reinterpret/r32i_r32f: fail
deqp-gles31/fu

[Mesa-dev] [PATCH 2/8] nv50/ir: use moveSources to condense sources

2016-05-14 Thread Samuel Pitoiset
From: Ilia Mirkin 

This makes sure that rIndirectSrc and other things stay updated.

Signed-off-by: Ilia Mirkin 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
index 7e8bb17..27883a0 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
@@ -2073,14 +2073,9 @@ 
RegAlloc::InsertConstraintsPass::condenseSrcs(Instruction *insn,
merge->setDef(0, lval);
for (int s = a, i = 0; s <= b; ++s, ++i) {
   merge->setSrc(i, insn->getSrc(s));
-  insn->setSrc(s, NULL);
}
+   insn->moveSources(b + 1, a - b);
insn->setSrc(a, lval);
-
-   for (int k = a + 1, s = b + 1; insn->srcExists(s); ++s, ++k) {
-  insn->setSrc(k, insn->getSrc(s));
-  insn->setSrc(s, NULL);
-   }
insn->bb->insertBefore(insn, merge);
 
insn->putExtraSources(0, save);
-- 
2.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix undefined df bits in brw_reg comparisons.

2016-05-14 Thread Emil Velikov
Hi Ken,

On 14 May 2016 at 01:44, Kenneth Graunke  wrote:
> Commit 5310bca024f77da40ea6f4c275455f9cb0528f9e added a new "double df"
> field to the brw_reg struct, adding an extra 4 bytes of data that isn't
> usually initialized (or may contain irrelevant garbage if the struct is
> mutated).  This means that it's no longer safe to memcmp().
>
> Instead, add a brw_regs_equal() function which ignores the extra df bits
> unless they matter.  To keep the implementation cheap, we wrap the first
> set of fields in a union/struct so that we can use a single DWord
> comparison.
>
This seems to be roughly what I did a while ago [1], as part of a
series [2] to remove the memclear/memcmp in {brw,fs,src,dst}_reg.
Shame the series never got much input, even after a few pings :-(

-Emil
[1] https://patchwork.freedesktop.org/patch/63840/
[2] https://patchwork.freedesktop.org/series/483/
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/11] tgsi: remove culldist semantic.

2016-05-14 Thread Marek Olšák
Dave,
It should be noted that clip distances can be disabled by
pipe_rasterizer_state::clip_plane_enable, but cull distances can't.
(same as GL)

Roland,
Our hardware only has 2 vec4 outputs. Each component can be configured
to be "clip distance", "cull distance", or "disabled" independently.

Marek


On Sat, May 14, 2016 at 12:43 AM, Roland Scheidegger  wrote:
> Am 13.05.2016 um 23:10 schrieb Dave Airlie:
>> From: Dave Airlie 
>>
>> This isn't used anymore in the tree, culldist's
>> are part of the clipdist semantic, we could in theory
>> rename it, but I'm not sure there is much point, and
>> I'd have to be careful with virgl.
>>
>> Signed-off-by: Dave Airlie 
>> ---
>>  src/gallium/auxiliary/tgsi/tgsi_strings.c  |  1 -
>>  src/gallium/docs/source/tgsi.rst   | 22 ++
>>  src/gallium/include/pipe/p_shader_tokens.h |  1 -
>>  3 files changed, 18 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c 
>> b/src/gallium/auxiliary/tgsi/tgsi_strings.c
>> index 306ab4f..c13f7ea 100644
>> --- a/src/gallium/auxiliary/tgsi/tgsi_strings.c
>> +++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c
>> @@ -85,7 +85,6 @@ const char *tgsi_semantic_names[TGSI_SEMANTIC_COUNT] =
>> "PCOORD",
>> "VIEWPORT_INDEX",
>> "LAYER",
>> -   "CULLDIST",
>> "SAMPLEID",
>> "SAMPLEPOS",
>> "SAMPLEMASK",
>> diff --git a/src/gallium/docs/source/tgsi.rst 
>> b/src/gallium/docs/source/tgsi.rst
>> index 4315707..ab12490 100644
>> --- a/src/gallium/docs/source/tgsi.rst
>> +++ b/src/gallium/docs/source/tgsi.rst
>> @@ -2876,18 +2876,32 @@ annotated with those semantics.
>>  TGSI_SEMANTIC_CLIPDIST
>>  ""
>>
>> +Note this covers clipping and culling distances.
>> +
>>  When components of vertex elements are identified this way, these
>>  values are each assumed to be a float32 signed distance to a plane.
>> +
>> +For clip distances:
>>  Primitive setup only invokes rasterization on pixels for which
>> -the interpolated plane distances are >= 0. Multiple clip planes
>> -can be implemented simultaneously, by annotating multiple
>> -components of one or more vertex elements with the above specified
>> -semantic. The limits on both clip and cull distances are bound
>> +the interpolated plane distances are >= 0.
>> +
>> +For cull distances:
>> +Primitives will be completely discarded if the plane distance
>> +for all of the vertices in the primitive are < 0.
>> +If a vertex has a cull distance of NaN, that vertex counts as "out"
>> +(as if its < 0);
>> +
>> +Multiple clip/cull planes can be implemented simultaneously, by
>> +annotating multiple components of one or more vertex elements with
>> +the above specified semantic.
>> +The limits on both clip and cull distances are bound
>>  by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
>>  the maximum number of components that can be used to hold the
>>  distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
>>  which specifies the maximum number of registers which can be
>>  annotated with those semantics.
>> +The properties NUM_CLIPDIST_ENABLED and NUM_CULLDIST_ENABLED
>> +are used to divide up the 2 x vec4 space between clipping and culling.
> This should really say how it's determined which one is which (so clip
> dists come first).
>
>
> You should remove the TGSI_SEMANTIC_CULLDIST section.
>
> For patch 10, shouldn't this work with softpipe too?
>
> Honestly, I'm not a big fan of packed clip and cull dists in the same
> regs (it's still not the same as what d3d10 does in any case), my
> opinion is since we generally don't allow different semantics within the
> same reg, I see no good reason why we allow it here (and clip dists and
> cull dists, albeit somewhat similar, are still different). So, if some
> drivers wanted it in different regs and some in the same regs, I'd
> prefer it to be different regs in the interface, with drivers having to
> merge it when required, just because it looks cleaner. But if really all
> hw wants it like that, 6,8-11 are
> Reviewed-by: Roland Scheidegger 
> (But I'd like to hear from other driver's authors.)
>
> Roland
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 15/30] i965/fs: support doubles with UBO loads

2016-05-14 Thread Samuel Iglesias Gonsálvez


On 14/05/16 01:16, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez  writes:
> 
>> From: Iago Toral Quiroga 
>>
>> UBO loads with constant offset use the UNIFORM_PULL_CONSTANT_LOAD
>> instruction, which reads 16 bytes (a vec4) of data from memory. For dvec
>> types this only provides components x and y. Thus, if we are reading
>> more than 2 components we need to issue a second load at offset+16 to
>> read the next 16-byte chunk with components w and z.
>>
>> UBO loads with non-constant offset emit a load for each component
>> in the vector (and rely in CSE to fix redundant loads), so we only
>> need to consider the size of the data type when computing the offset
>> of each element in a vector.
>>
>> v2 (Sam):
>> - Adapt the code to use component() (Curro).
>>
>> Signed-off-by: Samuel Iglesias Gonsálvez 
>> Reviewed-by: Kenneth Graunke 
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 52 
>> +++-
>>  1 file changed, 45 insertions(+), 7 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
>> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> index 2d57fd3..02f1e81 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> @@ -3362,6 +3362,9 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
>> nir_intrinsic_instr *instr
>> nir->info.num_ubos - 1);
>>}
>>  
>> +  /* Number of 32-bit slots in the type */
>> +  unsigned type_slots = MAX2(1, type_sz(dest.type) / 4);
>> +
>>nir_const_value *const_offset = nir_src_as_const_value(instr->src[1]);
>>if (const_offset == NULL) {
>>   fs_reg base_offset = retype(get_nir_src(instr->src[1]),
>> @@ -3369,19 +3372,54 @@ fs_visitor::nir_emit_intrinsic(const fs_builder 
>> &bld, nir_intrinsic_instr *instr
>>  
>>   for (int i = 0; i < instr->num_components; i++)
>>  VARYING_PULL_CONSTANT_LOAD(bld, offset(dest, bld, i), 
>> surf_index,
>> -   base_offset, i * 4);
>> +   base_offset, i * 4 * type_slots);
> 
> Why not 'i * type_sz(...)'?  As before it seems like type_slots is just
> going to introduce rounding errors here for no benefit?
> 

Right, I will fix it.

>>} else {
>> + /* Even if we are loading doubles, a pull constant load will load
>> +  * a 32-bit vec4, so should only reserve vgrf space for that. If we
>> +  * need to load a full dvec4 we will have to emit 2 loads. This is
>> +  * similar to demote_pull_constants(), except that in that case we
>> +  * see individual accesses to each component of the vector and then
>> +  * we let CSE deal with duplicate loads. Here we see a vector 
>> access
>> +  * and we have to split it if necessary.
>> +  */
>>   fs_reg packed_consts = vgrf(glsl_type::float_type);
>>   packed_consts.type = dest.type;
>>  
>> - struct brw_reg const_offset_reg = brw_imm_ud(const_offset->u32[0] 
>> & ~15);
>> - bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD, packed_consts,
>> -  surf_index, const_offset_reg);
>> + unsigned const_offset_aligned = const_offset->u32[0] & ~15;
>> +
>> + /* A vec4 only contains half of a dvec4, if we need more than 2
>> +  * components of a dvec4 we will have to issue another load for
>> +  * components z and w
>> +  */
>> + int num_components;
>> + if (type_slots == 1)
>> +num_components = instr->num_components;
>> + else
>> +num_components = MIN2(2, instr->num_components);
>>
>> - const fs_reg consts = byte_offset(packed_consts, 
>> const_offset->u32[0] % 16);
>> + int remaining_components = instr->num_components;
>> + while (remaining_components > 0) {
>> +/* Read the vec4 from a 16-byte aligned offset */
>> +struct brw_reg const_offset_reg = 
>> brw_imm_ud(const_offset_aligned);
>> +bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
>> + retype(packed_consts, BRW_REGISTER_TYPE_F),
>> + surf_index, const_offset_reg);
>>  
>> - for (unsigned i = 0; i < instr->num_components; i++)
>> -bld.MOV(offset(dest, bld, i), component(consts, i));
>> +const fs_reg consts = byte_offset(packed_consts, 
>> (const_offset->u32[0] % 16));
> 
> This looks really fishy to me, if the initial offset is not 16B aligned
> you'll apply the same sub-16B offset to the result from each one of the
> subsequent pull constant loads.

This cannot happen thanks to the layout alignment rules, see below.

> Also you don't seem to take into
> account whether the initial offset is misaligned in the calculation of
> num_components -- If it is it looks like the first pull constant load
> could return less than "num_components" usable components and you wou

Re: [Mesa-dev] [PATCH 13/18] winsys/amdgpu: start with smaller IBs, growing as necessary

2016-05-14 Thread Marek Olšák
On Tue, May 10, 2016 at 1:21 AM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> This avoids allocating giant IBs from the outset, especially for CE and DMA.
>
> With this change, we also never flush prematurely due to the CE IB: as long
> as there is space in the buffer, we will use it.
> ---
>  src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 55 
> +--
>  src/gallium/winsys/amdgpu/drm/amdgpu_cs.h |  1 +
>  2 files changed, 46 insertions(+), 10 deletions(-)
>
> diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
> b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
> index a318670..546f224 100644
> --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
> +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
> @@ -335,11 +335,31 @@ static unsigned amdgpu_cs_add_buffer(struct 
> radeon_winsys_cs *rcs,
> return index;
>  }
>
> -static bool amdgpu_ib_new_buffer(struct radeon_winsys *ws, struct amdgpu_ib 
> *ib,
> - unsigned buffer_size)
> +static bool amdgpu_ib_new_buffer(struct radeon_winsys *ws, struct amdgpu_ib 
> *ib)
>  {
> struct pb_buffer *pb;
> uint8_t *mapped;
> +   unsigned buffer_size;
> +
> +   /* Always create a buffer that is at least as large as the largest IB
> +* seen so far (multiplied by a factor to reduce internal fragmentation),
> +* but never more than the maximum IB size supported by the hardware.
> +*/
> +   buffer_size = 4 << MIN2(19, 2 + util_last_bit(ib->max_ib_size));

Would you please use something more readable? I think it's equal or
very similar to this expression:

MIN2(2 * 1024 * 1024, 4 * 4 * util_next_power_of_two(ib->max_ib_size))

And a comment explaining those numbers would be useful. For example,
"2MB is the maximum IB size allowed by the winsys" (I think the hw
limit is 4 MB actually) "and we always allocate 4 times more space
than the maximum seen IB size aligned to 2^n".


> +
> +   switch (ib->ib_type) {
> +   case IB_CONST_PREAMBLE:
> +  buffer_size = MAX2(buffer_size, 4 * 1024);
> +  break;
> +   case IB_CONST:
> +  buffer_size = MAX2(buffer_size, 16 * 1024 * 4);
> +  break;
> +   case IB_MAIN:
> +  buffer_size = MAX2(buffer_size, 8 * 1024 * 4);
> +  break;
> +   default:
> +  unreachable("unhandled IB type");
> +   }
>
> pb = ws->buffer_create(ws, buffer_size, 4096, RADEON_DOMAIN_GTT,
>RADEON_FLAG_CPU_ACCESS);
> @@ -370,35 +390,34 @@ static bool amdgpu_get_new_ib(struct radeon_winsys *ws, 
> struct amdgpu_cs *cs,
>  */
> struct amdgpu_ib *ib = NULL;
> struct amdgpu_cs_ib_info *info = &cs->csc->ib[ib_type];
> -   unsigned buffer_size, ib_size;
> +   unsigned ib_size = 0;
>
> switch (ib_type) {
> case IB_CONST_PREAMBLE:
>ib = &cs->const_preamble_ib;
> -  buffer_size = 4 * 1024 * 4;
> -  ib_size = 1024 * 4;
> +  ib_size = 256 * 4;
>break;
> case IB_CONST:
>ib = &cs->const_ib;
> -  buffer_size = 512 * 1024 * 4;
> -  ib_size = 128 * 1024 * 4;
> +  ib_size = 8 * 1024 * 4;
>break;
> case IB_MAIN:
>ib = &cs->main;
> -  buffer_size = 128 * 1024 * 4;
> -  ib_size = 20 * 1024 * 4;
> +  ib_size = 4 * 1024 * 4;
>break;
> default:
>unreachable("unhandled IB type");
> }
>
> +   ib_size = MAX2(ib_size, 4 << MIN2(19, util_last_bit(ib->max_ib_size)));

This is an expression similar to the one above. Some unification would be nice.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Mesa 11.2.2 problems with Intel i965 graphics on Arch Linux

2016-05-14 Thread Vanja Z
Hi all,

I'm sorry if this is the wrong place to post this. Upgrading from mesa 11.2.1 
to 11.2.2 on Arch Linux results in several programs not working. I am getting 
the following errors when launching Paraview for example,

libGL error: unable to load driver: i965_dri.so
libGL error: driver pointer missing
libGL error: failed to load driver: i965
libGL error: unable to load driver: swrast_dri.so
libGL error: failed to load driver: swrast

Both files exist on my system,

/usr/lib/xorg/modules/dri/i965_dri.so
/usr/lib/xorg/modules/dri/swrast_dri.so

I am not sure if this is a problem with mesa, or with the Arch package or with 
my X configuration. I've tried asking on the Arch forums to no avail.


Best regards,
Vanja
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Drop bad ASSERT_TRUE in gl_CullDistance link_varyings test.

2016-05-14 Thread Jordan Justen
On 2016-05-13 19:26:37, Kenneth Graunke wrote:
> I don't know what the intention was here, but this function returns
> void.  We can't assert anything about its return value.

c1bbaff1e83f901d67d78f9e1ddfe8291dd09bfa seems to be related, and
appears to have changed this file similarly for some other cases.
Maybe a rebase issue.

There is a linker::populate_consumer_input_sets prototype at the top
of the file that has bool rather than void for the return. Can you
update it too?

Reviewed-by: Jordan Justen 

> 
> Fixes "make check" failures.
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/compiler/glsl/tests/varyings_test.cpp | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/src/compiler/glsl/tests/varyings_test.cpp 
> b/src/compiler/glsl/tests/varyings_test.cpp
> index 936f495..09bf1eb 100644
> --- a/src/compiler/glsl/tests/varyings_test.cpp
> +++ b/src/compiler/glsl/tests/varyings_test.cpp
> @@ -210,11 +210,11 @@ TEST_F(link_varyings, gl_CullDistance)
>  
> ir.push_tail(culldistance);
>  
> -   ASSERT_TRUE(linker::populate_consumer_input_sets(mem_ctx,
> -&ir,
> -consumer_inputs,
> -
> consumer_interface_inputs,
> -junk));
> +   linker::populate_consumer_input_sets(mem_ctx,
> +&ir,
> +consumer_inputs,
> +consumer_interface_inputs,
> +junk);
>  
> EXPECT_EQ(culldistance, junk[VARYING_SLOT_CULL_DIST0]);
> EXPECT_TRUE(is_empty(consumer_inputs));
> -- 
> 2.8.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix undefined df bits in brw_reg comparisons.

2016-05-14 Thread Samuel Iglesias Gonsálvez
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

With Curro's comment addressed,

Reviewed-by: Samuel Iglesias Gonsálvez 

On 14/05/16 02:44, Kenneth Graunke wrote:
> Commit 5310bca024f77da40ea6f4c275455f9cb0528f9e added a new "double
> df" field to the brw_reg struct, adding an extra 4 bytes of data
> that isn't usually initialized (or may contain irrelevant garbage
> if the struct is mutated).  This means that it's no longer safe to
> memcmp().
> 
> Instead, add a brw_regs_equal() function which ignores the extra df
> bits unless they matter.  To keep the implementation cheap, we wrap
> the first set of fields in a union/struct so that we can use a
> single DWord comparison.
> 
> Signed-off-by: Kenneth Graunke  --- 
> src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  2 +- 
> src/mesa/drivers/dri/i965/brw_reg.h  | 27
> +--- src/mesa/drivers/dri/i965/brw_shader.cpp
> |  2 +- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  2 +- 4
> files changed, 22 insertions(+), 11 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index
> 4f6f3a3..3b50a82 100644 ---
> a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -1010,7 +1010,7
> @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst,
> struct brw_reg src brw_set_default_mask_control(p,
> BRW_MASK_DISABLE); brw_set_default_access_mode(p, BRW_ALIGN_1);
> 
> -  if (memcmp(&surface_reg, &sampler_reg, sizeof(surface_reg))
> == 0) { +  if (brw_regs_equal(&surface_reg, &sampler_reg)) { 
> brw_MUL(p, addr, sampler_reg, brw_imm_uw(0x101)); } else { 
> brw_SHL(p, addr, sampler_reg, brw_imm_ud(8)); diff --git
> a/src/mesa/drivers/dri/i965/brw_reg.h
> b/src/mesa/drivers/dri/i965/brw_reg.h index 6d51623..71e1024
> 100644 --- a/src/mesa/drivers/dri/i965/brw_reg.h +++
> b/src/mesa/drivers/dri/i965/brw_reg.h @@ -234,14 +234,19 @@
> uint32_t brw_swizzle_immediate(enum brw_reg_type type, uint32_t x,
> unsigned swz) * or "structure of array" form: */ struct brw_reg { -
> enum brw_reg_type type:4; -   enum brw_reg_file file:3;  /* :2
> hardware format */ -   unsigned negate:1; /* source
> only */ -   unsigned abs:1;/* source only */ -
> unsigned address_mode:1;   /* relative addressing, hopefully!
> */ -   unsigned pad0:1; -   unsigned subnr:5;  /* :1 in
> align16 */ -   unsigned nr:16; +   union { +  struct { +
> enum brw_reg_type type:4; + enum brw_reg_file file:3;
> /* :2 hardware format */ + unsigned negate:1;
> /* source only */ + unsigned abs:1;/*
> source only */ + unsigned address_mode:1;   /* relative
> addressing, hopefully! */ + unsigned pad0:1; +
> unsigned subnr:5;  /* :1 in align16 */ +
> unsigned nr:16; +  }; +  uint32_t bits; +   };
> 
> union { struct { @@ -261,6 +266,12 @@ struct brw_reg { }; };
> 
> +static inline bool +brw_regs_equal(const struct brw_reg *a, const
> struct brw_reg *b) +{ +   const bool df = a->type ==
> BRW_REGISTER_TYPE_DF && a->file == IMM; +   return a->bits ==
> b->bits && (df ? a->df == b->df : a->ud == b->ud); +}
> 
> struct brw_indirect { unsigned addr_subnr:4; diff --git
> a/src/mesa/drivers/dri/i965/brw_shader.cpp
> b/src/mesa/drivers/dri/i965/brw_shader.cpp index a23f14e..8d9e309
> 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++
> b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -687,7 +687,7 @@
> backend_shader::backend_shader(const struct brw_compiler
> *compiler, bool backend_reg::equals(const backend_reg &r) const { -
> return memcmp((brw_reg *)this, (brw_reg *)&r, sizeof(brw_reg)) == 0
> && +   return brw_regs_equal((brw_reg *)this, (brw_reg *)&r) && 
> reg_offset == r.reg_offset; }
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index
> 4b44c3a..baf4422 100644 ---
> a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++
> b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -295,7 +295,7
> @@ generate_tex(struct brw_codegen *p, 
> brw_set_default_mask_control(p, BRW_MASK_DISABLE); 
> brw_set_default_access_mode(p, BRW_ALIGN_1);
> 
> -  if (memcmp(&surface_reg, &sampler_reg, sizeof(surface_reg))
> == 0) { +  if (brw_regs_equal(&surface_reg, &sampler_reg)) { 
> brw_MUL(p, addr, sampler_reg, brw_imm_uw(0x101)); } else { 
> brw_SHL(p, addr, sampler_reg, brw_imm_ud(8));
> 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJXNs3JAAoJEH/0ujLxfcNDbNoP/A1sDKrs2iHW7rN/pCT59Qvy
xe4ZoPaQU++gUzQbizOvrdKaibIj5SgwY6Cs9gvWoOy8FsOjRrQs2ptnCKyzfEog
TDrwQ/4CDY6Kc0NykVQgxJHmw/363XHqWo3FF6mpVyl7MZHyA9ffBxzFO1OSEhx8
uwpY6mt0NfDtBh/R4Pju5UAV7WT9VnIYWh4Te7M098EsEgf6iqg2I833ct1FbNWu
LJYu6g46cIN3Mig4Bak5H495Ws4phP+vBcIPKe+wcWSS/p3bGG0OOk3fcm3fDsD4
h7pE8sYBh/t5TInAMFfjAm9SSnXHmxe1zqkvh5XwD8WQPIAx9E9HnzccGDAs35o6
gU3O43DAkYqXm53oOJi5qOWeisllxQcr