Re: [Mesa-dev] Problems with accuracy of coeffs_init + attribs_update

2015-11-02 Thread Oded Gabbay
On Thu, Oct 29, 2015 at 9:37 PM, Roland Scheidegger  wrote:
> Am 29.10.2015 um 19:44 schrieb Oded Gabbay:
>> On Thu, Oct 29, 2015 at 5:31 PM, Roland Scheidegger  
>> wrote:
>>> Am 29.10.2015 um 12:27 schrieb Oded Gabbay:
 Hi Roland, Jose

 I wanted to bring a problem I found to your attention, and discuss
 about it and ways to solve it.

 I'm working on regressions of piglit gpu.py between x86-64 and
 ppc64le, when running with llvmpipe.

 One of the regressions manifests itself in 2 tests,
 clip-distance-bulk-copy and clip-distance-itemized-copy
>>> There's actually a third one, interface-vs-unnamed-to-fs-unnamed, which
>>> fails the same (and it's easier to debug...).
>>>
>> Yes, you are correct of course. It was also fixed by my change.
>>
>>>

 What happens in ppc64le is that *some* of the interpolated per-vertex
 values (clip-distance values) that constitute the input into the
 fragment shader, are not calculated according to the required accuracy
 of the test (which is an error/distance of less than 1e-6)

 It took a while, but I believe I found the reason. In general, when
 running on ppc64le, the code path that is taken at
 lp_build_interp_soa_init() is doing  bld->simple_interp = FALSE, which
 means using coeffs_init() + attribs_update(). That's in contrast with
 x86-64, where bld->simple_interp = TRUE, which means the code will use
 coeffs_init_simple() + attribs_update_simple()
>>> Note this isn't really about x86-64 per se. If you don't have AVX on a
>>> x86-64 box, the other path will be taken too (and the test fail as well).
>>>
>> makes sense.
>>

 In specific, the problem lies in how the coeffs are calculated inside
 coeffs_init. To simply put it, they are only *actually* calculated
 correctly for the first quad. The coeffs are later *updated* for the
 other quads, using dadx/dady and dadq.

 --
 side-note:
 I believe you even documented it in coeffs_init():
   /*
* a = a0 + (x * dadx + y * dady)
* a0aos is the attrib value at top left corner of stamp
*/
 --

 The root cause for that, I believe, is because coeffs_init does *not*
 take into account the different x/y offsets for the other quads. In
 contrast, on x86-64, the code uses a function called calc_offset(),
 which does take the different x/y offsets into account.

 Please look at this definition (from lp_bld_interp.c) :

 static const unsigned char quad_offset_x[16] = {0, 1, 0, 1, 2, 3, 2,
 3, 0, 1, 0, 1, 2, 3, 2, 3};
 static const unsigned char quad_offset_y[16] = {0, 0, 1, 1, 0, 0, 1,
 1, 2, 2, 3, 3, 2, 2, 3, 3};

 Now, look at how coeffs_init() uses them:

for (i = 0; i < coeff_bld->type.length; i++) {
   LLVMValueRef nr = lp_build_const_int32(gallivm, i);
   LLVMValueRef pixxf = lp_build_const_float(gallivm, quad_offset_x[i]);
   LLVMValueRef pixyf = lp_build_const_float(gallivm, quad_offset_y[i]);
   pixoffx = LLVMBuildInsertElement(builder, pixoffx, pixxf, nr, "");
   pixoffy = LLVMBuildInsertElement(builder, pixoffy, pixyf, nr, "");
}

 So, in ppc64le, where coeff_bld->type.length == 4, we *always* take
 the first four values from the above arrays. The code *never*
 calculates the coeffs based on the other offsets.

 Admittedly, that's *almost* always works and I doubt you will see a
 difference in display. However, in the tests I mentioned above, it
 creates, for some pixels, a bigger error than what is allowed.
>>> Well, I'm not sure what error actually is allowed... GL of course is
>>> vague (the test just seems to be built so the error allowed actually
>>> passes on real hw).
>>>
>>> I don't think there's an actual bug with the code. What it does in
>>> coeffs_init, is simply calculate the dadx + dady values from one pixel
>>> to the next (within a quad).
>>> Then, in attrib_update, it simply uses these values (which are constant
>>> for all quads, as the position of the pixels within a quad doesn't
>>> change) to calculate the actual dadq value based on the actual pixel
>>> offsets, and add that to the (also done in coeffs_init) starting value
>>> of each quad.
>>>
>>> So, the code is correct. However, it still gives different results
>>> compared to the other method, and the reason is just float math. This
>>> method does two interpolations (one for the start of the quad, the other
>>> per pixel), whereas the other method does just one. And this is why the
>>> results are different, because the results just aren't exact.
>>>
>> Yeah, I agree. I also wouldn't say it is a bug per-se, but it is a
>> method of calculation that produces less accurate results then the
>> other method (the simple path).
>>
>>>

 The first thing that came into my mind was to 

[Mesa-dev] [PATCH] gallivm: exit emit_fetch_constat() when no constants

2015-11-02 Thread Oded Gabbay
If we don't have any constants, just exit emit_fetch_constat() and don't
call LLVM functions.

This also prevents a crash that happens when we emit the prologue of the
fragment shader when DEBUG_EXECUTION is set to 1 and we don't have
constants (e.g. arb_blend_func_extended-fbo-extended-blend test in
piglit).

Signed-off-by: Oded Gabbay 
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
index fae604e..189d5da 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
@@ -1238,6 +1238,9 @@ emit_fetch_constant(
consts_ptr = bld->consts[dimension];
num_consts = bld->consts_sizes[dimension];
 
+   if (!consts_ptr)
+   return NULL;
+
if (reg->Register.Indirect) {
   LLVMValueRef indirect_index;
   LLVMValueRef swizzle_vec =
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: buffer support in TGSI for SSBO/atomic

2015-11-02 Thread Roland Scheidegger
I don't know much about ssbo, but since it looks like in glsl the
coherent etc. bits are on the variables, not the ops, it seems unnatural
to mark the op bits instead. So I'd guess it would be better if the
variables could be marked instead. If this isn't expressible in tgsi
maybe this needs to be fixed. Albeit I have to say it sounds odd to me
from a hw perspective if this variables with different bits can be
stuffed together and then the hw is expected to handle that efficiently...

Roland

Am 01.11.2015 um 23:45 schrieb Ilia Mirkin:
> Just wanted to note down some thoughts and get some feedback before
> going forward. I've already sent out a series which covered a lot of
> this, but in the end I realized it came up a bit short (available at
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_imirkin_mesa_commits_fd2=BQIGaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=ZEO6K764MpKKCTrBFReM7jS6WlerLtMTWbj_OABE6K8=yJ3Ee990VBHMVTEQzdXBcPDd1ioo-BizrAGpP4kU-Cg=
>  ).
> 
> There are two separate buffer-related features --
> ARB_shader_atomic_counters(_ops) and
> ARB_shader_storage_buffer_objects. The former are implementable more
> efficiently on EG/NI hardware by performing the atomic ops on
> not-main-memory (GDS? LDS?). However I think that the gallium-side
> interface can be mostly identical for both cases, perhaps we can mark
> the buffer as atomic-only in the TGSI.
> 
> Just like there is a CONST tgsi file, I want to add a BUFFER file,
> which will map to ->set_shader_buffers() indices. The tricky bit comes
> in from the fact that individual variables inside of a buffer may have
> different access/store properties. I see two ways to resolve this:
> 
> 1. Declare each variable explicitly, much like UBO's still get
> individual decls per slot. These decls could contain the relevant
> caching property.
> 
> 2. Make each LOAD/STORE op declare what caching it wants explicitly.
> 
> The first option would work well for images, but for ssbo, it feels
> problematic, as with all the various packing options that exist, you
> could still specify odd per-variable cache rules, which would be
> difficult to express in the TGSI DECL. However I'm not sure how to
> implement the second option.
> 
> There is a precedent of a saturate flag, but looking at
> tgsi_instruction, there are only 2 free bits. Since there are only 4
> different caching values (none, coherent, volatile, restrict; I'm not
> counting readonly/writeonly), this fits. However that would leave no
> more bits in tgsi_instruction. I could add a texture-style bit, saying
> to expect an additional tgsi_instruction_buffer packet with more info
> but that seems wasteful.
> 
> Another option is to just pass an immediate directly to the LOAD/STORE
> ops which would specify this caching spec as an extra source. This
> seems much simpler, but a little dirtier. Opinions much appreciated.
> 
> I think that one this is worked out, I'll be able to resend my series
> adding SSBO/atomic support to freedreno, and partial SSBO (without
> atomic*) support for nvc0.
> 
> Cheers,
> 
>   -ilia
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev=BQIGaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=ZEO6K764MpKKCTrBFReM7jS6WlerLtMTWbj_OABE6K8=OnyoWgHxyrDIN6esIAWVu0pQP5Mk8Iz3wNrzeeuTbvo=
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: Silence GCC maybe-uninitialized warnings.

2015-11-02 Thread Matt Turner
On Mon, Nov 2, 2015 at 1:30 AM, Vinson Lee  wrote:
> nir/nir_control_flow.c: In function ‘split_block_cursor.isra.11’:
> nir/nir_control_flow.c:460:15: warning: ‘after’ may be used uninitialized in 
> this function [-Wmaybe-uninitialized]
>*_after = after;
>^
> nir/nir_control_flow.c:458:16: warning: ‘before’ may be used uninitialized in 
> this function [-Wmaybe-uninitialized]
>*_before = before;
> ^
>
> Signed-off-by: Vinson Lee 
> ---
>  src/glsl/nir/nir_control_flow.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c
> index 7f51c4f..96395a4 100644
> --- a/src/glsl/nir/nir_control_flow.c
> +++ b/src/glsl/nir/nir_control_flow.c
> @@ -452,6 +452,9 @@ split_block_cursor(nir_cursor cursor,
>   before = split_block_before_instr(nir_instr_next(cursor.instr));
>}
>break;
> +
> +   default:
> +  unreachable("not reached");
> }

We've avoided adding this, because it seems like an absurd warning.
The enum that's the subject of the switch has four members, all of
which are handled in the switch.

Researching this, it appears that gcc doesn't consider this warning
indicate a bug because of some combination of (1) enums in C are just
ints and can contain values not associated with an enum member, and
(2) it would apparently require extra (generated) code to ensure that
the value is that of an enum member.

I don't imagine we'll want to add more members to the enum, so maybe
it's okay...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92022] st/va: add initial support for Video Post Processing and Export/Import of VaSurface

2015-11-02 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92022

Julien Isorce  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Julien Isorce  ---
All the patches related to the title have now landed

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] util: update util_resource_copy_region() for GL_ARB_copy_image

2015-11-02 Thread Roland Scheidegger
Am 31.10.2015 um 14:37 schrieb Brian Paul:
> This primarily means added support for copying between compressed
> and uncompressed formats.
> ---
>  src/gallium/auxiliary/util/u_surface.c | 106 
> +++--
>  1 file changed, 88 insertions(+), 18 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/util/u_surface.c 
> b/src/gallium/auxiliary/util/u_surface.c
> index 70ed911..5c32876 100644
> --- a/src/gallium/auxiliary/util/u_surface.c
> +++ b/src/gallium/auxiliary/util/u_surface.c
> @@ -238,8 +238,21 @@ util_fill_box(ubyte * dst,
>  }
>  
>  
> +/** Mipmap level size computation, with minimum block size */
> +static inline unsigned
> +minify(unsigned value, unsigned levels, unsigned blocksize)
> +{
> +return MAX2(blocksize, value >> levels);
> +}
Won't this result in a warning of unused function in release builds?

> +
> +
>  /**
>   * Fallback function for pipe->resource_copy_region().
> + * We support copying between different formats (including compressed/
> + * uncompressed) if the bytes per block or pixel matches.  If copying
> + * compressed -> uncompressed, the dst region is reduced by the block
> + * width, height.  If copying uncompressed -> compressed, the dest region
> + * is expanded by the block width, height.  See GL_ARB_copy_image.
>   * Note: (X,Y)=(0,0) is always the upper-left corner.
>   */
>  void
> @@ -249,13 +262,14 @@ util_resource_copy_region(struct pipe_context *pipe,
>unsigned dst_x, unsigned dst_y, unsigned dst_z,
>struct pipe_resource *src,
>unsigned src_level,
> -  const struct pipe_box *src_box)
> +  const struct pipe_box *src_box_in)
>  {
> struct pipe_transfer *src_trans, *dst_trans;
> uint8_t *dst_map;
> const uint8_t *src_map;
> enum pipe_format src_format, dst_format;
> -   struct pipe_box dst_box;
> +   struct pipe_box src_box, dst_box;
> +   unsigned src_bs, dst_bs, src_bw, dst_bw, src_bh, dst_bh;
>  
> assert(src && dst);
> if (!src || !dst)
> @@ -267,27 +281,83 @@ util_resource_copy_region(struct pipe_context *pipe,
> src_format = src->format;
> dst_format = dst->format;
>  
> -   assert(util_format_get_blocksize(dst_format) == 
> util_format_get_blocksize(src_format));
> -   assert(util_format_get_blockwidth(dst_format) == 
> util_format_get_blockwidth(src_format));
> -   assert(util_format_get_blockheight(dst_format) == 
> util_format_get_blockheight(src_format));
> +   /* init src box */
> +   src_box = *src_box_in;
> +
> +   /* init dst box */
> +   dst_box.x = dst_x;
> +   dst_box.y = dst_y;
> +   dst_box.z = dst_z;
> +   dst_box.width  = src_box.width;
> +   dst_box.height = src_box.height;
> +   dst_box.depth  = src_box.depth;
> +
> +   src_bs = util_format_get_blocksize(src_format);
> +   src_bw = util_format_get_blockwidth(src_format);
> +   src_bh = util_format_get_blockheight(src_format);
> +   dst_bs = util_format_get_blocksize(dst_format);
> +   dst_bw = util_format_get_blockwidth(dst_format);
> +   dst_bh = util_format_get_blockheight(dst_format);
> +
> +   /* Note: all box positions and sizes are in pixels */
> +   if (src_bw > 1 && dst_bw == 1) {
> +  /* Copy from compressed to uncompressed.
> +   * Shrink dest box by the src block size.
> +   */
> +  dst_box.width /= src_bw;
> +  dst_box.height /= src_bh;
This doesn't quite look right to me. If width/height wasn't a full block
size (for small mips, or npot texture), it will round down instead of up.

> +   }
> +   else if (src_bw == 1 && dst_bw > 1) {
> +  /* Copy from uncompressed to compressed.
> +   * Expand dest box by the dest block size.
> +   */
> +  dst_box.width *= dst_bw;
> +  dst_box.height *= dst_bh;
> +   }
> +   else {
> +  /* compressed -> compressed or uncompressed -> uncompressed copy */
> +  assert(src_bw == dst_bw);
> +  assert(src_bh == dst_bh);
> +   }
> +
> +   assert(src_bs == dst_bs);
> +   if (src_bs != dst_bs) {
> +  /* This can happen if we fail to do format checking before hand.
> +   * Don't crash below.
> +   */
> +  return;
> +   }
> +
> +   /* check that region boxes are block aligned */
> +   assert(src_box.x % src_bw == 0);
> +   assert(src_box.y % src_bh == 0);
> +   assert(src_box.width % src_bw == 0 || src_box.width < src_bw);
> +   assert(src_box.height % src_bh == 0 || src_box.height < src_bh);
> +   assert(dst_box.x % dst_bw == 0);
> +   assert(dst_box.y % dst_bh == 0);
> +   assert(dst_box.width % dst_bw == 0 || dst_box.width < dst_bw);
> +   assert(dst_box.height % dst_bh == 0 || dst_box.height < src_bh);
the last term should be dst_bh.


> +
> +   /* check that region boxes are not out of bounds */
> +   assert(src_box.x + src_box.width <= minify(src->width0, src_level, 
> src_bw));
> +   assert(src_box.y + src_box.height <= minify(src->height0, src_level, 
> src_bh));
> +   assert(dst_box.x + 

Re: [Mesa-dev] [PATCH 2/2] llvmpipe: enable PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS

2015-11-02 Thread Roland Scheidegger
Am 31.10.2015 um 14:37 schrieb Brian Paul:
> We disable all 3-channel array formats since we don't support 3-channel
> UNORM8 formats.
> 
> Recall that if a 3-channel format is requested by the user, we might
> actually wind up using a 4-channel format instead.  In fact, llvmpipe
> does this frequently.  If we don't disable all 3-channel formats, we
> might be asked to copy from a true 3-channel format to a "3->4 upgraded"
> format and that'll fail.
> 
> This enables GL_ARB_copy_image for llvmpipe.
> Piglit's arb_copy_image-formats test passes 100%.
> ---
>  src/gallium/drivers/llvmpipe/lp_screen.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c 
> b/src/gallium/drivers/llvmpipe/lp_screen.c
> index d1c50ae..dc216d6 100644
> --- a/src/gallium/drivers/llvmpipe/lp_screen.c
> +++ b/src/gallium/drivers/llvmpipe/lp_screen.c
> @@ -299,8 +299,9 @@ llvmpipe_get_param(struct pipe_screen *screen, enum 
> pipe_cap param)
> case PIPE_CAP_TGSI_TXQS:
> case PIPE_CAP_FORCE_PERSAMPLE_INTERP:
> case PIPE_CAP_SHAREABLE_SHADERS:
> -   case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS:
>return 0;
> +   case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS:
> +  return 1;
> }
> /* should only get here on unhandled cases */
> debug_printf("Unexpected PIPE_CAP %d query\n", param);
> @@ -441,6 +442,20 @@ llvmpipe_is_format_supported( struct pipe_screen 
> *_screen,
>}
> }
>  
> +   if ((bind & (PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW)) &&
> +   ((bind & PIPE_BIND_DISPLAY_TARGET) == 0) &&
> +   target != PIPE_BUFFER) {
> +  const struct util_format_description *desc =
> + util_format_description(format);
> +  if (desc->nr_channels == 3 && desc->is_array) {
> + /* Don't support any 3-component formats here since we don't support
> +  * the 3-component UNORM formats.  This allows GL_ARB_copy_image
> +  * to work.
I think this logic should be disabled for the 32bit per channel ones.
There are no such unorm formats, and those are also the ones which
actually work in llvmpipe (most 8 and 16 bit ones are disabled already
in _some_ cases, notably if they have rt flag, albeit it's something
which should be fixed).
Eventually could instead fix up the util copy instead (since some
formats just don't even have 3-channel ones even if they'd work in
llvmpipe, albeit I'm not really sure why not (wouldn't be that difficult
I guess, essentially just needed to copy each pixel separately, albeit
might not be worth it...).
I actually just realized the existing format check in llvmpipe to
disable certain 3-channel formats is bugged, since it will only disable
3-channel rgb16 if it's non-float to avoid crash in the backend, but of
course with a view we could end up with a non-float format there even if
the format was float (albeit that crash might have been fixed with newer
llvm version I guess...)...




> +  */
> + return FALSE;
> +  }
> +   }
> +
> if (bind & PIPE_BIND_DISPLAY_TARGET) {
>if(!winsys->is_displaytarget_format_supported(winsys, bind, format))
>   return FALSE;
> 

Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] util: update util_resource_copy_region() for GL_ARB_copy_image

2015-11-02 Thread Brian Paul

On 11/02/2015 11:14 AM, Roland Scheidegger wrote:

Am 31.10.2015 um 14:37 schrieb Brian Paul:

This primarily means added support for copying between compressed
and uncompressed formats.
---
  src/gallium/auxiliary/util/u_surface.c | 106 +++--
  1 file changed, 88 insertions(+), 18 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_surface.c 
b/src/gallium/auxiliary/util/u_surface.c
index 70ed911..5c32876 100644
--- a/src/gallium/auxiliary/util/u_surface.c
+++ b/src/gallium/auxiliary/util/u_surface.c
@@ -238,8 +238,21 @@ util_fill_box(ubyte * dst,
  }


+/** Mipmap level size computation, with minimum block size */
+static inline unsigned
+minify(unsigned value, unsigned levels, unsigned blocksize)
+{
+return MAX2(blocksize, value >> levels);
+}

Won't this result in a warning of unused function in release builds?


No, unused inline functions never warn.





+
+
  /**
   * Fallback function for pipe->resource_copy_region().
+ * We support copying between different formats (including compressed/
+ * uncompressed) if the bytes per block or pixel matches.  If copying
+ * compressed -> uncompressed, the dst region is reduced by the block
+ * width, height.  If copying uncompressed -> compressed, the dest region
+ * is expanded by the block width, height.  See GL_ARB_copy_image.
   * Note: (X,Y)=(0,0) is always the upper-left corner.
   */
  void
@@ -249,13 +262,14 @@ util_resource_copy_region(struct pipe_context *pipe,
unsigned dst_x, unsigned dst_y, unsigned dst_z,
struct pipe_resource *src,
unsigned src_level,
-  const struct pipe_box *src_box)
+  const struct pipe_box *src_box_in)
  {
 struct pipe_transfer *src_trans, *dst_trans;
 uint8_t *dst_map;
 const uint8_t *src_map;
 enum pipe_format src_format, dst_format;
-   struct pipe_box dst_box;
+   struct pipe_box src_box, dst_box;
+   unsigned src_bs, dst_bs, src_bw, dst_bw, src_bh, dst_bh;

 assert(src && dst);
 if (!src || !dst)
@@ -267,27 +281,83 @@ util_resource_copy_region(struct pipe_context *pipe,
 src_format = src->format;
 dst_format = dst->format;

-   assert(util_format_get_blocksize(dst_format) == 
util_format_get_blocksize(src_format));
-   assert(util_format_get_blockwidth(dst_format) == 
util_format_get_blockwidth(src_format));
-   assert(util_format_get_blockheight(dst_format) == 
util_format_get_blockheight(src_format));
+   /* init src box */
+   src_box = *src_box_in;
+
+   /* init dst box */
+   dst_box.x = dst_x;
+   dst_box.y = dst_y;
+   dst_box.z = dst_z;
+   dst_box.width  = src_box.width;
+   dst_box.height = src_box.height;
+   dst_box.depth  = src_box.depth;
+
+   src_bs = util_format_get_blocksize(src_format);
+   src_bw = util_format_get_blockwidth(src_format);
+   src_bh = util_format_get_blockheight(src_format);
+   dst_bs = util_format_get_blocksize(dst_format);
+   dst_bw = util_format_get_blockwidth(dst_format);
+   dst_bh = util_format_get_blockheight(dst_format);
+
+   /* Note: all box positions and sizes are in pixels */
+   if (src_bw > 1 && dst_bw == 1) {
+  /* Copy from compressed to uncompressed.
+   * Shrink dest box by the src block size.
+   */
+  dst_box.width /= src_bw;
+  dst_box.height /= src_bh;

This doesn't quite look right to me. If width/height wasn't a full block
size (for small mips, or npot texture), it will round down instead of up.


I'll fix that.  Though, IIRC, I was assuming that the region size would 
be a multiple of the block size.  It looks like we check for that in 
_mesa_CopyImageSubData().  But that's not necessarily the case for small 
mipmap levels.  I think we'll need a new piglit test for that.






+   }
+   else if (src_bw == 1 && dst_bw > 1) {
+  /* Copy from uncompressed to compressed.
+   * Expand dest box by the dest block size.
+   */
+  dst_box.width *= dst_bw;
+  dst_box.height *= dst_bh;
+   }
+   else {
+  /* compressed -> compressed or uncompressed -> uncompressed copy */
+  assert(src_bw == dst_bw);
+  assert(src_bh == dst_bh);
+   }
+
+   assert(src_bs == dst_bs);
+   if (src_bs != dst_bs) {
+  /* This can happen if we fail to do format checking before hand.
+   * Don't crash below.
+   */
+  return;
+   }
+
+   /* check that region boxes are block aligned */
+   assert(src_box.x % src_bw == 0);
+   assert(src_box.y % src_bh == 0);
+   assert(src_box.width % src_bw == 0 || src_box.width < src_bw);
+   assert(src_box.height % src_bh == 0 || src_box.height < src_bh);
+   assert(dst_box.x % dst_bw == 0);
+   assert(dst_box.y % dst_bh == 0);
+   assert(dst_box.width % dst_bw == 0 || dst_box.width < dst_bw);
+   assert(dst_box.height % dst_bh == 0 || dst_box.height < src_bh);

the last term should be dst_bh.


Yes.






+
+   /* check that region boxes are not out of bounds */
+   assert(src_box.x + 

Re: [Mesa-dev] [PATCH] glsl: OpenGLES GLSL 3.1 precision qualifiers ordering rules

2015-11-02 Thread Iago Toral
On Sat, 2015-10-31 at 23:22 -0700, Jordan Justen wrote:
> On 2015-10-29 01:22:37, Iago Toral wrote:
> > On Thu, 2015-10-29 at 00:47 -0700, Jordan Justen wrote:
> > > The OpenGLES GLSL 3.1 specification uses the precision qualifier
> > > ordering rules from ARB_shading_language_420pack.
> > 
> > Maybe expand the commit log to make explicit that this is for GLES 3.1
> > and desktop GL since 4.2
> 
> has_420pack already checks for OpenGL 4.2, so for desktop GL, this
> patch doesn't change anything.

Oh right, I had missed that.

> I don't think we want to add OpenGLES 3.1 to has_420pack because the
> 3.1 spec doesn't appear to adopt all of 420pack.

Yeah, agreed.

Iago

> -Jordan
> 
> > Reviewed-by: Iago Toral Quiroga 
> > 
> > > Signed-off-by: Jordan Justen 
> > > ---
> > >  src/glsl/glsl_parser.yy | 6 --
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy
> > > index 2f2e10d..4636435 100644
> > > --- a/src/glsl/glsl_parser.yy
> > > +++ b/src/glsl/glsl_parser.yy
> > > @@ -948,7 +948,8 @@ parameter_qualifier:
> > >if ($2.precision != ast_precision_none)
> > >   _mesa_glsl_error(&@1, state, "duplicate precision qualifier");
> > >  
> > > -  if (!state->has_420pack() && $2.flags.i != 0)
> > > +  if (!(state->has_420pack() || state->is_version(420, 310)) &&
> > > +  $2.flags.i != 0)
> > >   _mesa_glsl_error(&@1, state, "precision qualifiers must come 
> > > last");
> > >  
> > >$$ = $2;
> > > @@ -1847,7 +1848,8 @@ type_qualifier:
> > >if ($2.precision != ast_precision_none)
> > >   _mesa_glsl_error(&@1, state, "duplicate precision qualifier");
> > >  
> > > -  if (!state->has_420pack() && $2.flags.i != 0)
> > > +  if (!(state->has_420pack() || state->is_version(420, 310)) &&
> > > +  $2.flags.i != 0)
> > >   _mesa_glsl_error(&@1, state, "precision qualifiers must come 
> > > last");
> > >  
> > >$$ = $2;
> > 
> > 
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/2] mesa: Update DispatchComputeIndirect errors for indirect parameter

2015-11-02 Thread Iago Toral
the changes look reasonable. For both patches:

Reviewed-by: Iago Toral Quiroga 

On Mon, 2015-11-02 at 00:35 -0800, Jordan Justen wrote:
> There is some discrepancy between the return values for some error
> cases for the DispatchComputeIndirect call in the ARB_compute_shader
> specification. Regarding the indirect parameter, in one place the
> extension spec lists that the error returned for invalid values should
> be INVALID_OPERATION, while later it specifies INVALID_VALUE.
> 
> The OpenGL 4.3 and OpenGLES 3.1 specifications appear to be consistent
> in requiring the INVALID_VALUE error return in this case.
> 
> Here we update the code to match the main specifications, and update
> the citations use the main specification rather than the extension
> specification.
> 
> v2:
>  * Updates based on review from Iago
> 
> Signed-off-by: Jordan Justen 
> Cc: Iago Toral Quiroga 
> Cc: Marta Lofstedt 
> ---
>  src/mesa/main/api_validate.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
> index 1f729e8..a3ee8c0 100644
> --- a/src/mesa/main/api_validate.c
> +++ b/src/mesa/main/api_validate.c
> @@ -960,20 +960,19 @@ valid_dispatch_indirect(struct gl_context *ctx,
> if (!check_valid_to_compute(ctx, name))
>return GL_FALSE;
>  
> -   /* From the ARB_compute_shader specification:
> +   /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
>  *
> -* "An INVALID_OPERATION error is generated [...] if  is less
> -*  than zero or not a multiple of the size, in basic machine units, of
> -*  uint."
> +* "An INVALID_VALUE error is generated if indirect is negative or is not 
> a
> +*  multiple of four."
>  */
> if ((GLintptr)indirect & (sizeof(GLuint) - 1)) {
> -  _mesa_error(ctx, GL_INVALID_OPERATION,
> +  _mesa_error(ctx, GL_INVALID_VALUE,
>"%s(indirect is not aligned)", name);
>return GL_FALSE;
> }
>  
> if ((GLintptr)indirect < 0) {
> -  _mesa_error(ctx, GL_INVALID_OPERATION,
> +  _mesa_error(ctx, GL_INVALID_VALUE,
>"%s(indirect is less than zero)", name);
>return GL_FALSE;
> }


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: buffer support in TGSI for SSBO/atomic

2015-11-02 Thread Ilia Mirkin
FTR these are the various operators on nvidia hw:

http://docs.nvidia.com/cuda/parallel-thread-execution/#cache-operators

Most of these map directly to instruction things (ca/cg/cs/cv sound
familiar, dunno about lu, could just be an assembler helper).

How backwards-compatible is TGSI supposed to be? Can we change the
encoding willy-nilly, or are there separate systems that talk to each
other using TGSI that would need coordination?

  -ilia

On Mon, Nov 2, 2015 at 2:49 PM, Roland Scheidegger  wrote:
> Ok, I guess if it's really flagged on the instructions in hw, it seems
> reasonable to do it on the instructions in tgsi as well.
> Using the last two bits there doesn't sound nice indeed (in particular
> if maybe you'd wanted to encode the read/write bits as well at some
> point too), but it's not THAT bad I think. We can scrap some bits later
> if needed from it (token type is 4 bits but never larger than 3, NumSrcs
> could easily do with 3 instead of 4 bits too and at some point the
> predicate bit can go too). Albeit an extra token might be a good option
> too (if you decided to add those r/w bits...)
>
> Though I still don't quite understand how gpus can do that efficiently
> if you can do different flags with data which might be in the same cache
> line. But maybe it's less of a problem than I thought...
>
> Roland
>
>
> Am 02.11.2015 um 20:07 schrieb Ilia Mirkin:
>> I haven't the faintest idea about efficiently, but these things flags
>> on the ld/st instructions in the nvidia ISA for SM20+ (and I just
>> plain don't know about SM10). I'm moderately sure that's the case for
>> GCN as well.
>>
>> The difficulty with TGSI is that you might have something like
>>
>> layout (std430) buffer foo {
>>   coherent int a;
>>   int b;
>> }
>>
>> Now I don't remember if they get baked into the same vec4, but I think
>> they do. If they don't, then ARB_enhanced_layouts will fix that right
>> up. Since TGSI is vec4-oriented, it's really awkward to specify that
>> sort of thing... how would you do it?
>>
>> DECL BUFFER[0][0].x COHERENT
>> DECL BUFFER[0][0].y
>>
>> And then totally unrelated to the separate bits, you can end up with
>>
>> layout (std430) buffer foo {
>>   int foo[5];
>> }
>>
>> and I have no idea how to even express that in TGSI -- it'd want
>> things to be aligned to 16 bytes, but it'll be packed tightly here.
>> This worked OK for layout (std140), but won't work with more advanced
>> layouts. This will be a problem for UBOs too -- perhaps we need to
>> allow something like
>>
>> LOAD dst, CONST[1][0], offset
>>
>> to account for that. And lastly, ssbo allows for something like
>>
>> layout (std430) buffer foo {
>>   int foo[];
>> }
>>
>> And you can access foo[anything-you-want] -- difficult to declare that
>> in TGSI. I could invent stuff for all of these situations, but it
>> seems to be a lot easier to just feed the data to load and forget
>> about it. That's how it's all encoded in the GLSL IR as well.
>>
>>   -ilia
>>
>>
>> On Mon, Nov 2, 2015 at 1:56 PM, Roland Scheidegger  
>> wrote:
>>> I don't know much about ssbo, but since it looks like in glsl the
>>> coherent etc. bits are on the variables, not the ops, it seems unnatural
>>> to mark the op bits instead. So I'd guess it would be better if the
>>> variables could be marked instead. If this isn't expressible in tgsi
>>> maybe this needs to be fixed. Albeit I have to say it sounds odd to me
>>> from a hw perspective if this variables with different bits can be
>>> stuffed together and then the hw is expected to handle that efficiently...
>>>
>>> Roland
>>>
>>> Am 01.11.2015 um 23:45 schrieb Ilia Mirkin:
 Just wanted to note down some thoughts and get some feedback before
 going forward. I've already sent out a series which covered a lot of
 this, but in the end I realized it came up a bit short (available at
 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_imirkin_mesa_commits_fd2=BQIGaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=ZEO6K764MpKKCTrBFReM7jS6WlerLtMTWbj_OABE6K8=yJ3Ee990VBHMVTEQzdXBcPDd1ioo-BizrAGpP4kU-Cg=
  ).

 There are two separate buffer-related features --
 ARB_shader_atomic_counters(_ops) and
 ARB_shader_storage_buffer_objects. The former are implementable more
 efficiently on EG/NI hardware by performing the atomic ops on
 not-main-memory (GDS? LDS?). However I think that the gallium-side
 interface can be mostly identical for both cases, perhaps we can mark
 the buffer as atomic-only in the TGSI.

 Just like there is a CONST tgsi file, I want to add a BUFFER file,
 which will map to ->set_shader_buffers() indices. The tricky bit comes
 in from the fact that individual variables inside of a buffer may have
 different access/store properties. I see two ways to resolve this:

 1. Declare each variable explicitly, much like UBO's still get

[Mesa-dev] [PATCH] i965: Fix texture views of 2d array surfaces

2015-11-02 Thread Ben Widawsky
It is legal to have a texture view of a single layer from a 2D array texture;
you can sample from it, or render to it. Intel hardware needs to be made aware
when it is using a 2d array surface in the surface state. The texture view is
just a 2d surface with the backing miptree actually being a 2d array surface.
This caused the previous code would not set the right bit in the surface state
since it wasn't considered an array texture.

I spotted this early on in debug but brushed it off because it is clearly not
needed on other platforms (since they all pass). I have no idea how this works
properly on other platforms (I think gen7 introduced the bit in the state, but I
am too lazy to check). As such, I have opted not to modify gen7, though I
believe the current code is wrong there as well.

Thanks to Chris for helping me debug this.

Cc: Chris Forbes 
Reported-by: Mark Janes  (Jenkins)
References: https://www.opengl.org/registry/specs/ARB/texture_view.txt
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92609
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/gen8_surface_state.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index 18b8665..19d449b 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -252,8 +252,9 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
 format == BRW_SURFACEFORMAT_BC7_UNORM))
   surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
 
-   if (_mesa_is_array_texture(target) || target == GL_TEXTURE_CUBE_MAP)
+   if (depth && target != GL_TEXTURE_3D) {
   surf[0] |= GEN8_SURFACE_IS_ARRAY;
+   }
 
surf[1] = SET_FIELD(mocs_wb, GEN8_SURFACE_MOCS) | mt->qpitch >> 2;
 
@@ -428,7 +429,6 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
case GL_TEXTURE_CUBE_MAP_ARRAY:
case GL_TEXTURE_CUBE_MAP:
   surf_type = BRW_SURFACE_2D;
-  is_array = true;
   depth *= 6;
   break;
case GL_TEXTURE_3D:
@@ -436,10 +436,11 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
   /* fallthrough */
default:
   surf_type = translate_tex_target(gl_target);
-  is_array = _mesa_tex_target_is_array(gl_target);
   break;
}
 
+   is_array = depth > 1 && gl_target != GL_TEXTURE_3D;
+
/* _NEW_BUFFERS */
/* Render targets can't use IMS layout. Stencil in turn gets configured as
 * single sampled and indexed manually by the program.
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: buffer support in TGSI for SSBO/atomic

2015-11-02 Thread Ilia Mirkin
I haven't the faintest idea about efficiently, but these things flags
on the ld/st instructions in the nvidia ISA for SM20+ (and I just
plain don't know about SM10). I'm moderately sure that's the case for
GCN as well.

The difficulty with TGSI is that you might have something like

layout (std430) buffer foo {
  coherent int a;
  int b;
}

Now I don't remember if they get baked into the same vec4, but I think
they do. If they don't, then ARB_enhanced_layouts will fix that right
up. Since TGSI is vec4-oriented, it's really awkward to specify that
sort of thing... how would you do it?

DECL BUFFER[0][0].x COHERENT
DECL BUFFER[0][0].y

And then totally unrelated to the separate bits, you can end up with

layout (std430) buffer foo {
  int foo[5];
}

and I have no idea how to even express that in TGSI -- it'd want
things to be aligned to 16 bytes, but it'll be packed tightly here.
This worked OK for layout (std140), but won't work with more advanced
layouts. This will be a problem for UBOs too -- perhaps we need to
allow something like

LOAD dst, CONST[1][0], offset

to account for that. And lastly, ssbo allows for something like

layout (std430) buffer foo {
  int foo[];
}

And you can access foo[anything-you-want] -- difficult to declare that
in TGSI. I could invent stuff for all of these situations, but it
seems to be a lot easier to just feed the data to load and forget
about it. That's how it's all encoded in the GLSL IR as well.

  -ilia


On Mon, Nov 2, 2015 at 1:56 PM, Roland Scheidegger  wrote:
> I don't know much about ssbo, but since it looks like in glsl the
> coherent etc. bits are on the variables, not the ops, it seems unnatural
> to mark the op bits instead. So I'd guess it would be better if the
> variables could be marked instead. If this isn't expressible in tgsi
> maybe this needs to be fixed. Albeit I have to say it sounds odd to me
> from a hw perspective if this variables with different bits can be
> stuffed together and then the hw is expected to handle that efficiently...
>
> Roland
>
> Am 01.11.2015 um 23:45 schrieb Ilia Mirkin:
>> Just wanted to note down some thoughts and get some feedback before
>> going forward. I've already sent out a series which covered a lot of
>> this, but in the end I realized it came up a bit short (available at
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_imirkin_mesa_commits_fd2=BQIGaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=ZEO6K764MpKKCTrBFReM7jS6WlerLtMTWbj_OABE6K8=yJ3Ee990VBHMVTEQzdXBcPDd1ioo-BizrAGpP4kU-Cg=
>>  ).
>>
>> There are two separate buffer-related features --
>> ARB_shader_atomic_counters(_ops) and
>> ARB_shader_storage_buffer_objects. The former are implementable more
>> efficiently on EG/NI hardware by performing the atomic ops on
>> not-main-memory (GDS? LDS?). However I think that the gallium-side
>> interface can be mostly identical for both cases, perhaps we can mark
>> the buffer as atomic-only in the TGSI.
>>
>> Just like there is a CONST tgsi file, I want to add a BUFFER file,
>> which will map to ->set_shader_buffers() indices. The tricky bit comes
>> in from the fact that individual variables inside of a buffer may have
>> different access/store properties. I see two ways to resolve this:
>>
>> 1. Declare each variable explicitly, much like UBO's still get
>> individual decls per slot. These decls could contain the relevant
>> caching property.
>>
>> 2. Make each LOAD/STORE op declare what caching it wants explicitly.
>>
>> The first option would work well for images, but for ssbo, it feels
>> problematic, as with all the various packing options that exist, you
>> could still specify odd per-variable cache rules, which would be
>> difficult to express in the TGSI DECL. However I'm not sure how to
>> implement the second option.
>>
>> There is a precedent of a saturate flag, but looking at
>> tgsi_instruction, there are only 2 free bits. Since there are only 4
>> different caching values (none, coherent, volatile, restrict; I'm not
>> counting readonly/writeonly), this fits. However that would leave no
>> more bits in tgsi_instruction. I could add a texture-style bit, saying
>> to expect an additional tgsi_instruction_buffer packet with more info
>> but that seems wasteful.
>>
>> Another option is to just pass an immediate directly to the LOAD/STORE
>> ops which would specify this caching spec as an extra source. This
>> seems much simpler, but a little dirtier. Opinions much appreciated.
>>
>> I think that one this is worked out, I'll be able to resend my series
>> adding SSBO/atomic support to freedreno, and partial SSBO (without
>> atomic*) support for nvc0.
>>
>> Cheers,
>>
>>   -ilia
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> 

Re: [Mesa-dev] RFC: buffer support in TGSI for SSBO/atomic

2015-11-02 Thread Marek Olšák
On Mon, Nov 2, 2015 at 8:07 PM, Ilia Mirkin  wrote:
> I haven't the faintest idea about efficiently, but these things flags
> on the ld/st instructions in the nvidia ISA for SM20+ (and I just
> plain don't know about SM10). I'm moderately sure that's the case for
> GCN as well.
>
> The difficulty with TGSI is that you might have something like
>
> layout (std430) buffer foo {
>   coherent int a;
>   int b;
> }
>
> Now I don't remember if they get baked into the same vec4, but I think
> they do. If they don't, then ARB_enhanced_layouts will fix that right
> up. Since TGSI is vec4-oriented, it's really awkward to specify that
> sort of thing... how would you do it?
>
> DECL BUFFER[0][0].x COHERENT
> DECL BUFFER[0][0].y
>
> And then totally unrelated to the separate bits, you can end up with
>
> layout (std430) buffer foo {
>   int foo[5];
> }
>
> and I have no idea how to even express that in TGSI -- it'd want
> things to be aligned to 16 bytes, but it'll be packed tightly here.
> This worked OK for layout (std140), but won't work with more advanced
> layouts. This will be a problem for UBOs too -- perhaps we need to
> allow something like
>
> LOAD dst, CONST[1][0], offset
>
> to account for that. And lastly, ssbo allows for something like
>
> layout (std430) buffer foo {
>   int foo[];
> }
>
> And you can access foo[anything-you-want] -- difficult to declare that
> in TGSI. I could invent stuff for all of these situations, but it
> seems to be a lot easier to just feed the data to load and forget
> about it. That's how it's all encoded in the GLSL IR as well.

That's even more fun than I thought. ;)

I guess declaring buffers and not variables is the way to go, e.g.:

BUFFER[0]
BUFFER[1]

Then you can put the offsets and flags into loads and stores:

LOAD dst.xyz, BUFFER[0], offset_in_dwords # coherent, volatile

A backend that cares about declarations can reconstruct them by
reading the instructions.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: buffer support in TGSI for SSBO/atomic

2015-11-02 Thread Roland Scheidegger
Ok, I guess if it's really flagged on the instructions in hw, it seems
reasonable to do it on the instructions in tgsi as well.
Using the last two bits there doesn't sound nice indeed (in particular
if maybe you'd wanted to encode the read/write bits as well at some
point too), but it's not THAT bad I think. We can scrap some bits later
if needed from it (token type is 4 bits but never larger than 3, NumSrcs
could easily do with 3 instead of 4 bits too and at some point the
predicate bit can go too). Albeit an extra token might be a good option
too (if you decided to add those r/w bits...)

Though I still don't quite understand how gpus can do that efficiently
if you can do different flags with data which might be in the same cache
line. But maybe it's less of a problem than I thought...

Roland


Am 02.11.2015 um 20:07 schrieb Ilia Mirkin:
> I haven't the faintest idea about efficiently, but these things flags
> on the ld/st instructions in the nvidia ISA for SM20+ (and I just
> plain don't know about SM10). I'm moderately sure that's the case for
> GCN as well.
> 
> The difficulty with TGSI is that you might have something like
> 
> layout (std430) buffer foo {
>   coherent int a;
>   int b;
> }
> 
> Now I don't remember if they get baked into the same vec4, but I think
> they do. If they don't, then ARB_enhanced_layouts will fix that right
> up. Since TGSI is vec4-oriented, it's really awkward to specify that
> sort of thing... how would you do it?
> 
> DECL BUFFER[0][0].x COHERENT
> DECL BUFFER[0][0].y
> 
> And then totally unrelated to the separate bits, you can end up with
> 
> layout (std430) buffer foo {
>   int foo[5];
> }
> 
> and I have no idea how to even express that in TGSI -- it'd want
> things to be aligned to 16 bytes, but it'll be packed tightly here.
> This worked OK for layout (std140), but won't work with more advanced
> layouts. This will be a problem for UBOs too -- perhaps we need to
> allow something like
> 
> LOAD dst, CONST[1][0], offset
> 
> to account for that. And lastly, ssbo allows for something like
> 
> layout (std430) buffer foo {
>   int foo[];
> }
> 
> And you can access foo[anything-you-want] -- difficult to declare that
> in TGSI. I could invent stuff for all of these situations, but it
> seems to be a lot easier to just feed the data to load and forget
> about it. That's how it's all encoded in the GLSL IR as well.
> 
>   -ilia
> 
> 
> On Mon, Nov 2, 2015 at 1:56 PM, Roland Scheidegger  wrote:
>> I don't know much about ssbo, but since it looks like in glsl the
>> coherent etc. bits are on the variables, not the ops, it seems unnatural
>> to mark the op bits instead. So I'd guess it would be better if the
>> variables could be marked instead. If this isn't expressible in tgsi
>> maybe this needs to be fixed. Albeit I have to say it sounds odd to me
>> from a hw perspective if this variables with different bits can be
>> stuffed together and then the hw is expected to handle that efficiently...
>>
>> Roland
>>
>> Am 01.11.2015 um 23:45 schrieb Ilia Mirkin:
>>> Just wanted to note down some thoughts and get some feedback before
>>> going forward. I've already sent out a series which covered a lot of
>>> this, but in the end I realized it came up a bit short (available at
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_imirkin_mesa_commits_fd2=BQIGaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=ZEO6K764MpKKCTrBFReM7jS6WlerLtMTWbj_OABE6K8=yJ3Ee990VBHMVTEQzdXBcPDd1ioo-BizrAGpP4kU-Cg=
>>>  ).
>>>
>>> There are two separate buffer-related features --
>>> ARB_shader_atomic_counters(_ops) and
>>> ARB_shader_storage_buffer_objects. The former are implementable more
>>> efficiently on EG/NI hardware by performing the atomic ops on
>>> not-main-memory (GDS? LDS?). However I think that the gallium-side
>>> interface can be mostly identical for both cases, perhaps we can mark
>>> the buffer as atomic-only in the TGSI.
>>>
>>> Just like there is a CONST tgsi file, I want to add a BUFFER file,
>>> which will map to ->set_shader_buffers() indices. The tricky bit comes
>>> in from the fact that individual variables inside of a buffer may have
>>> different access/store properties. I see two ways to resolve this:
>>>
>>> 1. Declare each variable explicitly, much like UBO's still get
>>> individual decls per slot. These decls could contain the relevant
>>> caching property.
>>>
>>> 2. Make each LOAD/STORE op declare what caching it wants explicitly.
>>>
>>> The first option would work well for images, but for ssbo, it feels
>>> problematic, as with all the various packing options that exist, you
>>> could still specify odd per-variable cache rules, which would be
>>> difficult to express in the TGSI DECL. However I'm not sure how to
>>> implement the second option.
>>>
>>> There is a precedent of a saturate flag, but looking at
>>> tgsi_instruction, there are only 2 free bits. Since there are only 4
>>> 

Re: [Mesa-dev] [PATCH v3 1/6] gallium: expose a debug message callback settable by context owner

2015-11-02 Thread Ilia Mirkin
On Mon, Nov 2, 2015 at 3:07 PM, Marek Olšák  wrote:
>> diff --git a/src/gallium/include/pipe/p_state.h 
>> b/src/gallium/include/pipe/p_state.h
>> index 4bf8d46..2843bb6 100644
>> --- a/src/gallium/include/pipe/p_state.h
>> +++ b/src/gallium/include/pipe/p_state.h
>> @@ -684,6 +684,35 @@ struct pipe_compute_state
>> unsigned req_input_mem; /**< Required size of the INPUT resource. */
>>  };
>>
>> +/**
>> + * Structure that contains a callback for debug messages from the driver 
>> back
>> + * to the state tracker.
>> + */
>> +struct pipe_debug_info
>> +{
>
> I would prefer it if this structure were called pipe_debug_callback,
> which is what it really is.

To confirm -- just change the struct, leave the set_debug_info call alone?

Thanks,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallivm: exit emit_fetch_constat() when no constants

2015-11-02 Thread Roland Scheidegger
Am 02.11.2015 um 15:56 schrieb Oded Gabbay:
> If we don't have any constants, just exit emit_fetch_constat() and don't
> call LLVM functions.
> 
> This also prevents a crash that happens when we emit the prologue of the
> fragment shader when DEBUG_EXECUTION is set to 1 and we don't have
> constants (e.g. arb_blend_func_extended-fbo-extended-blend test in
> piglit).
> 
> Signed-off-by: Oded Gabbay 
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c 
> b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
> index fae604e..189d5da 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
> @@ -1238,6 +1238,9 @@ emit_fetch_constant(
> consts_ptr = bld->consts[dimension];
> num_consts = bld->consts_sizes[dimension];
>  
> +   if (!consts_ptr)
> +   return NULL;
> +
> if (reg->Register.Indirect) {
>LLVMValueRef indirect_index;
>LLVMValueRef swizzle_vec =
> 


I'm not convinced that's the right solution. Clearly, a shader trying to
fetch a constant but without a const dcl is invalid.
Not sure what's the problem with the debug stuff actually. Some quick
look seems to say this should work, since it will output only up from 0
to file_max, and some other quick look at tgsi_scan_shader says this
should be initialized to -1, thus it should never try to output anything
for the const file. Might be worth trying to figure out why that's not
the case.

Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 2/2] mesa: Add spec citations for DispatchCompute*

2015-11-02 Thread Jordan Justen
On 2015-11-02 00:35:36, Jordan Justen wrote:
> Note: The OpenGL 4.3 - 4.5 specification language for DispatchCompute
> appears to have an error regarding the max allowed values. When adding
> the specification citation, we note why the code does not match the
> specification language.
> 
> v2:
>  * Updates based on review from Iago
> 
> Signed-off-by: Jordan Justen 
> Cc: Iago Toral Quiroga 
> Cc: Marta Lofstedt 
> ---
>  src/mesa/main/api_validate.c | 34 +-
>  1 file changed, 29 insertions(+), 5 deletions(-)
> 
> diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
> index a3ee8c0..a490189 100644
> --- a/src/mesa/main/api_validate.c
> +++ b/src/mesa/main/api_validate.c
> @@ -918,6 +918,11 @@ check_valid_to_compute(struct gl_context *ctx, const 
> char *function)
>return false;
> }
>  
> +   /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
> +*
> +* "An INVALID_OPERATION error is generated if there is no active program
> +*  for the compute shader stage."
> +*/
> prog = ctx->_Shader->CurrentProgram[MESA_SHADER_COMPUTE];
> if (prog == NULL || prog->_LinkedShaders[MESA_SHADER_COMPUTE] == NULL) {
>_mesa_error(ctx, GL_INVALID_OPERATION,
> @@ -940,6 +945,24 @@ _mesa_validate_DispatchCompute(struct gl_context *ctx,
>return GL_FALSE;
>  
> for (i = 0; i < 3; i++) {
> +  /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
> +   *
> +   * "An INVALID_VALUE error is generated if any of num_groups_x,
> +   *  num_groups_y and num_groups_z are greater than or equal to the
> +   *  maximum work group count for the corresponding dimension."
> +   *
> +   * However, the "or equal to" portions appears to be a specification
> +   * bug. In all other areas, the specification appears to indicate that
> +   * the number of workgroups can match the MAX_COMPUTE_WORK_GROUP_COUNT
> +   * value. For example, under DispatchComputeIndirect:
> +   *
> +   * "If any of num_groups_x, num_groups_y or num_groups_z is greater 
> than
> +   *  the value of MAX_COMPUTE_WORK_GROUP_COUNT for the corresponding
> +   *  dimension then the results are undefined."
> +   *
> +   * Additionally, the OpenGLES 3.1 specification does not contain "or
> +   * equal to" as an error condition.

I did get confirmation from the extension spec author that this
appears to be a bug in the OpenGL 4.3 - 4.5 specs, and I filed an
OpenGL spec bug.

-Jordan

> +   */
>if (num_groups[i] > ctx->Const.MaxComputeWorkGroupCount[i]) {
>   _mesa_error(ctx, GL_INVALID_VALUE,
>   "glDispatchCompute(num_groups_%c)", 'x' + i);
> @@ -977,6 +1000,12 @@ valid_dispatch_indirect(struct gl_context *ctx,
>return GL_FALSE;
> }
>  
> +   /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
> +*
> +* "An INVALID_OPERATION error is generated if no buffer is bound to the
> +*  DRAW_INDIRECT_BUFFER binding, or if the command would source data
> +*  beyond the end of the buffer object."
> +*/
> if (!_mesa_is_bufferobj(ctx->DispatchIndirectBuffer)) {
>_mesa_error(ctx, GL_INVALID_OPERATION,
>"%s: no buffer bound to DISPATCH_INDIRECT_BUFFER", name);
> @@ -989,11 +1018,6 @@ valid_dispatch_indirect(struct gl_context *ctx,
>return GL_FALSE;
> }
>  
> -   /* From the ARB_compute_shader specification:
> -*
> -* "An INVALID_OPERATION error is generated if this command sources data
> -*  beyond the end of the buffer object [...]"
> -*/
> if (ctx->DispatchIndirectBuffer->Size < end) {
>_mesa_error(ctx, GL_INVALID_OPERATION,
>"%s(DISPATCH_INDIRECT_BUFFER too small)", name);
> -- 
> 2.6.2
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: buffer support in TGSI for SSBO/atomic

2015-11-02 Thread Ilia Mirkin
Another fun example to try to express properly in TGSI:

buffer foo {
  struct bar {
coherent int a;
int b;
  } asdf[10];
}

Now all of a sudden you have to worry about stride for the declarations.

  -ilia


On Mon, Nov 2, 2015 at 2:07 PM, Ilia Mirkin  wrote:
> I haven't the faintest idea about efficiently, but these things flags
> on the ld/st instructions in the nvidia ISA for SM20+ (and I just
> plain don't know about SM10). I'm moderately sure that's the case for
> GCN as well.
>
> The difficulty with TGSI is that you might have something like
>
> layout (std430) buffer foo {
>   coherent int a;
>   int b;
> }
>
> Now I don't remember if they get baked into the same vec4, but I think
> they do. If they don't, then ARB_enhanced_layouts will fix that right
> up. Since TGSI is vec4-oriented, it's really awkward to specify that
> sort of thing... how would you do it?
>
> DECL BUFFER[0][0].x COHERENT
> DECL BUFFER[0][0].y
>
> And then totally unrelated to the separate bits, you can end up with
>
> layout (std430) buffer foo {
>   int foo[5];
> }
>
> and I have no idea how to even express that in TGSI -- it'd want
> things to be aligned to 16 bytes, but it'll be packed tightly here.
> This worked OK for layout (std140), but won't work with more advanced
> layouts. This will be a problem for UBOs too -- perhaps we need to
> allow something like
>
> LOAD dst, CONST[1][0], offset
>
> to account for that. And lastly, ssbo allows for something like
>
> layout (std430) buffer foo {
>   int foo[];
> }
>
> And you can access foo[anything-you-want] -- difficult to declare that
> in TGSI. I could invent stuff for all of these situations, but it
> seems to be a lot easier to just feed the data to load and forget
> about it. That's how it's all encoded in the GLSL IR as well.
>
>   -ilia
>
>
> On Mon, Nov 2, 2015 at 1:56 PM, Roland Scheidegger  wrote:
>> I don't know much about ssbo, but since it looks like in glsl the
>> coherent etc. bits are on the variables, not the ops, it seems unnatural
>> to mark the op bits instead. So I'd guess it would be better if the
>> variables could be marked instead. If this isn't expressible in tgsi
>> maybe this needs to be fixed. Albeit I have to say it sounds odd to me
>> from a hw perspective if this variables with different bits can be
>> stuffed together and then the hw is expected to handle that efficiently...
>>
>> Roland
>>
>> Am 01.11.2015 um 23:45 schrieb Ilia Mirkin:
>>> Just wanted to note down some thoughts and get some feedback before
>>> going forward. I've already sent out a series which covered a lot of
>>> this, but in the end I realized it came up a bit short (available at
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_imirkin_mesa_commits_fd2=BQIGaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=ZEO6K764MpKKCTrBFReM7jS6WlerLtMTWbj_OABE6K8=yJ3Ee990VBHMVTEQzdXBcPDd1ioo-BizrAGpP4kU-Cg=
>>>  ).
>>>
>>> There are two separate buffer-related features --
>>> ARB_shader_atomic_counters(_ops) and
>>> ARB_shader_storage_buffer_objects. The former are implementable more
>>> efficiently on EG/NI hardware by performing the atomic ops on
>>> not-main-memory (GDS? LDS?). However I think that the gallium-side
>>> interface can be mostly identical for both cases, perhaps we can mark
>>> the buffer as atomic-only in the TGSI.
>>>
>>> Just like there is a CONST tgsi file, I want to add a BUFFER file,
>>> which will map to ->set_shader_buffers() indices. The tricky bit comes
>>> in from the fact that individual variables inside of a buffer may have
>>> different access/store properties. I see two ways to resolve this:
>>>
>>> 1. Declare each variable explicitly, much like UBO's still get
>>> individual decls per slot. These decls could contain the relevant
>>> caching property.
>>>
>>> 2. Make each LOAD/STORE op declare what caching it wants explicitly.
>>>
>>> The first option would work well for images, but for ssbo, it feels
>>> problematic, as with all the various packing options that exist, you
>>> could still specify odd per-variable cache rules, which would be
>>> difficult to express in the TGSI DECL. However I'm not sure how to
>>> implement the second option.
>>>
>>> There is a precedent of a saturate flag, but looking at
>>> tgsi_instruction, there are only 2 free bits. Since there are only 4
>>> different caching values (none, coherent, volatile, restrict; I'm not
>>> counting readonly/writeonly), this fits. However that would leave no
>>> more bits in tgsi_instruction. I could add a texture-style bit, saying
>>> to expect an additional tgsi_instruction_buffer packet with more info
>>> but that seems wasteful.
>>>
>>> Another option is to just pass an immediate directly to the LOAD/STORE
>>> ops which would specify this caching spec as an extra source. This
>>> seems much simpler, but a little dirtier. Opinions much appreciated.
>>>
>>> I think that one this is 

Re: [Mesa-dev] Unused (?) duplicated GLSL IR state in NIR

2015-11-02 Thread Jason Ekstrand
On Mon, Nov 2, 2015 at 9:33 AM, Connor Abbott  wrote:
> On Mon, Nov 2, 2015 at 8:35 AM, Emil Velikov  wrote:
>> Hi all,
>>
>> From a quick look, it seems that NIR copies (almost ?) all the state
>> from GLSL IR even if it doesn't use it.
>>
>> The particular piece that I'm thinking about is nir_variable::data.
>> Afaict this is a remnant from the early days, when the intent was to
>> kill off the GLSL IR and use NIR directly. If so should we just nuke
>> it, or (if there is someone working on it) add some comments "Keepme:
>> WIP work by XXX to use this and kill the glsl IR one".
>>
>> The (not that distant) GLSL IR memory optimisations by Ian, seems to
>> have missed NIR. Was that intentional or were those worked upon before
>> NIR got merged ? Perhaps it's worth porting them over ?
>>
>>
>> Thanks
>> Emil
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> Hi Emil,
>
> Indeed, nir_variable was copied from ir_variable before Ian's memory
> optimizations landed. Also, as you noticed, there are a number of
> fields only used by the GLSL linker, so they're unused until/unless we
> do linking in NIR. I wouldn't be opposed to nuking those, since we can
> always revert the commit if/when we get to that.

Agreed.  If you want to nuke some of them, feel free to send the
patch.  If you do, please CC me.  There are some things that we're
using for SPIR-V so I might request that you keep a field or two.
However, cleaning it up would definitely be a good idea.

One thing I will note though is that you should consider this a
cleanup and not an optimization.  The reason why Ian's stuff helped
GLSL IR so much is that it uses variables extensively.  The average
NIR shader only has about half a dozen variables by the time you get
done optimizing so it's not nearly as much of a problem.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 1/6] gallium: expose a debug message callback settable by context owner

2015-11-02 Thread Marek Olšák
> diff --git a/src/gallium/include/pipe/p_state.h 
> b/src/gallium/include/pipe/p_state.h
> index 4bf8d46..2843bb6 100644
> --- a/src/gallium/include/pipe/p_state.h
> +++ b/src/gallium/include/pipe/p_state.h
> @@ -684,6 +684,35 @@ struct pipe_compute_state
> unsigned req_input_mem; /**< Required size of the INPUT resource. */
>  };
>
> +/**
> + * Structure that contains a callback for debug messages from the driver back
> + * to the state tracker.
> + */
> +struct pipe_debug_info
> +{

I would prefer it if this structure were called pipe_debug_callback,
which is what it really is.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/10] i965: always mark used surfaces in the visitors

2015-11-02 Thread Iago Toral
Hi Curro,

On Fri, 2015-10-30 at 16:19 +0200, Francisco Jerez wrote:
> Iago Toral Quiroga  writes:
> 
> > Right now some opcodes that only use constant surface indexing mark them as
> > used in the generator while others do it in the visitor. When the opcode can
> > handle both direct and indirect surface indexing then some opcodes handle
> > only the constant part in the generator and leave the indirect case to the
> > caller. It is all very inconsistent and leads to confusion, since one has to
> > go and look into the generator code in each case to check if it marks 
> > surfaces
> > as used or not, and in which cases.
> >
> > when I was working on SSBOs I was tempted to try and fix this but then I
> > forgot. Jordan bumped into this recently too when comparing visitor
> > code paths for similar opcodes (ubos and ssbos) that need to handle this
> > differently because they use different generator opcodes.
> >
> > Since the generator opcodes never handle marking of indirect surfaces, just
> > leave surface marking to the caller completely, since callers always have
> > all the information needed for this. It also makes things more consistent
> > and clear for everyone: marking surfaces as used is always on the side
> > of the visitor, never the generator.
> >
> > No piglit regressions observed in my IVB laptop. Would be nice to have
> > someone giving this a try with Jenkins though, to make sure I did not miss
> > anything in paths specific to other gens.
> 
> Jenkins seems to be mostly happy about the series except for three
> apparent regressions:
> 
> piglit.spec.arb_fragment_layer_viewport.layer-gs-writes-in-range.bdwm64
> 
> piglit.spec.arb_shader_texture_lod.compiler.tex_grad-texture2dproj-2d-vec4.frag.g965m64
> 
> piglit.spec.glsl-es-1_00.compiler.structure-and-array-operations.sampler-array-index.frag.g965m64
> 
> The latter two die with a crash so you may be able to look into them
> even if you don't have the original i965 by using INTEL_DEVID_OVERRIDE. ;)

thanks for testing and all the review comments. I'll look into these!

Iago

> >
> > Iago Toral Quiroga (10):
> >   i965/fs: Do not mark direct used surfaces in
> > VARYING_PULL_CONSTANT_LOAD
> >   i965/fs: Do not mark used direct surfaces in
> > UNIFORM_PULL_CONSTANT_LOAD
> >   i965/fs: Do not mark used direct surfaces in the generator for texture
> > opcodes
> >   i965/vec4: Do not mark used direct surfaces in
> > VS_OPCODE_PULL_CONSTANT_LOAD
> >   i965/vec4: Do not mark used direct surfaces in the generator for
> > texture opcodes
> >   i965/vec4: Do not mark used surfaces in SHADER_OPCODE_SHADER_TIME_ADD
> >   i965/fs: Do not mark used surfaces in SHADER_OPCODE_SHADER_TIME_ADD
> >   i965/vec4: Do not mark used surfaces in VS_OPCODE_GET_BUFFER_SIZE
> >   i965/fs: Do not mark used surfaces in FS_OPCODE_GET_BUFFER_SIZE
> >   i965/fs: Do not mark used surfaces in
> > FS_OPCODE_FB_WRITE/FS_OPCODE_REP_FB_WRITE
> >
> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 12 ++-
> >  src/mesa/drivers/dri/i965/brw_fs.h   |  3 +-
> >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 31 -
> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 35 +--
> >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 24 -
> >  src/mesa/drivers/dri/i965/brw_vec4.cpp   |  3 ++
> >  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 19 --
> >  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp   | 44 
> > +++-
> >  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp   |  7 ++--
> >  9 files changed, 87 insertions(+), 91 deletions(-)
> >
> > -- 
> > 1.9.1
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: remove redundant function inout assignments

2015-11-02 Thread Lofstedt, Marta
Reviewed-by: Marta Lofstedt 


> -Original Message-
> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
> Behalf Of Timothy Arceri
> Sent: Sunday, November 1, 2015 9:33 AM
> To: mesa-dev@lists.freedesktop.org
> Subject: [Mesa-dev] [PATCH] glsl: remove redundant function inout
> assignments
> 
> Handles the case with function inout params where array elements do an
> assignment to themselves e.g.
> 
>  void array_mod(inout int b[2])
>  {
> b[0] = int(2);
> b[1] = b[1];
>  }
> 
>  Fixes assert in nir for:
>  ES31-CTS.arrays_of_arrays.InteractionFunctionCalls2
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=92588
> ---
>  src/glsl/opt_copy_propagation_elements.cpp | 46
> ++
>  1 file changed, 46 insertions(+)
> 
>  Piglit test: https://patchwork.freedesktop.org/patch/63355/
> 
> diff --git a/src/glsl/opt_copy_propagation_elements.cpp
> b/src/glsl/opt_copy_propagation_elements.cpp
> index 353a5c6..a62b625 100644
> --- a/src/glsl/opt_copy_propagation_elements.cpp
> +++ b/src/glsl/opt_copy_propagation_elements.cpp
> @@ -439,6 +439,8 @@ ir_copy_propagation_elements_visitor::kill(kill_entry
> *k)
>  /**
>   * Adds directly-copied channels between vector variables to the available
>   * copy propagation list.
> + *
> + * Also tidy up redundant function inout assignments while we are here.
>   */
>  void
>  ir_copy_propagation_elements_visitor::add_copy(ir_assignment *ir) @@ -
> 450,6 +452,50 @@
> ir_copy_propagation_elements_visitor::add_copy(ir_assignment *ir)
> if (ir->condition)
>return;
> 
> +   /* Handle a corner case with function inout params where array elements
> +* do an assignment to themselves e.g.
> +*
> +* void array_mod(inout int b[2])
> +* {
> +*b[0] = int(2);
> +*b[1] = b[1];
> +* }
> +*/
> +   ir_rvalue *rhs_array = ir->rhs;
> +   ir_rvalue *lhs_array = ir->lhs;
> +   if (lhs_array->as_dereference_array() &&
> +   rhs_array->as_dereference_array()) {
> +  /* Check arrays are indexed by a const and match otherwise return */
> +  while (rhs_array->as_dereference_array() &&
> + lhs_array->as_dereference_array()) {
> +
> + ir_dereference_array *rhs_deref_array =
> +rhs_array->as_dereference_array();
> + ir_dereference_array *lhs_deref_array =
> +lhs_array->as_dereference_array();
> +
> + ir_constant *lhs_ai_const =
> +lhs_deref_array->array_index->as_constant();
> + ir_constant *rhs_ai_const =
> +rhs_deref_array->array_index->as_constant();
> + if (lhs_ai_const == NULL || rhs_ai_const == NULL ||
> + lhs_ai_const->value.i[0] != rhs_ai_const->value.i[0])
> +return;
> + lhs_array = lhs_deref_array->array;
> + rhs_array = rhs_deref_array->array;
> +  }
> +
> +  ir_dereference_variable *lhs_var = lhs_array-
> >as_dereference_variable();
> +  ir_dereference_variable *rhs_var = rhs_array-
> >as_dereference_variable();
> +  if(lhs_var && rhs_var && lhs_var->var == rhs_var->var){
> +  /* Removing the assignment now would mess up the loop
> iteration
> +   * calling us.  Just flag it to not execute, and someone else
> +   * will clean up the mess.
> +   */
> +  ir->condition = new(ralloc_parent(ir)) ir_constant(false);
> +  }
> +   }
> +
> ir_dereference_variable *lhs = ir->lhs->as_dereference_variable();
> if (!lhs || !(lhs->type->is_scalar() || lhs->type->is_vector()))
>return;
> --
> 2.4.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/2] mesa: Update DispatchComputeIndirect errors for indirect parameter

2015-11-02 Thread Jordan Justen
There is some discrepancy between the return values for some error
cases for the DispatchComputeIndirect call in the ARB_compute_shader
specification. Regarding the indirect parameter, in one place the
extension spec lists that the error returned for invalid values should
be INVALID_OPERATION, while later it specifies INVALID_VALUE.

The OpenGL 4.3 and OpenGLES 3.1 specifications appear to be consistent
in requiring the INVALID_VALUE error return in this case.

Here we update the code to match the main specifications, and update
the citations use the main specification rather than the extension
specification.

v2:
 * Updates based on review from Iago

Signed-off-by: Jordan Justen 
Cc: Iago Toral Quiroga 
Cc: Marta Lofstedt 
---
 src/mesa/main/api_validate.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index 1f729e8..a3ee8c0 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -960,20 +960,19 @@ valid_dispatch_indirect(struct gl_context *ctx,
if (!check_valid_to_compute(ctx, name))
   return GL_FALSE;
 
-   /* From the ARB_compute_shader specification:
+   /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
 *
-* "An INVALID_OPERATION error is generated [...] if  is less
-*  than zero or not a multiple of the size, in basic machine units, of
-*  uint."
+* "An INVALID_VALUE error is generated if indirect is negative or is not a
+*  multiple of four."
 */
if ((GLintptr)indirect & (sizeof(GLuint) - 1)) {
-  _mesa_error(ctx, GL_INVALID_OPERATION,
+  _mesa_error(ctx, GL_INVALID_VALUE,
   "%s(indirect is not aligned)", name);
   return GL_FALSE;
}
 
if ((GLintptr)indirect < 0) {
-  _mesa_error(ctx, GL_INVALID_OPERATION,
+  _mesa_error(ctx, GL_INVALID_VALUE,
   "%s(indirect is less than zero)", name);
   return GL_FALSE;
}
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/2] mesa: Add spec citations for DispatchCompute*

2015-11-02 Thread Jordan Justen
Note: The OpenGL 4.3 - 4.5 specification language for DispatchCompute
appears to have an error regarding the max allowed values. When adding
the specification citation, we note why the code does not match the
specification language.

v2:
 * Updates based on review from Iago

Signed-off-by: Jordan Justen 
Cc: Iago Toral Quiroga 
Cc: Marta Lofstedt 
---
 src/mesa/main/api_validate.c | 34 +-
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index a3ee8c0..a490189 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -918,6 +918,11 @@ check_valid_to_compute(struct gl_context *ctx, const char 
*function)
   return false;
}
 
+   /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
+*
+* "An INVALID_OPERATION error is generated if there is no active program
+*  for the compute shader stage."
+*/
prog = ctx->_Shader->CurrentProgram[MESA_SHADER_COMPUTE];
if (prog == NULL || prog->_LinkedShaders[MESA_SHADER_COMPUTE] == NULL) {
   _mesa_error(ctx, GL_INVALID_OPERATION,
@@ -940,6 +945,24 @@ _mesa_validate_DispatchCompute(struct gl_context *ctx,
   return GL_FALSE;
 
for (i = 0; i < 3; i++) {
+  /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
+   *
+   * "An INVALID_VALUE error is generated if any of num_groups_x,
+   *  num_groups_y and num_groups_z are greater than or equal to the
+   *  maximum work group count for the corresponding dimension."
+   *
+   * However, the "or equal to" portions appears to be a specification
+   * bug. In all other areas, the specification appears to indicate that
+   * the number of workgroups can match the MAX_COMPUTE_WORK_GROUP_COUNT
+   * value. For example, under DispatchComputeIndirect:
+   *
+   * "If any of num_groups_x, num_groups_y or num_groups_z is greater than
+   *  the value of MAX_COMPUTE_WORK_GROUP_COUNT for the corresponding
+   *  dimension then the results are undefined."
+   *
+   * Additionally, the OpenGLES 3.1 specification does not contain "or
+   * equal to" as an error condition.
+   */
   if (num_groups[i] > ctx->Const.MaxComputeWorkGroupCount[i]) {
  _mesa_error(ctx, GL_INVALID_VALUE,
  "glDispatchCompute(num_groups_%c)", 'x' + i);
@@ -977,6 +1000,12 @@ valid_dispatch_indirect(struct gl_context *ctx,
   return GL_FALSE;
}
 
+   /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
+*
+* "An INVALID_OPERATION error is generated if no buffer is bound to the
+*  DRAW_INDIRECT_BUFFER binding, or if the command would source data
+*  beyond the end of the buffer object."
+*/
if (!_mesa_is_bufferobj(ctx->DispatchIndirectBuffer)) {
   _mesa_error(ctx, GL_INVALID_OPERATION,
   "%s: no buffer bound to DISPATCH_INDIRECT_BUFFER", name);
@@ -989,11 +1018,6 @@ valid_dispatch_indirect(struct gl_context *ctx,
   return GL_FALSE;
}
 
-   /* From the ARB_compute_shader specification:
-*
-* "An INVALID_OPERATION error is generated if this command sources data
-*  beyond the end of the buffer object [...]"
-*/
if (ctx->DispatchIndirectBuffer->Size < end) {
   _mesa_error(ctx, GL_INVALID_OPERATION,
   "%s(DISPATCH_INDIRECT_BUFFER too small)", name);
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/7] egl/x11: Implement dri3 support with loader's dri3 helper

2015-11-02 Thread Martin Peres

On 30/10/15 18:17, Matt Turner wrote:

On Fri, Oct 30, 2015 at 9:03 AM, Martin Peres
 wrote:

diff --git a/src/egl/Makefile.am b/src/egl/Makefile.am
index 5c2ba30..cdf2216 100644
--- a/src/egl/Makefile.am
+++ b/src/egl/Makefile.am
@@ -47,12 +47,18 @@ libEGL_la_LDFLAGS = \
 $(LD_NO_UNDEFINED)

  dri2_backend_FILES =
+dri3_backend_FILES =

  if HAVE_EGL_PLATFORM_X11
  AM_CFLAGS += -DHAVE_X11_PLATFORM
  AM_CFLAGS += $(XCB_DRI2_CFLAGS)
  libEGL_la_LIBADD += $(XCB_DRI2_LIBS)
  dri2_backend_FILES += drivers/dri2/platform_x11.c
+
+if HAVE_DRI3
+dri3_backend_FILES += \
+   drivers/dri2/platform_x11_dri3.c

Only one tab needed to indent.


Thanks, fixed!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: remove redundant function inout assignments

2015-11-02 Thread Ilia Mirkin
On Sun, Nov 1, 2015 at 3:33 AM, Timothy Arceri  wrote:
> Handles the case with function inout params where array elements
> do an assignment to themselves e.g.
>
>  void array_mod(inout int b[2])
>  {
> b[0] = int(2);
> b[1] = b[1];
>  }
>
>  Fixes assert in nir for:
>  ES31-CTS.arrays_of_arrays.InteractionFunctionCalls2
>
> https://bugs.freedesktop.org/show_bug.cgi?id=92588
> ---
>  src/glsl/opt_copy_propagation_elements.cpp | 46 
> ++
>  1 file changed, 46 insertions(+)
>
>  Piglit test: https://patchwork.freedesktop.org/patch/63355/
>
> diff --git a/src/glsl/opt_copy_propagation_elements.cpp 
> b/src/glsl/opt_copy_propagation_elements.cpp
> index 353a5c6..a62b625 100644
> --- a/src/glsl/opt_copy_propagation_elements.cpp
> +++ b/src/glsl/opt_copy_propagation_elements.cpp
> @@ -439,6 +439,8 @@ ir_copy_propagation_elements_visitor::kill(kill_entry *k)
>  /**
>   * Adds directly-copied channels between vector variables to the available
>   * copy propagation list.
> + *
> + * Also tidy up redundant function inout assignments while we are here.
>   */
>  void
>  ir_copy_propagation_elements_visitor::add_copy(ir_assignment *ir)
> @@ -450,6 +452,50 @@ 
> ir_copy_propagation_elements_visitor::add_copy(ir_assignment *ir)
> if (ir->condition)
>return;
>
> +   /* Handle a corner case with function inout params where array elements
> +* do an assignment to themselves e.g.
> +*
> +* void array_mod(inout int b[2])
> +* {
> +*b[0] = int(2);
> +*b[1] = b[1];
> +* }

What if you have like

array_mod(inout vec2 b[2]) {
  b[1] = b[1].xy;
or
  b[1] = b[1].yx;
}

IOW, why is this case so special?

> +*/
> +   ir_rvalue *rhs_array = ir->rhs;
> +   ir_rvalue *lhs_array = ir->lhs;
> +   if (lhs_array->as_dereference_array() &&
> +   rhs_array->as_dereference_array()) {
> +  /* Check arrays are indexed by a const and match otherwise return */
> +  while (rhs_array->as_dereference_array() &&
> + lhs_array->as_dereference_array()) {
> +
> + ir_dereference_array *rhs_deref_array =
> +rhs_array->as_dereference_array();
> + ir_dereference_array *lhs_deref_array =
> +lhs_array->as_dereference_array();
> +
> + ir_constant *lhs_ai_const =
> +lhs_deref_array->array_index->as_constant();
> + ir_constant *rhs_ai_const =
> +rhs_deref_array->array_index->as_constant();
> + if (lhs_ai_const == NULL || rhs_ai_const == NULL ||
> + lhs_ai_const->value.i[0] != rhs_ai_const->value.i[0])
> +return;
> + lhs_array = lhs_deref_array->array;
> + rhs_array = rhs_deref_array->array;
> +  }
> +
> +  ir_dereference_variable *lhs_var = 
> lhs_array->as_dereference_variable();
> +  ir_dereference_variable *rhs_var = 
> rhs_array->as_dereference_variable();
> +  if(lhs_var && rhs_var && lhs_var->var == rhs_var->var){

spaces please

> +/* Removing the assignment now would mess up the loop iteration
> + * calling us.  Just flag it to not execute, and someone else
> + * will clean up the mess.
> + */
> +ir->condition = new(ralloc_parent(ir)) ir_constant(false);

I thought that the allocator was smart and you could just do new (ir)
ir_constant(false). e.g.

src/glsl/lower_jumps.cpp:new (ir) ir_constant(true)));

> +  }
> +   }
> +
> ir_dereference_variable *lhs = ir->lhs->as_dereference_variable();
> if (!lhs || !(lhs->type->is_scalar() || lhs->type->is_vector()))
>return;
> --
> 2.4.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 2/2] mesa: Add spec citations for DispatchCompute*

2015-11-02 Thread Lofstedt, Marta
Reviewed-by: Marta Lofstedt 


> -Original Message-
> From: Justen, Jordan L
> Sent: Monday, November 2, 2015 9:36 AM
> To: mesa-dev@lists.freedesktop.org
> Cc: Justen, Jordan L; Iago Toral Quiroga; Lofstedt, Marta
> Subject: [PATCH v2 2/2] mesa: Add spec citations for DispatchCompute*
> 
> Note: The OpenGL 4.3 - 4.5 specification language for DispatchCompute
> appears to have an error regarding the max allowed values. When adding the
> specification citation, we note why the code does not match the specification
> language.
> 
> v2:
>  * Updates based on review from Iago
> 
> Signed-off-by: Jordan Justen 
> Cc: Iago Toral Quiroga 
> Cc: Marta Lofstedt 
> ---
>  src/mesa/main/api_validate.c | 34 +
> -
>  1 file changed, 29 insertions(+), 5 deletions(-)
> 
> diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
> index a3ee8c0..a490189 100644
> --- a/src/mesa/main/api_validate.c
> +++ b/src/mesa/main/api_validate.c
> @@ -918,6 +918,11 @@ check_valid_to_compute(struct gl_context *ctx,
> const char *function)
>return false;
> }
> 
> +   /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
> +*
> +* "An INVALID_OPERATION error is generated if there is no active
> program
> +*  for the compute shader stage."
> +*/
> prog = ctx->_Shader->CurrentProgram[MESA_SHADER_COMPUTE];
> if (prog == NULL || prog->_LinkedShaders[MESA_SHADER_COMPUTE] ==
> NULL) {
>_mesa_error(ctx, GL_INVALID_OPERATION, @@ -940,6 +945,24 @@
> _mesa_validate_DispatchCompute(struct gl_context *ctx,
>return GL_FALSE;
> 
> for (i = 0; i < 3; i++) {
> +  /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute
> Shaders:
> +   *
> +   * "An INVALID_VALUE error is generated if any of num_groups_x,
> +   *  num_groups_y and num_groups_z are greater than or equal to the
> +   *  maximum work group count for the corresponding dimension."
> +   *
> +   * However, the "or equal to" portions appears to be a specification
> +   * bug. In all other areas, the specification appears to indicate that
> +   * the number of workgroups can match the
> MAX_COMPUTE_WORK_GROUP_COUNT
> +   * value. For example, under DispatchComputeIndirect:
> +   *
> +   * "If any of num_groups_x, num_groups_y or num_groups_z is greater
> than
> +   *  the value of MAX_COMPUTE_WORK_GROUP_COUNT for the
> corresponding
> +   *  dimension then the results are undefined."
> +   *
> +   * Additionally, the OpenGLES 3.1 specification does not contain "or
> +   * equal to" as an error condition.
> +   */
>if (num_groups[i] > ctx->Const.MaxComputeWorkGroupCount[i]) {
>   _mesa_error(ctx, GL_INVALID_VALUE,
>   "glDispatchCompute(num_groups_%c)", 'x' + i); @@ -977,6
> +1000,12 @@ valid_dispatch_indirect(struct gl_context *ctx,
>return GL_FALSE;
> }
> 
> +   /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
> +*
> +* "An INVALID_OPERATION error is generated if no buffer is bound to the
> +*  DRAW_INDIRECT_BUFFER binding, or if the command would source
> data
> +*  beyond the end of the buffer object."
> +*/
> if (!_mesa_is_bufferobj(ctx->DispatchIndirectBuffer)) {
>_mesa_error(ctx, GL_INVALID_OPERATION,
>"%s: no buffer bound to DISPATCH_INDIRECT_BUFFER", name);
> @@ -989,11 +1018,6 @@ valid_dispatch_indirect(struct gl_context *ctx,
>return GL_FALSE;
> }
> 
> -   /* From the ARB_compute_shader specification:
> -*
> -* "An INVALID_OPERATION error is generated if this command sources
> data
> -*  beyond the end of the buffer object [...]"
> -*/
> if (ctx->DispatchIndirectBuffer->Size < end) {
>_mesa_error(ctx, GL_INVALID_OPERATION,
>"%s(DISPATCH_INDIRECT_BUFFER too small)", name);
> --
> 2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/2] mesa: Update DispatchComputeIndirect errors for indirect parameter

2015-11-02 Thread Lofstedt, Marta
Reviewed-by: Marta Lofstedt 


> -Original Message-
> From: Justen, Jordan L
> Sent: Monday, November 2, 2015 9:36 AM
> To: mesa-dev@lists.freedesktop.org
> Cc: Justen, Jordan L; Iago Toral Quiroga; Lofstedt, Marta
> Subject: [PATCH v2 1/2] mesa: Update DispatchComputeIndirect errors for
> indirect parameter
> 
> There is some discrepancy between the return values for some error cases
> for the DispatchComputeIndirect call in the ARB_compute_shader
> specification. Regarding the indirect parameter, in one place the extension
> spec lists that the error returned for invalid values should be
> INVALID_OPERATION, while later it specifies INVALID_VALUE.
> 
> The OpenGL 4.3 and OpenGLES 3.1 specifications appear to be consistent in
> requiring the INVALID_VALUE error return in this case.
> 
> Here we update the code to match the main specifications, and update the
> citations use the main specification rather than the extension specification.
> 
> v2:
>  * Updates based on review from Iago
> 
> Signed-off-by: Jordan Justen 
> Cc: Iago Toral Quiroga 
> Cc: Marta Lofstedt 
> ---
>  src/mesa/main/api_validate.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
> index 1f729e8..a3ee8c0 100644
> --- a/src/mesa/main/api_validate.c
> +++ b/src/mesa/main/api_validate.c
> @@ -960,20 +960,19 @@ valid_dispatch_indirect(struct gl_context *ctx,
> if (!check_valid_to_compute(ctx, name))
>return GL_FALSE;
> 
> -   /* From the ARB_compute_shader specification:
> +   /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
>  *
> -* "An INVALID_OPERATION error is generated [...] if  is less
> -*  than zero or not a multiple of the size, in basic machine units, of
> -*  uint."
> +* "An INVALID_VALUE error is generated if indirect is negative or is not 
> a
> +*  multiple of four."
>  */
> if ((GLintptr)indirect & (sizeof(GLuint) - 1)) {
> -  _mesa_error(ctx, GL_INVALID_OPERATION,
> +  _mesa_error(ctx, GL_INVALID_VALUE,
>"%s(indirect is not aligned)", name);
>return GL_FALSE;
> }
> 
> if ((GLintptr)indirect < 0) {
> -  _mesa_error(ctx, GL_INVALID_OPERATION,
> +  _mesa_error(ctx, GL_INVALID_VALUE,
>"%s(indirect is less than zero)", name);
>return GL_FALSE;
> }
> --
> 2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Problems with accuracy of coeffs_init + attribs_update

2015-11-02 Thread Roland Scheidegger
Am 02.11.2015 um 14:46 schrieb Oded Gabbay:
> On Thu, Oct 29, 2015 at 9:37 PM, Roland Scheidegger  
> wrote:
>> Am 29.10.2015 um 19:44 schrieb Oded Gabbay:
>>> On Thu, Oct 29, 2015 at 5:31 PM, Roland Scheidegger  
>>> wrote:
 Am 29.10.2015 um 12:27 schrieb Oded Gabbay:
> Hi Roland, Jose
>
> I wanted to bring a problem I found to your attention, and discuss
> about it and ways to solve it.
>
> I'm working on regressions of piglit gpu.py between x86-64 and
> ppc64le, when running with llvmpipe.
>
> One of the regressions manifests itself in 2 tests,
> clip-distance-bulk-copy and clip-distance-itemized-copy
 There's actually a third one, interface-vs-unnamed-to-fs-unnamed, which
 fails the same (and it's easier to debug...).

>>> Yes, you are correct of course. It was also fixed by my change.
>>>

>
> What happens in ppc64le is that *some* of the interpolated per-vertex
> values (clip-distance values) that constitute the input into the
> fragment shader, are not calculated according to the required accuracy
> of the test (which is an error/distance of less than 1e-6)
>
> It took a while, but I believe I found the reason. In general, when
> running on ppc64le, the code path that is taken at
> lp_build_interp_soa_init() is doing  bld->simple_interp = FALSE, which
> means using coeffs_init() + attribs_update(). That's in contrast with
> x86-64, where bld->simple_interp = TRUE, which means the code will use
> coeffs_init_simple() + attribs_update_simple()
 Note this isn't really about x86-64 per se. If you don't have AVX on a
 x86-64 box, the other path will be taken too (and the test fail as well).

>>> makes sense.
>>>
>
> In specific, the problem lies in how the coeffs are calculated inside
> coeffs_init. To simply put it, they are only *actually* calculated
> correctly for the first quad. The coeffs are later *updated* for the
> other quads, using dadx/dady and dadq.
>
> --
> side-note:
> I believe you even documented it in coeffs_init():
>   /*
>* a = a0 + (x * dadx + y * dady)
>* a0aos is the attrib value at top left corner of stamp
>*/
> --
>
> The root cause for that, I believe, is because coeffs_init does *not*
> take into account the different x/y offsets for the other quads. In
> contrast, on x86-64, the code uses a function called calc_offset(),
> which does take the different x/y offsets into account.
>
> Please look at this definition (from lp_bld_interp.c) :
>
> static const unsigned char quad_offset_x[16] = {0, 1, 0, 1, 2, 3, 2,
> 3, 0, 1, 0, 1, 2, 3, 2, 3};
> static const unsigned char quad_offset_y[16] = {0, 0, 1, 1, 0, 0, 1,
> 1, 2, 2, 3, 3, 2, 2, 3, 3};
>
> Now, look at how coeffs_init() uses them:
>
>for (i = 0; i < coeff_bld->type.length; i++) {
>   LLVMValueRef nr = lp_build_const_int32(gallivm, i);
>   LLVMValueRef pixxf = lp_build_const_float(gallivm, 
> quad_offset_x[i]);
>   LLVMValueRef pixyf = lp_build_const_float(gallivm, 
> quad_offset_y[i]);
>   pixoffx = LLVMBuildInsertElement(builder, pixoffx, pixxf, nr, "");
>   pixoffy = LLVMBuildInsertElement(builder, pixoffy, pixyf, nr, "");
>}
>
> So, in ppc64le, where coeff_bld->type.length == 4, we *always* take
> the first four values from the above arrays. The code *never*
> calculates the coeffs based on the other offsets.
>
> Admittedly, that's *almost* always works and I doubt you will see a
> difference in display. However, in the tests I mentioned above, it
> creates, for some pixels, a bigger error than what is allowed.
 Well, I'm not sure what error actually is allowed... GL of course is
 vague (the test just seems to be built so the error allowed actually
 passes on real hw).

 I don't think there's an actual bug with the code. What it does in
 coeffs_init, is simply calculate the dadx + dady values from one pixel
 to the next (within a quad).
 Then, in attrib_update, it simply uses these values (which are constant
 for all quads, as the position of the pixels within a quad doesn't
 change) to calculate the actual dadq value based on the actual pixel
 offsets, and add that to the (also done in coeffs_init) starting value
 of each quad.

 So, the code is correct. However, it still gives different results
 compared to the other method, and the reason is just float math. This
 method does two interpolations (one for the start of the quad, the other
 per pixel), whereas the other method does just one. And this is why the
 results are different, because the results just aren't exact.

>>> Yeah, I agree. I also wouldn't say it is a bug per-se, but it is a
>>> 

Re: [Mesa-dev] [PATCH] nir: Silence GCC maybe-uninitialized warnings.

2015-11-02 Thread Connor Abbott
Reviewed-by: Connor Abbott 

On Mon, Nov 2, 2015 at 4:30 AM, Vinson Lee  wrote:
> nir/nir_control_flow.c: In function ‘split_block_cursor.isra.11’:
> nir/nir_control_flow.c:460:15: warning: ‘after’ may be used uninitialized in 
> this function [-Wmaybe-uninitialized]
>*_after = after;
>^
> nir/nir_control_flow.c:458:16: warning: ‘before’ may be used uninitialized in 
> this function [-Wmaybe-uninitialized]
>*_before = before;
> ^
>
> Signed-off-by: Vinson Lee 
> ---
>  src/glsl/nir/nir_control_flow.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c
> index 7f51c4f..96395a4 100644
> --- a/src/glsl/nir/nir_control_flow.c
> +++ b/src/glsl/nir/nir_control_flow.c
> @@ -452,6 +452,9 @@ split_block_cursor(nir_cursor cursor,
>   before = split_block_before_instr(nir_instr_next(cursor.instr));
>}
>break;
> +
> +   default:
> +  unreachable("not reached");
> }
>
> if (_before)
> --
> 2.6.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Unused (?) duplicated GLSL IR state in NIR

2015-11-02 Thread Connor Abbott
On Mon, Nov 2, 2015 at 8:35 AM, Emil Velikov  wrote:
> Hi all,
>
> From a quick look, it seems that NIR copies (almost ?) all the state
> from GLSL IR even if it doesn't use it.
>
> The particular piece that I'm thinking about is nir_variable::data.
> Afaict this is a remnant from the early days, when the intent was to
> kill off the GLSL IR and use NIR directly. If so should we just nuke
> it, or (if there is someone working on it) add some comments "Keepme:
> WIP work by XXX to use this and kill the glsl IR one".
>
> The (not that distant) GLSL IR memory optimisations by Ian, seems to
> have missed NIR. Was that intentional or were those worked upon before
> NIR got merged ? Perhaps it's worth porting them over ?
>
>
> Thanks
> Emil
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Hi Emil,

Indeed, nir_variable was copied from ir_variable before Ian's memory
optimizations landed. Also, as you noticed, there are a number of
fields only used by the GLSL linker, so they're unused until/unless we
do linking in NIR. I wouldn't be opposed to nuking those, since we can
always revert the commit if/when we get to that.

Connor
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: buffer support in TGSI for SSBO/atomic

2015-11-02 Thread Roland Scheidegger
Am 02.11.2015 um 20:55 schrieb Ilia Mirkin:
> FTR these are the various operators on nvidia hw:
> 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.nvidia.com_cuda_parallel-2Dthread-2Dexecution_-23cache-2Doperators=BQIFaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=AffBQlVu5ht00Ignz67m4YHn6ePeNQFrDUYljvo28Vc=m5WnnOkcD2MS1JiJjK4XTWgUXjyEbZWhWSjq_V_O6oA=
>  
> 
> Most of these map directly to instruction things (ca/cg/cs/cv sound
> familiar, dunno about lu, could just be an assembler helper).
Ah I see, that's how it works. Makes sense I guess, though I guess there
could be some slight inefficiences if data is packed "strangely"
(because global coherent write will evict l1 cache lines, thus for
instance some non-coherent access to a different variable but same cache
line would have to re-fetch that from l2), but probably that's not too
bad...


> 
> How backwards-compatible is TGSI supposed to be? Can we change the
> encoding willy-nilly, or are there separate systems that talk to each
> other using TGSI that would need coordination?
I think it's not really different than the rest of gallium, so not
actually considered stable. Obviously it was written to be quite
extensible, but I don't think there's really anything preventing us from
changing it in binary-incompatible ways, IFF there's a really good
reason for it. If some driver relies on binary compatibility for tgsi
shaders (let's say for recognizing specific shaders to be able to do
shader replacements) it will need to be adapted to such changes (and I
know of some code which does that...). So reducing field width just
because you could do with less is not really a good idea, but doing it
because you actually need the now free bits should be ok.

Roland


> 
>   -ilia
> 
> On Mon, Nov 2, 2015 at 2:49 PM, Roland Scheidegger  wrote:
>> Ok, I guess if it's really flagged on the instructions in hw, it seems
>> reasonable to do it on the instructions in tgsi as well.
>> Using the last two bits there doesn't sound nice indeed (in particular
>> if maybe you'd wanted to encode the read/write bits as well at some
>> point too), but it's not THAT bad I think. We can scrap some bits later
>> if needed from it (token type is 4 bits but never larger than 3, NumSrcs
>> could easily do with 3 instead of 4 bits too and at some point the
>> predicate bit can go too). Albeit an extra token might be a good option
>> too (if you decided to add those r/w bits...)
>>
>> Though I still don't quite understand how gpus can do that efficiently
>> if you can do different flags with data which might be in the same cache
>> line. But maybe it's less of a problem than I thought...
>>
>> Roland
>>
>>
>> Am 02.11.2015 um 20:07 schrieb Ilia Mirkin:
>>> I haven't the faintest idea about efficiently, but these things flags
>>> on the ld/st instructions in the nvidia ISA for SM20+ (and I just
>>> plain don't know about SM10). I'm moderately sure that's the case for
>>> GCN as well.
>>>
>>> The difficulty with TGSI is that you might have something like
>>>
>>> layout (std430) buffer foo {
>>>   coherent int a;
>>>   int b;
>>> }
>>>
>>> Now I don't remember if they get baked into the same vec4, but I think
>>> they do. If they don't, then ARB_enhanced_layouts will fix that right
>>> up. Since TGSI is vec4-oriented, it's really awkward to specify that
>>> sort of thing... how would you do it?
>>>
>>> DECL BUFFER[0][0].x COHERENT
>>> DECL BUFFER[0][0].y
>>>
>>> And then totally unrelated to the separate bits, you can end up with
>>>
>>> layout (std430) buffer foo {
>>>   int foo[5];
>>> }
>>>
>>> and I have no idea how to even express that in TGSI -- it'd want
>>> things to be aligned to 16 bytes, but it'll be packed tightly here.
>>> This worked OK for layout (std140), but won't work with more advanced
>>> layouts. This will be a problem for UBOs too -- perhaps we need to
>>> allow something like
>>>
>>> LOAD dst, CONST[1][0], offset
>>>
>>> to account for that. And lastly, ssbo allows for something like
>>>
>>> layout (std430) buffer foo {
>>>   int foo[];
>>> }
>>>
>>> And you can access foo[anything-you-want] -- difficult to declare that
>>> in TGSI. I could invent stuff for all of these situations, but it
>>> seems to be a lot easier to just feed the data to load and forget
>>> about it. That's how it's all encoded in the GLSL IR as well.
>>>
>>>   -ilia
>>>
>>>
>>> On Mon, Nov 2, 2015 at 1:56 PM, Roland Scheidegger  
>>> wrote:
 I don't know much about ssbo, but since it looks like in glsl the
 coherent etc. bits are on the variables, not the ops, it seems unnatural
 to mark the op bits instead. So I'd guess it would be better if the
 variables could be marked instead. If this isn't expressible in tgsi
 maybe this needs to be fixed. Albeit I have to say it sounds odd to me
 from a hw perspective if this variables with different bits can be
 stuffed together and then 

Re: [Mesa-dev] RFC: buffer support in TGSI for SSBO/atomic

2015-11-02 Thread Dave Airlie
On 3 November 2015 at 07:31, Roland Scheidegger  wrote:
> Am 02.11.2015 um 20:55 schrieb Ilia Mirkin:
>> FTR these are the various operators on nvidia hw:
>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.nvidia.com_cuda_parallel-2Dthread-2Dexecution_-23cache-2Doperators=BQIFaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=AffBQlVu5ht00Ignz67m4YHn6ePeNQFrDUYljvo28Vc=m5WnnOkcD2MS1JiJjK4XTWgUXjyEbZWhWSjq_V_O6oA=
>>
>> Most of these map directly to instruction things (ca/cg/cs/cv sound
>> familiar, dunno about lu, could just be an assembler helper).
> Ah I see, that's how it works. Makes sense I guess, though I guess there
> could be some slight inefficiences if data is packed "strangely"
> (because global coherent write will evict l1 cache lines, thus for
> instance some non-coherent access to a different variable but same cache
> line would have to re-fetch that from l2), but probably that's not too
> bad...
>
>
>>
>> How backwards-compatible is TGSI supposed to be? Can we change the
>> encoding willy-nilly, or are there separate systems that talk to each
>> other using TGSI that would need coordination?
> I think it's not really different than the rest of gallium, so not
> actually considered stable. Obviously it was written to be quite
> extensible, but I don't think there's really anything preventing us from
> changing it in binary-incompatible ways, IFF there's a really good
> reason for it. If some driver relies on binary compatibility for tgsi
> shaders (let's say for recognizing specific shaders to be able to do
> shader replacements) it will need to be adapted to such changes (and I
> know of some code which does that...). So reducing field width just
> because you could do with less is not really a good idea, but doing it
> because you actually need the now free bits should be ok.

For virgl I'd like the txt encoding's not to radically be redone, but the
binary I'm fine with.

Even if the txt does for virgl I can rewrite things anyways.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Problems with accuracy of coeffs_init + attribs_update

2015-11-02 Thread Oded Gabbay
On Mon, Nov 2, 2015 at 7:08 PM, Roland Scheidegger  wrote:
> Am 02.11.2015 um 14:46 schrieb Oded Gabbay:
>> On Thu, Oct 29, 2015 at 9:37 PM, Roland Scheidegger  
>> wrote:
>>> Am 29.10.2015 um 19:44 schrieb Oded Gabbay:
 On Thu, Oct 29, 2015 at 5:31 PM, Roland Scheidegger  
 wrote:
> Am 29.10.2015 um 12:27 schrieb Oded Gabbay:
>> Hi Roland, Jose
>>
>> I wanted to bring a problem I found to your attention, and discuss
>> about it and ways to solve it.
>>
>> I'm working on regressions of piglit gpu.py between x86-64 and
>> ppc64le, when running with llvmpipe.
>>
>> One of the regressions manifests itself in 2 tests,
>> clip-distance-bulk-copy and clip-distance-itemized-copy
> There's actually a third one, interface-vs-unnamed-to-fs-unnamed, which
> fails the same (and it's easier to debug...).
>
 Yes, you are correct of course. It was also fixed by my change.

>
>>
>> What happens in ppc64le is that *some* of the interpolated per-vertex
>> values (clip-distance values) that constitute the input into the
>> fragment shader, are not calculated according to the required accuracy
>> of the test (which is an error/distance of less than 1e-6)
>>
>> It took a while, but I believe I found the reason. In general, when
>> running on ppc64le, the code path that is taken at
>> lp_build_interp_soa_init() is doing  bld->simple_interp = FALSE, which
>> means using coeffs_init() + attribs_update(). That's in contrast with
>> x86-64, where bld->simple_interp = TRUE, which means the code will use
>> coeffs_init_simple() + attribs_update_simple()
> Note this isn't really about x86-64 per se. If you don't have AVX on a
> x86-64 box, the other path will be taken too (and the test fail as well).
>
 makes sense.

>>
>> In specific, the problem lies in how the coeffs are calculated inside
>> coeffs_init. To simply put it, they are only *actually* calculated
>> correctly for the first quad. The coeffs are later *updated* for the
>> other quads, using dadx/dady and dadq.
>>
>> --
>> side-note:
>> I believe you even documented it in coeffs_init():
>>   /*
>>* a = a0 + (x * dadx + y * dady)
>>* a0aos is the attrib value at top left corner of stamp
>>*/
>> --
>>
>> The root cause for that, I believe, is because coeffs_init does *not*
>> take into account the different x/y offsets for the other quads. In
>> contrast, on x86-64, the code uses a function called calc_offset(),
>> which does take the different x/y offsets into account.
>>
>> Please look at this definition (from lp_bld_interp.c) :
>>
>> static const unsigned char quad_offset_x[16] = {0, 1, 0, 1, 2, 3, 2,
>> 3, 0, 1, 0, 1, 2, 3, 2, 3};
>> static const unsigned char quad_offset_y[16] = {0, 0, 1, 1, 0, 0, 1,
>> 1, 2, 2, 3, 3, 2, 2, 3, 3};
>>
>> Now, look at how coeffs_init() uses them:
>>
>>for (i = 0; i < coeff_bld->type.length; i++) {
>>   LLVMValueRef nr = lp_build_const_int32(gallivm, i);
>>   LLVMValueRef pixxf = lp_build_const_float(gallivm, 
>> quad_offset_x[i]);
>>   LLVMValueRef pixyf = lp_build_const_float(gallivm, 
>> quad_offset_y[i]);
>>   pixoffx = LLVMBuildInsertElement(builder, pixoffx, pixxf, nr, "");
>>   pixoffy = LLVMBuildInsertElement(builder, pixoffy, pixyf, nr, "");
>>}
>>
>> So, in ppc64le, where coeff_bld->type.length == 4, we *always* take
>> the first four values from the above arrays. The code *never*
>> calculates the coeffs based on the other offsets.
>>
>> Admittedly, that's *almost* always works and I doubt you will see a
>> difference in display. However, in the tests I mentioned above, it
>> creates, for some pixels, a bigger error than what is allowed.
> Well, I'm not sure what error actually is allowed... GL of course is
> vague (the test just seems to be built so the error allowed actually
> passes on real hw).
>
> I don't think there's an actual bug with the code. What it does in
> coeffs_init, is simply calculate the dadx + dady values from one pixel
> to the next (within a quad).
> Then, in attrib_update, it simply uses these values (which are constant
> for all quads, as the position of the pixels within a quad doesn't
> change) to calculate the actual dadq value based on the actual pixel
> offsets, and add that to the (also done in coeffs_init) starting value
> of each quad.
>
> So, the code is correct. However, it still gives different results
> compared to the other method, and the reason is just float math. This
> method does two interpolations (one for the start of the quad, the other
> per pixel), whereas the other method 

Re: [Mesa-dev] [PATCH] i965: Fix texture views of 2d array surfaces

2015-11-02 Thread Ben Widawsky
On Mon, Nov 02, 2015 at 12:05:32PM -0800, Ben Widawsky wrote:
> It is legal to have a texture view of a single layer from a 2D array texture;
> you can sample from it, or render to it. Intel hardware needs to be made aware
> when it is using a 2d array surface in the surface state. The texture view is
> just a 2d surface with the backing miptree actually being a 2d array surface.
> This caused the previous code would not set the right bit in the surface state
> since it wasn't considered an array texture.
> 
> I spotted this early on in debug but brushed it off because it is clearly not
> needed on other platforms (since they all pass). I have no idea how this works
> properly on other platforms (I think gen7 introduced the bit in the state, 
> but I
> am too lazy to check). As such, I have opted not to modify gen7, though I
> believe the current code is wrong there as well.
> 
> Thanks to Chris for helping me debug this.
> 
> Cc: Chris Forbes 
> Reported-by: Mark Janes  (Jenkins)
> References: https://www.opengl.org/registry/specs/ARB/texture_view.txt
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92609
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/gen8_surface_state.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> index 18b8665..19d449b 100644
> --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> @@ -252,8 +252,9 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
>  format == BRW_SURFACEFORMAT_BC7_UNORM))
>surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
>  
> -   if (_mesa_is_array_texture(target) || target == GL_TEXTURE_CUBE_MAP)
> +   if (depth && target != GL_TEXTURE_3D) {
>surf[0] |= GEN8_SURFACE_IS_ARRAY;
> +   }

Something isn't right here. I think it should be depth > 1, but when I change
that the test fails. I need to think about this some more I guess.

>  
> surf[1] = SET_FIELD(mocs_wb, GEN8_SURFACE_MOCS) | mt->qpitch >> 2;
>  
> @@ -428,7 +429,6 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
> case GL_TEXTURE_CUBE_MAP_ARRAY:
> case GL_TEXTURE_CUBE_MAP:
>surf_type = BRW_SURFACE_2D;
> -  is_array = true;
>depth *= 6;
>break;
> case GL_TEXTURE_3D:
> @@ -436,10 +436,11 @@ gen8_update_renderbuffer_surface(struct brw_context 
> *brw,
>/* fallthrough */
> default:
>surf_type = translate_tex_target(gl_target);
> -  is_array = _mesa_tex_target_is_array(gl_target);
>break;
> }
>  
> +   is_array = depth > 1 && gl_target != GL_TEXTURE_3D;
> +
> /* _NEW_BUFFERS */
> /* Render targets can't use IMS layout. Stencil in turn gets configured as
>  * single sampled and indexed manually by the program.
> -- 
> 2.6.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 1/6] gallium: expose a debug message callback settable by context owner

2015-11-02 Thread Marek Olšák
On Mon, Nov 2, 2015 at 9:14 PM, Ilia Mirkin  wrote:
> On Mon, Nov 2, 2015 at 3:07 PM, Marek Olšák  wrote:
>>> diff --git a/src/gallium/include/pipe/p_state.h 
>>> b/src/gallium/include/pipe/p_state.h
>>> index 4bf8d46..2843bb6 100644
>>> --- a/src/gallium/include/pipe/p_state.h
>>> +++ b/src/gallium/include/pipe/p_state.h
>>> @@ -684,6 +684,35 @@ struct pipe_compute_state
>>> unsigned req_input_mem; /**< Required size of the INPUT resource. */
>>>  };
>>>
>>> +/**
>>> + * Structure that contains a callback for debug messages from the driver 
>>> back
>>> + * to the state tracker.
>>> + */
>>> +struct pipe_debug_info
>>> +{
>>
>> I would prefer it if this structure were called pipe_debug_callback,
>> which is what it really is.
>
> To confirm -- just change the struct, leave the set_debug_info call alone?

Ah, also set_debug_info -> set_debug_callback please :)

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/7] DRI3 support for EGL (v3)

2015-11-02 Thread Martin Peres

On 30/10/15 21:26, Kristian Høgsberg wrote:

On Fri, Oct 30, 2015 at 06:03:31PM +0200, Martin Peres wrote:

First of all, I would like to thank Boyan for his work here. I rebased his patch
series, fixed minor issues here and there and reviewed it. You can check the
changes in every patch but the biggest changes are related to the build system.

Speaking about the build system it seems like scons is not building the EGL X11
platform, so I did not bother adding support for the X11 DRI3 there.

I tested the code using GLES apps and ran a full piglit/cts run on all the
platforms. Nothing display-related failed.

Finally, I added a patch that improves the reporting of which DRI version is
being used. I tested that the reporting was accurate by checking which node
was being used by running the application with ezbench's env_dump
(/utils/env_dump).

I intend to push that series at the end of next week unless someone opposes.

On the whole, this looks ok. I was a little surprised to see the dri3
helpers added to libloader, but I guess that can work, as long as we
drop the libloader_la_LIBADD += $(XCB_DRI3_LIBS) and add the libs to
users of libloader.la that need them.


Yep, sounds good!



One change I'd like to see is that we embed loader_dri3_drawable in
dri3_drawable and change loader_dri3_create_drawable() to not allocate
memory and call it loader_dri3_init_drawable().


Yes, I started work on this direction on Friday but I felt un-easy with the
deallocation part. I guess a fini() function will do just fine.

Thanks for calling me on this.


Also, I think we could trim the number of vfuncs in
loader_dri3_vtable. We don't need virtual getters for stuff that
doesn't change like get_dri_screen, is_different_gpu. The
loader_dri3_drawable always has the most up to date width and height,
so we can just cache them in the struct and drop get_width and
get_height.


Sounds good.



Finally, we're getting to many vtables in this area. It's not a
problem with this patch, but the patch is kinda the final straw. I
think one thing we'll want to do is to see if we can kill
dri2_dyp->vtbl in favor of handling the dri2_drv->base.API callbacks
directly in the platform_*.c implementations. For example:

static _EGLSurface*
dri2_create_window_surface(_EGLDriver *drv, _EGLDisplay *dpy,
_EGLConfig *conf, void *native_window,
const EGLint *attrib_list)
{
struct dri2_egl_display *dri2_dpy = dri2_egl_display(dpy);
return dri2_dpy->vtbl->create_window_surface(drv, dpy, conf, native_window,
 attrib_list);
}

doesn't add any value over just setting
dri2_drv->base.API.CreateWindowSurface in platform_x11.c.  Functions
like this one account for pretty much all of dri2_dpy->vtbl.


Right, this is something I have been wondering about when working
on the cpu usage. I am looking for a tool that could tell me where most
of the cache misses are located so as we can start optimizing the hotest
codepath. However, I am afraid we just have thousands of paper cuts...



Anyway, with just the LIBADD and loader_dri3_init_drawable() changes
I'd be ok with landing this;

Reviewed-by: Kristian Høgsberg 


Thanks for the review! I will try to change as little code as possible 
from Boyan
because I do not want him to take the blame for code that I added. I 
will instead

introduce new patches when possible.




Boyan Ding (6):
   loader: Add dri3 helper
   glx/dri3: Convert to use dri3 helper in loader library
   egl_dri2: Add a function to let platform code return dri drawable from
 _EGLSurface
   egl/x11: Implement dri3 support with loader's dri3 helper
   loader/dri3: Expose function to create __DRIimage from pixmap
   egl/x11_dri3: Implement EGL_KHR_image_pixmap

Martin Peres (1):
   egl: make it clear which platform x11 backend is being used (dri2 or
 3)

  configure.ac |   13 +-
  src/egl/Makefile.am  |9 +-
  src/egl/drivers/dri2/egl_dri2.c  |  118 ++-
  src/egl/drivers/dri2/egl_dri2.h  |   19 +-
  src/egl/drivers/dri2/platform_android.c  |1 +
  src/egl/drivers/dri2/platform_drm.c  |1 +
  src/egl/drivers/dri2/platform_wayland.c  |2 +
  src/egl/drivers/dri2/platform_x11.c  |  112 ++-
  src/egl/drivers/dri2/platform_x11_dri3.c |  591 
  src/egl/drivers/dri2/platform_x11_dri3.h |   44 +
  src/glx/dri3_glx.c   | 1446 --
  src/glx/dri3_priv.h  |   96 +-
  src/loader/Makefile.am   |9 +
  src/loader/loader_dri3_helper.c  | 1421 +
  src/loader/loader_dri3_helper.h  |  235 +
  15 files changed, 2713 insertions(+), 1404 deletions(-)
  create mode 100644 src/egl/drivers/dri2/platform_x11_dri3.c
  create mode 100644 src/egl/drivers/dri2/platform_x11_dri3.h
  create mode 100644 

Re: [Mesa-dev] [PATCH] mesa: use gl_shader_variable in program resource list

2015-11-02 Thread Tapani Pälli



On 11/02/2015 09:16 AM, Ilia Mirkin wrote:

On Mon, Nov 2, 2015 at 1:58 AM, Tapani Pälli  wrote:

Patch changes linker to allocate gl_shader_variable instead of using
ir_variable. This makes it possible to get rid of ir_variables and ir
in memory after linking.

v2: check that we do not create duplicate entries with
 packed varyings

Signed-off-by: Tapani Pälli 
---
  src/glsl/linker.cpp| 58 +++---
  src/mesa/main/mtypes.h | 56 
  src/mesa/main/shader_query.cpp | 36 +-
  3 files changed, 123 insertions(+), 27 deletions(-)

diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index 48dd2d3..d0353b4 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -3341,6 +3341,27 @@ build_stageref(struct gl_shader_program *shProg, const 
char *name,
 return stages;
  }

+/**
+ * Create gl_shader_variable from ir_variable class.
+ */
+static gl_shader_variable *
+create_shader_variable(struct gl_shader_program *shProg, const ir_variable *in)
+{
+   gl_shader_variable *out = ralloc(shProg, struct gl_shader_variable);
+   if (!out)
+  return NULL;
+
+   out->type = in->type;
+   out->name = ralloc_strdup(shProg, in->name);


This can fail too, right? Might be nice to error-check.


Thanks, static analysis might complain about this, will fix.


+
+   out->location = in->data.location;
+   out->index = in->data.index;
+   out->patch = in->data.patch;
+   out->mode = in->data.mode;
+
+   return out;
+}
+
  static bool
  add_interface_variables(struct gl_shader_program *shProg,
  exec_list *ir, GLenum programInterface)
@@ -3392,9 +3413,13 @@ add_interface_variables(struct gl_shader_program *shProg,
if (strncmp(var->name, "gl_out_FragData", 15) == 0)
   continue;

-  if (!add_program_resource(shProg, programInterface, var,
-build_stageref(shProg, var->name,
-   var->data.mode) | mask))
+  gl_shader_variable *sha_v = create_shader_variable(shProg, var);
+  if (!sha_v)
+ return false;
+
+  if (!add_program_resource(shProg, programInterface, sha_v,
+build_stageref(shProg, sha_v->name,
+   sha_v->mode) | mask))
   return false;
 }
 return true;
@@ -3422,9 +3447,14 @@ add_packed_varyings(struct gl_shader_program *shProg, 
int stage)
   default:
  unreachable("unexpected type");
   }
- if (!add_program_resource(shProg, iface, var,
-   build_stageref(shProg, var->name,
-  var->data.mode)))
+
+ gl_shader_variable *sha_v = create_shader_variable(shProg, var);
+ if (!sha_v)
+return false;
+
+ if (!add_program_resource(shProg, iface, sha_v,
+   build_stageref(shProg, sha_v->name,
+  sha_v->mode)))
  return false;
}
 }
@@ -3443,7 +3473,12 @@ add_fragdata_arrays(struct gl_shader_program *shProg)
ir_variable *var = node->as_variable();
if (var) {
   assert(var->data.mode == ir_var_shader_out);
- if (!add_program_resource(shProg, GL_PROGRAM_OUTPUT, var,
+
+ gl_shader_variable *sha_v = create_shader_variable(shProg, var);
+ if (!sha_v)
+return false;
+
+ if (!add_program_resource(shProg, GL_PROGRAM_OUTPUT, sha_v,
 1 << MESA_SHADER_FRAGMENT))
  return false;
}
@@ -3723,8 +3758,13 @@ build_program_resource_list(struct gl_shader_program 
*shProg)
 if (shProg->SeparateShader) {
if (!add_packed_varyings(shProg, input_stage))
   return;
-  if (!add_packed_varyings(shProg, output_stage))
- return;
+  /* Only when dealing with multiple stages, otherwise we would have
+   * duplicate gl_shader_variable entries.
+   */
+  if (input_stage != output_stage) {
+ if (!add_packed_varyings(shProg, output_stage))
+return;
+  }
 }

 if (!add_fragdata_arrays(shProg))
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index d6c1eb8..0316769 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -2519,6 +2519,62 @@ struct gl_active_atomic_buffer
  };

  /**
+ * Data container for shader queries. This holds only the minimal
+ * amount of required information for resource queries to work.
+ */
+struct gl_shader_variable
+{
+   /**
+* Declared type of the variable
+*/
+   const struct glsl_type *type;
+
+   /**
+* Declared name of the variable
+*/
+   char *name;
+
+   /**
+* Storage location of the base of this variable
+*
+* The precise meaning 

[Mesa-dev] [PATCH] nir: Silence GCC maybe-uninitialized warnings.

2015-11-02 Thread Vinson Lee
nir/nir_control_flow.c: In function ‘split_block_cursor.isra.11’:
nir/nir_control_flow.c:460:15: warning: ‘after’ may be used uninitialized in 
this function [-Wmaybe-uninitialized]
   *_after = after;
   ^
nir/nir_control_flow.c:458:16: warning: ‘before’ may be used uninitialized in 
this function [-Wmaybe-uninitialized]
   *_before = before;
^

Signed-off-by: Vinson Lee 
---
 src/glsl/nir/nir_control_flow.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c
index 7f51c4f..96395a4 100644
--- a/src/glsl/nir/nir_control_flow.c
+++ b/src/glsl/nir/nir_control_flow.c
@@ -452,6 +452,9 @@ split_block_cursor(nir_cursor cursor,
  before = split_block_before_instr(nir_instr_next(cursor.instr));
   }
   break;
+
+   default:
+  unreachable("not reached");
}
 
if (_before)
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: buffer support in TGSI for SSBO/atomic

2015-11-02 Thread Marek Olšák
I'm okay with adding flags wherever you want, but please note that if
you add flags to declarations, you may need array support on the
declarations, so that instructions can tell which variable is being
used when indirect addressing is being used. CONSTs don't need array
support because the declarations have no flags. You can probably get
away with no array support if you assure that literal address offsets
are not constant-folded.

Marek

On Sun, Nov 1, 2015 at 11:45 PM, Ilia Mirkin  wrote:
> Just wanted to note down some thoughts and get some feedback before
> going forward. I've already sent out a series which covered a lot of
> this, but in the end I realized it came up a bit short (available at
> https://github.com/imirkin/mesa/commits/fd2).
>
> There are two separate buffer-related features --
> ARB_shader_atomic_counters(_ops) and
> ARB_shader_storage_buffer_objects. The former are implementable more
> efficiently on EG/NI hardware by performing the atomic ops on
> not-main-memory (GDS? LDS?). However I think that the gallium-side
> interface can be mostly identical for both cases, perhaps we can mark
> the buffer as atomic-only in the TGSI.
>
> Just like there is a CONST tgsi file, I want to add a BUFFER file,
> which will map to ->set_shader_buffers() indices. The tricky bit comes
> in from the fact that individual variables inside of a buffer may have
> different access/store properties. I see two ways to resolve this:
>
> 1. Declare each variable explicitly, much like UBO's still get
> individual decls per slot. These decls could contain the relevant
> caching property.
>
> 2. Make each LOAD/STORE op declare what caching it wants explicitly.
>
> The first option would work well for images, but for ssbo, it feels
> problematic, as with all the various packing options that exist, you
> could still specify odd per-variable cache rules, which would be
> difficult to express in the TGSI DECL. However I'm not sure how to
> implement the second option.
>
> There is a precedent of a saturate flag, but looking at
> tgsi_instruction, there are only 2 free bits. Since there are only 4
> different caching values (none, coherent, volatile, restrict; I'm not
> counting readonly/writeonly), this fits. However that would leave no
> more bits in tgsi_instruction. I could add a texture-style bit, saying
> to expect an additional tgsi_instruction_buffer packet with more info
> but that seems wasteful.
>
> Another option is to just pass an immediate directly to the LOAD/STORE
> ops which would specify this caching spec as an extra source. This
> seems much simpler, but a little dirtier. Opinions much appreciated.
>
> I think that one this is worked out, I'll be able to resend my series
> adding SSBO/atomic support to freedreno, and partial SSBO (without
> atomic*) support for nvc0.
>
> Cheers,
>
>   -ilia
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92366] [BSW SKL] Regression: glx@glx-swap-event_async failed

2015-11-02 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92366

--- Comment #4 from Tapani Pälli  ---
This test has been randomly failing for me on IVB and HSW as long as I can
remember, I don't think this is specific for BSW and SKL, it just fails on
anything but in a random fashion.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] [v2] i965: Fix texture views of 2d array surfaces

2015-11-02 Thread Ben Widawsky
It is legal to have a texture view of a single layer from a 2D array texture;
you can sample from it, or render to it. Intel hardware needs to be made aware
when it is using a 2d array surface in the surface state. The texture view is
just a 2d surface with the backing miptree actually being a 2d array surface.
This caused the previous code would not set the right bit in the surface state
since it wasn't considered an array texture.

I spotted this early on in debug but brushed it off because it is clearly not
needed on other platforms (since they all pass). I have no idea how this works
properly on other platforms (I think gen7 introduced the bit in the state, but I
am too lazy to check). As such, I have opted not to modify gen7, though I
believe the current code is wrong there as well.

Thanks to Chris for helping me debug this.

v2: Just use the underlying mt's target type to make the array determination.
This replaces a bug in the first patch which was incorrectly relying only
on non-zero depth (not sure how that had no failures). (Ilia)

Cc: Chris Forbes 
Reported-by: Mark Janes  (Jenkins)
References: https://www.opengl.org/registry/specs/ARB/texture_view.txt
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92609
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/gen8_surface_state.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index 18b8665..ba97df8 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -252,7 +252,7 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
 format == BRW_SURFACEFORMAT_BC7_UNORM))
   surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
 
-   if (_mesa_is_array_texture(target) || target == GL_TEXTURE_CUBE_MAP)
+   if (_mesa_is_array_texture(mt->target) || mt->target == GL_TEXTURE_CUBE_MAP)
   surf[0] |= GEN8_SURFACE_IS_ARRAY;
 
surf[1] = SET_FIELD(mocs_wb, GEN8_SURFACE_MOCS) | mt->qpitch >> 2;
@@ -436,7 +436,7 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
   /* fallthrough */
default:
   surf_type = translate_tex_target(gl_target);
-  is_array = _mesa_tex_target_is_array(gl_target);
+  is_array = _mesa_tex_target_is_array(mt->target);
   break;
}
 
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: buffer support in TGSI for SSBO/atomic

2015-11-02 Thread Roland Scheidegger
Am 02.11.2015 um 22:39 schrieb Dave Airlie:
> On 3 November 2015 at 07:31, Roland Scheidegger  wrote:
>> Am 02.11.2015 um 20:55 schrieb Ilia Mirkin:
>>> FTR these are the various operators on nvidia hw:
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.nvidia.com_cuda_parallel-2Dthread-2Dexecution_-23cache-2Doperators=BQIFaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=AffBQlVu5ht00Ignz67m4YHn6ePeNQFrDUYljvo28Vc=m5WnnOkcD2MS1JiJjK4XTWgUXjyEbZWhWSjq_V_O6oA=
>>>
>>> Most of these map directly to instruction things (ca/cg/cs/cv sound
>>> familiar, dunno about lu, could just be an assembler helper).
>> Ah I see, that's how it works. Makes sense I guess, though I guess there
>> could be some slight inefficiences if data is packed "strangely"
>> (because global coherent write will evict l1 cache lines, thus for
>> instance some non-coherent access to a different variable but same cache
>> line would have to re-fetch that from l2), but probably that's not too
>> bad...
>>
>>
>>>
>>> How backwards-compatible is TGSI supposed to be? Can we change the
>>> encoding willy-nilly, or are there separate systems that talk to each
>>> other using TGSI that would need coordination?
>> I think it's not really different than the rest of gallium, so not
>> actually considered stable. Obviously it was written to be quite
>> extensible, but I don't think there's really anything preventing us from
>> changing it in binary-incompatible ways, IFF there's a really good
>> reason for it. If some driver relies on binary compatibility for tgsi
>> shaders (let's say for recognizing specific shaders to be able to do
>> shader replacements) it will need to be adapted to such changes (and I
>> know of some code which does that...). So reducing field width just
>> because you could do with less is not really a good idea, but doing it
>> because you actually need the now free bits should be ok.
> 
> For virgl I'd like the txt encoding's not to radically be redone, but the
> binary I'm fine with.
> 
> Even if the txt does for virgl I can rewrite things anyways.
> 

If you'd need binary compatibility after such a change it would be
possible to add version tokens. I don't think anyone really wants to
deal with things like multiple slightly changed structs in general, but
conversion code could keep all the old struct versions and be able to
convert from old to new (or vice versa). I guess something similar could
be done for text representation, if necessary. (Stable gallium abi which
would of course include tgsi was something which was talked about in the
past but never materialized. I can certainly see why you might be
interested in stable tgsi representation for virgl, albeit you'd
probably have to maintain that code yourself...)

Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92783] MESA_DEBUG=incomplete_tex prints warnings from glClear which doesn't use the state

2015-11-02 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92783

Bug ID: 92783
   Summary: MESA_DEBUG=incomplete_tex prints warnings from glClear
which doesn't use the state
   Product: Mesa
   Version: git
  Hardware: All
OS: All
Status: NEW
  Severity: minor
  Priority: low
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: glenn.kenn...@gmail.com
QA Contact: mesa-dev@lists.freedesktop.org

Reproducible by running the  arb_texture_view-sampling-2d-array-as-2d-layer
piglit test case, the first call to glClear ends up checking the currently
bound shader's state for texture completeness and prints the following warning:
Mesa: Texture Obj 0 incomplete because: Image[baseLevel=0] == NULL

As far as I can tell this is due to _mesa_Clear calling _mesa_update_state
which eventually gets to _mesa_test_texobj_completeness where the warning is
printed.

Checking texture completeness in glClear strikes me as rather odd, in
particular since the check is on the api bound shader, not the internal state
used for the clear itself.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] mesa: check if resource backed by buffer in property queries

2015-11-02 Thread Tapani Pälli
Thanks Eduardo for taking a look at this. Today with a refreshed mind I 
noticed that I can fix this with smaller changes since it seems it's 
actually only the matrix_stride query that is broken for atomic counter 
buffers.


I'll send a new patch and will cc you.

On 10/30/2015 05:35 PM, Eduardo Lima Mitev wrote:

Patch is,

Reviewed-by: Eduardo Lima Mitev 

On 10/30/2015 01:30 PM, Tapani Pälli wrote:

Patch fixes broken behaviour of queries for following properties, from
ARB_program_interface_query specification:

GL_OFFSET:

"For active variables not backed by a buffer object, an offset of
-1 is written to ."

GL_ARRAY_STRIDE:

"For active variables not declared as an array of basic types, zero
is written to .  For active variables not backed by a buffer
object, -1 is written to , regardless of the variable type."

GL_MATRIX_STRIDE:

"For active variables not declared as a matrix or array of matrices,
zero is written to .  For active variables not backed by a
buffer object, -1 is written to , regardless of the variable
type."

These queries may come from GetActiveUniformsiv or GetProgramResourceiv.
Patch implements little helper to do 'backed by buffer' check and returns
appropriate values.

Fixes following CTS test:
ES31-CTS.shader_atomic_counters.basic-program-query

No Piglit regressions.

Signed-off-by: Tapani Pälli 
---
  src/mesa/main/shader_query.cpp | 40 +++-
  1 file changed, 35 insertions(+), 5 deletions(-)

diff --git a/src/mesa/main/shader_query.cpp b/src/mesa/main/shader_query.cpp
index dd51bba..b214691 100644
--- a/src/mesa/main/shader_query.cpp
+++ b/src/mesa/main/shader_query.cpp
@@ -1105,6 +1105,26 @@ invalid_operation:
 return 0;
  }

+/**
+ * Helper that returns if given resource is backed by a buffer object,
+ * either UBO, SSBO or atomic counter buffer.
+ */
+static bool
+backed_by_buffer_object(struct gl_program_resource *res)
+{
+   switch (res->Type) {
+   case GL_BUFFER_VARIABLE:
+  return true;
+   case GL_UNIFORM:
+  if (RESOURCE_UNI(res)->block_index != -1 ||
+  RESOURCE_UNI(res)->atomic_buffer_index != -1 ||
+  RESOURCE_UNI(res)->is_shader_storage)
+ return true;
+   default:
+  return false;
+   }
+}
+
  unsigned
  _mesa_program_resource_prop(struct gl_shader_program *shProg,
  struct gl_program_resource *res, GLuint index,
@@ -1171,20 +1191,30 @@ _mesa_program_resource_prop(struct gl_shader_program 
*shProg,
}
 case GL_OFFSET:
VALIDATE_TYPE_2(GL_UNIFORM, GL_BUFFER_VARIABLE);
-  *val = RESOURCE_UNI(res)->offset;
+  *val = backed_by_buffer_object(res) ? RESOURCE_UNI(res)->offset : -1;
return 1;
 case GL_BLOCK_INDEX:
VALIDATE_TYPE_2(GL_UNIFORM, GL_BUFFER_VARIABLE);
*val = RESOURCE_UNI(res)->block_index;
return 1;
-   case GL_ARRAY_STRIDE:
+   case GL_ARRAY_STRIDE: {
VALIDATE_TYPE_2(GL_UNIFORM, GL_BUFFER_VARIABLE);
-  *val = RESOURCE_UNI(res)->array_stride;
+  bool backed = backed_by_buffer_object(res);
+  if (backed && RESOURCE_UNI(res)->array_elements == 0)
+ *val = 0;
+  else
+ *val = backed ? RESOURCE_UNI(res)->array_stride : -1;
return 1;
-   case GL_MATRIX_STRIDE:
+   }
+   case GL_MATRIX_STRIDE: {
VALIDATE_TYPE_2(GL_UNIFORM, GL_BUFFER_VARIABLE);
-  *val = RESOURCE_UNI(res)->matrix_stride;
+ bool backed = backed_by_buffer_object(res);
+ if (backed && !RESOURCE_UNI(res)->type->is_matrix())
+*val = 0;
+ else
+*val = backed ? RESOURCE_UNI(res)->matrix_stride : -1;
return 1;
+   }
 case GL_IS_ROW_MAJOR:
VALIDATE_TYPE_2(GL_UNIFORM, GL_BUFFER_VARIABLE);
*val = RESOURCE_UNI(res)->row_major;




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V2 5/7] i965: add support for image AoA

2015-11-02 Thread Francisco Jerez
Timothy Arceri  writes:

> From: Timothy Arceri 
>
> V2: avoid useless zero-initialization and addition for the first AoA level,
> avoid redundant temporary, make use of type_size_scalar(), rename aoa_size
> to element_size, assign the indirect indexing temporary directly to
> image.reladdr, and replace while loop with a for loop. All suggested
> by Francisco Jerez.
>
> Cc: Francisco Jerez 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 30 
> ++
>  src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp |  2 ++
>  2 files changed, 18 insertions(+), 14 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 24ff5af..5254c2d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -1061,18 +1061,17 @@ fs_visitor::get_nir_image_deref(const nir_deref_var 
> *deref)
> fs_reg image(UNIFORM, deref->var->data.driver_location,
>  BRW_REGISTER_TYPE_UD);
>  
> -   if (deref->deref.child) {
> -  const nir_deref_array *deref_array =
> - nir_deref_as_array(deref->deref.child);
> -  assert(deref->deref.child->deref_type == nir_deref_type_array &&
> - deref_array->deref.child == NULL);
> -  const unsigned size = glsl_get_length(deref->var->type);
> +   for (const nir_deref *tail = deref->deref.child; tail;
> +tail = tail->child) {
> +  const nir_deref_array *deref_array = nir_deref_as_array(tail);
> +  assert(tail->deref_type == nir_deref_type_array);
> +  const unsigned size = glsl_get_length(tail->type);

IIUC tail->type is going to be the type of the *inner* contained array
which isn't the one you need to clamp to...  You probably want to
iterate on the containing array derefs instead so you have the size
bound available, like:

|   for (const nir_deref *tail = >deref; tail->child;
|tail = tail->child) {
|  const nir_deref_array *deref_array = nir_deref_as_array(tail->child);
|  assert(tail->child->deref_type == nir_deref_type_array);

(I suggested otherwise in my reply to your v1 because you were only
using tail->child from within the loop, but that was only due to this
mistake.)

> +  const unsigned element_size = type_size_scalar(tail->type);

Then change this to use "deref_array->deref.type".

>const unsigned base = MIN2(deref_array->base_offset, size - 1);
> -
> -  image = offset(image, bld, base * BRW_IMAGE_PARAM_SIZE);
> +  image = offset(image, bld, base * element_size);
>  
>if (deref_array->deref_array_type == nir_deref_array_type_indirect) {
> - fs_reg *tmp = new(mem_ctx) fs_reg(vgrf(glsl_type::int_type));
> + fs_reg tmp = vgrf(glsl_type::int_type);
>  
>   if (devinfo->gen == 7 && !devinfo->is_haswell) {
>  /* IVB hangs when trying to access an invalid surface index with
> @@ -1083,15 +1082,18 @@ fs_visitor::get_nir_image_deref(const nir_deref_var 
> *deref)
>   * of the possible outcomes of the hang.  Clamp the index to
>   * prevent access outside of the array bounds.
>   */
> -bld.emit_minmax(*tmp, retype(get_nir_src(deref_array->indirect),
> - BRW_REGISTER_TYPE_UD),
> +bld.emit_minmax(tmp, retype(get_nir_src(deref_array->indirect),
> +BRW_REGISTER_TYPE_UD),
>  fs_reg(size - base - 1), BRW_CONDITIONAL_L);
>   } else {
> -bld.MOV(*tmp, get_nir_src(deref_array->indirect));
> +bld.MOV(tmp, get_nir_src(deref_array->indirect));
>   }
>  
> - bld.MUL(*tmp, *tmp, fs_reg(BRW_IMAGE_PARAM_SIZE));
> - image.reladdr = tmp;
> + bld.MUL(tmp, tmp, fs_reg(element_size));
> + if (image.reladdr)
> +bld.ADD(*image.reladdr, *image.reladdr, tmp);
> + else
> +image.reladdr = new(mem_ctx) fs_reg(tmp);
>}
> }
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp 
> b/src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp
> index d3326e9..87b3839 100644
> --- a/src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp
> @@ -98,6 +98,8 @@ brw_nir_setup_glsl_uniform(gl_shader_stage stage, 
> nir_variable *var,
>if (storage->type->is_image()) {
>   brw_setup_image_uniform_values(stage, stage_prog_data,
>  uniform_index, storage);
> + uniform_index +=
> +BRW_IMAGE_PARAM_SIZE * MAX2(storage->array_elements, 1);
>} else {
>   gl_constant_value *components = storage->storage;
>   unsigned vector_count = (MAX2(storage->array_elements, 1) *
> -- 
> 2.4.3


signature.asc
Description: PGP signature
___

[Mesa-dev] [PATCH] glsl: set matrix_stride for non matrices with atomic counter buffers

2015-11-02 Thread Tapani Pälli
Patch sets matrix_stride as 0 for non matrix uniforms that are in a
atomic counter buffer. Matrix stride calculation for actual matrix
uniforms is done during link_assign_uniform_locations.

From ARB_program_interface_query specification:

GL_MATRIX_STRIDE:

   "For active variables not declared as a matrix or array of matrices,
   zero is written to .  For active variables not backed by a
   buffer object, -1 is written to , regardless of the variable
   type."

Signed-off-by: Tapani Pälli 
---
 src/glsl/link_atomics.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/glsl/link_atomics.cpp b/src/glsl/link_atomics.cpp
index cdcc06d..3aa52db 100644
--- a/src/glsl/link_atomics.cpp
+++ b/src/glsl/link_atomics.cpp
@@ -240,6 +240,8 @@ link_assign_atomic_counter_resources(struct gl_context *ctx,
  storage->offset = var->data.atomic.offset;
  storage->array_stride = (var->type->is_array() ?
   var->type->without_array()->atomic_size() : 
0);
+ if (!var->type->is_matrix())
+storage->matrix_stride = 0;
   }
 
   /* Assign stage-specific fields. */
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 22/24] i965/fs: Use brw_imm_uw().

2015-11-02 Thread Matt Turner
W/UW immediates are 16-bits, but those 16-bits must be replicated
in the high 16-bits of the 32-bit field.

Remove the useless W/UW immediate saturating code, since we'll now be
using the appropriate immediate (and W/UW immediates in the IR can now
no longer be larger than 16-bits).
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 3 +--
 src/mesa/drivers/dri/i965/brw_shader.cpp | 8 ++--
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 28dd6a3..9ddca6d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -437,8 +437,7 @@ fs_visitor::optimize_frontfacing_ternary(nir_alu_instr 
*instr,
   tmp.subreg_offset = 2;
   tmp.stride = 2;
 
-  fs_inst *or_inst = bld.OR(tmp, g0, brw_imm_d(0x3f80));
-  or_inst->src[1].type = BRW_REGISTER_TYPE_UW;
+  bld.OR(tmp, g0, brw_imm_uw(0x3f80));
 
   tmp.type = BRW_REGISTER_TYPE_D;
   tmp.subreg_offset = 0;
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index f2da164..45cd395 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -563,16 +563,12 @@ brw_saturate_immediate(enum brw_reg_type type, struct 
brw_reg *reg)
switch (type) {
case BRW_REGISTER_TYPE_UD:
case BRW_REGISTER_TYPE_D:
+   case BRW_REGISTER_TYPE_UW:
+   case BRW_REGISTER_TYPE_W:
case BRW_REGISTER_TYPE_UQ:
case BRW_REGISTER_TYPE_Q:
   /* Nothing to do. */
   return false;
-   case BRW_REGISTER_TYPE_UW:
-  sat_imm.ud = CLAMP(imm.ud, 0, USHRT_MAX);
-  break;
-   case BRW_REGISTER_TYPE_W:
-  sat_imm.d = CLAMP(imm.d, SHRT_MIN, SHRT_MAX);
-  break;
case BRW_REGISTER_TYPE_F:
   sat_imm.f = CLAMP(imm.f, 0.0f, 1.0f);
   break;
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 13/24] i965: Move BAD_FILE from the beginning of enum register_file.

2015-11-02 Thread Matt Turner
I'm going to begin using brw_reg's file field in backend_reg and its
derivatives, and in order to keep the hardware value for ARF as 0, we
have to do something different.
---
 src/mesa/drivers/dri/i965/brw_shader.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
b/src/mesa/drivers/dri/i965/brw_shader.h
index 5fb9913..1c0d0cb 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.h
+++ b/src/mesa/drivers/dri/i965/brw_shader.h
@@ -39,13 +39,13 @@
 #define MAX_VGRF_SIZE 16
 
 enum PACKED register_file {
-   BAD_FILE,
GRF,
MRF,
IMM,
HW_REG, /* a struct brw_reg */
ATTR,
UNIFORM, /* prog_data->params[reg] */
+   BAD_FILE,
 };
 
 #ifdef __cplusplus
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/24] i965: Unwrap some lines.

2015-11-02 Thread Matt Turner
---
 src/mesa/drivers/dri/i965/brw_fs.cpp| 5 +
 src/mesa/drivers/dri/i965/brw_fs_combine_constants.cpp  | 3 +--
 src/mesa/drivers/dri/i965/brw_vec4.cpp  | 5 +
 src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 3 +--
 4 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 243a4ac..ef66796 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -416,10 +416,7 @@ fs_reg::fs_reg(uint8_t vf0, uint8_t vf1, uint8_t vf2, 
uint8_t vf3)
init();
this->file = IMM;
this->type = BRW_REGISTER_TYPE_VF;
-   this->ud = (vf0 <<  0) |
-   (vf1 <<  8) |
-   (vf2 << 16) |
-   (vf3 << 24);
+   this->ud = (vf0 <<  0) | (vf1 <<  8) | (vf2 << 16) | (vf3 << 24);
 }
 
 fs_reg::fs_reg(struct brw_reg reg) :
diff --git a/src/mesa/drivers/dri/i965/brw_fs_combine_constants.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_combine_constants.cpp
index c956459..234bbec 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_combine_constants.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_combine_constants.cpp
@@ -299,8 +299,7 @@ fs_visitor::opt_combine_constants()
  reg->reg = table.imm[i].reg;
  reg->subreg_offset = table.imm[i].subreg_offset;
  reg->stride = 0;
- reg->negate = signbit(reg->f) !=
-   signbit(table.imm[i].val);
+ reg->negate = signbit(reg->f) != signbit(table.imm[i].val);
  assert(fabsf(reg->f) == table.imm[i].val);
   }
}
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 5d63a28..3ea4b1d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -113,10 +113,7 @@ src_reg::src_reg(uint8_t vf0, uint8_t vf1, uint8_t vf2, 
uint8_t vf3)
 
this->file = IMM;
this->type = BRW_REGISTER_TYPE_VF;
-   this->ud = (vf0 <<  0) |
-   (vf1 <<  8) |
-   (vf2 << 16) |
-   (vf3 << 24);
+   this->ud = (vf0 <<  0) | (vf1 <<  8) | (vf2 << 16) | (vf3 << 24);
 }
 
 src_reg::src_reg(struct brw_reg reg) :
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index 523866e..2be7b14 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
@@ -147,8 +147,7 @@ try_constant_propagate(const struct brw_device_info 
*devinfo,
}
 
if (value.type == BRW_REGISTER_TYPE_VF)
-  value.ud = swizzle_vf_imm(value.ud,
- inst->src[arg].swizzle);
+  value.ud = swizzle_vf_imm(value.ud, inst->src[arg].swizzle);
 
switch (inst->opcode) {
case BRW_OPCODE_MOV:
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 20/24] Revert "i965: Have brw_imm_vf4() take the vector components as integer values."

2015-11-02 Thread Matt Turner
This reverts commit bbf8239f92ecd79431dfa41402e1c85318e7267f.

I didn't like that commit to begin with -- computing things at compile
time is fine -- but for purposes of verifying that the resulting values
are correct, looking up 0x00 and 0x30 in a table is a lot better than
evaluating a recursive function.

Anyway, by making brw_imm_vf4() take the actual 8-bit restricted floats
directly (instead of only integral values that would be converted to
restricted float), we can use this function as a replacement for the
vector float src_reg/fs_reg constructors.
---
 src/mesa/drivers/dri/i965/brw_clip_util.c |  2 +-
 src/mesa/drivers/dri/i965/brw_reg.h   | 40 ---
 2 files changed, 11 insertions(+), 31 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_clip_util.c 
b/src/mesa/drivers/dri/i965/brw_clip_util.c
index 40ad144..0253d52 100644
--- a/src/mesa/drivers/dri/i965/brw_clip_util.c
+++ b/src/mesa/drivers/dri/i965/brw_clip_util.c
@@ -224,7 +224,7 @@ void brw_clip_interp_vertex( struct brw_clip_compile *c,
   vec1(t_nopersp),
   brw_imm_f(0));
   brw_IF(p, BRW_EXECUTE_1);
-  brw_MOV(p, t_nopersp, brw_imm_vf4(1, 0, 0, 0));
+  brw_MOV(p, t_nopersp, brw_imm_vf4(VF_ONE, VF_ZERO, VF_ZERO, VF_ZERO));
   brw_ENDIF(p);
 
   /* Now compute t_nopersp = t_nopersp.y/t_nopersp.x and broadcast it. */
diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
b/src/mesa/drivers/dri/i965/brw_reg.h
index b906892..a4c1901 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i965/brw_reg.h
@@ -43,7 +43,6 @@
 #define BRW_REG_H
 
 #include 
-#include "main/imports.h"
 #include "main/compiler.h"
 #include "main/macros.h"
 #include "program/prog_instruction.h"
@@ -638,38 +637,19 @@ brw_imm_vf(unsigned v)
return imm;
 }
 
-/**
- * Convert an integer into a "restricted" 8-bit float, used in vector
- * immediates.  The 8-bit floating point format has a sign bit, an
- * excess-3 3-bit exponent, and a 4-bit mantissa.  All integer values
- * from -31 to 31 can be represented exactly.
- */
-static inline uint8_t
-int_to_float8(int x)
-{
-   if (x == 0) {
-  return 0;
-   } else if (x < 0) {
-  return 1 << 7 | int_to_float8(-x);
-   } else {
-  const unsigned exponent = _mesa_logbase2(x);
-  const unsigned mantissa = (x - (1 << exponent)) << (4 - exponent);
-  assert(exponent <= 4);
-  return (exponent + 3) << 4 | mantissa;
-   }
-}
+#define VF_ZERO 0x0
+#define VF_ONE  0x30
+#define VF_NEG  (1<<7)
 
-/**
- * Construct a floating-point packed vector immediate from its integer
- * values. \sa int_to_float8()
- */
 static inline struct brw_reg
-brw_imm_vf4(int v0, int v1, int v2, int v3)
+brw_imm_vf4(unsigned v0, unsigned v1, unsigned v2, unsigned v3)
 {
-   return brw_imm_vf((int_to_float8(v0) << 0) |
- (int_to_float8(v1) << 8) |
- (int_to_float8(v2) << 16) |
- (int_to_float8(v3) << 24));
+   struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_VF);
+   imm.vstride = BRW_VERTICAL_STRIDE_0;
+   imm.width = BRW_WIDTH_4;
+   imm.hstride = BRW_HORIZONTAL_STRIDE_1;
+   imm.ud = ((v0 << 0) | (v1 << 8) | (v2 << 16) | (v3 << 24));
+   return imm;
 }
 
 
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/24] i965: Use brw_reg's nr field to store register number.

2015-11-02 Thread Matt Turner
In addition to combining another field, we get replace silliness like
"reg.reg" with something that actually makes sense, "reg.nr"; and no one
will ever wonder again why dst.reg isn't a dst_reg.

Moving the now 16-bit nr field to a 16-bit boundary decreases code size
by about 3k.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 163 ++---
 .../drivers/dri/i965/brw_fs_combine_constants.cpp  |   8 +-
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   |  18 +--
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |   2 +-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp |   8 +-
 src/mesa/drivers/dri/i965/brw_fs_live_variables.h  |   2 +-
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |  34 ++---
 .../drivers/dri/i965/brw_fs_register_coalesce.cpp  |  22 +--
 .../dri/i965/brw_fs_saturate_propagation.cpp   |   2 +-
 src/mesa/drivers/dri/i965/brw_fs_validate.cpp  |   4 +-
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |   2 +-
 src/mesa/drivers/dri/i965/brw_ir_fs.h  |   6 +-
 src/mesa/drivers/dri/i965/brw_ir_vec4.h|   8 +-
 src/mesa/drivers/dri/i965/brw_reg.h|  10 +-
 .../drivers/dri/i965/brw_schedule_instructions.cpp |  62 
 src/mesa/drivers/dri/i965/brw_shader.cpp   |   2 +-
 src/mesa/drivers/dri/i965/brw_shader.h |   9 --
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 100 ++---
 .../drivers/dri/i965/brw_vec4_copy_propagation.cpp |   6 +-
 src/mesa/drivers/dri/i965/brw_vec4_cse.cpp |   2 +-
 .../drivers/dri/i965/brw_vec4_live_variables.h |  12 +-
 .../drivers/dri/i965/brw_vec4_reg_allocate.cpp |  36 ++---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp |  40 ++---
 .../dri/i965/test_vec4_copy_propagation.cpp|   4 +-
 .../dri/i965/test_vec4_register_coalesce.cpp   |   4 +-
 25 files changed, 276 insertions(+), 290 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index ef66796..fecac13 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -306,7 +306,7 @@ fs_inst::is_copy_payload(const brw::simple_allocator 
_alloc) const
if (reg.file != GRF || reg.reg_offset != 0 || reg.stride == 0)
   return false;
 
-   if (grf_alloc.sizes[reg.reg] != this->regs_written)
+   if (grf_alloc.sizes[reg.nr] != this->regs_written)
   return false;
 
for (int i = 0; i < this->sources; i++) {
@@ -423,7 +423,6 @@ fs_reg::fs_reg(struct brw_reg reg) :
backend_reg(reg)
 {
this->file = HW_REG;
-   this->reg = 0;
this->reg_offset = 0;
this->subreg_offset = 0;
this->reladdr = NULL;
@@ -434,7 +433,7 @@ bool
 fs_reg::equals(const fs_reg ) const
 {
return (file == r.file &&
-   reg == r.reg &&
+   nr == r.nr &&
reg_offset == r.reg_offset &&
subreg_offset == r.subreg_offset &&
type == r.type &&
@@ -940,20 +939,20 @@ fs_visitor::vgrf(const glsl_type *const type)
  brw_type_for_base_type(type));
 }
 
-fs_reg::fs_reg(enum register_file file, int reg)
+fs_reg::fs_reg(enum register_file file, int nr)
 {
init();
this->file = file;
-   this->reg = reg;
+   this->nr = nr;
this->type = BRW_REGISTER_TYPE_F;
this->stride = (file == UNIFORM ? 0 : 1);
 }
 
-fs_reg::fs_reg(enum register_file file, int reg, enum brw_reg_type type)
+fs_reg::fs_reg(enum register_file file, int nr, enum brw_reg_type type)
 {
init();
this->file = file;
-   this->reg = reg;
+   this->nr = nr;
this->type = type;
this->stride = (file == UNIFORM ? 0 : 1);
 }
@@ -1380,7 +1379,7 @@ fs_visitor::assign_curb_setup()
foreach_block_and_inst(block, fs_inst, inst, cfg) {
   for (unsigned int i = 0; i < inst->sources; i++) {
 if (inst->src[i].file == UNIFORM) {
-int uniform_nr = inst->src[i].reg + inst->src[i].reg_offset;
+int uniform_nr = inst->src[i].nr + inst->src[i].reg_offset;
 int constant_nr;
 if (uniform_nr >= 0 && uniform_nr < (int) uniforms) {
constant_nr = push_constant_loc[uniform_nr];
@@ -1550,7 +1549,7 @@ fs_visitor::assign_vs_urb_setup()
  if (inst->src[i].file == ATTR) {
 int grf = payload.num_regs +
   prog_data->curb_read_length +
-  inst->src[i].reg +
+  inst->src[i].nr +
   inst->src[i].reg_offset;
 
 struct brw_reg reg =
@@ -1610,15 +1609,15 @@ fs_visitor::split_virtual_grfs()
/* Mark all used registers as fully splittable */
foreach_block_and_inst(block, fs_inst, inst, cfg) {
   if (inst->dst.file == GRF) {
- int reg = vgrf_to_reg[inst->dst.reg];
- for (unsigned j = 1; j < this->alloc.sizes[inst->dst.reg]; j++)
+ int reg = vgrf_to_reg[inst->dst.nr];
+ for (unsigned j = 1; j < this->alloc.sizes[inst->dst.nr]; j++)
 split_points[reg + j] = 

[Mesa-dev] [PATCH 02/24] i965: Delete abs/negate fields from backend_reg.

2015-11-02 Thread Matt Turner
Instead use the ones provided by brw_reg. Also allows us to handle
HW_REGs in the negate() functions.
---
 src/mesa/drivers/dri/i965/brw_ir_fs.h   | 2 +-
 src/mesa/drivers/dri/i965/brw_ir_vec4.h | 2 +-
 src/mesa/drivers/dri/i965/brw_shader.h  | 3 ---
 3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h 
b/src/mesa/drivers/dri/i965/brw_ir_fs.h
index 4417555..c0e486e 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
@@ -72,7 +72,7 @@ public:
 static inline fs_reg
 negate(fs_reg reg)
 {
-   assert(reg.file != HW_REG && reg.file != IMM);
+   assert(reg.file != IMM);
reg.negate = !reg.negate;
return reg;
 }
diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h 
b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
index 29642c6..2fbb043 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
@@ -90,7 +90,7 @@ swizzle(src_reg reg, unsigned swizzle)
 static inline src_reg
 negate(src_reg reg)
 {
-   assert(reg.file != HW_REG && reg.file != IMM);
+   assert(reg.file != IMM);
reg.negate = !reg.negate;
return reg;
 }
diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
b/src/mesa/drivers/dri/i965/brw_shader.h
index 9a516c3..c9614aa 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.h
+++ b/src/mesa/drivers/dri/i965/brw_shader.h
@@ -82,9 +82,6 @@ struct backend_reg : public brw_reg
uint16_t reg_offset;
 
struct brw_reg fixed_hw_reg;
-
-   bool negate;
-   bool abs;
 };
 #endif
 
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/24] i965: Reorganize brw_reg fields.

2015-11-02 Thread Matt Turner
Put fields that are meaningless with an immediate in the same storage
with the immediate. This leaves fields type, file, nr, subnr in the
first dword where there's now extra room for expansion.
---
 src/mesa/drivers/dri/i965/brw_reg.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
b/src/mesa/drivers/dri/i965/brw_reg.h
index 48f1f2e..b34bfef 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i965/brw_reg.h
@@ -237,18 +237,18 @@ struct brw_reg {
unsigned subnr:5;  /* :1 in align16 */
unsigned negate:1; /* source only */
unsigned abs:1;/* source only */
-   unsigned vstride:4;/* source only */
-   unsigned width:3;  /* src only, align1 only */
-   unsigned hstride:2;/* align1 only */
unsigned address_mode:1;   /* relative addressing, hopefully! */
-   unsigned pad0:1;
+   unsigned pad0:10;
 
union {
   struct {
  unsigned swizzle:8;  /* src only, align16 only */
  unsigned writemask:4;/* dest only, align16 only */
  int  indirect_offset:10; /* relative addressing offset */
- unsigned pad1:10;/* two dwords total */
+ unsigned vstride:4;  /* source only */
+ unsigned width:3;/* src only, align1 only */
+ unsigned hstride:2;  /* align1 only */
+ unsigned pad1:1;
   };
 
   float f;
@@ -357,9 +357,6 @@ brw_reg(unsigned file,
reg.subnr = subnr * type_sz(type);
reg.negate = negate;
reg.abs = abs;
-   reg.vstride = vstride;
-   reg.width = width;
-   reg.hstride = hstride;
reg.address_mode = BRW_ADDRESS_DIRECT;
reg.pad0 = 0;
 
@@ -372,6 +369,9 @@ brw_reg(unsigned file,
reg.swizzle = swizzle;
reg.writemask = writemask;
reg.indirect_offset = 0;
+   reg.vstride = vstride;
+   reg.width = width;
+   reg.hstride = hstride;
reg.pad1 = 0;
return reg;
 }
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/24] i965: Use immediate storage in inherited brw_reg.

2015-11-02 Thread Matt Turner
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 74 +++---
 .../drivers/dri/i965/brw_fs_combine_constants.cpp  |  6 +-
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   | 12 ++--
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp   | 16 ++---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 12 ++--
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |  2 +-
 src/mesa/drivers/dri/i965/brw_shader.cpp   | 10 +--
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 39 ++--
 .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 10 +--
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp |  2 +-
 10 files changed, 92 insertions(+), 91 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 8a2faeb..6b9e979 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -378,7 +378,7 @@ fs_reg::fs_reg(float f)
this->file = IMM;
this->type = BRW_REGISTER_TYPE_F;
this->stride = 0;
-   this->fixed_hw_reg.f = f;
+   this->f = f;
 }
 
 /** Immediate value constructor. */
@@ -388,7 +388,7 @@ fs_reg::fs_reg(int32_t i)
this->file = IMM;
this->type = BRW_REGISTER_TYPE_D;
this->stride = 0;
-   this->fixed_hw_reg.d = i;
+   this->d = i;
 }
 
 /** Immediate value constructor. */
@@ -398,7 +398,7 @@ fs_reg::fs_reg(uint32_t u)
this->file = IMM;
this->type = BRW_REGISTER_TYPE_UD;
this->stride = 0;
-   this->fixed_hw_reg.ud = u;
+   this->ud = u;
 }
 
 /** Vector float immediate value constructor. */
@@ -407,7 +407,7 @@ fs_reg::fs_reg(uint8_t vf[4])
init();
this->file = IMM;
this->type = BRW_REGISTER_TYPE_VF;
-   memcpy(>fixed_hw_reg.ud, vf, sizeof(unsigned));
+   memcpy(>ud, vf, sizeof(unsigned));
 }
 
 /** Vector float immediate value constructor. */
@@ -416,7 +416,7 @@ fs_reg::fs_reg(uint8_t vf0, uint8_t vf1, uint8_t vf2, 
uint8_t vf3)
init();
this->file = IMM;
this->type = BRW_REGISTER_TYPE_VF;
-   this->fixed_hw_reg.ud = (vf0 <<  0) |
+   this->ud = (vf0 <<  0) |
(vf1 <<  8) |
(vf2 << 16) |
(vf3 << 24);
@@ -442,9 +442,10 @@ fs_reg::equals(const fs_reg ) const
negate == r.negate &&
abs == r.abs &&
!reladdr && !r.reladdr &&
-   ((file != HW_REG && file != IMM) ||
+   (file != HW_REG ||
 memcmp(_hw_reg, _hw_reg,
sizeof(fixed_hw_reg)) == 0) &&
+   (file != IMM || d == r.d) &&
stride == r.stride);
 }
 
@@ -705,7 +706,7 @@ fs_inst::components_read(unsigned i) const
   assert(src[FB_WRITE_LOGICAL_SRC_COMPONENTS].file == IMM);
   /* First/second FB write color. */
   if (i < 2)
- return src[FB_WRITE_LOGICAL_SRC_COMPONENTS].fixed_hw_reg.ud;
+ return src[FB_WRITE_LOGICAL_SRC_COMPONENTS].ud;
   else
  return 1;
 
@@ -724,10 +725,10 @@ fs_inst::components_read(unsigned i) const
   assert(src[8].file == IMM && src[9].file == IMM);
   /* Texture coordinates. */
   if (i == 0)
- return src[8].fixed_hw_reg.ud;
+ return src[8].ud;
   /* Texture derivatives. */
   else if ((i == 2 || i == 3) && opcode == SHADER_OPCODE_TXD_LOGICAL)
- return src[9].fixed_hw_reg.ud;
+ return src[9].ud;
   /* Texture offset. */
   else if (i == 7)
  return 2;
@@ -739,7 +740,7 @@ fs_inst::components_read(unsigned i) const
   assert(src[3].file == IMM);
   /* Surface coordinates. */
   if (i == 0)
- return src[3].fixed_hw_reg.ud;
+ return src[3].ud;
   /* Surface operation source (ignored for reads). */
   else if (i == 1)
  return 0;
@@ -752,10 +753,10 @@ fs_inst::components_read(unsigned i) const
  src[4].file == IMM);
   /* Surface coordinates. */
   if (i == 0)
- return src[3].fixed_hw_reg.ud;
+ return src[3].ud;
   /* Surface operation source. */
   else if (i == 1)
- return src[4].fixed_hw_reg.ud;
+ return src[4].ud;
   else
  return 1;
 
@@ -763,10 +764,10 @@ fs_inst::components_read(unsigned i) const
case SHADER_OPCODE_TYPED_ATOMIC_LOGICAL: {
   assert(src[3].file == IMM &&
  src[4].file == IMM);
-  const unsigned op = src[4].fixed_hw_reg.ud;
+  const unsigned op = src[4].ud;
   /* Surface coordinates. */
   if (i == 0)
- return src[3].fixed_hw_reg.ud;
+ return src[3].ud;
   /* Surface operation source. */
   else if (i == 1 && op == BRW_AOP_CMPWR)
  return 2;
@@ -1954,8 +1955,7 @@ fs_visitor::opt_algebraic()
 if (inst->dst.type != inst->src[0].type)
assert(!"unimplemented: saturate mixed types");
 
-if (brw_saturate_immediate(inst->dst.type,
-   >src[0].fixed_hw_reg)) {
+if 

[Mesa-dev] [PATCH 04/24] i965: Make 'dw1' and 'bits' unnamed structures in brw_reg.

2015-11-02 Thread Matt Turner
Generated by

   sed -i -e 's/\.bits\././g' *.c *.h *.cpp
   sed -i -e 's/dw1\.//g' *.c *.h *.cpp

and then reverting changes to comments in gen7_blorp.cpp and
brw_fs_generator.cpp.

There wasn't any utility offered by forcing the programmer to list these
to access their fields. Removing them will reduce churn in future
commits.

This is C11 (and gcc has apparently supported it for sometime
"compatibility with other compilers")

See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c| 52 -
 src/mesa/drivers/dri/i965/brw_ff_gs_emit.c |  2 +-
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 68 +++---
 .../drivers/dri/i965/brw_fs_combine_constants.cpp  |  6 +-
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   |  8 +--
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp   | 16 ++---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 42 ++---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |  2 +-
 src/mesa/drivers/dri/i965/brw_reg.h| 38 ++--
 src/mesa/drivers/dri/i965/brw_shader.cpp   | 30 +-
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 40 ++---
 .../drivers/dri/i965/brw_vec4_copy_propagation.cpp |  6 +-
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp   | 40 ++---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp |  2 +-
 14 files changed, 176 insertions(+), 176 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index a6fbb54..775027d 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -169,10 +169,10 @@ brw_set_dest(struct brw_codegen *p, brw_inst *inst, 
struct brw_reg dest)
  brw_inst_set_dst_hstride(devinfo, inst, dest.hstride);
   } else {
  brw_inst_set_dst_da16_subreg_nr(devinfo, inst, dest.subnr / 16);
- brw_inst_set_da16_writemask(devinfo, inst, dest.dw1.bits.writemask);
+ brw_inst_set_da16_writemask(devinfo, inst, dest.writemask);
  if (dest.file == BRW_GENERAL_REGISTER_FILE ||
  dest.file == BRW_MESSAGE_REGISTER_FILE) {
-assert(dest.dw1.bits.writemask != 0);
+assert(dest.writemask != 0);
  }
 /* From the Ivybridge PRM, Vol 4, Part 3, Section 5.2.4.1:
  *Although Dst.HorzStride is a don't care for Align16, HW needs
@@ -187,13 +187,13 @@ brw_set_dest(struct brw_codegen *p, brw_inst *inst, 
struct brw_reg dest)
*/
   if (brw_inst_access_mode(devinfo, inst) == BRW_ALIGN_1) {
  brw_inst_set_dst_ia1_addr_imm(devinfo, inst,
-   dest.dw1.bits.indirect_offset);
+   dest.indirect_offset);
 if (dest.hstride == BRW_HORIZONTAL_STRIDE_0)
dest.hstride = BRW_HORIZONTAL_STRIDE_1;
  brw_inst_set_dst_hstride(devinfo, inst, dest.hstride);
   } else {
  brw_inst_set_dst_ia16_addr_imm(devinfo, inst,
-dest.dw1.bits.indirect_offset);
+dest.indirect_offset);
 /* even ignored in da16, still need to set as '01' */
  brw_inst_set_dst_hstride(devinfo, inst, 1);
   }
@@ -243,7 +243,7 @@ validate_reg(const struct brw_device_info *devinfo,
 */
if (reg.file == BRW_ARCHITECTURE_REGISTER_FILE &&
reg.nr == BRW_ARF_ACCUMULATOR)
-  assert(reg.dw1.bits.swizzle == BRW_SWIZZLE_XYZW);
+  assert(reg.swizzle == BRW_SWIZZLE_XYZW);
 
assert(reg.hstride >= 0 && reg.hstride < ARRAY_SIZE(hstride_for_reg));
hstride = hstride_for_reg[reg.hstride];
@@ -338,7 +338,7 @@ brw_set_src0(struct brw_codegen *p, brw_inst *inst, struct 
brw_reg reg)
brw_inst_set_src0_address_mode(devinfo, inst, reg.address_mode);
 
if (reg.file == BRW_IMMEDIATE_VALUE) {
-  brw_inst_set_imm_ud(devinfo, inst, reg.dw1.ud);
+  brw_inst_set_imm_ud(devinfo, inst, reg.ud);
 
   /* The Bspec's section titled "Non-present Operands" claims that if src0
* is an immediate that src1's type must be the same as that of src0.
@@ -408,9 +408,9 @@ brw_set_src0(struct brw_codegen *p, brw_inst *inst, struct 
brw_reg reg)
  brw_inst_set_src0_ia_subreg_nr(devinfo, inst, reg.subnr);
 
  if (brw_inst_access_mode(devinfo, inst) == BRW_ALIGN_1) {
-brw_inst_set_src0_ia1_addr_imm(devinfo, inst, 
reg.dw1.bits.indirect_offset);
+brw_inst_set_src0_ia1_addr_imm(devinfo, inst, reg.indirect_offset);
 } else {
-brw_inst_set_src0_ia16_addr_imm(devinfo, inst, 
reg.dw1.bits.indirect_offset);
+brw_inst_set_src0_ia16_addr_imm(devinfo, inst, 
reg.indirect_offset);
 }
   }
 
@@ -427,13 +427,13 @@ brw_set_src0(struct brw_codegen *p, brw_inst *inst, 
struct brw_reg reg)
 }
   } else {
  brw_inst_set_src0_da16_swiz_x(devinfo, inst,
- 

[Mesa-dev] [PATCH 09/24] i965/vec4: Remove swizzle/writemask fields from src/dst_reg.

2015-11-02 Thread Matt Turner
Also allows us to handle HW_REGs in the swizzle() and writemask()
functions.
---
 src/mesa/drivers/dri/i965/brw_ir_vec4.h | 7 +--
 src/mesa/drivers/dri/i965/brw_vec4.cpp  | 2 --
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h 
b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
index 0b2a925..a19a262 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
@@ -55,8 +55,6 @@ public:
 
explicit src_reg(const dst_reg );
 
-   unsigned swizzle; /**< BRW_SWIZZLE_XYZW macros from brw_reg.h. */
-
src_reg *reladdr;
 };
 
@@ -82,7 +80,6 @@ offset(src_reg reg, unsigned delta)
 static inline src_reg
 swizzle(src_reg reg, unsigned swizzle)
 {
-   assert(reg.file != HW_REG);
reg.swizzle = brw_compose_swizzle(swizzle, reg.swizzle);
return reg;
 }
@@ -122,8 +119,6 @@ public:
 
bool equals(const dst_reg ) const;
 
-   unsigned writemask; /**< Bitfield of WRITEMASK_[XYZW] */
-
src_reg *reladdr;
 };
 
@@ -145,7 +140,7 @@ offset(dst_reg reg, unsigned delta)
 static inline dst_reg
 writemask(dst_reg reg, unsigned mask)
 {
-   assert(reg.file != HW_REG && reg.file != IMM);
+   assert(reg.file != IMM);
assert((reg.writemask & mask) != 0);
reg.writemask &= mask;
return reg;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index ad52c9f..5d63a28 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -125,7 +125,6 @@ src_reg::src_reg(struct brw_reg reg) :
this->file = HW_REG;
this->reg = 0;
this->reg_offset = 0;
-   this->swizzle = BRW_SWIZZLE_;
this->reladdr = NULL;
 }
 
@@ -189,7 +188,6 @@ dst_reg::dst_reg(struct brw_reg reg) :
this->file = HW_REG;
this->reg = 0;
this->reg_offset = 0;
-   this->writemask = WRITEMASK_XYZW;
this->reladdr = NULL;
 }
 
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/24] i965: Add and use enum brw_reg_file.

2015-11-02 Thread Matt Turner
---
 src/mesa/drivers/dri/i965/brw_defines.h| 10 ++
 src/mesa/drivers/dri/i965/brw_eu_emit.c|  2 +-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp |  5 +++--
 src/mesa/drivers/dri/i965/brw_reg.h| 25 +
 4 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 6433cff..00464e2 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1371,10 +1371,12 @@ enum PACKED brw_predicate {
BRW_PREDICATE_ALIGN16_ALL4H   =  7,
 };
 
-#define BRW_ARCHITECTURE_REGISTER_FILE0
-#define BRW_GENERAL_REGISTER_FILE 1
-#define BRW_MESSAGE_REGISTER_FILE 2
-#define BRW_IMMEDIATE_VALUE   3
+enum PACKED brw_reg_file {
+   BRW_ARCHITECTURE_REGISTER_FILE = 0,
+   BRW_GENERAL_REGISTER_FILE  = 1,
+   BRW_MESSAGE_REGISTER_FILE  = 2,
+   BRW_IMMEDIATE_VALUE= 3,
+};
 
 #define BRW_HW_REG_TYPE_UD  0
 #define BRW_HW_REG_TYPE_D   1
diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 775027d..ec04d7d 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -92,7 +92,7 @@ gen7_convert_mrf_to_grf(struct brw_codegen *p, struct brw_reg 
*reg)
  */
 unsigned
 brw_reg_type_to_hw_type(const struct brw_device_info *devinfo,
-enum brw_reg_type type, unsigned file)
+enum brw_reg_type type, enum brw_reg_file file)
 {
if (file == BRW_IMMEDIATE_VALUE) {
   static const int imm_hw_types[] = {
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index e980003..ed3e335 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -33,7 +33,8 @@
 #include "brw_fs.h"
 #include "brw_cfg.h"
 
-static uint32_t brw_file_from_reg(fs_reg *reg)
+static enum brw_reg_file
+brw_file_from_reg(fs_reg *reg)
 {
switch (reg->file) {
case GRF:
@@ -48,7 +49,7 @@ static uint32_t brw_file_from_reg(fs_reg *reg)
case UNIFORM:
   unreachable("not reached");
}
-   return 0;
+   return GRF;
 }
 
 static struct brw_reg
diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
b/src/mesa/drivers/dri/i965/brw_reg.h
index b34bfef..ed65ed5 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i965/brw_reg.h
@@ -219,7 +219,7 @@ enum PACKED brw_reg_type {
 };
 
 unsigned brw_reg_type_to_hw_type(const struct brw_device_info *devinfo,
- enum brw_reg_type type, unsigned file);
+ enum brw_reg_type type, enum brw_reg_file 
file);
 const char *brw_reg_type_letters(unsigned brw_reg_type);
 
 #define REG_SIZE (8*4)
@@ -232,7 +232,7 @@ const char *brw_reg_type_letters(unsigned brw_reg_type);
  */
 struct brw_reg {
enum brw_reg_type type:4;
-   unsigned file:2;
+   enum brw_reg_file file:2;
unsigned nr:8;
unsigned subnr:5;  /* :1 in align16 */
unsigned negate:1; /* source only */
@@ -329,7 +329,7 @@ type_is_signed(unsigned type)
  * \param writemask WRITEMASK_X/Y/Z/W bitfield
  */
 static inline struct brw_reg
-brw_reg(unsigned file,
+brw_reg(enum brw_reg_file file,
 unsigned nr,
 unsigned subnr,
 unsigned negate,
@@ -378,7 +378,7 @@ brw_reg(unsigned file,
 
 /** Construct float[16] register */
 static inline struct brw_reg
-brw_vec16_reg(unsigned file, unsigned nr, unsigned subnr)
+brw_vec16_reg(enum brw_reg_file file, unsigned nr, unsigned subnr)
 {
return brw_reg(file,
   nr,
@@ -395,7 +395,7 @@ brw_vec16_reg(unsigned file, unsigned nr, unsigned subnr)
 
 /** Construct float[8] register */
 static inline struct brw_reg
-brw_vec8_reg(unsigned file, unsigned nr, unsigned subnr)
+brw_vec8_reg(enum brw_reg_file file, unsigned nr, unsigned subnr)
 {
return brw_reg(file,
   nr,
@@ -412,7 +412,7 @@ brw_vec8_reg(unsigned file, unsigned nr, unsigned subnr)
 
 /** Construct float[4] register */
 static inline struct brw_reg
-brw_vec4_reg(unsigned file, unsigned nr, unsigned subnr)
+brw_vec4_reg(enum brw_reg_file file, unsigned nr, unsigned subnr)
 {
return brw_reg(file,
   nr,
@@ -429,7 +429,7 @@ brw_vec4_reg(unsigned file, unsigned nr, unsigned subnr)
 
 /** Construct float[2] register */
 static inline struct brw_reg
-brw_vec2_reg(unsigned file, unsigned nr, unsigned subnr)
+brw_vec2_reg(enum brw_reg_file file, unsigned nr, unsigned subnr)
 {
return brw_reg(file,
   nr,
@@ -446,7 +446,7 @@ brw_vec2_reg(unsigned file, unsigned nr, unsigned subnr)
 
 /** Construct float[1] register */
 static inline struct brw_reg
-brw_vec1_reg(unsigned file, unsigned nr, unsigned subnr)
+brw_vec1_reg(enum brw_reg_file file, unsigned nr, unsigned subnr)
 {
return 

[Mesa-dev] [PATCH 03/24] i965: Delete type field from backend_reg.

2015-11-02 Thread Matt Turner
Switching from an implicitly-sized type field to field with an explicit
bit width is safe because we have fewer than 2^4 types, and gcc will
warn if you attempt to set a value that will not fit.
---
 src/mesa/drivers/dri/i965/brw_shader.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
b/src/mesa/drivers/dri/i965/brw_shader.h
index c9614aa..9a7a2d5 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.h
+++ b/src/mesa/drivers/dri/i965/brw_shader.h
@@ -59,7 +59,6 @@ struct backend_reg : public brw_reg
bool in_range(const backend_reg , unsigned n) const;
 
enum register_file file; /**< Register file: GRF, MRF, IMM. */
-   enum brw_reg_type type;  /**< Register type: BRW_REGISTER_TYPE_* */
 
/**
 * Register number.
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/24] i965: Refactor register classes

2015-11-02 Thread Matt Turner
backend_reg (from which fs_reg, src_reg, and dst_reg inherit) includes a
brw_reg that's used for "hardware regs" -- precolored registers or architecture
registers. This leads to properties like source modifiers, the register type,
swizzles, and writemasks being duplicated between the derived classes and the
brw_reg and of course often being out of sync.

This series removes the "fixed_hw_reg" field from backend_reg by just making
backend_reg inherit from brw_reg, and then removes fields duplicated in the
derived classes. In the process, it gets rid of HW_REG.

This in turn simplifies a lot of code -- no longer do you have to check a
number of subfields if file == HW_REG.

The last few patches begin some clean ups -- since the base of our register
classes is now brw_reg we don't need to do as many conversions. I've only
handled immediates so far and more is planned, but the series is growing large
and is a lot of churn already.

The sizes of the register classes all shrink by 8 bytes:

   backend_reg   20 -> 12
   fs_reg40 -> 32
   src_reg   32 -> 24?
   dst_reg   32 -> 24?

The remaining fields in the classes are

   backend_reg: reg_offset
   fs_reg:  reladdr, subreg_offset, stride
   src_reg  reladdr
   dst_reg  reladdr

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 19/24] i965: Use BRW_MRF_COMPR4 macro in more places.

2015-11-02 Thread Matt Turner
---
 src/mesa/drivers/dri/i965/brw_disasm.c | 4 ++--
 src/mesa/drivers/dri/i965/brw_eu_emit.c| 4 ++--
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +-
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c 
b/src/mesa/drivers/dri/i965/brw_disasm.c
index df74710..4ff4fe2 100644
--- a/src/mesa/drivers/dri/i965/brw_disasm.c
+++ b/src/mesa/drivers/dri/i965/brw_disasm.c
@@ -720,7 +720,7 @@ reg(FILE *file, unsigned _reg_file, unsigned _reg_nr)
 
/* Clear the Compr4 instruction compression bit. */
if (_reg_file == BRW_MESSAGE_REGISTER_FILE)
-  _reg_nr &= ~(1 << 7);
+  _reg_nr &= ~BRW_MRF_COMPR4;
 
if (_reg_file == BRW_ARCHITECTURE_REGISTER_FILE) {
   switch (_reg_nr & 0xf0) {
@@ -1644,7 +1644,7 @@ brw_disassemble_inst(FILE *file, const struct 
brw_device_info *devinfo,
  if (brw_inst_qtr_control(devinfo, inst) == BRW_COMPRESSION_COMPRESSED 
&&
  opcode_descs[opcode].ndst > 0 &&
  brw_inst_dst_reg_file(devinfo, inst) == BRW_MESSAGE_REGISTER_FILE 
&&
- brw_inst_dst_da_reg_nr(devinfo, inst) & (1 << 7)) {
+ brw_inst_dst_da_reg_nr(devinfo, inst) & BRW_MRF_COMPR4) {
 format(file, " compr4");
  } else {
 err |= control(file, "compression control", compr_ctrl,
diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index ec04d7d..da1ddfd 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -147,7 +147,7 @@ brw_set_dest(struct brw_codegen *p, brw_inst *inst, struct 
brw_reg dest)
const struct brw_device_info *devinfo = p->devinfo;
 
if (dest.file == BRW_MESSAGE_REGISTER_FILE)
-  assert((dest.nr & ~(1 << 7)) < BRW_MAX_MRF(devinfo->gen));
+  assert((dest.nr & ~BRW_MRF_COMPR4) < BRW_MAX_MRF(devinfo->gen));
else if (dest.file != BRW_ARCHITECTURE_REGISTER_FILE)
   assert(dest.nr < 128);
 
@@ -311,7 +311,7 @@ brw_set_src0(struct brw_codegen *p, brw_inst *inst, struct 
brw_reg reg)
const struct brw_device_info *devinfo = p->devinfo;
 
if (reg.file == BRW_MESSAGE_REGISTER_FILE)
-  assert((reg.nr & ~(1 << 7)) < BRW_MAX_MRF(devinfo->gen));
+  assert((reg.nr & ~BRW_MRF_COMPR4) < BRW_MAX_MRF(devinfo->gen));
else if (reg.file != BRW_ARCHITECTURE_REGISTER_FILE)
   assert(reg.nr < 128);
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index a59439f..977e15a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -61,7 +61,7 @@ brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, unsigned gen)
 
switch (reg->file) {
case MRF:
-  assert((reg->nr & ~(1 << 7)) < BRW_MAX_MRF(gen));
+  assert((reg->nr & ~BRW_MRF_COMPR4) < BRW_MAX_MRF(gen));
   /* Fallthrough */
case VGRF:
   if (reg->stride == 0) {
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 928ba74..ff9ef81 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1810,7 +1810,7 @@ vec4_visitor::convert_to_hw_regs()
  break;
 
   case MRF:
- assert(((dst.nr + dst.reg_offset) & ~(1 << 7)) < 
BRW_MAX_MRF(devinfo->gen));
+ assert(((dst.nr + dst.reg_offset) & ~BRW_MRF_COMPR4) < 
BRW_MAX_MRF(devinfo->gen));
  reg = brw_message_reg(dst.nr + dst.reg_offset);
  reg.type = dst.type;
  reg.writemask = dst.writemask;
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 24/24] i965: Drop IMM fs_reg/src_reg -> brw_reg conversions.

2015-11-02 Thread Matt Turner
The previous two commits make this unnecessary.
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 32 +-
 src/mesa/drivers/dri/i965/brw_vec4.cpp |  6 +
 2 files changed, 2 insertions(+), 36 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 977e15a..9fd2d4f 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -89,39 +89,9 @@ brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, unsigned gen)
   brw_reg.abs = reg->abs;
   brw_reg.negate = reg->negate;
   break;
-   case IMM:
-  assert(reg->stride == ((reg->type == BRW_REGISTER_TYPE_V ||
-  reg->type == BRW_REGISTER_TYPE_UV ||
-  reg->type == BRW_REGISTER_TYPE_VF) ? 1 : 0));
-
-  switch (reg->type) {
-  case BRW_REGISTER_TYPE_F:
-brw_reg = brw_imm_f(reg->f);
-break;
-  case BRW_REGISTER_TYPE_D:
-brw_reg = brw_imm_d(reg->d);
-break;
-  case BRW_REGISTER_TYPE_UD:
-brw_reg = brw_imm_ud(reg->ud);
-break;
-  case BRW_REGISTER_TYPE_W:
-brw_reg = brw_imm_w(reg->d);
-break;
-  case BRW_REGISTER_TYPE_UW:
-brw_reg = brw_imm_uw(reg->ud);
-break;
-  case BRW_REGISTER_TYPE_VF:
- brw_reg = brw_imm_vf(reg->ud);
- break;
-  case BRW_REGISTER_TYPE_V:
- brw_reg = brw_imm_v(reg->ud);
- break;
-  default:
-unreachable("not reached");
-  }
-  break;
case ARF:
case GRF:
+   case IMM:
   brw_reg = *static_cast(reg);
   break;
case BAD_FILE:
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 53da42c..9300fd4 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1721,11 +1721,6 @@ vec4_visitor::convert_to_hw_regs()
 reg.negate = src.negate;
 break;
 
- case IMM:
-reg = brw_imm_reg(src.type);
-reg.ud = src.ud;
-break;
-
  case UNIFORM:
 reg = stride(brw_vec4_grf(prog_data->base.dispatch_grf_start_reg +
   (src.nr + src.reg_offset) / 2,
@@ -1742,6 +1737,7 @@ vec4_visitor::convert_to_hw_regs()
 
  case ARF:
  case GRF:
+ case IMM:
 continue;
 
  case BAD_FILE:
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/24] i965: Make backend_reg inherit from brw_reg.

2015-11-02 Thread Matt Turner
Some fields (file, type, abs, negate) in brw_reg are shadowed by
backend_reg.
---
 src/mesa/drivers/dri/i965/brw_shader.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
b/src/mesa/drivers/dri/i965/brw_shader.h
index 6a2dfc9..9a516c3 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.h
+++ b/src/mesa/drivers/dri/i965/brw_shader.h
@@ -48,16 +48,15 @@ enum PACKED register_file {
UNIFORM, /* prog_data->params[reg] */
 };
 
-struct backend_reg
-{
 #ifdef __cplusplus
+struct backend_reg : public brw_reg
+{
bool is_zero() const;
bool is_one() const;
bool is_negative_one() const;
bool is_null() const;
bool is_accumulator() const;
bool in_range(const backend_reg , unsigned n) const;
-#endif
 
enum register_file file; /**< Register file: GRF, MRF, IMM. */
enum brw_reg_type type;  /**< Register type: BRW_REGISTER_TYPE_* */
@@ -87,6 +86,7 @@ struct backend_reg
bool negate;
bool abs;
 };
+#endif
 
 struct cfg_t;
 struct bblock_t;
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Expose support for OES/EXT_draw_elements_base_vertex to OpenGL ES.

2015-11-02 Thread Ilia Mirkin
On Mon, Nov 2, 2015 at 7:46 PM, Ian Romanick  wrote:
> On 11/01/2015 07:29 PM, Ilia Mirkin wrote:
>> On Sun, Nov 1, 2015 at 10:24 PM, Ryan Houdek  wrote:
>>> This has been tested with the piglits in the mailing list and
>>> on the Dolphin emulator.
>>> ---
>>>  docs/GL3.txt   |  2 +-
>>>  docs/relnotes/11.1.0.html  |  2 +
>>>  .../glapi/gen/ARB_draw_elements_base_vertex.xml|  6 +-
>>>  src/mapi/glapi/gen/es_EXT.xml  | 88 
>>> ++
>>>  src/mesa/main/extensions.c |  2 +
>>>  src/mesa/vbo/vbo_exec_array.c  | 12 ++-
>>>  6 files changed, 105 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/docs/GL3.txt b/docs/GL3.txt
>>> index 7964a5e..7f6b8c9 100644
>>> --- a/docs/GL3.txt
>>> +++ b/docs/GL3.txt
>>> @@ -243,7 +243,7 @@ GLES3.2, GLSL ES 3.2
>>>GL_KHR_texture_compression_astc_ldr  DONE (i965/gen9+)
>>>GL_OES_copy_imagenot started (based 
>>> on GL_ARB_copy_image, which is done for some drivers)
>>>GL_OES_draw_buffers_indexed  not started
>>> -  GL_OES_draw_elements_base_vertex not started (based 
>>> on GL_ARB_draw_elements_base_vertex, which is done for all drivers)
>>> +  GL_OES_draw_elements_base_vertex DONE (all drivers)
>>>GL_OES_geometry_shader   not started (based 
>>> on GL_ARB_geometry_shader4, which is done for all drivers)
>>>GL_OES_gpu_shader5   not started (based 
>>> on parts of GL_ARB_gpu_shader5, which is done for some drivers)
>>>GL_OES_primitive_bounding boxnot started
>>> diff --git a/docs/relnotes/11.1.0.html b/docs/relnotes/11.1.0.html
>>> index 972361f..7160244 100644
>>> --- a/docs/relnotes/11.1.0.html
>>> +++ b/docs/relnotes/11.1.0.html
>>> @@ -55,6 +55,8 @@ Note: some of the new features are only available with 
>>> certain drivers.
>>>  GL_ARB_texture_barrier / GL_NV_texture_barrier on i965
>>>  GL_ARB_texture_query_lod on softpipe
>>>  GL_ARB_texture_view on radeonsi
>>> +GL_EXT_draw_elements_base_vertex on all drivers
>>> +GL_OES_draw_elements_base_vertex on all drivers
>>>  EGL_KHR_create_context on softpipe, llvmpipe
>>>  EGL_KHR_gl_colorspace on softpipe, llvmpipe
>>>  new virgl gallium driver for qemu virtio-gpu
>>> diff --git a/src/mapi/glapi/gen/ARB_draw_elements_base_vertex.xml 
>>> b/src/mapi/glapi/gen/ARB_draw_elements_base_vertex.xml
>>> index 120bda1..72aa62c 100644
>>> --- a/src/mapi/glapi/gen/ARB_draw_elements_base_vertex.xml
>>> +++ b/src/mapi/glapi/gen/ARB_draw_elements_base_vertex.xml
>>> @@ -8,7 +8,7 @@
>>>
>>>  
>>>
>>> -
>>> +
>>>  
>>>  
>>>  
>>> @@ -16,7 +16,7 @@
>>>  
>>>  
>>>
>>> -
>>> +
>>>  
>>>  
>>>  
>>> @@ -35,7 +35,7 @@
>>>  
>>>  
>>>
>>> -
>>> +>> exec="dynamic">
>>>  
>>>  
>>>  
>>> diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml
>>> index cfca5a9..9277285 100644
>>> --- a/src/mapi/glapi/gen/es_EXT.xml
>>> +++ b/src/mapi/glapi/gen/es_EXT.xml
>>> @@ -817,4 +817,92 @@
>>>  
>>>  
>>>
>>> +
>>> +
>>> +   >> alias="DrawElementsBaseVertex"
>>
>> indentation fail
>>
>>> +exec="dynamic">
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +>> alias="DrawRangeElementsBaseVertex"
>>> +exec="dynamic">
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +>> alias="MultiDrawElementsBaseVertex"
>>> +exec="dynamic">
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +>> alias="DrawElementsInstancedBaseVertex"
>>> +exec="dynamic">
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +   >> alias="DrawElementsBaseVertex"
>>
>> indentation fail. actually perhaps it's the rest of the files that are
>> fail... will check what the other bits of it look like.
>>
>>> +exec="dynamic">
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +>> alias="DrawRangeElementsBaseVertex"
>>> +es2="3.0" exec="dynamic">
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +>> alias="MultiDrawElementsBaseVertex"
>>> +exec="dynamic">
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +>> alias="DrawElementsInstancedBaseVertex"
>>> +es2="3.0" exec="dynamic">
>>> +
>>> +
>>> +
>>> +
>>> + 

Re: [Mesa-dev] [PATCH 1/3] mesa: restrict ES2 from 32-bit blending, add GL_EXT_float_blend

2015-11-02 Thread Ilia Mirkin
On Mon, Nov 2, 2015 at 8:07 PM, Ian Romanick  wrote:
> On 11/02/2015 04:50 PM, Ilia Mirkin wrote:
>> GL_EXT_color_buffer_float adds support for float buffers in ES3.0+, but
>> explicitly disallows 32-bit blending. However this restriction was never
>> implemented in mesa.
>>
>> Add the restriction, and also allow a driver to expose
>> GL_EXT_float_blend which re-enables the functionality.
>>
>> Signed-off-by: Ilia Mirkin 
>> ---
>>
>> Untested... looking for confirmation that this is the right thing to
>> do. Will write a piglit if it is.
>
> Some flavor of this probably is the right thing to do.  The question is
> whether or not any hardware supported by Mesa can do
> GL_EXT_color_buffer_float but not GL_EXT_float_blend... if everyone can
> do both or neither, this patch series could be even simpler. :)

Adreno A3xx can't do 32-bit blending, only 16-bit. Sorry!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] Expose support for GL_EXT_buffer_storage

2015-11-02 Thread Ryan Houdek
This extension requires ES 3.1 since it relies on glMemoryBarrier.
For testing purposes I temporarily moved glMemoryBarrier to be an ES 3.0
function.
This has been tested with the piglit in the ML and the Dolphin emulator.
---
 docs/relnotes/11.1.0.html | 1 +
 src/mapi/glapi/gen/es_EXT.xml | 9 +
 src/mesa/main/extensions.c| 1 +
 3 files changed, 11 insertions(+)

diff --git a/docs/relnotes/11.1.0.html b/docs/relnotes/11.1.0.html
index 7160244..0b044cf 100644
--- a/docs/relnotes/11.1.0.html
+++ b/docs/relnotes/11.1.0.html
@@ -55,6 +55,7 @@ Note: some of the new features are only available with 
certain drivers.
 GL_ARB_texture_barrier / GL_NV_texture_barrier on i965
 GL_ARB_texture_query_lod on softpipe
 GL_ARB_texture_view on radeonsi
+GL_EXT_buffer_storage implemented for when ES 3.1 support is gained
 GL_EXT_draw_elements_base_vertex on all drivers
 GL_OES_draw_elements_base_vertex on all drivers
 EGL_KHR_create_context on softpipe, llvmpipe
diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml
index bf20e48..9a777a2 100644
--- a/src/mapi/glapi/gen/es_EXT.xml
+++ b/src/mapi/glapi/gen/es_EXT.xml
@@ -905,4 +905,13 @@
 
 
 
+
+
+
+
+
+
+
+
+
 
diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index d964f03..bdc6817 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -222,6 +222,7 @@ static const struct extension extension_table[] = {
{ "GL_EXT_blend_color", o(EXT_blend_color), 
GLL,1995 },
{ "GL_EXT_blend_equation_separate", 
o(EXT_blend_equation_separate), GL, 2003 },
{ "GL_EXT_blend_func_separate", o(EXT_blend_func_separate), 
GLL,1999 },
+   { "GL_EXT_buffer_storage",  o(ARB_buffer_storage),  
   ES31, 2015 },
{ "GL_EXT_discard_framebuffer", o(dummy_true),  
  ES1 | ES2, 2009 },
{ "GL_EXT_blend_minmax",o(EXT_blend_minmax),
GLL | ES1 | ES2, 1995 },
{ "GL_EXT_blend_subtract",  o(dummy_true),  
GLL,1995 },
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 21/24] i965/fs: Replace fs_reg(imm) constructors with brw_imm_*().

2015-11-02 Thread Matt Turner
Cuts 10k of .text, of which only 776 bytes are the fs_reg constructor
implementations themselves.

   text data  bss  dec  hex  filename
5204535   21411227784  5446431   531b1f  i965_dri.so before
5193977   21411227784  5435873   52f1e1  i965_dri.so after
---
 src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp|   2 +-
 src/mesa/drivers/dri/i965/brw_fs.cpp   |  97 +---
 src/mesa/drivers/dri/i965/brw_fs_builder.h |   4 +-
 .../drivers/dri/i965/brw_fs_combine_constants.cpp  |   2 +-
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 123 +++--
 .../drivers/dri/i965/brw_fs_surface_builder.cpp|  48 
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |  36 +++---
 src/mesa/drivers/dri/i965/brw_ir_fs.h  |   5 -
 .../drivers/dri/i965/test_fs_cmod_propagation.cpp  |  30 ++---
 9 files changed, 148 insertions(+), 199 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
index 5308d17..e684bdb 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
@@ -85,7 +85,7 @@ brw_blorp_eu_emitter::emit_texture_lookup(const struct 
brw_reg ,
   unsigned msg_length)
 {
fs_inst *inst = new (mem_ctx) fs_inst(op, 16, dst, 
brw_message_reg(base_mrf),
- fs_reg(0u));
+ brw_imm_ud(0u));
 
inst->base_mrf = base_mrf;
inst->mlen = msg_length;
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 3d2b051..9cd4f47 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -185,7 +185,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
,
 * the redundant ones.
 */
fs_reg vec4_offset = vgrf(glsl_type::int_type);
-   bld.ADD(vec4_offset, varying_offset, fs_reg(const_offset & ~3));
+   bld.ADD(vec4_offset, varying_offset, brw_imm_ud(const_offset & ~3));
 
int scale = 1;
if (devinfo->gen == 4 && bld.dispatch_width() == 8) {
@@ -372,54 +372,6 @@ fs_reg::fs_reg()
this->file = BAD_FILE;
 }
 
-/** Immediate value constructor. */
-fs_reg::fs_reg(float f)
-{
-   init();
-   this->file = IMM;
-   this->type = BRW_REGISTER_TYPE_F;
-   this->stride = 0;
-   this->f = f;
-}
-
-/** Immediate value constructor. */
-fs_reg::fs_reg(int32_t i)
-{
-   init();
-   this->file = IMM;
-   this->type = BRW_REGISTER_TYPE_D;
-   this->stride = 0;
-   this->d = i;
-}
-
-/** Immediate value constructor. */
-fs_reg::fs_reg(uint32_t u)
-{
-   init();
-   this->file = IMM;
-   this->type = BRW_REGISTER_TYPE_UD;
-   this->stride = 0;
-   this->ud = u;
-}
-
-/** Vector float immediate value constructor. */
-fs_reg::fs_reg(uint8_t vf[4])
-{
-   init();
-   this->file = IMM;
-   this->type = BRW_REGISTER_TYPE_VF;
-   memcpy(>ud, vf, sizeof(unsigned));
-}
-
-/** Vector float immediate value constructor. */
-fs_reg::fs_reg(uint8_t vf0, uint8_t vf1, uint8_t vf2, uint8_t vf3)
-{
-   init();
-   this->file = IMM;
-   this->type = BRW_REGISTER_TYPE_VF;
-   this->ud = (vf0 <<  0) | (vf1 <<  8) | (vf2 << 16) | (vf3 << 24);
-}
-
 fs_reg::fs_reg(struct brw_reg reg) :
backend_reg(reg)
 {
@@ -575,7 +527,7 @@ fs_visitor::emit_shader_time_end()
fs_reg reset = shader_end_time;
reset.set_smear(2);
set_condmod(BRW_CONDITIONAL_Z,
-   ibld.AND(ibld.null_reg_ud(), reset, fs_reg(1u)));
+   ibld.AND(ibld.null_reg_ud(), reset, brw_imm_ud(1u)));
ibld.IF(BRW_PREDICATE_NORMAL);
 
fs_reg start = shader_start_time;
@@ -590,11 +542,11 @@ fs_visitor::emit_shader_time_end()
 * is 2 cycles.  Remove that overhead, so I can forget about that when
 * trying to determine the time taken for single instructions.
 */
-   cbld.ADD(diff, diff, fs_reg(-2u));
+   cbld.ADD(diff, diff, brw_imm_ud(-2u));
SHADER_TIME_ADD(cbld, 0, diff);
-   SHADER_TIME_ADD(cbld, 1, fs_reg(1u));
+   SHADER_TIME_ADD(cbld, 1, brw_imm_ud(1u));
ibld.emit(BRW_OPCODE_ELSE);
-   SHADER_TIME_ADD(cbld, 2, fs_reg(1u));
+   SHADER_TIME_ADD(cbld, 2, brw_imm_ud(1u));
ibld.emit(BRW_OPCODE_ENDIF);
 }
 
@@ -604,7 +556,7 @@ fs_visitor::SHADER_TIME_ADD(const fs_builder ,
 fs_reg value)
 {
int index = shader_time_index * 3 + shader_time_subindex;
-   fs_reg offset = fs_reg(index * SHADER_TIME_STRIDE);
+   struct brw_reg offset = brw_imm_d(index * SHADER_TIME_STRIDE);
 
fs_reg payload;
if (dispatch_width == 8)
@@ -983,7 +935,7 @@ fs_visitor::emit_fragcoord_interpolation(bool 
pixel_center_integer,
if (pixel_center_integer) {
   bld.MOV(wpos, this->pixel_x);
} else {
-  bld.ADD(wpos, this->pixel_x, fs_reg(0.5f));
+  bld.ADD(wpos, this->pixel_x, brw_imm_f(0.5f));
}
wpos = offset(wpos, bld, 1);
 
@@ -999,7 +951,7 @@ 

Re: [Mesa-dev] [PATCH 0/7] More i965 scheduling improvements

2015-11-02 Thread Jason Ekstrand
So, I noticed while running benchmarks that there seem to be some
significant bugs somewhere.  In particular, Unigen Heaven misrenders
on Haswell.  One of the synmark tests also misrenders.  Initially I
thought it was a spilling bug, but I'm also seeing misrendering on the
sethi ullman patch.
--Jason

On Fri, Oct 30, 2015 at 6:02 PM, Connor Abbott  wrote:
> This series implements some more aggressive scheduler changes based on
> the original series that I sent out and now has been merged to master.
> In particular, it rewrites the scheduler to be bottom-up and top-down,
> and gives it a fancy new strategy involving a combination of limit
> scheduling and Sethi-Ullman numbering in order to tackle
> register-pressure-limited scenarios while still providing good
> parallelism otherwise. It also fixes some serious shortcomings in the
> low register pressure case, making us actually use the critical path
> information we were already computing before and using a better strategy
> to minimize stalls. Finally, it changes the heuristic for whether we
> should drop SIMD16 to something that, while probably not perfect, is
> probably still better than what we had previously.
>
> While each patch has shader-db numbers, it's a little hard to see the
> forest from the trees while looking at each change individually, so here
> are the shader-db numbers for the entire series on bdw, created using my
> shader-db patch [1]:
>
> total instructions in shared programs: 7392779 -> 7386851 (-0.08%)
> instructions in affected programs: 24443 -> 18515 (-24.25%)
> helped: 15
> HURT: 0
>
> total cycles in shared programs: 56128804 -> 48572820 (-13.46%)
> cycles in affected programs: 54357022 -> 46801038 (-13.90%)
> helped: 60142
> HURT: 801
>
> LOST:   392
> GAINED: 59
>
> But note that most of the SIMD16 shaders we lost were bad SIMD16 shaders
> that probably wouldn't have helped us; the remaining gained shaders went
> from spilling and thrown out to actually useful. And, of course, the
> intervening patches ensure that many more SIMD16 programs are considered
> "useful."
>
> Notably, after this series, there are no more SIMD8 programs in my
> shader-db that spill anymore!
>
> Note that this series is a little different from the one that some
> people have been looking at before. In particular, I dropped an attempt
> to replace the LIFO heuristic that turned out to not be useful at all by
> the end (instead, I just nuked it), and I fixed a slight issue with how
> the amount above the register pressure threshold was being computed in
> the Sethi-Ullman patch. As this is the first version I'm actually
> sending out for review, I didn't bother to mark those changes in the
> commit messages. It's probably worth re-doing the benchmarks, since now
> my shader-db shows no regressions or lost SIMD16 shaders in any SynMark
> benchmark for whatever reason.
>
> This series is also available at
> git://people.freedesktop.org/~cwabbott0/mesa i965-sched-v3
>
> [1] http://lists.freedesktop.org/archives/mesa-dev/2015-October/097431.html
>
> Connor Abbott (7):
>   i965: use real latencies in the pre-RA scheduler
>   i965/sched: use a critical path heuristic
>   i965/sched: get rid of the LIFO heuristic
>   i965/sched: switch to register pressure scheduling dynamically
>   i965/sched: switch to bottom-up scheduling
>   i965/sched: use Sethi-Ullman numbering
>   i965/fs: use a better heuristic for SIMD16
>
>  src/mesa/drivers/dri/i965/brw_fs.cpp   |  59 +--
>  src/mesa/drivers/dri/i965/brw_fs.h |   8 +-
>  .../drivers/dri/i965/brw_schedule_instructions.cpp | 443 
> +
>  src/mesa/drivers/dri/i965/brw_shader.h |   1 -
>  4 files changed, 313 insertions(+), 198 deletions(-)
>
> --
> 2.4.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Expose support for OES/EXT_draw_elements_base_vertex to OpenGL ES.

2015-11-02 Thread Ian Romanick
On 11/01/2015 07:29 PM, Ilia Mirkin wrote:
> On Sun, Nov 1, 2015 at 10:24 PM, Ryan Houdek  wrote:
>> This has been tested with the piglits in the mailing list and
>> on the Dolphin emulator.
>> ---
>>  docs/GL3.txt   |  2 +-
>>  docs/relnotes/11.1.0.html  |  2 +
>>  .../glapi/gen/ARB_draw_elements_base_vertex.xml|  6 +-
>>  src/mapi/glapi/gen/es_EXT.xml  | 88 
>> ++
>>  src/mesa/main/extensions.c |  2 +
>>  src/mesa/vbo/vbo_exec_array.c  | 12 ++-
>>  6 files changed, 105 insertions(+), 7 deletions(-)
>>
>> diff --git a/docs/GL3.txt b/docs/GL3.txt
>> index 7964a5e..7f6b8c9 100644
>> --- a/docs/GL3.txt
>> +++ b/docs/GL3.txt
>> @@ -243,7 +243,7 @@ GLES3.2, GLSL ES 3.2
>>GL_KHR_texture_compression_astc_ldr  DONE (i965/gen9+)
>>GL_OES_copy_imagenot started (based 
>> on GL_ARB_copy_image, which is done for some drivers)
>>GL_OES_draw_buffers_indexed  not started
>> -  GL_OES_draw_elements_base_vertex not started (based 
>> on GL_ARB_draw_elements_base_vertex, which is done for all drivers)
>> +  GL_OES_draw_elements_base_vertex DONE (all drivers)
>>GL_OES_geometry_shader   not started (based 
>> on GL_ARB_geometry_shader4, which is done for all drivers)
>>GL_OES_gpu_shader5   not started (based 
>> on parts of GL_ARB_gpu_shader5, which is done for some drivers)
>>GL_OES_primitive_bounding boxnot started
>> diff --git a/docs/relnotes/11.1.0.html b/docs/relnotes/11.1.0.html
>> index 972361f..7160244 100644
>> --- a/docs/relnotes/11.1.0.html
>> +++ b/docs/relnotes/11.1.0.html
>> @@ -55,6 +55,8 @@ Note: some of the new features are only available with 
>> certain drivers.
>>  GL_ARB_texture_barrier / GL_NV_texture_barrier on i965
>>  GL_ARB_texture_query_lod on softpipe
>>  GL_ARB_texture_view on radeonsi
>> +GL_EXT_draw_elements_base_vertex on all drivers
>> +GL_OES_draw_elements_base_vertex on all drivers
>>  EGL_KHR_create_context on softpipe, llvmpipe
>>  EGL_KHR_gl_colorspace on softpipe, llvmpipe
>>  new virgl gallium driver for qemu virtio-gpu
>> diff --git a/src/mapi/glapi/gen/ARB_draw_elements_base_vertex.xml 
>> b/src/mapi/glapi/gen/ARB_draw_elements_base_vertex.xml
>> index 120bda1..72aa62c 100644
>> --- a/src/mapi/glapi/gen/ARB_draw_elements_base_vertex.xml
>> +++ b/src/mapi/glapi/gen/ARB_draw_elements_base_vertex.xml
>> @@ -8,7 +8,7 @@
>>
>>  
>>
>> -
>> +
>>  
>>  
>>  
>> @@ -16,7 +16,7 @@
>>  
>>  
>>
>> -
>> +
>>  
>>  
>>  
>> @@ -35,7 +35,7 @@
>>  
>>  
>>
>> -
>> +> exec="dynamic">
>>  
>>  
>>  
>> diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml
>> index cfca5a9..9277285 100644
>> --- a/src/mapi/glapi/gen/es_EXT.xml
>> +++ b/src/mapi/glapi/gen/es_EXT.xml
>> @@ -817,4 +817,92 @@
>>  
>>  
>>
>> +
>> +
>> +   > alias="DrawElementsBaseVertex"
> 
> indentation fail
> 
>> +exec="dynamic">
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +> alias="DrawRangeElementsBaseVertex"
>> +exec="dynamic">
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +> alias="MultiDrawElementsBaseVertex"
>> +exec="dynamic">
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +> alias="DrawElementsInstancedBaseVertex"
>> +exec="dynamic">
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +   > alias="DrawElementsBaseVertex"
> 
> indentation fail. actually perhaps it's the rest of the files that are
> fail... will check what the other bits of it look like.
> 
>> +exec="dynamic">
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +> alias="DrawRangeElementsBaseVertex"
>> +es2="3.0" exec="dynamic">
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +> alias="MultiDrawElementsBaseVertex"
>> +exec="dynamic">
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +> alias="DrawElementsInstancedBaseVertex"
>> +es2="3.0" exec="dynamic">
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>>  
>> diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
>> index 64972fa..d964f03 100644
>> --- a/src/mesa/main/extensions.c
>> +++ b/src/mesa/main/extensions.c
>> @@ -230,6 +230,7 @@ static 

[Mesa-dev] [PATCH 2/3] st/mesa: enable GL_EXT_blend_float

2015-11-02 Thread Ilia Mirkin
If the driver supports PIPE_BIND_BLENABLE on RGBA32F, flip
EXT_blend_float on (which will affect ES3 contexts).

Signed-off-by: Ilia Mirkin 
---
 src/mesa/state_tracker/st_extensions.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index bd7cbcc..e69cd3a 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -533,6 +533,12 @@ void st_init_extensions(struct pipe_screen *screen,
   PIPE_FORMAT_R8G8_UNORM } },
};
 
+   /* Required: render target, sampler, and blending */
+   static const struct st_extension_format_mapping rt_blendable[] = {
+  { { o(EXT_float_blend) },
+{ PIPE_FORMAT_R32G32B32A32_FLOAT } },
+   };
+
/* Required: depth stencil and sampler support */
static const struct st_extension_format_mapping depthstencil_mapping[] = {
   { { o(ARB_depth_buffer_float) },
@@ -677,6 +683,10 @@ void st_init_extensions(struct pipe_screen *screen,
init_format_extensions(screen, extensions, rendertarget_mapping,
   ARRAY_SIZE(rendertarget_mapping), PIPE_TEXTURE_2D,
   PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW);
+   init_format_extensions(screen, extensions, rt_blendable,
+  ARRAY_SIZE(rt_blendable), PIPE_TEXTURE_2D,
+  PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
+  PIPE_BIND_BLENDABLE);
init_format_extensions(screen, extensions, depthstencil_mapping,
   ARRAY_SIZE(depthstencil_mapping), PIPE_TEXTURE_2D,
   PIPE_BIND_DEPTH_STENCIL | PIPE_BIND_SAMPLER_VIEW);
-- 
2.4.10

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] i965: always enable EXT_float_blend

2015-11-02 Thread Ilia Mirkin
From the table in brw_surface_formats.c, it appears that all generations
support blending on RGBA32F surfaces.

Signed-off-by: Ilia Mirkin 
---
 src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 4643ea3..26c8f28 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -224,6 +224,7 @@ intelInitExtensions(struct gl_context *ctx)
ctx->Extensions.EXT_blend_func_separate = true;
ctx->Extensions.EXT_blend_minmax = true;
ctx->Extensions.EXT_draw_buffers2 = true;
+   ctx->Extensions.EXT_float_blend = true;
ctx->Extensions.EXT_framebuffer_sRGB = true;
ctx->Extensions.EXT_gpu_program_parameters = true;
ctx->Extensions.EXT_packed_float = true;
-- 
2.4.10

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] mesa: restrict ES2 from 32-bit blending, add GL_EXT_float_blend

2015-11-02 Thread Ilia Mirkin
GL_EXT_color_buffer_float adds support for float buffers in ES3.0+, but
explicitly disallows 32-bit blending. However this restriction was never
implemented in mesa.

Add the restriction, and also allow a driver to expose
GL_EXT_float_blend which re-enables the functionality.

Signed-off-by: Ilia Mirkin 
---

Untested... looking for confirmation that this is the right thing to
do. Will write a piglit if it is.

 src/mesa/main/api_validate.c | 46 ++--
 src/mesa/main/extensions.c   |  3 ++-
 src/mesa/main/fbobject.c |  8 +++-
 src/mesa/main/mtypes.h   |  2 ++
 4 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index 06efe02..a44299d 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -46,10 +46,52 @@ check_valid_to_render(struct gl_context *ctx, const char 
*function)
}
 
switch (ctx->API) {
-   case API_OPENGLES2:
+   case API_OPENGLES2: {
+  struct gl_framebuffer *fb = ctx->DrawBuffer;
+  int i;
+
   /* For ES2, we can draw if we have a vertex program/shader). */
-  return ctx->VertexProgram._Current != NULL;
+  if (ctx->VertexProgram._Current == NULL)
+ return false;
 
+  /* From GL_EXT_color_buffer_float:
+   *
+   * "Blending applies only if the color buffer has a fixed-point or
+   * or floating-point format. If the color buffer has an integer
+   * format, proceed to the next operation.  Furthermore, an
+   * INVALID_OPERATION error is generated by DrawArrays and the other
+   * drawing commands defined in section 2.8.3 (10.5 in ES 3.1) if
+   * blending is enabled (see below) and any draw buffer has 32-bit
+   * floating-point format components."
+   *
+   * However GL_EXT_float_blend removes this text.
+   */
+  if (fb->_Has32bitFloatColorBuffer &&
+  ctx->Color.BlendEnabled &&
+  !ctx->Extensions.EXT_float_blend) {
+ bool per_buffer =
+ctx->Color._BlendFuncPerBuffer ||
+ctx->Color._BlendEquationPerBuffer ||
+(ctx->Color.BlendEnabled &&
+ (ctx->Color.BlendEnabled != ((1U << ctx->Const.MaxDrawBuffers) - 
1)));
+ if (!per_buffer && !(ctx->Color.BlendEnabled & 1))
+break;
+ for (i = 0; i < fb->_NumColorDrawBuffers; i++) {
+struct gl_renderbuffer *rb = fb->_ColorDrawBuffers[i];
+if (!rb)
+   continue;
+if (!(ctx->Color.BlendEnabled & (1U << i)))
+   continue;
+if (_mesa_get_format_datatype(rb->Format) == GL_FLOAT &&
+_mesa_get_format_max_bits(rb->Format) > 16) {
+   _mesa_error(ctx, GL_INVALID_OPERATION,
+   "%s(32-bit float output + blending)", function);
+   return false;
+}
+ }
+  }
+  break;
+   }
case API_OPENGLES:
   /* For OpenGL ES, only draw if we have vertex positions
*/
diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index 64972fa..5b9e538 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -225,6 +225,7 @@ static const struct extension extension_table[] = {
{ "GL_EXT_discard_framebuffer", o(dummy_true),  
  ES1 | ES2, 2009 },
{ "GL_EXT_blend_minmax",o(EXT_blend_minmax),
GLL | ES1 | ES2, 1995 },
{ "GL_EXT_blend_subtract",  o(dummy_true),  
GLL,1995 },
+   { "GL_EXT_color_buffer_float",  o(dummy_true),  
   ES3, 2013 },
{ "GL_EXT_compiled_vertex_array",   o(dummy_true),  
GLL,1996 },
{ "GL_EXT_copy_texture",o(dummy_true),  
GLL,1995 },
{ "GL_EXT_depth_bounds_test",   o(EXT_depth_bounds_test),   
GL, 2002 },
@@ -232,6 +233,7 @@ static const struct extension extension_table[] = {
{ "GL_EXT_draw_buffers2",   o(EXT_draw_buffers2),   
GL, 2006 },
{ "GL_EXT_draw_instanced",  o(ARB_draw_instanced),  
GL, 2006 },
{ "GL_EXT_draw_range_elements", o(dummy_true),  
GLL,1997 },
+   { "GL_EXT_float_blend", o(EXT_float_blend), 
   ES3, 2015 },
{ "GL_EXT_fog_coord",   o(dummy_true),  
GLL,1999 },
{ "GL_EXT_framebuffer_blit",o(dummy_true),  
GL, 2005 },
{ 

Re: [Mesa-dev] [PATCH 1/3] mesa: restrict ES2 from 32-bit blending, add GL_EXT_float_blend

2015-11-02 Thread Ian Romanick
On 11/02/2015 04:50 PM, Ilia Mirkin wrote:
> GL_EXT_color_buffer_float adds support for float buffers in ES3.0+, but
> explicitly disallows 32-bit blending. However this restriction was never
> implemented in mesa.
> 
> Add the restriction, and also allow a driver to expose
> GL_EXT_float_blend which re-enables the functionality.
> 
> Signed-off-by: Ilia Mirkin 
> ---
> 
> Untested... looking for confirmation that this is the right thing to
> do. Will write a piglit if it is.

Some flavor of this probably is the right thing to do.  The question is
whether or not any hardware supported by Mesa can do
GL_EXT_color_buffer_float but not GL_EXT_float_blend... if everyone can
do both or neither, this patch series could be even simpler. :)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Expose support for GL_EXT_buffer_storage

2015-11-02 Thread Ilia Mirkin
On Mon, Nov 2, 2015 at 8:19 PM, Ryan Houdek  wrote:
> This extension requires ES 3.1 since it relies on glMemoryBarrier.
> For testing purposes I temporarily moved glMemoryBarrier to be an ES 3.0
> function.
> This has been tested with the piglit in the ML and the Dolphin emulator.
> ---
>  docs/relnotes/11.1.0.html | 1 +
>  src/mapi/glapi/gen/es_EXT.xml | 9 +
>  src/mesa/main/extensions.c| 1 +
>  3 files changed, 11 insertions(+)
>
> diff --git a/docs/relnotes/11.1.0.html b/docs/relnotes/11.1.0.html
> index 7160244..0b044cf 100644
> --- a/docs/relnotes/11.1.0.html
> +++ b/docs/relnotes/11.1.0.html
> @@ -55,6 +55,7 @@ Note: some of the new features are only available with 
> certain drivers.
>  GL_ARB_texture_barrier / GL_NV_texture_barrier on i965
>  GL_ARB_texture_query_lod on softpipe
>  GL_ARB_texture_view on radeonsi
> +GL_EXT_buffer_storage implemented for when ES 3.1 support is gained
>  GL_EXT_draw_elements_base_vertex on all drivers
>  GL_OES_draw_elements_base_vertex on all drivers
>  EGL_KHR_create_context on softpipe, llvmpipe
> diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml
> index bf20e48..9a777a2 100644
> --- a/src/mapi/glapi/gen/es_EXT.xml
> +++ b/src/mapi/glapi/gen/es_EXT.xml
> @@ -905,4 +905,13 @@
>
>  
>
> +
> +
> +
> +
> +
> +
> +
> +
> +
>  
> diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
> index d964f03..6160bb3 100644
> --- a/src/mesa/main/extensions.c
> +++ b/src/mesa/main/extensions.c
> @@ -222,6 +222,7 @@ static const struct extension extension_table[] = {
> { "GL_EXT_blend_color", o(EXT_blend_color),   
>   GLL,1995 },
> { "GL_EXT_blend_equation_separate", 
> o(EXT_blend_equation_separate), GL, 2003 },
> { "GL_EXT_blend_func_separate", 
> o(EXT_blend_func_separate), GLL,1999 },
> +   { "GL_EXT_buffer_storage",  o(ARB_buffer_storage),
>   ES2, 2015 },

ES31 here. Unlike for desktop GL, we tend to be more careful about
where we expose exts... esp because a bunch of drivers support
ARB_buffer_storage but won't be getting ES3.1 anytime soon... or ever.

> { "GL_EXT_discard_framebuffer", o(dummy_true),
> ES1 | ES2, 2009 },
> { "GL_EXT_blend_minmax",o(EXT_blend_minmax),  
>   GLL | ES1 | ES2, 1995 },
> { "GL_EXT_blend_subtract",  o(dummy_true),
>   GLL,1995 },
> --
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/24] i965: Rename GRF to VGRF.

2015-11-02 Thread Matt Turner
The 2-bit hardware register file field is ARF, GRF, MRF, IMM.

Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to
mean an assigned general purpose register.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 104 ++---
 src/mesa/drivers/dri/i965/brw_fs.h |   2 +-
 src/mesa/drivers/dri/i965/brw_fs_builder.h |   4 +-
 .../drivers/dri/i965/brw_fs_cmod_propagation.cpp   |   2 +-
 .../drivers/dri/i965/brw_fs_combine_constants.cpp  |   4 +-
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   |  26 +++---
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |   6 +-
 .../dri/i965/brw_fs_dead_code_eliminate.cpp|   6 +-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp |   4 +-
 .../drivers/dri/i965/brw_fs_live_variables.cpp |   6 +-
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   |   4 +-
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |  24 ++---
 .../drivers/dri/i965/brw_fs_register_coalesce.cpp  |   8 +-
 .../dri/i965/brw_fs_saturate_propagation.cpp   |   6 +-
 src/mesa/drivers/dri/i965/brw_fs_validate.cpp  |   4 +-
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |  16 ++--
 src/mesa/drivers/dri/i965/brw_ir_fs.h  |   6 +-
 .../drivers/dri/i965/brw_schedule_instructions.cpp |  26 +++---
 src/mesa/drivers/dri/i965/brw_shader.h |   4 +-
 src/mesa/drivers/dri/i965/brw_vec4.cpp |  32 +++
 src/mesa/drivers/dri/i965/brw_vec4_builder.h   |   2 +-
 .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp |   2 +-
 .../drivers/dri/i965/brw_vec4_copy_propagation.cpp |  14 +--
 src/mesa/drivers/dri/i965/brw_vec4_cse.cpp |   4 +-
 .../dri/i965/brw_vec4_dead_code_eliminate.cpp  |   8 +-
 .../drivers/dri/i965/brw_vec4_live_variables.cpp   |   8 +-
 .../drivers/dri/i965/brw_vec4_live_variables.h |   4 +-
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp |   8 +-
 .../drivers/dri/i965/brw_vec4_reg_allocate.cpp |  26 +++---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp |  16 ++--
 30 files changed, 193 insertions(+), 193 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index fecac13..91eaf61 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -75,7 +75,7 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const 
fs_reg ,
 
/* This will be the case for almost all instructions. */
switch (dst.file) {
-   case GRF:
+   case VGRF:
case HW_REG:
case MRF:
case ATTR:
@@ -203,7 +203,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
,
   op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD;
 
int regs_written = 4 * (bld.dispatch_width() / 8) * scale;
-   fs_reg vec4_result = fs_reg(GRF, alloc.allocate(regs_written), dst.type);
+   fs_reg vec4_result = fs_reg(VGRF, alloc.allocate(regs_written), dst.type);
fs_inst *inst = bld.emit(op, vec4_result, surf_index, vec4_offset);
inst->regs_written = regs_written;
 
@@ -232,7 +232,7 @@ fs_visitor::DEP_RESOLVE_MOV(const fs_builder , int grf)
const fs_builder ubld = bld.annotate("send dependency resolve")
   .half(0);
 
-   ubld.MOV(ubld.null_reg_f(), fs_reg(GRF, grf, BRW_REGISTER_TYPE_F));
+   ubld.MOV(ubld.null_reg_f(), fs_reg(VGRF, grf, BRW_REGISTER_TYPE_F));
 }
 
 bool
@@ -285,12 +285,12 @@ fs_inst::is_send_from_grf() const
case SHADER_OPCODE_URB_READ_SIMD8:
   return true;
case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD:
-  return src[1].file == GRF;
+  return src[1].file == VGRF;
case FS_OPCODE_FB_WRITE:
-  return src[0].file == GRF;
+  return src[0].file == VGRF;
default:
   if (is_tex())
- return src[0].file == GRF;
+ return src[0].file == VGRF;
 
   return false;
}
@@ -303,7 +303,7 @@ fs_inst::is_copy_payload(const brw::simple_allocator 
_alloc) const
   return false;
 
fs_reg reg = this->src[0];
-   if (reg.file != GRF || reg.reg_offset != 0 || reg.stride == 0)
+   if (reg.file != VGRF || reg.reg_offset != 0 || reg.stride == 0)
   return false;
 
if (grf_alloc.sizes[reg.nr] != this->regs_written)
@@ -526,7 +526,7 @@ fs_visitor::get_timestamp(const fs_builder )
   0),
  BRW_REGISTER_TYPE_UD));
 
-   fs_reg dst = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD);
+   fs_reg dst = fs_reg(VGRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD);
 
/* We want to read the 3 fields we care about even if it's not enabled in
 * the dispatch.
@@ -581,7 +581,7 @@ fs_visitor::emit_shader_time_end()
 
fs_reg start = shader_start_time;
start.negate = true;
-   fs_reg diff = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD);
+   fs_reg diff = fs_reg(VGRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD);
diff.set_smear(0);
 
const fs_builder cbld = ibld.group(1, 0);
@@ -822,7 +822,7 @@ fs_inst::regs_read(int arg) const
   

[Mesa-dev] [PATCH 23/24] i965/vec4: Replace src_reg(imm) constructors with brw_imm_*().

2015-11-02 Thread Matt Turner
Cuts 1.5k of .text.
---
 src/mesa/drivers/dri/i965/brw_ir_vec4.h|  5 --
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 67 +++
 src/mesa/drivers/dri/i965/brw_vec4_builder.h   |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp  | 38 +
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 82 +--
 .../drivers/dri/i965/brw_vec4_surface_builder.cpp  |  8 +-
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 95 +++---
 src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp  | 20 ++---
 src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp  | 78 +-
 .../dri/i965/test_vec4_cmod_propagation.cpp| 38 -
 .../dri/i965/test_vec4_copy_propagation.cpp|  2 +-
 .../dri/i965/test_vec4_register_coalesce.cpp   |  4 +-
 12 files changed, 198 insertions(+), 241 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h 
b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
index 096825d..9b5b10c 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
@@ -41,11 +41,6 @@ public:
 
src_reg(enum brw_reg_file file, int nr, const glsl_type *type);
src_reg();
-   src_reg(float f);
-   src_reg(uint32_t u);
-   src_reg(int32_t i);
-   src_reg(uint8_t vf[4]);
-   src_reg(uint8_t vf0, uint8_t vf1, uint8_t vf2, uint8_t vf3);
src_reg(struct brw_reg reg);
 
bool equals(const src_reg ) const;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index ff9ef81..53da42c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -71,51 +71,6 @@ src_reg::src_reg()
init();
 }
 
-src_reg::src_reg(float f)
-{
-   init();
-
-   this->file = IMM;
-   this->type = BRW_REGISTER_TYPE_F;
-   this->f = f;
-}
-
-src_reg::src_reg(uint32_t u)
-{
-   init();
-
-   this->file = IMM;
-   this->type = BRW_REGISTER_TYPE_UD;
-   this->ud = u;
-}
-
-src_reg::src_reg(int32_t i)
-{
-   init();
-
-   this->file = IMM;
-   this->type = BRW_REGISTER_TYPE_D;
-   this->d = i;
-}
-
-src_reg::src_reg(uint8_t vf[4])
-{
-   init();
-
-   this->file = IMM;
-   this->type = BRW_REGISTER_TYPE_VF;
-   memcpy(>ud, vf, sizeof(unsigned));
-}
-
-src_reg::src_reg(uint8_t vf0, uint8_t vf1, uint8_t vf2, uint8_t vf3)
-{
-   init();
-
-   this->file = IMM;
-   this->type = BRW_REGISTER_TYPE_VF;
-   this->ud = (vf0 <<  0) | (vf1 <<  8) | (vf2 << 16) | (vf3 << 24);
-}
-
 src_reg::src_reg(struct brw_reg reg) :
backend_reg(reg)
 {
@@ -387,7 +342,9 @@ vec4_visitor::opt_vector_float()
 
   remaining_channels &= ~inst->dst.writemask;
   if (remaining_channels == 0) {
- vec4_instruction *mov = MOV(inst->dst, imm);
+ unsigned vf;
+ memcpy(, imm, sizeof(vf));
+ vec4_instruction *mov = MOV(inst->dst, brw_imm_vf(vf));
  mov->dst.type = BRW_REGISTER_TYPE_F;
  mov->dst.writemask = WRITEMASK_XYZW;
  inst->insert_after(block, mov);
@@ -662,13 +619,13 @@ vec4_visitor::opt_algebraic()
inst->opcode = BRW_OPCODE_MOV;
switch (inst->src[0].type) {
case BRW_REGISTER_TYPE_F:
-  inst->src[0] = src_reg(0.0f);
+  inst->src[0] = brw_imm_f(0.0f);
   break;
case BRW_REGISTER_TYPE_D:
-  inst->src[0] = src_reg(0);
+  inst->src[0] = brw_imm_d(0);
   break;
case BRW_REGISTER_TYPE_UD:
-  inst->src[0] = src_reg(0u);
+  inst->src[0] = brw_imm_ud(0u);
   break;
default:
   unreachable("not reached");
@@ -1237,7 +1194,7 @@ vec4_visitor::eliminate_find_live_channel()
   case SHADER_OPCODE_FIND_LIVE_CHANNEL:
  if (depth == 0) {
 inst->opcode = BRW_OPCODE_MOV;
-inst->src[0] = src_reg(0);
+inst->src[0] = brw_imm_d(0);
 inst->force_writemask_all = true;
 progress = true;
  }
@@ -1703,7 +1660,7 @@ vec4_visitor::emit_shader_time_end()
 */
src_reg reset_end = shader_end_time;
reset_end.swizzle = BRW_SWIZZLE_;
-   vec4_instruction *test = emit(AND(dst_null_d(), reset_end, src_reg(1u)));
+   vec4_instruction *test = emit(AND(dst_null_ud(), reset_end, 
brw_imm_ud(1u)));
test->conditional_mod = BRW_CONDITIONAL_Z;
 
emit(IF(BRW_PREDICATE_NORMAL));
@@ -1717,12 +1674,12 @@ vec4_visitor::emit_shader_time_end()
 * is 2 cycles.  Remove that overhead, so I can forget about that when
 * trying to determine the time taken for single instructions.
 */
-   emit(ADD(diff, src_reg(diff), src_reg(-2u)));
+   emit(ADD(diff, src_reg(diff), brw_imm_ud(-2u)));
 
emit_shader_time_write(0, src_reg(diff));
-   emit_shader_time_write(1, src_reg(1u));
+   emit_shader_time_write(1, brw_imm_ud(1u));
emit(BRW_OPCODE_ELSE);
-   emit_shader_time_write(2, src_reg(1u));
+   emit_shader_time_write(2, brw_imm_ud(1u));

[Mesa-dev] [PATCH 18/24] i965: Combine register file field.

2015-11-02 Thread Matt Turner
The first four values (2-bits) are hardware values, and VGRF, ATTR, and
UNIFORM remain values used in the IR.
---
 src/mesa/drivers/dri/i965/brw_defines.h | 11 +++
 src/mesa/drivers/dri/i965/brw_fs.cpp|  5 ++---
 src/mesa/drivers/dri/i965/brw_ir_fs.h   |  4 ++--
 src/mesa/drivers/dri/i965/brw_ir_vec4.h |  8 
 src/mesa/drivers/dri/i965/brw_reg.h |  4 ++--
 src/mesa/drivers/dri/i965/brw_shader.h  | 13 -
 src/mesa/drivers/dri/i965/brw_vec4.cpp  | 10 +-
 7 files changed, 26 insertions(+), 29 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 00464e2..5ab77bc 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1376,6 +1376,17 @@ enum PACKED brw_reg_file {
BRW_GENERAL_REGISTER_FILE  = 1,
BRW_MESSAGE_REGISTER_FILE  = 2,
BRW_IMMEDIATE_VALUE= 3,
+
+   ARF = BRW_ARCHITECTURE_REGISTER_FILE,
+   GRF = BRW_GENERAL_REGISTER_FILE,
+   MRF = BRW_MESSAGE_REGISTER_FILE,
+   IMM = BRW_IMMEDIATE_VALUE,
+
+   /* These are not hardware values */
+   VGRF,
+   ATTR,
+   UNIFORM, /* prog_data->params[reg] */
+   BAD_FILE,
 };
 
 #define BRW_HW_REG_TYPE_UD  0
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 6eeafd5..3d2b051 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -423,7 +423,6 @@ fs_reg::fs_reg(uint8_t vf0, uint8_t vf1, uint8_t vf2, 
uint8_t vf3)
 fs_reg::fs_reg(struct brw_reg reg) :
backend_reg(reg)
 {
-   this->file = (enum register_file)reg.file;
this->reg_offset = 0;
this->subreg_offset = 0;
this->reladdr = NULL;
@@ -940,7 +939,7 @@ fs_visitor::vgrf(const glsl_type *const type)
  brw_type_for_base_type(type));
 }
 
-fs_reg::fs_reg(enum register_file file, int nr)
+fs_reg::fs_reg(enum brw_reg_file file, int nr)
 {
init();
this->file = file;
@@ -949,7 +948,7 @@ fs_reg::fs_reg(enum register_file file, int nr)
this->stride = (file == UNIFORM ? 0 : 1);
 }
 
-fs_reg::fs_reg(enum register_file file, int nr, enum brw_reg_type type)
+fs_reg::fs_reg(enum brw_reg_file file, int nr, enum brw_reg_type type)
 {
init();
this->file = file;
diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h 
b/src/mesa/drivers/dri/i965/brw_ir_fs.h
index 3d038f8..d38764e 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
@@ -42,8 +42,8 @@ public:
explicit fs_reg(uint8_t vf[4]);
explicit fs_reg(uint8_t vf0, uint8_t vf1, uint8_t vf2, uint8_t vf3);
fs_reg(struct brw_reg reg);
-   fs_reg(enum register_file file, int nr);
-   fs_reg(enum register_file file, int nr, enum brw_reg_type type);
+   fs_reg(enum brw_reg_file file, int nr);
+   fs_reg(enum brw_reg_file file, int nr, enum brw_reg_type type);
 
bool equals(const fs_reg ) const;
bool is_contiguous() const;
diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h 
b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
index f6c7595..096825d 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
@@ -39,7 +39,7 @@ public:
 
void init();
 
-   src_reg(register_file file, int nr, const glsl_type *type);
+   src_reg(enum brw_reg_file file, int nr, const glsl_type *type);
src_reg();
src_reg(float f);
src_reg(uint32_t u);
@@ -108,10 +108,10 @@ public:
void init();
 
dst_reg();
-   dst_reg(register_file file, int nr);
-   dst_reg(register_file file, int nr, const glsl_type *type,
+   dst_reg(enum brw_reg_file file, int nr);
+   dst_reg(enum brw_reg_file file, int nr, const glsl_type *type,
unsigned writemask);
-   dst_reg(register_file file, int nr, brw_reg_type type,
+   dst_reg(enum brw_reg_file file, int nr, brw_reg_type type,
unsigned writemask);
dst_reg(struct brw_reg reg);
dst_reg(class vec4_visitor *v, const struct glsl_type *type);
diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
b/src/mesa/drivers/dri/i965/brw_reg.h
index d86c746..b906892 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i965/brw_reg.h
@@ -232,11 +232,11 @@ const char *brw_reg_type_letters(unsigned brw_reg_type);
  */
 struct brw_reg {
enum brw_reg_type type:4;
-   enum brw_reg_file file:2;
+   enum brw_reg_file file:3;  /* :2 hardware format */
unsigned negate:1; /* source only */
unsigned abs:1;/* source only */
unsigned address_mode:1;   /* relative addressing, hopefully! */
-   unsigned pad0:2;
+   unsigned pad0:1;
unsigned subnr:5;  /* :1 in align16 */
unsigned nr:16;
 
diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
b/src/mesa/drivers/dri/i965/brw_shader.h
index a59c953..31172a8 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.h
+++ b/src/mesa/drivers/dri/i965/brw_shader.h
@@ -38,17 +38,6 @@
 #define MAX_SAMPLER_MESSAGE_SIZE 11
 #define 

[Mesa-dev] [PATCH 17/24] i965: Replace HW_REG with ARF/GRF.

2015-11-02 Thread Matt Turner
HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to
look at different fields for type, abs, negate, writemask, swizzle, and
a second file. They also caused annoying problems like immediate sources
being considered scheduling barriers (commit 6148e94e2) and other such
nonsense.

Instead use ARF/GRF/MRF for fixed registers in those files.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 118 --
 src/mesa/drivers/dri/i965/brw_fs.h |   5 +-
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   |   3 +-
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |   3 +-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp |   7 +-
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |   5 +-
 src/mesa/drivers/dri/i965/brw_ir_fs.h  |   9 +-
 src/mesa/drivers/dri/i965/brw_ir_vec4.h|   6 +-
 .../drivers/dri/i965/brw_schedule_instructions.cpp |  53 +++-
 src/mesa/drivers/dri/i965/brw_shader.cpp   |   8 +-
 src/mesa/drivers/dri/i965/brw_shader.h |   5 +-
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 137 +
 src/mesa/drivers/dri/i965/brw_vec4_cse.cpp |   3 +-
 13 files changed, 154 insertions(+), 208 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 92a9437..6eeafd5 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -76,7 +76,8 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const 
fs_reg ,
/* This will be the case for almost all instructions. */
switch (dst.file) {
case VGRF:
-   case HW_REG:
+   case ARF:
+   case GRF:
case MRF:
case ATTR:
   this->regs_written = DIV_ROUND_UP(dst.component_size(exec_size),
@@ -422,7 +423,7 @@ fs_reg::fs_reg(uint8_t vf0, uint8_t vf1, uint8_t vf2, 
uint8_t vf3)
 fs_reg::fs_reg(struct brw_reg reg) :
backend_reg(reg)
 {
-   this->file = HW_REG;
+   this->file = (enum register_file)reg.file;
this->reg_offset = 0;
this->subreg_offset = 0;
this->reladdr = NULL;
@@ -438,24 +439,17 @@ fs_reg::fs_reg(struct brw_reg reg) :
 bool
 fs_reg::equals(const fs_reg ) const
 {
-   return (file == r.file &&
-   nr == r.nr &&
+   return (memcmp((brw_reg *)this, (brw_reg *), sizeof(brw_reg)) == 0 &&
reg_offset == r.reg_offset &&
subreg_offset == r.subreg_offset &&
-   type == r.type &&
-   negate == r.negate &&
-   abs == r.abs &&
!reladdr && !r.reladdr &&
-   (file != HW_REG ||
-memcmp((brw_reg *)this, (brw_reg *), sizeof(brw_reg)) == 0) &&
-   (file != IMM || d == r.d) &&
stride == r.stride);
 }
 
 fs_reg &
 fs_reg::set_smear(unsigned subreg)
 {
-   assert(file != HW_REG && file != IMM);
+   assert(file != ARF && file != GRF && file != IMM);
subreg_offset = subreg * type_sz(type);
stride = 0;
return *this;
@@ -470,7 +464,7 @@ fs_reg::is_contiguous() const
 unsigned
 fs_reg::component_size(unsigned width) const
 {
-   const unsigned stride = (file != HW_REG ? this->stride :
+   const unsigned stride = ((file != ARF && file != GRF) ? this->stride :
 hstride == 0 ? 0 :
 1 << (hstride - 1));
return MAX2(width * stride, 1) * type_sz(type);
@@ -839,9 +833,10 @@ fs_inst::regs_read(int arg) const
case UNIFORM:
case IMM:
   return 1;
+   case ARF:
+   case GRF:
case VGRF:
case ATTR:
-   case HW_REG:
   return DIV_ROUND_UP(components_read(arg) *
   src[arg].component_size(exec_size),
   REG_SIZE);
@@ -1520,12 +1515,12 @@ fs_visitor::assign_urb_setup()
 */
foreach_block_and_inst(block, fs_inst, inst, cfg) {
   if (inst->opcode == FS_OPCODE_LINTERP) {
-assert(inst->src[1].file == HW_REG);
+assert(inst->src[1].file == GRF);
  inst->src[1].nr += urb_start;
   }
 
   if (inst->opcode == FS_OPCODE_CINTERP) {
-assert(inst->src[0].file == HW_REG);
+assert(inst->src[0].file == GRF);
  inst->src[0].nr += urb_start;
   }
}
@@ -2680,7 +2675,7 @@ fs_visitor::emit_repclear_shader()
assign_curb_setup();
 
/* Now that we have the uniform assigned, go ahead and force it to a vec4. 
*/
-   assert(mov->src[0].file == HW_REG);
+   assert(mov->src[0].file == GRF);
mov->src[0] = brw_vec4_grf(mov->src[0].nr, 0);
 }
 
@@ -2759,10 +2754,7 @@ clear_deps_for_inst_src(fs_inst *inst, bool *deps, int 
first_grf, int grf_len)
/* Clear the flag for registers that actually got read (as expected). */
for (int i = 0; i < inst->sources; i++) {
   int grf;
-  if (inst->src[i].file == VGRF) {
- grf = inst->src[i].nr;
-  } else if (inst->src[i].file == HW_REG &&
- inst->src[i].brw_reg::file == BRW_GENERAL_REGISTER_FILE) {
+  if (inst->src[i].file == VGRF || inst->src[i].file == GRF) {
   

[Mesa-dev] [PATCH 08/24] i965: Remove fixed_hw_reg field from backend_reg.

2015-11-02 Thread Matt Turner
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   |  93 +-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp |   9 +-
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |   4 +-
 src/mesa/drivers/dri/i965/brw_ir_fs.h  |   4 +-
 src/mesa/drivers/dri/i965/brw_ir_vec4.h|   4 +-
 .../drivers/dri/i965/brw_schedule_instructions.cpp |  54 +--
 src/mesa/drivers/dri/i965/brw_shader.cpp   |   8 +-
 src/mesa/drivers/dri/i965/brw_shader.h |   5 +-
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 108 +
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp   |  12 +--
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp |   2 -
 11 files changed, 141 insertions(+), 162 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 6b9e979..243a4ac 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -422,13 +422,15 @@ fs_reg::fs_reg(uint8_t vf0, uint8_t vf1, uint8_t vf2, 
uint8_t vf3)
(vf3 << 24);
 }
 
-/** Fixed brw_reg. */
-fs_reg::fs_reg(struct brw_reg fixed_hw_reg)
+fs_reg::fs_reg(struct brw_reg reg) :
+   backend_reg(reg)
 {
-   init();
this->file = HW_REG;
-   this->fixed_hw_reg = fixed_hw_reg;
-   this->type = fixed_hw_reg.type;
+   this->reg = 0;
+   this->reg_offset = 0;
+   this->subreg_offset = 0;
+   this->reladdr = NULL;
+   this->stride = 1;
 }
 
 bool
@@ -443,8 +445,7 @@ fs_reg::equals(const fs_reg ) const
abs == r.abs &&
!reladdr && !r.reladdr &&
(file != HW_REG ||
-memcmp(_hw_reg, _hw_reg,
-   sizeof(fixed_hw_reg)) == 0) &&
+memcmp((brw_reg *)this, (brw_reg *), sizeof(brw_reg)) == 0) &&
(file != IMM || d == r.d) &&
stride == r.stride);
 }
@@ -468,8 +469,8 @@ unsigned
 fs_reg::component_size(unsigned width) const
 {
const unsigned stride = (file != HW_REG ? this->stride :
-fixed_hw_reg.hstride == 0 ? 0 :
-1 << (fixed_hw_reg.hstride - 1));
+hstride == 0 ? 0 :
+1 << (hstride - 1));
return MAX2(width * stride, 1) * type_sz(type);
 }
 
@@ -942,7 +943,6 @@ fs_visitor::vgrf(const glsl_type *const type)
  brw_type_for_base_type(type));
 }
 
-/** Fixed HW reg constructor. */
 fs_reg::fs_reg(enum register_file file, int reg)
 {
init();
@@ -952,7 +952,6 @@ fs_reg::fs_reg(enum register_file file, int reg)
this->stride = (file == UNIFORM ? 0 : 1);
 }
 
-/** Fixed HW reg constructor. */
 fs_reg::fs_reg(enum register_file file, int reg, enum brw_reg_type type)
 {
init();
@@ -1400,10 +1399,11 @@ fs_visitor::assign_curb_setup()
struct brw_reg brw_reg = brw_vec1_grf(payload.num_regs +
  constant_nr / 8,
  constant_nr % 8);
+brw_reg.abs = inst->src[i].abs;
+brw_reg.negate = inst->src[i].negate;
 
 assert(inst->src[i].stride == 0);
-   inst->src[i].file = HW_REG;
-   inst->src[i].fixed_hw_reg = byte_offset(
+inst->src[i] = byte_offset(
retype(brw_reg, inst->src[i].type),
inst->src[i].subreg_offset);
 }
@@ -1519,12 +1519,12 @@ fs_visitor::assign_urb_setup()
foreach_block_and_inst(block, fs_inst, inst, cfg) {
   if (inst->opcode == FS_OPCODE_LINTERP) {
 assert(inst->src[1].file == HW_REG);
-inst->src[1].fixed_hw_reg.nr += urb_start;
+ inst->src[1].nr += urb_start;
   }
 
   if (inst->opcode == FS_OPCODE_CINTERP) {
 assert(inst->src[0].file == HW_REG);
-inst->src[0].fixed_hw_reg.nr += urb_start;
+ inst->src[0].nr += urb_start;
   }
}
 
@@ -1556,12 +1556,15 @@ fs_visitor::assign_vs_urb_setup()
   inst->src[i].reg +
   inst->src[i].reg_offset;
 
-inst->src[i].file = HW_REG;
-inst->src[i].fixed_hw_reg =
+struct brw_reg reg =
stride(byte_offset(retype(brw_vec8_grf(grf, 0), 
inst->src[i].type),
   inst->src[i].subreg_offset),
   inst->exec_size * inst->src[i].stride,
   inst->exec_size, inst->src[i].stride);
+reg.abs = inst->src[i].abs;
+reg.negate = inst->src[i].negate;
+
+inst->src[i] = reg;
  }
   }
}
@@ -2676,7 +2679,7 @@ fs_visitor::emit_repclear_shader()
 
/* Now that we have the uniform assigned, go ahead and force it to a vec4. 
*/
assert(mov->src[0].file == HW_REG);
-   mov->src[0] = brw_vec4_grf(mov->src[0].fixed_hw_reg.nr, 0);
+   mov->src[0] = brw_vec4_grf(mov->src[0].nr, 0);
 }
 
 /**
@@ -2757,8 +2760,8 @@ clear_deps_for_inst_src(fs_inst *inst, bool *deps, 

[Mesa-dev] [PATCH 12/24] i965: Initialize registers' file to BAD_FILE.

2015-11-02 Thread Matt Turner
The test (file == BAD_FILE) works on registers for which the constructor
has not run because BAD_FILE is zero.  The next commit will move
BAD_FILE in the enum so that it's no longer zero.
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 10 +-
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |  3 +++
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp   |  9 +
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 7eeff93..611347c 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -260,6 +260,10 @@ void
 fs_visitor::nir_emit_system_values()
 {
nir_system_values = ralloc_array(mem_ctx, fs_reg, SYSTEM_VALUE_MAX);
+   for (unsigned i = 0; i < SYSTEM_VALUE_MAX; i++) {
+  nir_system_values[i].file = BAD_FILE;
+   }
+
nir_foreach_overload(nir, overload) {
   assert(strcmp(overload->function->name, "main") == 0);
   assert(overload->impl);
@@ -270,7 +274,11 @@ fs_visitor::nir_emit_system_values()
 void
 fs_visitor::nir_emit_impl(nir_function_impl *impl)
 {
-   nir_locals = reralloc(mem_ctx, nir_locals, fs_reg, impl->reg_alloc);
+   nir_locals = ralloc_array(mem_ctx, fs_reg, impl->reg_alloc);
+   for (unsigned i = 0; i < impl->reg_alloc; i++) {
+  nir_locals[i].file = BAD_FILE;
+   }
+
foreach_list_typed(nir_register, reg, node, >registers) {
   unsigned array_elems =
  reg->num_array_elems == 0 ? 1 : reg->num_array_elems;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 1e46f9a..ef6e19a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1157,6 +1157,9 @@ fs_visitor::init()
memset(>payload, 0, sizeof(this->payload));
memset(this->outputs, 0, sizeof(this->outputs));
memset(this->output_components, 0, sizeof(this->output_components));
+   for (unsigned i = 0; i < ARRAY_SIZE(this->outputs); i++) {
+  this->outputs[i].file = BAD_FILE;
+   }
this->source_depth_to_render_target = false;
this->runtime_check_aads_emit = false;
this->first_non_payload_grf = 0;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 8ca8ddb..bdb3d02 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -106,6 +106,9 @@ void
 vec4_visitor::nir_setup_system_values()
 {
nir_system_values = ralloc_array(mem_ctx, dst_reg, SYSTEM_VALUE_MAX);
+   for (unsigned i = 0; i < SYSTEM_VALUE_MAX; i++) {
+  nir_system_values[i].file = BAD_FILE;
+   }
 
nir_foreach_overload(nir, overload) {
   assert(strcmp(overload->function->name, "main") == 0);
@@ -118,6 +121,9 @@ void
 vec4_visitor::nir_setup_inputs()
 {
nir_inputs = ralloc_array(mem_ctx, src_reg, nir->num_inputs);
+   for (unsigned i = 0; i < nir->num_inputs; i++) {
+  nir_inputs[i].file = BAD_FILE;
+   }
 
nir_foreach_variable(var, >inputs) {
   int offset = var->data.driver_location;
@@ -148,6 +154,9 @@ void
 vec4_visitor::nir_emit_impl(nir_function_impl *impl)
 {
nir_locals = ralloc_array(mem_ctx, dst_reg, impl->reg_alloc);
+   for (unsigned i = 0; i < impl->reg_alloc; i++) {
+  nir_locals[i].file = BAD_FILE;
+   }
 
foreach_list_typed(nir_register, reg, node, >registers) {
   unsigned array_elems =
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 16/24] i965/fs: Handle type-V immediates in brw_reg_from_fs_reg().

2015-11-02 Thread Matt Turner
We use brw_imm_v() to produce type-V immediates, which generates a
brw_reg with fs_reg's .file set to HW_REG. The next commit will rid us
of HW_REGs, so we need to handle BRW_REGISTER_TYPE_V in the IMM case.
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index ddeb528..654bb1c 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -111,6 +111,9 @@ brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, unsigned 
gen)
   case BRW_REGISTER_TYPE_VF:
  brw_reg = brw_imm_vf(reg->ud);
  break;
+  case BRW_REGISTER_TYPE_V:
+ brw_reg = brw_imm_v(reg->ud);
+ break;
   default:
 unreachable("not reached");
   }
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 15/24] i965/fs: Set stride = 1 for vector immediate types.

2015-11-02 Thread Matt Turner
The generator asserts that this is true (and presumably it's useful in
some optimization passes?) and the VF fs_reg constructors did this (by
virtue of the fact that it doesn't override what init() does).

In the next commit, calling this constructor with brw_imm_* will
generate an IMM file register rather than a HW_REG, making this change
necessary to avoid breakage with existing uses of brw_imm_v().
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 91eaf61..92a9437 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -427,6 +427,12 @@ fs_reg::fs_reg(struct brw_reg reg) :
this->subreg_offset = 0;
this->reladdr = NULL;
this->stride = 1;
+   if (this->file == IMM &&
+   (this->type != BRW_REGISTER_TYPE_V &&
+this->type != BRW_REGISTER_TYPE_UV &&
+this->type != BRW_REGISTER_TYPE_VF)) {
+  this->stride = 0;
+   }
 }
 
 bool
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] Expose support for GL_EXT_buffer_storage

2015-11-02 Thread Ryan Houdek
This extension requires ES 3.1 since it relies on glMemoryBarrier.
For testing purposes I temporarily moved glMemoryBarrier to be an ES 3.0
function.
This has been tested with the piglit in the ML and the Dolphin emulator.
---
 docs/relnotes/11.1.0.html | 1 +
 src/mapi/glapi/gen/es_EXT.xml | 9 +
 src/mesa/main/extensions.c| 1 +
 3 files changed, 11 insertions(+)

diff --git a/docs/relnotes/11.1.0.html b/docs/relnotes/11.1.0.html
index 7160244..0b044cf 100644
--- a/docs/relnotes/11.1.0.html
+++ b/docs/relnotes/11.1.0.html
@@ -55,6 +55,7 @@ Note: some of the new features are only available with 
certain drivers.
 GL_ARB_texture_barrier / GL_NV_texture_barrier on i965
 GL_ARB_texture_query_lod on softpipe
 GL_ARB_texture_view on radeonsi
+GL_EXT_buffer_storage implemented for when ES 3.1 support is gained
 GL_EXT_draw_elements_base_vertex on all drivers
 GL_OES_draw_elements_base_vertex on all drivers
 EGL_KHR_create_context on softpipe, llvmpipe
diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml
index bf20e48..9a777a2 100644
--- a/src/mapi/glapi/gen/es_EXT.xml
+++ b/src/mapi/glapi/gen/es_EXT.xml
@@ -905,4 +905,13 @@
 
 
 
+
+
+
+
+
+
+
+
+
 
diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index d964f03..6160bb3 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -222,6 +222,7 @@ static const struct extension extension_table[] = {
{ "GL_EXT_blend_color", o(EXT_blend_color), 
GLL,1995 },
{ "GL_EXT_blend_equation_separate", 
o(EXT_blend_equation_separate), GL, 2003 },
{ "GL_EXT_blend_func_separate", o(EXT_blend_func_separate), 
GLL,1999 },
+   { "GL_EXT_buffer_storage",  o(ARB_buffer_storage),  
ES2, 2015 },
{ "GL_EXT_discard_framebuffer", o(dummy_true),  
  ES1 | ES2, 2009 },
{ "GL_EXT_blend_minmax",o(EXT_blend_minmax),
GLL | ES1 | ES2, 1995 },
{ "GL_EXT_blend_subtract",  o(dummy_true),  
GLL,1995 },
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] Expose support for GL_EXT_buffer_storage

2015-11-02 Thread Ilia Mirkin
On Mon, Nov 2, 2015 at 8:30 PM, Ryan Houdek  wrote:
> This extension requires ES 3.1 since it relies on glMemoryBarrier.
> For testing purposes I temporarily moved glMemoryBarrier to be an ES 3.0
> function.
> This has been tested with the piglit in the ML and the Dolphin emulator.

Reviewed-by: Ilia Mirkin 

> ---
>  docs/relnotes/11.1.0.html | 1 +
>  src/mapi/glapi/gen/es_EXT.xml | 9 +
>  src/mesa/main/extensions.c| 1 +
>  3 files changed, 11 insertions(+)
>
> diff --git a/docs/relnotes/11.1.0.html b/docs/relnotes/11.1.0.html
> index 7160244..0b044cf 100644
> --- a/docs/relnotes/11.1.0.html
> +++ b/docs/relnotes/11.1.0.html
> @@ -55,6 +55,7 @@ Note: some of the new features are only available with 
> certain drivers.
>  GL_ARB_texture_barrier / GL_NV_texture_barrier on i965
>  GL_ARB_texture_query_lod on softpipe
>  GL_ARB_texture_view on radeonsi
> +GL_EXT_buffer_storage implemented for when ES 3.1 support is gained
>  GL_EXT_draw_elements_base_vertex on all drivers
>  GL_OES_draw_elements_base_vertex on all drivers
>  EGL_KHR_create_context on softpipe, llvmpipe
> diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml
> index bf20e48..9a777a2 100644
> --- a/src/mapi/glapi/gen/es_EXT.xml
> +++ b/src/mapi/glapi/gen/es_EXT.xml
> @@ -905,4 +905,13 @@
>
>  
>
> +
> +
> +
> +
> +
> +
> +
> +
> +
>  
> diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
> index d964f03..bdc6817 100644
> --- a/src/mesa/main/extensions.c
> +++ b/src/mesa/main/extensions.c
> @@ -222,6 +222,7 @@ static const struct extension extension_table[] = {
> { "GL_EXT_blend_color", o(EXT_blend_color),   
>   GLL,1995 },
> { "GL_EXT_blend_equation_separate", 
> o(EXT_blend_equation_separate), GL, 2003 },
> { "GL_EXT_blend_func_separate", 
> o(EXT_blend_func_separate), GLL,1999 },
> +   { "GL_EXT_buffer_storage",  o(ARB_buffer_storage),
>  ES31, 2015 },
> { "GL_EXT_discard_framebuffer", o(dummy_true),
> ES1 | ES2, 2009 },
> { "GL_EXT_blend_minmax",o(EXT_blend_minmax),  
>   GLL | ES1 | ES2, 1995 },
> { "GL_EXT_blend_subtract",  o(dummy_true),
>   GLL,1995 },
> --
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Unused (?) duplicated GLSL IR state in NIR

2015-11-02 Thread Emil Velikov
Hi all,

From a quick look, it seems that NIR copies (almost ?) all the state
from GLSL IR even if it doesn't use it.

The particular piece that I'm thinking about is nir_variable::data.
Afaict this is a remnant from the early days, when the intent was to
kill off the GLSL IR and use NIR directly. If so should we just nuke
it, or (if there is someone working on it) add some comments "Keepme:
WIP work by XXX to use this and kill the glsl IR one".

The (not that distant) GLSL IR memory optimisations by Ian, seems to
have missed NIR. Was that intentional or were those worked upon before
NIR got merged ? Perhaps it's worth porting them over ?


Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/11] nir: Add a pass-running infastructure

2015-11-02 Thread Ian Romanick
On 10/28/2015 04:43 PM, Connor Abbott wrote:
> On Wed, Oct 28, 2015 at 7:06 PM, Ian Romanick  wrote:
>> On 10/28/2015 02:32 PM, Jason Ekstrand wrote:
>>> ---
>>>  src/glsl/nir/nir.h  | 19 +++
>>>  src/glsl/nir/nir_pass.c | 64 
>>> -
>>>  2 files changed, 82 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
>>> index e3777f9..069c7c1 100644
>>> --- a/src/glsl/nir/nir.h
>>> +++ b/src/glsl/nir/nir.h
>>> @@ -1582,6 +1582,25 @@ typedef struct nir_shader {
>>>foreach_list_typed(nir_function_overload, overload, node, \
>>>   &(func)->overload_list)
>>>
>>> +typedef struct nir_pass {
>>
>> A couple years ago Kristian, with the support of everyone, waged a war
>> on 'typedef struct foo { ... } foo;'  Has this awful idiom really risen
>> from the dead?  Can we please send it back to the grave?
> 
> (flamewar incoming!)
> 
> Yes, it has, and no.
> 
> In case you haven't read any NIR code, NIR uses this all over the
> place -- all of the core datastructures are typedef'd. The only
> argument I've heard against this practice, from Linus, is that it
> makes telling if a value is a lightweight thing, like an integer, or
> not. But this isn't the kernel; we don't use typedefs for integers at
> all, and the only place where we use nir_* and it *isn't* a structure
> is the enums. Pretty much the entire time, it's fairly obvious these
> are enums, either because they have "type" in the name (e.g.
> nir_instr_type), indicating C-style subclassing, or they're some kind
> of operation code (nir_op and nir_intrinsic_op). This is fairly
> subjective, but if you're not able to look at a nir_* type and know
> whether it's a lightweight thing or not, then you probably don't know
> the code well enough to understand it anyways. The amount of typing
> they save is just way too large compared to the potential
> (theoretical) inconvenience of not knowing how the type is declared.
> But if this bikeshed seriously disturbs you, then I'd much rather
> change the enums to not be typedef'd, since you have to type out their
> types a lot less often.

As I've been going through Meta today, I was reminded the other (much
more important IMO) reason for not doing the typedef business.  I
believe this is part of the reason we did the giant s/GLcontext/struct
gl_context/g a couple years ago.

Meta has a few structures, like meta_clear_state, that nothing outside
meta ever needs to know anything about.  However, instances of these
structures need to be tracked in the gl_context so that they can be used
the next time we enter meta, freed when the context is destroyed, etc.

You really don't want to put the definition of meta_clear_state in
mtypes.h both for encapsulation and avoiding rebuilding the world
whenever you change meta_clear_state.  You also don't want to stick a
'void *meta_clear;' in gl_context.  That sounds awful.

C language lets you use 'struct meta_clear_state *meta_clear;' without
knowing what a 'struct meta_clear_state' is.  If you want to use a
'meta_clear_state_t *' or similar, you at have to sprinkle the typedefs
in multiple places.  Then you have the same information in two places to
maintain.  That also sounds awful.

Or you mix uses of 'meta_clear_state_t *' and 'struct meta_clear_state
*'.  That also sounds awful.

Or you decide that "because reasons" some structures get a typedef and
some don't. That also sounds awful.

Once you eliminate the awful, whatever remains, no matter how much extra
typing, must be the truth.  Or something along those lines.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev