date:20160511

Re: [Mesa-dev] [PATCH 00/14] vl dri3 support for vaapi and vdpau

2016-05-11 Thread Axel Davy


Another comment:

What would solve your DRI_PRIME issues would also be
dma-buf fences.

While I believe thread_submit should be a bit better (because
it avoids a card stall waiting for another card to finish), dma-buf
fences make the two cards synchronize rendering properly.

I don't know much of the kernel details, but as I understand radeon and 
amdgpu support
the kernel interface for dma-buf fences, but intel not. You should be 
looking

to the intel team to implement the feature.

Axel

On 12/05/2016 07:46, Axel Davy wrote:

On 12/05/2016 04:41, Mike Lothian wrote:

Hi Axel

Is the thread_submit=true only for nine or does it work with all of DRI3?

I'm keen to get rid of the tearing on my Skylake/Tonga setup

Thanks

Mike



It is gallium nine only for now.

In the case of gallium nine, it as a reasonnable assumption all 
accesses to the rendered content (read/write) is done through the api. 
Besides the window messages are independent of whether there is 
presentation or not (and when). Thus you can delay the presentation.


I believe in the case of GL, it is too much to do these assumptions. 
Though perhaps the option could still be added and user test if for 
their app it works or not. But it's not trivial work to add support 
for the option.



Axel



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965/blorp: Do not skip fast color clear with new color

2016-05-11 Thread Topi Pohjolainen

This hasn't been visible before. It showed up with lossless
compression with:

dEQP-GLES3.functional.fbo.color.repeated_clear.sample.tex2d.rgb8

Current fast clear logic kicks color resolves even for gpu sampling.
In the test case this results into trashing of the fast color clear
state between two subsequent clears, and therefore each clear is
performed correctly.
With lossless compression the resolves are unnecessary and therefore
the clear state indicates that the buffer is already cleared. Without
considering if the previous color value was the same as the new,
clears that need to be performed are skipped and the buffer ends up
holding old pixel values.

Signed-off-by: Topi Pohjolainen 
CC: Kenneth Graunke 
CC: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_blorp_clear.cpp   |  6 --
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 13 -
 src/mesa/drivers/dri/i965/brw_meta_util.h   |  2 +-
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
index ed537ba..2cde347 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
@@ -301,12 +301,14 @@ do_single_blorp_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
* programmed in SURFACE_STATE by later rendering and resolve
* operations.
*/
-  brw_meta_set_fast_clear_color(brw, irb->mt, >Color.ClearColor);
+  const bool color_updated = brw_meta_set_fast_clear_color(
+brw, irb->mt, >Color.ClearColor);
 
   /* If the buffer is already in INTEL_FAST_CLEAR_STATE_CLEAR, the clear
* is redundant and can be skipped.
*/
-  if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_CLEAR)
+  if (!color_updated &&
+  irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_CLEAR)
  return true;
 
   /* If the MCS buffer hasn't been allocated yet, we need to allocate
diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 76988bf..d98f870 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -421,8 +421,10 @@ brw_is_color_fast_clear_compatible(struct brw_context *brw,
 /**
  * Convert the given color to a bitfield suitable for ORing into DWORD 7 of
  * SURFACE_STATE (DWORD 12-15 on SKL+).
+ *
+ * Returned boolean tells if the given color differs from the stored.
  */
-void
+bool
 brw_meta_set_fast_clear_color(struct brw_context *brw,
   struct intel_mipmap_tree *mt,
   const union gl_color_union *color)
@@ -466,9 +468,14 @@ brw_meta_set_fast_clear_color(struct brw_context *brw,
   }
}
 
+   bool updated;
if (brw->gen >= 9) {
+  updated = memcmp(>gen9_fast_clear_color, _color,
+   sizeof(mt->gen9_fast_clear_color));
   mt->gen9_fast_clear_color = override_color;
} else {
+  const uint32_t old_color_value = mt->fast_clear_color_value;
+
   mt->fast_clear_color_value = 0;
   for (int i = 0; i < 4; i++) {
  /* Testing for non-0 works for integer and float colors */
@@ -477,7 +484,11 @@ brw_meta_set_fast_clear_color(struct brw_context *brw,
 1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i));
  }
   }
+
+  updated = (old_color_value == mt->fast_clear_color_value);
}
+
+   return updated;
 }
 
 static const uint32_t fast_clear_color[4] = { ~0, ~0, ~0, ~0 };
diff --git a/src/mesa/drivers/dri/i965/brw_meta_util.h 
b/src/mesa/drivers/dri/i965/brw_meta_util.h
index 550a46a..ac051e2 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_util.h
+++ b/src/mesa/drivers/dri/i965/brw_meta_util.h
@@ -60,7 +60,7 @@ brw_meta_get_buffer_rect(const struct gl_framebuffer *fb,
  unsigned *x0, unsigned *y0,
  unsigned *x1, unsigned *y1);
 
-void
+bool
 brw_meta_set_fast_clear_color(struct brw_context *brw,
   struct intel_mipmap_tree *mt,
   const union gl_color_union *color);
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/14] vl dri3 support for vaapi and vdpau

2016-05-11 Thread Axel Davy


On 12/05/2016 04:41, Mike Lothian wrote:

Hi Axel

Is the thread_submit=true only for nine or does it work with all of DRI3?

I'm keen to get rid of the tearing on my Skylake/Tonga setup

Thanks

Mike



It is gallium nine only for now.

In the case of gallium nine, it as a reasonnable assumption all accesses 
to the rendered content (read/write) is done through the api. Besides 
the window messages are independent of whether there is presentation or 
not (and when). Thus you can delay the presentation.


I believe in the case of GL, it is too much to do these assumptions. 
Though perhaps the option could still be added and user test if for 
their app it works or not. But it's not trivial work to add support for 
the option.



Axel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 12/23] i965/fs: fix pull constant load component selection for doubles

2016-05-11 Thread Samuel Iglesias Gonsálvez



On 11/05/16 22:46, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez  writes:
> 
>> On Tue, 2016-05-10 at 21:06 -0700, Francisco Jerez wrote:
>>> Samuel Iglesias Gonsálvez  writes:
>>>

 From: Iago Toral Quiroga 

 UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4
 starting at a
 constant offset that is 16-byte aligned. If we need to access an
 unaligned
 offset we emit a load with an aligned offset and use the remaining
 constant
 offset to select the component into the vec4 result that we are
 interested
 in. This component must be computed in units of the type size,
 since that
 is what fs_reg::set_smear expects.

 This patch does this change in the two places where we use this
 message:
 In demote_pull_constants when we lower uniform access with constant
 offset
 into the pull constant buffer and in UBO loads with constant
 offset.
 ---
  src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++-
  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 4 +++-
  2 files changed, 5 insertions(+), 2 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
 b/src/mesa/drivers/dri/i965/brw_fs.cpp
 index 0e69be8..dff13ea 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
 @@ -2268,7 +2268,8 @@ fs_visitor::lower_constant_loads()
   inst->src[i].file = VGRF;
   inst->src[i].nr = dst.nr;
   inst->src[i].reg_offset = 0;
 - inst->src[i].set_smear(pull_index & 3);
 + unsigned type_slots = MAX2(1, type_sz(inst->dst.type) /
 4);
 + inst->src[i].set_smear((pull_index & 3) / type_slots);
  
>>> This cannot be right, why should we care what the destination type of
>>> the instruction is while lowering a uniform source?  Also I don't
>>> think
>>> the MAX2 call is correct because *if* type_sz(inst->dst.type) / 4 < 1
>>> you'll force type_slots to 1 and end up interpreting the pull_index
>>> in
>>> the wrong units.  How about:
>>>

   inst->src[i].set_smear((pull_index & 3) * 4 /
  type_sz(inst->src[i].type));

>>
>> OK
>>
   brw_mark_surface_used(prog_data, index);
}
 diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 index 4cd219a..532ca65 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 @@ -2980,8 +2980,10 @@ fs_visitor::nir_emit_intrinsic(const
 fs_builder , nir_intrinsic_instr *instr
   bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
 packed_consts,
surf_index, const_offset_reg);
  
 + unsigned component_base =
 +(const_offset->u32[0] % 16) / MAX2(1,
 type_sz(dest.type));
>>> Rather than dividing by the type size only to let set_smear multiply
>>> by
>>> the type size again, I think it would be cleaner to do something
>>> like:
>>>

   const fs_reg consts = byte_offset(packed_consts,
 const_offset->u32[0] % 16);

   for (unsigned i = 0; i < instr->num_components; i++) {
>>> then here:
>>>

  bld.MOV(offset(dest, bld, i), component(consts, i));
>>> and then remove the rest of the loop.
>>>
>>
>> I am having troubles with adapting patch 13/23 to this way because the
>> following assert in component() is failing for some tests:
>> 
>> assert(reg.subreg_offset == 0);
>>
> 
> Ouch, that seems pretty broken, let's fix it (see attachment).
> 

Oh thanks! That fixes my problems. I am going to pick your patch and add
it to our version 2 of the branch.

Sam

>> consts.subreg is not zero thanks to byte_offset() call.
>>
>> So I prefer to go to a mixed solution: keep set_smear() usage, then:
>>
>>bld.MOV(offset(dest, bld, i), packed_consts);
>>
>> and remove the rest of the loop.
>>
>> Sam
>>

 -packed_consts.set_smear(const_offset->u32[0] % 16 / 4
 + i);
 +packed_consts.set_smear(component_base + i);
  
  /* The std140 packing rules don't allow vectors to
 cross 16-byte
   * boundaries, and a reg is 32 bytes.
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 12/59] i965: add brw_imm_df

2016-05-11 Thread Samuel Iglesias Gonsálvez



On 11/05/16 22:30, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez  writes:
> 
>> On 11/05/16 05:56, Francisco Jerez wrote:
>>> Samuel Iglesias Gonsálvez  writes:
>>>
 From: Connor Abbott 

 v2 (Iago)
   - Fixup accessibility in backend_reg

 Signed-off-by: Iago Toral Quiroga 
>>>
>>> I've just noticed (while running valgrind) that this patch causes
>>> serious breakage in the back-end.  The reason is that the extra bits
>>> required to make room for the df field of the union don't get
>>> initialized in all codepaths, so backend_reg comparisons done using
>>> memcmp() can basically return random results now.  Can you please look
>>> into this?  Some ways to fix it would be to make sure we zero-initialize
>>> the whole brw_reg in all cases (or at least the union padding), or stop
>>> using memcmp() to compare registers -- I guess the latter might be
>>> somewhat less intrusive and increase the likelihood that we can get this
>>> sorted out timely.
>>>
>>
>> Attached is a patch for it, I initialized all union bits to zero before
>> setting them in brw_reg(). Can you test it? If it is not fixed, Would
>> you mind sending me an example to run it with valgrind here?
>>
> I'm afraid it's not fixed, I still see plenty of "Conditional jump or
> move depends on uninitialised value(s)" errors while running pretty much
> any piglit test on valgrind with the patch below applied.
> 

:-(

OK, I will fix it.

Thanks,

Sam

>> I am thinking that maybe we want to change backend_reg::equals() if this
>> doesn't work.
>>
>> Sam
>>
 ---
  src/mesa/drivers/dri/i965/brw_reg.h| 9 +
  src/mesa/drivers/dri/i965/brw_shader.h | 1 +
  2 files changed, 10 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
 b/src/mesa/drivers/dri/i965/brw_reg.h
 index b84c709..6d51623 100644
 --- a/src/mesa/drivers/dri/i965/brw_reg.h
 +++ b/src/mesa/drivers/dri/i965/brw_reg.h
 @@ -254,6 +254,7 @@ struct brw_reg {
   unsigned pad1:1;
};
  
 +  double df;
float f;
int   d;
unsigned ud;
 @@ -544,6 +545,14 @@ brw_imm_reg(enum brw_reg_type type)
  
  /** Construct float immediate register */
  static inline struct brw_reg
 +brw_imm_df(double df)
 +{
 +   struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_DF);
 +   imm.df = df;
 +   return imm;
 +}
 +
 +static inline struct brw_reg
  brw_imm_f(float f)
  {
 struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_F);
 diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
 b/src/mesa/drivers/dri/i965/brw_shader.h
 index fc228f6..f6f6167 100644
 --- a/src/mesa/drivers/dri/i965/brw_shader.h
 +++ b/src/mesa/drivers/dri/i965/brw_shader.h
 @@ -90,6 +90,7 @@ struct backend_reg : private brw_reg
 using brw_reg::width;
 using brw_reg::hstride;
  
 +   using brw_reg::df;
 using brw_reg::f;
 using brw_reg::d;
 using brw_reg::ud;
 -- 
 2.5.0

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> From 35254624d63b77aa2024bc2b08612e28cae4bb98 Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?Samuel=20Iglesias=20Gons=C3=A1lvez?= 
>> Date: Wed, 11 May 2016 07:44:10 +0200
>> Subject: [PATCH] i965: initialize struct brw_reg's union bits to zero.
>> MIME-Version: 1.0
>> Content-Type: text/plain; charset=UTF-8
>> Content-Transfer-Encoding: 8bit
>>
>> Extra bits required to make room for the df field of the union don't get
>> initialized in all codepaths, so backend_reg comparisons done using
>> memcmp() can basically return random results.
>>
>> Initialize them to zero before setting the rest of union's fields.
>>
>> Signed-off-by: Samuel Iglesias Gonsálvez 
>> Reported-by: Francisco Jerez 
>> ---
>>  src/mesa/drivers/dri/i965/brw_reg.h | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
>> b/src/mesa/drivers/dri/i965/brw_reg.h
>> index 6d51623..3b76d7d 100644
>> --- a/src/mesa/drivers/dri/i965/brw_reg.h
>> +++ b/src/mesa/drivers/dri/i965/brw_reg.h
>> @@ -338,6 +338,9 @@ brw_reg(enum brw_reg_file file,
>> reg.subnr = subnr * type_sz(type);
>> reg.nr = nr;
>>  
>> +   /* Initialize all union's bits to zero before setting them. */
>> +   reg.df = 0;
>> +
>> /* Could do better: If the reg is r5.3<0;1,0>, we probably want to
>>  * set swizzle and writemask to W, as the lower bits of subnr will
>>  * be lost when converted to align16.  This is probably too much to
>> -- 
>> 2.5.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

Re: [Mesa-dev] [PATCH 15/23] i965/fs: add SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA helper

2016-05-11 Thread Francisco Jerez

Francisco Jerez  writes:

> Iago Toral  writes:
>
>> On Tue, 2016-05-10 at 19:10 -0700, Francisco Jerez wrote:
>>> Samuel Iglesias Gonsálvez  writes:
>>> 
>>> > From: Iago Toral Quiroga 
>>> >
>>> > There are a few places where we need to shuffle the result of a 32-bit 
>>> > load
>>> > into valid 64-bit data, so extract this logic into a separate helper that 
>>> > we
>>> > can reuse.
>>> >
>>> > Also, the shuffling needs to operate with WE_all set, which we were 
>>> > missing
>>> > before, because we are changing the layout of the data across the various
>>> > channels. Otherwise we will run into problems in non-uniform control-flow
>>> > scenarios.
>>> > ---
>>> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 95 
>>> > +---
>>> >  src/mesa/drivers/dri/i965/brw_fs.h   |  5 ++
>>> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 46 ++--
>>> >  3 files changed, 73 insertions(+), 73 deletions(-)
>>> >
>>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
>>> > b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> > index dff13ea..709e4b8 100644
>>> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> > @@ -216,39 +216,8 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const 
>>> > fs_builder ,
>>> >  
>>> > vec4_result.type = dst.type;
>>> >  
>>> > -   /* Our VARYING_PULL_CONSTANT_LOAD reads a vector of 32-bit elements. 
>>> > If we
>>> > -* are reading doubles this means that we get this:
>>> > -*
>>> > -*  r0: x0 x0 x0 x0 x0 x0 x0 x0
>>> > -*  r1: x1 x1 x1 x1 x1 x1 x1 x1
>>> > -*  r2: y0 y0 y0 y0 y0 y0 y0 y0
>>> > -*  r3: y1 y1 y1 y1 y1 y1 y1 y1
>>> > -*
>>> > -* Fix this up so we return valid double elements:
>>> > -*
>>> > -*  r0: x0 x1 x0 x1 x0 x1 x0 x1
>>> > -*  r1: x0 x1 x0 x1 x0 x1 x0 x1
>>> > -*  r2: y0 y1 y0 y1 y0 y1 y0 y1
>>> > -*  r3: y0 y1 y0 y1 y0 y1 y0 y1
>>> > -*/
>>> > -   if (type_sz(dst.type) == 8) {
>>> > -  int multiplier = bld.dispatch_width() / 8;
>>> > -  fs_reg fixed_res =
>>> > - fs_reg(VGRF, alloc.allocate(2 * multiplier), 
>>> > BRW_REGISTER_TYPE_F);
>>> > -  /* We only have 2 doubles in a 32-bit vec4 */
>>> > -  for (int i = 0; i < 2; i++) {
>>> > - fs_reg vec4_float =
>>> > -horiz_offset(retype(vec4_result, BRW_REGISTER_TYPE_F),
>>> > - multiplier * 16 * i);
>>> > -
>>> > - bld.MOV(stride(fixed_res, 2), vec4_float);
>>> > - bld.MOV(stride(horiz_offset(fixed_res, 1), 2),
>>> > - horiz_offset(vec4_float, 8 * multiplier));
>>> > -
>>> > - bld.MOV(horiz_offset(vec4_result, multiplier * 8 * i),
>>> > - retype(fixed_res, BRW_REGISTER_TYPE_DF));
>>> > -  }
>>> > -   }
>>> > +   if (type_sz(dst.type) == 8)
>>> > +  SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA(bld, vec4_result, 
>>> > vec4_result, 2);
>>> >  
>>> > int type_slots = MAX2(type_sz(dst.type) / 4, 1);
>>> > bld.MOV(dst, offset(vec4_result, bld,
>>> > @@ -256,6 +225,66 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const 
>>> > fs_builder ,
>>> >  }
>>> >  
>>> >  /**
>>> > + * This helper takes the result of a load operation that reads 32-bit 
>>> > elements
>>> > + * in this format:
>>> > + *
>>> > + * x x x x x x x x
>>> > + * y y y y y y y y
>>> > + * z z z z z z z z
>>> > + * w w w w w w w w
>>> > + *
>>> > + * and shuffles the data to get this:
>>> > + *
>>> > + * x y x y x y x y
>>> > + * x y x y x y x y
>>> > + * z w z w z w z w
>>> > + * z w z w z w z w
>>> > + *
>>> > + * Which is exactly what we want if the load is reading 64-bit components
>>> > + * like doubles, where x represents the low 32-bit of the x double 
>>> > component
>>> > + * and y represents the high 32-bit of the x double component (likewise 
>>> > with
>>> > + * z and w for double component y). The parameter @components represents
>>> > + * the number of 64-bit components present in @src. This would typically 
>>> > be
>>> > + * 2 at most, since we can only fit 2 double elements in the result of a
>>> > + * vec4 load.
>>> > + *
>>> > + * Notice that @dst and @src can be the same register.
>>> > + */
>>> > +void
>>> > +fs_visitor::SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA(const fs_builder 
>>> > ,
>>> 
>>> I don't see any reason to make this an fs_visitor method.  Declare this
>>> as a static function local to brw_fs_nir.cpp what should improve
>>> encapsulation and reduce the amount of boilerplate.  Also please don't
>>> write it in capitals unless you want people to shout the name of your
>>> function while discussing out loud about it. ;)
>>> 
>>> > +const fs_reg dst,
>>> > +const fs_reg src,
>>> > +uint32_t components)
>>> > +{
>>> > +   int multiplier =

Re: [Mesa-dev] [PATCH 04/11] i965: Add new intel_set_texture_image_mt() helper

2016-05-11 Thread Pohjolainen, Topi

On Wed, May 11, 2016 at 12:22:37PM -0700, Kristian H?gsberg wrote:
> From: Kristian Høgsberg Kristensen 
> 
> This factors out the work of setting up a miptree as the backing for a
> texture image into a new helper.
> ---
>  src/mesa/drivers/dri/i965/intel_tex_image.c | 69 
> ++---
>  1 file changed, 42 insertions(+), 27 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
> b/src/mesa/drivers/dri/i965/intel_tex_image.c
> index 9a40476..b214937 100644
> --- a/src/mesa/drivers/dri/i965/intel_tex_image.c
> +++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
> @@ -135,6 +135,33 @@ intelTexImage(struct gl_context * ctx,
>  }
>  
>  
> +static void
> +intel_set_texture_image_mt(struct brw_context *brw,
> +   struct gl_texture_image *image,
> +   struct intel_mipmap_tree *mt)
> +
> +{
> +   const uint32_t internal_format = _mesa_get_format_base_format(mt->format);
> +   struct gl_texture_object *texobj = image->TexObject;
> +   struct intel_texture_object *intel_texobj = intel_texture_object(texobj);
> +   struct intel_texture_image *intel_image = intel_texture_image(image);
> +
> +   _mesa_init_teximage_fields(>ctx, image,
> +   mt->logical_width0, mt->logical_height0, 1,
> +   0, internal_format, mt->format);

Indentation looks a little odd here.

> +
> +   brw->ctx.Driver.FreeTextureImageBuffer(>ctx, image);
> +
> +   intel_texobj->needs_validate = true;
> +   intel_image->base.RowStride = mt->pitch / mt->cpp;
> +   assert(mt->pitch % mt->cpp == 0);
> +
> +   intel_miptree_reference(_image->mt, mt);
> +
> +   /* Immediately validate the image to the object. */
> +   intel_miptree_reference(_texobj->mt, mt);
> +}
> +
>  /**
>   * Binds a BO to a texture image, as if it was uploaded by glTexImage2D().
>   *
> @@ -154,29 +181,21 @@ intel_set_texture_image_bo(struct gl_context *ctx,
> uint32_t layout_flags)
>  {
> struct brw_context *brw = brw_context(ctx);
> -   struct intel_texture_image *intel_image = intel_texture_image(image);
> -   struct gl_texture_object *texobj = image->TexObject;
> -   struct intel_texture_object *intel_texobj = intel_texture_object(texobj);
> uint32_t draw_x, draw_y;
> +   struct intel_mipmap_tree *mt;
>  
> -   _mesa_init_teximage_fields(>ctx, image,
> -   width, height, 1,
> -   0, internalFormat, format);
> -
> -   ctx->Driver.FreeTextureImageBuffer(ctx, image);
> -
> -   intel_image->mt = intel_miptree_create_for_bo(brw, bo, image->TexFormat,
> - 0, width, height, 1, pitch,
> - layout_flags);
> -   if (intel_image->mt == NULL)
> +   mt = intel_miptree_create_for_bo(brw, bo, image->TexFormat,
> +0, width, height, 1, pitch,
> +layout_flags);
> +   if (mt == NULL)
> return;
> -   intel_image->mt->target = target;
> -   intel_image->mt->total_width = width;
> -   intel_image->mt->total_height = height;
> -   intel_image->mt->level[0].slice[0].x_offset = tile_x;
> -   intel_image->mt->level[0].slice[0].y_offset = tile_y;
> +   mt->target = target;
> +   mt->total_width = width;
> +   mt->total_height = height;
> +   mt->level[0].slice[0].x_offset = tile_x;
> +   mt->level[0].slice[0].y_offset = tile_y;
>  
> -   intel_miptree_get_tile_offsets(intel_image->mt, 0, 0, _x, _y);
> +   intel_miptree_get_tile_offsets(mt, 0, 0, _x, _y);
>  
> /* From "OES_EGL_image" error reporting. We report GL_INVALID_OPERATION
>  * for EGL images from non-tile aligned sufaces in gen4 hw and earlier 
> which has
> @@ -185,18 +204,14 @@ intel_set_texture_image_bo(struct gl_context *ctx,
> if (!brw->has_surface_tile_offset &&
> (draw_x != 0 || draw_y != 0)) {
>_mesa_error(ctx, GL_INVALID_OPERATION, __func__);
> -  intel_miptree_release(_image->mt);
> +  intel_miptree_release();
>return;
> }
>  
> -   intel_texobj->needs_validate = true;
> -
> -   intel_image->mt->offset = offset;
> -   assert(pitch % intel_image->mt->cpp == 0);
> -   intel_image->base.RowStride = pitch / intel_image->mt->cpp;
> +   mt->offset = offset;
>  
> -   /* Immediately validate the image to the object. */
> -   intel_miptree_reference(_texobj->mt, intel_image->mt);
> +   intel_set_texture_image_mt(brw, image, mt);
> +   intel_miptree_release();

We can drop the reference counter here because we explicitly associated the
miptree with intel_image and intel_texobj in intel_set_texture_image_mt()?

>  }
>  
>  void
> -- 
> 2.5.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list

[Mesa-dev] GBM backend dynamic dispatch method

2016-05-11 Thread Yu, Qiang

Hi guys,


Let me introduce myself. My name is Qiang Yu, I'm a developer of amdgpu-pro 
driver.

As you know the amdgpu-pro adopts some open source part like GBM but due to its

close source OGL part, we implement our own GBM backend.


Currently libgbm only support static selection of GBM backend by GBM_BACKEND,

so for the hybrid GPU case like Intel iGPU + AMD dGPU and AMD dGPU is drived

by amdgpu-pro, it's not convenient for client to switch backend all the time 
and even

impossible for applications that need to deal with both GPUs like the XServer.


So I'm wondering a dynamic dispatch method and hope it can go upstream to the 
libgbm:

1. create a /etc/gbm/xxx.conf for libgbm to read when none default backend 
needed

2. the content should be like: :

In the amdgpu-pro case, the content is: amdgpu:gbm_amdgpu.so


This method need libgbm use libdrm to determine the FD kernel driver first.

Any feedback on this method and the hope to go upstream?


Thanks,

Qiang

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] i965/blorp: Special-case the clear color in MSAA resolves

2016-05-11 Thread Ilia Mirkin

On Wed, May 11, 2016 at 10:42 PM, Jason Ekstrand  wrote:
> The current MSAA resolve code has a special-case for if the MCS value is 0.
> In this case we can only sample once because we know that all values are in
> slice 0.  This commit adds a second optimization that detecs the magic MCS
> value that indicates the clear color and grabs the color from a push
> constant and avoids sampling altogether.  On a microbenchmark written by
> Neil Roberts that tests resolving surfaces with just clear color, this
> improves performance by 60% for 8x, 40% for 4x, and 28% for 2x MSAA on my
> SKL gte3 laptop.  The benchmark can be found on the ML archive:
>
> https://lists.freedesktop.org/archives/mesa-dev/2016-February/108077.html
> ---
>  src/mesa/drivers/dri/i965/brw_blorp.h|   4 +-
>  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 101 
> +--
>  2 files changed, 100 insertions(+), 5 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h 
> b/src/mesa/drivers/dri/i965/brw_blorp.h
> index 15114d0..9d71ca4 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.h
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.h
> @@ -197,7 +197,9 @@ struct brw_blorp_wm_push_constants
> uint32_t src_z;
>
> /* Pad out to an integral number of registers */
> -   uint32_t pad[5];
> +   uint32_t pad;
> +
> +   union gl_color_union clear_color;
>  };
>
>  #define BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS \
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
> b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> index 514a316..45b696d 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> @@ -346,6 +346,7 @@ struct brw_blorp_blit_vars {
>nir_variable *offset;
> } u_x_transform, u_y_transform;
> nir_variable *u_src_z;
> +   nir_variable *u_clear_color;
>
> /* gl_FragCoord */
> nir_variable *frag_coord;
> @@ -374,6 +375,7 @@ brw_blorp_blit_vars_init(nir_builder *b, struct 
> brw_blorp_blit_vars *v,
> LOAD_UNIFORM(y_transform.multiplier, glsl_float_type())
> LOAD_UNIFORM(y_transform.offset, glsl_float_type())
> LOAD_UNIFORM(src_z, glsl_uint_type())
> +   LOAD_UNIFORM(clear_color, glsl_vec4_type())
>
>  #undef DECL_UNIFORM
>
> @@ -858,7 +860,8 @@ static nir_ssa_def *
>  blorp_nir_manual_blend_average(nir_builder *b, nir_ssa_def *pos,
> unsigned tex_samples,
> enum intel_msaa_layout tex_layout,
> -   enum brw_reg_type dst_type)
> +   enum brw_reg_type dst_type,
> +   struct brw_blorp_blit_vars *v)
>  {
> /* If non-null, this is the outer-most if statement */
> nir_if *outer_if = NULL;
> @@ -867,9 +870,53 @@ blorp_nir_manual_blend_average(nir_builder *b, 
> nir_ssa_def *pos,
>nir_local_variable_create(b->impl, glsl_vec4_type(), "color");
>
> nir_ssa_def *mcs = NULL;
> -   if (tex_layout == INTEL_MSAA_LAYOUT_CMS)
> +   if (tex_layout == INTEL_MSAA_LAYOUT_CMS) {
>mcs = blorp_nir_txf_ms_mcs(b, pos);
>
> +  /* The MCS buffer stores a packed value that provides a mapping from
> +   * samples to array slices.  The magic value of all ones means that all
> +   * samples have the clear color.  In this case, we can short-circuit 
> the
> +   * sampling process and just use the clear color that we pushed into 
> the
> +   * shader.
> +   */
> +  nir_ssa_def *is_clear_color;
> +  switch (tex_samples) {
> +  case 2:
> + /* Empirical evidence suggests that the value returned from the
> +  * sampler is not always 0x3 for clear color so we need to mask it.
> +  */
> + is_clear_color =
> +nir_ieq(b, nir_iand(b, nir_channel(b, mcs, 0), nir_imm_int(b, 
> 0x3)),
> +   nir_imm_int(b, 0x3));
> + break;
> +  case 4:
> + is_clear_color =
> +nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b, 0xff));
> + break;
> +  case 8:
> + is_clear_color =
> +nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b, ~0));
> + break;
> +  case 16:
> + is_clear_color =
> +nir_ior(b, nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b, 
> ~0)),
> +   nir_ieq(b, nir_channel(b, mcs, 1), nir_imm_int(b, 
> ~0)));
> + break;
> +  default:
> + unreachable("Invalid sample count");
> +  }
> +
> +  nir_if *if_stmt = nir_if_create(b->shader);
> +  if_stmt->condition = nir_src_for_ssa(is_clear_color);
> +  nir_cf_node_insert(b->cursor, _stmt->cf_node);
> +
> +  b->cursor = nir_after_cf_list(_stmt->then_list);
> +  nir_store_var(b, color, nir_load_var(b, v->u_clear_color), 0xf);
> +
> +  b->cursor = nir_after_cf_list(_stmt->else_list);
> +  outer_if = if_stmt;
> +   }
> +
> /* We add together samples using a

[Mesa-dev] [PATCH v2] i965/blorp: Special-case the clear color in MSAA resolves

2016-05-11 Thread Jason Ekstrand

The current MSAA resolve code has a special-case for if the MCS value is 0.
In this case we can only sample once because we know that all values are in
slice 0.  This commit adds a second optimization that detecs the magic MCS
value that indicates the clear color and grabs the color from a push
constant and avoids sampling altogether.  On a microbenchmark written by
Neil Roberts that tests resolving surfaces with just clear color, this
improves performance by 60% for 8x, 40% for 4x, and 28% for 2x MSAA on my
SKL gte3 laptop.  The benchmark can be found on the ML archive:

https://lists.freedesktop.org/archives/mesa-dev/2016-February/108077.html
---
 src/mesa/drivers/dri/i965/brw_blorp.h|   4 +-
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 101 +--
 2 files changed, 100 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h 
b/src/mesa/drivers/dri/i965/brw_blorp.h
index 15114d0..9d71ca4 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.h
+++ b/src/mesa/drivers/dri/i965/brw_blorp.h
@@ -197,7 +197,9 @@ struct brw_blorp_wm_push_constants
uint32_t src_z;
 
/* Pad out to an integral number of registers */
-   uint32_t pad[5];
+   uint32_t pad;
+
+   union gl_color_union clear_color;
 };
 
 #define BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS \
diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index 514a316..45b696d 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -346,6 +346,7 @@ struct brw_blorp_blit_vars {
   nir_variable *offset;
} u_x_transform, u_y_transform;
nir_variable *u_src_z;
+   nir_variable *u_clear_color;
 
/* gl_FragCoord */
nir_variable *frag_coord;
@@ -374,6 +375,7 @@ brw_blorp_blit_vars_init(nir_builder *b, struct 
brw_blorp_blit_vars *v,
LOAD_UNIFORM(y_transform.multiplier, glsl_float_type())
LOAD_UNIFORM(y_transform.offset, glsl_float_type())
LOAD_UNIFORM(src_z, glsl_uint_type())
+   LOAD_UNIFORM(clear_color, glsl_vec4_type())
 
 #undef DECL_UNIFORM
 
@@ -858,7 +860,8 @@ static nir_ssa_def *
 blorp_nir_manual_blend_average(nir_builder *b, nir_ssa_def *pos,
unsigned tex_samples,
enum intel_msaa_layout tex_layout,
-   enum brw_reg_type dst_type)
+   enum brw_reg_type dst_type,
+   struct brw_blorp_blit_vars *v)
 {
/* If non-null, this is the outer-most if statement */
nir_if *outer_if = NULL;
@@ -867,9 +870,53 @@ blorp_nir_manual_blend_average(nir_builder *b, nir_ssa_def 
*pos,
   nir_local_variable_create(b->impl, glsl_vec4_type(), "color");
 
nir_ssa_def *mcs = NULL;
-   if (tex_layout == INTEL_MSAA_LAYOUT_CMS)
+   if (tex_layout == INTEL_MSAA_LAYOUT_CMS) {
   mcs = blorp_nir_txf_ms_mcs(b, pos);
 
+  /* The MCS buffer stores a packed value that provides a mapping from
+   * samples to array slices.  The magic value of all ones means that all
+   * samples have the clear color.  In this case, we can short-circuit the
+   * sampling process and just use the clear color that we pushed into the
+   * shader.
+   */
+  nir_ssa_def *is_clear_color;
+  switch (tex_samples) {
+  case 2:
+ /* Empirical evidence suggests that the value returned from the
+  * sampler is not always 0x3 for clear color so we need to mask it.
+  */
+ is_clear_color =
+nir_ieq(b, nir_iand(b, nir_channel(b, mcs, 0), nir_imm_int(b, 
0x3)),
+   nir_imm_int(b, 0x3));
+ break;
+  case 4:
+ is_clear_color =
+nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b, 0xff));
+ break;
+  case 8:
+ is_clear_color =
+nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b, ~0));
+ break;
+  case 16:
+ is_clear_color =
+nir_ior(b, nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b, ~0)),
+   nir_ieq(b, nir_channel(b, mcs, 1), nir_imm_int(b, ~0)));
+ break;
+  default:
+ unreachable("Invalid sample count");
+  }
+
+  nir_if *if_stmt = nir_if_create(b->shader);
+  if_stmt->condition = nir_src_for_ssa(is_clear_color);
+  nir_cf_node_insert(b->cursor, _stmt->cf_node);
+
+  b->cursor = nir_after_cf_list(_stmt->then_list);
+  nir_store_var(b, color, nir_load_var(b, v->u_clear_color), 0xf);
+
+  b->cursor = nir_after_cf_list(_stmt->else_list);
+  outer_if = if_stmt;
+   }
+
/* We add together samples using a binary tree structure, e.g. for 4x MSAA:
 *
 *   result = ((sample[0] + sample[1]) + (sample[2] + sample[3])) / 4
@@ -937,7 +984,8 @@ blorp_nir_manual_blend_average(nir_builder *b, nir_ssa_def 
*pos,
  nir_store_var(b, color, texture_data[0], 0xf);
 
  b->cursor =

Re: [Mesa-dev] [PATCH 00/14] vl dri3 support for vaapi and vdpau

2016-05-11 Thread Mike Lothian

Hi Axel

Is the thread_submit=true only for nine or does it work with all of DRI3?

I'm keen to get rid of the tearing on my Skylake/Tonga setup

Thanks

Mike

On Wed, 11 May 2016 at 16:57 Axel Davy  wrote:

> Hi,
>
> Do you have some local branch to review all at once (it is a bit hard to
> follow with the patches) ?
>
>  From a quick looks, it seems you inspired from the loader dri3 code.
>
> There is also another implementation you can inspire from:
> https://github.com/iXit/wine/blob/master/dlls/d3d9-nine/dri3.c
> Probably not much more you can get from it.
>
> I haven't checked the code yet, so I don't know if that applies,
> something I have noticed on my tonga with games, is that (non-vsynced)
> apps that get around 45 fps fell like 15 fps (above 50 or below 35 is
> fine).
> I guess this is due to the fact the screen buffer swap waits the buffer
> has finished rendering to execute the swap, and some bad timing when
> hitting 45 fps.
> In fact for this specific case with gallium nine, I noticed the problem
> disappear when using thread_submit=true.
> thread_submit is an option that was designed for DRI_PRIME case in mind:
> the driver spawns a thread that will wait the buffers we want to present
> are finished rendering before sending them. That solves all the sync
> issues a DRI_PRIME configuration can have.
> I think in the case of the problem described, sending buffers that are
> finished rendering prevents the screen buffer swap to have to wait
> another vblank the buffer is rendered.
>
> I guess for video, you really don't want to hit the bad scenario
> described. I'm not sure if you can possibly have the issue or not, but
> that may be something to consider. In all cases, that seems a good thing
> to look at if wanting to implement a good DRI_PRIME support, granting it
> is possible: I don't know the user API, but if the user has guarantee
> for example the updated content will be copied to some pixmap after some
> call, you cannot delay the presentation for that case.
>
> Axel
>
>
> On 11/05/2016 17:06, Leo Liu wrote :
> > This series implement DRI3 supports for VA-API and VDPAU. It implements
> > supports for DRI3 Open, PixmapFromBuffer, BufferFromPixmap, and for
> > PRESENT including PresentPixmap, PresentNotifyMSC, PresentIdleNotify,
> > PresentConfigureNotify and PresentCompleteNotify.
> >
> > It has been tested with player mpv and vlc with various clips from
> > 480p to 4K with framerate from 24 to 60. Also includes window mode
> > and fullscreen w/wo compositing manager. The test also includes VA-API
> > glx extension.
> >
> > There's still some future work like DRI_PRIME different GPU support
> > to be added.
> >
> > Leo Liu (14):
> >vl: add DRI3 support infrastructure
> >vl/dri3: implement dri3 screen create and destroy
> >vl/dri3: set drawable geometry
> >vl/dri3: register present events
> >vl/dri3: implement flushing for queued events
> >vl/dri3: add back buffers support
> >vl/dri3: implement function for flush frontbuffer
> >vl/dri3: implement funciton for get dirty area
> >vl/dri3: add support for resizing
> >vl/dri3: implement DRI3 BufferFromPixmap
> >st/va: add dri3 support
> >vl/dri3: handle PresentCompleteNotify event
> >vl/dri3: implement functions for get and set timestamp
> >st/vdpau: add dri3 support
> >
> >   configure.ac  |   7 +-
> >   src/gallium/auxiliary/Makefile.sources|   5 +
> >   src/gallium/auxiliary/vl/vl_winsys.h  |   5 +
> >   src/gallium/auxiliary/vl/vl_winsys_dri3.c | 703
> ++
> >   src/gallium/state_trackers/va/context.c   |   6 +-
> >   src/gallium/state_trackers/vdpau/device.c |   6 +-
> >   6 files changed, 729 insertions(+), 3 deletions(-)
> >   create mode 100644 src/gallium/auxiliary/vl/vl_winsys_dri3.c
> >
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95346] Stellaris - Black/super dark planets

2016-05-11 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95346

--- Comment #8 from Ilia Mirkin  ---
First bad draw call in the referenced trace is 803513, I believe. That ends up
with the "bad" earth - the S3TC texture of the earth doesn't appear to be
making it on there.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 13/14] vl/dri3: implement functions for get and set timestamp

2016-05-11 Thread Leo Liu

Hi Axel,

Just clarify something you might got misunderstand on vl implementation 
perspective.

>The present extension has something exactly to set the target ust for 
the presentation: PresentOptionUST

>Unfortunately, while it is in the spec it looks like the option is 
totally ignored, and thus it will be totally buggy (you are supposed to 
pass ust instead of msc...).

We have to set target msc for presentation, not target ust, otherwise 
will mismatch the vsync and got tearing.
using last_ust here  is for player to get timestamp of last frame being 
presented in order for player to set new timestamp.
PresentOptionUST is to schedule next presentation at specified UST time, 
so it's not suitable here.

>I think it may get issues if ns_frame is wrong. For example for some 
reason (app hidden for some frame, or monitor shut, or whatever), I 
think we could get >two buffers getting complete event with same ust 
(one skipped, and one shown).

I don't think this is correct scenario. in each frame(except the first 
one) we will wait to get sbc reply/or serial update to make sure 
different buffer/frame get different ust, otherwise the playback will 
drop frames and get stuttering.

>DRI3/Present are designed to allow more liberty for the server. It is 
expected in future version
>the window managers will handle some of the events, and it may behave 
quite different than currently.

>Besides DRI3/Present will likely be plugged into Xwayland (currently 
it uses the emulation code, but the extension can be implemented with 
wayland calls, I >had a patch for that), and there again it is different 
behaviour.

>Thus I advise against having code that 'works well in practice' but 
can fail with something that is allowed in the spec.

Like being said in the cover letter,  it has implemented spec for 
adapting current X sever, window mangers, and being well tested in this 
environment. And we will get into new environment like Xwayland for next 
step, if something fails and that's in the spec, definitely new spec 
will get implemented.

Thanks a lot for your commenting, that has been very helpful!

Leo

On 05/11/2016 05:43 PM, Leo Liu wrote:

On 05/11/2016 05:37 PM, Axel Davy wrote:

On 11/05/2016 23:31, Leo Liu wrote:

On 05/11/2016 05:18 PM, Axel Davy wrote:

On 11/05/2016 23:08, Leo Liu wrote:
scrn->next_msc = ((int64_t)stamp - scrn->last_ust + 
scrn->ns_frame/2) /
+   scrn->ns_frame + scrn->last_msc; 

Could you explain this calculation ?

ns_frame is the time for vsync in ns.
last_ust is time of last frame is finished presentation.
last_msc is msc of last frame is finished presentation.
stamp is player expected time for next frame to present.
and do the round up.

thanks

I think it may get issues if ns_frame is wrong. For example for 
some reason (app hidden for some frame, or monitor shut, or 
whatever), I think we could get two buffers getting complete event 
with same ust (one skipped, and one shown).

It works well very well so far for DRI2.

DRI3/Present are designed to allow more liberty for the server. It is 
expected in future version
the window managers will handle some of the events, and it may behave 
quite different than currently.

Besides DRI3/Present will likely be plugged into Xwayland (currently 
it uses the emulation code, but the extension can be implemented with 
wayland calls, I had a patch for that), and there again it is 
different behaviour.

Thus I advise against having code that 'works well in practice' but 
can fail with something that is allowed in the spec.

Agreed, this is initial work. we will keep making progress.

Thanks,
Leo

Thanks,
Leo

I think the calculation should be made more robust to issues with 
ns_frame. Perhaps do some temporal averaging of ns_frame and ignore 
outliers ?

Axel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95346] Stellaris - Black/super dark planets

2016-05-11 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95346

--- Comment #7 from Ilia Mirkin  ---
Replaying on nvc0 also brings up the same issue. So the current status is

llvmpipe: fail
nvc0: fail
radeonsi: fail (i assume, otherwise we wouldn't have this bug)
i965/SKL: success

Feels like a Gallium issue.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95346] Stellaris - Black/super dark planets

2016-05-11 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95346

--- Comment #6 from Michel Dänzer  ---
(In reply to Christopher W. Carpenter from comment #4)
> I'm not sure if this changes anything as far as assigning it to mesa core,
> but my laptop running the i915 driver with mesa 11.1.3 does not exhibit the
> rendering issue.

It doesn't change anything, we're using the Mesa core component for
driver-independent Gallium issues as well.


> (would an API trace from that laptop matter at all?)

It would be interesting (in particular, whether replaying that trace on a
Gallium driver also reproduces the problem), yes.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] docs: Mark GL_OES_shader_io_blocks as started

2016-05-11 Thread Ian Romanick

From: Ian Romanick 

Watch the oes_shader_io_blocks of my fd.o Mesa GIT repo for progress.

Signed-off-by: Ian Romanick 
---
 docs/GL3.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index e2dabea..dce6421 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -265,7 +265,7 @@ GLES3.2, GLSL ES 3.2
   GL_OES_sample_shading DONE (nvc0, r600, 
radeonsi)
   GL_OES_sample_variables   DONE (nvc0, r600, 
radeonsi)
   GL_OES_shader_image_atomicDONE (all drivers that 
support GL_ARB_shader_image_load_store)
-  GL_OES_shader_io_blocks   not started (based on 
parts of GLSL 1.50, which is done)
+  GL_OES_shader_io_blocks   started (idr)
   GL_OES_shader_multisample_interpolation   DONE (nvc0, r600, 
radeonsi)
   GL_OES_tessellation_shadernot started (based on 
GL_ARB_tessellation_shader, which is done for some drivers)
   GL_OES_texture_border_clamp   DONE (all drivers)
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95346] Stellaris - Black/super dark planets

2016-05-11 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95346

Médéric Boquien  changed:

   What|Removed |Added

 CC||mboqu...@free.fr

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Add OpenSWR to GL3.txt?

2016-05-11 Thread Andrew J

In lieu of GL3.txt being updated, is there some way for me to get an
idea about what OpenSWR supports without digging through the code /
building tests?

FWIW, as an outsider I use mesamatrix to get a very nice overview on
what the different renderers support. I consider that valuable.

On Wed, May 11, 2016 at 9:37 AM, Ilia Mirkin  wrote:
> It is whatever you (i.e. driver maintainer) want it to be. GL3.txt is
> mainly for coordinating development and letting people know who's
> working on what (less so of late though). If you plan on exposing GL
> 4.0+, it can be a nice TODO list. Otherwise there's not an immense
> amount of value.
>
>   -ilia
>
> On Wed, May 11, 2016 at 12:28 PM, Rowley, Timothy O
>  wrote:
>> What is the criteria for marking an extension “done”?  Passing some 
>> percentage (all?) of relevant piglit tests?
>>
>> -Tim
>>
>>> On May 10, 2016, at 10:31 PM, Andrew J  wrote:
>>>
>>> Is there any possibility that OpenSWR can be added to GL3.txt [1] so
>>> others can get an idea of what things OpenSWR supports?
>>>
>>> GL3.txt is what mesamatrix [2] uses, so adding OpenSWR to GL3.txt
>>> would add it there as well.
>>>
>>> [1] https://cgit.freedesktop.org/mesa/mesa/tree/docs/GL3.txt
>>> [2] http://mesamatrix.net
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/blorp: Special-case the clear color in MSAA resolves

2016-05-11 Thread Jason Ekstrand

I need to recind this patch.  I thought it worked but it's far more
half-baked than I realized.  There's some issue with swizzles interacting
with the clear color. :-(
--Jason

On Tue, May 10, 2016 at 9:45 PM, Jason Ekstrand 
wrote:

> The current MSAA resolve code has a special-case for if the MCS value is 0.
> In this case we can only sample once because we know that all values are in
> slice 0.  This commit adds a second optimization that detecs the magic MCS
> value that indicates the clear color and grabs the color from a push
> constant and avoids sampling altogether.  On a microbenchmark written by
> Neil Roberts that tests resolving surfaces with just clear color, this
> improves performance by 60% for 8x, 40% for 4x, and 28% for 2x MSAA on my
> SKL gte3 laptop.  The benchmark can be found on the ML archive:
>
> https://lists.freedesktop.org/archives/mesa-dev/2016-February/108077.html
> ---
>  src/mesa/drivers/dri/i965/brw_blorp.h|  4 +-
>  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 72
> ++--
>  2 files changed, 71 insertions(+), 5 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h
> b/src/mesa/drivers/dri/i965/brw_blorp.h
> index 5f7569c..550c6c5 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.h
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.h
> @@ -197,7 +197,9 @@ struct brw_blorp_wm_push_constants
> uint32_t src_z;
>
> /* Pad out to an integral number of registers */
> -   uint32_t pad[5];
> +   uint32_t pad;
> +
> +   union gl_color_union clear_color;
>  };
>
>  #define BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS \
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> index 97e3908..314034e 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> @@ -346,6 +346,7 @@ struct brw_blorp_blit_vars {
>nir_variable *offset;
> } u_x_transform, u_y_transform;
> nir_variable *u_src_z;
> +   nir_variable *u_clear_color;
>
> /* gl_FragCoord */
> nir_variable *frag_coord;
> @@ -374,6 +375,7 @@ brw_blorp_blit_vars_init(nir_builder *b, struct
> brw_blorp_blit_vars *v,
> LOAD_UNIFORM(y_transform.multiplier, glsl_float_type())
> LOAD_UNIFORM(y_transform.offset, glsl_float_type())
> LOAD_UNIFORM(src_z, glsl_uint_type())
> +   LOAD_UNIFORM(clear_color, glsl_vec4_type())
>
>  #undef DECL_UNIFORM
>
> @@ -858,7 +860,8 @@ static nir_ssa_def *
>  blorp_nir_manual_blend_average(nir_builder *b, nir_ssa_def *pos,
> unsigned tex_samples,
> enum intel_msaa_layout tex_layout,
> -   enum brw_reg_type dst_type)
> +   enum brw_reg_type dst_type,
> +   struct brw_blorp_blit_vars *v)
>  {
> /* If non-null, this is the outer-most if statement */
> nir_if *outer_if = NULL;
> @@ -867,9 +870,53 @@ blorp_nir_manual_blend_average(nir_builder *b,
> nir_ssa_def *pos,
>nir_local_variable_create(b->impl, glsl_vec4_type(), "color");
>
> nir_ssa_def *mcs = NULL;
> -   if (tex_layout == INTEL_MSAA_LAYOUT_CMS)
> +   if (tex_layout == INTEL_MSAA_LAYOUT_CMS) {
>mcs = blorp_nir_txf_ms_mcs(b, pos);
>
> +  /* The MCS buffer stores a packed value that provides a mapping from
> +   * samples to array slices.  The magic value of all ones means that
> all
> +   * samples have the clear color.  In this case, we can
> short-circuit the
> +   * sampling process and just use the clear color that we pushed
> into the
> +   * shader.
> +   */
> +  nir_ssa_def *is_clear_color;
> +  switch (tex_samples) {
> +  case 2:
> + /* Empirical evidence suggests that the value returned from the
> +  * sampler is not always 0x3 for clear color so we need to mask
> it.
> +  */
> + is_clear_color =
> +nir_ieq(b, nir_iand(b, nir_channel(b, mcs, 0), nir_imm_int(b,
> 0x3)),
> +   nir_imm_int(b, 0x3));
> + break;
> +  case 4:
> + is_clear_color =
> +nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b, 0xff));
> + break;
> +  case 8:
> + is_clear_color =
> +nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b, ~0));
> + break;
> +  case 16:
> + is_clear_color =
> +nir_ior(b, nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b,
> ~0)),
> +   nir_ieq(b, nir_channel(b, mcs, 1), nir_imm_int(b,
> ~0)));
> + break;
> +  default:
> + unreachable("Invalid sample count");
> +  }
> +
> +  nir_if *if_stmt = nir_if_create(b->shader);
> +  if_stmt->condition = nir_src_for_ssa(is_clear_color);
> +  nir_cf_node_insert(b->cursor, _stmt->cf_node);
> +
> +  b->cursor = nir_after_cf_list(_stmt->then_list);
> +  nir_store_var(b, color, nir_load_var(b,

Re: [Mesa-dev] Add OpenSWR to GL3.txt?

2016-05-11 Thread Ilia Mirkin

https://people.freedesktop.org/~imirkin/glxinfo/glxinfo.html

When swr makes it into a release, I'll be sure to add it in. This
shows what extensions each hardware group has. Note that this isn't
necessarily 1:1 with driver - a single driver might support several
iterations of hardware, and provide different levels of support on
different pieces of hardware.

On Wed, May 11, 2016 at 7:28 PM, Andrew J  wrote:
> In lieu of GL3.txt being updated, is there some way for me to get an
> idea about what OpenSWR supports without digging through the code /
> building tests?
>
> FWIW, as an outsider I use mesamatrix to get a very nice overview on
> what the different renderers support. I consider that valuable.
>
> On Wed, May 11, 2016 at 9:37 AM, Ilia Mirkin  wrote:
>> It is whatever you (i.e. driver maintainer) want it to be. GL3.txt is
>> mainly for coordinating development and letting people know who's
>> working on what (less so of late though). If you plan on exposing GL
>> 4.0+, it can be a nice TODO list. Otherwise there's not an immense
>> amount of value.
>>
>>   -ilia
>>
>> On Wed, May 11, 2016 at 12:28 PM, Rowley, Timothy O
>>  wrote:
>>> What is the criteria for marking an extension “done”?  Passing some 
>>> percentage (all?) of relevant piglit tests?
>>>
>>> -Tim
>>>
 On May 10, 2016, at 10:31 PM, Andrew J  wrote:

 Is there any possibility that OpenSWR can be added to GL3.txt [1] so
 others can get an idea of what things OpenSWR supports?

 GL3.txt is what mesamatrix [2] uses, so adding OpenSWR to GL3.txt
 would add it there as well.

 [1] https://cgit.freedesktop.org/mesa/mesa/tree/docs/GL3.txt
 [2] http://mesamatrix.net
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Add OpenSWR to GL3.txt?

2016-05-11 Thread Roland Scheidegger

Typically, if you expose the cap bits so that support for some extension
is announced you'd consider it done.
That does not necessarily mean that all the corresponding piglit tests
are passing, but of course you should generally not announce support for
features which don't really work (but you might find it acceptable if
only some corner cases don't when flipping the switch for some cap bit).

Roland

Am 11.05.2016 um 18:28 schrieb Rowley, Timothy O:
> What is the criteria for marking an extension “done”?  Passing some 
> percentage (all?) of relevant piglit tests?
> 
> -Tim
> 
>> On May 10, 2016, at 10:31 PM, Andrew J  wrote:
>>
>> Is there any possibility that OpenSWR can be added to GL3.txt [1] so
>> others can get an idea of what things OpenSWR supports?
>>
>> GL3.txt is what mesamatrix [2] uses, so adding OpenSWR to GL3.txt
>> would add it there as well.
>>
>> [1] https://cgit.freedesktop.org/mesa/mesa/tree/docs/GL3.txt
>> [2] http://mesamatrix.net
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/5] anv: Port L3 cache programming from i965

2016-05-11 Thread Jordan Justen

Signed-off-by: Jordan Justen 
---
 src/intel/vulkan/Makefile.sources  |   4 +
 src/intel/vulkan/anv_genX.h|   4 +-
 src/intel/vulkan/anv_pipeline.c|  33 ++-
 src/intel/vulkan/anv_private.h |  12 +-
 src/intel/vulkan/gen7_cmd_buffer.c |  94 +--
 src/intel/vulkan/gen8_cmd_buffer.c |  74 +
 src/intel/vulkan/genX_cmd_buffer.c |   2 +-
 src/intel/vulkan/genX_l3.c | 541 +
 src/intel/vulkan/genX_pipeline.c   |   2 +
 9 files changed, 593 insertions(+), 173 deletions(-)
 create mode 100644 src/intel/vulkan/genX_l3.c

diff --git a/src/intel/vulkan/Makefile.sources 
b/src/intel/vulkan/Makefile.sources
index 182c1e1..6c6b29d 100644
--- a/src/intel/vulkan/Makefile.sources
+++ b/src/intel/vulkan/Makefile.sources
@@ -70,6 +70,7 @@ VULKAN_GENERATED_FILES := \
 
 GEN7_FILES := \
genX_cmd_buffer.c \
+   genX_l3.c \
genX_pipeline.c \
gen7_cmd_buffer.c \
gen7_pipeline.c \
@@ -77,6 +78,7 @@ GEN7_FILES := \
 
 GEN75_FILES := \
genX_cmd_buffer.c \
+   genX_l3.c \
genX_pipeline.c \
gen7_cmd_buffer.c \
gen7_pipeline.c \
@@ -84,6 +86,7 @@ GEN75_FILES := \
 
 GEN8_FILES := \
genX_cmd_buffer.c \
+   genX_l3.c \
genX_pipeline.c \
gen8_cmd_buffer.c \
gen8_pipeline.c \
@@ -91,6 +94,7 @@ GEN8_FILES := \
 
 GEN9_FILES := \
genX_cmd_buffer.c \
+   genX_l3.c \
genX_pipeline.c \
gen8_cmd_buffer.c \
gen8_pipeline.c \
diff --git a/src/intel/vulkan/anv_genX.h b/src/intel/vulkan/anv_genX.h
index 908a9e0..a5ec27d 100644
--- a/src/intel/vulkan/anv_genX.h
+++ b/src/intel/vulkan/anv_genX.h
@@ -42,8 +42,10 @@ void genX(cmd_buffer_set_subpass)(struct anv_cmd_buffer 
*cmd_buffer,
 void genX(flush_pipeline_select_3d)(struct anv_cmd_buffer *cmd_buffer);
 void genX(flush_pipeline_select_gpgpu)(struct anv_cmd_buffer *cmd_buffer);
 
+void genX(setup_pipeline_l3_config)(struct anv_pipeline *pipeline);
+
 void genX(cmd_buffer_config_l3)(struct anv_cmd_buffer *cmd_buffer,
-bool enable_slm);
+const struct anv_pipeline *pipeline);
 
 void genX(cmd_buffer_flush_state)(struct anv_cmd_buffer *cmd_buffer);
 void genX(cmd_buffer_flush_dynamic_state)(struct anv_cmd_buffer *cmd_buffer);
diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index fcaa450..b774e0c 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -801,10 +801,34 @@ anv_pipeline_compile_cs(struct anv_pipeline *pipeline,
return VK_SUCCESS;
 }
 
-static void
-gen7_compute_urb_partition(struct anv_pipeline *pipeline)
+
+void
+anv_setup_pipeline_l3_config(struct anv_pipeline *pipeline)
+{
+   const struct brw_device_info *devinfo = >device->info;
+   switch (devinfo->gen) {
+   case 7:
+  if (devinfo->is_haswell)
+ gen75_setup_pipeline_l3_config(pipeline);
+  else
+ gen7_setup_pipeline_l3_config(pipeline);
+  break;
+   case 8:
+  gen8_setup_pipeline_l3_config(pipeline);
+  break;
+   case 9:
+  gen9_setup_pipeline_l3_config(pipeline);
+  break;
+   default:
+  unreachable("unsupported gen\n");
+   }
+}
+
+void
+anv_compute_urb_partition(struct anv_pipeline *pipeline)
 {
const struct brw_device_info *devinfo = >device->info;
+
bool vs_present = pipeline->active_stages & VK_SHADER_STAGE_VERTEX_BIT;
unsigned vs_size = vs_present ?
   get_vs_prog_data(pipeline)->base.urb_entry_size : 1;
@@ -828,7 +852,7 @@ gen7_compute_urb_partition(struct anv_pipeline *pipeline)
unsigned chunk_size_bytes = 8192;
 
/* Determine the size of the URB in chunks. */
-   unsigned urb_chunks = devinfo->urb.size * 1024 / chunk_size_bytes;
+   unsigned urb_chunks = pipeline->urb.total_size * 1024 / chunk_size_bytes;
 
/* Reserve space for push constants */
unsigned push_constant_kb;
@@ -1196,7 +1220,8 @@ anv_pipeline_init(struct anv_pipeline *pipeline,
   assert(extra->disable_vs);
}
 
-   gen7_compute_urb_partition(pipeline);
+   anv_setup_pipeline_l3_config(pipeline);
+   anv_compute_urb_partition(pipeline);
 
const VkPipelineVertexInputStateCreateInfo *vi_info =
   pCreateInfo->pVertexInputState;
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index fb308eb..8d7a5ae 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -52,6 +52,8 @@ typedef struct xcb_connection_t xcb_connection_t;
 typedef uint32_t xcb_visualid_t;
 typedef uint32_t xcb_window_t;
 
+struct anv_l3_config;
+
 #define VK_PROTOTYPES
 #include 
 #include 
@@ -1158,7 +1160,7 @@ struct anv_attachment_state {
 struct anv_cmd_state {
/* PIPELINE_SELECT.PipelineSelection */
uint32_t current_pipeline;
-   uint32_t current_l3_config;
+   const struct anv_l3_config *

[Mesa-dev] [PATCH 1/5] genxml/hsw: Add L3 cache control registers

2016-05-11 Thread Jordan Justen

These were added to the i965 driver in
5912da45a69923afa1b7f2eb5bb371d848813c41.

Signed-off-by: Jordan Justen 
---
 src/intel/genxml/gen75.xml | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/intel/genxml/gen75.xml b/src/intel/genxml/gen75.xml
index 698d93f..2258dee 100644
--- a/src/intel/genxml/gen75.xml
+++ b/src/intel/genxml/gen75.xml
@@ -2932,4 +2932,12 @@
 
   
 
+  
+
+  
+
+  
+
+  
+
 
-- 
2.8.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/5] anv: Keep track of whether the data cache should be enabled in L3

2016-05-11 Thread Jordan Justen

If images or shader buffers are used, we will enable the data cache in
the the L3 config.

Signed-off-by: Jordan Justen 
---
 src/intel/vulkan/anv_pipeline.c  | 8 +++-
 src/intel/vulkan/anv_private.h   | 1 +
 src/intel/vulkan/genX_pipeline.c | 2 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 5800e68..fcaa450 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -331,8 +331,12 @@ anv_pipeline_compile(struct anv_pipeline *pipeline,
if (pipeline->layout && pipeline->layout->stage[stage].has_dynamic_offsets)
   prog_data->nr_params += MAX_DYNAMIC_BUFFERS * 2;
 
-   if (nir->info.num_images > 0)
+   if (nir->info.num_images > 0) {
   prog_data->nr_params += nir->info.num_images * BRW_IMAGE_PARAM_SIZE;
+  pipeline->needs_data_cache |= true;
+   }
+
+   pipeline->needs_data_cache |= nir->info.num_ssbos > 0;
 
if (prog_data->nr_params > 0) {
   /* XXX: I think we're leaking this */
@@ -1138,6 +1142,8 @@ anv_pipeline_init(struct anv_pipeline *pipeline,
 
pipeline->use_repclear = extra && extra->use_repclear;
 
+   pipeline->needs_data_cache = false;
+
/* When we free the pipeline, we detect stages based on the NULL status
 * of various prog_data pointers.  Make them NULL by default.
 */
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index d8a2194..fb308eb 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -1403,6 +1403,7 @@ struct anv_pipeline {
struct anv_pipeline_bind_map bindings[MESA_SHADER_STAGES];
 
bool use_repclear;
+   bool needs_data_cache;
 
const struct brw_stage_prog_data *   prog_data[MESA_SHADER_STAGES];
uint32_t 
scratch_start[MESA_SHADER_STAGES];
diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
index 2328920..2a41b2d 100644
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -74,6 +74,8 @@ genX(compute_pipeline_create)(
pipeline->active_stages = 0;
pipeline->total_scratch = 0;
 
+   pipeline->needs_data_cache = false;
+
assert(pCreateInfo->stage.stage == VK_SHADER_STAGE_COMPUTE_BIT);
ANV_FROM_HANDLE(anv_shader_module, module,  pCreateInfo->stage.module);
anv_pipeline_compile_cs(pipeline, cache, pCreateInfo, module,
-- 
2.8.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/5] anv/gen7: Add memory barrier to vkCmdWaitEvents call

2016-05-11 Thread Jordan Justen

We also have this barrier call for gen8 vkCmdWaitEvents.

We don't implement waiting on events for gen7 yet, but this barrier at
least helps to not regress CTS cases when data caching is enabled.
Without this, the tests would intermittently report a failure when the
data cache was enabled.

Signed-off-by: Jordan Justen 
---
 src/intel/vulkan/gen7_cmd_buffer.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/intel/vulkan/gen7_cmd_buffer.c 
b/src/intel/vulkan/gen7_cmd_buffer.c
index 03ce889..479790e 100644
--- a/src/intel/vulkan/gen7_cmd_buffer.c
+++ b/src/intel/vulkan/gen7_cmd_buffer.c
@@ -547,4 +547,10 @@ void genX(CmdWaitEvents)(
 const VkImageMemoryBarrier* pImageMemoryBarriers)
 {
stub();
+
+   genX(CmdPipelineBarrier)(commandBuffer, srcStageMask, destStageMask,
+false, /* byRegion */
+memoryBarrierCount, pMemoryBarriers,
+bufferMemoryBarrierCount, pBufferMemoryBarriers,
+imageMemoryBarrierCount, pImageMemoryBarriers);
 }
-- 
2.8.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/5] anv: Port L3 cache programming from i965

2016-05-11 Thread Jordan Justen

git://people.freedesktop.org/~jljusten/mesa anv-l3-v1

This series is related to this bug:

https://bugs.freedesktop.org/show_bug.cgi?id=94468

Since we have a work-around for that bug currently, this doesn't fix
it. It does allow us to remove the work-around though.

Running through jenkins, I see 1 consistent regression only on Haswell
for an image load/store test that uses compute shaders. Even with the
regression, I think it is better to merge this series.

Jordan Justen (5):
  genxml/hsw: Add L3 cache control registers
  anv: Keep track of whether the data cache should be enabled in L3
  anv/gen7: Add memory barrier to vkCmdWaitEvents call
  anv: Port L3 cache programming from i965
  Revert "HACK: Don't re-configure L3$ in render stages pre-BDW"

 src/intel/genxml/gen75.xml |   8 +
 src/intel/vulkan/Makefile.sources  |   4 +
 src/intel/vulkan/anv_genX.h|   4 +-
 src/intel/vulkan/anv_pipeline.c|  41 ++-
 src/intel/vulkan/anv_private.h |  13 +-
 src/intel/vulkan/gen7_cmd_buffer.c | 100 +--
 src/intel/vulkan/gen8_cmd_buffer.c |  74 +
 src/intel/vulkan/genX_cmd_buffer.c |  13 +-
 src/intel/vulkan/genX_l3.c | 541 +
 src/intel/vulkan/genX_pipeline.c   |   4 +
 10 files changed, 617 insertions(+), 185 deletions(-)
 create mode 100644 src/intel/vulkan/genX_l3.c

-- 
2.8.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/5] Revert "HACK: Don't re-configure L3$ in render stages pre-BDW"

2016-05-11 Thread Jordan Justen

From: Jordan Justen 

This reverts commit 41af9b2e517dd0c17e519490ca915b96f6898390.
---
 src/intel/vulkan/genX_cmd_buffer.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index daa1884..b7c93bd 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -330,18 +330,7 @@ genX(cmd_buffer_flush_state)(struct anv_cmd_buffer 
*cmd_buffer)
 
assert((pipeline->active_stages & VK_SHADER_STAGE_COMPUTE_BIT) == 0);
 
-#if GEN_GEN >= 8
-   /* FIXME (jason): Currently, the config_l3 function causes problems on
-* Haswell and prior if you have a kernel older than 4.4.  In order to
-* work, it requires a couple of registers be white-listed in the
-* command parser and they weren't added until 4.4.  What we should do
-* is check the command parser version and make it a no-op if your
-* command parser is either off or too old.  Compute won't work 100%,
-* but at least 3-D will.  In the mean time, I'm going to make this
-* gen8+ only so that we can get Haswell working again.
-*/
genX(cmd_buffer_config_l3)(cmd_buffer, pipeline);
-#endif
 
genX(flush_pipeline_select_3d)(cmd_buffer);
 
-- 
2.8.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 09/18] gallium/radeon: use cs_check_space throughout

2016-05-11 Thread Marek Olšák

For patches 1-9:

Reviewed-by: Marek Olšák 

Marek

On Tue, May 10, 2016 at 1:21 AM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> ---
>  src/gallium/drivers/r300/r300_blit.c  | 2 +-
>  src/gallium/drivers/r300/r300_render.c| 2 +-
>  src/gallium/drivers/r600/r600_hw_context.c| 6 ++
>  src/gallium/drivers/radeon/r600_pipe_common.c | 2 +-
>  src/gallium/drivers/radeonsi/si_hw_context.c  | 5 ++---
>  5 files changed, 7 insertions(+), 10 deletions(-)
>
> diff --git a/src/gallium/drivers/r300/r300_blit.c 
> b/src/gallium/drivers/r300/r300_blit.c
> index b8cc316..2ee9b54 100644
> --- a/src/gallium/drivers/r300/r300_blit.c
> +++ b/src/gallium/drivers/r300/r300_blit.c
> @@ -382,7 +382,7 @@ static void r300_clear(struct pipe_context* pipe,
>  r300_get_num_cs_end_dwords(r300);
>
>  /* Reserve CS space. */
> -if (dwords > (r300->cs->max_dw - r300->cs->cdw)) {
> +if (!r300->rws->cs_check_space(r300->cs, dwords)) {
>  r300_flush(>context, RADEON_FLUSH_ASYNC, NULL);
>  }
>
> diff --git a/src/gallium/drivers/r300/r300_render.c 
> b/src/gallium/drivers/r300/r300_render.c
> index 43860f3..ad0f489 100644
> --- a/src/gallium/drivers/r300/r300_render.c
> +++ b/src/gallium/drivers/r300/r300_render.c
> @@ -215,7 +215,7 @@ static boolean r300_reserve_cs_dwords(struct r300_context 
> *r300,
>  cs_dwords += r300_get_num_cs_end_dwords(r300);
>
>  /* Reserve requested CS space. */
> -if (cs_dwords > (r300->cs->max_dw - r300->cs->cdw)) {
> +if (!r300->rws->cs_check_space(r300->cs, cs_dwords)) {
>  r300_flush(>context, RADEON_FLUSH_ASYNC, NULL);
>  flushed = TRUE;
>  }
> diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
> b/src/gallium/drivers/r600/r600_hw_context.c
> index 0b36494..425cda4 100644
> --- a/src/gallium/drivers/r600/r600_hw_context.c
> +++ b/src/gallium/drivers/r600/r600_hw_context.c
> @@ -47,9 +47,7 @@ void r600_need_cs_space(struct r600_context *ctx, unsigned 
> num_dw,
> ctx->b.gtt = 0;
> ctx->b.vram = 0;
>
> -   /* The number of dwords we already used in the CS so far. */
> -   num_dw += ctx->b.gfx.cs->cdw;
> -
> +   /* Check available space in CS. */
> if (count_draw_in) {
> uint64_t mask;
>
> @@ -82,7 +80,7 @@ void r600_need_cs_space(struct r600_context *ctx, unsigned 
> num_dw,
> num_dw += 10;
>
> /* Flush if there's not enough space. */
> -   if (num_dw > ctx->b.gfx.cs->max_dw) {
> +   if (!ctx->b.ws->cs_check_space(ctx->b.gfx.cs, num_dw)) {
> ctx->b.gfx.flush(ctx, RADEON_FLUSH_ASYNC, NULL);
> }
>  }
> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
> b/src/gallium/drivers/radeon/r600_pipe_common.c
> index feddb5c..8b76be0 100644
> --- a/src/gallium/drivers/radeon/r600_pipe_common.c
> +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
> @@ -143,7 +143,7 @@ void r600_need_dma_space(struct r600_common_context *ctx, 
> unsigned num_dw)
> ctx->gfx.flush(ctx, RADEON_FLUSH_ASYNC, NULL);
>
> /* Flush if there's not enough space. */
> -   if ((num_dw + ctx->dma.cs->cdw) > ctx->dma.cs->max_dw) {
> +   if (!ctx->ws->cs_check_space(ctx->dma.cs, num_dw)) {
> ctx->dma.flush(ctx, RADEON_FLUSH_ASYNC, NULL);
> assert((num_dw + ctx->dma.cs->cdw) <= ctx->dma.cs->max_dw);
> }
> diff --git a/src/gallium/drivers/radeonsi/si_hw_context.c 
> b/src/gallium/drivers/radeonsi/si_hw_context.c
> index dcf206d..2dac824 100644
> --- a/src/gallium/drivers/radeonsi/si_hw_context.c
> +++ b/src/gallium/drivers/radeonsi/si_hw_context.c
> @@ -84,9 +84,8 @@ void si_need_cs_space(struct si_context *ctx)
> /* If the CS is sufficiently large, don't count the space needed
>  * and just flush if there is not enough space left.
>  */
> -   if (unlikely(cs->cdw > cs->max_dw - 2048 ||
> - (ce_ib && ce_ib->max_dw - ce_ib->cdw <
> -  si_ce_needed_cs_space(
> +   if (!ctx->b.ws->cs_check_space(cs, 2048) ||
> +   (ce_ib && !ctx->b.ws->cs_check_space(ce_ib, 
> si_ce_needed_cs_space(
> ctx->b.gfx.flush(ctx, RADEON_FLUSH_ASYNC, NULL);
>  }
>
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v5] Add .mailmap

2016-05-11 Thread Jason Ekstrand

On Wed, May 11, 2016 at 2:16 PM, Kenneth Graunke 
wrote:

> On Wednesday, May 11, 2016 12:16:41 PM PDT Jason Ekstrand wrote:
> > Is there a reason this never got merged?  I'm up for just landing it now
> > and letting people fix up names as needed.
> > --Jason
>
> Sounds good to me.  Jason, why don't you go ahead and push it then?
>

Already did :-)


> --Ken
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 13/14] vl/dri3: implement functions for get and set timestamp

2016-05-11 Thread Leo Liu




On 05/11/2016 05:37 PM, Axel Davy wrote:

On 11/05/2016 23:31, Leo Liu wrote:



On 05/11/2016 05:18 PM, Axel Davy wrote:

On 11/05/2016 23:08, Leo Liu wrote:
scrn->next_msc = ((int64_t)stamp - scrn->last_ust + 
scrn->ns_frame/2) /
+   scrn->ns_frame + scrn->last_msc; 


Could you explain this calculation ?

ns_frame is the time for vsync in ns.
last_ust is time of last frame is finished presentation.
last_msc is msc of last frame is finished presentation.
stamp is player expected time for next frame to present.
and do the round up.

thanks


I think it may get issues if ns_frame is wrong. For example for some 
reason (app hidden for some frame, or monitor shut, or whatever), I 
think we could get two buffers getting complete event with same ust 
(one skipped, and one shown).



It works well very well so far for DRI2.



DRI3/Present are designed to allow more liberty for the server. It is 
expected in future version
the window managers will handle some of the events, and it may behave 
quite different than currently.


Besides DRI3/Present will likely be plugged into Xwayland (currently 
it uses the emulation code, but the extension can be implemented with 
wayland calls, I had a patch for that), and there again it is 
different behaviour.


Thus I advise against having code that 'works well in practice' but 
can fail with something that is allowed in the spec.


Agreed, this is initial work. we will keep making progress.

Thanks,
Leo




Thanks,
Leo

I think the calculation should be made more robust to issues with 
ns_frame. Perhaps do some temporal averaging of ns_frame and ignore 
outliers ?



Axel







___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 13/14] vl/dri3: implement functions for get and set timestamp

2016-05-11 Thread Axel Davy


On 11/05/2016 23:31, Leo Liu wrote:



On 05/11/2016 05:18 PM, Axel Davy wrote:

On 11/05/2016 23:08, Leo Liu wrote:

scrn->next_msc = ((int64_t)stamp - scrn->last_ust + scrn->ns_frame/2) /
+   scrn->ns_frame + scrn->last_msc; 


Could you explain this calculation ?

ns_frame is the time for vsync in ns.
last_ust is time of last frame is finished presentation.
last_msc is msc of last frame is finished presentation.
stamp is player expected time for next frame to present.
and do the round up.

thanks


I think it may get issues if ns_frame is wrong. For example for some 
reason (app hidden for some frame, or monitor shut, or whatever), I 
think we could get two buffers getting complete event with same ust 
(one skipped, and one shown).



It works well very well so far for DRI2.



DRI3/Present are designed to allow more liberty for the server. It is 
expected in future version
the window managers will handle some of the events, and it may behave 
quite different than currently.


Besides DRI3/Present will likely be plugged into Xwayland (currently it 
uses the emulation code, but the extension can be implemented with 
wayland calls, I had a patch for that), and there again it is different 
behaviour.


Thus I advise against having code that 'works well in practice' but can 
fail with something that is allowed in the spec.



Thanks,
Leo

I think the calculation should be made more robust to issues with 
ns_frame. Perhaps do some temporal averaging of ns_frame and ignore 
outliers ?



Axel





___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 13/14] vl/dri3: implement functions for get and set timestamp

2016-05-11 Thread Leo Liu




On 05/11/2016 05:18 PM, Axel Davy wrote:

On 11/05/2016 23:08, Leo Liu wrote:

scrn->next_msc = ((int64_t)stamp - scrn->last_ust + scrn->ns_frame/2) /
+   scrn->ns_frame + scrn->last_msc; 


Could you explain this calculation ?

ns_frame is the time for vsync in ns.
last_ust is time of last frame is finished presentation.
last_msc is msc of last frame is finished presentation.
stamp is player expected time for next frame to present.
and do the round up.


I think it may get issues if ns_frame is wrong. For example for some 
reason (app hidden for some frame, or monitor shut, or whatever), I 
think we could get two buffers getting complete event with same ust 
(one skipped, and one shown).



It works well very well so far for DRI2.

Thanks,
Leo

I think the calculation should be made more robust to issues with 
ns_frame. Perhaps do some temporal averaging of ns_frame and ignore 
outliers ?



Axel



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 05/18] winsys/amdgpu: add amdgpu_cs_has_user_fence

2016-05-11 Thread Marek Olšák

On Tue, May 10, 2016 at 1:21 AM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> ---
>  src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
> b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
> index 075d791..0bd776d 100644
> --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
> +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
> @@ -215,6 +215,11 @@ amdgpu_ctx_query_reset_status(struct radeon_winsys_ctx 
> *rwctx)
>
>  /* COMMAND SUBMISSION */
>
> +static bool amdgpu_cs_has_user_fence(struct amdgpu_cs_context *cs)
> +{
> +   return cs->request.ip_type != AMDGPU_HW_IP_UVD && cs->request.ip_type != 
> AMDGPU_HW_IP_VCE;

Bikeshedding: This would be nicer if it was on two lines as below.

Marek

> +}
> +
>  static bool amdgpu_get_new_ib(struct radeon_winsys *ws, struct amdgpu_ib *ib,
>struct amdgpu_cs_ib_info *info, unsigned 
> ib_type)
>  {
> @@ -677,8 +682,7 @@ void amdgpu_cs_submit_ib(struct amdgpu_cs *acs)
> int i, r;
>
> cs->request.fence_info.handle = NULL;
> -   if (cs->request.ip_type != AMDGPU_HW_IP_UVD &&
> -   cs->request.ip_type != AMDGPU_HW_IP_VCE) {
> +   if (amdgpu_cs_has_user_fence(cs)) {
> cs->request.fence_info.handle = acs->ctx->user_fence_bo;
> cs->request.fence_info.offset = acs->ring_type;
> }
> @@ -735,8 +739,7 @@ void amdgpu_cs_submit_ib(struct amdgpu_cs *acs)
> } else {
>/* Success. */
>uint64_t *user_fence = NULL;
> -  if (cs->request.ip_type != AMDGPU_HW_IP_UVD &&
> -  cs->request.ip_type != AMDGPU_HW_IP_VCE)
> +  if (amdgpu_cs_has_user_fence(cs))
>   user_fence = acs->ctx->user_fence_cpu_address_base +
>cs->request.fence_info.offset;
>amdgpu_fence_submitted(cs->fence, >request, user_fence);
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 13/14] vl/dri3: implement functions for get and set timestamp

2016-05-11 Thread Axel Davy


On 11/05/2016 23:08, Leo Liu wrote:

scrn->next_msc = ((int64_t)stamp - scrn->last_ust + scrn->ns_frame/2) /
+   scrn->ns_frame + scrn->last_msc; 


Could you explain this calculation ?

I think it may get issues if ns_frame is wrong. For example for some 
reason (app hidden for some frame, or monitor shut, or whatever), I 
think we could get two buffers getting complete event with same ust (one 
skipped, and one shown).



I think the calculation should be made more robust to issues with 
ns_frame. Perhaps do some temporal averaging of ns_frame and ignore 
outliers ?



Axel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v5] Add .mailmap

2016-05-11 Thread Kenneth Graunke

On Wednesday, May 11, 2016 12:16:41 PM PDT Jason Ekstrand wrote:
> Is there a reason this never got merged?  I'm up for just landing it now
> and letting people fix up names as needed.
> --Jason

Sounds good to me.  Jason, why don't you go ahead and push it then?

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/14] vl/dri3: implement dri3 screen create and destroy

2016-05-11 Thread Leo Liu




On 05/11/2016 04:20 PM, Axel Davy wrote:

On 11/05/2016 17:06, Leo Liu wrote:

Screen created with device fd returned from X server,
also will bail out to DRI2 with certain conditions.

Signed-off-by: Leo Liu 
---
  configure.ac  |  7 ++-
  src/gallium/auxiliary/vl/vl_winsys_dri3.c | 88 
++-

  2 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/configure.ac b/configure.ac
index 023110e..8c3960a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1779,7 +1779,12 @@ if test "x$enable_xvmc" = xyes -o \
  "x$enable_vdpau" = xyes -o \
  "x$enable_omx" = xyes -o \
  "x$enable_va" = xyes; then
-PKG_CHECK_MODULES([VL], [x11-xcb xcb xcb-dri2 >= 
$XCBDRI2_REQUIRED])

+if test x"$enable_dri3" = xyes; then
+PKG_CHECK_MODULES([VL], [xcb-dri3 xcb-present xcb-sync 
xshmfence >= $XSHMFENCE_REQUIRED
+ x11-xcb xcb xcb-dri2 >= 
$XCBDRI2_REQUIRED])

+else
+PKG_CHECK_MODULES([VL], [x11-xcb xcb xcb-dri2 >= 
$XCBDRI2_REQUIRED])

+fi
  need_gallium_vl_winsys=yes
  fi
  AM_CONDITIONAL(NEED_GALLIUM_VL_WINSYS, test 
"x$need_gallium_vl_winsys" = xyes)
diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c

index 2c3d3ae..c018379 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -25,7 +25,16 @@
   *
**/
  +#include 
+
+#include 
+#include 
+#include 
+
+#include "loader.h"
+
  #include "pipe/p_screen.h"
+#include "pipe-loader/pipe_loader.h"
#include "util/u_memory.h"
  #include "vl/vl_winsys.h"
@@ -33,6 +42,8 @@
  struct vl_dri3_screen
  {
 struct vl_screen base;
+   xcb_connection_t *conn;
+   xcb_drawable_t drawable;
  };
static void
@@ -82,7 +93,14 @@ vl_dri3_screen_get_private(struct vl_screen *vscreen)
  static void
  vl_dri3_screen_destroy(struct vl_screen *vscreen)
  {
-   /* TODO */
+   struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)vscreen;
+
+   assert(vscreen);
+
+   scrn->base.pscreen->destroy(scrn->base.pscreen);
+   pipe_loader_release(>base.dev, 1);
+   FREE(scrn);
+
 return;
  }
  @@ -90,6 +108,13 @@ struct vl_screen *
  vl_dri3_screen_create(Display *display, int screen)
  {
 struct vl_dri3_screen *scrn;
+   const xcb_query_extension_reply_t *extension;
+   xcb_dri3_open_cookie_t open_cookie;
+   xcb_dri3_open_reply_t *open_reply;
+   xcb_get_geometry_cookie_t geom_cookie;
+   xcb_get_geometry_reply_t *geom_reply;
+   int is_different_gpu;
+   int fd;
   assert(display);
  @@ -97,6 +122,58 @@ vl_dri3_screen_create(Display *display, int 
screen)

 if (!scrn)
return NULL;
  +   scrn->conn = XGetXCBConnection(display);
+   if (!scrn->conn)
+  goto free_screen;
+
+   xcb_prefetch_extension_data(scrn->conn , _dri3_id);
+   xcb_prefetch_extension_data(scrn->conn, _present_id);
+   extension = xcb_get_extension_data(scrn->conn, _dri3_id);
+   if (!(extension && extension->present))
+  goto free_screen;
+   extension = xcb_get_extension_data(scrn->conn, _present_id);
+   if (!(extension && extension->present))
+  goto free_screen;
+
+   open_cookie = xcb_dri3_open(scrn->conn, RootWindow(display, 
screen), None);

+   open_reply = xcb_dri3_open_reply(scrn->conn, open_cookie, NULL);
+   if (!open_reply)
+  goto free_screen;
+   if (open_reply->nfd != 1) {
+  free(open_reply);
+  goto free_screen;
+   }
+
+   fd = xcb_dri3_open_reply_fds(scrn->conn, open_reply)[0];
+   if (fd < 0) {
+  free(open_reply);
+  goto free_screen;
+   }
+   fcntl(fd, F_SETFD, FD_CLOEXEC);
+   free(open_reply);
+
+   fd = loader_get_user_preferred_fd(fd, _different_gpu);
+   /* TODO support different GPU */
+   if (is_different_gpu)
+  goto free_screen;
+
+   geom_cookie = xcb_get_geometry(scrn->conn, RootWindow(display, 
screen));

+   geom_reply = xcb_get_geometry_reply(scrn->conn, geom_cookie, NULL);
+   if (!geom_reply)
+  goto free_screen;
+   /* TODO support depth other than 24 */
+   if (geom_reply->depth != 24) {
+  free(geom_reply);
+  goto free_screen;
+   }
+   free(geom_reply);
+
+   if (pipe_loader_drm_probe_fd(>base.dev, fd))
+  scrn->base.pscreen = pipe_loader_create_screen(scrn->base.dev);
+
+   if (!scrn->base.pscreen)
+  goto release_pipe;
+
 scrn->base.destroy = vl_dri3_screen_destroy;
 scrn->base.texture_from_drawable = 
vl_dri3_screen_texture_from_drawable;

 scrn->base.get_dirty_area = vl_dri3_screen_get_dirty_area;
@@ -106,4 +183,13 @@ vl_dri3_screen_create(Display *display, int screen)
 scrn->base.pscreen->flush_frontbuffer = vl_dri3_flush_frontbuffer;
   return >base;
+
+release_pipe:
+   if (scrn->base.dev)
+  pipe_loader_release(>base.dev, 1);
+   fd = -1;
+   close(fd);


I assume mistake there ... or is close(-1) supposed to do something 
specific ?


Good

Re: [Mesa-dev] [PATCH 13/14] vl/dri3: implement functions for get and set timestamp

2016-05-11 Thread Leo Liu




On 05/11/2016 04:16 PM, Axel Davy wrote:

Hi,

The present extension has something exactly to set the target ust for 
the presentation: PresentOptionUST


Unfortunately, while it is in the spec it looks like the option is 
totally ignored, and thus it will be totally buggy (you are supposed 
to pass ust instead of msc...).


Exactly. using ust instead of msc just except the first frame, will 
answer further with next question.




However PresentNotifyMSC should work well (assuming recent enough 
Xserver) and give you the current screen ust (and msc).


I see you use it when last_ust hasn't been filled already.  But why 
not using it all the time ?


The player call get timestamp before rendering and presentation, so for 
the first frame we have to use msc instead,

and later we all use ust.

Do some apps assume it's the ust of the last presented buffer ? The 
doc of the vdpau function doesn't

seem to tell you should assume that.

Could you add a comment to explain your next_msc calculation ?

Player calculate time stamp based on get time and vsync, and pass by, we 
calculate the next msc based on this , last msc and time of vsync.
Basically this and using ust(msc for the first frame) are the exact same 
idea based on existing vl/dri, which is working well for different players.


Thanks,
Leo


Axel

On 11/05/2016 17:06, Leo Liu wrote:

Signed-off-by: Leo Liu 
---
  src/gallium/auxiliary/vl/vl_winsys_dri3.c | 59 
+++

  1 file changed, 53 insertions(+), 6 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c

index f917e4b..d8e8319 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -79,6 +79,8 @@ struct vl_dri3_screen
 uint32_t send_msc_serial, recv_msc_serial;
 uint64_t send_sbc, recv_sbc;
 int64_t last_ust, ns_frame, last_msc, next_msc;
+
+   bool flushed;
  };
static void
@@ -467,19 +469,30 @@ vl_dri3_flush_frontbuffer(struct pipe_screen 
*screen,

 if (!back)
 return;
  +   if (scrn->flushed) {
+  while (scrn->special_event && scrn->recv_sbc < scrn->send_sbc)
+ if (!dri3_wait_present_events(scrn))
+return;
+   }
+
 xshmfence_reset(back->shm_fence);
 back->busy = true;
   xcb_present_pixmap(scrn->conn,
scrn->drawable,
back->pixmap,
-  0, 0, 0, 0, 0,
+  (uint32_t)(++scrn->send_sbc),
+  0, 0, 0, 0,
None, None,
back->sync_fence,
-  options, 0, 0, 0, 0, NULL);
+  options,
+  scrn->next_msc,
+  0, 0, 0, NULL);
   xcb_flush(scrn->conn);
  +   scrn->flushed = true;
+
 return;
  }
  @@ -494,6 +507,13 @@ vl_dri3_screen_texture_from_drawable(struct 
vl_screen *vscreen, void *drawable)

 if (!dri3_set_drawable(scrn, (Drawable)drawable))
return NULL;
  +   if (scrn->flushed) {
+  while (scrn->special_event && scrn->recv_sbc < scrn->send_sbc)
+ if (!dri3_wait_present_events(scrn))
+return NULL;
+   }
+   scrn->flushed = false;
+
 buffer = (scrn->is_pixmap) ?
  dri3_get_front_buffer(scrn) :
  dri3_get_back_buffer(scrn);
@@ -516,15 +536,42 @@ vl_dri3_screen_get_dirty_area(struct vl_screen 
*vscreen)

  static uint64_t
  vl_dri3_screen_get_timestamp(struct vl_screen *vscreen, void 
*drawable)

  {
-   /* TODO */
-   return 0;
+   struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)vscreen;
+
+   assert(scrn);
+
+   if (!dri3_set_drawable(scrn, (Drawable)drawable))
+  return 0;
+
+   if (!scrn->last_ust) {
+  xcb_present_notify_msc(scrn->conn,
+ scrn->drawable,
+ ++scrn->send_msc_serial,
+ 0, 0, 0);
+  xcb_flush(scrn->conn);
+
+  while (scrn->special_event &&
+ scrn->send_msc_serial > scrn->recv_msc_serial) {
+ if (!dri3_wait_present_events(scrn))
+return 0;
+  }
+   }
+
+   return scrn->last_ust;
  }
static void
  vl_dri3_screen_set_next_timestamp(struct vl_screen *vscreen, 
uint64_t stamp)

  {
-   /* TODO */
-   return;
+   struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)vscreen;
+
+   assert(scrn);
+
+   if (stamp && scrn->last_ust && scrn->ns_frame && scrn->last_msc)
+  scrn->next_msc = ((int64_t)stamp - scrn->last_ust + 
scrn->ns_frame/2) /

+   scrn->ns_frame + scrn->last_msc;
+   else
+  scrn->next_msc = 0;
  }
static void *





___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 12/23] i965/fs: fix pull constant load component selection for doubles

2016-05-11 Thread Francisco Jerez

Samuel Iglesias Gonsálvez  writes:

> On Tue, 2016-05-10 at 21:06 -0700, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez  writes:
>> 
>> > 
>> > From: Iago Toral Quiroga 
>> > 
>> > UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4
>> > starting at a
>> > constant offset that is 16-byte aligned. If we need to access an
>> > unaligned
>> > offset we emit a load with an aligned offset and use the remaining
>> > constant
>> > offset to select the component into the vec4 result that we are
>> > interested
>> > in. This component must be computed in units of the type size,
>> > since that
>> > is what fs_reg::set_smear expects.
>> > 
>> > This patch does this change in the two places where we use this
>> > message:
>> > In demote_pull_constants when we lower uniform access with constant
>> > offset
>> > into the pull constant buffer and in UBO loads with constant
>> > offset.
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++-
>> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 4 +++-
>> >  2 files changed, 5 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > index 0e69be8..dff13ea 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > @@ -2268,7 +2268,8 @@ fs_visitor::lower_constant_loads()
>> >   inst->src[i].file = VGRF;
>> >   inst->src[i].nr = dst.nr;
>> >   inst->src[i].reg_offset = 0;
>> > - inst->src[i].set_smear(pull_index & 3);
>> > + unsigned type_slots = MAX2(1, type_sz(inst->dst.type) /
>> > 4);
>> > + inst->src[i].set_smear((pull_index & 3) / type_slots);
>> >  
>> This cannot be right, why should we care what the destination type of
>> the instruction is while lowering a uniform source?  Also I don't
>> think
>> the MAX2 call is correct because *if* type_sz(inst->dst.type) / 4 < 1
>> you'll force type_slots to 1 and end up interpreting the pull_index
>> in
>> the wrong units.  How about:
>> 
>> > 
>> >   inst->src[i].set_smear((pull_index & 3) * 4 /
>> >  type_sz(inst->src[i].type));
>> > 
>
> OK
>
>> >   brw_mark_surface_used(prog_data, index);
>> >    }
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > index 4cd219a..532ca65 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > @@ -2980,8 +2980,10 @@ fs_visitor::nir_emit_intrinsic(const
>> > fs_builder , nir_intrinsic_instr *instr
>> >   bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
>> > packed_consts,
>> >    surf_index, const_offset_reg);
>> >  
>> > + unsigned component_base =
>> > +(const_offset->u32[0] % 16) / MAX2(1,
>> > type_sz(dest.type));
>> Rather than dividing by the type size only to let set_smear multiply
>> by
>> the type size again, I think it would be cleaner to do something
>> like:
>> 
>> > 
>> >   const fs_reg consts = byte_offset(packed_consts,
>> > const_offset->u32[0] % 16);
>> > 
>> >   for (unsigned i = 0; i < instr->num_components; i++) {
>> then here:
>> 
>> > 
>> >  bld.MOV(offset(dest, bld, i), component(consts, i));
>> and then remove the rest of the loop.
>> 
>
> I am having troubles with adapting patch 13/23 to this way because the
> following assert in component() is failing for some tests:
>     
>     assert(reg.subreg_offset == 0);
>

Ouch, that seems pretty broken, let's fix it (see attachment).

> consts.subreg is not zero thanks to byte_offset() call.
>
> So I prefer to go to a mixed solution: keep set_smear() usage, then:
>
>    bld.MOV(offset(dest, bld, i), packed_consts);
>
> and remove the rest of the loop.
>
> Sam
>
>> > 
>> > -packed_consts.set_smear(const_offset->u32[0] % 16 / 4
>> > + i);
>> > +packed_consts.set_smear(component_base + i);
>> >  
>> >  /* The std140 packing rules don't allow vectors to
>> > cross 16-byte
>> >   * boundaries, and a reg is 32 bytes.

From 242fa33e55396630a8385794ffeab5ea6cb6462c Mon Sep 17 00:00:00 2001
From: Francisco Jerez 
Date: Wed, 11 May 2016 12:54:26 -0700
Subject: [PATCH] i965/fs: Fix and document component().

This fixes a number of bugs of component() by reimplementing it in
terms of horiz_offset(): Handling of base registers starting at a
non-zero subreg_offset, handling of strided registers and overflow of
subreg_offset into reg_offset.
---
 src/mesa/drivers/dri/i965/brw_ir_fs.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h b/src/mesa/drivers/dri/i965/brw_ir_fs.h
index e4f20f4..57ee816 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
@@ -122,12 +122,14 @@

[Mesa-dev] [Bug 95346] Stellaris - Black/super dark planets

2016-05-11 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95346

--- Comment #5 from Ilia Mirkin  ---
The trace replays fine on i965 (SKL) but incorrectly on llvmpipe, at commit 
2655265 as well as on 11.2.2.

BTW, those "used uninitialized warnings" are hardly always accurate. They can
refer to dead code, etc.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 8/8] radeonsi/sid_tables: rename reg_table to sid_reg_table

2016-05-11 Thread Bas Nieuwenhuizen

Working my way through the python took some time as I'm not that
familiar with python either, but the series is

Reviewed-by: Bas Nieuwenhuizen 

Also thanks for removing the designated array initializers.

- Bas

On Wed, May 11, 2016 at 10:13 PM, Marek Olšák  wrote:
> For the series:
>
> Reviewed-by: Marek Olšák 
>
> Except patch 6, which is:
>
> Acked-by: Marek Olšák 
>
> It's just too much python for me and I don't consider myself a python guy.
>
> Marek
>
>
> On Mon, May 9, 2016 at 6:32 PM, Nicolai Hähnle  wrote:
>> From: Nicolai Hähnle 
>>
>> This is purely cosmetic, making it easier to assign blame for space used
>> in the binary in case somebody else makes a similar cleanup effort in the
>> future.
>> ---
>>  src/gallium/drivers/radeonsi/si_debug.c| 4 ++--
>>  src/gallium/drivers/radeonsi/sid_tables.py | 2 +-
>>  2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_debug.c 
>> b/src/gallium/drivers/radeonsi/si_debug.c
>> index f7393d6..783dee4 100644
>> --- a/src/gallium/drivers/radeonsi/si_debug.c
>> +++ b/src/gallium/drivers/radeonsi/si_debug.c
>> @@ -185,8 +185,8 @@ static void si_dump_reg(FILE *file, unsigned offset, 
>> uint32_t value,
>>  {
>> int r, f;
>>
>> -   for (r = 0; r < ARRAY_SIZE(reg_table); r++) {
>> -   const struct si_reg *reg = _table[r];
>> +   for (r = 0; r < ARRAY_SIZE(sid_reg_table); r++) {
>> +   const struct si_reg *reg = _reg_table[r];
>> const char *reg_name = sid_strings + reg->name_offset;
>>
>> if (reg->offset == offset) {
>> diff --git a/src/gallium/drivers/radeonsi/sid_tables.py 
>> b/src/gallium/drivers/radeonsi/sid_tables.py
>> index 0ca24ae..7ba0215 100755
>> --- a/src/gallium/drivers/radeonsi/sid_tables.py
>> +++ b/src/gallium/drivers/radeonsi/sid_tables.py
>> @@ -262,7 +262,7 @@ struct si_packet3 {
>>  print '};'
>>  print
>>
>> -print 'static const struct si_reg reg_table[] = {'
>> +print 'static const struct si_reg sid_reg_table[] = {'
>>  for reg in regs:
>>  if len(reg.fields):
>>  print '\t{%s, %s, %s, %s},' % (strings.add(reg.name), 
>> reg.r_name,
>> --
>> 2.7.4
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] Integrate precise trig into configuration infrastructure

2016-05-11 Thread Gurchetan Singh

With this change, to enable precise SIN and COS instructions
on Intel hardware, one can put



in the proper drirc file.

V2: Make option name more generic
---
 src/mesa/drivers/dri/common/xmlpool/t_options.h | 5 +
 src/mesa/drivers/dri/i965/brw_compiler.c| 2 --
 src/mesa/drivers/dri/i965/brw_context.c | 3 +++
 src/mesa/drivers/dri/i965/intel_screen.c| 2 ++
 4 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/common/xmlpool/t_options.h 
b/src/mesa/drivers/dri/common/xmlpool/t_options.h
index e5cbc46..4b298a4 100644
--- a/src/mesa/drivers/dri/common/xmlpool/t_options.h
+++ b/src/mesa/drivers/dri/common/xmlpool/t_options.h
@@ -158,6 +158,11 @@ DRI_CONF_OPT_BEGIN_B(force_s3tc_enable, def) \
 DRI_CONF_DESC(en,gettext("Enable S3TC texture compression even if 
software support is not available")) \
 DRI_CONF_OPT_END
 
+#define DRI_CONF_PRECISE_TRIG(def) \
+DRI_CONF_OPT_BEGIN_B(precise_trig, def) \
+DRI_CONF_DESC(en,gettext("Prefer accuracy over performance in trig 
functions")) \
+DRI_CONF_OPT_END
+
 #define DRI_CONF_COLOR_REDUCTION_ROUND 0
 #define DRI_CONF_COLOR_REDUCTION_DITHER 1
 #define DRI_CONF_COLOR_REDUCTION(def) \
diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c 
b/src/mesa/drivers/dri/i965/brw_compiler.c
index 7c1b7e4..9ef7357 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.c
+++ b/src/mesa/drivers/dri/i965/brw_compiler.c
@@ -145,8 +145,6 @@ brw_compiler_create(void *mem_ctx, const struct 
brw_device_info *devinfo)
brw_fs_alloc_reg_sets(compiler);
brw_vec4_alloc_reg_set(compiler);
 
-   compiler->precise_trig = env_var_as_boolean("INTEL_PRECISE_TRIG", false);
-
compiler->scalar_stage[MESA_SHADER_VERTEX] =
   devinfo->gen >= 8 && !(INTEL_DEBUG & DEBUG_VEC4VS);
compiler->scalar_stage[MESA_SHADER_TESS_CTRL] =
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 26514a0..160f232 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -776,6 +776,9 @@ brw_process_driconf_options(struct brw_context *brw)
 
brw->precompile = driQueryOptionb(>optionCache, "shader_precompile");
 
+   brw->intelScreen->compiler->precise_trig =
+  driQueryOptionb(>optionCache, "precise_trig");
+
ctx->Const.ForceGLSLExtensionsWarn =
   driQueryOptionb(options, "force_glsl_extensions_warn");
 
diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index f9b5484..af8c4f4 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -65,6 +65,8 @@ DRI_CONF_BEGIN
DRI_CONF_SECTION_QUALITY
   DRI_CONF_FORCE_S3TC_ENABLE("false")
 
+  DRI_CONF_PRECISE_TRIG("false")
+
   DRI_CONF_OPT_BEGIN(clamp_max_samples, int, -1)
   DRI_CONF_DESC(en, "Clamp the value of GL_MAX_SAMPLES to the "
 "given integer. If negative, then do not clamp.")
-- 
2.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 12/59] i965: add brw_imm_df

2016-05-11 Thread Francisco Jerez

Samuel Iglesias Gonsálvez  writes:

> On 11/05/16 05:56, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez  writes:
>> 
>>> From: Connor Abbott 
>>>
>>> v2 (Iago)
>>>   - Fixup accessibility in backend_reg
>>>
>>> Signed-off-by: Iago Toral Quiroga 
>> 
>> I've just noticed (while running valgrind) that this patch causes
>> serious breakage in the back-end.  The reason is that the extra bits
>> required to make room for the df field of the union don't get
>> initialized in all codepaths, so backend_reg comparisons done using
>> memcmp() can basically return random results now.  Can you please look
>> into this?  Some ways to fix it would be to make sure we zero-initialize
>> the whole brw_reg in all cases (or at least the union padding), or stop
>> using memcmp() to compare registers -- I guess the latter might be
>> somewhat less intrusive and increase the likelihood that we can get this
>> sorted out timely.
>> 
>
> Attached is a patch for it, I initialized all union bits to zero before
> setting them in brw_reg(). Can you test it? If it is not fixed, Would
> you mind sending me an example to run it with valgrind here?
>
I'm afraid it's not fixed, I still see plenty of "Conditional jump or
move depends on uninitialised value(s)" errors while running pretty much
any piglit test on valgrind with the patch below applied.

> I am thinking that maybe we want to change backend_reg::equals() if this
> doesn't work.
>
> Sam
>
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_reg.h| 9 +
>>>  src/mesa/drivers/dri/i965/brw_shader.h | 1 +
>>>  2 files changed, 10 insertions(+)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
>>> b/src/mesa/drivers/dri/i965/brw_reg.h
>>> index b84c709..6d51623 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_reg.h
>>> +++ b/src/mesa/drivers/dri/i965/brw_reg.h
>>> @@ -254,6 +254,7 @@ struct brw_reg {
>>>   unsigned pad1:1;
>>>};
>>>  
>>> +  double df;
>>>float f;
>>>int   d;
>>>unsigned ud;
>>> @@ -544,6 +545,14 @@ brw_imm_reg(enum brw_reg_type type)
>>>  
>>>  /** Construct float immediate register */
>>>  static inline struct brw_reg
>>> +brw_imm_df(double df)
>>> +{
>>> +   struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_DF);
>>> +   imm.df = df;
>>> +   return imm;
>>> +}
>>> +
>>> +static inline struct brw_reg
>>>  brw_imm_f(float f)
>>>  {
>>> struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_F);
>>> diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
>>> b/src/mesa/drivers/dri/i965/brw_shader.h
>>> index fc228f6..f6f6167 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_shader.h
>>> +++ b/src/mesa/drivers/dri/i965/brw_shader.h
>>> @@ -90,6 +90,7 @@ struct backend_reg : private brw_reg
>>> using brw_reg::width;
>>> using brw_reg::hstride;
>>>  
>>> +   using brw_reg::df;
>>> using brw_reg::f;
>>> using brw_reg::d;
>>> using brw_reg::ud;
>>> -- 
>>> 2.5.0
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> From 35254624d63b77aa2024bc2b08612e28cae4bb98 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Samuel=20Iglesias=20Gons=C3=A1lvez?= 
> Date: Wed, 11 May 2016 07:44:10 +0200
> Subject: [PATCH] i965: initialize struct brw_reg's union bits to zero.
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Extra bits required to make room for the df field of the union don't get
> initialized in all codepaths, so backend_reg comparisons done using
> memcmp() can basically return random results.
>
> Initialize them to zero before setting the rest of union's fields.
>
> Signed-off-by: Samuel Iglesias Gonsálvez 
> Reported-by: Francisco Jerez 
> ---
>  src/mesa/drivers/dri/i965/brw_reg.h | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
> b/src/mesa/drivers/dri/i965/brw_reg.h
> index 6d51623..3b76d7d 100644
> --- a/src/mesa/drivers/dri/i965/brw_reg.h
> +++ b/src/mesa/drivers/dri/i965/brw_reg.h
> @@ -338,6 +338,9 @@ brw_reg(enum brw_reg_file file,
> reg.subnr = subnr * type_sz(type);
> reg.nr = nr;
>  
> +   /* Initialize all union's bits to zero before setting them. */
> +   reg.df = 0;
> +
> /* Could do better: If the reg is r5.3<0;1,0>, we probably want to
>  * set swizzle and writemask to W, as the lower bits of subnr will
>  * be lost when converted to align16.  This is probably too much to
> -- 
> 2.5.0


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/14] vl/dri3: implement dri3 screen create and destroy

2016-05-11 Thread Axel Davy


On 11/05/2016 17:06, Leo Liu wrote:

Screen created with device fd returned from X server,
also will bail out to DRI2 with certain conditions.

Signed-off-by: Leo Liu 
---
  configure.ac  |  7 ++-
  src/gallium/auxiliary/vl/vl_winsys_dri3.c | 88 ++-
  2 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/configure.ac b/configure.ac
index 023110e..8c3960a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1779,7 +1779,12 @@ if test "x$enable_xvmc" = xyes -o \
  "x$enable_vdpau" = xyes -o \
  "x$enable_omx" = xyes -o \
  "x$enable_va" = xyes; then
-PKG_CHECK_MODULES([VL], [x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED])
+if test x"$enable_dri3" = xyes; then
+PKG_CHECK_MODULES([VL], [xcb-dri3 xcb-present xcb-sync xshmfence >= 
$XSHMFENCE_REQUIRED
+ x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED])
+else
+PKG_CHECK_MODULES([VL], [x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED])
+fi
  need_gallium_vl_winsys=yes
  fi
  AM_CONDITIONAL(NEED_GALLIUM_VL_WINSYS, test "x$need_gallium_vl_winsys" = xyes)
diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
index 2c3d3ae..c018379 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -25,7 +25,16 @@
   *
   **/
  
+#include 

+
+#include 
+#include 
+#include 
+
+#include "loader.h"
+
  #include "pipe/p_screen.h"
+#include "pipe-loader/pipe_loader.h"
  
  #include "util/u_memory.h"

  #include "vl/vl_winsys.h"
@@ -33,6 +42,8 @@
  struct vl_dri3_screen
  {
 struct vl_screen base;
+   xcb_connection_t *conn;
+   xcb_drawable_t drawable;
  };
  
  static void

@@ -82,7 +93,14 @@ vl_dri3_screen_get_private(struct vl_screen *vscreen)
  static void
  vl_dri3_screen_destroy(struct vl_screen *vscreen)
  {
-   /* TODO */
+   struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)vscreen;
+
+   assert(vscreen);
+
+   scrn->base.pscreen->destroy(scrn->base.pscreen);
+   pipe_loader_release(>base.dev, 1);
+   FREE(scrn);
+
 return;
  }
  
@@ -90,6 +108,13 @@ struct vl_screen *

  vl_dri3_screen_create(Display *display, int screen)
  {
 struct vl_dri3_screen *scrn;
+   const xcb_query_extension_reply_t *extension;
+   xcb_dri3_open_cookie_t open_cookie;
+   xcb_dri3_open_reply_t *open_reply;
+   xcb_get_geometry_cookie_t geom_cookie;
+   xcb_get_geometry_reply_t *geom_reply;
+   int is_different_gpu;
+   int fd;
  
 assert(display);
  
@@ -97,6 +122,58 @@ vl_dri3_screen_create(Display *display, int screen)

 if (!scrn)
return NULL;
  
+   scrn->conn = XGetXCBConnection(display);

+   if (!scrn->conn)
+  goto free_screen;
+
+   xcb_prefetch_extension_data(scrn->conn , _dri3_id);
+   xcb_prefetch_extension_data(scrn->conn, _present_id);
+   extension = xcb_get_extension_data(scrn->conn, _dri3_id);
+   if (!(extension && extension->present))
+  goto free_screen;
+   extension = xcb_get_extension_data(scrn->conn, _present_id);
+   if (!(extension && extension->present))
+  goto free_screen;
+
+   open_cookie = xcb_dri3_open(scrn->conn, RootWindow(display, screen), None);
+   open_reply = xcb_dri3_open_reply(scrn->conn, open_cookie, NULL);
+   if (!open_reply)
+  goto free_screen;
+   if (open_reply->nfd != 1) {
+  free(open_reply);
+  goto free_screen;
+   }
+
+   fd = xcb_dri3_open_reply_fds(scrn->conn, open_reply)[0];
+   if (fd < 0) {
+  free(open_reply);
+  goto free_screen;
+   }
+   fcntl(fd, F_SETFD, FD_CLOEXEC);
+   free(open_reply);
+
+   fd = loader_get_user_preferred_fd(fd, _different_gpu);
+   /* TODO support different GPU */
+   if (is_different_gpu)
+  goto free_screen;
+
+   geom_cookie = xcb_get_geometry(scrn->conn, RootWindow(display, screen));
+   geom_reply = xcb_get_geometry_reply(scrn->conn, geom_cookie, NULL);
+   if (!geom_reply)
+  goto free_screen;
+   /* TODO support depth other than 24 */
+   if (geom_reply->depth != 24) {
+  free(geom_reply);
+  goto free_screen;
+   }
+   free(geom_reply);
+
+   if (pipe_loader_drm_probe_fd(>base.dev, fd))
+  scrn->base.pscreen = pipe_loader_create_screen(scrn->base.dev);
+
+   if (!scrn->base.pscreen)
+  goto release_pipe;
+
 scrn->base.destroy = vl_dri3_screen_destroy;
 scrn->base.texture_from_drawable = vl_dri3_screen_texture_from_drawable;
 scrn->base.get_dirty_area = vl_dri3_screen_get_dirty_area;
@@ -106,4 +183,13 @@ vl_dri3_screen_create(Display *display, int screen)
 scrn->base.pscreen->flush_frontbuffer = vl_dri3_flush_frontbuffer;
  
 return >base;

+
+release_pipe:
+   if (scrn->base.dev)
+  pipe_loader_release(>base.dev, 1);
+   fd = -1;
+   close(fd);


I assume mistake there ... or is close(-1) supposed to do something 
specific ?


Axel

+free_screen:
+   FREE(scrn);
+   return

Re: [Mesa-dev] [PATCH 13/14] vl/dri3: implement functions for get and set timestamp

2016-05-11 Thread Axel Davy


Hi,

The present extension has something exactly to set the target ust for 
the presentation: PresentOptionUST


Unfortunately, while it is in the spec it looks like the option is 
totally ignored, and thus it will be totally buggy (you are supposed to 
pass ust instead of msc...).


However PresentNotifyMSC should work well (assuming recent enough 
Xserver) and give you the current screen ust (and msc).


I see you use it when last_ust hasn't been filled already.  But why not 
using it all the time ?
Do some apps assume it's the ust of the last presented buffer ? The doc 
of the vdpau function doesn't

seem to tell you should assume that.

Could you add a comment to explain your next_msc calculation ?

Axel

On 11/05/2016 17:06, Leo Liu wrote:

Signed-off-by: Leo Liu 
---
  src/gallium/auxiliary/vl/vl_winsys_dri3.c | 59 +++
  1 file changed, 53 insertions(+), 6 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
index f917e4b..d8e8319 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -79,6 +79,8 @@ struct vl_dri3_screen
 uint32_t send_msc_serial, recv_msc_serial;
 uint64_t send_sbc, recv_sbc;
 int64_t last_ust, ns_frame, last_msc, next_msc;
+
+   bool flushed;
  };
  
  static void

@@ -467,19 +469,30 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen,
 if (!back)
 return;
  
+   if (scrn->flushed) {

+  while (scrn->special_event && scrn->recv_sbc < scrn->send_sbc)
+ if (!dri3_wait_present_events(scrn))
+return;
+   }
+
 xshmfence_reset(back->shm_fence);
 back->busy = true;
  
 xcb_present_pixmap(scrn->conn,

scrn->drawable,
back->pixmap,
-  0, 0, 0, 0, 0,
+  (uint32_t)(++scrn->send_sbc),
+  0, 0, 0, 0,
None, None,
back->sync_fence,
-  options, 0, 0, 0, 0, NULL);
+  options,
+  scrn->next_msc,
+  0, 0, 0, NULL);
  
 xcb_flush(scrn->conn);
  
+   scrn->flushed = true;

+
 return;
  }
  
@@ -494,6 +507,13 @@ vl_dri3_screen_texture_from_drawable(struct vl_screen *vscreen, void *drawable)

 if (!dri3_set_drawable(scrn, (Drawable)drawable))
return NULL;
  
+   if (scrn->flushed) {

+  while (scrn->special_event && scrn->recv_sbc < scrn->send_sbc)
+ if (!dri3_wait_present_events(scrn))
+return NULL;
+   }
+   scrn->flushed = false;
+
 buffer = (scrn->is_pixmap) ?
  dri3_get_front_buffer(scrn) :
  dri3_get_back_buffer(scrn);
@@ -516,15 +536,42 @@ vl_dri3_screen_get_dirty_area(struct vl_screen *vscreen)
  static uint64_t
  vl_dri3_screen_get_timestamp(struct vl_screen *vscreen, void *drawable)
  {
-   /* TODO */
-   return 0;
+   struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)vscreen;
+
+   assert(scrn);
+
+   if (!dri3_set_drawable(scrn, (Drawable)drawable))
+  return 0;
+
+   if (!scrn->last_ust) {
+  xcb_present_notify_msc(scrn->conn,
+ scrn->drawable,
+ ++scrn->send_msc_serial,
+ 0, 0, 0);
+  xcb_flush(scrn->conn);
+
+  while (scrn->special_event &&
+ scrn->send_msc_serial > scrn->recv_msc_serial) {
+ if (!dri3_wait_present_events(scrn))
+return 0;
+  }
+   }
+
+   return scrn->last_ust;
  }
  
  static void

  vl_dri3_screen_set_next_timestamp(struct vl_screen *vscreen, uint64_t stamp)
  {
-   /* TODO */
-   return;
+   struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)vscreen;
+
+   assert(scrn);
+
+   if (stamp && scrn->last_ust && scrn->ns_frame && scrn->last_msc)
+  scrn->next_msc = ((int64_t)stamp - scrn->last_ust + scrn->ns_frame/2) /
+   scrn->ns_frame + scrn->last_msc;
+   else
+  scrn->next_msc = 0;
  }
  
  static void *



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 8/8] radeonsi/sid_tables: rename reg_table to sid_reg_table

2016-05-11 Thread Marek Olšák

For the series:

Reviewed-by: Marek Olšák 

Except patch 6, which is:

Acked-by: Marek Olšák 

It's just too much python for me and I don't consider myself a python guy.

Marek


On Mon, May 9, 2016 at 6:32 PM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> This is purely cosmetic, making it easier to assign blame for space used
> in the binary in case somebody else makes a similar cleanup effort in the
> future.
> ---
>  src/gallium/drivers/radeonsi/si_debug.c| 4 ++--
>  src/gallium/drivers/radeonsi/sid_tables.py | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/src/gallium/drivers/radeonsi/si_debug.c 
> b/src/gallium/drivers/radeonsi/si_debug.c
> index f7393d6..783dee4 100644
> --- a/src/gallium/drivers/radeonsi/si_debug.c
> +++ b/src/gallium/drivers/radeonsi/si_debug.c
> @@ -185,8 +185,8 @@ static void si_dump_reg(FILE *file, unsigned offset, 
> uint32_t value,
>  {
> int r, f;
>
> -   for (r = 0; r < ARRAY_SIZE(reg_table); r++) {
> -   const struct si_reg *reg = _table[r];
> +   for (r = 0; r < ARRAY_SIZE(sid_reg_table); r++) {
> +   const struct si_reg *reg = _reg_table[r];
> const char *reg_name = sid_strings + reg->name_offset;
>
> if (reg->offset == offset) {
> diff --git a/src/gallium/drivers/radeonsi/sid_tables.py 
> b/src/gallium/drivers/radeonsi/sid_tables.py
> index 0ca24ae..7ba0215 100755
> --- a/src/gallium/drivers/radeonsi/sid_tables.py
> +++ b/src/gallium/drivers/radeonsi/sid_tables.py
> @@ -262,7 +262,7 @@ struct si_packet3 {
>  print '};'
>  print
>
> -print 'static const struct si_reg reg_table[] = {'
> +print 'static const struct si_reg sid_reg_table[] = {'
>  for reg in regs:
>  if len(reg.fields):
>  print '\t{%s, %s, %s, %s},' % (strings.add(reg.name), reg.r_name,
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 12/23] i965/fs: fix pull constant load component selection for doubles

2016-05-11 Thread Francisco Jerez

Samuel Iglesias Gonsálvez  writes:

> On Wed, 2016-05-11 at 17:12 +0200, Samuel Iglesias Gonsálvez wrote:
>> On Tue, 2016-05-10 at 21:06 -0700, Francisco Jerez wrote:
>> > 
>> > Samuel Iglesias Gonsálvez  writes:
>> > 
>> > > 
>> > > 
>> > > From: Iago Toral Quiroga 
>> > > 
>> > > UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4
>> > > starting at a
>> > > constant offset that is 16-byte aligned. If we need to access an
>> > > unaligned
>> > > offset we emit a load with an aligned offset and use the
>> > > remaining
>> > > constant
>> > > offset to select the component into the vec4 result that we are
>> > > interested
>> > > in. This component must be computed in units of the type size,
>> > > since that
>> > > is what fs_reg::set_smear expects.
>> > > 
>> > > This patch does this change in the two places where we use this
>> > > message:
>> > > In demote_pull_constants when we lower uniform access with
>> > > constant
>> > > offset
>> > > into the pull constant buffer and in UBO loads with constant
>> > > offset.
>> > > ---
>> > >  src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++-
>> > >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 4 +++-
>> > >  2 files changed, 5 insertions(+), 2 deletions(-)
>> > > 
>> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > > b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > > index 0e69be8..dff13ea 100644
>> > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > > @@ -2268,7 +2268,8 @@ fs_visitor::lower_constant_loads()
>> > >   inst->src[i].file = VGRF;
>> > >   inst->src[i].nr = dst.nr;
>> > >   inst->src[i].reg_offset = 0;
>> > > - inst->src[i].set_smear(pull_index & 3);
>> > > + unsigned type_slots = MAX2(1, type_sz(inst->dst.type) /
>> > > 4);
>> > > + inst->src[i].set_smear((pull_index & 3) / type_slots);
>> > >  
>> > This cannot be right, why should we care what the destination type
>> > of
>> > the instruction is while lowering a uniform source?  Also I don't
>> > think
>> > the MAX2 call is correct because *if* type_sz(inst->dst.type) / 4 <
>> > 1
>> > you'll force type_slots to 1 and end up interpreting the pull_index
>> > in
>> > the wrong units.  How about:
>> > 
>> > > 
>> > > 
>> > >   inst->src[i].set_smear((pull_index & 3) * 4 /
>> > >  type_sz(inst->src[i].type));
>> > > 
>> OK
>> 
>> > 
>> > > 
>> > >   brw_mark_surface_used(prog_data, index);
>> > >    }
>> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > > index 4cd219a..532ca65 100644
>> > > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > > @@ -2980,8 +2980,10 @@ fs_visitor::nir_emit_intrinsic(const
>> > > fs_builder , nir_intrinsic_instr *instr
>> > >   bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
>> > > packed_consts,
>> > >    surf_index, const_offset_reg);
>> > >  
>> > > + unsigned component_base =
>> > > +(const_offset->u32[0] % 16) / MAX2(1,
>> > > type_sz(dest.type));
>> > Rather than dividing by the type size only to let set_smear
>> > multiply
>> > by
>> > the type size again, I think it would be cleaner to do something
>> > like:
>> > 
>> > > 
>> > > 
>> > >   const fs_reg consts = byte_offset(packed_consts,
>> > > const_offset->u32[0] % 16);
>> > > 
>> > >   for (unsigned i = 0; i < instr->num_components; i++) {
>> > then here:
>> > 
>> > > 
>> > > 
>> > >  bld.MOV(offset(dest, bld, i), component(consts, i));
>> > and then remove the rest of the loop.
>> > 
>> I am having troubles with adapting patch 13/23 to this way because
>> the
>> following assert in component() is failing for some tests:
>>     
>>     assert(reg.subreg_offset == 0);
>> 
>> consts.subreg is not zero thanks to byte_offset() call.
>> 
>> So I prefer to go to a mixed solution: keep set_smear() usage, then:
>> 
>>    bld.MOV(offset(dest, bld, i), packed_consts);
>> 
>
> Looking at patch 13, offset(dest, bld, i) needs to be adjusted to save
> the remaining components, so I think the MOV is clearer as it is now
> than the proposed change.
>
I won't pretend I understand everything going on in PATCH 13 :P, but
feel free to keep the induction on the destination register if it turns
out to be convenient later on.

> Sam
>
>> and remove the rest of the loop.
>> 
>> Sam
>> 
>> > 
>> > > 
>> > > 
>> > > -packed_consts.set_smear(const_offset->u32[0] % 16 /
>> > > 4
>> > > + i);
>> > > +packed_consts.set_smear(component_base + i);
>> > >  
>> > >  /* The std140 packing rules don't allow vectors to
>> > > cross 16-byte
>> > >   * boundaries, and a reg is 32 bytes.
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org

Re: [Mesa-dev] [PATCH 11/15] i965: abort linking if we exhaust the registers

2016-05-11 Thread Kenneth Graunke

On Wednesday, May 11, 2016 1:19:01 PM PDT Juan A. Suarez Romero wrote:
> On Fri, 2016-04-29 at 14:23 +0200, Juan A. Suarez Romero wrote:
> > On Fri, 2016-04-29 at 11:15 +0200, Ian Romanick wrote:
> > > 
> > > The driver supports up to 16 vertex attributes.
> > > > 
> > > > ARB_vertex_attrib_64bit
> > > > states that attribute variables of type dvec3, dvec4, dmat2x3,
> > > > dmat2x4,
> > > > dmat3, dmat3x4, dmat4x3, and dmat4 *may* count as consuming twice
> > > > as
> > > > many attributes as equivalent single-precision types.
> > > > 
> > > > 
> > > > I highlight the may, because it is not mandatory. If we count
> > > > those
> > > > types as consuming the same as a single-precision type (which is
> > > > what
> > > > is happening in Mesa), we are consuming 15 attributes, so we are
> > > > under 
> > > > the limit.
> > > This is the thing we need to fix.  Bailing from deep inside the
> > > driver
> > > code generation (which may happen long, long after linking) is not
> > > allowed.  If a shader is not going to work, we are required to
> > > generate
> > > the error in glLinkProgram.
> > > 
> > I'm not sure if I am following you. In which cases there can be code
> > generation after the linking?
> > 
> > On the other side, the error is generated when calling
> > glLinkProgram():
> > it happens inside the stack that follows the call, when the NIR code
> > is
> > transformed in the intermediate BRW code.
> > 
> > So from user pov, the error happens when calling glLinkProgram(), not
> > afterwards.
> > 
> > 
> > > 
> > > > 
> > > > > 
> > > > > But I see couple of drawbacks with this approach:
> > > > >  
> > > > >  
> > > > > - There are tests that under the same conditions (less than the
> > > > limit
> > > > > 
> > > > > if you count those types as occupying the same as single-
> > > > precision, but
> > > > > 
> > > > > beyond the limit if those types are considered as consuming
> > > > twice) they
> > > > > 
> > > > > still works. An example is the attached shader2 test: it
> > > > > requires
> > > > 13
> > > > > 
> > > > > attributes (or 19 counting as twice the mentioned types) and it
> > > > works
> > > > > 
> > > > > fine.
> > > > > 
> > > > - This check affects to all the backends. And there could be some
> > > > backend that works perfectly fine with the current
> > > > implementation,
> > > > which is less conservative. In fact, we have an example: the same
> > > > driver running in vec4 mode (SIMD4x2) works perfectly fine.
> > > I think we can handle this by having a per-type (double, dvec2,
> > > dvec3,
> > > and dvec4) flag to select the double or don't-double behavior.
> > > 
> > Do you mean only dvec3 and dvec4 (and types based on those: dmat2x3,
> > dmat2x4, dmat3, dmat3x4, dmat4x3, and dmat4)?
> > 
> > Because only restrict to those types to count them as twice when
> > checking if we bypass the max attributes limit.
> > 
> > Also, do we really need a flag per type? Wouldn't it be enough to
> > have
> > a single flag stating if all of those types are counted twice or not?
> > 
> > Finally, I understand the idea is that the flag would be like a
> > boolean
> > with true (count twice) or false (don't count twice), contained in
> > each
> > backend/driver (like the max attribs limit, for instance).
> > 
> > 
> > Anyway, this would fix the second problem, but not the former: if our
> > gen8 backend counts doubles as twice, it will prevent to run some
> > shaders that otherway would work fine (like the shader2). If this is
> > an
> > acceptable solution (shaders that work fine now will be rejected
> > because they use too much vertex attribs), then it's fine.
> > 
> > 
> > J.A.
> 
> 
> CCing Kenneth as this is also related to patch 12.
> 
> 
> Summing up, we can:
> 
> -  Check registers exhaustion and URB read length and abort linking if
> we reach the limit. This is what patches 11 and 12 do.
> 
> - Or, we can count dvec3, dvec4, dmat2x3, dmat2x4, dmat3, dmat3x4,
> dmat4x3, and dmat4 vertex attributes as consuming twice the equivalent
> single-precision types. Right now we are counting them as if they were
> single-precision (spec allows both options[1]). This would require to
> add some kind of flags to allow drivers to decide if they count them as
> one or two. But if we follow this option, some shaders in i965 that
> currently work because they do not exhaust registers neither reach the
> URB read length limit, will fail to link if the count those mentioned
> double types twice.
> 
> So what to do?

I feel like the best approach is the second one (double counting
dvec[34] and dmat*x[34] types).  Although they only take up a single
VERTEX_ELEMENT structure, they do take up twice as much URB space
(256 bits instead of 128 bits).

As you said in your earlier email, we should only get into trouble
for shaders that use a lot of double attributes.  I doubt that we'll
see many of those in the real world.  Plus, we are specifically allowed
to do that, so conformance tests can't assume things

Re: [Mesa-dev] [PATCH 15/23] i965/fs: add SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA helper

2016-05-11 Thread Francisco Jerez

Iago Toral  writes:

> On Tue, 2016-05-10 at 19:10 -0700, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez  writes:
>> 
>> > From: Iago Toral Quiroga 
>> >
>> > There are a few places where we need to shuffle the result of a 32-bit load
>> > into valid 64-bit data, so extract this logic into a separate helper that 
>> > we
>> > can reuse.
>> >
>> > Also, the shuffling needs to operate with WE_all set, which we were missing
>> > before, because we are changing the layout of the data across the various
>> > channels. Otherwise we will run into problems in non-uniform control-flow
>> > scenarios.
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 95 
>> > +---
>> >  src/mesa/drivers/dri/i965/brw_fs.h   |  5 ++
>> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 46 ++--
>> >  3 files changed, 73 insertions(+), 73 deletions(-)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
>> > b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > index dff13ea..709e4b8 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > @@ -216,39 +216,8 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const 
>> > fs_builder ,
>> >  
>> > vec4_result.type = dst.type;
>> >  
>> > -   /* Our VARYING_PULL_CONSTANT_LOAD reads a vector of 32-bit elements. 
>> > If we
>> > -* are reading doubles this means that we get this:
>> > -*
>> > -*  r0: x0 x0 x0 x0 x0 x0 x0 x0
>> > -*  r1: x1 x1 x1 x1 x1 x1 x1 x1
>> > -*  r2: y0 y0 y0 y0 y0 y0 y0 y0
>> > -*  r3: y1 y1 y1 y1 y1 y1 y1 y1
>> > -*
>> > -* Fix this up so we return valid double elements:
>> > -*
>> > -*  r0: x0 x1 x0 x1 x0 x1 x0 x1
>> > -*  r1: x0 x1 x0 x1 x0 x1 x0 x1
>> > -*  r2: y0 y1 y0 y1 y0 y1 y0 y1
>> > -*  r3: y0 y1 y0 y1 y0 y1 y0 y1
>> > -*/
>> > -   if (type_sz(dst.type) == 8) {
>> > -  int multiplier = bld.dispatch_width() / 8;
>> > -  fs_reg fixed_res =
>> > - fs_reg(VGRF, alloc.allocate(2 * multiplier), 
>> > BRW_REGISTER_TYPE_F);
>> > -  /* We only have 2 doubles in a 32-bit vec4 */
>> > -  for (int i = 0; i < 2; i++) {
>> > - fs_reg vec4_float =
>> > -horiz_offset(retype(vec4_result, BRW_REGISTER_TYPE_F),
>> > - multiplier * 16 * i);
>> > -
>> > - bld.MOV(stride(fixed_res, 2), vec4_float);
>> > - bld.MOV(stride(horiz_offset(fixed_res, 1), 2),
>> > - horiz_offset(vec4_float, 8 * multiplier));
>> > -
>> > - bld.MOV(horiz_offset(vec4_result, multiplier * 8 * i),
>> > - retype(fixed_res, BRW_REGISTER_TYPE_DF));
>> > -  }
>> > -   }
>> > +   if (type_sz(dst.type) == 8)
>> > +  SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA(bld, vec4_result, 
>> > vec4_result, 2);
>> >  
>> > int type_slots = MAX2(type_sz(dst.type) / 4, 1);
>> > bld.MOV(dst, offset(vec4_result, bld,
>> > @@ -256,6 +225,66 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const 
>> > fs_builder ,
>> >  }
>> >  
>> >  /**
>> > + * This helper takes the result of a load operation that reads 32-bit 
>> > elements
>> > + * in this format:
>> > + *
>> > + * x x x x x x x x
>> > + * y y y y y y y y
>> > + * z z z z z z z z
>> > + * w w w w w w w w
>> > + *
>> > + * and shuffles the data to get this:
>> > + *
>> > + * x y x y x y x y
>> > + * x y x y x y x y
>> > + * z w z w z w z w
>> > + * z w z w z w z w
>> > + *
>> > + * Which is exactly what we want if the load is reading 64-bit components
>> > + * like doubles, where x represents the low 32-bit of the x double 
>> > component
>> > + * and y represents the high 32-bit of the x double component (likewise 
>> > with
>> > + * z and w for double component y). The parameter @components represents
>> > + * the number of 64-bit components present in @src. This would typically 
>> > be
>> > + * 2 at most, since we can only fit 2 double elements in the result of a
>> > + * vec4 load.
>> > + *
>> > + * Notice that @dst and @src can be the same register.
>> > + */
>> > +void
>> > +fs_visitor::SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA(const fs_builder ,
>> 
>> I don't see any reason to make this an fs_visitor method.  Declare this
>> as a static function local to brw_fs_nir.cpp what should improve
>> encapsulation and reduce the amount of boilerplate.  Also please don't
>> write it in capitals unless you want people to shout the name of your
>> function while discussing out loud about it. ;)
>> 
>> > +const fs_reg dst,
>> > +const fs_reg src,
>> > +uint32_t components)
>> > +{
>> > +   int multiplier = bld.dispatch_width() / 8;
>> 
>> This definition is redundant with the changes below taken into account.
>> 
>> > +
>> > +   /* A temporary that we will use to shuffle the 32-bit data of each
>>

[Mesa-dev] [PATCH 01/11] nir: Add new 'plane' texture source type

2016-05-11 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

This will be used to select the plane to sample from for planar
textures.

Reviewed-by: Jason Ekstrand 

---
 src/compiler/nir/nir.h   | 1 +
 src/compiler/nir/nir_print.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 20927a2..daf91be 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1072,6 +1072,7 @@ typedef enum {
nir_tex_src_ddy,
nir_tex_src_texture_offset, /* < dynamically uniform indirect offset */
nir_tex_src_sampler_offset, /* < dynamically uniform indirect offset */
+   nir_tex_src_plane,  /* < selects plane for planar textures */
nir_num_tex_src_types
 } nir_tex_src_type;
 
diff --git a/src/compiler/nir/nir_print.c b/src/compiler/nir/nir_print.c
index a36561e..090070a 100644
--- a/src/compiler/nir/nir_print.c
+++ b/src/compiler/nir/nir_print.c
@@ -688,6 +688,9 @@ print_tex_instr(nir_tex_instr *instr, print_state *state)
   case nir_tex_src_sampler_offset:
  fprintf(fp, "(sampler_offset)");
  break;
+  case nir_tex_src_plane:
+ fprintf(fp, "(plane)");
+ break;
 
   default:
  unreachable("Invalid texture source type");
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 15/23] i965/fs: add SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA helper

2016-05-11 Thread Francisco Jerez

Iago Toral  writes:

> On Tue, 2016-05-10 at 19:10 -0700, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez  writes:
>> 
>> > From: Iago Toral Quiroga 
>> >
>> > There are a few places where we need to shuffle the result of a 32-bit load
>> > into valid 64-bit data, so extract this logic into a separate helper that 
>> > we
>> > can reuse.
>> >
>> > Also, the shuffling needs to operate with WE_all set, which we were missing
>> > before, because we are changing the layout of the data across the various
>> > channels. Otherwise we will run into problems in non-uniform control-flow
>> > scenarios.
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 95 
>> > +---
>> >  src/mesa/drivers/dri/i965/brw_fs.h   |  5 ++
>> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 46 ++--
>> >  3 files changed, 73 insertions(+), 73 deletions(-)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
>> > b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > index dff13ea..709e4b8 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > @@ -216,39 +216,8 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const 
>> > fs_builder ,
>> >  
>> > vec4_result.type = dst.type;
>> >  
>> > -   /* Our VARYING_PULL_CONSTANT_LOAD reads a vector of 32-bit elements. 
>> > If we
>> > -* are reading doubles this means that we get this:
>> > -*
>> > -*  r0: x0 x0 x0 x0 x0 x0 x0 x0
>> > -*  r1: x1 x1 x1 x1 x1 x1 x1 x1
>> > -*  r2: y0 y0 y0 y0 y0 y0 y0 y0
>> > -*  r3: y1 y1 y1 y1 y1 y1 y1 y1
>> > -*
>> > -* Fix this up so we return valid double elements:
>> > -*
>> > -*  r0: x0 x1 x0 x1 x0 x1 x0 x1
>> > -*  r1: x0 x1 x0 x1 x0 x1 x0 x1
>> > -*  r2: y0 y1 y0 y1 y0 y1 y0 y1
>> > -*  r3: y0 y1 y0 y1 y0 y1 y0 y1
>> > -*/
>> > -   if (type_sz(dst.type) == 8) {
>> > -  int multiplier = bld.dispatch_width() / 8;
>> > -  fs_reg fixed_res =
>> > - fs_reg(VGRF, alloc.allocate(2 * multiplier), 
>> > BRW_REGISTER_TYPE_F);
>> > -  /* We only have 2 doubles in a 32-bit vec4 */
>> > -  for (int i = 0; i < 2; i++) {
>> > - fs_reg vec4_float =
>> > -horiz_offset(retype(vec4_result, BRW_REGISTER_TYPE_F),
>> > - multiplier * 16 * i);
>> > -
>> > - bld.MOV(stride(fixed_res, 2), vec4_float);
>> > - bld.MOV(stride(horiz_offset(fixed_res, 1), 2),
>> > - horiz_offset(vec4_float, 8 * multiplier));
>> > -
>> > - bld.MOV(horiz_offset(vec4_result, multiplier * 8 * i),
>> > - retype(fixed_res, BRW_REGISTER_TYPE_DF));
>> > -  }
>> > -   }
>> > +   if (type_sz(dst.type) == 8)
>> > +  SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA(bld, vec4_result, 
>> > vec4_result, 2);
>> >  
>> > int type_slots = MAX2(type_sz(dst.type) / 4, 1);
>> > bld.MOV(dst, offset(vec4_result, bld,
>> > @@ -256,6 +225,66 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const 
>> > fs_builder ,
>> >  }
>> >  
>> >  /**
>> > + * This helper takes the result of a load operation that reads 32-bit 
>> > elements
>> > + * in this format:
>> > + *
>> > + * x x x x x x x x
>> > + * y y y y y y y y
>> > + * z z z z z z z z
>> > + * w w w w w w w w
>> > + *
>> > + * and shuffles the data to get this:
>> > + *
>> > + * x y x y x y x y
>> > + * x y x y x y x y
>> > + * z w z w z w z w
>> > + * z w z w z w z w
>> > + *
>> > + * Which is exactly what we want if the load is reading 64-bit components
>> > + * like doubles, where x represents the low 32-bit of the x double 
>> > component
>> > + * and y represents the high 32-bit of the x double component (likewise 
>> > with
>> > + * z and w for double component y). The parameter @components represents
>> > + * the number of 64-bit components present in @src. This would typically 
>> > be
>> > + * 2 at most, since we can only fit 2 double elements in the result of a
>> > + * vec4 load.
>> > + *
>> > + * Notice that @dst and @src can be the same register.
>> > + */
>> > +void
>> > +fs_visitor::SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA(const fs_builder ,
>> 
>> I don't see any reason to make this an fs_visitor method.  Declare this
>> as a static function local to brw_fs_nir.cpp what should improve
>> encapsulation and reduce the amount of boilerplate.  Also please don't
>> write it in capitals unless you want people to shout the name of your
>> function while discussing out loud about it. ;)
>> 
>> > +const fs_reg dst,
>> > +const fs_reg src,
>> > +uint32_t components)
>> > +{
>> > +   int multiplier = bld.dispatch_width() / 8;
>> 
>> This definition is redundant with the changes below taken into account.
>> 
>> > +
>> > +   /* A temporary that we will use to shuffle the 32-bit data of each
>>

[Mesa-dev] [PATCH 07/11] i965: Create multiple miptrees for planar YUV images

2016-05-11 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c |  3 ++
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  5 +++
 src/mesa/drivers/dri/i965/intel_tex_image.c   | 49 ++-
 src/mesa/drivers/dri/i965/intel_tex_obj.h |  2 ++
 4 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 94f6333..e2405cb 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -990,6 +990,9 @@ intel_miptree_release(struct intel_mipmap_tree **mt)
   intel_miptree_release(&(*mt)->mcs_mt);
   intel_resolve_map_clear(&(*mt)->hiz_map);
 
+  intel_miptree_release(&(*mt)->plane[0]);
+  intel_miptree_release(&(*mt)->plane[1]);
+
   for (i = 0; i < MAX_TEXTURE_LEVELS; i++) {
 free((*mt)->level[i].slice);
   }
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index 7862152..9ab4b23 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -625,6 +625,11 @@ struct intel_mipmap_tree
struct intel_mipmap_tree *mcs_mt;
 
/**
+* Planes 1 and 2 in case this is a planar surface.
+*/
+   struct intel_mipmap_tree *plane[2];
+
+   /**
 * Fast clear state for this buffer.
 */
enum intel_fast_clear_state fast_clear_state;
diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index 4d20a86..a58edf2 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -162,6 +162,47 @@ intel_set_texture_image_mt(struct brw_context *brw,
intel_miptree_reference(_texobj->mt, mt);
 }
 
+static struct intel_mipmap_tree *
+create_mt_for_planar_dri_image(struct brw_context *brw,
+   GLenum target, __DRIimage *image)
+{
+   struct intel_image_format *f = image->planar_format;
+   struct intel_mipmap_tree *planar_mt;
+
+   for (int i = 0; i < f->nplanes; i++) {
+  const int index = f->planes[i].buffer_index;
+  const uint32_t dri_format = f->planes[i].dri_format;
+  const mesa_format format = driImageFormatToGLFormat(dri_format);
+  const uint32_t width = image->width >> f->planes[i].width_shift;
+  const uint32_t height = image->height >> f->planes[i].height_shift;
+
+  /* Disable creation of the texture's aux buffers because the driver
+   * exposes no EGL API to manage them. That is, there is no API for
+   * resolving the aux buffer's content to the main buffer nor for
+   * invalidating the aux buffer's content.
+   */
+  struct intel_mipmap_tree *mt =
+ intel_miptree_create_for_bo(brw, image->bo, format,
+ image->offsets[index],
+ width, height, 1,
+ image->strides[index],
+ MIPTREE_LAYOUT_DISABLE_AUX);
+  if (mt == NULL)
+ return NULL;
+
+  mt->target = target;
+  mt->total_width = width;
+  mt->total_height = height;
+
+  if (i == 0)
+ planar_mt = mt;
+  else
+ planar_mt->plane[i - 1] = mt;
+   }
+
+   return planar_mt;
+}
+
 /**
  * Binds a BO to a texture image, as if it was uploaded by glTexImage2D().
  *
@@ -348,10 +389,16 @@ intel_image_target_texture_2d(struct gl_context *ctx, 
GLenum target,
   return;
}
 
-   mt = create_mt_for_dri_image(brw, target, image);
+   if (image->planar_format && image->planar_format->nplanes > 0)
+  mt = create_mt_for_planar_dri_image(brw, target, image);
+   else
+  mt = create_mt_for_dri_image(brw, target, image);
if (mt == NULL)
   return;
 
+   struct intel_texture_object *intel_texobj = intel_texture_object(texObj);
+   intel_texobj->dri_image = image;
+
intel_set_texture_image_mt(brw, texImage, mt);
intel_miptree_release();
 }
diff --git a/src/mesa/drivers/dri/i965/intel_tex_obj.h 
b/src/mesa/drivers/dri/i965/intel_tex_obj.h
index 750e4c3..ad78570 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_obj.h
+++ b/src/mesa/drivers/dri/i965/intel_tex_obj.h
@@ -58,6 +58,8 @@ struct intel_texture_object
 * since the mt is shared across views with differing formats.
 */
mesa_format _Format;
+
+   struct __DRIimageRec *dri_image;
 };
 
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 03/11] nir: Add a lowering pass for YUV textures

2016-05-11 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

This lowers sampling from YUV textures to 1) one or more texture
instructions to sample each plane and 2) color space conversion to RGB.

Reviewed-by: Jason Ekstrand 

---
 src/compiler/nir/nir.h   |   7 +++
 src/compiler/nir/nir_lower_tex.c | 119 +++
 2 files changed, 126 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index daf91be..b07d6be 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2328,6 +2328,13 @@ typedef struct nir_lower_tex_options {
bool lower_rect;
 
/**
+* If true, convert yuv to rgb.
+*/
+   unsigned lower_y_uv_external;
+   unsigned lower_y_u_v_external;
+   unsigned lower_yx_xuxv_external;
+
+   /**
 * To emulate certain texture wrap modes, this can be used
 * to saturate the specified tex coord to [0.0, 1.0].  The
 * bits are according to sampler #, ie. if, for example:
diff --git a/src/compiler/nir/nir_lower_tex.c b/src/compiler/nir/nir_lower_tex.c
index a080475..18c2e70 100644
--- a/src/compiler/nir/nir_lower_tex.c
+++ b/src/compiler/nir/nir_lower_tex.c
@@ -161,6 +161,109 @@ lower_rect(nir_builder *b, nir_tex_instr *tex)
tex->sampler_dim = GLSL_SAMPLER_DIM_2D;
 }
 
+static nir_ssa_def *
+sample_plane(nir_builder *b, nir_tex_instr *tex, int plane)
+{
+   assert(tex->dest.is_ssa);
+   assert(nir_tex_instr_dest_size(tex) == 4);
+   assert(nir_alu_type_get_base_type(tex->dest_type) == nir_type_float);
+   assert(tex->op == nir_texop_tex);
+   assert(tex->coord_components == 2);
+
+   nir_tex_instr *plane_tex = nir_tex_instr_create(b->shader, 2);
+   nir_src_copy(_tex->src[0].src, >src[0].src, plane_tex);
+   plane_tex->src[0].src_type = nir_tex_src_coord;
+   plane_tex->src[1].src = nir_src_for_ssa(nir_imm_int(b, plane));
+   plane_tex->src[1].src_type = nir_tex_src_plane;
+   plane_tex->op = nir_texop_tex;
+   plane_tex->sampler_dim = 2;
+   plane_tex->dest_type = nir_type_float;
+   plane_tex->coord_components = 2;
+
+   plane_tex->texture_index = tex->texture_index;
+   plane_tex->texture = (nir_deref_var *)
+  nir_copy_deref(plane_tex, >texture->deref);
+   plane_tex->sampler_index = tex->sampler_index;
+   plane_tex->sampler = (nir_deref_var *)
+  nir_copy_deref(plane_tex, >sampler->deref);
+
+   nir_ssa_dest_init(_tex->instr, _tex->dest, 4, 32, NULL);
+
+   nir_builder_instr_insert(b, _tex->instr);
+
+   return _tex->dest.ssa;
+}
+
+static void
+convert_yuv_to_rgb(nir_builder *b, nir_tex_instr *tex,
+   nir_ssa_def *y, nir_ssa_def *u, nir_ssa_def *v)
+{
+   nir_const_value m[3] = {
+  { .f32 = { 1.0f,  0.0f, 1.59602678f, 0.0f } },
+  { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } },
+  { .f32 = { 1.0f,  2.01723214f,  0.0f,0.0f } }
+   };
+
+   nir_ssa_def *yuv =
+  nir_vec4(b,
+   nir_fmul(b, nir_imm_float(b, 1.16438356f),
+nir_fadd(b, y, nir_imm_float(b, -0.0625f))),
+   nir_channel(b, nir_fadd(b, u, nir_imm_float(b, -0.5f)), 0),
+   nir_channel(b, nir_fadd(b, v, nir_imm_float(b, -0.5f)), 0),
+   nir_imm_float(b, 0.0));
+
+   nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[0]));
+   nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[1]));
+   nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, 32, m[2]));
+
+   nir_ssa_def *result = nir_vec4(b, red, green, blue, nir_imm_float(b, 1.0f));
+
+   nir_ssa_def_rewrite_uses(>dest.ssa, nir_src_for_ssa(result));
+}
+
+static void
+lower_y_uv_external(nir_builder *b, nir_tex_instr *tex)
+{
+   b->cursor = nir_after_instr(>instr);
+
+   nir_ssa_def *y = sample_plane(b, tex, 0);
+   nir_ssa_def *uv = sample_plane(b, tex, 1);
+
+   convert_yuv_to_rgb(b, tex,
+  nir_channel(b, y, 0),
+  nir_channel(b, uv, 0),
+  nir_channel(b, uv, 1));
+}
+
+static void
+lower_y_u_v_external(nir_builder *b, nir_tex_instr *tex)
+{
+   b->cursor = nir_after_instr(>instr);
+
+   nir_ssa_def *y = sample_plane(b, tex, 0);
+   nir_ssa_def *u = sample_plane(b, tex, 1);
+   nir_ssa_def *v = sample_plane(b, tex, 2);
+
+   convert_yuv_to_rgb(b, tex,
+  nir_channel(b, y, 0),
+  nir_channel(b, u, 0),
+  nir_channel(b, v, 0));
+}
+
+static void
+lower_yx_xuxv_external(nir_builder *b, nir_tex_instr *tex)
+{
+   b->cursor = nir_after_instr(>instr);
+
+   nir_ssa_def *y = sample_plane(b, tex, 0);
+   nir_ssa_def *xuxv = sample_plane(b, tex, 1);
+
+   convert_yuv_to_rgb(b, tex,
+  nir_channel(b, y, 0),
+  nir_channel(b, xuxv, 1),
+  nir_channel(b, xuxv, 3));
+}
+
 static void
 saturate_src(nir_builder *b, nir_tex_instr *tex, unsigned sat_mask)
 {
@@ -344,6 +447,22 @@ nir_lower_tex_block(nir_block *block, nir_builder *b,

[Mesa-dev] [PATCH 04/11] i965: Add new intel_set_texture_image_mt() helper

2016-05-11 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

This factors out the work of setting up a miptree as the backing for a
texture image into a new helper.
---
 src/mesa/drivers/dri/i965/intel_tex_image.c | 69 ++---
 1 file changed, 42 insertions(+), 27 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index 9a40476..b214937 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -135,6 +135,33 @@ intelTexImage(struct gl_context * ctx,
 }
 
 
+static void
+intel_set_texture_image_mt(struct brw_context *brw,
+   struct gl_texture_image *image,
+   struct intel_mipmap_tree *mt)
+
+{
+   const uint32_t internal_format = _mesa_get_format_base_format(mt->format);
+   struct gl_texture_object *texobj = image->TexObject;
+   struct intel_texture_object *intel_texobj = intel_texture_object(texobj);
+   struct intel_texture_image *intel_image = intel_texture_image(image);
+
+   _mesa_init_teximage_fields(>ctx, image,
+ mt->logical_width0, mt->logical_height0, 1,
+ 0, internal_format, mt->format);
+
+   brw->ctx.Driver.FreeTextureImageBuffer(>ctx, image);
+
+   intel_texobj->needs_validate = true;
+   intel_image->base.RowStride = mt->pitch / mt->cpp;
+   assert(mt->pitch % mt->cpp == 0);
+
+   intel_miptree_reference(_image->mt, mt);
+
+   /* Immediately validate the image to the object. */
+   intel_miptree_reference(_texobj->mt, mt);
+}
+
 /**
  * Binds a BO to a texture image, as if it was uploaded by glTexImage2D().
  *
@@ -154,29 +181,21 @@ intel_set_texture_image_bo(struct gl_context *ctx,
uint32_t layout_flags)
 {
struct brw_context *brw = brw_context(ctx);
-   struct intel_texture_image *intel_image = intel_texture_image(image);
-   struct gl_texture_object *texobj = image->TexObject;
-   struct intel_texture_object *intel_texobj = intel_texture_object(texobj);
uint32_t draw_x, draw_y;
+   struct intel_mipmap_tree *mt;
 
-   _mesa_init_teximage_fields(>ctx, image,
- width, height, 1,
- 0, internalFormat, format);
-
-   ctx->Driver.FreeTextureImageBuffer(ctx, image);
-
-   intel_image->mt = intel_miptree_create_for_bo(brw, bo, image->TexFormat,
- 0, width, height, 1, pitch,
- layout_flags);
-   if (intel_image->mt == NULL)
+   mt = intel_miptree_create_for_bo(brw, bo, image->TexFormat,
+0, width, height, 1, pitch,
+layout_flags);
+   if (mt == NULL)
return;
-   intel_image->mt->target = target;
-   intel_image->mt->total_width = width;
-   intel_image->mt->total_height = height;
-   intel_image->mt->level[0].slice[0].x_offset = tile_x;
-   intel_image->mt->level[0].slice[0].y_offset = tile_y;
+   mt->target = target;
+   mt->total_width = width;
+   mt->total_height = height;
+   mt->level[0].slice[0].x_offset = tile_x;
+   mt->level[0].slice[0].y_offset = tile_y;
 
-   intel_miptree_get_tile_offsets(intel_image->mt, 0, 0, _x, _y);
+   intel_miptree_get_tile_offsets(mt, 0, 0, _x, _y);
 
/* From "OES_EGL_image" error reporting. We report GL_INVALID_OPERATION
 * for EGL images from non-tile aligned sufaces in gen4 hw and earlier 
which has
@@ -185,18 +204,14 @@ intel_set_texture_image_bo(struct gl_context *ctx,
if (!brw->has_surface_tile_offset &&
(draw_x != 0 || draw_y != 0)) {
   _mesa_error(ctx, GL_INVALID_OPERATION, __func__);
-  intel_miptree_release(_image->mt);
+  intel_miptree_release();
   return;
}
 
-   intel_texobj->needs_validate = true;
-
-   intel_image->mt->offset = offset;
-   assert(pitch % intel_image->mt->cpp == 0);
-   intel_image->base.RowStride = pitch / intel_image->mt->cpp;
+   mt->offset = offset;
 
-   /* Immediately validate the image to the object. */
-   intel_miptree_reference(_texobj->mt, intel_image->mt);
+   intel_set_texture_image_mt(brw, image, mt);
+   intel_miptree_release();
 }
 
 void
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 02/11] nir: Handle NULL in nir_copy_deref()

2016-05-11 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

Reviewed-by: Jason Ekstrand 

---
 src/compiler/nir/nir.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c
index 867a43c..f15c993 100644
--- a/src/compiler/nir/nir.c
+++ b/src/compiler/nir/nir.c
@@ -642,6 +642,9 @@ copy_deref_struct(void *mem_ctx, nir_deref_struct *deref)
 nir_deref *
 nir_copy_deref(void *mem_ctx, nir_deref *deref)
 {
+   if (deref == NULL)
+  return NULL;
+
switch (deref->deref_type) {
case nir_deref_type_var:
   return _deref_var(mem_ctx, nir_deref_as_var(deref))->deref;
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 11/11] dri: Add YVU formats

2016-05-11 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

---
 include/GL/internal/dri_interface.h  |  5 +
 src/mesa/drivers/dri/i965/intel_screen.c | 26 ++
 2 files changed, 31 insertions(+)

diff --git a/include/GL/internal/dri_interface.h 
b/include/GL/internal/dri_interface.h
index 84731a0..4049be6 100644
--- a/include/GL/internal/dri_interface.h
+++ b/include/GL/internal/dri_interface.h
@@ -1158,6 +1158,11 @@ struct __DRIdri2ExtensionRec {
 #define __DRI_IMAGE_FOURCC_NV160x3631564e
 #define __DRI_IMAGE_FOURCC_YUYV0x56595559
 
+#define __DRI_IMAGE_FOURCC_YVU410  0x39555659
+#define __DRI_IMAGE_FOURCC_YVU411  0x31315659
+#define __DRI_IMAGE_FOURCC_YVU420  0x32315659
+#define __DRI_IMAGE_FOURCC_YVU422  0x36315659
+#define __DRI_IMAGE_FOURCC_YVU444  0x34325659
 
 /**
  * Queryable on images created by createImageFromNames.
diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index 599ec19..2a11d49 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -224,6 +224,7 @@ static struct intel_image_format intel_image_formats[] = {
{ __DRI_IMAGE_FOURCC_XBGR, __DRI_IMAGE_COMPONENTS_RGB, 1,
  { { 0, 0, 0, __DRI_IMAGE_FORMAT_XBGR, 4 }, } },
 
+
{ __DRI_IMAGE_FOURCC_RGB565, __DRI_IMAGE_COMPONENTS_RGB, 1,
  { { 0, 0, 0, __DRI_IMAGE_FORMAT_RGB565, 2 } } },
 
@@ -258,6 +259,31 @@ static struct intel_image_format intel_image_formats[] = {
{ 1, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
{ 2, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 } } },
 
+   { __DRI_IMAGE_FOURCC_YVU410, __DRI_IMAGE_COMPONENTS_Y_U_V, 3,
+ { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 2, 2, 2, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 1, 2, 2, __DRI_IMAGE_FORMAT_R8, 1 } } },
+
+   { __DRI_IMAGE_FOURCC_YVU411, __DRI_IMAGE_COMPONENTS_Y_U_V, 3,
+ { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 2, 2, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 1, 2, 0, __DRI_IMAGE_FORMAT_R8, 1 } } },
+
+   { __DRI_IMAGE_FOURCC_YVU420, __DRI_IMAGE_COMPONENTS_Y_U_V, 3,
+ { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 2, 1, 1, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 1, 1, 1, __DRI_IMAGE_FORMAT_R8, 1 } } },
+
+   { __DRI_IMAGE_FOURCC_YVU422, __DRI_IMAGE_COMPONENTS_Y_U_V, 3,
+ { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 2, 1, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 1, 1, 0, __DRI_IMAGE_FORMAT_R8, 1 } } },
+
+   { __DRI_IMAGE_FOURCC_YVU444, __DRI_IMAGE_COMPONENTS_Y_U_V, 3,
+ { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 2, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 1, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 } } },
+
{ __DRI_IMAGE_FOURCC_NV12, __DRI_IMAGE_COMPONENTS_Y_UV, 2,
  { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
{ 1, 1, 1, __DRI_IMAGE_FORMAT_GR88, 2 } } },
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 09/11] i965: Invoke lowering pass for YUV textures

2016-05-11 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

---
 src/mesa/drivers/dri/i965/brw_compiler.h |  7 +++
 src/mesa/drivers/dri/i965/brw_nir.c  |  4 
 src/mesa/drivers/dri/i965/brw_wm.c   | 29 +
 3 files changed, 40 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
b/src/mesa/drivers/dri/i965/brw_compiler.h
index 7d75202..e28711e 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.h
+++ b/src/mesa/drivers/dri/i965/brw_compiler.h
@@ -159,6 +159,13 @@ struct brw_sampler_prog_key_data {
 * For Sandybridge, which shader w/a we need for gather quirks.
 */
enum gen6_gather_sampler_wa gen6_gather_wa[MAX_SAMPLERS];
+
+   /**
+* Texture units that have a YUV image bound.
+*/
+   uint32_t y_u_v_image_mask;
+   uint32_t y_uv_image_mask;
+   uint32_t yx_xuxv_image_mask;
 };
 
 
diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index c501bc1..a6837a9 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -616,6 +616,10 @@ brw_nir_apply_sampler_key(nir_shader *nir,
  tex_options.swizzles[s][c] = GET_SWZ(key_tex->swizzles[s], c);
}
 
+   tex_options.lower_y_uv_external = key_tex->y_uv_image_mask;
+   tex_options.lower_y_u_v_external = key_tex->y_u_v_image_mask;
+   tex_options.lower_yx_xuxv_external = key_tex->yx_xuxv_image_mask;
+
if (nir_lower_tex(nir, _options)) {
   nir_validate_shader(nir);
   nir = nir_optimize(nir, is_scalar);
diff --git a/src/mesa/drivers/dri/i965/brw_wm.c 
b/src/mesa/drivers/dri/i965/brw_wm.c
index dbc626c..09aecc6 100644
--- a/src/mesa/drivers/dri/i965/brw_wm.c
+++ b/src/mesa/drivers/dri/i965/brw_wm.c
@@ -35,6 +35,7 @@
 #include "program/prog_parameter.h"
 #include "program/program.h"
 #include "intel_mipmap_tree.h"
+#include "intel_image.h"
 #include "brw_nir.h"
 #include "brw_program.h"
 
@@ -206,6 +207,16 @@ brw_debug_recompile_sampler_key(struct brw_context *brw,
   old_key->msaa_16,
   key->msaa_16);
 
+   found |= key_debug(brw, "y_uv image bound",
+  old_key->y_uv_image_mask,
+  key->y_uv_image_mask);
+   found |= key_debug(brw, "y_u_v image bound",
+  old_key->y_u_v_image_mask,
+  key->y_u_v_image_mask);
+   found |= key_debug(brw, "yx_xuxv image bound",
+  old_key->yx_xuxv_image_mask,
+  key->yx_xuxv_image_mask);
+
for (unsigned int i = 0; i < MAX_SAMPLERS; i++) {
   found |= key_debug(brw, "textureGather workarounds",
  old_key->gen6_gather_wa[i], key->gen6_gather_wa[i]);
@@ -370,6 +381,24 @@ brw_populate_sampler_prog_key_data(struct gl_context *ctx,
key->msaa_16 |= 1 << s;
 }
  }
+
+ if (t->Target == GL_TEXTURE_EXTERNAL_OES && intel_tex->dri_image) {
+__DRIimage *image = intel_tex->dri_image;
+switch (image->planar_format->components) {
+case __DRI_IMAGE_COMPONENTS_Y_UV:
+   key->y_uv_image_mask |= 1 << s;
+   break;
+case __DRI_IMAGE_COMPONENTS_Y_U_V:
+   key->y_u_v_image_mask |= 1 << s;
+   break;
+case __DRI_IMAGE_COMPONENTS_Y_XUXV:
+   key->yx_xuxv_image_mask |= 1 << s;
+   break;
+default:
+   break;
+}
+ }
+
   }
}
 }
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 05/11] i965: Use intel_set_texture_image_mt() in intelSetTexBuffer2()

2016-05-11 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

Create the mt for the drawable bo directly and call our new
intel_miptree_create_for_bo() helper instead.
---
 src/mesa/drivers/dri/i965/intel_tex_image.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index b214937..7d71aa2 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -225,8 +225,8 @@ intelSetTexBuffer2(__DRIcontext *pDRICtx, GLint target,
struct intel_renderbuffer *rb;
struct gl_texture_object *texObj;
struct gl_texture_image *texImage;
-   int level = 0, internalFormat = 0;
mesa_format texFormat = MESA_FORMAT_NONE;
+   struct intel_mipmap_tree *mt;
 
texObj = _mesa_get_current_tex_object(ctx, target);
 
@@ -246,27 +246,30 @@ intelSetTexBuffer2(__DRIcontext *pDRICtx, GLint target,
 
if (rb->mt->cpp == 4) {
   if (texture_format == __DRI_TEXTURE_FORMAT_RGB) {
- internalFormat = GL_RGB;
  texFormat = MESA_FORMAT_B8G8R8X8_UNORM;
   }
   else {
- internalFormat = GL_RGBA;
  texFormat = MESA_FORMAT_B8G8R8A8_UNORM;
   }
} else if (rb->mt->cpp == 2) {
-  internalFormat = GL_RGB;
   texFormat = MESA_FORMAT_B5G6R5_UNORM;
}
 
-   _mesa_lock_texture(>ctx, texObj);
-   texImage = _mesa_get_tex_image(ctx, texObj, target, level);
intel_miptree_make_shareable(brw, rb->mt);
-   intel_set_texture_image_bo(ctx, texImage, rb->mt->bo, target,
-  internalFormat, texFormat, 0,
-  rb->Base.Base.Width,
-  rb->Base.Base.Height,
-  rb->mt->pitch,
-  0, 0, 0);
+   mt = intel_miptree_create_for_bo(brw, rb->mt->bo, texFormat, 0,
+rb->Base.Base.Width,
+rb->Base.Base.Height,
+1, rb->mt->pitch, 0);
+   if (mt == NULL)
+   return;
+   mt->target = target;
+   mt->total_width = rb->Base.Base.Width;
+   mt->total_height = rb->Base.Base.Height;
+
+   _mesa_lock_texture(>ctx, texObj);
+   texImage = _mesa_get_tex_image(ctx, texObj, target, 0);
+   intel_set_texture_image_mt(brw, texImage, mt);
+   intel_miptree_release();
_mesa_unlock_texture(>ctx, texObj);
 }
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 08/11] i965: Support textures with multiple planes

2016-05-11 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

---
 src/mesa/drivers/dri/i965/brw_compiler.h  |  1 +
 src/mesa/drivers/dri/i965/brw_context.h   |  2 +-
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp  | 13 
 src/mesa/drivers/dri/i965/brw_shader.cpp  |  9 ++
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c  | 38 +--
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |  3 +-
 src/mesa/drivers/dri/i965/gen8_surface_state.c| 12 ++-
 7 files changed, 59 insertions(+), 19 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
b/src/mesa/drivers/dri/i965/brw_compiler.h
index 5807305..7d75202 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.h
+++ b/src/mesa/drivers/dri/i965/brw_compiler.h
@@ -329,6 +329,7 @@ struct brw_stage_prog_data {
   uint32_t abo_start;
   uint32_t image_start;
   uint32_t shader_time_start;
+  uint32_t plane_start[3];
   /** @} */
} binding_table;
 
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 035cbe9..a923516 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -724,7 +724,7 @@ struct brw_context
   void (*update_texture_surface)(struct gl_context *ctx,
  unsigned unit,
  uint32_t *surf_offset,
- bool for_gather);
+ bool for_gather, uint32_t plane);
   uint32_t (*update_renderbuffer_surface)(struct brw_context *brw,
   struct gl_renderbuffer *rb,
   bool layered, unsigned unit,
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index c2274ba..6192378 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -3745,6 +3745,19 @@ fs_visitor::nir_emit_texture(const fs_builder , 
nir_tex_instr *instr)
  break;
   }
 
+  case nir_tex_src_plane: {
+ nir_const_value *const_plane =
+nir_src_as_const_value(instr->src[i].src);
+ const uint32_t plane = const_plane->u32[0];
+ const uint32_t texture_index =
+instr->texture_index +
+stage_prog_data->binding_table.plane_start[plane] -
+stage_prog_data->binding_table.texture_start;
+
+ srcs[TEX_LOGICAL_SRC_SURFACE] = brw_imm_ud(texture_index);
+ break;
+  }
+
   default:
  unreachable("unknown texture source");
   }
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index a23f14e..681b170 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -1232,6 +1232,15 @@ brw_assign_common_binding_table_offsets(gl_shader_stage 
stage,
stage_prog_data->binding_table.pull_constants_start = 
next_binding_table_offset;
next_binding_table_offset++;
 
+   /* Plane 0 is just the regular texture section */
+   stage_prog_data->binding_table.plane_start[0] = 
stage_prog_data->binding_table.texture_start;
+
+   stage_prog_data->binding_table.plane_start[1] = next_binding_table_offset;
+   next_binding_table_offset += num_textures;
+
+   stage_prog_data->binding_table.plane_start[2] = next_binding_table_offset;
+   next_binding_table_offset += num_textures;
+
assert(next_binding_table_offset <= BRW_MAX_SURFACES);
 
/* prog_data->base.binding_table.size will be set by brw_mark_surface_used. 
*/
diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
index b00ebd1..b73d5d5 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -316,7 +316,8 @@ static void
 brw_update_texture_surface(struct gl_context *ctx,
unsigned unit,
uint32_t *surf_offset,
-   bool for_gather)
+   bool for_gather,
+   uint32_t plane)
 {
struct brw_context *brw = brw_context(ctx);
struct gl_texture_object *tObj = ctx->Texture.Unit[unit]._Current;
@@ -827,7 +828,7 @@ static void
 update_stage_texture_surfaces(struct brw_context *brw,
   const struct gl_program *prog,
   struct brw_stage_state *stage_state,
-  bool for_gather)
+  bool for_gather, uint32_t plane)
 {
if (!prog)
   return;
@@ -840,7 +841,7 @@ update_stage_texture_surfaces(struct brw_context *brw,
if (for_gather)
   surf_offset += 
stage_state->prog_data->binding_table.gather_texture_start;
else
-  surf_offset += stage_state->prog_data->binding_table.texture_start;
+

[Mesa-dev] [PATCH 00/11] YUV EGLImage sampling v2

2016-05-11 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

Here's v2 of the series. Incorporates Topi's and Jason comments, but
also refactors the miptree creation a bit. The old series broke
support for certain EGLImages (images without a planar_format), but
they're now working as they did before.

Kristian

Kristian Høgsberg Kristensen (11):
  nir: Add new 'plane' texture source type
  nir: Handle NULL in nir_copy_deref()
  nir: Add a lowering pass for YUV textures
  i965: Add new intel_set_texture_image_mt() helper
  i965: Use intel_set_texture_image_mt() in intelSetTexBuffer2()
  i965: Refactor intel_set_texture_image_bo() to
create_mt_for_dri_image()
  i965: Create multiple miptrees for planar YUV images
  i965: Support textures with multiple planes
  i965: Invoke lowering pass for YUV textures
  i965: Allow creating planar YUV __DRIimages
  dri: Add YVU formats

 include/GL/internal/dri_interface.h   |   5 +
 src/compiler/nir/nir.c|   3 +
 src/compiler/nir/nir.h|   8 +
 src/compiler/nir/nir_lower_tex.c  | 119 ++
 src/compiler/nir/nir_print.c  |   3 +
 src/mesa/drivers/dri/i965/brw_compiler.h  |   8 +
 src/mesa/drivers/dri/i965/brw_context.h   |   2 +-
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp  |  13 ++
 src/mesa/drivers/dri/i965/brw_nir.c   |   4 +
 src/mesa/drivers/dri/i965/brw_shader.cpp  |   9 ++
 src/mesa/drivers/dri/i965/brw_wm.c|  29 
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c  |  38 +++--
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |   3 +-
 src/mesa/drivers/dri/i965/gen8_surface_state.c|  12 +-
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c |   3 +
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h |   5 +
 src/mesa/drivers/dri/i965/intel_screen.c  |  59 +--
 src/mesa/drivers/dri/i965/intel_tex_image.c   | 188 ++
 src/mesa/drivers/dri/i965/intel_tex_obj.h |   2 +
 19 files changed, 414 insertions(+), 99 deletions(-)

-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 10/11] i965: Allow creating planar YUV __DRIimages

2016-05-11 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

Lift the resctriction we had before and allow creation of images with
multiple planes. We still require all the planes to be within the same
bo.
---
 src/mesa/drivers/dri/i965/intel_screen.c | 33 ++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index f9b5484..599ec19 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -676,9 +676,14 @@ intel_create_image_from_fds(__DRIscreen *screen,
__DRIimage *image;
int i, index;
 
-   if (fds == NULL || num_fds != 1)
+   if (fds == NULL || num_fds < 1)
   return NULL;
 
+   /* We only support all planes from the same bo */
+   for (i = 0; i < num_fds; i++)
+  if (fds[0] != fds[i])
+ return NULL;
+
f = intel_image_format_lookup(fourcc);
if (f == NULL)
   return NULL;
@@ -691,22 +696,28 @@ intel_create_image_from_fds(__DRIscreen *screen,
if (image == NULL)
   return NULL;
 
-   image->bo = drm_intel_bo_gem_create_from_prime(intelScreen->bufmgr,
-  fds[0],
-  height * strides[0]);
-   if (image->bo == NULL) {
-  free(image);
-  return NULL;
-   }
image->width = width;
image->height = height;
image->pitch = strides[0];
 
image->planar_format = f;
+   int size = 0;
for (i = 0; i < f->nplanes; i++) {
   index = f->planes[i].buffer_index;
   image->offsets[index] = offsets[index];
   image->strides[index] = strides[index];
+
+  const int height = height >> f->planes[i].height_shift;
+  const int end = offsets[index] + height * strides[index];
+  if (size < end)
+ size = end;
+   }
+
+   image->bo = drm_intel_bo_gem_create_from_prime(intelScreen->bufmgr,
+  fds[0], size);
+   if (image->bo == NULL) {
+  free(image);
+  return NULL;
}
 
if (f->nplanes == 1) {
@@ -732,12 +743,6 @@ intel_create_image_from_dma_bufs(__DRIscreen *screen,
__DRIimage *image;
struct intel_image_format *f = intel_image_format_lookup(fourcc);
 
-   /* For now only packed formats that have native sampling are supported. */
-   if (!f || f->nplanes != 1) {
-  *error = __DRI_IMAGE_ERROR_BAD_MATCH;
-  return NULL;
-   }
-
image = intel_create_image_from_fds(screen, width, height, fourcc, fds,
num_fds, strides, offsets,
loaderPrivate);
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 06/11] i965: Refactor intel_set_texture_image_bo() to create_mt_for_dri_image()

2016-05-11 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

This function now only creates the mt and we then call
intel_set_texture_image_mt() in intel_image_target_texture_2d() to set
it for the texture image.
---
 src/mesa/drivers/dri/i965/intel_tex_image.c | 69 +
 1 file changed, 30 insertions(+), 39 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index 7d71aa2..4d20a86 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -167,33 +167,30 @@ intel_set_texture_image_mt(struct brw_context *brw,
  *
  * Used for GLX_EXT_texture_from_pixmap and EGL image extensions,
  */
-static void
-intel_set_texture_image_bo(struct gl_context *ctx,
-   struct gl_texture_image *image,
-   drm_intel_bo *bo,
-   GLenum target,
-   GLenum internalFormat,
-   mesa_format format,
-   uint32_t offset,
-   GLuint width, GLuint height,
-   GLuint pitch,
-   GLuint tile_x, GLuint tile_y,
-   uint32_t layout_flags)
+static struct intel_mipmap_tree *
+create_mt_for_dri_image(struct brw_context *brw,
+GLenum target, __DRIimage *image)
 {
-   struct brw_context *brw = brw_context(ctx);
-   uint32_t draw_x, draw_y;
struct intel_mipmap_tree *mt;
+   uint32_t draw_x, draw_y;
 
-   mt = intel_miptree_create_for_bo(brw, bo, image->TexFormat,
-0, width, height, 1, pitch,
-layout_flags);
+   /* Disable creation of the texture's aux buffers because the driver exposes
+* no EGL API to manage them. That is, there is no API for resolving the aux
+* buffer's content to the main buffer nor for invalidating the aux buffer's
+* content.
+*/
+   mt = intel_miptree_create_for_bo(brw, image->bo, image->format,
+0, image->width, image->height, 1,
+image->pitch,
+MIPTREE_LAYOUT_DISABLE_AUX);
if (mt == NULL)
-   return;
+  return NULL;
+
mt->target = target;
-   mt->total_width = width;
-   mt->total_height = height;
-   mt->level[0].slice[0].x_offset = tile_x;
-   mt->level[0].slice[0].y_offset = tile_y;
+   mt->total_width = image->width;
+   mt->total_height = image->height;
+   mt->level[0].slice[0].x_offset = image->tile_x;
+   mt->level[0].slice[0].y_offset = image->tile_y;
 
intel_miptree_get_tile_offsets(mt, 0, 0, _x, _y);
 
@@ -203,15 +200,14 @@ intel_set_texture_image_bo(struct gl_context *ctx,
 */
if (!brw->has_surface_tile_offset &&
(draw_x != 0 || draw_y != 0)) {
-  _mesa_error(ctx, GL_INVALID_OPERATION, __func__);
+  _mesa_error(>ctx, GL_INVALID_OPERATION, __func__);
   intel_miptree_release();
-  return;
+  return NULL;
}
 
-   mt->offset = offset;
+   mt->offset = image->offset;
 
-   intel_set_texture_image_mt(brw, image, mt);
-   intel_miptree_release();
+   return mt;
 }
 
 void
@@ -324,6 +320,7 @@ intel_image_target_texture_2d(struct gl_context *ctx, 
GLenum target,
  GLeglImageOES image_handle)
 {
struct brw_context *brw = brw_context(ctx);
+   struct intel_mipmap_tree *mt;
__DRIscreen *screen;
__DRIimage *image;
 
@@ -351,18 +348,12 @@ intel_image_target_texture_2d(struct gl_context *ctx, 
GLenum target,
   return;
}
 
-   /* Disable creation of the texture's aux buffers because the driver exposes
-* no EGL API to manage them. That is, there is no API for resolving the aux
-* buffer's content to the main buffer nor for invalidating the aux buffer's
-* content.
-*/
-   intel_set_texture_image_bo(ctx, texImage, image->bo,
-  target, image->internal_format,
-  image->format, image->offset,
-  image->width,  image->height,
-  image->pitch,
-  image->tile_x, image->tile_y,
-  MIPTREE_LAYOUT_DISABLE_AUX);
+   mt = create_mt_for_dri_image(brw, target, image);
+   if (mt == NULL)
+  return;
+
+   intel_set_texture_image_mt(brw, texImage, mt);
+   intel_miptree_release();
 }
 
 /**
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 06/14] vl/dri3: add back buffers support

2016-05-11 Thread Leo Liu

>> +   whandle.type= DRM_API_HANDLE_TYPE_FD;
>> +   usage = PIPE_HANDLE_USAGE_EXPLICIT_FLUSH | PIPE_HANDLE_USAGE_READ;
>Here using

>PIPE_HANDLE_USAGE_EXPLICIT_FLUSH

> is wrong. Both vaapi and vdpau don't call flush_resource.

>Perhaps vaapi and vdpau can get fixed to call it, I don't know which 
is best

>for the target usage of the resources.

vaapi/vdpau do flush the pipe, "pipe->flush(pipe, >fence, 0)" get 
called in both vaapi(surface.c) and vdpau(presentation.c)

>Another thing is that I think there is no guarantee the Xserver 
releases all the pixmaps,

>and that it could keep one infinitely (until window destruction).

>Thus if the drawable is changed by the user, but the previous drawable 
isn't destroyed by the user,
>one buffer can stay busy forever. Change several times of drawable and 
you get stuck...

Yeah, although haven't hit by this, but absolutely put it into future 
work list.
I haven't seen any related fix for glx and egl. Do you aware of any for 
them?

>If I understand dri3_get_back_buffer will puck an idle buffer of the 
buffer list.

>Is it really what is expected ?

Yes. Indeed.

>Shouldn't vl_dri3_screen_texture_from_drawable return a texture on the 
last back buffer sent instead ?

Not yet.

>I don't know what vl_dri3_screen_texture_from_drawable is supposed to 
do, so perhaps I'm wrong.

in vdpau for example(presentation.c).

VdpStatus
vlVdpPresentationQueueDisplay(VdpPresentationQueue presentation_queue,
  VdpOutputSurface surface,
  uint32_t clip_width,
  uint32_t clip_height,
  VdpTime  earliest_presentation_time)
{
...
   tex = vscreen->texture_from_drawable(vscreen, (void *)pq->drawable);

   dirty_area = vscreen->get_dirty_area(vscreen);

   vl_compositor_render(cstate, compositor, surf_draw, dirty_area, true);

   vscreen->set_next_timestamp(vscreen, earliest_presentation_time);
   pipe->screen->flush_frontbuffer(pipe->screen, tex, 0, 0,
   vscreen->get_private(vscreen), NULL);

   pipe->screen->fence_reference(pipe->screen, >fence, NULL);
   pipe->flush(pipe, >fence, 0);
   pq->last_surf = surf;
...
}

so texture from texture_from_drawable is the one going to be rendered.

Thanks,
Leo

On 05/11/2016 02:53 PM, Axel Davy wrote:

Again another comment for the same patch:

vl_dri3_screen_texture_from_drawable seem to call dri3_get_back_buffer 
in the !is_pixmap case.

If I understand dri3_get_back_buffer will puck an idle buffer of the 
buffer list.

Is it really what is expected ?

Shouldn't vl_dri3_screen_texture_from_drawable return a texture on the 
last back buffer sent instead ?
I don't know what vl_dri3_screen_texture_from_drawable is supposed to 
do, so perhaps I'm wrong.

On 11/05/2016 20:42, Axel Davy wrote:
Another thing is that I think there is no guarantee the Xserver 
releases all the pixmaps,

and that it could keep one infinitely (until window destruction).

Thus if the drawable is changed by the user, but the previous 
drawable isn't destroyed by the user,
one buffer can stay busy forever. Change several times of drawable 
and you get stuck...

I think the loader dri3 code suffers the same issues.

For gallium nine, we use the following code:
https://github.com/iXit/wine/blob/master/dlls/d3d9-nine/dri3.c#L739
the code is a bit complicated because of thread safety, but likely 
you don't need the thread safety part for you.
Basically the idea is that when only one pixmap hasn't been released, 
you send it again with the copy flag, which garantees it will get 
released.

Axel

On 11/05/2016 20:29, Axel Davy wrote:

On 11/05/2016 17:06, Leo Liu wrote:

This implements DRI3 PixmapFromBuffer. Create buffer objects, and
associate it to a dma-buf fd, and then pass this fd with a pixmap
ID to X server for creating pixmap object; also add a function
for wait events.

Signed-off-by: Leo Liu 
---
  src/gallium/auxiliary/vl/vl_winsys_dri3.c | 187 
+-

  1 file changed, 185 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c

index ef80730..e78ca07 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -28,17 +28,35 @@
  #include 
#include 
+#include 
  #include 
  #include 
#include "loader.h"
#include "pipe/p_screen.h"
+#include "pipe/p_state.h"
  #include "pipe-loader/pipe_loader.h"
#include "util/u_memory.h"
+#include "util/u_inlines.h"
+
  #include "vl/vl_winsys.h"
  +#define BACK_BUFFER_NUM 3
+
+struct vl_dri3_buffer
+{
+   struct pipe_resource *texture;
+
+   uint32_t pixmap;
+   uint32_t sync_fence;
+   struct xshmfence *shm_fence;
+
+   bool busy;
+   uint32_t width, height, pitch;
+};
+
  struct vl_dri3_screen
  {
 struct vl_screen base;
@@ -48,9 +66,23 @@ struct vl_dri3_screen
 uint32_t width, height,

Re: [Mesa-dev] [PATCH v5] Add .mailmap

2016-05-11 Thread Rob Clark

yes please..  scripts/get_reviewer.pl should in theory respect this
too, so useful to have

BR,
-R

On Wed, May 11, 2016 at 3:16 PM, Jason Ekstrand  wrote:
> Is there a reason this never got merged?  I'm up for just landing it now and
> letting people fix up names as needed.
> --Jason
>
> On Mon, Dec 28, 2015 at 1:50 AM, Giuseppe Bilotta
>  wrote:
>>
>> This adds a first tentative .mailmap file, to canonicize contributor
>> name/emails in shortlogs and other statistical endeavours.
>>
>> Signed-off-by: Giuseppe Bilotta 
>> ---
>> Hopefully the last time I need to submit this …
>>
>>  .mailmap | 460
>> +++
>>  1 file changed, 460 insertions(+)
>>  create mode 100644 .mailmap
>>
>> diff --git a/.mailmap b/.mailmap
>> new file mode 100644
>> index 000..10811c0
>> --- /dev/null
>> +++ b/.mailmap
>> @@ -0,0 +1,460 @@
>> +Aapo Tahkola  
>> +
>> +Adam Jackson  
>> +Adam Jackson  
>> +
>> +Adrian Marius Negreanu  Adrian Negreanu
>> 
>> +Adrian Marius Negreanu  Negreanu Marius
>> Adrian 
>> +
>> +Dave Airlie  
>> +Dave Airlie  airlied
>> 
>> +Dave Airlie  
>> +Dave Airlie  
>> +Dave Airlie  
>> +Dave Airlie  
>> +Dave Airlie  
>> +Dave Airlie  
>> +Dave Airlie  
>> +
>> +Alan Coopersmith  
>> +
>> +Alan Hourihane  
>> +Alan Hourihane  
>> +Alan Hourihane  
>> +
>> +Alexander Monakov  
>> +
>> +Alexander von Gluck IV  Alexander von Gluck
>> 
>> +
>> +Alex Corscadden  
>> +Alex Corscadden  
>> +
>> +Alex Deucher  
>> +Alex Deucher  
>> +Alex Deucher  
>> +Alex Deucher  
>> +Alex Deucher  
>> +Alex Deucher  
>> +
>> +Andreas Fänger  
>> +
>> +Andreas Hartmetz  
>> +
>> +Andre Heider 
>> +Andreas Heider 
>> +
>> +Andreas Pokorny 
>> 
>> +
>> +Andrew Randrianasulu  
>> +Andrew Randrianasulu  
>> +
>> +Arthur Huillet  Arthur HUILLET
>> 
>> +
>> +Benjamin Franzke  ben
>> 
>> +
>> +Ben Skeggs  
>> +Ben Skeggs  
>> +Ben Skeggs  
>> +Ben Skeggs  
>> +Ben Skeggs  
>> +Ben Skeggs  
>> +Ben Skeggs  
>> +
>> +Ben Widawsky  Ben Widawsky
>> 
>> +
>> +Blair Sadewitz  Blair Sadewitz
>> 
>> +
>> +Boris Peterbarg  reist 
>> +
>> +Brian Paul  Brian 
>> +Brian Paul  
>> +Brian Paul  
>> +Brian Paul  
>> +Brian Paul  brian 
>> +Brian Paul  Brian 
>> +Brian Paul  Brian 
>> +Brian Paul  Brian 
>> +Brian Paul  Brian 
>> +Brian Paul  Brian 
>> +Brian Paul  Brian 
>> +Brian Paul

Re: [Mesa-dev] [PATCH] nir: glsl_get_bit_size() should take glsl_type

2016-05-11 Thread Jason Ekstrand

On Wed, May 11, 2016 at 12:09 PM, Rob Clark  wrote:

> From: Rob Clark 
>
> It's what all the call-sites once, so gets rid of a bunch of inlined
> glsl_get_base_type() at the call-sites.
>

Thank you!  This has been bothering me for a while.  At some point in the
future, a glsl_get_base_type_bit_size() helper may be useful but there's no
need for all the wrapping.

Reviewed-by: Jason Ekstrand 


> Signed-off-by: Rob Clark 
> ---
> Plus, for mediump, going to need glsl_get_base_type() to take a
> precision param.. this should make it less of a challenge to stay
> in 80 columns..
>
>  src/compiler/nir/glsl_to_nir.cpp| 16 
>  src/compiler/nir/nir.c  |  2 +-
>  src/compiler/nir/nir_builder.h  |  2 +-
>  src/compiler/nir/nir_lower_locals_to_regs.c |  2 +-
>  src/compiler/nir/nir_lower_var_copies.c |  3 +--
>  src/compiler/nir/nir_lower_vars_to_ssa.c|  2 +-
>  src/compiler/nir_types.h|  4 ++--
>  src/compiler/spirv/spirv_to_nir.c   |  6 +++---
>  src/compiler/spirv/vtn_variables.c  |  4 ++--
>  9 files changed, 20 insertions(+), 21 deletions(-)
>
> diff --git a/src/compiler/nir/glsl_to_nir.cpp
> b/src/compiler/nir/glsl_to_nir.cpp
> index ee39a3c..9e53e59 100644
> --- a/src/compiler/nir/glsl_to_nir.cpp
> +++ b/src/compiler/nir/glsl_to_nir.cpp
> @@ -857,7 +857,7 @@ nir_visitor::visit(ir_call *ir)
>   instr->num_components = type->vector_elements;
>
>   /* Setup destination register */
> - unsigned bit_size = glsl_get_bit_size(type->base_type);
> + unsigned bit_size = glsl_get_bit_size(type);
>   nir_ssa_dest_init(>instr, >dest,
> type->vector_elements, bit_size, NULL);
>
> @@ -943,7 +943,7 @@ nir_visitor::visit(ir_call *ir)
>   instr->num_components = type->vector_elements;
>
>   /* Setup destination register */
> - unsigned bit_size = glsl_get_bit_size(type->base_type);
> + unsigned bit_size = glsl_get_bit_size(type);
>   nir_ssa_dest_init(>instr, >dest,
> type->vector_elements, bit_size, NULL);
>
> @@ -1006,7 +1006,7 @@ nir_visitor::visit(ir_call *ir)
>
>   /* Atomic result */
>   assert(ir->return_deref);
> - unsigned bit_size =
> glsl_get_bit_size(ir->return_deref->type->base_type);
> + unsigned bit_size = glsl_get_bit_size(ir->return_deref->type);
>   nir_ssa_dest_init(>instr, >dest,
> ir->return_deref->type->vector_elements,
> bit_size, NULL);
> @@ -1187,7 +1187,7 @@ nir_visitor::evaluate_rvalue(ir_rvalue* ir)
>load_instr->num_components = ir->type->vector_elements;
>load_instr->variables[0] = this->deref_head;
>ralloc_steal(load_instr, load_instr->variables[0]);
> -  unsigned bit_size = glsl_get_bit_size(ir->type->base_type);
> +  unsigned bit_size = glsl_get_bit_size(ir->type);
>add_instr(_instr->instr, ir->type->vector_elements, bit_size);
> }
>
> @@ -1208,7 +1208,7 @@ nir_visitor::visit(ir_expression *ir)
> case ir_binop_ubo_load: {
>nir_intrinsic_instr *load =
>   nir_intrinsic_instr_create(this->shader, nir_intrinsic_load_ubo);
> -  unsigned bit_size = glsl_get_bit_size(ir->type->base_type);
> +  unsigned bit_size = glsl_get_bit_size(ir->type);
>load->num_components = ir->type->vector_elements;
>load->src[0] = nir_src_for_ssa(evaluate_rvalue(ir->operands[0]));
>load->src[1] = nir_src_for_ssa(evaluate_rvalue(ir->operands[1]));
> @@ -1277,7 +1277,7 @@ nir_visitor::visit(ir_expression *ir)
>intrin->intrinsic == nir_intrinsic_interp_var_at_sample)
>   intrin->src[0] =
> nir_src_for_ssa(evaluate_rvalue(ir->operands[1]));
>
> -  unsigned bit_size =  glsl_get_bit_size(deref->type->base_type);
> +  unsigned bit_size =  glsl_get_bit_size(deref->type);
>add_instr(>instr, deref->type->vector_elements, bit_size);
>
>if (swizzle) {
> @@ -1497,7 +1497,7 @@ nir_visitor::visit(ir_expression *ir)
>   nir_intrinsic_get_buffer_size);
>load->num_components = ir->type->vector_elements;
>load->src[0] = nir_src_for_ssa(evaluate_rvalue(ir->operands[0]));
> -  unsigned bit_size = glsl_get_bit_size(ir->type->base_type);
> +  unsigned bit_size = glsl_get_bit_size(ir->type);
>add_instr(>instr, ir->type->vector_elements, bit_size);
>return;
> }
> @@ -1935,7 +1935,7 @@ nir_visitor::visit(ir_texture *ir)
>
> assert(src_number == num_srcs);
>
> -   unsigned bit_size = glsl_get_bit_size(ir->type->base_type);
> +   unsigned bit_size = glsl_get_bit_size(ir->type);
> add_instr(>instr, nir_tex_instr_dest_size(instr), bit_size);
>  }
>
> diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c
> index

Re: [Mesa-dev] [PATCH v5] Add .mailmap

2016-05-11 Thread Jason Ekstrand

Is there a reason this never got merged?  I'm up for just landing it now
and letting people fix up names as needed.
--Jason

On Mon, Dec 28, 2015 at 1:50 AM, Giuseppe Bilotta <
giuseppe.bilo...@gmail.com> wrote:

> This adds a first tentative .mailmap file, to canonicize contributor
> name/emails in shortlogs and other statistical endeavours.
>
> Signed-off-by: Giuseppe Bilotta 
> ---
> Hopefully the last time I need to submit this …
>
>  .mailmap | 460
> +++
>  1 file changed, 460 insertions(+)
>  create mode 100644 .mailmap
>
> diff --git a/.mailmap b/.mailmap
> new file mode 100644
> index 000..10811c0
> --- /dev/null
> +++ b/.mailmap
> @@ -0,0 +1,460 @@
> +Aapo Tahkola  
> +
> +Adam Jackson  
> +Adam Jackson  
> +
> +Adrian Marius Negreanu  Adrian Negreanu <
> adrian.m.negre...@intel.com>
> +Adrian Marius Negreanu  Negreanu Marius
> Adrian 
> +
> +Dave Airlie  
> +Dave Airlie  airlied <
> airl...@unused-12-215.bne.redhat.com>
> +Dave Airlie  
> +Dave Airlie  
> +Dave Airlie  
> +Dave Airlie  
> +Dave Airlie  
> +Dave Airlie  
> +Dave Airlie  
> +
> +Alan Coopersmith  
> +
> +Alan Hourihane  
> +Alan Hourihane  
> +Alan Hourihane  
> +
> +Alexander Monakov  
> +
> +Alexander von Gluck IV  Alexander von Gluck <
> kallis...@unixzen.com>
> +
> +Alex Corscadden  
> +Alex Corscadden  
> +
> +Alex Deucher  
> +Alex Deucher  
> +Alex Deucher  
> +Alex Deucher  
> +Alex Deucher  
> +Alex Deucher  
> +
> +Andreas Fänger  
> +
> +Andreas Hartmetz  
> +
> +Andre Heider 
> +Andreas Heider 
> +
> +Andreas Pokorny  <
> andreas.poko...@elektrobit.com>
> +
> +Andrew Randrianasulu  
> +Andrew Randrianasulu  
> +
> +Arthur Huillet  Arthur HUILLET <
> arthur.huil...@free.fr>
> +
> +Benjamin Franzke  ben <
> benjaminfran...@googlemail.com>
> +
> +Ben Skeggs  
> +Ben Skeggs  
> +Ben Skeggs  
> +Ben Skeggs  
> +Ben Skeggs  
> +Ben Skeggs  
> +Ben Skeggs  
> +
> +Ben Widawsky  Ben Widawsky  >
> +
> +Blair Sadewitz  Blair Sadewitz <
> blair.sadewitz.gmail.com>
> +
> +Boris Peterbarg  reist 
> +
> +Brian Paul  Brian 
> +Brian Paul  
> +Brian Paul  
> +Brian Paul  
> +Brian Paul  brian 
> +Brian Paul  Brian 
> +Brian Paul  Brian 
> +Brian Paul  Brian 
> +Brian Paul  Brian 
> +Brian Paul  Brian 
> +Brian Paul  Brian 
> +Brian Paul  root 
> +Brian Paul  root 
> +Brian Paul  root 
> +Brian Paul  root 
> +
> +Bruce

[Mesa-dev] [PATCH] nir: glsl_get_bit_size() should take glsl_type

2016-05-11 Thread Rob Clark

From: Rob Clark 

It's what all the call-sites once, so gets rid of a bunch of inlined
glsl_get_base_type() at the call-sites.

Signed-off-by: Rob Clark 
---
Plus, for mediump, going to need glsl_get_base_type() to take a
precision param.. this should make it less of a challenge to stay
in 80 columns..

 src/compiler/nir/glsl_to_nir.cpp| 16 
 src/compiler/nir/nir.c  |  2 +-
 src/compiler/nir/nir_builder.h  |  2 +-
 src/compiler/nir/nir_lower_locals_to_regs.c |  2 +-
 src/compiler/nir/nir_lower_var_copies.c |  3 +--
 src/compiler/nir/nir_lower_vars_to_ssa.c|  2 +-
 src/compiler/nir_types.h|  4 ++--
 src/compiler/spirv/spirv_to_nir.c   |  6 +++---
 src/compiler/spirv/vtn_variables.c  |  4 ++--
 9 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/src/compiler/nir/glsl_to_nir.cpp b/src/compiler/nir/glsl_to_nir.cpp
index ee39a3c..9e53e59 100644
--- a/src/compiler/nir/glsl_to_nir.cpp
+++ b/src/compiler/nir/glsl_to_nir.cpp
@@ -857,7 +857,7 @@ nir_visitor::visit(ir_call *ir)
  instr->num_components = type->vector_elements;
 
  /* Setup destination register */
- unsigned bit_size = glsl_get_bit_size(type->base_type);
+ unsigned bit_size = glsl_get_bit_size(type);
  nir_ssa_dest_init(>instr, >dest,
type->vector_elements, bit_size, NULL);
 
@@ -943,7 +943,7 @@ nir_visitor::visit(ir_call *ir)
  instr->num_components = type->vector_elements;
 
  /* Setup destination register */
- unsigned bit_size = glsl_get_bit_size(type->base_type);
+ unsigned bit_size = glsl_get_bit_size(type);
  nir_ssa_dest_init(>instr, >dest,
type->vector_elements, bit_size, NULL);
 
@@ -1006,7 +1006,7 @@ nir_visitor::visit(ir_call *ir)
 
  /* Atomic result */
  assert(ir->return_deref);
- unsigned bit_size = 
glsl_get_bit_size(ir->return_deref->type->base_type);
+ unsigned bit_size = glsl_get_bit_size(ir->return_deref->type);
  nir_ssa_dest_init(>instr, >dest,
ir->return_deref->type->vector_elements,
bit_size, NULL);
@@ -1187,7 +1187,7 @@ nir_visitor::evaluate_rvalue(ir_rvalue* ir)
   load_instr->num_components = ir->type->vector_elements;
   load_instr->variables[0] = this->deref_head;
   ralloc_steal(load_instr, load_instr->variables[0]);
-  unsigned bit_size = glsl_get_bit_size(ir->type->base_type);
+  unsigned bit_size = glsl_get_bit_size(ir->type);
   add_instr(_instr->instr, ir->type->vector_elements, bit_size);
}
 
@@ -1208,7 +1208,7 @@ nir_visitor::visit(ir_expression *ir)
case ir_binop_ubo_load: {
   nir_intrinsic_instr *load =
  nir_intrinsic_instr_create(this->shader, nir_intrinsic_load_ubo);
-  unsigned bit_size = glsl_get_bit_size(ir->type->base_type);
+  unsigned bit_size = glsl_get_bit_size(ir->type);
   load->num_components = ir->type->vector_elements;
   load->src[0] = nir_src_for_ssa(evaluate_rvalue(ir->operands[0]));
   load->src[1] = nir_src_for_ssa(evaluate_rvalue(ir->operands[1]));
@@ -1277,7 +1277,7 @@ nir_visitor::visit(ir_expression *ir)
   intrin->intrinsic == nir_intrinsic_interp_var_at_sample)
  intrin->src[0] = nir_src_for_ssa(evaluate_rvalue(ir->operands[1]));
 
-  unsigned bit_size =  glsl_get_bit_size(deref->type->base_type);
+  unsigned bit_size =  glsl_get_bit_size(deref->type);
   add_instr(>instr, deref->type->vector_elements, bit_size);
 
   if (swizzle) {
@@ -1497,7 +1497,7 @@ nir_visitor::visit(ir_expression *ir)
  nir_intrinsic_get_buffer_size);
   load->num_components = ir->type->vector_elements;
   load->src[0] = nir_src_for_ssa(evaluate_rvalue(ir->operands[0]));
-  unsigned bit_size = glsl_get_bit_size(ir->type->base_type);
+  unsigned bit_size = glsl_get_bit_size(ir->type);
   add_instr(>instr, ir->type->vector_elements, bit_size);
   return;
}
@@ -1935,7 +1935,7 @@ nir_visitor::visit(ir_texture *ir)
 
assert(src_number == num_srcs);
 
-   unsigned bit_size = glsl_get_bit_size(ir->type->base_type);
+   unsigned bit_size = glsl_get_bit_size(ir->type);
add_instr(>instr, nir_tex_instr_dest_size(instr), bit_size);
 }
 
diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c
index 867a43c..71adcb3 100644
--- a/src/compiler/nir/nir.c
+++ b/src/compiler/nir/nir.c
@@ -694,7 +694,7 @@ nir_deref_get_const_initializer_load(nir_shader *shader, 
nir_deref_var *deref)
   tail = tail->child;
}
 
-   unsigned bit_size = glsl_get_bit_size(glsl_get_base_type(tail->type));
+   unsigned bit_size = glsl_get_bit_size(tail->type);
nir_load_const_instr *load =
   nir_load_const_instr_create(shader, glsl_get_vector_elements(tail->type),

Re: [Mesa-dev] [PATCH 15/23] i965/fs: add SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA helper

2016-05-11 Thread Francisco Jerez

Iago Toral  writes:

> On Wed, 2016-05-11 at 12:49 +0200, Iago Toral wrote:
>> On Tue, 2016-05-10 at 19:10 -0700, Francisco Jerez wrote:
>> > Samuel Iglesias Gonsálvez  writes:
>> > 
>> > > From: Iago Toral Quiroga 
>> > >
>> > > There are a few places where we need to shuffle the result of a 32-bit 
>> > > load
>> > > into valid 64-bit data, so extract this logic into a separate helper 
>> > > that we
>> > > can reuse.
>> > >
>> > > Also, the shuffling needs to operate with WE_all set, which we were 
>> > > missing
>> > > before, because we are changing the layout of the data across the various
>> > > channels. Otherwise we will run into problems in non-uniform control-flow
>> > > scenarios.
>> > > ---
>> > >  src/mesa/drivers/dri/i965/brw_fs.cpp | 95 
>> > > +---
>> > >  src/mesa/drivers/dri/i965/brw_fs.h   |  5 ++
>> > >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 46 ++--
>> > >  3 files changed, 73 insertions(+), 73 deletions(-)
>> > >
>> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
>> > > b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > > index dff13ea..709e4b8 100644
>> > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > > @@ -216,39 +216,8 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const 
>> > > fs_builder ,
>> > >  
>> > > vec4_result.type = dst.type;
>> > >  
>> > > -   /* Our VARYING_PULL_CONSTANT_LOAD reads a vector of 32-bit elements. 
>> > > If we
>> > > -* are reading doubles this means that we get this:
>> > > -*
>> > > -*  r0: x0 x0 x0 x0 x0 x0 x0 x0
>> > > -*  r1: x1 x1 x1 x1 x1 x1 x1 x1
>> > > -*  r2: y0 y0 y0 y0 y0 y0 y0 y0
>> > > -*  r3: y1 y1 y1 y1 y1 y1 y1 y1
>> > > -*
>> > > -* Fix this up so we return valid double elements:
>> > > -*
>> > > -*  r0: x0 x1 x0 x1 x0 x1 x0 x1
>> > > -*  r1: x0 x1 x0 x1 x0 x1 x0 x1
>> > > -*  r2: y0 y1 y0 y1 y0 y1 y0 y1
>> > > -*  r3: y0 y1 y0 y1 y0 y1 y0 y1
>> > > -*/
>> > > -   if (type_sz(dst.type) == 8) {
>> > > -  int multiplier = bld.dispatch_width() / 8;
>> > > -  fs_reg fixed_res =
>> > > - fs_reg(VGRF, alloc.allocate(2 * multiplier), 
>> > > BRW_REGISTER_TYPE_F);
>> > > -  /* We only have 2 doubles in a 32-bit vec4 */
>> > > -  for (int i = 0; i < 2; i++) {
>> > > - fs_reg vec4_float =
>> > > -horiz_offset(retype(vec4_result, BRW_REGISTER_TYPE_F),
>> > > - multiplier * 16 * i);
>> > > -
>> > > - bld.MOV(stride(fixed_res, 2), vec4_float);
>> > > - bld.MOV(stride(horiz_offset(fixed_res, 1), 2),
>> > > - horiz_offset(vec4_float, 8 * multiplier));
>> > > -
>> > > - bld.MOV(horiz_offset(vec4_result, multiplier * 8 * i),
>> > > - retype(fixed_res, BRW_REGISTER_TYPE_DF));
>> > > -  }
>> > > -   }
>> > > +   if (type_sz(dst.type) == 8)
>> > > +  SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA(bld, vec4_result, 
>> > > vec4_result, 2);
>> > >  
>> > > int type_slots = MAX2(type_sz(dst.type) / 4, 1);
>> > > bld.MOV(dst, offset(vec4_result, bld,
>> > > @@ -256,6 +225,66 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const 
>> > > fs_builder ,
>> > >  }
>> > >  
>> > >  /**
>> > > + * This helper takes the result of a load operation that reads 32-bit 
>> > > elements
>> > > + * in this format:
>> > > + *
>> > > + * x x x x x x x x
>> > > + * y y y y y y y y
>> > > + * z z z z z z z z
>> > > + * w w w w w w w w
>> > > + *
>> > > + * and shuffles the data to get this:
>> > > + *
>> > > + * x y x y x y x y
>> > > + * x y x y x y x y
>> > > + * z w z w z w z w
>> > > + * z w z w z w z w
>> > > + *
>> > > + * Which is exactly what we want if the load is reading 64-bit 
>> > > components
>> > > + * like doubles, where x represents the low 32-bit of the x double 
>> > > component
>> > > + * and y represents the high 32-bit of the x double component (likewise 
>> > > with
>> > > + * z and w for double component y). The parameter @components represents
>> > > + * the number of 64-bit components present in @src. This would 
>> > > typically be
>> > > + * 2 at most, since we can only fit 2 double elements in the result of a
>> > > + * vec4 load.
>> > > + *
>> > > + * Notice that @dst and @src can be the same register.
>> > > + */
>> > > +void
>> > > +fs_visitor::SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA(const fs_builder 
>> > > ,
>> > 
>> > I don't see any reason to make this an fs_visitor method.  Declare this
>> > as a static function local to brw_fs_nir.cpp what should improve
>> > encapsulation and reduce the amount of boilerplate.  Also please don't
>> > write it in capitals unless you want people to shout the name of your
>> > function while discussing out loud about it. ;)
>> 
>> I know, I saw that we also had VARYING_PULL_CONSTANT_LOAD and figured
>> that maybe that was a style thing for certain helpers in the

[Mesa-dev] [Bug 95354] anv_pipeline.c:164:7: error: implicit declaration of function ‘nir_lower_outputs_to_temporaries’ [-Werror=implicit-function-declaration]

2016-05-11 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95354

Mark Janes  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Mark Janes  ---
fixed by 5886d1bad13a1c0106b7f42191bbc399fff4a0d9

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 06/23] i965/fs: fix copy/constant propagation regioning checks

2016-05-11 Thread Francisco Jerez

Iago Toral  writes:

> On Tue, 2016-05-10 at 16:53 -0700, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez  writes:
>> 
>> > From: Iago Toral Quiroga 
>> >
>> > We were not accounting for reg_suboffset in the check for the start
>> > of the region. This meant that would allow copy-propagation even if
>> > the dst wrote to sub_regoffset 4 and our source read from
>> > sub_regoffset 0, which is not correct. This was observed in fp64 code,
>> > since there we use reg_suboffset to select the high 32-bit of a double.
>> >
>> I don't think this paragraph is accurate, copy instructions with
>> non-zero destination subreg offset are currently considered partial
>> writes and should never have been added to the ACP hash table in the
>> first place.
>
> Right, I think I wrote this patch before the one where I fixed
> is_partial_write() to consider any write to subreg_offset > 0 partial. 
>
>> > Also, fs_reg::regs_read() already takes the stride into account, so we
>> > should not multiply its result by the stride again. This was making
>> > copy-propagation fail to copy-propagate cases that would otherwise be
>> > safe to copy-propagate. Again, this was observed in fp64 code, since
>> > there we use stride > 1 often.
>> >
>> > Incidentally, these fixes open up more possibilities for copy propagation
>> > which uncovered new bugs in copy-propagation. The folowing patches address
>> > each of these new issues.
>> 
>> Oh man, that sucks...
>> 
>> > ---
>> >  .../drivers/dri/i965/brw_fs_copy_propagation.cpp| 21 
>> > +
>> >  1 file changed, 13 insertions(+), 8 deletions(-)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
>> > b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > index 5fae10f..23df877 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > @@ -329,6 +329,15 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned 
>> > stride,
>> > return true;
>> >  }
>> >  
>> > +static inline bool
>> > +region_match(fs_reg src, unsigned regs_read,
>> > + fs_reg dst, unsigned regs_written)
>> 
>> How about 'region_contained_in(dst, regs_write, src, regs_read)'? (I
>> personally wouldn't mind 'region_match' but
>> 'write_region_contains_read_region' sounds a bit too long for my taste).
>> 
>> > +{
>> > +   return src.reg_offset >= dst.reg_offset &&
>> > +  (src.reg_offset + regs_read) <= (dst.reg_offset + regs_written) 
>> > &&
>> > +  src.subreg_offset >= dst.subreg_offset;
>> 
>> This works under the assumption that src.subreg_offset is strictly less
>> than the reg_offset unit -- Which *should* be the case unless we've
>> messed up that restriction in some place (we have in the past :P).  To
>> be on the safe side you could do something like following, if you like:
>> 
>> |   return (src.reg_offset * REG_SIZE + src.subreg_offset >=
>> |   dst.reg_offset * REG_SIZE + dst.subreg_offset) &&
>> |  src.reg_offset + regs_read <= dst.reg_offset + regs_written;
>
> I understand that even if we discard writes with dst.subreg_offset > 0,
> you still want the subreg_offset check here to be safe exactly in that
> scenario (since we would not need this for the case I originally wrote
> it for).
>

Yeah, it's up to you but I guess it wouldn't hurt to be extra-paranoid
here, and it would probably be sensible to add 'src.file == dst.file &&
src.nr == dst.nr && ...' to the return expresssion in addition.

>> With the above taken into account:
>> 
>> Reviewed-by: Francisco Jerez 
>
> Thanks!
>
>> > +}
>> > +
>> >  bool
>> >  fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
>> >  {
>> > @@ -351,10 +360,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int 
>> > arg, acp_entry *entry)
>> > /* Bail if inst is reading a range that isn't contained in the range
>> >  * that entry is writing.
>> >  */
>> > -   if (inst->src[arg].reg_offset < entry->dst.reg_offset ||
>> > -   (inst->src[arg].reg_offset * 32 + inst->src[arg].subreg_offset +
>> > -inst->regs_read(arg) * inst->src[arg].stride * 32) >
>> > -   (entry->dst.reg_offset + entry->regs_written) * 32)
>> > +   if (!region_match(inst->src[arg], inst->regs_read(arg),
>> > + entry->dst, entry->regs_written))
>> >return false;
>> >  
>> > /* we can't generally copy-propagate UD negations because we
>> > @@ -554,10 +561,8 @@ fs_visitor::try_constant_propagate(fs_inst *inst, 
>> > acp_entry *entry)
>> >/* Bail if inst is reading a range that isn't contained in the range
>> > * that entry is writing.
>> > */
>> > -  if (inst->src[i].reg_offset < entry->dst.reg_offset ||
>> > -  (inst->src[i].reg_offset * 32 + inst->src[i].subreg_offset +
>> > -   inst->regs_read(i) * inst->src[i].stride *

Re: [Mesa-dev] [PATCH 1/3] glx: Implement the libglvnd interface.

2016-05-11 Thread Kyle Brenneman

In the GLX dispatch functions, it should be safe to ignore a failed call 
to AddDrawableMapping. If it can't update the drawable-to-vendor 
hashtable at that point, then libGLX will just query the server when it 
needs to figure out the vendor.


In dispatch_ChooseFBConfigSGIX, if AddFBConfigsMapping fails, should it 
use free or XFree to free the memory?


-Kyle

On 05/11/2016 12:01 PM, Adam Jackson wrote:

From: Kyle Brenneman 

With reference to the libglvnd branch:

https://cgit.freedesktop.org/mesa/mesa/log/?h=libglvnd

This is a squashed commit containing all of Kyle's commits, all but two
of Emil's commits (to follow), and a small fixup from myself to mark the
rest of the glX* functions as _GLX_PUBLIC so they are not exported when
building for libglvnd. I (ajax) squashed them together both for ease of
review, and because most of the changes are un-useful intermediate
states representing the evolution of glvnd's internal API.

Co-author: Emil Velikov 
Reviewed-by: Adam Jackson 
---
  configure.ac|  49 +-
  src/glx/Makefile.am |  19 +-
  src/glx/dri_glx.c   |   4 +-
  src/glx/g_glxglvnddispatchfuncs.c   | 976 
  src/glx/g_glxglvnddispatchindices.h |  92 
  src/glx/glx_pbuffer.c   |  28 +-
  src/glx/glxclient.h |   5 +
  src/glx/glxcmds.c   |  78 +--
  src/glx/glxcurrent.c|  10 +-
  src/glx/glxglvnd.c  |  75 +++
  src/glx/glxglvnd.h  |  14 +
  src/glx/glxglvnddispatchfuncs.h |  70 +++
  12 files changed, 1356 insertions(+), 64 deletions(-)
  create mode 100644 src/glx/g_glxglvnddispatchfuncs.c
  create mode 100644 src/glx/g_glxglvnddispatchindices.h
  create mode 100644 src/glx/glxglvnd.c
  create mode 100644 src/glx/glxglvnd.h
  create mode 100644 src/glx/glxglvnddispatchfuncs.h

diff --git a/configure.ac b/configure.ac
index 023110e..7bf28f9 100644
--- a/configure.ac
+++ b/configure.ac
@@ -514,6 +514,34 @@ else
 DEFINES="$DEFINES -DNDEBUG"
  fi
  
+DEFAULT_GL_LIB_NAME=GL

+
+dnl
+dnl Libglvnd configuration
+dnl
+AC_ARG_ENABLE([libglvnd],
+[AS_HELP_STRING([--enable-libglvnd],
+[Build for libglvnd @<:@default=disabled@:>@])],
+[enable_libglvnd="$enableval"],
+[enable_libglvnd=no])
+AM_CONDITIONAL(USE_LIBGLVND_GLX, test "x$enable_libglvnd" = xyes)
+if test "x$enable_libglvnd" = xyes ; then
+dnl XXX: update once we can handle more than libGL/glx.
+dnl Namely: we should error out if neither of the glvnd enabled libraries
+dnl are built
+if test "x$enable_glx" = xno; then
+AC_MSG_ERROR([cannot build libglvnd without GLX])
+fi
+
+if test "x$enable_xlib_glx" = xyes; then
+AC_MSG_ERROR([cannot build libgvnd when Xlib-GLX is enabled])
+fi
+
+PKG_CHECK_MODULES([GLVND], libglvnd >= 0.1.0)
+DEFINES="${DEFINES} -DUSE_LIBGLVND_GLX=1"
+DEFAULT_GL_LIB_NAME=GLX_mesa
+fi
+
  dnl
  dnl Check if linker supports -Bsymbolic
  dnl
@@ -611,6 +639,23 @@ esac
  
  AM_CONDITIONAL(HAVE_COMPAT_SYMLINKS, test "x$HAVE_COMPAT_SYMLINKS" = xyes)
  
+DEFAULT_GL_LIB_NAME=GL

+
+dnl
+dnl Libglvnd configuration
+dnl
+AC_ARG_ENABLE([libglvnd],
+[AS_HELP_STRING([--enable-libglvnd],
+[Build for libglvnd @<:@default=disabled@:>@])],
+[enable_libglvnd="$enableval"],
+[enable_libglvnd=no])
+AM_CONDITIONAL(USE_LIBGLVND_GLX, test "x$enable_libglvnd" = xyes)
+#AM_COND_IF([USE_LIBGLVND_GLX], [DEFINES="${DEFINES} -DUSE_LIBGLVND_GLX=1"])
+if test "x$enable_libglvnd" = xyes ; then
+DEFINES="${DEFINES} -DUSE_LIBGLVND_GLX=1"
+DEFAULT_GL_LIB_NAME=GLX_mesa
+fi
+
  dnl
  dnl library names
  dnl
@@ -648,13 +693,13 @@ AC_ARG_WITH([gl-lib-name],
[AS_HELP_STRING([--with-gl-lib-name@<:@=NAME@:>@],
  [specify GL library name @<:@default=GL@:>@])],
[GL_LIB=$withval],
-  [GL_LIB=GL])
+  [GL_LIB="$DEFAULT_GL_LIB_NAME"])
  AC_ARG_WITH([osmesa-lib-name],
[AS_HELP_STRING([--with-osmesa-lib-name@<:@=NAME@:>@],
  [specify OSMesa library name @<:@default=OSMesa@:>@])],
[OSMESA_LIB=$withval],
[OSMESA_LIB=OSMesa])
-AS_IF([test "x$GL_LIB" = xyes], [GL_LIB=GL])
+AS_IF([test "x$GL_LIB" = xyes], [GL_LIB="$DEFAULT_GL_LIB_NAME"])
  AS_IF([test "x$OSMESA_LIB" = xyes], [OSMESA_LIB=OSMesa])
  
  dnl

diff --git a/src/glx/Makefile.am b/src/glx/Makefile.am
index d65fb81..5154a23 100644
--- a/src/glx/Makefile.am
+++ b/src/glx/Makefile.am
@@ -46,7 +46,6 @@ AM_CFLAGS = \
$(EXTRA_DEFINES_XF86VIDMODE) \
-D_REENTRANT \
-DDEFAULT_DRIVER_DIR=\"$(DRI_DRIVER_SEARCH_DIR)\" \
-   -DGL_LIB_NAME=\"lib@GL_LIB@.so.1\" \
$(DEFINES) \
$(LIBDRM_CFLAGS) \
$(DRI2PROTO_CFLAGS) \
@@ -146,6 +145,22 @@ SUBDIRS += apple
  libglx_la_LIBADD += $(builddir)/apple/libappleglx.la
  endif
  
+if USE_LIBGLVND_GLX

+AM_CFLAGS += \
+   -DGL_LIB_NAME=\"lib@GL_LIB@.so.0\" \
+

Re: [Mesa-dev] [PATCH 06/14] vl/dri3: add back buffers support

2016-05-11 Thread Axel Davy


Again another comment for the same patch:

vl_dri3_screen_texture_from_drawable seem to call dri3_get_back_buffer 
in the !is_pixmap case.


If I understand dri3_get_back_buffer will puck an idle buffer of the 
buffer list.

Is it really what is expected ?

Shouldn't vl_dri3_screen_texture_from_drawable return a texture on the 
last back buffer sent instead ?
I don't know what vl_dri3_screen_texture_from_drawable is supposed to 
do, so perhaps I'm wrong.


On 11/05/2016 20:42, Axel Davy wrote:
Another thing is that I think there is no guarantee the Xserver 
releases all the pixmaps,

and that it could keep one infinitely (until window destruction).

Thus if the drawable is changed by the user, but the previous drawable 
isn't destroyed by the user,
one buffer can stay busy forever. Change several times of drawable and 
you get stuck...


I think the loader dri3 code suffers the same issues.

For gallium nine, we use the following code:
https://github.com/iXit/wine/blob/master/dlls/d3d9-nine/dri3.c#L739
the code is a bit complicated because of thread safety, but likely you 
don't need the thread safety part for you.
Basically the idea is that when only one pixmap hasn't been released, 
you send it again with the copy flag, which garantees it will get 
released.


Axel

On 11/05/2016 20:29, Axel Davy wrote:

On 11/05/2016 17:06, Leo Liu wrote:

This implements DRI3 PixmapFromBuffer. Create buffer objects, and
associate it to a dma-buf fd, and then pass this fd with a pixmap
ID to X server for creating pixmap object; also add a function
for wait events.

Signed-off-by: Leo Liu 
---
  src/gallium/auxiliary/vl/vl_winsys_dri3.c | 187 
+-

  1 file changed, 185 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c

index ef80730..e78ca07 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -28,17 +28,35 @@
  #include 
#include 
+#include 
  #include 
  #include 
#include "loader.h"
#include "pipe/p_screen.h"
+#include "pipe/p_state.h"
  #include "pipe-loader/pipe_loader.h"
#include "util/u_memory.h"
+#include "util/u_inlines.h"
+
  #include "vl/vl_winsys.h"
  +#define BACK_BUFFER_NUM 3
+
+struct vl_dri3_buffer
+{
+   struct pipe_resource *texture;
+
+   uint32_t pixmap;
+   uint32_t sync_fence;
+   struct xshmfence *shm_fence;
+
+   bool busy;
+   uint32_t width, height, pitch;
+};
+
  struct vl_dri3_screen
  {
 struct vl_screen base;
@@ -48,9 +66,23 @@ struct vl_dri3_screen
 uint32_t width, height, depth;
   xcb_special_event_t *special_event;
+
+   struct vl_dri3_buffer *back_buffers[BACK_BUFFER_NUM];
+   int cur_back;
  };
static void
+dri3_free_back_buffer(struct vl_dri3_screen *scrn,
+struct vl_dri3_buffer *buffer)
+{
+   xcb_free_pixmap(scrn->conn, buffer->pixmap);
+   xcb_sync_destroy_fence(scrn->conn, buffer->sync_fence);
+   xshmfence_unmap_shm(buffer->shm_fence);
+   pipe_resource_reference(>texture, NULL);
+   FREE(buffer);
+}
+
+static void
  dri3_handle_present_event(struct vl_dri3_screen *scrn,
xcb_present_generic_event_t *ge)
  {
@@ -83,6 +115,145 @@ dri3_flush_present_events(struct vl_dri3_screen 
*scrn)

  }
static bool
+dri3_wait_present_events(struct vl_dri3_screen *scrn)
+{
+   if (scrn->special_event) {
+  xcb_generic_event_t *ev;
+  ev = xcb_wait_for_special_event(scrn->conn, 
scrn->special_event);

+  if (!ev)
+ return false;
+  dri3_handle_present_event(scrn, (xcb_present_generic_event_t 
*)ev);

+  return true;
+   }
+   return false;
+}
+
+static int
+dri3_find_back(struct vl_dri3_screen *scrn)
+{
+   int b;
+
+   for (;;) {
+  for (b = 0; b < BACK_BUFFER_NUM; b++) {
+ int id = (b + scrn->cur_back) % BACK_BUFFER_NUM;
+ struct vl_dri3_buffer *buffer = scrn->back_buffers[id];
+ if (!buffer || !buffer->busy)
+return id;
+  }
+  xcb_flush(scrn->conn);
+  if (!dri3_wait_present_events(scrn))
+ return -1;
+   }
+}
+
+static struct vl_dri3_buffer *
+dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
+{
+   struct vl_dri3_buffer *buffer;
+   xcb_pixmap_t pixmap;
+   xcb_sync_fence_t sync_fence;
+   struct xshmfence *shm_fence;
+   int buffer_fd, fence_fd;
+   struct pipe_resource templ;
+   struct winsys_handle whandle;
+   unsigned usage;
+
+   buffer = CALLOC_STRUCT(vl_dri3_buffer);
+   if (!buffer)
+  return NULL;
+
+   fence_fd = xshmfence_alloc_shm();
+   if (fence_fd < 0)
+  goto free_buffer;
+
+   shm_fence = xshmfence_map_shm(fence_fd);
+   if (!shm_fence)
+  goto close_fd;
+
+   memset(, 0, sizeof(templ));
+   templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
+PIPE_BIND_SCANOUT | PIPE_BIND_SHARED;
+   templ.format = PIPE_FORMAT_B8G8R8X8_UNORM;
+   templ.target = PIPE_TEXTURE_2D;
+

Re: [Mesa-dev] [PATCH 06/23] i965/fs: fix copy/constant propagation regioning checks

2016-05-11 Thread Francisco Jerez

Iago Toral  writes:

> On Tue, 2016-05-10 at 16:53 -0700, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez  writes:
>> 
>> > From: Iago Toral Quiroga 
>> >
>> > We were not accounting for reg_suboffset in the check for the start
>> > of the region. This meant that would allow copy-propagation even if
>> > the dst wrote to sub_regoffset 4 and our source read from
>> > sub_regoffset 0, which is not correct. This was observed in fp64 code,
>> > since there we use reg_suboffset to select the high 32-bit of a double.
>> >
>> I don't think this paragraph is accurate, copy instructions with
>> non-zero destination subreg offset are currently considered partial
>> writes and should never have been added to the ACP hash table in the
>> first place.
>
> Right, I think I wrote this patch before the one where I fixed
> is_partial_write() to consider any write to subreg_offset > 0 partial. 
>
In practice they would be considered partial writes already, but the
reason is somewhat obscure -- Writes with subreg_offset != 0 would
necessarily fall into three categories:
 - Strided writes (which are considered partial explicitly).
 - Contiguous writes which write less than one GRF worth of data (which
   are considered partial explicitly)
 - Contiguous writes which straddle two registers (which have been
   severely limited by the hardware historically).

Even though it's unlikely to have fixed pre-existing bugs, for the sake
of sanity I'm glad that you fixed is_partial_write() to handle the
latter case explicitly (thanks!), especially since the hardware has
slowly been lifting restrictions on the ways you can straddle multiple
registers: On Gen7 the stupid decompression behaviour that simply shifts
the second decompressed portion of the instruction by one GRF would kill
you.  Gen8 is slightly less stupid and shifts the second decompressed
portion by ExecSize/2 components which allows it to handle a subset of
straddled writes (only the cases where the written components are
balanced between the two registers).  AFAIK Gen9 should be the first to
support fully unrestricted unbalanced writes.

>> > Also, fs_reg::regs_read() already takes the stride into account, so we
>> > should not multiply its result by the stride again. This was making
>> > copy-propagation fail to copy-propagate cases that would otherwise be
>> > safe to copy-propagate. Again, this was observed in fp64 code, since
>> > there we use stride > 1 often.
>> >
>> > Incidentally, these fixes open up more possibilities for copy propagation
>> > which uncovered new bugs in copy-propagation. The folowing patches address
>> > each of these new issues.
>> 
>> Oh man, that sucks...
>> 
>> > ---
>> >  .../drivers/dri/i965/brw_fs_copy_propagation.cpp| 21 
>> > +
>> >  1 file changed, 13 insertions(+), 8 deletions(-)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
>> > b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > index 5fae10f..23df877 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > @@ -329,6 +329,15 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned 
>> > stride,
>> > return true;
>> >  }
>> >  
>> > +static inline bool
>> > +region_match(fs_reg src, unsigned regs_read,
>> > + fs_reg dst, unsigned regs_written)
>> 
>> How about 'region_contained_in(dst, regs_write, src, regs_read)'? (I
>> personally wouldn't mind 'region_match' but
>> 'write_region_contains_read_region' sounds a bit too long for my taste).
>> 
>> > +{
>> > +   return src.reg_offset >= dst.reg_offset &&
>> > +  (src.reg_offset + regs_read) <= (dst.reg_offset + regs_written) 
>> > &&
>> > +  src.subreg_offset >= dst.subreg_offset;
>> 
>> This works under the assumption that src.subreg_offset is strictly less
>> than the reg_offset unit -- Which *should* be the case unless we've
>> messed up that restriction in some place (we have in the past :P).  To
>> be on the safe side you could do something like following, if you like:
>> 
>> |   return (src.reg_offset * REG_SIZE + src.subreg_offset >=
>> |   dst.reg_offset * REG_SIZE + dst.subreg_offset) &&
>> |  src.reg_offset + regs_read <= dst.reg_offset + regs_written;
>
> I understand that even if we discard writes with dst.subreg_offset > 0,
> you still want the subreg_offset check here to be safe exactly in that
> scenario (since we would not need this for the case I originally wrote
> it for).
>
>> With the above taken into account:
>> 
>> Reviewed-by: Francisco Jerez 
>
> Thanks!
>
>> > +}
>> > +
>> >  bool
>> >  fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
>> >  {
>> > @@ -351,10 +360,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int 
>> > arg, acp_entry *entry)
>> > /* Bail if inst is reading a range that isn't contained

Re: [Mesa-dev] [PATCH 06/14] vl/dri3: add back buffers support

2016-05-11 Thread Axel Davy

Another thing is that I think there is no guarantee the Xserver releases 
all the pixmaps,

and that it could keep one infinitely (until window destruction).

Thus if the drawable is changed by the user, but the previous drawable 
isn't destroyed by the user,
one buffer can stay busy forever. Change several times of drawable and 
you get stuck...


I think the loader dri3 code suffers the same issues.

For gallium nine, we use the following code:
https://github.com/iXit/wine/blob/master/dlls/d3d9-nine/dri3.c#L739
the code is a bit complicated because of thread safety, but likely you 
don't need the thread safety part for you.
Basically the idea is that when only one pixmap hasn't been released, 
you send it again with the copy flag, which garantees it will get released.


Axel

On 11/05/2016 20:29, Axel Davy wrote:

On 11/05/2016 17:06, Leo Liu wrote:

This implements DRI3 PixmapFromBuffer. Create buffer objects, and
associate it to a dma-buf fd, and then pass this fd with a pixmap
ID to X server for creating pixmap object; also add a function
for wait events.

Signed-off-by: Leo Liu 
---
  src/gallium/auxiliary/vl/vl_winsys_dri3.c | 187 
+-

  1 file changed, 185 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c

index ef80730..e78ca07 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -28,17 +28,35 @@
  #include 
#include 
+#include 
  #include 
  #include 
#include "loader.h"
#include "pipe/p_screen.h"
+#include "pipe/p_state.h"
  #include "pipe-loader/pipe_loader.h"
#include "util/u_memory.h"
+#include "util/u_inlines.h"
+
  #include "vl/vl_winsys.h"
  +#define BACK_BUFFER_NUM 3
+
+struct vl_dri3_buffer
+{
+   struct pipe_resource *texture;
+
+   uint32_t pixmap;
+   uint32_t sync_fence;
+   struct xshmfence *shm_fence;
+
+   bool busy;
+   uint32_t width, height, pitch;
+};
+
  struct vl_dri3_screen
  {
 struct vl_screen base;
@@ -48,9 +66,23 @@ struct vl_dri3_screen
 uint32_t width, height, depth;
   xcb_special_event_t *special_event;
+
+   struct vl_dri3_buffer *back_buffers[BACK_BUFFER_NUM];
+   int cur_back;
  };
static void
+dri3_free_back_buffer(struct vl_dri3_screen *scrn,
+struct vl_dri3_buffer *buffer)
+{
+   xcb_free_pixmap(scrn->conn, buffer->pixmap);
+   xcb_sync_destroy_fence(scrn->conn, buffer->sync_fence);
+   xshmfence_unmap_shm(buffer->shm_fence);
+   pipe_resource_reference(>texture, NULL);
+   FREE(buffer);
+}
+
+static void
  dri3_handle_present_event(struct vl_dri3_screen *scrn,
xcb_present_generic_event_t *ge)
  {
@@ -83,6 +115,145 @@ dri3_flush_present_events(struct vl_dri3_screen 
*scrn)

  }
static bool
+dri3_wait_present_events(struct vl_dri3_screen *scrn)
+{
+   if (scrn->special_event) {
+  xcb_generic_event_t *ev;
+  ev = xcb_wait_for_special_event(scrn->conn, scrn->special_event);
+  if (!ev)
+ return false;
+  dri3_handle_present_event(scrn, (xcb_present_generic_event_t 
*)ev);

+  return true;
+   }
+   return false;
+}
+
+static int
+dri3_find_back(struct vl_dri3_screen *scrn)
+{
+   int b;
+
+   for (;;) {
+  for (b = 0; b < BACK_BUFFER_NUM; b++) {
+ int id = (b + scrn->cur_back) % BACK_BUFFER_NUM;
+ struct vl_dri3_buffer *buffer = scrn->back_buffers[id];
+ if (!buffer || !buffer->busy)
+return id;
+  }
+  xcb_flush(scrn->conn);
+  if (!dri3_wait_present_events(scrn))
+ return -1;
+   }
+}
+
+static struct vl_dri3_buffer *
+dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
+{
+   struct vl_dri3_buffer *buffer;
+   xcb_pixmap_t pixmap;
+   xcb_sync_fence_t sync_fence;
+   struct xshmfence *shm_fence;
+   int buffer_fd, fence_fd;
+   struct pipe_resource templ;
+   struct winsys_handle whandle;
+   unsigned usage;
+
+   buffer = CALLOC_STRUCT(vl_dri3_buffer);
+   if (!buffer)
+  return NULL;
+
+   fence_fd = xshmfence_alloc_shm();
+   if (fence_fd < 0)
+  goto free_buffer;
+
+   shm_fence = xshmfence_map_shm(fence_fd);
+   if (!shm_fence)
+  goto close_fd;
+
+   memset(, 0, sizeof(templ));
+   templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
+PIPE_BIND_SCANOUT | PIPE_BIND_SHARED;
+   templ.format = PIPE_FORMAT_B8G8R8X8_UNORM;
+   templ.target = PIPE_TEXTURE_2D;
+   templ.last_level = 0;
+   templ.width0 = scrn->width;
+   templ.height0 = scrn->height;
+   templ.depth0 = 1;
+   templ.array_size = 1;
+   buffer->texture = 
scrn->base.pscreen->resource_create(scrn->base.pscreen,

+ );
+   if (!buffer->texture)
+  goto unmap_shm;
+
+   memset(, 0, sizeof(whandle));
+   whandle.type= DRM_API_HANDLE_TYPE_FD;
+   usage = PIPE_HANDLE_USAGE_EXPLICIT_FLUSH | PIPE_HANDLE_USAGE_READ;

Here using

PIPE_HANDLE_USAGE_EXPLICIT_FLUSH

 is wrong. Both vaapi and vdpau don't call

Re: [Mesa-dev] [PATCH 00/28] i965/blorp: Use NIR for compiling shaders

2016-05-11 Thread Jason Ekstrand

On Wed, May 11, 2016 at 9:28 AM, Pohjolainen, Topi <
topi.pohjolai...@intel.com> wrote:

> On Tue, May 10, 2016 at 04:16:20PM -0700, Jason Ekstrand wrote:
> > When Paul originally wrote blorp he hand-rolled a shader builder that
> > builds i965 shaders directly.  This has caused headaches because every
> time
> > we make a change to the back-end compiler, we have to update blorp.  NIR
> on
> > the other hand tends to be more stable at this point since it has many
> > different users all across mesa.
> >
> > Using NIR also means that we get decent optimizations, register
> allocation,
> > and scheduling.  The original blorp codegen code tried fairly hard to
> emit
> > reasonably efficient code in that it didn't do more work than needed but
> it
> > was fairly naieve when it came to register allocation and scheduling.
> > Using the full compiler stack also means that we get new features for
> free
> > without having to re-implement them in blorp.  On Sky Lake, for instance,
> > we are now generating shaders with sampler-EOT.
> >
> > In spite of all this, this series shows no measurable performance
> > difference on Haswell with every benchmark in sixonyx run 25 times.
> >
> > Jason Ekstrand (28):
> >   nir: Add an info bit for uses_sample_qualifier
> >   i965/fs: Rework the persample shading key/prog_data bits
> >   i965/state: Clean up WM/PS state to pull more things out of prog_data
> >   i965/fs: Clean up the logic in compile_fs a bit
> >   i965/fs: Stop setting dispatch_grf_start_reg from the visitor
> >   i965/gen7_wm: Move where we set the fast clear op
> >   i965/fs: Organize prog_data by ksp number rather than SIMD width
> >   i965/blorp: Simplify the sample layout calculation
> >   i965/fs: Use MRF0 for the repclear message
> >   nir/builder: Generate the alu helpers directly in python
> >   nir/builder: Add a helper for grabbing multiple channels from an ssa
> > def
> >   nir: Add texture opcodes and source types for multisample compression
> >   i965/fs: Implement the new NIR MCS texturing
> >   i965/blorp: Add a prog_data_init helper
> >   i965/blorp: Add a param array to prog_data
> >   blorp: Add initial state setup support for SIMD8 dispatch
> >   i965/blorp: Add a helper for compiling NIR shaders
> >   i965/blorp: Create the program key in get_clear_kernel
> >   i965/blorp: Use NIR for clear shaders
> >   i965/blorp: Refactor getting the blit kernel into a helper
>
> I had a few questions but 14-20 are:
>
> Reviewed-by: Topi Pohjolainen 
>

Thanks
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 29/28] i965/blorp: Get rid of the blorp_prog_data_int() helper

2016-05-11 Thread Jason Ekstrand

The helper was initially created to allow us to set reasonable defaults as
we mutated the brw_blorp_prog_data structure in preparation for NIR.  Now
that everything is going through brw_blorp_compile_nir_shader() which fully
fills out the brw_blorp_prog_data structure, we don't need the helper.
---
 src/mesa/drivers/dri/i965/brw_blorp.c | 16 
 src/mesa/drivers/dri/i965/brw_blorp.h |  2 --
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp  |  8 ++--
 src/mesa/drivers/dri/i965/brw_blorp_clear.cpp |  2 --
 4 files changed, 2 insertions(+), 26 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
b/src/mesa/drivers/dri/i965/brw_blorp.c
index 161fb90..626a750 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -137,22 +137,6 @@ brw_blorp_compute_tile_offsets(const struct 
brw_blorp_surface_info *info,
 
 
 void
-brw_blorp_prog_data_init(struct brw_blorp_prog_data *prog_data)
-{
-   prog_data->dispatch_8 = false;
-   prog_data->dispatch_16 = true;
-   prog_data->first_curbe_grf_0 = 0;
-   prog_data->first_curbe_grf_2 = 0;
-   prog_data->ksp_offset_2 = 0;
-   prog_data->persample_msaa_dispatch = false;
-
-   prog_data->nr_params = BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS;
-   for (unsigned i = 0; i < BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS; i++)
-  prog_data->param[i] = i;
-}
-
-
-void
 brw_blorp_params_init(struct brw_blorp_params *params)
 {
memset(params, 0, sizeof(*params));
diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h 
b/src/mesa/drivers/dri/i965/brw_blorp.h
index 51e7975..9d71ca4 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.h
+++ b/src/mesa/drivers/dri/i965/brw_blorp.h
@@ -234,8 +234,6 @@ struct brw_blorp_prog_data
uint8_t param[BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS];
 };
 
-void brw_blorp_prog_data_init(struct brw_blorp_prog_data *prog_data);
-
 struct brw_blorp_params
 {
uint32_t x0;
diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index 314034e..455330f 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -1267,8 +1267,7 @@ blorp_nir_manual_blend_bilinear(nir_builder *b, 
nir_ssa_def *pos,
  */
 static nir_shader *
 brw_blorp_build_nir_shader(struct brw_context *brw,
-   const brw_blorp_blit_prog_key *key,
-   struct brw_blorp_prog_data *prog_data)
+   const brw_blorp_blit_prog_key *key)
 {
nir_ssa_def *src_pos, *dst_pos, *color;
 
@@ -1312,9 +1311,6 @@ brw_blorp_build_nir_shader(struct brw_context *brw,
assert((key->dst_layout == INTEL_MSAA_LAYOUT_NONE) ==
   (key->dst_samples == 0));
 
-   /* Set up prog_data */
-   brw_blorp_prog_data_init(prog_data);
-
nir_builder b;
nir_builder_init_simple_shader(, NULL, MESA_SHADER_FRAGMENT, NULL);
 
@@ -1467,7 +1463,7 @@ brw_blorp_get_blit_kernel(struct brw_context *brw,
/* Try and compile with NIR first.  If that fails, fall back to the old
 * method of building shaders manually.
 */
-   nir_shader *nir = brw_blorp_build_nir_shader(brw, prog_key, _data);
+   nir_shader *nir = brw_blorp_build_nir_shader(brw, prog_key);
struct brw_wm_prog_key wm_key;
brw_blorp_init_wm_prog_key(_key);
wm_key.tex.compressed_multisample_layout_mask =
diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
index 3925d28..fe02301 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
@@ -82,8 +82,6 @@ brw_blorp_params_get_clear_kernel(struct brw_context *brw,
brw_blorp_init_wm_prog_key(_key);
 
struct brw_blorp_prog_data prog_data;
-   brw_blorp_prog_data_init(_data);
-
unsigned program_size;
const unsigned *program =
   brw_blorp_compile_nir_shader(brw, b.shader, _key, use_replicated_data,
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 06/14] vl/dri3: add back buffers support

2016-05-11 Thread Axel Davy


On 11/05/2016 17:06, Leo Liu wrote:

This implements DRI3 PixmapFromBuffer. Create buffer objects, and
associate it to a dma-buf fd, and then pass this fd with a pixmap
ID to X server for creating pixmap object; also add a function
for wait events.

Signed-off-by: Leo Liu 
---
  src/gallium/auxiliary/vl/vl_winsys_dri3.c | 187 +-
  1 file changed, 185 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
index ef80730..e78ca07 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -28,17 +28,35 @@
  #include 
  
  #include 

+#include 
  #include 
  #include 
  
  #include "loader.h"
  
  #include "pipe/p_screen.h"

+#include "pipe/p_state.h"
  #include "pipe-loader/pipe_loader.h"
  
  #include "util/u_memory.h"

+#include "util/u_inlines.h"
+
  #include "vl/vl_winsys.h"
  
+#define BACK_BUFFER_NUM 3

+
+struct vl_dri3_buffer
+{
+   struct pipe_resource *texture;
+
+   uint32_t pixmap;
+   uint32_t sync_fence;
+   struct xshmfence *shm_fence;
+
+   bool busy;
+   uint32_t width, height, pitch;
+};
+
  struct vl_dri3_screen
  {
 struct vl_screen base;
@@ -48,9 +66,23 @@ struct vl_dri3_screen
 uint32_t width, height, depth;
  
 xcb_special_event_t *special_event;

+
+   struct vl_dri3_buffer *back_buffers[BACK_BUFFER_NUM];
+   int cur_back;
  };
  
  static void

+dri3_free_back_buffer(struct vl_dri3_screen *scrn,
+struct vl_dri3_buffer *buffer)
+{
+   xcb_free_pixmap(scrn->conn, buffer->pixmap);
+   xcb_sync_destroy_fence(scrn->conn, buffer->sync_fence);
+   xshmfence_unmap_shm(buffer->shm_fence);
+   pipe_resource_reference(>texture, NULL);
+   FREE(buffer);
+}
+
+static void
  dri3_handle_present_event(struct vl_dri3_screen *scrn,
xcb_present_generic_event_t *ge)
  {
@@ -83,6 +115,145 @@ dri3_flush_present_events(struct vl_dri3_screen *scrn)
  }
  
  static bool

+dri3_wait_present_events(struct vl_dri3_screen *scrn)
+{
+   if (scrn->special_event) {
+  xcb_generic_event_t *ev;
+  ev = xcb_wait_for_special_event(scrn->conn, scrn->special_event);
+  if (!ev)
+ return false;
+  dri3_handle_present_event(scrn, (xcb_present_generic_event_t *)ev);
+  return true;
+   }
+   return false;
+}
+
+static int
+dri3_find_back(struct vl_dri3_screen *scrn)
+{
+   int b;
+
+   for (;;) {
+  for (b = 0; b < BACK_BUFFER_NUM; b++) {
+ int id = (b + scrn->cur_back) % BACK_BUFFER_NUM;
+ struct vl_dri3_buffer *buffer = scrn->back_buffers[id];
+ if (!buffer || !buffer->busy)
+return id;
+  }
+  xcb_flush(scrn->conn);
+  if (!dri3_wait_present_events(scrn))
+ return -1;
+   }
+}
+
+static struct vl_dri3_buffer *
+dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
+{
+   struct vl_dri3_buffer *buffer;
+   xcb_pixmap_t pixmap;
+   xcb_sync_fence_t sync_fence;
+   struct xshmfence *shm_fence;
+   int buffer_fd, fence_fd;
+   struct pipe_resource templ;
+   struct winsys_handle whandle;
+   unsigned usage;
+
+   buffer = CALLOC_STRUCT(vl_dri3_buffer);
+   if (!buffer)
+  return NULL;
+
+   fence_fd = xshmfence_alloc_shm();
+   if (fence_fd < 0)
+  goto free_buffer;
+
+   shm_fence = xshmfence_map_shm(fence_fd);
+   if (!shm_fence)
+  goto close_fd;
+
+   memset(, 0, sizeof(templ));
+   templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
+PIPE_BIND_SCANOUT | PIPE_BIND_SHARED;
+   templ.format = PIPE_FORMAT_B8G8R8X8_UNORM;
+   templ.target = PIPE_TEXTURE_2D;
+   templ.last_level = 0;
+   templ.width0 = scrn->width;
+   templ.height0 = scrn->height;
+   templ.depth0 = 1;
+   templ.array_size = 1;
+   buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
+ );
+   if (!buffer->texture)
+  goto unmap_shm;
+
+   memset(, 0, sizeof(whandle));
+   whandle.type= DRM_API_HANDLE_TYPE_FD;
+   usage = PIPE_HANDLE_USAGE_EXPLICIT_FLUSH | PIPE_HANDLE_USAGE_READ;

Here using

PIPE_HANDLE_USAGE_EXPLICIT_FLUSH

 is wrong. Both vaapi and vdpau don't call flush_resource.

Perhaps vaapi and vdpau can get fixed to call it, I don't know which is best
for the target usage of the resources.

+   scrn->base.pscreen->resource_get_handle(scrn->base.pscreen,
+   buffer->texture, ,
+   usage);
+   buffer_fd = whandle.handle;
+   buffer->pitch = whandle.stride;
+   xcb_dri3_pixmap_from_buffer(scrn->conn,
+   (pixmap = xcb_generate_id(scrn->conn)),
+   scrn->drawable,
+   0,
+   scrn->width, scrn->height, buffer->pitch,
+   scrn->depth, 32,
+   buffer_fd);
+

Re: [Mesa-dev] [PATCH 00/14] vl dri3 support for vaapi and vdpau

2016-05-11 Thread Christian König


Am 11.05.2016 um 17:06 schrieb Leo Liu:

This series implement DRI3 supports for VA-API and VDPAU. It implements
supports for DRI3 Open, PixmapFromBuffer, BufferFromPixmap, and for
PRESENT including PresentPixmap, PresentNotifyMSC, PresentIdleNotify,
PresentConfigureNotify and PresentCompleteNotify.

It has been tested with player mpv and vlc with various clips from
480p to 4K with framerate from 24 to 60. Also includes window mode
and fullscreen w/wo compositing manager. The test also includes VA-API
glx extension.

There's still some future work like DRI_PRIME different GPU support
to be added.

Leo Liu (14):
   vl: add DRI3 support infrastructure
   vl/dri3: implement dri3 screen create and destroy
   vl/dri3: set drawable geometry
   vl/dri3: register present events
   vl/dri3: implement flushing for queued events
   vl/dri3: add back buffers support
   vl/dri3: implement function for flush frontbuffer
   vl/dri3: implement funciton for get dirty area
   vl/dri3: add support for resizing
   vl/dri3: implement DRI3 BufferFromPixmap
   st/va: add dri3 support
   vl/dri3: handle PresentCompleteNotify event
   vl/dri3: implement functions for get and set timestamp
   st/vdpau: add dri3 support


Very nice work, for the series Reviewed-by: Christian König 



We could clean up the implementation in VDPAU a bit now as well, but 
that is something for a follow up patch set.


Regards,
Christian.



  configure.ac  |   7 +-
  src/gallium/auxiliary/Makefile.sources|   5 +
  src/gallium/auxiliary/vl/vl_winsys.h  |   5 +
  src/gallium/auxiliary/vl/vl_winsys_dri3.c | 703 ++
  src/gallium/state_trackers/va/context.c   |   6 +-
  src/gallium/state_trackers/vdpau/device.c |   6 +-
  6 files changed, 729 insertions(+), 3 deletions(-)
  create mode 100644 src/gallium/auxiliary/vl/vl_winsys_dri3.c



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 15/28] i965/blorp: Add a param array to prog_data

2016-05-11 Thread Jason Ekstrand

On Wed, May 11, 2016 at 8:39 AM, Pohjolainen, Topi <
topi.pohjolai...@intel.com> wrote:

> On Wed, May 11, 2016 at 07:46:33AM -0700, Jason Ekstrand wrote:
> >On May 11, 2016 7:45 AM, "Jason Ekstrand" <[1]ja...@jlekstrand.net>
> >wrote:
> >>
> >>
> >> On May 10, 2016 11:53 PM, "Pohjolainen, Topi"
> ><[2]topi.pohjolai...@intel.com> wrote:
> >> >
> >> > On Tue, May 10, 2016 at 04:16:35PM -0700, Jason Ekstrand wrote:
> >> > > This array allows the push constants to be re-arranged on
> >upload.  The
> >> > > actual arrangement will, eventually, come from the back-end
> >compiler.
> >> > > ---
> >> > >  src/mesa/drivers/dri/i965/brw_blorp.c  |  4 
> >> > >  src/mesa/drivers/dri/i965/brw_blorp.h  |  6 ++
> >> > >  src/mesa/drivers/dri/i965/gen6_blorp.c | 12 +++-
> >> > >  3 files changed, 17 insertions(+), 5 deletions(-)
> >> > >
> >> > > diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c
> >b/src/mesa/drivers/dri/i965/brw_blorp.c
> >> > > index 4bbe45f..1379804 100644
> >> > > --- a/src/mesa/drivers/dri/i965/brw_blorp.c
> >> > > +++ b/src/mesa/drivers/dri/i965/brw_blorp.c
> >> > > @@ -139,6 +139,10 @@ brw_blorp_prog_data_init(struct
> >brw_blorp_prog_data *prog_data)
> >> > >  {
> >> > > prog_data->first_curbe_grf = 0;
> >> > > prog_data->persample_msaa_dispatch = false;
> >> > > +
> >> > > +   prog_data->nr_params = BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS;
> >> > > +   for (unsigned i = 0; i < BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS;
> >i++)
> >> > > +  prog_data->param[i] = i;
> >> > >  }
> >> > >
> >> > >
> >> > > diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h
> >b/src/mesa/drivers/dri/i965/brw_blorp.h
> >> > > index 4a0e46e..c2f33a1 100644
> >> > > --- a/src/mesa/drivers/dri/i965/brw_blorp.h
> >> > > +++ b/src/mesa/drivers/dri/i965/brw_blorp.h
> >> > > @@ -199,6 +199,9 @@ struct brw_blorp_wm_push_constants
> >> > > uint32_t pad[5];
> >> > >  };
> >> > >
> >> > > +#define BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS \
> >> > > +   (sizeof(struct brw_blorp_wm_push_constants) / 4)
> >> > > +
> >> > >  /* Every 32 bytes of push constant data constitutes one GEN
> >register. */
> >> > >  static const unsigned int BRW_BLORP_NUM_PUSH_CONST_REGS =
> >> > > sizeof(struct brw_blorp_wm_push_constants) / 32;
> >> > > @@ -212,6 +215,9 @@ struct brw_blorp_prog_data
> >> > >  * than one sample per pixel.
> >> > >  */
> >> > > bool persample_msaa_dispatch;
> >> > > +
> >> > > +   uint8_t nr_params;
> >> > > +   uint8_t param[BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS];
> >> >
> >> > Do I read this correctly: this corresponds to push_contant_loc in
> >the scalar
> >> > backend?
> >>
> >> Sort-of.  The mapping actually goes in the other direction:  From
> >location to uniform number.
> >
> >Really, it's just a simplified version of peog_data->param.
>
> Right. Could we add some description, "param" doesn't tell much, does it?
> For example,
>
>  /* Compiler will re-arrange push constants and store the upload order
>   * here. Given an index 'i' in the final upload buffer, param[i] gives
>   * the index in the uniform store. In other words, the value to be
>   * uploaded can be found in
> brw_blorp_params::wm_push_consts[param[i]].
>

I added basically that exact comment.  Thanks!


>   */
>  uint8_t param[BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS];
>
> >
> >> > >  };
> >> > >
> >> > >  void brw_blorp_prog_data_init(struct brw_blorp_prog_data
> >*prog_data);
> >> > > diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.c
> >b/src/mesa/drivers/dri/i965/gen6_blorp.c
> >> > > index 1955811..950e2b9 100644
> >> > > --- a/src/mesa/drivers/dri/i965/gen6_blorp.c
> >> > > +++ b/src/mesa/drivers/dri/i965/gen6_blorp.c
> >> > > @@ -308,11 +308,13 @@ gen6_blorp_emit_wm_constants(struct
> >brw_context *brw,
> >> > >  {
> >> > > uint32_t wm_push_const_offset;
> >> > >
> >> > > -   void *constants = brw_state_batch(brw,
> >AUB_TRACE_WM_CONSTANTS,
> >> > > -
> >sizeof(params->wm_push_consts),
> >> > > - 32,
> _push_const_offset);
> >> > > -   memcpy(constants, >wm_push_consts,
> >> > > -  sizeof(params->wm_push_consts));
> >> > > +   uint32_t *constants = brw_state_batch(brw,
> >AUB_TRACE_WM_CONSTANTS,
> >> > > +
> >sizeof(params->wm_push_consts),
> >> > > + 32,
> >_push_const_offset);
> >> > > +
> >> > > +   uint32_t *push_consts = (uint32_t *)>wm_push_consts;
>

I also made the const change you suggested here.


> >> > > +   for (unsigned i = 0; i < params->wm_prog_data->nr_params;
> >i++)
> >> > > +  constants[i] =
> >

Re: [Mesa-dev] [PATCH 17/28] i965/blorp: Add a helper for compiling NIR shaders

2016-05-11 Thread Pohjolainen, Topi

On Tue, May 10, 2016 at 04:16:37PM -0700, Jason Ekstrand wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_blorp.c | 95 
> +++
>  src/mesa/drivers/dri/i965/brw_blorp.h | 10 
>  2 files changed, 105 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
> b/src/mesa/drivers/dri/i965/brw_blorp.c
> index 6c3b83a..161fb90 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.c
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.c
> @@ -26,6 +26,8 @@
>  #include "intel_fbo.h"
>  
>  #include "brw_blorp.h"
> +#include "brw_compiler.h"
> +#include "brw_nir.h"
>  #include "brw_state.h"
>  
>  #define FILE_DEBUG_FLAG DEBUG_BLORP
> @@ -161,6 +163,99 @@ brw_blorp_params_init(struct brw_blorp_params *params)
> params->num_layers = 1;
>  }
>  
> +void
> +brw_blorp_init_wm_prog_key(struct brw_wm_prog_key *wm_key)
> +{
> +   memset(wm_key, 0, sizeof(*wm_key));
> +   wm_key->nr_color_regions = 1;
> +   for (int i = 0; i < MAX_SAMPLERS; i++)

Could be unsigned.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] glx: Implement the libglvnd interface.

2016-05-11 Thread Adam Jackson

From: Kyle Brenneman 

With reference to the libglvnd branch:

https://cgit.freedesktop.org/mesa/mesa/log/?h=libglvnd

This is a squashed commit containing all of Kyle's commits, all but two
of Emil's commits (to follow), and a small fixup from myself to mark the
rest of the glX* functions as _GLX_PUBLIC so they are not exported when
building for libglvnd. I (ajax) squashed them together both for ease of
review, and because most of the changes are un-useful intermediate
states representing the evolution of glvnd's internal API.

Co-author: Emil Velikov 
Reviewed-by: Adam Jackson 
---
 configure.ac|  49 +-
 src/glx/Makefile.am |  19 +-
 src/glx/dri_glx.c   |   4 +-
 src/glx/g_glxglvnddispatchfuncs.c   | 976 
 src/glx/g_glxglvnddispatchindices.h |  92 
 src/glx/glx_pbuffer.c   |  28 +-
 src/glx/glxclient.h |   5 +
 src/glx/glxcmds.c   |  78 +--
 src/glx/glxcurrent.c|  10 +-
 src/glx/glxglvnd.c  |  75 +++
 src/glx/glxglvnd.h  |  14 +
 src/glx/glxglvnddispatchfuncs.h |  70 +++
 12 files changed, 1356 insertions(+), 64 deletions(-)
 create mode 100644 src/glx/g_glxglvnddispatchfuncs.c
 create mode 100644 src/glx/g_glxglvnddispatchindices.h
 create mode 100644 src/glx/glxglvnd.c
 create mode 100644 src/glx/glxglvnd.h
 create mode 100644 src/glx/glxglvnddispatchfuncs.h

diff --git a/configure.ac b/configure.ac
index 023110e..7bf28f9 100644
--- a/configure.ac
+++ b/configure.ac
@@ -514,6 +514,34 @@ else
DEFINES="$DEFINES -DNDEBUG"
 fi
 
+DEFAULT_GL_LIB_NAME=GL
+
+dnl
+dnl Libglvnd configuration
+dnl
+AC_ARG_ENABLE([libglvnd],
+[AS_HELP_STRING([--enable-libglvnd],
+[Build for libglvnd @<:@default=disabled@:>@])],
+[enable_libglvnd="$enableval"],
+[enable_libglvnd=no])
+AM_CONDITIONAL(USE_LIBGLVND_GLX, test "x$enable_libglvnd" = xyes)
+if test "x$enable_libglvnd" = xyes ; then
+dnl XXX: update once we can handle more than libGL/glx.
+dnl Namely: we should error out if neither of the glvnd enabled libraries
+dnl are built
+if test "x$enable_glx" = xno; then
+AC_MSG_ERROR([cannot build libglvnd without GLX])
+fi
+
+if test "x$enable_xlib_glx" = xyes; then
+AC_MSG_ERROR([cannot build libgvnd when Xlib-GLX is enabled])
+fi
+
+PKG_CHECK_MODULES([GLVND], libglvnd >= 0.1.0)
+DEFINES="${DEFINES} -DUSE_LIBGLVND_GLX=1"
+DEFAULT_GL_LIB_NAME=GLX_mesa
+fi
+
 dnl
 dnl Check if linker supports -Bsymbolic
 dnl
@@ -611,6 +639,23 @@ esac
 
 AM_CONDITIONAL(HAVE_COMPAT_SYMLINKS, test "x$HAVE_COMPAT_SYMLINKS" = xyes)
 
+DEFAULT_GL_LIB_NAME=GL
+
+dnl
+dnl Libglvnd configuration
+dnl
+AC_ARG_ENABLE([libglvnd],
+[AS_HELP_STRING([--enable-libglvnd],
+[Build for libglvnd @<:@default=disabled@:>@])],
+[enable_libglvnd="$enableval"],
+[enable_libglvnd=no])
+AM_CONDITIONAL(USE_LIBGLVND_GLX, test "x$enable_libglvnd" = xyes)
+#AM_COND_IF([USE_LIBGLVND_GLX], [DEFINES="${DEFINES} -DUSE_LIBGLVND_GLX=1"])
+if test "x$enable_libglvnd" = xyes ; then
+DEFINES="${DEFINES} -DUSE_LIBGLVND_GLX=1"
+DEFAULT_GL_LIB_NAME=GLX_mesa
+fi
+
 dnl
 dnl library names
 dnl
@@ -648,13 +693,13 @@ AC_ARG_WITH([gl-lib-name],
   [AS_HELP_STRING([--with-gl-lib-name@<:@=NAME@:>@],
 [specify GL library name @<:@default=GL@:>@])],
   [GL_LIB=$withval],
-  [GL_LIB=GL])
+  [GL_LIB="$DEFAULT_GL_LIB_NAME"])
 AC_ARG_WITH([osmesa-lib-name],
   [AS_HELP_STRING([--with-osmesa-lib-name@<:@=NAME@:>@],
 [specify OSMesa library name @<:@default=OSMesa@:>@])],
   [OSMESA_LIB=$withval],
   [OSMESA_LIB=OSMesa])
-AS_IF([test "x$GL_LIB" = xyes], [GL_LIB=GL])
+AS_IF([test "x$GL_LIB" = xyes], [GL_LIB="$DEFAULT_GL_LIB_NAME"])
 AS_IF([test "x$OSMESA_LIB" = xyes], [OSMESA_LIB=OSMesa])
 
 dnl
diff --git a/src/glx/Makefile.am b/src/glx/Makefile.am
index d65fb81..5154a23 100644
--- a/src/glx/Makefile.am
+++ b/src/glx/Makefile.am
@@ -46,7 +46,6 @@ AM_CFLAGS = \
$(EXTRA_DEFINES_XF86VIDMODE) \
-D_REENTRANT \
-DDEFAULT_DRIVER_DIR=\"$(DRI_DRIVER_SEARCH_DIR)\" \
-   -DGL_LIB_NAME=\"lib@GL_LIB@.so.1\" \
$(DEFINES) \
$(LIBDRM_CFLAGS) \
$(DRI2PROTO_CFLAGS) \
@@ -146,6 +145,22 @@ SUBDIRS += apple
 libglx_la_LIBADD += $(builddir)/apple/libappleglx.la
 endif
 
+if USE_LIBGLVND_GLX
+AM_CFLAGS += \
+   -DGL_LIB_NAME=\"lib@GL_LIB@.so.0\" \
+   $(GLVND_CFLAGS)
+
+libglx_la_SOURCES += \
+  glxglvnd.c \
+  g_glxglvnddispatchfuncs.c
+
+GL_LIB_VERSION=0
+else
+AM_CFLAGS += \
+   -DGL_LIB_NAME=\"lib@GL_LIB@.so.1\"
+GL_LIB_VERSION=1:2
+endif
+
 GL_LIBS = \
libglx.la \
$(top_builddir)/src/mapi/glapi/libglapi.la \
@@ -154,7 +169,7 @@ GL_LIBS = \
 
 GL_LDFLAGS = \
-no-undefined \
-   -version-number 1:2 \
+   -version-number $(GL_LIB_VERSION) \

[Mesa-dev] [PATCH 3/3] glx/glvnd: rework dispatch functions/indices tables lookup

2016-05-11 Thread Adam Jackson

From: Emil Velikov 

Rather than checking if the function name maps to a valid entry in the
respective table, just create a dummy entry at the end of each table.

This allows us to remove some unnessesary "index >= 0" checks, which get
executed quite often.

Reviewed-by: Adam Jackson 
Signed-off-by: Emil Velikov 
---
 src/glx/g_glxglvnddispatchfuncs.c |  7 +--
 src/glx/glxglvnd.c| 19 ---
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/src/glx/g_glxglvnddispatchfuncs.c 
b/src/glx/g_glxglvnddispatchfuncs.c
index 13fbc5e..72f0f68 100644
--- a/src/glx/g_glxglvnddispatchfuncs.c
+++ b/src/glx/g_glxglvnddispatchfuncs.c
@@ -9,7 +9,8 @@
 #include "g_glxglvnddispatchindices.h"
 
 const int DI_FUNCTION_COUNT = DI_LAST_INDEX;
-int __glXDispatchTableIndices[DI_LAST_INDEX];
+/* Allocate an extra 'dummy' to ease lookup. See FindGLXFunction() */
+int __glXDispatchTableIndices[DI_LAST_INDEX + 1];
 const __GLXapiExports *__glXGLVNDAPIExports;
 
 const char * const __glXDispatchTableStrings[DI_LAST_INDEX] = {
@@ -922,7 +923,8 @@ static Bool dispatch_glXWaitForSbcOML(Display *dpy, 
GLXDrawable drawable,
 #undef __FETCH_FUNCTION_PTR
 
 
-const void * const __glXDispatchFunctions[DI_LAST_INDEX] = {
+/* Allocate an extra 'dummy' to ease lookup. See FindGLXFunction() */
+const void * const __glXDispatchFunctions[DI_LAST_INDEX + 1] = {
 #define __ATTRIB(field) \
 [DI_##field] = (void *)dispatch_##field
 
@@ -972,5 +974,6 @@ const void * const __glXDispatchFunctions[DI_LAST_INDEX] = {
 __ATTRIB(glXWaitForMscOML),
 __ATTRIB(glXWaitForSbcOML),
 
+[DI_LAST_INDEX] = NULL,
 #undef __ATTRIB
 };
diff --git a/src/glx/glxglvnd.c b/src/glx/glxglvnd.c
index 9475023..fa39ad4 100644
--- a/src/glx/glxglvnd.c
+++ b/src/glx/glxglvnd.c
@@ -17,7 +17,7 @@ static void *__glXGLVNDGetProcAddress(const GLubyte *procName)
 return glXGetProcAddressARB(procName);
 }
 
-static int FindGLXFunction(const GLubyte *name)
+static unsigned FindGLXFunction(const GLubyte *name)
 {
 unsigned first = 0;
 unsigned last = DI_FUNCTION_COUNT - 1;
@@ -34,26 +34,23 @@ static int FindGLXFunction(const GLubyte *name)
 else
 return middle;
 }
-return -1;
+
+/* Just point to the dummy entry at the end of the respective table */
+return DI_FUNCTION_COUNT;
 }
 
 static void *__glXGLVNDGetDispatchAddress(const GLubyte *procName)
 {
-int internalIndex = FindGLXFunction(procName);
+unsigned internalIndex = FindGLXFunction(procName);
 
-if (internalIndex >= 0) {
-return __glXDispatchFunctions[internalIndex];
-}
-
-return NULL;
+return __glXDispatchFunctions[internalIndex];
 }
 
 static void __glXGLVNDSetDispatchIndex(const GLubyte *procName, int index)
 {
-int internalIndex = FindGLXFunction(procName);
+unsigned internalIndex = FindGLXFunction(procName);
 
-if (internalIndex >= 0)
-__glXDispatchTableIndices[internalIndex] = index;
+__glXDispatchTableIndices[internalIndex] = index;
 }
 
 _X_EXPORT Bool __glx_Main(uint32_t version, const __GLXapiExports *exports,
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] glx/glvnd: Use strcmp() based binary search in FindGLXFunction()

2016-05-11 Thread Adam Jackson

From: Emil Velikov 

It will allows us to find the function within 6 attempts, out of the ~80
entry long table.

Reviewed-by: Adam Jackson 
Signed-off-by: Emil Velikov 
---
 src/glx/glxglvnd.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/src/glx/glxglvnd.c b/src/glx/glxglvnd.c
index c7c35ca..9475023 100644
--- a/src/glx/glxglvnd.c
+++ b/src/glx/glxglvnd.c
@@ -19,11 +19,20 @@ static void *__glXGLVNDGetProcAddress(const GLubyte 
*procName)
 
 static int FindGLXFunction(const GLubyte *name)
 {
-int i;
+unsigned first = 0;
+unsigned last = DI_FUNCTION_COUNT - 1;
+unsigned middle = (first + last) / 2;
 
-for (i = 0; i < DI_FUNCTION_COUNT; i++) {
-if (strcmp((const char *) name, __glXDispatchTableStrings[i]) == 0)
-return i;
+while (first <= last) {
+int comp = strcmp((const char *) name,
+  __glXDispatchTableStrings[middle]);
+
+if (comp < 0)
+first = middle + 1;
+else if (comp > 0)
+last = middle;
+else
+return middle;
 }
 return -1;
 }
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95354] anv_pipeline.c:164:7: error: implicit declaration of function ‘nir_lower_outputs_to_temporaries’ [-Werror=implicit-function-declaration]

2016-05-11 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95354

Bug ID: 95354
   Summary: anv_pipeline.c:164:7: error: implicit declaration of
function ‘nir_lower_outputs_to_temporaries’
[-Werror=implicit-function-declaration]
   Product: Mesa
   Version: git
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Keywords: regression
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: v...@freedesktop.org
QA Contact: mesa-dev@lists.freedesktop.org

mesa: 697382eb61a9091ea0fa8b5836c9e7d281e9e1c5 (master 11.3.0-devel)

  CC   anv_pipeline.lo
anv_pipeline.c: In function ‘anv_shader_compile_to_nir’:
anv_pipeline.c:164:7: error: implicit declaration of function
‘nir_lower_outputs_to_temporaries’ [-Werror=implicit-function-declaration]
   nir_lower_outputs_to_temporaries(entry_point->shader, entry_point);
   ^

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 16/28] blorp: Add initial state setup support for SIMD8 dispatch

2016-05-11 Thread Jason Ekstrand

On Wed, May 11, 2016 at 12:25 AM, Pohjolainen, Topi <
topi.pohjolai...@intel.com> wrote:

> On Tue, May 10, 2016 at 04:16:36PM -0700, Jason Ekstrand wrote:
> > ---
> >  src/mesa/drivers/dri/i965/brw_blorp.c |  6 +-
> >  src/mesa/drivers/dri/i965/brw_blorp.h |  8 +++-
> >  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp  |  2 +-
> >  src/mesa/drivers/dri/i965/brw_blorp_clear.cpp |  4 ++--
> >  src/mesa/drivers/dri/i965/gen6_blorp.c| 23
> ---
> >  src/mesa/drivers/dri/i965/gen7_blorp.c| 27
> +--
> >  src/mesa/drivers/dri/i965/gen8_blorp.c| 23
> +++
> >  7 files changed, 67 insertions(+), 26 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c
> b/src/mesa/drivers/dri/i965/brw_blorp.c
> > index 1379804..6c3b83a 100644
> > --- a/src/mesa/drivers/dri/i965/brw_blorp.c
> > +++ b/src/mesa/drivers/dri/i965/brw_blorp.c
> > @@ -137,7 +137,11 @@ brw_blorp_compute_tile_offsets(const struct
> brw_blorp_surface_info *info,
> >  void
> >  brw_blorp_prog_data_init(struct brw_blorp_prog_data *prog_data)
> >  {
> > -   prog_data->first_curbe_grf = 0;
> > +   prog_data->dispatch_8 = false;
> > +   prog_data->dispatch_16 = true;
> > +   prog_data->first_curbe_grf_0 = 0;
> > +   prog_data->first_curbe_grf_2 = 0;
> > +   prog_data->ksp_offset_2 = 0;
> > prog_data->persample_msaa_dispatch = false;
> >
> > prog_data->nr_params = BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS;
> > diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h
> b/src/mesa/drivers/dri/i965/brw_blorp.h
> > index c2f33a1..b38b689 100644
> > --- a/src/mesa/drivers/dri/i965/brw_blorp.h
> > +++ b/src/mesa/drivers/dri/i965/brw_blorp.h
> > @@ -208,7 +208,13 @@ static const unsigned int
> BRW_BLORP_NUM_PUSH_CONST_REGS =
> >
> >  struct brw_blorp_prog_data
> >  {
> > -   unsigned int first_curbe_grf;
> > +   bool dispatch_8;
> > +   bool dispatch_16;
> > +
> > +   uint8_t first_curbe_grf_0;
> > +   uint8_t first_curbe_grf_2;
> > +
> > +   uint32_t ksp_offset_2;
> >
> > /**
> >  * True if the WM program should be run in MSDISPMODE_PERSAMPLE with
> more
> > diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> > index ed43184..7067c06 100644
> > --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> > @@ -778,7 +778,7 @@ brw_blorp_blit_program::alloc_regs()
> > int reg = 0;
> > this->R0 = retype(brw_vec8_grf(reg++, 0), BRW_REGISTER_TYPE_UW);
> > this->R1 = retype(brw_vec8_grf(reg++, 0), BRW_REGISTER_TYPE_UW);
> > -   prog_data.first_curbe_grf = reg;
> > +   prog_data.first_curbe_grf_0 = reg;
> > alloc_push_const_regs(reg);
> > reg += BRW_BLORP_NUM_PUSH_CONST_REGS;
> > for (unsigned i = 0; i < ARRAY_SIZE(texture_data); ++i) {
> > diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> > index 5ed46e1..c298889 100644
> > --- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> > @@ -86,7 +86,7 @@
> brw_blorp_const_color_program::brw_blorp_const_color_program(
> >   clear_rgba(),
> >   base_mrf(0)
> >  {
> > -   prog_data.first_curbe_grf = 0;
> > +   prog_data.first_curbe_grf_0 = 0;
> > prog_data.persample_msaa_dispatch = false;
> > brw_init_codegen(brw->intelScreen->devinfo, , mem_ctx);
> >  }
> > @@ -145,7 +145,7 @@ brw_blorp_const_color_program::alloc_regs()
> > this->R0 = retype(brw_vec8_grf(reg++, 0), BRW_REGISTER_TYPE_UW);
> > this->R1 = retype(brw_vec8_grf(reg++, 0), BRW_REGISTER_TYPE_UW);
> >
> > -   prog_data.first_curbe_grf = reg;
> > +   prog_data.first_curbe_grf_0 = reg;
> > clear_rgba = retype(brw_vec4_grf(reg++, 0), BRW_REGISTER_TYPE_F);
> > reg += BRW_BLORP_NUM_PUSH_CONST_REGS;
> >
> > diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.c
> b/src/mesa/drivers/dri/i965/gen6_blorp.c
> > index 950e2b9..32049eb 100644
> > --- a/src/mesa/drivers/dri/i965/gen6_blorp.c
> > +++ b/src/mesa/drivers/dri/i965/gen6_blorp.c
> > @@ -619,7 +619,7 @@ gen6_blorp_emit_wm_config(struct brw_context *brw,
> >const struct brw_blorp_params *params)
> >  {
> > const struct brw_blorp_prog_data *prog_data = params->wm_prog_data;
> > -   uint32_t dw2, dw4, dw5, dw6;
> > +   uint32_t dw2, dw4, dw5, dw6, ksp0, ksp2;
> >
> > /* Even when thread dispatch is disabled, max threads (dw5.25:31)
> must be
> >  * nonzero to prevent the GPU from hanging.  While the documentation
> doesn't
> > @@ -630,7 +630,7 @@ gen6_blorp_emit_wm_config(struct brw_context *brw,
> >  * configure the WM state whether or not there is a WM program.
> >  */
> >
> > -   dw2 = dw4 = dw5 = dw6 = 0;
> > +   dw2 = dw4 = dw5 = dw6 = ksp0 = ksp2 = 0;
> > switch (params->hiz_op) {
> > case GEN6_HIZ_OP_DEPTH_CLEAR:
> >dw4 |= GEN6_WM_DEPTH_CLEAR;
> > @@ -652,9 +652,18 @@

Re: [Mesa-dev] [PATCH 15/28] i965/blorp: Add a param array to prog_data

2016-05-11 Thread Pohjolainen, Topi

On Tue, May 10, 2016 at 04:16:35PM -0700, Jason Ekstrand wrote:
> This array allows the push constants to be re-arranged on upload.  The
> actual arrangement will, eventually, come from the back-end compiler.
> ---
>  src/mesa/drivers/dri/i965/brw_blorp.c  |  4 
>  src/mesa/drivers/dri/i965/brw_blorp.h  |  6 ++
>  src/mesa/drivers/dri/i965/gen6_blorp.c | 12 +++-
>  3 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
> b/src/mesa/drivers/dri/i965/brw_blorp.c
> index 4bbe45f..1379804 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.c
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.c
> @@ -139,6 +139,10 @@ brw_blorp_prog_data_init(struct brw_blorp_prog_data 
> *prog_data)
>  {
> prog_data->first_curbe_grf = 0;
> prog_data->persample_msaa_dispatch = false;
> +
> +   prog_data->nr_params = BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS;
> +   for (unsigned i = 0; i < BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS; i++)
> +  prog_data->param[i] = i;
>  }
>  
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h 
> b/src/mesa/drivers/dri/i965/brw_blorp.h
> index 4a0e46e..c2f33a1 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.h
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.h
> @@ -199,6 +199,9 @@ struct brw_blorp_wm_push_constants
> uint32_t pad[5];
>  };
>  
> +#define BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS \
> +   (sizeof(struct brw_blorp_wm_push_constants) / 4)
> +
>  /* Every 32 bytes of push constant data constitutes one GEN register. */
>  static const unsigned int BRW_BLORP_NUM_PUSH_CONST_REGS =
> sizeof(struct brw_blorp_wm_push_constants) / 32;
> @@ -212,6 +215,9 @@ struct brw_blorp_prog_data
>  * than one sample per pixel.
>  */
> bool persample_msaa_dispatch;
> +
> +   uint8_t nr_params;
> +   uint8_t param[BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS];
>  };
>  
>  void brw_blorp_prog_data_init(struct brw_blorp_prog_data *prog_data);
> diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.c 
> b/src/mesa/drivers/dri/i965/gen6_blorp.c
> index 1955811..950e2b9 100644
> --- a/src/mesa/drivers/dri/i965/gen6_blorp.c
> +++ b/src/mesa/drivers/dri/i965/gen6_blorp.c
> @@ -308,11 +308,13 @@ gen6_blorp_emit_wm_constants(struct brw_context *brw,
>  {
> uint32_t wm_push_const_offset;
>  
> -   void *constants = brw_state_batch(brw, AUB_TRACE_WM_CONSTANTS,
> - sizeof(params->wm_push_consts),
> - 32, _push_const_offset);
> -   memcpy(constants, >wm_push_consts,
> -  sizeof(params->wm_push_consts));
> +   uint32_t *constants = brw_state_batch(brw, AUB_TRACE_WM_CONSTANTS,
> + sizeof(params->wm_push_consts),
> + 32, _push_const_offset);
> +
> +   uint32_t *push_consts = (uint32_t *)>wm_push_consts;

Could be:

  const uint32_t *push_consts = (const uint32_t *)>wm_push_consts;

> +   for (unsigned i = 0; i < params->wm_prog_data->nr_params; i++)
> +  constants[i] = push_consts[params->wm_prog_data->param[i]];
>  
> return wm_push_const_offset;
>  }
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Add OpenSWR to GL3.txt?

2016-05-11 Thread Ilia Mirkin

It is whatever you (i.e. driver maintainer) want it to be. GL3.txt is
mainly for coordinating development and letting people know who's
working on what (less so of late though). If you plan on exposing GL
4.0+, it can be a nice TODO list. Otherwise there's not an immense
amount of value.

  -ilia

On Wed, May 11, 2016 at 12:28 PM, Rowley, Timothy O
 wrote:
> What is the criteria for marking an extension “done”?  Passing some 
> percentage (all?) of relevant piglit tests?
>
> -Tim
>
>> On May 10, 2016, at 10:31 PM, Andrew J  wrote:
>>
>> Is there any possibility that OpenSWR can be added to GL3.txt [1] so
>> others can get an idea of what things OpenSWR supports?
>>
>> GL3.txt is what mesamatrix [2] uses, so adding OpenSWR to GL3.txt
>> would add it there as well.
>>
>> [1] https://cgit.freedesktop.org/mesa/mesa/tree/docs/GL3.txt
>> [2] http://mesamatrix.net
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/14] vl dri3 support for vaapi and vdpau

2016-05-11 Thread Leo Liu


Hi Axel,

Thanks for the comments. Inlines.

On 05/11/2016 11:57 AM, Axel Davy wrote:

Hi,

Do you have some local branch to review all at once (it is a bit hard 
to follow with the patches) ?


The sequences of patches are based on existing vl/dri required 
functions, also follows vaapi and then vdpau.
I don't have local branch, but I am going to attach vl_winsys_dri3.c 
file, that might be easier.




From a quick looks, it seems you inspired from the loader dri3 code.



That is quite inspiring. I do learn a lot from loader dri3 code. I 
should honor it in the commit message. Thanks.



There is also another implementation you can inspire from:
https://github.com/iXit/wine/blob/master/dlls/d3d9-nine/dri3.c
Probably not much more you can get from it.

I haven't checked the code yet, so I don't know if that applies, 
something I have noticed on my tonga with games, is that (non-vsynced) 
apps that get around 45 fps fell like 15 fps (above 50 or below 35 is 
fine).
I guess this is due to the fact the screen buffer swap waits the 
buffer has finished rendering to execute the swap, and some bad timing 
when hitting 45 fps.


I am using Tonga as well for the development, I haven't hit this, but 
definitely that's something to be considered, I will try more video clips.


In fact for this specific case with gallium nine, I noticed the 
problem disappear when using thread_submit=true.
thread_submit is an option that was designed for DRI_PRIME case in 
mind: the driver spawns a thread that will wait the buffers we want to 
present are finished rendering before sending them. That solves all 
the sync issues a DRI_PRIME configuration can have.
I think in the case of the problem described, sending buffers that are 
finished rendering prevents the screen buffer swap to have to wait 
another vblank the buffer is rendered.


I guess for video, you really don't want to hit the bad scenario 
described. I'm not sure if you can possibly have the issue or not, but 
that may be something to consider. In all cases, that seems a good 
thing to look at if wanting to implement a good DRI_PRIME support, 
granting it is possible: I don't know the user API, but if the user 
has guarantee for example the updated content will be copied to some 
pixmap after some call, you cannot delay the presentation for that case.


Like said, definitely keep this in mind. File attached.

Thanks,
Leo



Axel


On 11/05/2016 17:06, Leo Liu wrote :

This series implement DRI3 supports for VA-API and VDPAU. It implements
supports for DRI3 Open, PixmapFromBuffer, BufferFromPixmap, and for
PRESENT including PresentPixmap, PresentNotifyMSC, PresentIdleNotify,
PresentConfigureNotify and PresentCompleteNotify.

It has been tested with player mpv and vlc with various clips from
480p to 4K with framerate from 24 to 60. Also includes window mode
and fullscreen w/wo compositing manager. The test also includes VA-API
glx extension.

There's still some future work like DRI_PRIME different GPU support
to be added.

Leo Liu (14):
   vl: add DRI3 support infrastructure
   vl/dri3: implement dri3 screen create and destroy
   vl/dri3: set drawable geometry
   vl/dri3: register present events
   vl/dri3: implement flushing for queued events
   vl/dri3: add back buffers support
   vl/dri3: implement function for flush frontbuffer
   vl/dri3: implement funciton for get dirty area
   vl/dri3: add support for resizing
   vl/dri3: implement DRI3 BufferFromPixmap
   st/va: add dri3 support
   vl/dri3: handle PresentCompleteNotify event
   vl/dri3: implement functions for get and set timestamp
   st/vdpau: add dri3 support

  configure.ac  |   7 +-
  src/gallium/auxiliary/Makefile.sources|   5 +
  src/gallium/auxiliary/vl/vl_winsys.h  |   5 +
  src/gallium/auxiliary/vl/vl_winsys_dri3.c | 703 
++

  src/gallium/state_trackers/va/context.c   |   6 +-
  src/gallium/state_trackers/vdpau/device.c |   6 +-
  6 files changed, 729 insertions(+), 3 deletions(-)
  create mode 100644 src/gallium/auxiliary/vl/vl_winsys_dri3.c





/**
 *
 * Copyright 2016 Advanced Micro Devices, Inc.
 * All Rights Reserved.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the
 * "Software"), to deal in the Software without restriction, including
 * without limitation the rights to use, copy, modify, merge, publish,
 * distribute, sub license, and/or sell copies of the Software, and to
 * permit persons to whom the Software is furnished to do so, subject to
 * the following conditions:
 *
 * The above copyright notice and this permission notice (including the
 * next paragraph) shall be included in all copies or substantial portions
 * of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
 * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE

Re: [Mesa-dev] Add OpenSWR to GL3.txt?

2016-05-11 Thread Rowley, Timothy O

What is the criteria for marking an extension “done”?  Passing some percentage 
(all?) of relevant piglit tests?

-Tim

> On May 10, 2016, at 10:31 PM, Andrew J  wrote:
> 
> Is there any possibility that OpenSWR can be added to GL3.txt [1] so
> others can get an idea of what things OpenSWR supports?
> 
> GL3.txt is what mesamatrix [2] uses, so adding OpenSWR to GL3.txt
> would add it there as well.
> 
> [1] https://cgit.freedesktop.org/mesa/mesa/tree/docs/GL3.txt
> [2] http://mesamatrix.net

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/28] i965/blorp: Use NIR for compiling shaders

2016-05-11 Thread Pohjolainen, Topi

On Tue, May 10, 2016 at 04:16:20PM -0700, Jason Ekstrand wrote:
> When Paul originally wrote blorp he hand-rolled a shader builder that
> builds i965 shaders directly.  This has caused headaches because every time
> we make a change to the back-end compiler, we have to update blorp.  NIR on
> the other hand tends to be more stable at this point since it has many
> different users all across mesa.
> 
> Using NIR also means that we get decent optimizations, register allocation,
> and scheduling.  The original blorp codegen code tried fairly hard to emit
> reasonably efficient code in that it didn't do more work than needed but it
> was fairly naieve when it came to register allocation and scheduling.
> Using the full compiler stack also means that we get new features for free
> without having to re-implement them in blorp.  On Sky Lake, for instance,
> we are now generating shaders with sampler-EOT.
> 
> In spite of all this, this series shows no measurable performance
> difference on Haswell with every benchmark in sixonyx run 25 times.
> 
> Jason Ekstrand (28):
>   nir: Add an info bit for uses_sample_qualifier
>   i965/fs: Rework the persample shading key/prog_data bits
>   i965/state: Clean up WM/PS state to pull more things out of prog_data
>   i965/fs: Clean up the logic in compile_fs a bit
>   i965/fs: Stop setting dispatch_grf_start_reg from the visitor
>   i965/gen7_wm: Move where we set the fast clear op
>   i965/fs: Organize prog_data by ksp number rather than SIMD width
>   i965/blorp: Simplify the sample layout calculation
>   i965/fs: Use MRF0 for the repclear message
>   nir/builder: Generate the alu helpers directly in python
>   nir/builder: Add a helper for grabbing multiple channels from an ssa
> def
>   nir: Add texture opcodes and source types for multisample compression
>   i965/fs: Implement the new NIR MCS texturing
>   i965/blorp: Add a prog_data_init helper
>   i965/blorp: Add a param array to prog_data
>   blorp: Add initial state setup support for SIMD8 dispatch
>   i965/blorp: Add a helper for compiling NIR shaders
>   i965/blorp: Create the program key in get_clear_kernel
>   i965/blorp: Use NIR for clear shaders
>   i965/blorp: Refactor getting the blit kernel into a helper

I had a few questions but 14-20 are:

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] Android: apps crashed on Intel Gen9 GPU

2016-05-11 Thread Chih-Wei Huang

Testing android-x86 with mesa 11.2.2,
I found the Google Play crashed forever on
a device with Intel Gen9 GPU (e.g., Skylake).

After analyzing, the i965 driver seems to assume
irb->mt is not null. For example in
brw_meta_fast_clear of brw_meta_fast_clear.c:

  struct intel_renderbuffer *irb = intel_renderbuffer(rb);
  ...
  if (brw->gen >= 9 &&
  brw_format_for_mesa_format(irb->mt->format) !=
^ => crashing
  brw->render_target_format[irb->mt->format])
 clear_type = REP_CLEAR;

If I added null checking to irb->mt, it fixes this crashing.
However, the app still crashed at other place that
accesses irb->mt similarly.
(brw_draw.c line 399, gen8_surface_state.c line 432, etc)

Please comment how to fix it correctly.
Why irb->mt is null but the code assumes it's not?


-- 
Chih-Wei
Android-x86 project
http://www.android-x86.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/17] scons: Build NIR.

2016-05-11 Thread Rob Clark

On Tue, May 10, 2016 at 3:43 PM, Emil Velikov  wrote:
> On 9 May 2016 at 20:33, Rob Clark  wrote:
>> From: Jose Fonseca 
>>
>> Signed-off-by: Rob Clark 
>> ---
>>  src/compiler/SConscript | 57 
>> +++--
>>  1 file changed, 55 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/compiler/SConscript b/src/compiler/SConscript
>> index 10c79c4..dde4dfd 100644
>> --- a/src/compiler/SConscript
>> +++ b/src/compiler/SConscript
>> @@ -1,5 +1,7 @@
>>  Import('*')
>>
>> +from sys import executable as python_cmd
>> +
>>  env = env.Clone()
>>
>>  env.MSVC2013Compat()
>> @@ -11,13 +13,64 @@ env.Prepend(CPPPATH = [
>>  '#src/mesa',
>>  '#src/gallium/include',
>>  '#src/gallium/auxiliary',
>> +'#src/compiler',
>> +'#src/compiler/nir',
>> +])
>> +
>> +
>> +# Make generated headers reachable from the include path.
>> +env.Append(CPPPATH = [
>> +   Dir('nir').abspath
>>  ])
>>
>> -sources = env.ParseSourceList('Makefile.sources', 'LIBCOMPILER_FILES')
>> +# nir generated sources
>> +
>> +nir_builder_opcodes_h = env.CodeGenerate(
>> +target = 'nir/nir_builder_opcodes.h',
>> +script = 'nir/nir_builder_opcodes_h.py',
>> +source = [],
>> +command = python_cmd + ' $SCRIPT > $TARGET'
>> +)
>> +
>> +env.CodeGenerate(
>> +target = 'nir/nir_constant_expressions.c',
>> +script = 'nir/nir_constant_expressions.py',
>> +source = [],
>> +command = python_cmd + ' $SCRIPT > $TARGET'
>> +)
>> +
>> +env.CodeGenerate(
>> +target = 'nir/nir_opcodes.h',
>> +script = 'nir/nir_opcodes_h.py',
>> +source = [],
>> +command = python_cmd + ' $SCRIPT > $TARGET'
>> +)
>> +
>> +env.CodeGenerate(
>> +target = 'nir/nir_opcodes.c',
>> +script = 'nir/nir_opcodes_c.py',
>> +source = [],
>> +command = python_cmd + ' $SCRIPT > $TARGET'
>> +)
>> +
>> +env.CodeGenerate(
>> +target = 'nir/nir_opt_algebraic.c',
>> +script = 'nir/nir_algebraic.py',
>> +source = [],
>> +command = python_cmd + ' $SCRIPT > $TARGET'
>> +)
>> +
>> +# parse Makefile.sources
>> +source_lists = env.ParseSourceList('Makefile.sources')
>> +
>> +nir_sources = []
>> +nir_sources += source_lists['LIBCOMPILER_FILES']
>> +nir_sources += source_lists['NIR_FILES']
>> +nir_sources += source_lists['NIR_GENERATED_FILES']
>>
>>  compiler = env.ConvenienceLibrary(
>>  target = 'compiler',
>> -source = sources
>> +source = nir_sources
>>  )
>>  Export('compiler')
>>
> NIR already has scons build support. One just needs to add the static
> (convenience in scons speak) library 'nir' into the respective
> place(s). Something like the following untested hunk should do it. And
> yes, it is a bit nasty looking.
>
> -Emil
>
> diff --git a/src/compiler/SConscript.glsl b/src/compiler/SConscript.glsl
> index 43a11d1..4e5133b 100644
> --- a/src/compiler/SConscript.glsl
> +++ b/src/compiler/SConscript.glsl
> @@ -64,6 +64,8 @@ if env['msvc']:
>  env.Prepend(CPPPATH = ['#/src/getopt'])
>  env.PrependUnique(LIBS = [getopt])
>
> +env.Prepend(LIBS = [nir])
> +
>  # Copy these files to avoid generation object files into src/mesa/program
>  env.Prepend(CPPPATH = ['#src/mesa/main'])
>  env.Command('glsl/imports.c', '#src/mesa/main/imports.c',
> Copy('$TARGET', '$SOURCE'))

hmm, seems to take more than that... but not planning to push the
parts that introduce mesa/st dependency on NIR yet so if you want to
handle adding nir to scons build, I'm happy to drop the patch..

$ scons
scons: Reading SConscript files ...
Checking for GCC ...  yes
Checking for Clang ...  no
scons: Found LLVM version 3.7.0
Checking for X11 (x11 xext xdamage xfixes glproto >= 1.4.13)... yes
Checking for XCB (x11-xcb xcb-glx >= 1.8.1 xcb-dri2 >= 1.8)... yes
Checking for XF86VIDMODE (xxf86vm)... yes
Checking for DRM (libdrm >= 2.4.38)... yes
Checking for UDEV (libudev >= 151)... yes
NameError: name 'nir' is not defined:
  File "/home/robclark/tmp/mesa/SConstruct", line 143:
duplicate = 0 # http://www.scons.org/doc/0.97/HTML/scons-user/x2261.html
  File "/usr/lib/scons/SCons/Script/SConscript.py", line 614:
return method(*args, **kw)
  File "/usr/lib/scons/SCons/Script/SConscript.py", line 551:
return _SConscript(self.fs, *files, **subst_kw)
  File "/usr/lib/scons/SCons/Script/SConscript.py", line 260:
exec _file_ in call_stack[-1].globals
  File "/home/robclark/tmp/mesa/src/SConscript", line 8:
SConscript('compiler/SConscript')
  File "/usr/lib/scons/SCons/Script/SConscript.py", line 614:
return method(*args, **kw)
  File "/usr/lib/scons/SCons/Script/SConscript.py", line 551:
return _SConscript(self.fs, *files, **subst_kw)
  File "/usr/lib/scons/SCons/Script/SConscript.py", line 260:
exec _file_ in call_stack[-1].globals
  File "/home/robclark/tmp/mesa/src/compiler/SConscript", line 24:
SConscript('SConscript.glsl')
  File

Re: [Mesa-dev] [PATCH 00/14] vl dri3 support for vaapi and vdpau

2016-05-11 Thread Axel Davy


Hi,

Do you have some local branch to review all at once (it is a bit hard to 
follow with the patches) ?


From a quick looks, it seems you inspired from the loader dri3 code.

There is also another implementation you can inspire from:
https://github.com/iXit/wine/blob/master/dlls/d3d9-nine/dri3.c
Probably not much more you can get from it.

I haven't checked the code yet, so I don't know if that applies, 
something I have noticed on my tonga with games, is that (non-vsynced) 
apps that get around 45 fps fell like 15 fps (above 50 or below 35 is fine).
I guess this is due to the fact the screen buffer swap waits the buffer 
has finished rendering to execute the swap, and some bad timing when 
hitting 45 fps.
In fact for this specific case with gallium nine, I noticed the problem 
disappear when using thread_submit=true.
thread_submit is an option that was designed for DRI_PRIME case in mind: 
the driver spawns a thread that will wait the buffers we want to present 
are finished rendering before sending them. That solves all the sync 
issues a DRI_PRIME configuration can have.
I think in the case of the problem described, sending buffers that are 
finished rendering prevents the screen buffer swap to have to wait 
another vblank the buffer is rendered.


I guess for video, you really don't want to hit the bad scenario 
described. I'm not sure if you can possibly have the issue or not, but 
that may be something to consider. In all cases, that seems a good thing 
to look at if wanting to implement a good DRI_PRIME support, granting it 
is possible: I don't know the user API, but if the user has guarantee 
for example the updated content will be copied to some pixmap after some 
call, you cannot delay the presentation for that case.


Axel


On 11/05/2016 17:06, Leo Liu wrote :

This series implement DRI3 supports for VA-API and VDPAU. It implements
supports for DRI3 Open, PixmapFromBuffer, BufferFromPixmap, and for
PRESENT including PresentPixmap, PresentNotifyMSC, PresentIdleNotify,
PresentConfigureNotify and PresentCompleteNotify.

It has been tested with player mpv and vlc with various clips from
480p to 4K with framerate from 24 to 60. Also includes window mode
and fullscreen w/wo compositing manager. The test also includes VA-API
glx extension.

There's still some future work like DRI_PRIME different GPU support
to be added.

Leo Liu (14):
   vl: add DRI3 support infrastructure
   vl/dri3: implement dri3 screen create and destroy
   vl/dri3: set drawable geometry
   vl/dri3: register present events
   vl/dri3: implement flushing for queued events
   vl/dri3: add back buffers support
   vl/dri3: implement function for flush frontbuffer
   vl/dri3: implement funciton for get dirty area
   vl/dri3: add support for resizing
   vl/dri3: implement DRI3 BufferFromPixmap
   st/va: add dri3 support
   vl/dri3: handle PresentCompleteNotify event
   vl/dri3: implement functions for get and set timestamp
   st/vdpau: add dri3 support

  configure.ac  |   7 +-
  src/gallium/auxiliary/Makefile.sources|   5 +
  src/gallium/auxiliary/vl/vl_winsys.h  |   5 +
  src/gallium/auxiliary/vl/vl_winsys_dri3.c | 703 ++
  src/gallium/state_trackers/va/context.c   |   6 +-
  src/gallium/state_trackers/vdpau/device.c |   6 +-
  6 files changed, 729 insertions(+), 3 deletions(-)
  create mode 100644 src/gallium/auxiliary/vl/vl_winsys_dri3.c



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/14] vl dri3 support for vaapi and vdpau

2016-05-11 Thread Alex Deucher

On Wed, May 11, 2016 at 11:06 AM, Leo Liu  wrote:
> This series implement DRI3 supports for VA-API and VDPAU. It implements
> supports for DRI3 Open, PixmapFromBuffer, BufferFromPixmap, and for
> PRESENT including PresentPixmap, PresentNotifyMSC, PresentIdleNotify,
> PresentConfigureNotify and PresentCompleteNotify.
>
> It has been tested with player mpv and vlc with various clips from
> 480p to 4K with framerate from 24 to 60. Also includes window mode
> and fullscreen w/wo compositing manager. The test also includes VA-API
> glx extension.
>
> There's still some future work like DRI_PRIME different GPU support
> to be added.
>
> Leo Liu (14):
>   vl: add DRI3 support infrastructure
>   vl/dri3: implement dri3 screen create and destroy
>   vl/dri3: set drawable geometry
>   vl/dri3: register present events
>   vl/dri3: implement flushing for queued events
>   vl/dri3: add back buffers support
>   vl/dri3: implement function for flush frontbuffer
>   vl/dri3: implement funciton for get dirty area
>   vl/dri3: add support for resizing
>   vl/dri3: implement DRI3 BufferFromPixmap
>   st/va: add dri3 support
>   vl/dri3: handle PresentCompleteNotify event
>   vl/dri3: implement functions for get and set timestamp
>   st/vdpau: add dri3 support

For the series:
Reviewed-by: Alex Deucher 

>
>  configure.ac  |   7 +-
>  src/gallium/auxiliary/Makefile.sources|   5 +
>  src/gallium/auxiliary/vl/vl_winsys.h  |   5 +
>  src/gallium/auxiliary/vl/vl_winsys_dri3.c | 703 
> ++
>  src/gallium/state_trackers/va/context.c   |   6 +-
>  src/gallium/state_trackers/vdpau/device.c |   6 +-
>  6 files changed, 729 insertions(+), 3 deletions(-)
>  create mode 100644 src/gallium/auxiliary/vl/vl_winsys_dri3.c
>
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 15/28] i965/blorp: Add a param array to prog_data

2016-05-11 Thread Pohjolainen, Topi

On Wed, May 11, 2016 at 07:46:33AM -0700, Jason Ekstrand wrote:
>On May 11, 2016 7:45 AM, "Jason Ekstrand" <[1]ja...@jlekstrand.net>
>wrote:
>>
>>
>> On May 10, 2016 11:53 PM, "Pohjolainen, Topi"
><[2]topi.pohjolai...@intel.com> wrote:
>> >
>> > On Tue, May 10, 2016 at 04:16:35PM -0700, Jason Ekstrand wrote:
>> > > This array allows the push constants to be re-arranged on
>upload.  The
>> > > actual arrangement will, eventually, come from the back-end
>compiler.
>> > > ---
>> > >  src/mesa/drivers/dri/i965/brw_blorp.c  |  4 
>> > >  src/mesa/drivers/dri/i965/brw_blorp.h  |  6 ++
>> > >  src/mesa/drivers/dri/i965/gen6_blorp.c | 12 +++-
>> > >  3 files changed, 17 insertions(+), 5 deletions(-)
>> > >
>> > > diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c
>b/src/mesa/drivers/dri/i965/brw_blorp.c
>> > > index 4bbe45f..1379804 100644
>> > > --- a/src/mesa/drivers/dri/i965/brw_blorp.c
>> > > +++ b/src/mesa/drivers/dri/i965/brw_blorp.c
>> > > @@ -139,6 +139,10 @@ brw_blorp_prog_data_init(struct
>brw_blorp_prog_data *prog_data)
>> > >  {
>> > > prog_data->first_curbe_grf = 0;
>> > > prog_data->persample_msaa_dispatch = false;
>> > > +
>> > > +   prog_data->nr_params = BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS;
>> > > +   for (unsigned i = 0; i < BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS;
>i++)
>> > > +  prog_data->param[i] = i;
>> > >  }
>> > >
>> > >
>> > > diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h
>b/src/mesa/drivers/dri/i965/brw_blorp.h
>> > > index 4a0e46e..c2f33a1 100644
>> > > --- a/src/mesa/drivers/dri/i965/brw_blorp.h
>> > > +++ b/src/mesa/drivers/dri/i965/brw_blorp.h
>> > > @@ -199,6 +199,9 @@ struct brw_blorp_wm_push_constants
>> > > uint32_t pad[5];
>> > >  };
>> > >
>> > > +#define BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS \
>> > > +   (sizeof(struct brw_blorp_wm_push_constants) / 4)
>> > > +
>> > >  /* Every 32 bytes of push constant data constitutes one GEN
>register. */
>> > >  static const unsigned int BRW_BLORP_NUM_PUSH_CONST_REGS =
>> > > sizeof(struct brw_blorp_wm_push_constants) / 32;
>> > > @@ -212,6 +215,9 @@ struct brw_blorp_prog_data
>> > >  * than one sample per pixel.
>> > >  */
>> > > bool persample_msaa_dispatch;
>> > > +
>> > > +   uint8_t nr_params;
>> > > +   uint8_t param[BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS];
>> >
>> > Do I read this correctly: this corresponds to push_contant_loc in
>the scalar
>> > backend?
>>
>> Sort-of.  The mapping actually goes in the other direction:  From
>location to uniform number.
> 
>Really, it's just a simplified version of peog_data->param.

Right. Could we add some description, "param" doesn't tell much, does it?
For example,

 /* Compiler will re-arrange push constants and store the upload order
  * here. Given an index 'i' in the final upload buffer, param[i] gives
  * the index in the uniform store. In other words, the value to be
  * uploaded can be found in brw_blorp_params::wm_push_consts[param[i]].
  */
 uint8_t param[BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS];

> 
>> > >  };
>> > >
>> > >  void brw_blorp_prog_data_init(struct brw_blorp_prog_data
>*prog_data);
>> > > diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.c
>b/src/mesa/drivers/dri/i965/gen6_blorp.c
>> > > index 1955811..950e2b9 100644
>> > > --- a/src/mesa/drivers/dri/i965/gen6_blorp.c
>> > > +++ b/src/mesa/drivers/dri/i965/gen6_blorp.c
>> > > @@ -308,11 +308,13 @@ gen6_blorp_emit_wm_constants(struct
>brw_context *brw,
>> > >  {
>> > > uint32_t wm_push_const_offset;
>> > >
>> > > -   void *constants = brw_state_batch(brw,
>AUB_TRACE_WM_CONSTANTS,
>> > > -
>sizeof(params->wm_push_consts),
>> > > - 32, _push_const_offset);
>> > > -   memcpy(constants, >wm_push_consts,
>> > > -  sizeof(params->wm_push_consts));
>> > > +   uint32_t *constants = brw_state_batch(brw,
>AUB_TRACE_WM_CONSTANTS,
>> > > +
>sizeof(params->wm_push_consts),
>> > > + 32,
>_push_const_offset);
>> > > +
>> > > +   uint32_t *push_consts = (uint32_t *)>wm_push_consts;
>> > > +   for (unsigned i = 0; i < params->wm_prog_data->nr_params;
>i++)
>> > > +  constants[i] =
>push_consts[params->wm_prog_data->param[i]];
>> > >
>> > > return wm_push_const_offset;
>> > >  }
>> > > --
>> > > 2.5.0.400.gff86faf
>> > >
>> > > ___
>> > > mesa-dev mailing list
>> > > [3]mesa-dev@lists.freedesktop.org
>> > > [4]https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> References
> 
>1.

Re: [Mesa-dev] [PATCH 12/23] i965/fs: fix pull constant load component selection for doubles

2016-05-11 Thread Samuel Iglesias Gonsálvez

On Wed, 2016-05-11 at 17:12 +0200, Samuel Iglesias Gonsálvez wrote:
> On Tue, 2016-05-10 at 21:06 -0700, Francisco Jerez wrote:
> > 
> > Samuel Iglesias Gonsálvez  writes:
> > 
> > > 
> > > 
> > > From: Iago Toral Quiroga 
> > > 
> > > UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4
> > > starting at a
> > > constant offset that is 16-byte aligned. If we need to access an
> > > unaligned
> > > offset we emit a load with an aligned offset and use the
> > > remaining
> > > constant
> > > offset to select the component into the vec4 result that we are
> > > interested
> > > in. This component must be computed in units of the type size,
> > > since that
> > > is what fs_reg::set_smear expects.
> > > 
> > > This patch does this change in the two places where we use this
> > > message:
> > > In demote_pull_constants when we lower uniform access with
> > > constant
> > > offset
> > > into the pull constant buffer and in UBO loads with constant
> > > offset.
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++-
> > >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 4 +++-
> > >  2 files changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > index 0e69be8..dff13ea 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > @@ -2268,7 +2268,8 @@ fs_visitor::lower_constant_loads()
> > >   inst->src[i].file = VGRF;
> > >   inst->src[i].nr = dst.nr;
> > >   inst->src[i].reg_offset = 0;
> > > - inst->src[i].set_smear(pull_index & 3);
> > > + unsigned type_slots = MAX2(1, type_sz(inst->dst.type) /
> > > 4);
> > > + inst->src[i].set_smear((pull_index & 3) / type_slots);
> > >  
> > This cannot be right, why should we care what the destination type
> > of
> > the instruction is while lowering a uniform source?  Also I don't
> > think
> > the MAX2 call is correct because *if* type_sz(inst->dst.type) / 4 <
> > 1
> > you'll force type_slots to 1 and end up interpreting the pull_index
> > in
> > the wrong units.  How about:
> > 
> > > 
> > > 
> > >   inst->src[i].set_smear((pull_index & 3) * 4 /
> > >  type_sz(inst->src[i].type));
> > > 
> OK
> 
> > 
> > > 
> > >   brw_mark_surface_used(prog_data, index);
> > >    }
> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > index 4cd219a..532ca65 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > @@ -2980,8 +2980,10 @@ fs_visitor::nir_emit_intrinsic(const
> > > fs_builder , nir_intrinsic_instr *instr
> > >   bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
> > > packed_consts,
> > >    surf_index, const_offset_reg);
> > >  
> > > + unsigned component_base =
> > > +(const_offset->u32[0] % 16) / MAX2(1,
> > > type_sz(dest.type));
> > Rather than dividing by the type size only to let set_smear
> > multiply
> > by
> > the type size again, I think it would be cleaner to do something
> > like:
> > 
> > > 
> > > 
> > >   const fs_reg consts = byte_offset(packed_consts,
> > > const_offset->u32[0] % 16);
> > > 
> > >   for (unsigned i = 0; i < instr->num_components; i++) {
> > then here:
> > 
> > > 
> > > 
> > >  bld.MOV(offset(dest, bld, i), component(consts, i));
> > and then remove the rest of the loop.
> > 
> I am having troubles with adapting patch 13/23 to this way because
> the
> following assert in component() is failing for some tests:
>     
>     assert(reg.subreg_offset == 0);
> 
> consts.subreg is not zero thanks to byte_offset() call.
> 
> So I prefer to go to a mixed solution: keep set_smear() usage, then:
> 
>    bld.MOV(offset(dest, bld, i), packed_consts);
> 

Looking at patch 13, offset(dest, bld, i) needs to be adjusted to save
the remaining components, so I think the MOV is clearer as it is now
than the proposed change.

Sam

> and remove the rest of the loop.
> 
> Sam
> 
> > 
> > > 
> > > 
> > > -packed_consts.set_smear(const_offset->u32[0] % 16 /
> > > 4
> > > + i);
> > > +packed_consts.set_smear(component_base + i);
> > >  
> > >  /* The std140 packing rules don't allow vectors to
> > > cross 16-byte
> > >   * boundaries, and a reg is 32 bytes.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 08/14] vl/dri3: implement funciton for get dirty area

2016-05-11 Thread Leo Liu

This will clear presentation area not covered by video content

Signed-off-by: Leo Liu 
---
 src/gallium/auxiliary/vl/vl_winsys_dri3.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
index a6ac64a..8895663 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -41,6 +41,7 @@
 #include "util/u_memory.h"
 #include "util/u_inlines.h"
 
+#include "vl/vl_compositor.h"
 #include "vl/vl_winsys.h"
 
 #define BACK_BUFFER_NUM 3
@@ -69,6 +70,8 @@ struct vl_dri3_screen
 
struct vl_dri3_buffer *back_buffers[BACK_BUFFER_NUM];
int cur_back;
+
+   struct u_rect dirty_areas[BACK_BUFFER_NUM];
 };
 
 static void
@@ -251,6 +254,7 @@ dri3_get_back_buffer(struct vl_dri3_screen *scrn)
   if (!buffer)
  return NULL;
 
+  vl_compositor_reset_dirty_area(>dirty_areas[scrn->cur_back]);
   scrn->back_buffers[scrn->cur_back] = buffer;
}
 
@@ -363,8 +367,11 @@ vl_dri3_screen_texture_from_drawable(struct vl_screen 
*vscreen, void *drawable)
 static struct u_rect *
 vl_dri3_screen_get_dirty_area(struct vl_screen *vscreen)
 {
-   /* TODO */
-   return NULL;
+   struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)vscreen;
+
+   assert(scrn);
+
+   return >dirty_areas[scrn->cur_back];
 }
 
 static uint64_t
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 06/14] vl/dri3: add back buffers support

2016-05-11 Thread Leo Liu

This implements DRI3 PixmapFromBuffer. Create buffer objects, and
associate it to a dma-buf fd, and then pass this fd with a pixmap
ID to X server for creating pixmap object; also add a function
for wait events.

Signed-off-by: Leo Liu 
---
 src/gallium/auxiliary/vl/vl_winsys_dri3.c | 187 +-
 1 file changed, 185 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
index ef80730..e78ca07 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -28,17 +28,35 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 
 #include "loader.h"
 
 #include "pipe/p_screen.h"
+#include "pipe/p_state.h"
 #include "pipe-loader/pipe_loader.h"
 
 #include "util/u_memory.h"
+#include "util/u_inlines.h"
+
 #include "vl/vl_winsys.h"
 
+#define BACK_BUFFER_NUM 3
+
+struct vl_dri3_buffer
+{
+   struct pipe_resource *texture;
+
+   uint32_t pixmap;
+   uint32_t sync_fence;
+   struct xshmfence *shm_fence;
+
+   bool busy;
+   uint32_t width, height, pitch;
+};
+
 struct vl_dri3_screen
 {
struct vl_screen base;
@@ -48,9 +66,23 @@ struct vl_dri3_screen
uint32_t width, height, depth;
 
xcb_special_event_t *special_event;
+
+   struct vl_dri3_buffer *back_buffers[BACK_BUFFER_NUM];
+   int cur_back;
 };
 
 static void
+dri3_free_back_buffer(struct vl_dri3_screen *scrn,
+struct vl_dri3_buffer *buffer)
+{
+   xcb_free_pixmap(scrn->conn, buffer->pixmap);
+   xcb_sync_destroy_fence(scrn->conn, buffer->sync_fence);
+   xshmfence_unmap_shm(buffer->shm_fence);
+   pipe_resource_reference(>texture, NULL);
+   FREE(buffer);
+}
+
+static void
 dri3_handle_present_event(struct vl_dri3_screen *scrn,
   xcb_present_generic_event_t *ge)
 {
@@ -83,6 +115,145 @@ dri3_flush_present_events(struct vl_dri3_screen *scrn)
 }
 
 static bool
+dri3_wait_present_events(struct vl_dri3_screen *scrn)
+{
+   if (scrn->special_event) {
+  xcb_generic_event_t *ev;
+  ev = xcb_wait_for_special_event(scrn->conn, scrn->special_event);
+  if (!ev)
+ return false;
+  dri3_handle_present_event(scrn, (xcb_present_generic_event_t *)ev);
+  return true;
+   }
+   return false;
+}
+
+static int
+dri3_find_back(struct vl_dri3_screen *scrn)
+{
+   int b;
+
+   for (;;) {
+  for (b = 0; b < BACK_BUFFER_NUM; b++) {
+ int id = (b + scrn->cur_back) % BACK_BUFFER_NUM;
+ struct vl_dri3_buffer *buffer = scrn->back_buffers[id];
+ if (!buffer || !buffer->busy)
+return id;
+  }
+  xcb_flush(scrn->conn);
+  if (!dri3_wait_present_events(scrn))
+ return -1;
+   }
+}
+
+static struct vl_dri3_buffer *
+dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
+{
+   struct vl_dri3_buffer *buffer;
+   xcb_pixmap_t pixmap;
+   xcb_sync_fence_t sync_fence;
+   struct xshmfence *shm_fence;
+   int buffer_fd, fence_fd;
+   struct pipe_resource templ;
+   struct winsys_handle whandle;
+   unsigned usage;
+
+   buffer = CALLOC_STRUCT(vl_dri3_buffer);
+   if (!buffer)
+  return NULL;
+
+   fence_fd = xshmfence_alloc_shm();
+   if (fence_fd < 0)
+  goto free_buffer;
+
+   shm_fence = xshmfence_map_shm(fence_fd);
+   if (!shm_fence)
+  goto close_fd;
+
+   memset(, 0, sizeof(templ));
+   templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
+PIPE_BIND_SCANOUT | PIPE_BIND_SHARED;
+   templ.format = PIPE_FORMAT_B8G8R8X8_UNORM;
+   templ.target = PIPE_TEXTURE_2D;
+   templ.last_level = 0;
+   templ.width0 = scrn->width;
+   templ.height0 = scrn->height;
+   templ.depth0 = 1;
+   templ.array_size = 1;
+   buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
+ );
+   if (!buffer->texture)
+  goto unmap_shm;
+
+   memset(, 0, sizeof(whandle));
+   whandle.type= DRM_API_HANDLE_TYPE_FD;
+   usage = PIPE_HANDLE_USAGE_EXPLICIT_FLUSH | PIPE_HANDLE_USAGE_READ;
+   scrn->base.pscreen->resource_get_handle(scrn->base.pscreen,
+   buffer->texture, ,
+   usage);
+   buffer_fd = whandle.handle;
+   buffer->pitch = whandle.stride;
+   xcb_dri3_pixmap_from_buffer(scrn->conn,
+   (pixmap = xcb_generate_id(scrn->conn)),
+   scrn->drawable,
+   0,
+   scrn->width, scrn->height, buffer->pitch,
+   scrn->depth, 32,
+   buffer_fd);
+   xcb_dri3_fence_from_fd(scrn->conn,
+  pixmap,
+  (sync_fence = xcb_generate_id(scrn->conn)),
+  false,
+  fence_fd);
+
+   buffer->pixmap = pixmap;
+   buffer->sync_fence = sync_fence;
+   buffer->shm_fence = shm_fence;
+

[Mesa-dev] [PATCH 02/14] vl/dri3: implement dri3 screen create and destroy

2016-05-11 Thread Leo Liu

Screen created with device fd returned from X server,
also will bail out to DRI2 with certain conditions.

Signed-off-by: Leo Liu 
---
 configure.ac  |  7 ++-
 src/gallium/auxiliary/vl/vl_winsys_dri3.c | 88 ++-
 2 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/configure.ac b/configure.ac
index 023110e..8c3960a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1779,7 +1779,12 @@ if test "x$enable_xvmc" = xyes -o \
 "x$enable_vdpau" = xyes -o \
 "x$enable_omx" = xyes -o \
 "x$enable_va" = xyes; then
-PKG_CHECK_MODULES([VL], [x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED])
+if test x"$enable_dri3" = xyes; then
+PKG_CHECK_MODULES([VL], [xcb-dri3 xcb-present xcb-sync xshmfence >= 
$XSHMFENCE_REQUIRED
+ x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED])
+else
+PKG_CHECK_MODULES([VL], [x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED])
+fi
 need_gallium_vl_winsys=yes
 fi
 AM_CONDITIONAL(NEED_GALLIUM_VL_WINSYS, test "x$need_gallium_vl_winsys" = xyes)
diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
index 2c3d3ae..c018379 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -25,7 +25,16 @@
  *
  **/
 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "loader.h"
+
 #include "pipe/p_screen.h"
+#include "pipe-loader/pipe_loader.h"
 
 #include "util/u_memory.h"
 #include "vl/vl_winsys.h"
@@ -33,6 +42,8 @@
 struct vl_dri3_screen
 {
struct vl_screen base;
+   xcb_connection_t *conn;
+   xcb_drawable_t drawable;
 };
 
 static void
@@ -82,7 +93,14 @@ vl_dri3_screen_get_private(struct vl_screen *vscreen)
 static void
 vl_dri3_screen_destroy(struct vl_screen *vscreen)
 {
-   /* TODO */
+   struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)vscreen;
+
+   assert(vscreen);
+
+   scrn->base.pscreen->destroy(scrn->base.pscreen);
+   pipe_loader_release(>base.dev, 1);
+   FREE(scrn);
+
return;
 }
 
@@ -90,6 +108,13 @@ struct vl_screen *
 vl_dri3_screen_create(Display *display, int screen)
 {
struct vl_dri3_screen *scrn;
+   const xcb_query_extension_reply_t *extension;
+   xcb_dri3_open_cookie_t open_cookie;
+   xcb_dri3_open_reply_t *open_reply;
+   xcb_get_geometry_cookie_t geom_cookie;
+   xcb_get_geometry_reply_t *geom_reply;
+   int is_different_gpu;
+   int fd;
 
assert(display);
 
@@ -97,6 +122,58 @@ vl_dri3_screen_create(Display *display, int screen)
if (!scrn)
   return NULL;
 
+   scrn->conn = XGetXCBConnection(display);
+   if (!scrn->conn)
+  goto free_screen;
+
+   xcb_prefetch_extension_data(scrn->conn , _dri3_id);
+   xcb_prefetch_extension_data(scrn->conn, _present_id);
+   extension = xcb_get_extension_data(scrn->conn, _dri3_id);
+   if (!(extension && extension->present))
+  goto free_screen;
+   extension = xcb_get_extension_data(scrn->conn, _present_id);
+   if (!(extension && extension->present))
+  goto free_screen;
+
+   open_cookie = xcb_dri3_open(scrn->conn, RootWindow(display, screen), None);
+   open_reply = xcb_dri3_open_reply(scrn->conn, open_cookie, NULL);
+   if (!open_reply)
+  goto free_screen;
+   if (open_reply->nfd != 1) {
+  free(open_reply);
+  goto free_screen;
+   }
+
+   fd = xcb_dri3_open_reply_fds(scrn->conn, open_reply)[0];
+   if (fd < 0) {
+  free(open_reply);
+  goto free_screen;
+   }
+   fcntl(fd, F_SETFD, FD_CLOEXEC);
+   free(open_reply);
+
+   fd = loader_get_user_preferred_fd(fd, _different_gpu);
+   /* TODO support different GPU */
+   if (is_different_gpu)
+  goto free_screen;
+
+   geom_cookie = xcb_get_geometry(scrn->conn, RootWindow(display, screen));
+   geom_reply = xcb_get_geometry_reply(scrn->conn, geom_cookie, NULL);
+   if (!geom_reply)
+  goto free_screen;
+   /* TODO support depth other than 24 */
+   if (geom_reply->depth != 24) {
+  free(geom_reply);
+  goto free_screen;
+   }
+   free(geom_reply);
+
+   if (pipe_loader_drm_probe_fd(>base.dev, fd))
+  scrn->base.pscreen = pipe_loader_create_screen(scrn->base.dev);
+
+   if (!scrn->base.pscreen)
+  goto release_pipe;
+
scrn->base.destroy = vl_dri3_screen_destroy;
scrn->base.texture_from_drawable = vl_dri3_screen_texture_from_drawable;
scrn->base.get_dirty_area = vl_dri3_screen_get_dirty_area;
@@ -106,4 +183,13 @@ vl_dri3_screen_create(Display *display, int screen)
scrn->base.pscreen->flush_frontbuffer = vl_dri3_flush_frontbuffer;
 
return >base;
+
+release_pipe:
+   if (scrn->base.dev)
+  pipe_loader_release(>base.dev, 1);
+   fd = -1;
+   close(fd);
+free_screen:
+   FREE(scrn);
+   return NULL;
 }
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 12/14] vl/dri3: handle PresentCompleteNotify event

2016-05-11 Thread Leo Liu

and get timestamp calculated based on the event's reply

Signed-off-by: Leo Liu 
---
 src/gallium/auxiliary/vl/vl_winsys_dri3.c | 28 +++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
index b1438b3..f917e4b 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -75,6 +75,10 @@ struct vl_dri3_screen
 
struct vl_dri3_buffer *front_buffer;
bool is_pixmap;
+
+   uint32_t send_msc_serial, recv_msc_serial;
+   uint64_t send_sbc, recv_sbc;
+   int64_t last_ust, ns_frame, last_msc, next_msc;
 };
 
 static void
@@ -98,6 +102,19 @@ dri3_free_back_buffer(struct vl_dri3_screen *scrn,
 }
 
 static void
+dri3_handle_stamps(struct vl_dri3_screen *scrn, uint64_t ust, uint64_t msc)
+{
+   int64_t ust_ns =  ust * 1000;
+
+   if (scrn->last_ust && (ust_ns > scrn->last_ust) &&
+   scrn->last_msc && (msc > scrn->last_msc))
+  scrn->ns_frame = (ust_ns - scrn->last_ust) / (msc - scrn->last_msc);
+
+   scrn->last_ust = ust_ns;
+   scrn->last_msc = msc;
+}
+
+static void
 dri3_handle_present_event(struct vl_dri3_screen *scrn,
   xcb_present_generic_event_t *ge)
 {
@@ -109,7 +126,16 @@ dri3_handle_present_event(struct vl_dri3_screen *scrn,
   break;
}
case XCB_PRESENT_COMPLETE_NOTIFY: {
-  /* TODO */
+  xcb_present_complete_notify_event_t *ce = (void *) ge;
+  if (ce->kind == XCB_PRESENT_COMPLETE_KIND_PIXMAP) {
+ scrn->recv_sbc = (scrn->send_sbc & 0xLL) | ce->serial;
+ if (scrn->recv_sbc > scrn->send_sbc)
+scrn->recv_sbc -= 0x1;
+ dri3_handle_stamps(scrn, ce->ust, ce->msc);
+  } else if (ce->kind == XCB_PRESENT_COMPLETE_KIND_NOTIFY_MSC) {
+ scrn->recv_msc_serial = ce->serial;
+ dri3_handle_stamps(scrn, ce->ust, ce->msc);
+  }
   break;
}
case XCB_PRESENT_EVENT_IDLE_NOTIFY: {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 10/14] vl/dri3: implement DRI3 BufferFromPixmap

2016-05-11 Thread Leo Liu

We also need render to the front buffer of temporary X pixmap,
this is the case of when we using opengl as video out for vaapi.
the basic implementation is to pass pixmap ID to X server, and
then X will return dma-buf fd, we will get the buffer object
through this dma-buf fd.

Signed-off-by: Leo Liu 
---
 src/gallium/auxiliary/vl/vl_winsys_dri3.c | 116 +-
 1 file changed, 113 insertions(+), 3 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
index c82da40..b1438b3 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -72,9 +72,21 @@ struct vl_dri3_screen
int cur_back;
 
struct u_rect dirty_areas[BACK_BUFFER_NUM];
+
+   struct vl_dri3_buffer *front_buffer;
+   bool is_pixmap;
 };
 
 static void
+dri3_free_front_buffer(struct vl_dri3_screen *scrn,
+struct vl_dri3_buffer *buffer)
+{
+   xcb_sync_destroy_fence(scrn->conn, buffer->sync_fence);
+   xshmfence_unmap_shm(buffer->shm_fence);
+   FREE(buffer);
+}
+
+static void
 dri3_free_back_buffer(struct vl_dri3_screen *scrn,
 struct vl_dri3_buffer *buffer)
 {
@@ -282,6 +294,7 @@ dri3_set_drawable(struct vl_dri3_screen *scrn, Drawable 
drawable)
xcb_void_cookie_t cookie;
xcb_generic_error_t *error;
xcb_present_event_t peid;
+   bool ret = true;
 
assert(drawable);
 
@@ -305,6 +318,7 @@ dri3_set_drawable(struct vl_dri3_screen *scrn, Drawable 
drawable)
   scrn->special_event = NULL;
}
 
+   scrn->is_pixmap = false;
peid = xcb_generate_id(scrn->conn);
cookie =
   xcb_present_select_input_checked(scrn->conn, peid, scrn->drawable,
@@ -314,15 +328,103 @@ dri3_set_drawable(struct vl_dri3_screen *scrn, Drawable 
drawable)
 
error = xcb_request_check(scrn->conn, cookie);
if (error) {
+  if (error->error_code != BadWindow)
+ ret = false;
+  else
+ scrn->is_pixmap = true;
   free(error);
-  return false;
} else
   scrn->special_event =
  xcb_register_for_special_xge(scrn->conn, _present_id, peid, 0);
 
dri3_flush_present_events(scrn);
 
-   return true;
+   return ret;
+}
+
+static struct vl_dri3_buffer *
+dri3_get_front_buffer(struct vl_dri3_screen *scrn)
+{
+   xcb_dri3_buffer_from_pixmap_cookie_t bp_cookie;
+   xcb_dri3_buffer_from_pixmap_reply_t *bp_reply;
+   xcb_sync_fence_t sync_fence;
+   struct xshmfence *shm_fence;
+   int fence_fd, *fds;
+   struct winsys_handle whandle;
+   struct pipe_resource templ, *texture = NULL;
+
+   if (scrn->front_buffer) {
+  pipe_resource_reference(, scrn->front_buffer->texture);
+  return scrn->front_buffer;
+   }
+
+   scrn->front_buffer = CALLOC_STRUCT(vl_dri3_buffer);
+   if (!scrn->front_buffer)
+  return NULL;
+
+   fence_fd = xshmfence_alloc_shm();
+   if (fence_fd < 0)
+  goto free_buffer;
+
+   shm_fence = xshmfence_map_shm(fence_fd);
+   if (!shm_fence)
+  goto close_fd;
+
+   bp_cookie = xcb_dri3_buffer_from_pixmap(scrn->conn, scrn->drawable);
+   bp_reply = xcb_dri3_buffer_from_pixmap_reply(scrn->conn, bp_cookie, NULL);
+   if (!bp_reply)
+  goto unmap_shm;
+
+   fds = xcb_dri3_buffer_from_pixmap_reply_fds(scrn->conn, bp_reply);
+   if (fds[0] < 0)
+  goto free_reply;
+
+   memset(, 0, sizeof(whandle));
+   whandle.type = DRM_API_HANDLE_TYPE_FD;
+   whandle.handle = (unsigned)fds[0];
+   whandle.stride = bp_reply->stride;
+   memset(, 0, sizeof(templ));
+   templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW;
+   templ.format = PIPE_FORMAT_B8G8R8X8_UNORM;
+   templ.target = PIPE_TEXTURE_2D;
+   templ.last_level = 0;
+   templ.width0 = bp_reply->width;
+   templ.height0 = bp_reply->height;
+   templ.depth0 = 1;
+   templ.array_size = 1;
+   scrn->front_buffer->texture =
+  scrn->base.pscreen->resource_from_handle(scrn->base.pscreen,
+   , ,
+   PIPE_HANDLE_USAGE_READ_WRITE);
+   close(fds[0]);
+   if (!scrn->front_buffer->texture)
+  goto free_reply;
+
+   xcb_dri3_fence_from_fd(scrn->conn,
+  scrn->drawable,
+  (sync_fence = xcb_generate_id(scrn->conn)),
+  false,
+  fence_fd);
+
+   pipe_resource_reference(, scrn->front_buffer->texture);
+   scrn->front_buffer->pixmap = scrn->drawable;
+   scrn->front_buffer->width = bp_reply->width;
+   scrn->front_buffer->height = bp_reply->height;
+   scrn->front_buffer->shm_fence = shm_fence;
+   scrn->front_buffer->sync_fence = sync_fence;
+   free(bp_reply);
+
+   return scrn->front_buffer;
+
+free_reply:
+   free(bp_reply);
+unmap_shm:
+   xshmfence_unmap_shm(shm_fence);
+close_fd:
+   close(fence_fd);
+free_buffer:
+   FREE(scrn->front_buffer);
+   return NULL;
 }
 
 static void
@@ -366,7 +468,9 @@ vl_dri3_screen_texture_from_drawable(struct

1 2 >

1 - 100 of 141 matches

Mail list logo