Re: [Mesa-dev] [PATCH 2/9] i965/fs: print writemask_all when it's enabled

2015-11-19 Thread Iago Toral
On Thu, 2015-11-19 at 10:14 -0500, Connor Abbott wrote:
> I think Ken already pushed a similar patch so we can drop this.

I don't see that in master, but sure, we can hold this back if he is
planning to push the same thing.

Iago

> On Thu, Nov 19, 2015 at 5:05 AM, Iago Toral Quiroga  wrote:
> > From: Connor Abbott 
> >
> > Reviewed-by: Iago Toral Quiroga 
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> > b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > index 2d4ed6a..63dcba7 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -4787,6 +4787,9 @@ fs_visitor::dump_instruction(backend_instruction 
> > *be_inst, FILE *file)
> >   fprintf(file, "1sthalf ");
> > }
> >
> > +   if (inst->force_writemask_all)
> > +  fprintf(file, "WE_all ");
> > +
> > fprintf(file, "\n");
> >  }
> >
> > --
> > 1.9.1
> >
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] i965: fix 64-bit immediates in brw_inst(_set)_bits

2015-11-19 Thread Iago Toral
On Thu, 2015-11-19 at 12:08 -0800, Matt Turner wrote:
> On Thu, Nov 19, 2015 at 11:35 AM, Kristian Høgsberg  
> wrote:
> > On Thu, Nov 19, 2015 at 11:24 AM, Matt Turner  wrote:
> >> On Thu, Nov 19, 2015 at 9:30 AM, Kristian Høgsberg  
> >> wrote:
> >>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
> >>> wrote:
>  From: Connor Abbott 
> 
>  If we tried to get/set something that was exactly 64 bits, we would
>  try to do (1 << 64) - 1 to calculate the mask which doesn't give us all
>  1's like we want.
> 
>  v2 (Iago)
>   - Replace ~0 by ~0ull
>   - Removed unnecessary parenthesis
> 
>  Reviewed-by: Iago Toral Quiroga 
>  ---
>   src/mesa/drivers/dri/i965/brw_inst.h | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
>  diff --git a/src/mesa/drivers/dri/i965/brw_inst.h 
>  b/src/mesa/drivers/dri/i965/brw_inst.h
>  index 4ed95c4..ec08194 100644
>  --- a/src/mesa/drivers/dri/i965/brw_inst.h
>  +++ b/src/mesa/drivers/dri/i965/brw_inst.h
>  @@ -694,7 +694,8 @@ brw_inst_bits(const brw_inst *inst, unsigned high, 
>  unsigned low)
>  high %= 64;
>  low %= 64;
> 
>  -   const uint64_t mask = (1ull << (high - low + 1)) - 1;
>  +   const uint64_t mask = (high - low == 63) ? ~0ull :
>  +  (1ull << (high - low + 1)) - 1;
> >>>
> >>> Can we do
> >>>
> >>> const uint64_t mask = (~0ul >> (64 - (high - low + 1)));
> >>>
> >>> instead?
> >>
> >> I don't think so, because ~0ul is of type unsigned, so right shifting
> >> it shifts in zeros. I was going to make a similar comment on the
> >> original patch -- "-1" is preferable over ~0u with an increasingly
> >> long sequence of l's because it's signed, so it's sign extended to
> >> fill whatever you assign it to. In your code though, since it's an
> >> operand we'd need -1ll, I think...
> >
> > No, shifting in zeros is the whole point. We start out with 64 1 bits,
> > then shift it down enough that we end up with (high - low + 1) 1 bits
> > at the bottom, which is what we're trying to compute.
> 
> Doh. Of course. :)
> 

Thanks Kristian, I'll use your version and add your Rb and Matt's to it.

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/9] Early fp64 fixes

2015-11-19 Thread Iago Toral
On Thu, 2015-11-19 at 12:19 -0800, Matt Turner wrote:
> For the next patches you send from Connor, please use
> --suppress-cc=author so that every reply doesn't generate a bounce
> message about his disabled @intel email. :)
> 

Yeah, will do :)

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/9] Early fp64 fixes

2015-11-19 Thread Iago Toral
On Thu, 2015-11-19 at 10:54 -0500, Connor Abbott wrote:
> On Thu, Nov 19, 2015 at 5:05 AM, Iago Toral Quiroga  wrote:
> > These patches are fixes extracted from Connor's fp64 branch that
> > I would like to review and land ahead of the rest. They are
> > independent of the rest of the series, some of them are even
> > general fixes unrelated to fp64 that could fix issues in master,
> > others are fairly trivial and we can easily land them now.
> >
> > It will help reduce the size of a rather big series (115 patches
> > and counting) that is still in development and ease future
> > rebases a little, specially since there is a patch that renames
> > one of the nir types, so we probably want to land that soon.
> >
> > Apparently, Connor sent most of these for review back in August but
> > they did not get reviewed back then. I went ahead and added my Rb to
> > all but one:
> >
> > i965/fs: add stride restrictions for copy propagation
> >
> > I could not verify all the conditions (particularly, the cases where we 
> > allow
> > 0 strides), so it would be nice if someone else could review that or point 
> > me
> > to the place in the docs where it says that 0 strides are safe for these
> > situations. Comments/Reviews to the other patches are also welcome :)
> 
> Well, the docs don't say that it's unsafe either :) But more
> seriously, we use 0 strides all the time -- in particular, we use them
> for uniforms when they get promoted to push constants, since when
> they're pushed to the EU they're packed into registers with, say, 8
> 32-bit things per SIMD8 register. So to get, say, the second uniform,
> we need to say "get the second channel of this register and splat it
> to all the channels of the source," which we can do with a source of
> stride 0 and offset 1.

Yes, that is what we do with set_smear(), right.

Only Haswell has an absolute statement saying that stride must be 1 for
Math instructions and we respect that in the patch, it makes sense.
Thanks for the clarification.

Iago

> >
> > No piglit regressions on IVB.
> >
> > Connor Abbott (8):
> >   i965/fs: print non-1 strides when dumping instructions
> >   i965/fs: print writemask_all when it's enabled
> >   i965: fix 64-bit immediates in brw_inst(_set)_bits
> >   i965/fs: respect force_sechalf/force_writemask_all in CSE
> >   i965/fs: don't propagate cmod when the exec sizes differ
> >   i965/fs: add stride restrictions for copy propagation
> >   i965/vec4: avoid dependency control around Align1 instructions
> >   nir/builder: only read meaningful channels in nir_swizzle()
> >
> > Jason Ekstrand (1):
> >   nir: s/nir_type_unsigned/nir_type_uint
> >
> >  src/gallium/auxiliary/nir/tgsi_to_nir.c|  2 +-
> >  src/glsl/nir/glsl_to_nir.cpp   |  2 +-
> >  src/glsl/nir/nir.h |  2 +-
> >  src/glsl/nir/nir_builder.h |  2 +-
> >  src/glsl/nir/nir_constant_expressions.py   |  2 +-
> >  src/glsl/nir/nir_opcodes.py| 78 
> > +++---
> >  src/glsl/nir/nir_search.c  |  4 +-
> >  src/mesa/drivers/dri/i965/brw_fs.cpp   | 15 +
> >  .../drivers/dri/i965/brw_fs_cmod_propagation.cpp   |  3 +
> >  .../drivers/dri/i965/brw_fs_copy_propagation.cpp   | 58 +++-
> >  src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |  2 +
> >  src/mesa/drivers/dri/i965/brw_inst.h   |  6 +-
> >  src/mesa/drivers/dri/i965/brw_nir.c|  4 +-
> >  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 +++--
> >  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp |  2 +-
> >  15 files changed, 142 insertions(+), 57 deletions(-)
> >
> > --
> > 1.9.1
> >
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965: Use ldexpf() in VF float test set up.

2015-11-19 Thread Matt Turner
---
 src/mesa/drivers/dri/i965/test_vf_float_conversions.cpp | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/test_vf_float_conversions.cpp 
b/src/mesa/drivers/dri/i965/test_vf_float_conversions.cpp
index 6a8bcea..7f03425 100644
--- a/src/mesa/drivers/dri/i965/test_vf_float_conversions.cpp
+++ b/src/mesa/drivers/dri/i965/test_vf_float_conversions.cpp
@@ -40,15 +40,10 @@ void vf_float_conversion_test::SetUp() {
   int ebits = (vf >> 4) & 0x7;
   int mbits = vf & 0xf;
 
-  int e = ebits - 3;
+  float x = 1.0f + mbits / 16.0f;
+  int exp = ebits - 3;
 
-  float value = 1.0f;
-
-  value += mbits / 16.0f;
-
-  value *= exp2f(e);
-
-  vf_to_float[vf] = value;
+  vf_to_float[vf] = ldexpf(x, exp);
}
 }
 
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965: Test that nonrepresentable floats cannot be converted to VF.

2015-11-19 Thread Matt Turner
---
 src/mesa/drivers/dri/i965/test_vf_float_conversions.cpp | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/test_vf_float_conversions.cpp 
b/src/mesa/drivers/dri/i965/test_vf_float_conversions.cpp
index 7f03425..7af97d0 100644
--- a/src/mesa/drivers/dri/i965/test_vf_float_conversions.cpp
+++ b/src/mesa/drivers/dri/i965/test_vf_float_conversions.cpp
@@ -93,3 +93,18 @@ TEST_F(vf_float_conversion_test, test_special_case_0)
EXPECT_EQ(f2u(brw_vf_to_float(brw_float_to_vf(+0.0f))), f2u(+0.0f));
EXPECT_EQ(f2u(brw_vf_to_float(brw_float_to_vf(-0.0f))), f2u(-0.0f));
 }
+
+TEST_F(vf_float_conversion_test, test_nonrepresentable_float_input)
+{
+   EXPECT_EQ(brw_float_to_vf(+32.0f), -1);
+   EXPECT_EQ(brw_float_to_vf(-32.0f), -1);
+
+   EXPECT_EQ(brw_float_to_vf(+16.5f), -1);
+   EXPECT_EQ(brw_float_to_vf(-16.5f), -1);
+
+   EXPECT_EQ(brw_float_to_vf(+8.25f), -1);
+   EXPECT_EQ(brw_float_to_vf(-8.25f), -1);
+
+   EXPECT_EQ(brw_float_to_vf(+4.125f), -1);
+   EXPECT_EQ(brw_float_to_vf(-4.125f), -1);
+}
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965/vec4: Initialize nir_inputs with src_reg().

2015-11-19 Thread Matt Turner
nir_locals, nir_ssa_values, and nir_system_values are all dst_reg (not
that that makes a whole lot of sense to me), and only nir_inputs is a
src_reg.
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 3d186b4..3b7c28d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -122,7 +122,7 @@ vec4_visitor::nir_setup_inputs()
 {
nir_inputs = ralloc_array(mem_ctx, src_reg, nir->num_inputs);
for (unsigned i = 0; i < nir->num_inputs; i++) {
-  nir_inputs[i] = dst_reg();
+  nir_inputs[i] = src_reg();
}
 
nir_foreach_variable(var, &nir->inputs) {
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965: Prevent implicit upcasts to brw_reg.

2015-11-19 Thread Matt Turner
Now that backend_reg inherits from brw_reg, we have to be careful to
avoid the object slicing problem.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   |  6 +++--
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   |  4 +--
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp |  2 +-
 src/mesa/drivers/dri/i965/brw_shader.h | 31 +-
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 15 ++-
 .../drivers/dri/i965/brw_vec4_copy_propagation.cpp |  4 +--
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp   | 10 +++
 7 files changed, 53 insertions(+), 19 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index e9c990d..a5f1d8f 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -392,7 +392,8 @@ fs_reg::fs_reg(struct brw_reg reg) :
 bool
 fs_reg::equals(const fs_reg &r) const
 {
-   return (memcmp((brw_reg *)this, (brw_reg *)&r, sizeof(brw_reg)) == 0 &&
+   return (memcmp(&this->as_brw_reg(), &r.as_brw_reg(),
+  sizeof(struct brw_reg)) == 0 &&
reg_offset == r.reg_offset &&
subreg_offset == r.subreg_offset &&
!reladdr && !r.reladdr &&
@@ -2037,7 +2038,8 @@ fs_visitor::opt_algebraic()
 if (inst->dst.type != inst->src[0].type)
assert(!"unimplemented: saturate mixed types");
 
-if (brw_saturate_immediate(inst->dst.type, &inst->src[0])) {
+if (brw_saturate_immediate(inst->dst.type,
+   &inst->src[0].as_brw_reg())) {
inst->saturate = false;
progress = true;
 }
diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
index 426ea57..b72fe33 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
@@ -478,14 +478,14 @@ fs_visitor::try_constant_propagate(fs_inst *inst, 
acp_entry *entry)
 
   if (inst->src[i].abs) {
  if ((devinfo->gen >= 8 && is_logic_op(inst->opcode)) ||
- !brw_abs_immediate(val.type, &val)) {
+ !brw_abs_immediate(val.type, &val.as_brw_reg())) {
 continue;
  }
   }
 
   if (inst->src[i].negate) {
  if ((devinfo->gen >= 8 && is_logic_op(inst->opcode)) ||
- !brw_negate_immediate(val.type, &val)) {
+ !brw_negate_immediate(val.type, &val.as_brw_reg())) {
 continue;
  }
   }
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 77969c4..ea37ea4 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -92,7 +92,7 @@ brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, unsigned gen)
case ARF:
case FIXED_GRF:
case IMM:
-  brw_reg = *static_cast(reg);
+  brw_reg = reg->as_brw_reg();
   break;
case BAD_FILE:
   /* Probably unused. */
diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
b/src/mesa/drivers/dri/i965/brw_shader.h
index a4139cf..52b9aff 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.h
+++ b/src/mesa/drivers/dri/i965/brw_shader.h
@@ -39,11 +39,21 @@
 #define MAX_VGRF_SIZE 16
 
 #ifdef __cplusplus
-struct backend_reg : public brw_reg
+struct backend_reg : private brw_reg
 {
backend_reg() {}
backend_reg(struct brw_reg reg) : brw_reg(reg) {}
 
+   const brw_reg &as_brw_reg() const
+   {
+  return static_cast(*this);
+   }
+
+   brw_reg &as_brw_reg()
+   {
+  return static_cast(*this);
+   }
+
bool is_zero() const;
bool is_one() const;
bool is_negative_one() const;
@@ -61,6 +71,25 @@ struct backend_reg : public brw_reg
 * For uniforms, this is in units of 1 float.
 */
uint16_t reg_offset;
+
+   using brw_reg::type;
+   using brw_reg::file;
+   using brw_reg::negate;
+   using brw_reg::abs;
+   using brw_reg::address_mode;
+   using brw_reg::subnr;
+   using brw_reg::nr;
+
+   using brw_reg::swizzle;
+   using brw_reg::writemask;
+   using brw_reg::indirect_offset;
+   using brw_reg::vstride;
+   using brw_reg::width;
+   using brw_reg::hstride;
+
+   using brw_reg::f;
+   using brw_reg::d;
+   using brw_reg::ud;
 };
 #endif
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 44893e3..f564ce1 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -79,7 +79,7 @@ src_reg::src_reg(struct brw_reg reg) :
 }
 
 src_reg::src_reg(const dst_reg ®) :
-   backend_reg(static_cast(reg))
+   backend_reg(reg.as_brw_reg())
 {
this->reg_offset = reg.reg_offset;
this->reladdr = reg.reladdr;
@@ -137,7 +137,7 @@ dst_reg::dst_reg(struct brw_reg reg) :
 }
 
 dst_reg::dst_reg(const src_reg ®) :
-   backend_reg(static_cast(reg))
+   backend_reg(reg.as_brw_reg())
 {
 

Re: [Mesa-dev] [PATCH] i965/fs: Replace nested ternary with if ladder.

2015-11-19 Thread Matt Turner
On Wed, Nov 18, 2015 at 8:15 AM, Francisco Jerez  wrote:
> Matt Turner  writes:
>
>> Since the types of the expression were
>>
>>bool ? src_reg : (bool ? brw_reg : brw_reg)
>>
>> the result of the second (nested) ternary would be implicitly
>> converted to a src_reg by the src_reg(struct brw_reg) constructor. I.e.,
>>
>>bool ? src_reg : src_reg(bool ? brw_reg : brw_reg)
>>
>> In the next patch, I make backend_reg (the parent of src_reg) inherit
>> from brw_reg, which changes this expression to return brw_reg, which
>> throws away any fields that exist in the classes derived from brw_reg.
>> I.e.,
>>
>>src_reg(bool ? brw_reg(src_reg) : bool ? brw_reg : brw_reg)
>>
>
> The fundamental problem here has nothing to do with ternary operators,
> but with the fact that you made backend_reg inherit from brw_reg even
> though there is no is-a relationship -- More precisely, the Liskov
> substitution principle doesn't hold: Pass a valid program written in
> terms of brw_reg (e.g. brw_MOV()) an fs_reg (or similarly any of the
> vec4 register objects) and you've got a bug (AKA object slicing).  The
> reason is that fs_regs have additional structure and restrictions that a
> valid brw_reg program won't take into account (e.g. additional offset,
> stride, reladdr), and even the methods shared with the base brw_reg
> class have subtly different semantics.  A proper conversion from fs_reg
> to brw_reg amounts to more than just slicing off the additional
> structure of the fs_reg (c.f. brw_reg_from_fs_reg()), what points at
> there being an implemented-in-terms-of relationship rather than an is-a
> relationship.

I agree.

> The typical approach to solve this problem is use private inheritance
> and explicitly expose the fields from the base class you want to be
> public using "using" directives.  That will prevent implicit (and
> generally incorrect) conversion of register subclasses into brw_reg.

I have implemented this locally and I'll Cc you on the patch. The only
other bit I had to do other than what you suggested was add

+   const brw_reg &as_brw_reg() const
+   {
+  return static_cast(*this);
+   }
+
+   brw_reg &as_brw_reg()
+   {
+  return static_cast(*this);
+   }

to backend_reg to allow explicit upcasts, e.g., for use as an argument
of of brw_saturate_immediate(enum brw_reg_type type, struct brw_reg
*reg).

> Another somewhat more wordy approach is use aggregation and expose any
> interesting fields from the brw_reg using accessors, or use completely
> unrelated representations for backend_reg and brw_reg and perform the
> conversion manually (kind of like was done previously, but replacing the
> fixed_hw_reg thing with precisely the subset of brw_reg fields required
> to represent HW_REGs and immediates orthogonally with respect to the
> remaining backend_reg fields).
>
>> Generally this code was gross, and wasn't actually shorter or easier to
>> read than an if ladder.
>
> Generally I find it gross to split up a pure expression into separate
> statements even though there's no control-flow relationship intended.
> The widely used pattern:
>
> | condition0 ? expression0 :
> | condition1 ? expression1 :
> | /* ... */
> | conditionN ? expressionN :
> | default-expression
>
> is at least as readable as the equivalent if-ladder (although this point
> is highly subjective) and frequently saves you from introducing mutable
> variables that only ever need to be assigned once and after that point
> remain constant even though they're not explicitly marked as constants
> (what obscures the dataflow of the program -- This last point is

Right, I see the appeal. Part of the problem was that the ternary
arguments were mixed types, which I'm sure one could argue was fine in
the existing code because there was an fs_reg(brw_reg) constructor. In
any case, this is the problem with implicit conversions...

> probably not subjective at all, but it doesn't apply to this particular
> example).  In other cases chained ternary operators can simplify code by
> allowing the factorization of duplicated code like:
>
> | p ? f(x) :
> | q ? f(y) :
> | /* ... */ :
> | f(z)
>
>   ->
>
> | f(p ? x :
> |   q ? y :
> |   /* ... */ :
> |   z)
>
> In this particular case the duplicated "f" factor you added is just a
> trivial return statement so this benefit is not too important in this

Interesting. I wasn't aware of that.

> case either, and you're left with the purely subjective matter of which
> is the most readable -- In doubt I tend to prefer ternary operators
> instinctively when the alternatives are pure expressions simply for
> consistency and brevity.  You may feel otherwise and I'm not going to
> complain about this change because the difference is purely subjective
> -- But for that same reason calling this "gross" seems like a rather
> gross overstatement to me.

It was judgmental and I apologize for that. At the same time, I
haven't found another person who disagreed with that opinion. :)

Thanks for the email

Re: [Mesa-dev] [PATCH 4/4] automake: loader: don't create an empty dri3 helper

2015-11-19 Thread Vinson Lee
On Thu, Nov 19, 2015 at 9:36 PM, Vinson Lee  wrote:
> On Thu, Nov 19, 2015 at 8:32 AM, Emil Velikov  
> wrote:
>> On 19 November 2015 at 16:33, Emil Velikov  wrote:
>>> From: Emil Velikov 
>>>
>>> Seems that creating an empty one does not fair too well with MacOSX's
>>> ar. Considering that all the users of the helper include it only when
>>> needed, let's reshuffle the makefile.
>>>
>> Forgot to mention - the cause is guesswork, as I'm not that intimate
>> familiar with different AR implementations. Yet this seems the most
>> likely one.
>>
>> Vinson, can you test this ?
>>
>> -Emil
>
> This patch fixed my Mac OS X build error.
>
> Tested-by: Vinson Lee 

Fixed a typo in Tested-by line.

Tested-by: Vinson Lee 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] automake: loader: don't create an empty dri3 helper

2015-11-19 Thread Vinson Lee
On Thu, Nov 19, 2015 at 8:32 AM, Emil Velikov  wrote:
> On 19 November 2015 at 16:33, Emil Velikov  wrote:
>> From: Emil Velikov 
>>
>> Seems that creating an empty one does not fair too well with MacOSX's
>> ar. Considering that all the users of the helper include it only when
>> needed, let's reshuffle the makefile.
>>
> Forgot to mention - the cause is guesswork, as I'm not that intimate
> familiar with different AR implementations. Yet this seems the most
> likely one.
>
> Vinson, can you test this ?
>
> -Emil

This patch fixed my Mac OS X build error.

Tested-by: Vinson Lee 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8 v2] i965: Enable EXT_shader_samples_identical

2015-11-19 Thread Ian Romanick
On 11/19/2015 02:13 PM, Neil Roberts wrote:
> Ian Romanick  writes:
> 
>> Am I correct that nothing special is needed in the vec4 backend? It
>> seems like mcs should know the register size, and the CMP with 0
>> should just do the right thing.
> 
> I think you probably will have to do something for the vec4 backend.
> Currently it generates an instruction like this:
> 
> cmp.z.f0(8) g8<1>.xUD   g9<4,4,1>.xUD   0xUD
> 
> g9 contains the MCS data. If I understand correctly for 16x MSAA the
> second half of the MCS data is in the y component which is currently
> ignored by this instruction. Maybe something like this would work:
> 
>   if (mcs.file == BRW_IMMEDIATE_VALUE) {
>  emit(MOV(dest, src_reg(0u)));
>   } else if ((key_tex->msaa_16 & (1 << sampler))) {
>  dst_reg tmp(this, glsl_type::uint_type);
>  dst_reg swizzled_mcs = mcs;
>  swizzled_mcs.swizzle = BRW_SWIZZLE_;
>  emit(OR(tmp, mcs, swizzled_mcs));
>  emit(CMP(dest, tmp, src_reg(0u), BRW_CONDITIONAL_EQ));
>   } else {
>  emit(CMP(dest, mcs, src_reg(0u), BRW_CONDITIONAL_EQ));
>   }
> 
> Sadly the optimiser doesn't optimise out the extra instruction this
> time. There's probably a better way to do it. On the other hand I can't
> think why anyone would be using this function in the vec4 backend so
> it's probably not worth worrying about.
> 
> I haven't tested this at all. I guess to test it you would have to write
> a GS version of the Piglit test? I think that is the only place that
> uses the vec4 backend on SKL.
> 
> If we don't want to risk this we could always just make it return false
> when msaa_16 is set for the sampler. On the other hand if there is

As far as I could tell, there is no msaa_16 for vec4.  It looks like it
always takes the equivalent of the FS msaa_16 path when gen >= 9.

> currently no test for the 8x version either then we should probably
> assume that neither of them work… maybe it wouldn't be so bad to just
> always return false in the vec4 backend until we've tested it?

Hmm... yeah, I'll do that.  It's an optimization, and I don't think
anyone will miss it in vec4 on any platform.  That way I don't have to
worry about applying a bunch of fixes to the 11.1 branch after I create
tests.  We can either implement it ("fix") in 11.1 or 11.2.

> Regards,
> - Neil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/8] Implement EXT_shader_samples_identical

2015-11-19 Thread Ian Romanick
On 11/19/2015 06:47 AM, Emil Velikov wrote:
> Hi Ian,
> 
> Attempting to high-jack the thread :-P
> 
> On 18 November 2015 at 23:46, Ian Romanick  wrote:
> 
>> I really wanted to get this in the next Mesa release.  For some reason,
>> I thought the branch point was after Thanksgiving (which is next
>> Thursday).  Ken reminded me yesterday that the branch point is actually
>> this Friday. :( As a result, I'm sending it out today to get review as
>> soon as possible.
>>
> Any suggestions on how we make dates more prominent/obvious ? I was
> going to say "we even have it in #irc-devel channel topic" but you
> don't seem to be around these days.

VPN problems have made it basically impossible for me to be on IRC.
It's a long, annoying story.  It's one of those "too busy to fix it"
things too.  Ugh.

> Alternatively you can set a filter in your email client - the subject
> is always "Mesa #VERSION release plan"

All my mesa-dev email goes in one folder, and it's really easy for
single messages to get lost between the 40 patch series.  I like Ilia's
suggestion of using mesa-announce.  That just dumps in my main inbox,
and I'm less likely to miss it there.

One other thing... what time are you planning to make the branch
tomorrow? :)

> Thanks
> Emil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallivm: use sampler index 0 for texel fetches

2015-11-19 Thread sroland
From: Roland Scheidegger 

texel fetches don't use any samplers. Previously we just set the same
number for both texture and sampler unit (as per "ordinary" gl style
sampling where the numbers are always the same) however this would trigger
some assertions checking that the sampler index isn't over PIPE_MAX_SAMPLERS
limit elsewhere with d3d10, so just set to 0.
(Fixing the assertion instead isn't really an option, the sampler isn't
really used but might still pass an out-of-bound pointer around and even
copy some things from it.)
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
index 7d2cd9a..28c7a86 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
@@ -2608,7 +2608,12 @@ emit_fetch_texels( struct lp_build_tgsi_soa_context *bld,
params.type = bld->bld_base.base.type;
params.sample_key = sample_key;
params.texture_index = unit;
-   params.sampler_index = unit;
+   /*
+* sampler not actually used, set to 0 so it won't exceed PIPE_MAX_SAMPLERS
+* and trigger some assertions with d3d10 where the sampler view number
+* can exceed this.
+*/
+   params.sampler_index = 0;
params.context_ptr = bld->context_ptr;
params.thread_data_ptr = bld->thread_data_ptr;
params.coords = coords;
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix JIP to properly skip over unrelated control flow.

2015-11-19 Thread Kenneth Graunke
On Thursday, November 19, 2015 02:05:44 PM Kenneth Graunke wrote:
> We've apparently always been botching JIP for sequences such as:
> 
>do
>cmp.f0.0 ...
>(+f0.0) break
>...
>if
>   ...
>else
>   ...
>endif
>...
>while
> 
> Normally, UIP is supposed to point to the final destination of the jump,
> while in nested control flow, JIP is supposed to point to the end of the
> current nesting level.  It essentially bounces out of the current nested
> control flow, to an instruction that has a JIP which bounces out another
> level, and so on.
> 
> In the above example, when setting JIP for the BREAK, we call
> brw_find_next_block_end(), which begins a search after the BREAK for the
> next ENDIF, ELSE, WHILE, or HALT.  It ignores the IF and finds the ELSE,
> setting JIP there.
> 
> This makes no sense at all.  The break is supposed to skip over the
> whole if/else/endif block entirely.  They have a sibling relationship,
> not a nesting relationship.
> 
> This patch fixes brw_find_next_block_end() to track depth as it does
> its search, and ignore anything not at depth 0.  So when it sees the
> IF, it ignores everything until after the ENDIF.  That way, it finds
> the end of the right block.
> 
> Caught while debugging a tessellation shader - no apparent effect on
> Piglit.  I did look for actual applications that were affected, and
> found that GLBenchmark Manhattan had a BREAK with a bogus JIP.
> 
> Cc: mesa-sta...@lists.freedesktop.org
> Signed-off-by: Kenneth Graunke 

I tried pretty hard to produce a Piglit test that showed an actual
problem from doing this wrong - and I wasn't able to.

It seems it just steps through some extra instructions which do
nothing, and is pretty harmless.

So I don't think this should actually go to stable after all.




signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 13/14] glsl: add subroutine index qualifier support

2015-11-19 Thread Timothy Arceri
On Fri, 2015-11-20 at 14:25 +1100, Timothy Arceri wrote:
> Hi Dave/Tapani,
> 
> This is the last patch in this series unreviewed is either of you able to
> review is as you guys implemented the respective extensions.

Forgot to say that there are piglit compile tests in master under
 ARB_explicit_uniform_location

and an unreviewed patch that test the correct values are returned by progrm
query

http://patchwork.freedesktop.org/patch/63795/

> 
> Thanks,
> Tim
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 13/14] glsl: add subroutine index qualifier support

2015-11-19 Thread Timothy Arceri
Hi Dave/Tapani,

This is the last patch in this series unreviewed is either of you able to
review is as you guys implemented the respective extensions.

Thanks,
Tim
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 2/2] radeonsi/compute: Use the compiler's COMPUTE_PGM_RSRC* register values

2015-11-19 Thread Michel Dänzer
On 20.11.2015 05:53, Emil Velikov wrote:
> On 19 November 2015 at 20:37, Tom Stellard  wrote:
>> On Wed, Nov 18, 2015 at 05:43:31PM +, Emil Velikov wrote:
>>> Hi Tom,
>>>
>>> Please flip the order of the patches and drop the now patch 1/2 from
>>> the stable queue.
>>>
>>
>> I'm confused about why I need to flip the order of the patches.
>>
> As mentioned on IRC - patch 1 is not a bugfix and despite how trivial
> I'd rather have only bugfixes in stable.
> Is flipping things too much to ask ?

It would make the current patch 2 a bit ugly, because it would use the
ls_rsrc field even for non-LS shaders.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeon/llvm: Use llvm.AMDIL.exp intrinsic again for now

2015-11-19 Thread Michel Dänzer
On 20.11.2015 02:11, Nicolai Hähnle wrote:
> On 19.11.2015 17:37, Tom Stellard wrote:
>> On Thu, Nov 19, 2015 at 11:17:12AM +0100, Nicolai Hähnle wrote:
>>> On 19.11.2015 03:55, Tom Stellard wrote:
 On Thu, Nov 19, 2015 at 11:31:55AM +0900, Michel Dänzer wrote:
> From: Michel Dänzer 
>
> llvm.exp2.f32 doesn't work in some cases yet.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709
> Signed-off-by: Michel Dänzer 
> ---
>
> Once the problem is fixed in the LLVM AMDGPU backend, we can re-enable
> llvm.exp2.f32 for the fixed LLVM version.
>
>   src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> index ac99e73..c94f109 100644
> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> @@ -1539,7 +1539,7 @@ void radeon_llvm_context_init(struct
> radeon_llvm_context * ctx)
>   bld_base->op_actions[TGSI_OPCODE_ENDIF].emit = endif_emit;
>   bld_base->op_actions[TGSI_OPCODE_ENDLOOP].emit = endloop_emit;
>   bld_base->op_actions[TGSI_OPCODE_EX2].emit =
> build_tgsi_intrinsic_nomem;
> -bld_base->op_actions[TGSI_OPCODE_EX2].intr_name =
> "llvm.exp2.f32";
> +bld_base->op_actions[TGSI_OPCODE_EX2].intr_name =
> "llvm.AMDIL.exp.";

 Do we want a native instruction here, or do we want IEEE precise exp2?
 If it's the former then we shouldn't be using llvm.exp2.f32 anyway.

 I know that we need to use llvm.AMDIL.exp. for older LLVM, but for
 newer
 LLVM, I would really like to start doing intrinsics the correct
 way.  In
 this case, it means adding an llvm.amdgcn.exp.f32 intrinsic to
 include/llvm/IR/IntrinsicsAMDGPU.td.  In the section with the amdgcn
 TargetPrefix.
>>>
>>> Doesn't AMDGPU currently emit native instructions for the various
>>> math intrinsics anyway? If your plan is to change that and we add
>>> our own intrinsics, what's the plan to still benefit from existing
>>> LLVM optimization passes?
>>>
>>
>> AMDGPU does emit native instruction for the math intrinsics, but this is
>> wrong in most cases, because it assumes that a less precise
>> calculation is
>> acceptable when this is not always the case.
> 
> Okay, that makes sense.
> 
>> This a good point about optimizations.  I think the main benefits we
>> get from
>> using the LLVM intrinsic are constant folding and also speculative
>> execution.
>>
>> An alternate solution would be to use the llvm intrinsic, but then
>> communicate
>> to the backend via a function attribute that it is OK to use the less
>> precise
>> version of exp.
> 
> I like that alternate solution.
> 
>>> FWIW, the original bug that Michel addresses here is caused by LLVM
>>> libcall optimizations replacing the exp2 _intrinsic_ by a ldexp
>>> _libcall_, which AMDGPU codegen cannot deal with. I suggested to
>>> address this with http://reviews.llvm.org/D14327. Could you take a
>>> look at that?
>>>
>>
>> Please take a look at the comment I added to this review.
> 
> Done :)

Meanwhile, can I get an R-b for this patch to fix the immediate
regression in Mesa Git until LLVM is fixed (and for released versions of
LLVM)?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] MESA_EXTENSION_OVERRIDE problem

2015-11-19 Thread Ian Romanick
On 11/19/2015 05:40 PM, Nanley Chery wrote:
> 
> On Thu, Nov 19, 2015 at 12:13 PM, Brian Paul  > wrote:
> 
> Hi Nanley,
> 
>  
> Hi Brian,
>  
> 
> Maybe you can fix an issue I have with the new extension code.
> 
> Previously, I could do something like export
> MESA_EXTENSION_OVERRIDE="-ARB_clear_buffer_object" and I would no
> longer see it in the GL_EXTENSIONS string, even if it was an "always
> on" extension.
> 
> Now when I try that I get:
> 
> Mesa 11.1.0-devel implementation error: Trying to disable
> permanently enabled extensions: GL_ARB_get_texture_sub_image
> Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa
> 
> The whole point of the "-GL_EXT_foobar" syntax was to hide an
> extension from the application when it queries the driver's extensions.
> 
> Can you please fix this so it works as before?
> 
> 
> I have two branches that provide the ability to disable permanently
> enabled extensions:
> 1. The first only modifies the extension strings and is located here:
> http://cgit.freedesktop.org/~nchery/mesa/commit/?h=mod_always_on
> 
> 2. The second modifies the extension strings and disables the extension
> within the driver (assuming appropriate the helper function is used). It
> also provides some performance benefits. :
> http://cgit.freedesktop.org/~nchery/mesa/commit/?h=init_ext_vals
> 
> 
> I'd appreciate any feedback on the two approaches as I work to get the
> feature upstreamed.

I think #2 might be better, but there's a lot of churn.  I don't know
that we want that much churn right around the time of the release branch
point, and I think it would be good to have this resolved in 11.1.  I
also have a few bits of feedback in #2, so it might take a couple
iterations before that could land.

Whatever we do, I think we should create some 'make check' tests to
verify that the functionality continues to work in the future.

> Regards,
> Nanley
>  
> 
> Thanks.
> 
> -Brian
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org 
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] i965/gen9: Don't allow the RGBX formats for texturing/rendering

2015-11-19 Thread Ben Widawsky
On Thu, Nov 19, 2015 at 04:25:21PM +0100, Neil Roberts wrote:
> The RGBX surface formats aren't renderable so we internally remap them
> to RGBA when rendering. They are retained as RGBX when used as
> textures. However since the previous patch fast clears are disabled
> for surfaces that use a different format for rendering than for
> texturing. To avoid this situation we can just pretend not to support
> RGBX formats at all. This will cause the upper layers of mesa to pick
> an RGBA format internally instead. This should be safe because we
> always override the alpha component to 1.0 for RGBX in the texture
> swizzle anyway. We could also do this for all gens except that it's a
> bit more difficult when the hardware doesn't support texture
> swizzling. Gens using the blorp have further problems because that
> doesn't implement this swizzle override.
> ---
>  src/mesa/drivers/dri/i965/brw_surface_formats.c | 28 
> +
>  1 file changed, 28 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
> b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> index 7c38431..7594aca 100644
> --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
> +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> @@ -733,6 +733,34 @@ brw_init_surface_formats(struct brw_context *brw)
> if (brw->gen >= 8)
>ctx->TextureFormatSupported[MESA_FORMAT_Z_UNORM16] = true;
>  
> +   /* The RGBX formats are not renderable. Normally these get mapped
> +* internally to RGBA formats when rendering. However on Gen9+ when this
> +* internal override is used fast clears don't work so they are disabled 
> in
> +* brw_meta_fast_clear. To avoid this problem we can just pretend not to
> +* support RGBX formats at all. This will cause the upper layers of Mesa 
> to
> +* pick the RGBA formats instead. This works fine because when it is used

A lot of these formats are already unsupported for fast clears. In fact, I
believe only MESA_FORMAT_R8G8B8X8_UNORM is a problem. Are you trying to
accomplish something else here as well?

> +* as a texture source the swizzle state is programmed to force the alpha
> +* channel to 1.0 anyway. We could also do this for all gens except that
> +* it's a bit more difficult when the hardware doesn't support texture
> +* swizzling. Gens using the blorp have further problems because that
> +* doesn't implement this swizzle override. We don't need to do this for
> +* BGRX because that actually is supported natively on Gen8+.
> +*/
> +   if (brw->gen >= 9) {
> +  static const mesa_format rgbx_formats[] = {
> + MESA_FORMAT_R8G8B8X8_UNORM,
> + MESA_FORMAT_R8G8B8X8_SRGB,
> + MESA_FORMAT_RGBX_UNORM16,
> + MESA_FORMAT_RGBX_FLOAT16,
> + MESA_FORMAT_RGBX_FLOAT32
> +  };
> +
> +  for (int i = 0; i < ARRAY_SIZE(rgbx_formats); i++) {
> + ctx->TextureFormatSupported[rgbx_formats[i]] = false;
> + brw->format_supported_as_render_target[rgbx_formats[i]] = false;
> +  }
> +   }
> +


> /* On hardware that lacks support for ETC1, we map ETC1 to RGBX
>  * during glCompressedTexImage2D(). See intel_mipmap_tree::wraps_etc1.
>  */
> -- 
> 1.9.3
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/5] i965: Check base format to determine whether to use tiled memcpy

2015-11-19 Thread Ben Widawsky
On Thu, Nov 19, 2015 at 04:25:19PM +0100, Neil Roberts wrote:
> The tiled memcpy doesn't work for copying from RGBX to RGBA because it
> doesn't override the alpha component to 1.0. Commit 2cebaac479d4 added
> a check to disable it for RGBX formats by looking at the TexFormat.
> However a lot of the rest of the code base is written with the
> assumption that an RGBA texture can be used internally to implement a
> GL_RGB texture. If that is done then this check breaks. This patch
> makes it instead check the base format of the texture which I think
> more directly matches the intention.
> 
> Cc: Jason Ekstrand 


I really don't know the corner cases well enough, but concept seems good to me.
In particular, I assume this is now going to return false for more cases than
previously - and that's okay, I guess?

> ---
>  src/mesa/drivers/dri/i965/intel_pixel_read.c | 7 ---
>  src/mesa/drivers/dri/i965/intel_tex_image.c  | 7 ---
>  2 files changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_pixel_read.c 
> b/src/mesa/drivers/dri/i965/intel_pixel_read.c
> index 9bcbbd1..c8aef65 100644
> --- a/src/mesa/drivers/dri/i965/intel_pixel_read.c
> +++ b/src/mesa/drivers/dri/i965/intel_pixel_read.c
> @@ -135,10 +135,11 @@ intel_readpixels_tiled_memcpy(struct gl_context * ctx,
>return false;
>  
> /* We can't handle copying from RGBX or BGRX because the tiled_memcpy
> -* function doesn't set the last channel to 1.
> +* function doesn't set the last channel to 1. Note this checks BaseFormat
> +* rather than TexFormat in case the RGBX format is being simulated with 
> an
> +* RGBA format.
>  */
> -   if (rb->Format == MESA_FORMAT_B8G8R8X8_UNORM ||
> -   rb->Format == MESA_FORMAT_R8G8B8X8_UNORM)
> +   if (rb->_BaseFormat == GL_RGB)
>return false;
>  
> if (!intel_get_memcpy(rb->Format, format, type, &mem_copy, &cpp,
> diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
> b/src/mesa/drivers/dri/i965/intel_tex_image.c
> index 34b91e8..e3710da7 100644
> --- a/src/mesa/drivers/dri/i965/intel_tex_image.c
> +++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
> @@ -399,10 +399,11 @@ intel_gettexsubimage_tiled_memcpy(struct gl_context 
> *ctx,
>return false;
>  
> /* We can't handle copying from RGBX or BGRX because the tiled_memcpy
> -* function doesn't set the last channel to 1.
> +* function doesn't set the last channel to 1. Note this checks BaseFormat
> +* rather than TexFormat in case the RGBX format is being simulated with 
> an
> +* RGBA format.
>  */
> -   if (texImage->TexFormat == MESA_FORMAT_B8G8R8X8_UNORM ||
> -   texImage->TexFormat == MESA_FORMAT_R8G8B8X8_UNORM)
> +   if (texImage->_BaseFormat == GL_RGB)
>return false;
>  
> if (!intel_get_memcpy(texImage->TexFormat, format, type, &mem_copy, &cpp,
> -- 
> 1.9.3
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] i965/gen9: Don't disallow fast clear for MSRT formats matching render

2015-11-19 Thread Ben Widawsky
On Thu, Nov 19, 2015 at 04:25:17PM +0100, Neil Roberts wrote:
> Previously fast clear was disallowed on Gen9 for MSRTs with the claim
> that some formats don't work but we didn't understand why. On further
> investigation it seems the formats that don't work are the ones where
> the render surface format is being overriden to a different format
> than the one used for texturing. The one used for texturing is not
> actually a renderable format. It arguably makes sense that the sampler
> hardware doesn't handle the fast color correctly in these cases
> because it shouldn't be possible to end up with a fast cleared surface
> that is non-renderable.
> 
> This patch changes the limitation to prevent fast clear for surfaces
> where the format for rendering is overriden.
> ---
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> index 1f8bfdf..f2e894a 100644
> --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> @@ -48,6 +48,7 @@
>  #include "brw_defines.h"
>  #include "brw_context.h"
>  #include "brw_draw.h"
> +#include "brw_state.h"
>  #include "intel_fbo.h"
>  #include "intel_batchbuffer.h"
>  
> @@ -549,11 +550,17 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
> gl_framebuffer *fb,
>if (brw->gen < 7)
>   clear_type = REP_CLEAR;
>  
> -  /* Certain formats have unresolved issues with sampling from the MCS
> -   * buffer on Gen9. This disables fast clears altogether for MSRTs until
> -   * we can figure out what's going on.
> +  /* If we're mapping the render format to a different format than the
> +   * format we use for texturing then it is a bit questionable whether it
> +   * should be possible to use a fast clear. Although we only actually
> +   * render using a renderable format, without the override workaround it
> +   * wouldn't be possible to have a non-renderable surface in a fast 
> clear
> +   * state so the hardware probably legitimately doesn't need to support
> +   * this case. At least on Gen9 this really does seem to cause problems.
> */
> -  if (brw->gen >= 9 && irb->mt->num_samples > 1)
> +  if (brw->gen >= 9 &&
> +  brw_format_for_mesa_format(irb->mt->format) !=
> +  brw->render_target_format[irb->mt->format])

Could you just do !brw->format_supported_as_render_target[irb->mt->format]?

>   clear_type = REP_CLEAR;
>  
>if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_NO_MCS)

I forget, did you find failures for this in the non-MSRT case? If not, maybe we
could skip this patch and just take the rest of the series? That way we can
avoid the "perf regression" of RGBX clears which we do hit on certain workloads.

Assuming we have a failure in non-MSRT case, and we can't just bypass this
patch, this is:
Reviewed-by: Ben Widawsky 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/gen8: Allow rendering to B8G8R8X8

2015-11-19 Thread Ben Widawsky
On Thu, Nov 19, 2015 at 04:25:18PM +0100, Neil Roberts wrote:
> Since Gen8 this is allowed as a rendering target so we don't need to
> override it to B8G8R8A8. This is helpful on Gen9+ where using this
> override causes fast clears not to work.
> ---
>  src/mesa/drivers/dri/i965/brw_surface_formats.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
> b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> index 55e7e64..7c38431 100644
> --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
> +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> @@ -167,8 +167,8 @@ const struct surface_format_info surface_formats[] = {
> SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   I32_FLOAT)
> SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   L32_FLOAT)
> SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   A32_FLOAT)
> -   SF( Y,  Y,  x,  Y,  x,  x,  x,  x, 60,   90,   B8G8R8X8_UNORM)
> -   SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   B8G8R8X8_UNORM_SRGB)
> +   SF( Y,  Y,  x,  Y, 80, 80,  x,  x, 60,   90,   B8G8R8X8_UNORM)
> +   SF( Y,  Y,  x,  x, 80, 80,  x,  x,  x,x,   B8G8R8X8_UNORM_SRGB)
> SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R8G8B8X8_UNORM)
> SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R8G8B8X8_UNORM_SRGB)
> SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R9G9B9E5_SHAREDEXP)
> @@ -670,9 +670,10 @@ brw_init_surface_formats(struct brw_context *brw)
> * mask writes to alpha (ala glColorMask) and reconfigure the
> * alpha blending hardware to use GL_ONE (or GL_ZERO) for
> * cases where GL_DST_ALPHA (or GL_ONE_MINUS_DST_ALPHA) is
> -   * used.
> +   * used. On Gen8+ BGRX is actually allowed (but not RGBX).
> */
> -  render = BRW_SURFACEFORMAT_B8G8R8A8_UNORM;
> + if (gen < tinfo->render_target)
> +render = BRW_SURFACEFORMAT_B8G8R8A8_UNORM;
>break;
>case BRW_SURFACEFORMAT_R8G8B8X8_UNORM:
>   render = BRW_SURFACEFORMAT_R8G8B8A8_UNORM;

There's probably something I don't know about the format naming but at the top
of the of the comment you modified they refer to the formats with the components
reversed (XRGB and ARGB) whereas you named them as in order (BGRX).

BTW, I pointed this out to Nanley, but I didn't see a patch, I think we need to
add is_braswell (and maybe broxton)) to the gen += 5 at the top of this
function.

lgtm
Reviewed-by: Ben Widawsky 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] MESA_EXTENSION_OVERRIDE problem

2015-11-19 Thread Ian Romanick
On 11/19/2015 12:30 PM, Ilia Mirkin wrote:
> On Thu, Nov 19, 2015 at 3:25 PM, Brian Paul  wrote:
>> On 11/19/2015 01:19 PM, Ilia Mirkin wrote:
>>> On Thu, Nov 19, 2015 at 3:13 PM, Brian Paul  wrote:

 Hi Nanley,

 Maybe you can fix an issue I have with the new extension code.

 Previously, I could do something like export
 MESA_EXTENSION_OVERRIDE="-ARB_clear_buffer_object" and I would no longer
 see
 it in the GL_EXTENSIONS string, even if it was an "always on" extension.
>>>
>>> How sure are you that this worked?
>>
>> I'm pretty sure it worked that way when I first wrote it.
>>
>>> AFAIK there was no way to
>>> enable/disable an always-on ext.
>>
>> That's not the point.  The point is to change what
>> glGetString(GL_EXTENSIONS) (or the glGetStringi() method) returns to the
>> application.  I don't care if the underlying functionality is still working
>> in Mesa.  We're trying to tell the application to not use a feature.  I've
>> found this extremely helpful when debugging some apps.
>>
>>> That env var never affected the
>>> extension strings, but only the actual ctx->Extensions struct, which
>>> would indirectly modify the ext string. If the entry in that ext table
>>> was a dummy one, it had logic to not futz with it. (But it did so
>>> silently.)
>>
>> Yeah, I don't care about the table.  I only care about the return value of
>> glGetString(GL_EXTENSIONS).
> 
> I was pretty annoyed at that too -- having it affect exts returned by
> the driver seemed to be the important bit, but I think I investigated
> why it didn't work ~6 months ago, and came to the above conclusion
> [that it only futzed with the table]. In fact recently someone was
> unable to work around an ARB_dsa shortcoming in nouveau by disabling
> it with an override.

Ugh... How much should this work?  Extensions are used to determine GL
versions, and compute_version can't check that "always on" extensions
have been disabled.  If you try to disable GL_EXT_texture, you'll still
get at least OpenGL 1.1. :)  Is it okay that the version override is
still needed to disable newer versions?

> IMHO it makes sense for it to affect both the table and the ext
> string... i.e. if there's a ctx->Extensions.foo listed, it should turn
> that on/off, but if there isn't, it should still touch up the returned
> ext list.
> 
>   -ilia
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] MESA_EXTENSION_OVERRIDE problem

2015-11-19 Thread Nanley Chery
On Thu, Nov 19, 2015 at 12:13 PM, Brian Paul  wrote:

> Hi Nanley,
>
>
Hi Brian,


> Maybe you can fix an issue I have with the new extension code.
>
> Previously, I could do something like export
> MESA_EXTENSION_OVERRIDE="-ARB_clear_buffer_object" and I would no longer
> see it in the GL_EXTENSIONS string, even if it was an "always on" extension.
>
> Now when I try that I get:
>
> Mesa 11.1.0-devel implementation error: Trying to disable permanently
> enabled extensions: GL_ARB_get_texture_sub_image
> Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa
>
> The whole point of the "-GL_EXT_foobar" syntax was to hide an extension
> from the application when it queries the driver's extensions.
>
> Can you please fix this so it works as before?
>
>
I have two branches that provide the ability to disable permanently enabled
extensions:
1. The first only modifies the extension strings and is located here:
http://cgit.freedesktop.org/~nchery/mesa/commit/?h=mod_always_on
2. The second modifies the extension strings and disables the extension
within the driver (assuming appropriate the helper function is used). It
also provides some performance benefits. :
http://cgit.freedesktop.org/~nchery/mesa/commit/?h=init_ext_vals

I'd appreciate any feedback on the two approaches as I work to get the
feature upstreamed.

Regards,
Nanley


> Thanks.
>
> -Brian
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Add src/dst interference for certain instructions with hazards.

2015-11-19 Thread Matt Turner
From: Kenneth Graunke 

When working on tessellation shaders, I created some vec4 virtual
opcodes for creating message headers through a sequence like:

   mov(8) g7<1>UD  0xUD{ align1 WE_all 1Q compacted };
   mov(1) g7.5<1>UD0x0100UD{ align1 WE_all };
   mov(1) g7<1>UD  g0<0,1,0>UD { align1 WE_all compacted };
   mov(1) g7.3<1>UDg8<0,1,0>UD { align1 WE_all };

This is done in the generator since the vec4 backend can't handle align1
regioning.  From the visitor's point of view, this is a single opcode:

   hs_set_output_urb_offsets vgrf7.0:UD, 1U, vgrf8.:UD

Normally, there's no hazard between sources and destinations - an
instruction (naturally) reads its sources, then writes the result to the
destination.  However, when the virtual instruction generates multiple
hardware instructions, we can get into trouble.

In the above example, if the register allocator assigned vgrf7 and vgrf8
to the same hardware register, then we'd clobber the source with 0 in
the first instruction, and read back the wrong value in the last one.

It occured to me that this is exactly the same problem we have with
SIMD16 instructions that use W/UW or B/UB types with 0 stride.  The
hardware implicitly decodes them as two SIMD8 instructions, and with
the overlapping regions, the first would clobber the second.

Previously, we handled that by incrementing the live range end IP by 1,
which works, but is excessive: the next instruction doesn't actually
care about that.  It might also be the end of control flow.  This might
keep values alive too long.  What we really want is to say "my source
and destinations interfere".

This patch creates new infrastructure for doing just that, and teaches
the register allocator to add interference when there's a hazard.  For
my vec4 case, we can determine this by switching on opcodes.  For the
SIMD16 case, we just move the existing code there.

I audited our existing virtual opcodes that generate multiple
instructions; I believe FS_OPCODE_PACK_HALF_2x16_SPLIT needs this
treatment as well, but no others.

Reviewed-by: Matt Turner 
Signed-off-by: Kenneth Graunke 
---
[mattst88] Rebased, applied R-b, and ran shader-db on ILK and HSW (no change)

 src/mesa/drivers/dri/i965/brw_fs.cpp   | 65 ++
 .../drivers/dri/i965/brw_fs_live_variables.cpp | 36 +---
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  | 13 +
 src/mesa/drivers/dri/i965/brw_ir_fs.h  |  1 +
 src/mesa/drivers/dri/i965/brw_ir_vec4.h|  1 +
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 29 ++
 .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 13 +
 7 files changed, 123 insertions(+), 35 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index e9c990d..b1f0164 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -299,6 +299,71 @@ fs_inst::is_send_from_grf() const
}
 }
 
+/**
+ * Returns true if this instruction's sources and destinations cannot
+ * safely be the same register.
+ *
+ * In most cases, a register can be written over safely by the same
+ * instruction that is its last use.  For a single instruction, the
+ * sources are dereferenced before writing of the destination starts
+ * (naturally).
+ *
+ * However, there are a few cases where this can be problematic:
+ *
+ * - Virtual opcodes that translate to multiple instructions in the
+ *   code generator: if src == dst and one instruction writes the
+ *   destination before a later instruction reads the source, then
+ *   src will have been clobbered.
+ *
+ * - SIMD16 compressed instructions with certain regioning (see below).
+ *
+ * The register allocator uses this information to set up conflicts between
+ * GRF sources and the destination.
+ */
+bool
+fs_inst::has_source_and_destination_hazard() const
+{
+   switch (opcode) {
+   case FS_OPCODE_PACK_HALF_2x16_SPLIT:
+  /* Multiple partial writes to the destination */
+  return true;
+   default:
+  /* The SIMD16 compressed instruction
+   *
+   * add(16)  g4<1>F  g4<8,8,1>F   g6<8,8,1>F
+   *
+   * is actually decoded in hardware as:
+   *
+   * add(8)   g4<1>F  g4<8,8,1>F   g6<8,8,1>F
+   * add(8)   g5<1>F  g5<8,8,1>F   g7<8,8,1>F
+   *
+   * Which is safe.  However, if we have uniform accesses
+   * happening, we get into trouble:
+   *
+   * add(8)   g4<1>F  g4<0,1,0>F   g6<8,8,1>F
+   * add(8)   g5<1>F  g4<0,1,0>F   g7<8,8,1>F
+   *
+   * Now our destination for the first instruction overwrote the
+   * second instruction's src0, and we get garbage for those 8
+   * pixels.  There's a similar issue for the pre-gen6
+   * pixel_x/pixel_y, which are registers of 16-bit values and thus
+   * would get stomped by the first decode as well.
+   */
+  if (exec_size =

Re: [Mesa-dev] [RFC 3/4] glsl: avoid linker and user varying location to overlap

2015-11-19 Thread Timothy Arceri
On Sun, 2015-10-25 at 15:01 +0100, Gregory Hainaut wrote:
> Current behavior on the interface matching:
> 
> layout (location = 0) out0; // Assigned to VARYING_SLOT_VAR0 by user
> out1; // Assigned to VARYING_SLOT_VAR0 by the linker
> 
> New behavior on the interface matching:
> 
> layout (location = 0) out0; // Assigned to VARYING_SLOT_VAR0 by user
> out1; // Assigned to VARYING_SLOT_VAR1 by the linker
> 
> piglit: arb_separate_shader_object-rendezvous_by_name
> 
> v4:
> * Fix variable name in assert
> 
> Signed-off-by: Gregory Hainaut 
> ---
>  src/glsl/link_varyings.cpp | 46 +++
> ---
>  1 file changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
> index 7e77a67..67d04cb 100644
> --- a/src/glsl/link_varyings.cpp
> +++ b/src/glsl/link_varyings.cpp
> @@ -766,7 +766,7 @@ public:
> gl_shader_stage consumer_stage);
> ~varying_matches();
> void record(ir_variable *producer_var, ir_variable *consumer_var);
> -   unsigned assign_locations();
> +   unsigned assign_locations(uint64_t reserved_slots);
> void store_locations() const;
>  
>  private:
> @@ -986,7 +986,7 @@ varying_matches::record(ir_variable *producer_var,
> ir_variable *consumer_var)
>   * passed to varying_matches::record().
>   */
>  unsigned
> -varying_matches::assign_locations()
> +varying_matches::assign_locations(uint64_t reserved_slots)
>  {
> /* Sort varying matches into an order that makes them easy to pack. */
> qsort(this->matches, this->num_matches, sizeof(*this->matches),
> @@ -1013,6 +1013,10 @@ varying_matches::assign_locations()
>!= this->matches[i].packing_class) {
>   *location = ALIGN(*location, 4);
>}
> +  while ((*location < MAX_VARYING * 4u) &&
> +(reserved_slots & (1u << *location / 4u))) {
> + *location = ALIGN(*location + 1, 4);
> +  }
>  
>this->matches[i].generic_location = *location;
>  
> @@ -1376,6 +1380,38 @@ canonicalize_shader_io(exec_list *ir, enum
> ir_variable_mode io_mode)
>  }
>  
>  /**
> + * Generate a bitfield map of the already reserved slots for a shader.

Maybe change this to:

Generate a bitfield map of the explicit locations for shader varyings.

Otherwise:

Reviewed-by: Timothy Arceri  + *
> + * In theory a 32 bits value will be enough but a 64 bits value is future
> proof.
> + */
> +uint64_t
> +reserved_varying_slot(struct gl_shader *stage, ir_variable_mode io_mode)
> +{
> +   assert(io_mode == ir_var_shader_in || io_mode == ir_var_shader_out);
> +   assert(MAX_VARYING <= 64); /* avoid an overflow of the returned value */
> +
> +   uint64_t slots = 0;
> +   int var_slot;
> +
> +   if (!stage)
> +  return slots;
> +
> +   foreach_in_list(ir_instruction, node, stage->ir) {
> +  ir_variable *const var = node->as_variable();
> +
> +  if (var == NULL || var->data.mode != io_mode || !var
> ->data.explicit_location)
> + continue;
> +
> +  var_slot = var->data.location - VARYING_SLOT_VAR0;
> +  if (var_slot >= 0 && var_slot < MAX_VARYING)
> + slots |= 1u << var_slot;
> +   }
> +
> +   return slots;
> +}
> +
> +
> +/**
>   * Assign locations for all variables that are produced in one pipeline
> stage
>   * (the "producer") and consumed in the next stage (the "consumer").
>   *
> @@ -1550,7 +1586,11 @@ assign_varying_locations(struct gl_context *ctx,
>   matches.record(matched_candidate->toplevel_var, NULL);
> }
>  
> -   const unsigned slots_used = matches.assign_locations();
> +   const uint64_t reserved_slots =
> +  reserved_varying_slot(producer, ir_var_shader_out) |
> +  reserved_varying_slot(consumer, ir_var_shader_in);
> +
> +   const unsigned slots_used = matches.assign_locations(reserved_slots);
> matches.store_locations();
>  
> for (unsigned i = 0; i < num_tfeedback_decls; ++i) {
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92706] glBlitFramebuffer refuses to blit RGBA to RGB with MSAA

2015-11-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92706

--- Comment #7 from EoD  ---
OpenGL renderer string: Gallium 0.4 on AMD BARTS (DRM 2.43.0, LLVM 3.7.0)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 11.1.0-devel
(git-0cfc130)

After applying both patches, the patched piglit test passes just fine:

http://patchwork.freedesktop.org/patch/63670/
http://patchwork.freedesktop.org/patch/63672/

PIGLIT: {"result": "pass" }

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix JIP to properly skip over unrelated control flow.

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 2:05 PM, Kenneth Graunke  wrote:
> We've apparently always been botching JIP for sequences such as:

Wrong since Dec 1 2015! Nice find.

>
>do
>cmp.f0.0 ...
>(+f0.0) break
>...
>if
>   ...
>else
>   ...
>endif
>...
>while
>
> Normally, UIP is supposed to point to the final destination of the jump,
> while in nested control flow, JIP is supposed to point to the end of the
> current nesting level.  It essentially bounces out of the current nested
> control flow, to an instruction that has a JIP which bounces out another
> level, and so on.
>
> In the above example, when setting JIP for the BREAK, we call
> brw_find_next_block_end(), which begins a search after the BREAK for the
> next ENDIF, ELSE, WHILE, or HALT.  It ignores the IF and finds the ELSE,
> setting JIP there.
>
> This makes no sense at all.  The break is supposed to skip over the
> whole if/else/endif block entirely.  They have a sibling relationship,
> not a nesting relationship.
>
> This patch fixes brw_find_next_block_end() to track depth as it does
> its search, and ignore anything not at depth 0.  So when it sees the
> IF, it ignores everything until after the ENDIF.  That way, it finds
> the end of the right block.
>
> Caught while debugging a tessellation shader - no apparent effect on
> Piglit.  I did look for actual applications that were affected, and
> found that GLBenchmark Manhattan had a BREAK with a bogus JIP.
>
> Cc: mesa-sta...@lists.freedesktop.org
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_eu_emit.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> index da1ddfd..e457fd2 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> @@ -2604,17 +2604,27 @@ brw_find_next_block_end(struct brw_codegen *p, int 
> start_offset)
> void *store = p->store;
> const struct brw_device_info *devinfo = p->devinfo;
>
> +   int depth = 0;
> +
> for (offset = next_offset(devinfo, store, start_offset);
>  offset < p->next_insn_offset;
>  offset = next_offset(devinfo, store, offset)) {
>brw_inst *insn = store + offset;
>
>switch (brw_inst_opcode(devinfo, insn)) {
> +  case BRW_OPCODE_IF:
> + depth++;
> + break;
>case BRW_OPCODE_ENDIF:
> + if (depth == 0)
> +return offset;
> + depth--;
> + break;
>case BRW_OPCODE_ELSE:
>case BRW_OPCODE_WHILE:
>case BRW_OPCODE_HALT:
> -return offset;
> + if (depth == 0)
> +return offset;


Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8] i965: Enable EXT_shader_samples_identical

2015-11-19 Thread Neil Roberts
Jason Ekstrand  writes:

> Something else that came out of that conversation is that, for 2x
> MSAA, we may get bogus data in all but the bottom 4 bits.  In other
> words, just blindly checking for zero is probably a bad idea.  It'll
> work because the extension spec lets us return false negatives, but it
> isn't a good idea in general.  If we really want the implementation to
> be solid, we need to mask off all but the bottom n * log2(n) bits
> where n = number of samples.

I just noticed that this optimisation is actually suggested in the PRM.
I'll paste it here (sorry for the noise if everyone was already aware of
this):

IVB vol4 part 1, 2.7.1

“A simple optimization with probable large return in performance is to
 compare the MCS value to zero (indicating all samples are on sample
 slice 0), and sample only from sample slice 0 using ld2dss if MCS is
 zero. Sample slice 0 is the pixel color in this case. If MCS is not
 zero, each sample is then obtained using ld2dms messages and the
 results are averaged in the kernel after being returned. Refer to the
 multisample storage format in the GPU Overview volume for more
 details.”

This at least implies that this is an expected use case and maybe is a
good hint that it might do the right thing and clear the extra bits for
the 2x case? Of course it would be good to be more sure. I can't find
the more details that it's referring to. (Why don't the specs have
hyperlinks…?)

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/gen8: Allow rendering to B8G8R8X8

2015-11-19 Thread Anuj Phogat
On Thu, Nov 19, 2015 at 7:25 AM, Neil Roberts  wrote:
> Since Gen8 this is allowed as a rendering target so we don't need to
> override it to B8G8R8A8. This is helpful on Gen9+ where using this
> override causes fast clears not to work.
> ---
>  src/mesa/drivers/dri/i965/brw_surface_formats.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
> b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> index 55e7e64..7c38431 100644
> --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
> +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> @@ -167,8 +167,8 @@ const struct surface_format_info surface_formats[] = {
> SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   I32_FLOAT)
> SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   L32_FLOAT)
> SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   A32_FLOAT)
> -   SF( Y,  Y,  x,  Y,  x,  x,  x,  x, 60,   90,   B8G8R8X8_UNORM)
> -   SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   B8G8R8X8_UNORM_SRGB)
> +   SF( Y,  Y,  x,  Y, 80, 80,  x,  x, 60,   90,   B8G8R8X8_UNORM)
> +   SF( Y,  Y,  x,  x, 80, 80,  x,  x,  x,x,   B8G8R8X8_UNORM_SRGB)
> SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R8G8B8X8_UNORM)
> SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R8G8B8X8_UNORM_SRGB)
> SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R9G9B9E5_SHAREDEXP)
> @@ -670,9 +670,10 @@ brw_init_surface_formats(struct brw_context *brw)
>   * mask writes to alpha (ala glColorMask) and reconfigure the
>   * alpha blending hardware to use GL_ONE (or GL_ZERO) for
>   * cases where GL_DST_ALPHA (or GL_ONE_MINUS_DST_ALPHA) is
> - * used.
> + * used. On Gen8+ BGRX is actually allowed (but not RGBX).
>   */
> -render = BRW_SURFACEFORMAT_B8G8R8A8_UNORM;
> + if (gen < tinfo->render_target)
> +render = BRW_SURFACEFORMAT_B8G8R8A8_UNORM;
>  break;
>case BRW_SURFACEFORMAT_R8G8B8X8_UNORM:
>   render = BRW_SURFACEFORMAT_R8G8B8A8_UNORM;
> --
> 1.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Verified in the hardware specs.
Reviewed-by: Anuj Phogat 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8 v2] i965: Enable EXT_shader_samples_identical

2015-11-19 Thread Neil Roberts
Ian Romanick  writes:

> Am I correct that nothing special is needed in the vec4 backend? It
> seems like mcs should know the register size, and the CMP with 0
> should just do the right thing.

I think you probably will have to do something for the vec4 backend.
Currently it generates an instruction like this:

cmp.z.f0(8) g8<1>.xUD   g9<4,4,1>.xUD   0xUD

g9 contains the MCS data. If I understand correctly for 16x MSAA the
second half of the MCS data is in the y component which is currently
ignored by this instruction. Maybe something like this would work:

  if (mcs.file == BRW_IMMEDIATE_VALUE) {
 emit(MOV(dest, src_reg(0u)));
  } else if ((key_tex->msaa_16 & (1 << sampler))) {
 dst_reg tmp(this, glsl_type::uint_type);
 dst_reg swizzled_mcs = mcs;
 swizzled_mcs.swizzle = BRW_SWIZZLE_;
 emit(OR(tmp, mcs, swizzled_mcs));
 emit(CMP(dest, tmp, src_reg(0u), BRW_CONDITIONAL_EQ));
  } else {
 emit(CMP(dest, mcs, src_reg(0u), BRW_CONDITIONAL_EQ));
  }

Sadly the optimiser doesn't optimise out the extra instruction this
time. There's probably a better way to do it. On the other hand I can't
think why anyone would be using this function in the vec4 backend so
it's probably not worth worrying about.

I haven't tested this at all. I guess to test it you would have to write
a GS version of the Piglit test? I think that is the only place that
uses the vec4 backend on SKL.

If we don't want to risk this we could always just make it return false
when msaa_16 is set for the sampler. On the other hand if there is
currently no test for the 8x version either then we should probably
assume that neither of them work… maybe it wouldn't be so bad to just
always return false in the vec4 backend until we've tested it?

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Fix JIP to properly skip over unrelated control flow.

2015-11-19 Thread Kenneth Graunke
We've apparently always been botching JIP for sequences such as:

   do
   cmp.f0.0 ...
   (+f0.0) break
   ...
   if
  ...
   else
  ...
   endif
   ...
   while

Normally, UIP is supposed to point to the final destination of the jump,
while in nested control flow, JIP is supposed to point to the end of the
current nesting level.  It essentially bounces out of the current nested
control flow, to an instruction that has a JIP which bounces out another
level, and so on.

In the above example, when setting JIP for the BREAK, we call
brw_find_next_block_end(), which begins a search after the BREAK for the
next ENDIF, ELSE, WHILE, or HALT.  It ignores the IF and finds the ELSE,
setting JIP there.

This makes no sense at all.  The break is supposed to skip over the
whole if/else/endif block entirely.  They have a sibling relationship,
not a nesting relationship.

This patch fixes brw_find_next_block_end() to track depth as it does
its search, and ignore anything not at depth 0.  So when it sees the
IF, it ignores everything until after the ENDIF.  That way, it finds
the end of the right block.

Caught while debugging a tessellation shader - no apparent effect on
Piglit.  I did look for actual applications that were affected, and
found that GLBenchmark Manhattan had a BREAK with a bogus JIP.

Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index da1ddfd..e457fd2 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -2604,17 +2604,27 @@ brw_find_next_block_end(struct brw_codegen *p, int 
start_offset)
void *store = p->store;
const struct brw_device_info *devinfo = p->devinfo;
 
+   int depth = 0;
+
for (offset = next_offset(devinfo, store, start_offset);
 offset < p->next_insn_offset;
 offset = next_offset(devinfo, store, offset)) {
   brw_inst *insn = store + offset;
 
   switch (brw_inst_opcode(devinfo, insn)) {
+  case BRW_OPCODE_IF:
+ depth++;
+ break;
   case BRW_OPCODE_ENDIF:
+ if (depth == 0)
+return offset;
+ depth--;
+ break;
   case BRW_OPCODE_ELSE:
   case BRW_OPCODE_WHILE:
   case BRW_OPCODE_HALT:
-return offset;
+ if (depth == 0)
+return offset;
   }
}
 
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8] i965: Enable EXT_shader_samples_identical

2015-11-19 Thread Jason Ekstrand
On Wed, Nov 18, 2015 at 9:29 PM, Jason Ekstrand  wrote:
>
> On Nov 18, 2015 5:02 PM, "Jason Ekstrand"  wrote:
>>
>> On Wed, Nov 18, 2015 at 4:06 PM, Kenneth Graunke 
>> wrote:
>> > On Wednesday, November 18, 2015 03:46:54 PM Ian Romanick wrote:
>> >> From: Ian Romanick 
>> >>
>> >> Signed-off-by: Ian Romanick 
>> >> ---
>> >>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   |  1 +
>> >>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 16 
>> >>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp |  1 +
>> >>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 11 +++
>> >>  src/mesa/drivers/dri/i965/intel_extensions.c   |  1 +
>> >>  5 files changed, 30 insertions(+)
>> >>
>> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >> index 1f71f66..4af1234 100644
>> >> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >> @@ -2550,6 +2550,7 @@ fs_visitor::nir_emit_texture(const fs_builder
>> >> &bld, nir_tex_instr *instr)
>> >>   switch (instr->op) {
>> >>   case nir_texop_txf:
>> >>   case nir_texop_txf_ms:
>> >> + case nir_texop_samples_identical:
>> >>  coordinate = retype(src, BRW_REGISTER_TYPE_D);
>> >>  break;
>> >>   default:
>> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> >> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> >> index a7bd9ce..6688f6a 100644
>> >> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> >> @@ -259,6 +259,22 @@ fs_visitor::emit_texture(ir_texture_opcode op,
>> >>lod = fs_reg(0u);
>> >> }
>> >>
>> >> +   if (op == ir_samples_identical) {
>> >> +  fs_reg dst = vgrf(glsl_type::get_instance(dest_type->base_type,
>> >> 1, 1));
>> >> +
>> >> +  if (mcs.file == BRW_IMMEDIATE_VALUE) {
>> >> + fs_reg tmp = vgrf(glsl_type::uint_type);
>> >> +
>> >> + bld.MOV(tmp, mcs);
>> >> + bld.CMP(dst, tmp, src_reg(0u), BRW_CONDITIONAL_EQ);
>> >
>> > Seems a little strange to emit assembly to do the comparison when
>> > you've already determined that the value is a compile time constant.
>> >
>> > Why not just:
>> >
>> >bld.MOV(dst, fs_reg(mcs.ud == 0u ? ~0u : 0u));
>>
>> Actually, getting an immediate here means we don't have an MCS and we
>> have no idea of the samples are identical, so we should return false
>> always.
>>
>> >> +  } else {
>> >> + bld.CMP(dst, mcs, src_reg(0u), BRW_CONDITIONAL_EQ);
>>
>> We should also consider handling the clear color case.  In this case,
>> we'll get 0xff for 2x and 0x for 4x or 8x.  Do we know the
>> number of samples in the shader?  We should be able to get that from
>> the sampler or something but then we would have to pass that through
>> the key and that would get gross.
>
> First off, I realized that the numbers I have there are wrong. It's 0xff for
> 2x and 4x and 0x for 8x.  However, I also just realized that the 8
> 8-bit values you get for 8x MSAA range from 0 to 7 but take up 4 bits each.
> This means that no valid 8x MSAA MCS value can have 0xff as its bottom 8
> bits unless it's the clear color.  This means that a simple and with 0xff
> will get us a clear color check on all but 16x.  Unfortunately, 16x has a
> 64-bit MCS value and, unless the hardware provides is with some extra
> guarantees, 0xff would be valid in the bottom 8 bits.
>
> Going off into the world of speculation just a bit, can we make some
> assumptions about the hardware?  Suppose for a moment that the used a fairly
> greedy algorithm for determining which plane to store a value in:
>
> 1) If all samples are affected, store in slice zero
> 2) If not, store in the first available empty or completely overwritten
> slice.
>
> Such an algorithm would make sense and have the nice property of tending to
> pack the samples in the earlier slices this decreasing the possibility of
> ever touching slice 15.  This is good for cache locality.  It also has
> another property that would be very useful for us, namely that it only
> touches slice 15 if all 16 samples have different colors.  In particular, it
> would mean that you can never have two samples that both lie in slice 15
> and, more specifically, 0xff would also be invalid for 16x.
>
> Unfortunately, that's entirely based on my speculation as to how the
> hardware works.  Given that we don't actually know, it's not documented, and
> that we're not liable to ever find anyone willing to give us those kinds of
> details, we're not likely to find out without a very clever experiment.

So, chad has convinced my that my speculation is quite possibly bogus
and *very* hard to actually test, so just checking the bottom 8 bits
probably won't work.

Something else that came out of that conversation is that, for 2x
MSAA, we may get bogus data in all but the bottom 4 bits.  In other
words, just blindly checking

Re: [Mesa-dev] [PATCH 0/8] Implement EXT_shader_samples_identical

2015-11-19 Thread Chris Forbes
Series (with the v2 changes) is:

Reviewed-by: Chris Forbes 

On Thu, Nov 19, 2015 at 12:46 PM, Ian Romanick  wrote:

> This patch series implements a new GL extension,
> EXT_shader_samples_identical.  This extension allows shaders to
> determine when all of the samples in a particular texel are the same.
> This takes advantage of the way compressed multisample surfaces are
> stored on modern Intel and AMD hardware.  This enables optimizations in
> application multisample resolve filters, etc.
>
> I really wanted to get this in the next Mesa release.  For some reason,
> I thought the branch point was after Thanksgiving (which is next
> Thursday).  Ken reminded me yesterday that the branch point is actually
> this Friday. :( As a result, I'm sending it out today to get review as
> soon as possible.
>
> I also wanted to get as much time as possible for other drivers to get
> implementations.  I worked with Graham Sellers on this extension, and he
> assures me that the implementation on modern Radeons is trivial.  My
> expectation is that it should be about the same as the Intel
> implementation.
>
> There will be some extra TGSI bits needed, but that should also be
> trivial.  For the NIR and i965 backend bits, I mostly copied and blended
> the implementations of txf_ms and query_samples.
>
> There are currently only trivial piglit tests, but I am working on more.
> I basically hacked up tests/spec/arb_texture_multisample/texelfetch.c to
> use the extension to render different colors based on whether
> textureSamplesIdenticalEXT returned true or false.  The resulting image
> and the generated assembly look good.  My plan is to get a set of real
> tests out by midday tomorrow.
>
> As soon as we're confident that the spec is good, I'll submit it to
> Khronos for publication in the registry.  I'm still waiting on feedback
> from another closed-source driver writer.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8 v2] i965: Enable EXT_shader_samples_identical

2015-11-19 Thread Ian Romanick
On 11/19/2015 09:34 AM, Neil Roberts wrote:
> Jason Ekstrand  writes:
> 
>> On Nov 18, 2015 6:38 PM, "Ian Romanick"  wrote:
>>>
>>> From: Ian Romanick 
>>>
>>> v2: Handle immediate value for MCS smarter.  Rebase on changes to
>>> nir_texop_sampels_identical (missing second parameter).  Suggested by
>>> Jason.  This still doesn't handle the 16x MSAA case.
> 
> 16x MSAA has a field in the program sampler key because it needs a new
> instruction to fetch from the texture. For 16x MSAA it just uses two of
> the registers that are returned by the MCS fetch instruction instead of
> just one. Maybe you could do something like this:
> 
>   if (mcs.file == BRW_IMMEDIATE_VALUE) {
>  bld.MOV(dst, fs_reg(0));
>   } else if ((key_tex->msaa_16 & (1 << sampler))) {
>  fs_reg tmp = vgrf(glsl_type::uint_type);
>  bld.OR(tmp, mcs, offset(mcs, bld, 1));
>  bld.CMP(dst, tmp, src_reg(0u), BRW_CONDITIONAL_EQ);
>   } else {
>  bld.CMP(dst, mcs, src_reg(0u), BRW_CONDITIONAL_EQ);
>   }
> 
> I tested this on my SKL and it passes all of your Piglit tests. The
> optimiser quite neatly removes the second instruction and the temporary
> register in the 16x case and makes the OR directly update the flag
> registers, so it is possibly the same cost as the 8x case in practice.

I was looking at the msaa_16 flags last night after I sent out v2.  I
was thinking of something along these lines.  Am I correct that nothing
special is needed in the vec4 backend?  It seems like mcs should know
the register size, and the CMP with 0 should just do the right thing.

I'll add your code to the FS and your S-o-b to the commit message... and
hopefully this can land soon. :)

> I can confirm that I've already pushed the 16x MSAA patches to master.
> 
> Regards,
> - Neil


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl/x11/dri3: do not expose the preserved swap behavior (to be squashed)

2015-11-19 Thread Eric Anholt
Martin Graesslin  writes:

> On Thursday, November 19, 2015 1:54:22 PM CET Martin Peres wrote:
>> On 11/11/15 00:44, Eric Anholt wrote:
>> > Martin Peres  writes:
>> >> The preserved swap behavior is currently untested in piglit and not
>> >> supported on the GLX side. Before working on implementing it for
>> >> EGL/DRI3, let's disable it until support comes.
>> >> 
>> >> This patch is trivial enough and will likely be squashed in the commit
>> >> egl/x11: Implement dri3 support with loader's dri3 helper
>> >> 
>> >> Signed-off-by: Martin Peres 
>> > 
>> > This looks good to me.  I don't think anybody really cares about
>> > SWAP_BEHAVIOR_PRESERVED -- the buffer_age stuff was what you really
>> > wanted all along.
>> 
>> Hey Eric and Martin,
>> 
>> It seems like KWin is relying on SWAP_BEHAVIOR_PRESERVED for its EGL
>> backend. Should we add proper support for it in mesa or should we
>> propose a patch for kwin to test for the extension to be present before
>> using it?
>> 
>> Martin, what is your opinion on this since buffer age is what you need
>> and you already have support for it for the glx and wayland backends?
>
> That's certainly a left over from before buffer age.  I don't know whether we 
> can remove it unconditionally. Not every GLES harware which does support X11 
> supports buffer age.
>
> I assume the smartest thing to do is not ask for PRESERVED if buffer age is 
> supported.

Yeah, you definitely want to not ask for PRESERVED if you can handle
buffer age, since PRESERVED means losing page flipping.  (Part of why
PRESERVED was a bad idea from day 1)


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 2/2] radeonsi/compute: Use the compiler's COMPUTE_PGM_RSRC* register values

2015-11-19 Thread Emil Velikov
On 19 November 2015 at 20:37, Tom Stellard  wrote:
> On Wed, Nov 18, 2015 at 05:43:31PM +, Emil Velikov wrote:
>> Hi Tom,
>>
>> Please flip the order of the patches and drop the now patch 1/2 from
>> the stable queue.
>>
>
> I'm confused about why I need to flip the order of the patches.
>
As mentioned on IRC - patch 1 is not a bugfix and despite how trivial
I'd rather have only bugfixes in stable.
Is flipping things too much to ask ?

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Add test for sorted extension table

2015-11-19 Thread Ian Romanick
On 11/19/2015 10:44 AM, Nanley Chery wrote:
> From: Nanley Chery 
> 
> Enable developers to know if the table's alphabetical sorting
> is maintained or lost.
> 
> v2: Move "*" next to pointer name (Matt)
> Include extensions_table.h instead of extensions.h (Ian)

That actually does make it a lot nicer.

Reviewed-by: Ian Romanick 

> Remove extra " *" in comment (Ian)
> 
> Signed-off-by: Nanley Chery 
> ---
> 
> Does the original Rb still apply?
> 
>  src/mesa/main/tests/Makefile.am |  1 +
>  src/mesa/main/tests/mesa_extensions.cpp | 51 
> +
>  2 files changed, 52 insertions(+)
>  create mode 100644 src/mesa/main/tests/mesa_extensions.cpp
> 
> diff --git a/src/mesa/main/tests/Makefile.am b/src/mesa/main/tests/Makefile.am
> index bd7ab73..d6977e2 100644
> --- a/src/mesa/main/tests/Makefile.am
> +++ b/src/mesa/main/tests/Makefile.am
> @@ -27,6 +27,7 @@ AM_CPPFLAGS += -DHAVE_SHARED_GLAPI
>  main_test_SOURCES += \
>   dispatch_sanity.cpp \
>   mesa_formats.cpp\
> + mesa_extensions.cpp \
>   program_state_string.cpp
>  
>  main_test_LDADD += \
> diff --git a/src/mesa/main/tests/mesa_extensions.cpp 
> b/src/mesa/main/tests/mesa_extensions.cpp
> new file mode 100644
> index 000..0c7addd
> --- /dev/null
> +++ b/src/mesa/main/tests/mesa_extensions.cpp
> @@ -0,0 +1,51 @@
> +/*
> + * Copyright © 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +
> +/**
> + * \name mesa_extensions.cpp
> + *
> + * Verify that the extensions table is sorted.
> + */
> +
> +#include 
> +#include "util/macros.h"
> +
> +/**
> + * Debug/test: verify the extension table is alphabetically sorted.
> + */
> +TEST(MesaExtensionsTest, AlphabeticallySorted)
> +{
> +   const char *ext_names[] = {
> +   #define EXT(name_str, ...) #name_str,
> +   #include "main/extensions_table.h"
> +   #undef EXT
> +   };
> +
> +   for (unsigned i = 0; i < ARRAY_SIZE(ext_names) - 1; ++i) {
> +  const char *current_str = ext_names[i];
> +  const char *next_str = ext_names[i+1];
> +
> +  /* We expect the extension table to be alphabetically sorted */
> +  ASSERT_LT(strcmp(current_str, next_str), 0);
> +   }
> +}
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 2/2] radeonsi/compute: Use the compiler's COMPUTE_PGM_RSRC* register values

2015-11-19 Thread Tom Stellard
On Wed, Nov 18, 2015 at 05:43:31PM +, Emil Velikov wrote:
> Hi Tom,
> 
> Please flip the order of the patches and drop the now patch 1/2 from
> the stable queue.
> 

I'm confused about why I need to flip the order of the patches.

-Tom

> On 16 November 2015 at 20:03, Tom Stellard  wrote:
> > The compiler has more information and is able to optimize the bits
> > it sets in these registers.
> >
> > CC: 
> > ---
> >  src/gallium/drivers/radeonsi/si_compute.c | 37 
> > ++-
> >  src/gallium/drivers/radeonsi/si_shader.c  |  2 ++
> >  2 files changed, 9 insertions(+), 30 deletions(-)
> >
> > diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
> > b/src/gallium/drivers/radeonsi/si_compute.c
> > index 2d551dd..a461b2c 100644
> > --- a/src/gallium/drivers/radeonsi/si_compute.c
> > +++ b/src/gallium/drivers/radeonsi/si_compute.c
> > @@ -34,11 +34,6 @@
> >
> >  #define MAX_GLOBAL_BUFFERS 20
> >
> > -/* XXX: Even though we don't pass the scratch buffer via user sgprs any 
> > more
> > - * LLVM still expects that we specify 4 USER_SGPRS so it can remain 
> > compatible
> > - * with older mesa. */
> > -#define NUM_USER_SGPRS 4
> > -
> >  struct si_compute {
> > struct si_context *ctx;
> >
> > @@ -238,7 +233,6 @@ static void si_launch_grid(
> > uint64_t kernel_args_va;
> > uint64_t scratch_buffer_va = 0;
> > uint64_t shader_va;
> > -   unsigned arg_user_sgpr_count = NUM_USER_SGPRS;
> > unsigned i;
> > struct si_shader *shader = &program->shader;
> > unsigned lds_blocks;
> > @@ -366,19 +360,7 @@ static void si_launch_grid(
> > si_pm4_set_reg(pm4, R_00B830_COMPUTE_PGM_LO, shader_va >> 8);
> > si_pm4_set_reg(pm4, R_00B834_COMPUTE_PGM_HI, shader_va >> 40);
> >
> > -   si_pm4_set_reg(pm4, R_00B848_COMPUTE_PGM_RSRC1,
> > -   /* We always use at least 3 VGPRS, these come from
> > -* TIDIG_COMP_CNT.
> > -* XXX: The compiler should account for this.
> > -*/
> > -   S_00B848_VGPRS((MAX2(3, shader->num_vgprs) - 1) / 4)
> > -   /* We always use at least 4 + arg_user_sgpr_count.  The 4 
> > extra
> > -* sgprs are from TGID_X_EN, TGID_Y_EN, TGID_Z_EN, 
> > TG_SIZE_EN
> > -* XXX: The compiler should account for this.
> > -*/
> > -   |  S_00B848_SGPRS(((MAX2(4 + arg_user_sgpr_count,
> > -   shader->num_sgprs)) - 1) / 8)
> > -   |  S_00B028_FLOAT_MODE(shader->float_mode))
> > +   si_pm4_set_reg(pm4, R_00B848_COMPUTE_PGM_RSRC1, shader->rsrc1);
> > ;
> The above semicolon should be nuked as well, shouldn't it ?
> 
> Thanks
> Emil
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1.9/2] i965: Add brw_imm_uv().

2015-11-19 Thread Jason Ekstrand
On Thu, Nov 19, 2015 at 11:33 AM, Matt Turner  wrote:
> On Thu, Nov 19, 2015 at 9:53 AM, Matt Turner  wrote:
>> On Thu, Nov 19, 2015 at 8:00 AM, Jason Ekstrand  wrote:
>>> On Wed, Nov 18, 2015 at 2:25 PM, Matt Turner  wrote:
 ---
  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 3 +++
  src/mesa/drivers/dri/i965/brw_reg.h| 9 +
  2 files changed, 12 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
 index e5a286a..78c10e9 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
 @@ -116,6 +116,9 @@ brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, 
 unsigned gen)
case BRW_REGISTER_TYPE_V:
   brw_reg = brw_imm_v(reg->ud);
   break;
 +  case BRW_REGISTER_TYPE_UV:
 + brw_reg = brw_imm_uv(reg->ud);
 + break;
default:
  unreachable("not reached");
}
 diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
 b/src/mesa/drivers/dri/i965/brw_reg.h
 index 759bd93..b77cdeb 100644
 --- a/src/mesa/drivers/dri/i965/brw_reg.h
 +++ b/src/mesa/drivers/dri/i965/brw_reg.h
 @@ -623,6 +623,15 @@ brw_imm_v(unsigned v)
 return imm;
  }

 +/** Construct vector of eight unsigned half-byte values */
 +static inline struct brw_reg
 +brw_imm_uv(unsigned uv)
 +{
>>>
>>> Please add a GEN assertion either here or in the generator.  This only
>>> becomes available on Haswell or Broadwell if I remember correctly.  I
>>> do know it's not universally available.
>>
>> Good idea. (It's Sandybridge+)
>
> On second thought, I can't (without adding a devinfo parameter to this
> function, which I'm not going to do)
>
> This is something the instruction validator can be tasked with verifying.

You could also do it in the generator, but the validator is fine too.
As long as it's somewhere.
--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] MESA_EXTENSION_OVERRIDE problem

2015-11-19 Thread Ilia Mirkin
On Thu, Nov 19, 2015 at 3:25 PM, Brian Paul  wrote:
> On 11/19/2015 01:19 PM, Ilia Mirkin wrote:
>>
>> On Thu, Nov 19, 2015 at 3:13 PM, Brian Paul  wrote:
>>>
>>> Hi Nanley,
>>>
>>> Maybe you can fix an issue I have with the new extension code.
>>>
>>> Previously, I could do something like export
>>> MESA_EXTENSION_OVERRIDE="-ARB_clear_buffer_object" and I would no longer
>>> see
>>> it in the GL_EXTENSIONS string, even if it was an "always on" extension.
>>
>>
>> How sure are you that this worked?
>
>
> I'm pretty sure it worked that way when I first wrote it.
>
>
>> AFAIK there was no way to
>> enable/disable an always-on ext.
>
>
> That's not the point.  The point is to change what
> glGetString(GL_EXTENSIONS) (or the glGetStringi() method) returns to the
> application.  I don't care if the underlying functionality is still working
> in Mesa.  We're trying to tell the application to not use a feature.  I've
> found this extremely helpful when debugging some apps.
>
>
>> That env var never affected the
>> extension strings, but only the actual ctx->Extensions struct, which
>> would indirectly modify the ext string. If the entry in that ext table
>> was a dummy one, it had logic to not futz with it. (But it did so
>> silently.)
>
>
> Yeah, I don't care about the table.  I only care about the return value of
> glGetString(GL_EXTENSIONS).

I was pretty annoyed at that too -- having it affect exts returned by
the driver seemed to be the important bit, but I think I investigated
why it didn't work ~6 months ago, and came to the above conclusion
[that it only futzed with the table]. In fact recently someone was
unable to work around an ARB_dsa shortcoming in nouveau by disabling
it with an override.

IMHO it makes sense for it to affect both the table and the ext
string... i.e. if there's a ctx->Extensions.foo listed, it should turn
that on/off, but if there isn't, it should still touch up the returned
ext list.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] automake: loader: don't create an empty dri3 helper

2015-11-19 Thread Matt Turner
Patches 1, 3, and 4 are

Reviewed-by: Matt Turner 

(I gave an Ack on 2 since I didn't review the list of functions)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] automake: egl: add symbols test

2015-11-19 Thread Matt Turner
Seems like a good plan. I haven't reviewed the list, so have an

Acked-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] MESA_EXTENSION_OVERRIDE problem

2015-11-19 Thread Brian Paul

On 11/19/2015 01:19 PM, Ilia Mirkin wrote:

On Thu, Nov 19, 2015 at 3:13 PM, Brian Paul  wrote:

Hi Nanley,

Maybe you can fix an issue I have with the new extension code.

Previously, I could do something like export
MESA_EXTENSION_OVERRIDE="-ARB_clear_buffer_object" and I would no longer see
it in the GL_EXTENSIONS string, even if it was an "always on" extension.


How sure are you that this worked?


I'm pretty sure it worked that way when I first wrote it.



AFAIK there was no way to
enable/disable an always-on ext.


That's not the point.  The point is to change what 
glGetString(GL_EXTENSIONS) (or the glGetStringi() method) returns to the 
application.  I don't care if the underlying functionality is still 
working in Mesa.  We're trying to tell the application to not use a 
feature.  I've found this extremely helpful when debugging some apps.




That env var never affected the
extension strings, but only the actual ctx->Extensions struct, which
would indirectly modify the ext string. If the entry in that ext table
was a dummy one, it had logic to not futz with it. (But it did so
silently.)


Yeah, I don't care about the table.  I only care about the return value 
of glGetString(GL_EXTENSIONS).


-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/9] Early fp64 fixes

2015-11-19 Thread Matt Turner
For the next patches you send from Connor, please use
--suppress-cc=author so that every reply doesn't generate a bounce
message about his disabled @intel email. :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] MESA_EXTENSION_OVERRIDE problem

2015-11-19 Thread Ilia Mirkin
On Thu, Nov 19, 2015 at 3:13 PM, Brian Paul  wrote:
> Hi Nanley,
>
> Maybe you can fix an issue I have with the new extension code.
>
> Previously, I could do something like export
> MESA_EXTENSION_OVERRIDE="-ARB_clear_buffer_object" and I would no longer see
> it in the GL_EXTENSIONS string, even if it was an "always on" extension.

How sure are you that this worked? AFAIK there was no way to
enable/disable an always-on ext. That env var never affected the
extension strings, but only the actual ctx->Extensions struct, which
would indirectly modify the ext string. If the entry in that ext table
was a dummy one, it had logic to not futz with it. (But it did so
silently.)

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Connor Abbott
On Thu, Nov 19, 2015 at 3:11 PM, Matt Turner  wrote:
> On Thu, Nov 19, 2015 at 11:35 AM, Connor Abbott  wrote:
>> On Thu, Nov 19, 2015 at 2:07 PM, Connor Abbott  wrote:
>>> On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner  wrote:
 On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott  wrote:
> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
>> wrote:
>>> From: Connor Abbott 
>>>
>>> It appears that not only math instructions, but also MOV_BYTES or
>>> any instruction that uses Align1 mode cannot be in the middle
>>> of a dependency control sequence or the GPU will hang (at least on my
>>> BDW). This fixes GPU hangs in some fp64 tests.
>>
>> I'm pretty surprised by this assessment. Doubtful even.
>>
>>> Reviewed-by: Iago Toral Quiroga 
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> index 3bcd5cb..bc0a33b 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
>>> vec4_instruction *inst)
>>> }
>>>
>>> /*
>>> +* Instructions that use Align1 mode cause the GPU to hang when 
>>> inserted
>>> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered 
>>> empirically.
>>> +*/
>>> +
>>> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
>>> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>>
>> PACK_BYTES sets depctrl itself in the generator, and at the time I
>> added it I made a test that did
>>
>>   vec4 foo = vec4(packUnorm4x8(...),
>>   packUnorm4x8(...),
>>   packUnorm4x8(...),
>>   packUnorm4x8(...))
>>
>> and confirmed that it set depctrl properly on the whole sequence.
>> There could of course be bugs somewhere, but the "hardware doesn't
>> work if you mix align1 and align16 depctrl" seems unlikely.
>>
>> Do you know of a test that this affects?
>
> This only affects FP64 tests, since there we use an align1 mov to do
> double-to-float and float-to-double. However, I tried commenting out
> emit_nir_code() and just doing essentially:
>
> emit(MOV(...))->force_writemask_all = true;
> emit(VEC4_OPCODE_PACK_BYTES, ...);
> emit(MOV(...))->force_writemask_all = true;
>
> and on my BDW it hanged. In case it's not clear: this isn't about
> setting depctrl on the instruction itself, it just can't be inside of
> a depctrl sequence (which we were already disallowing for math
> instructions anyways).

 Very weird. I'll take a look. So I understand, are the MOV
 instructions writing different channels of the same register? And
 VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
 as the MOVs? (I saw your fixup reply)
>>>
>>> Actually, I had them writing the same thing so the second overwrote
>>> the first one.
>>
>> Err, just to be clear... in the actual test that led to me discovering
>> this, the instructions (not mov's but a bfi and a mov IIRC) were
>> writing different components of the same register. In my hacked-up
>> test to see if align1 in between a depctrl sequence was the problem, I
>> just had them both write to the same thing since it was easier. In any
>> case, I don't think it should matter too much.
>
> Interesting. If it was a BFI and a MOV writing different components of
> the same register, I can see that failing. Though I haven't measured,
> 3-src instructions usually have 2+ extra cycles of latency and with a
> 2-cycle issue time, a BFI followed immediately by a MOV would mean
> that both of them retired in the same cycle. That would seem bad if
> one of them was supposed to signal the scoreboard.
> ×

They weren't right next to each other, though -- there was a sequence
of 5-6 instructions between them, one of which happens to be an align1
mov which (according to my testing) hangs the GPU in between any
depctrl sequence.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 11:35 AM, Connor Abbott  wrote:
> On Thu, Nov 19, 2015 at 2:07 PM, Connor Abbott  wrote:
>> On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner  wrote:
>>> On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott  wrote:
 On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
> wrote:
>> From: Connor Abbott 
>>
>> It appears that not only math instructions, but also MOV_BYTES or
>> any instruction that uses Align1 mode cannot be in the middle
>> of a dependency control sequence or the GPU will hang (at least on my
>> BDW). This fixes GPU hangs in some fp64 tests.
>
> I'm pretty surprised by this assessment. Doubtful even.
>
>> Reviewed-by: Iago Toral Quiroga 
>> ---
>>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> index 3bcd5cb..bc0a33b 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
>> vec4_instruction *inst)
>> }
>>
>> /*
>> +* Instructions that use Align1 mode cause the GPU to hang when 
>> inserted
>> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered 
>> empirically.
>> +*/
>> +
>> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
>> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>
> PACK_BYTES sets depctrl itself in the generator, and at the time I
> added it I made a test that did
>
>   vec4 foo = vec4(packUnorm4x8(...),
>   packUnorm4x8(...),
>   packUnorm4x8(...),
>   packUnorm4x8(...))
>
> and confirmed that it set depctrl properly on the whole sequence.
> There could of course be bugs somewhere, but the "hardware doesn't
> work if you mix align1 and align16 depctrl" seems unlikely.
>
> Do you know of a test that this affects?

 This only affects FP64 tests, since there we use an align1 mov to do
 double-to-float and float-to-double. However, I tried commenting out
 emit_nir_code() and just doing essentially:

 emit(MOV(...))->force_writemask_all = true;
 emit(VEC4_OPCODE_PACK_BYTES, ...);
 emit(MOV(...))->force_writemask_all = true;

 and on my BDW it hanged. In case it's not clear: this isn't about
 setting depctrl on the instruction itself, it just can't be inside of
 a depctrl sequence (which we were already disallowing for math
 instructions anyways).
>>>
>>> Very weird. I'll take a look. So I understand, are the MOV
>>> instructions writing different channels of the same register? And
>>> VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
>>> as the MOVs? (I saw your fixup reply)
>>
>> Actually, I had them writing the same thing so the second overwrote
>> the first one.
>
> Err, just to be clear... in the actual test that led to me discovering
> this, the instructions (not mov's but a bfi and a mov IIRC) were
> writing different components of the same register. In my hacked-up
> test to see if align1 in between a depctrl sequence was the problem, I
> just had them both write to the same thing since it was easier. In any
> case, I don't think it should matter too much.

Interesting. If it was a BFI and a MOV writing different components of
the same register, I can see that failing. Though I haven't measured,
3-src instructions usually have 2+ extra cycles of latency and with a
2-cycle issue time, a BFI followed immediately by a MOV would mean
that both of them retired in the same cycle. That would seem bad if
one of them was supposed to signal the scoreboard.
×
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] MESA_EXTENSION_OVERRIDE problem

2015-11-19 Thread Brian Paul

Hi Nanley,

Maybe you can fix an issue I have with the new extension code.

Previously, I could do something like export 
MESA_EXTENSION_OVERRIDE="-ARB_clear_buffer_object" and I would no longer 
see it in the GL_EXTENSIONS string, even if it was an "always on" extension.


Now when I try that I get:

Mesa 11.1.0-devel implementation error: Trying to disable permanently 
enabled extensions: GL_ARB_get_texture_sub_image

Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa

The whole point of the "-GL_EXT_foobar" syntax was to hide an extension 
from the application when it queries the driver's extensions.


Can you please fix this so it works as before?

Thanks.

-Brian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] i965: fix 64-bit immediates in brw_inst(_set)_bits

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 11:35 AM, Kristian Høgsberg  wrote:
> On Thu, Nov 19, 2015 at 11:24 AM, Matt Turner  wrote:
>> On Thu, Nov 19, 2015 at 9:30 AM, Kristian Høgsberg  
>> wrote:
>>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
>>> wrote:
 From: Connor Abbott 

 If we tried to get/set something that was exactly 64 bits, we would
 try to do (1 << 64) - 1 to calculate the mask which doesn't give us all
 1's like we want.

 v2 (Iago)
  - Replace ~0 by ~0ull
  - Removed unnecessary parenthesis

 Reviewed-by: Iago Toral Quiroga 
 ---
  src/mesa/drivers/dri/i965/brw_inst.h | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_inst.h 
 b/src/mesa/drivers/dri/i965/brw_inst.h
 index 4ed95c4..ec08194 100644
 --- a/src/mesa/drivers/dri/i965/brw_inst.h
 +++ b/src/mesa/drivers/dri/i965/brw_inst.h
 @@ -694,7 +694,8 @@ brw_inst_bits(const brw_inst *inst, unsigned high, 
 unsigned low)
 high %= 64;
 low %= 64;

 -   const uint64_t mask = (1ull << (high - low + 1)) - 1;
 +   const uint64_t mask = (high - low == 63) ? ~0ull :
 +  (1ull << (high - low + 1)) - 1;
>>>
>>> Can we do
>>>
>>> const uint64_t mask = (~0ul >> (64 - (high - low + 1)));
>>>
>>> instead?
>>
>> I don't think so, because ~0ul is of type unsigned, so right shifting
>> it shifts in zeros. I was going to make a similar comment on the
>> original patch -- "-1" is preferable over ~0u with an increasingly
>> long sequence of l's because it's signed, so it's sign extended to
>> fill whatever you assign it to. In your code though, since it's an
>> operand we'd need -1ll, I think...
>
> No, shifting in zeros is the whole point. We start out with 64 1 bits,
> then shift it down enough that we end up with (high - low + 1) 1 bits
> at the bottom, which is what we're trying to compute.

Doh. Of course. :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFCv2 03/13] nir: allow pre-resolved sampler uniform locations

2015-11-19 Thread Ilia Mirkin
On Thu, Nov 19, 2015 at 2:54 PM, Rob Clark  wrote:
> On Mon, Nov 9, 2015 at 4:08 PM, Timothy Arceri  wrote:
>> On Mon, 2015-11-09 at 07:43 -0500, Rob Clark wrote:
>>> On Sun, Nov 8, 2015 at 7:58 PM, Timothy Arceri 
>>> wrote:
>>> > On Sun, 2015-11-08 at 15:12 -0500, Rob Clark wrote:
>>> > > From: Rob Clark 
>>> > >
>>> > > With TGSI, the ir_variable::data.location gets fixed up to be a stage
>>> > > local location (rather than program global).  In this case we need to
>>> > > skip the UniformStorage[location] lookup.
>>> > > ---
>>> > >  src/glsl/nir/nir_lower_samplers.c | 23 ---
>>> > >  1 file changed, 16 insertions(+), 7 deletions(-)
>>> > >
>>> > > diff --git a/src/glsl/nir/nir_lower_samplers.c
>>> > > b/src/glsl/nir/nir_lower_samplers.c
>>> > > index 5df79a6..d99ba4c 100644
>>> > > --- a/src/glsl/nir/nir_lower_samplers.c
>>> > > +++ b/src/glsl/nir/nir_lower_samplers.c
>>> > > @@ -130,14 +130,18 @@ lower_sampler(nir_tex_instr *instr, const struct
>>> > > gl_shader_program *shader_progr
>>> > >instr->sampler_array_size = array_elements;
>>> > > }
>>> > >
>>> > > -   if (location > shader_program->NumUniformStorage - 1 ||
>>> > > -   !shader_program->UniformStorage[location].opaque[stage].active)
>>> > > {
>>> > > -  assert(!"cannot return a sampler");
>>> > > -  return;
>>> > > -   }
>>> > > +   if (!shader_program) {
>>> > > +  instr->sampler_index = location;
>>> > > +   } else {
>>> > > +  if (location > shader_program->NumUniformStorage - 1 ||
>>> > > +  !shader_program
>>> > > ->UniformStorage[location].opaque[stage].active) {
>>> > > + assert(!"cannot return a sampler");
>>> > > + return;
>>> > > +  }
>>> > >
>>> > > -   instr->sampler_index +=
>>> > > -  shader_program->UniformStorage[location].opaque[stage].index;
>>> > > +  instr->sampler_index =
>>> > > + shader_program->UniformStorage[location].opaque[stage].index;
>>> >
>>> > Hi Rob,
>>> >
>>> > This will break arrays as instr->sampler_index is increamented inside
>>> >  calc_sampler_offsets()
>>>
>>> oh, whoops, I didn't notice that.. ok, that part is easy enough to fix..
>>>
>>> > calc_sampler_offsets() also modifies the value of location is this what
>>> > you
>>> > want? I would assume not as we are counting uniforms not just samplers
>>> > here.
>>>
>>> hmm, tbh I'm not entirely sure..  offhand, what piglit's should I
>>> check?
>>
>> tests/spec/arb_gpu_shader5/execution/sampler_array_indexing
>>
>> Contains the tests you probably want to try out.
>
> oh, hmm, looks like they all need at least gl3.2..

Not to mention ARB_gpu_shader5 :)

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFCv2 03/13] nir: allow pre-resolved sampler uniform locations

2015-11-19 Thread Rob Clark
On Mon, Nov 9, 2015 at 4:08 PM, Timothy Arceri  wrote:
> On Mon, 2015-11-09 at 07:43 -0500, Rob Clark wrote:
>> On Sun, Nov 8, 2015 at 7:58 PM, Timothy Arceri 
>> wrote:
>> > On Sun, 2015-11-08 at 15:12 -0500, Rob Clark wrote:
>> > > From: Rob Clark 
>> > >
>> > > With TGSI, the ir_variable::data.location gets fixed up to be a stage
>> > > local location (rather than program global).  In this case we need to
>> > > skip the UniformStorage[location] lookup.
>> > > ---
>> > >  src/glsl/nir/nir_lower_samplers.c | 23 ---
>> > >  1 file changed, 16 insertions(+), 7 deletions(-)
>> > >
>> > > diff --git a/src/glsl/nir/nir_lower_samplers.c
>> > > b/src/glsl/nir/nir_lower_samplers.c
>> > > index 5df79a6..d99ba4c 100644
>> > > --- a/src/glsl/nir/nir_lower_samplers.c
>> > > +++ b/src/glsl/nir/nir_lower_samplers.c
>> > > @@ -130,14 +130,18 @@ lower_sampler(nir_tex_instr *instr, const struct
>> > > gl_shader_program *shader_progr
>> > >instr->sampler_array_size = array_elements;
>> > > }
>> > >
>> > > -   if (location > shader_program->NumUniformStorage - 1 ||
>> > > -   !shader_program->UniformStorage[location].opaque[stage].active)
>> > > {
>> > > -  assert(!"cannot return a sampler");
>> > > -  return;
>> > > -   }
>> > > +   if (!shader_program) {
>> > > +  instr->sampler_index = location;
>> > > +   } else {
>> > > +  if (location > shader_program->NumUniformStorage - 1 ||
>> > > +  !shader_program
>> > > ->UniformStorage[location].opaque[stage].active) {
>> > > + assert(!"cannot return a sampler");
>> > > + return;
>> > > +  }
>> > >
>> > > -   instr->sampler_index +=
>> > > -  shader_program->UniformStorage[location].opaque[stage].index;
>> > > +  instr->sampler_index =
>> > > + shader_program->UniformStorage[location].opaque[stage].index;
>> >
>> > Hi Rob,
>> >
>> > This will break arrays as instr->sampler_index is increamented inside
>> >  calc_sampler_offsets()
>>
>> oh, whoops, I didn't notice that.. ok, that part is easy enough to fix..
>>
>> > calc_sampler_offsets() also modifies the value of location is this what
>> > you
>> > want? I would assume not as we are counting uniforms not just samplers
>> > here.
>>
>> hmm, tbh I'm not entirely sure..  offhand, what piglit's should I
>> check?
>
> tests/spec/arb_gpu_shader5/execution/sampler_array_indexing
>
> Contains the tests you probably want to try out.

oh, hmm, looks like they all need at least gl3.2..

BR,
-R

>>  I guess it would be easier to debug if it worked correctly
>> with glsl_to_tgsi, but I guess I could try to get the non-indirect
>> case working..
>>
>> BR,
>> -R
>>
>> > The other thing to note is that glsl to tgsi doesn't handle indirects on
>> > structs or arrays of arrays correctly (Ilia was trying to fix this).
>> >
>> > Tim
>> >
>> >
>> >
>> > > +   }
>> > >
>> > > instr->sampler = NULL;
>> > >  }
>> > > @@ -177,6 +181,11 @@ lower_impl(nir_function_impl *impl, const struct
>> > > gl_shader_program *shader_progr
>> > > nir_foreach_block(impl, lower_block_cb, &state);
>> > >  }
>> > >
>> > > +/* Call with a null 'shader_program' if uniform locations are
>> >
>> > uniform locations -> sampler indices?
>> >
>> > > + * already local to the shader, ie. skipping the
>> > > + * shader_program->UniformStorage[location].opaque[stage].index
>> > > + * lookup
>> > > + */
>> > >  void
>> > >  nir_lower_samplers(nir_shader *shader,
>> > > const struct gl_shader_program *shader_program)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 10/42] i965: Calculate appropriate L3 partition weights for the current pipeline state.

2015-11-19 Thread Kristian Høgsberg
On Thu, Nov 19, 2015 at 4:24 AM, Francisco Jerez  wrote:
> Kristian Høgsberg  writes:
>
>> On Tue, Nov 17, 2015 at 9:54 PM, Jordan Justen
>>  wrote:
>>> From: Francisco Jerez 
>>>
>>> This calculates a rather conservative partitioning of the L3 cache
>>> based on the shaders currently bound to the pipeline and whether they
>>> use SLM, atomics, images or scratch space.  The result is intended to
>>> be fine-tuned later on based on other pipeline state.
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_compiler.h  |  1 +
>>>  src/mesa/drivers/dri/i965/gen7_l3_state.c | 53 
>>> +++
>>>  2 files changed, 54 insertions(+)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
>>> b/src/mesa/drivers/dri/i965/brw_compiler.h
>>> index 8f147d3..ef8bddb 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_compiler.h
>>> +++ b/src/mesa/drivers/dri/i965/brw_compiler.h
>>> @@ -300,6 +300,7 @@ struct brw_stage_prog_data {
>>>
>>> unsigned curb_read_length;
>>> unsigned total_scratch;
>>> +   unsigned total_shared;
>>>
>>> /**
>>>  * Register where the thread expects to find input data from the URB
>>> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
>>> b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>>> index 4d0cfcd..1a88261 100644
>>> --- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
>>> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>>> @@ -258,6 +258,59 @@ get_l3_config(const struct brw_device_info *devinfo, 
>>> struct brw_l3_weights w0)
>>>  }
>>>
>>>  /**
>>> + * Return a reasonable default L3 configuration for the specified device 
>>> based
>>> + * on whether SLM and DC are required.  In the non-SLM non-DC case the 
>>> result
>>> + * is intended to approximately resemble the hardware defaults.
>>> + */
>>> +static struct brw_l3_weights
>>> +get_default_l3_weights(const struct brw_device_info *devinfo,
>>> +   bool needs_dc, bool needs_slm)
>>> +{
>>> +   struct brw_l3_weights w = {{ 0 }};
>>> +
>>> +   w.w[L3P_SLM] = needs_slm;
>>> +   w.w[L3P_URB] = 1.0;
>>> +
>>> +   if (devinfo->gen >= 8) {
>>> +  w.w[L3P_ALL] = 1.0;
>>> +   } else {
>>> +  w.w[L3P_DC] = needs_dc ? 0.1 : 0;
>>> +  w.w[L3P_RO] = devinfo->is_baytrail ? 0.5 : 1.0;
>>> +   }
>>> +
>>> +   return norm_l3_weights(w);
>>> +}
>>> +
>>> +/**
>>> + * Calculate the desired L3 partitioning based on the current state of the
>>> + * pipeline.  For now this simply returns the conservative defaults 
>>> calculated
>>> + * by get_default_l3_weights(), but we could probably do better by 
>>> gathering
>>> + * more statistics from the pipeline state (e.g. guess of expected URB 
>>> usage
>>> + * and bound surfaces), or by using feed-back from performance counters.
>>> + */
>>> +static struct brw_l3_weights
>>> +get_pipeline_state_l3_weights(const struct brw_context *brw)
>>> +{
>>> +   const struct brw_stage_state *stage_states[] = {
>>> +  &brw->vs.base, &brw->gs.base, &brw->wm.base, &brw->cs.base
>>> +   };
>>> +   bool needs_dc = false, needs_slm = false;
>>
>> This doesn't seem optimal - we should evaluate the 3D pipe and the
>> compute pipe separately depending on which  one is active. For
>> example, if we have a current compute program that uses SLM, but are
>> using the 3D pipeline, we'll get a partition that includes SLM even
>> for the 3D pipe.
>>
> The intention of this patch is not to provide an optimal heuristic, but
> to implement a simple heuristic that calculates conservative defaults in
> order to guarantee functional correctness.  It would be possible to base
> the result on the currently active pipeline with minimal changes to this
> function (and making sure that the L3 config atom is invalidated while
> switching pipelines), but I don't think we want to switch back and forth
> between SLM and non-SLM configurations if the application interleaves
> draw and compute operations, because the L3 partitioning is global
> rather than per-pipeline and transitions are expensive -- Switching
> between L3 configuration requires a full pipeline stall and flushing and
> invalidation of all L3-backed caches, doing what you suggest would
> likely lead to cache-thrashing and might prevent pipelining of compute
> and render workloads [At least on BDW+ -- On Gen7 it might be impossible
> to achieve pipelining of render and compute workloads due to hardware
> issues].  If the result of the heuristic is based on the currently
> active pipeline some sort of hysteresis will likely be desirable, which
> in this patch I get for free by basing the calculation on the context
> state rather than on the currently active pipeline.  I agree with you
> that we'll probably need a more sophisticated heuristic in the future,
> but that falls outside the scope of this series -- If anything we need
> to be able to run benchmarks exercising CS and SLM before we can find
> out which kind of heuristic we want and what degree of hysteresis is
> desirable.

Switching between compute and 3d pipeline (P

Re: [Mesa-dev] [PATCH 3/9] i965: fix 64-bit immediates in brw_inst(_set)_bits

2015-11-19 Thread Kristian Høgsberg
On Thu, Nov 19, 2015 at 11:24 AM, Matt Turner  wrote:
> On Thu, Nov 19, 2015 at 9:30 AM, Kristian Høgsberg  wrote:
>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
>> wrote:
>>> From: Connor Abbott 
>>>
>>> If we tried to get/set something that was exactly 64 bits, we would
>>> try to do (1 << 64) - 1 to calculate the mask which doesn't give us all
>>> 1's like we want.
>>>
>>> v2 (Iago)
>>>  - Replace ~0 by ~0ull
>>>  - Removed unnecessary parenthesis
>>>
>>> Reviewed-by: Iago Toral Quiroga 
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_inst.h | 6 --
>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_inst.h 
>>> b/src/mesa/drivers/dri/i965/brw_inst.h
>>> index 4ed95c4..ec08194 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_inst.h
>>> +++ b/src/mesa/drivers/dri/i965/brw_inst.h
>>> @@ -694,7 +694,8 @@ brw_inst_bits(const brw_inst *inst, unsigned high, 
>>> unsigned low)
>>> high %= 64;
>>> low %= 64;
>>>
>>> -   const uint64_t mask = (1ull << (high - low + 1)) - 1;
>>> +   const uint64_t mask = (high - low == 63) ? ~0ull :
>>> +  (1ull << (high - low + 1)) - 1;
>>
>> Can we do
>>
>> const uint64_t mask = (~0ul >> (64 - (high - low + 1)));
>>
>> instead?
>
> I don't think so, because ~0ul is of type unsigned, so right shifting
> it shifts in zeros. I was going to make a similar comment on the
> original patch -- "-1" is preferable over ~0u with an increasingly
> long sequence of l's because it's signed, so it's sign extended to
> fill whatever you assign it to. In your code though, since it's an
> operand we'd need -1ll, I think...

No, shifting in zeros is the whole point. We start out with 64 1 bits,
then shift it down enough that we end up with (high - low + 1) 1 bits
at the bottom, which is what we're trying to compute.

Kristian

>
> So with s/~0ul/-1ll/, I think that'll work? It's all evaluated at
> compile-time in any case, so clarity is the only metric. I don't have
> a preference.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Connor Abbott
On Thu, Nov 19, 2015 at 2:07 PM, Connor Abbott  wrote:
> On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner  wrote:
>> On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott  wrote:
>>> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
 On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
 wrote:
> From: Connor Abbott 
>
> It appears that not only math instructions, but also MOV_BYTES or
> any instruction that uses Align1 mode cannot be in the middle
> of a dependency control sequence or the GPU will hang (at least on my
> BDW). This fixes GPU hangs in some fp64 tests.

 I'm pretty surprised by this assessment. Doubtful even.

> Reviewed-by: Iago Toral Quiroga 
> ---
>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index 3bcd5cb..bc0a33b 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
> vec4_instruction *inst)
> }
>
> /*
> +* Instructions that use Align1 mode cause the GPU to hang when 
> inserted
> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered 
> empirically.
> +*/
> +
> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||

 PACK_BYTES sets depctrl itself in the generator, and at the time I
 added it I made a test that did

   vec4 foo = vec4(packUnorm4x8(...),
   packUnorm4x8(...),
   packUnorm4x8(...),
   packUnorm4x8(...))

 and confirmed that it set depctrl properly on the whole sequence.
 There could of course be bugs somewhere, but the "hardware doesn't
 work if you mix align1 and align16 depctrl" seems unlikely.

 Do you know of a test that this affects?
>>>
>>> This only affects FP64 tests, since there we use an align1 mov to do
>>> double-to-float and float-to-double. However, I tried commenting out
>>> emit_nir_code() and just doing essentially:
>>>
>>> emit(MOV(...))->force_writemask_all = true;
>>> emit(VEC4_OPCODE_PACK_BYTES, ...);
>>> emit(MOV(...))->force_writemask_all = true;
>>>
>>> and on my BDW it hanged. In case it's not clear: this isn't about
>>> setting depctrl on the instruction itself, it just can't be inside of
>>> a depctrl sequence (which we were already disallowing for math
>>> instructions anyways).
>>
>> Very weird. I'll take a look. So I understand, are the MOV
>> instructions writing different channels of the same register? And
>> VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
>> as the MOVs? (I saw your fixup reply)
>
> Actually, I had them writing the same thing so the second overwrote
> the first one.

Err, just to be clear... in the actual test that led to me discovering
this, the instructions (not mov's but a bfi and a mov IIRC) were
writing different components of the same register. In my hacked-up
test to see if align1 in between a depctrl sequence was the problem, I
just had them both write to the same thing since it was easier. In any
case, I don't think it should matter too much.

> The PACK_BYTES/MOV_BYTES/F2D/D2F (I think I tested all
> of them, not sure) were operating on completely different registers,
> and in the FP64 test that actually hung the GPU they were as well.
> Using d2f since it's simpler and I remember what the operands are
> (it's just an align1 mov with a dest stride of 2), the test code was
> something like:
>
> mov g50, g51 { no_dd_clear }
> d2f g52, g54
> mov g50, g53 { no_dd_check }
>
> and changing the d2f to a normal align16 mov or commenting it out
> prevented the hang. It would be interesting to see if a math
> instruction instead of d2f also hangs.
>
>>
>> By the way, the math code is too heavy handed as far as I know. The
>> BDW+ docs say that the MATH instruction itself cannot take dependency
>> control hints (and empirically earlier platforms seem to have problems
>> with this as well, see
>> tests/shaders/dependency-hints/exp2.shader_test) -- nothing about a
>> math instruction being in the middle of a NoDDC* block. The person who
>> implemented the math did the minimal amount of work to fix the
>> problem.
>>
>> The PRM also says:
>>
>> """
>> Instructions other than send, may use this control as long as
>> operations that have different pipeline latencies are not mixed. The
>> operations that have longer latencies are:
>>
>> Opcodes pln, lrp, dp*.
>> Operations involving double precision computation.
>> Integer DW multiplication where both source operands are DWs.
>> """
>>
>> I would say that mixing a double-precision operation and something
>> else might cause problems, but that seems like we have a different
>> probl

Re: [Mesa-dev] [PATCH 1.9/2] i965: Add brw_imm_uv().

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 9:53 AM, Matt Turner  wrote:
> On Thu, Nov 19, 2015 at 8:00 AM, Jason Ekstrand  wrote:
>> On Wed, Nov 18, 2015 at 2:25 PM, Matt Turner  wrote:
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 3 +++
>>>  src/mesa/drivers/dri/i965/brw_reg.h| 9 +
>>>  2 files changed, 12 insertions(+)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>>> index e5a286a..78c10e9 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>>> @@ -116,6 +116,9 @@ brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, 
>>> unsigned gen)
>>>case BRW_REGISTER_TYPE_V:
>>>   brw_reg = brw_imm_v(reg->ud);
>>>   break;
>>> +  case BRW_REGISTER_TYPE_UV:
>>> + brw_reg = brw_imm_uv(reg->ud);
>>> + break;
>>>default:
>>>  unreachable("not reached");
>>>}
>>> diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
>>> b/src/mesa/drivers/dri/i965/brw_reg.h
>>> index 759bd93..b77cdeb 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_reg.h
>>> +++ b/src/mesa/drivers/dri/i965/brw_reg.h
>>> @@ -623,6 +623,15 @@ brw_imm_v(unsigned v)
>>> return imm;
>>>  }
>>>
>>> +/** Construct vector of eight unsigned half-byte values */
>>> +static inline struct brw_reg
>>> +brw_imm_uv(unsigned uv)
>>> +{
>>
>> Please add a GEN assertion either here or in the generator.  This only
>> becomes available on Haswell or Broadwell if I remember correctly.  I
>> do know it's not universally available.
>
> Good idea. (It's Sandybridge+)

On second thought, I can't (without adding a devinfo parameter to this
function, which I'm not going to do)

This is something the instruction validator can be tasked with verifying.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1.9/2] i965: Add brw_imm_uv().

2015-11-19 Thread Matt Turner
On Wed, Nov 18, 2015 at 2:25 PM, Matt Turner  wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 3 +++
>  src/mesa/drivers/dri/i965/brw_reg.h| 9 +
>  2 files changed, 12 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index e5a286a..78c10e9 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -116,6 +116,9 @@ brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, unsigned 
> gen)
>case BRW_REGISTER_TYPE_V:
>   brw_reg = brw_imm_v(reg->ud);
>   break;
> +  case BRW_REGISTER_TYPE_UV:
> + brw_reg = brw_imm_uv(reg->ud);
> + break;

This hunk of the patch is now unnecessary with a patch I pushed this morning.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] i965: fix 64-bit immediates in brw_inst(_set)_bits

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 9:30 AM, Kristian Høgsberg  wrote:
> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  wrote:
>> From: Connor Abbott 
>>
>> If we tried to get/set something that was exactly 64 bits, we would
>> try to do (1 << 64) - 1 to calculate the mask which doesn't give us all
>> 1's like we want.
>>
>> v2 (Iago)
>>  - Replace ~0 by ~0ull
>>  - Removed unnecessary parenthesis
>>
>> Reviewed-by: Iago Toral Quiroga 
>> ---
>>  src/mesa/drivers/dri/i965/brw_inst.h | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_inst.h 
>> b/src/mesa/drivers/dri/i965/brw_inst.h
>> index 4ed95c4..ec08194 100644
>> --- a/src/mesa/drivers/dri/i965/brw_inst.h
>> +++ b/src/mesa/drivers/dri/i965/brw_inst.h
>> @@ -694,7 +694,8 @@ brw_inst_bits(const brw_inst *inst, unsigned high, 
>> unsigned low)
>> high %= 64;
>> low %= 64;
>>
>> -   const uint64_t mask = (1ull << (high - low + 1)) - 1;
>> +   const uint64_t mask = (high - low == 63) ? ~0ull :
>> +  (1ull << (high - low + 1)) - 1;
>
> Can we do
>
> const uint64_t mask = (~0ul >> (64 - (high - low + 1)));
>
> instead?

I don't think so, because ~0ul is of type unsigned, so right shifting
it shifts in zeros. I was going to make a similar comment on the
original patch -- "-1" is preferable over ~0u with an increasingly
long sequence of l's because it's signed, so it's sign extended to
fill whatever you assign it to. In your code though, since it's an
operand we'd need -1ll, I think...

So with s/~0ul/-1ll/, I think that'll work? It's all evaluated at
compile-time in any case, so clarity is the only metric. I don't have
a preference.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Connor Abbott
On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner  wrote:
> On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott  wrote:
>> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
>>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
>>> wrote:
 From: Connor Abbott 

 It appears that not only math instructions, but also MOV_BYTES or
 any instruction that uses Align1 mode cannot be in the middle
 of a dependency control sequence or the GPU will hang (at least on my
 BDW). This fixes GPU hangs in some fp64 tests.
>>>
>>> I'm pretty surprised by this assessment. Doubtful even.
>>>
 Reviewed-by: Iago Toral Quiroga 
 ---
  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
  1 file changed, 12 insertions(+), 5 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
 b/src/mesa/drivers/dri/i965/brw_vec4.cpp
 index 3bcd5cb..bc0a33b 100644
 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
 @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
 vec4_instruction *inst)
 }

 /*
 +* Instructions that use Align1 mode cause the GPU to hang when 
 inserted
 +* between a NoDDClr and NoDDChk in Align16 mode. Discovered 
 empirically.
 +*/
 +
 +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
 +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>>>
>>> PACK_BYTES sets depctrl itself in the generator, and at the time I
>>> added it I made a test that did
>>>
>>>   vec4 foo = vec4(packUnorm4x8(...),
>>>   packUnorm4x8(...),
>>>   packUnorm4x8(...),
>>>   packUnorm4x8(...))
>>>
>>> and confirmed that it set depctrl properly on the whole sequence.
>>> There could of course be bugs somewhere, but the "hardware doesn't
>>> work if you mix align1 and align16 depctrl" seems unlikely.
>>>
>>> Do you know of a test that this affects?
>>
>> This only affects FP64 tests, since there we use an align1 mov to do
>> double-to-float and float-to-double. However, I tried commenting out
>> emit_nir_code() and just doing essentially:
>>
>> emit(MOV(...))->force_writemask_all = true;
>> emit(VEC4_OPCODE_PACK_BYTES, ...);
>> emit(MOV(...))->force_writemask_all = true;
>>
>> and on my BDW it hanged. In case it's not clear: this isn't about
>> setting depctrl on the instruction itself, it just can't be inside of
>> a depctrl sequence (which we were already disallowing for math
>> instructions anyways).
>
> Very weird. I'll take a look. So I understand, are the MOV
> instructions writing different channels of the same register? And
> VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
> as the MOVs? (I saw your fixup reply)

Actually, I had them writing the same thing so the second overwrote
the first one. The PACK_BYTES/MOV_BYTES/F2D/D2F (I think I tested all
of them, not sure) were operating on completely different registers,
and in the FP64 test that actually hung the GPU they were as well.
Using d2f since it's simpler and I remember what the operands are
(it's just an align1 mov with a dest stride of 2), the test code was
something like:

mov g50, g51 { no_dd_clear }
d2f g52, g54
mov g50, g53 { no_dd_check }

and changing the d2f to a normal align16 mov or commenting it out
prevented the hang. It would be interesting to see if a math
instruction instead of d2f also hangs.

>
> By the way, the math code is too heavy handed as far as I know. The
> BDW+ docs say that the MATH instruction itself cannot take dependency
> control hints (and empirically earlier platforms seem to have problems
> with this as well, see
> tests/shaders/dependency-hints/exp2.shader_test) -- nothing about a
> math instruction being in the middle of a NoDDC* block. The person who
> implemented the math did the minimal amount of work to fix the
> problem.
>
> The PRM also says:
>
> """
> Instructions other than send, may use this control as long as
> operations that have different pipeline latencies are not mixed. The
> operations that have longer latencies are:
>
> Opcodes pln, lrp, dp*.
> Operations involving double precision computation.
> Integer DW multiplication where both source operands are DWs.
> """
>
> I would say that mixing a double-precision operation and something
> else might cause problems, but that seems like we have a different
> problem thus far.

Yeah, these are all just mov's so I would expect that section to
apply. It still seems like we're not taking it into account, though...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Add test for sorted extension table

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 10:44 AM, Nanley Chery  wrote:
> From: Nanley Chery 
>
> Enable developers to know if the table's alphabetical sorting
> is maintained or lost.
>
> v2: Move "*" next to pointer name (Matt)
> Include extensions_table.h instead of extensions.h (Ian)
> Remove extra " *" in comment (Ian)
>
> Signed-off-by: Nanley Chery 
> ---

Looks good.

Reviewed-by: Matt Turner 

Thanks Nanley!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 19/42] glsl ubo/ssbo: Move common code into lower_buffer_access::setup_buffer_access

2015-11-19 Thread Jordan Justen
On 2015-11-18 00:00:08, Iago Toral wrote:
> On Tue, 2015-11-17 at 21:54 -0800, Jordan Justen wrote:
> > This code will also be usable by the pass to lower shared variables.
> > 
> > Note, that *const_offset is adjusted by setup_buffer_access so it must
> > be initialized before calling setup_buffer_access.
> > 
> > v2:
> >  * Add comment for lower_buffer_access::setup_buffer_access
> > 
> > Signed-off-by: Jordan Justen 
> > Cc: Samuel Iglesias Gonsalvez 
> > Cc: Iago Toral Quiroga 
> > Reviewed-by: Iago Toral Quiroga 
> > ---
> >  src/glsl/lower_buffer_access.cpp | 177 
> > +++
> >  src/glsl/lower_buffer_access.h   |   5 ++
> >  src/glsl/lower_ubo_reference.cpp | 160 +--
> >  3 files changed, 185 insertions(+), 157 deletions(-)
> > 
> > diff --git a/src/glsl/lower_buffer_access.cpp 
> > b/src/glsl/lower_buffer_access.cpp
> > index b5fe6e3..297ed69 100644
> > --- a/src/glsl/lower_buffer_access.cpp
> > +++ b/src/glsl/lower_buffer_access.cpp
> > @@ -305,4 +305,181 @@ 
> > lower_buffer_access::is_dereferenced_thing_row_major(const ir_rvalue *deref)
> > return false;
> >  }
> >  
> > +/**
> > + * This function initializes various values that will be used later by
> > + * emit_access when actually emitting loads or stores.
> > + *
> > + * Note: const_offset is an input as well as an output. For UBO and SSBO, 
> > the
> > + * caller should initialize it to 0 to point to the start of the buffer
> > + * object. For compute shader shared variables it will be initialized to 
> > the
> > + * offset of variable in the shared variable storage block.
> 
> I think this is not true, your version changes UBO and SSBO to behave
> just like shader variables since you no longer ever update *const_offset
> when you find an ir_type_dereference_variable node. Instead, you always
> initialize *const_offset to the value of ubo_var->Offset in
> setup_for_load_or_store. I think you can just rewrite the note comment
> above as:
> 
> "const_offset is an input as well as an output, clients must initialize
> it to the offset of the variable in the underlying block, and this
> function will adjust it by adding the constant offset of the member
> being accessed into that variable"

Yeah, that sounds good. I'll use your version.

Thanks,

-Jordan

> 
> > + */
> > +void
> > +lower_buffer_access::setup_buffer_access(void *mem_ctx,
> > + ir_variable *var,
> > + ir_rvalue *deref,
> > + ir_rvalue **offset,
> > + unsigned *const_offset,
> > + bool *row_major,
> > + int *matrix_columns,
> > + unsigned packing)
> > +{
> > +   *offset = new(mem_ctx) ir_constant(0u);
> > +   *row_major = is_dereferenced_thing_row_major(deref);
> > +   *matrix_columns = 1;
> > +
> > +   /* Calculate the offset to the start of the region of the UBO
> > +* dereferenced by *rvalue.  This may be a variable offset if an
> > +* array dereference has a variable index.
> > +*/
> > +   while (deref) {
> > +  switch (deref->ir_type) {
> > +  case ir_type_dereference_variable: {
> > + deref = NULL;
> > + break;
> > +  }
> > +
> > +  case ir_type_dereference_array: {
> > + ir_dereference_array *deref_array = (ir_dereference_array *) 
> > deref;
> > + unsigned array_stride;
> > + if (deref_array->array->type->is_vector()) {
> > +/* We get this when storing or loading a component out of a 
> > vector
> > + * with a non-constant index. This happens for v[i] = f where 
> > v is
> > + * a vector (or m[i][j] = f where m is a matrix). If we don't
> > + * lower that here, it gets turned into v = vector_insert(v, i,
> > + * f), which loads the entire vector, modifies one component 
> > and
> > + * then write the entire thing back.  That breaks if another
> > + * thread or SIMD channel is modifying the same vector.
> > + */
> > +array_stride = 4;
> > +if (deref_array->array->type->is_double())
> > +   array_stride *= 2;
> > + } else if (deref_array->array->type->is_matrix() && *row_major) {
> > +/* When loading a vector out of a row major matrix, the
> > + * step between the columns (vectors) is the size of a
> > + * float, while the step between the rows (elements of a
> > + * vector) is handled below in emit_ubo_loads.
> > + */
> > +array_stride = 4;
> > +if (deref_array->array->type->is_double())
> > +   array_stride *= 2;
> > +*matrix_columns = deref_array->array->type->matrix_columns;
> > + } else if (deref_array->type->w

Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott  wrote:
> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
>> wrote:
>>> From: Connor Abbott 
>>>
>>> It appears that not only math instructions, but also MOV_BYTES or
>>> any instruction that uses Align1 mode cannot be in the middle
>>> of a dependency control sequence or the GPU will hang (at least on my
>>> BDW). This fixes GPU hangs in some fp64 tests.
>>
>> I'm pretty surprised by this assessment. Doubtful even.
>>
>>> Reviewed-by: Iago Toral Quiroga 
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> index 3bcd5cb..bc0a33b 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
>>> vec4_instruction *inst)
>>> }
>>>
>>> /*
>>> +* Instructions that use Align1 mode cause the GPU to hang when inserted
>>> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered 
>>> empirically.
>>> +*/
>>> +
>>> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
>>> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>>
>> PACK_BYTES sets depctrl itself in the generator, and at the time I
>> added it I made a test that did
>>
>>   vec4 foo = vec4(packUnorm4x8(...),
>>   packUnorm4x8(...),
>>   packUnorm4x8(...),
>>   packUnorm4x8(...))
>>
>> and confirmed that it set depctrl properly on the whole sequence.
>> There could of course be bugs somewhere, but the "hardware doesn't
>> work if you mix align1 and align16 depctrl" seems unlikely.
>>
>> Do you know of a test that this affects?
>
> This only affects FP64 tests, since there we use an align1 mov to do
> double-to-float and float-to-double. However, I tried commenting out
> emit_nir_code() and just doing essentially:
>
> emit(MOV(...))->force_writemask_all = true;
> emit(VEC4_OPCODE_PACK_BYTES, ...);
> emit(MOV(...))->force_writemask_all = true;
>
> and on my BDW it hanged. In case it's not clear: this isn't about
> setting depctrl on the instruction itself, it just can't be inside of
> a depctrl sequence (which we were already disallowing for math
> instructions anyways).

Very weird. I'll take a look. So I understand, are the MOV
instructions writing different channels of the same register? And
VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
as the MOVs? (I saw your fixup reply)

By the way, the math code is too heavy handed as far as I know. The
BDW+ docs say that the MATH instruction itself cannot take dependency
control hints (and empirically earlier platforms seem to have problems
with this as well, see
tests/shaders/dependency-hints/exp2.shader_test) -- nothing about a
math instruction being in the middle of a NoDDC* block. The person who
implemented the math did the minimal amount of work to fix the
problem.

The PRM also says:

"""
Instructions other than send, may use this control as long as
operations that have different pipeline latencies are not mixed. The
operations that have longer latencies are:

Opcodes pln, lrp, dp*.
Operations involving double precision computation.
Integer DW multiplication where both source operands are DWs.
"""

I would say that mixing a double-precision operation and something
else might cause problems, but that seems like we have a different
problem thus far.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: Add test for sorted extension table

2015-11-19 Thread Nanley Chery
From: Nanley Chery 

Enable developers to know if the table's alphabetical sorting
is maintained or lost.

v2: Move "*" next to pointer name (Matt)
Include extensions_table.h instead of extensions.h (Ian)
Remove extra " *" in comment (Ian)

Signed-off-by: Nanley Chery 
---

Does the original Rb still apply?

 src/mesa/main/tests/Makefile.am |  1 +
 src/mesa/main/tests/mesa_extensions.cpp | 51 +
 2 files changed, 52 insertions(+)
 create mode 100644 src/mesa/main/tests/mesa_extensions.cpp

diff --git a/src/mesa/main/tests/Makefile.am b/src/mesa/main/tests/Makefile.am
index bd7ab73..d6977e2 100644
--- a/src/mesa/main/tests/Makefile.am
+++ b/src/mesa/main/tests/Makefile.am
@@ -27,6 +27,7 @@ AM_CPPFLAGS += -DHAVE_SHARED_GLAPI
 main_test_SOURCES +=   \
dispatch_sanity.cpp \
mesa_formats.cpp\
+   mesa_extensions.cpp \
program_state_string.cpp
 
 main_test_LDADD += \
diff --git a/src/mesa/main/tests/mesa_extensions.cpp 
b/src/mesa/main/tests/mesa_extensions.cpp
new file mode 100644
index 000..0c7addd
--- /dev/null
+++ b/src/mesa/main/tests/mesa_extensions.cpp
@@ -0,0 +1,51 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \name mesa_extensions.cpp
+ *
+ * Verify that the extensions table is sorted.
+ */
+
+#include 
+#include "util/macros.h"
+
+/**
+ * Debug/test: verify the extension table is alphabetically sorted.
+ */
+TEST(MesaExtensionsTest, AlphabeticallySorted)
+{
+   const char *ext_names[] = {
+   #define EXT(name_str, ...) #name_str,
+   #include "main/extensions_table.h"
+   #undef EXT
+   };
+
+   for (unsigned i = 0; i < ARRAY_SIZE(ext_names) - 1; ++i) {
+  const char *current_str = ext_names[i];
+  const char *next_str = ext_names[i+1];
+
+  /* We expect the extension table to be alphabetically sorted */
+  ASSERT_LT(strcmp(current_str, next_str), 0);
+   }
+}
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1.9/2] i965: Add brw_imm_uv().

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 8:00 AM, Jason Ekstrand  wrote:
> On Wed, Nov 18, 2015 at 2:25 PM, Matt Turner  wrote:
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 3 +++
>>  src/mesa/drivers/dri/i965/brw_reg.h| 9 +
>>  2 files changed, 12 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
>> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> index e5a286a..78c10e9 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> @@ -116,6 +116,9 @@ brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, unsigned 
>> gen)
>>case BRW_REGISTER_TYPE_V:
>>   brw_reg = brw_imm_v(reg->ud);
>>   break;
>> +  case BRW_REGISTER_TYPE_UV:
>> + brw_reg = brw_imm_uv(reg->ud);
>> + break;
>>default:
>>  unreachable("not reached");
>>}
>> diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
>> b/src/mesa/drivers/dri/i965/brw_reg.h
>> index 759bd93..b77cdeb 100644
>> --- a/src/mesa/drivers/dri/i965/brw_reg.h
>> +++ b/src/mesa/drivers/dri/i965/brw_reg.h
>> @@ -623,6 +623,15 @@ brw_imm_v(unsigned v)
>> return imm;
>>  }
>>
>> +/** Construct vector of eight unsigned half-byte values */
>> +static inline struct brw_reg
>> +brw_imm_uv(unsigned uv)
>> +{
>
> Please add a GEN assertion either here or in the generator.  This only
> becomes available on Haswell or Broadwell if I remember correctly.  I
> do know it's not universally available.

Good idea. (It's Sandybridge+)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] nouveau: move interlaced assert down in nouveau_vp3_video_buffer_create

2015-11-19 Thread Ilia Mirkin
On Thu, Nov 19, 2015 at 4:37 AM, Julien Isorce  wrote:
> templat->interlaced is 0 if not NV12 which is the case currently
> when using VPP.
>
> Signed-off-by: Julien Isorce 
> ---
>  src/gallium/drivers/nouveau/nouveau_vp3_video.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/nouveau/nouveau_vp3_video.c 
> b/src/gallium/drivers/nouveau/nouveau_vp3_video.c
> index f3a64b2..411b853 100644
> --- a/src/gallium/drivers/nouveau/nouveau_vp3_video.c
> +++ b/src/gallium/drivers/nouveau/nouveau_vp3_video.c
> @@ -83,10 +83,10 @@ nouveau_vp3_video_buffer_create(struct pipe_context *pipe,
> struct pipe_sampler_view sv_templ;
> struct pipe_surface surf_templ;
>
> -   assert(templat->interlaced);
> if (getenv("XVMC_VL") || templat->buffer_format != PIPE_FORMAT_NV12)
>return vl_video_buffer_create(pipe, templat);
>
> +   assert(templat->interlaced);
> assert(templat->chroma_format == PIPE_VIDEO_CHROMA_FORMAT_420);
>
> buffer = CALLOC_STRUCT(nouveau_vp3_video_buffer);
> --

Makes sense. Looks like this is (roughly) how I did it in
nv84_video_buffer_create too -- i.e. bail first if !NV12 and then
check interlaced.

Reviewed-by: Ilia Mirkin 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] i965: fix 64-bit immediates in brw_inst(_set)_bits

2015-11-19 Thread Ilia Mirkin
On Thu, Nov 19, 2015 at 12:30 PM, Kristian Høgsberg  wrote:
> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  wrote:
>> From: Connor Abbott 
>>
>> If we tried to get/set something that was exactly 64 bits, we would
>> try to do (1 << 64) - 1 to calculate the mask which doesn't give us all
>> 1's like we want.
>>
>> v2 (Iago)
>>  - Replace ~0 by ~0ull
>>  - Removed unnecessary parenthesis
>>
>> Reviewed-by: Iago Toral Quiroga 
>> ---
>>  src/mesa/drivers/dri/i965/brw_inst.h | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_inst.h 
>> b/src/mesa/drivers/dri/i965/brw_inst.h
>> index 4ed95c4..ec08194 100644
>> --- a/src/mesa/drivers/dri/i965/brw_inst.h
>> +++ b/src/mesa/drivers/dri/i965/brw_inst.h
>> @@ -694,7 +694,8 @@ brw_inst_bits(const brw_inst *inst, unsigned high, 
>> unsigned low)
>> high %= 64;
>> low %= 64;
>>
>> -   const uint64_t mask = (1ull << (high - low + 1)) - 1;
>> +   const uint64_t mask = (high - low == 63) ? ~0ull :
>> +  (1ull << (high - low + 1)) - 1;
>
> Can we do
>
> const uint64_t mask = (~0ul >> (64 - (high - low + 1)));
>
> instead?
>
> Kristian
>
>> return (inst->data[word] >> low) & mask;
>>  }
>> @@ -713,7 +714,8 @@ brw_inst_set_bits(brw_inst *inst, unsigned high, 
>> unsigned low, uint64_t value)
>> high %= 64;
>> low %= 64;
>>
>> -   const uint64_t mask = ((1ull << (high - low + 1)) - 1) << low;
>> +   const uint64_t mask = (high - low == 63) ? ~0ull :
>> +  ((1ull << (high - low + 1)) - 1) << low;

Or if you feel like reusing exising logic, there's
BITFIELD64_MASK/BITFIELD64_RANGE in mtypes.h [which does it the  ? :
way... could be fixed though].

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8 v2] i965: Enable EXT_shader_samples_identical

2015-11-19 Thread Neil Roberts
Jason Ekstrand  writes:

> On Nov 18, 2015 6:38 PM, "Ian Romanick"  wrote:
>>
>> From: Ian Romanick 
>>
>> v2: Handle immediate value for MCS smarter.  Rebase on changes to
>> nir_texop_sampels_identical (missing second parameter).  Suggested by
>> Jason.  This still doesn't handle the 16x MSAA case.

16x MSAA has a field in the program sampler key because it needs a new
instruction to fetch from the texture. For 16x MSAA it just uses two of
the registers that are returned by the MCS fetch instruction instead of
just one. Maybe you could do something like this:

  if (mcs.file == BRW_IMMEDIATE_VALUE) {
 bld.MOV(dst, fs_reg(0));
  } else if ((key_tex->msaa_16 & (1 << sampler))) {
 fs_reg tmp = vgrf(glsl_type::uint_type);
 bld.OR(tmp, mcs, offset(mcs, bld, 1));
 bld.CMP(dst, tmp, src_reg(0u), BRW_CONDITIONAL_EQ);
  } else {
 bld.CMP(dst, mcs, src_reg(0u), BRW_CONDITIONAL_EQ);
  }

I tested this on my SKL and it passes all of your Piglit tests. The
optimiser quite neatly removes the second instruction and the temporary
register in the 16x case and makes the OR directly update the flag
registers, so it is possibly the same cost as the 8x case in practice.

I can confirm that I've already pushed the 16x MSAA patches to master.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] automake: loader: rework the CPPFLAGS

2015-11-19 Thread Kai Wasserbäch
Hey Emil,
you can have my
  Tested-by: Kai Wasserbäch 
for the series.

I just did a build (amd64 and i386) and didn't see the additional symbols show
up again.

Thanks for taking care of this!

Cheers,
Kai


Emil Velikov wrote on 19.11.2015 17:33:
> From: Emil Velikov 
> 
> Rather than duplicating things, just use the generic AM_CPPFLAGS. This
> has the fortunate side-effect of adding VISIBILITY_CFLAGS for the dri3
> helper. The latter of which was erroneously exposing some internal
> symbols.
> 
> Cc: Kai Wasserbäch 
> Reported-by: Kai Wasserbäch 
> Signed-off-by: Emil Velikov 
> ---
>  src/loader/Makefile.am | 15 ---
>  1 file changed, 4 insertions(+), 11 deletions(-)
> 
> diff --git a/src/loader/Makefile.am b/src/loader/Makefile.am
> index c0f7947..67ed776 100644
> --- a/src/loader/Makefile.am
> +++ b/src/loader/Makefile.am
> @@ -25,18 +25,20 @@ EXTRA_DIST = SConscript
>  
>  noinst_LTLIBRARIES = libloader.la libloader_dri3_helper.la
>  
> -libloader_la_CPPFLAGS = \
> +AM_CPPFLAGS = \
>   $(DEFINES) \
>   -I$(top_srcdir)/include \
>   -I$(top_srcdir)/src \
>   $(VISIBILITY_CFLAGS) \
> + $(LIBDRM_CFLAGS) \
>   $(LIBUDEV_CFLAGS)
>  
>  libloader_la_SOURCES = $(LOADER_C_FILES)
>  libloader_la_LIBADD =
>  
>  if HAVE_DRICOMMON
> -libloader_la_CPPFLAGS += \
> +libloader_la_CPPFLAGS = \
> + $(AM_CPPFLAGS) \
>   -I$(top_srcdir)/src/mesa/drivers/dri/common/ \
>   -I$(top_builddir)/src/mesa/drivers/dri/common/ \
>   -I$(top_srcdir)/src/mesa/ \
> @@ -49,20 +51,11 @@ libloader_la_CPPFLAGS += \
>  endif
>  
>  if HAVE_LIBDRM
> -libloader_la_CPPFLAGS += \
> - $(LIBDRM_CFLAGS)
> -
>  libloader_la_LIBADD += \
>   $(LIBDRM_LIBS)
>  endif
>  
>  if HAVE_DRI3
> -libloader_dri3_helper_la_CPPFLAGS = \
> - $(DEFINES) \
> - -I$(top_srcdir)/include \
> - -I$(top_srcdir)/src \
> - $(LIBDRM_CFLAGS)
> -
>  libloader_dri3_helper_la_SOURCES = \
>   loader_dri3_helper.c \
>   loader_dri3_helper.h
> 

-- 

Kai Wasserbäch (Kai Wasserbaech)

E-Mail: k...@dev.carbon-project.org



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] i965: fix 64-bit immediates in brw_inst(_set)_bits

2015-11-19 Thread Kristian Høgsberg
On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  wrote:
> From: Connor Abbott 
>
> If we tried to get/set something that was exactly 64 bits, we would
> try to do (1 << 64) - 1 to calculate the mask which doesn't give us all
> 1's like we want.
>
> v2 (Iago)
>  - Replace ~0 by ~0ull
>  - Removed unnecessary parenthesis
>
> Reviewed-by: Iago Toral Quiroga 
> ---
>  src/mesa/drivers/dri/i965/brw_inst.h | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_inst.h 
> b/src/mesa/drivers/dri/i965/brw_inst.h
> index 4ed95c4..ec08194 100644
> --- a/src/mesa/drivers/dri/i965/brw_inst.h
> +++ b/src/mesa/drivers/dri/i965/brw_inst.h
> @@ -694,7 +694,8 @@ brw_inst_bits(const brw_inst *inst, unsigned high, 
> unsigned low)
> high %= 64;
> low %= 64;
>
> -   const uint64_t mask = (1ull << (high - low + 1)) - 1;
> +   const uint64_t mask = (high - low == 63) ? ~0ull :
> +  (1ull << (high - low + 1)) - 1;

Can we do

const uint64_t mask = (~0ul >> (64 - (high - low + 1)));

instead?

Kristian

> return (inst->data[word] >> low) & mask;
>  }
> @@ -713,7 +714,8 @@ brw_inst_set_bits(brw_inst *inst, unsigned high, unsigned 
> low, uint64_t value)
> high %= 64;
> low %= 64;
>
> -   const uint64_t mask = ((1ull << (high - low + 1)) - 1) << low;
> +   const uint64_t mask = (high - low == 63) ? ~0ull :
> +  ((1ull << (high - low + 1)) - 1) << low;
>
> /* Make sure the supplied value actually fits in the given bitfield. */
> assert((value & (mask >> low)) == value);
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] automake: loader: don't create an empty dri3 helper

2015-11-19 Thread Kristian Høgsberg
On Thu, Nov 19, 2015 at 8:33 AM, Emil Velikov  wrote:
> From: Emil Velikov 
>
> Seems that creating an empty one does not fair too well with MacOSX's
> ar. Considering that all the users of the helper include it only when
> needed, let's reshuffle the makefile.

For the series,

Reviewed-by: Kristian Høgsberg 


> Cc: Vinson Lee 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92985
> Signed-off-by: Emil Velikov 
> ---
>  src/loader/Makefile.am | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/src/loader/Makefile.am b/src/loader/Makefile.am
> index 5daa42e..5021120 100644
> --- a/src/loader/Makefile.am
> +++ b/src/loader/Makefile.am
> @@ -23,7 +23,7 @@ include Makefile.sources
>
>  EXTRA_DIST = SConscript
>
> -noinst_LTLIBRARIES = libloader.la libloader_dri3_helper.la
> +noinst_LTLIBRARIES = libloader.la
>
>  AM_CPPFLAGS = \
> $(DEFINES) \
> @@ -57,6 +57,8 @@ libloader_la_LIBADD += \
>  endif
>
>  if HAVE_DRI3
> +noinst_LTLIBRARIES += libloader_dri3_helper.la
> +
>  libloader_dri3_helper_la_SOURCES = \
> loader_dri3_helper.c \
> loader_dri3_helper.h
> --
> 2.6.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeon/llvm: Use llvm.AMDIL.exp intrinsic again for now

2015-11-19 Thread Nicolai Hähnle

On 19.11.2015 17:37, Tom Stellard wrote:

On Thu, Nov 19, 2015 at 11:17:12AM +0100, Nicolai Hähnle wrote:

On 19.11.2015 03:55, Tom Stellard wrote:

On Thu, Nov 19, 2015 at 11:31:55AM +0900, Michel Dänzer wrote:

From: Michel Dänzer 

llvm.exp2.f32 doesn't work in some cases yet.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709
Signed-off-by: Michel Dänzer 
---

Once the problem is fixed in the LLVM AMDGPU backend, we can re-enable
llvm.exp2.f32 for the fixed LLVM version.

  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index ac99e73..c94f109 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -1539,7 +1539,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context 
* ctx)
bld_base->op_actions[TGSI_OPCODE_ENDIF].emit = endif_emit;
bld_base->op_actions[TGSI_OPCODE_ENDLOOP].emit = endloop_emit;
bld_base->op_actions[TGSI_OPCODE_EX2].emit = build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_EX2].intr_name = "llvm.exp2.f32";
+   bld_base->op_actions[TGSI_OPCODE_EX2].intr_name = "llvm.AMDIL.exp.";


Do we want a native instruction here, or do we want IEEE precise exp2?
If it's the former then we shouldn't be using llvm.exp2.f32 anyway.

I know that we need to use llvm.AMDIL.exp. for older LLVM, but for newer
LLVM, I would really like to start doing intrinsics the correct way.  In
this case, it means adding an llvm.amdgcn.exp.f32 intrinsic to
include/llvm/IR/IntrinsicsAMDGPU.td.  In the section with the amdgcn
TargetPrefix.


Doesn't AMDGPU currently emit native instructions for the various
math intrinsics anyway? If your plan is to change that and we add
our own intrinsics, what's the plan to still benefit from existing
LLVM optimization passes?



AMDGPU does emit native instruction for the math intrinsics, but this is
wrong in most cases, because it assumes that a less precise calculation is
acceptable when this is not always the case.


Okay, that makes sense.


This a good point about optimizations.  I think the main benefits we get from
using the LLVM intrinsic are constant folding and also speculative execution.

An alternate solution would be to use the llvm intrinsic, but then communicate
to the backend via a function attribute that it is OK to use the less precise
version of exp.


I like that alternate solution.


FWIW, the original bug that Michel addresses here is caused by LLVM
libcall optimizations replacing the exp2 _intrinsic_ by a ldexp
_libcall_, which AMDGPU codegen cannot deal with. I suggested to
address this with http://reviews.llvm.org/D14327. Could you take a
look at that?



Please take a look at the comment I added to this review.


Done :)

Cheers,
Nicolai



-Tom


Cheers,
Nicolai


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] nv50: allow using inline vertex data submit when gl_VertexID is used

2015-11-19 Thread Samuel Pitoiset



On 11/19/2015 06:02 PM, Ilia Mirkin wrote:

On Thu, Nov 19, 2015 at 12:01 PM, Samuel Pitoiset
 wrote:

On 11/19/2015 05:54 PM, Ilia Mirkin wrote:


This appears to all be correct. I assume you ran piglit on this and
tried a few (esp older) games? I don't know of any offhand that hit
the nv50_push logic, but I'd feel more at ease if you just tried a
couple.



Yes, I did a full piglit run, there are no regressions but I didn't test any
games. Which ones do you want me to test ?


Just one or two that you have on-hand. Nothing in particular. But when
making changes I'm not 100% sure of, it's nice to double-check that
you didn't accidentally the whole thing.


Sure, I can test with xonotic-glx. I should ask for a steam account to 
have more games... I'm not a gamer, but this is going to be very useful. ;)




   -ilia



--
-Samuel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] nv50: allow using inline vertex data submit when gl_VertexID is used

2015-11-19 Thread Ilia Mirkin
On Thu, Nov 19, 2015 at 12:01 PM, Samuel Pitoiset
 wrote:
> On 11/19/2015 05:54 PM, Ilia Mirkin wrote:
>>
>> This appears to all be correct. I assume you ran piglit on this and
>> tried a few (esp older) games? I don't know of any offhand that hit
>> the nv50_push logic, but I'd feel more at ease if you just tried a
>> couple.
>>
>
> Yes, I did a full piglit run, there are no regressions but I didn't test any
> games. Which ones do you want me to test ?

Just one or two that you have on-hand. Nothing in particular. But when
making changes I'm not 100% sure of, it's nice to double-check that
you didn't accidentally the whole thing.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] nv50: allow using inline vertex data submit when gl_VertexID is used

2015-11-19 Thread Samuel Pitoiset



On 11/19/2015 05:54 PM, Ilia Mirkin wrote:

This appears to all be correct. I assume you ran piglit on this and
tried a few (esp older) games? I don't know of any offhand that hit
the nv50_push logic, but I'd feel more at ease if you just tried a
couple.



Yes, I did a full piglit run, there are no regressions but I didn't test 
any games. Which ones do you want me to test ?



Reviewed-by: Ilia Mirkin 

On Thu, Nov 19, 2015 at 3:51 AM, Samuel Pitoiset
 wrote:

The hardware can actually generates vertexid when vertices come from
a client-side buffer like when glDrawElements is used.

This doesn't fix (or break) any piglit tests but it improves the
previous attempt of Ilia (c830d19 "nv50: avoid using inline vertex
data submit when gl_VertexID is used")

The only disadvantage is that only works on G84+, but we don't really
care of that weird and old NV50 chipset.

Changes from v2:
  - use NV84_3D previously introduced
  - reset gl_VertexID after the draw is done

Signed-off-by: Samuel Pitoiset 
---
  src/gallium/drivers/nouveau/nv50/nv50_program.c|  3 +-
  src/gallium/drivers/nouveau/nv50/nv50_program.h|  2 +-
  src/gallium/drivers/nouveau/nv50/nv50_push.c   | 42 +-
  .../drivers/nouveau/nv50/nv50_state_validate.c |  3 +-
  src/gallium/drivers/nouveau/nv50/nv50_vbo.c| 11 +-
  5 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c 
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index 48057d2..a4b8ddf 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -66,7 +66,6 @@ nv50_vertprog_assign_slots(struct nv50_ir_prog_info *info)
case TGSI_SEMANTIC_VERTEXID:
   prog->vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID;
   prog->vp.attrs[2] |= 
NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID_DRAW_ARRAYS_ADD_START;
- prog->vp.vertexid = 1;
   continue;
default:
   break;
@@ -383,6 +382,8 @@ nv50_program_translate(struct nv50_program *prog, uint16_t 
chipset,
 prog->max_gpr = MAX2(4, (info->bin.maxGPR >> 1) + 1);
 prog->tls_space = info->bin.tlsSpace;

+   prog->vp.need_vertex_id = info->io.vertexId < PIPE_MAX_SHADER_INPUTS;
+
 if (prog->type == PIPE_SHADER_FRAGMENT) {
if (info->prop.fp.writesDepth) {
   prog->fp.flags[0] |= NV50_3D_FP_CONTROL_EXPORTS_Z;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.h 
b/src/gallium/drivers/nouveau/nv50/nv50_program.h
index f001670..1de5122 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.h
@@ -76,9 +76,9 @@ struct nv50_program {
ubyte psiz;/* output slot of point size */
ubyte bfc[2];  /* indices into varying for FFC (FP) or BFC (VP) */
ubyte edgeflag;
-  ubyte vertexid;
ubyte clpd[2]; /* output slot of clip distance[i]'s 1st component */
ubyte clpd_nr;
+  bool need_vertex_id;
 } vp;

 struct {
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_push.c 
b/src/gallium/drivers/nouveau/nv50/nv50_push.c
index f31eaa0..cbef95d 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_push.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_push.c
@@ -24,6 +24,10 @@ struct push_context {
 struct translate *translate;

 bool primitive_restart;
+
+   bool need_vertex_id;
+   int32_t index_bias;
+
 uint32_t prim;
 uint32_t restart_index;
 uint32_t instance_id;
@@ -74,6 +78,11 @@ emit_vertices_i08(struct push_context *ctx, unsigned start, 
unsigned count)

size = ctx->vertex_words * nr;

+  if (unlikely(ctx->need_vertex_id)) {
+ BEGIN_NV04(ctx->push, NV84_3D(VERTEX_ID_BASE), 1);
+ PUSH_DATA (ctx->push, *elts + ctx->index_bias);
+  }
+
BEGIN_NI04(ctx->push, NV50_3D(VERTEX_DATA), size);

ctx->translate->run_elts8(ctx->translate, elts, nr, 0, ctx->instance_id,
@@ -107,6 +116,11 @@ emit_vertices_i16(struct push_context *ctx, unsigned 
start, unsigned count)

size = ctx->vertex_words * nr;

+  if (unlikely(ctx->need_vertex_id)) {
+ BEGIN_NV04(ctx->push, NV84_3D(VERTEX_ID_BASE), 1);
+ PUSH_DATA (ctx->push, *elts + ctx->index_bias);
+  }
+
BEGIN_NI04(ctx->push, NV50_3D(VERTEX_DATA), size);

ctx->translate->run_elts16(ctx->translate, elts, nr, 0, 
ctx->instance_id,
@@ -140,6 +154,11 @@ emit_vertices_i32(struct push_context *ctx, unsigned 
start, unsigned count)

size = ctx->vertex_words * nr;

+  if (unlikely(ctx->need_vertex_id)) {
+ BEGIN_NV04(ctx->push, NV84_3D(VERTEX_ID_BASE), 1);
+ PUSH_DATA (ctx->push, *elts + ctx->index_bias);
+  }
+
BEGIN_NI04(ctx->push, NV50_3D(VERTEX_DATA), size);

ctx->translate->run_elts(ctx->translate, elts, nr, 0, ctx->instance_id,
@@ -161,10 +180,18 @@ emit_vertices_i32(struct push_context *ctx, unsigned 
start, unsigned co

Re: [Mesa-dev] [PATCH 1/2] nv50: add NV84_3D macro

2015-11-19 Thread Samuel Pitoiset



On 11/19/2015 05:44 PM, Ilia Mirkin wrote:

Reviewed-by: Ilia Mirkin 

Thanks for touching up the existing uses of this :) I was going to ask
you to do it, pleasantly surprised that you did it without asking.


I'm glad to hear that from you. :)



On Thu, Nov 19, 2015 at 3:51 AM, Samuel Pitoiset
 wrote:

Signed-off-by: Samuel Pitoiset 
---
  src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2 +-
  src/gallium/drivers/nouveau/nv50/nv50_vbo.c| 4 ++--
  src/gallium/drivers/nouveau/nv50/nv50_winsys.h | 1 +
  3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 4e7201d..cc7984d 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -686,7 +686,7 @@ nv50_screen_init_hwctx(struct nv50_screen *screen)
 BEGIN_NV04(push, NV50_3D(VB_ELEMENT_BASE), 1);
 PUSH_DATA (push, 0);
 if (screen->base.class_3d >= NV84_3D_CLASS) {
-  BEGIN_NV04(push, SUBC_3D(NV84_3D_VERTEX_ID_BASE), 1);
+  BEGIN_NV04(push, NV84_3D(VERTEX_ID_BASE), 1);
PUSH_DATA (push, 0);
 }

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c 
b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
index 9aa593f..ac0c4d9 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
@@ -487,7 +487,7 @@ nv50_draw_arrays(struct nv50_context *nv50,
BEGIN_NV04(push, NV50_3D(VB_ELEMENT_BASE), 1);
PUSH_DATA (push, 0);
if (nv50->screen->base.class_3d >= NV84_3D_CLASS) {
- BEGIN_NV04(push, SUBC_3D(NV84_3D_VERTEX_ID_BASE), 1);
+ BEGIN_NV04(push, NV84_3D(VERTEX_ID_BASE), 1);
   PUSH_DATA (push, 0);
}
nv50->state.index_bias = 0;
@@ -613,7 +613,7 @@ nv50_draw_elements(struct nv50_context *nv50, bool shorten,
BEGIN_NV04(push, NV50_3D(VB_ELEMENT_BASE), 1);
PUSH_DATA (push, index_bias);
if (nv50->screen->base.class_3d >= NV84_3D_CLASS) {
- BEGIN_NV04(push, SUBC_3D(NV84_3D_VERTEX_ID_BASE), 1);
+ BEGIN_NV04(push, NV84_3D(VERTEX_ID_BASE), 1);
   PUSH_DATA (push, index_bias);
}
nv50->state.index_bias = index_bias;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_winsys.h 
b/src/gallium/drivers/nouveau/nv50/nv50_winsys.h
index 76f1b41..6800230 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_winsys.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_winsys.h
@@ -49,6 +49,7 @@ PUSH_REFN(struct nouveau_pushbuf *push, struct nouveau_bo 
*bo, uint32_t flags)

  #define SUBC_3D(m) 3, (m)
  #define NV50_3D(n) SUBC_3D(NV50_3D_##n)
+#define NV84_3D(n) SUBC_3D(NV84_3D_##n)
  #define NVA0_3D(n) SUBC_3D(NVA0_3D_##n)

  #define SUBC_2D(m) 4, (m)
--
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


--
-Samuel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] nv50: allow using inline vertex data submit when gl_VertexID is used

2015-11-19 Thread Ilia Mirkin
This appears to all be correct. I assume you ran piglit on this and
tried a few (esp older) games? I don't know of any offhand that hit
the nv50_push logic, but I'd feel more at ease if you just tried a
couple.

Reviewed-by: Ilia Mirkin 

On Thu, Nov 19, 2015 at 3:51 AM, Samuel Pitoiset
 wrote:
> The hardware can actually generates vertexid when vertices come from
> a client-side buffer like when glDrawElements is used.
>
> This doesn't fix (or break) any piglit tests but it improves the
> previous attempt of Ilia (c830d19 "nv50: avoid using inline vertex
> data submit when gl_VertexID is used")
>
> The only disadvantage is that only works on G84+, but we don't really
> care of that weird and old NV50 chipset.
>
> Changes from v2:
>  - use NV84_3D previously introduced
>  - reset gl_VertexID after the draw is done
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/nv50/nv50_program.c|  3 +-
>  src/gallium/drivers/nouveau/nv50/nv50_program.h|  2 +-
>  src/gallium/drivers/nouveau/nv50/nv50_push.c   | 42 
> +-
>  .../drivers/nouveau/nv50/nv50_state_validate.c |  3 +-
>  src/gallium/drivers/nouveau/nv50/nv50_vbo.c| 11 +-
>  5 files changed, 46 insertions(+), 15 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c 
> b/src/gallium/drivers/nouveau/nv50/nv50_program.c
> index 48057d2..a4b8ddf 100644
> --- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
> +++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
> @@ -66,7 +66,6 @@ nv50_vertprog_assign_slots(struct nv50_ir_prog_info *info)
>case TGSI_SEMANTIC_VERTEXID:
>   prog->vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID;
>   prog->vp.attrs[2] |= 
> NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID_DRAW_ARRAYS_ADD_START;
> - prog->vp.vertexid = 1;
>   continue;
>default:
>   break;
> @@ -383,6 +382,8 @@ nv50_program_translate(struct nv50_program *prog, 
> uint16_t chipset,
> prog->max_gpr = MAX2(4, (info->bin.maxGPR >> 1) + 1);
> prog->tls_space = info->bin.tlsSpace;
>
> +   prog->vp.need_vertex_id = info->io.vertexId < PIPE_MAX_SHADER_INPUTS;
> +
> if (prog->type == PIPE_SHADER_FRAGMENT) {
>if (info->prop.fp.writesDepth) {
>   prog->fp.flags[0] |= NV50_3D_FP_CONTROL_EXPORTS_Z;
> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.h 
> b/src/gallium/drivers/nouveau/nv50/nv50_program.h
> index f001670..1de5122 100644
> --- a/src/gallium/drivers/nouveau/nv50/nv50_program.h
> +++ b/src/gallium/drivers/nouveau/nv50/nv50_program.h
> @@ -76,9 +76,9 @@ struct nv50_program {
>ubyte psiz;/* output slot of point size */
>ubyte bfc[2];  /* indices into varying for FFC (FP) or BFC (VP) */
>ubyte edgeflag;
> -  ubyte vertexid;
>ubyte clpd[2]; /* output slot of clip distance[i]'s 1st component 
> */
>ubyte clpd_nr;
> +  bool need_vertex_id;
> } vp;
>
> struct {
> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_push.c 
> b/src/gallium/drivers/nouveau/nv50/nv50_push.c
> index f31eaa0..cbef95d 100644
> --- a/src/gallium/drivers/nouveau/nv50/nv50_push.c
> +++ b/src/gallium/drivers/nouveau/nv50/nv50_push.c
> @@ -24,6 +24,10 @@ struct push_context {
> struct translate *translate;
>
> bool primitive_restart;
> +
> +   bool need_vertex_id;
> +   int32_t index_bias;
> +
> uint32_t prim;
> uint32_t restart_index;
> uint32_t instance_id;
> @@ -74,6 +78,11 @@ emit_vertices_i08(struct push_context *ctx, unsigned 
> start, unsigned count)
>
>size = ctx->vertex_words * nr;
>
> +  if (unlikely(ctx->need_vertex_id)) {
> + BEGIN_NV04(ctx->push, NV84_3D(VERTEX_ID_BASE), 1);
> + PUSH_DATA (ctx->push, *elts + ctx->index_bias);
> +  }
> +
>BEGIN_NI04(ctx->push, NV50_3D(VERTEX_DATA), size);
>
>ctx->translate->run_elts8(ctx->translate, elts, nr, 0, 
> ctx->instance_id,
> @@ -107,6 +116,11 @@ emit_vertices_i16(struct push_context *ctx, unsigned 
> start, unsigned count)
>
>size = ctx->vertex_words * nr;
>
> +  if (unlikely(ctx->need_vertex_id)) {
> + BEGIN_NV04(ctx->push, NV84_3D(VERTEX_ID_BASE), 1);
> + PUSH_DATA (ctx->push, *elts + ctx->index_bias);
> +  }
> +
>BEGIN_NI04(ctx->push, NV50_3D(VERTEX_DATA), size);
>
>ctx->translate->run_elts16(ctx->translate, elts, nr, 0, 
> ctx->instance_id,
> @@ -140,6 +154,11 @@ emit_vertices_i32(struct push_context *ctx, unsigned 
> start, unsigned count)
>
>size = ctx->vertex_words * nr;
>
> +  if (unlikely(ctx->need_vertex_id)) {
> + BEGIN_NV04(ctx->push, NV84_3D(VERTEX_ID_BASE), 1);
> + PUSH_DATA (ctx->push, *elts + ctx->index_bias);
> +  }
> +
>BEGIN_NI04(ctx->push, NV50_3D(VERTEX_DATA), size);
>
>ctx->translate->run_elts(ctx->translate, elts, nr, 0, ctx->instance_id,
> @@ -161,10 +180,18 @@ emit_vertices_i32(struct push_context *ctx, unsigned 

Re: [Mesa-dev] [PATCH 1/2] nv50: add NV84_3D macro

2015-11-19 Thread Ilia Mirkin
Reviewed-by: Ilia Mirkin 

Thanks for touching up the existing uses of this :) I was going to ask
you to do it, pleasantly surprised that you did it without asking.

On Thu, Nov 19, 2015 at 3:51 AM, Samuel Pitoiset
 wrote:
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2 +-
>  src/gallium/drivers/nouveau/nv50/nv50_vbo.c| 4 ++--
>  src/gallium/drivers/nouveau/nv50/nv50_winsys.h | 1 +
>  3 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
> b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
> index 4e7201d..cc7984d 100644
> --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
> +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
> @@ -686,7 +686,7 @@ nv50_screen_init_hwctx(struct nv50_screen *screen)
> BEGIN_NV04(push, NV50_3D(VB_ELEMENT_BASE), 1);
> PUSH_DATA (push, 0);
> if (screen->base.class_3d >= NV84_3D_CLASS) {
> -  BEGIN_NV04(push, SUBC_3D(NV84_3D_VERTEX_ID_BASE), 1);
> +  BEGIN_NV04(push, NV84_3D(VERTEX_ID_BASE), 1);
>PUSH_DATA (push, 0);
> }
>
> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c 
> b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
> index 9aa593f..ac0c4d9 100644
> --- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
> +++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
> @@ -487,7 +487,7 @@ nv50_draw_arrays(struct nv50_context *nv50,
>BEGIN_NV04(push, NV50_3D(VB_ELEMENT_BASE), 1);
>PUSH_DATA (push, 0);
>if (nv50->screen->base.class_3d >= NV84_3D_CLASS) {
> - BEGIN_NV04(push, SUBC_3D(NV84_3D_VERTEX_ID_BASE), 1);
> + BEGIN_NV04(push, NV84_3D(VERTEX_ID_BASE), 1);
>   PUSH_DATA (push, 0);
>}
>nv50->state.index_bias = 0;
> @@ -613,7 +613,7 @@ nv50_draw_elements(struct nv50_context *nv50, bool 
> shorten,
>BEGIN_NV04(push, NV50_3D(VB_ELEMENT_BASE), 1);
>PUSH_DATA (push, index_bias);
>if (nv50->screen->base.class_3d >= NV84_3D_CLASS) {
> - BEGIN_NV04(push, SUBC_3D(NV84_3D_VERTEX_ID_BASE), 1);
> + BEGIN_NV04(push, NV84_3D(VERTEX_ID_BASE), 1);
>   PUSH_DATA (push, index_bias);
>}
>nv50->state.index_bias = index_bias;
> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_winsys.h 
> b/src/gallium/drivers/nouveau/nv50/nv50_winsys.h
> index 76f1b41..6800230 100644
> --- a/src/gallium/drivers/nouveau/nv50/nv50_winsys.h
> +++ b/src/gallium/drivers/nouveau/nv50/nv50_winsys.h
> @@ -49,6 +49,7 @@ PUSH_REFN(struct nouveau_pushbuf *push, struct nouveau_bo 
> *bo, uint32_t flags)
>
>  #define SUBC_3D(m) 3, (m)
>  #define NV50_3D(n) SUBC_3D(NV50_3D_##n)
> +#define NV84_3D(n) SUBC_3D(NV84_3D_##n)
>  #define NVA0_3D(n) SUBC_3D(NVA0_3D_##n)
>
>  #define SUBC_2D(m) 4, (m)
> --
> 2.6.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeon/llvm: Use llvm.AMDIL.exp intrinsic again for now

2015-11-19 Thread Tom Stellard
On Thu, Nov 19, 2015 at 11:17:12AM +0100, Nicolai Hähnle wrote:
> On 19.11.2015 03:55, Tom Stellard wrote:
> >On Thu, Nov 19, 2015 at 11:31:55AM +0900, Michel Dänzer wrote:
> >>From: Michel Dänzer 
> >>
> >>llvm.exp2.f32 doesn't work in some cases yet.
> >>
> >>Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709
> >>Signed-off-by: Michel Dänzer 
> >>---
> >>
> >>Once the problem is fixed in the LLVM AMDGPU backend, we can re-enable
> >>llvm.exp2.f32 for the fixed LLVM version.
> >>
> >>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >>diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
> >>b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> >>index ac99e73..c94f109 100644
> >>--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> >>+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> >>@@ -1539,7 +1539,7 @@ void radeon_llvm_context_init(struct 
> >>radeon_llvm_context * ctx)
> >>bld_base->op_actions[TGSI_OPCODE_ENDIF].emit = endif_emit;
> >>bld_base->op_actions[TGSI_OPCODE_ENDLOOP].emit = endloop_emit;
> >>bld_base->op_actions[TGSI_OPCODE_EX2].emit = build_tgsi_intrinsic_nomem;
> >>-   bld_base->op_actions[TGSI_OPCODE_EX2].intr_name = "llvm.exp2.f32";
> >>+   bld_base->op_actions[TGSI_OPCODE_EX2].intr_name = "llvm.AMDIL.exp.";
> >
> >Do we want a native instruction here, or do we want IEEE precise exp2?
> >If it's the former then we shouldn't be using llvm.exp2.f32 anyway.
> >
> >I know that we need to use llvm.AMDIL.exp. for older LLVM, but for newer
> >LLVM, I would really like to start doing intrinsics the correct way.  In
> >this case, it means adding an llvm.amdgcn.exp.f32 intrinsic to
> >include/llvm/IR/IntrinsicsAMDGPU.td.  In the section with the amdgcn
> >TargetPrefix.
> 
> Doesn't AMDGPU currently emit native instructions for the various
> math intrinsics anyway? If your plan is to change that and we add
> our own intrinsics, what's the plan to still benefit from existing
> LLVM optimization passes?
> 

AMDGPU does emit native instruction for the math intrinsics, but this is
wrong in most cases, because it assumes that a less precise calculation is
acceptable when this is not always the case.

This a good point about optimizations.  I think the main benefits we get from
using the LLVM intrinsic are constant folding and also speculative execution.

An alternate solution would be to use the llvm intrinsic, but then communicate
to the backend via a function attribute that it is OK to use the less precise
version of exp.

> FWIW, the original bug that Michel addresses here is caused by LLVM
> libcall optimizations replacing the exp2 _intrinsic_ by a ldexp
> _libcall_, which AMDGPU codegen cannot deal with. I suggested to
> address this with http://reviews.llvm.org/D14327. Could you take a
> look at that?
> 

Please take a look at the comment I added to this review.

-Tom

> Cheers,
> Nicolai
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] automake: loader: don't create an empty dri3 helper

2015-11-19 Thread Emil Velikov
On 19 November 2015 at 16:33, Emil Velikov  wrote:
> From: Emil Velikov 
>
> Seems that creating an empty one does not fair too well with MacOSX's
> ar. Considering that all the users of the helper include it only when
> needed, let's reshuffle the makefile.
>
Forgot to mention - the cause is guesswork, as I'm not that intimate
familiar with different AR implementations. Yet this seems the most
likely one.

Vinson, can you test this ?

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] automake: loader: don't create an empty dri3 helper

2015-11-19 Thread Emil Velikov
From: Emil Velikov 

Seems that creating an empty one does not fair too well with MacOSX's
ar. Considering that all the users of the helper include it only when
needed, let's reshuffle the makefile.

Cc: Vinson Lee 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92985
Signed-off-by: Emil Velikov 
---
 src/loader/Makefile.am | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/loader/Makefile.am b/src/loader/Makefile.am
index 5daa42e..5021120 100644
--- a/src/loader/Makefile.am
+++ b/src/loader/Makefile.am
@@ -23,7 +23,7 @@ include Makefile.sources
 
 EXTRA_DIST = SConscript
 
-noinst_LTLIBRARIES = libloader.la libloader_dri3_helper.la
+noinst_LTLIBRARIES = libloader.la
 
 AM_CPPFLAGS = \
$(DEFINES) \
@@ -57,6 +57,8 @@ libloader_la_LIBADD += \
 endif
 
 if HAVE_DRI3
+noinst_LTLIBRARIES += libloader_dri3_helper.la
+
 libloader_dri3_helper_la_SOURCES = \
loader_dri3_helper.c \
loader_dri3_helper.h
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] automake: egl: add symbols test

2015-11-19 Thread Emil Velikov
From: Emil Velikov 

Should help us catch issues where we expose any extra symbols by
mistake. Just like the ones fixes with previous commit.

Signed-off-by: Emil Velikov 
---

Ideally we'll check all symbols (not only the T ones) and parse out the 
private symbols such as _init, _fini and others.

Not a big deal here and now as this approach is used elsewhere in mesa.

-Emil

 src/egl/Makefile.am   |  3 +++
 src/egl/egl-symbols-check | 55 +++
 2 files changed, 58 insertions(+)
 create mode 100755 src/egl/egl-symbols-check

diff --git a/src/egl/Makefile.am b/src/egl/Makefile.am
index 88fe13a..0b463c8 100644
--- a/src/egl/Makefile.am
+++ b/src/egl/Makefile.am
@@ -119,7 +119,10 @@ egl_HEADERS = \
$(top_srcdir)/include/EGL/eglmesaext.h \
$(top_srcdir)/include/EGL/eglplatform.h
 
+TESTS = egl-symbols-check
+
 EXTRA_DIST = \
+   egl-symbols-check \
SConscript \
drivers/haiku \
docs \
diff --git a/src/egl/egl-symbols-check b/src/egl/egl-symbols-check
new file mode 100755
index 000..5d46fed
--- /dev/null
+++ b/src/egl/egl-symbols-check
@@ -0,0 +1,55 @@
+#!/bin/bash
+
+FUNCS=$(nm -D --defined-only ${1-.libs/libEGL.so} | grep -o "T .*" | cut -c 3- 
| while read func; do
+( grep -q "^$func$" || echo $func )  

[Mesa-dev] [PATCH 1/4] automake: loader: rework the CPPFLAGS

2015-11-19 Thread Emil Velikov
From: Emil Velikov 

Rather than duplicating things, just use the generic AM_CPPFLAGS. This
has the fortunate side-effect of adding VISIBILITY_CFLAGS for the dri3
helper. The latter of which was erroneously exposing some internal
symbols.

Cc: Kai Wasserbäch 
Reported-by: Kai Wasserbäch 
Signed-off-by: Emil Velikov 
---
 src/loader/Makefile.am | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/src/loader/Makefile.am b/src/loader/Makefile.am
index c0f7947..67ed776 100644
--- a/src/loader/Makefile.am
+++ b/src/loader/Makefile.am
@@ -25,18 +25,20 @@ EXTRA_DIST = SConscript
 
 noinst_LTLIBRARIES = libloader.la libloader_dri3_helper.la
 
-libloader_la_CPPFLAGS = \
+AM_CPPFLAGS = \
$(DEFINES) \
-I$(top_srcdir)/include \
-I$(top_srcdir)/src \
$(VISIBILITY_CFLAGS) \
+   $(LIBDRM_CFLAGS) \
$(LIBUDEV_CFLAGS)
 
 libloader_la_SOURCES = $(LOADER_C_FILES)
 libloader_la_LIBADD =
 
 if HAVE_DRICOMMON
-libloader_la_CPPFLAGS += \
+libloader_la_CPPFLAGS = \
+   $(AM_CPPFLAGS) \
-I$(top_srcdir)/src/mesa/drivers/dri/common/ \
-I$(top_builddir)/src/mesa/drivers/dri/common/ \
-I$(top_srcdir)/src/mesa/ \
@@ -49,20 +51,11 @@ libloader_la_CPPFLAGS += \
 endif
 
 if HAVE_LIBDRM
-libloader_la_CPPFLAGS += \
-   $(LIBDRM_CFLAGS)
-
 libloader_la_LIBADD += \
$(LIBDRM_LIBS)
 endif
 
 if HAVE_DRI3
-libloader_dri3_helper_la_CPPFLAGS = \
-   $(DEFINES) \
-   -I$(top_srcdir)/include \
-   -I$(top_srcdir)/src \
-   $(LIBDRM_CFLAGS)
-
 libloader_dri3_helper_la_SOURCES = \
loader_dri3_helper.c \
loader_dri3_helper.h
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] automake: loader: honour the XCB_DRI3 cflags

2015-11-19 Thread Emil Velikov
From: Emil Velikov 

Without this the compilation will fail, as the headers are installed in
a non-default location.

Signed-off-by: Emil Velikov 
---
 src/loader/Makefile.am | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/loader/Makefile.am b/src/loader/Makefile.am
index 67ed776..5daa42e 100644
--- a/src/loader/Makefile.am
+++ b/src/loader/Makefile.am
@@ -30,6 +30,7 @@ AM_CPPFLAGS = \
-I$(top_srcdir)/include \
-I$(top_srcdir)/src \
$(VISIBILITY_CFLAGS) \
+   $(XCB_DRI3_CFLAGS) \
$(LIBDRM_CFLAGS) \
$(LIBUDEV_CFLAGS)
 
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeon: ensure that timing/profiling queries are suspended on flush

2015-11-19 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Wed, Nov 18, 2015 at 6:45 PM, Nicolai Hähnle  wrote:
> The queries_suspended_for_flush flag is redundant because suspended queries
> are not removed from their respective linked list.
> ---
>  src/gallium/drivers/radeon/r600_pipe_common.c | 13 ++---
>  src/gallium/drivers/radeon/r600_pipe_common.h |  2 --
>  2 files changed, 6 insertions(+), 9 deletions(-)
>
> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
> b/src/gallium/drivers/radeon/r600_pipe_common.c
> index 60be412..f03dcd9 100644
> --- a/src/gallium/drivers/radeon/r600_pipe_common.c
> +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
> @@ -27,6 +27,7 @@
>  #include "r600_pipe_common.h"
>  #include "r600_cs.h"
>  #include "tgsi/tgsi_parse.h"
> +#include "util/list.h"
>  #include "util/u_draw_quad.h"
>  #include "util/u_memory.h"
>  #include "util/u_format_s3tc.h"
> @@ -135,12 +136,10 @@ static void r600_memory_barrier(struct pipe_context 
> *ctx, unsigned flags)
>  void r600_preflush_suspend_features(struct r600_common_context *ctx)
>  {
> /* suspend queries */
> -   ctx->queries_suspended_for_flush = false;
> -   if (ctx->num_cs_dw_nontimer_queries_suspend) {
> +   if (!LIST_IS_EMPTY(&ctx->active_nontimer_queries))
> r600_suspend_nontimer_queries(ctx);
> +   if (!LIST_IS_EMPTY(&ctx->active_timer_queries))
> r600_suspend_timer_queries(ctx);
> -   ctx->queries_suspended_for_flush = true;
> -   }
>
> ctx->streamout.suspended = false;
> if (ctx->streamout.begin_emitted) {
> @@ -157,10 +156,10 @@ void r600_postflush_resume_features(struct 
> r600_common_context *ctx)
> }
>
> /* resume queries */
> -   if (ctx->queries_suspended_for_flush) {
> -   r600_resume_nontimer_queries(ctx);
> +   if (!LIST_IS_EMPTY(&ctx->active_timer_queries))
> r600_resume_timer_queries(ctx);
> -   }
> +   if (!LIST_IS_EMPTY(&ctx->active_nontimer_queries))
> +   r600_resume_nontimer_queries(ctx);
>  }
>
>  static void r600_flush_from_st(struct pipe_context *ctx,
> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
> b/src/gallium/drivers/radeon/r600_pipe_common.h
> index f9fecdf..253d657 100644
> --- a/src/gallium/drivers/radeon/r600_pipe_common.h
> +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
> @@ -397,8 +397,6 @@ struct r600_common_context {
> struct list_headactive_timer_queries;
> unsignednum_cs_dw_nontimer_queries_suspend;
> unsignednum_cs_dw_timer_queries_suspend;
> -   /* If queries have been suspended. */
> -   boolqueries_suspended_for_flush;
> /* Additional hardware info. */
> unsignedbackend_mask;
> unsignedmax_db; /* for OQ */
> --
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] [v3] i965/skl: skip fast clears for certain surface formats

2015-11-19 Thread Chad Versace
On Wed 18 Nov 2015, Pohjolainen, Topi wrote:
> On Tue, Nov 17, 2015 at 05:31:12PM -0800, Ben Widawsky wrote:
> > Some of the information originally in this commit message is now in the 
> > patch
> > before this.
> >
> > SKL adds compressible render targets and as a result mutates some of the
> > programming for fast clears and resolves. There is a new internal surface 
> > type
> > called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "The Auxiliary 
> > surface is
> > a CCS (Color Control Surface) with compression disabled or an MCS with
> > compression enabled, depending on number of multisamples. MCS (Multisample
>
> I find this wording (even though it looks to be directly from the spec
> somewhat misleading. I read it suggesting that AUX can be CCS only when
> compression is disabled. I like another wording in "Auxiliary Surfaces For
> Sampled Tiled Resource" better: "CCS is used to indicate that the color
> surface is losslessly compressed."

The BSpec docs regarding CCS are universally confusing.

To clarify the subject, here is a summary how the BSpec usage of the
terms "MCS" and "CCS" differ between BDW and SKL.

BDW > RENDER_SURFACE_STATE > Auxiliary Surface Mode:

AUX_MCS, if samples > 1:

The aux surface is a "Multisample Control Surface".

Let N be the surfaces's sample count.  Each element of the MCS is
a packed array of N subelements, one subelement per sample. Each
subelement contains log2(N) bits, and therefore has the range [0,
N-1]. The total element size is byte-aligned. See the PRM section
titled "Compressed Multisampled Surfaces" for the full details of
the bit layout.

For each sample (x, y, s) of the "logical" surface, the MCS element
at location (x, y) defines which array slice of the primary surface
contains the color value for sample s. For example, if N=4 and the
MCS element at some pixel location (x0, y0) is 1101b, then it
defines the mapping:
(x0, y0, 0) => array slice 0
(x0, y0, 1) => array slice 0
(x0, y0, 2) => array slice 1
(x0, y0, 3) => array slice 3

A special case occurs if the MCS element at some location (x0, y0)
indicates that the color value of all samples (x0, y0, s) of pixel
(x0, y0) are stored in array slice N-1.  This special case occurs
when all bits are set in the MCS element.  In this case, the pixel
value stored at location (x0, y0) in each array slice of the primary
surface is undefined. The pixel value for all samples is instead the
surface's clear color, which is set in
RENDER_SURFACE_STATE.*ClearColor.

AUX_MCS, if samples == 1:
The aux suface is a degenerate form of the "Multisample Control
Surface".

The PRM calls this degenerate form of the MCS the "non-MSRT MCS"
(non-Mulisampled Render Target Multisample Control Surface). The
name is horrifically contradictory.

Each MCS element is 1 bit. The MCS is tightly packed (that is,
1 byte contains 8 elements). Each MCS element contains the clear
status of a cache line pair (128B) in the primary surface.  If the
MCS element's bit is set, then all color values stored in the
2 cache lines of the primary surface are invalid; the real color
value, for all pixels in the cache line pair, is the clear color set
in RENDER_SURFACE_STATE.*ClearColor.

SKL > RENDER_SURFACE_STATE > Auxiliary Surface Mode:

AUX_CCS_D, if samples > 1:
The aux surface is a Multisample Control Surface. It's identical to
the Broadwell MCS.

AUX_CCS_E, if samples > 1:
Same as AUX_CCS_D.  The aux surface is again a Multisample Control
Surface, identical to the Broadwell MCS.

AUX_CCS_D, if samples == 1:
The aux surface is a "Color Control Surface" with compression 
"Disabled".

Sometimes the BSpec refers to the compression as "Render
Compression", "color compression", and "lossless color
compression". I believe they all mean the same thing.

Skylake's AUX_CCS_D surface has the same layout as
Broadwell's non-MSRT MCS, except that each CCS element is
2 bits instead of 1. In other words, each 2bit element in
the CCS_D contains the clear status of a cache-line pair
(128B) in the primary surface. If all bits in the CCS_D element
is set, then all color values stored in the 2 cache lines of
the primary surface are invalid; the real color value, for
all pixels in the cache line pair, is the clear color set in
RENDER_SURFACE_STATE.*ClearColor.

AUX_CCS_

Re: [Mesa-dev] [PATCH 1.9/2] i965: Add brw_imm_uv().

2015-11-19 Thread Jason Ekstrand
On Wed, Nov 18, 2015 at 2:25 PM, Matt Turner  wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 3 +++
>  src/mesa/drivers/dri/i965/brw_reg.h| 9 +
>  2 files changed, 12 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index e5a286a..78c10e9 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -116,6 +116,9 @@ brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, unsigned 
> gen)
>case BRW_REGISTER_TYPE_V:
>   brw_reg = brw_imm_v(reg->ud);
>   break;
> +  case BRW_REGISTER_TYPE_UV:
> + brw_reg = brw_imm_uv(reg->ud);
> + break;
>default:
>  unreachable("not reached");
>}
> diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
> b/src/mesa/drivers/dri/i965/brw_reg.h
> index 759bd93..b77cdeb 100644
> --- a/src/mesa/drivers/dri/i965/brw_reg.h
> +++ b/src/mesa/drivers/dri/i965/brw_reg.h
> @@ -623,6 +623,15 @@ brw_imm_v(unsigned v)
> return imm;
>  }
>
> +/** Construct vector of eight unsigned half-byte values */
> +static inline struct brw_reg
> +brw_imm_uv(unsigned uv)
> +{

Please add a GEN assertion either here or in the generator.  This only
becomes available on Haswell or Broadwell if I remember correctly.  I
do know it's not universally available.

> +   struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_UV);
> +   imm.ud = uv;
> +   return imm;
> +}
> +
>  /** Construct vector of four 8-bit float values */
>  static inline struct brw_reg
>  brw_imm_vf(unsigned v)
> --
> 2.4.9
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/7] [v2] i965/gen9: Support fast clears for 32b float

2015-11-19 Thread Neil Roberts
Ben Widawsky  writes:

> Two formats are explicitly disabled because they fail piglit tests,
> LUMINANCE16F and INTENSITY16F. There is some question about the
> validity of sampling from these surfaces for all gens, however, there
> seem to be no other failures, so I'd prefer to leave tackling that for
> a separate series.

You might want to remove this paragraph now that it's no longer true.

The patch looks good to me.

Reviewed-by: Neil Roberts 

Just to note, I posted a patch which changes the check to allow fast
clears for certain MSRT formats. However it still disables LUMINANCE16F
and INTENSITY16F because those are part of the list of formats that get
a different format when used for rendering, so nothing needs to be
changed in this patch if we land that.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/9] Early fp64 fixes

2015-11-19 Thread Connor Abbott
On Thu, Nov 19, 2015 at 5:05 AM, Iago Toral Quiroga  wrote:
> These patches are fixes extracted from Connor's fp64 branch that
> I would like to review and land ahead of the rest. They are
> independent of the rest of the series, some of them are even
> general fixes unrelated to fp64 that could fix issues in master,
> others are fairly trivial and we can easily land them now.
>
> It will help reduce the size of a rather big series (115 patches
> and counting) that is still in development and ease future
> rebases a little, specially since there is a patch that renames
> one of the nir types, so we probably want to land that soon.
>
> Apparently, Connor sent most of these for review back in August but
> they did not get reviewed back then. I went ahead and added my Rb to
> all but one:
>
> i965/fs: add stride restrictions for copy propagation
>
> I could not verify all the conditions (particularly, the cases where we allow
> 0 strides), so it would be nice if someone else could review that or point me
> to the place in the docs where it says that 0 strides are safe for these
> situations. Comments/Reviews to the other patches are also welcome :)

Well, the docs don't say that it's unsafe either :) But more
seriously, we use 0 strides all the time -- in particular, we use them
for uniforms when they get promoted to push constants, since when
they're pushed to the EU they're packed into registers with, say, 8
32-bit things per SIMD8 register. So to get, say, the second uniform,
we need to say "get the second channel of this register and splat it
to all the channels of the source," which we can do with a source of
stride 0 and offset 1.

>
> No piglit regressions on IVB.
>
> Connor Abbott (8):
>   i965/fs: print non-1 strides when dumping instructions
>   i965/fs: print writemask_all when it's enabled
>   i965: fix 64-bit immediates in brw_inst(_set)_bits
>   i965/fs: respect force_sechalf/force_writemask_all in CSE
>   i965/fs: don't propagate cmod when the exec sizes differ
>   i965/fs: add stride restrictions for copy propagation
>   i965/vec4: avoid dependency control around Align1 instructions
>   nir/builder: only read meaningful channels in nir_swizzle()
>
> Jason Ekstrand (1):
>   nir: s/nir_type_unsigned/nir_type_uint
>
>  src/gallium/auxiliary/nir/tgsi_to_nir.c|  2 +-
>  src/glsl/nir/glsl_to_nir.cpp   |  2 +-
>  src/glsl/nir/nir.h |  2 +-
>  src/glsl/nir/nir_builder.h |  2 +-
>  src/glsl/nir/nir_constant_expressions.py   |  2 +-
>  src/glsl/nir/nir_opcodes.py| 78 
> +++---
>  src/glsl/nir/nir_search.c  |  4 +-
>  src/mesa/drivers/dri/i965/brw_fs.cpp   | 15 +
>  .../drivers/dri/i965/brw_fs_cmod_propagation.cpp   |  3 +
>  .../drivers/dri/i965/brw_fs_copy_propagation.cpp   | 58 +++-
>  src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |  2 +
>  src/mesa/drivers/dri/i965/brw_inst.h   |  6 +-
>  src/mesa/drivers/dri/i965/brw_nir.c|  4 +-
>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 +++--
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp |  2 +-
>  15 files changed, 142 insertions(+), 57 deletions(-)
>
> --
> 1.9.1
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] blit: Don't take into account the Mesa format when checking MSRT blit

2015-11-19 Thread Marek Olšák
On Thu, Nov 19, 2015 at 4:25 PM, Neil Roberts  wrote:
> According to the GLES3 spec, blitting between multisample FBOs with
> different internal formats should not be allowed. The
> compatible_resolve_formats function implements this check. Previously
> it had a shortcut where if the Mesa formats of the two renderbuffers
> were the same then it would assume the blit is ok. However some
> drivers map different internal formats to the same Mesa format, for
> example it might implement both GL_RGB and GL_RGBA textures with
> MESA_FORMAT_R8G8B8A_UNORM. The function is used to generate a GL error
> according to what the GL spec requires so the blit should not be
> allowed in that case. This patch just removes the shortcut so that it
> only ever looks at the internal format.
>
> Note that I posted a related patch to disable this check altogether
> for desktop GL. However this function is still used on GLES3 because
> there are conformance tests that require this behaviour so this patch
> is still useful.
>
> Cc: Marek Olšák 
> ---
>  src/mesa/main/blit.c | 28 +++-
>  1 file changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/src/mesa/main/blit.c b/src/mesa/main/blit.c
> index a32f1a4..deb4f5a 100644
> --- a/src/mesa/main/blit.c
> +++ b/src/mesa/main/blit.c
> @@ -129,20 +129,22 @@ compatible_resolve_formats(const struct gl_renderbuffer 
> *readRb,
>  {
> GLenum readFormat, drawFormat;
>
> -   /* The simple case where we know the backing Mesa formats are the same.
> -*/
> -   if (_mesa_get_srgb_format_linear(readRb->Format) ==
> -   _mesa_get_srgb_format_linear(drawRb->Format)) {
> -  return GL_TRUE;
> -   }
> -
> -   /* The Mesa formats are different, so we must check whether the internal
> -* formats are compatible.
> +   /* This checks whether the internal formats are compatible rather than the
> +* Mesa format for two reasons:
> +*
> +* • Under some circumstances, the user may request e.g. two GL_RGBA8
> +*   textures and get two entirely different Mesa formats like RGBA 
> and
> +*   ARGB. Drivers behaving like that should be able to cope with
> +*   non-matching formats by themselves, because it's not the user's 
> fault.
> +*
> +* • Picking two different internal formats can end up with the same Mesa
> +*   format. For example the driver might be simulating GL_RGB textures
> +*   with GL_RGBA internally and in that case both internal formats would
> +*   end up with RGBA.
>  *
> -* Under some circumstances, the user may request e.g. two GL_RGBA8
> -* textures and get two entirely different Mesa formats like RGBA and
> -* ARGB. Drivers behaving like that should be able to cope with
> -* non-matching formats by themselves, because it's not the user's fault.
> +* This function is used to generate a GL error according to the spec so 
> in
> +* both cases we want to be looking at the appliation-level format, which

typo: application

other than that:

Reviewed-by: Marek Olšák 

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Connor Abbott
On Thu, Nov 19, 2015 at 10:31 AM, Connor Abbott  wrote:
> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
>> wrote:
>>> From: Connor Abbott 
>>>
>>> It appears that not only math instructions, but also MOV_BYTES or
>>> any instruction that uses Align1 mode cannot be in the middle
>>> of a dependency control sequence or the GPU will hang (at least on my
>>> BDW). This fixes GPU hangs in some fp64 tests.
>>
>> I'm pretty surprised by this assessment. Doubtful even.
>>
>>> Reviewed-by: Iago Toral Quiroga 
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> index 3bcd5cb..bc0a33b 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
>>> vec4_instruction *inst)
>>> }
>>>
>>> /*
>>> +* Instructions that use Align1 mode cause the GPU to hang when inserted
>>> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered 
>>> empirically.
>>> +*/
>>> +
>>> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
>>> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>>
>> PACK_BYTES sets depctrl itself in the generator, and at the time I
>> added it I made a test that did
>>
>>   vec4 foo = vec4(packUnorm4x8(...),
>>   packUnorm4x8(...),
>>   packUnorm4x8(...),
>>   packUnorm4x8(...))
>>
>> and confirmed that it set depctrl properly on the whole sequence.
>> There could of course be bugs somewhere, but the "hardware doesn't
>> work if you mix align1 and align16 depctrl" seems unlikely.
>>
>> Do you know of a test that this affects?
>
> This only affects FP64 tests, since there we use an align1 mov to do
> double-to-float and float-to-double. However, I tried commenting out
> emit_nir_code() and just doing essentially:
>
> emit(MOV(...))->force_writemask_all = true;
> emit(VEC4_OPCODE_PACK_BYTES, ...);
> emit(MOV(...))->force_writemask_all = true;

Err, I meant:

emit(MOV(...))->no_dd_clear = true;
emit(VEC4_OPCODE_PACK_BYTES, ...);
emit(MOV(...))->no_dd_check = true;

(this also hangs with my DOUBLE_TO_FLOAT and FLOAT_TO_DOUBLE opcodes,
which don't use the data dependency bits at all).

>
> and on my BDW it hanged. In case it's not clear: this isn't about
> setting depctrl on the instruction itself, it just can't be inside of
> a depctrl sequence (which we were already disallowing for math
> instructions anyways).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Connor Abbott
On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  wrote:
>> From: Connor Abbott 
>>
>> It appears that not only math instructions, but also MOV_BYTES or
>> any instruction that uses Align1 mode cannot be in the middle
>> of a dependency control sequence or the GPU will hang (at least on my
>> BDW). This fixes GPU hangs in some fp64 tests.
>
> I'm pretty surprised by this assessment. Doubtful even.
>
>> Reviewed-by: Iago Toral Quiroga 
>> ---
>>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> index 3bcd5cb..bc0a33b 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction 
>> *inst)
>> }
>>
>> /*
>> +* Instructions that use Align1 mode cause the GPU to hang when inserted
>> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered empirically.
>> +*/
>> +
>> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
>> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>
> PACK_BYTES sets depctrl itself in the generator, and at the time I
> added it I made a test that did
>
>   vec4 foo = vec4(packUnorm4x8(...),
>   packUnorm4x8(...),
>   packUnorm4x8(...),
>   packUnorm4x8(...))
>
> and confirmed that it set depctrl properly on the whole sequence.
> There could of course be bugs somewhere, but the "hardware doesn't
> work if you mix align1 and align16 depctrl" seems unlikely.
>
> Do you know of a test that this affects?

This only affects FP64 tests, since there we use an align1 mov to do
double-to-float and float-to-double. However, I tried commenting out
emit_nir_code() and just doing essentially:

emit(MOV(...))->force_writemask_all = true;
emit(VEC4_OPCODE_PACK_BYTES, ...);
emit(MOV(...))->force_writemask_all = true;

and on my BDW it hanged. In case it's not clear: this isn't about
setting depctrl on the instruction itself, it just can't be inside of
a depctrl sequence (which we were already disallowing for math
instructions anyways).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] i965: Check base format to determine whether to use tiled memcpy

2015-11-19 Thread Neil Roberts
The tiled memcpy doesn't work for copying from RGBX to RGBA because it
doesn't override the alpha component to 1.0. Commit 2cebaac479d4 added
a check to disable it for RGBX formats by looking at the TexFormat.
However a lot of the rest of the code base is written with the
assumption that an RGBA texture can be used internally to implement a
GL_RGB texture. If that is done then this check breaks. This patch
makes it instead check the base format of the texture which I think
more directly matches the intention.

Cc: Jason Ekstrand 
---
 src/mesa/drivers/dri/i965/intel_pixel_read.c | 7 ---
 src/mesa/drivers/dri/i965/intel_tex_image.c  | 7 ---
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_pixel_read.c 
b/src/mesa/drivers/dri/i965/intel_pixel_read.c
index 9bcbbd1..c8aef65 100644
--- a/src/mesa/drivers/dri/i965/intel_pixel_read.c
+++ b/src/mesa/drivers/dri/i965/intel_pixel_read.c
@@ -135,10 +135,11 @@ intel_readpixels_tiled_memcpy(struct gl_context * ctx,
   return false;
 
/* We can't handle copying from RGBX or BGRX because the tiled_memcpy
-* function doesn't set the last channel to 1.
+* function doesn't set the last channel to 1. Note this checks BaseFormat
+* rather than TexFormat in case the RGBX format is being simulated with an
+* RGBA format.
 */
-   if (rb->Format == MESA_FORMAT_B8G8R8X8_UNORM ||
-   rb->Format == MESA_FORMAT_R8G8B8X8_UNORM)
+   if (rb->_BaseFormat == GL_RGB)
   return false;
 
if (!intel_get_memcpy(rb->Format, format, type, &mem_copy, &cpp,
diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index 34b91e8..e3710da7 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -399,10 +399,11 @@ intel_gettexsubimage_tiled_memcpy(struct gl_context *ctx,
   return false;
 
/* We can't handle copying from RGBX or BGRX because the tiled_memcpy
-* function doesn't set the last channel to 1.
+* function doesn't set the last channel to 1. Note this checks BaseFormat
+* rather than TexFormat in case the RGBX format is being simulated with an
+* RGBA format.
 */
-   if (texImage->TexFormat == MESA_FORMAT_B8G8R8X8_UNORM ||
-   texImage->TexFormat == MESA_FORMAT_R8G8B8X8_UNORM)
+   if (texImage->_BaseFormat == GL_RGB)
   return false;
 
if (!intel_get_memcpy(texImage->TexFormat, format, type, &mem_copy, &cpp,
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] i965/gen9: Don't allow the RGBX formats for texturing/rendering

2015-11-19 Thread Neil Roberts
The RGBX surface formats aren't renderable so we internally remap them
to RGBA when rendering. They are retained as RGBX when used as
textures. However since the previous patch fast clears are disabled
for surfaces that use a different format for rendering than for
texturing. To avoid this situation we can just pretend not to support
RGBX formats at all. This will cause the upper layers of mesa to pick
an RGBA format internally instead. This should be safe because we
always override the alpha component to 1.0 for RGBX in the texture
swizzle anyway. We could also do this for all gens except that it's a
bit more difficult when the hardware doesn't support texture
swizzling. Gens using the blorp have further problems because that
doesn't implement this swizzle override.
---
 src/mesa/drivers/dri/i965/brw_surface_formats.c | 28 +
 1 file changed, 28 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
b/src/mesa/drivers/dri/i965/brw_surface_formats.c
index 7c38431..7594aca 100644
--- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
+++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
@@ -733,6 +733,34 @@ brw_init_surface_formats(struct brw_context *brw)
if (brw->gen >= 8)
   ctx->TextureFormatSupported[MESA_FORMAT_Z_UNORM16] = true;
 
+   /* The RGBX formats are not renderable. Normally these get mapped
+* internally to RGBA formats when rendering. However on Gen9+ when this
+* internal override is used fast clears don't work so they are disabled in
+* brw_meta_fast_clear. To avoid this problem we can just pretend not to
+* support RGBX formats at all. This will cause the upper layers of Mesa to
+* pick the RGBA formats instead. This works fine because when it is used
+* as a texture source the swizzle state is programmed to force the alpha
+* channel to 1.0 anyway. We could also do this for all gens except that
+* it's a bit more difficult when the hardware doesn't support texture
+* swizzling. Gens using the blorp have further problems because that
+* doesn't implement this swizzle override. We don't need to do this for
+* BGRX because that actually is supported natively on Gen8+.
+*/
+   if (brw->gen >= 9) {
+  static const mesa_format rgbx_formats[] = {
+ MESA_FORMAT_R8G8B8X8_UNORM,
+ MESA_FORMAT_R8G8B8X8_SRGB,
+ MESA_FORMAT_RGBX_UNORM16,
+ MESA_FORMAT_RGBX_FLOAT16,
+ MESA_FORMAT_RGBX_FLOAT32
+  };
+
+  for (int i = 0; i < ARRAY_SIZE(rgbx_formats); i++) {
+ ctx->TextureFormatSupported[rgbx_formats[i]] = false;
+ brw->format_supported_as_render_target[rgbx_formats[i]] = false;
+  }
+   }
+
/* On hardware that lacks support for ETC1, we map ETC1 to RGBX
 * during glCompressedTexImage2D(). See intel_mipmap_tree::wraps_etc1.
 */
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] i965/gen9: Don't disallow fast clear for MSRT formats matching render

2015-11-19 Thread Neil Roberts
Previously fast clear was disallowed on Gen9 for MSRTs with the claim
that some formats don't work but we didn't understand why. On further
investigation it seems the formats that don't work are the ones where
the render surface format is being overriden to a different format
than the one used for texturing. The one used for texturing is not
actually a renderable format. It arguably makes sense that the sampler
hardware doesn't handle the fast color correctly in these cases
because it shouldn't be possible to end up with a fast cleared surface
that is non-renderable.

This patch changes the limitation to prevent fast clear for surfaces
where the format for rendering is overriden.
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 1f8bfdf..f2e894a 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -48,6 +48,7 @@
 #include "brw_defines.h"
 #include "brw_context.h"
 #include "brw_draw.h"
+#include "brw_state.h"
 #include "intel_fbo.h"
 #include "intel_batchbuffer.h"
 
@@ -549,11 +550,17 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
   if (brw->gen < 7)
  clear_type = REP_CLEAR;
 
-  /* Certain formats have unresolved issues with sampling from the MCS
-   * buffer on Gen9. This disables fast clears altogether for MSRTs until
-   * we can figure out what's going on.
+  /* If we're mapping the render format to a different format than the
+   * format we use for texturing then it is a bit questionable whether it
+   * should be possible to use a fast clear. Although we only actually
+   * render using a renderable format, without the override workaround it
+   * wouldn't be possible to have a non-renderable surface in a fast clear
+   * state so the hardware probably legitimately doesn't need to support
+   * this case. At least on Gen9 this really does seem to cause problems.
*/
-  if (brw->gen >= 9 && irb->mt->num_samples > 1)
+  if (brw->gen >= 9 &&
+  brw_format_for_mesa_format(irb->mt->format) !=
+  brw->render_target_format[irb->mt->format])
  clear_type = REP_CLEAR;
 
   if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_NO_MCS)
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] i965/gen8: Allow rendering to B8G8R8X8

2015-11-19 Thread Neil Roberts
Since Gen8 this is allowed as a rendering target so we don't need to
override it to B8G8R8A8. This is helpful on Gen9+ where using this
override causes fast clears not to work.
---
 src/mesa/drivers/dri/i965/brw_surface_formats.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
b/src/mesa/drivers/dri/i965/brw_surface_formats.c
index 55e7e64..7c38431 100644
--- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
+++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
@@ -167,8 +167,8 @@ const struct surface_format_info surface_formats[] = {
SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   I32_FLOAT)
SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   L32_FLOAT)
SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   A32_FLOAT)
-   SF( Y,  Y,  x,  Y,  x,  x,  x,  x, 60,   90,   B8G8R8X8_UNORM)
-   SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   B8G8R8X8_UNORM_SRGB)
+   SF( Y,  Y,  x,  Y, 80, 80,  x,  x, 60,   90,   B8G8R8X8_UNORM)
+   SF( Y,  Y,  x,  x, 80, 80,  x,  x,  x,x,   B8G8R8X8_UNORM_SRGB)
SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R8G8B8X8_UNORM)
SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R8G8B8X8_UNORM_SRGB)
SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R9G9B9E5_SHAREDEXP)
@@ -670,9 +670,10 @@ brw_init_surface_formats(struct brw_context *brw)
  * mask writes to alpha (ala glColorMask) and reconfigure the
  * alpha blending hardware to use GL_ONE (or GL_ZERO) for
  * cases where GL_DST_ALPHA (or GL_ONE_MINUS_DST_ALPHA) is
- * used.
+ * used. On Gen8+ BGRX is actually allowed (but not RGBX).
  */
-render = BRW_SURFACEFORMAT_B8G8R8A8_UNORM;
+ if (gen < tinfo->render_target)
+render = BRW_SURFACEFORMAT_B8G8R8A8_UNORM;
 break;
   case BRW_SURFACEFORMAT_R8G8B8X8_UNORM:
  render = BRW_SURFACEFORMAT_R8G8B8A8_UNORM;
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] blit: Don't take into account the Mesa format when checking MSRT blit

2015-11-19 Thread Neil Roberts
According to the GLES3 spec, blitting between multisample FBOs with
different internal formats should not be allowed. The
compatible_resolve_formats function implements this check. Previously
it had a shortcut where if the Mesa formats of the two renderbuffers
were the same then it would assume the blit is ok. However some
drivers map different internal formats to the same Mesa format, for
example it might implement both GL_RGB and GL_RGBA textures with
MESA_FORMAT_R8G8B8A_UNORM. The function is used to generate a GL error
according to what the GL spec requires so the blit should not be
allowed in that case. This patch just removes the shortcut so that it
only ever looks at the internal format.

Note that I posted a related patch to disable this check altogether
for desktop GL. However this function is still used on GLES3 because
there are conformance tests that require this behaviour so this patch
is still useful.

Cc: Marek Olšák 
---
 src/mesa/main/blit.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/src/mesa/main/blit.c b/src/mesa/main/blit.c
index a32f1a4..deb4f5a 100644
--- a/src/mesa/main/blit.c
+++ b/src/mesa/main/blit.c
@@ -129,20 +129,22 @@ compatible_resolve_formats(const struct gl_renderbuffer 
*readRb,
 {
GLenum readFormat, drawFormat;
 
-   /* The simple case where we know the backing Mesa formats are the same.
-*/
-   if (_mesa_get_srgb_format_linear(readRb->Format) ==
-   _mesa_get_srgb_format_linear(drawRb->Format)) {
-  return GL_TRUE;
-   }
-
-   /* The Mesa formats are different, so we must check whether the internal
-* formats are compatible.
+   /* This checks whether the internal formats are compatible rather than the
+* Mesa format for two reasons:
+*
+* • Under some circumstances, the user may request e.g. two GL_RGBA8
+*   textures and get two entirely different Mesa formats like RGBA and
+*   ARGB. Drivers behaving like that should be able to cope with
+*   non-matching formats by themselves, because it's not the user's fault.
+*
+* • Picking two different internal formats can end up with the same Mesa
+*   format. For example the driver might be simulating GL_RGB textures
+*   with GL_RGBA internally and in that case both internal formats would
+*   end up with RGBA.
 *
-* Under some circumstances, the user may request e.g. two GL_RGBA8
-* textures and get two entirely different Mesa formats like RGBA and
-* ARGB. Drivers behaving like that should be able to cope with
-* non-matching formats by themselves, because it's not the user's fault.
+* This function is used to generate a GL error according to the spec so in
+* both cases we want to be looking at the appliation-level format, which
+* is InternalFormat.
 *
 * Blits between linear and sRGB formats are also allowed.
 */
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/9] i965/fs: print writemask_all when it's enabled

2015-11-19 Thread Connor Abbott
I think Ken already pushed a similar patch so we can drop this.

On Thu, Nov 19, 2015 at 5:05 AM, Iago Toral Quiroga  wrote:
> From: Connor Abbott 
>
> Reviewed-by: Iago Toral Quiroga 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 2d4ed6a..63dcba7 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -4787,6 +4787,9 @@ fs_visitor::dump_instruction(backend_instruction 
> *be_inst, FILE *file)
>   fprintf(file, "1sthalf ");
> }
>
> +   if (inst->force_writemask_all)
> +  fprintf(file, "WE_all ");
> +
> fprintf(file, "\n");
>  }
>
> --
> 1.9.1
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/8] Implement EXT_shader_samples_identical

2015-11-19 Thread Emil Velikov
On 19 November 2015 at 14:53, Ilia Mirkin  wrote:
> On Thu, Nov 19, 2015 at 9:47 AM, Emil Velikov  
> wrote:
>> Hi Ian,
>>
>> Attempting to high-jack the thread :-P
>>
>> On 18 November 2015 at 23:46, Ian Romanick  wrote:
>>
>>> I really wanted to get this in the next Mesa release.  For some reason,
>>> I thought the branch point was after Thanksgiving (which is next
>>> Thursday).  Ken reminded me yesterday that the branch point is actually
>>> this Friday. :( As a result, I'm sending it out today to get review as
>>> soon as possible.
>>>
>> Any suggestions on how we make dates more prominent/obvious ? I was
>> going to say "we even have it in #irc-devel channel topic" but you
>> don't seem to be around these days.
>>
>> Alternatively you can set a filter in your email client - the subject
>> is always "Mesa #VERSION release plan"
>
> A common thing is to have a $project-announce mailing list which would
> receive such emails along with actual release notifications.
>
Good idea - we have mesa-announce so might as well use it :P
It is kind of strange that we do not use it (be that myself or
Ian/Carl priorly) for these things.

Will do from now on. Thanks !
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >