Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v4]

2018-10-16 Thread Keith Packard
Jason Ekstrand  writes:

> Doing all of the CPU sampling on one side or the other of the GPU sampling
> would probably reduce our window.

True, although as I said, it's taking several µs to get through the
loop, and the gpu clock tick is far smaller than that, so even adding
the two values together to make it fit the current implementation won't
make the deviation that much larger.

> This leaves us with a delta of I + max(P(M), P(R), P(G)).  In
> particular, any two real-number valued times are, instantaneously,
> within that interval.

That, at least, would be easy to compute, and scale nicely if we added
more clocks in the future.

> Personally, I'm completely content to have the delta just be a the first
> one: a bound on the difference between any two real-valued times.  At this
> point, I can guarantee you that far more thought has been put into this
> mesa-dev discussion than was put into the spec and I think we're rapidly
> getting to the point of diminishing returns. :-)

It seems likely. How about we do the above computation for the current
code and leave it at that?

-- 
-keith


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v4]

2018-10-16 Thread Keith Packard
Jason Ekstrand  writes:


> The result is that we're looking at something like "end - start +
> monotonic_raw_tick + max(gpu_tick, monotonic_tick)"  Have I just come
> full-circle?

Yes.  As monotonic_raw_tick and monotonic_tick are both 1...

-- 
-keith


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir: remove some redundant bcsel instructions

2018-10-16 Thread Timothy Arceri
For example:

   vec1 32 ssa_386 = feq ssa_333.x, ssa_6
   vec1 32 ssa_387 = feq ssa_333.x, ssa_2
   vec1 32 ssa_391 = bcsel ssa_387, ssa_388, ssa_324
   vec1 32 ssa_396 = bcsel ssa_386, ssa_324, ssa_391

Can be simplified to:

   vec1 32 ssa_386 = feq ssa_333.x, ssa_6
   vec1 32 ssa_391 = bcsel ssa_387, ssa_388, ssa_324

There are a bunch of these in Rise of The Tomb Raiders Vulkan
shaders. There are also a hadful of shaders helped in shader-db
but the changes there are smaller.

For RADV:

Totals from affected shaders:
SGPRS: 11184 -> 11168 (-0.14 %)
VGPRS: 11484 -> 11484 (0.00 %)
Spilled SGPRs: 1119 -> 1116 (-0.27 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 1210856 -> 1210372 (-0.04 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 360 -> 360 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
---
 src/compiler/nir/nir_opt_algebraic.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index cc747250ba5..7530710cbe0 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -34,6 +34,7 @@ a = 'a'
 b = 'b'
 c = 'c'
 d = 'd'
+e = 'e'
 
 # Written in the form (, ) where  is an expression
 # and  is either an expression or a value.  An expression is
@@ -525,6 +526,9 @@ optimizations = [
# The result of this should be hit by constant propagation and, in the
# next round of opt_algebraic, get picked up by one of the above two.
(('bcsel', '#a', b, c), ('bcsel', ('ine', 'a', 0), b, c)),
+   # Remove redundant bcsel
+   (('bcsel', ('ieq', '#a', b), c, ('bcsel', ('ieq', '#d', b), e, c)), 
('bcsel', ('ieq', d, b), e, c)),
+   (('bcsel', ('feq', '#a', b), c, ('bcsel', ('feq', '#d', b), e, c)), 
('bcsel', ('feq', d, b), e, c)),
 
(('bcsel', a, b, b), b),
(('fcsel', a, b, b), b),
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v4]

2018-10-16 Thread Jason Ekstrand
On Tue, Oct 16, 2018 at 5:56 PM Keith Packard  wrote:

> Bas Nieuwenhuizen  writes:
>
> > You can make the monotonic case the same as the raw case if you make
> > sure to always sample the CPU first by e.g. splitting the loops into
> > two and doing CPU in the first and GPU in the second. That way you
> > make the case above impossible.
>
> Right, I had thought of that, and it probably solves the problem for
> us. If more time domains are added, things become 'more complicated'
> though.
>

Doing all of the CPU sampling on one side or the other of the GPU sampling
would probably reduce our window.


> > That said "start of the interval of the tick" is kinda arbitrary and
> > you could pick random other points on that interval, so depending on
> > what requirements you put on it (i.e. can the chosen position be
> > different per call, consistent but implicit or explicitly picked which
> > might be leaked through the interface) the max deviation changes. So
> > depending on interpretation this thing can be very moot ...
>
> It doesn't really matter what phase you use; the timer increments
> periodically, and what really matters is the time when that happens
> relative to other clocks changing.
>

Agreed.

Thinking about this a bit more, I think it helps to consider each clock to
be a real number that's changing continuously in time and what you actually
measure is floor(x / P(x)) where P(x) is the period of x in nanoseconds..
(or ceil; it doesn't matter so long as you're consistent.)  At any given
point, the clocks do have an exact value relative to each other.  When you
sample, you grab floor(M / P(M)), floor(G / P(G)), and floor(R / P(R)) all
in some interval of size I.  The delta between the real values sampled is
most I but the sampling takes a floor operation, so the actual value of any
given clock C may be as much as P(C) greater than what was sampled but it
cannot be lower (assuming the floor convention).  This leaves us with a
delta of I + max(P(M), P(R), P(G)).  In particular, any two real-number
valued times are, instantaneously, within that interval.

The next question becomes, if I sample again and assume zero clock drift,
what are the bounds on the next sampling.  Above, we calculated the maximum
delta between real-valued clocks.  However, because we're sampling again,
we may end up with more phase shift issues and any clock may, again, be off
by as much as P(C).  However, again assuming no drift, no clock is going to
be off with respect to itself; just sampled at a different phase so I think
the most delta you can see between two clocks in the two samplings is the
sum of their periods.  So if the delta we're looking for is a delta for a
theoretical second sampling, I think it's I plus the maximum of the sums of
all pairs of periods.

Personally, I'm completely content to have the delta just be a the first
one: a bound on the difference between any two real-valued times.  At this
point, I can guarantee you that far more thought has been put into this
mesa-dev discussion than was put into the spec and I think we're rapidly
getting to the point of diminishing returns. :-)

--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v4]

2018-10-16 Thread Jason Ekstrand
On Tue, Oct 16, 2018 at 5:06 PM Keith Packard  wrote:

> Bas Nieuwenhuizen  writes:
>
> > Well the complication here is that in the MONOTONIC (not
> > MONOTONIC_RAW) case the CPU measurement can happen at the end of the
> > MONOTONIC_RAW interval (as the order of measurements is based on
> > argument order), so you can get a tick that started `period` (5 in
> > this case) monotonic ticks before the start of the interval and a CPU
> > measurement at the end of the interval.
>
> Ah, that's an excellent point. Let's split out raw and monotonic and
> take a look. You want the GPU sampled at the start of the raw interval
> and monotonic sampled at the end, I think?
>
>  w x y z 0 1 2 3 4 5 6 7 8 9 a b c d e f
> Raw  -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
>
>   0 1 2 3
> GPU   -_-_-_-_
>
> x y z 0 1 2 3 4 5 6 7 8 9 a b c
> Monotonic   -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
>
> Interval <->
> Deviation   <-->
>
> start = read(raw)   2
> gpu   = read(GPU)   1
> mono  = read(monotonic) 2
> end   = read(raw)   b
>
> In this case, the error between the monotonic pulse and the GPU is
> interval + gpu_period (probably plus one to include the measurement
> error of the raw clock).
>

I'm very confused by this case.  Why is monotonic timeline delayed?  It
seems to me like it's only the monotonic sampling that's delayed and the
result is that mono ends up closer to end than start so the sampled value
would be something like 9 or a rather than 2?

I think we can model this fairly simply as two components:

 1) The size of the sampling window; this is "end - start +
monotonic_raw_tick"
 2) The maximum phase shift of any sample.  The only issue here is that a
tick may have started before the sampling window so we need to add on the
maximum tick size.  The worst case bound for this is when the early sampled
clock is sampled at the end of a tick and the late sampled clock is sampled
at the beginning of a tick.

The result is that we're looking at something like "end - start +
monotonic_raw_tick + max(gpu_tick, monotonic_tick)"  Have I just come
full-circle?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/7] nir/int64: Add some more lowering helpers

2018-10-16 Thread Jason Ekstrand
On Tue, Oct 16, 2018 at 7:51 PM Matt Turner  wrote:

> On Sun, Oct 14, 2018 at 3:58 PM Jason Ekstrand 
> wrote:
> >
> > On October 14, 2018 17:12:34 Matt Turner  wrote:
> >
> > > From: Jason Ekstrand 
> > >
> > > [mattst88]: Found in an old branch of Jason's.
> > >
> > > Jason implemented: inot, iand, ior, iadd, isub, ineg, iabs, compare,
> > >   imin, imax, umin, umax
> > > Matt implemented:  ixor, imov, bcsel
> > > ---
> > > src/compiler/nir/nir_lower_int64.c | 186
> +
> > > 1 file changed, 186 insertions(+)
> > >
> > > diff --git a/src/compiler/nir/nir_lower_int64.c
> > > b/src/compiler/nir/nir_lower_int64.c
> > > index 0d7f165b406..6b269830801 100644
> > > --- a/src/compiler/nir/nir_lower_int64.c
> > > +++ b/src/compiler/nir/nir_lower_int64.c
> > > @@ -24,6 +24,192 @@
> > > #include "nir.h"
> > > #include "nir_builder.h"
> > >
> > > +static nir_ssa_def *
> > > +lower_imov64(nir_builder *b, nir_ssa_def *x)
> > > +{
> > > +   nir_ssa_def *x_lo = nir_unpack_64_2x32_split_x(b, x);
> > > +   nir_ssa_def *x_hi = nir_unpack_64_2x32_split_y(b, x);
> > > +
> > > +   return nir_pack_64_2x32_split(b, nir_imov(b, x_lo), nir_imov(b,
> x_hi));
> >
> > You don't really need the movs...
>
> Thanks. I think that was really a copy-and-paste-and-replace mistake.
>
> > > +}
> > > +
> > > +static nir_ssa_def *
> > > +lower_bcsel64(nir_builder *b, nir_ssa_def *cond, nir_ssa_def *x,
> > > nir_ssa_def *y)
> > > +{
> > > +   nir_ssa_def *x_lo = nir_unpack_64_2x32_split_x(b, x);
> > > +   nir_ssa_def *x_hi = nir_unpack_64_2x32_split_y(b, x);
> > > +   nir_ssa_def *y_lo = nir_unpack_64_2x32_split_x(b, y);
> > > +   nir_ssa_def *y_hi = nir_unpack_64_2x32_split_y(b, y);
> > > +
> > > +   return nir_pack_64_2x32_split(b, nir_bcsel(b, cond, x_lo, y_lo),
> > > +nir_bcsel(b, cond, x_hi, y_hi));
> > > +}
> > > +
> > > +static nir_ssa_def *
> > > +lower_inot64(nir_builder *b, nir_ssa_def *x)
> > > +{
> > > +   nir_ssa_def *x_lo = nir_unpack_64_2x32_split_x(b, x);
> > > +   nir_ssa_def *x_hi = nir_unpack_64_2x32_split_y(b, x);
> > > +
> > > +   return nir_pack_64_2x32_split(b, nir_inot(b, x_lo), nir_inot(b,
> x_hi));
> > > +}
> > > +
> > > +static nir_ssa_def *
> > > +lower_iand64(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y)
> > > +{
> > > +   nir_ssa_def *x_lo = nir_unpack_64_2x32_split_x(b, x);
> > > +   nir_ssa_def *x_hi = nir_unpack_64_2x32_split_y(b, x);
> > > +   nir_ssa_def *y_lo = nir_unpack_64_2x32_split_x(b, y);
> > > +   nir_ssa_def *y_hi = nir_unpack_64_2x32_split_y(b, y);
> > > +
> > > +   return nir_pack_64_2x32_split(b, nir_iand(b, x_lo, y_lo),
> > > +nir_iand(b, x_hi, y_hi));
> > > +}
> > > +
> > > +static nir_ssa_def *
> > > +lower_ior64(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y)
> > > +{
> > > +   nir_ssa_def *x_lo = nir_unpack_64_2x32_split_x(b, x);
> > > +   nir_ssa_def *x_hi = nir_unpack_64_2x32_split_y(b, x);
> > > +   nir_ssa_def *y_lo = nir_unpack_64_2x32_split_x(b, y);
> > > +   nir_ssa_def *y_hi = nir_unpack_64_2x32_split_y(b, y);
> > > +
> > > +   return nir_pack_64_2x32_split(b, nir_ior(b, x_lo, y_lo),
> > > +nir_ior(b, x_hi, y_hi));
> > > +}
> > > +
> > > +static nir_ssa_def *
> > > +lower_ixor64(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y)
> > > +{
> > > +   nir_ssa_def *x_lo = nir_unpack_64_2x32_split_x(b, x);
> > > +   nir_ssa_def *x_hi = nir_unpack_64_2x32_split_y(b, x);
> > > +   nir_ssa_def *y_lo = nir_unpack_64_2x32_split_x(b, y);
> > > +   nir_ssa_def *y_hi = nir_unpack_64_2x32_split_y(b, y);
> > > +
> > > +   return nir_pack_64_2x32_split(b, nir_ixor(b, x_lo, y_lo),
> > > +nir_ixor(b, x_hi, y_hi));
> > > +}
> > > +
> > > +static nir_ssa_def *
> > > +lower_iadd64(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y)
> > > +{
> > > +   nir_ssa_def *x_lo = nir_unpack_64_2x32_split_x(b, x);
> > > +   nir_ssa_def *x_hi = nir_unpack_64_2x32_split_y(b, x);
> > > +   nir_ssa_def *y_lo = nir_unpack_64_2x32_split_x(b, y);
> > > +   nir_ssa_def *y_hi = nir_unpack_64_2x32_split_y(b, y);
> > > +
> > > +   nir_ssa_def *res_lo = nir_iadd(b, x_lo, y_lo);
> > > +   nir_ssa_def *carry = nir_b2i(b, nir_ult(b, res_lo, x_lo));
> > > +   nir_ssa_def *res_hi = nir_iadd(b, carry, nir_iadd(b, x_hi, y_hi));
> > > +
> > > +   return nir_pack_64_2x32_split(b, res_lo, res_hi);
> > > +}
> > > +
> > > +static nir_ssa_def *
> > > +lower_isub64(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y)
> > > +{
> > > +   nir_ssa_def *x_lo = nir_unpack_64_2x32_split_x(b, x);
> > > +   nir_ssa_def *x_hi = nir_unpack_64_2x32_split_y(b, x);
> > > +   nir_ssa_def *y_lo = nir_unpack_64_2x32_split_x(b, y);
> > > +   nir_ssa_def *y_hi = nir_unpack_64_2x32_split_y(b, y);
> > > +
> > > +   nir_ssa_def *res_lo = nir_isub(b, x_lo, y_lo);
> > > +   /* In NIR, true is represented by ~0 which is -1 */
> >
> > We've had discussions (had some at XDC this year) 

Re: [Mesa-dev] [PATCH 2/7] nir/int64: Add some more lowering helpers

2018-10-16 Thread Matt Turner
On Sun, Oct 14, 2018 at 3:58 PM Jason Ekstrand  wrote:
>
> On October 14, 2018 17:12:34 Matt Turner  wrote:
>
> > From: Jason Ekstrand 
> >
> > [mattst88]: Found in an old branch of Jason's.
> >
> > Jason implemented: inot, iand, ior, iadd, isub, ineg, iabs, compare,
> >   imin, imax, umin, umax
> > Matt implemented:  ixor, imov, bcsel
> > ---
> > src/compiler/nir/nir_lower_int64.c | 186 
> > +
> > 1 file changed, 186 insertions(+)
> >
> > diff --git a/src/compiler/nir/nir_lower_int64.c
> > b/src/compiler/nir/nir_lower_int64.c
> > index 0d7f165b406..6b269830801 100644
> > --- a/src/compiler/nir/nir_lower_int64.c
> > +++ b/src/compiler/nir/nir_lower_int64.c
> > @@ -24,6 +24,192 @@
> > #include "nir.h"
> > #include "nir_builder.h"
> >
> > +static nir_ssa_def *
> > +lower_imov64(nir_builder *b, nir_ssa_def *x)
> > +{
> > +   nir_ssa_def *x_lo = nir_unpack_64_2x32_split_x(b, x);
> > +   nir_ssa_def *x_hi = nir_unpack_64_2x32_split_y(b, x);
> > +
> > +   return nir_pack_64_2x32_split(b, nir_imov(b, x_lo), nir_imov(b, x_hi));
>
> You don't really need the movs...

Thanks. I think that was really a copy-and-paste-and-replace mistake.

> > +}
> > +
> > +static nir_ssa_def *
> > +lower_bcsel64(nir_builder *b, nir_ssa_def *cond, nir_ssa_def *x,
> > nir_ssa_def *y)
> > +{
> > +   nir_ssa_def *x_lo = nir_unpack_64_2x32_split_x(b, x);
> > +   nir_ssa_def *x_hi = nir_unpack_64_2x32_split_y(b, x);
> > +   nir_ssa_def *y_lo = nir_unpack_64_2x32_split_x(b, y);
> > +   nir_ssa_def *y_hi = nir_unpack_64_2x32_split_y(b, y);
> > +
> > +   return nir_pack_64_2x32_split(b, nir_bcsel(b, cond, x_lo, y_lo),
> > +nir_bcsel(b, cond, x_hi, y_hi));
> > +}
> > +
> > +static nir_ssa_def *
> > +lower_inot64(nir_builder *b, nir_ssa_def *x)
> > +{
> > +   nir_ssa_def *x_lo = nir_unpack_64_2x32_split_x(b, x);
> > +   nir_ssa_def *x_hi = nir_unpack_64_2x32_split_y(b, x);
> > +
> > +   return nir_pack_64_2x32_split(b, nir_inot(b, x_lo), nir_inot(b, x_hi));
> > +}
> > +
> > +static nir_ssa_def *
> > +lower_iand64(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y)
> > +{
> > +   nir_ssa_def *x_lo = nir_unpack_64_2x32_split_x(b, x);
> > +   nir_ssa_def *x_hi = nir_unpack_64_2x32_split_y(b, x);
> > +   nir_ssa_def *y_lo = nir_unpack_64_2x32_split_x(b, y);
> > +   nir_ssa_def *y_hi = nir_unpack_64_2x32_split_y(b, y);
> > +
> > +   return nir_pack_64_2x32_split(b, nir_iand(b, x_lo, y_lo),
> > +nir_iand(b, x_hi, y_hi));
> > +}
> > +
> > +static nir_ssa_def *
> > +lower_ior64(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y)
> > +{
> > +   nir_ssa_def *x_lo = nir_unpack_64_2x32_split_x(b, x);
> > +   nir_ssa_def *x_hi = nir_unpack_64_2x32_split_y(b, x);
> > +   nir_ssa_def *y_lo = nir_unpack_64_2x32_split_x(b, y);
> > +   nir_ssa_def *y_hi = nir_unpack_64_2x32_split_y(b, y);
> > +
> > +   return nir_pack_64_2x32_split(b, nir_ior(b, x_lo, y_lo),
> > +nir_ior(b, x_hi, y_hi));
> > +}
> > +
> > +static nir_ssa_def *
> > +lower_ixor64(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y)
> > +{
> > +   nir_ssa_def *x_lo = nir_unpack_64_2x32_split_x(b, x);
> > +   nir_ssa_def *x_hi = nir_unpack_64_2x32_split_y(b, x);
> > +   nir_ssa_def *y_lo = nir_unpack_64_2x32_split_x(b, y);
> > +   nir_ssa_def *y_hi = nir_unpack_64_2x32_split_y(b, y);
> > +
> > +   return nir_pack_64_2x32_split(b, nir_ixor(b, x_lo, y_lo),
> > +nir_ixor(b, x_hi, y_hi));
> > +}
> > +
> > +static nir_ssa_def *
> > +lower_iadd64(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y)
> > +{
> > +   nir_ssa_def *x_lo = nir_unpack_64_2x32_split_x(b, x);
> > +   nir_ssa_def *x_hi = nir_unpack_64_2x32_split_y(b, x);
> > +   nir_ssa_def *y_lo = nir_unpack_64_2x32_split_x(b, y);
> > +   nir_ssa_def *y_hi = nir_unpack_64_2x32_split_y(b, y);
> > +
> > +   nir_ssa_def *res_lo = nir_iadd(b, x_lo, y_lo);
> > +   nir_ssa_def *carry = nir_b2i(b, nir_ult(b, res_lo, x_lo));
> > +   nir_ssa_def *res_hi = nir_iadd(b, carry, nir_iadd(b, x_hi, y_hi));
> > +
> > +   return nir_pack_64_2x32_split(b, res_lo, res_hi);
> > +}
> > +
> > +static nir_ssa_def *
> > +lower_isub64(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y)
> > +{
> > +   nir_ssa_def *x_lo = nir_unpack_64_2x32_split_x(b, x);
> > +   nir_ssa_def *x_hi = nir_unpack_64_2x32_split_y(b, x);
> > +   nir_ssa_def *y_lo = nir_unpack_64_2x32_split_x(b, y);
> > +   nir_ssa_def *y_hi = nir_unpack_64_2x32_split_y(b, y);
> > +
> > +   nir_ssa_def *res_lo = nir_isub(b, x_lo, y_lo);
> > +   /* In NIR, true is represented by ~0 which is -1 */
>
> We've had discussions (had some at XDC this year) about changing booleans
> to one-bit which would break this.  Doing b2i would be safer but this does
> work for now.
>
> > +   nir_ssa_def *borrow = nir_ult(b, x_lo, y_lo);
> > +   nir_ssa_def *res_hi = nir_iadd(b, nir_isub(b, x_hi, y_hi), borrow);
> > +
> > +   return nir_pack_64_2x32_split(b, 

[Mesa-dev] [PATCH] intel/eu: Don't apply chansel when repctrl is set

2018-10-16 Thread Sagar Ghuge
Signed-off-by: Sagar Ghuge 
---
 src/intel/compiler/brw_eu_emit.c | 36 
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index 0cbc682ebc..a6b45fcb1a 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++ b/src/intel/compiler/brw_eu_emit.c
@@ -765,31 +765,49 @@ brw_alu3(struct brw_codegen *p, unsigned opcode, struct 
brw_reg dest,
   brw_inst_set_3src_a16_dst_writemask(devinfo, inst, dest.writemask);
 
   assert(src0.file == BRW_GENERAL_REGISTER_FILE);
-  brw_inst_set_3src_a16_src0_swizzle(devinfo, inst, src0.swizzle);
   brw_inst_set_3src_a16_src0_subreg_nr(devinfo, inst, 
get_3src_subreg_nr(src0));
   brw_inst_set_3src_src0_reg_nr(devinfo, inst, src0.nr);
   brw_inst_set_3src_src0_abs(devinfo, inst, src0.abs);
   brw_inst_set_3src_src0_negate(devinfo, inst, src0.negate);
-  brw_inst_set_3src_a16_src0_rep_ctrl(devinfo, inst,
-  src0.vstride == 
BRW_VERTICAL_STRIDE_0);
+
+  /* RepCtrl field in Source or Destination Operand table in BDW Bspec
+   * says:
+   *"ChanSel does not apply when Replicate Control is set."
+   */
+  if (src0.vstride == BRW_VERTICAL_STRIDE_0)
+ brw_inst_set_3src_a16_src0_rep_ctrl(devinfo, inst, true);
+  else
+ brw_inst_set_3src_a16_src0_swizzle(devinfo, inst, src0.swizzle);
 
   assert(src1.file == BRW_GENERAL_REGISTER_FILE);
-  brw_inst_set_3src_a16_src1_swizzle(devinfo, inst, src1.swizzle);
   brw_inst_set_3src_a16_src1_subreg_nr(devinfo, inst, 
get_3src_subreg_nr(src1));
   brw_inst_set_3src_src1_reg_nr(devinfo, inst, src1.nr);
   brw_inst_set_3src_src1_abs(devinfo, inst, src1.abs);
   brw_inst_set_3src_src1_negate(devinfo, inst, src1.negate);
-  brw_inst_set_3src_a16_src1_rep_ctrl(devinfo, inst,
-  src1.vstride == 
BRW_VERTICAL_STRIDE_0);
+
+  /* RepCtrl field in Source or Destination Operand table in BDW Bspec
+   * says:
+   *"ChanSel does not apply when Replicate Control is set."
+   */
+  if (src1.vstride == BRW_VERTICAL_STRIDE_0)
+ brw_inst_set_3src_a16_src1_rep_ctrl(devinfo, inst, true);
+  else
+ brw_inst_set_3src_a16_src1_swizzle(devinfo, inst, src1.swizzle);
 
   assert(src2.file == BRW_GENERAL_REGISTER_FILE);
-  brw_inst_set_3src_a16_src2_swizzle(devinfo, inst, src2.swizzle);
   brw_inst_set_3src_a16_src2_subreg_nr(devinfo, inst, 
get_3src_subreg_nr(src2));
   brw_inst_set_3src_src2_reg_nr(devinfo, inst, src2.nr);
   brw_inst_set_3src_src2_abs(devinfo, inst, src2.abs);
   brw_inst_set_3src_src2_negate(devinfo, inst, src2.negate);
-  brw_inst_set_3src_a16_src2_rep_ctrl(devinfo, inst,
-  src2.vstride == 
BRW_VERTICAL_STRIDE_0);
+
+  /* RepCtrl field in Source or Destination Operand table in BDW Bspec
+   * says:
+   *"ChanSel does not apply when Replicate Control is set."
+   */
+  if (src2.vstride == BRW_VERTICAL_STRIDE_0)
+ brw_inst_set_3src_a16_src2_rep_ctrl(devinfo, inst, true);
+  else
+ brw_inst_set_3src_a16_src2_swizzle(devinfo, inst, src2.swizzle);
 
   if (devinfo->gen >= 7) {
  /* Set both the source and destination types based on dest.type,
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v4]

2018-10-16 Thread Keith Packard
Bas Nieuwenhuizen  writes:

> You can make the monotonic case the same as the raw case if you make
> sure to always sample the CPU first by e.g. splitting the loops into
> two and doing CPU in the first and GPU in the second. That way you
> make the case above impossible.

Right, I had thought of that, and it probably solves the problem for
us. If more time domains are added, things become 'more complicated'
though.

> That said "start of the interval of the tick" is kinda arbitrary and
> you could pick random other points on that interval, so depending on
> what requirements you put on it (i.e. can the chosen position be
> different per call, consistent but implicit or explicitly picked which
> might be leaked through the interface) the max deviation changes. So
> depending on interpretation this thing can be very moot ...

It doesn't really matter what phase you use; the timer increments
periodically, and what really matters is the time when that happens
relative to other clocks changing.

-- 
-keith


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 107765] [regression] Batman Arkham City crashes with DXVK under wine

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107765

--- Comment #12 from Timothy Arceri  ---
(In reply to farmboy0+freedesktop from comment #11)
> Just start the game. After the intro videos it crashes, before the first
> game screen.
> Or if you want a faster crash do this:
> https://pcgamingwiki.com/wiki/Batman:_Arkham_City#Skip_intro_videos

I cannot reproduce this new crash, the game works issue free for me now. I
updated to DXVK 0.90 to be sure but still not seeing any problem.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v4]

2018-10-16 Thread Bas Nieuwenhuizen
On Wed, Oct 17, 2018 at 12:06 AM Keith Packard  wrote:
>
> Bas Nieuwenhuizen  writes:
>
> > Well the complication here is that in the MONOTONIC (not
> > MONOTONIC_RAW) case the CPU measurement can happen at the end of the
> > MONOTONIC_RAW interval (as the order of measurements is based on
> > argument order), so you can get a tick that started `period` (5 in
> > this case) monotonic ticks before the start of the interval and a CPU
> > measurement at the end of the interval.
>
> Ah, that's an excellent point. Let's split out raw and monotonic and
> take a look. You want the GPU sampled at the start of the raw interval
> and monotonic sampled at the end, I think?
>
>  w x y z 0 1 2 3 4 5 6 7 8 9 a b c d e f
> Raw  -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
>
>   0 1 2 3
> GPU   -_-_-_-_
>
> x y z 0 1 2 3 4 5 6 7 8 9 a b c
> Monotonic   -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
>
> Interval <->
> Deviation   <-->
>
> start = read(raw)   2
> gpu   = read(GPU)   1
> mono  = read(monotonic) 2
> end   = read(raw)   b
>
> In this case, the error between the monotonic pulse and the GPU is
> interval + gpu_period (probably plus one to include the measurement
> error of the raw clock).
>
> Thanks for finding this case.
>
> Now, I guess the question is whether we want to try and find the
> smallest maxDeviation possible for each query. For instance, if the
> application asks only for raw and gpu, the max_deviation could be
> max2(interval+1,gpu_period), but if it asks for monotonic and gpu, it
> would be interval+1+gpu_period. I'm not seeing a simple definition
> here...

You can make the monotonic case the same as the raw case if you make
sure to always sample the CPU first by e.g. splitting the loops into
two and doing CPU in the first and GPU in the second. That way you
make the case above impossible.

That said "start of the interval of the tick" is kinda arbitrary and
you could pick random other points on that interval, so depending on
what requirements you put on it (i.e. can the chosen position be
different per call, consistent but implicit or explicitly picked which
might be leaked through the interface) the max deviation changes. So
depending on interpretation this thing can be very moot ...


>
> --
> -keith
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v4]

2018-10-16 Thread Keith Packard
Bas Nieuwenhuizen  writes:

> Well the complication here is that in the MONOTONIC (not
> MONOTONIC_RAW) case the CPU measurement can happen at the end of the
> MONOTONIC_RAW interval (as the order of measurements is based on
> argument order), so you can get a tick that started `period` (5 in
> this case) monotonic ticks before the start of the interval and a CPU
> measurement at the end of the interval.

Ah, that's an excellent point. Let's split out raw and monotonic and
take a look. You want the GPU sampled at the start of the raw interval
and monotonic sampled at the end, I think?

 w x y z 0 1 2 3 4 5 6 7 8 9 a b c d e f
Raw  -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-

  0 1 2 3
GPU   -_-_-_-_

x y z 0 1 2 3 4 5 6 7 8 9 a b c
Monotonic   -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-

Interval <->
Deviation   <-->

start = read(raw)   2
gpu   = read(GPU)   1
mono  = read(monotonic) 2
end   = read(raw)   b

In this case, the error between the monotonic pulse and the GPU is
interval + gpu_period (probably plus one to include the measurement
error of the raw clock).

Thanks for finding this case.

Now, I guess the question is whether we want to try and find the
smallest maxDeviation possible for each query. For instance, if the
application asks only for raw and gpu, the max_deviation could be
max2(interval+1,gpu_period), but if it asks for monotonic and gpu, it
would be interval+1+gpu_period. I'm not seeing a simple definition
here...

-- 
-keith


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v4]

2018-10-16 Thread Keith Packard
Jason Ekstrand  writes:


> You've got me almost convinced as well.  Thanks for the diagrams!  I think
> instead of adding 1 perhaps what we want is
>
> max2(sample_interval_ns, gpu_tick_ns + monotonic_tick_ns)
>
> Where monotonic_tick_ns is maybe as low as 1.  Am I following you correctly?

Not quite; I was thinking that because the sample_interval_ns is
measured by sampling the monotonic clock twice, the actual interval can
be up to 1 tick longer, so

max2(sample_interval_ns + monotonic_tick_ns, gpu_tick_ns)

The gpu_tick_ns is computed, not measured, and so accurately reflects
the maximum deviation in the measurements.

-- 
-keith


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 107765] [regression] Batman Arkham City crashes with DXVK under wine

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107765

--- Comment #11 from farmboy0+freedesk...@googlemail.com ---
Just start the game. After the intro videos it crashes, before the first game
screen.
Or if you want a faster crash do this:
https://pcgamingwiki.com/wiki/Batman:_Arkham_City#Skip_intro_videos

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v4]

2018-10-16 Thread Bas Nieuwenhuizen
On Tue, Oct 16, 2018 at 11:07 PM Keith Packard  wrote:
>
> Jason Ekstrand  writes:
>
> > I think what Bas is getting at is that there are two problems:
> >
> >  1) We are not sampling at exactly the same time
> >  2) The two clocks may not tick at exactly the same time.
>
> Thanks for the clarification.
>
> > If we want to be conservative, I suspect Bas may be right that adding is
> > the safer thing to do.
>
> Yes, it's certainly safe to increase the value of
> maxDeviation. Currently, the time it takes to sample all of the clocks
> is far larger than the GPU tick, so adding that in would not have a huge
> impact on the value returned to the application.
>
> I'd like to dig in a little further and actually understand if the
> current computation (which is derived directly from the Vulkan spec) is
> wrong, and if so, whether the spec needs to be adjusted.
>
> I think the question is what 'maxDeviation' is supposed to
> represent. All the spec says is:
>
>  * pMaxDeviation is a pointer to a 64-bit unsigned integer value in
>which the strictly positive maximum deviation, in nanoseconds, of the
>calibrated timestamp values is returned.
>
> I interpret this as the maximum error in sampling the individual clocks,
> which is to say that the clock values are guaranteed to have been
> sampled within this interval of each other.

With this interpretation I don't think you need to account for period at all?

>
> So, if we have a monotonic clock and GPU clock:
>
>   0 1 2 3 4 5 6 7 8 9 a b c d e f
> Monotonic -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
>
>   0 1 2 3
> GPU   -_-_-_-_
>
>
> gpu_period in this case is 5 ticks of the monotonic clock.
>
> Now, I perform three operations:
>
> start = read(monotonic)
> gpu   = read(GPU)
> end   = read(monotonic)
>
> Let's say that:
>
> start = 2
> GPU = 1 * 5 = 5 monotonic equivalent ticks
> end = b
>
> So, the question is only how large the error between GPU and start could
> possibly be. Certainly the GPU clock was sampled some time between
> when monotonic tick 2 started and monotonic tick b ended. But, we have
> no idea what phase the GPU clock was in when sampled.
>
> Let's imagine we manage to sample the GPU clock immediately after the
> first monotonic sample. I'll shift the offset of the monotonic and GPU
> clocks to retain the same values (start = 2, GPU = 1), but now
> the GPU clock is being sampled immediately after monotonic time 2:
>
> w x y z 0 1 2 3 4 5 6 7 8 9 a b c d e f
> Monotonic   -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
>
>   0 1 2 3
> GPU   -_-_-_-_
>
>
> In this case, the GPU tick started at monotonic time y, nearly 5
> monotonic ticks earlier than the measured monotonic time, so the
> deviation between GPU and monotonic would be 5 ticks.

Well the complication here is that in the MONOTONIC (not
MONOTONIC_RAW) case the CPU measurement can happen at the end of the
MONOTONIC_RAW interval (as the order of measurements is based on
argument order), so you can get a tick that started `period` (5 in
this case) monotonic ticks before the start of the interval and a CPU
measurement at the end of the interval.

>
> If we sample the GPU clock immediately before the second monotonic
> sample, then that GPU tick either starts earlier than the range, in
> which case the above evaluation holds, or the GPU tick is entirely
> contained within the range:
>
>   0 1 2 3 4 5 6 7 8 9 a b c d e f
> Monotonic -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
>
>z 0 1 2 3
> GPU  __-_-_-_-_-
>
> In this case, the deviation between the first monotonic sample (that
> returned to the application as the monotonic time) and the GPU sample is
> the whole interval of measurement (b - 2).
>
> I think I've just managed to convince myself that Jason's first
> suggestion (max2(sample interval, gpu interval)) is correct, although I
> think we should add '1' to the interval to account for sampling phase
> errors in the monotonic clock. As that's measured in ns, and I'm
> currently getting values in the µs range, that's a small error in
> comparison.
>
> --
> -keith
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v4]

2018-10-16 Thread Jason Ekstrand
On Tue, Oct 16, 2018 at 4:07 PM Keith Packard  wrote:

> Jason Ekstrand  writes:
>
> > I think what Bas is getting at is that there are two problems:
> >
> >  1) We are not sampling at exactly the same time
> >  2) The two clocks may not tick at exactly the same time.
>
> Thanks for the clarification.
>
> > If we want to be conservative, I suspect Bas may be right that adding is
> > the safer thing to do.
>
> Yes, it's certainly safe to increase the value of
> maxDeviation. Currently, the time it takes to sample all of the clocks
> is far larger than the GPU tick, so adding that in would not have a huge
> impact on the value returned to the application.
>
> I'd like to dig in a little further and actually understand if the
> current computation (which is derived directly from the Vulkan spec) is
> wrong, and if so, whether the spec needs to be adjusted.
>
> I think the question is what 'maxDeviation' is supposed to
> represent. All the spec says is:
>
>  * pMaxDeviation is a pointer to a 64-bit unsigned integer value in
>which the strictly positive maximum deviation, in nanoseconds, of the
>calibrated timestamp values is returned.
>
> I interpret this as the maximum error in sampling the individual clocks,
> which is to say that the clock values are guaranteed to have been
> sampled within this interval of each other.
>
> So, if we have a monotonic clock and GPU clock:
>
>   0 1 2 3 4 5 6 7 8 9 a b c d e f
> Monotonic -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
>
>   0 1 2 3
> GPU   -_-_-_-_
>
>
> gpu_period in this case is 5 ticks of the monotonic clock.
>
> Now, I perform three operations:
>
> start = read(monotonic)
> gpu   = read(GPU)
> end   = read(monotonic)
>
> Let's say that:
>
> start = 2
> GPU = 1 * 5 = 5 monotonic equivalent ticks
> end = b
>
> So, the question is only how large the error between GPU and start could
> possibly be. Certainly the GPU clock was sampled some time between
> when monotonic tick 2 started and monotonic tick b ended. But, we have
> no idea what phase the GPU clock was in when sampled.
>
> Let's imagine we manage to sample the GPU clock immediately after the
> first monotonic sample. I'll shift the offset of the monotonic and GPU
> clocks to retain the same values (start = 2, GPU = 1), but now
> the GPU clock is being sampled immediately after monotonic time 2:
>
> w x y z 0 1 2 3 4 5 6 7 8 9 a b c d e f
> Monotonic   -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
>
>   0 1 2 3
> GPU   -_-_-_-_
>
>
> In this case, the GPU tick started at monotonic time y, nearly 5
> monotonic ticks earlier than the measured monotonic time, so the
> deviation between GPU and monotonic would be 5 ticks.
>
> If we sample the GPU clock immediately before the second monotonic
> sample, then that GPU tick either starts earlier than the range, in
> which case the above evaluation holds, or the GPU tick is entirely
> contained within the range:
>
>   0 1 2 3 4 5 6 7 8 9 a b c d e f
> Monotonic -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
>
>z 0 1 2 3
> GPU  __-_-_-_-_-
>
> In this case, the deviation between the first monotonic sample (that
> returned to the application as the monotonic time) and the GPU sample is
> the whole interval of measurement (b - 2).
>
> I think I've just managed to convince myself that Jason's first
> suggestion (max2(sample interval, gpu interval)) is correct, although I
> think we should add '1' to the interval to account for sampling phase
> errors in the monotonic clock. As that's measured in ns, and I'm
> currently getting values in the µs range, that's a small error in
> comparison.
>

You've got me almost convinced as well.  Thanks for the diagrams!  I think
instead of adding 1 perhaps what we want is

max2(sample_interval_ns, gpu_tick_ns + monotonic_tick_ns)

Where monotonic_tick_ns is maybe as low as 1.  Am I following you correctly?

--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 107765] [regression] Batman Arkham City crashes with DXVK under wine

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107765

--- Comment #10 from Samuel Pitoiset  ---
Okay, so apparently Batman needs itoi with R32G32B32 too. I thought only btoi
as needed by that game. I will implement it.

Can you explain how to reproduce the problem in-game?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 107765] [regression] Batman Arkham City crashes with DXVK under wine

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107765

--- Comment #9 from farmboy0+freedesk...@googlemail.com ---
Unhandled exception: assertion failed in 32-bit code (0xf7feeb09).
Register dump:
 CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b
 EIP:f7feeb09 ESP:12aef0d0 EBP:12aef0fc EFLAGS:0246(   - --  I  Z- -P- )
 EAX: EBX:0002 ECX:12aef0fc EDX:
 ESI:0008 EDI:
Stack dump:
0x12aef0d0:  12aef0fc  12aef0fc f7bd956a
0x12aef0e0:  f7fe9fcc d4c00018  0006
0x12aef0f0:  f7baefb0 07188f72 12aef120 
0x12aef100:   f7038890 12aef2ac dd4d6f00
0x12aef110:  12aef17c 12aef2ac f7c2bd39 f7c2821b
0x12aef120:  e778c100 d4c00018 0001 d4c88d10
Backtrace:
=>0 0xf7feeb09 __kernel_vsyscall+0x9() in [vdso].so (0x12aef0fc)
  1 0xf7bd956a gsignal+0xc9() in libc.so.6 (0x12aef0fc)
  2 0xf7bdafcf abort+0x15e() in libc.so.6 (0xf7d337de)
  3 0xf7bd07ed in libc.so.6 (+0x257ec) (0xf7d337de)
  4 0xf7bd0887 __assert_fail+0x56() in libc.so.6 (0x01e6)
  5 0xe5b980d1 in libvulkan_radeon.so (+0x7f0d0) (0x0004)
  6 0xe5b9ed1d in libvulkan_radeon.so (+0x85d1c) (0x12aef8c8)
  7 0xe5b6598f in libvulkan_radeon.so (+0x4c98e) (0x12aef8c8)
  8 0xe5b82179 in libvulkan_radeon.so (+0x69178) (0xd4c63860)
  9 0xf0e55424 wine_vkCmdCopyImage+0x73(commandBuffer=, srcImage=, srcImageLayout=, dstImage=,
dstImageLayout=, regionCount=, pRegions=)
[/mnt/work/Repositories/wine/dlls/winevulkan/vulkan_thunks.c:1400] in
winevulkan (0x12aefce8)
  10 0x6a5afd9a in d3d11 (+0x6fd99) (0x12aefde8)
  11 0x6a545c4c in d3d11 (+0x5c4b) (0x12aefea8)
  12 0x6a6158c8 in d3d11 (+0xd58c7) (0x12aefeec)
  13 0x7bc852c9 call_thread_func+0xf8()
[/mnt/work/Repositories/wine/dlls/ntdll/signal_i386.c:2654] in ntdll
(0x12aeffdc)
  14 0x7bc81d56 call_thread_entry+0x9() in ntdll (0x12aeffec)
0xf7feeb09 __kernel_vsyscall+0x9 in [vdso].so: popl %ebp

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v4]

2018-10-16 Thread Keith Packard
Jason Ekstrand  writes:

> I think what Bas is getting at is that there are two problems:
>
>  1) We are not sampling at exactly the same time
>  2) The two clocks may not tick at exactly the same time.

Thanks for the clarification.

> If we want to be conservative, I suspect Bas may be right that adding is
> the safer thing to do.

Yes, it's certainly safe to increase the value of
maxDeviation. Currently, the time it takes to sample all of the clocks
is far larger than the GPU tick, so adding that in would not have a huge
impact on the value returned to the application.

I'd like to dig in a little further and actually understand if the
current computation (which is derived directly from the Vulkan spec) is
wrong, and if so, whether the spec needs to be adjusted.

I think the question is what 'maxDeviation' is supposed to
represent. All the spec says is:

 * pMaxDeviation is a pointer to a 64-bit unsigned integer value in
   which the strictly positive maximum deviation, in nanoseconds, of the
   calibrated timestamp values is returned.

I interpret this as the maximum error in sampling the individual clocks,
which is to say that the clock values are guaranteed to have been
sampled within this interval of each other.

So, if we have a monotonic clock and GPU clock:

  0 1 2 3 4 5 6 7 8 9 a b c d e f
Monotonic -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-

  0 1 2 3
GPU   -_-_-_-_


gpu_period in this case is 5 ticks of the monotonic clock.

Now, I perform three operations:

start = read(monotonic)
gpu   = read(GPU)
end   = read(monotonic)

Let's say that:

start = 2
GPU = 1 * 5 = 5 monotonic equivalent ticks
end = b

So, the question is only how large the error between GPU and start could
possibly be. Certainly the GPU clock was sampled some time between
when monotonic tick 2 started and monotonic tick b ended. But, we have
no idea what phase the GPU clock was in when sampled.

Let's imagine we manage to sample the GPU clock immediately after the
first monotonic sample. I'll shift the offset of the monotonic and GPU
clocks to retain the same values (start = 2, GPU = 1), but now
the GPU clock is being sampled immediately after monotonic time 2:

w x y z 0 1 2 3 4 5 6 7 8 9 a b c d e f
Monotonic   -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-

  0 1 2 3
GPU   -_-_-_-_


In this case, the GPU tick started at monotonic time y, nearly 5
monotonic ticks earlier than the measured monotonic time, so the
deviation between GPU and monotonic would be 5 ticks.

If we sample the GPU clock immediately before the second monotonic
sample, then that GPU tick either starts earlier than the range, in
which case the above evaluation holds, or the GPU tick is entirely
contained within the range:

  0 1 2 3 4 5 6 7 8 9 a b c d e f
Monotonic -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-

   z 0 1 2 3
GPU  __-_-_-_-_-

In this case, the deviation between the first monotonic sample (that
returned to the application as the monotonic time) and the GPU sample is
the whole interval of measurement (b - 2).

I think I've just managed to convince myself that Jason's first
suggestion (max2(sample interval, gpu interval)) is correct, although I
think we should add '1' to the interval to account for sampling phase
errors in the monotonic clock. As that's measured in ns, and I'm
currently getting values in the µs range, that's a small error in
comparison.

-- 
-keith


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 107765] [regression] Batman Arkham City crashes with DXVK under wine

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107765

--- Comment #8 from Samuel Pitoiset  ---
Can you paste a backtrace please?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] radv: adjust the VGT workaround for prim restart on GFX9

2018-10-16 Thread Marek Olšák
On Thu, Oct 11, 2018 at 4:43 AM Samuel Pitoiset
 wrote:
>
> WD_SWITCH_ON_EOP seems to be the only workaround that fixes
> the GPU hangs with Yakuza and The Evil Within on Vega. I don't
> like as it might decrease geometry performance as pointed out
> by Marek, but I don't know how to implement a better one.
>
> Cc: mesa-sta...@lists.freedesktop.org
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_pipeline.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
> index 426b417e172..2256b2c58e9 100644
> --- a/src/amd/vulkan/radv_pipeline.c
> +++ b/src/amd/vulkan/radv_pipeline.c
> @@ -3412,14 +3412,23 @@ radv_compute_ia_multi_vgt_param_helpers(struct 
> radv_pipeline *pipeline,
> }
>
> /* Workaround for a VGT hang when strip primitive types are used with
> -* primitive restart.
> +* primitive restart. This fixes GPU hangs with Yakuza and The Evil
> +* Within, at least. Not sure if we can implement a better workaround.
>  */
> if (pipeline->graphics.prim_restart_enable &&
> (prim == V_008958_DI_PT_LINESTRIP ||
>  prim == V_008958_DI_PT_TRISTRIP ||
>  prim == V_008958_DI_PT_LINESTRIP_ADJ ||
>  prim == V_008958_DI_PT_TRISTRIP_ADJ)) {

Adjacency primitive types should already have wd_switch_on_eop set to true.

The workaround I'm going to use is:

if (!wd_switch_on_eop && key->u.primitive_restart)
partial_vs_wave = true;

Our docs say we should do this. I don't know why it still hangs with
this workaround.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108355] Civilization VI - Artifacts in mouse cursor

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108355

--- Comment #5 from Hadrien Nilsson  ---
I tried to run the game under a Wayland session instead of X11 and the mouse
cursor is normal.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108355] Civilization VI - Artifacts in mouse cursor

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108355

--- Comment #4 from Hadrien Nilsson  ---
Created attachment 142059
  --> https://bugs.freedesktop.org/attachment.cgi?id=142059=edit
Xorg log after reboot and started briefly the game in windowed mode

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108355] Civilization VI - Artifacts in mouse cursor

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108355

--- Comment #3 from Hadrien Nilsson  ---
Created attachment 142058
  --> https://bugs.freedesktop.org/attachment.cgi?id=142058=edit
dmesg after reboot

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108353] Request: Control Center for AMD GPU

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108353

--- Comment #3 from Ahmed Elsayed  ---
I hope that will be done one day.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/2] anv: More fixes for dispatch enable forcing

2018-10-16 Thread Jason Ekstrand
Due to some idiosyncrasies of the dispatch logic on gen8+ hardware, we
have to manually force the fragment shader to actually be dispatched when a
shader has side effects or uses discard and color writes are disabled.

Historically, we have done this with 3DSTATE_PS::PixelShaderHasUAV which
enforces thread dispatch and also enables some additional coherency between
shader stages that's demanded by D3D.  Unfortunately, this occasionally
causes hangs on gen8 for unknown reasons.  About four months ago, we
stopped setting 3DSTATE_PS::PixelShaderHasUAV in favor of using the
3DSTATE_WM::ForceThreadDispatchEnable flag.  This solved the issue on
Broadwell but caused problems on Skylake.  Today, I pushed 0fa9e6d7b304f
which makes us use 3DSTATE_PS::PixelShaderHasUAV on Skylake and above and
3DSTATE_WM::ForceThreadDispatchEnable flag on Broadwell only.  This gets
rid of all known hangs but isn't a terribly satisfactory solution.

While searching through my inbox looking for this stuff today, I came
across this e-mail from Francisco:

https://lists.freedesktop.org/archives/mesa-dev/2017-February/144269.html

He suggested in that mail that setting ForceDispatchEnable may not be safe
because it forces WM thread dispatch even for HiZ ops where it is normally
disabled.  So today, I decided to give that a go (patch 1 below) and it
seems to fix the hangs we were seeing in Dota 2 on Skylake when using
3DSTATE_WM::ForceThreadDispatchEnable.  I suspect that this is the more
proper fix so I'd like to revert the fix which splits things across gens
and instead us 3DSTATE_WM::ForceThreadDispatchEnable everywhere and make
sure that it's set to normal for HiZ ops.

Cc: Kenneth Graunke 

Jason Ekstrand (2):
  blorp: Emit a dummy 3DSTATE_WM prior to 3DSTATE_WM_HZ_OP
  Revert "anv/skylake: disable ForceThreadDispatchEnable"

 src/intel/blorp/blorp_genX_exec.h |  9 +++
 src/intel/vulkan/genX_pipeline.c  | 42 ++-
 2 files changed, 16 insertions(+), 35 deletions(-)

-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] blorp: Emit a dummy 3DSTATE_WM prior to 3DSTATE_WM_HZ_OP

2018-10-16 Thread Jason Ekstrand
Suggested-by: Francisco Jerez 
---
 src/intel/blorp/blorp_genX_exec.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/intel/blorp/blorp_genX_exec.h 
b/src/intel/blorp/blorp_genX_exec.h
index 50341ab0ecf..30025cf4deb 100644
--- a/src/intel/blorp/blorp_genX_exec.h
+++ b/src/intel/blorp/blorp_genX_exec.h
@@ -1628,6 +1628,15 @@ blorp_emit_gen8_hiz_op(struct blorp_batch *batch,
 */
blorp_emit_3dstate_multisample(batch, params);
 
+   /* According to the SKL PRM formula for WM_INT::ThreadDispatchEnable, the
+* 3DSTATE_WM::ForceThreadDispatchEnable field can force WM thread dispatch
+* even when WM_HZ_OP is active.  However, WM thread dispatch is normal
+* disabled for HiZ ops and it appears that force-enabling it can lead to
+* GPU hangs on at least Skylake.  Since we don't know the current state of
+* the 3DSTATE_WM packet, just emit a dummy one prior to 3DSTATE_WM_HZ_OP.
+*/
+   blorp_emit(batch, GENX(3DSTATE_WM), wm);
+
/* If we can't alter the depth stencil config and multiple layers are
 * involved, the HiZ op will fail. This is because the op requires that a
 * new config is emitted for each additional layer.
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] Revert "anv/skylake: disable ForceThreadDispatchEnable"

2018-10-16 Thread Jason Ekstrand
This reverts commit 0fa9e6d7b304f6a8064ed78a4b9c557e1026e7e5.  The real
issue appears to have been that HiZ ops don't like having WM thread
dispatch force-enabled.  The previous commit fixes that problem so we
can go back to using the ForceThreadDispatchEnable bit even on SKL+.
---
 src/intel/vulkan/genX_pipeline.c | 42 ++--
 1 file changed, 7 insertions(+), 35 deletions(-)

diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
index 33f1f7832ac..9595a7133ae 100644
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -1445,12 +1445,12 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct 
anv_subpass *subpass,
 wm.EarlyDepthStencilControl = EDSC_NORMAL;
  }
 
-#if GEN_GEN == 8
- /* Gen8 and later hardware tries to compute ThreadDispatchEnable for
-  * us but doesn't take into account KillPixels when no depth or
-  * stencil writes are enabled.  In order for occlusion queries to
-  * work correctly with no attachments, we need to force-enable PS
-  * thread dispatch.
+#if GEN_GEN >= 8
+ /* Gen8 hardware tries to compute ThreadDispatchEnable for us but
+  * doesn't take into account KillPixels when no depth or stencil
+  * writes are enabled.  In order for occlusion queries to work
+  * correctly with no attachments, we need to force-enable PS thread
+  * dispatch.
   *
   * The BDW docs are pretty clear that that this bit isn't validated
   * and probably shouldn't be used in production:
@@ -1460,9 +1460,7 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct 
anv_subpass *subpass,
   *
   * Unfortunately, however, the other mechanism we have for doing this
   * is 3DSTATE_PS_EXTRA::PixelShaderHasUAV which causes hangs on BDW.
-  * Given two bad options, we choose the one which works.  On Skylake
-  * and later, setting ForceThreadDispatchEnable causes GPU hangs so
-  * we use the PixelShaderHasUAV mechanism there.
+  * Given two bad options, we choose the one which works.
   */
  if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
  !has_color_buffer_write_enabled(pipeline, blend))
@@ -1665,32 +1663,6 @@ emit_3dstate_ps_extra(struct anv_pipeline *pipeline,
  wm_prog_data->uses_kill;
 
 #if GEN_GEN >= 9
-  /* Gen8 and later hardware tries to compute ThreadDispatchEnable for us
-   * but doesn't take into account KillPixels when no depth or stencil
-   * writes are enabled.  In order for occlusion queries to work correctly
-   * with no attachments, we need to force-enable PS thread dispatch.
-   *
-   * The stricter cross-primitive coherency guarantees that the hardware
-   * gives us with the "Accesses UAV" bit set for at least one shader stage
-   * and the "UAV coherency required" bit set on the 3DPRIMITIVE command 
are
-   * redundant within the current image, atomic counter and SSBO GL and
-   * Vulkan APIs, which all have very loose ordering and coherency
-   * requirements and generally rely on the application to insert explicit
-   * barriers when a shader invocation is expected to see the memory
-   * writes performed by the invocations of some previous primitive.
-   * Regardless of the value of "UAV coherency required", the "Accesses
-   * UAV" bits will implicitly cause an in most cases useless DC flush
-   * when the lowermost stage with the bit set finishes execution.
-   *
-   * Unfortunately, however, the other mechanism we have for doing this is
-   * 3DSTATE_WM::ForceThreadDispatchEnable which causes GPU hangs on
-   * Skylake and later hardware.  On Broadwell, however, setting this bit
-   * causes GPU hangs so we use ForceThreadDispatchEnable there.
-   */
-  if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
-  !has_color_buffer_write_enabled(pipeline, blend))
- ps.PixelShaderHasUAV = true;
-
   ps.PixelShaderComputesStencil = wm_prog_data->computed_stencil;
   ps.PixelShaderPullsBary= wm_prog_data->pulls_bary;
 
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v4]

2018-10-16 Thread Jason Ekstrand
On Tue, Oct 16, 2018 at 2:35 PM Keith Packard  wrote:

> Bas Nieuwenhuizen  writes:
>
> >> +   end = radv_clock_gettime(CLOCK_MONOTONIC_RAW);
> >> +
> >> +   uint64_t clock_period = end - begin;
> >> +   uint64_t device_period = DIV_ROUND_UP(100,
> clock_crystal_freq);
> >> +
> >> +   *pMaxDeviation = MAX2(clock_period, device_period);
> >
> > Should this not be a sum? Those deviations can happen independently
> > from each other, so worst case both deviations happen in the same
> > direction which causes the magnitude to be combined.
>
> This use of MAX2 comes right from one of the issues raised during work
> on the extension:
>
>  8) Can the maximum deviation reported ever be zero?
>
>  RESOLVED: Unless the tick of each clock corresponding to the set of
>  time domains coincides and all clocks can literally be sampled
>  simutaneously, there isn’t really a possibility for the maximum
>  deviation to be zero, so by convention the maximum deviation is always
>  at least the maximum of the length of the ticks of the set of time
>  domains calibrated and thus can never be zero.
>
> I can't wrap my brain around this entirely, but I think that this says
> that the deviation reported is supposed to only reflect the fact that we
> aren't sampling the clocks at the same time, and so there may be a
> 'tick' of error for any sampled clock.
>
> If you look at the previous issue in the spec, that actually has the
> pseudo code I used in this implementation for computing maxDeviation
> which doesn't include anything about the time period of the GPU.
>
> Jason suggested using the GPU period as the minimum value for
> maxDeviation at the GPU time period to make sure we never accidentally
> returned zero, as that is forbidden by the spec. We might be able to use
> 1 instead, but it won't matter in practice as the time it takes to
> actually sample all of the clocks is far longer than a GPU tick.
>

I think what Bas is getting at is that there are two problems:

 1) We are not sampling at exactly the same time
 2) The two clocks may not tick at exactly the same time.

Even if I can simultaneously sample the CPU and GPU clocks, their
oscilators are not aligned and I my sample may be at the begining of the
CPU tick and the end of the GPU tick.  If I had sampled 75ns earlier, I
could have gotten lower CPU time but the same GPU time (most intel GPUs
have about an 80ns tick).

If we want to be conservative, I suspect Bas may be right that adding is
the safer thing to do.

--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v4]

2018-10-16 Thread Keith Packard
Bas Nieuwenhuizen  writes:

>> +   end = radv_clock_gettime(CLOCK_MONOTONIC_RAW);
>> +
>> +   uint64_t clock_period = end - begin;
>> +   uint64_t device_period = DIV_ROUND_UP(100, clock_crystal_freq);
>> +
>> +   *pMaxDeviation = MAX2(clock_period, device_period);
>
> Should this not be a sum? Those deviations can happen independently
> from each other, so worst case both deviations happen in the same
> direction which causes the magnitude to be combined.

This use of MAX2 comes right from one of the issues raised during work
on the extension:

 8) Can the maximum deviation reported ever be zero?

 RESOLVED: Unless the tick of each clock corresponding to the set of
 time domains coincides and all clocks can literally be sampled
 simutaneously, there isn’t really a possibility for the maximum
 deviation to be zero, so by convention the maximum deviation is always
 at least the maximum of the length of the ticks of the set of time
 domains calibrated and thus can never be zero.

I can't wrap my brain around this entirely, but I think that this says
that the deviation reported is supposed to only reflect the fact that we
aren't sampling the clocks at the same time, and so there may be a
'tick' of error for any sampled clock.

If you look at the previous issue in the spec, that actually has the
pseudo code I used in this implementation for computing maxDeviation
which doesn't include anything about the time period of the GPU.

Jason suggested using the GPU period as the minimum value for
maxDeviation at the GPU time period to make sure we never accidentally
returned zero, as that is forbidden by the spec. We might be able to use
1 instead, but it won't matter in practice as the time it takes to
actually sample all of the clocks is far longer than a GPU tick.

-- 
-keith


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 107765] [regression] Batman Arkham City crashes with DXVK under wine

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107765

farmboy0+freedesk...@googlemail.com changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #7 from farmboy0+freedesk...@googlemail.com ---
I am still getting the same crash with mesa-git
0fa9e6d7b304f6a8064ed78a4b9c557e1026e7e5

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/compiler/icl: Use invocation id bits 22:16 instead of 23:17

2018-10-16 Thread Anuj Phogat
On Tue, Oct 16, 2018 at 4:21 AM Topi Pohjolainen
 wrote:
>
> Identifier bits in the dispatch header have changed. See Bspec:
>
> SINGLE_PATCH Payload:
>
> 3D Pipeline Stages - 3D Pipeline Geometry -
> Hull Shader (HS) Stage IVB+ - Payloads IVB+
>
> Fixes: 
> KHR-GL46.tessellation_shader.tessellation_shader_tc_barriers.barrier_guarded_read_write_calls
>
> CC: Anuj Phogat 
> CC: Mark Janes 
> Signed-off-by: Topi Pohjolainen 
> ---
>  src/intel/compiler/brw_fs.cpp | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 23a25fedca5..757147b01ec 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -6593,14 +6593,18 @@ fs_visitor::run_tcs_single_patch()
> if (tcs_prog_data->instances == 1) {
>invocation_id = channels_ud;
> } else {
> +  const unsigned invocation_id_mask = devinfo->gen >= 11 ?
> + INTEL_MASK(22, 16) : INTEL_MASK(23, 17);
> +  const unsigned invocation_id_shift = devinfo->gen >= 11 ? 16 : 17;
> +
>invocation_id = bld.vgrf(BRW_REGISTER_TYPE_UD);
>
>/* Get instance number from g0.2 bits 23:17, and multiply it by 8. */
>fs_reg t = bld.vgrf(BRW_REGISTER_TYPE_UD);
>fs_reg instance_times_8 = bld.vgrf(BRW_REGISTER_TYPE_UD);
>bld.AND(t, fs_reg(retype(brw_vec1_grf(0, 2), BRW_REGISTER_TYPE_UD)),
> -  brw_imm_ud(INTEL_MASK(23, 17)));
> -  bld.SHR(instance_times_8, t, brw_imm_ud(17 - 3));
> +  brw_imm_ud(invocation_id_mask));
> +  bld.SHR(instance_times_8, t, brw_imm_ud(invocation_id_shift - 3));
>
>bld.ADD(invocation_id, instance_times_8, channels_ud);
> }
> --
> 2.17.1
>

Reviewed-by: Anuj Phogat 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] anv/skylake: disable ForceThreadDispatchEnable

2018-10-16 Thread Jason Ekstrand
I've updated the comments a bit and pushed to master.  Thanks for all your
debugging!

On Wed, Sep 19, 2018 at 11:21 AM Sergii Romantsov <
sergii.romant...@gmail.com> wrote:

> On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang.
>
> -v2: enabling of  ForceThreadDispatchEnable is only for gen8, for
>  gen9 and higher reverted enabling of PixelShaderHasUAV.
>
> CC: Jason Ekstrand 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107941
> Fixes: 79270d2140ec (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV)
> Signed-off-by: Sergii Romantsov 
> ---
>  src/intel/vulkan/genX_pipeline.c | 33 -
>  1 file changed, 32 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/vulkan/genX_pipeline.c
> b/src/intel/vulkan/genX_pipeline.c
> index 9595a71..b469270 100644
> --- a/src/intel/vulkan/genX_pipeline.c
> +++ b/src/intel/vulkan/genX_pipeline.c
> @@ -1445,7 +1445,7 @@ emit_3dstate_wm(struct anv_pipeline *pipeline,
> struct anv_subpass *subpass,
>  wm.EarlyDepthStencilControl = EDSC_NORMAL;
>   }
>
> -#if GEN_GEN >= 8
> +#if GEN_GEN == 8
>   /* Gen8 hardware tries to compute ThreadDispatchEnable for us but
>* doesn't take into account KillPixels when no depth or stencil
>* writes are enabled.  In order for occlusion queries to work
> @@ -1663,6 +1663,37 @@ emit_3dstate_ps_extra(struct anv_pipeline *pipeline,
>   wm_prog_data->uses_kill;
>
>  #if GEN_GEN >= 9
> +  /* The stricter cross-primitive coherency guarantees that the
> hardware
> +   * gives us with the "Accesses UAV" bit set for at least one shader
> stage
> +   * and the "UAV coherency required" bit set on the 3DPRIMITIVE
> command are
> +   * redundant within the current image, atomic counter and SSBO GL
> APIs,
> +   * which all have very loose ordering and coherency requirements and
> +   * generally rely on the application to insert explicit barriers
> when a
> +   * shader invocation is expected to see the memory writes performed
> by the
> +   * invocations of some previous primitive.  Regardless of the value
> of
> +   * "UAV coherency required", the "Accesses UAV" bits will
> implicitly cause
> +   * an in most cases useless DC flush when the lowermost stage with
> the bit
> +   * set finishes execution.
> +   *
> +   * It would be nice to disable it, but in some cases we can't
> because on
> +   * Gen8+ it also has an influence on rasterization via the PS
> UAV-only
> +   * signal (which could be set independently from the coherency
> mechanism
> +   * in the 3DSTATE_WM command on Gen7), and because in some cases it
> will
> +   * determine whether the hardware skips execution of the fragment
> shader
> +   * or not via the ThreadDispatchEnable signal.  However if we know
> that
> +   * GEN8_PS_BLEND_HAS_WRITEABLE_RT is going to be set and
> +   * GEN8_PSX_PIXEL_SHADER_NO_RT_WRITE is not set it shouldn't make
> any
> +   * difference so we may just disable it here.
> +   *
> +   * Gen8 hardware tries to compute ThreadDispatchEnable for us but
> doesn't
> +   * take into account KillPixels when no depth or stencil writes are
> +   * enabled. In order for occlusion queries to work correctly with no
> +   * attachments, we need to force-enable here.
> +   */
> +  if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
> +  !has_color_buffer_write_enabled(pipeline, blend))
> + ps.PixelShaderHasUAV = true;
> +
>ps.PixelShaderComputesStencil = wm_prog_data->computed_stencil;
>ps.PixelShaderPullsBary= wm_prog_data->pulls_bary;
>
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr/rast: fix intrinsic/function for LLVM 7 compatibility

2018-10-16 Thread Chuck Atkins
Tested-by: Chuck Atkins 

On Tue, Oct 16, 2018 at 8:51 AM Cherniak, Bruce 
wrote:

> Reviewed-by: Bruce Cherniak 
>
> > On Oct 15, 2018, at 9:53 AM, Alok Hota  wrote:
> >
> > Converted from x86 VFMADDPS intrinsic to generic LLVM intrinsic, and
> > removed createInstructionSimplifierPass, which were both removed in LLVM
> > 7.0.0
> >
> > These changes combine patches we received from the community and our own
> > internal patches
> > ---
> > .../swr/rasterizer/codegen/gen_llvm_ir_macros.py  |  2 +-
> > .../drivers/swr/rasterizer/jitter/blend_jit.cpp   |  1 -
> > .../drivers/swr/rasterizer/jitter/builder_misc.cpp| 11 ++-
> > .../drivers/swr/rasterizer/jitter/fetch_jit.cpp   |  1 -
> > .../rasterizer/jitter/functionpasses/lower_x86.cpp|  1 -
> > .../drivers/swr/rasterizer/jitter/streamout_jit.cpp   |  1 -
> > 6 files changed, 3 insertions(+), 14 deletions(-)
> >
> > diff --git
> a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
> b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
> > index 2e7f1a88a0..d34e88d1bc 100644
> > --- a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
> > +++ b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
> > @@ -57,7 +57,6 @@ intrinsics = [
> > ['VHSUBPS', ['a', 'b'], 'a'],
> > ['VPTESTC', ['a', 'b'], 'mInt32Ty'],
> > ['VPTESTZ', ['a', 'b'], 'mInt32Ty'],
> > -['VFMADDPS',['a', 'b', 'c'], 'a'],
> > ['VPHADDD', ['a', 'b'], 'a'],
> > ['PDEP32',  ['a', 'b'], 'a'],
> > ['RDTSC',   [], 'mInt64Ty'],
> > @@ -71,6 +70,7 @@ llvm_intrinsics = [
> > ['STACKRESTORE', 'stackrestore', ['a'], []],
> > ['VMINPS', 'minnum', ['a', 'b'], ['a']],
> > ['VMAXPS', 'maxnum', ['a', 'b'], ['a']],
> > +['VFMADDPS', 'fmuladd', ['a', 'b', 'c'], ['a']],
> > ['DEBUGTRAP', 'debugtrap', [], []],
> > ['POPCNT', 'ctpop', ['a'], ['a']],
> > ['LOG2', 'log2', ['a'], ['a']],
> > diff --git a/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
> b/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
> > index f89c502db7..d5328c8e4e 100644
> > --- a/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
> > +++ b/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
> > @@ -870,7 +870,6 @@ struct BlendJit : public Builder
> > passes.add(createCFGSimplificationPass());
> > passes.add(createEarlyCSEPass());
> > passes.add(createInstructionCombiningPass());
> > -passes.add(createInstructionSimplifierPass());
> > passes.add(createConstantPropagationPass());
> > passes.add(createSCCPPass());
> > passes.add(createAggressiveDCEPass());
> > diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
> b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
> > index 4116dad443..26d8688f5e 100644
> > --- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
> > +++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
> > @@ -755,15 +755,8 @@ namespace SwrJit
> > Value* Builder::FMADDPS(Value* a, Value* b, Value* c)
> > {
> > Value* vOut;
> > -// use FMADs if available
> > -if (JM()->mArch.AVX2())
> > -{
> > -vOut = VFMADDPS(a, b, c);
> > -}
> > -else
> > -{
> > -vOut = FADD(FMUL(a, b), c);
> > -}
> > +// This maps to LLVM fmuladd intrinsic
> > +vOut = VFMADDPS(a, b, c);
> > return vOut;
> > }
> >
> > diff --git a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
> b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
> > index b4d326ebdc..3ad0fabe81 100644
> > --- a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
> > +++ b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
> > @@ -294,7 +294,6 @@ Function* FetchJit::Create(const
> FETCH_COMPILE_STATE& fetchState)
> > optPasses.add(createCFGSimplificationPass());
> > optPasses.add(createEarlyCSEPass());
> > optPasses.add(createInstructionCombiningPass());
> > -optPasses.add(createInstructionSimplifierPass());
> > optPasses.add(createConstantPropagationPass());
> > optPasses.add(createSCCPPass());
> > optPasses.add(createAggressiveDCEPass());
> > diff --git
> a/src/gallium/drivers/swr/rasterizer/jitter/functionpasses/lower_x86.cpp
> b/src/gallium/drivers/swr/rasterizer/jitter/functionpasses/lower_x86.cpp
> > index 7605823c04..c34959d35e 100644
> > ---
> a/src/gallium/drivers/swr/rasterizer/jitter/functionpasses/lower_x86.cpp
> > +++
> b/src/gallium/drivers/swr/rasterizer/jitter/functionpasses/lower_x86.cpp
> > @@ -76,7 +76,6 @@ namespace SwrJit
> > {"meta.intrinsic.VCVTPS2PH", Intrinsic::x86_vcvtps2ph_256},
> > {"meta.intrinsic.VPTESTC", Intrinsic::x86_avx_ptestc_256},
> > {"meta.intrinsic.VPTESTZ", Intrinsic::x86_avx_ptestz_256},
> > -{"meta.intrinsic.VFMADDPS", Intrinsic::x86_fma_vfmadd_ps_256},
> >   

Re: [Mesa-dev] [PATCH] softpipe: dynamically allocate space for immediate constants

2018-10-16 Thread Roland Scheidegger
Looks reasonable to me.
Reviewed-by: Roland Scheidegger 


Am 16.10.18 um 10:07 schrieb Gert Wollny:
> From: Gert Wollny 
> 
> The number of immediate constants was fixed and the size check was
> only done by means of an assertion. Given this a shader that emits
> more immediate constants would result in a memory corruption when
> mesa is build in release mode.
> 
> Instead of using this fixed limit allocate the space dynamically, let it 
> grow as needed, and also remove the unused ImmArray.
> 
> Fixes: dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.1
> 
> Signed-off-by: Gert Wollny 
> ---
>  src/gallium/auxiliary/tgsi/tgsi_exec.c | 13 -
>  src/gallium/auxiliary/tgsi/tgsi_exec.h |  7 +++
>  2 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
> b/src/gallium/auxiliary/tgsi/tgsi_exec.c
> index 59194ebe31..5db515a075 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
> @@ -1223,7 +1223,17 @@ tgsi_exec_machine_bind_shader(
>   {
>  uint size = parse.FullToken.FullImmediate.Immediate.NrTokens - 1;
>  assert( size <= 4 );
> -assert( mach->ImmLimit + 1 <= TGSI_EXEC_NUM_IMMEDIATES );
> +if (mach->ImmLimit >= mach->ImmsReserved) {
> +   unsigned newReserved = mach->ImmsReserved ? 2 * 
> mach->ImmsReserved : 128;
> +   float4 *imms = REALLOC(mach->Imms, mach->ImmsReserved, 
> newReserved * sizeof(float4));
> +   if (imms) {
> +  mach->ImmsReserved = newReserved;
> +  mach->Imms = imms;
> +   } else {
> +  debug_printf("Unable to (re)allocate space for immidiate 
> constants\n");
> +  break;
> +   }
> +}
>  
>  for( i = 0; i < size; i++ ) {
> mach->Imms[mach->ImmLimit][i] = 
> @@ -1337,6 +1347,7 @@ tgsi_exec_machine_destroy(struct tgsi_exec_machine 
> *mach)
> if (mach) {
>FREE(mach->Instructions);
>FREE(mach->Declarations);
> +  FREE(mach->Imms);
>  
>align_free(mach->Inputs);
>align_free(mach->Outputs);
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.h 
> b/src/gallium/auxiliary/tgsi/tgsi_exec.h
> index ed8b9e8869..6d4ac38142 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_exec.h
> +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.h
> @@ -231,7 +231,6 @@ struct tgsi_sampler
>  };
>  
>  #define TGSI_EXEC_NUM_TEMPS   4096
> -#define TGSI_EXEC_NUM_IMMEDIATES  256
>  
>  /*
>   * Locations of various utility registers (_I = Index, _C = Channel)
> @@ -341,6 +340,7 @@ enum tgsi_break_type {
>  
>  #define TGSI_EXEC_MAX_BREAK_STACK (TGSI_EXEC_MAX_LOOP_NESTING + 
> TGSI_EXEC_MAX_SWITCH_NESTING)
>  
> +typedef float float4[4];
>  
>  /**
>   * Run-time virtual machine state for executing TGSI shader.
> @@ -352,9 +352,8 @@ struct tgsi_exec_machine
> struct tgsi_exec_vector   Temps[TGSI_EXEC_NUM_TEMPS +
> TGSI_EXEC_NUM_TEMP_EXTRAS];
>  
> -   float Imms[TGSI_EXEC_NUM_IMMEDIATES][4];
> -
> -   float ImmArray[TGSI_EXEC_NUM_IMMEDIATES][4];
> +   unsigned   ImmsReserved;
> +   float4 *Imms;
>  
> struct tgsi_exec_vector   *Inputs;
> struct tgsi_exec_vector   *Outputs;
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108457] [OpenGL CTS] KHR-GL46.tessellation_shader.single.xfb_captures_data_from_correct_stage fails

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108457

Bug ID: 108457
   Summary: [OpenGL CTS]
KHR-GL46.tessellation_shader.single.xfb_captures_data_
from_correct_stage fails
   Product: Mesa
   Version: unspecified
  Hardware: All
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: glsl-compiler
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: jmcasan...@igalia.com
QA Contact: intel-3d-b...@lists.freedesktop.org

With the landing of update f75a5cccd800bb011b44c799415efdfe547d6076 at OpenGL
CTS 4.6.0 
KHR-GL46.tessellation_shader.single.xfb_captures_data_from_correct_stage starts
failing.

commit f75a5cccd800bb011b44c799415efdfe547d6076
Author: Piers Daniell 
Date:   Tue Oct 2 11:51:25 2018 -0600

Use non-arrayed varying name for TCS blocks

This is a partial revert of CL 2625 to restore naming the
value member of the BLOCK_INOUT interface block as
"BLOCK_INOUT.value" rather than "BLOCK_INOUT[0].value".

Affects:

KHR-GL46.tessellation_shader.single.xfb_captures_data_from_correct_stage

Components: OpenGL

VK-GL-CTS issue: 1388

Change-Id: I9ef6453ec5465a0fa5561220cc9d7bfe54298416

Mesa populates the transform feedback candidates with the following names
because TCS interface block is arrayed:

BLOCK_INOUT[0].value
BLOCK_INOUT[1].value
BLOCK_INOUT[2].value
BLOCK_INOUT[3].value

As now the tests has been changed to use as XFB name BLOCK_INOUT.value the
variable isn't found and a linking error is raised.

More details about the motivations of the test modification are available at
Khronos Gitlab.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108382] st_framebuffer might leak on certain circumstances

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108382

--- Comment #3 from Emil Velikov  ---
>From a quick skim at the report - it does sound similar to the issue described
here [1]. Feel free to try the patch.

If it doesn't help please submit any patches as described here [2]


[1] https://patchwork.freedesktop.org/patch/218289/
[2] https://www.mesa3d.org/submittingpatches.html

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: Implement VK_EXT_pci_bus_info

2018-10-16 Thread Emil Velikov
On Sun, 14 Oct 2018 at 13:56, Jason Ekstrand  wrote:
>
> Here I was reveling in the triviality of my fixed-pci-path implementation
> and you had to show me up by implementing it properly. :-P
>
> Implementing it properly is a better plan because we know discrete is coming.
>
You're welcome, that's amongst the reasons why I've introduced drmDevice.
Even though hearing libdrm does makes us uneasy at times.

Fwiw:
Reviewed-by: Emil Velikov 

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] anv: Stop generating weak references for instance entrypoints

2018-10-16 Thread Jason Ekstrand
FYI, patch 1 is required for this patch to build.  It also means this patch 
found a nice little bug.  I'll respond to patch 1 in more detail after the 
SI call tomorrow.


--Jason


On October 16, 2018 06:49:35 Lionel Landwerlin 
 wrote:



Reviewed-by: Lionel Landwerlin 

On 15/10/2018 04:47, Jason Ekstrand wrote:

We don't need weak references to instance entrypoints because we never
have more than one of each so we don't need the NULL fall-back.  This
also helps us avoid forgetting things because we now get link errors for
missing instance entrypoints.
---
  src/intel/vulkan/anv_entrypoints_gen.py | 13 -
  1 file changed, 13 deletions(-)

diff --git a/src/intel/vulkan/anv_entrypoints_gen.py 
b/src/intel/vulkan/anv_entrypoints_gen.py

index beb658b8660..25a532fd706 100644
--- a/src/intel/vulkan/anv_entrypoints_gen.py
+++ b/src/intel/vulkan/anv_entrypoints_gen.py
@@ -227,19 +227,6 @@ ${strmap(device_strmap, 'device')}
   * either pick the correct entry point.
   */

-% for e in instance_entrypoints:
-  % if e.alias:
-<% continue %>
-  % endif
-  % if e.guard is not None:
-#ifdef ${e.guard}
-  % endif
-  ${e.return_type} ${e.prefixed_name('anv')}(${e.decl_params()}) 
__attribute__ ((weak));

-  % if e.guard is not None:
-#endif // ${e.guard}
-  % endif
-% endfor
-
  const struct anv_instance_dispatch_table anv_instance_dispatch_table = {
  % for e in instance_entrypoints:
% if e.guard is not None:



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr/rast: fix intrinsic/function for LLVM 7 compatibility

2018-10-16 Thread Cherniak, Bruce
Reviewed-by: Bruce Cherniak 

> On Oct 15, 2018, at 9:53 AM, Alok Hota  wrote:
> 
> Converted from x86 VFMADDPS intrinsic to generic LLVM intrinsic, and
> removed createInstructionSimplifierPass, which were both removed in LLVM
> 7.0.0
> 
> These changes combine patches we received from the community and our own
> internal patches
> ---
> .../swr/rasterizer/codegen/gen_llvm_ir_macros.py  |  2 +-
> .../drivers/swr/rasterizer/jitter/blend_jit.cpp   |  1 -
> .../drivers/swr/rasterizer/jitter/builder_misc.cpp| 11 ++-
> .../drivers/swr/rasterizer/jitter/fetch_jit.cpp   |  1 -
> .../rasterizer/jitter/functionpasses/lower_x86.cpp|  1 -
> .../drivers/swr/rasterizer/jitter/streamout_jit.cpp   |  1 -
> 6 files changed, 3 insertions(+), 14 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py 
> b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
> index 2e7f1a88a0..d34e88d1bc 100644
> --- a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
> +++ b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
> @@ -57,7 +57,6 @@ intrinsics = [
> ['VHSUBPS', ['a', 'b'], 'a'],
> ['VPTESTC', ['a', 'b'], 'mInt32Ty'],
> ['VPTESTZ', ['a', 'b'], 'mInt32Ty'],
> -['VFMADDPS',['a', 'b', 'c'], 'a'],
> ['VPHADDD', ['a', 'b'], 'a'],
> ['PDEP32',  ['a', 'b'], 'a'],
> ['RDTSC',   [], 'mInt64Ty'],
> @@ -71,6 +70,7 @@ llvm_intrinsics = [
> ['STACKRESTORE', 'stackrestore', ['a'], []],
> ['VMINPS', 'minnum', ['a', 'b'], ['a']],
> ['VMAXPS', 'maxnum', ['a', 'b'], ['a']],
> +['VFMADDPS', 'fmuladd', ['a', 'b', 'c'], ['a']],
> ['DEBUGTRAP', 'debugtrap', [], []],
> ['POPCNT', 'ctpop', ['a'], ['a']],
> ['LOG2', 'log2', ['a'], ['a']],
> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp 
> b/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
> index f89c502db7..d5328c8e4e 100644
> --- a/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
> @@ -870,7 +870,6 @@ struct BlendJit : public Builder
> passes.add(createCFGSimplificationPass());
> passes.add(createEarlyCSEPass());
> passes.add(createInstructionCombiningPass());
> -passes.add(createInstructionSimplifierPass());
> passes.add(createConstantPropagationPass());
> passes.add(createSCCPPass());
> passes.add(createAggressiveDCEPass());
> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp 
> b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
> index 4116dad443..26d8688f5e 100644
> --- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
> @@ -755,15 +755,8 @@ namespace SwrJit
> Value* Builder::FMADDPS(Value* a, Value* b, Value* c)
> {
> Value* vOut;
> -// use FMADs if available
> -if (JM()->mArch.AVX2())
> -{
> -vOut = VFMADDPS(a, b, c);
> -}
> -else
> -{
> -vOut = FADD(FMUL(a, b), c);
> -}
> +// This maps to LLVM fmuladd intrinsic
> +vOut = VFMADDPS(a, b, c);
> return vOut;
> }
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp 
> b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
> index b4d326ebdc..3ad0fabe81 100644
> --- a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
> @@ -294,7 +294,6 @@ Function* FetchJit::Create(const FETCH_COMPILE_STATE& 
> fetchState)
> optPasses.add(createCFGSimplificationPass());
> optPasses.add(createEarlyCSEPass());
> optPasses.add(createInstructionCombiningPass());
> -optPasses.add(createInstructionSimplifierPass());
> optPasses.add(createConstantPropagationPass());
> optPasses.add(createSCCPPass());
> optPasses.add(createAggressiveDCEPass());
> diff --git 
> a/src/gallium/drivers/swr/rasterizer/jitter/functionpasses/lower_x86.cpp 
> b/src/gallium/drivers/swr/rasterizer/jitter/functionpasses/lower_x86.cpp
> index 7605823c04..c34959d35e 100644
> --- a/src/gallium/drivers/swr/rasterizer/jitter/functionpasses/lower_x86.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/jitter/functionpasses/lower_x86.cpp
> @@ -76,7 +76,6 @@ namespace SwrJit
> {"meta.intrinsic.VCVTPS2PH", Intrinsic::x86_vcvtps2ph_256},
> {"meta.intrinsic.VPTESTC", Intrinsic::x86_avx_ptestc_256},
> {"meta.intrinsic.VPTESTZ", Intrinsic::x86_avx_ptestz_256},
> -{"meta.intrinsic.VFMADDPS", Intrinsic::x86_fma_vfmadd_ps_256},
> {"meta.intrinsic.VPHADDD", Intrinsic::x86_avx2_phadd_d},
> {"meta.intrinsic.PDEP32", Intrinsic::x86_bmi_pdep_32},
> {"meta.intrinsic.RDTSC", Intrinsic::x86_rdtsc},
> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/streamout_jit.cpp 
> 

Re: [Mesa-dev] [PATCH 2/3] anv: Stop generating weak references for instance entrypoints

2018-10-16 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

On 15/10/2018 04:47, Jason Ekstrand wrote:

We don't need weak references to instance entrypoints because we never
have more than one of each so we don't need the NULL fall-back.  This
also helps us avoid forgetting things because we now get link errors for
missing instance entrypoints.
---
  src/intel/vulkan/anv_entrypoints_gen.py | 13 -
  1 file changed, 13 deletions(-)

diff --git a/src/intel/vulkan/anv_entrypoints_gen.py 
b/src/intel/vulkan/anv_entrypoints_gen.py
index beb658b8660..25a532fd706 100644
--- a/src/intel/vulkan/anv_entrypoints_gen.py
+++ b/src/intel/vulkan/anv_entrypoints_gen.py
@@ -227,19 +227,6 @@ ${strmap(device_strmap, 'device')}
   * either pick the correct entry point.
   */
  
-% for e in instance_entrypoints:

-  % if e.alias:
-<% continue %>
-  % endif
-  % if e.guard is not None:
-#ifdef ${e.guard}
-  % endif
-  ${e.return_type} ${e.prefixed_name('anv')}(${e.decl_params()}) __attribute__ 
((weak));
-  % if e.guard is not None:
-#endif // ${e.guard}
-  % endif
-% endfor
-
  const struct anv_instance_dispatch_table anv_instance_dispatch_table = {
  % for e in instance_entrypoints:
% if e.guard is not None:



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] intel/compiler/icl: Use invocation id bits 22:16 instead of 23:17

2018-10-16 Thread Topi Pohjolainen
Identifier bits in the dispatch header have changed. See Bspec:

SINGLE_PATCH Payload:

3D Pipeline Stages - 3D Pipeline Geometry -
Hull Shader (HS) Stage IVB+ - Payloads IVB+

Fixes: 
KHR-GL46.tessellation_shader.tessellation_shader_tc_barriers.barrier_guarded_read_write_calls

CC: Anuj Phogat 
CC: Mark Janes 
Signed-off-by: Topi Pohjolainen 
---
 src/intel/compiler/brw_fs.cpp | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 23a25fedca5..757147b01ec 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -6593,14 +6593,18 @@ fs_visitor::run_tcs_single_patch()
if (tcs_prog_data->instances == 1) {
   invocation_id = channels_ud;
} else {
+  const unsigned invocation_id_mask = devinfo->gen >= 11 ?
+ INTEL_MASK(22, 16) : INTEL_MASK(23, 17);
+  const unsigned invocation_id_shift = devinfo->gen >= 11 ? 16 : 17;
+
   invocation_id = bld.vgrf(BRW_REGISTER_TYPE_UD);
 
   /* Get instance number from g0.2 bits 23:17, and multiply it by 8. */
   fs_reg t = bld.vgrf(BRW_REGISTER_TYPE_UD);
   fs_reg instance_times_8 = bld.vgrf(BRW_REGISTER_TYPE_UD);
   bld.AND(t, fs_reg(retype(brw_vec1_grf(0, 2), BRW_REGISTER_TYPE_UD)),
-  brw_imm_ud(INTEL_MASK(23, 17)));
-  bld.SHR(instance_times_8, t, brw_imm_ud(17 - 3));
+  brw_imm_ud(invocation_id_mask));
+  bld.SHR(instance_times_8, t, brw_imm_ud(invocation_id_shift - 3));
 
   bld.ADD(invocation_id, instance_times_8, channels_ud);
}
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108355] Civilization VI - Artifacts in mouse cursor

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108355

--- Comment #2 from Michel Dänzer  ---
Please attach the corresponding Xorg log file and output of dmesg.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108105] [DXVK] Dauntless Helmets rendering incorrectly on Vega, works in AMDVLK

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108105

--- Comment #12 from Samuel Pitoiset  ---
Well, I can launch the game, but I get a black screen after clicking on the
"Play" button. This is apparently a known issue with the tutorial. The only way
to solve it is to play the tutorial on a different machine [1] ...

If you want me to fix the rendering issue on Vega, please record an apitrace or
a renderdoc capture. If you need help, please ask! Thanks!

[1] https://www.youtube.com/watch?v=yQjbfXlU9So

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] meson: Add -Werror=return-type when supported.

2018-10-16 Thread Eric Engestrom
On Monday, 2018-10-15 18:53:40 -0700, Kenneth Graunke wrote:
> This warning detects non-void functions with a missing return statement,
> return statements with a value in void functions, and functions with an
> bogus return type that ends up defaulting to int.  It's already enabled
> by default with -Wall.  Generally, these are fairly serious bugs in the
> code, which developers would like to notice and fix immediately.  This
> patch promotes it from a warning to an error, to help developers catch
> such mistakes early.

Agreed.

> 
> I would not expect this warning to change much based on the compiler
> version, so hopefully it won't become a problem for packagers/builders.

That's always my worry when adding `-Werror`s, but I think you're right.

> 
> See the GCC documentation or 'man gcc' for more details:
> https://gcc.gnu.org/onlinedocs/gcc-7.3.0/gcc/Warning-Options.html#index-Wreturn-type
> ---
>  meson.build | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/meson.build b/meson.build
> index 002ce35a608..11e0ea2c08e 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -788,7 +788,8 @@ endif
>  # Check for generic C arguments
>  c_args = []
>  foreach a : ['-Wall', '-Werror=implicit-function-declaration',
> - '-Werror=missing-prototypes', '-fno-math-errno',
> + '-Werror=missing-prototypes', '-Werror=return-type',
> + '-fno-math-errno',

I think we should add this to C++ code as well (~20 lines below).

With C++ getting the same treatment:
Reviewed-by: Eric Engestrom 

(btw, we might want to add the same to configure.ac, but I'm not sure
(m)any devs still use it, I think it's mostly just old deployment/testing
systems, and packagers for some of the slow distros, so not the people
who would introduce these issues in the first place)

>   '-fno-trapping-math', '-Qunused-arguments']
>if cc.has_argument(a)
>  c_args += a
> -- 
> 2.19.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] softpipe: dynamically allocate space for immediate constants

2018-10-16 Thread Gert Wollny
From: Gert Wollny 

The number of immediate constants was fixed and the size check was
only done by means of an assertion. Given this a shader that emits
more immediate constants would result in a memory corruption when
mesa is build in release mode.

Instead of using this fixed limit allocate the space dynamically, let it 
grow as needed, and also remove the unused ImmArray.

Fixes: dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.1

Signed-off-by: Gert Wollny 
---
 src/gallium/auxiliary/tgsi/tgsi_exec.c | 13 -
 src/gallium/auxiliary/tgsi/tgsi_exec.h |  7 +++
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index 59194ebe31..5db515a075 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -1223,7 +1223,17 @@ tgsi_exec_machine_bind_shader(
  {
 uint size = parse.FullToken.FullImmediate.Immediate.NrTokens - 1;
 assert( size <= 4 );
-assert( mach->ImmLimit + 1 <= TGSI_EXEC_NUM_IMMEDIATES );
+if (mach->ImmLimit >= mach->ImmsReserved) {
+   unsigned newReserved = mach->ImmsReserved ? 2 * 
mach->ImmsReserved : 128;
+   float4 *imms = REALLOC(mach->Imms, mach->ImmsReserved, 
newReserved * sizeof(float4));
+   if (imms) {
+  mach->ImmsReserved = newReserved;
+  mach->Imms = imms;
+   } else {
+  debug_printf("Unable to (re)allocate space for immidiate 
constants\n");
+  break;
+   }
+}
 
 for( i = 0; i < size; i++ ) {
mach->Imms[mach->ImmLimit][i] = 
@@ -1337,6 +1347,7 @@ tgsi_exec_machine_destroy(struct tgsi_exec_machine *mach)
if (mach) {
   FREE(mach->Instructions);
   FREE(mach->Declarations);
+  FREE(mach->Imms);
 
   align_free(mach->Inputs);
   align_free(mach->Outputs);
diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.h 
b/src/gallium/auxiliary/tgsi/tgsi_exec.h
index ed8b9e8869..6d4ac38142 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.h
@@ -231,7 +231,6 @@ struct tgsi_sampler
 };
 
 #define TGSI_EXEC_NUM_TEMPS   4096
-#define TGSI_EXEC_NUM_IMMEDIATES  256
 
 /*
  * Locations of various utility registers (_I = Index, _C = Channel)
@@ -341,6 +340,7 @@ enum tgsi_break_type {
 
 #define TGSI_EXEC_MAX_BREAK_STACK (TGSI_EXEC_MAX_LOOP_NESTING + 
TGSI_EXEC_MAX_SWITCH_NESTING)
 
+typedef float float4[4];
 
 /**
  * Run-time virtual machine state for executing TGSI shader.
@@ -352,9 +352,8 @@ struct tgsi_exec_machine
struct tgsi_exec_vector   Temps[TGSI_EXEC_NUM_TEMPS +
TGSI_EXEC_NUM_TEMP_EXTRAS];
 
-   float Imms[TGSI_EXEC_NUM_IMMEDIATES][4];
-
-   float ImmArray[TGSI_EXEC_NUM_IMMEDIATES][4];
+   unsigned   ImmsReserved;
+   float4 *Imms;
 
struct tgsi_exec_vector   *Inputs;
struct tgsi_exec_vector   *Outputs;
-- 
2.18.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108115] [vulkancts] dEQP-VK.subgroups.vote.graphics.subgroupallequal.* fails

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108115

Samuel Pitoiset  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #1 from Samuel Pitoiset  ---
See
https://cgit.freedesktop.org/mesa/mesa/commit/?id=647c2b90e96a9ab8571baf958a7c67c1e816911a

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH mesa] util: use *unsigned* ints for bit operations

2018-10-16 Thread Eric Engestrom
Fixes errors thrown by GCC's Undefined Behaviour sanitizer (ubsan) every
time this macro is used.

Signed-off-by: Eric Engestrom 
---
 src/util/bitset.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/util/bitset.h b/src/util/bitset.h
index adafc72a5f74d46e118a..3b18abac793a0694c611 100644
--- a/src/util/bitset.h
+++ b/src/util/bitset.h
@@ -54,7 +54,7 @@
 #define BITSET_ONES(x) memset( (x), 0xff, sizeof (x) )
 
 #define BITSET_BITWORD(b) ((b) / BITSET_WORDBITS)
-#define BITSET_BIT(b) (1 << ((b) % BITSET_WORDBITS))
+#define BITSET_BIT(b) (1u << ((b) % BITSET_WORDBITS))
 
 /* single bit operations
  */
-- 
Cheers,
  Eric

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/5] anv: Add a NIR cache

2018-10-16 Thread Eero Tamminen

Hi,

On 10/13/18 3:08 AM, Jason Ekstrand wrote:

This patch series adds a simple NIR shader cache that sits right after
spirv_to_nir and brw_preprocess_nir and before linking.  This should help
alleviate some of the added overhead of link-time optimization since most
of the NIR-level optimization is now cached prior to linking.

I have no numbers to back this series up; just intuition.


Fortunately approximate numbers of where the bottlenecks are, are easy
to come by for shader compilation.

1. Start use-case doing lot of shader compilation (shader-db?)
2. "sudo perf record -a"
3. ^C when shader compilation stops
4. "sudo perf report"

When I profiled GL shader compilation before Mesa switched to NIR,
linking phase was 2/3 of the cycles taken by the whole shader compilation.

(I.e. why shader cache needed to cache linking results to have 
significant impact.)



- Eero


Jason Ekstrand (5):
   anv/pipeline: Move wpos and input attachment lowering to lower_nir
   anv/pipeline: Hash shader modules and spec constants separately
   compiler/types: Serialize/deserialize subpass input types correctly
   anv/pipeline_cache: Add support for caching NIR
   anv/pipeline: Cache the pre-lowered NIR

  src/compiler/glsl_types.cpp   |   4 +-
  src/intel/vulkan/anv_pipeline.c   | 118 ++
  src/intel/vulkan/anv_pipeline_cache.c | 100 ++
  src/intel/vulkan/anv_private.h|  18 
  4 files changed, 204 insertions(+), 36 deletions(-)



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 4/7] mesa: Helper functions for counting set bits in a mask

2018-10-16 Thread Toni Lönnberg
I tried to find a helper function for bit count already available but 
apparently I missed it :) No need for a new one if one exists already indeed.

-Toni

On Mon, Oct 15, 2018 at 07:59:58PM +, Roland Scheidegger wrote:
> Am 15.10.18 um 15:19 schrieb Toni Lönnberg:
> > ---
> >  src/util/bitscan.h | 25 +
> >  1 file changed, 25 insertions(+)
> > 
> > diff --git a/src/util/bitscan.h b/src/util/bitscan.h
> > index dc89ac9..cdfecaf 100644
> > --- a/src/util/bitscan.h
> > +++ b/src/util/bitscan.h
> > @@ -112,6 +112,31 @@ u_bit_scan64(uint64_t *mask)
> > return i;
> >  }
> >  
> > +/* Count bits set in mask */
> > +static inline int
> > +u_count_bits(unsigned *mask)
> I don't think you'd want to pass a pointer.
> 
> Besides, I don't think we need another set of functions for this.
> src/util/u_math.h already has util_bitcount64 and util_bitcount which do
> the same thing.
> (Although I don't know which one is better, util_bitcount looks like it
> would be potentially faster with just very few bits set, but with
> "random" uint/uint64 it certainly would seem the new one is better. But
> in any case, can't beat the cpu popcount instruction...)
> 
> Roland
> 
> 
> > +{
> > +   unsigned v = *mask;
> > +   int c;
> > +   v = v - ((v >> 1) & 0x);
> > +   v = (v & 0x) + ((v >> 2) & 0x);
> > +   v = (v + (v >> 4)) & 0xF0F0F0F;
> > +   c = (int)((v * 0x1010101) >> 24);
> > +   return c;
> > +}
> > +
> > +static inline int
> > +u_count_bits64(uint64_t *mask)
> > +{
> > +   uint64_t v = *mask;
> > +   int c;
> > +   v = v - ((v >> 1) & 0xull);
> > +   v = (v & 0xull) + ((v >> 2) & 0xull);
> > +   v = (v + (v >> 4)) & 0xF0F0F0F0F0F0F0Full;
> > +   c = (int)((v * 0x101010101010101ull) >> 56);
> > +   return c;
> > +}
> > +
> >  /* Determine if an unsigned value is a power of two.
> >   *
> >   * \note
> > 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: disable VK_SUBGROUP_FEATURE_VOTE_BIT

2018-10-16 Thread Samuel Pitoiset



On 10/16/18 9:54 AM, Bas Nieuwenhuizen wrote:

On Tue, Oct 16, 2018 at 9:40 AM Samuel Pitoiset
 wrote:


This feature isn't used for now, so disable it until
wwm is fixed in LLVM.


Is wwm actually the issue?


Just reporting what Daniel said.



Reviewed-by: Bas Nieuwenhuizen 


Fixes dEQP-VK.subgroups.vote.graphics.subgroupallequal*

https://bugs.freedesktop.org/show_bug.cgi?id=108115
Signed-off-by: Samuel Pitoiset 
---
  src/amd/vulkan/radv_device.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 174922780f..5fd5d48c42 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -1057,12 +1057,14 @@ void radv_GetPhysicalDeviceProperties2(
 (VkPhysicalDeviceSubgroupProperties*)ext;
 properties->subgroupSize = 64;
 properties->supportedStages = VK_SHADER_STAGE_ALL;
+   /* TODO: Enable VK_SUBGROUP_FEATURE_VOTE_BIT when wwm
+* is fixed in LLLVM.
+*/
 properties->supportedOperations =
 
VK_SUBGROUP_FEATURE_ARITHMETIC_BIT |
 
VK_SUBGROUP_FEATURE_BASIC_BIT |
 
VK_SUBGROUP_FEATURE_BALLOT_BIT |
-   
VK_SUBGROUP_FEATURE_QUAD_BIT |
-   
VK_SUBGROUP_FEATURE_VOTE_BIT;
+   
VK_SUBGROUP_FEATURE_QUAD_BIT;
 if (pdevice->rad_info.chip_class >= VI) {
 properties->supportedOperations |=
 
VK_SUBGROUP_FEATURE_SHUFFLE_BIT |
--
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: disable VK_SUBGROUP_FEATURE_VOTE_BIT

2018-10-16 Thread Bas Nieuwenhuizen
On Tue, Oct 16, 2018 at 9:40 AM Samuel Pitoiset
 wrote:
>
> This feature isn't used for now, so disable it until
> wwm is fixed in LLVM.

Is wwm actually the issue?

Reviewed-by: Bas Nieuwenhuizen 
>
> Fixes dEQP-VK.subgroups.vote.graphics.subgroupallequal*
>
> https://bugs.freedesktop.org/show_bug.cgi?id=108115
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_device.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index 174922780f..5fd5d48c42 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -1057,12 +1057,14 @@ void radv_GetPhysicalDeviceProperties2(
> (VkPhysicalDeviceSubgroupProperties*)ext;
> properties->subgroupSize = 64;
> properties->supportedStages = VK_SHADER_STAGE_ALL;
> +   /* TODO: Enable VK_SUBGROUP_FEATURE_VOTE_BIT when wwm
> +* is fixed in LLLVM.
> +*/
> properties->supportedOperations =
> 
> VK_SUBGROUP_FEATURE_ARITHMETIC_BIT |
> 
> VK_SUBGROUP_FEATURE_BASIC_BIT |
> 
> VK_SUBGROUP_FEATURE_BALLOT_BIT |
> -   
> VK_SUBGROUP_FEATURE_QUAD_BIT |
> -   
> VK_SUBGROUP_FEATURE_VOTE_BIT;
> +   
> VK_SUBGROUP_FEATURE_QUAD_BIT;
> if (pdevice->rad_info.chip_class >= VI) {
> properties->supportedOperations |=
> 
> VK_SUBGROUP_FEATURE_SHUFFLE_BIT |
> --
> 2.19.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radv: disable VK_SUBGROUP_FEATURE_VOTE_BIT

2018-10-16 Thread Samuel Pitoiset
This feature isn't used for now, so disable it until
wwm is fixed in LLVM.

Fixes dEQP-VK.subgroups.vote.graphics.subgroupallequal*

https://bugs.freedesktop.org/show_bug.cgi?id=108115
Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_device.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 174922780f..5fd5d48c42 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -1057,12 +1057,14 @@ void radv_GetPhysicalDeviceProperties2(
(VkPhysicalDeviceSubgroupProperties*)ext;
properties->subgroupSize = 64;
properties->supportedStages = VK_SHADER_STAGE_ALL;
+   /* TODO: Enable VK_SUBGROUP_FEATURE_VOTE_BIT when wwm
+* is fixed in LLLVM.
+*/
properties->supportedOperations =

VK_SUBGROUP_FEATURE_ARITHMETIC_BIT |

VK_SUBGROUP_FEATURE_BASIC_BIT |

VK_SUBGROUP_FEATURE_BALLOT_BIT |
-   
VK_SUBGROUP_FEATURE_QUAD_BIT |
-   
VK_SUBGROUP_FEATURE_VOTE_BIT;
+   
VK_SUBGROUP_FEATURE_QUAD_BIT;
if (pdevice->rad_info.chip_class >= VI) {
properties->supportedOperations |=

VK_SUBGROUP_FEATURE_SHUFFLE_BIT |
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v4]

2018-10-16 Thread Bas Nieuwenhuizen
On Tue, Oct 16, 2018 at 7:31 AM Keith Packard  wrote:
>
> Offers three clocks, device, clock monotonic and clock monotonic
> raw. Could use some kernel support to reduce the deviation between
> clock values.
>
> v2:
> Ensure deviation is at least as big as the GPU time interval.
>
> v3:
> Set device->lost when returning DEVICE_LOST.
> Use MAX2 and DIV_ROUND_UP instead of open coding these.
> Delete spurious TIMESTAMP in radv version.
> Suggested-by: Jason Ekstrand 
> Suggested-by: Lionel Landwerlin 
>
> v4:
> Add anv_gem_reg_read to anv_gem_stubs.c
> Suggested-by: Jason Ekstrand 
>
> Signed-off-by: Keith Packard 
> ---
>  src/amd/vulkan/radv_device.c   | 81 +++
>  src/amd/vulkan/radv_extensions.py  |  1 +
>  src/intel/vulkan/anv_device.c  | 89 ++
>  src/intel/vulkan/anv_extensions.py |  1 +
>  src/intel/vulkan/anv_gem.c | 13 +
>  src/intel/vulkan/anv_gem_stubs.c   |  7 +++
>  src/intel/vulkan/anv_private.h |  2 +
>  7 files changed, 194 insertions(+)
>
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index 174922780fc..80050485e54 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -4955,3 +4955,84 @@ radv_GetDeviceGroupPeerMemoryFeatures(
>VK_PEER_MEMORY_FEATURE_GENERIC_SRC_BIT |
>VK_PEER_MEMORY_FEATURE_GENERIC_DST_BIT;
>  }
> +
> +static const VkTimeDomainEXT radv_time_domains[] = {
> +   VK_TIME_DOMAIN_DEVICE_EXT,
> +   VK_TIME_DOMAIN_CLOCK_MONOTONIC_EXT,
> +   VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW_EXT,
> +};
> +
> +VkResult radv_GetPhysicalDeviceCalibrateableTimeDomainsEXT(
> +   VkPhysicalDevice physicalDevice,
> +   uint32_t *pTimeDomainCount,
> +   VkTimeDomainEXT  *pTimeDomains)
> +{
> +   int d;
> +   VK_OUTARRAY_MAKE(out, pTimeDomains, pTimeDomainCount);
> +
> +   for (d = 0; d < ARRAY_SIZE(radv_time_domains); d++) {
> +   vk_outarray_append(, i) {
> +   *i = radv_time_domains[d];
> +   }
> +   }
> +
> +   return vk_outarray_status();
> +}
> +
> +static uint64_t
> +radv_clock_gettime(clockid_t clock_id)
> +{
> +   struct timespec current;
> +   int ret;
> +
> +   ret = clock_gettime(clock_id, );
> +   if (ret < 0 && clock_id == CLOCK_MONOTONIC_RAW)
> +   ret = clock_gettime(CLOCK_MONOTONIC, );
> +   if (ret < 0)
> +   return 0;
> +
> +   return (uint64_t) current.tv_sec * 10ULL + current.tv_nsec;
> +}
> +
> +VkResult radv_GetCalibratedTimestampsEXT(
> +   VkDevice _device,
> +   uint32_t timestampCount,
> +   const VkCalibratedTimestampInfoEXT   *pTimestampInfos,
> +   uint64_t *pTimestamps,
> +   uint64_t *pMaxDeviation)
> +{
> +   RADV_FROM_HANDLE(radv_device, device, _device);
> +   uint32_t clock_crystal_freq = 
> device->physical_device->rad_info.clock_crystal_freq;
> +   int d;
> +   uint64_t begin, end;
> +
> +   begin = radv_clock_gettime(CLOCK_MONOTONIC_RAW);
> +
> +   for (d = 0; d < timestampCount; d++) {
> +   switch (pTimestampInfos[d].timeDomain) {
> +   case VK_TIME_DOMAIN_DEVICE_EXT:
> +   pTimestamps[d] = device->ws->query_value(device->ws,
> +
> RADEON_TIMESTAMP);
> +   break;
> +   case VK_TIME_DOMAIN_CLOCK_MONOTONIC_EXT:
> +   pTimestamps[d] = radv_clock_gettime(CLOCK_MONOTONIC);
> +   break;
> +
> +   case VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW_EXT:
> +   pTimestamps[d] = begin;
> +   break;
> +   default:
> +   pTimestamps[d] = 0;
> +   break;
> +   }
> +   }
> +
> +   end = radv_clock_gettime(CLOCK_MONOTONIC_RAW);
> +
> +   uint64_t clock_period = end - begin;
> +   uint64_t device_period = DIV_ROUND_UP(100, clock_crystal_freq);
> +
> +   *pMaxDeviation = MAX2(clock_period, device_period);

Should this not be a sum? Those deviations can happen independently
from each other, so worst case both deviations happen in the same
direction which causes the magnitude to be combined.

With that change:

Reviewed-by: Bas Nieuwenhuizen 

> +
> +   return VK_SUCCESS;
> +}
> diff --git a/src/amd/vulkan/radv_extensions.py 
> b/src/amd/vulkan/radv_extensions.py
> index 5dcedae1c63..4c81d3f0068 100644
> --- a/src/amd/vulkan/radv_extensions.py
> +++ b/src/amd/vulkan/radv_extensions.py
> @@ -92,6 

Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v4]

2018-10-16 Thread Samuel Pitoiset

The RADV bits are:

Reviewed-by: Samuel Pitoiset 

Thanks!

On 10/16/18 7:31 AM, Keith Packard wrote:

Offers three clocks, device, clock monotonic and clock monotonic
raw. Could use some kernel support to reduce the deviation between
clock values.

v2:
Ensure deviation is at least as big as the GPU time interval.

v3:
Set device->lost when returning DEVICE_LOST.
Use MAX2 and DIV_ROUND_UP instead of open coding these.
Delete spurious TIMESTAMP in radv version.
Suggested-by: Jason Ekstrand 
Suggested-by: Lionel Landwerlin 

v4:
Add anv_gem_reg_read to anv_gem_stubs.c
Suggested-by: Jason Ekstrand 

Signed-off-by: Keith Packard 
---
  src/amd/vulkan/radv_device.c   | 81 +++
  src/amd/vulkan/radv_extensions.py  |  1 +
  src/intel/vulkan/anv_device.c  | 89 ++
  src/intel/vulkan/anv_extensions.py |  1 +
  src/intel/vulkan/anv_gem.c | 13 +
  src/intel/vulkan/anv_gem_stubs.c   |  7 +++
  src/intel/vulkan/anv_private.h |  2 +
  7 files changed, 194 insertions(+)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 174922780fc..80050485e54 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -4955,3 +4955,84 @@ radv_GetDeviceGroupPeerMemoryFeatures(
   VK_PEER_MEMORY_FEATURE_GENERIC_SRC_BIT |
   VK_PEER_MEMORY_FEATURE_GENERIC_DST_BIT;
  }
+
+static const VkTimeDomainEXT radv_time_domains[] = {
+   VK_TIME_DOMAIN_DEVICE_EXT,
+   VK_TIME_DOMAIN_CLOCK_MONOTONIC_EXT,
+   VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW_EXT,
+};
+
+VkResult radv_GetPhysicalDeviceCalibrateableTimeDomainsEXT(
+   VkPhysicalDevice physicalDevice,
+   uint32_t *pTimeDomainCount,
+   VkTimeDomainEXT  *pTimeDomains)
+{
+   int d;
+   VK_OUTARRAY_MAKE(out, pTimeDomains, pTimeDomainCount);
+
+   for (d = 0; d < ARRAY_SIZE(radv_time_domains); d++) {
+   vk_outarray_append(, i) {
+   *i = radv_time_domains[d];
+   }
+   }
+
+   return vk_outarray_status();
+}
+
+static uint64_t
+radv_clock_gettime(clockid_t clock_id)
+{
+   struct timespec current;
+   int ret;
+
+   ret = clock_gettime(clock_id, );
+   if (ret < 0 && clock_id == CLOCK_MONOTONIC_RAW)
+   ret = clock_gettime(CLOCK_MONOTONIC, );
+   if (ret < 0)
+   return 0;
+
+   return (uint64_t) current.tv_sec * 10ULL + current.tv_nsec;
+}
+
+VkResult radv_GetCalibratedTimestampsEXT(
+   VkDevice _device,
+   uint32_t timestampCount,
+   const VkCalibratedTimestampInfoEXT   *pTimestampInfos,
+   uint64_t *pTimestamps,
+   uint64_t *pMaxDeviation)
+{
+   RADV_FROM_HANDLE(radv_device, device, _device);
+   uint32_t clock_crystal_freq = 
device->physical_device->rad_info.clock_crystal_freq;
+   int d;
+   uint64_t begin, end;
+
+   begin = radv_clock_gettime(CLOCK_MONOTONIC_RAW);
+
+   for (d = 0; d < timestampCount; d++) {
+   switch (pTimestampInfos[d].timeDomain) {
+   case VK_TIME_DOMAIN_DEVICE_EXT:
+   pTimestamps[d] = device->ws->query_value(device->ws,
+
RADEON_TIMESTAMP);
+   break;
+   case VK_TIME_DOMAIN_CLOCK_MONOTONIC_EXT:
+   pTimestamps[d] = radv_clock_gettime(CLOCK_MONOTONIC);
+   break;
+
+   case VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW_EXT:
+   pTimestamps[d] = begin;
+   break;
+   default:
+   pTimestamps[d] = 0;
+   break;
+   }
+   }
+
+   end = radv_clock_gettime(CLOCK_MONOTONIC_RAW);
+
+   uint64_t clock_period = end - begin;
+   uint64_t device_period = DIV_ROUND_UP(100, clock_crystal_freq);
+
+   *pMaxDeviation = MAX2(clock_period, device_period);
+
+   return VK_SUCCESS;
+}
diff --git a/src/amd/vulkan/radv_extensions.py 
b/src/amd/vulkan/radv_extensions.py
index 5dcedae1c63..4c81d3f0068 100644
--- a/src/amd/vulkan/radv_extensions.py
+++ b/src/amd/vulkan/radv_extensions.py
@@ -92,6 +92,7 @@ EXTENSIONS = [
  Extension('VK_KHR_display',  23, 
'VK_USE_PLATFORM_DISPLAY_KHR'),
  Extension('VK_EXT_direct_mode_display',   1, 
'VK_USE_PLATFORM_DISPLAY_KHR'),
  Extension('VK_EXT_acquire_xlib_display',  1, 
'VK_USE_PLATFORM_XLIB_XRANDR_EXT'),
+Extension('VK_EXT_calibrated_timestamps', 1, True),
  

[Mesa-dev] [Bug 107765] [regression] Batman Arkham City crashes with DXVK under wine

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107765

Samuel Pitoiset  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Samuel Pitoiset  ---
Should be fixed with
https://cgit.freedesktop.org/mesa/mesa/commit/?id=593996bc026c9e383da9683ff30e784b0ea09015

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Allow fd.o to join forces with X.Org

2018-10-16 Thread Peter Hutterer
On Mon, Oct 15, 2018 at 10:49:24AM -0400, Harry Wentland wrote:
> The leadership of freedesktop.org (fd.o) has recently expressed interest
> in having an elected governing body. Given the tight connection between
> fd.o and X.Org and the fact that X.Org has such a governing body it
> seemed obvious to consider extending X.Org's mandate to fd.o.
> 
> Quite a bit of background on fd.o leading up to this has been covered by
> Daniel Stone at XDC 2018 and was covered really well by Jake Edge of LWN [1].
> 
> One question that is briefly addressed in the LWN article and was
> thoroughly discussed by members of the X.Org boards, Daniel Stone, and
> others in hallway discussions is the question of whether to extend the
> X.Org membership to projects hosted on fd.o but outside the purpose of
> the X.Org foundation as enacted in its bylaws.
> 
> Most people I talked to would prefer not to dilute X.Org's mission and
> extend membership only to contributors of projects that follow X.Org's
> purpose as enacted in its bylaws. Other projects can continue to be
> hosted on fd.o but won't receive X.Org membership for the mere reason of
> being hosted on fd.o.
> 
> [1] https://lwn.net/Articles/767258/
> 
> v2:
>  - Subject line that better describes the intention
>  - Briefly describe reasons behind this change
>  - Drop expanding membership eligibility
> ---
> 
> We're looking for feedback and comments on this patch. If it's not
> widely controversial the final version of the patch will be put to a
> vote at the 2019 X.Org elections.
> 
> The patch applies to the X.Org bylaws git repo, which can be found at
> https://gitlab.freedesktop.org/xorgfoundation/bylaws
> 
> Happy commenting.
> 
> Harry
> 
> bylaws.tex | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/bylaws.tex b/bylaws.tex
> index 4ab35a4f7745..44ff4745963b 100644
> --- a/bylaws.tex
> +++ b/bylaws.tex
> @@ -14,7 +14,7 @@ BE IT ENACTED AND IT IS HEREBY ENACTED as a By-law of the 
> X.Org Foundation
>  
>  The purpose of the X.Org Foundation shall be to:
>  \begin{enumerate}[(i)\hspace{.2cm}]
> - \item Research, develop, support, organize, administrate, standardize,
> + \item \label{1} Research, develop, support, organize, administrate, 
> standardize,
>   promote, and defend a free and open accelerated graphics stack. This
>   includes, but is not limited to, the following projects: DRM, Mesa,
>   Wayland and the X Window System,
> @@ -24,6 +24,11 @@ The purpose of the X.Org Foundation shall be to:
>  
>   \item Support and educate the general community of users of this
>   graphics stack.
> +
> + \item Support free and open source projects through the freedesktop.org
> + infrastructure. For projects outside the scope of item (\ref{1}) support
> + extends to project hosting only.
> +

Yes to the idea but given that the remaining 11 pages cover all the legalese
for xorg I think we need to add at least a section of what "project hosting"
means. Even if it's just a "includes but is not limited to blah".  And some
addition to 4.1 Powers is needed to spell out what the BoD can do in regards
to fdo. 

Cheers,
   Peter


>  \end{enumerate}
>  
>  \article{INTERPRETATION}
> -- 
> 2.19.1
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108382] st_framebuffer might leak on certain circumstances

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108382

--- Comment #2 from Yong Zhang  ---
Created attachment 142041
  --> https://bugs.freedesktop.org/attachment.cgi?id=142041=edit
proposed patch

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108382] st_framebuffer might leak on certain circumstances

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108382

--- Comment #1 from Yong Zhang  ---
Proposed solution:
do not use st_framebuffer_iface address as hash key, use
st_framebuffer_iface->ID instead.
According to st_api.h, ID is "Identifier that uniquely identifies the
framebuffer interface object.", which is guaranteed to be unique.

Please refer to attached patch.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108382] st_framebuffer might leak on certain circumstances

2018-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108382

Bug ID: 108382
   Summary: st_framebuffer might leak on certain circumstances
   Product: Mesa
   Version: 18.2
  Hardware: All
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: zhangy...@lbesec.com
QA Contact: mesa-dev@lists.freedesktop.org

By design, framebuffer can be marked as obsoleted by GL calls (such as
eglDestroySurface) at any time, but gets released only when it's no longer used
(by calls such as eglMakeCurrent).

st_manager uses two structs to achieve this, one is hash table stfbi_ht, which
tracks st_framebuffer_iface of every active (non-obsolete) st_framebuffers,
another one is linked list winsys_buffer, which keeps all currently used
st_framebuffers.

Consider following call sequence (assuming we are using dri backend):
EGLSurface surf1 = eglCreateWindowSurface(...);
eglMakeCurrent(..., surf1, surf1, ...);
// do rendering
eglDestroySurface(..., surf1);
EGLSurface surf2 = eglCreateWindowSurface(...);
eglMakeCurrent(..., surf2, surf2, ...);

When first eglMakeCurrent is called, st_api_make_current is called,
st_framebuffer is created, its iface is pointed to the actual dri_drawable of
surf1. st_framebuffer is added into winsys_buffer and iface is added into
stfbi_ht. then st_framebuffers_purge is called, it traverses winsys_buffer,
searches iface in stfbi_ht, found all framebuffers are active.

When eglDestroySurface is called, st_api_destroy_drawable is called, only
st_framebuffer_iface (which is actually dri_drawable of surf1) is removed from
stfbi_ht.

When second eglMakeCurrent is called, st_api_make_current is called,
st_framebuffer is created, its iface is pointed to the actual dri_drawable of
surf2, st_framebuffer is added into winsys_buffer and iface is added into
stfbi_ht. then st_framebuffers_purge is called, it traverses winsys_buffers,
found st_framebuffer->iface of surf1 is not in stfbi_ht, which means surf1 is
obsolete, at last st_framebuffer is released and removed from winsys_buffers. 

This mechanism depends on an assumption: iface (which are dri_drawable of surf1
and surf2) are allocated on different locations on memory, so their address can
be used as hash key. The assumption works for most cases, but on Android x86,
due to different malloc implementation, dri_drawable of surf1 and surf2 are
very likely to be allocated on the same address, which breaks
st_framebuffers_purge, causes stale st_framebuffer never gets freed.

Test environment:
Android x86 7.1.2-r36, AMD Radeon RX560
How to reproduce:
keep switching any app between foreground and background.
Observe /sys/kernel/debug/dri/0/amdgpu_gem_info will find gem usage increase.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev