Re: [Mesa-dev] [RFC PATCH 0/6] r600: speed up tesselation shaders

2017-12-28 Thread Dave Airlie
On 29 December 2017 at 16:38, Dave Airlie  wrote:
> On 11 December 2017 at 22:49, Gert Wollny  wrote:
>> Am Freitag, den 08.12.2017, 16:30 +1000 schrieb Dave Airlie:
>>> [snip]
>>>
>>> So I haven't commited these yet, because I wanted to see if I could
>>> get sb to work.
>> Well, it was very much work in progress, I didn't expect it to be
>> committed as is anyway.
>>
>>>
>>> https://cgit.freedesktop.org/~airlied/mesa/log/?h=r600-sb-lds-wip
>>>
>>> is my non functional attempt, so far, biut it gpu hangs on the nop
>>> shader.
>>
>> I've played aound it a bit and added some hacks to make it not hang,
>> i.e. sb scheduls calls into any slot, but LDS read/write should go only
>> into SLOT_X, and not splitting up the fetch seemed to be important
>> (patch attached).
>>
>>
>> However, gcm moves around the LSD_OQ* loads changing the order without
>> changing the order of the according LDS_READ_RET calls. At least for
>> this the nop shader still fails.
>>
>> I tried to persuade the optimizer to not reorder these move
>> instructions by adding a "use" to the dst-value of a node that reads
>> from a LDS_OQ to the next node that reads from the same queue, but to
>> no avail. I guess I didn't figure out how to count these extra uses
>> properly when the instructuions are scheduled.
>
> I thought I'd done this already, I must dig a bit more.
>
> I've pushed mosre stuff to the branch, nop still doesn't work.
>
> I've included your patche in one of the squashes, I think we should be
> pretty close.

I think the top patch un my tree fixes the LDS reordering, nop still
doesn't work
though which is annoying. maybe you can spot the problem I've been
staring too long.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH 0/6] r600: speed up tesselation shaders

2017-12-28 Thread Dave Airlie
On 11 December 2017 at 22:49, Gert Wollny  wrote:
> Am Freitag, den 08.12.2017, 16:30 +1000 schrieb Dave Airlie:
>> [snip]
>>
>> So I haven't commited these yet, because I wanted to see if I could
>> get sb to work.
> Well, it was very much work in progress, I didn't expect it to be
> committed as is anyway.
>
>>
>> https://cgit.freedesktop.org/~airlied/mesa/log/?h=r600-sb-lds-wip
>>
>> is my non functional attempt, so far, biut it gpu hangs on the nop
>> shader.
>
> I've played aound it a bit and added some hacks to make it not hang,
> i.e. sb scheduls calls into any slot, but LDS read/write should go only
> into SLOT_X, and not splitting up the fetch seemed to be important
> (patch attached).
>
>
> However, gcm moves around the LSD_OQ* loads changing the order without
> changing the order of the according LDS_READ_RET calls. At least for
> this the nop shader still fails.
>
> I tried to persuade the optimizer to not reorder these move
> instructions by adding a "use" to the dst-value of a node that reads
> from a LDS_OQ to the next node that reads from the same queue, but to
> no avail. I guess I didn't figure out how to count these extra uses
> properly when the instructuions are scheduled.

I thought I'd done this already, I must dig a bit more.

I've pushed mosre stuff to the branch, nop still doesn't work.

I've included your patche in one of the squashes, I think we should be
pretty close.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/6] r600g/state_trackers small cleanups

2017-12-28 Thread Konstantin Kharlamov
I forgot to tell: there's actually lots of warnings for Wsign-compare, so if 
anybody wanted, it's something to look at. I fixed only as much of the code as 
I had a mood for :Ь

В письме от пятница, 29 декабря 2017 г. 8:32:26 MSK пользователь Konstantin 
Kharlamov написал:
> Mostly a quick run with -Wsign-compare. I didn't see any real problems
> though except for the changes at r600_isa.c where a unsigned been
> compared with -1.
> 
> No changes in piglit except for two unstable tests for me: dlist and
> multiple-texture-reading.
> 
> Regarding how did I manage to run piglit given it hangs r600g: well, it
> hangs in the very end on the same test, and really saves the results. So
> even though piglit run requires me to reboot in the end, it still gives
> most of results, except for, maybe, some dozens in the end.
> 
> P.S. I don't have commit rights.
> 
> Konstantin Kharlamov (6):
>   r600g: do not use "fast-clear" for small textures
>   r600g: constify some variables
>   nine: constify some variables
>   st/glx: constify some variables
>   r600g: some -Wsign-compare fixes
>   r600g: fix unused variable warning
> 
>  src/gallium/drivers/r600/cayman_msaa.c|  2 +-
>  src/gallium/drivers/r600/eg_debug.c   |  6 +++---
>  src/gallium/drivers/r600/evergreen_state.c|  8 
>  src/gallium/drivers/r600/r600_isa.c   |  6 +++---
>  src/gallium/drivers/r600/r600_pipe.h  |  2 +-
>  src/gallium/drivers/r600/r600_query.c |  2 +-
>  src/gallium/drivers/r600/r600_state.c | 12 ++--
>  src/gallium/drivers/r600/r600_state_common.c  |  6 ++
>  src/gallium/drivers/r600/r600_test_dma.c  |  2 +-
>  src/gallium/drivers/r600/r600_texture.c   | 10 ++
>  src/gallium/drivers/r600/sb/sb_expr.cpp   |  2 +-
>  src/gallium/state_trackers/glx/xlib/glx_getproc.c |  2 +-
>  src/gallium/state_trackers/nine/nine_pipe.h   |  2 +-
>  src/gallium/state_trackers/nine/nine_shader.c | 10 +-
>  14 files changed, 40 insertions(+), 32 deletions(-)
> 
> 




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/6] r600g: do not use "fast-clear" for small textures

2017-12-28 Thread Konstantin Kharlamov
Ported from radeonsi. Improves windowed glxgears ran as

vblank_mode=0 glxgears -info -geometry 0+0+512+512

from ≈2270 FPS to ≈2360 FPS. Tested with AMD TURKS.

Signed-off-by: Konstantin Kharlamov 
---
 src/gallium/drivers/r600/r600_texture.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_texture.c 
b/src/gallium/drivers/r600/r600_texture.c
index 03cdcd22ee..6739553f94 100644
--- a/src/gallium/drivers/r600/r600_texture.c
+++ b/src/gallium/drivers/r600/r600_texture.c
@@ -1793,6 +1793,16 @@ void evergreen_do_fast_color_clear(struct 
r600_common_context *rctx,
!(tex->resource.external_usage & 
PIPE_HANDLE_USAGE_EXPLICIT_FLUSH))
continue;
 
+   /* Use a slow clear for small surfaces where the cost of
+* the eliminate pass can be higher than the benefit of fast
+* clear. AMDGPU-pro does this, but the numbers may differ.
+*
+* This helps on both dGPUs and APUs, even small ones.
+*/
+   if (tex->resource.b.b.nr_samples <= 1 &&
+   tex->resource.b.b.width0 * tex->resource.b.b.height0 <= 512 
* 512)
+   continue;
+
{
/* 128-bit formats are unusupported */
if (tex->surface.bpe > 8) {
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/6] r600g: fix unused variable warning

2017-12-28 Thread Konstantin Kharlamov
Signed-off-by: Konstantin Kharlamov 
---
 src/gallium/drivers/r600/r600_state_common.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index ec8945f084..dc5cc0ad2e 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -902,7 +902,6 @@ struct r600_pipe_shader_selector 
*r600_create_shader_state_tokens(struct pipe_co
  unsigned 
pipe_shader_type)
 {
struct r600_pipe_shader_selector *sel = 
CALLOC_STRUCT(r600_pipe_shader_selector);
-   int i;
 
sel->type = pipe_shader_type;
sel->tokens = tgsi_dup_tokens(tokens);
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/6] r600g: constify some variables

2017-12-28 Thread Konstantin Kharlamov
Just a nice hint for both peoples and compilers.

Signed-off-by: Konstantin Kharlamov 
---
 src/gallium/drivers/r600/cayman_msaa.c |  2 +-
 src/gallium/drivers/r600/evergreen_state.c |  2 +-
 src/gallium/drivers/r600/r600_query.c  |  2 +-
 src/gallium/drivers/r600/r600_state.c  | 12 ++--
 src/gallium/drivers/r600/sb/sb_expr.cpp|  2 +-
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/r600/cayman_msaa.c 
b/src/gallium/drivers/r600/cayman_msaa.c
index 6bc307a4bc..c1294536d3 100644
--- a/src/gallium/drivers/r600/cayman_msaa.c
+++ b/src/gallium/drivers/r600/cayman_msaa.c
@@ -219,7 +219,7 @@ void cayman_emit_msaa_config(struct radeon_winsys_cs *cs, 
int nr_samples,
 
if (setup_samples > 1) {
/* indexed by log2(nr_samples) */
-   unsigned max_dist[] = {
+   const unsigned max_dist[] = {
0,
eg_max_dist_2x,
eg_max_dist_4x,
diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 0da665f634..ecb9c598e3 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -1584,7 +1584,7 @@ static void evergreen_set_min_samples(struct pipe_context 
*ctx, unsigned min_sam
 }
 
 /* 8xMSAA */
-static uint32_t sample_locs_8x[] = {
+static const uint32_t sample_locs_8x[] = {
FILL_SREG(-1,  1,  1,  5,  3, -5,  5,  3),
FILL_SREG(-7, -1, -3, -7,  7, -3, -5,  7),
FILL_SREG(-1,  1,  1,  5,  3, -5,  5,  3),
diff --git a/src/gallium/drivers/r600/r600_query.c 
b/src/gallium/drivers/r600/r600_query.c
index 987da9a806..8194d6e7c7 100644
--- a/src/gallium/drivers/r600/r600_query.c
+++ b/src/gallium/drivers/r600/r600_query.c
@@ -1914,7 +1914,7 @@ void r600_query_fix_enabled_rb_mask(struct 
r600_common_screen *rscreen)
 #define XG(group_, name_, query_type_, type_, result_type_) \
XFULL(name_, query_type_, type_, result_type_, 
R600_QUERY_GROUP_##group_)
 
-static struct pipe_driver_query_info r600_driver_query_list[] = {
+static const struct pipe_driver_query_info r600_driver_query_list[] = {
X("num-compilations",   NUM_COMPILATIONS,   UINT64, 
CUMULATIVE),
X("num-shaders-created",NUM_SHADERS_CREATED,UINT64, 
CUMULATIVE),
X("num-shader-cache-hits",  NUM_SHADER_CACHE_HITS,  UINT64, 
CUMULATIVE),
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index cbf860f45f..52adebe6cf 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -1220,22 +1220,22 @@ static void r600_set_framebuffer_state(struct 
pipe_context *ctx,
rctx->framebuffer.do_update_surf_dirtiness = true;
 }
 
-static uint32_t sample_locs_2x[] = {
+static const uint32_t sample_locs_2x[] = {
FILL_SREG(-4, 4, 4, -4, -4, 4, 4, -4),
FILL_SREG(-4, 4, 4, -4, -4, 4, 4, -4),
 };
-static unsigned max_dist_2x = 4;
+static const unsigned max_dist_2x = 4;
 
-static uint32_t sample_locs_4x[] = {
+static const uint32_t sample_locs_4x[] = {
FILL_SREG(-2, -2, 2, 2, -6, 6, 6, -6),
FILL_SREG(-2, -2, 2, 2, -6, 6, 6, -6),
 };
-static unsigned max_dist_4x = 6;
-static uint32_t sample_locs_8x[] = {
+static const unsigned max_dist_4x = 6;
+static const uint32_t sample_locs_8x[] = {
FILL_SREG(-1,  1,  1,  5,  3, -5,  5,  3),
FILL_SREG(-7, -1, -3, -7,  7, -3, -5,  7),
 };
-static unsigned max_dist_8x = 7;
+static const unsigned max_dist_8x = 7;
 
 static void r600_get_sample_position(struct pipe_context *ctx,
 unsigned sample_count,
diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp 
b/src/gallium/drivers/r600/sb/sb_expr.cpp
index 7a5d62c8e8..7ad48e8ad3 100644
--- a/src/gallium/drivers/r600/sb/sb_expr.cpp
+++ b/src/gallium/drivers/r600/sb/sb_expr.cpp
@@ -330,7 +330,7 @@ void expr_handler::apply_alu_src_mod(const bc_alu , 
unsigned src,
 }
 
 void expr_handler::apply_alu_dst_mod(const bc_alu , literal ) {
-   float omod_coeff[] = {2.0f, 4.0, 0.5f};
+   const float omod_coeff[] = {2.0f, 4.0, 0.5f};
 
if (bc.omod)
v = v.f * omod_coeff[bc.omod - 1];
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/6] r600g: some -Wsign-compare fixes

2017-12-28 Thread Konstantin Kharlamov
Signed-off-by: Konstantin Kharlamov 
---
 src/gallium/drivers/r600/eg_debug.c  | 6 +++---
 src/gallium/drivers/r600/evergreen_state.c   | 6 +++---
 src/gallium/drivers/r600/r600_isa.c  | 6 +++---
 src/gallium/drivers/r600/r600_pipe.h | 2 +-
 src/gallium/drivers/r600/r600_state_common.c | 5 ++---
 src/gallium/drivers/r600/r600_test_dma.c | 2 +-
 6 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/src/gallium/drivers/r600/eg_debug.c 
b/src/gallium/drivers/r600/eg_debug.c
index ceb7c1672c..56195df296 100644
--- a/src/gallium/drivers/r600/eg_debug.c
+++ b/src/gallium/drivers/r600/eg_debug.c
@@ -78,7 +78,7 @@ static void print_named_value(FILE *file, const char *name, 
uint32_t value,
 static void eg_dump_reg(FILE *file, unsigned offset, uint32_t value,
uint32_t field_mask)
 {
-   int r, f;
+   unsigned r, f;
 
for (r = 0; r < ARRAY_SIZE(egd_reg_table); r++) {
const struct eg_reg *reg = _reg_table[r];
@@ -134,7 +134,7 @@ static void ac_parse_set_reg_packet(FILE *f, uint32_t *ib, 
unsigned count,
unsigned reg_offset)
 {
unsigned reg = (ib[1] << 2) + reg_offset;
-   int i;
+   unsigned i;
 
for (i = 0; i < count; i++)
eg_dump_reg(f, reg + i*4, ib[2+i], ~0);
@@ -149,7 +149,7 @@ static uint32_t *ac_parse_packet3(FILE *f, uint32_t *ib, 
int *num_dw,
unsigned op = PKT3_IT_OPCODE_G(ib[0]);
const char *predicate = PKT3_PREDICATE(ib[0]) ? "(predicate)" : "";
const char *compute_mode = (ib[0] & 0x2) ? "(C)" : "";
-   int i;
+   unsigned i;
 
/* Print the name first. */
for (i = 0; i < ARRAY_SIZE(packet3_table); i++)
diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index ecb9c598e3..1aae9097f3 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -3915,7 +3915,7 @@ static void evergreen_set_hw_atomic_buffers(struct 
pipe_context *ctx,
 {
struct r600_context *rctx = (struct r600_context *)ctx;
struct r600_atomic_buffer_state *astate;
-   int i, idx;
+   unsigned i, idx;
 
astate = >atomic_buffer_state;
 
@@ -3951,7 +3951,7 @@ static void evergreen_set_shader_buffers(struct 
pipe_context *ctx,
struct r600_tex_color_info color;
struct eg_buf_res_params buf_params;
struct r600_resource *resource;
-   int i, idx;
+   unsigned i, idx;
unsigned old_mask;
 
if (shader != PIPE_SHADER_FRAGMENT &&
@@ -4042,7 +4042,7 @@ static void evergreen_set_shader_images(struct 
pipe_context *ctx,
const struct pipe_image_view *images)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
-   int i;
+   unsigned i;
struct r600_image_view *rview;
struct pipe_resource *image;
struct r600_resource *resource;
diff --git a/src/gallium/drivers/r600/r600_isa.c 
b/src/gallium/drivers/r600/r600_isa.c
index 2633cdcdb9..0d3e93d141 100644
--- a/src/gallium/drivers/r600/r600_isa.c
+++ b/src/gallium/drivers/r600/r600_isa.c
@@ -557,7 +557,7 @@ int r600_isa_init(struct r600_context *ctx, struct r600_isa 
*isa) {
 
for (i = 0; i < ARRAY_SIZE(r600_alu_op_table); ++i) {
const struct alu_op_info *op = _alu_op_table[i];
-   unsigned opc;
+   int opc;
if (op->flags & AF_LDS || op->slots[isa->hw_class] == 0)
continue;
opc = op->opcode[isa->hw_class >> 1];
@@ -570,7 +570,7 @@ int r600_isa_init(struct r600_context *ctx, struct r600_isa 
*isa) {
 
for (i = 0; i < ARRAY_SIZE(fetch_op_table); ++i) {
const struct fetch_op_info *op = _op_table[i];
-   unsigned opc = op->opcode[isa->hw_class];
+   int opc = op->opcode[isa->hw_class];
if ((op->flags & FF_GDS) || ((opc & 0xFF) != opc))
continue; /* ignore GDS ops and INST_MOD versions for 
now */
isa->fetch_map[opc] = i + 1;
@@ -578,7 +578,7 @@ int r600_isa_init(struct r600_context *ctx, struct r600_isa 
*isa) {
 
for (i = 0; i < ARRAY_SIZE(cf_op_table); ++i) {
const struct cf_op_info *op = _op_table[i];
-   unsigned opc = op->opcode[isa->hw_class];
+   int opc = op->opcode[isa->hw_class];
if (opc == -1)
continue;
/* using offset for CF_ALU_xxx opcodes because they overlap 
with other
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index e042edf2b4..65206b023d 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -560,7 +560,7 @@ struct r600_context {
boolgs_tri_strip_adj_fix;
boolean  

[Mesa-dev] [PATCH 0/6] r600g/state_trackers small cleanups

2017-12-28 Thread Konstantin Kharlamov
Mostly a quick run with -Wsign-compare. I didn't see any real problems
though except for the changes at r600_isa.c where a unsigned been
compared with -1.

No changes in piglit except for two unstable tests for me: dlist and
multiple-texture-reading.

Regarding how did I manage to run piglit given it hangs r600g: well, it
hangs in the very end on the same test, and really saves the results. So
even though piglit run requires me to reboot in the end, it still gives
most of results, except for, maybe, some dozens in the end.

P.S. I don't have commit rights.

Konstantin Kharlamov (6):
  r600g: do not use "fast-clear" for small textures
  r600g: constify some variables
  nine: constify some variables
  st/glx: constify some variables
  r600g: some -Wsign-compare fixes
  r600g: fix unused variable warning

 src/gallium/drivers/r600/cayman_msaa.c|  2 +-
 src/gallium/drivers/r600/eg_debug.c   |  6 +++---
 src/gallium/drivers/r600/evergreen_state.c|  8 
 src/gallium/drivers/r600/r600_isa.c   |  6 +++---
 src/gallium/drivers/r600/r600_pipe.h  |  2 +-
 src/gallium/drivers/r600/r600_query.c |  2 +-
 src/gallium/drivers/r600/r600_state.c | 12 ++--
 src/gallium/drivers/r600/r600_state_common.c  |  6 ++
 src/gallium/drivers/r600/r600_test_dma.c  |  2 +-
 src/gallium/drivers/r600/r600_texture.c   | 10 ++
 src/gallium/drivers/r600/sb/sb_expr.cpp   |  2 +-
 src/gallium/state_trackers/glx/xlib/glx_getproc.c |  2 +-
 src/gallium/state_trackers/nine/nine_pipe.h   |  2 +-
 src/gallium/state_trackers/nine/nine_shader.c | 10 +-
 14 files changed, 40 insertions(+), 32 deletions(-)

-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/6] st/glx: constify some variables

2017-12-28 Thread Konstantin Kharlamov
Just a nice hint for both peoples and compilers.

Signed-off-by: Konstantin Kharlamov 
---
 src/gallium/state_trackers/glx/xlib/glx_getproc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/glx/xlib/glx_getproc.c 
b/src/gallium/state_trackers/glx/xlib/glx_getproc.c
index e7564ad9cd..b0f04ceebc 100644
--- a/src/gallium/state_trackers/glx/xlib/glx_getproc.c
+++ b/src/gallium/state_trackers/glx/xlib/glx_getproc.c
@@ -43,7 +43,7 @@ struct name_address_pair {
 };
 
 
-static struct name_address_pair GLX_functions[] = {
+static const struct name_address_pair GLX_functions[] = {
/*** GLX_VERSION_1_0 ***/
{ "glXChooseVisual", (__GLXextFuncPtr) glXChooseVisual },
{ "glXCopyContext", (__GLXextFuncPtr) glXCopyContext },
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6] st/nine: constify some variables

2017-12-28 Thread Konstantin Kharlamov
Just a nice hint for both peoples and compilers.

Signed-off-by: Konstantin Kharlamov 
---
 src/gallium/state_trackers/nine/nine_pipe.h   |  2 +-
 src/gallium/state_trackers/nine/nine_shader.c | 10 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/gallium/state_trackers/nine/nine_pipe.h 
b/src/gallium/state_trackers/nine/nine_pipe.h
index 6bd4a0c897..c8fef62827 100644
--- a/src/gallium/state_trackers/nine/nine_pipe.h
+++ b/src/gallium/state_trackers/nine/nine_pipe.h
@@ -201,7 +201,7 @@ compressed_format( D3DFORMAT fmt )
 static inline boolean
 depth_stencil_format( D3DFORMAT fmt )
 {
-static D3DFORMAT allowed[] = {
+static const D3DFORMAT allowed[] = {
 D3DFMT_D16_LOCKABLE,
 D3DFMT_D32,
 D3DFMT_D15S1,
diff --git a/src/gallium/state_trackers/nine/nine_shader.c 
b/src/gallium/state_trackers/nine/nine_shader.c
index cc667ebfbc..42f0566083 100644
--- a/src/gallium/state_trackers/nine/nine_shader.c
+++ b/src/gallium/state_trackers/nine/nine_shader.c
@@ -378,7 +378,7 @@ struct sm1_instruction
 struct sm1_src_param dst_rel[1];
 struct sm1_dst_param dst[1];
 
-struct sm1_op_info *info;
+const struct sm1_op_info *info;
 };
 
 static void
@@ -2901,7 +2901,7 @@ DECL_SPECIAL(COMMENT)
 #define _OPI(o,t,vv1,vv2,pv1,pv2,d,s,h) \
 { D3DSIO_##o, TGSI_OPCODE_##t, { vv1, vv2 }, { pv1, pv2, }, d, s, h }
 
-struct sm1_op_info inst_table[] =
+static const struct sm1_op_info inst_table[] =
 {
 _OPI(NOP, NOP, V(0,0), V(3,0), V(0,0), V(3,0), 0, 0, SPECIAL(NOP)), /* 0 */
 _OPI(MOV, MOV, V(0,0), V(3,0), V(0,0), V(3,0), 1, 1, NULL),
@@ -3008,10 +3008,10 @@ struct sm1_op_info inst_table[] =
 _OPI(BREAKP, BRK,  V(0,0), V(3,0), V(2,1), V(3,0), 0, 1, SPECIAL(BREAKP))
 };
 
-struct sm1_op_info inst_phase =
+static const struct sm1_op_info inst_phase =
 _OPI(PHASE, NOP, V(0,0), V(0,0), V(1,4), V(1,4), 0, 0, SPECIAL(PHASE));
 
-struct sm1_op_info inst_comment =
+static const struct sm1_op_info inst_comment =
 _OPI(COMMENT, NOP, V(0,0), V(3,0), V(0,0), V(3,0), 0, 0, SPECIAL(COMMENT));
 
 static void
@@ -3279,7 +3279,7 @@ sm1_parse_instruction(struct shader_translator *tx)
 struct sm1_instruction *insn = >insn;
 HRESULT hr;
 DWORD tok;
-struct sm1_op_info *info = NULL;
+const struct sm1_op_info *info = NULL;
 unsigned i;
 
 sm1_parse_comments(tx, TRUE);
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 9/9] radv: Enable DCC with transfers.

2017-12-28 Thread Dieter Nützel

For the series:

Tested-by: Dieter Nützel 

on RX580 with

'smoketest' somewhat faster
'F1 2017' little bit slower

BTW You dropped all my tb last time.

Dieter

Am 29.12.2017 03:06, schrieb Bas Nieuwenhuizen:

Before this DCC was in practice disabled for most games. This
enables practical DCC use. Expect a 5-10% perf increase on a
bunch of games on vega @ 4k.
---
 src/amd/vulkan/radv_image.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c
index 6088928df80..d8cee6f6cba 100644
--- a/src/amd/vulkan/radv_image.c
+++ b/src/amd/vulkan/radv_image.c
@@ -148,8 +148,7 @@ radv_init_surface(struct radv_device *device,
}
}

-   if ((pCreateInfo->usage & (VK_IMAGE_USAGE_TRANSFER_SRC_BIT |
-  VK_IMAGE_USAGE_STORAGE_BIT)) ||
+   if ((pCreateInfo->usage & VK_IMAGE_USAGE_STORAGE_BIT) ||
(pCreateInfo->flags & VK_IMAGE_CREATE_EXTENDED_USAGE_BIT_KHR) ||
!dcc_compatible_formats ||
 (pCreateInfo->tiling == VK_IMAGE_TILING_LINEAR) ||

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] nir: Add a helper to get the uvec4 type.

2017-12-28 Thread Jason Ekstrand

Sure. Rb


On December 28, 2017 19:56:55 Eric Anholt  wrote:


I needed this in the vc5 compiler.
---
 src/compiler/nir_types.cpp | 6 ++
 src/compiler/nir_types.h   | 1 +
 2 files changed, 7 insertions(+)

diff --git a/src/compiler/nir_types.cpp b/src/compiler/nir_types.cpp
index 377de0c9c7bd..cbdd452dc813 100644
--- a/src/compiler/nir_types.cpp
+++ b/src/compiler/nir_types.cpp
@@ -297,6 +297,12 @@ glsl_vec4_type(void)
return glsl_type::vec4_type;
 }

+const glsl_type *
+glsl_uvec4_type(void)
+{
+   return glsl_type::uvec4_type;
+}
+
 const glsl_type *
 glsl_int_type(void)
 {
diff --git a/src/compiler/nir_types.h b/src/compiler/nir_types.h
index daff97325093..4397c2406f9a 100644
--- a/src/compiler/nir_types.h
+++ b/src/compiler/nir_types.h
@@ -136,6 +136,7 @@ const struct glsl_type *glsl_double_type(void);
 const struct glsl_type *glsl_vec_type(unsigned n);
 const struct glsl_type *glsl_dvec_type(unsigned n);
 const struct glsl_type *glsl_vec4_type(void);
+const struct glsl_type *glsl_uvec4_type(void);
 const struct glsl_type *glsl_int_type(void);
 const struct glsl_type *glsl_uint_type(void);
 const struct glsl_type *glsl_int64_t_type(void);
--
2.15.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 9/9] radv: Enable DCC with transfers.

2017-12-28 Thread Bas Nieuwenhuizen
Before this DCC was in practice disabled for most games. This
enables practical DCC use. Expect a 5-10% perf increase on a
bunch of games on vega @ 4k.
---
 src/amd/vulkan/radv_image.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c
index 6088928df80..d8cee6f6cba 100644
--- a/src/amd/vulkan/radv_image.c
+++ b/src/amd/vulkan/radv_image.c
@@ -148,8 +148,7 @@ radv_init_surface(struct radv_device *device,
}
}
 
-   if ((pCreateInfo->usage & (VK_IMAGE_USAGE_TRANSFER_SRC_BIT |
-  VK_IMAGE_USAGE_STORAGE_BIT)) ||
+   if ((pCreateInfo->usage & VK_IMAGE_USAGE_STORAGE_BIT) ||
(pCreateInfo->flags & VK_IMAGE_CREATE_EXTENDED_USAGE_BIT_KHR) ||
!dcc_compatible_formats ||
 (pCreateInfo->tiling == VK_IMAGE_TILING_LINEAR) ||
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/9] radv: Don't enable DCC / TC compat HTILE for storage images.

2017-12-28 Thread Bas Nieuwenhuizen
We don't get a layout when binding to a descriptor set, but can
assume that the LAYOUT is GENERAL.

For DCC stores with the DCC bits set will result in a hang, so
better be safe than sorry.
---
 src/amd/vulkan/radv_image.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c
index 40e6dfc3af1..aaf12bdcb16 100644
--- a/src/amd/vulkan/radv_image.c
+++ b/src/amd/vulkan/radv_image.c
@@ -240,7 +240,7 @@ si_set_mutable_tex_desc_fields(struct radv_device *device,
   const struct legacy_surf_level *base_level_info,
   unsigned base_level, unsigned first_level,
   unsigned block_width, bool is_stencil,
-  uint32_t *state)
+  bool is_storage_image, uint32_t *state)
 {
uint64_t gpu_address = image->bo ? radv_buffer_get_va(image->bo) + 
image->offset : 0;
uint64_t va = gpu_address;
@@ -264,11 +264,12 @@ si_set_mutable_tex_desc_fields(struct radv_device *device,
if (chip_class >= VI) {
state[6] &= C_008F28_COMPRESSION_EN;
state[7] = 0;
-   if (radv_vi_dcc_enabled(image, first_level)) {
+   if (!is_storage_image && radv_vi_dcc_enabled(image, 
first_level)) {
meta_va = gpu_address + image->dcc_offset;
if (chip_class <= VI)
meta_va += base_level_info->dcc_offset;
-   } else if(image->tc_compatible_htile && 
image->surface.htile_size) {
+   } else if(!is_storage_image && image->tc_compatible_htile &&
+ image->surface.htile_size) {
meta_va = gpu_address + image->htile_offset;
}
 
@@ -600,7 +601,7 @@ radv_query_opaque_metadata(struct radv_device *device,
   desc, NULL);
 
si_set_mutable_tex_desc_fields(device, image, 
>surface.u.legacy.level[0], 0, 0,
-  image->surface.blk_w, false, desc);
+  image->surface.blk_w, false, false, 
desc);
 
/* Clear the base address and set the relative DCC offset. */
desc[0] = 0;
@@ -1013,7 +1014,7 @@ radv_image_view_make_descriptor(struct radv_image_view 
*iview,
   base_level_info,
   iview->base_mip,
   iview->base_mip,
-  blk_w, is_stencil, descriptor);
+  blk_w, is_stencil, is_storage_image, 
descriptor);
 }
 
 void
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/9] radv: Add GFX DCC decompress.

2017-12-28 Thread Bas Nieuwenhuizen
---
 src/amd/vulkan/radv_meta_fast_clear.c | 94 ++-
 src/amd/vulkan/radv_private.h |  1 +
 2 files changed, 83 insertions(+), 12 deletions(-)

diff --git a/src/amd/vulkan/radv_meta_fast_clear.c 
b/src/amd/vulkan/radv_meta_fast_clear.c
index 1acf510359d..44c2ff52617 100644
--- a/src/amd/vulkan/radv_meta_fast_clear.c
+++ b/src/amd/vulkan/radv_meta_fast_clear.c
@@ -250,7 +250,55 @@ create_pipeline(struct radv_device *device,
if (result != VK_SUCCESS)
goto cleanup_cmask;
 
+   result = radv_graphics_pipeline_create(device_h,
+  
radv_pipeline_cache_to_handle(>meta_state.cache),
+  &(VkGraphicsPipelineCreateInfo) {
+  .sType = 
VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO,
+  .stageCount = 2,
+  .pStages = stages,
+
+  .pVertexInputState = 
_state,
+  .pInputAssemblyState = 
_state,
+
+  .pViewportState = 
&(VkPipelineViewportStateCreateInfo) {
+  .sType = 
VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO,
+  .viewportCount = 1,
+  .scissorCount = 1,
+  },
+  .pRasterizationState = 
_state,
+  .pMultisampleState = 
&(VkPipelineMultisampleStateCreateInfo) {
+  .sType = 
VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO,
+  .rasterizationSamples = 
1,
+  .sampleShadingEnable = 
false,
+  .pSampleMask = NULL,
+  .alphaToCoverageEnable = 
false,
+  .alphaToOneEnable = 
false,
+  },
+   .pColorBlendState = 
_state,
+   .pDynamicState = 
&(VkPipelineDynamicStateCreateInfo) {
+   .sType = 
VK_STRUCTURE_TYPE_PIPELINE_DYNAMIC_STATE_CREATE_INFO,
+   .dynamicStateCount = 2,
+   .pDynamicStates = 
(VkDynamicState[]) {
+   
VK_DYNAMIC_STATE_VIEWPORT,
+   
VK_DYNAMIC_STATE_SCISSOR,
+   },
+   },
+   .layout = layout,
+   .renderPass = 
device->meta_state.fast_clear_flush.pass,
+   .subpass = 0,
+  },
+  &(struct 
radv_graphics_pipeline_create_info) {
+  .use_rectlist = true,
+  .custom_blend_mode = 
V_028808_CB_DCC_DECOMPRESS,
+  },
+  >meta_state.alloc,
+  
>meta_state.fast_clear_flush.dcc_decompress_pipeline);
+   if (result != VK_SUCCESS)
+   goto cleanup_fmask;
+
goto cleanup;
+cleanup_fmask:
+   radv_DestroyPipeline(device_h, 
device->meta_state.fast_clear_flush.fmask_decompress_pipeline, 
>meta_state.alloc);
 cleanup_cmask:
radv_DestroyPipeline(device_h, 
device->meta_state.fast_clear_flush.cmask_eliminate_pipeline, 
>meta_state.alloc);
 cleanup:
@@ -263,17 +311,20 @@ radv_device_finish_meta_fast_clear_flush_state(struct 
radv_device *device)
 {
struct radv_meta_state *state = >meta_state;
 
-   radv_DestroyRenderPass(radv_device_to_handle(device),
-  state->fast_clear_flush.pass, >alloc);
-   radv_DestroyPipelineLayout(radv_device_to_handle(device),
-  state->fast_clear_flush.p_layout,
-  >alloc);
radv_DestroyPipeline(radv_device_to_handle(device),
-state->fast_clear_flush.cmask_eliminate_pipeline,
+

[Mesa-dev] [PATCH 6/9] radv: Don't init DCC metadata during FS resolve.

2017-12-28 Thread Bas Nieuwenhuizen
It should already be valid there + the RB will update it during
rendering.
---
 src/amd/vulkan/radv_meta_resolve_fs.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/src/amd/vulkan/radv_meta_resolve_fs.c 
b/src/amd/vulkan/radv_meta_resolve_fs.c
index 798129ec854..99314d94e53 100644
--- a/src/amd/vulkan/radv_meta_resolve_fs.c
+++ b/src/amd/vulkan/radv_meta_resolve_fs.c
@@ -629,13 +629,8 @@ radv_cmd_buffer_resolve_subpass_fs(struct radv_cmd_buffer 
*cmd_buffer)
continue;
 
struct radv_image_view *dest_iview = 
cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment;
-   struct radv_image *dst_img = dest_iview->image;
struct radv_image_view *src_iview = 
cmd_buffer->state.framebuffer->attachments[src_att.attachment].attachment;
 
-   if (dst_img->surface.dcc_size) {
-   radv_initialize_dcc(cmd_buffer, dst_img, 0x);
-   
cmd_buffer->state.attachments[dest_att.attachment].current_layout = 
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
-   }
{
VkImageSubresourceRange range;
range.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/9] radv: Add compute DCC decompress.

2017-12-28 Thread Bas Nieuwenhuizen
We do an in place copy where we read compressed and write decompressed.
By doing this in sizes that cover entire DCC blocks and waiting for all
reads in the block before starting to write we avoid corruption.

In the end we clear the DCC metadata to 0x.
---
 src/amd/vulkan/radv_meta.h|   3 +
 src/amd/vulkan/radv_meta_fast_clear.c | 268 ++
 src/amd/vulkan/radv_private.h |   4 +
 3 files changed, 275 insertions(+)

diff --git a/src/amd/vulkan/radv_meta.h b/src/amd/vulkan/radv_meta.h
index 3edf5fa6461..9f3198e8797 100644
--- a/src/amd/vulkan/radv_meta.h
+++ b/src/amd/vulkan/radv_meta.h
@@ -171,6 +171,9 @@ void radv_resummarize_depth_image_inplace(struct 
radv_cmd_buffer *cmd_buffer,
 void radv_fast_clear_flush_image_inplace(struct radv_cmd_buffer *cmd_buffer,
 struct radv_image *image,
 const VkImageSubresourceRange 
*subresourceRange);
+void radv_decompress_dcc(struct radv_cmd_buffer *cmd_buffer,
+   struct radv_image *image,
+const VkImageSubresourceRange *subresourceRange);
 
 void radv_meta_resolve_compute_image(struct radv_cmd_buffer *cmd_buffer,
 struct radv_image *src_image,
diff --git a/src/amd/vulkan/radv_meta_fast_clear.c 
b/src/amd/vulkan/radv_meta_fast_clear.c
index 2603229a1f7..98e8f6ac18a 100644
--- a/src/amd/vulkan/radv_meta_fast_clear.c
+++ b/src/amd/vulkan/radv_meta_fast_clear.c
@@ -28,6 +28,160 @@
 #include "radv_private.h"
 #include "sid.h"
 
+
+static nir_shader *
+build_dcc_decompress_compute_shader(struct radv_device *dev)
+{
+   nir_builder b;
+   const struct glsl_type *buf_type = 
glsl_sampler_type(GLSL_SAMPLER_DIM_2D,
+false,
+false,
+GLSL_TYPE_FLOAT);
+   const struct glsl_type *img_type = 
glsl_sampler_type(GLSL_SAMPLER_DIM_2D,
+false,
+false,
+GLSL_TYPE_FLOAT);
+   nir_builder_init_simple_shader(, NULL, MESA_SHADER_COMPUTE, NULL);
+   b.shader->info.name = ralloc_strdup(b.shader, "dcc_decompress_compute");
+
+   /* We need at least 16/16/1 to cover an entire DCC block in a single 
workgroup. */
+   b.shader->info.cs.local_size[0] = 16;
+   b.shader->info.cs.local_size[1] = 16;
+   b.shader->info.cs.local_size[2] = 1;
+   nir_variable *input_img = nir_variable_create(b.shader, nir_var_uniform,
+ buf_type, "s_tex");
+   input_img->data.descriptor_set = 0;
+   input_img->data.binding = 0;
+
+   nir_variable *output_img = nir_variable_create(b.shader, 
nir_var_uniform,
+  img_type, "out_img");
+   output_img->data.descriptor_set = 0;
+   output_img->data.binding = 1;
+
+   nir_ssa_def *invoc_id = nir_load_system_value(, 
nir_intrinsic_load_local_invocation_id, 0);
+   nir_ssa_def *wg_id = nir_load_system_value(, 
nir_intrinsic_load_work_group_id, 0);
+   nir_ssa_def *block_size = nir_imm_ivec4(,
+   b.shader->info.cs.local_size[0],
+   b.shader->info.cs.local_size[1],
+   
b.shader->info.cs.local_size[2], 0);
+
+   nir_ssa_def *global_id = nir_iadd(, nir_imul(, wg_id, block_size), 
invoc_id);
+
+   nir_tex_instr *tex = nir_tex_instr_create(b.shader, 2);
+   tex->sampler_dim = GLSL_SAMPLER_DIM_2D;
+   tex->op = nir_texop_txf;
+   tex->src[0].src_type = nir_tex_src_coord;
+   tex->src[0].src = nir_src_for_ssa(nir_channels(, global_id, 3));
+   tex->src[1].src_type = nir_tex_src_lod;
+   tex->src[1].src = nir_src_for_ssa(nir_imm_int(, 0));
+   tex->dest_type = nir_type_float;
+   tex->is_array = false;
+   tex->coord_components = 2;
+   tex->texture = nir_deref_var_create(tex, input_img);
+   tex->sampler = NULL;
+
+   nir_ssa_dest_init(>instr, >dest, 4, 32, "tex");
+   nir_builder_instr_insert(, >instr);
+
+   nir_intrinsic_instr *membar = nir_intrinsic_instr_create(b.shader, 
nir_intrinsic_memory_barrier);
+   nir_builder_instr_insert(, >instr);
+
+   nir_intrinsic_instr *bar = nir_intrinsic_instr_create(b.shader, 
nir_intrinsic_barrier);
+   nir_builder_instr_insert(, >instr);
+
+   nir_ssa_def *outval = >dest.ssa;
+   nir_intrinsic_instr *store = nir_intrinsic_instr_create(b.shader, 
nir_intrinsic_image_store);
+   store->src[0] = nir_src_for_ssa(global_id);
+   store->src[1] = nir_src_for_ssa(nir_ssa_undef(, 1, 32));
+   

[Mesa-dev] [PATCH 3/9] radv: Use the meta fast clear destructor on construction failure.

2017-12-28 Thread Bas Nieuwenhuizen
Simplifies failure paths. The caller already calls
radv_device_finish_meta_fast_clear_flush_state on failure.
---
 src/amd/vulkan/radv_meta_fast_clear.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/src/amd/vulkan/radv_meta_fast_clear.c 
b/src/amd/vulkan/radv_meta_fast_clear.c
index 44c2ff52617..2603229a1f7 100644
--- a/src/amd/vulkan/radv_meta_fast_clear.c
+++ b/src/amd/vulkan/radv_meta_fast_clear.c
@@ -248,7 +248,7 @@ create_pipeline(struct radv_device *device,
   >meta_state.alloc,
   
>meta_state.fast_clear_flush.fmask_decompress_pipeline);
if (result != VK_SUCCESS)
-   goto cleanup_cmask;
+   goto cleanup;
 
result = radv_graphics_pipeline_create(device_h,
   
radv_pipeline_cache_to_handle(>meta_state.cache),
@@ -294,13 +294,10 @@ create_pipeline(struct radv_device *device,
   >meta_state.alloc,
   
>meta_state.fast_clear_flush.dcc_decompress_pipeline);
if (result != VK_SUCCESS)
-   goto cleanup_fmask;
+   goto cleanup;
 
goto cleanup;
-cleanup_fmask:
-   radv_DestroyPipeline(device_h, 
device->meta_state.fast_clear_flush.fmask_decompress_pipeline, 
>meta_state.alloc);
-cleanup_cmask:
-   radv_DestroyPipeline(device_h, 
device->meta_state.fast_clear_flush.cmask_eliminate_pipeline, 
>meta_state.alloc);
+
 cleanup:
ralloc_free(fs_module.nir);
return result;
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/9] radv: Disable DCC for GENERAL layout and compute transfer dest.

2017-12-28 Thread Bas Nieuwenhuizen
Apps can use this for render feedback loops, where things are
defined if they render each pixel only once. However, DCC fails
here, as the level of coherence is a block not a pixel, so disable it.

This is also going to help implementing other stuff.

Even if we optimize this later to only happen if there actually is
a loop (if possible at all ...), then the machinery is still useful
to exclude images accessible by the SDMA queue when that is implemented.
---
 src/amd/vulkan/radv_cmd_buffer.c   | 29 +++--
 src/amd/vulkan/radv_image.c| 12 
 src/amd/vulkan/radv_meta_resolve.c | 10 --
 src/amd/vulkan/radv_private.h  |  4 
 4 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index 42468bceed2..c735d201802 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -1184,10 +1184,20 @@ radv_emit_depth_biais(struct radv_cmd_buffer 
*cmd_buffer)
 static void
 radv_emit_fb_color_state(struct radv_cmd_buffer *cmd_buffer,
 int index,
-struct radv_attachment_info *att)
+struct radv_attachment_info *att,
+struct radv_image *image,
+VkImageLayout layout)
 {
bool is_vi = cmd_buffer->device->physical_device->rad_info.chip_class 
>= VI;
struct radv_color_buffer_info *cb = >cb;
+   uint32_t cb_color_info = cb->cb_color_info;
+
+   if (!radv_layout_dcc_compressed(image, layout,
+   radv_image_queue_family_mask(image,
+
cmd_buffer->queue_family_index,
+
cmd_buffer->queue_family_index))) {
+   cb_color_info &= C_028C70_DCC_ENABLE;
+   }
 
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9) {
radeon_set_context_reg_seq(cmd_buffer->cs, 
R_028C60_CB_COLOR0_BASE + index * 0x3c, 11);
@@ -1195,7 +1205,7 @@ radv_emit_fb_color_state(struct radv_cmd_buffer 
*cmd_buffer,
radeon_emit(cmd_buffer->cs, cb->cb_color_base >> 32);
radeon_emit(cmd_buffer->cs, cb->cb_color_attrib2);
radeon_emit(cmd_buffer->cs, cb->cb_color_view);
-   radeon_emit(cmd_buffer->cs, cb->cb_color_info);
+   radeon_emit(cmd_buffer->cs, cb_color_info);
radeon_emit(cmd_buffer->cs, cb->cb_color_attrib);
radeon_emit(cmd_buffer->cs, cb->cb_dcc_control);
radeon_emit(cmd_buffer->cs, cb->cb_color_cmask);
@@ -1215,7 +1225,7 @@ radv_emit_fb_color_state(struct radv_cmd_buffer 
*cmd_buffer,
radeon_emit(cmd_buffer->cs, cb->cb_color_pitch);
radeon_emit(cmd_buffer->cs, cb->cb_color_slice);
radeon_emit(cmd_buffer->cs, cb->cb_color_view);
-   radeon_emit(cmd_buffer->cs, cb->cb_color_info);
+   radeon_emit(cmd_buffer->cs, cb_color_info);
radeon_emit(cmd_buffer->cs, cb->cb_color_attrib);
radeon_emit(cmd_buffer->cs, cb->cb_dcc_control);
radeon_emit(cmd_buffer->cs, cb->cb_color_cmask);
@@ -1461,13 +1471,15 @@ radv_emit_framebuffer_state(struct radv_cmd_buffer 
*cmd_buffer)
 
int idx = subpass->color_attachments[i].attachment;
struct radv_attachment_info *att = 
>attachments[idx];
+   struct radv_image *image = att->attachment->image;
+   VkImageLayout layout = subpass->color_attachments[i].layout;
 
radv_cs_add_buffer(cmd_buffer->device->ws, cmd_buffer->cs, 
att->attachment->bo, 8);
 
assert(att->attachment->aspect_mask & 
VK_IMAGE_ASPECT_COLOR_BIT);
-   radv_emit_fb_color_state(cmd_buffer, i, att);
+   radv_emit_fb_color_state(cmd_buffer, i, att, image, layout);
 
-   radv_load_color_clear_regs(cmd_buffer, att->attachment->image, 
i);
+   radv_load_color_clear_regs(cmd_buffer, image, i);
}
 
if(subpass->depth_stencil_attachment.attachment != 
VK_ATTACHMENT_UNUSED) {
@@ -3878,7 +3890,12 @@ static void radv_handle_dcc_image_transition(struct 
radv_cmd_buffer *cmd_buffer,
 const VkImageSubresourceRange 
*range)
 {
if (src_layout == VK_IMAGE_LAYOUT_UNDEFINED) {
-   radv_initialize_dcc(cmd_buffer, image, 0x20202020u);
+   radv_initialize_dcc(cmd_buffer, image,
+   radv_layout_dcc_compressed(image, 
dst_layout, dst_queue_mask) ?
+0x20202020u : 0xu);
+   } else if (radv_layout_dcc_compressed(image, src_layout, 
src_queue_mask) &&
+  !radv_layout_dcc_compressed(image, dst_layout, 
dst_queue_mask)) {
+  

[Mesa-dev] [PATCH 5/9] radv: Make color meta operations layout aware.

2017-12-28 Thread Bas Nieuwenhuizen
For fast clear eliminate and decompressions, we always use the most compressed
format.

For clears, the code already creates a renderpass on demand with the exact same
layout as specified.

Otherwise we start distinguishing between GENERAL and TRANSFER_DST_OPTIMAL.
---
 src/amd/vulkan/radv_meta_blit.c   | 76 ++
 src/amd/vulkan/radv_meta_blit2d.c | 77 +++
 src/amd/vulkan/radv_meta_fast_clear.c |  6 +--
 src/amd/vulkan/radv_meta_resolve_fs.c | 74 +
 src/amd/vulkan/radv_private.h | 22 --
 5 files changed, 145 insertions(+), 110 deletions(-)

diff --git a/src/amd/vulkan/radv_meta_blit.c b/src/amd/vulkan/radv_meta_blit.c
index 1f5f6ff739d..3ff48498d80 100644
--- a/src/amd/vulkan/radv_meta_blit.c
+++ b/src/amd/vulkan/radv_meta_blit.c
@@ -325,11 +325,12 @@ meta_emit_blit(struct radv_cmd_buffer *cmd_buffer,
switch (src_iview->aspect_mask) {
case VK_IMAGE_ASPECT_COLOR_BIT: {
unsigned fs_key = 
radv_format_meta_fs_key(dest_image->vk_format);
+   unsigned dst_layout = 
radv_meta_dst_layout_from_layout(dest_image_layout);
 
radv_CmdBeginRenderPass(radv_cmd_buffer_to_handle(cmd_buffer),
  &(VkRenderPassBeginInfo) {
  .sType = 
VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,
- .renderPass = 
device->meta_state.blit.render_pass[fs_key],
+ .renderPass = 
device->meta_state.blit.render_pass[fs_key][dst_layout],
  .framebuffer = fb,
  .renderArea = {
  .offset = { 
dest_box.offset.x, dest_box.offset.y },
@@ -644,9 +645,11 @@ radv_device_finish_meta_blit_state(struct radv_device 
*device)
struct radv_meta_state *state = >meta_state;
 
for (unsigned i = 0; i < NUM_META_FS_KEYS; ++i) {
-   radv_DestroyRenderPass(radv_device_to_handle(device),
-  state->blit.render_pass[i],
-  >alloc);
+   for (unsigned j = 0; j < RADV_META_DST_LAYOUT_COUNT; ++j) {
+   radv_DestroyRenderPass(radv_device_to_handle(device),
+  state->blit.render_pass[i][j],
+  >alloc);
+   }
radv_DestroyPipeline(radv_device_to_handle(device),
 state->blit.pipeline_1d_src[i],
 >alloc);
@@ -717,38 +720,41 @@ radv_device_init_meta_blit_color(struct radv_device 
*device,
 
for (unsigned i = 0; i < ARRAY_SIZE(pipeline_formats); ++i) {
unsigned key = radv_format_meta_fs_key(pipeline_formats[i]);
-   result = radv_CreateRenderPass(radv_device_to_handle(device),
-   &(VkRenderPassCreateInfo) {
-   .sType = 
VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO,
-   .attachmentCount = 1,
-   .pAttachments = 
&(VkAttachmentDescription) {
-   .format = 
pipeline_formats[i],
-   .loadOp = 
VK_ATTACHMENT_LOAD_OP_LOAD,
-   .storeOp = 
VK_ATTACHMENT_STORE_OP_STORE,
-   .initialLayout = 
VK_IMAGE_LAYOUT_GENERAL,
-   .finalLayout = 
VK_IMAGE_LAYOUT_GENERAL,
-   },
-   .subpassCount = 1,
-   
.pSubpasses = &(VkSubpassDescription) {
-   .pipelineBindPoint = 
VK_PIPELINE_BIND_POINT_GRAPHICS,
-   .inputAttachmentCount = 
0,
-   .colorAttachmentCount = 
1,
-   .pColorAttachments = 
&(VkAttachmentReference) {
-   .attachment = 0,
-   .layout = 
VK_IMAGE_LAYOUT_GENERAL,
+   for(unsigned j = 0; j < RADV_META_DST_LAYOUT_COUNT; ++j) {
+   VkImageLayout layout = 
radv_meta_dst_layout_to_layout(j);
+   result = 

[Mesa-dev] [PATCH 8/9] radv: Decompress copy destination if formats are incompatible.

2017-12-28 Thread Bas Nieuwenhuizen
If both source and destination are DCC compressed, and their formats
are not compatible, we need to decompress one of them to make
sure we can do reinterpretation (which needs src format == dst format)
.
---
 src/amd/vulkan/radv_meta_copy.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/src/amd/vulkan/radv_meta_copy.c b/src/amd/vulkan/radv_meta_copy.c
index 29951f2ba44..7f7ef22efc8 100644
--- a/src/amd/vulkan/radv_meta_copy.c
+++ b/src/amd/vulkan/radv_meta_copy.c
@@ -369,8 +369,31 @@ meta_copy_image(struct radv_cmd_buffer *cmd_buffer,
dest_image_layout,

[r].dstSubresource);
 
-   /* for DCC */
-   b_src.format = b_dst.format;
+   uint32_t dst_queue_mask = 
radv_image_queue_family_mask(dest_image,
+  
cmd_buffer->queue_family_index,
+  
cmd_buffer->queue_family_index);
+   bool dst_compressed = radv_layout_dcc_compressed(dest_image, 
dest_image_layout, dst_queue_mask);
+   uint32_t src_queue_mask = 
radv_image_queue_family_mask(src_image,
+  
cmd_buffer->queue_family_index,
+  
cmd_buffer->queue_family_index);
+   bool src_compressed = radv_layout_dcc_compressed(src_image, 
src_image_layout, src_queue_mask);
+
+   if (!src_compressed || 
radv_dcc_formats_compatible(b_src.format, b_dst.format)) {
+   b_src.format = b_dst.format;
+   } else if (!dst_compressed) {
+   b_dst.format = b_src.format;
+   } else {
+   radv_decompress_dcc(cmd_buffer, dest_image, 
&(VkImageSubresourceRange) {
+   .aspectMask = 
pRegions[r].dstSubresource.aspectMask,
+   .baseMipLevel = 
pRegions[r].dstSubresource.mipLevel,
+   .levelCount = 1,
+   .baseArrayLayer = 
pRegions[r].dstSubresource.baseArrayLayer,
+   .layerCount = 
pRegions[r].dstSubresource.layerCount,
+   });
+   b_dst.format = b_src.format;
+   b_dst.current_layout = VK_IMAGE_LAYOUT_GENERAL;
+   }
+
 
/**
 * From the Vulkan 1.0.6 spec: 18.4 Copying Data Between 
Buffers and Images
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] nir: Add a helper to get the uvec4 type.

2017-12-28 Thread Eric Anholt
I needed this in the vc5 compiler.
---
 src/compiler/nir_types.cpp | 6 ++
 src/compiler/nir_types.h   | 1 +
 2 files changed, 7 insertions(+)

diff --git a/src/compiler/nir_types.cpp b/src/compiler/nir_types.cpp
index 377de0c9c7bd..cbdd452dc813 100644
--- a/src/compiler/nir_types.cpp
+++ b/src/compiler/nir_types.cpp
@@ -297,6 +297,12 @@ glsl_vec4_type(void)
return glsl_type::vec4_type;
 }
 
+const glsl_type *
+glsl_uvec4_type(void)
+{
+   return glsl_type::uvec4_type;
+}
+
 const glsl_type *
 glsl_int_type(void)
 {
diff --git a/src/compiler/nir_types.h b/src/compiler/nir_types.h
index daff97325093..4397c2406f9a 100644
--- a/src/compiler/nir_types.h
+++ b/src/compiler/nir_types.h
@@ -136,6 +136,7 @@ const struct glsl_type *glsl_double_type(void);
 const struct glsl_type *glsl_vec_type(unsigned n);
 const struct glsl_type *glsl_dvec_type(unsigned n);
 const struct glsl_type *glsl_vec4_type(void);
+const struct glsl_type *glsl_uvec4_type(void);
 const struct glsl_type *glsl_int_type(void);
 const struct glsl_type *glsl_uint_type(void);
 const struct glsl_type *glsl_int64_t_type(void);
-- 
2.15.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] util: use zlib's CRC32 implementaion for larger buffers

2017-12-28 Thread Grazvydas Ignotas
zlib provides a faster slice-by-4 CRC32 implementation than the
traditional single byte lookup one used by mesa. As most supported
platforms now link zlib unconditionally, we can easily use it.
For small buffers the old implementation is still used as it's faster
with cold cache (first call), as indicated by some throughput
benchmarking (avg MB/s, n=100, zlib 1.2.8):

i5-6600KC2D E4500
size  mesa zlibmesa zlib
4   66   43 -35% +/- 4.8%43   22 -49% +/- 9.6%
32 193  171 -11% +/- 5.8%   129   49 -61% +/- 7.2%
64 256  267   4% +/- 4.1%   171   63 -63% +/- 5.4%
128317  389  22% +/- 5.8%   253   89 -64% +/- 4.2%
256364  596  63% +/- 5.6%   304  166 -45% +/- 2.8%
512401  838 108% +/- 5.3%   338  296 -12% +/- 3.1%
1024   420 1036 146% +/- 7.6%   375  461  23% +/- 3.7%
1M 443 1443 225% +/- 2.1%   403 1175 191% +/- 0.9%
100M   448 1452 224% +/- 0.3%   406 1214 198% +/- 0.3%

With hot cache (repeated calls) zlib almost always wins on both CPUS.
It has been verified the calculation results stay the same after this
change.

Signed-off-by: Grazvydas Ignotas 
---
 src/util/crc32.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/util/crc32.c b/src/util/crc32.c
index f2e01c6..0cffa49 100644
--- a/src/util/crc32.c
+++ b/src/util/crc32.c
@@ -31,12 +31,20 @@
  * 
  * @author Jose Fonseca
  */
 
 
+#ifdef HAVE_ZLIB
+#include 
+#endif
 #include "crc32.h"
 
+/* For small buffers it's faster to avoid the library call.
+ * The optimal threshold depends on CPU characteristics, it is hoped
+ * the choice below is reasonable for typical modern CPU.
+ */
+#define ZLIB_SIZE_THRESHOLD 64
 
 static const uint32_t 
 util_crc32_table[256] = {
0x, 0x77073096, 0xee0e612c, 0x990951ba, 
0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3, 
@@ -112,10 +120,15 @@ uint32_t
 util_hash_crc32(const void *data, size_t size)
 {
const uint8_t *p = data;
uint32_t crc = 0x;
  
+#ifdef HAVE_ZLIB
+   if (size >= ZLIB_SIZE_THRESHOLD && (uInt)size == size)
+  return ~crc32(0, data, size);
+#endif
+
while (size--)
   crc = util_crc32_table[(crc ^ *p++) & 0xff] ^ (crc >> 8);

return crc;
 }
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] nir: Add a lowering pass for gl_FragColor to glFragData[] writes.

2017-12-28 Thread Eric Anholt
For VC5, the shader needs to have the appropriate base type for the
variable in the render target write, and gallium's
FS_COLOR0_WRITES_ALL_CBUFS (used for glClearBufferiv) doesn't give you
that information.  This pass lets the backend decide what types to explode
the gl_FragColor write out to.

This would also be a prerequisite of moving some of VC5's render target
format packing into NIR as well.
---
 src/compiler/Makefile.sources |   1 +
 src/compiler/nir/meson.build  |   1 +
 src/compiler/nir/nir.h|   3 +
 src/compiler/nir/nir_lower_gl_fragcolor.c | 143 ++
 4 files changed, 148 insertions(+)
 create mode 100644 src/compiler/nir/nir_lower_gl_fragcolor.c

diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
index d3f746f5f948..4afaa1a2146a 100644
--- a/src/compiler/Makefile.sources
+++ b/src/compiler/Makefile.sources
@@ -220,6 +220,7 @@ NIR_FILES = \
nir/nir_lower_constant_initializers.c \
nir/nir_lower_double_ops.c \
nir/nir_lower_drawpixels.c \
+   nir/nir_lower_gl_fragcolor.c \
nir/nir_lower_global_vars_to_local.c \
nir/nir_lower_gs_intrinsics.c \
nir/nir_lower_load_const_to_scalar.c \
diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
index 5dd21e6652f0..9e11279118f6 100644
--- a/src/compiler/nir/meson.build
+++ b/src/compiler/nir/meson.build
@@ -114,6 +114,7 @@ files_libnir = files(
   'nir_lower_constant_initializers.c',
   'nir_lower_double_ops.c',
   'nir_lower_drawpixels.c',
+  'nir_lower_gl_fragcolor.c',
   'nir_lower_global_vars_to_local.c',
   'nir_lower_gs_intrinsics.c',
   'nir_lower_load_const_to_scalar.c',
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 440c3fe9974c..17bb8fc8de4c 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2680,6 +2680,9 @@ bool nir_lower_atomics(nir_shader *shader,
 bool nir_lower_atomics_to_ssbo(nir_shader *shader, unsigned ssbo_offset);
 bool nir_lower_uniforms_to_ubo(nir_shader *shader);
 bool nir_lower_to_source_mods(nir_shader *shader);
+bool nir_lower_gl_fragcolor(nir_shader *shader,
+uint32_t rt_mask,
+const struct glsl_type **types);
 
 bool nir_lower_gs_intrinsics(nir_shader *shader);
 
diff --git a/src/compiler/nir/nir_lower_gl_fragcolor.c 
b/src/compiler/nir/nir_lower_gl_fragcolor.c
new file mode 100644
index ..d4b39f00c233
--- /dev/null
+++ b/src/compiler/nir/nir_lower_gl_fragcolor.c
@@ -0,0 +1,143 @@
+/*
+ * Copyright © 2017 Broadcom
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
THE
+ * SOFTWARE.
+ */
+
+#include "nir.h"
+#include "nir_builder.h"
+
+/**
+ * Lowers gl_FragColor to a per-render-target store.
+ *
+ * GLSL's gl_FragColor writes implicitly broadcast their store to every active
+ * render target.  This can be used by driver backends to implement
+ * gl_FragColor in the same way as other multiple-render-target shaders, and
+ * is particularly useful if the driver needs to do other per-render-target
+ * lowering in NIR.
+ *
+ * Run before nir_lower_io.
+ */
+
+typedef struct {
+   nir_shader *shader;
+   nir_builder b;
+
+   nir_variable *var; /* gl_FragColor */
+
+   int num_rt_vars;
+   nir_variable *rt_var[32]; /* gl_FragDataN */
+} lower_gl_fragcolor_state;
+
+static void
+lower_gl_fragcolor(lower_gl_fragcolor_state *state, nir_intrinsic_instr *intr)
+{
+   nir_builder *b = >b;
+
+   assert(intr->dest.is_ssa);
+
+   b->cursor = nir_before_instr(>instr);
+
+   /* Generate a gl_FragDataN write per render target. */
+   nir_ssa_def *color = nir_ssa_for_src(b, intr->src[0], 4);
+   for (int i = 0; i < state->num_rt_vars; i++) {
+  nir_store_var(b, state->rt_var[i], color, 0xf);
+   }
+
+   /* Remove the gl_FragColor write. */
+   nir_instr_remove(>instr);
+}
+
+static bool
+lower_gl_fragcolor_block(lower_gl_fragcolor_state 

[Mesa-dev] [PATCH 1/3] android,configure,meson: define HAVE_ZLIB

2017-12-28 Thread Grazvydas Ignotas
The next change wants to use some optional zlib functionality, however
not all platforms currently use zlib. Based on earlier Jordan Justen's
patches and their review feedback.

Signed-off-by: Grazvydas Ignotas 
---
 Android.common.mk | 1 +
 configure.ac  | 1 +
 meson.build   | 1 +
 3 files changed, 3 insertions(+)

diff --git a/Android.common.mk b/Android.common.mk
index d9f871c..52dc7bf 100644
--- a/Android.common.mk
+++ b/Android.common.mk
@@ -68,10 +68,11 @@ LOCAL_CFLAGS += \
-DHAVE___BUILTIN_UNREACHABLE \
-DHAVE_PTHREAD=1 \
-DHAVE_DLADDR \
-DHAVE_DL_ITERATE_PHDR \
-DHAVE_LINUX_FUTEX_H \
+   -DHAVE_ZLIB \
-DMAJOR_IN_SYSMACROS \
-fvisibility=hidden \
-Wno-sign-compare
 
 LOCAL_CPPFLAGS += \
diff --git a/configure.ac b/configure.ac
index 79f275d..e236a3c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -904,10 +904,11 @@ esac
 dnl See if posix_memalign is available
 AC_CHECK_FUNC([posix_memalign], [DEFINES="$DEFINES -DHAVE_POSIX_MEMALIGN"])
 
 dnl Check for zlib
 PKG_CHECK_MODULES([ZLIB], [zlib >= $ZLIB_REQUIRED])
+DEFINES="$DEFINES -DHAVE_ZLIB"
 
 dnl Check for pthreads
 AX_PTHREAD
 if test "x$ax_pthread_ok" = xno; then
 AC_MSG_ERROR([Building mesa on this platform requires pthreads])
diff --git a/meson.build b/meson.build
index d9f7ea9..9d9d074 100644
--- a/meson.build
+++ b/meson.build
@@ -922,10 +922,11 @@ if dep_libdrm.found()
   endif
 endif
 
 # TODO: some of these may be conditional
 dep_zlib = dependency('zlib', version : '>= 1.2.3')
+pre_args += '-DHAVE_ZLIB'
 dep_thread = dependency('threads')
 if dep_thread.found() and host_machine.system() != 'windows'
   pre_args += '-DHAVE_PTHREAD'
 endif
 if with_amd_vk or with_gallium_radeonsi or with_gallium_r600 # TODO: clover
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] util/crc32: don't drop the const qualifier

2017-12-28 Thread Grazvydas Ignotas
Signed-off-by: Grazvydas Ignotas 
---
 src/util/crc32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/util/crc32.c b/src/util/crc32.c
index 44d637c..f2e01c6 100644
--- a/src/util/crc32.c
+++ b/src/util/crc32.c
@@ -109,11 +109,11 @@ util_crc32_table[256] = {
  * @sa http://www.w3.org/TR/PNG/#D-CRCAppendix
  */
 uint32_t
 util_hash_crc32(const void *data, size_t size)
 {
-   uint8_t *p = (uint8_t *)data;
+   const uint8_t *p = data;
uint32_t crc = 0x;
  
while (size--)
   crc = util_crc32_table[(crc ^ *p++) & 0xff] ^ (crc >> 8);

-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] broadcom/vc5: Use the new glFragColor lowering pass.

2017-12-28 Thread Eric Anholt
This fixes dEQP-GLES3.functional.fbo.color.clear.r16i and friends, by
making sure we do an integer TLB store instead of float.
---
 src/broadcom/compiler/nir_to_vir.c|  5 +
 src/broadcom/compiler/v3d_compiler.h  |  6 ++
 src/broadcom/compiler/vir.c   | 13 +
 src/gallium/drivers/vc5/vc5_program.c |  5 +
 4 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/src/broadcom/compiler/nir_to_vir.c 
b/src/broadcom/compiler/nir_to_vir.c
index 1cf8865bf0e1..4bd9ae2e9a74 100644
--- a/src/broadcom/compiler/nir_to_vir.c
+++ b/src/broadcom/compiler/nir_to_vir.c
@@ -1493,10 +1493,7 @@ ntq_setup_outputs(struct v3d_compile *c)
 if (c->s->info.stage == MESA_SHADER_FRAGMENT) {
 switch (var->data.location) {
 case FRAG_RESULT_COLOR:
-c->output_color_var[0] = var;
-c->output_color_var[1] = var;
-c->output_color_var[2] = var;
-c->output_color_var[3] = var;
+unreachable("Should have been lowered");
 break;
 case FRAG_RESULT_DATA0:
 case FRAG_RESULT_DATA1:
diff --git a/src/broadcom/compiler/v3d_compiler.h 
b/src/broadcom/compiler/v3d_compiler.h
index bbe7a57fa10e..d060af3c4169 100644
--- a/src/broadcom/compiler/v3d_compiler.h
+++ b/src/broadcom/compiler/v3d_compiler.h
@@ -322,6 +322,12 @@ struct v3d_fs_key {
 uint8_t swap_color_rb;
 /* Mask of which render targets need to be written as 32-bit floats */
 uint8_t f32_color_rb;
+
+/* Mask of which render targets need gl_FragColor output as a vec4. */
+uint8_t gl_fragcolor_lower_vec4;
+/* Mask of which render targets need gl_FragColor output as a uvec4. */
+uint8_t gl_fragcolor_lower_uvec4;
+
 uint8_t alpha_test_func;
 uint8_t logicop_func;
 uint32_t point_sprite_mask;
diff --git a/src/broadcom/compiler/vir.c b/src/broadcom/compiler/vir.c
index 4e78a477bd7d..abcb430c6e3b 100644
--- a/src/broadcom/compiler/vir.c
+++ b/src/broadcom/compiler/vir.c
@@ -750,6 +750,19 @@ uint64_t *v3d_compile_fs(const struct v3d_compiler 
*compiler,
 if (key->base.ucp_enables)
 NIR_PASS_V(c->s, nir_lower_clip_fs, key->base.ucp_enables);
 
+const struct glsl_type *gl_fragcolor_types[4] = {NULL, NULL, NULL, 
NULL};
+for (int i = 0; i < ARRAY_SIZE(gl_fragcolor_types); i++) {
+if (key->gl_fragcolor_lower_vec4 & (1 << i))
+gl_fragcolor_types[i] = glsl_vec4_type();
+else if (key->gl_fragcolor_lower_uvec4 & (1 << i))
+gl_fragcolor_types[i] = glsl_uvec4_type();
+}
+
+NIR_PASS_V(c->s, nir_lower_gl_fragcolor,
+   key->gl_fragcolor_lower_vec4 |
+   key->gl_fragcolor_lower_uvec4,
+   gl_fragcolor_types);
+
 /* Note: FS input scalarizing must happen after
  * nir_lower_two_sided_color, which only handles a vec4 at a time.
  */
diff --git a/src/gallium/drivers/vc5/vc5_program.c 
b/src/gallium/drivers/vc5/vc5_program.c
index 4f902fd4c65d..2fb50897730d 100644
--- a/src/gallium/drivers/vc5/vc5_program.c
+++ b/src/gallium/drivers/vc5/vc5_program.c
@@ -378,6 +378,11 @@ vc5_update_compiled_fs(struct vc5_context *vc5, uint8_t 
prim_mode)
 desc->channel[0].size == 32) {
 key->f32_color_rb |= 1 << i;
 }
+
+if (desc->channel[0].pure_integer)
+key->gl_fragcolor_lower_uvec4 |= 1 << i;
+else
+key->gl_fragcolor_lower_vec4 |= 1 << i;
 }
 
 if (key->is_points) {
-- 
2.15.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radv/gfx9: use correct swizzle parameter to work out border swizzle.

2017-12-28 Thread Dave Airlie
From: Dave Airlie 

This should fix:
dEQP-VK.pipeline.sampler.view_type.*.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black
and a few others in that area.

Fixes: b11c4a5546 (radv: add texture descriptor/fmask/cmask support for GFX9)
Signed-off-by: Dave Airlie 
---
 src/amd/vulkan/radv_image.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c
index a8b40909cf..3718606a91 100644
--- a/src/amd/vulkan/radv_image.c
+++ b/src/amd/vulkan/radv_image.c
@@ -345,7 +345,7 @@ static unsigned radv_tex_dim(VkImageType image_type, 
VkImageViewType view_type,
}
 }
 
-static unsigned gfx9_border_color_swizzle(const unsigned char swizzle[4])
+static unsigned gfx9_border_color_swizzle(const enum vk_swizzle swizzle[4])
 {
unsigned bc_swizzle = V_008F20_BC_SWIZZLE_XYZW;
 
@@ -459,7 +459,7 @@ si_make_texture_descriptor(struct radv_device *device,
state[7] = 0;
 
if (device->physical_device->rad_info.chip_class >= GFX9) {
-   unsigned bc_swizzle = gfx9_border_color_swizzle(desc->swizzle);
+   unsigned bc_swizzle = gfx9_border_color_swizzle(swizzle);
 
/* Depth is the the last accessible layer on Gfx9.
 * The hw doesn't need to know the total number of layers.
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radv/gfx9: use a bigger hammer to flush cb/db caches.

2017-12-28 Thread Dave Airlie
From: Dave Airlie 

amdvlk is probably more subtle than this but it never uses
the inv cb/db variants, we fail some CTS tests without this.

Fixes:
dEQP-VK.renderpass.dedicated_allocation.formats.d32_sfloat_s8_uint.input*.

Fixes: c2fbeb7ca05 (radv: add GFX9 cache flushing support.)
Signed-off-by: Dave Airlie 
---
 src/amd/vulkan/si_cmd_buffer.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c
index 972d37948a..a6981c136e 100644
--- a/src/amd/vulkan/si_cmd_buffer.c
+++ b/src/amd/vulkan/si_cmd_buffer.c
@@ -991,6 +991,11 @@ si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
if (chip_class >= GFX9 && flush_cb_db) {
unsigned cb_db_event, tc_flags;
 
+#if 0
+   /* This breaks a bunch of:
+  
dEQP-VK.renderpass.dedicated_allocation.formats.d32_sfloat_s8_uint.input*.
+  use the big hammer always.
+   */
/* Set the CB/DB flush event. */
switch (flush_cb_db) {
case RADV_CMD_FLAG_FLUSH_AND_INV_CB:
@@ -1003,7 +1008,9 @@ si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
/* both CB & DB */
cb_db_event = V_028A90_CACHE_FLUSH_AND_INV_TS_EVENT;
}
-
+#else
+   cb_db_event = V_028A90_CACHE_FLUSH_AND_INV_TS_EVENT;
+#endif
/* TC| TC_WB = invalidate L2 data
 * TC_MD | TC_WB = invalidate L2 metadata
 * TC| TC_WB | TC_MD = invalidate L2 data & metadata
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radv/gfx9: fix block compression texture views.

2017-12-28 Thread Dave Airlie
From: Dave Airlie 

This ports a fix from amdvlk, to fix the sizing for mip levels
when block compressed images are viewed using uncompressed views.

Fixes:
dEQP-VK.image.texel_view_compatible.graphic.extended*bc*

Fixes: e38685cc62e 'Revert "radv: disable support for VEGA for now."'
Signed-off-by: Dave Airlie 
---
 src/amd/vulkan/radv_image.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c
index 40e6dfc3af..a8b40909cf 100644
--- a/src/amd/vulkan/radv_image.c
+++ b/src/amd/vulkan/radv_image.c
@@ -1067,6 +1067,46 @@ radv_image_view_init(struct radv_image_view *iview,
   
vk_format_get_blockwidth(image->vk_format));
iview->extent.height = round_up_u32(iview->extent.height * 
vk_format_get_blockheight(iview->vk_format),

vk_format_get_blockheight(image->vk_format));
+   /* from amdvlk -
+* If we have the following image:
+*  Uncompressed pixels   Compressed block sizes 
(4x4)
+*  mip0:   22 x 22   6 x 6
+*  mip1:   11 x 11   3 x 3
+*  mip2:5 x  5   2 x 2
+*  mip3:2 x  2   1 x 1
+*  mip4:1 x  1   1 x 1
+*
+* On GFX9 the SRD is always programmed with the WIDTH and 
HEIGHT of the base level and the HW is
+* calculating the degradation of the block sizes down the 
mip-chain as follows (straight-up
+* divide-by-two integer math):
+*  mip0:  6x6
+*  mip1:  3x3
+*  mip2:  1x1
+*  mip3:  1x1
+*
+* This means that mip2 will be missing texels.
+*
+* Fix this by calculating the start mip's ceil(texels/blocks) 
width and height and then go up the chain
+* to pad the base mip's width and height to account for this.  
A result lower than the base mip's
+* indicates a non-power-of-two texture, and the result should 
be clamped to its extentElements.
+* Otherwise, if the mip is aligned to block multiples, the 
result will be equal to extentElements.  If
+* there is no suitable width or height, the 
actualExtentElements is chosen.  The application is in
+* charge of making sure the math works out properly if they do 
this (allowed by Vulkan), otherwise we
+* assume it's an internal view and the copy shaders will 
prevent accessing out-of-bounds pixels.
+*/
+if (device->physical_device->rad_info.chip_class >= GFX9 &&
+vk_format_is_compressed(image->vk_format)) {
+unsigned lvl_width  = radv_minify(image->info.width , 
range->baseMipLevel);
+unsigned lvl_height = radv_minify(image->info.height, 
range->baseMipLevel);
+
+lvl_width = round_up_u32(lvl_width * 
vk_format_get_blockwidth(iview->vk_format),
+ 
vk_format_get_blockwidth(image->vk_format));
+lvl_height = round_up_u32(lvl_height * 
vk_format_get_blockheight(iview->vk_format),
+  
vk_format_get_blockheight(image->vk_format));
+
+iview->extent.width = MAX2(iview->extent.width, 
lvl_width << range->baseMipLevel);
+iview->extent.height = MAX2(iview->extent.height, 
lvl_height << range->baseMipLevel);
+}
}
 
iview->base_layer = range->baseArrayLayer;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] radv/gfx9: fix 3d image clears on compute queues

2017-12-28 Thread Dave Airlie
From: Dave Airlie 

This fixes some of the broken:
dEQP-VK.synchronization.op.multi_queue.*64x64x8* tests.

Fixes: e38685cc62e 'Revert "radv: disable support for VEGA for now."'
Signed-off-by: Dave Airlie 
---
 src/amd/vulkan/radv_meta_bufimage.c | 73 -
 src/amd/vulkan/radv_private.h   |  1 +
 2 files changed, 65 insertions(+), 9 deletions(-)

diff --git a/src/amd/vulkan/radv_meta_bufimage.c 
b/src/amd/vulkan/radv_meta_bufimage.c
index b2dca80ad5..1696f85ac1 100644
--- a/src/amd/vulkan/radv_meta_bufimage.c
+++ b/src/amd/vulkan/radv_meta_bufimage.c
@@ -667,15 +667,16 @@ radv_device_finish_meta_itoi_state(struct radv_device 
*device)
 }
 
 static nir_shader *
-build_nir_cleari_compute_shader(struct radv_device *dev)
+build_nir_cleari_compute_shader(struct radv_device *dev, bool is_3d)
 {
nir_builder b;
-   const struct glsl_type *img_type = 
glsl_sampler_type(GLSL_SAMPLER_DIM_2D,
+   enum glsl_sampler_dim dim = is_3d ? GLSL_SAMPLER_DIM_3D : 
GLSL_SAMPLER_DIM_2D;
+   const struct glsl_type *img_type = glsl_sampler_type(dim,
 false,
 false,
 GLSL_TYPE_FLOAT);
nir_builder_init_simple_shader(, NULL, MESA_SHADER_COMPUTE, NULL);
-   b.shader->info.name = ralloc_strdup(b.shader, "meta_cleari_cs");
+   b.shader->info.name = ralloc_strdup(b.shader, is_3d ? 
"meta_cleari_cs_3d" : "meta_cleari_cs");
b.shader->info.cs.local_size[0] = 16;
b.shader->info.cs.local_size[1] = 16;
b.shader->info.cs.local_size[2] = 1;
@@ -696,12 +697,29 @@ build_nir_cleari_compute_shader(struct radv_device *dev)
 
nir_intrinsic_instr *clear_val = nir_intrinsic_instr_create(b.shader, 
nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(clear_val, 0);
-   nir_intrinsic_set_range(clear_val, 16);
+   nir_intrinsic_set_range(clear_val, 20);
clear_val->src[0] = nir_src_for_ssa(nir_imm_int(, 0));
clear_val->num_components = 4;
nir_ssa_dest_init(_val->instr, _val->dest, 4, 32, 
"clear_value");
nir_builder_instr_insert(, _val->instr);
 
+   nir_intrinsic_instr *layer = nir_intrinsic_instr_create(b.shader, 
nir_intrinsic_load_push_constant);
+   nir_intrinsic_set_base(layer, 0);
+   nir_intrinsic_set_range(layer, 20);
+   layer->src[0] = nir_src_for_ssa(nir_imm_int(, 16));
+   layer->num_components = 1;
+   nir_ssa_dest_init(>instr, >dest, 1, 32, "layer");
+   nir_builder_instr_insert(, >instr);
+
+   nir_ssa_def *global_z = nir_iadd(, nir_channel(, global_id, 2), 
>dest.ssa);
+
+   nir_ssa_def *comps[4];
+   comps[0] = nir_channel(, global_id, 0);
+   comps[1] = nir_channel(, global_id, 1);
+   comps[2] = global_z;
+   comps[3] = nir_imm_int(, 0);
+   global_id = nir_vec(, comps, 4);
+
nir_intrinsic_instr *store = nir_intrinsic_instr_create(b.shader, 
nir_intrinsic_image_store);
store->src[0] = nir_src_for_ssa(global_id);
store->src[1] = nir_src_for_ssa(nir_ssa_undef(, 1, 32));
@@ -717,8 +735,10 @@ radv_device_init_meta_cleari_state(struct radv_device 
*device)
 {
VkResult result;
struct radv_shader_module cs = { .nir = NULL };
-
-   cs.nir = build_nir_cleari_compute_shader(device);
+   struct radv_shader_module cs_3d = { .nir = NULL };
+   cs.nir = build_nir_cleari_compute_shader(device, false);
+   if (device->physical_device->rad_info.chip_class >= GFX9)
+   cs_3d.nir = build_nir_cleari_compute_shader(device, true);
 
/*
 * two descriptors one for the image being sampled
@@ -752,7 +772,7 @@ radv_device_init_meta_cleari_state(struct radv_device 
*device)
.setLayoutCount = 1,
.pSetLayouts = >meta_state.cleari.img_ds_layout,
.pushConstantRangeCount = 1,
-   .pPushConstantRanges = 
&(VkPushConstantRange){VK_SHADER_STAGE_COMPUTE_BIT, 0, 16},
+   .pPushConstantRanges = 
&(VkPushConstantRange){VK_SHADER_STAGE_COMPUTE_BIT, 0, 20},
};
 
result = radv_CreatePipelineLayout(radv_device_to_handle(device),
@@ -786,10 +806,38 @@ radv_device_init_meta_cleari_state(struct radv_device 
*device)
if (result != VK_SUCCESS)
goto fail;
 
+
+   if (device->physical_device->rad_info.chip_class >= GFX9) {
+   /* compute shader */
+   VkPipelineShaderStageCreateInfo pipeline_shader_stage_3d = {
+   .sType = 
VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,
+   .stage = VK_SHADER_STAGE_COMPUTE_BIT,
+   .module = radv_shader_module_to_handle(_3d),
+   .pName = "main",
+   .pSpecializationInfo = NULL,
+   };
+
+

[Mesa-dev] [PATCH 1/3] radv/gfx9: fix 3d image to image transfers on compute queues.

2017-12-28 Thread Dave Airlie
From: Dave Airlie 

This fixes some of the broken:
dEQP-VK.synchronization.op.multi_queue.*64x64x8* tests.

Fixes: e38685cc62e 'Revert "radv: disable support for VEGA for now."'
Signed-off-by: Dave Airlie 
---
 src/amd/vulkan/radv_meta_bufimage.c | 75 +++--
 src/amd/vulkan/radv_private.h   |  1 +
 2 files changed, 56 insertions(+), 20 deletions(-)

diff --git a/src/amd/vulkan/radv_meta_bufimage.c 
b/src/amd/vulkan/radv_meta_bufimage.c
index a1e67b6840..b2dca80ad5 100644
--- a/src/amd/vulkan/radv_meta_bufimage.c
+++ b/src/amd/vulkan/radv_meta_bufimage.c
@@ -451,19 +451,20 @@ radv_device_finish_meta_btoi_state(struct radv_device 
*device)
 }
 
 static nir_shader *
-build_nir_itoi_compute_shader(struct radv_device *dev)
+build_nir_itoi_compute_shader(struct radv_device *dev, bool is_3d)
 {
nir_builder b;
-   const struct glsl_type *buf_type = 
glsl_sampler_type(GLSL_SAMPLER_DIM_2D,
+   enum glsl_sampler_dim dim = is_3d ? GLSL_SAMPLER_DIM_3D : 
GLSL_SAMPLER_DIM_2D;
+   const struct glsl_type *buf_type = glsl_sampler_type(dim,
 false,
 false,
 GLSL_TYPE_FLOAT);
-   const struct glsl_type *img_type = 
glsl_sampler_type(GLSL_SAMPLER_DIM_2D,
+   const struct glsl_type *img_type = glsl_sampler_type(dim,
 false,
 false,
 GLSL_TYPE_FLOAT);
nir_builder_init_simple_shader(, NULL, MESA_SHADER_COMPUTE, NULL);
-   b.shader->info.name = ralloc_strdup(b.shader, "meta_itoi_cs");
+   b.shader->info.name = ralloc_strdup(b.shader, is_3d ? "meta_itoi_cs_3d" 
: "meta_itoi_cs");
b.shader->info.cs.local_size[0] = 16;
b.shader->info.cs.local_size[1] = 16;
b.shader->info.cs.local_size[2] = 1;
@@ -488,18 +489,18 @@ build_nir_itoi_compute_shader(struct radv_device *dev)
 
nir_intrinsic_instr *src_offset = nir_intrinsic_instr_create(b.shader, 
nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(src_offset, 0);
-   nir_intrinsic_set_range(src_offset, 16);
+   nir_intrinsic_set_range(src_offset, 24);
src_offset->src[0] = nir_src_for_ssa(nir_imm_int(, 0));
-   src_offset->num_components = 2;
-   nir_ssa_dest_init(_offset->instr, _offset->dest, 2, 32, 
"src_offset");
+   src_offset->num_components = is_3d ? 3 : 2;
+   nir_ssa_dest_init(_offset->instr, _offset->dest, is_3d ? 3 : 2, 
32, "src_offset");
nir_builder_instr_insert(, _offset->instr);
 
nir_intrinsic_instr *dst_offset = nir_intrinsic_instr_create(b.shader, 
nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(dst_offset, 0);
-   nir_intrinsic_set_range(dst_offset, 16);
-   dst_offset->src[0] = nir_src_for_ssa(nir_imm_int(, 8));
-   dst_offset->num_components = 2;
-   nir_ssa_dest_init(_offset->instr, _offset->dest, 2, 32, 
"dst_offset");
+   nir_intrinsic_set_range(dst_offset, 24);
+   dst_offset->src[0] = nir_src_for_ssa(nir_imm_int(, 12));
+   dst_offset->num_components = is_3d ? 3 : 2;
+   nir_ssa_dest_init(_offset->instr, _offset->dest, is_3d ? 3 : 2, 
32, "dst_offset");
nir_builder_instr_insert(, _offset->instr);
 
nir_ssa_def *src_coord = nir_iadd(, global_id, _offset->dest.ssa);
@@ -507,15 +508,15 @@ build_nir_itoi_compute_shader(struct radv_device *dev)
nir_ssa_def *dst_coord = nir_iadd(, global_id, _offset->dest.ssa);
 
nir_tex_instr *tex = nir_tex_instr_create(b.shader, 2);
-   tex->sampler_dim = GLSL_SAMPLER_DIM_2D;
+   tex->sampler_dim = dim;
tex->op = nir_texop_txf;
tex->src[0].src_type = nir_tex_src_coord;
-   tex->src[0].src = nir_src_for_ssa(nir_channels(, src_coord, 3));
+   tex->src[0].src = nir_src_for_ssa(nir_channels(, src_coord, is_3d ? 
0x7 : 0x3));
tex->src[1].src_type = nir_tex_src_lod;
tex->src[1].src = nir_src_for_ssa(nir_imm_int(, 0));
tex->dest_type = nir_type_float;
tex->is_array = false;
-   tex->coord_components = 2;
+   tex->coord_components = is_3d ? 3 : 2;
tex->texture = nir_deref_var_create(tex, input_img);
tex->sampler = NULL;
 
@@ -539,9 +540,10 @@ radv_device_init_meta_itoi_state(struct radv_device 
*device)
 {
VkResult result;
struct radv_shader_module cs = { .nir = NULL };
-
-   cs.nir = build_nir_itoi_compute_shader(device);
-
+   struct radv_shader_module cs_3d = { .nir = NULL };
+   cs.nir = build_nir_itoi_compute_shader(device, false);
+   if (device->physical_device->rad_info.chip_class >= GFX9)
+   cs_3d.nir = build_nir_itoi_compute_shader(device, true);
/*
  

[Mesa-dev] [PATCH 3/3] radv/gfx9: fix buffer to image for 3d images on compute queues

2017-12-28 Thread Dave Airlie
From: Dave Airlie 

This fixes some of the broken:
dEQP-VK.synchronization.op.multi_queue.*64x64x8* tests.

Fixes: e38685cc62e 'Revert "radv: disable support for VEGA for now."'
Signed-off-by: Dave Airlie 
---
 src/amd/vulkan/radv_meta_bufimage.c | 62 -
 src/amd/vulkan/radv_private.h   |  1 +
 2 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/src/amd/vulkan/radv_meta_bufimage.c 
b/src/amd/vulkan/radv_meta_bufimage.c
index 1696f85ac1..5bcc1e62db 100644
--- a/src/amd/vulkan/radv_meta_bufimage.c
+++ b/src/amd/vulkan/radv_meta_bufimage.c
@@ -259,19 +259,20 @@ radv_device_finish_meta_itob_state(struct radv_device 
*device)
 }
 
 static nir_shader *
-build_nir_btoi_compute_shader(struct radv_device *dev)
+build_nir_btoi_compute_shader(struct radv_device *dev, bool is_3d)
 {
nir_builder b;
+   enum glsl_sampler_dim dim = is_3d ? GLSL_SAMPLER_DIM_3D : 
GLSL_SAMPLER_DIM_2D;
const struct glsl_type *buf_type = 
glsl_sampler_type(GLSL_SAMPLER_DIM_BUF,
 false,
 false,
 GLSL_TYPE_FLOAT);
-   const struct glsl_type *img_type = 
glsl_sampler_type(GLSL_SAMPLER_DIM_2D,
+   const struct glsl_type *img_type = glsl_sampler_type(dim,
 false,
 false,
 GLSL_TYPE_FLOAT);
nir_builder_init_simple_shader(, NULL, MESA_SHADER_COMPUTE, NULL);
-   b.shader->info.name = ralloc_strdup(b.shader, "meta_btoi_cs");
+   b.shader->info.name = ralloc_strdup(b.shader, is_3d ? "meta_btoi_cs_3d" 
: "meta_btoi_cs");
b.shader->info.cs.local_size[0] = 16;
b.shader->info.cs.local_size[1] = 16;
b.shader->info.cs.local_size[2] = 1;
@@ -296,16 +297,16 @@ build_nir_btoi_compute_shader(struct radv_device *dev)
 
nir_intrinsic_instr *offset = nir_intrinsic_instr_create(b.shader, 
nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(offset, 0);
-   nir_intrinsic_set_range(offset, 12);
+   nir_intrinsic_set_range(offset, 16);
offset->src[0] = nir_src_for_ssa(nir_imm_int(, 0));
-   offset->num_components = 2;
-   nir_ssa_dest_init(>instr, >dest, 2, 32, "offset");
+   offset->num_components = is_3d ? 3 : 2;
+   nir_ssa_dest_init(>instr, >dest, is_3d ? 3 : 2, 32, 
"offset");
nir_builder_instr_insert(, >instr);
 
nir_intrinsic_instr *stride = nir_intrinsic_instr_create(b.shader, 
nir_intrinsic_load_push_constant);
nir_intrinsic_set_base(stride, 0);
-   nir_intrinsic_set_range(stride, 12);
-   stride->src[0] = nir_src_for_ssa(nir_imm_int(, 8));
+   nir_intrinsic_set_range(stride, 16);
+   stride->src[0] = nir_src_for_ssa(nir_imm_int(, 12));
stride->num_components = 1;
nir_ssa_dest_init(>instr, >dest, 1, 32, "stride");
nir_builder_instr_insert(, >instr);
@@ -353,9 +354,10 @@ radv_device_init_meta_btoi_state(struct radv_device 
*device)
 {
VkResult result;
struct radv_shader_module cs = { .nir = NULL };
-
-   cs.nir = build_nir_btoi_compute_shader(device);
-
+   struct radv_shader_module cs_3d = { .nir = NULL };
+   cs.nir = build_nir_btoi_compute_shader(device, false);
+   if (device->physical_device->rad_info.chip_class >= GFX9)
+   cs_3d.nir = build_nir_btoi_compute_shader(device, true);
/*
 * two descriptors one for the image being sampled
 * one for the buffer being written.
@@ -395,7 +397,7 @@ radv_device_init_meta_btoi_state(struct radv_device *device)
.setLayoutCount = 1,
.pSetLayouts = >meta_state.btoi.img_ds_layout,
.pushConstantRangeCount = 1,
-   .pPushConstantRanges = 
&(VkPushConstantRange){VK_SHADER_STAGE_COMPUTE_BIT, 0, 12},
+   .pPushConstantRanges = 
&(VkPushConstantRange){VK_SHADER_STAGE_COMPUTE_BIT, 0, 16},
};
 
result = radv_CreatePipelineLayout(radv_device_to_handle(device),
@@ -429,9 +431,33 @@ radv_device_init_meta_btoi_state(struct radv_device 
*device)
if (result != VK_SUCCESS)
goto fail;
 
+   if (device->physical_device->rad_info.chip_class >= GFX9) {
+   VkPipelineShaderStageCreateInfo pipeline_shader_stage_3d = {
+   .sType = 
VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,
+   .stage = VK_SHADER_STAGE_COMPUTE_BIT,
+   .module = radv_shader_module_to_handle(_3d),
+   .pName = "main",
+   .pSpecializationInfo = NULL,
+   };
+
+   VkComputePipelineCreateInfo vk_pipeline_info_3d = {
+   

Re: [Mesa-dev] [PATCH] radv: enable denorms for 64-bit and 16-bit floats

2017-12-28 Thread Samuel Pitoiset



On 12/28/2017 11:08 PM, Matt Arsenault wrote:




On Dec 28, 2017, at 16:55, Samuel Pitoiset  wrote:

Similar to RadeonSI.

This fixes:
dEQP-VK.image.texel_view_compatible.graphic.basic.attachment_read.bc*r16g16b16a16_sfloat
dEQP-VK.image.extended_usage_bit.attachment_write.r16_sfloat

Signed-off-by: Samuel Pitoiset 
---
src/amd/common/ac_nir_to_llvm.c | 14 ++
1 file changed, 14 insertions(+)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index d9f2cb408c..9d9a1f911b 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -6879,6 +6879,20 @@ static void ac_compile_llvm_module(LLVMTargetMachineRef 
tm,
/* +3 for scratch wave offset and VCC */
config->num_sgprs = MAX2(config->num_sgprs,
 shader_info->num_input_sgprs + 3);
+
+   /* Enable 64-bit and 16-bit denormals, because there is no performance
+* cost.
+*
+* If denormals are enabled, all floating-point output modifiers are
+* ignored.
+*
+* Don't enable denormals for 32-bit floats, because:
+* - Floating-point output modifiers would be ignored by the hw.
+* - Some opcodes don't support denormals, such as v_mad_f32. We would
+*   have to stop using those.
+* - SI & CI would be very slow.
+*/
+   config->float_mode |= V_00B028_FP_64_DENORMS;
}


This is set in the program binary. You should use that directly rather than 
ignoring it


Ah, I didn't know.




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: enable denorms for 64-bit and 16-bit floats

2017-12-28 Thread Matt Arsenault


> On Dec 28, 2017, at 16:55, Samuel Pitoiset  wrote:
> 
> Similar to RadeonSI.
> 
> This fixes:
> dEQP-VK.image.texel_view_compatible.graphic.basic.attachment_read.bc*r16g16b16a16_sfloat
> dEQP-VK.image.extended_usage_bit.attachment_write.r16_sfloat
> 
> Signed-off-by: Samuel Pitoiset 
> ---
> src/amd/common/ac_nir_to_llvm.c | 14 ++
> 1 file changed, 14 insertions(+)
> 
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index d9f2cb408c..9d9a1f911b 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -6879,6 +6879,20 @@ static void 
> ac_compile_llvm_module(LLVMTargetMachineRef tm,
>   /* +3 for scratch wave offset and VCC */
>   config->num_sgprs = MAX2(config->num_sgprs,
>shader_info->num_input_sgprs + 3);
> +
> + /* Enable 64-bit and 16-bit denormals, because there is no performance
> +  * cost.
> +  *
> +  * If denormals are enabled, all floating-point output modifiers are
> +  * ignored.
> +  *
> +  * Don't enable denormals for 32-bit floats, because:
> +  * - Floating-point output modifiers would be ignored by the hw.
> +  * - Some opcodes don't support denormals, such as v_mad_f32. We would
> +  *   have to stop using those.
> +  * - SI & CI would be very slow.
> +  */
> + config->float_mode |= V_00B028_FP_64_DENORMS;
> }

This is set in the program binary. You should use that directly rather than 
ignoring it
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radv: enable denorms for 64-bit and 16-bit floats

2017-12-28 Thread Samuel Pitoiset
Similar to RadeonSI.

This fixes:
dEQP-VK.image.texel_view_compatible.graphic.basic.attachment_read.bc*r16g16b16a16_sfloat
dEQP-VK.image.extended_usage_bit.attachment_write.r16_sfloat

Signed-off-by: Samuel Pitoiset 
---
 src/amd/common/ac_nir_to_llvm.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index d9f2cb408c..9d9a1f911b 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -6879,6 +6879,20 @@ static void ac_compile_llvm_module(LLVMTargetMachineRef 
tm,
/* +3 for scratch wave offset and VCC */
config->num_sgprs = MAX2(config->num_sgprs,
 shader_info->num_input_sgprs + 3);
+
+   /* Enable 64-bit and 16-bit denormals, because there is no performance
+* cost.
+*
+* If denormals are enabled, all floating-point output modifiers are
+* ignored.
+*
+* Don't enable denormals for 32-bit floats, because:
+* - Floating-point output modifiers would be ignored by the hw.
+* - Some opcodes don't support denormals, such as v_mad_f32. We would
+*   have to stop using those.
+* - SI & CI would be very slow.
+*/
+   config->float_mode |= V_00B028_FP_64_DENORMS;
 }
 
 static void
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 104351] X Error of failed request: BadAlloc (insufficient resources for operation)

2017-12-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104351

Emil Velikov  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|NEW |RESOLVED

--- Comment #5 from Emil Velikov  ---
Let's mark this as duplicate for now.
If the solution of bug 104306 does not help feel free to reopen.

In the interim use the Xorg workaround and keep a close eye on bug 104306.

*** This bug has been marked as a duplicate of bug 104306 ***

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Buffer update assert with multiple contexts.

2017-12-28 Thread corngood

I'm hitting this assert in radeonsi: si_descriptors.c:1414:
si_desc_reset_buffer_offset: Assertion `old_buf_va <= old_desc_va' failed.

It seems to happen when a buffer is updated after being bound (as a
uniform buffer) on multiple contexts in a sharing group. If it's bound
on context A and B, then reallocated (BufferSubData) on context A, the
descriptor on context B is never updated, so the next time the buffer is
updated on B, you hit this assert.

I've attached a test for piglit, which shows the assert, and also shows
that the updated buffer contents aren't used when the buffer is updated
from another context.  The test does:

- create context A+B
- make A current
  - create buffer
  - set buffer content to red
  - bind buffer as uniform
  - draw quad with uniform colour
- make B current
  - bind buffer as uniform
  - draw quad with uniform colour
  - set buffer content to green
  - draw quad with uniform colour
- make A current
  - draw quad with uniform colour
  ** this one ends up red
  - set buffer content to blue
  ** this results in the assert
  - draw quad with uniform colour

Setting all resources as shared stops the assert and the incorrect (?)
output:

--- a/src/gallium/auxiliary/util/u_threaded_context.c
+++ b/src/gallium/auxiliary/util/u_threaded_context.c
@@ -265,7 +265,7 @@ threaded_resource_init(struct pipe_resource *res)
tres->latest = >b;
util_range_init(>valid_buffer_range);
tres->base_valid_buffer_range = >valid_buffer_range;
-   tres->is_shared = false;
+   tres->is_shared = true;
tres->is_user_ptr = false;
 }
 
I've tested mesa 17.1, 17.2, 17.3, and master, and they all seem to
behave the same way.

Is this a bug, or am I misunderstanding what's allowed with shared
buffers?  I haven't found any clear documentation on sharing.

Assuming it's a bug, does anyone have any thoughts on fixing it?  As far
as I can tell there's currently no way for buffers to be marked as
'is_shared'.  Getting the descriptor rebinding working for multiple
contexts seems like it would be tricky.


>From 4208b7b105364e1f5064de123a507247344c8694 Mon Sep 17 00:00:00 2001
From: David McFarland 
Date: Thu, 28 Dec 2017 14:47:43 -0400
Subject: [PATCH] Add glx buffer sharing test.

---
 tests/all.py   |   1 +
 tests/glx/CMakeLists.gl.txt|   1 +
 tests/glx/glx-buffer-sharing.c | 277 +
 3 files changed, 279 insertions(+)
 create mode 100644 tests/glx/glx-buffer-sharing.c

diff --git a/tests/all.py b/tests/all.py
index 5a1368a8e..6ee094e10 100644
--- a/tests/all.py
+++ b/tests/all.py
@@ -515,6 +515,7 @@ with profile.test_list.group_manager(PiglitGLTest, 'shaders') as g:
 with profile.test_list.group_manager(
 PiglitGLTest, 'glx',
 require_platforms=['glx', 'mixed_glx_egl']) as g:
+g(['glx-buffer-sharing'], run_concurrent=False)
 g(['glx-destroycontext-1'], run_concurrent=False)
 g(['glx-destroycontext-2'], run_concurrent=False)
 g(['glx-dont-care-mask'], run_concurrent=False)
diff --git a/tests/glx/CMakeLists.gl.txt b/tests/glx/CMakeLists.gl.txt
index bd3780b8d..ff1182bc2 100644
--- a/tests/glx/CMakeLists.gl.txt
+++ b/tests/glx/CMakeLists.gl.txt
@@ -21,6 +21,7 @@ IF(PIGLIT_BUILD_GLX_TESTS)
 	link_libraries (
 		${X11_X11_LIB}
 	)
+	piglit_add_executable (glx-buffer-sharing glx-buffer-sharing.c)
 	piglit_add_executable (glx-fbconfig-sanity glx-fbconfig-sanity.c)
 	piglit_add_executable (glx-fbconfig-compliance glx-fbconfig-compliance.c)
 	piglit_add_executable (glx-fbconfig-bad glx-fbconfig-bad.c)
diff --git a/tests/glx/glx-buffer-sharing.c b/tests/glx/glx-buffer-sharing.c
new file mode 100644
index 0..58cd09fe9
--- /dev/null
+++ b/tests/glx/glx-buffer-sharing.c
@@ -0,0 +1,277 @@
+/*
+ * Copyright (c) 2010 VMware, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "piglit-util-gl.h"
+#include 

Re: [Mesa-dev] [PATCH] svga: update SVGA_NEW_ flags for updating sampler state

2017-12-28 Thread Charmaine Lee

Looks good.

Reviewed-by: Charmaine Lee 


From: Brian Paul 
Sent: Thursday, December 28, 2017 11:09:34 AM
To: mesa-dev@lists.freedesktop.org
Cc: Neha Bhende; Charmaine Lee
Subject: [PATCH] svga: update SVGA_NEW_ flags for updating sampler state

The SVGA_NEW_FS flag is needed since we now examine the fragment
shader's fs_shadow_compare_units flags.  The SVGA_NEW_TEXTURE_FLAGS
flag is not needed since it's only for pre-VGPU10.

No piglit changes.  This doesn't fix any known issues but it could
pop up somewhere.  Suggested by Charmaine.
---
 src/gallium/drivers/svga/svga_state_sampler.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_state_sampler.c 
b/src/gallium/drivers/svga/svga_state_sampler.c
index 11f36e3..bcc055d 100644
--- a/src/gallium/drivers/svga/svga_state_sampler.c
+++ b/src/gallium/drivers/svga/svga_state_sampler.c
@@ -393,6 +393,7 @@ update_samplers(struct svga_context *svga, unsigned dirty )
   for (i = 0; i < count; i++) {
  bool fs_shadow = false;

+ /* _NEW_FS */
  if (shader == PIPE_SHADER_FRAGMENT) {
 struct svga_shader_variant *fs = svga->state.hw_draw.fs;
 /* If the fragment shader is doing the shadow comparison
@@ -469,8 +470,8 @@ update_samplers(struct svga_context *svga, unsigned dirty )

 struct svga_tracked_state svga_hw_sampler = {
"texture sampler emit",
-   (SVGA_NEW_SAMPLER |
-SVGA_NEW_STIPPLE |
-SVGA_NEW_TEXTURE_FLAGS),
+   (SVGA_NEW_FS |
+SVGA_NEW_SAMPLER |
+SVGA_NEW_STIPPLE),
update_samplers
 };
--
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] svga: update SVGA_NEW_ flags for updating sampler state

2017-12-28 Thread Brian Paul
The SVGA_NEW_FS flag is needed since we now examine the fragment
shader's fs_shadow_compare_units flags.  The SVGA_NEW_TEXTURE_FLAGS
flag is not needed since it's only for pre-VGPU10.

No piglit changes.  This doesn't fix any known issues but it could
pop up somewhere.  Suggested by Charmaine.
---
 src/gallium/drivers/svga/svga_state_sampler.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_state_sampler.c 
b/src/gallium/drivers/svga/svga_state_sampler.c
index 11f36e3..bcc055d 100644
--- a/src/gallium/drivers/svga/svga_state_sampler.c
+++ b/src/gallium/drivers/svga/svga_state_sampler.c
@@ -393,6 +393,7 @@ update_samplers(struct svga_context *svga, unsigned dirty )
   for (i = 0; i < count; i++) {
  bool fs_shadow = false;
 
+ /* _NEW_FS */
  if (shader == PIPE_SHADER_FRAGMENT) {
 struct svga_shader_variant *fs = svga->state.hw_draw.fs;
 /* If the fragment shader is doing the shadow comparison
@@ -469,8 +470,8 @@ update_samplers(struct svga_context *svga, unsigned dirty )
 
 struct svga_tracked_state svga_hw_sampler = {
"texture sampler emit",
-   (SVGA_NEW_SAMPLER |
-SVGA_NEW_STIPPLE |
-SVGA_NEW_TEXTURE_FLAGS),
+   (SVGA_NEW_FS |
+SVGA_NEW_SAMPLER |
+SVGA_NEW_STIPPLE),
update_samplers
 };
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] anv/device: Mark all state buffers as needing capture

2017-12-28 Thread Jason Ekstrand
Thanks!  I've pushed the last 3.  I'll let the debate continue on 1/4. :-)

On Thu, Dec 28, 2017 at 9:25 AM, Lionel Landwerlin <
lionel.g.landwer...@intel.com> wrote:

> Reviewed-by: Lionel Landwerlin 
>
>
> On 27/12/17 20:58, Jason Ekstrand wrote:
>
>> Previously, we were flagging the instruction state buffer for capture
>> but not surface state or dynamic state.  We want those captured too.
>> ---
>>   src/intel/vulkan/anv_device.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.
>> c
>> index 4638f31..680f5a7 100644
>> --- a/src/intel/vulkan/anv_device.c
>> +++ b/src/intel/vulkan/anv_device.c
>> @@ -1251,7 +1251,8 @@ VkResult anv_CreateDevice(
>> goto fail_batch_bo_pool;
>>/* For the state pools we explicitly disable 48bit. */
>> -   bo_flags = physical_device->has_exec_async ? EXEC_OBJECT_ASYNC : 0;
>> +   bo_flags = (physical_device->has_exec_async ? EXEC_OBJECT_ASYNC : 0)
>> |
>> +  (physical_device->has_exec_capture ? EXEC_OBJECT_CAPTURE
>> : 0);
>>result = anv_state_pool_init(>dynamic_state_pool, device,
>> 16384,
>>   bo_flags);
>> @@ -1259,8 +1260,7 @@ VkResult anv_CreateDevice(
>> goto fail_bo_cache;
>>result = anv_state_pool_init(>instruction_state_pool,
>> device, 16384,
>> -bo_flags |
>> -(physical_device->has_exec_capture ?
>> EXEC_OBJECT_CAPTURE : 0));
>> +bo_flags);
>>  if (result != VK_SUCCESS)
>> goto fail_dynamic_state_pool;
>>
>>
>
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces

2017-12-28 Thread Miguel Angel Vico
(Adding dri-devel back, and trying to respond to some comments from
the different forks)

James Jones wrote:

> Your worst case analysis above isn't far off from our HW, give or take
> some bits and axes here and there.  We've started an internal discussion
> about how to lay out all the bits we need.  It's hard to even enumerate
> them all without having a complete understanding of what capability sets
> are going to include, a fully-optimized implementation of the mechanism
> on our HW, and lot's of test scenarios though.  

(thanks James for most of the info below)

To elaborate a bit, if we want to share an allocation across GPUs for 3D
rendering, it seems we would need 12 bits to express our
swizzling/tiling memory layouts for fermi+. In addition to that,
maxwell uses 3 more bits for this, and we need an extra bit to identify
pre-fermi representations.

We also need one bit to differentiate between Tegra and desktop, and
another one to indicate whether the layout is otherwise linear.

Then things like whether compression is used (one more bit), and we can
probably get by with 3 bits for the type of compression if we are
creative. However, it'd be way easier to just track arch + page kind,
which would be like 32 bits on its own.

Whether Z-culling and/or zero-bandwidth-clears are used may be another 3
bits.

If device-local properties are included, we might need a couple more
bits for caching.

We may also need to express locality information, which may take at
least another 2 or 3 bits.

If we want to share array textures too, you also need to pass the array
pitch. Is it supposed to be encoded in a modifier too? That's 64 bits on
its own.

So yes, as James mentioned, with some effort, we could technically fit
our current allocation parameters in a modifier, but I'm still not
convinced this is as future proof as it could be as our hardware grows
in capabilities.


Daniel Stone wrote:

> So I reflexively
> get a bit itchy when I see the kernel being used to transit magic
> blobs of data which are supplied by userspace, and only interpreted by
> different userspace. Having tiling formats hidden away means that
> we've had real-world bugs in AMD hardware, where we end up displaying
> garbage because we cannot generically reason about the buffer
> attributes.  

I'm a bit confused. Can't modifiers be specified by vendors and only
interpreted by drivers? My understanding was that modifiers could
actually be treated as opaque 64-bit data, in which case they would
qualify as "magic blobs of data". Otherwise, it seems this wouldn't be
scalable. What am I missing?


Daniel Vetter wrote:

> I think in the interim figuring out how to expose kms capabilities
> better (and necessarily standardizing at least some of them which
> matter at the compositor level, like size limits of framebuffers)
> feels like the place to push the ecosystem forward. In some way
> Miguel's proposal looks a bit backwards, since it adds the pitch
> capabilities to addfb, but at addfb time you've allocated everything
> already, so way too late to fix things up. With modifiers we've added
> a very simple per-plane property to list which modifiers can be
> combined with which pixel formats. Tiny start, but obviously very far
> from all that we'll need.  

Not sure whether I might be misunderstanding your statement, but one of
the allocator main features is negotiation of nearly optimal allocation
parameters given a set of uses on different devices/engines by the
capability merge operation. A client should have queried what every
device/engine is capable of for the given uses, find the optimal set of
capabilities, and use it for allocating a buffer. At the moment these
parameters are given to KMS, they are expected to be good. If they
aren't, the client didn't do things right.


Rob Clark wrote:

> It does seem like, if possible, starting out with modifiers for now at
> the kernel interface would make life easier, vs trying to reinvent
> both kernel and userspace APIs at the same time.  Userspace APIs are
> easier to change or throw away.  Presumably by the time we get to the
> point of changing kernel uabi, we are already using, and pretty happy
> with, serialized liballoc data over the wire in userspace so it is
> only a matter of changing the kernel interface.  

I guess we can indeed start with modifiers for now, if that's what it
takes to get the allocator mechanisms rolling. However, it seems to me
that we won't be able to encode the same type of information included
in capability sets with modifiers in all cases. For instance, if we end
up encoding usage transition information in capability sets, how that
would translate to modifiers?

I assume display doesn't really care about a lot of the data capability
sets may encode, but is it correct to think of modifiers as things only
display needs? If we are to treat modifiers as a first-class citizen, I
would expect to use them beyond that.


Kristian Kristensen wrote:

> I agree and let me 

Re: [Mesa-dev] [PATCH 4/4] anv/device: Mark all state buffers as needing capture

2017-12-28 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

On 27/12/17 20:58, Jason Ekstrand wrote:

Previously, we were flagging the instruction state buffer for capture
but not surface state or dynamic state.  We want those captured too.
---
  src/intel/vulkan/anv_device.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 4638f31..680f5a7 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -1251,7 +1251,8 @@ VkResult anv_CreateDevice(
goto fail_batch_bo_pool;
  
 /* For the state pools we explicitly disable 48bit. */

-   bo_flags = physical_device->has_exec_async ? EXEC_OBJECT_ASYNC : 0;
+   bo_flags = (physical_device->has_exec_async ? EXEC_OBJECT_ASYNC : 0) |
+  (physical_device->has_exec_capture ? EXEC_OBJECT_CAPTURE : 0);
  
 result = anv_state_pool_init(>dynamic_state_pool, device, 16384,

  bo_flags);
@@ -1259,8 +1260,7 @@ VkResult anv_CreateDevice(
goto fail_bo_cache;
  
 result = anv_state_pool_init(>instruction_state_pool, device, 16384,

-bo_flags |
-(physical_device->has_exec_capture ? 
EXEC_OBJECT_CAPTURE : 0));
+bo_flags);
 if (result != VK_SUCCESS)
goto fail_dynamic_state_pool;
  



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] intel/aubinator: Gracefully handle dynamic state not being available

2017-12-28 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

On 27/12/17 20:58, Jason Ekstrand wrote:

Some older versions of the Vulkan driver didn't properly tag dynamic
state as needing to be captured.  Also, this prevents crashes when
looking at dumps on older kernels.
---
  src/intel/tools/gen_batch_decoder.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/src/intel/tools/gen_batch_decoder.c 
b/src/intel/tools/gen_batch_decoder.c
index e8dd19b..f09b833 100644
--- a/src/intel/tools/gen_batch_decoder.c
+++ b/src/intel/tools/gen_batch_decoder.c
@@ -582,6 +582,11 @@ decode_dynamic_state_pointers(struct gen_batch_decode_ctx 
*ctx,
const char *struct_type, const uint32_t *p,
int count)
  {
+   if (ctx->dynamic_base.map == NULL) {
+  fprintf(ctx->fp, "  dynamic %s state unavailable\n", struct_type);
+  return;
+   }
+
 struct gen_group *inst = gen_spec_find_instruction(ctx->spec, p);
 struct gen_group *state = gen_spec_find_struct(ctx->spec, struct_type);
  



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] intel/aubinator: Free section data last

2017-12-28 Thread Lionel Landwerlin

Good catch!

Reviewed-by: Lionel Landwerlin 

On 27/12/17 20:58, Jason Ekstrand wrote:

We were walking the sections, printing the batches, and then freeing
them in one pass.  If the batch happens to reference any earlier
sections (which it almost certainly will since it's at the end), we will
access freed memory.
---
  src/intel/tools/aubinator_error_decode.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/intel/tools/aubinator_error_decode.c 
b/src/intel/tools/aubinator_error_decode.c
index f0c5b5b..5f5b6af 100644
--- a/src/intel/tools/aubinator_error_decode.c
+++ b/src/intel/tools/aubinator_error_decode.c
@@ -523,12 +523,14 @@ read_data_file(FILE *file)
   gen_print_batch(_ctx, sections[s].data, sections[s].count,
   sections[s].gtt_offset);
}
+   }
+
+   gen_batch_decode_ctx_finish(_ctx);
  
+   for (int s = 0; s < sect_num; s++) {

free(sections[s].ring_name);
free(sections[s].data);
 }
-
-   gen_batch_decode_ctx_finish(_ctx);
  }
  
  static void



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] svga: check for null fs pointer in update_samplers()

2017-12-28 Thread Neha Bhende
Looks good.

Reviewed-by: Neha Bhende

From: Brian Paul 
Sent: Thursday, December 28, 2017 8:19:24 AM
To: mesa-dev@lists.freedesktop.org
Cc: Charmaine Lee; Neha Bhende
Subject: [PATCH] svga: check for null fs pointer in update_samplers()

This can happen when there's no active fragment shader, such as
when using transform feedback.  This wasn't hit by any Piglit test
but is hit by Daniel Rákos' Nature demo.  VMware bug 2026189.
---
 src/gallium/drivers/svga/svga_state_sampler.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/svga/svga_state_sampler.c 
b/src/gallium/drivers/svga/svga_state_sampler.c
index 9bd0d53..11f36e3 100644
--- a/src/gallium/drivers/svga/svga_state_sampler.c
+++ b/src/gallium/drivers/svga/svga_state_sampler.c
@@ -399,7 +399,7 @@ update_samplers(struct svga_context *svga, unsigned dirty )
  * for this texture unit, don't enable shadow compare in
  * the texture sampler state.
  */
-if (fs->fs_shadow_compare_units & (1 << i)) {
+if (fs && (fs->fs_shadow_compare_units & (1 << i))) {
fs_shadow = true;
 }
  }
--
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] svga: check for null fs pointer in update_samplers()

2017-12-28 Thread Charmaine Lee

Reviewed-by: Charmaine Lee 


From: Brian Paul 
Sent: Thursday, December 28, 2017 8:19:24 AM
To: mesa-dev@lists.freedesktop.org
Cc: Charmaine Lee; Neha Bhende
Subject: [PATCH] svga: check for null fs pointer in update_samplers()

This can happen when there's no active fragment shader, such as
when using transform feedback.  This wasn't hit by any Piglit test
but is hit by Daniel Rákos' Nature demo.  VMware bug 2026189.
---
 src/gallium/drivers/svga/svga_state_sampler.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/svga/svga_state_sampler.c 
b/src/gallium/drivers/svga/svga_state_sampler.c
index 9bd0d53..11f36e3 100644
--- a/src/gallium/drivers/svga/svga_state_sampler.c
+++ b/src/gallium/drivers/svga/svga_state_sampler.c
@@ -399,7 +399,7 @@ update_samplers(struct svga_context *svga, unsigned dirty )
  * for this texture unit, don't enable shadow compare in
  * the texture sampler state.
  */
-if (fs->fs_shadow_compare_units & (1 << i)) {
+if (fs && (fs->fs_shadow_compare_units & (1 << i))) {
fs_shadow = true;
 }
  }
--
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] svga: check for null fs pointer in update_samplers()

2017-12-28 Thread Brian Paul
This can happen when there's no active fragment shader, such as
when using transform feedback.  This wasn't hit by any Piglit test
but is hit by Daniel Rákos' Nature demo.  VMware bug 2026189.
---
 src/gallium/drivers/svga/svga_state_sampler.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/svga/svga_state_sampler.c 
b/src/gallium/drivers/svga/svga_state_sampler.c
index 9bd0d53..11f36e3 100644
--- a/src/gallium/drivers/svga/svga_state_sampler.c
+++ b/src/gallium/drivers/svga/svga_state_sampler.c
@@ -399,7 +399,7 @@ update_samplers(struct svga_context *svga, unsigned dirty )
  * for this texture unit, don't enable shadow compare in
  * the texture sampler state.
  */
-if (fs->fs_shadow_compare_units & (1 << i)) {
+if (fs && (fs->fs_shadow_compare_units & (1 << i))) {
fs_shadow = true;
 }
  }
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 93551] Divinity: Original Sin Enhanced Edition(Native) crash on start

2017-12-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=93551

--- Comment #47 from Robert G. Brown  ---
Installed the shim for Fedora 25, XFCE, and Steam.  It worked.  Upgraded to F26
(it no longer worked) and then to F27 (and it still no longer works).  I've
played with it a fair bit, reinstalled the Divinity binaries, and no matter
what I try, I still get the instant crash.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] radv/radeonsi: set dcc min uncompressed properly for APUs.

2017-12-28 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Tue, Dec 26, 2017 at 11:19 PM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> This is ported from amdvlk.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/amd/vulkan/radv_device.c| 10 ++
>  src/gallium/drivers/radeonsi/si_state.c |  9 +
>  2 files changed, 19 insertions(+)
>
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index 63be8c53a9..c87d858a17 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -3162,6 +3162,15 @@ radv_initialise_color_surface(struct radv_device 
> *device,
>
> if (device->physical_device->rad_info.chip_class >= VI) {
> unsigned max_uncompressed_block_size = 
> V_028C78_MAX_BLOCK_SIZE_256B;
> +   unsigned min_compressed_block_size = 
> V_028C78_MIN_BLOCK_SIZE_32B;
> +
> +   /* amdvlk: [min-compressed-block-size] should be set to 32 
> for dGPU and
> +  64 for APU because all of our APUs to date use DIMMs which 
> have
> +  a request granularity size of 64B while all other chips 
> have a
> +  32B request size */
> +   if (!device->physical_device->rad_info.has_dedicated_vram)
> +   min_compressed_block_size = 
> V_028C78_MIN_BLOCK_SIZE_64B;
> +
> if (iview->image->info.samples > 1) {
> if (iview->image->surface.bpe == 1)
> max_uncompressed_block_size = 
> V_028C78_MAX_BLOCK_SIZE_64B;
> @@ -3170,6 +3179,7 @@ radv_initialise_color_surface(struct radv_device 
> *device,
> }
>
> cb->cb_dcc_control = 
> S_028C78_MAX_UNCOMPRESSED_BLOCK_SIZE(max_uncompressed_block_size) |
> +   
> S_028C78_MIN_COMPRESSED_BLOCK_SIZE(min_compressed_block_size) |
> S_028C78_INDEPENDENT_64B_BLOCKS(1);
> }
>
> diff --git a/src/gallium/drivers/radeonsi/si_state.c 
> b/src/gallium/drivers/radeonsi/si_state.c
> index 544bf6aa2f..db31fae60f 100644
> --- a/src/gallium/drivers/radeonsi/si_state.c
> +++ b/src/gallium/drivers/radeonsi/si_state.c
> @@ -2451,6 +2451,14 @@ static void si_initialize_color_surface(struct 
> si_context *sctx,
>
> if (sctx->b.chip_class >= VI) {
> unsigned max_uncompressed_block_size = 
> V_028C78_MAX_BLOCK_SIZE_256B;
> +   unsigned min_compressed_block_size = 
> V_028C78_MIN_BLOCK_SIZE_32B;
> +
> +   /* amdvlk: [min-compressed-block-size] should be set to 32 
> for dGPU and
> +  64 for APU because all of our APUs to date use DIMMs which 
> have
> +  a request granularity size of 64B while all other chips 
> have a
> +  32B request size */
> +   if (!sctx->screen->info.has_dedicated_vram)
> +   min_compressed_block_size = 
> V_028C78_MIN_BLOCK_SIZE_64B;
>
> if (rtex->resource.b.b.nr_samples > 1) {
> if (rtex->surface.bpe == 1)
> @@ -2460,6 +2468,7 @@ static void si_initialize_color_surface(struct 
> si_context *sctx,
> }
>
> surf->cb_dcc_control = 
> S_028C78_MAX_UNCOMPRESSED_BLOCK_SIZE(max_uncompressed_block_size) |
> +  
> S_028C78_MIN_COMPRESSED_BLOCK_SIZE(min_compressed_block_size) |
>S_028C78_INDEPENDENT_64B_BLOCKS(1);
> }
>
> --
> 2.14.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: don't use fast color clear for small images even on APUs

2017-12-28 Thread Bas Nieuwenhuizen
On Thu, Dec 28, 2017 at 3:54 PM, Marek Olšák  wrote:
> On Thu, Dec 28, 2017 at 12:29 PM, Konstantin Kharlamov
>  wrote:
>> I'm wondering, how is r600g different in that regard? I tried wiring up the 
>> code into evergreen_do_fast_color_clear(), both in this state and by using 
>> 256*256 — however FPS for me always varies around the same 1420.
>>
>> That said, I'm seeing lots of CPU used by Xorg, glxgears, and compton — I'm 
>> wondering if CPU cap could be the reason?
>
> r600g might benefit in the same way. glxgears requires the limit to be
> at least 300*300.

As was discussed on #radeon, his default window was much larger due to
a tiling window manager (683x768) and hence his changes did not
trigger.

- Bas
>
> Marek
>
>>
>> В письме от среда, 13 декабря 2017 г. 2:53:12 MSK пользователь Marek Olšák 
>> написал:
>>> From: Marek Olšák 
>>>
>>> Increase the limit and handle non-square images better.
>>>
>>> This makes glxgears 20% faster on APUs, and a little more on dGPUs.
>>> We all use and love glxgears.
>>> ---
>>>  src/gallium/drivers/radeonsi/si_clear.c | 9 -
>>>  1 file changed, 4 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/radeonsi/si_clear.c 
>>> b/src/gallium/drivers/radeonsi/si_clear.c
>>> index 0ac83f4..464b9d7 100644
>>> --- a/src/gallium/drivers/radeonsi/si_clear.c
>>> +++ b/src/gallium/drivers/radeonsi/si_clear.c
>>> @@ -418,26 +418,25 @@ static void si_do_fast_color_clear(struct si_context 
>>> *sctx,
>>>   sctx->b.family == CHIP_STONEY)
>>>   tex->num_slow_clears++;
>>>   }
>>>
>>>   bool need_decompress_pass = false;
>>>
>>>   /* Use a slow clear for small surfaces where the cost of
>>>* the eliminate pass can be higher than the benefit of fast
>>>* clear. The closed driver does this, but the numbers may 
>>> differ.
>>>*
>>> -  * Always use fast clear on APUs.
>>> +  * This helps on both dGPUs and APUs, even small APUs like 
>>> Mullins.
>>>*/
>>> - bool too_small = sctx->screen->info.has_dedicated_vram &&
>>> -  tex->resource.b.b.nr_samples <= 1 &&
>>> -  tex->resource.b.b.width0 <= 256 &&
>>> -  tex->resource.b.b.height0 <= 256;
>>> + bool too_small = tex->resource.b.b.nr_samples <= 1 &&
>>> +  tex->resource.b.b.width0 *
>>> +  tex->resource.b.b.height0 <= 512 * 512;
>>>
>>>   /* Try to clear DCC first, otherwise try CMASK. */
>>>   if (vi_dcc_enabled(tex, 0)) {
>>>   uint32_t reset_value;
>>>   bool clear_words_needed;
>>>
>>>   if (sctx->screen->debug_flags & DBG(NO_DCC_CLEAR))
>>>   continue;
>>>
>>>   /* This can only occur with MSAA. */
>>>
>>
>>
>>
>>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/6] amd/common: Add detection of the syncobj wait/signal/reset ioctls.

2017-12-28 Thread Marek Olšák
OK. I was confused because the name has_syncobj_wait suggests that
it's about amdgpu_cs_syncobj_wait, not WAIT_FOR_SUBMIT.

Marek

On Wed, Dec 27, 2017 at 1:18 AM, Bas Nieuwenhuizen
 wrote:
> For vulkan, I wanted this because of
>
> drm/syncobj: Allow wait for submit and signal behavior (v5)
>
> Vulkan VkFence semantics require that the application be able to perform
> a CPU wait on work which may not yet have been submitted.  This is
> perfectly safe because the CPU wait has a timeout which will get
> triggered eventually if no work is ever submitted.  This behavior is
> advantageous for multi-threaded workloads because, so long as all of the
> threads agree on what fences to use up-front, you don't have the extra
> cross-thread synchronization cost of thread A telling thread B that it
> has submitted its dependent work and thread B is now free to wait.
>
> Within a single process, this can be implemented in the userspace driver
> by doing exactly the same kind of tracking the app would have to do
> using posix condition variables or similar.  However, in order for this
> to work cross-process (as is required by VK_KHR_external_fence), we need
> to handle this in the kernel.
>
> This commit adds a WAIT_FOR_SUBMIT flag to DRM_IOCTL_SYNCOBJ_WAIT which
> instructs the IOCTL to wait for the syncobj to have a non-null fence and
> then wait on the fence.  Combined with DRM_IOCTL_SYNCOBJ_RESET, you can
> easily get the Vulkan behavior.
>
>
> I suppose you could use an earlier DRM version if you don't need it.
> IMO we should keep them separate, as on radv semaphores don't need any
> wait functionality at all.
>
> On Tue, Dec 26, 2017 at 6:29 PM, Marek Olšák  wrote:
>> Does this mean that radeonsi shouldn't use amdgpu_cs_syncobj_wait on older 
>> DRM?
>>
>> Does it make sense to have separate has_syncobj and has_syncobj_wait flags?
>>
>> Marek
>>
>> On Sun, Dec 17, 2017 at 1:11 AM, Bas Nieuwenhuizen
>>  wrote:
>>> First amdgpu bump after inclusion was 20 (which was done for local BOs).
>>> ---
>>>  src/amd/common/ac_gpu_info.c | 1 +
>>>  src/amd/common/ac_gpu_info.h | 1 +
>>>  2 files changed, 2 insertions(+)
>>>
>>> diff --git a/src/amd/common/ac_gpu_info.c b/src/amd/common/ac_gpu_info.c
>>> index 0576dd369cf..c042bb229ce 100644
>>> --- a/src/amd/common/ac_gpu_info.c
>>> +++ b/src/amd/common/ac_gpu_info.c
>>> @@ -277,6 +277,7 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle dev,
>>> vce.available_rings ? vce_version : 0;
>>> info->has_userptr = true;
>>> info->has_syncobj = has_syncobj(fd);
>>> +   info->has_syncobj_wait = info->has_syncobj && info->drm_minor >= 20;
>>> info->has_sync_file = info->has_syncobj && info->drm_minor >= 21;
>>> info->has_ctx_priority = info->drm_minor >= 22;
>>> info->num_render_backends = amdinfo->rb_pipes;
>>> diff --git a/src/amd/common/ac_gpu_info.h b/src/amd/common/ac_gpu_info.h
>>> index 5b9e51658b0..04e17f91c59 100644
>>> --- a/src/amd/common/ac_gpu_info.h
>>> +++ b/src/amd/common/ac_gpu_info.h
>>> @@ -81,6 +81,7 @@ struct radeon_info {
>>> uint32_tdrm_patchlevel;
>>> boolhas_userptr;
>>> boolhas_syncobj;
>>> +   boolhas_syncobj_wait;
>>> boolhas_sync_file;
>>> boolhas_ctx_priority;
>>>
>>> --
>>> 2.15.1
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: don't use fast color clear for small images even on APUs

2017-12-28 Thread Marek Olšák
On Thu, Dec 28, 2017 at 12:29 PM, Konstantin Kharlamov
 wrote:
> I'm wondering, how is r600g different in that regard? I tried wiring up the 
> code into evergreen_do_fast_color_clear(), both in this state and by using 
> 256*256 — however FPS for me always varies around the same 1420.
>
> That said, I'm seeing lots of CPU used by Xorg, glxgears, and compton — I'm 
> wondering if CPU cap could be the reason?

r600g might benefit in the same way. glxgears requires the limit to be
at least 300*300.

Marek

>
> В письме от среда, 13 декабря 2017 г. 2:53:12 MSK пользователь Marek Olšák 
> написал:
>> From: Marek Olšák 
>>
>> Increase the limit and handle non-square images better.
>>
>> This makes glxgears 20% faster on APUs, and a little more on dGPUs.
>> We all use and love glxgears.
>> ---
>>  src/gallium/drivers/radeonsi/si_clear.c | 9 -
>>  1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_clear.c 
>> b/src/gallium/drivers/radeonsi/si_clear.c
>> index 0ac83f4..464b9d7 100644
>> --- a/src/gallium/drivers/radeonsi/si_clear.c
>> +++ b/src/gallium/drivers/radeonsi/si_clear.c
>> @@ -418,26 +418,25 @@ static void si_do_fast_color_clear(struct si_context 
>> *sctx,
>>   sctx->b.family == CHIP_STONEY)
>>   tex->num_slow_clears++;
>>   }
>>
>>   bool need_decompress_pass = false;
>>
>>   /* Use a slow clear for small surfaces where the cost of
>>* the eliminate pass can be higher than the benefit of fast
>>* clear. The closed driver does this, but the numbers may 
>> differ.
>>*
>> -  * Always use fast clear on APUs.
>> +  * This helps on both dGPUs and APUs, even small APUs like 
>> Mullins.
>>*/
>> - bool too_small = sctx->screen->info.has_dedicated_vram &&
>> -  tex->resource.b.b.nr_samples <= 1 &&
>> -  tex->resource.b.b.width0 <= 256 &&
>> -  tex->resource.b.b.height0 <= 256;
>> + bool too_small = tex->resource.b.b.nr_samples <= 1 &&
>> +  tex->resource.b.b.width0 *
>> +  tex->resource.b.b.height0 <= 512 * 512;
>>
>>   /* Try to clear DCC first, otherwise try CMASK. */
>>   if (vi_dcc_enabled(tex, 0)) {
>>   uint32_t reset_value;
>>   bool clear_words_needed;
>>
>>   if (sctx->screen->debug_flags & DBG(NO_DCC_CLEAR))
>>   continue;
>>
>>   /* This can only occur with MSAA. */
>>
>
>
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] intel/aubinator: Allow for the case where the ascii85 decode fails

2017-12-28 Thread Jason Ekstrand

On December 28, 2017 08:24:03 Jason Ekstrand  wrote:


On December 28, 2017 01:30:11 Kenneth Graunke  wrote:


On Wednesday, December 27, 2017 3:13:42 PM PST Jason Ekstrand wrote:

On December 27, 2017 17:06:43 Kenneth Graunke  wrote:

> On Wednesday, December 27, 2017 12:58:12 PM PST Jason Ekstrand wrote:
>> ---
>>  src/intel/tools/aubinator_error_decode.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/intel/tools/aubinator_error_decode.c
>> b/src/intel/tools/aubinator_error_decode.c
>> index d6fbfe0..f0c5b5b 100644
>> --- a/src/intel/tools/aubinator_error_decode.c
>> +++ b/src/intel/tools/aubinator_error_decode.c
>> @@ -350,8 +350,10 @@ read_data_file(FILE *file)
>>   uint32_t *data = NULL;
>>   int count = ascii85_decode(line+1, , line[0] == ':');
>>   if (count == 0) {
>> -fprintf(stderr, "ASCII85 decode failed.\n");
>> -exit(EXIT_FAILURE);
>> +fprintf(stderr, "ASCII85 decode of %s at 0x%08"PRIx64"
failed.\n",
>> +sections[sect_num].buffer_name,
>> +sections[sect_num].gtt_offset);
>> +continue;
>>   }
>>   sections[sect_num].data = data;
>>   sections[sect_num].count = count;
>>
>
> What's the rationale, here?  At this point, you've got a malformed file.
> What do we gain by supporting invalid file formats?

Because there's a decent chance (I've got multiple files on my laptop that
are this way) that other BOs will decompress ok.  I still don't know why
aubinator is failing to decompress things.

--Jason


Are they just from old pre-ascii85 support kernels or something?


Nope, 4.14 or 4.13.


I should be more specific.  It's not ascii85 decide that fails, it's zlib 
decompression.  I have no idea why.



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] intel/aubinator: Allow for the case where the ascii85 decode fails

2017-12-28 Thread Jason Ekstrand

On December 28, 2017 01:30:11 Kenneth Graunke  wrote:


On Wednesday, December 27, 2017 3:13:42 PM PST Jason Ekstrand wrote:

On December 27, 2017 17:06:43 Kenneth Graunke  wrote:

> On Wednesday, December 27, 2017 12:58:12 PM PST Jason Ekstrand wrote:
>> ---
>>  src/intel/tools/aubinator_error_decode.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/intel/tools/aubinator_error_decode.c
>> b/src/intel/tools/aubinator_error_decode.c
>> index d6fbfe0..f0c5b5b 100644
>> --- a/src/intel/tools/aubinator_error_decode.c
>> +++ b/src/intel/tools/aubinator_error_decode.c
>> @@ -350,8 +350,10 @@ read_data_file(FILE *file)
>>   uint32_t *data = NULL;
>>   int count = ascii85_decode(line+1, , line[0] == ':');
>>   if (count == 0) {
>> -fprintf(stderr, "ASCII85 decode failed.\n");
>> -exit(EXIT_FAILURE);
>> +fprintf(stderr, "ASCII85 decode of %s at 0x%08"PRIx64" 
failed.\n",

>> +sections[sect_num].buffer_name,
>> +sections[sect_num].gtt_offset);
>> +continue;
>>   }
>>   sections[sect_num].data = data;
>>   sections[sect_num].count = count;
>>
>
> What's the rationale, here?  At this point, you've got a malformed file.
> What do we gain by supporting invalid file formats?

Because there's a decent chance (I've got multiple files on my laptop that
are this way) that other BOs will decompress ok.  I still don't know why
aubinator is failing to decompress things.

--Jason


Are they just from old pre-ascii85 support kernels or something?


Nope, 4.14 or 4.13.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallium/winsys/kms: Add support for multi-planes (v2)

2017-12-28 Thread Lepton Wu
v2: address comments from Tomasz Figa
   a) Add more check for plane size.
   b) Avoid duplicated mapping and leaked mapping.
   c) Other minor changes.

Signed-off-by: Lepton Wu 

Change-Id: I0863f522976cc8863d6e95492d9346df35c066ec
---
 src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c | 179 +++---
 1 file changed, 126 insertions(+), 53 deletions(-)

diff --git a/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c 
b/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c
index 22e1c936ac5..69c05197081 100644
--- a/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c
+++ b/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c
@@ -59,20 +59,29 @@
 #define DEBUG_PRINT(msg, ...)
 #endif
 
+struct kms_sw_displaytarget;
 
-struct kms_sw_displaytarget
-{
-   enum pipe_format format;
+struct kms_sw_plane {
unsigned width;
unsigned height;
unsigned stride;
+   unsigned offset;
+   struct kms_sw_displaytarget* dt;
+   struct list_head link;
+};
+
+struct kms_sw_displaytarget
+{
+   enum pipe_format format;
unsigned size;
 
uint32_t handle;
void *mapped;
+   void *ro_mapped;
 
int ref_count;
struct list_head link;
+   struct list_head planes;
 };
 
 struct kms_sw_winsys
@@ -83,10 +92,10 @@ struct kms_sw_winsys
struct list_head bo_list;
 };
 
-static inline struct kms_sw_displaytarget *
-kms_sw_displaytarget( struct sw_displaytarget *dt )
+static inline struct kms_sw_plane *
+kms_sw_plane( struct sw_displaytarget *dt )
 {
-   return (struct kms_sw_displaytarget *)dt;
+   return (struct kms_sw_plane *)dt;
 }
 
 static inline struct kms_sw_winsys *
@@ -105,6 +114,42 @@ kms_sw_is_displaytarget_format_supported( struct sw_winsys 
*ws,
return TRUE;
 }
 
+static struct kms_sw_plane *get_plane(struct kms_sw_displaytarget *kms_sw_dt,
+  enum pipe_format format,
+  unsigned width, unsigned height,
+  unsigned stride, unsigned offset) {
+   struct kms_sw_plane * tmp, * plane = NULL;
+   if (offset + util_format_get_2d_size(format, stride, height) >
+   kms_sw_dt->size) {
+  DEBUG_PRINT("KMS-DEBUG: plane too big. format: %d stride: %d height: %d "
+  "offset: %d size:%d\n", format, stride, height, offset,
+  kms_sw_dt->size);
+  return NULL;
+   }
+   LIST_FOR_EACH_ENTRY(tmp, _sw_dt->planes, link) {
+  if (tmp->offset == offset) {
+ plane = tmp;
+ break;
+  }
+   }
+   if (plane) {
+  assert(plane->width == width);
+  assert(plane->height == height);
+  assert(plane->stride == stride);
+  assert(plane->dt == kms_sw_dt);
+   } else {
+  plane = CALLOC_STRUCT(kms_sw_plane);
+  if (plane == NULL) return NULL;
+  plane->width = width;
+  plane->height = height;
+  plane->stride = stride;
+  plane->offset = offset;
+  plane->dt = kms_sw_dt;
+  list_add(>link, _sw_dt->planes);
+   }
+   return plane;
+}
+
 static struct sw_displaytarget *
 kms_sw_displaytarget_create(struct sw_winsys *ws,
 unsigned tex_usage,
@@ -124,11 +169,10 @@ kms_sw_displaytarget_create(struct sw_winsys *ws,
if (!kms_sw_dt)
   goto no_dt;
 
+   list_inithead(_sw_dt->planes);
kms_sw_dt->ref_count = 1;
 
kms_sw_dt->format = format;
-   kms_sw_dt->width = width;
-   kms_sw_dt->height = height;
 
memset(_req, 0, sizeof(create_req));
create_req.bpp = 32;
@@ -138,17 +182,19 @@ kms_sw_displaytarget_create(struct sw_winsys *ws,
if (ret)
   goto free_bo;
 
-   kms_sw_dt->stride = create_req.pitch;
kms_sw_dt->size = create_req.size;
kms_sw_dt->handle = create_req.handle;
+   struct kms_sw_plane* plane = get_plane(kms_sw_dt, format, width, height,
+  create_req.pitch, 0);
+   if (plane == NULL)
+  goto free_bo;
 
list_add(_sw_dt->link, _sw->bo_list);
 
DEBUG_PRINT("KMS-DEBUG: created buffer %u (size %u)\n", kms_sw_dt->handle, 
kms_sw_dt->size);
 
-   *stride = kms_sw_dt->stride;
-   return (struct sw_displaytarget *)kms_sw_dt;
-
+   *stride = create_req.pitch;
+   return (struct sw_displaytarget *) plane;
  free_bo:
memset(_req, 0, sizeof destroy_req);
destroy_req.handle = create_req.handle;
@@ -163,13 +209,19 @@ kms_sw_displaytarget_destroy(struct sw_winsys *ws,
  struct sw_displaytarget *dt)
 {
struct kms_sw_winsys *kms_sw = kms_sw_winsys(ws);
-   struct kms_sw_displaytarget *kms_sw_dt = kms_sw_displaytarget(dt);
+   struct kms_sw_plane *plane = kms_sw_plane(dt);
+   struct kms_sw_displaytarget *kms_sw_dt = plane->dt;
struct drm_mode_destroy_dumb destroy_req;
 
kms_sw_dt->ref_count --;
if (kms_sw_dt->ref_count > 0)
   return;
 
+   if (kms_sw_dt->ro_mapped)
+ munmap(kms_sw_dt->ro_mapped, kms_sw_dt->size);
+   if (kms_sw_dt->mapped)
+ munmap(kms_sw_dt->mapped, kms_sw_dt->size);
+
memset(_req, 0, sizeof 

Re: [Mesa-dev] [PATCH] radeonsi: don't use fast color clear for small images even on APUs

2017-12-28 Thread Konstantin Kharlamov
I'm wondering, how is r600g different in that regard? I tried wiring up the 
code into evergreen_do_fast_color_clear(), both in this state and by using 
256*256 — however FPS for me always varies around the same 1420.

That said, I'm seeing lots of CPU used by Xorg, glxgears, and compton — I'm 
wondering if CPU cap could be the reason?

В письме от среда, 13 декабря 2017 г. 2:53:12 MSK пользователь Marek Olšák 
написал:
> From: Marek Olšák 
> 
> Increase the limit and handle non-square images better.
> 
> This makes glxgears 20% faster on APUs, and a little more on dGPUs.
> We all use and love glxgears.
> ---
>  src/gallium/drivers/radeonsi/si_clear.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_clear.c 
> b/src/gallium/drivers/radeonsi/si_clear.c
> index 0ac83f4..464b9d7 100644
> --- a/src/gallium/drivers/radeonsi/si_clear.c
> +++ b/src/gallium/drivers/radeonsi/si_clear.c
> @@ -418,26 +418,25 @@ static void si_do_fast_color_clear(struct si_context 
> *sctx,
>   sctx->b.family == CHIP_STONEY)
>   tex->num_slow_clears++;
>   }
>  
>   bool need_decompress_pass = false;
>  
>   /* Use a slow clear for small surfaces where the cost of
>* the eliminate pass can be higher than the benefit of fast
>* clear. The closed driver does this, but the numbers may 
> differ.
>*
> -  * Always use fast clear on APUs.
> +  * This helps on both dGPUs and APUs, even small APUs like 
> Mullins.
>*/
> - bool too_small = sctx->screen->info.has_dedicated_vram &&
> -  tex->resource.b.b.nr_samples <= 1 &&
> -  tex->resource.b.b.width0 <= 256 &&
> -  tex->resource.b.b.height0 <= 256;
> + bool too_small = tex->resource.b.b.nr_samples <= 1 &&
> +  tex->resource.b.b.width0 *
> +  tex->resource.b.b.height0 <= 512 * 512;
>  
>   /* Try to clear DCC first, otherwise try CMASK. */
>   if (vi_dcc_enabled(tex, 0)) {
>   uint32_t reset_value;
>   bool clear_words_needed;
>  
>   if (sctx->screen->debug_flags & DBG(NO_DCC_CLEAR))
>   continue;
>  
>   /* This can only occur with MSAA. */
> 




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: move local bos usage to a perftest flag.

2017-12-28 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

On Thu, Dec 28, 2017 at 7:14 AM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> These seem mildly unstable on vega, crashing CTS in various fun ways,
> and looks like leaking memory.
>
> Disable for now, but leave the option to enable them.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/amd/vulkan/radv_debug.h   | 1 +
>  src/amd/vulkan/radv_device.c  | 1 +
>  src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c | 2 +-
>  src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c | 1 +
>  src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.h | 1 +
>  5 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/src/amd/vulkan/radv_debug.h b/src/amd/vulkan/radv_debug.h
> index 8e09c36528..af07564833 100644
> --- a/src/amd/vulkan/radv_debug.h
> +++ b/src/amd/vulkan/radv_debug.h
> @@ -47,6 +47,7 @@ enum {
>  enum {
> RADV_PERFTEST_NO_BATCHCHAIN  =   0x1,
> RADV_PERFTEST_SISCHED=   0x2,
> +   RADV_PERFTEST_LOCAL_BOS  =   0x4,
>  };
>
>  bool
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index 788252c2c5..2a249b95e2 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -343,6 +343,7 @@ radv_get_debug_option_name(int id)
>  static const struct debug_control radv_perftest_options[] = {
> {"nobatchchain", RADV_PERFTEST_NO_BATCHCHAIN},
> {"sisched", RADV_PERFTEST_SISCHED},
> +   {"localbos", RADV_PERFTEST_LOCAL_BOS},
> {NULL, 0}
>  };
>
> diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c 
> b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c
> index ffcc1a2ad3..4b11823b0a 100644
> --- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c
> +++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c
> @@ -335,7 +335,7 @@ radv_amdgpu_winsys_bo_create(struct radeon_winsys *_ws,
> request.flags |= AMDGPU_GEM_CREATE_CPU_GTT_USWC;
> if (!(flags & RADEON_FLAG_IMPLICIT_SYNC) && ws->info.drm_minor >= 22)
> request.flags |= AMDGPU_GEM_CREATE_EXPLICIT_SYNC;
> -   if (flags & RADEON_FLAG_NO_INTERPROCESS_SHARING && ws->info.drm_minor 
> >= 20) {
> +   if (flags & RADEON_FLAG_NO_INTERPROCESS_SHARING && ws->info.drm_minor 
> >= 20 && ws->use_local_bos) {
> bo->base.is_local = true;
> request.flags |= AMDGPU_GEM_CREATE_VM_ALWAYS_VALID;
> }
> diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c 
> b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c
> index 0c6374e71c..42e83f1482 100644
> --- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c
> +++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c
> @@ -177,6 +177,7 @@ radv_amdgpu_winsys_create(int fd, uint64_t debug_flags, 
> uint64_t perftest_flags)
> if (debug_flags & RADV_DEBUG_NO_IBS)
> ws->use_ib_bos = false;
>
> +   ws->use_local_bos = perftest_flags & RADV_PERFTEST_LOCAL_BOS;
> ws->zero_all_vram_allocs = debug_flags & RADV_DEBUG_ZERO_VRAM;
> ws->batchchain = !(perftest_flags & RADV_PERFTEST_NO_BATCHCHAIN);
> LIST_INITHEAD(>global_bo_list);
> diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.h 
> b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.h
> index 66c93475e5..d6af6052a6 100644
> --- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.h
> +++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.h
> @@ -46,6 +46,7 @@ struct radv_amdgpu_winsys {
> bool batchchain;
> bool use_ib_bos;
> bool zero_all_vram_allocs;
> +   bool use_local_bos;
> unsigned num_buffers;
>
> pthread_mutex_t global_bo_list_lock;
> --
> 2.14.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: fix pipeline statistics end query on compute queue

2017-12-28 Thread Bas Nieuwenhuizen
Please add a fixes tag.

Reviewed-by: Bas Nieuwenhuizen 

On Thu, Dec 28, 2017 at 7:33 AM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> It's legal to a pipeline stat query on a compute queue,
> but we'd emit the wrong packet here. This should fix it to emit
> the correct packet.
>
> Noticed while inspecting the mpv hang.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/amd/vulkan/radv_query.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c
> index 5dc88af8f8..ace745e4e6 100644
> --- a/src/amd/vulkan/radv_query.c
> +++ b/src/amd/vulkan/radv_query.c
> @@ -1156,7 +1156,7 @@ void radv_CmdEndQuery(
> si_cs_emit_write_event_eop(cs,
>false,
>
> cmd_buffer->device->physical_device->rad_info.chip_class,
> -  false,
> +  
> radv_cmd_buffer_uses_mec(cmd_buffer),
>V_028A90_BOTTOM_OF_PIPE_TS, 0,
>1, avail_va, 0, 1);
> break;
> --
> 2.14.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: fix events on compute queues.

2017-12-28 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

On Thu, Dec 28, 2017 at 7:29 AM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> The event emission wasn't sending the correct packet for gfx8 compute
> queues, which explains why it works on vega fine.
>
> This fixes the mpv vulkan hang.
>
> Fixes: ad61eac250 (radv: factor out eop event writing code. (v2))
>
> Signed-off-by: Dave Airlie 
> ---
>  src/amd/vulkan/radv_cmd_buffer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/amd/vulkan/radv_cmd_buffer.c 
> b/src/amd/vulkan/radv_cmd_buffer.c
> index 6a89d4e568..42468bceed 100644
> --- a/src/amd/vulkan/radv_cmd_buffer.c
> +++ b/src/amd/vulkan/radv_cmd_buffer.c
> @@ -4002,7 +4002,7 @@ static void write_event(struct radv_cmd_buffer 
> *cmd_buffer,
> si_cs_emit_write_event_eop(cs,
>cmd_buffer->state.predicating,
>
> cmd_buffer->device->physical_device->rad_info.chip_class,
> -  false,
> +  radv_cmd_buffer_uses_mec(cmd_buffer),
>V_028A90_BOTTOM_OF_PIPE_TS, 0,
>1, va, 2, value);
>
> --
> 2.14.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev