Re: [Mesa-dev] [PATCH 16/25] radeonsi: add PS prolog

2016-02-19 Thread Nicolai Hähnle

On 19.02.2016 06:39, Marek Olšák wrote:

On Tue, Feb 16, 2016 at 5:31 PM, Nicolai Hähnle  wrote:

So, patches 12-16 also look good to me except for the comments I've sent on
12-14.


Does this mean I can add your Rb on those?


Yes indeed.

Nicolai



Marek


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 16/25] radeonsi: add PS prolog

2016-02-19 Thread Marek Olšák
On Tue, Feb 16, 2016 at 5:31 PM, Nicolai Hähnle  wrote:
> So, patches 12-16 also look good to me except for the comments I've sent on
> 12-14.

Does this mean I can add your Rb on those?

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 16/25] radeonsi: add PS prolog

2016-02-16 Thread Marek Olšák
On Tue, Feb 16, 2016 at 5:31 PM, Nicolai Hähnle  wrote:
> So, patches 12-16 also look good to me except for the comments I've sent on
> 12-14.
>
> I'm a bit worried though that there is a lot of "almost code duplication"
> around the handling of input and output positions etc., and maintaining the
> two different code paths for monolithic and non-monolithic is brittle.
>
> Here's an approach that I think could work to clean this up: keep only the
> non-monolithic code for LLVM IR function generation. Then implement
> monolithic mode with a helper that takes a sequence of LLVM IR functions and
> generates a master function that pipes each function's output into the input
> of the next. Then set the functions as always inline and rely on LLVM's
> inliner to stitch everything together.
>
> This ends up with slightly higher overhead for the monolithic code path
> (although the unconditional inlining should be fast), but it would help
> clean the code up tremendously.

Yes, I had the same idea. However, the incremental approach was the
most bearable way to do it and made fixing regressions and hangs a lot
easier. Doing a complete rewrite from monolithic to non-monolithic
would be a lot more frustrating.

Cleaning this up is definitely a good idea, but we need to make it all
more useful first and add an on-disk shader cache to mesa/main. That
will be quite a project too.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 16/25] radeonsi: add PS prolog

2016-02-16 Thread Nicolai Hähnle
So, patches 12-16 also look good to me except for the comments I've sent 
on 12-14.


I'm a bit worried though that there is a lot of "almost code 
duplication" around the handling of input and output positions etc., and 
maintaining the two different code paths for monolithic and 
non-monolithic is brittle.


Here's an approach that I think could work to clean this up: keep only 
the non-monolithic code for LLVM IR function generation. Then implement 
monolithic mode with a helper that takes a sequence of LLVM IR functions 
and generates a master function that pipes each function's output into 
the input of the next. Then set the functions as always inline and rely 
on LLVM's inliner to stitch everything together.


This ends up with slightly higher overhead for the monolithic code path 
(although the unconditional inlining should be fast), but it would help 
clean the code up tremendously.


Cheers,
Nicolai

On 15.02.2016 18:59, Marek Olšák wrote:

From: Marek Olšák 

---
  src/gallium/drivers/radeonsi/si_pipe.c  |   1 +
  src/gallium/drivers/radeonsi/si_pipe.h  |   1 +
  src/gallium/drivers/radeonsi/si_shader.c| 324 +++-
  src/gallium/drivers/radeonsi/si_shader.h|  14 +-
  src/gallium/drivers/radeonsi/si_state_shaders.c |   7 +
  5 files changed, 345 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 02c430d..44f6047 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -541,6 +541,7 @@ static void si_destroy_screen(struct pipe_screen* pscreen)
sscreen->vs_prologs,
sscreen->vs_epilogs,
sscreen->tcs_epilogs,
+   sscreen->ps_prologs,
sscreen->ps_epilogs
};
unsigned i;
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 5d204ec..1ac7bc4 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -92,6 +92,7 @@ struct si_screen {
struct si_shader_part   *vs_prologs;
struct si_shader_part   *vs_epilogs;
struct si_shader_part   *tcs_epilogs;
+   struct si_shader_part   *ps_prologs;
struct si_shader_part   *ps_epilogs;
  };

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 915ac1d..c6d4cb5 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -875,7 +875,8 @@ static int lookup_interp_param_index(unsigned interpolate, 
unsigned location)
  static unsigned select_interp_param(struct si_shader_context *ctx,
unsigned param)
  {
-   if (!ctx->shader->key.ps.prolog.force_persample_interp)
+   if (!ctx->shader->key.ps.prolog.force_persample_interp ||
+   !ctx->is_monolithic)
return param;

/* If the shader doesn't use center/centroid, just return the parameter.
@@ -1019,6 +1020,7 @@ static void declare_input_fs(
unsigned input_index,
const struct tgsi_full_declaration *decl)
  {
+   struct lp_build_context *base = &radeon_bld->soa.bld_base.base;
struct si_shader_context *ctx =
si_shader_context(&radeon_bld->soa.bld_base);
struct si_shader *shader = ctx->shader;
@@ -1026,6 +1028,26 @@ static void declare_input_fs(
LLVMValueRef interp_param = NULL;
int interp_param_idx;

+   /* Get colors from input VGPRs (set by the prolog). */
+   if (!ctx->is_monolithic &&
+   decl->Semantic.Name == TGSI_SEMANTIC_COLOR) {
+   unsigned i = decl->Semantic.Index;
+   unsigned colors_read = shader->selector->info.colors_read;
+   unsigned mask = colors_read >> (i * 4);
+   unsigned offset = SI_PARAM_POS_FIXED_PT + 1 +
+ (i ? util_bitcount(colors_read & 0xf) : 0);
+
+   radeon_bld->inputs[radeon_llvm_reg_index_soa(input_index, 0)] =
+   mask & 0x1 ? LLVMGetParam(main_fn, offset++) : 
base->undef;
+   radeon_bld->inputs[radeon_llvm_reg_index_soa(input_index, 1)] =
+   mask & 0x2 ? LLVMGetParam(main_fn, offset++) : 
base->undef;
+   radeon_bld->inputs[radeon_llvm_reg_index_soa(input_index, 2)] =
+   mask & 0x4 ? LLVMGetParam(main_fn, offset++) : 
base->undef;
+   radeon_bld->inputs[radeon_llvm_reg_index_soa(input_index, 3)] =
+   mask & 0x8 ? LLVMGetParam(main_fn, offset++) : 
base->undef;
+   return;
+   }
+
interp_param_idx = lookup_interp_param_index(decl->Interp.Interpolate,
 decl->Interp.Location);
if (interp_param_idx == -1)
@@ -3966,6 +3988,16 @@ static void cr

[Mesa-dev] [PATCH 16/25] radeonsi: add PS prolog

2016-02-15 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_pipe.c  |   1 +
 src/gallium/drivers/radeonsi/si_pipe.h  |   1 +
 src/gallium/drivers/radeonsi/si_shader.c| 324 +++-
 src/gallium/drivers/radeonsi/si_shader.h|  14 +-
 src/gallium/drivers/radeonsi/si_state_shaders.c |   7 +
 5 files changed, 345 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 02c430d..44f6047 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -541,6 +541,7 @@ static void si_destroy_screen(struct pipe_screen* pscreen)
sscreen->vs_prologs,
sscreen->vs_epilogs,
sscreen->tcs_epilogs,
+   sscreen->ps_prologs,
sscreen->ps_epilogs
};
unsigned i;
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 5d204ec..1ac7bc4 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -92,6 +92,7 @@ struct si_screen {
struct si_shader_part   *vs_prologs;
struct si_shader_part   *vs_epilogs;
struct si_shader_part   *tcs_epilogs;
+   struct si_shader_part   *ps_prologs;
struct si_shader_part   *ps_epilogs;
 };
 
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 915ac1d..c6d4cb5 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -875,7 +875,8 @@ static int lookup_interp_param_index(unsigned interpolate, 
unsigned location)
 static unsigned select_interp_param(struct si_shader_context *ctx,
unsigned param)
 {
-   if (!ctx->shader->key.ps.prolog.force_persample_interp)
+   if (!ctx->shader->key.ps.prolog.force_persample_interp ||
+   !ctx->is_monolithic)
return param;
 
/* If the shader doesn't use center/centroid, just return the parameter.
@@ -1019,6 +1020,7 @@ static void declare_input_fs(
unsigned input_index,
const struct tgsi_full_declaration *decl)
 {
+   struct lp_build_context *base = &radeon_bld->soa.bld_base.base;
struct si_shader_context *ctx =
si_shader_context(&radeon_bld->soa.bld_base);
struct si_shader *shader = ctx->shader;
@@ -1026,6 +1028,26 @@ static void declare_input_fs(
LLVMValueRef interp_param = NULL;
int interp_param_idx;
 
+   /* Get colors from input VGPRs (set by the prolog). */
+   if (!ctx->is_monolithic &&
+   decl->Semantic.Name == TGSI_SEMANTIC_COLOR) {
+   unsigned i = decl->Semantic.Index;
+   unsigned colors_read = shader->selector->info.colors_read;
+   unsigned mask = colors_read >> (i * 4);
+   unsigned offset = SI_PARAM_POS_FIXED_PT + 1 +
+ (i ? util_bitcount(colors_read & 0xf) : 0);
+
+   radeon_bld->inputs[radeon_llvm_reg_index_soa(input_index, 0)] =
+   mask & 0x1 ? LLVMGetParam(main_fn, offset++) : 
base->undef;
+   radeon_bld->inputs[radeon_llvm_reg_index_soa(input_index, 1)] =
+   mask & 0x2 ? LLVMGetParam(main_fn, offset++) : 
base->undef;
+   radeon_bld->inputs[radeon_llvm_reg_index_soa(input_index, 2)] =
+   mask & 0x4 ? LLVMGetParam(main_fn, offset++) : 
base->undef;
+   radeon_bld->inputs[radeon_llvm_reg_index_soa(input_index, 3)] =
+   mask & 0x8 ? LLVMGetParam(main_fn, offset++) : 
base->undef;
+   return;
+   }
+
interp_param_idx = lookup_interp_param_index(decl->Interp.Interpolate,
 decl->Interp.Location);
if (interp_param_idx == -1)
@@ -3966,6 +3988,16 @@ static void create_function(struct si_shader_context 
*ctx)
num_params = SI_PARAM_POS_FIXED_PT+1;
 
if (!ctx->is_monolithic) {
+   /* Color inputs from the prolog. */
+   if (shader->selector->info.colors_read) {
+   unsigned num_color_elements =
+   
util_bitcount(shader->selector->info.colors_read);
+
+   assert(num_params + num_color_elements <= 
ARRAY_SIZE(params));
+   for (i = 0; i < num_color_elements; i++)
+   params[num_params++] = ctx->f32;
+   }
+
/* Outputs for the epilog. */
num_return_sgprs = SI_SGPR_ALPHA_REF + 1;
num_returns =
@@ -3997,6 +4029,20 @@ static void create_function(struct si_shader_context 
*ctx)
si_create_function(ctx, re