On Tue, Feb 16, 2016 at 5:01 PM, Nicolai Hähnle <nhaeh...@gmail.com> wrote: > On 15.02.2016 18:59, Marek Olšák wrote: >> >> From: Marek Olšák <marek.ol...@amd.com> >> >> --- >> src/gallium/drivers/radeonsi/si_pipe.c | 1 + >> src/gallium/drivers/radeonsi/si_pipe.h | 3 ++ >> src/gallium/drivers/radeonsi/si_shader.c | 53 >> ++++++++++++++++++++++++-------- >> src/gallium/drivers/radeonsi/si_shader.h | 2 +- >> 4 files changed, 45 insertions(+), 14 deletions(-) >> >> diff --git a/src/gallium/drivers/radeonsi/si_pipe.c >> b/src/gallium/drivers/radeonsi/si_pipe.c >> index fa60732..448fe88 100644 >> --- a/src/gallium/drivers/radeonsi/si_pipe.c >> +++ b/src/gallium/drivers/radeonsi/si_pipe.c >> @@ -600,6 +600,7 @@ struct pipe_screen *radeonsi_screen_create(struct >> radeon_winsys *ws) >> >> sscreen->b.has_cp_dma = true; >> sscreen->b.has_streamout = true; >> + sscreen->use_monolithic_shaders = true; >> >> if (debug_get_bool_option("RADEON_DUMP_SHADERS", FALSE)) >> sscreen->b.debug_flags |= DBG_FS | DBG_VS | DBG_GS | >> DBG_PS | DBG_CS; >> diff --git a/src/gallium/drivers/radeonsi/si_pipe.h >> b/src/gallium/drivers/radeonsi/si_pipe.h >> index b5790d6..2a2455c 100644 >> --- a/src/gallium/drivers/radeonsi/si_pipe.h >> +++ b/src/gallium/drivers/radeonsi/si_pipe.h >> @@ -84,6 +84,9 @@ struct si_compute; >> struct si_screen { >> struct r600_common_screen b; >> unsigned gs_table_depth; >> + >> + /* Whether shaders are monolithic (1-part) or separate (3-part). >> */ >> + bool use_monolithic_shaders; >> }; >> >> struct si_blend_color { >> diff --git a/src/gallium/drivers/radeonsi/si_shader.c >> b/src/gallium/drivers/radeonsi/si_shader.c >> index b058019..b74ed1e 100644 >> --- a/src/gallium/drivers/radeonsi/si_shader.c >> +++ b/src/gallium/drivers/radeonsi/si_shader.c >> @@ -70,6 +70,12 @@ struct si_shader_context >> >> unsigned type; /* TGSI_PROCESSOR_* specifies the type of shader. >> */ >> bool is_gs_copy_shader; >> + >> + /* Whether to generate the optimized shader variant compiled as a >> whole >> + * (without a prolog and epilog) >> + */ >> + bool is_monolithic; >> + >> int param_streamout_config; >> int param_streamout_write_index; >> int param_streamout_offset[4]; >> @@ -3657,8 +3663,10 @@ static void create_function(struct >> si_shader_context *ctx) >> struct lp_build_tgsi_context *bld_base = >> &ctx->radeon_bld.soa.bld_base; >> struct gallivm_state *gallivm = bld_base->base.gallivm; >> struct si_shader *shader = ctx->shader; >> - LLVMTypeRef params[SI_NUM_PARAMS], v2i32, v3i32; >> + LLVMTypeRef params[SI_NUM_PARAMS + SI_NUM_VERTEX_BUFFERS], v2i32, >> v3i32; >> + LLVMTypeRef returns[16+32*4]; > > > This is a bit of a magic number, I guess something like max parameters plus > attributes. Can you replace it by the appropriate defines?
There is not a single definition that would express this clearly. The prolog has to return up to 16 input SGPRs and 4-20 input VGPRs. Additionally, the prolog returns other data in VGPRs. That's up to 4+16 VGPRs (16 vertex load addresses) for the VS and 20+8 VGPRs (2 vec4 colors) for the PS. The PS epilog returns one SGPR (but in s10 or so, so we need to allocate 11) and 9*4 VGPRs at most. This all can change in the future, who knows. 16+32*4 is much more than we'll ever need, but it shouldn't overflow at least. Assertions also check if we don't overflow. Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev