[Mesa-dev] Pull request for 1.50 GS layout qualifiers
Hey Paul! I got the layout qualifiers working. It's unblocked things so I can finish off a bunch of testcases I've been working on, so I'd like to get it in your gs branch so we can all enjoy testcases together. There's one not-for-upstream commit in here, and do note the TODO in the last commit (and we need tests for this feature still). Oh, and there's one little prep commit for UBOs, too. I'm planning on sending out these commits: glsl: Make _mesa_print_ir() available from anything including ir.h. glsl: Remove ir_print_visitor.h includes and usage mesa: Use shared code for converting shader targets to short strings. mesa: Move the common _mesa_glsl_compile_shader() code to glsl/. for review, plus a port of your "Make files buildable from C" to the list, since they seem like a good cleanup, together. The following changes since commit 4e6d6dbfab79d9e7aff5d26c585d6e77b36db0f2: !UPSTREAM: Handle GS_OPCODE_THREAD_END in implied_mrf_writes() (2013-06-12 11:09:01 -0700) are available in the git repository at: git://people.freedesktop.org/~anholt/mesa gs-qualifiers for you to fetch changes up to dbe3e86de06813ea0619dd9035f328372c9caab2: glsl: Cross-validate GS layout qualifiers while intrastage linking. (2013-06-13 18:04:29 -0700) Eric Anholt (11): mesa: Expose uniform buffers in geometry shaders. glsl: Make _mesa_print_ir() available from anything including ir.h. glsl: Remove ir_print_visitor.h includes and usage mesa: Use shared code for converting shader targets to short strings. mesa: Move the common _mesa_glsl_compile_shader() code to glsl/. glsl: Include EmitVertex() and EndPrimitive() prototypes for GLSL 1.50 GS. glsl: !UPSTREAM: Spam in builtin 1.30 variables for 1.50 GSes. glsl: Make sure that we don't put too many bitfields in ast_type_qualifier. glsl: Parse the GLSL 1.50 GS layout qualifiers. glsl: Export the compiler's GS layout qualifiers to the gl_shader. glsl: Cross-validate GS layout qualifiers while intrastage linking. src/glsl/ast.h | 12 ++ src/glsl/ast_to_hir.cpp| 2 + src/glsl/ast_type.cpp | 23 src/glsl/builtin_variables.cpp | 6 + src/glsl/builtins/profiles/150.geom| 3 + src/glsl/glsl_parser.yy| 69 +- src/glsl/glsl_parser_extras.cpp| 153 - src/glsl/glsl_parser_extras.h | 11 ++ src/glsl/ir.h | 8 ++ src/glsl/ir_print_visitor.cpp | 3 + src/glsl/ir_print_visitor.h| 3 - src/glsl/ir_rvalue_visitor.cpp | 1 - src/glsl/link_varyings.cpp | 12 +- src/glsl/linker.cpp| 115 +--- src/glsl/linker.h | 3 - src/glsl/main.cpp | 60 +--- src/glsl/opt_array_splitting.cpp | 1 - src/glsl/opt_noop_swizzle.cpp | 1 - src/glsl/opt_structure_splitting.cpp | 1 - src/glsl/program.h | 16 ++- src/glsl/test_optpass.cpp | 1 - src/mesa/drivers/dri/i965/brw_fs.cpp | 1 - src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 1 - src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 1 - .../drivers/dri/i965/brw_fs_vector_splitting.cpp | 1 - src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 5 +- .../drivers/dri/i965/brw_schedule_instructions.cpp | 1 - src/mesa/drivers/dri/i965/brw_shader.cpp | 10 +- src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 - src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 1 - .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 1 - src/mesa/main/ff_fragment_shader.cpp | 1 - src/mesa/main/mtypes.h | 18 +++ src/mesa/main/shaderapi.c | 74 ++ src/mesa/main/uniform_query.cpp| 9 +- src/mesa/program/ir_to_mesa.cpp| 102 +- src/mesa/program/ir_to_mesa.h | 1 - src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 12 +- 38 files changed, 491 insertions(+), 253 deletions(-) create mode 100644 src/glsl/builtins/profiles/150.geom pgpCafXQGWI4W.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: don't clear the so targets until we stream out
> > Though I find stream output very confusing... > > I agree. I was digging a bit more and I think I was correct the first time. > The D3D spec is very clear that "a buffer cannot be bound as both an input > and an output at the same time", so I think the current behavior is correct, > or at least one of the correct options given that the behavior is simply > undefined. So I think I'm going to skip this patch, especially that is is > subtly wrong (because it will clear so target buffers on each invocation of > the stream output stage which isn't correct behavior since the buffers > should only be cleared when new so targets are set). Actually I'd just like to commit the attached patch. All it does is move the clearing of the so targets from the drivers to the draw module. It fixes a bug in softpipe, because softpipe would never clear the buffers and would always append. zFrom a4a89e8f39a127474c668cb72a7db24038396731 Mon Sep 17 00:00:00 2001 From: Zack Rusin Date: Thu, 13 Jun 2013 17:57:47 -0400 Subject: [PATCH] draw: clear the draw buffers in draw Moves clearing of the draw so target buffers to the draw module. They had to be cleared in the drivers before which was quite messy. Signed-off-by: Zack Rusin --- src/gallium/auxiliary/draw/draw_context.c | 12 ++-- src/gallium/auxiliary/draw/draw_context.h |3 ++- src/gallium/drivers/llvmpipe/lp_context.h |1 + src/gallium/drivers/llvmpipe/lp_draw_arrays.c |4 ++-- src/gallium/drivers/llvmpipe/lp_state_so.c|8 ++-- src/gallium/drivers/softpipe/sp_context.h |1 + src/gallium/drivers/softpipe/sp_draw_arrays.c |4 ++-- src/gallium/drivers/softpipe/sp_state_so.c|1 + 8 files changed, 21 insertions(+), 13 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_context.c b/src/gallium/auxiliary/draw/draw_context.c index 22c0e9b..53f515e 100644 --- a/src/gallium/auxiliary/draw/draw_context.c +++ b/src/gallium/auxiliary/draw/draw_context.c @@ -809,12 +809,20 @@ draw_get_rasterizer_no_cull( struct draw_context *draw, void draw_set_mapped_so_targets(struct draw_context *draw, int num_targets, - struct draw_so_target *targets[PIPE_MAX_SO_BUFFERS]) + struct draw_so_target *targets[PIPE_MAX_SO_BUFFERS], + unsigned append_bitmask) { int i; - for (i = 0; i < num_targets; i++) + for (i = 0; i < num_targets; i++) { draw->so.targets[i] = targets[i]; + /* if we're not appending then lets reset the internal + data of our so target */ + if (!(append_bitmask & (1 << i)) && draw->so.targets[i]) { + draw->so.targets[i]->internal_offset = 0; + draw->so.targets[i]->emitted_vertices = 0; + } + } for (i = num_targets; i < PIPE_MAX_SO_BUFFERS; i++) draw->so.targets[i] = NULL; diff --git a/src/gallium/auxiliary/draw/draw_context.h b/src/gallium/auxiliary/draw/draw_context.h index 4a1b27e..ae63068 100644 --- a/src/gallium/auxiliary/draw/draw_context.h +++ b/src/gallium/auxiliary/draw/draw_context.h @@ -231,7 +231,8 @@ draw_set_mapped_constant_buffer(struct draw_context *draw, void draw_set_mapped_so_targets(struct draw_context *draw, int num_targets, - struct draw_so_target *targets[PIPE_MAX_SO_BUFFERS]); + struct draw_so_target *targets[PIPE_MAX_SO_BUFFERS], + unsigned append_bitmask); /*** diff --git a/src/gallium/drivers/llvmpipe/lp_context.h b/src/gallium/drivers/llvmpipe/lp_context.h index abfe852..0515968 100644 --- a/src/gallium/drivers/llvmpipe/lp_context.h +++ b/src/gallium/drivers/llvmpipe/lp_context.h @@ -91,6 +91,7 @@ struct llvmpipe_context { struct draw_so_target *so_targets[PIPE_MAX_SO_BUFFERS]; int num_so_targets; + unsigned so_append_bitmask; struct pipe_query_data_so_statistics so_stats; unsigned num_primitives_generated; diff --git a/src/gallium/drivers/llvmpipe/lp_draw_arrays.c b/src/gallium/drivers/llvmpipe/lp_draw_arrays.c index 4e23904..11b665a 100644 --- a/src/gallium/drivers/llvmpipe/lp_draw_arrays.c +++ b/src/gallium/drivers/llvmpipe/lp_draw_arrays.c @@ -104,7 +104,7 @@ llvmpipe_draw_vbo(struct pipe_context *pipe, const struct pipe_draw_info *info) } } draw_set_mapped_so_targets(draw, lp->num_so_targets, - lp->so_targets); + lp->so_targets, lp->so_append_bitmask); llvmpipe_prepare_vertex_sampling(lp, lp->num_sampler_views[PIPE_SHADER_VERTEX], @@ -134,7 +134,7 @@ llvmpipe_draw_vbo(struct pipe_context *pipe, const struct pipe_draw_info *info) if (mapped_indices) { draw_set_indexes(draw, NULL, 0, 0); } - draw_set_mapped_so_targets(draw, 0, NULL); + draw_set_mapped_so_targets(draw, 0, N
Re: [Mesa-dev] [PATCH] draw: don't clear the so targets until we stream out
> Though I find stream output very confusing... I agree. I was digging a bit more and I think I was correct the first time. The D3D spec is very clear that "a buffer cannot be bound as both an input and an output at the same time", so I think the current behavior is correct, or at least one of the correct options given that the behavior is simply undefined. So I think I'm going to skip this patch, especially that is is subtly wrong (because it will clear so target buffers on each invocation of the stream output stage which isn't correct behavior since the buffers should only be cleared when new so targets are set). z ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: don't clear the so targets until we stream out
Am 14.06.2013 00:04, schrieb Zack Rusin: > Since draw auto fetches the count from the buffers, we can't > just clear them on bind, we need to wait until the actual > stream out is performed. Otherwise the count for draw auto > will be zero. Plus is cleaner to have draw do it rather > than drivers having to mess with draw's internals. > > Signed-off-by: Zack Rusin > --- > src/gallium/auxiliary/draw/draw_context.c |4 +++- > src/gallium/auxiliary/draw/draw_context.h |3 ++- > src/gallium/auxiliary/draw/draw_private.h |1 + > src/gallium/auxiliary/draw/draw_pt_so_emit.c | 20 > src/gallium/drivers/llvmpipe/lp_context.h |1 + > src/gallium/drivers/llvmpipe/lp_draw_arrays.c |4 ++-- > src/gallium/drivers/llvmpipe/lp_state_so.c|8 ++-- > src/gallium/drivers/softpipe/sp_context.h |1 + > src/gallium/drivers/softpipe/sp_draw_arrays.c |4 ++-- > src/gallium/drivers/softpipe/sp_state_so.c|1 + > 10 files changed, 35 insertions(+), 12 deletions(-) > > diff --git a/src/gallium/auxiliary/draw/draw_context.c > b/src/gallium/auxiliary/draw/draw_context.c > index 4a08765..f463739 100644 > --- a/src/gallium/auxiliary/draw/draw_context.c > +++ b/src/gallium/auxiliary/draw/draw_context.c > @@ -810,7 +810,8 @@ draw_get_rasterizer_no_cull( struct draw_context *draw, > void > draw_set_mapped_so_targets(struct draw_context *draw, > int num_targets, > - struct draw_so_target > *targets[PIPE_MAX_SO_BUFFERS]) > + struct draw_so_target > *targets[PIPE_MAX_SO_BUFFERS], > + unsigned append_bitmask) > { > int i; > > @@ -820,6 +821,7 @@ draw_set_mapped_so_targets(struct draw_context *draw, >draw->so.targets[i] = NULL; > > draw->so.num_targets = num_targets; > + draw->so.append_bitmask = append_bitmask; > } > > void > diff --git a/src/gallium/auxiliary/draw/draw_context.h > b/src/gallium/auxiliary/draw/draw_context.h > index 4a1b27e..ae63068 100644 > --- a/src/gallium/auxiliary/draw/draw_context.h > +++ b/src/gallium/auxiliary/draw/draw_context.h > @@ -231,7 +231,8 @@ draw_set_mapped_constant_buffer(struct draw_context *draw, > void > draw_set_mapped_so_targets(struct draw_context *draw, > int num_targets, > - struct draw_so_target > *targets[PIPE_MAX_SO_BUFFERS]); > + struct draw_so_target > *targets[PIPE_MAX_SO_BUFFERS], > + unsigned append_bitmask); > > > /*** > diff --git a/src/gallium/auxiliary/draw/draw_private.h > b/src/gallium/auxiliary/draw/draw_private.h > index fd52c2d..4dda90e 100644 > --- a/src/gallium/auxiliary/draw/draw_private.h > +++ b/src/gallium/auxiliary/draw/draw_private.h > @@ -290,6 +290,7 @@ struct draw_context > struct { >struct draw_so_target *targets[PIPE_MAX_SO_BUFFERS]; >uint num_targets; > + uint append_bitmask; > } so; > > /* Clip derived state: > diff --git a/src/gallium/auxiliary/draw/draw_pt_so_emit.c > b/src/gallium/auxiliary/draw/draw_pt_so_emit.c > index d624a99..785aa34 100644 > --- a/src/gallium/auxiliary/draw/draw_pt_so_emit.c > +++ b/src/gallium/auxiliary/draw/draw_pt_so_emit.c > @@ -77,6 +77,24 @@ draw_has_so(const struct draw_context *draw) > return FALSE; > } > > +static void > +clean_so_buffers(struct pt_so_emit *emit) > +{ > + struct draw_context *draw = emit->draw; > + unsigned i; > + > + debug_assert(emit->has_so); > + > + for (i = 0; i < draw->so.num_targets; i++) { > + /* if we're not appending then lets reset the internal > + data of our so target */ > + if (!(draw->so.append_bitmask & (1 << i)) && draw->so.targets[i]) { > + draw->so.targets[i]->internal_offset = 0; > + draw->so.targets[i]->emitted_vertices = 0; > + } > + } > +} > + > void draw_pt_so_emit_prepare(struct pt_so_emit *emit, boolean > use_pre_clip_pos) > { > struct draw_context *draw = emit->draw; > @@ -257,6 +275,8 @@ void draw_pt_so_emit( struct pt_so_emit *emit, > if (!draw->so.num_targets) >return; > > + clean_so_buffers(emit); > + > emit->emitted_vertices = 0; > emit->emitted_primitives = 0; > emit->generated_primitives = 0; > diff --git a/src/gallium/drivers/llvmpipe/lp_context.h > b/src/gallium/drivers/llvmpipe/lp_context.h > index abfe852..0515968 100644 > --- a/src/gallium/drivers/llvmpipe/lp_context.h > +++ b/src/gallium/drivers/llvmpipe/lp_context.h > @@ -91,6 +91,7 @@ struct llvmpipe_context { > > struct draw_so_target *so_targets[PIPE_MAX_SO_BUFFERS]; > int num_so_targets; > + unsigned so_append_bitmask; > struct pipe_query_data_so_statistics so_stats; > unsigned num_primitives_generated; > > diff --git a/src/gallium/drivers/llvmpipe/lp_draw_arrays.c >
Re: [Mesa-dev] [PATCH 2/2] r600g/compute: Accept LDS size from the LLVM backend
For both patches in this series, the original files use tabs for indentation, not the spaces that the patches introduce. Might want to fix that for consistency. I'm not familiar enough with the register poking to give a qualified review, but everything else looks reasonable to me. Tested-by: Aaron Watry On Wed, Jun 12, 2013 at 7:34 PM, Tom Stellard wrote: > From: Tom Stellard > > And allocate the correct amount before dispatching the kernel. > --- > src/gallium/drivers/r600/evergreen_compute.c | 53 > +++--- > .../drivers/r600/evergreen_compute_internal.h | 1 + > src/gallium/drivers/r600/evergreen_state.c | 6 +-- > src/gallium/drivers/r600/r600_asm.h| 1 + > src/gallium/drivers/r600/r600_llvm.c | 3 ++ > 5 files changed, 44 insertions(+), 20 deletions(-) > > diff --git a/src/gallium/drivers/r600/evergreen_compute.c > b/src/gallium/drivers/r600/evergreen_compute.c > index b16c9d9..226933b 100644 > --- a/src/gallium/drivers/r600/evergreen_compute.c > +++ b/src/gallium/drivers/r600/evergreen_compute.c > @@ -211,7 +211,8 @@ void *evergreen_create_compute_state( > #endif > > shader->ctx = (struct r600_context*)ctx; > - shader->local_size = cso->req_local_mem; ///TODO: assert it > + /* XXX: We ignore cso->req_local_mem, because we compute this value > +* ourselves on a per-kernel basis. */ > shader->private_size = cso->req_private_mem; > shader->input_size = cso->req_input_mem; > > @@ -327,13 +328,13 @@ static void evergreen_emit_direct_dispatch( > { > int i; > struct radeon_winsys_cs *cs = rctx->rings.gfx.cs; > + struct r600_pipe_compute *shader = rctx->cs_shader_state.shader; > unsigned num_waves; > unsigned num_pipes = rctx->screen->info.r600_max_pipes; > unsigned wave_divisor = (16 * num_pipes); > int group_size = 1; > int grid_size = 1; > - /* XXX: Enable lds and get size from cs_shader_state */ > - unsigned lds_size = 0; > + unsigned lds_size = shader->active_kernel->bc.nlds_dw; > > /* Calculate group_size/grid_size */ > for (i = 0; i < 3; i++) { > @@ -348,16 +349,10 @@ static void evergreen_emit_direct_dispatch( > num_waves = (block_layout[0] * block_layout[1] * block_layout[2] + > wave_divisor - 1) / wave_divisor; > > - COMPUTE_DBG(rctx->screen, "Using %u pipes, there are %u wavefronts > per thread block\n", > - num_pipes, num_waves); > - > - /* XXX: Partition the LDS between PS/CS. By default half (4096 dwords > -* on Evergreen) oes to Pixel Shaders and half goes to Compute > Shaders. > -* We may need to allocat the entire LDS space for Compute Shaders. > -* > -* EG: R_008E2C_SQ_LDS_RESOURCE_MGMT := > S_008E2C_NUM_LS_LDS(lds_dwords) > -* CM: CM_R_0286FC_SPI_LDS_MGMT := S_0286FC_NUM_LS_LDS(lds_dwords) > -*/ > + COMPUTE_DBG(rctx->screen, "Using %u pipes, " > + "%u wavefronts per thread block, " > + "allocating %u dwords lds.\n", > + num_pipes, num_waves, lds_size); > > r600_write_config_reg(cs, R_008970_VGT_NUM_INDICES, group_size); > > @@ -374,6 +369,14 @@ static void evergreen_emit_direct_dispatch( > r600_write_value(cs, block_layout[1]); /* > R_0286F0_SPI_COMPUTE_NUM_THREAD_Y */ > r600_write_value(cs, block_layout[2]); /* > R_0286F4_SPI_COMPUTE_NUM_THREAD_Z */ > > + if (rctx->chip_class < CAYMAN) { > + assert(lds_size <= 8192); > + } else { > + /* Cayman appears to have a slightly smaller limit, see the > +* value of CM_R_0286FC_SPI_LDS_MGMT.NUM_LS_LDS */ > + assert(lds_size <= 8160); > + } > + > r600_write_compute_context_reg(cs, CM_R_0288E8_SQ_LDS_ALLOC, > lds_size | (num_waves << 14)); > > @@ -517,12 +520,14 @@ static void evergreen_launch_grid( > struct r600_context *ctx = (struct r600_context *)ctx_; > > #ifdef HAVE_OPENCL > - COMPUTE_DBG(ctx->screen, "*** evergreen_launch_grid: pc = %u\n", pc); > > struct r600_pipe_compute *shader = ctx->cs_shader_state.shader; > - if (!shader->kernels[pc].code_bo) { > + struct r600_kernel *kernel = &shader->kernels[pc]; > + > + COMPUTE_DBG(ctx->screen, "*** evergreen_launch_grid: pc = %u\n", pc); > + > + if (!kernel->code_bo) { > void *p; > - struct r600_kernel *kernel = &shader->kernels[pc]; > struct r600_bytecode *bc = &kernel->bc; > LLVMModuleRef mod = kernel->llvm_module; > boolean use_kill = false; > @@ -551,7 +556,7 @@ static void evergreen_launch_grid( > ctx->ws->buffer_unmap(kernel->code_bo->cs_buf); >
Re: [Mesa-dev] [PATCH 1/2] r600g/compute: Move compute_shader_create() function into evergreen_compute.c
On Thu, Jun 13, 2013 at 05:51:49PM -0500, Aaron Watry wrote: > On Wed, Jun 12, 2013 at 7:34 PM, Tom Stellard wrote: > > From: Tom Stellard > > > > --- > > src/gallium/drivers/r600/evergreen_compute.c | 23 +++- > > src/gallium/drivers/r600/r600_shader.c | 32 > > > > 2 files changed, 22 insertions(+), 33 deletions(-) > > > > diff --git a/src/gallium/drivers/r600/evergreen_compute.c > > b/src/gallium/drivers/r600/evergreen_compute.c > > index c993c09..b16c9d9 100644 > > --- a/src/gallium/drivers/r600/evergreen_compute.c > > +++ b/src/gallium/drivers/r600/evergreen_compute.c > > @@ -46,6 +46,7 @@ > > #include "evergreen_compute.h" > > #include "evergreen_compute_internal.h" > > #include "compute_memory_pool.h" > > +#include "sb/sb_public.h" > > #ifdef HAVE_OPENCL > > #include "radeon_llvm_util.h" > > #endif > > @@ -522,7 +523,27 @@ static void evergreen_launch_grid( > > if (!shader->kernels[pc].code_bo) { > > void *p; > > struct r600_kernel *kernel = &shader->kernels[pc]; > > - r600_compute_shader_create(ctx_, kernel->llvm_module, > > &kernel->bc); > > + struct r600_bytecode *bc = &kernel->bc; > > + LLVMModuleRef mod = kernel->llvm_module; > > + boolean use_kill = false; > > + bool dump = (ctx->screen->debug_flags & DBG_CS) != 0; > > + unsigned use_sb = ctx->screen->debug_flags & DBG_SB_CS; > > + unsigned sb_disasm = use_sb || > > + (ctx->screen->debug_flags & DBG_SB_DISASM); > > + > > + r600_bytecode_init(bc, ctx->chip_class, ctx->family, > > + ctx->screen->has_compressed_msaa_texturing); > > + bc->type = TGSI_PROCESSOR_COMPUTE; > > + bc->isa = ctx->isa; > > + r600_llvm_compile(mod, ctx->family, bc, &use_kill, dump); > > + > > + if (dump && !sb_disasm) { > > + r600_bytecode_disasm(bc); > > + } else if ((dump && sb_disasm) || use_sb) { > > + if (r600_sb_bytecode_process(ctx, bc, NULL, dump, > > use_sb)) > > + R600_ERR("r600_sb_bytecode_process > > failed!\n"); > > + } > > + > > kernel->code_bo = > > r600_compute_buffer_alloc_vram(ctx->screen, > > kernel->bc.ndw * 4); > > p = r600_buffer_mmap_sync_with_rings(ctx, kernel->code_bo, > > PIPE_TRANSFER_WRITE); > > diff --git a/src/gallium/drivers/r600/r600_shader.c > > b/src/gallium/drivers/r600/r600_shader.c > > index 81ed3ce..97c625c 100644 > > --- a/src/gallium/drivers/r600/r600_shader.c > > +++ b/src/gallium/drivers/r600/r600_shader.c > > @@ -291,38 +291,6 @@ static int tgsi_bgnloop(struct r600_shader_ctx *ctx); > > static int tgsi_endloop(struct r600_shader_ctx *ctx); > > static int tgsi_loop_brk_cont(struct r600_shader_ctx *ctx); > > > > -#ifdef HAVE_OPENCL > > -int r600_compute_shader_create(struct pipe_context * ctx, > > - LLVMModuleRef mod, struct r600_bytecode * bytecode) > > -{ > > There's an associated declaration of this function in r600_pipe.h that > is now unused... should this be removed? Otherwise, this looks good to > me. > Yes, that should be removed. I'll take care of that before I push. > FYI: Tested on CEDAR (HD5400). > Great, thanks. -Tom > > > > - struct r600_context *r600_ctx = (struct r600_context *)ctx; > > - struct r600_shader_ctx shader_ctx; > > - boolean use_kill = false; > > - bool dump = (r600_ctx->screen->debug_flags & DBG_CS) != 0; > > - unsigned use_sb = r600_ctx->screen->debug_flags & DBG_SB_CS; > > - unsigned sb_disasm = use_sb || > > - (r600_ctx->screen->debug_flags & DBG_SB_DISASM); > > - > > - shader_ctx.bc = bytecode; > > - r600_bytecode_init(shader_ctx.bc, r600_ctx->chip_class, > > r600_ctx->family, > > - r600_ctx->screen->has_compressed_msaa_texturing); > > - shader_ctx.bc->type = TGSI_PROCESSOR_COMPUTE; > > - shader_ctx.bc->isa = r600_ctx->isa; > > - r600_llvm_compile(mod, r600_ctx->family, > > - shader_ctx.bc, &use_kill, dump); > > - > > - if (dump && !sb_disasm) { > > - r600_bytecode_disasm(shader_ctx.bc); > > - } else if ((dump && sb_disasm) || use_sb) { > > - if (r600_sb_bytecode_process(r600_ctx, shader_ctx.bc, NULL, > > dump, use_sb)) > > - R600_ERR("r600_sb_bytecode_process failed!\n"); > > - } > > - > > - return 1; > > -} > > - > > -#endif /* HAVE_OPENCL */ > > - > > static int tgsi_is_supported(struct r600_shader_ctx *ctx) > > { > > struct tgsi_full_instruction *i = > > &ctx->parse.FullToken.FullInstruction; > > -- > > 1.7.11.4 > > > > __
Re: [Mesa-dev] [PATCH 1/2] r600g/compute: Move compute_shader_create() function into evergreen_compute.c
On Wed, Jun 12, 2013 at 7:34 PM, Tom Stellard wrote: > From: Tom Stellard > > --- > src/gallium/drivers/r600/evergreen_compute.c | 23 +++- > src/gallium/drivers/r600/r600_shader.c | 32 > > 2 files changed, 22 insertions(+), 33 deletions(-) > > diff --git a/src/gallium/drivers/r600/evergreen_compute.c > b/src/gallium/drivers/r600/evergreen_compute.c > index c993c09..b16c9d9 100644 > --- a/src/gallium/drivers/r600/evergreen_compute.c > +++ b/src/gallium/drivers/r600/evergreen_compute.c > @@ -46,6 +46,7 @@ > #include "evergreen_compute.h" > #include "evergreen_compute_internal.h" > #include "compute_memory_pool.h" > +#include "sb/sb_public.h" > #ifdef HAVE_OPENCL > #include "radeon_llvm_util.h" > #endif > @@ -522,7 +523,27 @@ static void evergreen_launch_grid( > if (!shader->kernels[pc].code_bo) { > void *p; > struct r600_kernel *kernel = &shader->kernels[pc]; > - r600_compute_shader_create(ctx_, kernel->llvm_module, > &kernel->bc); > + struct r600_bytecode *bc = &kernel->bc; > + LLVMModuleRef mod = kernel->llvm_module; > + boolean use_kill = false; > + bool dump = (ctx->screen->debug_flags & DBG_CS) != 0; > + unsigned use_sb = ctx->screen->debug_flags & DBG_SB_CS; > + unsigned sb_disasm = use_sb || > + (ctx->screen->debug_flags & DBG_SB_DISASM); > + > + r600_bytecode_init(bc, ctx->chip_class, ctx->family, > + ctx->screen->has_compressed_msaa_texturing); > + bc->type = TGSI_PROCESSOR_COMPUTE; > + bc->isa = ctx->isa; > + r600_llvm_compile(mod, ctx->family, bc, &use_kill, dump); > + > + if (dump && !sb_disasm) { > + r600_bytecode_disasm(bc); > + } else if ((dump && sb_disasm) || use_sb) { > + if (r600_sb_bytecode_process(ctx, bc, NULL, dump, > use_sb)) > + R600_ERR("r600_sb_bytecode_process > failed!\n"); > + } > + > kernel->code_bo = r600_compute_buffer_alloc_vram(ctx->screen, > kernel->bc.ndw * 4); > p = r600_buffer_mmap_sync_with_rings(ctx, kernel->code_bo, > PIPE_TRANSFER_WRITE); > diff --git a/src/gallium/drivers/r600/r600_shader.c > b/src/gallium/drivers/r600/r600_shader.c > index 81ed3ce..97c625c 100644 > --- a/src/gallium/drivers/r600/r600_shader.c > +++ b/src/gallium/drivers/r600/r600_shader.c > @@ -291,38 +291,6 @@ static int tgsi_bgnloop(struct r600_shader_ctx *ctx); > static int tgsi_endloop(struct r600_shader_ctx *ctx); > static int tgsi_loop_brk_cont(struct r600_shader_ctx *ctx); > > -#ifdef HAVE_OPENCL > -int r600_compute_shader_create(struct pipe_context * ctx, > - LLVMModuleRef mod, struct r600_bytecode * bytecode) > -{ There's an associated declaration of this function in r600_pipe.h that is now unused... should this be removed? Otherwise, this looks good to me. FYI: Tested on CEDAR (HD5400). --Aaron > - struct r600_context *r600_ctx = (struct r600_context *)ctx; > - struct r600_shader_ctx shader_ctx; > - boolean use_kill = false; > - bool dump = (r600_ctx->screen->debug_flags & DBG_CS) != 0; > - unsigned use_sb = r600_ctx->screen->debug_flags & DBG_SB_CS; > - unsigned sb_disasm = use_sb || > - (r600_ctx->screen->debug_flags & DBG_SB_DISASM); > - > - shader_ctx.bc = bytecode; > - r600_bytecode_init(shader_ctx.bc, r600_ctx->chip_class, > r600_ctx->family, > - r600_ctx->screen->has_compressed_msaa_texturing); > - shader_ctx.bc->type = TGSI_PROCESSOR_COMPUTE; > - shader_ctx.bc->isa = r600_ctx->isa; > - r600_llvm_compile(mod, r600_ctx->family, > - shader_ctx.bc, &use_kill, dump); > - > - if (dump && !sb_disasm) { > - r600_bytecode_disasm(shader_ctx.bc); > - } else if ((dump && sb_disasm) || use_sb) { > - if (r600_sb_bytecode_process(r600_ctx, shader_ctx.bc, NULL, > dump, use_sb)) > - R600_ERR("r600_sb_bytecode_process failed!\n"); > - } > - > - return 1; > -} > - > -#endif /* HAVE_OPENCL */ > - > static int tgsi_is_supported(struct r600_shader_ctx *ctx) > { > struct tgsi_full_instruction *i = > &ctx->parse.FullToken.FullInstruction; > -- > 1.7.11.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH libclc] Implement barrier() builtin
FYI: I've applied your related piglit test and R600 back-end patches and tested this on a CEDAR (HD5400). Note: I had some trouble applying patches 4 and 5 of the R600 patches but after chopping out the unit tests and creating those files by hand (and using --ignore-whitespace), everything is there and functioning. For the libclc change: Reviewed-by: Aaron Watry On Wed, Jun 12, 2013 at 7:31 PM, Tom Stellard wrote: > From: Tom Stellard > > --- > r600/lib/SOURCES | 2 ++ > r600/lib/synchronization/barrier.cl | 15 +++ > r600/lib/synchronization/barrier_impl.ll | 12 > 3 files changed, 29 insertions(+) > create mode 100644 r600/lib/synchronization/barrier.cl > create mode 100644 r600/lib/synchronization/barrier_impl.ll > > diff --git a/r600/lib/SOURCES b/r600/lib/SOURCES > index af8c8c8..16ef3ac 100644 > --- a/r600/lib/SOURCES > +++ b/r600/lib/SOURCES > @@ -2,3 +2,5 @@ workitem/get_group_id.ll > workitem/get_local_size.ll > workitem/get_local_id.ll > workitem/get_global_size.ll > +synchronization/barrier.cl > +synchronization/barrier_impl.ll > diff --git a/r600/lib/synchronization/barrier.cl > b/r600/lib/synchronization/barrier.cl > new file mode 100644 > index 000..ac0b4b3 > --- /dev/null > +++ b/r600/lib/synchronization/barrier.cl > @@ -0,0 +1,15 @@ > + > +#include > + > +void barrier_local(void); > +void barrier_global(void); > + > +void barrier(cl_mem_fence_flags flags) { > + if (flags & CLK_LOCAL_MEM_FENCE) { > +barrier_local(); > + } > + > + if (flags & CLK_GLOBAL_MEM_FENCE) { > +barrier_global(); > + } > +} > diff --git a/r600/lib/synchronization/barrier_impl.ll > b/r600/lib/synchronization/barrier_impl.ll > new file mode 100644 > index 000..99ac018 > --- /dev/null > +++ b/r600/lib/synchronization/barrier_impl.ll > @@ -0,0 +1,12 @@ > +declare void @llvm.AMDGPU.barrier.local() nounwind > +declare void @llvm.AMDGPU.barrier.global() nounwind > + > +define void @barrier_local() nounwind alwaysinline { > + call void @llvm.AMDGPU.barrier.local() > + ret void > +} > + > +define void @barrier_global() nounwind alwaysinline { > + call void @llvm.AMDGPU.barrier.global() > + ret void > +} > -- > 1.7.11.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] draw: don't clear the so targets until we stream out
Since draw auto fetches the count from the buffers, we can't just clear them on bind, we need to wait until the actual stream out is performed. Otherwise the count for draw auto will be zero. Plus is cleaner to have draw do it rather than drivers having to mess with draw's internals. Signed-off-by: Zack Rusin --- src/gallium/auxiliary/draw/draw_context.c |4 +++- src/gallium/auxiliary/draw/draw_context.h |3 ++- src/gallium/auxiliary/draw/draw_private.h |1 + src/gallium/auxiliary/draw/draw_pt_so_emit.c | 20 src/gallium/drivers/llvmpipe/lp_context.h |1 + src/gallium/drivers/llvmpipe/lp_draw_arrays.c |4 ++-- src/gallium/drivers/llvmpipe/lp_state_so.c|8 ++-- src/gallium/drivers/softpipe/sp_context.h |1 + src/gallium/drivers/softpipe/sp_draw_arrays.c |4 ++-- src/gallium/drivers/softpipe/sp_state_so.c|1 + 10 files changed, 35 insertions(+), 12 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_context.c b/src/gallium/auxiliary/draw/draw_context.c index 4a08765..f463739 100644 --- a/src/gallium/auxiliary/draw/draw_context.c +++ b/src/gallium/auxiliary/draw/draw_context.c @@ -810,7 +810,8 @@ draw_get_rasterizer_no_cull( struct draw_context *draw, void draw_set_mapped_so_targets(struct draw_context *draw, int num_targets, - struct draw_so_target *targets[PIPE_MAX_SO_BUFFERS]) + struct draw_so_target *targets[PIPE_MAX_SO_BUFFERS], + unsigned append_bitmask) { int i; @@ -820,6 +821,7 @@ draw_set_mapped_so_targets(struct draw_context *draw, draw->so.targets[i] = NULL; draw->so.num_targets = num_targets; + draw->so.append_bitmask = append_bitmask; } void diff --git a/src/gallium/auxiliary/draw/draw_context.h b/src/gallium/auxiliary/draw/draw_context.h index 4a1b27e..ae63068 100644 --- a/src/gallium/auxiliary/draw/draw_context.h +++ b/src/gallium/auxiliary/draw/draw_context.h @@ -231,7 +231,8 @@ draw_set_mapped_constant_buffer(struct draw_context *draw, void draw_set_mapped_so_targets(struct draw_context *draw, int num_targets, - struct draw_so_target *targets[PIPE_MAX_SO_BUFFERS]); + struct draw_so_target *targets[PIPE_MAX_SO_BUFFERS], + unsigned append_bitmask); /*** diff --git a/src/gallium/auxiliary/draw/draw_private.h b/src/gallium/auxiliary/draw/draw_private.h index fd52c2d..4dda90e 100644 --- a/src/gallium/auxiliary/draw/draw_private.h +++ b/src/gallium/auxiliary/draw/draw_private.h @@ -290,6 +290,7 @@ struct draw_context struct { struct draw_so_target *targets[PIPE_MAX_SO_BUFFERS]; uint num_targets; + uint append_bitmask; } so; /* Clip derived state: diff --git a/src/gallium/auxiliary/draw/draw_pt_so_emit.c b/src/gallium/auxiliary/draw/draw_pt_so_emit.c index d624a99..785aa34 100644 --- a/src/gallium/auxiliary/draw/draw_pt_so_emit.c +++ b/src/gallium/auxiliary/draw/draw_pt_so_emit.c @@ -77,6 +77,24 @@ draw_has_so(const struct draw_context *draw) return FALSE; } +static void +clean_so_buffers(struct pt_so_emit *emit) +{ + struct draw_context *draw = emit->draw; + unsigned i; + + debug_assert(emit->has_so); + + for (i = 0; i < draw->so.num_targets; i++) { + /* if we're not appending then lets reset the internal + data of our so target */ + if (!(draw->so.append_bitmask & (1 << i)) && draw->so.targets[i]) { + draw->so.targets[i]->internal_offset = 0; + draw->so.targets[i]->emitted_vertices = 0; + } + } +} + void draw_pt_so_emit_prepare(struct pt_so_emit *emit, boolean use_pre_clip_pos) { struct draw_context *draw = emit->draw; @@ -257,6 +275,8 @@ void draw_pt_so_emit( struct pt_so_emit *emit, if (!draw->so.num_targets) return; + clean_so_buffers(emit); + emit->emitted_vertices = 0; emit->emitted_primitives = 0; emit->generated_primitives = 0; diff --git a/src/gallium/drivers/llvmpipe/lp_context.h b/src/gallium/drivers/llvmpipe/lp_context.h index abfe852..0515968 100644 --- a/src/gallium/drivers/llvmpipe/lp_context.h +++ b/src/gallium/drivers/llvmpipe/lp_context.h @@ -91,6 +91,7 @@ struct llvmpipe_context { struct draw_so_target *so_targets[PIPE_MAX_SO_BUFFERS]; int num_so_targets; + unsigned so_append_bitmask; struct pipe_query_data_so_statistics so_stats; unsigned num_primitives_generated; diff --git a/src/gallium/drivers/llvmpipe/lp_draw_arrays.c b/src/gallium/drivers/llvmpipe/lp_draw_arrays.c index 4e23904..11b665a 100644 --- a/src/gallium/drivers/llvmpipe/lp_draw_arrays.c +++ b/src/gallium/drivers/llvmpipe/lp_draw_arrays.c @@ -104,7 +104,7 @@ llvmpipe_draw_vbo(struct pipe_context *pipe, const struct pipe_draw_info *info) } } dra
[Mesa-dev] [Bug 47824] osmesa using --enable-shared-glapi depends on libgl
https://bugs.freedesktop.org/show_bug.cgi?id=47824 Anssi Hannula changed: What|Removed |Added CC||an...@mageia.org -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] i965: Remove broken source type assertions from brw_alu3().
Commit 526ffdfc033ab01cf133cb7e8290c65d12ccc9be attempted to generalize the source register type assertions to allow D and UD. However, the src1 and src2 assertions actually checked src0.type against D and UD due to a copy and paste bug. It also began setting the source and destination register types based on dest.type, ignoring src0/src1/src2.type completely. BFE and BFI2 may actually pass mixed D/UD types and expect them to be ignored, which is arguably a bit sloppy, but not too crazy either. This patch simply removes the source register assertions as those values aren't used anyway. It also clarifies the comment above the block that sets the register types. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 19 +-- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 3d0db1b..f2cacd1 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -811,9 +811,6 @@ static struct brw_instruction *brw_alu3(struct brw_compile *p, assert(src0.file == BRW_GENERAL_REGISTER_FILE); assert(src0.address_mode == BRW_ADDRESS_DIRECT); assert(src0.nr < 128); - assert(src0.type == BRW_REGISTER_TYPE_F || - src0.type == BRW_REGISTER_TYPE_D || - src0.type == BRW_REGISTER_TYPE_UD); insn->bits2.da3src.src0_swizzle = src0.dw1.bits.swizzle; insn->bits2.da3src.src0_subreg_nr = get_3src_subreg_nr(src0); insn->bits2.da3src.src0_reg_nr = src0.nr; @@ -824,9 +821,6 @@ static struct brw_instruction *brw_alu3(struct brw_compile *p, assert(src1.file == BRW_GENERAL_REGISTER_FILE); assert(src1.address_mode == BRW_ADDRESS_DIRECT); assert(src1.nr < 128); - assert(src1.type == BRW_REGISTER_TYPE_F || - src0.type == BRW_REGISTER_TYPE_D || - src0.type == BRW_REGISTER_TYPE_UD); insn->bits2.da3src.src1_swizzle = src1.dw1.bits.swizzle; insn->bits2.da3src.src1_subreg_nr_low = get_3src_subreg_nr(src1) & 0x3; insn->bits3.da3src.src1_subreg_nr_high = get_3src_subreg_nr(src1) >> 2; @@ -838,9 +832,6 @@ static struct brw_instruction *brw_alu3(struct brw_compile *p, assert(src2.file == BRW_GENERAL_REGISTER_FILE); assert(src2.address_mode == BRW_ADDRESS_DIRECT); assert(src2.nr < 128); - assert(src2.type == BRW_REGISTER_TYPE_F || - src0.type == BRW_REGISTER_TYPE_D || - src0.type == BRW_REGISTER_TYPE_UD); insn->bits3.da3src.src2_swizzle = src2.dw1.bits.swizzle; insn->bits3.da3src.src2_subreg_nr = get_3src_subreg_nr(src2); insn->bits3.da3src.src2_rep_ctrl = src2.vstride == BRW_VERTICAL_STRIDE_0; @@ -849,12 +840,12 @@ static struct brw_instruction *brw_alu3(struct brw_compile *p, insn->bits1.da3src.src2_negate = src2.negate; if (intel->gen >= 7) { - /* For MAD and LRP, all incoming src types are float, but for BFE and - * BFI2, the three source types might not all be the same. src2, the - * primary argument, should match the type of the destination. + /* Set both the source and destination types based on dest.type, + * ignoring the source register types. The MAD and LRP emitters ensure + * that all four types are float. The BFE and BFI2 emitters, however, + * may send us mixed D and UD types and want us to ignore that and use + * the destination type. */ - assert(dest.type == src2.type); - switch (dest.type) { case BRW_REGISTER_TYPE_F: insn->bits1.da3src.src_type = BRW_3SRC_TYPE_F; -- 1.8.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] i965: Add back strict type assertions for MAD and LRP.
Commit 526ffdfc033ab01cf133cb7e8290c65d12ccc9be relaxed the type assertions in brw_alu3 to allow D/UD types (required by BFE and BFI2). This lost us the strict type checking for MAD and LRP, which require all four types to be float. This patch adds a new ALU3F wrapper which checks these once again. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 31d97ca..3d0db1b 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -905,6 +905,20 @@ struct brw_instruction *brw_##OP(struct brw_compile *p, \ return brw_alu3(p, BRW_OPCODE_##OP, dest, src0, src1, src2);\ } +#define ALU3F(OP) \ +struct brw_instruction *brw_##OP(struct brw_compile *p, \ + struct brw_reg dest, \ + struct brw_reg src0, \ + struct brw_reg src1, \ + struct brw_reg src2) \ +{ \ + assert(dest.type == BRW_REGISTER_TYPE_F);\ + assert(src0.type == BRW_REGISTER_TYPE_F);\ + assert(src1.type == BRW_REGISTER_TYPE_F);\ + assert(src2.type == BRW_REGISTER_TYPE_F);\ + return brw_alu3(p, BRW_OPCODE_##OP, dest, src0, src1, src2); \ +} + /* Rounding operations (other than RNDD) require two instructions - the first * stores a rounded value (possibly the wrong way) in the dest register, but * also sets a per-channel "increment bit" in the flag register. A predicated @@ -955,8 +969,8 @@ ALU2(DP3) ALU2(DP2) ALU2(LINE) ALU2(PLN) -ALU3(MAD) -ALU3(LRP) +ALU3F(MAD) +ALU3F(LRP) ALU1(BFREV) ALU3(BFE) ALU2(BFI1) -- 1.8.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 52167] llvmpipe test programs link fails when ld --as-needed option is used
https://bugs.freedesktop.org/show_bug.cgi?id=52167 Olivier Blin changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #11 from Olivier Blin --- I can not reproduce the build issue anymore, so this patch is not needed. This has likely been fixed by the automake conversion. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] XDC2013 - Call for Proposals
# Call For Proposals **2013 X.Org Developers Conference (XDC 2013)** **23-25 September 2013** **Portland, Oregon USA** The [2013 X.Org Developers Conference] (http://www.x.org/wiki/Events/XDC2013) is the annual technical meeting for [X Window System](http://x.org) and [Free Desktop](http://freedesktop.org) developers. The attendees will gather to discuss outstanding technical issues related to X and to plan the direction of the X Window System and its software ecosystem. The event is free of charge and open to the general public. The XDC 2013 Technical Program Committee (TPC) is requesting proposals for papers and presentations at XDC 2013. While any serious proposal will be gratefully considered, topics of interest to X.org and FreeDesktop.org developers are encouraged. There are three particular types of proposal the TPC is seeking: 1. Technical talk abstracts: 250-1000 words describing a presentation to be made at XDC 2013. This can be anything: a work-in-progress talk, a proposal for change, analysis of trends, etc. 2. Informal white papers: 1000+ words on something of interest or relevance to X.org developers, FreeDesktop.org developers or the X community at large. These papers will appear in the online conference proceedings of XDC 2013, and are unrefereed (beyond basic checks for legitimacy and relevance). Papers can be refereed if requested in advance. 3. Technical research papers: 2500+ words in a format and style suitable for refereed technical publication. Papers that are judged acceptable by the TPC and its referees will be published in the printed conference proceedings of XDC 2013, available on a print-on-demand basis online. XDC 2013 technical presenters will be chosen from the authors of any of these submissions (as well as other presenters invited by the TPC). Normally, there is time for everyone who wants to present to do so, but one can never tell. As much as possible, presenters will be selected from those who submit before the deadline. We also may be able to offer financial assistance for travel for presenters who could not otherwise afford to attend and who submit before the deadline. Please do submit your proposal in a timely fashion. **Proposals due:** Thursday 1 August 2013 17:00 UTC *Accepted formats: PDF and ASCII text. **Notification of acceptance:** Thursday 8 August 2013 **E-mail:** bo...@foundation.x.org -- keith.pack...@intel.com pgpTt_81Ia1JZ.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] XDC2013 - Announcement
Now that we have everything in place we can finally make it official and announce it: XDC2013 will take place from September 23th to September 25th in Portland, Oregon at the University Place Hotel and Conference Center Ian Romanick, Bart Massey, and I will be orgainzing this event. The initial wiki page for this event has been put in place at: http://wiki.x.org/wiki/Events/XDC2013 This page will get updated regularly. Also we will keep you up-to-date on the X.Org events mailing list http://lists.x.org/mailman/listinfo/events so if you plan to come and are not subscribed there already, please consider doing so! For registration please add yourself to the attendees page http://wiki.x.org/wiki/Events/XDC2013/Attendees. If you would like to give a talk during the event, please add it to the program page http://wiki.x.org/wiki/Events/XDC2013/Program. We are looking forward to seeing you in Portland. So if you are corporate please talk to your managers about funding your trip. If you aren't but you have something to present, please contact the XOrg Foundation Board of Directors at bo...@foundation.x.org for travel funding. We have negotiated a special conference rate at the conference hotel. Please check the Wiki page for more information. -- keith.pack...@intel.com pgpSffKjm5FIz.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: per-texture locking
On 06/12/2013 04:08 PM, Dave Airlie wrote: On Thu, Jun 13, 2013 at 3:33 AM, Eric Anholt wrote: Frank Henigman writes: On Tue, Jun 11, 2013 at 1:10 PM, Eric Anholt wrote: Frank Henigman writes: Replace the one texture lock with a lock per texture. This allows uploading textures from one thread concurrently with drawing in another thread. _mesa_lock_context_textures() was used to check for texture updates from other contexts and also to acquire the texture lock. It's been replaced with _mesa_check_context_textures() which only does the checking. Code sections that were between _mesa_lock_context_textures() and _mesa_unlock_context_textures() have been updated to lock individual textures as needed. When someone's doing something like glCopyTexSubImage() from an FBO backed by a texture to another texture, how is the locking supposed to work? How about copies from one texture to the same texture? Right now glCopyTexSubImage locks the destination texture before copying to it, but doesn't lock the source texture. This was safe because locking any texture effectively locked them all. With my change that's no longer true so now we're copying from an unlocked texture. Is that your concern? So we just need to have it lock the source texture too? We'll have to check for source == destination so we don't try to lock twice. That's an example of my concern. I'm pretty sure that our current locking doesn't cover nearly as much as it needs to if one wants to make thread-per-context shareCtx support actually work, so I'm really concerned that this change may make the locking unfixable. I agree there probably are problems with locking currently, and there seems to be zero coverage for context sharing in piglit. But I don't understand how my change makes anything unfixable. Can you elaborate? Basic ABBA locking problems. Someone does a copyteximage from texture A to fbo-wrapped texture B, at the same time someone does copytexsubimage From texture B to fbo-wrapped texture A. The timeline for ABBA failure is: thread 1: thread 2: lock texture A lock texture B block locking texture B block locking texture A I suspect you'd need a reservation type scheme like the one Maarten is writing for the kernel, and based on the one TTM uses. Where you get a list of objects you want to lock and back off all locks when you hit a contended point. Dave. Other OpenGL drivers solve this a different way, and it has been something on my todo list since forever. Basically, you have N+1 sets of state per object: the global state and a mirror per-context. Once a texture is bound, all reads access the local mirror. Writes hit the local mirror and the global state. When glBindTexture is called, the mirror synchronizes from the global state. Accesses to the per-context state never need a lock (since they're implicitly locked by when the thread calls MakeCurrent). This eliminates all of the ABBA problems I'm aware of. - From Eric's example, thread 1 never needs to lock texture A, and thread 2 never needs to lock texture B. - Other cases where multiple textures are involved together (e.g., MRT rendering) are only modifying texture contents, not texture state. No lock is needed in these cases. The other follow-on is that we need to separately reference count the storage for the image data. There are probably other issues. Clients that have the texture bound will continue to access the same images even if another client has called glTexImage or glCopyTexImage. Once the last client unbinds (or rebinds) the texture the old images are freed. It's a big pile of work that will touch things all over the place in Mesa. That, alas, is why I've never gotten around to doing it. I think adding the ref counting to the images and adding the per-context mirror state (with the single big lock) would be good first steps. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/8] util: Expand the comment above the channel[] array
On Thu, 2013-06-13 at 14:50 +0100, Richard Sandiford wrote: > The entirety of the comment looks pretty good to me. :-) One question, and this is mostly curiosity on my part, I'm not specifically asking for another revision. > * (This is the same as C bitfield layout on most ABIs.) Do we have a handle on what 'most ABIs' are? I.e. would this include X86* and PPC* ABIs as we know them today, or do we already clearly understand which ones would not match? Thanks, -Will ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/vs: Combine code generation's inst->opcode switch statements.
vec4_visitor::generate_code() switches on vec4_instruction::opcode and calls into the brw_eu_emit.c layer to generate code for some of them. It then has a default case which calls generate_vec4_instruction() to handle the rest...which switches on opcode and handles the rest of the cases. The split apparently is that generate_code() handles the actual hardware opcodes (BRW_OPCODE_*) while generate_vec4_instruction() handles the virtual opcodes (SHADER_OPCODE_* and VS_OPCODE_*). But this looks fairly arbitrary, and it makes more sense to combine the two switches. This patch moves the cases from generate_code() into the helper function so that generate_code() isn't as large. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_vec4_emit.cpp | 329 ++-- 1 file changed, 166 insertions(+), 163 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp b/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp index fbb93db..f15759f 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp @@ -621,14 +621,178 @@ vec4_generator::generate_pull_constant_load_gen7(vec4_instruction *inst, 0); } +/** + * Generate assembly for a Vec4 IR instruction. + * + * \param instruction The Vec4 IR instruction to generate code for. + * \param dst The destination register. + * \param src An array of up to three source registers. + */ void vec4_generator::generate_vec4_instruction(vec4_instruction *instruction, struct brw_reg dst, struct brw_reg *src) { - vec4_instruction *inst = (vec4_instruction *)instruction; + vec4_instruction *inst = (vec4_instruction *) instruction; switch (inst->opcode) { + case BRW_OPCODE_MOV: + brw_MOV(p, dst, src[0]); + break; + case BRW_OPCODE_ADD: + brw_ADD(p, dst, src[0], src[1]); + break; + case BRW_OPCODE_MUL: + brw_MUL(p, dst, src[0], src[1]); + break; + case BRW_OPCODE_MACH: + brw_set_acc_write_control(p, 1); + brw_MACH(p, dst, src[0], src[1]); + brw_set_acc_write_control(p, 0); + break; + + case BRW_OPCODE_MAD: + brw_MAD(p, dst, src[0], src[1], src[2]); + break; + + case BRW_OPCODE_FRC: + brw_FRC(p, dst, src[0]); + break; + case BRW_OPCODE_RNDD: + brw_RNDD(p, dst, src[0]); + break; + case BRW_OPCODE_RNDE: + brw_RNDE(p, dst, src[0]); + break; + case BRW_OPCODE_RNDZ: + brw_RNDZ(p, dst, src[0]); + break; + + case BRW_OPCODE_AND: + brw_AND(p, dst, src[0], src[1]); + break; + case BRW_OPCODE_OR: + brw_OR(p, dst, src[0], src[1]); + break; + case BRW_OPCODE_XOR: + brw_XOR(p, dst, src[0], src[1]); + break; + case BRW_OPCODE_NOT: + brw_NOT(p, dst, src[0]); + break; + case BRW_OPCODE_ASR: + brw_ASR(p, dst, src[0], src[1]); + break; + case BRW_OPCODE_SHR: + brw_SHR(p, dst, src[0], src[1]); + break; + case BRW_OPCODE_SHL: + brw_SHL(p, dst, src[0], src[1]); + break; + + case BRW_OPCODE_CMP: + brw_CMP(p, dst, inst->conditional_mod, src[0], src[1]); + break; + case BRW_OPCODE_SEL: + brw_SEL(p, dst, src[0], src[1]); + break; + + case BRW_OPCODE_DPH: + brw_DPH(p, dst, src[0], src[1]); + break; + + case BRW_OPCODE_DP4: + brw_DP4(p, dst, src[0], src[1]); + break; + + case BRW_OPCODE_DP3: + brw_DP3(p, dst, src[0], src[1]); + break; + + case BRW_OPCODE_DP2: + brw_DP2(p, dst, src[0], src[1]); + break; + + case BRW_OPCODE_F32TO16: + brw_F32TO16(p, dst, src[0]); + break; + + case BRW_OPCODE_F16TO32: + brw_F16TO32(p, dst, src[0]); + break; + + case BRW_OPCODE_LRP: + brw_LRP(p, dst, src[0], src[1], src[2]); + break; + + case BRW_OPCODE_BFREV: + /* BFREV only supports UD type for src and dst. */ + brw_BFREV(p, retype(dst, BRW_REGISTER_TYPE_UD), + retype(src[0], BRW_REGISTER_TYPE_UD)); + break; + case BRW_OPCODE_FBH: + /* FBH only supports UD type for dst. */ + brw_FBH(p, retype(dst, BRW_REGISTER_TYPE_UD), src[0]); + break; + case BRW_OPCODE_FBL: + /* FBL only supports UD type for dst. */ + brw_FBL(p, retype(dst, BRW_REGISTER_TYPE_UD), src[0]); + break; + case BRW_OPCODE_CBIT: + /* CBIT only supports UD type for dst. */ + brw_CBIT(p, retype(dst, BRW_REGISTER_TYPE_UD), src[0]); + break; + + case BRW_OPCODE_BFE: + brw_BFE(p, dst, src[0], src[1], src[2]); + break; + + case BRW_OPCODE_BFI1: + brw_BFI1(p, dst, src[0], src[1]); + break; + case BRW_OPCODE_BFI2: + brw_BFI2(p, dst, src[0], src[1], src[2]); + break; + + case BRW_OPCODE_IF: + if (inst->src[0].file != BAD_FILE) { + /* The instruction has an embedded compare (only allowed on gen6) */ + assert
Re: [Mesa-dev] [PATCH] draw: fix a regression in computing max elt
Sounds good. Thanks for tracking this down! Jose - Original Message - > gl can use elts without setting indices, in which case > our eltMax was set to 0 and always invoking the overflow > condition. So by default set eltMax to maximum, it will > be curbed by draw_set_indexes (if it ever comes) and if > not then it will let gl's glVertexPointer/glDrawArrays > work correctly. Fixes piglit's > triangle-rasterization-overdraw test. > > Signed-off-by: Zack Rusin > --- > src/gallium/auxiliary/draw/draw_context.c |1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/gallium/auxiliary/draw/draw_context.c > b/src/gallium/auxiliary/draw/draw_context.c > index 22c0e9b..4a08765 100644 > --- a/src/gallium/auxiliary/draw/draw_context.c > +++ b/src/gallium/auxiliary/draw/draw_context.c > @@ -138,6 +138,7 @@ boolean draw_init(struct draw_context *draw) > draw->clip_z = TRUE; > > draw->pt.user.planes = (float (*) [DRAW_TOTAL_CLIP_PLANES][4]) > &(draw->plane[0]); > + draw->pt.user.eltMax = ~0; > > if (!draw_pipeline_init( draw )) >return FALSE; > -- > 1.7.10.4 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] draw: fix a regression in computing max elt
gl can use elts without setting indices, in which case our eltMax was set to 0 and always invoking the overflow condition. So by default set eltMax to maximum, it will be curbed by draw_set_indexes (if it ever comes) and if not then it will let gl's glVertexPointer/glDrawArrays work correctly. Fixes piglit's triangle-rasterization-overdraw test. Signed-off-by: Zack Rusin --- src/gallium/auxiliary/draw/draw_context.c |1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/auxiliary/draw/draw_context.c b/src/gallium/auxiliary/draw/draw_context.c index 22c0e9b..4a08765 100644 --- a/src/gallium/auxiliary/draw/draw_context.c +++ b/src/gallium/auxiliary/draw/draw_context.c @@ -138,6 +138,7 @@ boolean draw_init(struct draw_context *draw) draw->clip_z = TRUE; draw->pt.user.planes = (float (*) [DRAW_TOTAL_CLIP_PLANES][4]) &(draw->plane[0]); + draw->pt.user.eltMax = ~0; if (!draw_pipeline_init( draw )) return FALSE; -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] mesa, gallium: renumber shader indices according to their placement in pipeline
On 06/13/2013 06:25 AM, Marek Olšák wrote: See my explanation in mtypes.h. --- src/gallium/include/pipe/p_defines.h |7 --- src/glsl/linker.cpp| 16 src/mesa/drivers/dri/i965/brw_shader.cpp |8 ++-- src/mesa/main/mtypes.h |8 ++-- src/mesa/main/shaderobj.h |4 ++-- src/mesa/main/uniform_query.cpp|2 +- src/mesa/program/ir_to_mesa.cpp| 10 +++--- src/mesa/program/program.h |2 +- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 10 +++--- 9 files changed, 30 insertions(+), 37 deletions(-) Reviewed-by: Brian Paul However, a change for the VMware svga driver is also needed: diff --git a/src/gallium/drivers/svga/svga_state_constants.c b/src/gallium/drive index 759c6c6..c03f38c 100644 --- a/src/gallium/drivers/svga/svga_state_constants.c +++ b/src/gallium/drivers/svga/svga_state_constants.c @@ -58,10 +58,15 @@ static int svga_shader_type(unsigned shader) { - assert(PIPE_SHADER_VERTEX + 1 == SVGA3D_SHADERTYPE_VS); - assert(PIPE_SHADER_FRAGMENT + 1 == SVGA3D_SHADERTYPE_PS); - assert(shader <= PIPE_SHADER_FRAGMENT); - return shader + 1; + switch (shader) { + case PIPE_SHADER_VERTEX: + return SVGA3D_SHADERTYPE_VS; + case PIPE_SHADER_FRAGMENT: + return SVGA3D_SHADERTYPE_PS; + default: + assert(!"Unexpected PIPE_SHADER_ type in svga_shader_type()"); + return SVGA3D_SHADERTYPE_VS; + } } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 65714] Champions of Regnum dont show characters!
https://bugs.freedesktop.org/show_bug.cgi?id=65714 Alex Deucher changed: What|Removed |Added Assignee|mesa-dev@lists.freedesktop. |dri-devel@lists.freedesktop |org |.org Version|9.1 |git Component|Mesa core |Drivers/Gallium/r600 -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 65714] Champions of Regnum dont show characters!
https://bugs.freedesktop.org/show_bug.cgi?id=65714 --- Comment #1 from Alex Deucher --- Does the ppa enable LLVM? If so does setting env var R600_LLVM=0 help? -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 65714] Champions of Regnum dont show characters!
https://bugs.freedesktop.org/show_bug.cgi?id=65714 Fabio Pedretti changed: What|Removed |Added Attachment #80780|text/plain |image/png mime type|| -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] R600 Patches: Add support for the local address space
On Wed, Jun 12, 2013 at 06:37:39PM -0700, Matt Arsenault wrote: > On 06/12/2013 05:42 PM, Tom Stellard wrote: > >Hi, > > > >The attached patches add support for local address space on > >Evergreen / Northern Islands GPUs. > > > >Please Review. > > > >-Tom > > + def int_AMDGPU_barrier_local : Intrinsic<[], [], []>; > You probably want to mark this as IntrReadMem to try to avoid > reordering stores around the barrier > I don't think the intrinsic as defined will have stores reordered around it. From include/llvm/IR/Intrinsics.td: // Intr*Mem - Memory properties. An intrinsic is allowed to have at most one of // these properties set. They are listed from the most aggressive (best to use // if correct) to the least aggressive. If no property is set, the worst case // is assumed (it may read and write any memory it can get access to and // it may have other side effects). -Tom ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 65714] New: Champions of Regnum dont show characters!
https://bugs.freedesktop.org/show_bug.cgi?id=65714 Priority: medium Bug ID: 65714 Assignee: mesa-dev@lists.freedesktop.org Summary: Champions of Regnum dont show characters! Severity: normal Classification: Unclassified OS: Linux (All) Reporter: wolfmen...@hotmail.com Hardware: x86-64 (AMD64) Status: NEW Version: 9.1 Component: Mesa core Product: Mesa Created attachment 80780 --> https://bugs.freedesktop.org/attachment.cgi?id=80780&action=edit main character selection screen with no Character I am having some issues with the Champions of Regnum Game I am using Linux Mint 15 and out of the box everything was working fine... but you know as a linux user wanted to have the latest drivers ... So I first have installed the latest xorg/edgers/ppa ati graphic drivers , every game is running ok + playonlinux except Champions of Regnum which it doesnt display any characters in game , it displays them as invisble chars >< I have removed the xord/edgers ppa and installed oibaf's ppa but the same thing as xorg edgers.. I have an ATI HIS 5970x2 2GB DDR5 Any solution to this? -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/8] util: Expand the comment above the channel[] array
Michel Dänzer writes: > On Die, 2013-06-11 at 16:26 +0100, Richard Sandiford wrote: >> Signed-off-by: Richard Sandiford >> --- >> src/gallium/auxiliary/util/u_format.h | 42 >> ++- >> 1 file changed, 41 insertions(+), 1 deletion(-) >> >> diff --git a/src/gallium/auxiliary/util/u_format.h >> b/src/gallium/auxiliary/util/u_format.h >> index e4b9c36..db6c290 100644 >> --- a/src/gallium/auxiliary/util/u_format.h >> +++ b/src/gallium/auxiliary/util/u_format.h >> @@ -178,9 +178,49 @@ struct util_format_description >> unsigned is_mixed:1; >> >> /** >> -* Input channel description. >> +* Input channel description, in the order XYZW. >> * >> * Only valid for UTIL_FORMAT_LAYOUT_PLAIN formats. >> +* >> +* The general rule is that the order and layout of the channels is the >> +* same as they would be in a C struct: >> +* >> +* struct { >> +*...X...; >> +*...Y...; >> +*...Z...; >> +*...W...; >> +* }; >> +* >> +* with bitfields being used for all integer channels. > > I'd advise against using the term 'bitfield', as the semantics of C > bitfields are mostly up to the specific C implementation, and it will > lure people into implicitly thinking of the semantics of bitfields in > the C implementation they're using. I got the impression that the bit order was fairly consistent in practice, since there's usually a strong expectation that the first structure member should be in the first byte. But you're right of course. How does this look instead: /** * Input channel description, in the order XYZW. * * Only valid for UTIL_FORMAT_LAYOUT_PLAIN formats. * * Suppose the pixel value is treated as a single integer P. * The order of the channels within P depends on endianness: * * - On big-endian targets, the channels are ordered from the most * significant end to the least significant end. The most significant * bit of P is the most significant bit of the first channel. The least * significant bit of P is the least significant bit of the last channel. * * - On little-endian targets, the channels are ordered from the least * significant end to the most significant end. The least significant * bit of P is the least significant bit of the first channel. The most * significant bit of P is the most significant bit of the last channel. * * (This is the same as C bitfield layout on most ABIs.) * * This means that if some channels can be accessed as individual N-byte * values, the order of those channels in this array matches their order * in memory. Each N-byte value has native endianness. * * If instead a group of channels is accessed as a single N-byte value, * the order of the channels within that value depends on endianness. * For big-endian targets, the first channel in the group will be * the most significant, otherwise it will be the least significant. * * For example, if X, Y, Z and W are all 8 bits, the memory order is: * * 0 1 2 3 * X Y Z W * * regardless of endianness. If instead the channels are 16 bits, * the memory order is: * * 0 1 2 3 4 5 6 7 * little-endian: Xl Xu Yl Yu Zl Zu Wl Wu (l = lower, u = upper) * big-endian:Xu Xl Yu Yl Zu Zl Wu Wl * * If X is 8 bits and Y is 24 bits, the memory order is: * * 0 1 2 3 * little-endian: X Yl Ym Yu(l = lower, m = middle, u = upper) * big-endian:X Yu Ym Yl * * If X is 5 bits, Y is 5 bits, Z is 5 bits and W is 1 bit, the layout is: * *01 * msb lsb msb lsb * little-endian: YYYX WZYY * big-endian:XYYY YYZW */ (Each version grows a new example :-)) Thanks, Richard ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] mesa, glsl, gallium: remove GLSLSkipStrictMaxVaryingLimitCheck and dependencies
Not needed with do_dead_builtin_varyings. --- src/gallium/drivers/freedreno/freedreno_screen.c |1 - src/gallium/drivers/i915/i915_screen.c |1 - src/gallium/drivers/ilo/ilo_screen.c |1 - src/gallium/drivers/llvmpipe/lp_screen.c |1 - src/gallium/drivers/nv30/nv30_screen.c |1 - src/gallium/drivers/nv50/nv50_screen.c |1 - src/gallium/drivers/nvc0/nvc0_screen.c |1 - src/gallium/drivers/r300/r300_screen.c |1 - src/gallium/drivers/r600/r600_pipe.c |1 - src/gallium/drivers/radeonsi/radeonsi_pipe.c |1 - src/gallium/drivers/softpipe/sp_screen.c |1 - src/gallium/drivers/svga/svga_screen.c |1 - src/gallium/include/pipe/p_defines.h |1 - src/glsl/link_varyings.cpp | 32 ++ src/mesa/main/mtypes.h |5 ++-- src/mesa/state_tracker/st_extensions.c |3 -- 16 files changed, 10 insertions(+), 43 deletions(-) diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index f88fa08..ff45b3e 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -185,7 +185,6 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT: case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_INTEGER: case PIPE_CAP_SCALED_RESOLVE: - case PIPE_CAP_TGSI_CAN_COMPACT_VARYINGS: case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS: case PIPE_CAP_FRAGMENT_COLOR_CLAMPED: case PIPE_CAP_VERTEX_COLOR_CLAMPED: diff --git a/src/gallium/drivers/i915/i915_screen.c b/src/gallium/drivers/i915/i915_screen.c index 2d0cc78..3c751c5 100644 --- a/src/gallium/drivers/i915/i915_screen.c +++ b/src/gallium/drivers/i915/i915_screen.c @@ -204,7 +204,6 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap cap) case PIPE_CAP_MIXED_COLORBUFFER_FORMATS: case PIPE_CAP_CONDITIONAL_RENDER: case PIPE_CAP_TEXTURE_BARRIER: - case PIPE_CAP_TGSI_CAN_COMPACT_VARYINGS: case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS: case PIPE_CAP_VERTEX_COLOR_UNCLAMPED: case PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION: diff --git a/src/gallium/drivers/ilo/ilo_screen.c b/src/gallium/drivers/ilo/ilo_screen.c index 9daf01e..7a4443e 100644 --- a/src/gallium/drivers/ilo/ilo_screen.c +++ b/src/gallium/drivers/ilo/ilo_screen.c @@ -372,7 +372,6 @@ ilo_get_param(struct pipe_screen *screen, enum pipe_cap param) return is->dev.has_gen7_sol_reset; else return false; /* TODO */ - case PIPE_CAP_TGSI_CAN_COMPACT_VARYINGS: case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS: return false; case PIPE_CAP_VERTEX_COLOR_UNCLAMPED: diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c b/src/gallium/drivers/llvmpipe/lp_screen.c index 562fb51..1fed537 100644 --- a/src/gallium/drivers/llvmpipe/lp_screen.c +++ b/src/gallium/drivers/llvmpipe/lp_screen.c @@ -192,7 +192,6 @@ llvmpipe_get_param(struct pipe_screen *screen, enum pipe_cap param) return 16*4; case PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME: return 1; - case PIPE_CAP_TGSI_CAN_COMPACT_VARYINGS: case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS: return 0; case PIPE_CAP_VERTEX_COLOR_UNCLAMPED: diff --git a/src/gallium/drivers/nv30/nv30_screen.c b/src/gallium/drivers/nv30/nv30_screen.c index c9943e0..07ffc80 100644 --- a/src/gallium/drivers/nv30/nv30_screen.c +++ b/src/gallium/drivers/nv30/nv30_screen.c @@ -109,7 +109,6 @@ nv30_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_MAX_TEXEL_OFFSET: case PIPE_CAP_MAX_STREAM_OUTPUT_SEPARATE_COMPONENTS: case PIPE_CAP_MAX_STREAM_OUTPUT_INTERLEAVED_COMPONENTS: - case PIPE_CAP_TGSI_CAN_COMPACT_VARYINGS: case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS: case PIPE_CAP_TEXTURE_BARRIER: case PIPE_CAP_SEAMLESS_CUBE_MAP: diff --git a/src/gallium/drivers/nv50/nv50_screen.c b/src/gallium/drivers/nv50/nv50_screen.c index b6da303..5c57aa2 100644 --- a/src/gallium/drivers/nv50/nv50_screen.c +++ b/src/gallium/drivers/nv50/nv50_screen.c @@ -165,7 +165,6 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION: case PIPE_CAP_START_INSTANCE: return 1; - case PIPE_CAP_TGSI_CAN_COMPACT_VARYINGS: case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS: return 0; /* state trackers will know better */ case PIPE_CAP_USER_CONSTANT_BUFFERS: diff --git a/src/gallium/drivers/nvc0/nvc0_screen.c b/src/gallium/drivers/nvc0/nvc0_screen.c index 97ce82c..027fc11 100644 --- a/src/gallium/drivers/nvc0/nvc0_screen.c +++ b/src/gallium/drivers/nvc0/nvc0_screen.c @@ -157,7 +157,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_QUAD
[Mesa-dev] [PATCH 5/6] st/mesa: disable EXT_separate_shader_objects
The extension disallows elimination of set-but-unused varyings. --- docs/relnotes/9.2.html |3 +++ src/mesa/state_tracker/st_extensions.c |9 - 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/docs/relnotes/9.2.html b/docs/relnotes/9.2.html index 0dcc960..99f6374 100644 --- a/docs/relnotes/9.2.html +++ b/docs/relnotes/9.2.html @@ -63,6 +63,9 @@ Note: some of the new features are only available with certain drivers. Removed d3d1x state tracker (unused, unmaintained and broken) +GL_EXT_separate_shader_objects has been removed from all Gallium drivers, +because it disallows critical GLSL shader optimizations. +GL_ARB_separate_shader_objects doesn't have this issue. diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index 966722c..43111d6 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -559,7 +559,14 @@ void st_init_extensions(struct st_context *st) ctx->Extensions.EXT_point_parameters = GL_TRUE; ctx->Extensions.EXT_provoking_vertex = GL_TRUE; ctx->Extensions.EXT_secondary_color = GL_TRUE; - ctx->Extensions.EXT_separate_shader_objects = GL_TRUE; + + /* IMPORTANT: +*Don't enable EXT_separate_shader_objects. It disallows certain +*optimizations in the GLSL compiler and therefore is considered +*harmful. +*/ + ctx->Extensions.EXT_separate_shader_objects = GL_FALSE; + ctx->Extensions.EXT_texture_env_dot3 = GL_TRUE; ctx->Extensions.EXT_vertex_array_bgra = GL_TRUE; -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] glsl/linker: eliminate unused and set-but-unused built-in varyings
This eliminates built-in varyings such as gl_Color, gl_SecondaryColor, gl_TexCoord, and gl_FogFragCoord if they are unused by the next stage or not written at all (e.g. gl_TexCoord elements). The gl_TexCoord array is broken down into separate vec4s if needed. --- src/glsl/Makefile.sources |1 + src/glsl/ir_optimization.h |4 + src/glsl/link_varyings.h |4 + src/glsl/linker.cpp| 13 +- src/glsl/opt_dead_builtin_varyings.cpp | 468 5 files changed, 488 insertions(+), 2 deletions(-) create mode 100644 src/glsl/opt_dead_builtin_varyings.cpp diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources index 50bad85..cb17cf8 100644 --- a/src/glsl/Makefile.sources +++ b/src/glsl/Makefile.sources @@ -81,6 +81,7 @@ LIBGLSL_FILES = \ $(GLSL_SRCDIR)/opt_constant_variable.cpp \ $(GLSL_SRCDIR)/opt_copy_propagation.cpp \ $(GLSL_SRCDIR)/opt_copy_propagation_elements.cpp \ + $(GLSL_SRCDIR)/opt_dead_builtin_varyings.cpp \ $(GLSL_SRCDIR)/opt_dead_code.cpp \ $(GLSL_SRCDIR)/opt_dead_code_local.cpp \ $(GLSL_SRCDIR)/opt_dead_functions.cpp \ diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h index d38d5e3..fad6f1b 100644 --- a/src/glsl/ir_optimization.h +++ b/src/glsl/ir_optimization.h @@ -76,6 +76,10 @@ bool do_constant_variable_unlinked(exec_list *instructions); bool do_copy_propagation(exec_list *instructions); bool do_copy_propagation_elements(exec_list *instructions); bool do_constant_propagation(exec_list *instructions); +void do_dead_builtin_varyings(struct gl_context *ctx, + exec_list *producer, exec_list *consumer, + unsigned num_tfeedback_decls, + class tfeedback_decl *tfeedback_decls); bool do_dead_code(exec_list *instructions, bool uniform_locations_assigned); bool do_dead_code_local(exec_list *instructions); bool do_dead_code_unlinked(exec_list *instructions); diff --git a/src/glsl/link_varyings.h b/src/glsl/link_varyings.h index daa9d79..7f7be35 100644 --- a/src/glsl/link_varyings.h +++ b/src/glsl/link_varyings.h @@ -125,6 +125,10 @@ public: return this->vector_elements * this->matrix_columns * this->size; } + unsigned get_location() const { + return this->location; + } + private: /** * The name that was supplied to glTransformFeedbackVaryings. Used for diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index a8537cf..129b665 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -1887,6 +1887,9 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog) goto done; } + do_dead_builtin_varyings(ctx, sh->ir, NULL, + num_tfeedback_decls, tfeedback_decls); + demote_shader_inputs_and_outputs(sh, ir_var_shader_out); /* Eliminate code that is now dead due to unused outputs being demoted. @@ -1895,11 +1898,13 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog) ; } else if (first == MESA_SHADER_FRAGMENT) { - /* If the program only contains a fragment shader, just demote - * user-defined varyings. + /* If the program only contains a fragment shader... */ gl_shader *const sh = prog->_LinkedShaders[first]; + do_dead_builtin_varyings(ctx, NULL, sh->ir, + num_tfeedback_decls, tfeedback_decls); + demote_shader_inputs_and_outputs(sh, ir_var_shader_in); while (do_dead_code(sh->ir, false)) @@ -1919,6 +1924,10 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog) tfeedback_decls)) goto done; + do_dead_builtin_varyings(ctx, sh_i->ir, sh_next->ir, +next == MESA_SHADER_FRAGMENT ? num_tfeedback_decls : 0, +tfeedback_decls); + demote_shader_inputs_and_outputs(sh_i, ir_var_shader_out); demote_shader_inputs_and_outputs(sh_next, ir_var_shader_in); diff --git a/src/glsl/opt_dead_builtin_varyings.cpp b/src/glsl/opt_dead_builtin_varyings.cpp new file mode 100644 index 000..eb99d1e --- /dev/null +++ b/src/glsl/opt_dead_builtin_varyings.cpp @@ -0,0 +1,468 @@ +/* + * Copyright © 2013 Marek Olšák + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Softwar
[Mesa-dev] [PATCH 3/6] glsl/linker: check against varying limit after unused varyings are eliminated
We counted even the varyings which were later eliminated, which was suboptimal. --- src/glsl/link_varyings.cpp | 35 --- src/glsl/link_varyings.h |5 + src/glsl/linker.cpp|4 3 files changed, 33 insertions(+), 11 deletions(-) diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp index 34e3440..25f27f0 100644 --- a/src/glsl/link_varyings.cpp +++ b/src/glsl/link_varyings.cpp @@ -,16 +,12 @@ assign_varying_locations(struct gl_context *ctx, } } - unsigned varying_vectors = 0; - if (consumer) { foreach_list(node, consumer->ir) { ir_variable *const var = ((ir_instruction *) node)->as_variable(); - if ((var == NULL) || (var->mode != ir_var_shader_in)) -continue; - - if (var->is_unmatched_generic_inout) { + if (var && var->mode == ir_var_shader_in && + var->is_unmatched_generic_inout) { if (prog->Version <= 120) { /* On page 25 (page 31 of the PDF) of the GLSL 1.20 spec: * @@ -1143,15 +1139,32 @@ assign_varying_locations(struct gl_context *ctx, * value is written by the previous stage. */ var->mode = ir_var_auto; - } else if (is_varying_var(consumer->Type, var)) { -/* The packing rules are used for vertex shader inputs are also - * used for fragment shader inputs. - */ -varying_vectors += count_attribute_slots(var->type); } } } + return true; +} + +bool +check_against_varying_limit(struct gl_context *ctx, +struct gl_shader_program *prog, +gl_shader *consumer) +{ + unsigned varying_vectors = 0; + + foreach_list(node, consumer->ir) { + ir_variable *const var = ((ir_instruction *) node)->as_variable(); + + if (var && var->mode == ir_var_shader_in && + is_varying_var(consumer->Type, var)) { + /* The packing rules used for vertex shader inputs are also + * used for fragment shader inputs. + */ + varying_vectors += count_attribute_slots(var->type); + } + } + if (ctx->API == API_OPENGLES2 || prog->IsES) { if (varying_vectors > ctx->Const.MaxVarying) { if (ctx->Const.GLSLSkipStrictMaxVaryingLimitCheck) { diff --git a/src/glsl/link_varyings.h b/src/glsl/link_varyings.h index ee1010a..daa9d79 100644 --- a/src/glsl/link_varyings.h +++ b/src/glsl/link_varyings.h @@ -232,4 +232,9 @@ assign_varying_locations(struct gl_context *ctx, unsigned num_tfeedback_decls, tfeedback_decl *tfeedback_decls); +bool +check_against_varying_limit(struct gl_context *ctx, +struct gl_shader_program *prog, +gl_shader *consumer); + #endif /* GLSL_LINK_VARYINGS_H */ diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index 9ef9cc7..a8537cf 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -1929,6 +1929,10 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog) while (do_dead_code(sh_next->ir, false)) ; + /* This must be done after all dead varyings are eliminated. */ + if (!check_against_varying_limit(ctx, prog, sh_next)) + goto done; + next = i; } -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] glsl/linker: link shaders in the opposite order (from fragment to vertex)
This ensures that inter-shader outputs and inputs are properly eliminated across 3 or more shader stages. The behavior is unchanged with 2 or less shader stages. For example, elimination of unused FS inputs causes elimination of matching GS outputs, which causes elimination of the GS inputs that were needed for evaluation of the eliminated GS outputs, which causes elimination of matching VS outputs. An unused FS input is all that's needed to trigger this chain reaction. (It was too late when I realized this hadn't been needed with only 2 shader stages. This can be considered a cleanup for now and hopefully geometry shaders are close to completion.) --- src/glsl/linker.cpp | 108 +++ 1 file changed, 58 insertions(+), 50 deletions(-) diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index 0f167e6..9ef9cc7 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -1836,9 +1836,9 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog) goto done; } - unsigned prev; - for (prev = 0; prev < MESA_SHADER_TYPES; prev++) { - if (prog->_LinkedShaders[prev] != NULL) + unsigned first; + for (first = 0; first < MESA_SHADER_TYPES; first++) { + if (prog->_LinkedShaders[first] != NULL) break; } @@ -1850,7 +1850,7 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog) * non-zero, but the program object has no vertex or geometry * shader; */ - if (prev >= MESA_SHADER_FRAGMENT) { + if (first >= MESA_SHADER_FRAGMENT) { linker_error(prog, "Transform feedback varyings specified, but " "no vertex or geometry shader is present."); goto done; @@ -1864,69 +1864,77 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog) goto done; } - for (unsigned i = prev + 1; i < MESA_SHADER_TYPES; i++) { - if (prog->_LinkedShaders[i] == NULL) -continue; - - if (!assign_varying_locations( - ctx, mem_ctx, prog, prog->_LinkedShaders[prev], prog->_LinkedShaders[i], - i == MESA_SHADER_FRAGMENT ? num_tfeedback_decls : 0, - tfeedback_decls)) -goto done; - - prev = i; + /* Linking the stages in the opposite order (from fragment to vertex) +* ensures that inter-shader outputs written to in an earlier stage are +* eliminated if they are (transitively) not used in a later stage. +*/ + int last, next; + for (last = MESA_SHADER_TYPES-1; last >= 0; last--) { + if (prog->_LinkedShaders[last] != NULL) + break; } - if (prev != MESA_SHADER_FRAGMENT && num_tfeedback_decls != 0) { - /* There was no fragment shader, but we still have to assign varying - * locations for use by transform feedback. - */ - if (!assign_varying_locations( - ctx, mem_ctx, prog, prog->_LinkedShaders[prev], NULL, num_tfeedback_decls, - tfeedback_decls)) - goto done; - } + if (last >= 0 && last < MESA_SHADER_FRAGMENT) { + gl_shader *const sh = prog->_LinkedShaders[last]; - if (!store_tfeedback_info(ctx, prog, num_tfeedback_decls, tfeedback_decls)) - goto done; + if (num_tfeedback_decls != 0) { + /* There was no fragment shader, but we still have to assign varying + * locations for use by transform feedback. + */ + if (!assign_varying_locations(ctx, mem_ctx, prog, + sh, NULL, + num_tfeedback_decls, tfeedback_decls)) +goto done; + } - if (prog->_LinkedShaders[MESA_SHADER_VERTEX] != NULL) { - demote_shader_inputs_and_outputs(prog->_LinkedShaders[MESA_SHADER_VERTEX], - ir_var_shader_out); + demote_shader_inputs_and_outputs(sh, ir_var_shader_out); - /* Eliminate code that is now dead due to unused vertex outputs being - * demoted. + /* Eliminate code that is now dead due to unused outputs being demoted. */ - while (do_dead_code(prog->_LinkedShaders[MESA_SHADER_VERTEX]->ir, false)) -; + while (do_dead_code(sh->ir, false)) + ; } - - if (prog->_LinkedShaders[MESA_SHADER_GEOMETRY] != NULL) { - gl_shader *const sh = prog->_LinkedShaders[MESA_SHADER_GEOMETRY]; + else if (first == MESA_SHADER_FRAGMENT) { + /* If the program only contains a fragment shader, just demote + * user-defined varyings. + */ + gl_shader *const sh = prog->_LinkedShaders[first]; demote_shader_inputs_and_outputs(sh, ir_var_shader_in); - demote_shader_inputs_and_outputs(sh, ir_var_shader_out); - /* Eliminate code that is now dead due to unused geometry outputs being - * demoted. - */ - while (do_dead_code(prog->_LinkedShaders[MESA_SHADER_GEOMETRY]->ir,
[Mesa-dev] [PATCH 1/6] mesa, gallium: renumber shader indices according to their placement in pipeline
See my explanation in mtypes.h. --- src/gallium/include/pipe/p_defines.h |7 --- src/glsl/linker.cpp| 16 src/mesa/drivers/dri/i965/brw_shader.cpp |8 ++-- src/mesa/main/mtypes.h |8 ++-- src/mesa/main/shaderobj.h |4 ++-- src/mesa/main/uniform_query.cpp|2 +- src/mesa/program/ir_to_mesa.cpp| 10 +++--- src/mesa/program/program.h |2 +- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 10 +++--- 9 files changed, 30 insertions(+), 37 deletions(-) diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h index 8af1a84..216cd2f 100644 --- a/src/gallium/include/pipe/p_defines.h +++ b/src/gallium/include/pipe/p_defines.h @@ -352,11 +352,12 @@ enum pipe_flush_flags { /** - * Shaders + * Shaders. + * These must have the same values as Mesa's MESA_SHADER_*. */ #define PIPE_SHADER_VERTEX 0 -#define PIPE_SHADER_FRAGMENT 1 -#define PIPE_SHADER_GEOMETRY 2 +#define PIPE_SHADER_GEOMETRY 1 +#define PIPE_SHADER_FRAGMENT 2 #define PIPE_SHADER_COMPUTE 3 #define PIPE_SHADER_TYPES4 diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index cd8d680..0f167e6 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -1514,31 +1514,31 @@ static bool check_resources(struct gl_context *ctx, struct gl_shader_program *prog) { static const char *const shader_names[MESA_SHADER_TYPES] = { - "vertex", "fragment", "geometry" + "vertex", "geometry", "fragment" }; const unsigned max_samplers[MESA_SHADER_TYPES] = { ctx->Const.VertexProgram.MaxTextureImageUnits, - ctx->Const.FragmentProgram.MaxTextureImageUnits, - ctx->Const.GeometryProgram.MaxTextureImageUnits + ctx->Const.GeometryProgram.MaxTextureImageUnits, + ctx->Const.FragmentProgram.MaxTextureImageUnits }; const unsigned max_default_uniform_components[MESA_SHADER_TYPES] = { ctx->Const.VertexProgram.MaxUniformComponents, - ctx->Const.FragmentProgram.MaxUniformComponents, - ctx->Const.GeometryProgram.MaxUniformComponents + ctx->Const.GeometryProgram.MaxUniformComponents, + ctx->Const.FragmentProgram.MaxUniformComponents }; const unsigned max_combined_uniform_components[MESA_SHADER_TYPES] = { ctx->Const.VertexProgram.MaxCombinedUniformComponents, - ctx->Const.FragmentProgram.MaxCombinedUniformComponents, - ctx->Const.GeometryProgram.MaxCombinedUniformComponents + ctx->Const.GeometryProgram.MaxCombinedUniformComponents, + ctx->Const.FragmentProgram.MaxCombinedUniformComponents }; const unsigned max_uniform_blocks[MESA_SHADER_TYPES] = { ctx->Const.VertexProgram.MaxUniformBlocks, - ctx->Const.FragmentProgram.MaxUniformBlocks, ctx->Const.GeometryProgram.MaxUniformBlocks, + ctx->Const.FragmentProgram.MaxUniformBlocks }; for (unsigned i = 0; i < MESA_SHADER_TYPES; i++) { diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 03e4329..5c8f449 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -119,17 +119,13 @@ brw_link_shader(struct gl_context *ctx, struct gl_shader_program *shProg) for (stage = 0; stage < ARRAY_SIZE(shProg->_LinkedShaders); stage++) { struct brw_shader *shader = (struct brw_shader *)shProg->_LinkedShaders[stage]; - static const GLenum targets[] = { -GL_VERTEX_PROGRAM_ARB, -GL_FRAGMENT_PROGRAM_ARB, -GL_GEOMETRY_PROGRAM_NV - }; if (!shader) continue; struct gl_program *prog = -ctx->Driver.NewProgram(ctx, targets[stage], shader->base.Name); +ctx->Driver.NewProgram(ctx, _mesa_program_index_to_target(stage), +shader->base.Name); if (!prog) return false; prog->Parameters = _mesa_new_parameter_list(); diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index cd8650c..750e333 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -2175,12 +2175,16 @@ struct gl_shader /** * Shader stages. Note that these will become 5 with tessellation. * These MUST have the same values as gallium's PIPE_SHADER_* + * + * The order must match how shaders are ordered in the pipeline. + * The GLSL linker assumes that if i= MESA_SHADER_TYPES) return 0; diff --git a/src/mesa/main/uniform_query.cpp b/src/mesa/main/uniform_query.cpp index 296f80f..be2f0e4 100644 --- a/src/mesa/main/uniform_query.cpp +++ b/src/mesa/main/uniform_query.cpp @@ -435,7 +435,7 @@ static void log_program_parameters(const struct gl_shader_program *shProg) { static const char *stages[] = { - "vertex", "fragment", "geometry" + "vertex", "geometry", "fragment" }; assert(Elements(stages) == MESA_SHADER_TYPES); diff --git a/src/mesa/prog
[Mesa-dev] [PATCH 0/6] Eliminating unused built-in varyings
Hi everyone, this series adds a new GLSL compiler optimization pass which eliminates unused and set-but-unused built-in varyings and adds a few improvements to the GLSL linker in the process. Before I show you how it works, I wanna say that there are patches which are related to and will most probably conflict with the geometry shader work, but they are necessary because the linkage of varyings is largely suboptimal. Also, the GL_EXT_separate_shader_objects extension must be disabled for this optimization to be enabled. The reason is a program object with both a VS and FS can be bound partially, e.g. by glUseShaderProgramEXT(GL_VERTEX_SHADER, prog), so the extension makes every program object be just a set of "separate shaders". The extension is not important anyway. Now, to illustrate how the optimization works, consider these 2 shader IR dumps: Vertex shader (8 varyings): ... (declare (shader_out ) vec4 gl_FrontColor) (declare (shader_out ) vec4 gl_FrontSecondaryColor) (declare (shader_out ) (array vec4 6) gl_TexCoord) (function main (signature void (parameters ) ( ... (assign (xyzw) (var_ref gl_FrontColor) (var_ref gl_Color) ) (assign (xyzw) (var_ref gl_FrontSecondaryColor) (var_ref gl_SecondaryColor) ) (assign (xyzw) (array_ref (var_ref gl_TexCoord) (constant int (1)) ) (var_ref gl_MultiTexCoord1) ) (assign (xyzw) (array_ref (var_ref gl_TexCoord) (constant int (4)) ) (var_ref gl_MultiTexCoord4) ) (assign (xyzw) (array_ref (var_ref gl_TexCoord) (constant int (5)) ) (var_ref gl_MultiTexCoord5) ) )) ) Fragment shader (6 varyings): ... (declare (shader_in ) vec4 gl_SecondaryColor) (declare (shader_in ) (array vec4 5) gl_TexCoord) (function main (signature void (parameters ) ( (declare () vec4 r) (assign (xyzw) (var_ref r) ... (var_ref gl_SecondaryColor) ) ) (assign (xyzw) (var_ref r) ... (array_ref (var_ref gl_TexCoord) (constant int (1)) ) ) ) ) (assign (xyzw) (var_ref r) ... (array_ref (var_ref gl_TexCoord) (constant int (2)) ) ) ) ) (assign (xyzw) (var_ref r) ... (array_ref (var_ref gl_TexCoord) (constant int (3)) ) ) ) ) (declare (temporary ) vec4 assignment_tmp) (assign (xyzw) (var_ref assignment_tmp) ... (array_ref (var_ref gl_TexCoord) (constant int (4)) ) ) ) ) ... )) ) Note that only gl_TexCoord[1], gl_TexCoord[4], and gl_SecondaryColor are used by both shaders. The optimization replaces all occurences of varyings which are unused by the other stage by temporary variables. It also breaks down the gl_TexCoord array into separate vec4 variables if needed. Here's the result: Vertex shader (3 varyings instead of 8): ... (declare (shader_out ) vec4 gl_out_TexCoord1) (declare (shader_out ) vec4 gl_out_TexCoord4) (declare (temporary ) vec4 gl_out_TexCoord5_dummy) (declare (temporary ) vec4 gl_out_FrontColor0_dummy) (declare (shader_out ) vec4 gl_FrontSecondaryColor) (function main (signature void (parameters ) ( ... (assign (xyzw) (var_ref gl_out_FrontColor0_dummy) (var_ref gl_Color) ) (assign (xyzw) (var_ref gl_FrontSecondaryColor) (var_ref gl_SecondaryColor) ) (assign (xyzw) (var_ref gl_out_TexCoord1) (var_ref gl_MultiTexCoord1) ) (assign (xyzw) (var_ref gl_out_TexCoord4) (var_ref gl_MultiTexCoord4) ) (assign (xyzw) (var_ref gl_out_TexCoord5_dummy) (var_ref gl_MultiTexCoord5) ) )) ) Fragment shader (3 varyings instead of 6): ... (declare (shader_in ) vec4 gl_in_TexCoord1) (declare (temporary ) vec4 gl_in_TexCoord2_dummy) (declare (temporary ) vec4 gl_in_TexCoord3_dummy) (declare (shader_in ) vec4 gl_in_TexCoord4) (declare (shader_in ) vec4 gl_SecondaryColor) (function main (signature void (parameters ) ( (declare () vec4 r) (assign (xyzw) (var_ref r) ... (var_ref gl_SecondaryColor) ) ) (assign (xyzw) (var_ref r) ... (var_ref gl_in_TexCoord1) ) ) ) (assign (xyzw) (var_ref r) ... (var_ref gl_in_TexCoord2_dummy) ) ) ) (assign (xyzw) (var_ref r) ... (var_ref gl_in_TexCoord3_dummy) ) ) ) (declare (temporary ) vec4 assignment_tmp) (assign (xyzw) (var_ref assignment_tmp) ... (var_ref gl_in_TexCoord4) ) ) ) ... )) ) The locations of varyings remain the same. That's all. Please review. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: fix temp texture bindings in st_CopyPixels()
Reviewed-by: Marek Olšák Marek On Thu, Jun 13, 2013 at 11:11 AM, Chia-I Wu wrote: > The temporary texture should have either PIPE_BIND_RENDER_TARGET or > PIPE_BIND_DEPTH_STENCIL set in addition to PIPE_BIND_SAMPLER_VIEW. > > Signed-off-by: Chia-I Wu > --- > src/mesa/state_tracker/st_cb_drawpixels.c | 30 > + > 1 file changed, 13 insertions(+), 17 deletions(-) > > diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c > b/src/mesa/state_tracker/st_cb_drawpixels.c > index 1c26315..0200a62 100644 > --- a/src/mesa/state_tracker/st_cb_drawpixels.c > +++ b/src/mesa/state_tracker/st_cb_drawpixels.c > @@ -460,12 +460,12 @@ internal_format(struct gl_context *ctx, GLenum format, > GLenum type) > */ > static struct pipe_resource * > alloc_texture(struct st_context *st, GLsizei width, GLsizei height, > - enum pipe_format texFormat) > + enum pipe_format texFormat, unsigned bind) > { > struct pipe_resource *pt; > > pt = st_texture_create(st, st->internal_target, texFormat, 0, > - width, height, 1, 1, 0, PIPE_BIND_SAMPLER_VIEW); > + width, height, 1, 1, 0, bind); > > return pt; > } > @@ -515,7 +515,7 @@ make_texture(struct st_context *st, >return NULL; > > /* alloc temporary texture */ > - pt = alloc_texture(st, width, height, pipeFormat); > + pt = alloc_texture(st, width, height, pipeFormat, PIPE_BIND_SAMPLER_VIEW); > if (!pt) { >_mesa_unmap_pbo_source(ctx, unpack); >return NULL; > @@ -1475,6 +1475,7 @@ st_CopyPixels(struct gl_context *ctx, GLint srcx, GLint > srcy, > int num_sampler_view = 1; > GLfloat *color; > enum pipe_format srcFormat; > + unsigned srcBind; > GLboolean invertTex = GL_FALSE; > GLint readX, readY, readW, readH; > struct gl_pixelstore_attrib pack = ctx->DefaultPacking; > @@ -1540,16 +1541,15 @@ st_CopyPixels(struct gl_context *ctx, GLint srcx, > GLint srcy, > > /* Choose the format for the temporary texture. */ > srcFormat = rbRead->texture->format; > + srcBind = PIPE_BIND_SAMPLER_VIEW | > + (type == GL_COLOR ? PIPE_BIND_RENDER_TARGET : PIPE_BIND_DEPTH_STENCIL); > > if (!screen->is_format_supported(screen, srcFormat, st->internal_target, > 0, > -PIPE_BIND_SAMPLER_VIEW | > -(type == GL_COLOR ? > PIPE_BIND_RENDER_TARGET > - : PIPE_BIND_DEPTH_STENCIL))) { > +srcBind)) { >if (type == GL_DEPTH) { > srcFormat = st_choose_format(st, GL_DEPTH_COMPONENT, GL_NONE, >GL_NONE, st->internal_target, 0, > - PIPE_BIND_SAMPLER_VIEW | > - PIPE_BIND_DEPTH_STENCIL, FALSE); > + srcBind, FALSE); >} >else { > assert(type == GL_COLOR); > @@ -1557,26 +1557,22 @@ st_CopyPixels(struct gl_context *ctx, GLint srcx, > GLint srcy, > if (util_format_is_float(srcFormat)) { > srcFormat = st_choose_format(st, GL_RGBA32F, GL_NONE, > GL_NONE, st->internal_target, 0, > - PIPE_BIND_SAMPLER_VIEW | > - PIPE_BIND_RENDER_TARGET, FALSE); > + srcBind, FALSE); > } > else if (util_format_is_pure_sint(srcFormat)) { > srcFormat = st_choose_format(st, GL_RGBA32I, GL_NONE, > GL_NONE, st->internal_target, 0, > - PIPE_BIND_SAMPLER_VIEW | > - PIPE_BIND_RENDER_TARGET, FALSE); > + srcBind, FALSE); > } > else if (util_format_is_pure_uint(srcFormat)) { > srcFormat = st_choose_format(st, GL_RGBA32UI, GL_NONE, > GL_NONE, st->internal_target, 0, > - PIPE_BIND_SAMPLER_VIEW | > - PIPE_BIND_RENDER_TARGET, FALSE); > + srcBind, FALSE); > } > else { > srcFormat = st_choose_format(st, GL_RGBA, GL_NONE, > GL_NONE, st->internal_target, 0, > - PIPE_BIND_SAMPLER_VIEW | > - PIPE_BIND_RENDER_TARGET, FALSE); > + srcBind, FALSE); > } >} > > @@ -1615,7 +1611,7 @@ st_CopyPixels(struct gl_context *ctx, GLint srcx, GLint > srcy, > readH = MAX2(0, readH); > > /* Allocate the temporary texture. */ > - pt = alloc_texture(st, width, height, srcFormat); >
[Mesa-dev] [PATCH] st/mesa: fix temp texture bindings in st_CopyPixels()
The temporary texture should have either PIPE_BIND_RENDER_TARGET or PIPE_BIND_DEPTH_STENCIL set in addition to PIPE_BIND_SAMPLER_VIEW. Signed-off-by: Chia-I Wu --- src/mesa/state_tracker/st_cb_drawpixels.c | 30 + 1 file changed, 13 insertions(+), 17 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c b/src/mesa/state_tracker/st_cb_drawpixels.c index 1c26315..0200a62 100644 --- a/src/mesa/state_tracker/st_cb_drawpixels.c +++ b/src/mesa/state_tracker/st_cb_drawpixels.c @@ -460,12 +460,12 @@ internal_format(struct gl_context *ctx, GLenum format, GLenum type) */ static struct pipe_resource * alloc_texture(struct st_context *st, GLsizei width, GLsizei height, - enum pipe_format texFormat) + enum pipe_format texFormat, unsigned bind) { struct pipe_resource *pt; pt = st_texture_create(st, st->internal_target, texFormat, 0, - width, height, 1, 1, 0, PIPE_BIND_SAMPLER_VIEW); + width, height, 1, 1, 0, bind); return pt; } @@ -515,7 +515,7 @@ make_texture(struct st_context *st, return NULL; /* alloc temporary texture */ - pt = alloc_texture(st, width, height, pipeFormat); + pt = alloc_texture(st, width, height, pipeFormat, PIPE_BIND_SAMPLER_VIEW); if (!pt) { _mesa_unmap_pbo_source(ctx, unpack); return NULL; @@ -1475,6 +1475,7 @@ st_CopyPixels(struct gl_context *ctx, GLint srcx, GLint srcy, int num_sampler_view = 1; GLfloat *color; enum pipe_format srcFormat; + unsigned srcBind; GLboolean invertTex = GL_FALSE; GLint readX, readY, readW, readH; struct gl_pixelstore_attrib pack = ctx->DefaultPacking; @@ -1540,16 +1541,15 @@ st_CopyPixels(struct gl_context *ctx, GLint srcx, GLint srcy, /* Choose the format for the temporary texture. */ srcFormat = rbRead->texture->format; + srcBind = PIPE_BIND_SAMPLER_VIEW | + (type == GL_COLOR ? PIPE_BIND_RENDER_TARGET : PIPE_BIND_DEPTH_STENCIL); if (!screen->is_format_supported(screen, srcFormat, st->internal_target, 0, -PIPE_BIND_SAMPLER_VIEW | -(type == GL_COLOR ? PIPE_BIND_RENDER_TARGET - : PIPE_BIND_DEPTH_STENCIL))) { +srcBind)) { if (type == GL_DEPTH) { srcFormat = st_choose_format(st, GL_DEPTH_COMPONENT, GL_NONE, GL_NONE, st->internal_target, 0, - PIPE_BIND_SAMPLER_VIEW | - PIPE_BIND_DEPTH_STENCIL, FALSE); + srcBind, FALSE); } else { assert(type == GL_COLOR); @@ -1557,26 +1557,22 @@ st_CopyPixels(struct gl_context *ctx, GLint srcx, GLint srcy, if (util_format_is_float(srcFormat)) { srcFormat = st_choose_format(st, GL_RGBA32F, GL_NONE, GL_NONE, st->internal_target, 0, - PIPE_BIND_SAMPLER_VIEW | - PIPE_BIND_RENDER_TARGET, FALSE); + srcBind, FALSE); } else if (util_format_is_pure_sint(srcFormat)) { srcFormat = st_choose_format(st, GL_RGBA32I, GL_NONE, GL_NONE, st->internal_target, 0, - PIPE_BIND_SAMPLER_VIEW | - PIPE_BIND_RENDER_TARGET, FALSE); + srcBind, FALSE); } else if (util_format_is_pure_uint(srcFormat)) { srcFormat = st_choose_format(st, GL_RGBA32UI, GL_NONE, GL_NONE, st->internal_target, 0, - PIPE_BIND_SAMPLER_VIEW | - PIPE_BIND_RENDER_TARGET, FALSE); + srcBind, FALSE); } else { srcFormat = st_choose_format(st, GL_RGBA, GL_NONE, GL_NONE, st->internal_target, 0, - PIPE_BIND_SAMPLER_VIEW | - PIPE_BIND_RENDER_TARGET, FALSE); + srcBind, FALSE); } } @@ -1615,7 +1611,7 @@ st_CopyPixels(struct gl_context *ctx, GLint srcx, GLint srcy, readH = MAX2(0, readH); /* Allocate the temporary texture. */ - pt = alloc_texture(st, width, height, srcFormat); + pt = alloc_texture(st, width, height, srcFormat, srcBind); if (!pt) return; -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev