Re: [Mesa-dev] [PATCH 00/59] Initial arb_gpu_shader_fp64 support to the i965 scalar backend
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 04/05/16 21:28, Francisco Jerez wrote: > Samuel Iglesias Gonsálvez writes: > > > > On 03/05/16 20:59, Kenneth Graunke wrote: Other than patches 37, 56, and ones you agreed to drop, the series is: Reviewed-by: Kenneth Graunke I think you can go ahead and land all except those, and we can land new solutions for those problems afterwards. We still need to fix the horiz_offset problem, and I think Curro's subscript() helper is a great way to do that. I would be fine with fixing that after this series lands - it would probably mean less rebasing :) > > We prepared a patch series without patches 11, 22 (both were > dropped), 37, 40 (it has a NACK from Curro) and 56. Our idea was to > push the patch series once we check there is no piglit > regressions. > > We got 0 piglit regressions on ILK, SNB, IVB, HSW and BDW, the > latter with the extension enabled with the environment variable. > > Sadly, on SKL we had regressions (including GPU hangs) only if we > *enable* the arb_gpu_shader_fp64 extension. Those regressions are > fixed by patch 40 but we need still to investigate why they are > happening. > >> That sounds like PATCH 40 was most likely working around >> regs_read() being interpreted with incorrect units in some other >> place. The use of regs_read() in PATCH 36 is obviously wrong >> which might explain the problem. Something along the lines of >> 'inst->components_read(i) * type_sz(inst->src[i]) / 4' would do >> what you want -- With that changed PATCH 36 is: > >> Reviewed-by: Francisco Jerez This change fixes the SKL regressions. Thanks Curro! Sam > > For that reason, we have not pushed anything yet. Tomorrow we will > investigate why we have those regressions only on SKL and which > tests are affected. > >> No worries, you still have a couple of weeks until the merge >> deadline, I don't see the need to rush in any changes that still >> have outstanding comments. > > JFTR, if we disable the extension on SKL (i.e. we don't set the > environment variable), we have 0 piglit regressions on that > generation. > > Thanks, > > Sam > > >> ___ mesa-dev mailing >> list mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev -BEGIN PGP SIGNATURE- Version: GnuPG v2 iQIcBAEBCAAGBQJXKuxFAAoJEH/0ujLxfcNDtx4QAMvFo5qXCyK9hoigUV38J8nL tTtsveHH/aIQzhFzhPwU8q2ABoIuGfhFJ0vuusFKVmWGXVt93/0S+QMLToGTXKQS 2m5ySc+w+K8bO7PD0/S4VDRcEck2MKM7rNP+ErdTM1F8elB+vqasgfIRVMlGUf6S pjT4wuiRamL5boIGBhLdch8eVZjap2t07DZ9TkqqPltkhHmtMeBJyACzLKNMH44w 2QD0J0OzJ268oRr6PPlVHisk/LbtoQSsCmtQciNwg1u5LPxF72pNPn3xpnPtZ3R2 PWCF14KedgxHfswes4qGqJ7hWm9B0Zd6PHSPXfzdqkIu4Nq8n4uJw2EAim092QVm OWyjhSGgUthoWwBJCf2yWqtufTp2kdwaTia+l1oRTNOzGnXRurTxu3CNrLJ42Cju 8JwXkKaNjkk47dM85vFftBzQh82lYfAuKceIOgv37BqrvYPbinYM53/hPYaRnQxW EViDfDTWrNBdTi5OxTP3HZ87w3LQQEngxK7kW/N8bLalKlCEMVQ2PPu9Nmg+TKHm YlO8z+lWoPgda2dEzJjoIu6J9GOX1WCKKrwHtwbPxBTrt/PtDlj4TiuD+nsA34Pk RXgOJHLGFBoxgOcIZBrFlwdkVv8avE80nH3fj6BMlERhloZv+KF6zSv0tx02lRq1 J5GVIh0plzqZDF+uVSLJ =G05c -END PGP SIGNATURE- ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa/ubo: add missing compute cases for ubo/atomic buffers
From: Dave Airlie This fixes: GL43-CTS.compute_shader.resource-ubo Signed-off-by: Dave Airlie --- src/mesa/main/uniforms.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/main/uniforms.c b/src/mesa/main/uniforms.c index a9308d0..acb2d06 100644 --- a/src/mesa/main/uniforms.c +++ b/src/mesa/main/uniforms.c @@ -1144,6 +1144,12 @@ mesa_bufferiv(struct gl_shader_program *shProg, GLenum type, GL_REFERENCED_BY_FRAGMENT_SHADER, params, caller); return; + case GL_UNIFORM_BLOCK_REFERENCED_BY_COMPUTE_SHADER: + case GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_COMPUTE_SHADER: + _mesa_program_resource_prop(shProg, res, index, + GL_REFERENCED_BY_COMPUTE_SHADER, params, + caller); + return; default: _mesa_error(ctx, GL_INVALID_ENUM, "%s(pname 0x%x (%s))", caller, pname, -- 2.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] mesa: remove null check before free
On Wed, May 4, 2016 at 7:54 AM, Eduardo Lima Mitev wrote: > Patch 2 and 3 are: > > Reviewed-by: Eduardo Lima Mitev > Thanks! I do not have commit access. Can I ask you to push (again)? > > I barely know about coccinelle scripts so I will let others look at > first patch. > > Thanks > Eduardo > > On 05/04/2016 06:15 AM, Thomas Hindoe Paaboel Andersen wrote: > > --- > > src/mesa/main/readpix.c | 3 +-- > > src/mesa/main/texgetimage.c | 3 +-- > > 2 files changed, 2 insertions(+), 4 deletions(-) > > > > diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c > > index 882d863..1cb06c7 100644 > > --- a/src/mesa/main/readpix.c > > +++ b/src/mesa/main/readpix.c > > @@ -608,8 +608,7 @@ read_rgba_pixels( struct gl_context *ctx, > > dst, format, type); > > } > > > > - if (rgba) > > - free(rgba); > > + free(rgba); > > > > done_swap: > > /* Handle byte swapping if required */ > > diff --git a/src/mesa/main/texgetimage.c b/src/mesa/main/texgetimage.c > > index 4ac0ad4..fc3cc6b 100644 > > --- a/src/mesa/main/texgetimage.c > > +++ b/src/mesa/main/texgetimage.c > > @@ -557,8 +557,7 @@ get_tex_rgba_uncompressed(struct gl_context *ctx, > GLuint dimensions, > > } > > > > done: > > - if (rgba) > > - free(rgba); > > + free(rgba); > > } > > > > > > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] nir: fix assert for wildcard pairs
On Wed, May 4, 2016 at 7:46 AM, Eduardo Lima Mitev wrote: > Good catch! > > Reviewed-by: Eduardo Lima Mitev > Thanks! I do not have commit access. Can I ask you to push? > > On 05/04/2016 05:48 AM, Thomas Hindoe Paaboel Andersen wrote: > > The assert was null checking dest_arr_parent twice. The intention > > seems to be to check both dest_ and src_. > > > > Added in d3636da9 > > --- > > v2: > > Fix the assert rather than checking both in the if(). Hat tip to Ilia. > > src/compiler/nir/nir_lower_var_copies.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/src/compiler/nir/nir_lower_var_copies.c > b/src/compiler/nir/nir_lower_var_copies.c > > index 707d5af..1a7e2ee 100644 > > --- a/src/compiler/nir/nir_lower_var_copies.c > > +++ b/src/compiler/nir/nir_lower_var_copies.c > > @@ -85,7 +85,7 @@ emit_copy_load_store(nir_intrinsic_instr *copy_instr, > > > > if (src_arr_parent || dest_arr_parent) { > >/* Wildcards had better come in matched pairs */ > > - assert(dest_arr_parent && dest_arr_parent); > > + assert(src_arr_parent && dest_arr_parent); > > > >nir_deref_array *src_arr = > nir_deref_as_array(src_arr_parent->child); > >nir_deref_array *dest_arr = > nir_deref_as_array(dest_arr_parent->child); > > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] mesa/compute: drop pointless casts.
From: Dave Airlie We already are a GLintptr, casting won't help. Signed-off-by: Dave Airlie --- src/mesa/main/api_validate.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c index d455f19..421ae8b 100644 --- a/src/mesa/main/api_validate.c +++ b/src/mesa/main/api_validate.c @@ -1107,20 +1107,20 @@ valid_dispatch_indirect(struct gl_context *ctx, GLintptr indirect, GLsizei size, const char *name) { - GLintptr end = (GLintptr)indirect + size; + GLintptr end = indirect + size; /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders: * * "An INVALID_VALUE error is generated if indirect is negative or is not a * multiple of four." */ - if ((GLintptr)indirect & (sizeof(GLuint) - 1)) { + if (indirect & (sizeof(GLuint) - 1)) { _mesa_error(ctx, GL_INVALID_VALUE, "%s(indirect is not aligned)", name); return GL_FALSE; } - if ((GLintptr)indirect < 0) { + if (indirect < 0) { _mesa_error(ctx, GL_INVALID_VALUE, "%s(indirect is less than zero)", name); return GL_FALSE; -- 2.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] i965: Add infrastucture for sample lod-zero operations.
On Wednesday, May 4, 2016 3:54:13 PM PDT Matt Turner wrote: > --- > src/mesa/drivers/dri/i965/brw_defines.h | 5 + > src/mesa/drivers/dri/i965/brw_disasm.c | 3 +++ > src/mesa/drivers/dri/i965/brw_fs.cpp| 3 +++ > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 14 ++ > src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp | 2 ++ > src/mesa/drivers/dri/i965/brw_shader.cpp| 6 ++ > 6 files changed, 33 insertions(+) Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] mesa/compute: move compute checks around for tests.
From: Dave Airlie This fixes GL43-CTS.compute_shader.api-indirect which tests the length/4 before anything else. Signed-off-by: Dave Airlie --- src/mesa/main/api_validate.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c index 688408f..d455f19 100644 --- a/src/mesa/main/api_validate.c +++ b/src/mesa/main/api_validate.c @@ -1109,9 +1109,6 @@ valid_dispatch_indirect(struct gl_context *ctx, { GLintptr end = (GLintptr)indirect + size; - if (!check_valid_to_compute(ctx, name)) - return GL_FALSE; - /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders: * * "An INVALID_VALUE error is generated if indirect is negative or is not a @@ -1153,6 +1150,9 @@ valid_dispatch_indirect(struct gl_context *ctx, return GL_FALSE; } + if (!check_valid_to_compute(ctx, name)) + return GL_FALSE; + return GL_TRUE; } -- 2.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] i965/fs: Recognize and emit ld_lz, sample_lz, sample_c_lz.
On Wednesday, May 4, 2016 3:54:14 PM PDT Matt Turner wrote: > Ken suggested instead of a big and complicated optimization pass, to > just recognize the operations here. It's certainly less code and a lot > prettier, but it seems to actually perform worse for currently unknown > reasons. One potential pitfall is that is_zero() currently returns false for -0.0f. The fp64 series that's supposedly landing fixes this. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] i965/fs: Add and use get_nir_src_imm().
On Wednesday, May 4, 2016 3:54:12 PM PDT Matt Turner wrote: > Terrible name. Suggest something else, or even a better way to do this. > > Basically, the next patch wants to inspect the LOD argument and do > something different if it's 0.0f. But at that point we've emitted a MOV > for it and we just have a register to look at. > --- > This is an alternative approach to the 4 patch series I sent ast night. > > src/mesa/drivers/dri/i965/brw_fs.h | 1 + > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 36 ++ +- > 2 files changed, 32 insertions(+), 5 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/ brw_fs.h > index ba6bd3f..29fa593 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.h > +++ b/src/mesa/drivers/dri/i965/brw_fs.h > @@ -273,6 +273,7 @@ public: > void nir_emit_jump(const brw::fs_builder &bld, >nir_jump_instr *instr); > fs_reg get_nir_src(nir_src src); > + fs_reg get_nir_src_imm(nir_src src); > fs_reg get_nir_dest(nir_dest dest); > fs_reg get_nir_image_deref(const nir_deref_var *deref); > fs_reg get_indirect_offset(nir_intrinsic_instr *instr); > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/ dri/i965/brw_fs_nir.cpp > index 4d14fda..fb59800 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -1245,6 +1245,32 @@ fs_visitor::get_nir_src(nir_src src) > } > > fs_reg > +fs_visitor::get_nir_src_imm(nir_src src) > +{ > + fs_reg reg; > + if (src.is_ssa) { > + if (src.ssa->parent_instr->type == nir_instr_type_load_const) { > + nir_load_const_instr *load_const = > +nir_instr_as_load_const(src.ssa->parent_instr); > + return brw_imm_ud(load_const->value.u32[0]); > + } > + > + reg = nir_ssa_values[src.ssa->index]; > + } else { > + /* We don't handle indirects on locals */ > + assert(src.reg.indirect == NULL); > + reg = offset(nir_locals[src.reg.reg->index], bld, > + src.reg.base_offset * src.reg.reg->num_components); > + } > + > + /* to avoid floating-point denorm flushing problems, set the type by > +* default to D - instructions that need floating point semantics will set > +* this to F if they need to > +*/ > + return retype(reg, BRW_REGISTER_TYPE_D); > +} I actually wrote one of these in commit ce1511c627da2aa2a of my tree, and it's much simpler: /** * Return an IMM for constants; otherwise call get_nir_src() as normal. */ fs_reg fs_visitor::get_nir_src_imm(nir_src src) { nir_const_value *val = nir_src_as_const_value(src); return val ? fs_reg(val->i[0]) : get_nir_src(src); } I ended up dropping the patch because I just used nir_src_as_const_value() in the callers directly, but it's probably nice to have one of these. > + > +fs_reg > fs_visitor::get_nir_dest(nir_dest dest) > { > if (dest.is_ssa) { > @@ -3444,7 +3470,7 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, nir_tex_instr *instr) >fs_reg src = get_nir_src(instr->src[i].src); >switch (instr->src[i].src_type) { >case nir_tex_src_bias: > - lod = retype(src, BRW_REGISTER_TYPE_F); > + lod = retype(get_nir_src_imm(instr->src[i].src), BRW_REGISTER_TYPE_F); > break; >case nir_tex_src_comparitor: > shadow_comparitor = retype(src, BRW_REGISTER_TYPE_F); > @@ -3462,7 +3488,7 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, nir_tex_instr *instr) > } > break; >case nir_tex_src_ddx: > - lod = retype(src, BRW_REGISTER_TYPE_F); > + lod = retype(get_nir_src_imm(instr->src[i].src), BRW_REGISTER_TYPE_F); > lod_components = nir_tex_instr_src_size(instr, i); > break; >case nir_tex_src_ddy: > @@ -3471,13 +3497,13 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, nir_tex_instr *instr) >case nir_tex_src_lod: > switch (instr->op) { > case nir_texop_txs: > -lod = retype(src, BRW_REGISTER_TYPE_UD); > +lod = retype(get_nir_src_imm(instr->src[i].src), BRW_REGISTER_TYPE_UD); > break; > case nir_texop_txf: > -lod = retype(src, BRW_REGISTER_TYPE_D); > +lod = retype(get_nir_src_imm(instr->src[i].src), BRW_REGISTER_TYPE_D); > break; > default: > -lod = retype(src, BRW_REGISTER_TYPE_F); > +lod = retype(get_nir_src_imm(instr->src[i].src), BRW_REGISTER_TYPE_F); > break; > } > break; > signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] i965/fs: Merge nir_emit_texture and emit_texture
On Tuesday, May 3, 2016 3:00:24 PM PDT Jason Ekstrand wrote: > The fs_visitor::emit_texture helper originated when we still had both NIR > and IR visitors for the FS backend. Since the old visitor was removed, > emit_texture serves no real purpose beyond arbitrarily splitting > heavily-linked code across two functions. > --- > src/mesa/drivers/dri/i965/brw_fs.h | 18 --- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 223 ++ + > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 159 --- > 3 files changed, 162 insertions(+), 238 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/ brw_fs.h > index a5c3297..925e4b7 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.h > +++ b/src/mesa/drivers/dri/i965/brw_fs.h > @@ -200,21 +200,6 @@ public: > void emit_interpolation_setup_gen4(); > void emit_interpolation_setup_gen6(); > void compute_sample_position(fs_reg dst, fs_reg int_sample_pos); > - void emit_texture(ir_texture_opcode op, > - const glsl_type *dest_type, > - fs_reg coordinate, int components, > - fs_reg shadow_c, > - fs_reg lod, fs_reg dpdy, int grad_components, > - fs_reg sample_index, > - fs_reg offset, > - fs_reg mcs, > - int gather_component, > - bool is_cube_array, > - uint32_t surface, > - fs_reg surface_reg, > - uint32_t sampler, > - fs_reg sampler_reg, > - unsigned return_channels); > fs_reg emit_mcs_fetch(const fs_reg &coordinate, unsigned components, > const fs_reg &sampler); > void emit_gen6_gather_wa(uint8_t wa, fs_reg dst); > @@ -375,9 +360,6 @@ public: > bool simd16_unsupported; > char *no16_msg; > > - /* Result of last visit() method. Still used by emit_texture() */ > - fs_reg result; > - > /** Register numbers for thread payload fields. */ > struct thread_payload { >uint8_t source_depth_reg; > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/ dri/i965/brw_fs_nir.cpp > index 360e2c9..ebc54ad 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -3068,65 +3068,61 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, nir_tex_instr *instr) > { > unsigned texture = instr->texture_index; > unsigned sampler = instr->sampler_index; > - fs_reg texture_reg(brw_imm_ud(texture)); > - fs_reg sampler_reg(brw_imm_ud(sampler)); > > - int gather_component = instr->component; > + fs_reg srcs[TEX_LOGICAL_NUM_SRCS]; > > - bool is_cube_array = instr->sampler_dim == GLSL_SAMPLER_DIM_CUBE && > -instr->is_array; > + srcs[TEX_LOGICAL_SRC_SURFACE] = brw_imm_ud(texture); > + srcs[TEX_LOGICAL_SRC_SAMPLER] = brw_imm_ud(sampler); > > int lod_components = 0; > > - fs_reg coordinate, shadow_comparitor, lod, lod2, sample_index, mcs, tex_offset; > - > /* The hardware requires a LOD for buffer textures */ > if (instr->sampler_dim == GLSL_SAMPLER_DIM_BUF) > - lod = brw_imm_d(0); > + srcs[TEX_LOGICAL_SRC_LOD] = brw_imm_d(0); > > for (unsigned i = 0; i < instr->num_srcs; i++) { >fs_reg src = get_nir_src(instr->src[i].src); >switch (instr->src[i].src_type) { >case nir_tex_src_bias: > - lod = retype(src, BRW_REGISTER_TYPE_F); > + srcs[TEX_LOGICAL_SRC_LOD] = retype(src, BRW_REGISTER_TYPE_F); > break; >case nir_tex_src_comparitor: > - shadow_comparitor = retype(src, BRW_REGISTER_TYPE_F); > + srcs[TEX_LOGICAL_SRC_SHADOW_C] = retype(src, BRW_REGISTER_TYPE_F); > break; >case nir_tex_src_coord: > switch (instr->op) { > case nir_texop_txf: > case nir_texop_txf_ms: > case nir_texop_samples_identical: > -coordinate = retype(src, BRW_REGISTER_TYPE_D); > +srcs[TEX_LOGICAL_SRC_COORDINATE] = retype(src, BRW_REGISTER_TYPE_D); > break; > default: > -coordinate = retype(src, BRW_REGISTER_TYPE_F); > +srcs[TEX_LOGICAL_SRC_COORDINATE] = retype(src, BRW_REGISTER_TYPE_F); > break; > } > break; >case nir_tex_src_ddx: > - lod = retype(src, BRW_REGISTER_TYPE_F); > + srcs[TEX_LOGICAL_SRC_LOD] = retype(src, BRW_REGISTER_TYPE_F); > lod_components = nir_tex_instr_src_size(instr, i); > break; >case nir_tex_src_ddy: > - lod2 = retype(src, BRW_REGISTER_TYPE_F); > + srcs[TEX_LOGICAL_SRC_LOD2] = retype(src, BRW_REGISTER_TYPE_F); > break; >case nir_tex_src_lod: > switch (instr->op) { > case nir_texop_txs: > -
[Mesa-dev] [PATCH 13/14] radeonsi: consolidate radeon_add_to_buffer_list calls for DMA
From: Marek Olšák --- src/gallium/drivers/radeon/r600_pipe_common.c | 14 ++ src/gallium/drivers/radeonsi/cik_sdma.c | 23 --- src/gallium/drivers/radeonsi/si_dma.c | 10 -- 3 files changed, 14 insertions(+), 33 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index c0579da..3eb12f3 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -166,6 +166,20 @@ void r600_need_dma_space(struct r600_common_context *ctx, unsigned num_dw, ctx->dma.flush(ctx, RADEON_FLUSH_ASYNC, NULL); assert((num_dw + ctx->dma.cs->cdw) <= ctx->dma.cs->max_dw); } + + /* If GPUVM is not supported, the CS checker needs 2 entries +* in the buffer list per packet, which has to be done manually. +*/ + if (ctx->screen->info.has_virtual_memory) { + if (dst) + radeon_add_to_buffer_list(ctx, &ctx->dma, dst, + RADEON_USAGE_WRITE, + RADEON_PRIO_SDMA_BUFFER); + if (src) + radeon_add_to_buffer_list(ctx, &ctx->dma, src, + RADEON_USAGE_READ, + RADEON_PRIO_SDMA_BUFFER); + } } /* This is required to prevent read-after-write hazards. */ diff --git a/src/gallium/drivers/radeonsi/cik_sdma.c b/src/gallium/drivers/radeonsi/cik_sdma.c index 29d9a00..17c3a83 100644 --- a/src/gallium/drivers/radeonsi/cik_sdma.c +++ b/src/gallium/drivers/radeonsi/cik_sdma.c @@ -49,11 +49,6 @@ static void cik_sdma_do_copy_buffer(struct si_context *ctx, ncopy = DIV_ROUND_UP(size, CIK_SDMA_COPY_MAX_SIZE); r600_need_dma_space(&ctx->b, ncopy * 7, rdst, rsrc); - radeon_add_to_buffer_list(&ctx->b, &ctx->b.dma, rsrc, RADEON_USAGE_READ, - RADEON_PRIO_SDMA_BUFFER); - radeon_add_to_buffer_list(&ctx->b, &ctx->b.dma, rdst, RADEON_USAGE_WRITE, - RADEON_PRIO_SDMA_BUFFER); - for (i = 0; i < ncopy; i++) { csize = MIN2(size, CIK_SDMA_COPY_MAX_SIZE); cs->buf[cs->cdw++] = CIK_SDMA_PACKET(CIK_SDMA_OPCODE_COPY, @@ -213,12 +208,6 @@ static bool cik_sdma_copy_texture(struct si_context *sctx, struct radeon_winsys_cs *cs = sctx->b.dma.cs; r600_need_dma_space(&sctx->b, 13, &rdst->resource, &rsrc->resource); - radeon_add_to_buffer_list(&sctx->b, &sctx->b.dma, &rsrc->resource, - RADEON_USAGE_READ, - RADEON_PRIO_SDMA_TEXTURE); - radeon_add_to_buffer_list(&sctx->b, &sctx->b.dma, &rdst->resource, - RADEON_USAGE_WRITE, - RADEON_PRIO_SDMA_TEXTURE); radeon_emit(cs, CIK_SDMA_PACKET(CIK_SDMA_OPCODE_COPY, CIK_SDMA_COPY_SUB_OPCODE_LINEAR_SUB_WINDOW, 0) | @@ -381,12 +370,6 @@ static bool cik_sdma_copy_texture(struct si_context *sctx, struct radeon_winsys_cs *cs = sctx->b.dma.cs; r600_need_dma_space(&sctx->b, 14, &rdst->resource, &rsrc->resource); - radeon_add_to_buffer_list(&sctx->b, &sctx->b.dma, &rsrc->resource, - RADEON_USAGE_READ, - RADEON_PRIO_SDMA_TEXTURE); - radeon_add_to_buffer_list(&sctx->b, &sctx->b.dma, &rdst->resource, - RADEON_USAGE_WRITE, - RADEON_PRIO_SDMA_TEXTURE); radeon_emit(cs, CIK_SDMA_PACKET(CIK_SDMA_OPCODE_COPY, CIK_SDMA_COPY_SUB_OPCODE_TILED_SUB_WINDOW, 0) | @@ -483,12 +466,6 @@ static bool cik_sdma_copy_texture(struct si_context *sctx, struct radeon_winsys_cs *cs = sctx->b.dma.cs; r600_need_dma_space(&sctx->b, 15, &rdst->resource, &rsrc->resource); - radeon_add_to_buffer_list(&sctx->b, &sctx->b.dma, &rsrc->resource, - RADEON_USAGE_READ, - RADEON_PRIO_SDMA_TEXTURE); - radeon_add_to_buffer_list(&sctx->b, &sctx->b.dma, &rdst->resource, - RADEON_USAGE_WRITE, - RADEON_PRIO_SDMA_TEXTURE); radeon_emit(cs, CIK_SDMA_PACKET(CIK_SDMA_OPCODE_COPY,
[Mesa-dev] [PATCH 14/14] gallium/radeon: don't flush the GFX IB if DMA doesn't depend on it
From: Marek Olšák --- src/gallium/drivers/radeon/r600_pipe_common.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 3eb12f3..4845587 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -154,8 +154,14 @@ void r600_need_dma_space(struct r600_common_context *ctx, unsigned num_dw, gtt += src->buf->size; } - /* Flush the GFX IB if it's not empty. */ - if (ctx->gfx.cs->cdw > ctx->initial_gfx_cs_size) + /* Flush the GFX IB if DMA depends on it. */ + if (ctx->gfx.cs->cdw > ctx->initial_gfx_cs_size && + ((dst && + ctx->ws->cs_is_buffer_referenced(ctx->gfx.cs, dst->buf, + RADEON_USAGE_READWRITE)) || +(src && + ctx->ws->cs_is_buffer_referenced(ctx->gfx.cs, src->buf, + RADEON_USAGE_WRITE ctx->gfx.flush(ctx, RADEON_FLUSH_ASYNC, NULL); /* Flush if there's not enough space, or if the memory usage per IB -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/14] gallium/radeon: flush if DMA IB memory usage is too high
From: Marek Olšák This prevents IB rejections due to insane memory usage from many concecutive texture uploads. --- src/gallium/drivers/r600/evergreen_hw_context.c | 2 +- src/gallium/drivers/r600/evergreen_state.c | 2 +- src/gallium/drivers/r600/r600_hw_context.c | 2 +- src/gallium/drivers/r600/r600_state.c | 2 +- src/gallium/drivers/radeon/r600_pipe_common.c | 27 + src/gallium/drivers/radeon/r600_pipe_common.h | 3 ++- src/gallium/drivers/radeonsi/cik_sdma.c | 8 src/gallium/drivers/radeonsi/si_dma.c | 4 ++-- 8 files changed, 35 insertions(+), 15 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c b/src/gallium/drivers/r600/evergreen_hw_context.c index c2dba8c..cd07319 100644 --- a/src/gallium/drivers/r600/evergreen_hw_context.c +++ b/src/gallium/drivers/r600/evergreen_hw_context.c @@ -60,7 +60,7 @@ void evergreen_dma_copy_buffer(struct r600_context *rctx, } ncopy = (size / EG_DMA_COPY_MAX_SIZE) + !!(size % EG_DMA_COPY_MAX_SIZE); - r600_need_dma_space(&rctx->b, ncopy * 5); + r600_need_dma_space(&rctx->b, ncopy * 5, rdst, rsrc); for (i = 0; i < ncopy; i++) { csize = size < EG_DMA_COPY_MAX_SIZE ? size : EG_DMA_COPY_MAX_SIZE; /* emit reloc before writing cs so that cs is always in consistent state */ diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index acf60c6..62152c0 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -3442,7 +3442,7 @@ static void evergreen_dma_copy_tile(struct r600_context *rctx, size = (copy_height * pitch) / 4; ncopy = (size / EG_DMA_COPY_MAX_SIZE) + !!(size % EG_DMA_COPY_MAX_SIZE); - r600_need_dma_space(&rctx->b, ncopy * 9); + r600_need_dma_space(&rctx->b, ncopy * 9, &rdst->resource, &rsrc->resource); for (i = 0; i < ncopy; i++) { cheight = copy_height; diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index fa1028b..857da7f 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -467,7 +467,7 @@ void r600_dma_copy_buffer(struct r600_context *rctx, size >>= 2; /* convert to dwords */ ncopy = (size / R600_DMA_COPY_MAX_SIZE_DW) + !!(size % R600_DMA_COPY_MAX_SIZE_DW); - r600_need_dma_space(&rctx->b, ncopy * 5); + r600_need_dma_space(&rctx->b, ncopy * 5, rdst, rsrc); for (i = 0; i < ncopy; i++) { csize = size < R600_DMA_COPY_MAX_SIZE_DW ? size : R600_DMA_COPY_MAX_SIZE_DW; /* emit reloc before writing cs so that cs is always in consistent state */ diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 715c6f1..ab0cf5c 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -2918,7 +2918,7 @@ static boolean r600_dma_copy_tile(struct r600_context *rctx, */ cheight = ((R600_DMA_COPY_MAX_SIZE_DW * 4) / pitch) & 0xfff8; ncopy = (copy_height / cheight) + !!(copy_height % cheight); - r600_need_dma_space(&rctx->b, ncopy * 7); + r600_need_dma_space(&rctx->b, ncopy * 7, &rdst->resource, &rsrc->resource); for (i = 0; i < ncopy; i++) { cheight = cheight > copy_height ? copy_height : cheight; diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 02e17ba..eac7812 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -136,14 +136,33 @@ void r600_draw_rectangle(struct blitter_context *blitter, pipe_resource_reference(&buf, NULL); } -void r600_need_dma_space(struct r600_common_context *ctx, unsigned num_dw) +void r600_need_dma_space(struct r600_common_context *ctx, unsigned num_dw, + struct r600_resource *dst, struct r600_resource *src) { + uint64_t vram = 0, gtt = 0; + + if (dst) { + if (dst->domains & RADEON_DOMAIN_VRAM) + vram += dst->buf->size; + else if (dst->domains & RADEON_DOMAIN_GTT) + gtt += dst->buf->size; + } + if (src) { + if (src->domains & RADEON_DOMAIN_VRAM) + vram += src->buf->size; + else if (src->domains & RADEON_DOMAIN_GTT) + gtt += src->buf->size; + } + /* Flush the GFX IB if it's not empty. */ if (ctx->gfx.cs->cdw > ctx->initial_gfx_cs_size) ctx->gfx.flush(ctx, RADEON_FLUSH_ASYNC, NULL); - /* Flush if there's not enough space. */ - if ((num_dw + ctx->dma.cs->cdw) > ctx->dma.cs->max_dw) { + /* Flush if there's not enough space, or if the
[Mesa-dev] [PATCH 12/14] gallium/radeon: add a heuristic for better (S)DMA performance
From: Marek Olšák --- src/gallium/drivers/radeon/r600_pipe_common.c | 16 src/gallium/drivers/radeon/radeon_winsys.h| 2 ++ src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 8 src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 8 4 files changed, 34 insertions(+) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index eac7812..c0579da 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -176,6 +176,22 @@ void r600_dma_emit_wait_idle(struct r600_common_context *rctx) /* done at the end of DMA calls, so increment this. */ rctx->num_dma_calls++; + /* IBs using too little memory are limited by the IB submission overhead. +* IBs using too much memory are limited by the kernel/TTM overhead. +* Too long IBs create CPU-GPU pipeline bubbles and add latency. +* +* This heuristic makes sure that DMA requests are executed +* very soon or immediately after the call is made and lowers memory +* usage. It improves texture upload performance by keeping the DMA +* engine busy while uploads are being submitted. +* +* 512 KB of copied data (1 MB / 2) is enough to submit an IB. +*/ + if (rctx->ws->cs_query_memory_usage(rctx->dma.cs) > 1024 * 1024) { + rctx->dma.flush(rctx, RADEON_FLUSH_ASYNC, NULL); + return; + } + r600_need_dma_space(rctx, 1, NULL, NULL); if (cs->cdw == 0) /* empty queue */ diff --git a/src/gallium/drivers/radeon/radeon_winsys.h b/src/gallium/drivers/radeon/radeon_winsys.h index 79c548c..2e1848e 100644 --- a/src/gallium/drivers/radeon/radeon_winsys.h +++ b/src/gallium/drivers/radeon/radeon_winsys.h @@ -682,6 +682,8 @@ struct radeon_winsys { */ boolean (*cs_memory_below_limit)(struct radeon_winsys_cs *cs, uint64_t vram, uint64_t gtt); +uint64_t (*cs_query_memory_usage)(struct radeon_winsys_cs *cs); + /** * Return the buffer list. * diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c index 03e45a9..a6156fc 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c @@ -562,6 +562,13 @@ static boolean amdgpu_cs_memory_below_limit(struct radeon_winsys_cs *rcs, uint64 return gtt < ws->info.gart_size * 0.7; } +static uint64_t amdgpu_cs_query_memory_usage(struct radeon_winsys_cs *rcs) +{ + struct amdgpu_cs *cs = amdgpu_cs(rcs); + + return cs->used_vram + cs->used_gart; +} + static unsigned amdgpu_cs_get_buffer_list(struct radeon_winsys_cs *rcs, struct radeon_bo_list_item *list) { @@ -827,6 +834,7 @@ void amdgpu_cs_init_functions(struct amdgpu_winsys *ws) ws->base.cs_lookup_buffer = amdgpu_cs_lookup_buffer; ws->base.cs_validate = amdgpu_cs_validate; ws->base.cs_memory_below_limit = amdgpu_cs_memory_below_limit; + ws->base.cs_query_memory_usage = amdgpu_cs_query_memory_usage; ws->base.cs_get_buffer_list = amdgpu_cs_get_buffer_list; ws->base.cs_flush = amdgpu_cs_flush; ws->base.cs_is_buffer_referenced = amdgpu_bo_is_referenced; diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c index 7a901a1..9ac5081 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c @@ -398,6 +398,13 @@ static boolean radeon_drm_cs_memory_below_limit(struct radeon_winsys_cs *rcs, ui return gtt < cs->ws->info.gart_size * 0.7; } +static uint64_t radeon_drm_cs_query_memory_usage(struct radeon_winsys_cs *rcs) +{ + struct radeon_drm_cs *cs = radeon_drm_cs(rcs); + + return cs->csc->used_vram + cs->csc->used_gart; +} + static unsigned radeon_drm_cs_get_buffer_list(struct radeon_winsys_cs *rcs, struct radeon_bo_list_item *list) { @@ -671,6 +678,7 @@ void radeon_drm_cs_init_functions(struct radeon_drm_winsys *ws) ws->base.cs_lookup_buffer = radeon_drm_cs_lookup_buffer; ws->base.cs_validate = radeon_drm_cs_validate; ws->base.cs_memory_below_limit = radeon_drm_cs_memory_below_limit; +ws->base.cs_query_memory_usage = radeon_drm_cs_query_memory_usage; ws->base.cs_get_buffer_list = radeon_drm_cs_get_buffer_list; ws->base.cs_flush = radeon_drm_cs_flush; ws->base.cs_is_buffer_referenced = radeon_bo_is_referenced; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/14] gallium/radeon: fix (S)DMA read-after-write hazards
From: Marek Olšák --- src/gallium/drivers/r600/evergreen_hw_context.c | 1 + src/gallium/drivers/r600/evergreen_state.c | 1 + src/gallium/drivers/r600/r600_hw_context.c | 1 + src/gallium/drivers/r600/r600_state.c | 1 + src/gallium/drivers/radeon/r600_pipe_common.c | 24 src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/gallium/drivers/radeonsi/cik_sdma.c | 1 + src/gallium/drivers/radeonsi/si_dma.c | 2 ++ 8 files changed, 32 insertions(+) diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c b/src/gallium/drivers/r600/evergreen_hw_context.c index a0f4680..c2dba8c 100644 --- a/src/gallium/drivers/r600/evergreen_hw_context.c +++ b/src/gallium/drivers/r600/evergreen_hw_context.c @@ -77,6 +77,7 @@ void evergreen_dma_copy_buffer(struct r600_context *rctx, src_offset += csize << shift; size -= csize; } + r600_dma_emit_wait_idle(&rctx->b); } /* The max number of bytes to copy per packet. */ diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index 672ddd4..acf60c6 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -3470,6 +3470,7 @@ static void evergreen_dma_copy_tile(struct r600_context *rctx, addr += cheight * pitch; y += cheight; } + r600_dma_emit_wait_idle(&rctx->b); } static void evergreen_dma_copy(struct pipe_context *ctx, diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index 2bc6d3f..fa1028b 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -484,4 +484,5 @@ void r600_dma_copy_buffer(struct r600_context *rctx, src_offset += csize << 2; size -= csize; } + r600_dma_emit_wait_idle(&rctx->b); } diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 48e3663..715c6f1 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -2941,6 +2941,7 @@ static boolean r600_dma_copy_tile(struct r600_context *rctx, addr += cheight * pitch; y += cheight; } + r600_dma_emit_wait_idle(&rctx->b); return TRUE; } diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index e1b0b63..02e17ba 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -149,6 +149,30 @@ void r600_need_dma_space(struct r600_common_context *ctx, unsigned num_dw) } } +/* This is required to prevent read-after-write hazards. */ +void r600_dma_emit_wait_idle(struct r600_common_context *rctx) +{ + struct radeon_winsys_cs *cs = rctx->dma.cs; + + /* done at the end of DMA calls, so increment this. */ + rctx->num_dma_calls++; + + r600_need_dma_space(rctx, 1); + + if (cs->cdw == 0) /* empty queue */ + return; + + /* NOP waits for idle on Evergreen and later. */ + if (rctx->chip_class >= CIK) + radeon_emit(cs, 0x); /* NOP */ + else if (rctx->chip_class >= EVERGREEN) + radeon_emit(cs, 0xf000); /* NOP */ + else { + /* TODO: R600-R700 should use the FENCE packet. +* CS checker support is required. */ + } +} + static void r600_memory_barrier(struct pipe_context *ctx, unsigned flags) { } diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index cc2d33e..a6afc43 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -598,6 +598,7 @@ struct pipe_resource *r600_resource_create_common(struct pipe_screen *screen, const struct pipe_resource *templ); const char *r600_get_llvm_processor_name(enum radeon_family family); void r600_need_dma_space(struct r600_common_context *ctx, unsigned num_dw); +void r600_dma_emit_wait_idle(struct r600_common_context *rctx); /* r600_gpu_load.c */ void r600_gpu_load_kill_thread(struct r600_common_screen *rscreen); diff --git a/src/gallium/drivers/radeonsi/cik_sdma.c b/src/gallium/drivers/radeonsi/cik_sdma.c index 88a994e..5021578 100644 --- a/src/gallium/drivers/radeonsi/cik_sdma.c +++ b/src/gallium/drivers/radeonsi/cik_sdma.c @@ -87,6 +87,7 @@ static void cik_sdma_copy_buffer(struct si_context *ctx, dst_offset + size); cik_sdma_do_copy_buffer(ctx, dst, src, dst_offset, src_offset, size); + r600_dma_emit_wait_idle(&ctx->b); } static void cik_sdma_copy(struct pipe_context *ctx, diff --git a/src/gallium/drivers/radeonsi/si_dma.c b/src/gallium/drivers/radeonsi/si_dma.c index 033eb
[Mesa-dev] [PATCH 01/14] gallium/radeon: rename r600_texture_disable_cmask -> discard_cmask
From: Marek Olšák because it doesn't decompress --- src/gallium/drivers/radeon/r600_texture.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_texture.c b/src/gallium/drivers/radeon/r600_texture.c index 67b2a34..6df013a 100644 --- a/src/gallium/drivers/radeon/r600_texture.c +++ b/src/gallium/drivers/radeon/r600_texture.c @@ -267,7 +267,7 @@ static void r600_eliminate_fast_color_clear(struct r600_common_screen *rscreen, pipe_mutex_unlock(rscreen->aux_context_lock); } -static void r600_texture_disable_cmask(struct r600_common_screen *rscreen, +static void r600_texture_discard_cmask(struct r600_common_screen *rscreen, struct r600_texture *rtex) { if (!rtex->cmask.size) @@ -351,7 +351,7 @@ static boolean r600_texture_get_handle(struct pipe_screen* screen, /* Disable CMASK if flush_resource isn't going * to be called. */ - r600_texture_disable_cmask(rscreen, rtex); + r600_texture_discard_cmask(rscreen, rtex); update_metadata = true; } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/14] radeonsi: add new SDMA texture copy code
From: Marek Olšák This implements: - Linear-to-linear partial copies. (unaligned) - Tiled-to-linear and linear-to-tiled partial copies. (unaligned except 1-2 Bpp) - Tiled-to-tiled partial copies aligned to 8x8. --- src/gallium/drivers/radeonsi/cik_sdma.c | 436 1 file changed, 436 insertions(+) diff --git a/src/gallium/drivers/radeonsi/cik_sdma.c b/src/gallium/drivers/radeonsi/cik_sdma.c index 5021578..ea6a122 100644 --- a/src/gallium/drivers/radeonsi/cik_sdma.c +++ b/src/gallium/drivers/radeonsi/cik_sdma.c @@ -90,6 +90,438 @@ static void cik_sdma_copy_buffer(struct si_context *ctx, r600_dma_emit_wait_idle(&ctx->b); } +static unsigned minify_as_blocks(unsigned width, unsigned level, unsigned blk_w) +{ + width = u_minify(width, level); + return DIV_ROUND_UP(width, blk_w); +} + +static unsigned encode_tile_info(struct si_context *sctx, +struct r600_texture *tex, unsigned level, +bool set_bpp) +{ + struct radeon_info *info = &sctx->screen->b.info; + unsigned tile_index = tex->surface.tiling_index[level]; + unsigned macro_tile_index = tex->surface.macro_tile_index; + unsigned tile_mode = info->si_tile_mode_array[tile_index]; + unsigned macro_tile_mode = info->cik_macrotile_mode_array[macro_tile_index]; + + return (set_bpp ? util_logbase2(tex->surface.bpe) : 0) | + (G_009910_ARRAY_MODE(tile_mode) << 3) | + (G_009910_MICRO_TILE_MODE_NEW(tile_mode) << 8) | + /* Non-depth modes don't have TILE_SPLIT set. */ + ((util_logbase2(tex->surface.tile_split >> 6)) << 11) | + (G_009990_BANK_WIDTH(macro_tile_mode) << 15) | + (G_009990_BANK_HEIGHT(macro_tile_mode) << 18) | + (G_009990_NUM_BANKS(macro_tile_mode) << 21) | + (G_009990_MACRO_TILE_ASPECT(macro_tile_mode) << 24) | + (G_009910_PIPE_CONFIG(tile_mode) << 26); +} + +static bool cik_sdma_copy_texture(struct si_context *sctx, + struct pipe_resource *dst, + unsigned dst_level, + unsigned dstx, unsigned dsty, unsigned dstz, + struct pipe_resource *src, + unsigned src_level, + const struct pipe_box *src_box) +{ + struct radeon_info *info = &sctx->screen->b.info; + struct r600_texture *rsrc = (struct r600_texture*)src; + struct r600_texture *rdst = (struct r600_texture*)dst; + unsigned bpp = rdst->surface.bpe; + uint64_t dst_address = rdst->resource.gpu_address + + rdst->surface.level[dst_level].offset; + uint64_t src_address = rsrc->resource.gpu_address + + rsrc->surface.level[src_level].offset; + unsigned dst_mode = rdst->surface.level[dst_level].mode; + unsigned src_mode = rsrc->surface.level[src_level].mode; + unsigned dst_tile_index = rdst->surface.tiling_index[dst_level]; + unsigned src_tile_index = rsrc->surface.tiling_index[src_level]; + unsigned dst_tile_mode = info->si_tile_mode_array[dst_tile_index]; + unsigned src_tile_mode = info->si_tile_mode_array[src_tile_index]; + unsigned dst_micro_mode = G_009910_MICRO_TILE_MODE_NEW(dst_tile_mode); + unsigned src_micro_mode = G_009910_MICRO_TILE_MODE_NEW(src_tile_mode); + unsigned dst_pitch = rdst->surface.level[dst_level].pitch_bytes / bpp; + unsigned src_pitch = rsrc->surface.level[src_level].pitch_bytes / bpp; + uint64_t dst_slice_pitch = rdst->surface.level[dst_level].slice_size / bpp; + uint64_t src_slice_pitch = rsrc->surface.level[src_level].slice_size / bpp; + unsigned dst_width = minify_as_blocks(rdst->resource.b.b.width0, + dst_level, rdst->surface.blk_w); + unsigned src_width = minify_as_blocks(rsrc->resource.b.b.width0, + src_level, rsrc->surface.blk_w); + unsigned dst_height = minify_as_blocks(rdst->resource.b.b.height0, + dst_level, rdst->surface.blk_h); + unsigned src_height = minify_as_blocks(rsrc->resource.b.b.height0, + src_level, rsrc->surface.blk_h); + unsigned srcx = src_box->x / rsrc->surface.blk_w; + unsigned srcy = src_box->y / rsrc->surface.blk_h; + unsigned srcz = src_box->z; + unsigned copy_width = DIV_ROUND_UP(src_box->width, rsrc->surface.blk_w); + unsigned copy_height = DIV_ROUND_UP(src_box->height, rsrc->surface.blk_h); + unsigned copy_depth = src_box->depth; + + assert(src_level <= src->last_level); + assert(dst_level <= dst->last_level); + assert(rdst->surface.level[dst_level].offset + +
[Mesa-dev] [PATCH 05/14] gallium/radeon: implement randomized SDMA texture copy testing (v2)
From: Marek Olšák v2: - adjustments for exercising all important SDMA code paths - decrease the probability of getting huge sizes (faster testing) - increase the probability of getting power-of-two dimensions - change the memory cap to 128MB (faster testing) - better detect which engine has been used --- src/gallium/drivers/r600/r600_pipe.c | 3 + src/gallium/drivers/radeon/Makefile.sources | 1 + src/gallium/drivers/radeon/r600_pipe_common.c | 2 + src/gallium/drivers/radeon/r600_pipe_common.h | 6 + src/gallium/drivers/radeon/r600_test_dma.c| 404 ++ src/gallium/drivers/radeonsi/si_pipe.c| 3 + 6 files changed, 419 insertions(+) create mode 100644 src/gallium/drivers/radeon/r600_test_dma.c diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index 52f0986..98c5a89 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -707,5 +707,8 @@ struct pipe_screen *r600_screen_create(struct radeon_winsys *ws) } #endif + if (rscreen->b.debug_flags & DBG_TEST_DMA) + r600_test_dma(&rscreen->b); + return &rscreen->b.b; } diff --git a/src/gallium/drivers/radeon/Makefile.sources b/src/gallium/drivers/radeon/Makefile.sources index f993d75..6fbed81 100644 --- a/src/gallium/drivers/radeon/Makefile.sources +++ b/src/gallium/drivers/radeon/Makefile.sources @@ -10,6 +10,7 @@ C_SOURCES := \ r600_query.c \ r600_query.h \ r600_streamout.c \ + r600_test_dma.c \ r600_texture.c \ r600_viewport.c \ radeon_uvd.c \ diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index dae2369..e1b0b63 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -383,6 +383,8 @@ static const struct debug_named_value common_debug_options[] = { { "noasm", DBG_NO_ASM, "Don't print disassembled shaders"}, { "preoptir", DBG_PREOPT_IR, "Print the LLVM IR before initial optimizations" }, + { "testdma", DBG_TEST_DMA, "Invoke SDMA tests and exit." }, + /* features */ { "nodma", DBG_NO_ASYNC_DMA, "Disable asynchronous DMA" }, { "nohyperz", DBG_NO_HYPERZ, "Disable Hyper-Z" }, diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index 803c89c..cc2d33e 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -77,6 +77,8 @@ #define DBG_NO_TGSI(1 << 13) #define DBG_NO_ASM (1 << 14) #define DBG_PREOPT_IR (1 << 15) +/* gaps */ +#define DBG_TEST_DMA (1 << 20) /* Bits 21-31 are reserved for the r600g driver. */ /* features */ #define DBG_NO_ASYNC_DMA (1llu << 32) @@ -484,6 +486,7 @@ struct r600_common_context { unsignedmax_db; /* for OQ */ /* Misc stats. */ unsignednum_draw_calls; + unsignednum_dma_calls; /* Render condition. */ struct r600_atomrender_cond_atom; @@ -622,6 +625,9 @@ void r600_update_prims_generated_query_state(struct r600_common_context *rctx, unsigned type, int diff); void r600_streamout_init(struct r600_common_context *rctx); +/* r600_test_dma.c */ +void r600_test_dma(struct r600_common_screen *rscreen); + /* r600_texture.c */ bool r600_prepare_for_dma_blit(struct r600_common_context *rctx, struct r600_texture *rdst, diff --git a/src/gallium/drivers/radeon/r600_test_dma.c b/src/gallium/drivers/radeon/r600_test_dma.c new file mode 100644 index 000..c203b4d --- /dev/null +++ b/src/gallium/drivers/radeon/r600_test_dma.c @@ -0,0 +1,404 @@ +/* + * Copyright 2016 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRAC
[Mesa-dev] [PATCH 03/14] gallium/radeon: use a common function for DMA blit preparation
From: Marek Olšák this is more robust and probably fixes some bugs already --- src/gallium/drivers/r600/evergreen_state.c| 10 ++--- src/gallium/drivers/r600/r600_state.c | 5 ++- src/gallium/drivers/radeon/r600_pipe_common.h | 7 src/gallium/drivers/radeon/r600_texture.c | 55 +++ src/gallium/drivers/radeonsi/cik_sdma.c | 14 +-- src/gallium/drivers/radeonsi/si_dma.c | 13 ++- 6 files changed, 73 insertions(+), 31 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index 5224e42..672ddd4 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -3497,14 +3497,10 @@ static void evergreen_dma_copy(struct pipe_context *ctx, return; } - if (src->format != dst->format || src_box->depth > 1 || - (rdst->dirty_level_mask | rdst->stencil_dirty_level_mask) & (1 << dst_level)) { + if (src_box->depth > 1 || + !r600_prepare_for_dma_blit(&rctx->b, rdst, dst_level, dstx, dsty, + dstz, rsrc, src_level, src_box)) goto fallback; - } - - if (rsrc->dirty_level_mask & (1 << src_level)) { - ctx->flush_resource(ctx, src); - } src_x = util_format_get_nblocksx(src->format, src_box->x); dst_x = util_format_get_nblocksx(src->format, dst_x); diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 2291096..48e3663 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -2972,9 +2972,10 @@ static void r600_dma_copy(struct pipe_context *ctx, return; } - if (src->format != dst->format || src_box->depth > 1) { + if (src_box->depth > 1 || + !r600_prepare_for_dma_blit(&rctx->b, rdst, dst_level, dstx, dsty, + dstz, rsrc, src_level, src_box)) goto fallback; - } src_x = util_format_get_nblocksx(src->format, src_box->x); dst_x = util_format_get_nblocksx(src->format, dst_x); diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index 74eefbb..803c89c 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -623,6 +623,13 @@ void r600_update_prims_generated_query_state(struct r600_common_context *rctx, void r600_streamout_init(struct r600_common_context *rctx); /* r600_texture.c */ +bool r600_prepare_for_dma_blit(struct r600_common_context *rctx, + struct r600_texture *rdst, + unsigned dst_level, unsigned dstx, + unsigned dsty, unsigned dstz, + struct r600_texture *rsrc, + unsigned src_level, + const struct pipe_box *src_box); void r600_texture_get_fmask_info(struct r600_common_screen *rscreen, struct r600_texture *rtex, unsigned nr_samples, diff --git a/src/gallium/drivers/radeon/r600_texture.c b/src/gallium/drivers/radeon/r600_texture.c index ac11380..0dba045 100644 --- a/src/gallium/drivers/radeon/r600_texture.c +++ b/src/gallium/drivers/radeon/r600_texture.c @@ -32,6 +32,61 @@ #include #include +bool r600_prepare_for_dma_blit(struct r600_common_context *rctx, + struct r600_texture *rdst, + unsigned dst_level, unsigned dstx, + unsigned dsty, unsigned dstz, + struct r600_texture *rsrc, + unsigned src_level, + const struct pipe_box *src_box) +{ + if (!rctx->dma.cs) + return false; + + if (util_format_get_blocksizebits(rdst->resource.b.b.format) != + util_format_get_blocksizebits(rsrc->resource.b.b.format)) + return false; + + /* MSAA: Blits don't exist in the real world. */ + if (rsrc->resource.b.b.nr_samples > 1 || + rdst->resource.b.b.nr_samples > 1) + return false; + + /* Depth-stencil surfaces: +* When dst is linear, the DB->CB copy preserves HTILE. +* When dst is tiled, the 3D path must be used to update HTILE. +*/ + if (rsrc->is_depth || rdst->is_depth) + return false; + + /* DCC as: +* src: Use the 3D path. DCC decompression is expensive. +* dst: If overwriting the whole texture, disable DCC and use SDMA. +*Otherwise, use the 3D path. +* TODO: handle the case when the dst box covers the whole texture +*/ + if (rsrc->dcc_offset || rdst->dcc_offset) +
[Mesa-dev] [PATCH 02/14] gallium/radeon: split out code for discarding DCC
From: Marek Olšák --- src/gallium/drivers/radeon/r600_texture.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_texture.c b/src/gallium/drivers/radeon/r600_texture.c index 6df013a..ac11380 100644 --- a/src/gallium/drivers/radeon/r600_texture.c +++ b/src/gallium/drivers/radeon/r600_texture.c @@ -292,6 +292,17 @@ static void r600_texture_discard_cmask(struct r600_common_screen *rscreen, p_atomic_inc(&rscreen->compressed_colortex_counter); } +static void r600_texture_discard_dcc(struct r600_common_screen *rscreen, +struct r600_texture *rtex) +{ + /* Disable DCC. */ + rtex->dcc_offset = 0; + rtex->cb_color_info &= ~VI_S_028C70_DCC_ENABLE(1); + + /* Notify all contexts about the change. */ + r600_dirty_all_framebuffer_states(rscreen); +} + void r600_texture_disable_dcc(struct r600_common_screen *rscreen, struct r600_texture *rtex) { @@ -307,12 +318,7 @@ void r600_texture_disable_dcc(struct r600_common_screen *rscreen, rctx->b.flush(&rctx->b, NULL, 0); pipe_mutex_unlock(rscreen->aux_context_lock); - /* Disable DCC. */ - rtex->dcc_offset = 0; - rtex->cb_color_info &= ~VI_S_028C70_DCC_ENABLE(1); - - /* Notify all contexts about the change. */ - r600_dirty_all_framebuffer_states(rscreen); + r600_texture_discard_dcc(rscreen, rtex); } static boolean r600_texture_get_handle(struct pipe_screen* screen, -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/14] gallium/radeon: discard CMASK or DCC if overwriting a whole texture by DMA
From: Marek Olšák --- src/gallium/drivers/radeon/r600_texture.c | 46 ++- 1 file changed, 39 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_texture.c b/src/gallium/drivers/radeon/r600_texture.c index 0dba045..d6d95af 100644 --- a/src/gallium/drivers/radeon/r600_texture.c +++ b/src/gallium/drivers/radeon/r600_texture.c @@ -32,6 +32,23 @@ #include #include +static void r600_texture_discard_dcc(struct r600_common_screen *rscreen, +struct r600_texture *rtex); +static void r600_texture_discard_cmask(struct r600_common_screen *rscreen, + struct r600_texture *rtex); + + +static bool range_covers_whole_texture(struct pipe_resource *tex, + unsigned level, unsigned x, unsigned y, + unsigned z, unsigned width, + unsigned height, unsigned depth) +{ + return x == 0 && y == 0 && z == 0 && + width == u_minify(tex->width0, level) && + height == u_minify(tex->height0, level) && + depth == util_max_layer(tex, level) + 1; +} + bool r600_prepare_for_dma_blit(struct r600_common_context *rctx, struct r600_texture *rdst, unsigned dst_level, unsigned dstx, @@ -61,21 +78,36 @@ bool r600_prepare_for_dma_blit(struct r600_common_context *rctx, /* DCC as: * src: Use the 3D path. DCC decompression is expensive. -* dst: If overwriting the whole texture, disable DCC and use SDMA. +* dst: If overwriting the whole texture, discard DCC and use SDMA. *Otherwise, use the 3D path. -* TODO: handle the case when the dst box covers the whole texture */ - if (rsrc->dcc_offset || rdst->dcc_offset) + if (rsrc->dcc_offset) return false; + if (rdst->dcc_offset) { + /* We can't discard DCC if the texture has been exported. */ + if (!rdst->resource.is_shared && + range_covers_whole_texture(&rdst->resource.b.b, dst_level, + dstx, dsty, dstz, src_box->width, + src_box->height, src_box->depth)) + r600_texture_discard_dcc(rctx->screen, rdst); + else + return false; + } + /* CMASK as: * src: Both texture and SDMA paths need decompression. Use SDMA. -* dst: If overwriting the whole texture, deallocate CMASK and use +* dst: If overwriting the whole texture, discard CMASK and use *SDMA. Otherwise, use the 3D path. -* TODO: handle the case when the dst box covers the whole texture */ - if (rdst->cmask.size && rdst->dirty_level_mask & (1 << dst_level)) - return false; + if (rdst->cmask.size && rdst->dirty_level_mask & (1 << dst_level)) { + if (range_covers_whole_texture(&rdst->resource.b.b, dst_level, + dstx, dsty, dstz, src_box->width, + src_box->height, src_box->depth)) + r600_texture_discard_cmask(rctx->screen, rdst); + else + return false; + } /* All requirements are met. Prepare textures for SDMA. */ if (rsrc->cmask.size && rsrc->dirty_level_mask & (1 << src_level)) -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/14] radeonsi: only expose *_init_*dma_functions from (S)DMA files
From: Marek Olšák just normalizing the interfaces --- src/gallium/drivers/radeonsi/cik_sdma.c | 19 --- src/gallium/drivers/radeonsi/si_dma.c | 19 --- src/gallium/drivers/radeonsi/si_pipe.c | 5 + src/gallium/drivers/radeonsi/si_pipe.h | 16 ++-- src/gallium/drivers/radeonsi/si_state.c | 6 -- 5 files changed, 31 insertions(+), 34 deletions(-) diff --git a/src/gallium/drivers/radeonsi/cik_sdma.c b/src/gallium/drivers/radeonsi/cik_sdma.c index 7f7db91..a8caf04 100644 --- a/src/gallium/drivers/radeonsi/cik_sdma.c +++ b/src/gallium/drivers/radeonsi/cik_sdma.c @@ -193,13 +193,13 @@ static void cik_sdma_copy_tile(struct si_context *ctx, } } -void cik_sdma_copy(struct pipe_context *ctx, - struct pipe_resource *dst, - unsigned dst_level, - unsigned dstx, unsigned dsty, unsigned dstz, - struct pipe_resource *src, - unsigned src_level, - const struct pipe_box *src_box) +static void cik_sdma_copy(struct pipe_context *ctx, + struct pipe_resource *dst, + unsigned dst_level, + unsigned dstx, unsigned dsty, unsigned dstz, + struct pipe_resource *src, + unsigned src_level, + const struct pipe_box *src_box) { struct si_context *sctx = (struct si_context *)ctx; struct r600_texture *rsrc = (struct r600_texture*)src; @@ -324,3 +324,8 @@ fallback: si_resource_copy_region(ctx, dst, dst_level, dstx, dsty, dstz, src, src_level, src_box); } + +void cik_init_sdma_functions(struct si_context *sctx) +{ + sctx->b.dma_copy = cik_sdma_copy; +} diff --git a/src/gallium/drivers/radeonsi/si_dma.c b/src/gallium/drivers/radeonsi/si_dma.c index 84961d5..033eb7b 100644 --- a/src/gallium/drivers/radeonsi/si_dma.c +++ b/src/gallium/drivers/radeonsi/si_dma.c @@ -190,13 +190,13 @@ static void si_dma_copy_tile(struct si_context *ctx, } } -void si_dma_copy(struct pipe_context *ctx, -struct pipe_resource *dst, -unsigned dst_level, -unsigned dstx, unsigned dsty, unsigned dstz, -struct pipe_resource *src, -unsigned src_level, -const struct pipe_box *src_box) +static void si_dma_copy(struct pipe_context *ctx, + struct pipe_resource *dst, + unsigned dst_level, + unsigned dstx, unsigned dsty, unsigned dstz, + struct pipe_resource *src, + unsigned src_level, + const struct pipe_box *src_box) { struct si_context *sctx = (struct si_context *)ctx; struct r600_texture *rsrc = (struct r600_texture*)src; @@ -293,3 +293,8 @@ fallback: si_resource_copy_region(ctx, dst, dst_level, dstx, dsty, dstz, src, src_level, src_box); } + +void si_init_dma_functions(struct si_context *sctx) +{ + sctx->b.dma_copy = si_dma_copy; +} diff --git a/src/gallium/drivers/radeonsi/si_pipe.c b/src/gallium/drivers/radeonsi/si_pipe.c index 1a69f43..f2692dc 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.c +++ b/src/gallium/drivers/radeonsi/si_pipe.c @@ -192,6 +192,11 @@ static struct pipe_context *si_create_context(struct pipe_screen *screen, si_init_state_functions(sctx); si_init_shader_functions(sctx); + if (sctx->b.chip_class >= CIK) + cik_init_sdma_functions(sctx); + else + si_init_dma_functions(sctx); + if (sscreen->b.debug_flags & DBG_FORCE_DMA) sctx->b.b.resource_copy_region = sctx->b.dma_copy; diff --git a/src/gallium/drivers/radeonsi/si_pipe.h b/src/gallium/drivers/radeonsi/si_pipe.h index d31e9a9..33d3d25 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.h +++ b/src/gallium/drivers/radeonsi/si_pipe.h @@ -325,13 +325,7 @@ struct si_context { }; /* cik_sdma.c */ -void cik_sdma_copy(struct pipe_context *ctx, - struct pipe_resource *dst, - unsigned dst_level, - unsigned dstx, unsigned dsty, unsigned dstz, - struct pipe_resource *src, - unsigned src_level, - const struct pipe_box *src_box); +void cik_init_sdma_functions(struct si_context *sctx); /* si_blit.c */ void si_init_blit_functions(struct si_context *sctx); @@ -357,13 +351,7 @@ void si_check_vm_faults(struct si_context *sctx); bool si_replace_shader(unsigned num, struct radeon_shader_binary *binary); /* si_dma.c */ -void si_dma_copy(struct pipe_context *ctx, -struct pipe_resource *dst, -unsigned dst_level, -unsigned dstx, unsigned dsty, unsigned dstz, -struct pipe_resource *src, -
[Mesa-dev] [PATCH 07/14] radeonsi: remove SDMA texture copy code
From: Marek Olšák Most of this has never worked according to the new test. The new code will be radically different. --- src/gallium/drivers/radeonsi/cik_sdma.c | 217 +--- 1 file changed, 2 insertions(+), 215 deletions(-) diff --git a/src/gallium/drivers/radeonsi/cik_sdma.c b/src/gallium/drivers/radeonsi/cik_sdma.c index a8caf04..ec3446a 100644 --- a/src/gallium/drivers/radeonsi/cik_sdma.c +++ b/src/gallium/drivers/radeonsi/cik_sdma.c @@ -89,110 +89,6 @@ static void cik_sdma_copy_buffer(struct si_context *ctx, cik_sdma_do_copy_buffer(ctx, dst, src, dst_offset, src_offset, size); } -static void cik_sdma_copy_tile(struct si_context *ctx, - struct pipe_resource *dst, - unsigned dst_level, - struct pipe_resource *src, - unsigned src_level, - unsigned y, - unsigned copy_height, - unsigned y_align, - unsigned pitch, - unsigned bpe) -{ - struct radeon_winsys_cs *cs = ctx->b.dma.cs; - struct r600_texture *rsrc = (struct r600_texture*)src; - struct r600_texture *rdst = (struct r600_texture*)dst; - unsigned dst_mode = rdst->surface.level[dst_level].mode; - unsigned src_mode = rsrc->surface.level[src_level].mode; - bool detile = dst_mode == RADEON_SURF_MODE_LINEAR_ALIGNED; - struct r600_texture *rlinear = detile ? rdst : rsrc; - struct r600_texture *rtiled = detile ? rsrc : rdst; - unsigned linear_lvl = detile ? dst_level : src_level; - unsigned tiled_lvl = detile ? src_level : dst_level; - struct radeon_info *info = &ctx->screen->b.info; - unsigned index = rtiled->surface.tiling_index[tiled_lvl]; - unsigned macro_index = rtiled->surface.macro_tile_index; - unsigned tile_mode = info->si_tile_mode_array[index]; - unsigned macro_mode = info->cik_macrotile_mode_array[macro_index]; - unsigned array_mode, lbpe, pitch_tile_max, slice_tile_max, size; - unsigned ncopy, height, cheight, i; - unsigned sub_op, bank_h, bank_w, mt_aspect, nbanks, tile_split, mt; - uint64_t base, addr; - unsigned pipe_config; - - assert(dst_mode != src_mode); - assert(src_mode == RADEON_SURF_MODE_LINEAR_ALIGNED || dst_mode == RADEON_SURF_MODE_LINEAR_ALIGNED); - - sub_op = CIK_SDMA_COPY_SUB_OPCODE_TILED; - lbpe = util_logbase2(bpe); - pitch_tile_max = ((pitch / bpe) / 8) - 1; - - assert(!util_format_is_depth_and_stencil(rtiled->resource.b.b.format)); - - array_mode = G_009910_ARRAY_MODE(tile_mode); - slice_tile_max = (rtiled->surface.level[tiled_lvl].nblk_x * - rtiled->surface.level[tiled_lvl].nblk_y) / (8*8) - 1; - height = rlinear->surface.level[linear_lvl].nblk_y; - base = rtiled->surface.level[tiled_lvl].offset; - addr = rlinear->surface.level[linear_lvl].offset; - bank_h = G_009990_BANK_HEIGHT(macro_mode); - bank_w = G_009990_BANK_WIDTH(macro_mode); - mt_aspect = G_009990_MACRO_TILE_ASPECT(macro_mode); - /* Non-depth modes don't have TILE_SPLIT set. */ - tile_split = util_logbase2(rtiled->surface.tile_split >> 6); - nbanks = G_009990_NUM_BANKS(macro_mode); - base += rtiled->resource.gpu_address; - addr += rlinear->resource.gpu_address; - - pipe_config = G_009910_PIPE_CONFIG(tile_mode); - mt = G_009910_MICRO_TILE_MODE_NEW(tile_mode); - - size = (copy_height * pitch) / 4; - cheight = copy_height; - if (((cheight * pitch) / 4) > CIK_SDMA_COPY_MAX_SIZE) { - cheight = (CIK_SDMA_COPY_MAX_SIZE * 4) / pitch; - cheight &= ~(y_align - 1); - } - ncopy = (copy_height + cheight - 1) / cheight; - r600_need_dma_space(&ctx->b, ncopy * 12); - - radeon_add_to_buffer_list(&ctx->b, &ctx->b.dma, &rsrc->resource, - RADEON_USAGE_READ, RADEON_PRIO_SDMA_TEXTURE); - radeon_add_to_buffer_list(&ctx->b, &ctx->b.dma, &rdst->resource, - RADEON_USAGE_WRITE, RADEON_PRIO_SDMA_TEXTURE); - - copy_height = size * 4 / pitch; - for (i = 0; i < ncopy; i++) { - cheight = copy_height; - if (((cheight * pitch) / 4) > CIK_SDMA_COPY_MAX_SIZE) { - cheight = (CIK_SDMA_COPY_MAX_SIZE * 4) / pitch; - cheight &= ~(y_align - 1); - } - size = (cheight * pitch) / 4; - - cs->buf[cs->cdw++] = CIK_SDMA_PACKET(CIK_SDMA_OPCODE_COPY, -sub_op, detile << 15); - cs->buf[cs->cdw++] = base; - cs->buf[cs->cdw++] = base >> 32; - cs->buf[cs->cdw++] = ((height - 1) << 16) | pitch_tile_max; -
[Mesa-dev] [PATCH 08/14] radeonsi: raise the max size for SDMA buffer copies
From: Marek Olšák --- src/gallium/drivers/radeonsi/cik_sdma.c | 4 ++-- src/gallium/drivers/radeonsi/sid.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/radeonsi/cik_sdma.c b/src/gallium/drivers/radeonsi/cik_sdma.c index ec3446a..88a994e 100644 --- a/src/gallium/drivers/radeonsi/cik_sdma.c +++ b/src/gallium/drivers/radeonsi/cik_sdma.c @@ -46,7 +46,7 @@ static void cik_sdma_do_copy_buffer(struct si_context *ctx, dst_offset += r600_resource(dst)->gpu_address; src_offset += r600_resource(src)->gpu_address; - ncopy = (size + CIK_SDMA_COPY_MAX_SIZE - 1) / CIK_SDMA_COPY_MAX_SIZE; + ncopy = DIV_ROUND_UP(size, CIK_SDMA_COPY_MAX_SIZE); r600_need_dma_space(&ctx->b, ncopy * 7); radeon_add_to_buffer_list(&ctx->b, &ctx->b.dma, rsrc, RADEON_USAGE_READ, @@ -55,7 +55,7 @@ static void cik_sdma_do_copy_buffer(struct si_context *ctx, RADEON_PRIO_SDMA_BUFFER); for (i = 0; i < ncopy; i++) { - csize = size < CIK_SDMA_COPY_MAX_SIZE ? size : CIK_SDMA_COPY_MAX_SIZE; + csize = MIN2(size, CIK_SDMA_COPY_MAX_SIZE); cs->buf[cs->cdw++] = CIK_SDMA_PACKET(CIK_SDMA_OPCODE_COPY, CIK_SDMA_COPY_SUB_OPCODE_LINEAR, 0); diff --git a/src/gallium/drivers/radeonsi/sid.h b/src/gallium/drivers/radeonsi/sid.h index 9daefdb..a388c5b 100644 --- a/src/gallium/drivers/radeonsi/sid.h +++ b/src/gallium/drivers/radeonsi/sid.h @@ -9021,7 +9021,7 @@ #defineCIK_SDMA_PACKET_SEMAPHORE 0x7 #defineCIK_SDMA_PACKET_CONSTANT_FILL 0xb #defineCIK_SDMA_PACKET_SRBM_WRITE 0xe -#defineCIK_SDMA_COPY_MAX_SIZE 0x1f +#defineCIK_SDMA_COPY_MAX_SIZE 0x3fffe0 #endif /* _SID_H */ -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/14] RadeonSI: SDMA texture copy rewrite
Hi, This patch series completely rewrites texture copying with SDMA for CIK & VI. It only uses the "partial" copy packets, which makes it a lot simpler (one packet per layered/3D copy). Most of the complexity is in handling hw limitations, but luckily the SDMA path can be used in the majority of cases. On top of that, the DMA IB support is improved thanks to extensive testing and benchmarking. This is only enabled on VI for now. It can be enabled on CIK after enough testing is done. There is one more hardware limitation that can cause VM faults with T2L copies and needs a workaround. The exact workaround is still under discussion, but I think this is good enough for review already. On Tonga, it beats the 3D engine in texture upload performance by 0-80%. On APUs, the performance is mixed or slightly worse. The reason may be that the 3D engine uses less bandwidth thanks to DCC. The fact that SDMA can't do DCC compression can be detrimental to texturing performance on VI. The impact hasn't been measured yet. Texture uploads still use 2 copies (user memory -> linear, linear -> tiled). Merging those 2 copies into 1 is not done in this series. Please review and opinions on the DCC issue are welcome, Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Don't follow pow with an instruction with two dest regs.
Matt Turner writes: > Beginning with commit 7b208a73, Unigine Valley began hanging the GPU on > Gen >= 8 platforms. > > Evidently that commit allowed the scheduler to make different choices > that somehow finally ran afoul of a hardware bug in which POW and FDIV > instructions may not be followed by an instruction with two destination > registers (including compressed instructions). I presume the conditions > are more complex than that, but the internal hardware bug report (BDWGFX > bug_de 1696294) does not contain much more information. > > Cc: Francisco Jerez > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94924 > Reviewed-by: Topi Pohjolainen [v1] > Tested-by: Mark Janes [v1] > --- > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 18 ++ > 1 file changed, 18 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > index 8654ca4..ff8d37e 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > @@ -1726,6 +1726,24 @@ fs_generator::generate_code(const cfg_t *cfg, int > dispatch_width) >unsigned int last_insn_offset = p->next_insn_offset; >bool multiple_instructions_emitted = false; > > + /* From the Broadwell PRM, Volume 7, "3D-Media-GPGPU", in the > + * "Register Region Restrictions" section: for BDW, SKL: > + * > + *"A POW/FDIV operation must not be followed by an instruction > + * that requires two destination registers." > + * > + * The documentation is often lacking annotations for Atom parts, > + * and empirically this affects CHV as well. > + */ > + if (devinfo->gen >= 8 && > + p->nr_insn > 1 && > + brw_inst_opcode(devinfo, brw_last_inst) == BRW_OPCODE_MATH && > + brw_inst_math_function(devinfo, brw_last_inst) == > BRW_MATH_FUNCTION_POW && > + inst->dst.component_size(inst->exec_size) > REG_SIZE) { > + brw_NOP(p); > + last_insn_offset = p->next_insn_offset; > + } > + Looks good. This would likely be a good time to check whether the math function is FDIV in addition -- Just to be sure that this hardware bug doesn't bite us again in the future if we ever start using FDIV. Either way patch is: Reviewed-by: Francisco Jerez >if (unlikely(debug_flag)) > annotate(p->devinfo, &annotation, cfg, inst, p->next_insn_offset); > > -- > 2.7.3 signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Don't follow pow with an instruction with two dest regs.
Beginning with commit 7b208a73, Unigine Valley began hanging the GPU on Gen >= 8 platforms. Evidently that commit allowed the scheduler to make different choices that somehow finally ran afoul of a hardware bug in which POW and FDIV instructions may not be followed by an instruction with two destination registers (including compressed instructions). I presume the conditions are more complex than that, but the internal hardware bug report (BDWGFX bug_de 1696294) does not contain much more information. Cc: Francisco Jerez Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94924 Reviewed-by: Topi Pohjolainen [v1] Tested-by: Mark Janes [v1] --- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 18 ++ 1 file changed, 18 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 8654ca4..ff8d37e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -1726,6 +1726,24 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width) unsigned int last_insn_offset = p->next_insn_offset; bool multiple_instructions_emitted = false; + /* From the Broadwell PRM, Volume 7, "3D-Media-GPGPU", in the + * "Register Region Restrictions" section: for BDW, SKL: + * + *"A POW/FDIV operation must not be followed by an instruction + * that requires two destination registers." + * + * The documentation is often lacking annotations for Atom parts, + * and empirically this affects CHV as well. + */ + if (devinfo->gen >= 8 && + p->nr_insn > 1 && + brw_inst_opcode(devinfo, brw_last_inst) == BRW_OPCODE_MATH && + brw_inst_math_function(devinfo, brw_last_inst) == BRW_MATH_FUNCTION_POW && + inst->dst.component_size(inst->exec_size) > REG_SIZE) { + brw_NOP(p); + last_insn_offset = p->next_insn_offset; + } + if (unlikely(debug_flag)) annotate(p->devinfo, &annotation, cfg, inst, p->next_insn_offset); -- 2.7.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 37/59] i965/fs: generalize SIMD16 interference workaround
On Curro's suggestion, I tried reproducing the problem that the patch fixes. Not only could I not reproduce it, but it seems like the original workaround that Jason added back when he reworked the fs backend ages ago is unnecessary at least on my ivybridge and broadwell machines. Since reproducing it involves specific register-allocation choices, I hacked the fs visitor to skip emit_nir_code() and instead produce a specific program with some hardcoded register assignments, then picked a random piglit test (tests/shaders/glsl-fs-abs-neg.shader_test) and ran it with INTEL_DEBUG=no8. The resulting assembly looked like: Native code for unnamed fragment shader GLSL3 SIMD16 shader: 7 instructions. 0 loops. 58 cycles. 0:0 spills:fills. Promoted 0 constants. Compacted 112 to 96 bytes (14%) START B0 mov(16) g10<1>F -1F { align1 1H }; mov(16) g122<1>F1F { align1 1H }; mov(16) g124<1>F[0F, 0F, 0F, 0F]VF { align1 1H compacted }; mov(16) g126<1>F1F { align1 1H }; add(16) g11<1>F g10<8,8,1>F 1F { align1 1H }; mov(16) g120<1>Fg11<8,8,1>F { align1 1H compacted }; sendc(16) null<1>UW g120<8,8,1>F render RT write SIMD16 LastRT Surface = 0 mlen 8 rlen 0 { align1 1H EOT }; END B0 If the comment Jason added is correct, then add(16) g11<1>F g10<8,8,1>F 1F { align1 1H }; will get expanded into add(8) g11<1>F g10<8,8,1>F 1F { align1 1Q }; add(8) g12<1>F g11<8,8,1>F 1F { align1 2Q }; And the add will happen twice for g11, making some of the pixels yellow instead of green. But on the machines I tested, this resulted in pure green. I also tried a similar thing for fp64, but there was still no problem. I Also tested the similar problem described in fs_inst::has_source_and_destination_hazard() with this program: Native code for unnamed fragment shader GLSL3 SIMD16 shader: 7 instructions. 0 loops. 56 cycles. 0:0 spills:fills. Promoted 0 constants. Compacted 112 to 96 bytes (14%) START B0 mov(1) g10<1>F -1F { align1 WE_all }; mov(16) g122<1>F1F { align1 1H }; mov(16) g124<1>F[0F, 0F, 0F, 0F]VF { align1 1H compacted }; mov(16) g126<1>F1F { align1 1H }; add(16) g10<1>F g10<0,1,0>F 1F { align1 1H }; mov(16) g120<1>Fg10<8,8,1>F { align1 1H compacted }; sendc(16) null<1>UW g120<8,8,1>F render RT write SIMD16 LastRT Surface = 0 mlen 8 rlen 0 { align1 1H EOT }; END B0 and again the comment seems to be wrong. Also, there were no piglit regressions with the patch reverted, and the only tests broken with the original workarounds reverted are ones that use math instructions which we split to SIMD8 in the generator (but probably shouldn't be). So for now, we can probably drop this patch, and if we can do more testing, then we can probably drop the original workaround or restrict it to older gens. If anyone wants to test this on other platforms than broadwell and ivybridge, the hacks I made are available on my fdo repo on the test-compressed-wa and test-fp64-compressed-wa (the former has two commits, go back to the first to produce the first assembly). On Fri, Apr 29, 2016 at 7:29 AM, Samuel Iglesias Gonsálvez wrote: > From: Connor Abbott > > Work based on registers read/written instead of dispatch_width, so that > the interferences are added for 64-bit sources/destinations as well. > --- > src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 67 > --- > 1 file changed, 48 insertions(+), 19 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > index 2347cd5..f0e96f9 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > @@ -541,6 +541,34 @@ setup_mrf_hack_interference(fs_visitor *v, struct > ra_graph *g, > } > } > > +static unsigned > +get_reg_node(const fs_reg& reg, unsigned first_payload_node) > +{ > + switch (reg.file) { > + case VGRF: > + return reg.nr; > + case ARF: > + case FIXED_GRF: > + return first_payload_node + reg.nr; > + case MRF: > + default: > + unreachable("unhandled register file"); > + } > + > + return 0; > +} > + > +static void > +add_reg_interference(struct ra_graph *g, const fs_reg& reg1, > + const fs_reg& reg2, unsigned first_payload_node) > +{ > + if ((reg1.file == VGRF || reg1.file == ARF || reg1.file == FIXED_GRF) && > + (reg2
[Mesa-dev] [PATCH 1/2] meta/blit: Don't blend integer values during MSAA resolves
--- src/mesa/drivers/common/meta_blit.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/common/meta_blit.c b/src/mesa/drivers/common/meta_blit.c index 6761238..bb79c46 100644 --- a/src/mesa/drivers/common/meta_blit.c +++ b/src/mesa/drivers/common/meta_blit.c @@ -458,8 +458,17 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, int step; if (src_datatype == GL_INT || src_datatype == GL_UNSIGNED_INT) { -merge_function = - "gvec4 merge(gvec4 a, gvec4 b) { return (a >> gvec4(1)) + (b >> gvec4(1)) + (a & b & gvec4(1)); }\n"; +/* From the OpenGL ES 3.2 spec section 16.2.1: + * + *"If the source formats are integer types or stencil values, + *a single sample’s value is selected for each pixel." + * + * The OpenGL 4.4 spec contains exactly the same language. + * + * We can accomplish this by making the merge function return just + * one of the two samples. The compiler should do the rest. + */ +merge_function = "gvec4 merge(gvec4 a, gvec4 b) { return a; }\n"; } else { /* The divide will happen at the end for floats. */ merge_function = -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] i965/borp: Don't blend integer values during MSAA resolves
--- src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 30 ++-- 1 file changed, 19 insertions(+), 11 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp index 882f9ed..dc68eab 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp @@ -2160,6 +2160,8 @@ brw_blorp_blit_program::manual_blend_average(unsigned num_samples) if (key->tex_layout == INTEL_MSAA_LAYOUT_CMS) mcs_fetch(); + assert(key->texture_data_type == BRW_REGISTER_TYPE_F); + /* We add together samples using a binary tree structure, e.g. for 4x MSAA: * * result = ((sample[0] + sample[1]) + (sample[2] + sample[3])) / 4 @@ -2228,8 +2230,7 @@ brw_blorp_blit_program::manual_blend_average(unsigned num_samples) /* TODO: should use a smaller loop bound for non_RGBA formats */ for (int k = 0; k < 4; ++k) { -emit_combine(key->texture_data_type == BRW_REGISTER_TYPE_F ? -BRW_OPCODE_ADD : BRW_OPCODE_AVG, +emit_combine(BRW_OPCODE_ADD, offset(texture_data[stack_depth - 1], 2*k), offset(vec8(texture_data[stack_depth - 1]), 2*k), offset(vec8(texture_data[stack_depth]), 2*k)); @@ -2240,14 +2241,12 @@ brw_blorp_blit_program::manual_blend_average(unsigned num_samples) /* We should have just 1 sample on the stack now. */ assert(stack_depth == 1); - if (key->texture_data_type == BRW_REGISTER_TYPE_F) { - /* Scale the result down by a factor of num_samples */ - /* TODO: should use a smaller loop bound for non-RGBA formats */ - for (int j = 0; j < 4; ++j) { - emit_mul(offset(texture_data[0], 2*j), - offset(vec8(texture_data[0]), 2*j), - brw_imm_f(1.0f / num_samples)); - } + /* Scale the result down by a factor of num_samples */ + /* TODO: should use a smaller loop bound for non-RGBA formats */ + for (int j = 0; j < 4; ++j) { + emit_mul(offset(texture_data[0], 2*j), + offset(vec8(texture_data[0]), 2*j), + brw_imm_f(1.0f / num_samples)); } if (key->tex_layout == INTEL_MSAA_LAYOUT_CMS) @@ -2869,8 +2868,17 @@ brw_blorp_blit_miptrees(struct brw_context *brw, GLenum base_format = _mesa_get_format_base_format(src_mt->format); if (base_format != GL_DEPTH_COMPONENT && /* TODO: what about depth/stencil? */ base_format != GL_STENCIL_INDEX && + !_mesa_is_format_integer(src_mt->format) && src_mt->num_samples > 1 && dst_mt->num_samples <= 1) { - /* We are downsampling a color buffer, so blend. */ + /* We are downsampling a non-integer color buffer, so blend. + * + * Regarding integer color buffers, the OpenGL ES 3.2 spec says: + * + *"If the source formats are integer types or stencil values, a + *single sample’s value is selected for each pixel." + * + * This implies we should not blend in that case. + */ wm_prog_key.blend = true; } -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] i965: Add infrastucture for sample lod-zero operations.
--- src/mesa/drivers/dri/i965/brw_defines.h | 5 + src/mesa/drivers/dri/i965/brw_disasm.c | 3 +++ src/mesa/drivers/dri/i965/brw_fs.cpp| 3 +++ src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 14 ++ src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp | 2 ++ src/mesa/drivers/dri/i965/brw_shader.cpp| 6 ++ 6 files changed, 33 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 60b696c..ba22363 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -977,8 +977,10 @@ enum opcode { SHADER_OPCODE_TXD_LOGICAL, SHADER_OPCODE_TXF, SHADER_OPCODE_TXF_LOGICAL, + SHADER_OPCODE_TXF_LZ, SHADER_OPCODE_TXL, SHADER_OPCODE_TXL_LOGICAL, + SHADER_OPCODE_TXL_LZ, SHADER_OPCODE_TXS, SHADER_OPCODE_TXS_LOGICAL, FS_OPCODE_TXB, @@ -1636,6 +1638,9 @@ enum brw_message_target { #define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO 17 #define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C 18 #define HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE 20 +#define GEN9_SAMPLER_MESSAGE_SAMPLE_LZ 24 +#define GEN9_SAMPLER_MESSAGE_SAMPLE_C_LZ 25 +#define GEN9_SAMPLER_MESSAGE_SAMPLE_LD_LZ26 #define GEN9_SAMPLER_MESSAGE_SAMPLE_LD2DMS_W 28 #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS 29 #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS 30 diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c b/src/mesa/drivers/dri/i965/brw_disasm.c index 1778419..5d25774 100644 --- a/src/mesa/drivers/dri/i965/brw_disasm.c +++ b/src/mesa/drivers/dri/i965/brw_disasm.c @@ -550,6 +550,9 @@ static const char *const gen5_sampler_msg_type[] = { [GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO] = "gather4_po", [GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C] = "gather4_po_c", [HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE] = "sample_d_c", + [GEN9_SAMPLER_MESSAGE_SAMPLE_LZ] = "sample_lz", + [GEN9_SAMPLER_MESSAGE_SAMPLE_C_LZ] = "sample_c_lz", + [GEN9_SAMPLER_MESSAGE_SAMPLE_LD_LZ]= "ld_lz", [GEN9_SAMPLER_MESSAGE_SAMPLE_LD2DMS_W] = "ld2dms_w", [GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS] = "ld_mcs", [GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS] = "ld2dms", diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 18760dd..d233265 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -949,12 +949,14 @@ fs_visitor::implied_mrf_writes(fs_inst *inst) case FS_OPCODE_TXB: case SHADER_OPCODE_TXD: case SHADER_OPCODE_TXF: + case SHADER_OPCODE_TXF_LZ: case SHADER_OPCODE_TXF_CMS: case SHADER_OPCODE_TXF_CMS_W: case SHADER_OPCODE_TXF_MCS: case SHADER_OPCODE_TG4: case SHADER_OPCODE_TG4_OFFSET: case SHADER_OPCODE_TXL: + case SHADER_OPCODE_TXL_LZ: case SHADER_OPCODE_TXS: case SHADER_OPCODE_LOD: case SHADER_OPCODE_SAMPLEINFO: @@ -4156,6 +4158,7 @@ lower_sampler_logical_send_gen7(const fs_builder &bld, fs_inst *inst, opcode op, coordinate_done = true; break; + case SHADER_OPCODE_TXF_CMS: case SHADER_OPCODE_TXF_CMS_W: case SHADER_OPCODE_TXF_UMS: diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 8654ca4..5867a48 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -792,6 +792,14 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LOD; } break; + case SHADER_OPCODE_TXL_LZ: + assert(devinfo->gen >= 9); +if (inst->shadow_compare) { +msg_type = GEN9_SAMPLER_MESSAGE_SAMPLE_C_LZ; + } else { +msg_type = GEN9_SAMPLER_MESSAGE_SAMPLE_LZ; + } + break; case SHADER_OPCODE_TXS: msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_RESINFO; break; @@ -807,6 +815,10 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src case SHADER_OPCODE_TXF: msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LD; break; + case SHADER_OPCODE_TXF_LZ: + assert(devinfo->gen >= 9); + msg_type = GEN9_SAMPLER_MESSAGE_SAMPLE_LD_LZ; + break; case SHADER_OPCODE_TXF_CMS_W: assert(devinfo->gen >= 9); msg_type = GEN9_SAMPLER_MESSAGE_SAMPLE_LD2DMS_W; @@ -2115,11 +2127,13 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width) case FS_OPCODE_TXB: case SHADER_OPCODE_TXD: case SHADER_OPCODE_TXF: + case SHADER_OPCODE_TXF_LZ: case SHADER_OPCODE_TXF_CMS: case SHADER_OPCODE_TXF_CMS_W: case SHADER_OPCODE_TXF_UMS: case SHADER_OPCODE_TXF_MCS: case SHADER_OPCODE_TXL: + case SHADER_OPCODE_TXL_LZ: case
[Mesa-dev] [PATCH 3/3] i965/fs: Recognize and emit ld_lz, sample_lz, sample_c_lz.
Ken suggested instead of a big and complicated optimization pass, to just recognize the operations here. It's certainly less code and a lot prettier, but it seems to actually perform worse for currently unknown reasons. total instructions in shared programs: 8514403 -> 8495373 (-0.22%) instructions in affected programs: 809512 -> 790482 (-2.35%) helped: 3316 HURT: 10 total cycles in shared programs: 64259830 -> 63979404 (-0.44%) cycles in affected programs: 10511460 -> 10231034 (-2.67%) helped: 2339 HURT: 771 total spills in shared programs: 1707 -> 1695 (-0.70%) spills in affected programs: 90 -> 78 (-13.33%) helped: 4 total fills in shared programs: 2647 -> 2620 (-1.02%) fills in affected programs: 174 -> 147 (-15.52%) helped: 4 LOST: 12 GAINED: 36 --- Here's a word diff of the shader-db results from the previous series (before) compared with this series (after) diff --git a/before b/after index e856a62..e2de982 100644 --- a/before +++ b/after @@ -1,27 +1,27 @@ total instructions in shared programs: [-8522163-]{+8514403+} -> [-8502879 (-0.23%)-]{+8495373 (-0.22%)+} instructions in affected programs: [-801772-]{+809512+} -> [-782488 (-2.41%)-]{+790482 (-2.35%)+} helped: 3316 HURT: [-0-]{+10+} total cycles in shared programs: [-64374090-]{+64259830+} -> [-64070042 (-0.47%)-]{+63979404 (-0.44%)+} cycles in affected programs: [-10100336-]{+10511460+} -> [-9796288 (-3.01%)-]{+10231034 (-2.67%)+} helped: [-2351-]{+2339+} HURT: [-735-]{+771+} total loops in shared programs: [-2209-]{+2199+} -> [-2209-]{+2199+} (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 1707 -> [-1683 (-1.41%)-]{+1695 (-0.70%)+} spills in affected programs: 90 -> [-66 (-26.67%)-]{+78 (-13.33%)+} helped: 4 HURT: 0 total fills in shared programs: 2647 -> [-2606 (-1.55%)-]{+2620 (-1.02%)+} fills in affected programs: 174 -> [-133 (-23.56%)-]{+147 (-15.52%)+} helped: 4 HURT: 0 LOST: [-2-]{+12+} GAINED: 36 src/mesa/drivers/dri/i965/brw_fs.cpp | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index d233265..202e0f7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -4096,6 +4096,10 @@ lower_sampler_logical_send_gen7(const fs_builder &bld, fs_inst *inst, opcode op, switch (op) { case FS_OPCODE_TXB: case SHADER_OPCODE_TXL: + if (devinfo->gen >= 9 && op == SHADER_OPCODE_TXL && lod.is_zero()) { + op = SHADER_OPCODE_TXL_LZ; + break; + } bld.MOV(sources[length], lod); length++; break; @@ -4147,8 +4151,12 @@ lower_sampler_logical_send_gen7(const fs_builder &bld, fs_inst *inst, opcode op, length++; } - bld.MOV(retype(sources[length], BRW_REGISTER_TYPE_D), lod); - length++; + if (devinfo->gen >= 9 && lod.is_zero()) { + op = SHADER_OPCODE_TXF_LZ; + } else { + bld.MOV(retype(sources[length], BRW_REGISTER_TYPE_D), lod); + length++; + } for (unsigned i = devinfo->gen >= 9 ? 2 : 1; i < coord_components; i++) { bld.MOV(retype(sources[length], BRW_REGISTER_TYPE_D), coordinate); -- 2.7.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] i965/fs: Add and use get_nir_src_imm().
Terrible name. Suggest something else, or even a better way to do this. Basically, the next patch wants to inspect the LOD argument and do something different if it's 0.0f. But at that point we've emitted a MOV for it and we just have a register to look at. --- This is an alternative approach to the 4 patch series I sent ast night. src/mesa/drivers/dri/i965/brw_fs.h | 1 + src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 36 +++- 2 files changed, 32 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index ba6bd3f..29fa593 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -273,6 +273,7 @@ public: void nir_emit_jump(const brw::fs_builder &bld, nir_jump_instr *instr); fs_reg get_nir_src(nir_src src); + fs_reg get_nir_src_imm(nir_src src); fs_reg get_nir_dest(nir_dest dest); fs_reg get_nir_image_deref(const nir_deref_var *deref); fs_reg get_indirect_offset(nir_intrinsic_instr *instr); diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 4d14fda..fb59800 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -1245,6 +1245,32 @@ fs_visitor::get_nir_src(nir_src src) } fs_reg +fs_visitor::get_nir_src_imm(nir_src src) +{ + fs_reg reg; + if (src.is_ssa) { + if (src.ssa->parent_instr->type == nir_instr_type_load_const) { + nir_load_const_instr *load_const = +nir_instr_as_load_const(src.ssa->parent_instr); + return brw_imm_ud(load_const->value.u32[0]); + } + + reg = nir_ssa_values[src.ssa->index]; + } else { + /* We don't handle indirects on locals */ + assert(src.reg.indirect == NULL); + reg = offset(nir_locals[src.reg.reg->index], bld, + src.reg.base_offset * src.reg.reg->num_components); + } + + /* to avoid floating-point denorm flushing problems, set the type by +* default to D - instructions that need floating point semantics will set +* this to F if they need to +*/ + return retype(reg, BRW_REGISTER_TYPE_D); +} + +fs_reg fs_visitor::get_nir_dest(nir_dest dest) { if (dest.is_ssa) { @@ -3444,7 +3470,7 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, nir_tex_instr *instr) fs_reg src = get_nir_src(instr->src[i].src); switch (instr->src[i].src_type) { case nir_tex_src_bias: - lod = retype(src, BRW_REGISTER_TYPE_F); + lod = retype(get_nir_src_imm(instr->src[i].src), BRW_REGISTER_TYPE_F); break; case nir_tex_src_comparitor: shadow_comparitor = retype(src, BRW_REGISTER_TYPE_F); @@ -3462,7 +3488,7 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, nir_tex_instr *instr) } break; case nir_tex_src_ddx: - lod = retype(src, BRW_REGISTER_TYPE_F); + lod = retype(get_nir_src_imm(instr->src[i].src), BRW_REGISTER_TYPE_F); lod_components = nir_tex_instr_src_size(instr, i); break; case nir_tex_src_ddy: @@ -3471,13 +3497,13 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, nir_tex_instr *instr) case nir_tex_src_lod: switch (instr->op) { case nir_texop_txs: -lod = retype(src, BRW_REGISTER_TYPE_UD); +lod = retype(get_nir_src_imm(instr->src[i].src), BRW_REGISTER_TYPE_UD); break; case nir_texop_txf: -lod = retype(src, BRW_REGISTER_TYPE_D); +lod = retype(get_nir_src_imm(instr->src[i].src), BRW_REGISTER_TYPE_D); break; default: -lod = retype(src, BRW_REGISTER_TYPE_F); +lod = retype(get_nir_src_imm(instr->src[i].src), BRW_REGISTER_TYPE_F); break; } break; -- 2.7.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeonsi: set DECOMPRESS_Z_ON_FLUSH if nr_samples >= 4
From: Marek Olšák Vulkan always sets this. It only affects in-place Z decompression. This is recommended for performance, but what app uses MSAA depth texturing? --- src/gallium/drivers/radeonsi/si_state.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index 81a4341..c6e10b7 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -1104,7 +1104,8 @@ static void si_emit_db_render_state(struct si_context *sctx, struct r600_atom *s /* DB_RENDER_OVERRIDE2 */ radeon_set_context_reg(cs, R_028010_DB_RENDER_OVERRIDE2, S_028010_DISABLE_ZMASK_EXPCLEAR_OPTIMIZATION(sctx->db_depth_disable_expclear) | - S_028010_DISABLE_SMEM_EXPCLEAR_OPTIMIZATION(sctx->db_stencil_disable_expclear)); + S_028010_DISABLE_SMEM_EXPCLEAR_OPTIMIZATION(sctx->db_stencil_disable_expclear) | + S_028010_DECOMPRESS_Z_ON_FLUSH(sctx->framebuffer.nr_samples >= 4)); db_shader_control = S_02880C_ALPHA_TO_MASK_DISABLE(sctx->framebuffer.cb0_is_integer) | sctx->ps_db_shader_control; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 --- Comment #16 from Jose Fonseca --- (In reply to David Lonie from comment #12) > Just tried that here, no such luck: > > $ ln -s libGL.so libGL.so.1 > $ LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH apitrace trace --api gl > ./vtkRenderingCoreCxxTests "TestTranslucentLUTDepthPeeling" > > apitrace: loaded into /usr/bin/apitrace apitrace is being injected into apitrace... This can only happen if you have stuff you shouldn't have on LD_PRELOAD and LD_LIBRARY_PATH variables... :-/ Please start from a clean terminal. Don't do anything other than I described. In particular don't set LD_PRELOAD, and only set LD_LIBRARY_PATH=$PWD. Honestly it's better not to fiddle with LD_PRELOAD directly yourself at all. It just makes everything unnecesarily harder. Just let the `apitrace trace` command do it. I repeat: - the right way of loading libGL.so is to set LD_LIBRARY_PATH=/path/to/where/libGL.so.1/lives/ - the right way of tracing is to `apitrace trace myapp` - and the right way of doing both is to export LD_LIBRARY_PATH=/path/to/where/libGL.so.1/lives/ apitrace trace myapp (Also note, the right name for libGL is not libGL.so as you had in the tarball. It needs to be libGL.so.1. I suspect that this mismatch is what lead you to resort to LD_PRELOAD to start with. But if the name is libGL.so.1 that should never be necessary.) -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 --- Comment #15 from Emil Velikov --- Just pointing out something that may not be that obvious - David is using the xlib powered gallium libGL. That one uses the (iirc) unsupported split shared LLVM libraries. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 --- Comment #14 from David Lonie --- (In reply to Ilia Mirkin from comment #13) > FWIW this is what it looks like on i965: http://i.imgur.com/LMmcGkM.png > > I'm guessing that's correct, but the "correct" screenshot is inside an > archive and I'm lazy. It is indeed totally messed up with llvmpipe. Yep, that's what it should look like. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCHv2 10/23] i965/fs: Stop using the LOAD_PAYLOAD instruction in lower_simd_width.
Michael Schellenberger Costa writes: > Hi Curro, > > Am 04.05.2016 um 06:26 schrieb Francisco Jerez: >> Instead of using the LOAD_PAYLOAD instruction (emitted through the >> emit_transpose() helper that is no longer useful and this commit >> removes) which had to be marked force_writemask_all in some cases, >> emit a series of moves to apply proper channel enable signals to the >> destination. Until now lower_simd_width() had mainly been used to >> lower things that invariably had a basic block-local temporary as >> destination so it didn't seem like a big deal, but I found it to be >> the reason for several Piglit regressions in my SIMD32 branch and >> Igalia discovered the same issue independently while working on FP64 >> support. >> --- >> This is taken from the following WIP series: >> >> https://cgit.freedesktop.org/~currojerez/mesa/log/?h=i965-late-simd-lowering >> >> See also: >> https://lists.freedesktop.org/archives/mesa-dev/2016-May/115596.html >> >> src/mesa/drivers/dri/i965/brw_fs.cpp | 58 >> +++- >> 1 file changed, 18 insertions(+), 40 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >> b/src/mesa/drivers/dri/i965/brw_fs.cpp >> index 4b6aa67..34599cb 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >> @@ -4625,29 +4625,6 @@ get_lowered_simd_width(const struct brw_device_info >> *devinfo, >> } >> } >> >> -/** >> - * The \p rows array of registers represents a \p num_rows by \p num_columns >> - * matrix in row-major order, write it in column-major order into the >> register >> - * passed as destination. \p stride gives the separation between matrix >> - * elements in the input in fs_builder::dispatch_width() units. >> - */ >> -static void >> -emit_transpose(const fs_builder &bld, >> - const fs_reg &dst, const fs_reg *rows, >> - unsigned num_rows, unsigned num_columns, unsigned stride) >> -{ >> - fs_reg *const components = new fs_reg[num_rows * num_columns]; >> - >> - for (unsigned i = 0; i < num_columns; ++i) { >> - for (unsigned j = 0; j < num_rows; ++j) >> - components[num_rows * i + j] = offset(rows[j], bld, stride * i); >> - } >> - >> - bld.LOAD_PAYLOAD(dst, components, num_rows * num_columns, 0); >> - >> - delete[] components; >> -} >> - >> bool >> fs_visitor::lower_simd_width() >> { >> @@ -4698,16 +4675,19 @@ fs_visitor::lower_simd_width() >> if (inst->src[j].file != BAD_FILE && >> !is_uniform(inst->src[j])) { >>/* Get the i-th copy_width-wide chunk of the source. */ >> - const fs_reg src = horiz_offset(inst->src[j], copy_width >> * i); >> + const fs_builder cbld = lbld.group(copy_width, 0); >> + const fs_reg src = offset(inst->src[j], cbld, i); >>const unsigned src_size = inst->components_read(j); >> >> - /* Use a trivial transposition to copy one every n >> - * copy_width-wide components of the register into a >> - * temporary passed as source to the lowered instruction. >> + /* Copy one every n copy_width-wide components of the > That first sentence is really unclear. Shouldnt it sound "Copy each > copy_width-wide component of..." > Nope, it really is copying one every n copy_width chunks of the register, any suggestions to make the wording more clear? > Michael >> + * register into a temporary passed as source to the >> lowered >> + * instruction. >> */ >>split_inst.src[j] = lbld.vgrf(inst->src[j].type, >> src_size); >> - emit_transpose(lbld.group(copy_width, 0), >> - split_inst.src[j], &src, 1, src_size, n); >> + >> + for (unsigned k = 0; k < src_size; ++k) >> + cbld.MOV(offset(split_inst.src[j], lbld, k), >> + offset(src, cbld, n * k)); >> } >> } >> >> @@ -4726,20 +4706,18 @@ fs_visitor::lower_simd_width() >> } >> >> if (inst->regs_written) { >> -/* Distance between useful channels in the temporaries, skipping >> - * garbage if the lowered instruction is wider than the >> original. >> - */ >> -const unsigned m = lower_width / copy_width; >> +const fs_builder lbld = ibld.group(lower_width, 0); >> >> /* Interleave the components of the result from the lowered >> - * instructions. We need to set exec_all() when copying more >> than >> - * one half per component, because LOAD_PAYLOAD (in terms of >> which >> - * emit_transpose is implemented) can only use the same channel >> - * enable signals for all of its non-header sources. >> + * ins
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 --- Comment #13 from Ilia Mirkin --- FWIW this is what it looks like on i965: http://i.imgur.com/LMmcGkM.png I'm guessing that's correct, but the "correct" screenshot is inside an archive and I'm lazy. It is indeed totally messed up with llvmpipe. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 --- Comment #12 from David Lonie --- (In reply to Jose Fonseca from comment #10) > The proper way of using a custom libGL.so is to simply set LD_LIBRARY_PATH > to the directory that has it: > > ln -sf libGL.so libGL.so,1 > LD_LIBRARY_PATH=$PWD ./vtkRenderingCoreCxxTests > "TestTranslucentLUTDepthPeeling" -I Just tried that here, no such luck: $ ln -s libGL.so libGL.so.1 $ LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH apitrace trace --api gl ./vtkRenderingCoreCxxTests "TestTranslucentLUTDepthPeeling" apitrace: loaded into /usr/bin/apitrace apitrace: unloaded from /usr/bin/apitrace apitrace: loaded into /ssd/src/VTK/build-tmp/mesa_bug/vtkRenderingCoreCxxTests apitrace: tracing to /ssd/src/VTK/build-tmp/mesa_bug/vtkRenderingCoreCxxTests.18.trace vtkRenderingCoreCxxTests: /build/apitrace/src/apitrace-7.1/build/wrappers/glxtrace.cpp:97195: void (* _wrapProcAddress(const GLubyte*, __GLXextFuncPtr))(): Assertion `procPtr != (__GLXextFuncPtr)&glXCreateContextAttribsARB' failed. Looks like you did get a trace from it, though, so I wonder what's different. Thanks for uploading that, hopefully that will make it easier to track down! -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 --- Comment #11 from Jose Fonseca --- Created attachment 123472 --> https://bugs.freedesktop.org/attachment.cgi?id=123472&action=edit TestTranslucentLUTDepthPeeling.trace Trace. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 27/59] i965/fs: add a stride helper
Iago Toral writes: > On Wed, 2016-05-04 at 01:15 -0700, Francisco Jerez wrote: >> Iago Toral writes: >> >> > On Mon, 2016-05-02 at 18:48 -0700, Francisco Jerez wrote: >> >> Samuel Iglesias Gonsálvez writes: >> >> >> >> > From: Connor Abbott >> >> > >> >> > Similar to retype() and offset(). >> >> > --- >> >> > src/mesa/drivers/dri/i965/brw_ir_fs.h | 8 >> >> > 1 file changed, 8 insertions(+) >> >> > >> >> > diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h >> >> > b/src/mesa/drivers/dri/i965/brw_ir_fs.h >> >> > index e4f20f4..abda2c3 100644 >> >> > --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h >> >> > +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h >> >> > @@ -78,6 +78,14 @@ retype(fs_reg reg, enum brw_reg_type type) >> >> > } >> >> > >> >> > static inline fs_reg >> >> > +stride(fs_reg reg, unsigned stride) >> >> > +{ >> >> > + if (reg.stride != 0) >> >> > + reg.stride = stride; >> >> > + return reg; >> >> > +} >> >> > + >> >> >> >> This only works if reg.stride == 0 or 1, we need to honour the stride of >> >> the original register (e.g. by doing reg.stride *= stride) or you'll end >> >> up taking components not part of the region given as argument. It gets >> >> messy with ARF and HW registers because they represent the stride >> >> differently... I suggest that instead of fixing this you take the >> >> following patch from my SIMD32 branch that introduces a somewhat easier >> >> to use helper: Instead of doing 'stride(horiz_offset(retype(reg, t), i), >> >> type_sz(reg.type) / type_sz(t))' to take the i-th subcomponent of type t >> >> of the original register you would just do 'subscript(reg, t, i)'. >> > >> > Actually, I think this is not what we want in a number of places. >> > >> > Sometimes we just want to apply a stride to a register, no type changes >> > or anything. That is, we just want to do something like stride(reg, 2). >> > Of course, we can code that with subscript as subscript(reg, retype(reg, >> > type_that is_half_the_size), 0) but that is certainly forced. For >> > example, we have code such as: >> > >> > bld.MOV(tmp, stride(component_i, 2)); >> > bld.MOV(horiz_offset(tmp, 8 * multiplier), >> > stride(horiz_offset(component_i, 1), 2)); >> > >> Oh no, this is precisely the use-case I had in mind: component_i is a >> vector of 64-bit channels laid out contiguously, and you want to extract >> each 32-bit field from each channel separately. You happen to have >> declared component_i as a 32-bit type but that's kind of a lie because >> each component of "component_i" only represents half a channel, it would >> definitely make more sense to make it a DF because that's what it >> contains... > > I think that whether component_i is really 64-bit or 32-bit is not > relevant, in the sense that what matters is how the algorithm > needs/wants to interpret the data it has to manipulate. The way I > thought about this is that we have our 32-bit channels mixed up and we > need to re-arrange them, so from my perspective, this is all about > 32-bit data that we need to move around in a particular fashion. Keep in mind that channel masking is applied to component_i in multiples of 64 bits regardless of the type you declare it with, so if you forget the fact that component_i is a vector of elements 64 bits apart from each other and pretend it's a vector of 32 bit channels, you'll end up crossing channel boundaries illegally and miscompiling the program (or working it around with ->force_writemask_all hacks like you do in the SHUFFLE functions...). subscript() guarantees that you never have to break channel boundaries not even temporarily -- If the register you give it as argument was channel-aligned the result is guaranteed channel-aligned. stride() OTOH gives you a channel-misaligned result you need to compensate for with a matching retype(), so pretty much every use of the result from stride() other than as argument for retype() indicates a bug, that's why it makes sense to do both things in the same operation. Otherwise you need to deal with a temporary value which is basically meaningless, and you need to manually make sure that the value provided as stride multiplier matches the ratio of the type sizes exactly (the snippet you paste below from PACK lowering illustrates the problem since it would miscompile the program silently if the instruction didn't have the exact number of sources you expected). > That's why I find the stride() helper a lot more natural to use, where > I just tell the access type and the stride explicitly, while with > subscript() I have to think about a register type and an access type > instead and the stride is a consequence of that relation. I admit this > is very subjective though. > Sorry, but this wasn't just a codestyle suggestion, most of the uses of stride() and horiz_offset() you have in both series will be broken for a certain input of fully well-formed IR due to at least one out of three different reasons (horiz_offset() not b
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 --- Comment #10 from Jose Fonseca --- (In reply to Jose Fonseca from comment #8) > It looks like apitrace is about to enter an infinite loop: the address of > the supposedly "real" glXCreateContextAttribsARB function is actually > pointing to the fake glXCreateContextAttribsARB wrapper function. > > I'll take a look. Thanks for trying. > > > It looks like doing `MESA_GL_VERSION_OVERRIDE=4.5 glretrace -S frame > vtkRenderingOpenGL2CxxTests.4075.trim.trace` works. I suspect problem is that you're loading Mesa libGL.so via LD_PRELOAD. From attached mesa_bug.tar.bz2: $ cat run-mesa.sh #!/bin/bash echo "Running test against mesa libGL." LD_PRELOAD=libGL.so ./vtkRenderingCoreCxxTests "TestTranslucentLUTDepthPeeling" -I This will defintely interfere with apitrace since apitrace needs to use LD_PRELOAD to inject. The proper way of using a custom libGL.so is to simply set LD_LIBRARY_PATH to the directory that has it: ln -sf libGL.so libGL.so,1 LD_LIBRARY_PATH=$PWD ./vtkRenderingCoreCxxTests "TestTranslucentLUTDepthPeeling" -I and all should work fine. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 --- Comment #9 from Roland Scheidegger --- FWIW it was not the samplers returning NaN for the float texture, but rather the _coords_ being NaNs (thus the results from sampling could be anything, though it is possible some implementations would use 0.0 as coord instead). But those coords came from the vs, which seemed simple enough. So possibly the NaNs might have been the result of a previous render pass. (It is of course also possible some shader is simply bogus and relying on NaNs getting flushed to zero somewhere on output.) It would indeed be helpful to know if hw drivers are affected too. (Using vtk sources to reproduce is pretty easy.) -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 --- Comment #8 from Jose Fonseca --- (In reply to David Lonie from comment #7) > Is this issue familiar to anyone? I'll keep trying to get this working, but > would appreciate any tips if this is a known issue :) It looks like apitrace is about to enter an infinite loop: the address of the supposedly "real" glXCreateContextAttribsARB function is actually pointing to the fake glXCreateContextAttribsARB wrapper function. I'll take a look. Thanks for trying. It looks like doing `MESA_GL_VERSION_OVERRIDE=4.5 glretrace -S frame vtkRenderingOpenGL2CxxTests.4075.trim.trace` works. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glapi: fix parameter type for GetSamplerParameterIuivEXT() in es_EXT.xml
Missed that. Will do. I should probably also cc this for 11.2 Thanks. -Brian On 05/04/2016 02:05 PM, Ilia Mirkin wrote: My bad (iirc)... copy/paste error. Also need to fix the OES variant. With that, Reviewed-by: Ilia Mirkin On Wed, May 4, 2016 at 4:03 PM, Brian Paul wrote: The function returns GLuint, not GLfloat values. --- src/mapi/glapi/gen/es_EXT.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml index bf67eae..e453401 100644 --- a/src/mapi/glapi/gen/es_EXT.xml +++ b/src/mapi/glapi/gen/es_EXT.xml @@ -901,7 +901,7 @@ - + -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=BQIBaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=T0t4QG7chq2ZwJo6wilkFznRSFy-8uDKartPGbomVj8&m=AXoK8Bn0E4KSNMa9kHkvvoyH6_V9nNSXmrZwVVtLrU4&s=a8BcTju1sst0Dp04U3N4IgadHfryV8dptFRg_ldFeek&e= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glapi: fix parameter type for GetSamplerParameterIuivEXT() in es_EXT.xml
Reviewed-by: Charmaine Lee From: Brian Paul Sent: Wednesday, May 4, 2016 1:03 PM To: mesa-dev@lists.freedesktop.org Cc: Charmaine Lee; Sinclair Yeh Subject: [PATCH] glapi: fix parameter type for GetSamplerParameterIuivEXT() in es_EXT.xml The function returns GLuint, not GLfloat values. --- src/mapi/glapi/gen/es_EXT.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml index bf67eae..e453401 100644 --- a/src/mapi/glapi/gen/es_EXT.xml +++ b/src/mapi/glapi/gen/es_EXT.xml @@ -901,7 +901,7 @@ - + -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glapi: fix parameter type for GetSamplerParameterIuivEXT() in es_EXT.xml
My bad (iirc)... copy/paste error. Also need to fix the OES variant. With that, Reviewed-by: Ilia Mirkin On Wed, May 4, 2016 at 4:03 PM, Brian Paul wrote: > The function returns GLuint, not GLfloat values. > --- > src/mapi/glapi/gen/es_EXT.xml | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml > index bf67eae..e453401 100644 > --- a/src/mapi/glapi/gen/es_EXT.xml > +++ b/src/mapi/glapi/gen/es_EXT.xml > @@ -901,7 +901,7 @@ > alias="GetSamplerParameterIuiv"> > > > - > + > > > > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: include texture format in glGenerateMipmap error message
Reviewed-by: Charmaine Lee From: Brian Paul Sent: Wednesday, May 4, 2016 1:03 PM To: mesa-dev@lists.freedesktop.org Cc: Charmaine Lee; Sinclair Yeh Subject: [PATCH] mesa: include texture format in glGenerateMipmap error message --- src/mesa/main/genmipmap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/main/genmipmap.c b/src/mesa/main/genmipmap.c index 1a6ae9a..d917220 100644 --- a/src/mesa/main/genmipmap.c +++ b/src/mesa/main/genmipmap.c @@ -139,7 +139,8 @@ _mesa_generate_texture_mipmap(struct gl_context *ctx, srcImage->InternalFormat)) { _mesa_unlock_texture(ctx, texObj); _mesa_error(ctx, GL_INVALID_OPERATION, - "glGenerate%sMipmap(invalid internal format)", suffix); + "glGenerate%sMipmap(invalid internal format %s)", suffix, + _mesa_enum_to_string(srcImage->InternalFormat)); return; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glapi: fix parameter type for GetSamplerParameterIuivEXT() in es_EXT.xml
The function returns GLuint, not GLfloat values. --- src/mapi/glapi/gen/es_EXT.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml index bf67eae..e453401 100644 --- a/src/mapi/glapi/gen/es_EXT.xml +++ b/src/mapi/glapi/gen/es_EXT.xml @@ -901,7 +901,7 @@ - + -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: include texture format in glGenerateMipmap error message
--- src/mesa/main/genmipmap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/main/genmipmap.c b/src/mesa/main/genmipmap.c index 1a6ae9a..d917220 100644 --- a/src/mesa/main/genmipmap.c +++ b/src/mesa/main/genmipmap.c @@ -139,7 +139,8 @@ _mesa_generate_texture_mipmap(struct gl_context *ctx, srcImage->InternalFormat)) { _mesa_unlock_texture(ctx, texObj); _mesa_error(ctx, GL_INVALID_OPERATION, - "glGenerate%sMipmap(invalid internal format)", suffix); + "glGenerate%sMipmap(invalid internal format %s)", suffix, + _mesa_enum_to_string(srcImage->InternalFormat)); return; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/59] Initial arb_gpu_shader_fp64 support to the i965 scalar backend
Samuel Iglesias Gonsálvez writes: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > > > On 03/05/16 20:59, Kenneth Graunke wrote: >> Other than patches 37, 56, and ones you agreed to drop, the series >> is: Reviewed-by: Kenneth Graunke >> >> I think you can go ahead and land all except those, and we can >> land new solutions for those problems afterwards. >> >> We still need to fix the horiz_offset problem, and I think Curro's >> subscript() helper is a great way to do that. I would be fine >> with fixing that after this series lands - it would probably mean >> less rebasing :) >> > > We prepared a patch series without patches 11, 22 (both were dropped), > 37, 40 (it has a NACK from Curro) and 56. Our idea was to push the > patch series once we check there is no piglit regressions. > > We got 0 piglit regressions on ILK, SNB, IVB, HSW and BDW, the latter > with the extension enabled with the environment variable. > > Sadly, on SKL we had regressions (including GPU hangs) only if we > *enable* the arb_gpu_shader_fp64 extension. Those regressions are > fixed by patch 40 but we need still to investigate why they are happening. > That sounds like PATCH 40 was most likely working around regs_read() being interpreted with incorrect units in some other place. The use of regs_read() in PATCH 36 is obviously wrong which might explain the problem. Something along the lines of 'inst->components_read(i) * type_sz(inst->src[i]) / 4' would do what you want -- With that changed PATCH 36 is: Reviewed-by: Francisco Jerez > For that reason, we have not pushed anything yet. Tomorrow we will > investigate why we have those regressions only on SKL and which tests > are affected. > No worries, you still have a couple of weeks until the merge deadline, I don't see the need to rush in any changes that still have outstanding comments. > JFTR, if we disable the extension on SKL (i.e. we don't set the > environment variable), we have 0 piglit regressions on that generation. > > Thanks, > > Sam > > > -BEGIN PGP SIGNATURE- > Version: GnuPG v2 > > iQIcBAEBCAAGBQJXKgPaAAoJEH/0ujLxfcNDtHgP/0JBd96UX2026Y0zBydrE1Ta > U7FV5e0SJXY5JnxWqdV4ZhtdiqQ0EDLj2G0NlFq0u29c+lQX//x9KHMANxbt2tdc > jStRm4NMNXds4rxFYaiRs+3p4XLquV4aZ33Px+0CT6m6Hjsm6sNJZViarB41SjAq > HVRuKAxBbVTjqqPXBKAoXXxp34OUym5uZydXFjavCs1lU70DVwZq9t/vzIAGUunT > NCbehScgtdVupTDEzORZ7QRXvGyqQoJa+HBeuEK6HvvhonfezwrB1hawIRGP34V/ > /k+UvL/DA3GEenpCqHfOIwnxVRYN6iXYNlHSNK+LolsjR/CUxccBs67tcQiGAPBA > g8V02KHbuIN2JyXLSwzJ0ovX5aA+vr5Fuc8e+0PZ7raa7GxIsDg3LbUNnyT8dT83 > fQAM6IMFYhZUOKKF8ynqw5tvRAaUhWJwageYqRkdHROSs8toero7RDAabEdBKWhG > scGkKEu3S7vsGLMBzeSch/UqyAXdaBcJzRSTKCi2UHyWT/5aPczqf4eacyLgWpa/ > 3r+qqzljZ2F8MUbnyP71CyTufx+UQzoBAlB4kM09UcF9RC95XTSQrMBMB/SqR2Om > +ebdE/d1e5ubExNQTbQPrb2OGCIiQhRGQo8MFcuFtgl2DabJFvW4QvsuJKinhQLR > zcfupbRUPDegesKRU+es > =phid > -END PGP SIGNATURE- > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 --- Comment #7 from David Lonie --- I'm trying to get a trace, but apitrace is failing an assertion: $ apitrace trace --api gl bin/vtkRenderingCoreCxxTests "TestTranslucentLUTDepthPeeling" vtkRenderingCoreCxxTests: /build/apitrace/src/apitrace-7.1/build/wrappers/glxtrace.cpp:97195: void (* _wrapProcAddress(const GLubyte*, __GLXextFuncPtr))(): Assertion `procPtr != (__GLXextFuncPtr)&glXCreateContextAttribsARB' failed. This is only happening on Mesa (I get a successful trace using my system nVidia drivers), and it also happens when following the "Trace manually" instructions here: https://github.com/apitrace/apitrace/blob/master/docs/USAGE.markdown#tracing-manually Is this issue familiar to anyone? I'll keep trying to get this working, but would appreciate any tips if this is a known issue :) -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] glapi: Harden GLX request size processing (v2)
v2: Use == not is for equality testing (Dylan Baker) Signed-off-by: Adam Jackson --- src/mapi/glapi/gen/glX_XML.py| 2 +- src/mapi/glapi/gen/glX_proto_recv.py | 2 -- src/mapi/glapi/gen/glX_proto_send.py | 2 -- src/mapi/glapi/gen/glX_proto_size.py | 24 +++- src/mapi/glapi/gen/gl_XML.py | 2 +- 5 files changed, 13 insertions(+), 19 deletions(-) diff --git a/src/mapi/glapi/gen/glX_XML.py b/src/mapi/glapi/gen/glX_XML.py index 12ff291..c659e93 100644 --- a/src/mapi/glapi/gen/glX_XML.py +++ b/src/mapi/glapi/gen/glX_XML.py @@ -357,7 +357,7 @@ class glx_function(gl_XML.gl_function): # FIXME adds some extra diffs to the generated # FIXME code. -size_string = size_string + " + __GLX_PAD(%s)" % (p.size_string(1)) +size_string = size_string + " + safe_pad(%s)" % (p.size_string(1)) return size_string diff --git a/src/mapi/glapi/gen/glX_proto_recv.py b/src/mapi/glapi/gen/glX_proto_recv.py index afee388..09cf05d 100644 --- a/src/mapi/glapi/gen/glX_proto_recv.py +++ b/src/mapi/glapi/gen/glX_proto_recv.py @@ -89,8 +89,6 @@ class PrintGlxDispatchFunctions(glX_proto_common.glx_print_proto): print '#include "indirect_util.h"' print '#include "singlesize.h"' print '' -print '#define __GLX_PAD(x) (((x) + 3) & ~3)' -print '' print 'typedef struct {' print '__GLX_PIXEL_3D_HDR;' print '} __GLXpixel3DHeader;' diff --git a/src/mapi/glapi/gen/glX_proto_send.py b/src/mapi/glapi/gen/glX_proto_send.py index 8b3d8d7..10abcff 100644 --- a/src/mapi/glapi/gen/glX_proto_send.py +++ b/src/mapi/glapi/gen/glX_proto_send.py @@ -177,8 +177,6 @@ class PrintGlxProtoStubs(glX_proto_common.glx_print_proto): print '#include ' print '' -print '#define __GLX_PAD(n) (((n) + 3) & ~3)' -print '' self.printFastcall() self.printNoinline() print '' diff --git a/src/mapi/glapi/gen/glX_proto_size.py b/src/mapi/glapi/gen/glX_proto_size.py index 75fc26f..3a1c554 100644 --- a/src/mapi/glapi/gen/glX_proto_size.py +++ b/src/mapi/glapi/gen/glX_proto_size.py @@ -291,7 +291,7 @@ class glx_server_enum_function(glx_enum_function): print '' print 'compsize = __gl%s_size(%s);' % (f.name, string.join(f.count_parameter_list, ",")) p = f.variable_length_parameter() -print 'return __GLX_PAD(%s);' % (p.size_string()) +print 'return safe_pad(%s);' % (p.size_string()) print '}' print '' @@ -428,7 +428,7 @@ class PrintGlxReqSize_h(PrintGlxReqSize_common): def printBody(self, api): for func in api.functionIterateGlx(): if not func.ignore and func.has_variable_size_request(): -print 'extern PURE _X_HIDDEN int __glX%sReqSize(const GLbyte *pc, Bool swap);' % (func.name) +print 'extern PURE _X_HIDDEN int __glX%sReqSize(const GLbyte *pc, Bool swap, int reqlen);' % (func.name) class PrintGlxReqSize_c(PrintGlxReqSize_common): @@ -452,20 +452,18 @@ class PrintGlxReqSize_c(PrintGlxReqSize_common): print '#include "indirect_size.h"' print '#include "indirect_reqsize.h"' print '' -print '#define __GLX_PAD(x) (((x) + 3) & ~3)' -print '' print '#if defined(__CYGWIN__) || defined(__MINGW32__)' print '# undef HAVE_ALIAS' print '#endif' print '#ifdef HAVE_ALIAS' print '# define ALIAS2(from,to) \\' -print 'GLint __glX ## from ## ReqSize( const GLbyte * pc, Bool swap ) \\' +print 'GLint __glX ## from ## ReqSize( const GLbyte * pc, Bool swap, int reqlen ) \\' print '__attribute__ ((alias( # to )));' print '# define ALIAS(from,to) ALIAS2( from, __glX ## to ## ReqSize )' print '#else' print '# define ALIAS(from,to) \\' -print 'GLint __glX ## from ## ReqSize( const GLbyte * pc, Bool swap ) \\' -print '{ return __glX ## to ## ReqSize( pc, swap ); }' +print 'GLint __glX ## from ## ReqSize( const GLbyte * pc, Bool swap, int reqlen ) \\' +print '{ return __glX ## to ## ReqSize( pc, swap, reqlen ); }' print '#endif' print '' print '' @@ -547,7 +545,7 @@ class PrintGlxReqSize_c(PrintGlxReqSize_common): def common_func_print_just_header(self, f): print 'int' -print '__glX%sReqSize( const GLbyte * pc, Bool swap )' % (f.name) +print '__glX%sReqSize( const GLbyte * pc, Bool swap, int reqlen )' % (f.name) print '{' @@ -602,7 +600,6 @@ class PrintGlxReqSize_c(PrintGlxReqSize_common): offset = 0 fixup = [] params = [] -plus = '' size = '' param_offsets = {} @@ -620,9 +617,10 @@ class PrintGlxReqSize_c(PrintGlxReqSize_common): if s == 0: s = 1
[Mesa-dev] [PATCH 5/6] glapi: Define PURE for Sun Studio as well
Signed-off-by: Adam Jackson --- src/mapi/glapi/gen/gl_XML.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mapi/glapi/gen/gl_XML.py b/src/mapi/glapi/gen/gl_XML.py index 4f35343..8036a02 100644 --- a/src/mapi/glapi/gen/gl_XML.py +++ b/src/mapi/glapi/gen/gl_XML.py @@ -183,7 +183,7 @@ class gl_print_base(object): The name is also added to the file's undef_list. """ self.undef_list.append("PURE") -print """# if defined(__GNUC__) +print """# if defined(__GNUC__) || (defined(__SUNPRO_C) && (__SUNPRO_C >= 0x590)) #define PURE __attribute__((pure)) # else #define PURE -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] glapi/glx: Mark byteswap functions as _X_UNUSED (v2)
Squashes the one remaining warning in the xserver build. v2: Also clean up some non-standard whitespace (Ian Romanick) Reviewed-by: Ian Romanick Signed-off-by: Adam Jackson --- src/mapi/glapi/gen/glX_proto_recv.py | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/mapi/glapi/gen/glX_proto_recv.py b/src/mapi/glapi/gen/glX_proto_recv.py index 09cf05d..5433288 100644 --- a/src/mapi/glapi/gen/glX_proto_recv.py +++ b/src/mapi/glapi/gen/glX_proto_recv.py @@ -171,11 +171,11 @@ class PrintGlxDispatchFunctions(glX_proto_common.glx_print_proto): if t.glx_name not in already_done: real_name = self.real_types[t_size] -print 'static %s' % (t_name) -print 'bswap_%s( const void * src )' % (t.glx_name) +print 'static _X_UNUSED %s' % (t_name) +print 'bswap_%s(const void * src)' % (t.glx_name) print '{' print 'union { %s dst; %s ret; } x;' % (real_name, t_name) -print 'x.dst = bswap_%u( *(%s *) src );' % (t_size * 8, real_name) +print 'x.dst = bswap_%u(*(%s *) src);' % (t_size * 8, real_name) print 'return x.ret;' print '}' print '' @@ -183,12 +183,12 @@ class PrintGlxDispatchFunctions(glX_proto_common.glx_print_proto): for bits in [16, 32, 64]: print 'static void *' -print 'bswap_%u_array( uint%u_t * src, unsigned count )' % (bits, bits) +print 'bswap_%u_array(uint%u_t * src, unsigned count)' % (bits, bits) print '{' print 'unsigned i;' print '' -print 'for ( i = 0 ; i < count ; i++ ) {' -print 'uint%u_t temp = bswap_%u( src[i] );' % (bits, bits) +print 'for (i = 0 ; i < count ; i++) {' +print 'uint%u_t temp = bswap_%u(src[i]);' % (bits, bits) print 'src[i] = temp;' print '}' print '' -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/6] Update generated GLX server code
Another attempt at syncing the GLX generator scripts with xserver. Jon mentioned a couple of issues in the last series, namely that these two patches were still necessary: https://lists.x.org/archives/xorg-devel/2014-April/041597.html https://lists.x.org/archives/xorg-devel/2014-April/041919.html The former doesn't work at all: 'api' isn't a global, nor is it a parameter to emit_function_call(). So that's not included here, and anyone updating the xserver side will just need to be careful about the ARB_multitexture bits. It's also fairly ugly! Turns out there really isn't enough information attached to the entrypoint itself to be able to walk backward to the GL version or extension name directly. I've been fiddling with getting it working and not having a lot of fun. The latter is addressed in a different way, I've just dropped the "skip that which is marked for skipping" patch from the series. This means we will generate protocol code for NV_*_program again, which is harmless enough. But it does mean xserver will need to revert the commit that removes the open-coded size logic for GetProgramString. - ajax ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] glapi/gen: Copy some GL 1.0 enum details into ARB_viewport_array
Otherwise the instances in the extension XML override the core definitions, and we stop knowing their sizes in indirect_size_get.c Signed-off-by: Adam Jackson --- src/mapi/glapi/gen/ARB_viewport_array.xml | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/src/mapi/glapi/gen/ARB_viewport_array.xml b/src/mapi/glapi/gen/ARB_viewport_array.xml index b20cf61..ebd5b99 100644 --- a/src/mapi/glapi/gen/ARB_viewport_array.xml +++ b/src/mapi/glapi/gen/ARB_viewport_array.xml @@ -12,10 +12,18 @@ - - - - + + + + + + + + + + + + -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] glapi: Add the safe_{add, mul, pad} functions from xserver
We're about to update the generator scripts to use these, easier not to vary between client and server. Signed-off-by: Adam Jackson --- src/mapi/glapi/gen/glX_proto_send.py | 24 1 file changed, 24 insertions(+) diff --git a/src/mapi/glapi/gen/glX_proto_send.py b/src/mapi/glapi/gen/glX_proto_send.py index 2b33030..8b3d8d7 100644 --- a/src/mapi/glapi/gen/glX_proto_send.py +++ b/src/mapi/glapi/gen/glX_proto_send.py @@ -174,6 +174,7 @@ class PrintGlxProtoStubs(glX_proto_common.glx_print_proto): print '#include ' print '#include ' print '#include ' +print '#include ' print '' print '#define __GLX_PAD(n) (((n) + 3) & ~3)' @@ -181,6 +182,29 @@ class PrintGlxProtoStubs(glX_proto_common.glx_print_proto): self.printFastcall() self.printNoinline() print '' + +print 'static _X_INLINE int safe_add(int a, int b)' +print '{' +print 'if (a < 0 || b < 0) return -1;' +print 'if (INT_MAX - a < b) return -1;' +print 'return a + b;' +print '}' +print 'static _X_INLINE int safe_mul(int a, int b)' +print '{' +print 'if (a < 0 || b < 0) return -1;' +print 'if (a == 0 || b == 0) return 0;' +print 'if (a > INT_MAX / b) return -1;' +print 'return a * b;' +print '}' +print 'static _X_INLINE int safe_pad(int a)' +print '{' +print 'int ret;' +print 'if (a < 0) return -1;' +print 'if ((ret = safe_add(a, 3)) < 0) return -1;' +print 'return ret & (GLuint)~3;' +print '}' +print '' + print '#ifndef __GNUC__' print '# define __builtin_expect(x, y) x' print '#endif' -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/6] glapi: Fix whitespace droppings when printing the license header
Signed-off-by: Adam Jackson --- src/mapi/glapi/gen/gl_XML.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mapi/glapi/gen/gl_XML.py b/src/mapi/glapi/gen/gl_XML.py index 2e7123e..e11f6fc 100644 --- a/src/mapi/glapi/gen/gl_XML.py +++ b/src/mapi/glapi/gen/gl_XML.py @@ -130,7 +130,7 @@ class gl_print_base(object): % (self.name) print '' print '/*' -print ' * ' + self.license.replace('\n', '\n * ') +print (' * ' + self.license.replace('\n', '\n * ')).replace(' \n', '\n') print ' */' print '' if self.header_tag: -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 02/15] i965: Configure how to store *64*PASSTHRU vertex components
On Wednesday, May 4, 2016 1:01:33 PM PDT Antía Puentes wrote: > Hi Kenneth, > > thanks for reviewing. > > On mié, 2016-05-04 at 03:36 -0700, Kenneth Graunke wrote: > > On Thursday, April 28, 2016 1:40:32 PM PDT Antia Puentes wrote: > > > > > > From the Broadwell specification, structure VERTEX_ELEMENT_STATE > > > description: > > > > > >"When SourceElementFormat is set to one of the *64*_PASSTHRU > > > formats, 64-bit components are stored in the URB without any > > > conversion. In this case, vertex elements must be written as 128 > > > or 256 bits, with VFCOMP_STORE_0 being used to pad the output > > > as required. E.g., if R64_PASSTHRU is used to copy a 64-bit Red > > component into > > > > > > the URB, Component 1 must be specified as VFCOMP_STORE_0 (with > > > Components 2,3 set to VFCOMP_NOSTORE) in order to output a 128-bit > > > vertex element, or Components 1-3 must be specified as VFCOMP_STORE_0 > > > in order to output a 256-bit vertex element. Likewise, use of > > > R64G64B64_PASSTHRU requires Component 3 to be specified as > > VFCOMP_STORE_0 > > > > > > in order to output a 256-bit vertex element." > > > > > > Uses 128-bits to write double and dvec2 vertex elements, and 256-bits for > > > dvec3 and dvec4 vertex elements. > > > > > > Signed-off-by: Juan A. Suarez Romero > > > Signed-off-by: Antia Puentes > > > --- > > > src/mesa/drivers/dri/i965/gen8_draw_upload.c | 34 + > > +++ > > > > > > 1 file changed, 34 insertions(+) > > > > > > diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c b/src/mesa/ > > drivers/dri/i965/gen8_draw_upload.c > > > > > > index fe5ed35..14f7091 100644 > > > --- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c > > > +++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c > > > @@ -217,6 +217,40 @@ gen8_emit_vertices(struct brw_context *brw) > > > break; > > >} > > > > > > + /* From the BDW PRM, Volume 2d, page 586 (VERTEX_ELEMENT_STATE): > > > + * > > > + * "When SourceElementFormat is set to one of the *64*_PASSTHRU > > > + * formats, 64-bit components are stored in the URB without any > > > + * conversion. In this case, vertex elements must be written as > > 128 > > > > > > + * or 256 bits, with VFCOMP_STORE_0 being used to pad the output > > > + * as required. E.g., if R64_PASSTHRU is used to copy a 64- bit > > Red > > > > > > + * component into the URB, Component 1 must be specified as > > > + * VFCOMP_STORE_0 (with Components 2,3 set to VFCOMP_NOSTORE) > > > + * in order to output a 128-bit vertex element, or Components 1-3 > > must > > > > > > + * be specified as VFCOMP_STORE_0 in order to output a 256- bit > > vertex > > > > > > + * element. Likewise, use of R64G64B64_PASSTHRU requires > > Component 3 > > > > > > + * to be specified as VFCOMP_STORE_0 in order to output a 256-bit > > vertex > > > > > > + * element." > > > + */ > > The above comment seems to indicate that R64 needs component 1 set to > > STORE_0, and that looks to be missing. I'm guessing you need to add: > > > > > > > > + if (input->glarray->Doubles) { > > > + switch (input->glarray->Size) { > > > + case 0: > >comp0 = BRW_VE1_COMPONENT_STORE_0; > >/* fallthrough */ > > > > > > > > + case 1: > >comp1 = BRW_VE1_COMPONENT_STORE_0; > >/* fallthrough */ > > > > I have not added them because comp0 and comp1 are initialized in the > code immediately above the switch I added. That already existing code > sets the default values for the components: > > switch (input->glarray->Size) { > case 0: comp0 = BRW_VE1_COMPONENT_STORE_0; > case 1: comp1 = BRW_VE1_COMPONENT_STORE_0; > case 2: comp2 = BRW_VE1_COMPONENT_STORE_0; > case 3: comp3 = input->glarray->Integer ? >BRW_VE1_COMPONENT_STORE_1_INT > : BRW_VE1_COMPONENT_STORE_1_FLT; > break; > } > > In my switch I just override the values that are not right for the > double-precision floating point cases. Right, sorry :) That seems totally fine then. Patches 1-3 are: Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/15] i965: get the proper vertex surface type for doubles on gen8+
On Wednesday, May 4, 2016 3:30:13 PM PDT Alejandro Piñeiro wrote: > > On 04/05/16 12:31, Kenneth Graunke wrote: > > On Thursday, April 28, 2016 1:40:31 PM PDT Antia Puentes wrote: > >> From: Alejandro Piñeiro > >> > >> This commit adds support for PASSTHRU format when pushing > >> double-precision attributes. > >> > >> Check glarray->Doubles in order to know if we should choose a format > >> that does a conversion to float, or just passthru the 64-bit double. > >> --- > >> src/mesa/drivers/dri/i965/brw_draw_upload.c | 30 +++ ++ > > +--- > >> 1 file changed, 27 insertions(+), 3 deletions(-) > > What's the motivation for using the PASSTHRU formats? Is there some > > advantage over just using the R64*FLOAT formats? > > We can't use R64*FLOAT because it does a double to float conversion, as > explained at the bdw PRM (vol 7, section 10.15, page 470, Table "Source > Element Formats Supported in VF Unit"). PASSTHRU ensures that the > 64-bits are passed without any conversion, that is what we want. If it > is not clear reading the commit message, I could expand it. Ahh...thanks. For some reason, I just assumed that R64_FLOAT meant double precision float, not "reads a 64 bit float and outputs a 32 bit float". Heh :) Thanks for the explanation! signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/nine: clean up WINAPI definition
On Sun, May 1, 2016 at 6:03 PM, Axel Davy wrote: > Do we need the #ifndef WINAPI part ? > > Axel > > > On 29/04/2016 20:53, Christian Schmidbauer wrote: >> >> As Emil pointed out, only gcc, clang and MSVC compatibility is required. >> Hence the check for GNUC can be skipped, as __i386__ and __x86_64__ are >> only defined for gcc/clang, not for MSVC. >> >> Remove the #undef which has been there for historic reasons, when wine >> dlls for nine have been built inside mesa. Instead use #ifndef in order >> to avoid redefining WINAPI from MSVC's headers. >> --- >> include/D3D9/d3d9types.h | 16 +--- >> 1 file changed, 5 insertions(+), 11 deletions(-) >> >> diff --git a/include/D3D9/d3d9types.h b/include/D3D9/d3d9types.h >> index e0b8652..88f22b9 100644 >> --- a/include/D3D9/d3d9types.h >> +++ b/include/D3D9/d3d9types.h >> @@ -173,22 +173,16 @@ typedef struct _RGNDATA { >> #define D3DPRESENTFLAG_RESTRICTED_CONTENT 0x0400 >> #define D3DPRESENTFLAG_RESTRICT_SHARED_RESOURCE_DRIVER 0x0800 >> - >> -#ifdef WINAPI >> -#undef WINAPI >> -#endif /* WINAPI*/ >> - >> -#ifdef __GNUC__ >> - #if (defined(__x86_64__) && !defined(__ILP32__)) || defined(_M_X64) >> +/* Windows calling convention */ >> +#ifndef WINAPI >> + #if defined(__x86_64__) && !defined(__ILP32__) >> #define WINAPI __attribute__((ms_abi)) >> - #elif defined(__i386) || defined(_M_IX86) >> + #elif defined(__i386__) >> #define WINAPI __attribute__((__stdcall__)) >> #else /* neither amd64 nor i386 */ >> #define WINAPI >> #endif >> -#else /* __GNUC__ */ >> - #define WINAPI >> -#endif >> +#endif /* WINAPI */ >> /* Implementation caps */ >> #define D3DPRESENT_BACK_BUFFERS_MAX3 > If you do not care about compatibility with MSVC, you can skip it. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/nine: clean up WINAPI definition
On Sun, May 1, 2016 at 12:02 PM, Emil Velikov wrote: > Hi Christian, > > On 29 April 2016 at 19:53, Christian Schmidbauer > wrote: >> As Emil pointed out, only gcc, clang and MSVC compatibility is required. >> Hence the check for GNUC can be skipped, as __i386__ and __x86_64__ are >> only defined for gcc/clang, not for MSVC. >> >> Remove the #undef which has been there for historic reasons, when wine >> dlls for nine have been built inside mesa. Instead use #ifndef in order >> to avoid redefining WINAPI from MSVC's headers. > Reviewed-by: Emil Velikov > > Thanks for picking this up. Should we apply this for stable as well > (considering the patch it 'fixes' is already in there) ? > I'm slightly concerned about the removal of undef WINAPI - I take it > that you've tested the patch ? > > Humbly waiting for an ack/r-b/t-b from someone (else) involved in Nine ;-) > > Thanks > Emil I didn't CC stable as the patch just cleans up the code and does not really fix anything. I don't mind applying this to stable though. I also talked to zhasha on IRC and he also thinks it was only necessary in the past. I did test the patch with wine (on amd64) and Xnine (on x32, i386). ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 --- Comment #6 from Jose Fonseca --- (In reply to David Lonie from comment #3) > I included an apitrace on the last bug, and was told that it was not > suitable for reproducing bugs (something about apitrace forcing a 4.5 > context IIRC). I can make one for this issue if anyone needs it. I agree a trace would be useful. If the issue repros with Mesa + llvmpipe, then just record the trace with Mesa + llvmpipe. If you don't record the trace with the exact same OpenGL implementation, then all sort of things can go wrong. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 --- Comment #4 from Ilia Mirkin --- (In reply to David Lonie from comment #3) > I included an apitrace on the last bug, and was told that it was not > suitable for reproducing bugs (something about apitrace forcing a 4.5 > context IIRC). I can make one for this issue if anyone needs it. Looking at it... 2 @0 glXCreateContextAttribsARB(dpy = 0x6c4f20, config = 0x72c1c0, share_context = NULL, direct = True, attrib_list = [GLX_CONTEXT_MAJOR_VERSION_ARB, 4, GLX_CONTEXT_MINOR_VERSION_ARB, 5, 0]) = 0x74cf10 don't do that? :) Mesa does not support GL 4.5, was that trace captured on some proprietary driver that does? Can you change VTK (even temporarily) to only request GL 3.0? Or alternatively flip to requesting a core context (which that trace isn't doing) and request GL 3.2? -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 --- Comment #5 from Ilia Mirkin --- FWIW I was able to run your trace from the other bug with MESA_GL_VERSION_OVERRIDE=4.5 MESA_GLSL_VERSION_OVERRIDE=450 so I suspect that a trace of this issue would be nice to have as well. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] About tolerance calculation on specific (builtin) functions
On Wed, 2016-05-04 at 13:48 -0400, Ilia Mirkin wrote: > On Wed, May 4, 2016 at 1:41 PM, Connor Abbott > wrote: > > > > On Wed, May 4, 2016 at 1:05 PM, Andres Gomez > > wrote: > > > > > > Hi, > > > > > > as part of the work done to "Add FP64 support to the i965 shader > > > backends" at: > > > https://bugs.freedesktop.org/show_bug.cgi?id=92760 > > > > > > I've been working to add piglit tests that would check the new > > > features > > > added by this addition. > > > > > > Due to this, I've been checking and making modifications into the > > > builtin_functions*py module used by some generators. These > > > modules use > > > automatic calculation of the tolerance when distance checking the > > > result of a function. > > > > > > As already stated in the good documentation of the module, the > > > tolerance is computed following what it is stated in OpenGL's > > > specs: > > > > > > From the OpenGL 1.4 spec (2.1.1 "Floating-Point > > > Computation"): > > > > > > "We require simply that numbers' floating-point parts > > > contain > > > enough bits ... so that individual results of floating- > > > point > > > operations are accurate to about 1 part in 10^5." > > > > > > Although the text is open to interpretation, and for specific > > > operations we take a little bit more flexible approach, > > > basically, the > > > tolerance is calculated as: > > > > > > tolerance = / 10⁵ > > > > > > This makes sense since the precision of a floating point value > > > gets > > > reduced while the number gets bigger[1]. > > > > > > Following this approach, for a number in the order of 40*10⁵, the > > > tolerance used is ~40. While this should be OK for most of the > > > functions, it seems to me that such a high tolerance should not > > > be used > > > with specific functions, if any tolerance should be used at all. > > > > > > For example, when testing the "sign()" function, seems pretty > > > obvious > > > that using a value of 40 in the tolerance of a function that > > > should > > > return either 1.0, 0.0 or -1.0 doesn't make much sense. > > > > > > A similar case is the "trunc" function and probably others, like > > > "floor", "ceil", "abs", etc. > > > > > > My conclusion is that it should be safe to assume no tolerance in > > > this > > > functions and I could modify the algorithm used for them in the > > > python > > > module but I wanted to have some feedback in case I'm not taking > > > into > > > account something that advices against doing these modifications. > > > > > > Opinions? > > Hi, > > > > If you look at the GLSL 4.40 spec, in section 4.7.1 ("Range and > > Precision") you'll find a table listing the precision of various > > operations. Your intuition about floor(), ceil(), etc. needing the > > exact result is correct, as you can see. For doubles, it says "The > > precision of double-precision operations is at least that of single > > precision." Now, it's up for interpretation whether that means that > > they must have the same *absolute* precision or the same ULP's (if > > the > > operation is not exact). For example inversesqrt() is listed at 2 > > ULP > > for single precision, which means that there must be 24 - 2 = 22 > > bits > > of precision. For doubles, are there still 22 bits of precision > > required, or is the requirement really that there still be 2 ULP's > > of > > precision in which case there are 53 - 2 = 51 bits of precision. I > > wrote the original lowering pass for inversesqrt() and friends > > assuming the latter was correct, since it seems like the most sane > > to > > me (or else doubles would have no advantage over floats for > > anything > > except addition and multiplication). > All of this is added by ARB_shader_precision (part of GLSL 4.10). It > also punts on doubles, good to know that the latest specs have > maintained the status quo. One interpretation of it is "you can > implement doubles with float ops". Another is that the ULP's apply to > doubles as well. Just out of curiosity. Is OpenGL's ULP defined anywhere? The OpenCL's definition is "distance between the two non-equal finite floating-point numbers nearest x". the required values are same as the ones in ARB_shader_precision which would hint to use the same definition. defining ULP as digits in binary representation gives you exponentially higher errors. Jan > > Prior to that ext, the precision of everything was to 10^-5. > > -ilia > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev -- Jan Vesely signature.asc Description: This is a digitally signed message part ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 --- Comment #3 from David Lonie --- I included an apitrace on the last bug, and was told that it was not suitable for reproducing bugs (something about apitrace forcing a 4.5 context IIRC). I can make one for this issue if anyone needs it. I share concerns about running random binaries, and have included instructions for building VTK from official sources to reproduce the issue for the more wary developers among us :) I've only tested using the mesa configuration options provided in the report. If there are other configurations you'd like me to test, I'd be happy to do so. Just let me know the exact options to specify -- mesa's configure options are a bit of a black box to me. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] About tolerance calculation on specific (builtin) functions
Andres Gomez writes: > Hi, > > as part of the work done to "Add FP64 support to the i965 shader > backends" at: > https://bugs.freedesktop.org/show_bug.cgi?id=92760 > > I've been working to add piglit tests that would check the new features > added by this addition. > > Due to this, I've been checking and making modifications into the > builtin_functions*py module used by some generators. These modules use > automatic calculation of the tolerance when distance checking the > result of a function. > > As already stated in the good documentation of the module, the > tolerance is computed following what it is stated in OpenGL's specs: > > From the OpenGL 1.4 spec (2.1.1 "Floating-Point Computation"): > > "We require simply that numbers' floating-point parts contain > enough bits ... so that individual results of floating-point > operations are accurate to about 1 part in 10^5." > > Although the text is open to interpretation, and for specific > operations we take a little bit more flexible approach, basically, the > tolerance is calculated as: > > tolerance = / 10⁵ > > This makes sense since the precision of a floating point value gets > reduced while the number gets bigger[1]. > > Following this approach, for a number in the order of 40*10⁵, the > tolerance used is ~40. While this should be OK for most of the > functions, it seems to me that such a high tolerance should not be used > with specific functions, if any tolerance should be used at all. > > For example, when testing the "sign()" function, seems pretty obvious > that using a value of 40 in the tolerance of a function that should > return either 1.0, 0.0 or -1.0 doesn't make much sense. That's an interesting interpretation. I would assume that a tolerance of 1 part in 10^5 for any of these three values would be somewhat less than 40 :) Although that does bring up an interesting point: To get things correct, probably the tolerance has to apply to all values in the chain, both the inputs, outputs and any intermediates. Consider for example the case of a = 40 * 10^5 + 1 b = 40 * 10^5 - 1 c = a - b What is the correct tolerance for c? eirik > A similar case is the "trunc" function and probably others, like > "floor", "ceil", "abs", etc. > > My conclusion is that it should be safe to assume no tolerance in this > functions and I could modify the algorithm used for them in the python > module but I wanted to have some feedback in case I'm not taking into > account something that advices against doing these modifications. > > Opinions? > > [1] https://en.wikipedia.org/wiki/IEEE_floating_point#Basic_and_interch > ange_formats > -- > Br, > > Andres > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 Ilia Mirkin changed: What|Removed |Added Attachment #123465|text/plain |application/bzip2 mime type|| -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 --- Comment #2 from Ilia Mirkin --- You might consider including an apitrace that reproduces the issue. People are probably going to be less inclined to run some random binary off the interwebs. Also, is this only a bug with llvmpipe (and/or gallivm), or have you tested this with various hw drivers in mesa? (Sorry, your wrote a lot of text and I only skimmed it.) -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 94955] Uninitialized variables leads to random segfaults (valgrind log, apitrace attached)
https://bugs.freedesktop.org/show_bug.cgi?id=94955 --- Comment #23 from David Lonie --- Opened a new bug: https://bugs.freedesktop.org/show_bug.cgi?id=95266 Feel free to close this one. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 --- Comment #1 from David Lonie --- The "Reproduction of issue" attachment is a tar.bz2 file, btw. When I try to download it it just dumps the binary as character data to my browser...Let me know if I should re-upload it and do something differently. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.
https://bugs.freedesktop.org/show_bug.cgi?id=95266 Bug ID: 95266 Summary: Geometry missing from rendering, only when using Mesa. Product: Mesa Version: git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: Mesa core Assignee: mesa-dev@lists.freedesktop.org Reporter: david.lo...@kitware.com QA Contact: mesa-dev@lists.freedesktop.org Created attachment 123465 --> https://bugs.freedesktop.org/attachment.cgi?id=123465&action=edit Reproduction of issue VTK's new depth peeling implementation has been tested on numerous platforms (mac, windows, linux, iOS/android ES3) and various OpenGL implementations (nVidia, ATI, Intel, and Mesa), but only fails when using mesa. This issue was first described in another bug report: https://bugs.freedesktop.org/show_bug.cgi?id=94955 but the issue has moved away from the original topic of that report. From the discussion there, it appears to be some issue in the floating point textures used in the new peeling code, as the samplers are returning NaNs (If I understand the discussion correctly). Indeed, the old peeling implementation (which works on Mesa) only used 32-bit RGBA textures. I've provided an executable from VTK that reproduces the issue, a compiled mesa libGL.so, and sample outputs using nVidia and Mesa OpenGL implementations, along with details on how VTK and Mesa were configured. The README included in the tarball is pasted below for further details. - This directory contains a reproduction of a Mesa OpenGL bug that affects VTK's new depth peeling code. It contains the following files: libGL.so - A compiled mesa OpenGL implementation, built against 38fcf7cb. run-mesa.sh - A script to run the test executable with the included mesa library. run-sysGL.sh - A script to run the test executable using the system OpenGL library. TestTranslucentLUTDepthPeeling-valid.png - Example valid output for test. TestTranslucentLUTDepthPeeling-mesa.png - Incorrect output for test from mesa. vtkRenderingCoreCxxTests -- The test executable. # Reproducing the bug Execute run-sysGL.sh. This will run the TestTranslucentLUTDepthPeeling test in the vtkRenderingCoreCxxTests executable using the libGL.so found in LD_LIBRARY_PATH. This code has been successfully tested on windows, mac, iOS/android GL ES3, and linux, using nVidia, intel, and ATI OpenGL implementations. The output should resemble the included 'valid' png. Execute run-mesa.sh. This runs the test using the included libGL.so (change the LD_PRELOAD value in the script to test a different library). The output will resemble the included 'mesa' png -- just the background, with no visible geometry. # Notes: Building Mesa The included mesa libGL.so was compiled with the options: ./autogen.sh \ --enable-debug \ --prefix=/ssd/src/mesa-master-install-nomangle \ --disable-dri \ --disable-egl \ --disable-gles1 \ --disable-gles2 \ --disable-shared-glapi \ --enable-xlib-glx \ --enable-gallium-osmesa \ --with-gallium-drivers=swrast \ --enable-gallium-llvm=yes \ LLVM_CONFIG=/path/to/llvm-3.8.0/llvm-config \ --enable-llvm-shared-libs with debugging symbols stripped later to reduce file size. # Notes: Building VTK The VTK test can be built by downloading the VTK source code from our git repo: https://gitlab.kitware.com/vtk/vtk.git I built against HEAD acd85fe48d. There is currently a workaround that falls back to an older depth peeling implementation for Mesa 11.3.0(-devel). To reproduce the bug, edit Rendering/OpenGL2/vtkOpenGLRenderer.cxx and remove the version check in vtkOpenGLRenderer::DeviceRenderTranslucentPolygonalGeometry, around line 307: std::string glVersion = reinterpret_cast(glGetString(GL_VERSION)); if (glVersion.find("Mesa 11.3.0") != std::string::npos) { vtkDebugMacro("Disabling dual depth peeling -- mesa bug detected. " "GL_VERSION = " << glVersion); dualDepthPeelingSupported = false; } } Configure using cmake: mkdir vtk-build cd vtk-build cmake path/to/vtk/sourcecode/ \ -DVTK_RENDERING_BACKEND=OpenGL2 \ -DOPENGL_INCLUDE_DIR=/path/to/installed/mesa/include -DOPENGL_gl_LIBRARY=/path/to/installed/mesa/lib/libGL.so -DOPENGL_glu_LIBRARY="" Run 'make' to build, and run the depth peeling tests: cd vtk-build ctest -R DepthPeel (use "ctest -V -R 'SomeTestName'" to see the actual executables/arguments/output) -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] About tolerance calculation on specific (builtin) functions
On Wed, May 4, 2016 at 1:41 PM, Connor Abbott wrote: > On Wed, May 4, 2016 at 1:05 PM, Andres Gomez wrote: >> Hi, >> >> as part of the work done to "Add FP64 support to the i965 shader >> backends" at: >> https://bugs.freedesktop.org/show_bug.cgi?id=92760 >> >> I've been working to add piglit tests that would check the new features >> added by this addition. >> >> Due to this, I've been checking and making modifications into the >> builtin_functions*py module used by some generators. These modules use >> automatic calculation of the tolerance when distance checking the >> result of a function. >> >> As already stated in the good documentation of the module, the >> tolerance is computed following what it is stated in OpenGL's specs: >> >> From the OpenGL 1.4 spec (2.1.1 "Floating-Point Computation"): >> >> "We require simply that numbers' floating-point parts contain >> enough bits ... so that individual results of floating-point >> operations are accurate to about 1 part in 10^5." >> >> Although the text is open to interpretation, and for specific >> operations we take a little bit more flexible approach, basically, the >> tolerance is calculated as: >> >> tolerance = / 10⁵ >> >> This makes sense since the precision of a floating point value gets >> reduced while the number gets bigger[1]. >> >> Following this approach, for a number in the order of 40*10⁵, the >> tolerance used is ~40. While this should be OK for most of the >> functions, it seems to me that such a high tolerance should not be used >> with specific functions, if any tolerance should be used at all. >> >> For example, when testing the "sign()" function, seems pretty obvious >> that using a value of 40 in the tolerance of a function that should >> return either 1.0, 0.0 or -1.0 doesn't make much sense. >> >> A similar case is the "trunc" function and probably others, like >> "floor", "ceil", "abs", etc. >> >> My conclusion is that it should be safe to assume no tolerance in this >> functions and I could modify the algorithm used for them in the python >> module but I wanted to have some feedback in case I'm not taking into >> account something that advices against doing these modifications. >> >> Opinions? > > Hi, > > If you look at the GLSL 4.40 spec, in section 4.7.1 ("Range and > Precision") you'll find a table listing the precision of various > operations. Your intuition about floor(), ceil(), etc. needing the > exact result is correct, as you can see. For doubles, it says "The > precision of double-precision operations is at least that of single > precision." Now, it's up for interpretation whether that means that > they must have the same *absolute* precision or the same ULP's (if the > operation is not exact). For example inversesqrt() is listed at 2 ULP > for single precision, which means that there must be 24 - 2 = 22 bits > of precision. For doubles, are there still 22 bits of precision > required, or is the requirement really that there still be 2 ULP's of > precision in which case there are 53 - 2 = 51 bits of precision. I > wrote the original lowering pass for inversesqrt() and friends > assuming the latter was correct, since it seems like the most sane to > me (or else doubles would have no advantage over floats for anything > except addition and multiplication). All of this is added by ARB_shader_precision (part of GLSL 4.10). It also punts on doubles, good to know that the latest specs have maintained the status quo. One interpretation of it is "you can implement doubles with float ops". Another is that the ULP's apply to doubles as well. Prior to that ext, the precision of everything was to 10^-5. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] About tolerance calculation on specific (builtin) functions
On Wed, May 4, 2016 at 1:05 PM, Andres Gomez wrote: > Hi, > > as part of the work done to "Add FP64 support to the i965 shader > backends" at: > https://bugs.freedesktop.org/show_bug.cgi?id=92760 > > I've been working to add piglit tests that would check the new features > added by this addition. > > Due to this, I've been checking and making modifications into the > builtin_functions*py module used by some generators. These modules use > automatic calculation of the tolerance when distance checking the > result of a function. > > As already stated in the good documentation of the module, the > tolerance is computed following what it is stated in OpenGL's specs: > > From the OpenGL 1.4 spec (2.1.1 "Floating-Point Computation"): > > "We require simply that numbers' floating-point parts contain > enough bits ... so that individual results of floating-point > operations are accurate to about 1 part in 10^5." > > Although the text is open to interpretation, and for specific > operations we take a little bit more flexible approach, basically, the > tolerance is calculated as: > > tolerance = / 10⁵ > > This makes sense since the precision of a floating point value gets > reduced while the number gets bigger[1]. > > Following this approach, for a number in the order of 40*10⁵, the > tolerance used is ~40. While this should be OK for most of the > functions, it seems to me that such a high tolerance should not be used > with specific functions, if any tolerance should be used at all. > > For example, when testing the "sign()" function, seems pretty obvious > that using a value of 40 in the tolerance of a function that should > return either 1.0, 0.0 or -1.0 doesn't make much sense. > > A similar case is the "trunc" function and probably others, like > "floor", "ceil", "abs", etc. > > My conclusion is that it should be safe to assume no tolerance in this > functions and I could modify the algorithm used for them in the python > module but I wanted to have some feedback in case I'm not taking into > account something that advices against doing these modifications. > > Opinions? Hi, If you look at the GLSL 4.40 spec, in section 4.7.1 ("Range and Precision") you'll find a table listing the precision of various operations. Your intuition about floor(), ceil(), etc. needing the exact result is correct, as you can see. For doubles, it says "The precision of double-precision operations is at least that of single precision." Now, it's up for interpretation whether that means that they must have the same *absolute* precision or the same ULP's (if the operation is not exact). For example inversesqrt() is listed at 2 ULP for single precision, which means that there must be 24 - 2 = 22 bits of precision. For doubles, are there still 22 bits of precision required, or is the requirement really that there still be 2 ULP's of precision in which case there are 53 - 2 = 51 bits of precision. I wrote the original lowering pass for inversesqrt() and friends assuming the latter was correct, since it seems like the most sane to me (or else doubles would have no advantage over floats for anything except addition and multiplication). Also, you might want to modify the FP64 tests to test the additional range that doubles provide. Last time I looked at them, they didn't do that. Connor > > [1] https://en.wikipedia.org/wiki/IEEE_floating_point#Basic_and_interch > ange_formats > -- > Br, > > Andres > > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] About tolerance calculation on specific (builtin) functions
On Wed, May 4, 2016 at 10:05 AM, Andres Gomez wrote: > Hi, > > as part of the work done to "Add FP64 support to the i965 shader > backends" at: > https://bugs.freedesktop.org/show_bug.cgi?id=92760 > > I've been working to add piglit tests that would check the new features > added by this addition. > > Due to this, I've been checking and making modifications into the > builtin_functions*py module used by some generators. These modules use > automatic calculation of the tolerance when distance checking the > result of a function. > > As already stated in the good documentation of the module, the > tolerance is computed following what it is stated in OpenGL's specs: > > From the OpenGL 1.4 spec (2.1.1 "Floating-Point Computation"): > > "We require simply that numbers' floating-point parts contain > enough bits ... so that individual results of floating-point > operations are accurate to about 1 part in 10^5." > > Although the text is open to interpretation, and for specific > operations we take a little bit more flexible approach, basically, the > tolerance is calculated as: > > tolerance = / 10⁵ > > This makes sense since the precision of a floating point value gets > reduced while the number gets bigger[1]. > > Following this approach, for a number in the order of 40*10⁵, the > tolerance used is ~40. While this should be OK for most of the > functions, it seems to me that such a high tolerance should not be used > with specific functions, if any tolerance should be used at all. > > For example, when testing the "sign()" function, seems pretty obvious > that using a value of 40 in the tolerance of a function that should > return either 1.0, 0.0 or -1.0 doesn't make much sense. > > A similar case is the "trunc" function and probably others, like > "floor", "ceil", "abs", etc. > > My conclusion is that it should be safe to assume no tolerance in this > functions and I could modify the algorithm used for them in the python > module but I wanted to have some feedback in case I'm not taking into > account something that advices against doing these modifications. > > Opinions? Yes, I agree. I noticed the "trunc" test provided some tolerance, and really... it shouldn't. We should be able to calculate trunc (and a bunch of other functions like you mention) exactly. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] add MAINTAINERS and get_maintainer.pl script
On Wed, May 4, 2016 at 1:04 PM, Eric Engestrom wrote: > On Tue, May 03, 2016 at 11:11:56AM -0400, Rob Clark wrote: >> >> Ok, I've renamed to REVIEWERS and scripts/get_reviewer.pl and updated >> the verbage appropriately. > > scripts/get_reviewer.pl still obeys .get_maintainer.conf and > .get_maintainer.ignore. Is that something we want to make more > consistent? maybe? tbh, I only tried to make minimal changes compared to linux get_maintainer.pl >> I think I'll go ahead and push it this afternoon if nobody screams. >> No formal ack-by's, but a handful of yeah-thats-a-good-idea-by's and >> no objections so far. > > I just tested it, and it barfed some UTF8-as-ASCII at me, along with > an error message and some text that shouldn't be there: > > Bad divisor in main::vcs_assign: 0 > (cc-cmd) Adding cc: "GitAuthor: Jason Ekstrand" > from: 'scripts/get_reviewer.pl' > (cc-cmd) Adding cc: =?UTF-8?q?GitAuthor=3A=20Kristian=20H=C3=B8gsberg?= > from: 'scripts/get_reviewer.pl' > (cc-cmd) Adding cc: Chad Versace from: > 'scripts/get_reviewer.pl' > (cc-cmd) Adding cc: =?UTF-8?q?Kristian=20H=C3=B8gsberg?= > from: 'scripts/get_reviewer.pl' > > The UTF-8 markup *might* be necessary if whatever consumes this doesn't > understand actual UTF-8, but the error and the extra "GitAuthor: " > definitely don't belong :] I think the UTF-8 is necessary when encoding email addresses. This has been how it has always worked for me in the past. No idea about the "GitAuthor:" bits.. in the end the CC's that it has added when used from git-send-email have seemed correct to me. BR, -R > Cheers, > Eric ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] About tolerance calculation on specific (builtin) functions
Hi, as part of the work done to "Add FP64 support to the i965 shader backends" at: https://bugs.freedesktop.org/show_bug.cgi?id=92760 I've been working to add piglit tests that would check the new features added by this addition. Due to this, I've been checking and making modifications into the builtin_functions*py module used by some generators. These modules use automatic calculation of the tolerance when distance checking the result of a function. As already stated in the good documentation of the module, the tolerance is computed following what it is stated in OpenGL's specs: From the OpenGL 1.4 spec (2.1.1 "Floating-Point Computation"): "We require simply that numbers' floating-point parts contain enough bits ... so that individual results of floating-point operations are accurate to about 1 part in 10^5." Although the text is open to interpretation, and for specific operations we take a little bit more flexible approach, basically, the tolerance is calculated as: tolerance = / 10⁵ This makes sense since the precision of a floating point value gets reduced while the number gets bigger[1]. Following this approach, for a number in the order of 40*10⁵, the tolerance used is ~40. While this should be OK for most of the functions, it seems to me that such a high tolerance should not be used with specific functions, if any tolerance should be used at all. For example, when testing the "sign()" function, seems pretty obvious that using a value of 40 in the tolerance of a function that should return either 1.0, 0.0 or -1.0 doesn't make much sense. A similar case is the "trunc" function and probably others, like "floor", "ceil", "abs", etc. My conclusion is that it should be safe to assume no tolerance in this functions and I could modify the algorithm used for them in the python module but I wanted to have some feedback in case I'm not taking into account something that advices against doing these modifications. Opinions? [1] https://en.wikipedia.org/wiki/IEEE_floating_point#Basic_and_interch ange_formats -- Br, Andres signature.asc Description: This is a digitally signed message part ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] add MAINTAINERS and get_maintainer.pl script
On Tue, May 03, 2016 at 11:11:56AM -0400, Rob Clark wrote: > > Ok, I've renamed to REVIEWERS and scripts/get_reviewer.pl and updated > the verbage appropriately. scripts/get_reviewer.pl still obeys .get_maintainer.conf and .get_maintainer.ignore. Is that something we want to make more consistent? > I think I'll go ahead and push it this afternoon if nobody screams. > No formal ack-by's, but a handful of yeah-thats-a-good-idea-by's and > no objections so far. I just tested it, and it barfed some UTF8-as-ASCII at me, along with an error message and some text that shouldn't be there: Bad divisor in main::vcs_assign: 0 (cc-cmd) Adding cc: "GitAuthor: Jason Ekstrand" from: 'scripts/get_reviewer.pl' (cc-cmd) Adding cc: =?UTF-8?q?GitAuthor=3A=20Kristian=20H=C3=B8gsberg?= from: 'scripts/get_reviewer.pl' (cc-cmd) Adding cc: Chad Versace from: 'scripts/get_reviewer.pl' (cc-cmd) Adding cc: =?UTF-8?q?Kristian=20H=C3=B8gsberg?= from: 'scripts/get_reviewer.pl' The UTF-8 markup *might* be necessary if whatever consumes this doesn't understand actual UTF-8, but the error and the extra "GitAuthor: " definitely don't belong :] Cheers, Eric ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] main: uses casts to silence some _mesa_debug() format warnings
Reviewed-by: Sinclair Yeh On Mon, May 02, 2016 at 07:15:11PM -0600, Brian Paul wrote: > Silences warnings with 32-bit Linux gcc builds and MinGW which doesn't > recognize the ‘t’ conversion character. > --- > src/mesa/main/bufferobj.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c > index 731b62e..e60a8ea 100644 > --- a/src/mesa/main/bufferobj.c > +++ b/src/mesa/main/bufferobj.c > @@ -2487,8 +2487,9 @@ _mesa_map_buffer_range(struct gl_context *ctx, > > if (offset + length > bufObj->Size) { >_mesa_error(ctx, GL_INVALID_VALUE, > - "%s(offset %td + length %td > buffer_size %td)", func, > - offset, length, bufObj->Size); > + "%s(offset %lu + length %lu > buffer_size %lu)", func, > + (unsigned long) offset, (unsigned long) length, > + (unsigned long) bufObj->Size); >return NULL; > } > > @@ -3749,8 +3750,9 @@ _mesa_BindBufferRange(GLenum target, GLuint index, > struct gl_buffer_object *bufObj; > > if (MESA_VERBOSE & VERBOSE_API) { > - _mesa_debug(ctx, "glBindBufferRange(%s, %u, %u, %ld, %ld)\n", > - _mesa_enum_to_string(target), index, buffer, offset, size); > + _mesa_debug(ctx, "glBindBufferRange(%s, %u, %u, %lu, %lu)\n", > + _mesa_enum_to_string(target), index, buffer, > + (unsigned long) offset, (unsigned long) size); > } > > if (buffer == 0) { > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swr: Remove stall waiting for core query counters.
Reviewed-By: George Kyriazis > -Original Message- > From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On > Behalf Of BruceCherniak > Sent: Thursday, April 28, 2016 12:13 PM > To: mesa-dev@lists.freedesktop.org > Subject: [Mesa-dev] [PATCH] swr: Remove stall waiting for core query > counters. > > When gathering query results, swr_gather_stats was > unnecessarily stalling the entire pipeline. Results are now > collected asynchronously, with a fence marking completion. > --- > src/gallium/drivers/swr/swr_fence.cpp |6 - > src/gallium/drivers/swr/swr_fence.h |8 ++ > src/gallium/drivers/swr/swr_query.cpp | 180 -- > --- > src/gallium/drivers/swr/swr_query.h | 11 ++- > 4 files changed, 81 insertions(+), 124 deletions(-) > > diff --git a/src/gallium/drivers/swr/swr_fence.cpp > b/src/gallium/drivers/swr/swr_fence.cpp > index 2e95b39..8a8e864 100644 > --- a/src/gallium/drivers/swr/swr_fence.cpp > +++ b/src/gallium/drivers/swr/swr_fence.cpp > @@ -105,12 +105,6 @@ swr_fence_reference(struct pipe_screen *screen, >swr_fence_destroy(old); > } > > -static INLINE boolean > -swr_is_fence_done(struct pipe_fence_handle *fence_handle) > -{ > - struct swr_fence *fence = swr_fence(fence_handle); > - return (fence->read == fence->write); > -} > > /* > * Wait for the fence to finish. > diff --git a/src/gallium/drivers/swr/swr_fence.h > b/src/gallium/drivers/swr/swr_fence.h > index df3776e..47f4d2e 100644 > --- a/src/gallium/drivers/swr/swr_fence.h > +++ b/src/gallium/drivers/swr/swr_fence.h > @@ -45,6 +45,14 @@ swr_fence(struct pipe_fence_handle *fence) > return (struct swr_fence *)fence; > } > > + > +static INLINE boolean > +swr_is_fence_done(struct pipe_fence_handle *fence_handle) > +{ > + struct swr_fence *fence = swr_fence(fence_handle); > + return (fence->read == fence->write); > +} > + > static INLINE boolean > swr_is_fence_pending(struct pipe_fence_handle *fence_handle) > { > diff --git a/src/gallium/drivers/swr/swr_query.cpp > b/src/gallium/drivers/swr/swr_query.cpp > index f038a6e..5c59965 100644 > --- a/src/gallium/drivers/swr/swr_query.cpp > +++ b/src/gallium/drivers/swr/swr_query.cpp > @@ -62,10 +62,8 @@ swr_destroy_query(struct pipe_context *pipe, struct > pipe_query *q) > struct swr_query *pq = swr_query(q); > > if (pq->fence) { > - if (!swr_is_fence_pending(pq->fence)) { > - swr_fence_submit(swr_context(pipe), pq->fence); > + if (swr_is_fence_pending(pq->fence)) > swr_fence_finish(pipe->screen, pq->fence, 0); > - } >swr_fence_reference(pipe->screen, &pq->fence, NULL); > } > > @@ -73,100 +71,45 @@ swr_destroy_query(struct pipe_context *pipe, > struct pipe_query *q) > } > > > -// XXX Create a fence callback, rather than stalling SwrWaitForIdle > static void > swr_gather_stats(struct pipe_context *pipe, struct swr_query *pq) > { > struct swr_context *ctx = swr_context(pipe); > > assert(pq->result); > - union pipe_query_result *result = pq->result; > + struct swr_query_result *result = pq->result; > boolean enable_stats = pq->enable_stats; > - SWR_STATS swr_stats = {0}; > - > - if (pq->fence) { > - if (!swr_is_fence_pending(pq->fence)) { > - swr_fence_submit(ctx, pq->fence); > - swr_fence_finish(pipe->screen, pq->fence, 0); > - } > - swr_fence_reference(pipe->screen, &pq->fence, NULL); > - } > > - /* > -* These queries don't need SWR Stats enabled in the core > -* Set and return. > -*/ > + /* A few results don't require the core, so don't involve it */ > switch (pq->type) { > case PIPE_QUERY_TIMESTAMP: > case PIPE_QUERY_TIME_ELAPSED: > - result->u64 = swr_get_timestamp(pipe->screen); > - return; > + result->timestamp = swr_get_timestamp(pipe->screen); >break; > case PIPE_QUERY_TIMESTAMP_DISJOINT: > - /* nothing to do here */ > - return; > - break; > case PIPE_QUERY_GPU_FINISHED: > - result->b = TRUE; /* XXX TODO Add an api func to SWR to compare > drawId > - vs LastRetiredId? */ > - return; > + /* nothing to do here */ >break; > default: > - /* Any query that needs SwrCore stats */ > - break; > - } > - > - /* > -* All other results are collected from SwrCore counters > -*/ > + /* > + * All other results are collected from SwrCore counters via > + * SwrGetStats. This returns immediately, but results are later filled > + * in by the backend. Fence status is the only indication of > + * completion. */ > + SwrGetStats(ctx->swrContext, &result->core); > + > + if (!pq->fence) { > + struct swr_screen *screen = swr_screen(pipe->screen); > + swr_fence_reference(pipe->screen, &pq->fence, screen- > >flush_fence); > + } > + swr_fence_submit(ctx, pq->fence); > > - /* XXX, Should turn this into a fence
Re: [Mesa-dev] [PATCH 08/11] glapi: Harden GLX request size processing
On Mon, 2016-03-28 at 11:10 -0700, Ian Romanick wrote: > > @@ -428,7 +428,7 @@ class PrintGlxReqSize_h(PrintGlxReqSize_common): > def printBody(self, api): > > for func in api.functionIterateGlx(): > > if not func.ignore and func.has_variable_size_request(): > > -print 'extern PURE _X_HIDDEN int __glX%sReqSize(const > > GLbyte *pc, Bool swap);' % (func.name) > > +print 'extern PURE _X_HIDDEN int __glX%sReqSize(const > > GLbyte *pc, Bool swap, int reqlen);' % (func.name) > > I don't see where reqlen is used. Did I miss something? You didn't. It's only used (or needed) in __glXDrawArraysReqSize, which isn't generated. But since the prototype needs to match... -ajax ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC demos] a mesa equivalent to vulkaninfo
Hi, some weeks ago Dave Airlie made the review for the ARB_internalformat_query2 extension support (again, thanks a lot), and at the end he suggested to create a "mesa-demos glxinfo super query app. Something that does what vulkaninfo kinda does" [1]. As I mentioned at the moment, I had a heavily modified piglit test that was doing something similar when working on the extension. So after some cleaning I have a standalone version of that program. But before start to work to integrate it with mesa demos, and propose it to the list as a patch, I have some doubts, and I hope someone is interested in answering me: * Formatting: right now I decided to format the outcome as csv. So something like this: 64 bit, GL_SAMPLES, GL_RENDERBUFFER, GL_DEPTH_COMPONENT, "8,4" 64 bit, GL_SAMPLES, GL_RENDERBUFFER, GL_DEPTH_STENCIL, "8,4" 64 bit, GL_SAMPLES, GL_RENDERBUFFER, GL_RED, "8,4" 64 bit, GL_SAMPLES, GL_RENDERBUFFER, GL_RG, "8,4" The value between quotes, because the number of possible values is variable (although right now only SAMPLES can return more that one value). The reason is that with so many possible target/internalformat/pname combinations, there isn't a simple way to fit it on a table. I also tried to not repeat the information on any line, but then you need to go back and forth to know what you are seeing. I assume that this format is good enough to dump the outcome to a file if you want to check something in particular, and that any kind of pretty-print is out of the scope of this little demo. I also guess that csv would be good enough if someone wants to implement that pretty-printer as a something external. In any case I'm open to any other kind of formatting. * Getting the gl enum name: ok, now the hack. In order to get the name for all the gl enums I just copied a method from piglit-util-gl-enum-gen.c. That is a generated file. The correct thing would be do as piglit, and generate that source file from the khronos xml. But that sounded like an overkill to me. Additionally that khronos xml is like 2.5M. vulkaninfo try to do some define/# magic, but I don't think that the solution would be better for this case [2], where we have more enums to take into account. * For the initialization I used glut and glew, instead of something more low-level like glxinfo. There are already several mesa-demos using both, so I assumed that it would not be any problem to do the same here. * Right now the name is query2-info. It is not really impressive, but I can't think on anything better. Suggestion are welcome. If someone wants to take a look to the standalone program, you can find it here [3]. Thanks in advance. BR [1] https://lists.freedesktop.org/archives/mesa-dev/2016-March/108954.html [2] https://github.com/LunarG/VulkanTools/blob/master/demos/vulkaninfo.c#L204 [3] https://github.com/infapi00/query2-info ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/59] Initial arb_gpu_shader_fp64 support to the i965 scalar backend
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 03/05/16 20:59, Kenneth Graunke wrote: > Other than patches 37, 56, and ones you agreed to drop, the series > is: Reviewed-by: Kenneth Graunke > > I think you can go ahead and land all except those, and we can > land new solutions for those problems afterwards. > > We still need to fix the horiz_offset problem, and I think Curro's > subscript() helper is a great way to do that. I would be fine > with fixing that after this series lands - it would probably mean > less rebasing :) > We prepared a patch series without patches 11, 22 (both were dropped), 37, 40 (it has a NACK from Curro) and 56. Our idea was to push the patch series once we check there is no piglit regressions. We got 0 piglit regressions on ILK, SNB, IVB, HSW and BDW, the latter with the extension enabled with the environment variable. Sadly, on SKL we had regressions (including GPU hangs) only if we *enable* the arb_gpu_shader_fp64 extension. Those regressions are fixed by patch 40 but we need still to investigate why they are happening. For that reason, we have not pushed anything yet. Tomorrow we will investigate why we have those regressions only on SKL and which tests are affected. JFTR, if we disable the extension on SKL (i.e. we don't set the environment variable), we have 0 piglit regressions on that generation. Thanks, Sam -BEGIN PGP SIGNATURE- Version: GnuPG v2 iQIcBAEBCAAGBQJXKgPaAAoJEH/0ujLxfcNDtHgP/0JBd96UX2026Y0zBydrE1Ta U7FV5e0SJXY5JnxWqdV4ZhtdiqQ0EDLj2G0NlFq0u29c+lQX//x9KHMANxbt2tdc jStRm4NMNXds4rxFYaiRs+3p4XLquV4aZ33Px+0CT6m6Hjsm6sNJZViarB41SjAq HVRuKAxBbVTjqqPXBKAoXXxp34OUym5uZydXFjavCs1lU70DVwZq9t/vzIAGUunT NCbehScgtdVupTDEzORZ7QRXvGyqQoJa+HBeuEK6HvvhonfezwrB1hawIRGP34V/ /k+UvL/DA3GEenpCqHfOIwnxVRYN6iXYNlHSNK+LolsjR/CUxccBs67tcQiGAPBA g8V02KHbuIN2JyXLSwzJ0ovX5aA+vr5Fuc8e+0PZ7raa7GxIsDg3LbUNnyT8dT83 fQAM6IMFYhZUOKKF8ynqw5tvRAaUhWJwageYqRkdHROSs8toero7RDAabEdBKWhG scGkKEu3S7vsGLMBzeSch/UqyAXdaBcJzRSTKCi2UHyWT/5aPczqf4eacyLgWpa/ 3r+qqzljZ2F8MUbnyP71CyTufx+UQzoBAlB4kM09UcF9RC95XTSQrMBMB/SqR2Om +ebdE/d1e5ubExNQTbQPrb2OGCIiQhRGQo8MFcuFtgl2DabJFvW4QvsuJKinhQLR zcfupbRUPDegesKRU+es =phid -END PGP SIGNATURE- ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 95211] scons TypeError: 'tuple' object is not callable
https://bugs.freedesktop.org/show_bug.cgi?id=95211 Jose Fonseca changed: What|Removed |Added Assignee|mesa-dev@lists.freedesktop. |jfons...@vmware.com |org | --- Comment #1 from Jose Fonseca --- It looks like SCons changed some of its internals. Most likely it won't be an easy fix. I'm not sure when I'll be able to look into this. My suggestion is to use SCons 2.4.x for now. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCHv2 10/23] i965/fs: Stop using the LOAD_PAYLOAD instruction in lower_simd_width.
Hi Curro, Am 04.05.2016 um 06:26 schrieb Francisco Jerez: > Instead of using the LOAD_PAYLOAD instruction (emitted through the > emit_transpose() helper that is no longer useful and this commit > removes) which had to be marked force_writemask_all in some cases, > emit a series of moves to apply proper channel enable signals to the > destination. Until now lower_simd_width() had mainly been used to > lower things that invariably had a basic block-local temporary as > destination so it didn't seem like a big deal, but I found it to be > the reason for several Piglit regressions in my SIMD32 branch and > Igalia discovered the same issue independently while working on FP64 > support. > --- > This is taken from the following WIP series: > https://cgit.freedesktop.org/~currojerez/mesa/log/?h=i965-late-simd-lowering > > See also: > https://lists.freedesktop.org/archives/mesa-dev/2016-May/115596.html > > src/mesa/drivers/dri/i965/brw_fs.cpp | 58 > +++- > 1 file changed, 18 insertions(+), 40 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index 4b6aa67..34599cb 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -4625,29 +4625,6 @@ get_lowered_simd_width(const struct brw_device_info > *devinfo, > } > } > > -/** > - * The \p rows array of registers represents a \p num_rows by \p num_columns > - * matrix in row-major order, write it in column-major order into the > register > - * passed as destination. \p stride gives the separation between matrix > - * elements in the input in fs_builder::dispatch_width() units. > - */ > -static void > -emit_transpose(const fs_builder &bld, > - const fs_reg &dst, const fs_reg *rows, > - unsigned num_rows, unsigned num_columns, unsigned stride) > -{ > - fs_reg *const components = new fs_reg[num_rows * num_columns]; > - > - for (unsigned i = 0; i < num_columns; ++i) { > - for (unsigned j = 0; j < num_rows; ++j) > - components[num_rows * i + j] = offset(rows[j], bld, stride * i); > - } > - > - bld.LOAD_PAYLOAD(dst, components, num_rows * num_columns, 0); > - > - delete[] components; > -} > - > bool > fs_visitor::lower_simd_width() > { > @@ -4698,16 +4675,19 @@ fs_visitor::lower_simd_width() > if (inst->src[j].file != BAD_FILE && > !is_uniform(inst->src[j])) { >/* Get the i-th copy_width-wide chunk of the source. */ > - const fs_reg src = horiz_offset(inst->src[j], copy_width * > i); > + const fs_builder cbld = lbld.group(copy_width, 0); > + const fs_reg src = offset(inst->src[j], cbld, i); >const unsigned src_size = inst->components_read(j); > > - /* Use a trivial transposition to copy one every n > - * copy_width-wide components of the register into a > - * temporary passed as source to the lowered instruction. > + /* Copy one every n copy_width-wide components of the That first sentence is really unclear. Shouldnt it sound "Copy each copy_width-wide component of..." Michael > + * register into a temporary passed as source to the > lowered > + * instruction. > */ >split_inst.src[j] = lbld.vgrf(inst->src[j].type, src_size); > - emit_transpose(lbld.group(copy_width, 0), > - split_inst.src[j], &src, 1, src_size, n); > + > + for (unsigned k = 0; k < src_size; ++k) > + cbld.MOV(offset(split_inst.src[j], lbld, k), > + offset(src, cbld, n * k)); > } > } > > @@ -4726,20 +4706,18 @@ fs_visitor::lower_simd_width() > } > > if (inst->regs_written) { > -/* Distance between useful channels in the temporaries, skipping > - * garbage if the lowered instruction is wider than the original. > - */ > -const unsigned m = lower_width / copy_width; > +const fs_builder lbld = ibld.group(lower_width, 0); > > /* Interleave the components of the result from the lowered > - * instructions. We need to set exec_all() when copying more > than > - * one half per component, because LOAD_PAYLOAD (in terms of > which > - * emit_transpose is implemented) can only use the same channel > - * enable signals for all of its non-header sources. > + * instructions. > */ > -emit_transpose(ibld.exec_all(inst->exec_size > copy_width) > - .group(copy_width, 0), > - inst->dst, dsts, n, dst_size, m); > +for (unsigned i = 0; i < dst_size; ++i) { > +
Re: [Mesa-dev] [PATCH v2 03/25] anv: tweak the %.json rule
On fredag 22. april 2016 19.55.10 CEST Emil Velikov wrote: > From: Emil Velikov > > It's used only by dev_icd.json so just call it that way. While we're > here, manually expand $< (as it might cause issue on some systems) > and drop the unneeded install_libdir substitution. > > Cc: Jason Ekstrand > Signed-off-by: Emil Velikov > --- > This commit replaces "anv: remove no longer needed %.json rule" > --- > src/intel/vulkan/Makefile.am | 7 +++ > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/src/intel/vulkan/Makefile.am b/src/intel/vulkan/Makefile.am > index 916f171..6d93906 100644 > --- a/src/intel/vulkan/Makefile.am > +++ b/src/intel/vulkan/Makefile.am > @@ -190,11 +190,10 @@ icdconf_DATA = intel_icd.json > # The following is used for development purposes, by setting > VK_ICD_FILENAMES. noinst_DATA = dev_icd.json > > -%.json : %.json.in > +dev_icd.json : dev_icd.json.in > $(AM_V_GEN) $(SED) \ > - -e "s#@build_libdir@#${abs_top_builddir}/${LIB_DIR}#" \ > - -e "s#@install_libdir@#${libdir}#" < $< > $@ > - > + -e "s#@build_libdir@#${abs_top_builddir}/${LIB_DIR}#" > + < $(srcdir)/dev_icd.json.in > $@ > > # Libvulkan with dummy gem. Used for unit tests. Hello. This patch breaks the build for me, as it hangs on the sed line. I've sent a small patch that fixes the issue. -- Knut ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] anv: fix hang during generation of dev_icd.json.
--- src/intel/vulkan/Makefile.am | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/intel/vulkan/Makefile.am b/src/intel/vulkan/Makefile.am index fd8aae0..662d720 100644 --- a/src/intel/vulkan/Makefile.am +++ b/src/intel/vulkan/Makefile.am @@ -156,7 +156,7 @@ noinst_DATA = dev_icd.json dev_icd.json : dev_icd.json.in $(AM_V_GEN) $(SED) \ - -e "s#@build_libdir@#${abs_top_builddir}/${LIB_DIR}#" + -e "s#@build_libdir@#${abs_top_builddir}/${LIB_DIR}#" \ < $(srcdir)/dev_icd.json.in > $@ # Libvulkan with dummy gem. Used for unit tests. -- 2.8.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/15] i965: get the proper vertex surface type for doubles on gen8+
On 04/05/16 12:31, Kenneth Graunke wrote: > On Thursday, April 28, 2016 1:40:31 PM PDT Antia Puentes wrote: >> From: Alejandro Piñeiro >> >> This commit adds support for PASSTHRU format when pushing >> double-precision attributes. >> >> Check glarray->Doubles in order to know if we should choose a format >> that does a conversion to float, or just passthru the 64-bit double. >> --- >> src/mesa/drivers/dri/i965/brw_draw_upload.c | 30 + > +--- >> 1 file changed, 27 insertions(+), 3 deletions(-) > What's the motivation for using the PASSTHRU formats? Is there some > advantage over just using the R64*FLOAT formats? We can't use R64*FLOAT because it does a double to float conversion, as explained at the bdw PRM (vol 7, section 10.15, page 470, Table "Source Element Formats Supported in VF Unit"). PASSTHRU ensures that the 64-bits are passed without any conversion, that is what we want. If it is not clear reading the commit message, I could expand it. FWIW, the R64*FLOAT format is used when calling the previous API (without *L*). You can call glVertexAttribPointer using GL_DOUBLE as type, but it will feed the shaders with floats. That conversion is done using R64*FLOAT formats when uploading the data. When using glVertexAttrib with GL_DOUBLE, the casting to float is done on the call itself, at mesa/main. Finally, the alternative to R64*PASSTHRU formats would be using R32*FLOAT formats, and doing two uploads if needed (for dvec3/dvec4). We needed to do that for the in-progress gen7 support, and means some extra code compared to just use PASSTHRU. If you are curious: https://github.com/Igalia/mesa/commit/7d4b0c5cff33a3b555a9c5d38b1af0aeba697cac Best regards ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swr: Fix resource leak in object.
On 05/04/2016 08:52 AM, Rowley, Timothy O wrote: After shader compilation, the gallivm_state is owned by ShaderVariant<> and destroyed by its destructor (swr_state.h) so that the generated JIT code and gallivm_state can be freed when no longer needed. Oh. Splendid. Thanks for looking into it. Rob. On May 3, 2016, at 5:45 PM, robert.f...@collabora.com wrote: From: Robert Foss Make sure that memory allocated is free'd. Previously only the contents of the variable galliumvm was free'd, not the actual memory it points to. Coverity: 1358907 Signed-off-by: Robert Foss --- src/gallium/drivers/swr/swr_shader.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/drivers/swr/swr_shader.cpp b/src/gallium/drivers/swr/swr_shader.cpp index f693f51..5b1b1ee 100644 --- a/src/gallium/drivers/swr/swr_shader.cpp +++ b/src/gallium/drivers/swr/swr_shader.cpp @@ -134,6 +134,7 @@ struct BuilderSWR : public Builder { ~BuilderSWR() { gallivm_free_ir(gallivm); + FREE(gallivm); } struct gallivm_state *gallivm; -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3] vulkan: Fix use of uninitialized variable.
From: Robert Foss The return variable was not set for failure paths. It has now been changed to VK_ERROR_INITIALIZATION_FAILED for failure paths. Coverity: 1358944 Reviewed-by: Eric Engestrom Signed-off-by: Robert Foss --- Changes since v1: - Started using vk_error(). Changes since v2: - Added Reviewed-by: Eric Engestrom src/intel/vulkan/anv_wsi_wayland.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/intel/vulkan/anv_wsi_wayland.c b/src/intel/vulkan/anv_wsi_wayland.c index 6f25eaf..43606f6 100644 --- a/src/intel/vulkan/anv_wsi_wayland.c +++ b/src/intel/vulkan/anv_wsi_wayland.c @@ -776,12 +776,16 @@ wsi_wl_surface_create_swapchain(VkIcdSurfaceBase *icd_surface, chain->queue = NULL; chain->display = wsi_wl_get_display(device->instance, surface->display); - if (!chain->display) + if (!chain->display) { + result = vk_error(VK_ERROR_INITIALIZATION_FAILED); goto fail; + } chain->queue = wl_display_create_queue(chain->display->display); - if (!chain->queue) + if (!chain->queue) { + result = vk_error(VK_ERROR_INITIALIZATION_FAILED); goto fail; + } for (uint32_t i = 0; i < chain->image_count; i++) { result = wsi_wl_image_init(chain, &chain->images[i], pAllocator); -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] vulkan: Fix use of uninitialized variable.
From: Robert Foss The return variable was not set for failure paths. It has now been changed to VK_ERROR_INITIALIZATION_FAILED for failure paths. Coverity: 1358944 Signed-off-by: Robert Foss --- Changes since v1: - Started using vk_error(). src/intel/vulkan/anv_wsi_wayland.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/intel/vulkan/anv_wsi_wayland.c b/src/intel/vulkan/anv_wsi_wayland.c index 6f25eaf..43606f6 100644 --- a/src/intel/vulkan/anv_wsi_wayland.c +++ b/src/intel/vulkan/anv_wsi_wayland.c @@ -776,12 +776,16 @@ wsi_wl_surface_create_swapchain(VkIcdSurfaceBase *icd_surface, chain->queue = NULL; chain->display = wsi_wl_get_display(device->instance, surface->display); - if (!chain->display) + if (!chain->display) { + result = vk_error(VK_ERROR_INITIALIZATION_FAILED); goto fail; + } chain->queue = wl_display_create_queue(chain->display->display); - if (!chain->queue) + if (!chain->queue) { + result = vk_error(VK_ERROR_INITIALIZATION_FAILED); goto fail; + } for (uint32_t i = 0; i < chain->image_count; i++) { result = wsi_wl_image_init(chain, &chain->images[i], pAllocator); -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swr: Fix resource leak in object.
After shader compilation, the gallivm_state is owned by ShaderVariant<> and destroyed by its destructor (swr_state.h) so that the generated JIT code and gallivm_state can be freed when no longer needed. > On May 3, 2016, at 5:45 PM, robert.f...@collabora.com wrote: > > From: Robert Foss > > Make sure that memory allocated is free'd. > Previously only the contents of the variable > galliumvm was free'd, not the actual memory > it points to. > > Coverity: 1358907 > Signed-off-by: Robert Foss > --- > src/gallium/drivers/swr/swr_shader.cpp | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/gallium/drivers/swr/swr_shader.cpp > b/src/gallium/drivers/swr/swr_shader.cpp > index f693f51..5b1b1ee 100644 > --- a/src/gallium/drivers/swr/swr_shader.cpp > +++ b/src/gallium/drivers/swr/swr_shader.cpp > @@ -134,6 +134,7 @@ struct BuilderSWR : public Builder { > >~BuilderSWR() { > gallivm_free_ir(gallivm); > + FREE(gallivm); >} > >struct gallivm_state *gallivm; > -- > 2.5.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] configure.ac: error out when building from git without python3
On 4 May 2016 at 13:08, Ilia Mirkin wrote: > Only if building Intel vulkan, no? I believe the idea is to move from python2 so this will do no harm as is. > Could that script just be fixed to work > with python2 BTW? > Possibly. The above 'moving away from python2' topic still applies imho. If we have a brave soul to port it and (more importantly) others are happy I would love to drop this patch. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] configure.ac: error out when building from git without python3
Only if building Intel vulkan, no? Could that script just be fixed to work with python2 BTW? On May 4, 2016 6:44 AM, "Emil Velikov" wrote: > From: Emil Velikov > > Bail early, as opposed to later on during the build. > > Signed-off-by: Emil Velikov > --- > configure.ac | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/configure.ac b/configure.ac > index 5f75c60..0932871 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -142,6 +142,12 @@ else > fi > fi > > +if test -z "$PYTHON3"; then > +if test ! -f "$srcdir/src/intel/genxml/gen9_pack.h"; then > +AC_MSG_ERROR([Python3 not found - unable to generate sources]) > +fi > +fi > + > AC_PROG_INSTALL > > dnl We need a POSIX shell for parts of the build. Assume we have one > -- > 2.6.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/59] freedreno/ir3: lower lrp when operating with double operands
On Fri, Apr 29, 2016 at 7:29 AM, Samuel Iglesias Gonsálvez wrote: > Lower lrp when operating with double operands because float version of > lrp is also lowered. > > Signed-off-by: Samuel Iglesias Gonsálvez > CC: Rob Clark > --- > src/gallium/drivers/freedreno/ir3/ir3_nir.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/gallium/drivers/freedreno/ir3/ir3_nir.c > b/src/gallium/drivers/freedreno/ir3/ir3_nir.c > index 364e92b..0635dfb 100644 > --- a/src/gallium/drivers/freedreno/ir3/ir3_nir.c > +++ b/src/gallium/drivers/freedreno/ir3/ir3_nir.c > @@ -43,6 +43,7 @@ ir3_tgsi_to_nir(const struct tgsi_token *tokens) > .lower_fsat = true, > .lower_scmp = true, > .lower_flrp32 = true, > + .lower_flrp64 = true, > .lower_ffract = true, > .native_integers = true, > .vertex_id_zero_based = true, I guess there are probably other things missing for fp64 support, but I think we probably want to figure that out at some point. According to opencl level supported by a4xx+ it should support doubles, but I'm not sure yet if that is just via lowering. Anyways, I guess this makes sense for now, so: Reviewed-by: Rob Clark > -- > 2.5.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 12/15] i965: abort linking if URB read length greater than 15
On Wed, 2016-05-04 at 03:43 -0700, Kenneth Graunke wrote: > On Thursday, April 28, 2016 1:40:42 PM PDT Antia Puentes wrote: > > > > From: "Juan A. Suarez Romero" > > > > In scalar mode, URB read length limit is 15. Abort if we go beyond > > it. > > --- > > src/mesa/drivers/dri/i965/brw_vec4.cpp | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp > > b/src/mesa/drivers/dri/ > i965/brw_vec4.cpp > > > > index 9816f0d..04287fb 100644 > > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp > > @@ -2148,6 +2148,14 @@ brw_compile_vs(const struct brw_compiler > > *compiler, > void *log_data, > > > > prog_data->base.urb_read_length = > > DIV_ROUND_UP(MAX2(nr_attribute_slots, 1), 2); > > > > + if (is_scalar && prog_data->base.urb_read_length > 15) { > > + if (error_str) > > + *error_str = ralloc_strdup(mem_ctx, > > +"Too many attributes. Try to > > reduce the > " > > > > +"number of attributes or their > > size"); > > + return NULL; > > + } > > + > > prog_data->nr_attributes = nr_attributes; > > prog_data->nr_attribute_slots = nr_attribute_slots; > > > > > Have you hit this limit? I'm probably doing my math wrong, but it > doesn't seem like we should ever hit it... > Yes. We have a piglit test (not in master yet) that hits the limit. You can find it in http://pastebin.com/k96EyYun This links with a comment made by Ian[1] in a different patch, but that also applies here. https://lists.freedesktop.org/archives/mesa-dev/2016-April/114885.html dvec3/dmat*3/dvec4/dmat*4 can be counted as consuming the same attributes as the single precision version, or consuming twice. Current code follows the former option. If we switch to the second, then we won't hit this limit. But this has other cons, as explained in the thread. > I'm not sure we can legally just say no - we may need to resort to > pulling URB data... > Maybe. Another option could be to switch to vec4... if gen8+vec4 were working. As this seems an extreme case (when we use lot of attributes), I thought that dealing with it was more like a task to improve in the future. J.A. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 02/15] i965: Configure how to store *64*PASSTHRU vertex components
Hi Kenneth, thanks for reviewing. On mié, 2016-05-04 at 03:36 -0700, Kenneth Graunke wrote: > On Thursday, April 28, 2016 1:40:32 PM PDT Antia Puentes wrote: > > > > From the Broadwell specification, structure VERTEX_ELEMENT_STATE > > description: > > > > "When SourceElementFormat is set to one of the *64*_PASSTHRU > > formats, 64-bit components are stored in the URB without any > > conversion. In this case, vertex elements must be written as 128 > > or 256 bits, with VFCOMP_STORE_0 being used to pad the output > > as required. E.g., if R64_PASSTHRU is used to copy a 64-bit Red > component into > > > > the URB, Component 1 must be specified as VFCOMP_STORE_0 (with > > Components 2,3 set to VFCOMP_NOSTORE) in order to output a 128-bit > > vertex element, or Components 1-3 must be specified as VFCOMP_STORE_0 > > in order to output a 256-bit vertex element. Likewise, use of > > R64G64B64_PASSTHRU requires Component 3 to be specified as > VFCOMP_STORE_0 > > > > in order to output a 256-bit vertex element." > > > > Uses 128-bits to write double and dvec2 vertex elements, and 256-bits for > > dvec3 and dvec4 vertex elements. > > > > Signed-off-by: Juan A. Suarez Romero > > Signed-off-by: Antia Puentes > > --- > > src/mesa/drivers/dri/i965/gen8_draw_upload.c | 34 + > +++ > > > > 1 file changed, 34 insertions(+) > > > > diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c b/src/mesa/ > drivers/dri/i965/gen8_draw_upload.c > > > > index fe5ed35..14f7091 100644 > > --- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c > > +++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c > > @@ -217,6 +217,40 @@ gen8_emit_vertices(struct brw_context *brw) > > break; > > } > > > > + /* From the BDW PRM, Volume 2d, page 586 (VERTEX_ELEMENT_STATE): > > + * > > + * "When SourceElementFormat is set to one of the *64*_PASSTHRU > > + * formats, 64-bit components are stored in the URB without any > > + * conversion. In this case, vertex elements must be written as > 128 > > > > + * or 256 bits, with VFCOMP_STORE_0 being used to pad the output > > + * as required. E.g., if R64_PASSTHRU is used to copy a 64-bit > Red > > > > + * component into the URB, Component 1 must be specified as > > + * VFCOMP_STORE_0 (with Components 2,3 set to VFCOMP_NOSTORE) > > + * in order to output a 128-bit vertex element, or Components > > 1-3 > must > > > > + * be specified as VFCOMP_STORE_0 in order to output a 256-bit > vertex > > > > + * element. Likewise, use of R64G64B64_PASSTHRU requires > Component 3 > > > > + * to be specified as VFCOMP_STORE_0 in order to output a > > 256-bit > vertex > > > > + * element." > > + */ > The above comment seems to indicate that R64 needs component 1 set to > STORE_0, and that looks to be missing. I'm guessing you need to add: > > > > > + if (input->glarray->Doubles) { > > + switch (input->glarray->Size) { > > + case 0: > comp0 = BRW_VE1_COMPONENT_STORE_0; > /* fallthrough */ > > > > > + case 1: > comp1 = BRW_VE1_COMPONENT_STORE_0; > /* fallthrough */ > I have not added them because comp0 and comp1 are initialized in the code immediately above the switch I added. That already existing code sets the default values for the components: switch (input->glarray->Size) { case 0: comp0 = BRW_VE1_COMPONENT_STORE_0; case 1: comp1 = BRW_VE1_COMPONENT_STORE_0; case 2: comp2 = BRW_VE1_COMPONENT_STORE_0; case 3: comp3 = input->glarray->Integer ? BRW_VE1_COMPONENT_STORE_1_INT : BRW_VE1_COMPONENT_STORE_1_FLT; break; } In my switch I just override the values that are not right for the double-precision floating point cases. > > + case 2: > > +/* Use 128-bits instead of 256-bits to write double and dvec2 > > + * vertex elements. > > + */ > > +comp2 = BRW_VE1_COMPONENT_NOSTORE; > > +comp3 = BRW_VE1_COMPONENT_NOSTORE; > > +break; > > + case 3: > > +/* Pad the output using VFCOMP_STORE_0 as suggested > > + * by the BDW PRM. > > + */ > > +comp3 = BRW_VE1_COMPONENT_STORE_0; > > + } > > + } > > + > > OUT_BATCH((input->buffer << GEN6_VE0_INDEX_SHIFT) | > > GEN6_VE0_VALID | > > (format << BRW_VE0_FORMAT_SHIFT) | > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev