Re: [Mesa-dev] [PATCH 30/33] anv: Use blorp for CopyBuffer and UpdateBuffer
On Fri, Sep 9, 2016 at 5:47 PM, Nanley Cherywrote: > On Wed, Aug 31, 2016 at 02:22:49PM -0700, Jason Ekstrand wrote: > > --- > > src/intel/vulkan/Makefile.sources | 1 - > > src/intel/vulkan/anv_blorp.c | 184 ++ > > > src/intel/vulkan/anv_meta_copy.c | 180 -- > --- > > 3 files changed, 184 insertions(+), 181 deletions(-) > > delete mode 100644 src/intel/vulkan/anv_meta_copy.c > > > > diff --git a/src/intel/vulkan/Makefile.sources > b/src/intel/vulkan/Makefile.sources > > index 35e15f6..6c9853b 100644 > > --- a/src/intel/vulkan/Makefile.sources > > +++ b/src/intel/vulkan/Makefile.sources > > @@ -35,7 +35,6 @@ VULKAN_FILES := \ > > anv_meta.h \ > > anv_meta_blit2d.c \ > > anv_meta_clear.c \ > > - anv_meta_copy.c \ > > anv_meta_resolve.c \ > > anv_nir.h \ > > anv_nir_apply_dynamic_offsets.c \ > > diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c > > index 89ff3b3..e2b6672 100644 > > --- a/src/intel/vulkan/anv_blorp.c > > +++ b/src/intel/vulkan/anv_blorp.c > > @@ -479,3 +479,187 @@ void anv_CmdBlitImage( > > > > blorp_batch_finish(); > > } > > + > > +static void > > +do_buffer_copy(struct blorp_batch *batch, > > + struct anv_bo *src, uint64_t src_offset, > > + struct anv_bo *dst, uint64_t dst_offset, > > + int width, int height, int block_size) > > +{ > > + struct anv_device *device = batch->blorp->driver_ctx; > > + > > + /* The actual format we pick doesn't matter as blorp will throw it > away. > > +* The only thing that actually matters is the size. > > +*/ > > + enum isl_format format; > > + switch (block_size) { > > + case 1: format = ISL_FORMAT_R8_UINT; break; > > + case 2: format = ISL_FORMAT_R8G8_UINT;break; > > + case 4: format = ISL_FORMAT_R8G8B8A8_UNORM; break; > > + case 8: format = ISL_FORMAT_R16G16B16A16_UNORM; break; > > + case 16: format = ISL_FORMAT_R32G32B32A32_UINT;break; > > + default: > > + unreachable("Not a power-of-two format size"); > > + } > > + > > + struct isl_surf surf; > > + isl_surf_init(>isl_dev, , > > + .dim = ISL_SURF_DIM_2D, > > + .format = format, > > + .width = width, > > + .height = height, > > + .depth = 1, > > + .levels = 1, > > + .array_len = 1, > > + .samples = 1, > > + .usage = ISL_SURF_USAGE_TEXTURE_BIT, > > Shouldn't we also OR in ISL_SURF_USAGE_RENDER_TARGET_BIT? > Yes we should. > > + .tiling_flags = ISL_TILING_LINEAR_BIT); > > + assert(surf.row_pitch == width * block_size); > > + > > + struct blorp_surf src_blorp_surf = { > > + .surf = , > > + .addr = { > > + .buffer = src, > > + .offset = src_offset, > > + }, > > + }; > > + > > + struct blorp_surf dst_blorp_surf = { > > + .surf = , > > + .addr = { > > + .buffer = dst, > > + .offset = dst_offset, > > + }, > > + }; > > + > > + blorp_copy(batch, _blorp_surf, 0, 0, _blorp_surf, 0, 0, > > + 0, 0, 0, 0, width, height); > > +} > > + > > +void anv_CmdCopyBuffer( > > +VkCommandBuffer commandBuffer, > > +VkBuffersrcBuffer, > > +VkBufferdstBuffer, > > +uint32_tregionCount, > > +const VkBufferCopy* pRegions) > > +{ > > + ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer); > > + ANV_FROM_HANDLE(anv_buffer, src_buffer, srcBuffer); > > + ANV_FROM_HANDLE(anv_buffer, dst_buffer, dstBuffer); > > + > > + struct blorp_batch batch; > > + blorp_batch_init(_buffer->device->blorp, , cmd_buffer); > > + > > + for (unsigned r = 0; r < regionCount; r++) { > > + uint64_t src_offset = src_buffer->offset + pRegions[r].srcOffset; > > + uint64_t dst_offset = dst_buffer->offset + pRegions[r].dstOffset; > > + uint64_t copy_size = pRegions[r].size; > > + > > + /* First, we compute the biggest format that can be used with the > > + * given offsets and size. > > + */ > > + int bs = 16; > > + > > + int fs = ffs(src_offset) - 1; > > + if (fs != -1) > > + bs = MIN2(bs, 1 << fs); > > + assert(src_offset % bs == 0); > > I think we could replace this math and some math in > anv_CmdUpdateBuffer() with logic that's a bit simpler: > > /* Get the greatest power-of-two multiple between a number and a > * power-of-two upper-bound. > */ > #define gcm_pow2(val, max_pow2m) (1 << (ffs((val)|(max_pow2m)) - 1)) > That's a good idea. I just pushed to the branch with two new patches for you to look at. One which uses gcm_pow2 and one which sets the usage bits as requested
Re: [Mesa-dev] [PATCH] mesa: Implement ARB_shader_viewport_layer_array for i965
On Fri, Sep 9, 2016 at 8:59 PM, Dylan Bakerwrote: >> Some piglit tests to check the TES behaviour would be nice. > > Yeah, I figured. I'll have to go read up on how tessellation works. While I would hate to discourage you from such a joyous learning experience, if you want the quickest way to success, mash up the AMD_* tests with the ARB_tessellation_shader nop.shader_test or sanity.shader_test. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Implement ARB_shader_viewport_layer_array for i965
Quoting Ilia Mirkin (2016-09-09 16:40:15) > On Fri, Sep 9, 2016 at 7:14 PM, Dylan Bakerwrote: > > +EXT(ARB_shader_viewport_layer_array , > > ARB_shader_viewport_layer_array, GLL, GLC, x , x , 2015) > > This relies on GL 3.2 at least, so I think this should just be GLC, not GLL. The descriptions of what GLL, GLC, etc do are a bit... vague. I'll change that before I push anything. > > Also, you mention it requires GL 3.1, but really it just requires > GL_ARB_viewport_array (which became core in GL 4.1). I think it'd be > safe to enable this on SNB+. Your call. The spec explicitly calls out that it requires GL 4.1, so I guess the question is whether we want to to honor that requirement or not. > > Core bits are > > Reviewed-by: Ilia Mirkin > > Some piglit tests to check the TES behaviour would be nice. Yeah, I figured. I'll have to go read up on how tessellation works. signature.asc Description: signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: don't preload constants at the beginning of shaders
On 09/10/2016 10:02 AM, Bas Nieuwenhuizen wrote: > Reviewed-by: Bas NieuwenhuizenReviewed-by: Edward O'Callaghan > > On Sat, Sep 10, 2016 at 12:40 AM, Marek Olšák wrote: >> From: Marek Olšák >> >> LLVM can CSE the loads, thus we can always re-load constants before each >> use. The decrease in SGPR spilling is huge. >> >> The best improvements are the dumbest ones. >> >> 26011 shaders in 14651 tests >> Totals: >> SGPRS: 1453346 -> 1251920 (-13.86 %) >> VGPRS: 742576 -> 728421 (-1.91 %) >> Spilled SGPRs: 52298 -> 16644 (-68.17 %) >> Spilled VGPRs: 397 -> 369 (-7.05 %) >> Scratch VGPRs: 1372 -> 1344 (-2.04 %) dwords per thread >> Code Size: 36136488 -> 36001064 (-0.37 %) bytes >> LDS: 767 -> 767 (0.00 %) blocks >> Max Waves: 219315 -> 21 (1.33 %) >> --- >> src/gallium/drivers/radeonsi/si_shader.c | 30 +++--- >> 1 file changed, 11 insertions(+), 19 deletions(-) >> >> diff --git a/src/gallium/drivers/radeonsi/si_shader.c >> b/src/gallium/drivers/radeonsi/si_shader.c >> index 0b7de18..08e3cee 100644 >> --- a/src/gallium/drivers/radeonsi/si_shader.c >> +++ b/src/gallium/drivers/radeonsi/si_shader.c >> @@ -1874,26 +1874,33 @@ static LLVMValueRef fetch_constant( >> for (chan = 0; chan < TGSI_NUM_CHANNELS; ++chan) >> values[chan] = fetch_constant(bld_base, reg, type, >> chan); >> >> return lp_build_gather_values(bld_base->base.gallivm, >> values, 4); >> } >> >> buf = reg->Register.Dimension ? reg->Dimension.Index : 0; >> idx = reg->Register.Index * 4 + swizzle; >> >> if (!reg->Register.Indirect && !reg->Dimension.Indirect) { >> + LLVMValueRef c0, c1; >> + >> + c0 = buffer_load_const(ctx, ctx->const_buffers[buf], >> + LLVMConstInt(ctx->i32, idx * 4, 0)); >> + >> if (!tgsi_type_is_64bit(type)) >> - return bitcast(bld_base, type, >> ctx->constants[buf][idx]); >> + return bitcast(bld_base, type, c0); >> else { >> + c1 = buffer_load_const(ctx, ctx->const_buffers[buf], >> + LLVMConstInt(ctx->i32, >> + (idx + 1) * 4, >> 0)); >> return radeon_llvm_emit_fetch_64bit(bld_base, type, >> - >> ctx->constants[buf][idx], >> - >> ctx->constants[buf][idx + 1]); >> + c0, c1); >> } >> } >> >> if (reg->Register.Dimension && reg->Dimension.Indirect) { >> LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, >> SI_PARAM_CONST_BUFFERS); >> LLVMValueRef index; >> index = get_bounded_indirect_index(ctx, >DimIndirect, >>reg->Dimension.Index, >>SI_NUM_CONST_BUFFERS); >> bufp = build_indexed_load_const(ctx, ptr, index); >> @@ -5789,39 +5796,26 @@ static void create_function(struct si_shader_context >> *ctx) >> >> static void preload_constants(struct si_shader_context *ctx) >> { >> struct lp_build_tgsi_context *bld_base = >> >radeon_bld.soa.bld_base; >> struct gallivm_state *gallivm = bld_base->base.gallivm; >> const struct tgsi_shader_info *info = bld_base->info; >> unsigned buf; >> LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, >> SI_PARAM_CONST_BUFFERS); >> >> for (buf = 0; buf < SI_NUM_CONST_BUFFERS; buf++) { >> - unsigned i, num_const = info->const_file_max[buf] + 1; >> - >> - if (num_const == 0) >> + if (info->const_file_max[buf] == -1) >> continue; >> >> - /* Allocate space for the constant values */ >> - ctx->constants[buf] = CALLOC(num_const * 4, >> sizeof(LLVMValueRef)); >> - >> /* Load the resource descriptor */ >> ctx->const_buffers[buf] = >> build_indexed_load_const(ctx, ptr, >> lp_build_const_int32(gallivm, buf)); >> - >> - /* Load the constants, we rely on the code sinking to do the >> rest */ >> - for (i = 0; i < num_const * 4; ++i) { >> - ctx->constants[buf][i] = >> - buffer_load_const(ctx, >> - ctx->const_buffers[buf], >> - lp_build_const_int32(gallivm, i * >> 4)); >> - } >> } >> } >> >> static void preload_shader_buffers(struct si_shader_context *ctx) >> {
Re: [Mesa-dev] [PATCH 30/33] anv: Use blorp for CopyBuffer and UpdateBuffer
On Wed, Aug 31, 2016 at 02:22:49PM -0700, Jason Ekstrand wrote: > --- > src/intel/vulkan/Makefile.sources | 1 - > src/intel/vulkan/anv_blorp.c | 184 > ++ > src/intel/vulkan/anv_meta_copy.c | 180 - > 3 files changed, 184 insertions(+), 181 deletions(-) > delete mode 100644 src/intel/vulkan/anv_meta_copy.c > > diff --git a/src/intel/vulkan/Makefile.sources > b/src/intel/vulkan/Makefile.sources > index 35e15f6..6c9853b 100644 > --- a/src/intel/vulkan/Makefile.sources > +++ b/src/intel/vulkan/Makefile.sources > @@ -35,7 +35,6 @@ VULKAN_FILES := \ > anv_meta.h \ > anv_meta_blit2d.c \ > anv_meta_clear.c \ > - anv_meta_copy.c \ > anv_meta_resolve.c \ > anv_nir.h \ > anv_nir_apply_dynamic_offsets.c \ > diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c > index 89ff3b3..e2b6672 100644 > --- a/src/intel/vulkan/anv_blorp.c > +++ b/src/intel/vulkan/anv_blorp.c > @@ -479,3 +479,187 @@ void anv_CmdBlitImage( > > blorp_batch_finish(); > } > + > +static void > +do_buffer_copy(struct blorp_batch *batch, > + struct anv_bo *src, uint64_t src_offset, > + struct anv_bo *dst, uint64_t dst_offset, > + int width, int height, int block_size) > +{ > + struct anv_device *device = batch->blorp->driver_ctx; > + > + /* The actual format we pick doesn't matter as blorp will throw it away. > +* The only thing that actually matters is the size. > +*/ > + enum isl_format format; > + switch (block_size) { > + case 1: format = ISL_FORMAT_R8_UINT; break; > + case 2: format = ISL_FORMAT_R8G8_UINT;break; > + case 4: format = ISL_FORMAT_R8G8B8A8_UNORM; break; > + case 8: format = ISL_FORMAT_R16G16B16A16_UNORM; break; > + case 16: format = ISL_FORMAT_R32G32B32A32_UINT;break; > + default: > + unreachable("Not a power-of-two format size"); > + } > + > + struct isl_surf surf; > + isl_surf_init(>isl_dev, , > + .dim = ISL_SURF_DIM_2D, > + .format = format, > + .width = width, > + .height = height, > + .depth = 1, > + .levels = 1, > + .array_len = 1, > + .samples = 1, > + .usage = ISL_SURF_USAGE_TEXTURE_BIT, Shouldn't we also OR in ISL_SURF_USAGE_RENDER_TARGET_BIT? > + .tiling_flags = ISL_TILING_LINEAR_BIT); > + assert(surf.row_pitch == width * block_size); > + > + struct blorp_surf src_blorp_surf = { > + .surf = , > + .addr = { > + .buffer = src, > + .offset = src_offset, > + }, > + }; > + > + struct blorp_surf dst_blorp_surf = { > + .surf = , > + .addr = { > + .buffer = dst, > + .offset = dst_offset, > + }, > + }; > + > + blorp_copy(batch, _blorp_surf, 0, 0, _blorp_surf, 0, 0, > + 0, 0, 0, 0, width, height); > +} > + > +void anv_CmdCopyBuffer( > +VkCommandBuffer commandBuffer, > +VkBuffersrcBuffer, > +VkBufferdstBuffer, > +uint32_tregionCount, > +const VkBufferCopy* pRegions) > +{ > + ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer); > + ANV_FROM_HANDLE(anv_buffer, src_buffer, srcBuffer); > + ANV_FROM_HANDLE(anv_buffer, dst_buffer, dstBuffer); > + > + struct blorp_batch batch; > + blorp_batch_init(_buffer->device->blorp, , cmd_buffer); > + > + for (unsigned r = 0; r < regionCount; r++) { > + uint64_t src_offset = src_buffer->offset + pRegions[r].srcOffset; > + uint64_t dst_offset = dst_buffer->offset + pRegions[r].dstOffset; > + uint64_t copy_size = pRegions[r].size; > + > + /* First, we compute the biggest format that can be used with the > + * given offsets and size. > + */ > + int bs = 16; > + > + int fs = ffs(src_offset) - 1; > + if (fs != -1) > + bs = MIN2(bs, 1 << fs); > + assert(src_offset % bs == 0); I think we could replace this math and some math in anv_CmdUpdateBuffer() with logic that's a bit simpler: /* Get the greatest power-of-two multiple between a number and a * power-of-two upper-bound. */ #define gcm_pow2(val, max_pow2m) (1 << (ffs((val)|(max_pow2m)) - 1)) bs = gcm_pow2(src_offset, bs); assert(src_offset % bs == 0); > + > + fs = ffs(dst_offset) - 1; > + if (fs != -1) > + bs = MIN2(bs, 1 << fs); > + assert(dst_offset % bs == 0); > + > + fs = ffs(pRegions[r].size) - 1; > + if (fs != -1) > + bs = MIN2(bs, 1 << fs); > + assert(pRegions[r].size % bs == 0); > + > + /* This is maximum possible width/height our HW can handle */ > + uint64_t max_surface_dim = 1 << 14; > + > + /* First,
Re: [Mesa-dev] [PATCH] radeonsi: don't preload constants at the beginning of shaders
Reviewed-by: Bas NieuwenhuizenOn Sat, Sep 10, 2016 at 12:40 AM, Marek Olšák wrote: > From: Marek Olšák > > LLVM can CSE the loads, thus we can always re-load constants before each > use. The decrease in SGPR spilling is huge. > > The best improvements are the dumbest ones. > > 26011 shaders in 14651 tests > Totals: > SGPRS: 1453346 -> 1251920 (-13.86 %) > VGPRS: 742576 -> 728421 (-1.91 %) > Spilled SGPRs: 52298 -> 16644 (-68.17 %) > Spilled VGPRs: 397 -> 369 (-7.05 %) > Scratch VGPRs: 1372 -> 1344 (-2.04 %) dwords per thread > Code Size: 36136488 -> 36001064 (-0.37 %) bytes > LDS: 767 -> 767 (0.00 %) blocks > Max Waves: 219315 -> 21 (1.33 %) > --- > src/gallium/drivers/radeonsi/si_shader.c | 30 +++--- > 1 file changed, 11 insertions(+), 19 deletions(-) > > diff --git a/src/gallium/drivers/radeonsi/si_shader.c > b/src/gallium/drivers/radeonsi/si_shader.c > index 0b7de18..08e3cee 100644 > --- a/src/gallium/drivers/radeonsi/si_shader.c > +++ b/src/gallium/drivers/radeonsi/si_shader.c > @@ -1874,26 +1874,33 @@ static LLVMValueRef fetch_constant( > for (chan = 0; chan < TGSI_NUM_CHANNELS; ++chan) > values[chan] = fetch_constant(bld_base, reg, type, > chan); > > return lp_build_gather_values(bld_base->base.gallivm, values, > 4); > } > > buf = reg->Register.Dimension ? reg->Dimension.Index : 0; > idx = reg->Register.Index * 4 + swizzle; > > if (!reg->Register.Indirect && !reg->Dimension.Indirect) { > + LLVMValueRef c0, c1; > + > + c0 = buffer_load_const(ctx, ctx->const_buffers[buf], > + LLVMConstInt(ctx->i32, idx * 4, 0)); > + > if (!tgsi_type_is_64bit(type)) > - return bitcast(bld_base, type, > ctx->constants[buf][idx]); > + return bitcast(bld_base, type, c0); > else { > + c1 = buffer_load_const(ctx, ctx->const_buffers[buf], > + LLVMConstInt(ctx->i32, > + (idx + 1) * 4, > 0)); > return radeon_llvm_emit_fetch_64bit(bld_base, type, > - > ctx->constants[buf][idx], > - > ctx->constants[buf][idx + 1]); > + c0, c1); > } > } > > if (reg->Register.Dimension && reg->Dimension.Indirect) { > LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, > SI_PARAM_CONST_BUFFERS); > LLVMValueRef index; > index = get_bounded_indirect_index(ctx, >DimIndirect, >reg->Dimension.Index, >SI_NUM_CONST_BUFFERS); > bufp = build_indexed_load_const(ctx, ptr, index); > @@ -5789,39 +5796,26 @@ static void create_function(struct si_shader_context > *ctx) > > static void preload_constants(struct si_shader_context *ctx) > { > struct lp_build_tgsi_context *bld_base = > >radeon_bld.soa.bld_base; > struct gallivm_state *gallivm = bld_base->base.gallivm; > const struct tgsi_shader_info *info = bld_base->info; > unsigned buf; > LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, > SI_PARAM_CONST_BUFFERS); > > for (buf = 0; buf < SI_NUM_CONST_BUFFERS; buf++) { > - unsigned i, num_const = info->const_file_max[buf] + 1; > - > - if (num_const == 0) > + if (info->const_file_max[buf] == -1) > continue; > > - /* Allocate space for the constant values */ > - ctx->constants[buf] = CALLOC(num_const * 4, > sizeof(LLVMValueRef)); > - > /* Load the resource descriptor */ > ctx->const_buffers[buf] = > build_indexed_load_const(ctx, ptr, > lp_build_const_int32(gallivm, buf)); > - > - /* Load the constants, we rely on the code sinking to do the > rest */ > - for (i = 0; i < num_const * 4; ++i) { > - ctx->constants[buf][i] = > - buffer_load_const(ctx, > - ctx->const_buffers[buf], > - lp_build_const_int32(gallivm, i * 4)); > - } > } > } > > static void preload_shader_buffers(struct si_shader_context *ctx) > { > struct gallivm_state *gallivm = >radeon_bld.gallivm; > LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, > SI_PARAM_SHADER_BUFFERS); > int buf, maxbuf; > > maxbuf =
Re: [Mesa-dev] [PATCH] mesa: Implement ARB_shader_viewport_layer_array for i965
On Fri, Sep 9, 2016 at 7:14 PM, Dylan Bakerwrote: > +EXT(ARB_shader_viewport_layer_array , > ARB_shader_viewport_layer_array, GLL, GLC, x , x , 2015) This relies on GL 3.2 at least, so I think this should just be GLC, not GLL. Also, you mention it requires GL 3.1, but really it just requires GL_ARB_viewport_array (which became core in GL 4.1). I think it'd be safe to enable this on SNB+. Your call. Core bits are Reviewed-by: Ilia Mirkin Some piglit tests to check the TES behaviour would be nice. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Check about recent changes in amdgpu
Hi Dave, while checking mesa-dev with android build, I'm porting the necessary changes to to have amd/addrlib in Android build, but I have also encountered a building error related to commit https://cgit.freedesktop.org/mesa/mesa/commit/?id=1add3562e33f0234da50e54dda8cfa6dac613125 In the attachment the full build log, under my signatue short error log. If I revert 1add3562e33f0234da50e54dda8cfa6dac613125 and after that I modify src/amd/common/amdgpu_id.h in this way... -#include "util/u_endian.h" +#include "pipe/p_config.h" ...I can build mesa and complete Android 7.0, build, so there may be some part missing in the new util/u_endian.h header compared to the old one. Would you like to check when you have time, please? Pardon me if I may have wrote something wrong, beacuse I'm more involved android build system than in coding. After your feedback, I will submit the final version of patch to fix android build for review, while in the following brach I have the workaround I've mentioned: https://github.com/maurossi/mesa/commits/12.1.0devel_nougat-x86_10-sep-2016 Mauro Rossi issor.or...@gmail.com In file included from external/mesa/src/amd/addrlib/r800/ciaddrlib.cpp:36: external/mesa/src/amd/addrlib/inc/chip/r800/si_gb_reg.h:36:2: error: "BIGENDIAN_CPU or LITTLEENDIAN_CPU must be defined" #error "BIGENDIAN_CPU or LITTLEENDIAN_CPU must be defined" ^ external/mesa/src/amd/addrlib/inc/chip/r800/si_gb_reg.h:91:6: error: unknown type name 'GB_ADDR_CONFIG_T' GB_ADDR_CONFIG_T f; ^ external/mesa/src/amd/addrlib/inc/chip/r800/si_gb_reg.h:146:6: error: unknown type name 'GB_TILE_MODE_T' GB_TILE_MODE_T f; ^ external/mesa/src/amd/addrlib/inc/chip/r800/si_gb_reg.h:151:6: error: unknown type name 'GB_MACROTILE_MODE_T' GB_MACROTILE_MODE_T f; ^ 4 errors generated. [ 20% 391/1939] target C++: libmesa_amdgpu_addrlib <= external/mesa/src/amd/addrlib/r800/egbaddrlib.cpp ninja: build stopped: subcommand failed. build/core/ninja.mk:148: recipe for target 'ninja_wrapper' failed make: *** [ninja_wrapper] Error 1 make failed to build some targets (01:33 (mm:ss)) utente@utente-MS-7576:~/nougat-x86$ make -j5 iso_img TARGET_KERNEL_CONFIG=480_defconfig FOR_RELEASE=1 PLATFORM_VERSION_CODENAME=REL PLATFORM_VERSION=7.0 TARGET_PRODUCT=android_x86 TARGET_BUILD_VARIANT=eng TARGET_BUILD_TYPE=release TARGET_BUILD_APPS= TARGET_ARCH=x86 TARGET_ARCH_VARIANT=x86 TARGET_CPU_VARIANT= TARGET_2ND_ARCH= TARGET_2ND_ARCH_VARIANT= TARGET_2ND_CPU_VARIANT= HOST_ARCH=x86_64 HOST_2ND_ARCH=x86 HOST_OS=linux HOST_OS_EXTRA=Linux-4.4.0-38-generic-x86_64-with-Ubuntu-16.04-xenial HOST_CROSS_OS=windows HOST_CROSS_ARCH=x86 HOST_CROSS_2ND_ARCH=x86_64 HOST_BUILD_TYPE=release BUILD_ID=NRD90M OUT_DIR=out Running kati to generate build-android_x86.ninja... external/mesa/src/gallium/winsys/amdgpu/drm/Android.mk was modified, regenerating... PLATFORM_VERSION_CODENAME=REL PLATFORM_VERSION=7.0 TARGET_PRODUCT=android_x86 TARGET_BUILD_VARIANT=eng TARGET_BUILD_TYPE=release TARGET_BUILD_APPS= TARGET_ARCH=x86 TARGET_ARCH_VARIANT=x86 TARGET_CPU_VARIANT= TARGET_2ND_ARCH= TARGET_2ND_ARCH_VARIANT= TARGET_2ND_CPU_VARIANT= HOST_ARCH=x86_64 HOST_2ND_ARCH=x86 HOST_OS=linux HOST_OS_EXTRA=Linux-4.4.0-38-generic-x86_64-with-Ubuntu-16.04-xenial HOST_CROSS_OS=windows HOST_CROSS_ARCH=x86 HOST_CROSS_2ND_ARCH=x86_64 HOST_BUILD_TYPE=release BUILD_ID=NRD90M OUT_DIR=out including ./abi/cpp/Android.mk ... including ./art/Android.mk ... including ./bionic/Android.mk ... including ./bootable/newinstaller/Android.mk ... including ./bootable/recovery/Android.mk ... including ./build/libs/host/Android.mk ... including ./build/target/board/Android.mk ... including ./build/target/product/security/Android.mk ... including ./build/tools/Android.mk ... including ./dalvik/Android.mk ... including ./development/apps/BluetoothDebug/Android.mk ... including ./development/apps/BuildWidget/Android.mk ... including ./development/apps/CustomLocale/Android.mk ... including ./development/apps/Development/Android.mk ... including ./development/apps/DevelopmentSettings/Android.mk ... including ./development/apps/Fallback/Android.mk ... including ./development/apps/GestureBuilder/Android.mk ... including ./development/apps/NinePatchLab/Android.mk ... including ./development/apps/OBJViewer/Android.mk ... including ./development/apps/SdkSetup/Android.mk ... including ./development/apps/SettingInjectorSample/Android.mk ... including ./development/apps/WidgetPreview/Android.mk ... including ./development/apps/launchperf/Android.mk ... including ./development/build/Android.mk ... including ./development/cmds/monkey/Android.mk ... including ./development/host/windows/prebuilt/usb/Android.mk ... including ./development/ndk/Android.mk ... including ./development/perftests/panorama/Android.mk ...
[Mesa-dev] [PATCH] mesa: Implement ARB_shader_viewport_layer_array for i965
This extension is a combination of AMD_vertex_shader_viewport_index and AMD_vertex_shader_layer, making it rather trivial to implement. For gallium I *think* this needs a new cap because of the addition of support in tessellation evaluation shaders, and since I don't have any hardware to test it on, I've left that for someone else to wire up. Since this requires GL 4.1, this is only available on gen8+. Signed-off-by: Dylan Baker--- docs/features.txt| 2 +- docs/relnotes/12.1.0.html| 1 + src/compiler/glsl/builtin_variables.cpp | 14 -- src/compiler/glsl/glsl_parser_extras.cpp | 1 + src/compiler/glsl/glsl_parser_extras.h | 2 ++ src/mesa/drivers/dri/i965/intel_extensions.c | 1 + src/mesa/main/extensions_table.h | 1 + src/mesa/main/mtypes.h | 1 + 8 files changed, 20 insertions(+), 3 deletions(-) diff --git a/docs/features.txt b/docs/features.txt index d6c3240..4b39a1c 100644 --- a/docs/features.txt +++ b/docs/features.txt @@ -296,7 +296,7 @@ Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES ve GL_ARB_shader_draw_parameters DONE (i965, nvc0, radeonsi) GL_ARB_shader_group_vote DONE (nvc0) GL_ARB_shader_stencil_export DONE (i965/gen9+, radeonsi, softpipe, llvmpipe, swr) - GL_ARB_shader_viewport_layer_arraynot started + GL_ARB_shader_viewport_layer_arrayDONE (i965/gen8+) GL_ARB_sparse_buffer not started GL_ARB_sparse_texture not started GL_ARB_sparse_texture2not started diff --git a/docs/relnotes/12.1.0.html b/docs/relnotes/12.1.0.html index bb20e4f..1a0177c 100644 --- a/docs/relnotes/12.1.0.html +++ b/docs/relnotes/12.1.0.html @@ -52,6 +52,7 @@ Note: some of the new features are only available with certain drivers. GL_ARB_indirect_parameters on radeonsi GL_ARB_shader_draw_parameters on radeonsi GL_ARB_shader_group_vote on nvc0 +GL_ARB_shader_viewport_layer_array on i965/gen8+ GL_ARB_stencil_texturing on i965/hsw GL_ARB_texture_stencil8 on i965/hsw GL_EXT_window_rectangles on nv50, nvc0 diff --git a/src/compiler/glsl/builtin_variables.cpp b/src/compiler/glsl/builtin_variables.cpp index 90278d6..8d6413e 100644 --- a/src/compiler/glsl/builtin_variables.cpp +++ b/src/compiler/glsl/builtin_variables.cpp @@ -1000,11 +1000,13 @@ builtin_variable_generator::generate_vs_special_vars() add_system_value(SYSTEM_VALUE_BASE_INSTANCE, int_t, "gl_BaseInstanceARB"); add_system_value(SYSTEM_VALUE_DRAW_ID, int_t, "gl_DrawIDARB"); } - if (state->AMD_vertex_shader_layer_enable) { + if (state->AMD_vertex_shader_layer_enable || + state->ARB_shader_viewport_layer_array_enable) { var = add_output(VARYING_SLOT_LAYER, int_t, "gl_Layer"); var->data.interpolation = INTERP_MODE_FLAT; } - if (state->AMD_vertex_shader_viewport_index_enable) { + if (state->AMD_vertex_shader_viewport_index_enable || + state->ARB_shader_viewport_layer_array_enable) { var = add_output(VARYING_SLOT_VIEWPORT, int_t, "gl_ViewportIndex"); var->data.interpolation = INTERP_MODE_FLAT; } @@ -1066,6 +1068,8 @@ builtin_variable_generator::generate_tcs_special_vars() void builtin_variable_generator::generate_tes_special_vars() { + ir_variable *var; + add_system_value(SYSTEM_VALUE_PRIMITIVE_ID, int_t, "gl_PrimitiveID"); add_system_value(SYSTEM_VALUE_VERTICES_IN, int_t, "gl_PatchVerticesIn"); add_system_value(SYSTEM_VALUE_TESS_COORD, vec3_t, "gl_TessCoord"); @@ -1073,6 +1077,12 @@ builtin_variable_generator::generate_tes_special_vars() "gl_TessLevelOuter"); add_system_value(SYSTEM_VALUE_TESS_LEVEL_INNER, array(float_t, 2), "gl_TessLevelInner"); + if (state->ARB_shader_viewport_layer_array_enable) { + var = add_output(VARYING_SLOT_LAYER, int_t, "gl_Layer"); + var->data.interpolation = INTERP_MODE_FLAT; + var = add_output(VARYING_SLOT_VIEWPORT, int_t, "gl_ViewportIndex"); + var->data.interpolation = INTERP_MODE_FLAT; + } } diff --git a/src/compiler/glsl/glsl_parser_extras.cpp b/src/compiler/glsl/glsl_parser_extras.cpp index 436ddd0..a21ce50 100644 --- a/src/compiler/glsl/glsl_parser_extras.cpp +++ b/src/compiler/glsl/glsl_parser_extras.cpp @@ -608,6 +608,7 @@ static const _mesa_glsl_extension _mesa_glsl_supported_extensions[] = { EXT(ARB_shader_subroutine), EXT(ARB_shader_texture_image_samples), EXT(ARB_shader_texture_lod), + EXT(ARB_shader_viewport_layer_array), EXT(ARB_shading_language_420pack), EXT(ARB_shading_language_packing), EXT(ARB_tessellation_shader), diff --git a/src/compiler/glsl/glsl_parser_extras.h b/src/compiler/glsl/glsl_parser_extras.h index
[Mesa-dev] [PATCH] radeonsi: don't preload constants at the beginning of shaders
From: Marek OlšákLLVM can CSE the loads, thus we can always re-load constants before each use. The decrease in SGPR spilling is huge. The best improvements are the dumbest ones. 26011 shaders in 14651 tests Totals: SGPRS: 1453346 -> 1251920 (-13.86 %) VGPRS: 742576 -> 728421 (-1.91 %) Spilled SGPRs: 52298 -> 16644 (-68.17 %) Spilled VGPRs: 397 -> 369 (-7.05 %) Scratch VGPRs: 1372 -> 1344 (-2.04 %) dwords per thread Code Size: 36136488 -> 36001064 (-0.37 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 219315 -> 21 (1.33 %) --- src/gallium/drivers/radeonsi/si_shader.c | 30 +++--- 1 file changed, 11 insertions(+), 19 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 0b7de18..08e3cee 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -1874,26 +1874,33 @@ static LLVMValueRef fetch_constant( for (chan = 0; chan < TGSI_NUM_CHANNELS; ++chan) values[chan] = fetch_constant(bld_base, reg, type, chan); return lp_build_gather_values(bld_base->base.gallivm, values, 4); } buf = reg->Register.Dimension ? reg->Dimension.Index : 0; idx = reg->Register.Index * 4 + swizzle; if (!reg->Register.Indirect && !reg->Dimension.Indirect) { + LLVMValueRef c0, c1; + + c0 = buffer_load_const(ctx, ctx->const_buffers[buf], + LLVMConstInt(ctx->i32, idx * 4, 0)); + if (!tgsi_type_is_64bit(type)) - return bitcast(bld_base, type, ctx->constants[buf][idx]); + return bitcast(bld_base, type, c0); else { + c1 = buffer_load_const(ctx, ctx->const_buffers[buf], + LLVMConstInt(ctx->i32, + (idx + 1) * 4, 0)); return radeon_llvm_emit_fetch_64bit(bld_base, type, - ctx->constants[buf][idx], - ctx->constants[buf][idx + 1]); + c0, c1); } } if (reg->Register.Dimension && reg->Dimension.Indirect) { LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, SI_PARAM_CONST_BUFFERS); LLVMValueRef index; index = get_bounded_indirect_index(ctx, >DimIndirect, reg->Dimension.Index, SI_NUM_CONST_BUFFERS); bufp = build_indexed_load_const(ctx, ptr, index); @@ -5789,39 +5796,26 @@ static void create_function(struct si_shader_context *ctx) static void preload_constants(struct si_shader_context *ctx) { struct lp_build_tgsi_context *bld_base = >radeon_bld.soa.bld_base; struct gallivm_state *gallivm = bld_base->base.gallivm; const struct tgsi_shader_info *info = bld_base->info; unsigned buf; LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, SI_PARAM_CONST_BUFFERS); for (buf = 0; buf < SI_NUM_CONST_BUFFERS; buf++) { - unsigned i, num_const = info->const_file_max[buf] + 1; - - if (num_const == 0) + if (info->const_file_max[buf] == -1) continue; - /* Allocate space for the constant values */ - ctx->constants[buf] = CALLOC(num_const * 4, sizeof(LLVMValueRef)); - /* Load the resource descriptor */ ctx->const_buffers[buf] = build_indexed_load_const(ctx, ptr, lp_build_const_int32(gallivm, buf)); - - /* Load the constants, we rely on the code sinking to do the rest */ - for (i = 0; i < num_const * 4; ++i) { - ctx->constants[buf][i] = - buffer_load_const(ctx, - ctx->const_buffers[buf], - lp_build_const_int32(gallivm, i * 4)); - } } } static void preload_shader_buffers(struct si_shader_context *ctx) { struct gallivm_state *gallivm = >radeon_bld.gallivm; LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, SI_PARAM_SHADER_BUFFERS); int buf, maxbuf; maxbuf = MIN2(ctx->shader->selector->info.file_max[TGSI_FILE_BUFFER], @@ -6898,22 +6892,20 @@ int si_compile_tgsi_shader(struct si_screen *sscreen, ctx.shader = shader->gs_copy_shader; if ((r = si_generate_gs_copy_shader(sscreen, , shader, debug))) {
[Mesa-dev] [PATCH] nir: Allow opt_peephole_sel to be more aggressive in flattening IFs.
VC4 was running into a major performance regression from enabling control flow in the glmark2 conditionals test, because of short if statements containing an ffract. This pass seems like it was was trying to ensure that we only flattened IFs that should be entirely a win by guaranteeing that there would be fewer bcsels than there were MOVs otherwise. However, if the number of ALU ops is small, we can avoid the overhead of branching (which itself costs cycles) and still get a win, even if it means moving real instructions out of the THEN/ELSE blocks. For now, just turn on aggressive flattening on vc4. i965 will need some tuning to avoid regressions. It does looks like this may be useful to replace freedreno code. Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47 fps to 95 fps on vc4. vc4 shader-db: total instructions in shared programs: 83220 -> 78887 (-5.21%) instructions in affected programs: 21095 -> 16762 (-20.54%) total uniforms in shared programs: 25977 -> 25606 (-1.43%) uniforms in affected programs: 4196 -> 3825 (-8.84%) total estimated cycles in shared programs: 195563 -> 192153 (-1.74%) estimated cycles in affected programs: 30171 -> 26761 (-11.30%) (This doesn't even count the test being optimized, due to a limitation in shader-db-2) --- src/compiler/nir/nir.h | 2 +- src/compiler/nir/nir_opt_peephole_select.c | 82 -- src/gallium/drivers/vc4/vc4_program.c | 2 +- src/mesa/drivers/dri/i965/brw_nir.c| 2 +- 4 files changed, 57 insertions(+), 31 deletions(-) diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index c1cf94001f65..f56f67690d34 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -2589,7 +2589,7 @@ bool nir_opt_dead_cf(nir_shader *shader); void nir_opt_gcm(nir_shader *shader); -bool nir_opt_peephole_select(nir_shader *shader); +bool nir_opt_peephole_select(nir_shader *shader, unsigned limit); bool nir_opt_remove_phis(nir_shader *shader); diff --git a/src/compiler/nir/nir_opt_peephole_select.c b/src/compiler/nir/nir_opt_peephole_select.c index 633e9f486c08..6a73d737077c 100644 --- a/src/compiler/nir/nir_opt_peephole_select.c +++ b/src/compiler/nir/nir_opt_peephole_select.c @@ -32,23 +32,33 @@ * Implements a small peephole optimization that looks for * * if (cond) { - * + * * } else { - * + * * } * phi * ... * phi * - * and replaces it with a series of selects. It can also handle the case - * where, instead of being empty, the if may contain some move operations - * whose only use is one of the following phi nodes. This happens all the - * time when the SSA form comes from a conditional assignment with a - * swizzle. + * and replaces it with: + * + * + * + * bcsel + * ... + * bcsel + * + * where the SSA defs are ALU operations or other cheap instructions (not + * texturing, for example). + * + * If the number of ALU operations in the branches is greater than the limit + * parameter, then the optimization is skipped. In limit=0 mode, the SSA defs + * must only be MOVs which we expect to get copy-propagated away once they're + * out of the inner blocks. */ static bool -block_check_for_allowed_instrs(nir_block *block) +block_check_for_allowed_instrs(nir_block *block, unsigned *count, bool alu_ok) { nir_foreach_instr(instr, block) { switch (instr->type) { @@ -67,6 +77,11 @@ block_check_for_allowed_instrs(nir_block *block) } break; + case nir_intrinsic_load_uniform: +if (!alu_ok) + return false; +break; + default: return false; } @@ -89,29 +104,36 @@ block_check_for_allowed_instrs(nir_block *block) case nir_op_vec2: case nir_op_vec3: case nir_op_vec4: -/* It must be a move-like operation. */ break; default: -return false; +if (!alu_ok) { + /* It must be a move-like operation. */ + return false; +} +break; } - /* Can't handle saturate */ - if (mov->dest.saturate) -return false; - /* It must be SSA */ if (!mov->dest.dest.is_ssa) return false; - /* It cannot have any if-uses */ - if (!list_empty(>dest.dest.ssa.if_uses)) -return false; + if (alu_ok) { +(*count)++; + } else { +/* Can't handle saturate */ +if (mov->dest.saturate) + return false; - /* The only uses of this definition must be phi's in the successor */ - nir_foreach_use(use, >dest.dest.ssa) { -if (use->parent_instr->type != nir_instr_type_phi || -use->parent_instr->block != block->successors[0]) +/* It cannot have any if-uses */ +if
Re: [Mesa-dev] [PATCH 29/33] anv: Use blorp for CopyImage
On Fri, Sep 9, 2016 at 2:35 PM, Nanley Cherywrote: > On Wed, Aug 31, 2016 at 02:22:48PM -0700, Jason Ekstrand wrote: > > --- > > src/intel/vulkan/anv_blorp.c | 67 + > > src/intel/vulkan/anv_meta_copy.c | 158 -- > - > > 2 files changed, 67 insertions(+), 158 deletions(-) > > > > diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c > > index 5fa6699..89ff3b3 100644 > > --- a/src/intel/vulkan/anv_blorp.c > > +++ b/src/intel/vulkan/anv_blorp.c > > @@ -168,6 +168,73 @@ get_blorp_surf_for_anv_image(const struct > anv_image *image, > > }; > > } > > > > +void anv_CmdCopyImage( > > +VkCommandBuffer commandBuffer, > > +VkImage srcImage, > > +VkImageLayout srcImageLayout, > > +VkImage dstImage, > > +VkImageLayout dstImageLayout, > > +uint32_tregionCount, > > +const VkImageCopy* pRegions) > > +{ > > + ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer); > > + ANV_FROM_HANDLE(anv_image, src_image, srcImage); > > + ANV_FROM_HANDLE(anv_image, dst_image, dstImage); > > + > > + struct blorp_batch batch; > > + blorp_batch_init(_buffer->device->blorp, , cmd_buffer); > > + > > + for (unsigned r = 0; r < regionCount; r++) { > > + VkOffset3D srcOffset = > > + anv_sanitize_image_offset(src_image->type, > pRegions[r].srcOffset); > > + VkOffset3D dstOffset = > > + anv_sanitize_image_offset(src_image->type, > pRegions[r].dstOffset); > > + VkExtent3D extent = > > + anv_sanitize_image_extent(src_image->type, > pRegions[r].extent); > > + > > + unsigned dst_base_layer, layer_count; > > + if (dst_image->type == VK_IMAGE_TYPE_3D) { > > + dst_base_layer = dstOffset.z; > > + layer_count = extent.depth; > > In the patch titled, "anv/blorp: Use the correct coordinates in > CmdCopyImage," > you replace extent.depth with pRegions[r].extent.depth. I think it also > makes > sense to change the assignment of src_base_layer and dest_base_layer with > the > offsets straight from user when we know the image type is VK_IMAGE_TYPE_3D. > When the type is 3D, the sanitize functions do nothing. > Sounds reasonable. > With or without that change, and assuming the fixup patch is amended, > this patch is, > > Reviewed-by: Nanley Chery > Thanks! > + } else { > > + dst_base_layer = pRegions[r].dstSubresource.baseArrayLayer; > > + layer_count = pRegions[r].dstSubresource.layerCount; > > + } > > + > > + unsigned src_base_layer; > > + if (src_image->type == VK_IMAGE_TYPE_3D) { > > + src_base_layer = srcOffset.z; > > + } else { > > + src_base_layer = pRegions[r].srcSubresource.baseArrayLayer; > > + assert(pRegions[r].srcSubresource.layerCount == layer_count); > > + } > > + > > + assert(pRegions[r].srcSubresource.aspectMask == > > + pRegions[r].dstSubresource.aspectMask); > > + > > + uint32_t a; > > + for_each_bit(a, pRegions[r].dstSubresource.aspectMask) { > > + VkImageAspectFlagBits aspect = (1 << a); > > + > > + struct blorp_surf src_surf, dst_surf; > > + get_blorp_surf_for_anv_image(src_image, aspect, _surf); > > + get_blorp_surf_for_anv_image(dst_image, aspect, _surf); > > + > > + for (unsigned i = 0; i < layer_count; i++) { > > +blorp_copy(, _surf, pRegions[r].srcSubresource. > mipLevel, > > + src_base_layer + i, > > + _surf, pRegions[r].dstSubresource.mipLevel, > > + dst_base_layer + i, > > + srcOffset.x, srcOffset.y, > > + dstOffset.x, dstOffset.y, > > + extent.width, extent.height); > > + } > > + } > > + } > > + > > + blorp_batch_finish(); > > +} > > + > > static void > > copy_buffer_to_image(struct anv_cmd_buffer *cmd_buffer, > > struct anv_buffer *anv_buffer, > > diff --git a/src/intel/vulkan/anv_meta_copy.c > b/src/intel/vulkan/anv_meta_copy.c > > index 5df04e6..b33273e 100644 > > --- a/src/intel/vulkan/anv_meta_copy.c > > +++ b/src/intel/vulkan/anv_meta_copy.c > > @@ -23,63 +23,6 @@ > > > > #include "anv_meta.h" > > > > -static VkExtent3D > > -meta_image_block_size(const struct anv_image *image) > > -{ > > - if (image->aspects == VK_IMAGE_ASPECT_COLOR_BIT) { > > - const struct isl_format_layout *isl_layout = > > - isl_format_get_layout(image->color_surface.isl.format); > > - return (VkExtent3D) { isl_layout->bw, isl_layout->bh, > isl_layout->bd }; > > - } else { > > - return (VkExtent3D) { 1, 1, 1 }; > > - } > > -} > > - > > -/* Returns
Re: [Mesa-dev] [PATCH 29/33] anv: Use blorp for CopyImage
On Wed, Aug 31, 2016 at 02:22:48PM -0700, Jason Ekstrand wrote: > --- > src/intel/vulkan/anv_blorp.c | 67 + > src/intel/vulkan/anv_meta_copy.c | 158 > --- > 2 files changed, 67 insertions(+), 158 deletions(-) > > diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c > index 5fa6699..89ff3b3 100644 > --- a/src/intel/vulkan/anv_blorp.c > +++ b/src/intel/vulkan/anv_blorp.c > @@ -168,6 +168,73 @@ get_blorp_surf_for_anv_image(const struct anv_image > *image, > }; > } > > +void anv_CmdCopyImage( > +VkCommandBuffer commandBuffer, > +VkImage srcImage, > +VkImageLayout srcImageLayout, > +VkImage dstImage, > +VkImageLayout dstImageLayout, > +uint32_tregionCount, > +const VkImageCopy* pRegions) > +{ > + ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer); > + ANV_FROM_HANDLE(anv_image, src_image, srcImage); > + ANV_FROM_HANDLE(anv_image, dst_image, dstImage); > + > + struct blorp_batch batch; > + blorp_batch_init(_buffer->device->blorp, , cmd_buffer); > + > + for (unsigned r = 0; r < regionCount; r++) { > + VkOffset3D srcOffset = > + anv_sanitize_image_offset(src_image->type, pRegions[r].srcOffset); > + VkOffset3D dstOffset = > + anv_sanitize_image_offset(src_image->type, pRegions[r].dstOffset); > + VkExtent3D extent = > + anv_sanitize_image_extent(src_image->type, pRegions[r].extent); > + > + unsigned dst_base_layer, layer_count; > + if (dst_image->type == VK_IMAGE_TYPE_3D) { > + dst_base_layer = dstOffset.z; > + layer_count = extent.depth; In the patch titled, "anv/blorp: Use the correct coordinates in CmdCopyImage," you replace extent.depth with pRegions[r].extent.depth. I think it also makes sense to change the assignment of src_base_layer and dest_base_layer with the offsets straight from user when we know the image type is VK_IMAGE_TYPE_3D. When the type is 3D, the sanitize functions do nothing. With or without that change, and assuming the fixup patch is amended, this patch is, Reviewed-by: Nanley Chery> + } else { > + dst_base_layer = pRegions[r].dstSubresource.baseArrayLayer; > + layer_count = pRegions[r].dstSubresource.layerCount; > + } > + > + unsigned src_base_layer; > + if (src_image->type == VK_IMAGE_TYPE_3D) { > + src_base_layer = srcOffset.z; > + } else { > + src_base_layer = pRegions[r].srcSubresource.baseArrayLayer; > + assert(pRegions[r].srcSubresource.layerCount == layer_count); > + } > + > + assert(pRegions[r].srcSubresource.aspectMask == > + pRegions[r].dstSubresource.aspectMask); > + > + uint32_t a; > + for_each_bit(a, pRegions[r].dstSubresource.aspectMask) { > + VkImageAspectFlagBits aspect = (1 << a); > + > + struct blorp_surf src_surf, dst_surf; > + get_blorp_surf_for_anv_image(src_image, aspect, _surf); > + get_blorp_surf_for_anv_image(dst_image, aspect, _surf); > + > + for (unsigned i = 0; i < layer_count; i++) { > +blorp_copy(, _surf, > pRegions[r].srcSubresource.mipLevel, > + src_base_layer + i, > + _surf, pRegions[r].dstSubresource.mipLevel, > + dst_base_layer + i, > + srcOffset.x, srcOffset.y, > + dstOffset.x, dstOffset.y, > + extent.width, extent.height); > + } > + } > + } > + > + blorp_batch_finish(); > +} > + > static void > copy_buffer_to_image(struct anv_cmd_buffer *cmd_buffer, > struct anv_buffer *anv_buffer, > diff --git a/src/intel/vulkan/anv_meta_copy.c > b/src/intel/vulkan/anv_meta_copy.c > index 5df04e6..b33273e 100644 > --- a/src/intel/vulkan/anv_meta_copy.c > +++ b/src/intel/vulkan/anv_meta_copy.c > @@ -23,63 +23,6 @@ > > #include "anv_meta.h" > > -static VkExtent3D > -meta_image_block_size(const struct anv_image *image) > -{ > - if (image->aspects == VK_IMAGE_ASPECT_COLOR_BIT) { > - const struct isl_format_layout *isl_layout = > - isl_format_get_layout(image->color_surface.isl.format); > - return (VkExtent3D) { isl_layout->bw, isl_layout->bh, isl_layout->bd }; > - } else { > - return (VkExtent3D) { 1, 1, 1 }; > - } > -} > - > -/* Returns the user-provided VkBufferImageCopy::imageExtent in units of > - * elements rather than texels. One element equals one texel or one block > - * if Image is uncompressed or compressed, respectively. > - */ > -static struct VkExtent3D > -meta_region_extent_el(const struct anv_image *image, > - const struct VkExtent3D
Re: [Mesa-dev] [PATCH 00/12] radeon/winsyses: various cleanups
On Fri, Sep 9, 2016 at 7:34 PM, Nicolai Hähnlewrote: > Hi, > > this is a bunch of random cleanups in and related to radeon and amdgpu > winsyses. > > There is one significant behavioral change, which is that amdgpu now only > keeps a single fence around for each buffer object. See the explanation > at that commit. Please review! It's not really a behavioral change, is it? For the series: Reviewed-by: Marek Olšák Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 2/3] egl: return corresponding offset of EGLImage instead of 0.
I'm not sure calling queryImage with an unsupported attribute is legal, thus I think a small check doesn't hurt. It'd give if (offsets) { offsets[0] = 0; if (dri2_dpy->image->base.version >= 13) { EGLint img_offset = 0; bool ret = dri2_dpy->image->queryImage(dri2_img->dri_image, __DRI_IMAGE_ATTRIB_OFFSET, _offset); if (ret) offsets[0] = img_offset; } } return EGL_TRUE; or alternatively, if you think not being able to give the offset indicates an error, if (offsets) { offsets[0] = 0; if (dri2_dpy->image->base.version >= 13) { EGLint img_offset = 0; bool ret = dri2_dpy->image->queryImage(dri2_img->dri_image, __DRI_IMAGE_ATTRIB_OFFSET, _offset); if (!ret) return EGL_FALSE; offsets[0] = img_offset; } } return EGL_TRUE; Axel Davy On 09/09/2016 17:34, Weng, Chuanbo wrote: The comments from Emil can be summarized into the following code: if (offsets) { offsets[0] = 0; EGLint img_offset = 0; bool ret = dri2_dpy->image->queryImage(dri2_img->dri_image, __DRI_IMAGE_ATTRIB_OFFSET, _offset); if(ret == true) offsets[0] = img_offset; } return EGL_TRUE; Emil, is it right? -Original Message- From: Emil Velikov [mailto:emil.l.veli...@gmail.com] Sent: Friday, September 9, 2016 8:32 PM To: Weng, ChuanboCc: ML mesa-dev ; Axel Davy Subject: Re: [PATCH v2 2/3] egl: return corresponding offset of EGLImage instead of 0. On 9 September 2016 at 08:58, Chuanbo Weng wrote: The offset should not always be 0. For example, if EGLImage is created from a 2D texture with EGL_GL_TEXTURE_LEVEL=1, then the offset should be the actual start of miplevel 1 in bo. v2: Add version check of __DRIimageExtension implementation (Suggested by Axel Davy). Just to elaborate what Axel was saying from another perspective: With current master the offset is explicitly zeroed, while with the (v2) patch you'll _only_ set the value when you have v13 driver. Thus using the loader + v12 driver you'll get a regression since a) the offset will remain garbage and b) the function (dri2_export_dma_buf_image_mesa) will fail. Signed-off-by: Chuanbo Weng --- src/egl/drivers/dri2/egl_dri2.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index 859612f..c529e55 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -2249,6 +2249,7 @@ dri2_export_dma_buf_image_mesa(_EGLDriver *drv, _EGLDisplay *disp, _EGLImage *im struct dri2_egl_image *dri2_img = dri2_egl_image(img); (void) drv; + EGLint img_offset = 0; /* rework later to provide multiple fds/strides/offsets */ if (fds) @@ -2259,8 +2260,14 @@ dri2_export_dma_buf_image_mesa(_EGLDriver *drv, _EGLDisplay *disp, _EGLImage *im dri2_dpy->image->queryImage(dri2_img->dri_image, __DRI_IMAGE_ATTRIB_STRIDE, strides); - if (offsets) + if (offsets){ offsets[0] = 0; + if(dri2_dpy->image->base.version >= 13){ Please move the variable declaration in local scope and add space between ){ Functionality wise we don't need the version check, esp. since you want to set the offset only when queryImage() succeeds... + dri2_dpy->image->queryImage(dri2_img->dri_image, + __DRI_IMAGE_ATTRIB_OFFSET, _offset); ... which you don't seem to be checking here, so you'll effectively store/use garbage. So unless Alex feels strongly for the version check I'd omit it, and fix the rest of this patch. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 28/57] i965/fs: Take into account copy register offset during compute-to-mrf.
Iago Toralwrites: > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: >> This was dropping 'inst->dst.offset' on the floor. Nothing in the >> code above seems to guarantee that it's zero and in that case the >> offset of the register being coalesced into wouldn't be taken into >> account while rewriting the generating instruction. >> --- >> src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >> b/src/mesa/drivers/dri/i965/brw_fs.cpp >> index 98da06c..5b7ca98 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >> @@ -2821,7 +2821,7 @@ fs_visitor::compute_to_mrf() >> } >> >> scan_inst->dst.file = MRF; >> -scan_inst->dst.offset %= REG_SIZE; >> +scan_inst->dst.offset = inst->dst.offset + scan_inst- >> >dst.offset % REG_SIZE; > > So if we had something like this: > > 0: mov(4) r1.4:F r0.0:F > 1: mov(4) m1:F r1.4:F > > This pass is trying to get us: > > mov(4) m1:F r0.0:F > Yeah, that's what it should be doing, but the current code (already before this series) would give you: | mov(4) m1.4:F r0.0:F which is pretty bogus. > In that case, the offset into the destination of the instruction we are > propagating from should not affect the offset into the MRF we are writing to. > Or maybe I am missing what is going on here :) Yeah, that's exactly the problem fixed in the next patch. Note that this commit doesn't actually introduce the bug, it's preserving the buggy behavior of preexisting code, which happens to become more obvious due to the explicit 'scan_inst->dst.offset % REG_SIZE' term instead of the '%=' operator. > >> scan_inst->saturate |= inst->saturate; >> if (!regs_left) >> break; signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 27/57] i965/vec4: Drop backend_reg::in_range() in favor of regions_overlap().
Iago Toralwrites: > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: >> This makes sure that overlap checks are done correctly throughout the >> back-end when the '*this' register starts before the register/size >> pair provided as argument, and is actually less annoying to use than >> in_range() at this point since regions_overlap() takes its size >> arguments in bytes. >> --- >> src/mesa/drivers/dri/i965/brw_shader.cpp| 9 - >> >> src/mesa/drivers/dri/i965/brw_shader.h | 1 - >> src/mesa/drivers/dri/i965/brw_vec4.cpp | 14 >> -- >> src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 4 ++-- >> src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 4 ++-- >> 5 files changed, 12 insertions(+), 20 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp >> b/src/mesa/drivers/dri/i965/brw_shader.cpp >> index e599235..951e6b2 100644 >> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp >> @@ -745,15 +745,6 @@ backend_reg::is_accumulator() const >> } >> >> bool >> -backend_reg::in_range(const backend_reg , unsigned n) const >> -{ >> - return (file == r.file && >> - nr == r.nr && >> - offset >= r.offset && >> - offset < r.offset + n * REG_SIZE); >> -} >> - >> -bool >> backend_instruction::is_commutative() const >> { >> switch (opcode) { >> diff --git a/src/mesa/drivers/dri/i965/brw_shader.h >> b/src/mesa/drivers/dri/i965/brw_shader.h >> index 0de0808..ba2404a 100644 >> --- a/src/mesa/drivers/dri/i965/brw_shader.h >> +++ b/src/mesa/drivers/dri/i965/brw_shader.h >> @@ -62,7 +62,6 @@ struct backend_reg : private brw_reg >> bool is_negative_one() const; >> bool is_null() const; >> bool is_accumulator() const; >> - bool in_range(const backend_reg , unsigned n) const; >> >> /** Offset from the start of the (virtual) register in bytes. */ >> uint16_t offset; >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp >> b/src/mesa/drivers/dri/i965/brw_vec4.cpp >> index 561170c..eb2888f 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp >> @@ -1143,7 +1143,8 @@ vec4_visitor::opt_register_coalesce() >> inst) { >> _scan_inst = scan_inst; >> >> - if (inst->src[0].in_range(scan_inst->dst, >> DIV_ROUND_UP(scan_inst->size_written, REG_SIZE))) { >> + if (regions_overlap(inst->src[0], inst->size_read(0), >> + scan_inst->dst, scan_inst- >> >size_written)) { >> /* Found something writing to the reg we want to >> coalesce away. */ >> if (to_mrf) { >> /* SEND instructions can't have MRF as a destination. >> */ >> @@ -1197,8 +1198,8 @@ vec4_visitor::opt_register_coalesce() >> */ >> bool interfered = false; >> for (int i = 0; i < 3; i++) { >> -if (inst->src[0].in_range(scan_inst->src[i], >> - DIV_ROUND_UP(scan_inst- >> >size_read(i), REG_SIZE))) >> +if (regions_overlap(inst->src[0], inst->size_read(0), >> +scan_inst->src[i], scan_inst- >> >size_read(i))) >> interfered = true; >> } >> if (interfered) >> @@ -1207,7 +1208,8 @@ vec4_visitor::opt_register_coalesce() >> /* If somebody else writes the same channels of our >> destination here, >> * we can't coalesce before that. >> */ >> - if (inst->dst.in_range(scan_inst->dst, >> DIV_ROUND_UP(scan_inst->size_written, REG_SIZE)) && >> + if (regions_overlap(inst->dst, inst->size_written, >> + scan_inst->dst, scan_inst- >> >size_written) && >> (inst->dst.writemask & scan_inst->dst.writemask) != 0) >> { >> break; >> } >> @@ -1223,8 +1225,8 @@ vec4_visitor::opt_register_coalesce() >> } >> } else { >> for (int i = 0; i < 3; i++) { >> - if (inst->dst.in_range(scan_inst->src[i], >> - DIV_ROUND_UP(scan_inst- >> >size_read(i), REG_SIZE))) >> + if (regions_overlap(inst->dst, inst->size_written, >> + scan_inst->src[i], scan_inst- >> >size_read(i))) >> interfered = true; >> } >> if (interfered) >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp >> b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp >> index e74bc15..9977317 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp >> @@ -68,8 +68,8 @@ opt_cmod_propagation_local(bblock_t *block) >> >> bool read_flag = false; >>
Re: [Mesa-dev] [PATCH 16/57] i965/fs: Take into account trailing padding in regs_written() and regs_read().
Iago Toralwrites: > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: >> This fixes regs_written() and regs_read() to return a more accurate >> value when the padding left between components due to a stride value >> greater than one causes the region bounds given by size_written or >> size_read to overflow into the next register. This could become a >> problem in optimization passes that keep track of dataflow using >> fixed-size arrays with register granularity, because the overflow >> register (not actually accessed by the region) may not have been >> allocated at all which could lead to undefined memory access. > > ah, nice catch! I am curious: did you find any cases where this was > creating a real problem or just something you noticed by inspection? > Yeah, it definitely fixed real problems in a bunch of FP64 tests (I forgot which ones but I could dig it up if you're interested) which would have regressed from the next commits that fix regs_written and regs_read to take into account misalignment -- The only reason this wasn't a problem before is that we weren't rounding up the register count correctly in most places... >> An alternative to this would be to subtract the trailing padding >> already during the calculation of fs_inst::size_read and >> ::size_written, but that would break code that currently assumes that >> ::size_read and _written are whole multiples of the component size, >> and would be hard to maintain looking forward because size_written is >> assigned from a bunch of different places. > > Yeah, this would be a more problematic solution. > >> --- >> src/mesa/drivers/dri/i965/brw_ir_fs.h | 22 -- >> 1 file changed, 20 insertions(+), 2 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h >> b/src/mesa/drivers/dri/i965/brw_ir_fs.h >> index 4ac9baa..0be67b7 100644 >> --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h >> +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h >> @@ -192,6 +192,20 @@ reg_offset(const fs_reg ) >> } >> >> /** >> + * Return the amount of padding in bytes left unused between >> individual >> + * components of register \p r due to a (horizontal) stride value >> greater than >> + * one, or zero if components are tightly packed in the register >> file. >> + */ >> +static inline unsigned >> +reg_padding(const fs_reg ) >> +{ >> + const unsigned stride = ((r.file != ARF && r.file != FIXED_GRF) ? >> r.stride : >> +r.hstride == 0 ? 0 : >> +1 << (r.hstride - 1)); >> + return (MAX2(1, stride) - 1) * type_sz(r.type); >> +} >> + >> +/** >> * Return whether the register region starting at \p r and spanning >> \p dr >> * bytes could potentially overlap the register region starting at >> \p s and >> * spanning \p ds bytes. >> @@ -423,7 +437,9 @@ regs_written(const fs_inst *inst) >> { >> /* XXX - Take into account register-misaligned offsets correctly. >> */ >> assert(inst->dst.file != UNIFORM && inst->dst.file != IMM); >> - return DIV_ROUND_UP(inst->size_written, REG_SIZE); >> + return DIV_ROUND_UP(inst->size_written - >> + MIN2(inst->size_written, reg_padding(inst- >> >dst)), >> + REG_SIZE); >> } >> >> /** >> @@ -438,7 +454,9 @@ regs_read(const fs_inst *inst, unsigned i) >> /* XXX - Take into account register-misaligned offsets correctly. >> */ >> const unsigned reg_size = >> inst->src[i].file == UNIFORM || inst->src[i].file == IMM ? 4 : >> REG_SIZE; >> - return DIV_ROUND_UP(inst->size_read(i), reg_size); >> + return DIV_ROUND_UP(inst->size_read(i) - >> + MIN2(inst->size_read(i), reg_padding(inst- >> >src[i])), >> + reg_size); >> } >> >> #endif signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 15/57] i965/fs: Handle fixed HW GRF subnr in reg_offset().
Iago Toralwrites: > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: >> This will be useful later on when we start using reg_offset() on >> fixed >> hardware registers. >> --- >> src/mesa/drivers/dri/i965/brw_ir_fs.h | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h >> b/src/mesa/drivers/dri/i965/brw_ir_fs.h >> index 2e5c8e5..4ac9baa 100644 >> --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h >> +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h >> @@ -187,7 +187,8 @@ static inline unsigned >> reg_offset(const fs_reg ) >> { >> return (r.file == VGRF || r.file == IMM ? 0 : r.nr) * >> - (r.file == UNIFORM ? 4 : REG_SIZE) + r.offset; >> + (r.file == UNIFORM ? 4 : REG_SIZE) + r.offset + >> + (r.file == ARF || r.file == FIXED_GRF ? r.subnr : 0); > > I guess that for ARF and FIXED_GRF r.offset is expected to be 0, right? It doesn't really matter as far as this function is concerned: If it's zero it won't affect the result of the function, if it's non-zero it will be taken into account correctly. Some other back-end code may break though if a fixed grf register has non-zero offset, but I'd call that code responsible for asserting "offset == 0" if it's unable to deal with it. > In that case, should we make it more explicit here by adding an assert, > and/or maybe separating the offset calculation for these register types > (so it only uses nr and subnr) from the others? > > Iago > >> } >> >> /** signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 25/57] i965/fs: Stop using fs_reg::in_range() in favor of regions_overlap().
Iago Toralwrites: > On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote: >> Its only use left in the FS back-end should be using >> regions_overlap() >> instead to avoid getting a false negative result in cases where >> source >> and destination overlap but the former starts before the latter in >> the >> VGRF file. >> --- >> src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> index 42ed131..b06606b 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> @@ -4820,7 +4820,8 @@ shuffle_64bit_data_for_32bit_write(const >> fs_builder , >> assert(type_sz(src.type) == 8); >> assert(type_sz(dst.type) == 4); >> >> - assert(!src.in_range(dst, 2 * components * bld.dispatch_width() / >> 8)); >> + assert(!regions_overlap(dst, 2 * >> dst.component_size(bld.dispatch_width()), >> + src, >> src.component_size(bld.dispatch_width(; > > I think you need to multiply the sizes of the regions for dst and src > by the number of components being shuffled, like in the original > assert, otherwise we would only be testing if there is overlap for the > first component. > Good point, thanks! >> for (unsigned i = 0; i < components; i++) { >> const fs_reg component_i = offset(src, bld, i); signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] i965: Use blorp_copy for all copy_image operations on gen6+
On Fri 09 Sep 2016, Jason Ekstrand wrote: > > On Sep 9, 2016 12:23 PM, "Chad Versace"wrote: > > Acked-by: Chad Versace > > Sure, review the easy patch and ack the interesting ones... Yep! > > Just an ack, because I'm not very familiar with the new blorp code. > > The blorp_copy entrypoint does exactly what it looks like it does... Including > handling all the crazy edge-cases... If you're willing to trust it, 3 and 4 > are > trivial. :-) I was unsure exactly what blorp_copy did. Is it a bit-for-bit copy? Does it do format translation? I just read the blorp_copy code, and revisited the glCopyTexImage2D manpage, and yeah... it's not scary. I trust this code now. For patches 3 and 4, Reviewed-by: Chad Versace Patch 2, though, is still just an ack. I don't have time to review all the implications of patch 2. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] intel/isl: Add support for RGB formats in X and Y-tiled memory
On Sep 9, 2016 12:05 PM, "Chad Versace"wrote: > > On Fri 09 Sep 2016, Jason Ekstrand wrote: > > Normally, using a non-linear tiling format helps improve cache locality by > > ensuring that neighboring pixels are usually close-by in memory. For RGB > > formats, this still sort-of holds, but it can also lead to rather terrible > > memory access patterns where a single RGB pixel value crosses a tile > > boundary and gets split into two pieces in different 4K pages. It also > > makes for some rather awkward calculations because your tile size is no > > longer an even multiple of surface element size. For these reasons, we > > chose to simply never create tiled RGB images in the Vulkan driver. > > > > The GL driver, however, is not so kind so we need to support it somehow. I > > briefly toyed with a couple of different schemes but this is the best one I > > could come up with. The fundamental problem is that a tile no longer > > contains an integer number of surface elements. I briefly considered a > > couple other options but found them wanting: > > > > 1) Using floats for the logical tile size. This leads to potential > > rounding error problems. > > Let's stay far away from floats With floats, there is just too much risk > for weird bugs. > > Floats are for representing ℝ. Our calculations are in ℚ, so let's > restrict ourselves to ints. (Does GMail display the blackboard bold > UTF-8 correctly?) It does! Even on my phone. > > 2) When presented with a RGB format, just make the tile 3-times as wide. > > This isn't so nice because now our tiles are no longer power-of-two > > size. Also, it can force the row_pitch to be larger than needed which, > > while not strictly a problem for ISL, causes incompatibility problems > > with the way the GL driver chooses surface pitches. > > > > The chosen method requires that you pay attention and not just assume that > > your tile_info is in the units you think it is. However, it's nice because > > it provides a nice "these are the units" declaration in isl_tile_info > > itself. Previously, the tile_info wasn't usable as a stand-alone structure > > because you had to also know the format. It also forces figuring out how > > to deal with inconsistencies between tiling and format back to the caller > > which is good because the two different consumers of isl_tile_info really > > want to deal with it differently: Computation of the surface size wants > > the fewest number of horizontal tiles possible while get_intratile_offset > > is far more concerned with things aligning nicely. > > > > Signed-off-by: Jason Ekstrand > > Cc: Chad Versace > > --- > > src/intel/isl/isl.c | 49 - > > src/intel/isl/isl.h | 10 +- > > 2 files changed, 45 insertions(+), 14 deletions(-) > > > > diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c > > index 34245f4..8c72186 100644 > > --- a/src/intel/isl/isl.c > > +++ b/src/intel/isl/isl.c > > @@ -113,7 +113,16 @@ isl_tiling_get_info(const struct isl_device *dev, > > const uint32_t bs = format_bpb / 8; > > struct isl_extent2d logical_el, phys_B; > > > > - assert(tiling == ISL_TILING_LINEAR || isl_is_pow2(format_bpb)); > > + if (tiling != ISL_TILING_LINEAR && ! isl_is_pow2(format_bpb)) { > -^^^ > 1. Weirdo space after '!' > > > + /* It is possible to have non-power-of-two formats in a tiled buffer. > > + * The easiest way to handle this is to treat the tile as if it is three > > + * times as wide. This way no pixel will ever cross a tile boundary. > > + * This really only works on legacy X and Y tiling formats. > > + */ > > + assert(tiling == ISL_TILING_X || tiling == ISL_TILING_Y0); > > + assert(bs % 3 == 0 && isl_is_pow2(format_bpb / 3)); > > + return isl_tiling_get_info(dev, tiling, format_bpb / 3, tile_info); > > + } > > > > switch (tiling) { > > case ISL_TILING_LINEAR: > > @@ -209,6 +218,7 @@ isl_tiling_get_info(const struct isl_device *dev, > > > > *tile_info = (struct isl_tile_info) { > >.tiling = tiling, > > + .format_bpb = format_bpb, > >.logical_extent_el = logical_el, > >.phys_extent_B = phys_B, > > }; > > @@ -1215,14 +1225,20 @@ isl_surf_init_s(const struct isl_device *dev, > >} > >base_alignment = isl_round_up_to_power_of_two(base_alignment); > > } else { > > + assert(fmtl->bpb % tile_info.format_bpb == 0); > > + const uint32_t tile_el_scale = fmtl->bpb / tile_info.format_bpb; > > > + assert(tile_el_scale == 1 || tile_el_scale == 3); > > 2. Without the context of this patch, the assertion is very mysterious. > Future people will be afraid to touch it, as there's no clue to the > magic 3's justification. Please add a comment. Yes, it can be dropped. At one point I was trying harder to be clever
Re: [Mesa-dev] [PATCH 4/4] i965: Use blorp_copy for all copy_image operations on gen6+
On Sep 9, 2016 12:23 PM, "Chad Versace"wrote: > > Patches 3 and 4, despite their craziness, are They're not that crazy... > Acked-by: Chad Versace Sure, review the easy patch and ack the interesting ones... > Just an ack, because I'm not very familiar with the new blorp code. The blorp_copy entrypoint does exactly what it looks like it does... Including handling all the crazy edge-cases... If you're willing to trust it, 3 and 4 are trivial. :-) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl: Document why EGL_OPENGL{, _ES}_API are mostly identical
On Thu 08 Sep 2016, Alejandro Piñeiro wrote: > > > On 07/09/16 23:41, Adam Jackson wrote: > > Signed-off-by: Adam Jackson> > --- > > src/egl/main/eglcontext.c | 10 ++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/src/egl/main/eglcontext.c b/src/egl/main/eglcontext.c > > index ebc004d..057b60f 100644 > > --- a/src/egl/main/eglcontext.c > > +++ b/src/egl/main/eglcontext.c > > @@ -457,6 +457,16 @@ _eglParseContextAttribList(_EGLContext *ctx, > > _EGLDisplay *dpy, > > /** > > * Initialize the given _EGLContext object to defaults and/or the values > > * in the attrib_list. > > + * > > + * According to EGL 1.5 Section 3.7: > > + * > > + * "EGL_OPENGL_API and EGL_OPENGL_ES_API are interchangeable for all > > + * purposes except eglCreateContext." > > Usually spec quotations add some indentation. Take a look to line 109 on > the same file. > > > + * > > + * And since we only support GL and GLES, this is the only place where the > > + * bound API matters at all. We look up the current API from the current > > + * thread, and stash that in the context we're initializing. Our caller is > > + * responsible for determining whether that's an API it supports. > > */ > > EGLBoolean > > _eglInitContext(_EGLContext *ctx, _EGLDisplay *dpy, _EGLConfig *conf, > > > > > With that fixed: > Reviewed-by: Alejandro Piñeiro Same indentation request. Reviewed-by: Chad Versace ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] i965: Use blorp_copy for all copy_image operations on gen6+
Patches 3 and 4, despite their craziness, are Acked-by: Chad VersaceJust an ack, because I'm not very familiar with the new blorp code. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Add a citation for uniform precision matching.
Ian Romanickwrites: > On 09/06/2016 03:24 PM, Kenneth Graunke wrote: >> On Tuesday, September 6, 2016 1:04:43 PM PDT Eric Anholt wrote: >>> Kenneth Graunke writes: >>> Ian added this check in commit 259fc505454ea6a67aeacf6cdebf1398d9947759. While reviewing the rules, I found a citation which spells this out clearly, so I figured I'd send a patch to add it as a comment. Cc: i...@freedesktop.org Signed-off-by: Kenneth Graunke >>> >>> Reviewed-by: Eric Anholt >>> >>> I was wondering about this change, because glmark2 is failing to compile >>> its terrain shaders now. > > Out of curiosity... is the uniform used in all the stages? In glmark2 -b terrain, it's defined but not used in VS. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nvc0/ir: joins get a sched of 0x2f
slightly improves performance for GpuTest /test=pixmark_piano /benchmark /no_scorebox /msaa=0 /benchmark_duration_ms=6 /width=1024 /height=640 score: 1031 -> 1033 observed from the binary generated by nvidia Signed-off-by: Karol Herbst--- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index d83028c..a8e9a33 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp @@ -3033,7 +3033,7 @@ SchedDataCalculator::setDelay(Instruction *insn, int delay, Instruction *next) insn->sched = 0xc2; } else if (insn->op == OP_JOIN || insn->join) { - insn->sched = 0x00; + insn->sched = 0x2f; } else if (delay >= 0 || prevData == 0x04 || !next || !targ->canDualIssue(insn, next)) { -- 2.10.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] intel/isl: Add support for RGB formats in X and Y-tiled memory
On Fri 09 Sep 2016, Jason Ekstrand wrote: > Normally, using a non-linear tiling format helps improve cache locality by > ensuring that neighboring pixels are usually close-by in memory. For RGB > formats, this still sort-of holds, but it can also lead to rather terrible > memory access patterns where a single RGB pixel value crosses a tile > boundary and gets split into two pieces in different 4K pages. It also > makes for some rather awkward calculations because your tile size is no > longer an even multiple of surface element size. For these reasons, we > chose to simply never create tiled RGB images in the Vulkan driver. > > The GL driver, however, is not so kind so we need to support it somehow. I > briefly toyed with a couple of different schemes but this is the best one I > could come up with. The fundamental problem is that a tile no longer > contains an integer number of surface elements. I briefly considered a > couple other options but found them wanting: > > 1) Using floats for the logical tile size. This leads to potential > rounding error problems. Let's stay far away from floats With floats, there is just too much risk for weird bugs. Floats are for representing ℝ. Our calculations are in ℚ, so let's restrict ourselves to ints. (Does GMail display the blackboard bold UTF-8 correctly?) > 2) When presented with a RGB format, just make the tile 3-times as wide. > This isn't so nice because now our tiles are no longer power-of-two > size. Also, it can force the row_pitch to be larger than needed which, > while not strictly a problem for ISL, causes incompatibility problems > with the way the GL driver chooses surface pitches. > > The chosen method requires that you pay attention and not just assume that > your tile_info is in the units you think it is. However, it's nice because > it provides a nice "these are the units" declaration in isl_tile_info > itself. Previously, the tile_info wasn't usable as a stand-alone structure > because you had to also know the format. It also forces figuring out how > to deal with inconsistencies between tiling and format back to the caller > which is good because the two different consumers of isl_tile_info really > want to deal with it differently: Computation of the surface size wants > the fewest number of horizontal tiles possible while get_intratile_offset > is far more concerned with things aligning nicely. > > Signed-off-by: Jason Ekstrand> Cc: Chad Versace > --- > src/intel/isl/isl.c | 49 - > src/intel/isl/isl.h | 10 +- > 2 files changed, 45 insertions(+), 14 deletions(-) > > diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c > index 34245f4..8c72186 100644 > --- a/src/intel/isl/isl.c > +++ b/src/intel/isl/isl.c > @@ -113,7 +113,16 @@ isl_tiling_get_info(const struct isl_device *dev, > const uint32_t bs = format_bpb / 8; > struct isl_extent2d logical_el, phys_B; > > - assert(tiling == ISL_TILING_LINEAR || isl_is_pow2(format_bpb)); > + if (tiling != ISL_TILING_LINEAR && ! isl_is_pow2(format_bpb)) { -^^^ 1. Weirdo space after '!' > + /* It is possible to have non-power-of-two formats in a tiled buffer. > + * The easiest way to handle this is to treat the tile as if it is > three > + * times as wide. This way no pixel will ever cross a tile boundary. > + * This really only works on legacy X and Y tiling formats. > + */ > + assert(tiling == ISL_TILING_X || tiling == ISL_TILING_Y0); > + assert(bs % 3 == 0 && isl_is_pow2(format_bpb / 3)); > + return isl_tiling_get_info(dev, tiling, format_bpb / 3, tile_info); > + } > > switch (tiling) { > case ISL_TILING_LINEAR: > @@ -209,6 +218,7 @@ isl_tiling_get_info(const struct isl_device *dev, > > *tile_info = (struct isl_tile_info) { >.tiling = tiling, > + .format_bpb = format_bpb, >.logical_extent_el = logical_el, >.phys_extent_B = phys_B, > }; > @@ -1215,14 +1225,20 @@ isl_surf_init_s(const struct isl_device *dev, >} >base_alignment = isl_round_up_to_power_of_two(base_alignment); > } else { > + assert(fmtl->bpb % tile_info.format_bpb == 0); > + const uint32_t tile_el_scale = fmtl->bpb / tile_info.format_bpb; > + assert(tile_el_scale == 1 || tile_el_scale == 3); 2. Without the context of this patch, the assertion is very mysterious. Future people will be afraid to touch it, as there's no clue to the magic 3's justification. Please add a comment. > + >assert(phys_slice0_sa.w % fmtl->bw == 0); >const uint32_t total_w_el = phys_slice0_sa.width / fmtl->bw; >const uint32_t total_w_tl = > - isl_align_div(total_w_el, tile_info.logical_extent_el.width); > + isl_align_div(total_w_el * tile_el_scale, > +
Re: [Mesa-dev] [PATCH 0/3] Implement EGL_KHR_no_config_context
Adam Jacksonwrites: > KHR_no_config_context is virtually identical to MESA_configless_context, > where they differ is only where the Mesa spec is silent, and for the > most part the code already did what the Khronos spec wants. Fix up the > one corner case, and mark the Mesa extension as superseded. This series is: Reviewed-by: Eric Anholt signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 07/11] glsl: add gl_LocalGroupSizeARB as a system value
On Thu, Sep 8, 2016 at 4:31 PM, Samuel Pitoisetwrote: > Signed-off-by: Samuel Pitoiset > --- > src/compiler/glsl/builtin_variables.cpp | 2 ++ > src/compiler/shader_enums.h | 1 + > 2 files changed, 3 insertions(+) > > diff --git a/src/compiler/glsl/builtin_variables.cpp > b/src/compiler/glsl/builtin_variables.cpp > index f47daab..a1768fc 100644 > --- a/src/compiler/glsl/builtin_variables.cpp > +++ b/src/compiler/glsl/builtin_variables.cpp > @@ -1236,6 +1236,8 @@ builtin_variable_generator::generate_cs_special_vars() > "gl_LocalInvocationID"); > add_system_value(SYSTEM_VALUE_WORK_GROUP_ID, uvec3_t, "gl_WorkGroupID"); > add_system_value(SYSTEM_VALUE_NUM_WORK_GROUPS, uvec3_t, > "gl_NumWorkGroups"); > + add_system_value(SYSTEM_VALUE_LOCAL_GROUP_SIZE, > +uvec3_t, "gl_LocalGroupSizeARB"); if (state->ARB_bla_bla_bla_enable) { } I think > if (state->ctx->Const.LowerCsDerivedVariables) { >add_variable("gl_GlobalInvocationID", uvec3_t, ir_var_auto, 0); >add_variable("gl_LocalInvocationIndex", uint_t, ir_var_auto, 0); > diff --git a/src/compiler/shader_enums.h b/src/compiler/shader_enums.h > index c3a62e0..b6e048e 100644 > --- a/src/compiler/shader_enums.h > +++ b/src/compiler/shader_enums.h > @@ -472,6 +472,7 @@ typedef enum > SYSTEM_VALUE_GLOBAL_INVOCATION_ID, > SYSTEM_VALUE_WORK_GROUP_ID, > SYSTEM_VALUE_NUM_WORK_GROUPS, > + SYSTEM_VALUE_LOCAL_GROUP_SIZE, > /*@}*/ > > /** > -- > 2.9.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 07/11] glsl: add gl_LocalGroupSizeARB as a system value
This patch is Reviewed-by: Ian RomanickOn 09/08/2016 01:31 PM, Samuel Pitoiset wrote: > Signed-off-by: Samuel Pitoiset > --- > src/compiler/glsl/builtin_variables.cpp | 2 ++ > src/compiler/shader_enums.h | 1 + > 2 files changed, 3 insertions(+) > > diff --git a/src/compiler/glsl/builtin_variables.cpp > b/src/compiler/glsl/builtin_variables.cpp > index f47daab..a1768fc 100644 > --- a/src/compiler/glsl/builtin_variables.cpp > +++ b/src/compiler/glsl/builtin_variables.cpp > @@ -1236,6 +1236,8 @@ builtin_variable_generator::generate_cs_special_vars() > "gl_LocalInvocationID"); > add_system_value(SYSTEM_VALUE_WORK_GROUP_ID, uvec3_t, "gl_WorkGroupID"); > add_system_value(SYSTEM_VALUE_NUM_WORK_GROUPS, uvec3_t, > "gl_NumWorkGroups"); > + add_system_value(SYSTEM_VALUE_LOCAL_GROUP_SIZE, > +uvec3_t, "gl_LocalGroupSizeARB"); > if (state->ctx->Const.LowerCsDerivedVariables) { >add_variable("gl_GlobalInvocationID", uvec3_t, ir_var_auto, 0); >add_variable("gl_LocalInvocationIndex", uint_t, ir_var_auto, 0); > diff --git a/src/compiler/shader_enums.h b/src/compiler/shader_enums.h > index c3a62e0..b6e048e 100644 > --- a/src/compiler/shader_enums.h > +++ b/src/compiler/shader_enums.h > @@ -472,6 +472,7 @@ typedef enum > SYSTEM_VALUE_GLOBAL_INVOCATION_ID, > SYSTEM_VALUE_WORK_GROUP_ID, > SYSTEM_VALUE_NUM_WORK_GROUPS, > + SYSTEM_VALUE_LOCAL_GROUP_SIZE, > /*@}*/ > > /** > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 06/11] glsl/linker: handle errors when a variable local size is used
On 09/08/2016 01:31 PM, Samuel Pitoiset wrote: > Compute shaders can now include a fixed local size as defined by > ARB_compute_shader or a variable size as defined by > ARB_compute_variable_group_size. > > Signed-off-by: Samuel Pitoiset> --- > src/compiler/glsl/linker.cpp | 23 +-- > 1 file changed, 21 insertions(+), 2 deletions(-) > > diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp > index c95edf3..e909455 100644 > --- a/src/compiler/glsl/linker.cpp > +++ b/src/compiler/glsl/linker.cpp > @@ -2074,6 +2074,7 @@ link_cs_input_layout_qualifiers(struct > gl_shader_program *prog, > { > for (int i = 0; i < 3; i++) >linked_shader->info.Comp.LocalSize[i] = 0; Blank line here. > + linked_shader->info.Comp.LocalSizeVariable = false; > > /* This function is called for all shader stages, but it only has an > effect > * for compute shaders. > @@ -2109,6 +2110,20 @@ link_cs_input_layout_qualifiers(struct > gl_shader_program *prog, > linked_shader->info.Comp.LocalSize[i] = > shader->info.Comp.LocalSize[i]; > } > + } else if (shader->info.Comp.LocalSizeVariable) { > + if (linked_shader->info.Comp.LocalSize[0] != 0) { > +/* From the ARB_compute_variable_group_size spec: Spec quote. > + * > + * If one compute shader attached to a program declares a > + * variable local group size and a second compute shader > + * attached to the same program declares a fixed local group > + * size, a link-time error results. > + */ > +linker_error(prog, "computer shader defined with both fixed and " compute :) > + "variable local group size\n"); > +return; > + } > + linked_shader->info.Comp.LocalSizeVariable = true; >} > } > > @@ -2116,12 +2131,16 @@ link_cs_input_layout_qualifiers(struct > gl_shader_program *prog, > * since we already know we're in the right type of shader program > * for doing it. > */ > - if (linked_shader->info.Comp.LocalSize[0] == 0) { > - linker_error(prog, "compute shader didn't declare local size\n"); > + if (linked_shader->info.Comp.LocalSize[0] == 0 && > + !linked_shader->info.Comp.LocalSizeVariable) { > + linker_error(prog, "compute shader must contain a fixed or a variable " > + "local group size\n"); >return; > } > for (int i = 0; i < 3; i++) >prog->Comp.LocalSize[i] = linked_shader->info.Comp.LocalSize[i]; Blank line here. With those 4 nits fixed, this patch is Reviewed-by: Ian Romanick > + prog->Comp.LocalSizeVariable = > + linked_shader->info.Comp.LocalSizeVariable; > } > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 05/11] glsl: reject compute shaders with fixed and variable local size
On 09/08/2016 01:31 PM, Samuel Pitoiset wrote: > The ARB_compute_variable_group_size specification explains that > when a compute shader includes both a fixed and a variable local > size, a compile-time error occurs. I probably would have squashed this in with the previous commit, but it's fine either way. > Signed-off-by: Samuel Pitoiset> --- > src/compiler/glsl/ast_to_hir.cpp | 14 ++ > 1 file changed, 14 insertions(+) > > diff --git a/src/compiler/glsl/ast_to_hir.cpp > b/src/compiler/glsl/ast_to_hir.cpp > index 4fc4c5c..a53a82e 100644 > --- a/src/compiler/glsl/ast_to_hir.cpp > +++ b/src/compiler/glsl/ast_to_hir.cpp > @@ -8013,6 +8013,20 @@ ast_cs_input_layout::hir(exec_list *instructions, >} > } > > + /* From the ARB_compute_variable_group_size specification: Same comment about spec quotes. It's hard to tell from just the patch... is this check order dependent? Does it correct handle all the cases: // case 1 layout(local_size_variable, local_size_x = 32, local_size_y = 32); // case 2 layout(local_size_x = 32, local_size_y = 32, local_size_variable); // case 3 layout(local_size_variable); layout(local_size_x = 32, local_size_y = 32); // case 4 layout(local_size_x = 32, local_size_y = 32); layout(local_size_variable); > +* > +* If a compute shader including a *local_size_variable* qualifier > also > +* declares a fixed local group size using the *local_size_x*, > +* *local_size_y*, or *local_size_z* qualifiers, a compile-time error > +* results > +*/ > + if (state->cs_input_local_size_variable_specified) { > + _mesa_glsl_error(, state, > + "compute shader can't include both a variable and a " > + "fixed local group size"); > + return NULL; > + } > + > state->cs_input_local_size_specified = true; > for (int i = 0; i < 3; i++) >state->cs_input_local_size[i] = qual_local_size[i]; > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/11] glsl: process local_size_variable input qualifier
On 09/08/2016 01:31 PM, Samuel Pitoiset wrote: > This is the new layout qualifier introduced by > ARB_compute_variable_group_size which allows to use a variable work > group size. > > Signed-off-by: Samuel Pitoiset> --- > src/compiler/glsl/ast.h | 5 + > src/compiler/glsl/ast_type.cpp | 6 ++ > src/compiler/glsl/glsl_parser.yy | 13 + > src/compiler/glsl/glsl_parser_extras.cpp | 5 + > src/compiler/glsl/glsl_parser_extras.h | 6 ++ > 5 files changed, 35 insertions(+) > > diff --git a/src/compiler/glsl/ast.h b/src/compiler/glsl/ast.h > index 4c648d0..55f009a 100644 > --- a/src/compiler/glsl/ast.h > +++ b/src/compiler/glsl/ast.h > @@ -553,6 +553,11 @@ struct ast_type_qualifier { >*/ > unsigned local_size:3; > > + /** \name Layout qualifiers for ARB_compute_variable_group_size. */ > + /** \{ */ > + unsigned local_size_variable:1; > + /** \} */ > + >/** \name Layout and memory qualifiers for > ARB_shader_image_load_store. */ >/** \{ */ >unsigned early_fragment_tests:1; > diff --git a/src/compiler/glsl/ast_type.cpp b/src/compiler/glsl/ast_type.cpp > index f3f6b29..3f19f1f 100644 > --- a/src/compiler/glsl/ast_type.cpp > +++ b/src/compiler/glsl/ast_type.cpp > @@ -503,6 +503,7 @@ ast_type_qualifier::merge_in_qualifier(YYLTYPE *loc, > state->in_qualifier->flags.q.local_size == 0; > >valid_in_mask.flags.q.local_size = 7; > + valid_in_mask.flags.q.local_size_variable = 1; >break; > default: >_mesa_glsl_error(loc, state, > @@ -580,6 +581,10 @@ ast_type_qualifier::merge_in_qualifier(YYLTYPE *loc, >this->point_mode = q.point_mode; > } > > + if (q.flags.q.local_size_variable) { > + state->cs_input_local_size_variable_specified = true; > + } > + > if (create_node) { >if (create_gs_ast) { > node = new(mem_ctx) ast_gs_input_layout(*loc, q.prim_type); > @@ -653,6 +658,7 @@ ast_type_qualifier::validate_flags(YYLTYPE *loc, > bad.flags.q.prim_type ? " prim_type" : "", > bad.flags.q.max_vertices ? " max_vertices" : "", > bad.flags.q.local_size ? " local_size" : "", > +bad.flags.q.local_size_variable ? " local_size_variable" > : "", > bad.flags.q.early_fragment_tests ? " > early_fragment_tests" : "", > bad.flags.q.explicit_image_format ? " image_format" : "", > bad.flags.q.coherent ? " coherent" : "", > diff --git a/src/compiler/glsl/glsl_parser.yy > b/src/compiler/glsl/glsl_parser.yy > index 9e1fd9e..38cbd3f 100644 > --- a/src/compiler/glsl/glsl_parser.yy > +++ b/src/compiler/glsl/glsl_parser.yy > @@ -1491,6 +1491,19 @@ layout_qualifier_id: > } >} > > + /* Layout qualifiers for ARB_compute_variable_group_size. */ > + if (!$$.flags.i) { > + if (match_layout_qualifier($1, "local_size_variable", state) == 0) { > +$$.flags.q.local_size_variable = 1; > + } > + > + if ($$.flags.i && !state->ARB_compute_variable_group_size_enable) { > +_mesa_glsl_error(& @1, state, > + "qualifier `local_size_variable` requires " > + "ARB_compute_variable_group_size"); > + } > + } > + >if (!$$.flags.i) { > _mesa_glsl_error(& @1, state, "unrecognized layout identifier " >"`%s'", $1); > diff --git a/src/compiler/glsl/glsl_parser_extras.cpp > b/src/compiler/glsl/glsl_parser_extras.cpp > index bcbe623..5a7153f 100644 > --- a/src/compiler/glsl/glsl_parser_extras.cpp > +++ b/src/compiler/glsl/glsl_parser_extras.cpp > @@ -297,6 +297,8 @@ _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct > gl_context *_ctx, >sizeof(this->atomic_counter_offsets)); > this->allow_extension_directive_midshader = >ctx->Const.AllowGLSLExtensionDirectiveMidShader; > + > + this->cs_input_local_size_variable_specified = false; > } > > /** > @@ -1648,6 +1650,7 @@ set_shader_inout_layout(struct gl_shader *shader, > if (shader->Stage != MESA_SHADER_COMPUTE) { >/* Should have been prevented by the parser. */ >assert(!state->cs_input_local_size_specified); > + assert(!state->cs_input_local_size_variable_specified); > } > > if (shader->Stage != MESA_SHADER_FRAGMENT) { > @@ -1763,6 +1766,8 @@ set_shader_inout_layout(struct gl_shader *shader, > for (int i = 0; i < 3; i++) > shader->info.Comp.LocalSize[i] = 0; >} Blank line here. With that fixed, this patch is Reviewed-by: Ian Romanick > + shader->info.Comp.LocalSizeVariable = > + state->cs_input_local_size_variable_specified; >break; > > case MESA_SHADER_FRAGMENT: > diff --git
Re: [Mesa-dev] [PATCH 0/3] Implement EGL_KHR_no_config_context
On Fri, Sep 9, 2016 at 9:25 AM, Adam Jacksonwrote: > KHR_no_config_context is virtually identical to MESA_configless_context, > where they differ is only where the Mesa spec is silent, and for the > most part the code already did what the Khronos spec wants. Fix up the > one corner case, and mark the Mesa extension as superseded. Looks good to me: Reviewed-by: Kristian Høgsberg > - ajax > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] intel/isl: Allow valign2 for texture-only Y-tiled surfaces on gen7
On Fri, Sep 9, 2016 at 11:18 AM, Chad Versacewrote: > On Fri 09 Sep 2016, Jason Ekstrand wrote: > > The restriction that Y-tiled surfaces must have valign == 4 only aplies > to > > render targets but we were applying it universally. This causes problems > > if R32G32B32_SFLOAT is used because it requires valign == 2; this should > be > > That isl format doesn't exist. Did you mean R32B32B32_FLOAT (sans S)? > I did. > > > okay because you can't render to that format. > > > > Signed-off-by: Jason Ekstrand > > Cc: Chad Versace > > --- > > src/intel/isl/isl_gen7.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/src/intel/isl/isl_gen7.c b/src/intel/isl/isl_gen7.c > > index 02273f8..f3d8428 100644 > > --- a/src/intel/isl/isl_gen7.c > > +++ b/src/intel/isl/isl_gen7.c > > @@ -354,7 +354,8 @@ gen7_choose_valign_el(const struct isl_device *dev, > > */ > > if (isl_surf_usage_is_depth(info->usage) || > > info->samples > 1 || > > - tiling == ISL_TILING_Y0) { > > + ((info->usage & ISL_SURF_USAGE_RENDER_TARGET_BIT) && > > +tiling == ISL_TILING_Y0)) { > >require_valign4 = true; > > } > > Did the buggy code trigger the assertion on line 390? > assert(!require_valign2 || !require_valign4); > If not, then something is suspicious. > Yes, that's exactly what happened > Fix the commit message and this is > Reviewed-by: Chad Versace > Thanks! ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/11] glsl: add enable flags for ARB_compute_variable_group_size
This patch is Reviewed-by: Ian RomanickI'm going to try to work through the rest of the glsl: patches today, but I can't make any promises. On 09/08/2016 01:31 PM, Samuel Pitoiset wrote: > This also initializes the default values for the standalone compiler. > > Signed-off-by: Samuel Pitoiset > --- > src/compiler/glsl/glsl_parser_extras.cpp | 1 + > src/compiler/glsl/glsl_parser_extras.h | 2 ++ > src/compiler/glsl/standalone.cpp | 4 > src/compiler/glsl/standalone_scaffolding.cpp | 5 + > 4 files changed, 12 insertions(+) > > diff --git a/src/compiler/glsl/glsl_parser_extras.cpp > b/src/compiler/glsl/glsl_parser_extras.cpp > index 436ddd0..bcbe623 100644 > --- a/src/compiler/glsl/glsl_parser_extras.cpp > +++ b/src/compiler/glsl/glsl_parser_extras.cpp > @@ -580,6 +580,7 @@ static const _mesa_glsl_extension > _mesa_glsl_supported_extensions[] = { > EXT(ARB_ES3_2_compatibility), > EXT(ARB_arrays_of_arrays), > EXT(ARB_compute_shader), > + EXT(ARB_compute_variable_group_size), > EXT(ARB_conservative_depth), > EXT(ARB_cull_distance), > EXT(ARB_derivative_control), > diff --git a/src/compiler/glsl/glsl_parser_extras.h > b/src/compiler/glsl/glsl_parser_extras.h > index e146fe1..8e0dafe 100644 > --- a/src/compiler/glsl/glsl_parser_extras.h > +++ b/src/compiler/glsl/glsl_parser_extras.h > @@ -576,6 +576,8 @@ struct _mesa_glsl_parse_state { > bool ARB_arrays_of_arrays_warn; > bool ARB_compute_shader_enable; > bool ARB_compute_shader_warn; > + bool ARB_compute_variable_group_size_enable; > + bool ARB_compute_variable_group_size_warn; > bool ARB_conservative_depth_enable; > bool ARB_conservative_depth_warn; > bool ARB_cull_distance_enable; > diff --git a/src/compiler/glsl/standalone.cpp > b/src/compiler/glsl/standalone.cpp > index 88fe5fd..cb2da03 100644 > --- a/src/compiler/glsl/standalone.cpp > +++ b/src/compiler/glsl/standalone.cpp > @@ -58,6 +58,10 @@ initialize_context(struct gl_context *ctx, gl_api api) > ctx->Const.MaxComputeWorkGroupSize[2] = 64; > ctx->Const.MaxComputeWorkGroupInvocations = 1024; > ctx->Const.MaxComputeSharedMemorySize = 32768; > + ctx->Const.MaxComputeVariableGroupSize[0] = 512; > + ctx->Const.MaxComputeVariableGroupSize[1] = 512; > + ctx->Const.MaxComputeVariableGroupSize[2] = 64; > + ctx->Const.MaxComputeVariableGroupInvocations = 512; > ctx->Const.Program[MESA_SHADER_COMPUTE].MaxTextureImageUnits = 16; > ctx->Const.Program[MESA_SHADER_COMPUTE].MaxUniformComponents = 1024; > ctx->Const.Program[MESA_SHADER_COMPUTE].MaxCombinedUniformComponents = > 1024; > diff --git a/src/compiler/glsl/standalone_scaffolding.cpp > b/src/compiler/glsl/standalone_scaffolding.cpp > index b0fb4b7..decff5f 100644 > --- a/src/compiler/glsl/standalone_scaffolding.cpp > +++ b/src/compiler/glsl/standalone_scaffolding.cpp > @@ -144,6 +144,7 @@ void initialize_context_to_defaults(struct gl_context > *ctx, gl_api api) > ctx->Extensions.dummy_false = false; > ctx->Extensions.dummy_true = true; > ctx->Extensions.ARB_compute_shader = true; > + ctx->Extensions.ARB_compute_variable_group_size = true; > ctx->Extensions.ARB_conservative_depth = true; > ctx->Extensions.ARB_draw_instanced = true; > ctx->Extensions.ARB_ES2_compatibility = true; > @@ -207,6 +208,10 @@ void initialize_context_to_defaults(struct gl_context > *ctx, gl_api api) > ctx->Const.MaxComputeWorkGroupSize[1] = 1024; > ctx->Const.MaxComputeWorkGroupSize[2] = 64; > ctx->Const.MaxComputeWorkGroupInvocations = 1024; > + ctx->Const.MaxComputeVariableGroupSize[0] = 512; > + ctx->Const.MaxComputeVariableGroupSize[1] = 512; > + ctx->Const.MaxComputeVariableGroupSize[2] = 64; > + ctx->Const.MaxComputeVariableGroupInvocations = 512; > ctx->Const.Program[MESA_SHADER_COMPUTE].MaxTextureImageUnits = 16; > ctx->Const.Program[MESA_SHADER_COMPUTE].MaxUniformComponents = 1024; > ctx->Const.Program[MESA_SHADER_COMPUTE].MaxInputComponents = 0; /* not > used */ > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: support TGSI compute shaders with variable block size
On 09/09/2016 05:29 AM, Marek Olšák wrote: > On Fri, Sep 9, 2016 at 10:12 AM, Nicolai Hähnlewrote: >> From: Nicolai Hähnle >> >> Not sure if it's possible to avoid programming the block size twice (once for >> the userdata and once for the dispatch). >> >> Since the shaders are compiled with a pessimistic upper limit on the number >> of >> registers, asynchronously compiling variants may be worth considering in the >> future if we observe the shaders to be dispatched with small block sizes. >> --- >> I think this is sufficient to support variable group sizes on radeonsi, but >> it's completely untested. Do you keep the latest version of your series in a >> public repository somewhere? >> >> src/gallium/drivers/radeonsi/si_compute.c | 10 +- >> src/gallium/drivers/radeonsi/si_shader.c | 29 - >> src/gallium/drivers/radeonsi/si_shader.h | 4 +++- >> 3 files changed, 32 insertions(+), 11 deletions(-) >> >> diff --git a/src/gallium/drivers/radeonsi/si_compute.c >> b/src/gallium/drivers/radeonsi/si_compute.c >> index 5041761..26e096c 100644 >> --- a/src/gallium/drivers/radeonsi/si_compute.c >> +++ b/src/gallium/drivers/radeonsi/si_compute.c >> @@ -379,25 +379,33 @@ static void si_setup_tgsi_grid(struct si_context *sctx, >> for (i = 0; i < 3; ++i) { >> radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0)); >> radeon_emit(cs, COPY_DATA_SRC_SEL(COPY_DATA_MEM) | >> COPY_DATA_DST_SEL(COPY_DATA_REG)); >> radeon_emit(cs, (va + 4 * i)); >> radeon_emit(cs, (va + 4 * i) >> 32); >> radeon_emit(cs, (grid_size_reg >> 2) + i); >> radeon_emit(cs, 0); >> } >> } else { >> + struct si_compute *program = sctx->cs_shader_state.program; >> + bool variable_group_size = >> + >> program->shader.selector->info.properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] >> == 0; >> >> - radeon_set_sh_reg_seq(cs, grid_size_reg, 3); >> + radeon_set_sh_reg_seq(cs, grid_size_reg, variable_group_size >> ? 6 : 3); >> radeon_emit(cs, info->grid[0]); >> radeon_emit(cs, info->grid[1]); >> radeon_emit(cs, info->grid[2]); >> + if (variable_group_size) { >> + radeon_emit(cs, info->block[0]); >> + radeon_emit(cs, info->block[1]); >> + radeon_emit(cs, info->block[2]); >> + } >> } >> } >> >> static void si_emit_dispatch_packets(struct si_context *sctx, >> const struct pipe_grid_info *info) >> { >> struct radeon_winsys_cs *cs = sctx->b.gfx.cs; >> bool render_cond_bit = sctx->b.render_cond && >> !sctx->b.render_cond_force_off; >> unsigned waves_per_threadgroup = >> DIV_ROUND_UP(info->block[0] * info->block[1] * >> info->block[2], 64); >> diff --git a/src/gallium/drivers/radeonsi/si_shader.c >> b/src/gallium/drivers/radeonsi/si_shader.c >> index 0b7de18..730ee21 100644 >> --- a/src/gallium/drivers/radeonsi/si_shader.c >> +++ b/src/gallium/drivers/radeonsi/si_shader.c >> @@ -1783,30 +1783,35 @@ static void declare_system_value( >> >> case TGSI_SEMANTIC_GRID_SIZE: >> value = LLVMGetParam(radeon_bld->main_fn, >> SI_PARAM_GRID_SIZE); >> break; >> >> case TGSI_SEMANTIC_BLOCK_SIZE: >> { >> LLVMValueRef values[3]; >> unsigned i; >> unsigned *properties = >> ctx->shader->selector->info.properties; >> - unsigned sizes[3] = { >> - properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH], >> - properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT], >> - properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH] >> - }; >> >> - for (i = 0; i < 3; ++i) >> - values[i] = lp_build_const_int32(gallivm, sizes[i]); >> + if (properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] != 0) { >> + unsigned sizes[3] = { >> + >> properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH], >> + >> properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT], >> + >> properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH] >> + }; >> + >> + for (i = 0; i < 3; ++i) >> + values[i] = lp_build_const_int32(gallivm, >> sizes[i]); >> >> - value = lp_build_gather_values(gallivm, values, 3); >> + value = lp_build_gather_values(gallivm, values, 3); >> + } else { >> +
Re: [Mesa-dev] [PATCH 01/11] glapi: add entry points for GL_ARB_compute_variable_group_size
On 09/09/2016 08:46 AM, Samuel Pitoiset wrote: > > > On 09/08/2016 10:58 PM, Ian Romanick wrote: >> On 09/08/2016 01:31 PM, Samuel Pitoiset wrote: >>> Signed-off-by: Samuel Pitoiset>>> --- >>> .../glapi/gen/ARB_compute_variable_group_size.xml | 25 >>> ++ >>> src/mapi/glapi/gen/Makefile.am | 1 + >>> src/mapi/glapi/gen/gl_API.xml | 2 ++ >>> src/mesa/main/compute.c| 8 +++ >>> src/mesa/main/compute.h| 5 + >>> src/mesa/main/tests/dispatch_sanity.cpp| 3 +++ >>> 6 files changed, 44 insertions(+) >>> create mode 100644 >>> src/mapi/glapi/gen/ARB_compute_variable_group_size.xml >>> >>> diff --git a/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml >>> b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml >>> new file mode 100644 >>> index 000..b21c52f >>> --- /dev/null >>> +++ b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml >>> @@ -0,0 +1,25 @@ >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> + >> value="0x9344"/> >>> + >> value="0x90EB"/> >>> + >> value="0x9345"/> >>> + >> value="0x91BF"/> >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> diff --git a/src/mapi/glapi/gen/Makefile.am >>> b/src/mapi/glapi/gen/Makefile.am >>> index 0d7c338..49fdfe3 100644 >>> --- a/src/mapi/glapi/gen/Makefile.am >>> +++ b/src/mapi/glapi/gen/Makefile.am >>> @@ -117,6 +117,7 @@ API_XML = \ >>> ARB_color_buffer_float.xml \ >>> ARB_compressed_texture_pixel_storage.xml \ >>> ARB_compute_shader.xml \ >>> +ARB_compute_variable_group_size.xml \ >>> ARB_copy_buffer.xml \ >>> ARB_copy_image.xml \ >>> ARB_debug_output.xml \ >>> diff --git a/src/mapi/glapi/gen/gl_API.xml >>> b/src/mapi/glapi/gen/gl_API.xml >>> index c39aa22..9ad3b60 100644 >>> --- a/src/mapi/glapi/gen/gl_API.xml >>> +++ b/src/mapi/glapi/gen/gl_API.xml >>> @@ -8258,6 +8258,8 @@ >>> >>> >> xmlns:xi="http://www.w3.org/2001/XInclude"/> >>> >>> +>> xmlns:xi="http://www.w3.org/2001/XInclude"/> >>> + >> >> This (extension #153) should go before ARB_indirect_parameters >> (extension #154), and the "ARB extensions 149 - 153" should be changed >> to "ARB extensions 149 - 152". > > Presumably, you meant "ARB extensions 149 - 154"? Because the next > section is "ARB extension 155 - 159". I meant what I said. The 'ARB extensions ...' comments are placeholders for extensions that are not there. Previously only 154 was there, hence the 149-153 and 155-159. You're adding 153, so those become 149-152 and 155-159. Looking at this more closely... there are some other errors here unrelated to your change. I know we support GL_ARB_texture_mirror_clamp_to_edge (extension #149), GL_ARB_texture_stencil8 (extension #150), and GL_ARB_vertex_type_10f_11f_11f_rev (extension #151). I think these are all extensions that just enable pre-existing enums in new places, so they don't add anything new for this file. A reasonable follow-up patch would be to add blocks for these extensions like the existing GL_ARB_texture_non_power_of_two block: It may not be worth the effort. >> With that fixed, this patch is >> >> Reviewed-by: Ian Romanick >> >>> >>> >>> >> xmlns:xi="http://www.w3.org/2001/XInclude"/> >>> diff --git a/src/mesa/main/compute.c b/src/mesa/main/compute.c >>> index b71430f..b052bae 100644 >>> --- a/src/mesa/main/compute.c >>> +++ b/src/mesa/main/compute.c >>> @@ -60,3 +60,11 @@ _mesa_DispatchComputeIndirect(GLintptr indirect) >>> >>> ctx->Driver.DispatchComputeIndirect(ctx, indirect); >>> } >>> + >>> +void GLAPIENTRY >>> +_mesa_DispatchComputeGroupSizeARB(GLuint num_groups_x, GLuint >>> num_groups_y, >>> + GLuint num_groups_z, GLuint >>> group_size_x, >>> + GLuint group_size_y, GLuint >>> group_size_z) >>> +{ >>> + >>> +} >>> diff --git a/src/mesa/main/compute.h b/src/mesa/main/compute.h >>> index 0cc034f..8018bbb 100644 >>> --- a/src/mesa/main/compute.h >>> +++ b/src/mesa/main/compute.h >>> @@ -35,4 +35,9 @@ _mesa_DispatchCompute(GLuint num_groups_x, >>> extern void GLAPIENTRY >>> _mesa_DispatchComputeIndirect(GLintptr indirect); >>> >>> +extern void GLAPIENTRY >>> +_mesa_DispatchComputeGroupSizeARB(GLuint num_groups_x, GLuint >>> num_groups_y, >>> + GLuint num_groups_z, GLuint >>> group_size_x, >>> + GLuint group_size_y, GLuint >>> group_size_z); >>> + >>> #endif >>> diff --git a/src/mesa/main/tests/dispatch_sanity.cpp >>> b/src/mesa/main/tests/dispatch_sanity.cpp >>> index 42fe61a..7faeabe 100644 >>> --- a/src/mesa/main/tests/dispatch_sanity.cpp >>> +++ b/src/mesa/main/tests/dispatch_sanity.cpp >>> @@ -942,6 +942,9 @@ const struct function >>> common_desktop_functions_possible[] = { >>> { "glDispatchCompute",
Re: [Mesa-dev] [PATCH 1/4] intel/isl: Allow valign2 for texture-only Y-tiled surfaces on gen7
On Fri 09 Sep 2016, Jason Ekstrand wrote: > The restriction that Y-tiled surfaces must have valign == 4 only aplies to > render targets but we were applying it universally. This causes problems > if R32G32B32_SFLOAT is used because it requires valign == 2; this should be That isl format doesn't exist. Did you mean R32B32B32_FLOAT (sans S)? > okay because you can't render to that format. > > Signed-off-by: Jason Ekstrand> Cc: Chad Versace > --- > src/intel/isl/isl_gen7.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/intel/isl/isl_gen7.c b/src/intel/isl/isl_gen7.c > index 02273f8..f3d8428 100644 > --- a/src/intel/isl/isl_gen7.c > +++ b/src/intel/isl/isl_gen7.c > @@ -354,7 +354,8 @@ gen7_choose_valign_el(const struct isl_device *dev, > */ > if (isl_surf_usage_is_depth(info->usage) || > info->samples > 1 || > - tiling == ISL_TILING_Y0) { > + ((info->usage & ISL_SURF_USAGE_RENDER_TARGET_BIT) && > +tiling == ISL_TILING_Y0)) { >require_valign4 = true; > } Did the buggy code trigger the assertion on line 390? assert(!require_valign2 || !require_valign4); If not, then something is suspicious. Fix the commit message and this is Reviewed-by: Chad Versace ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] anv: Link to libX11-xcb only when unneeded
On Thu 08 Sep 2016, Nanley Chery wrote: > Should the title the say needed instead of unneeded? Oops :/ It's already pushed. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [v3 4/6] i965/rbc: Consult rb settings for texture surface setup
On Fri, Sep 09, 2016 at 07:11:16AM -0700, Jason Ekstrand wrote: >On Sep 8, 2016 11:13 PM, "Pohjolainen, Topi" ><[1]topi.pohjolai...@gmail.com> wrote: >> >> On Thu, Sep 08, 2016 at 08:49:56AM -0700, Jason Ekstrand wrote: >> >On Sep 7, 2016 9:30 PM, "Pohjolainen, Topi" >> ><[1][2]topi.pohjolai...@gmail.com> wrote: >> >> >> >> On Wed, Sep 07, 2016 at 03:25:30PM -0700, Jason Ekstrand >wrote: >> >> >On Sep 7, 2016 10:24 AM, "Topi Pohjolainen" >> >> ><[1][2][3]topi.pohjolai...@gmail.com> wrote: >> >> >> >> >> >> Once mcs buffer gets allocated without delay for >lossless >> >> >> compression (same as we do for msaa), one gets >regression in: >> >> >> >> >> >> GL45-CTS.texture_barrier_ARB.same-texel-rw >> >> >> >> >> >> Setting the auxiliary surface for both sampling engine >and >> >data >> >> >> port seems to fix this. I haven't found any hardware >> >documentation >> >> >> backing this though. >> >> >> >> >> >> v2 (Jason): Prepare also for the case where surface is >sampled >> >with >> >> >> non-compressible format forcing also >rendering >> >without >> >> >> compression. >> >> >> v3: Split asserts and decision making. >> >> >> >> >> >> Signed-off-by: Topi Pohjolainen >> ><[2][3][4]topi.pohjolai...@intel.com> >> >> >> --- >> >> >> src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 63 >> >> >+--- >> >> >> 1 file changed, 56 insertions(+), 7 deletions(-) >> >> >> >> >> >> diff --git >a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c >> >> >b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c >> >> >> index c1273c5..054c5c8 100644 >> >> >> --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c >> >> >> +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c >> >> >> @@ -140,9 +140,7 @@ brw_emit_surface_state(struct >brw_context >> >*brw, >> >> >> struct isl_surf *aux_surf = NULL, aux_surf_s; >> >> >> uint64_t aux_offset = 0; >> >> >> enum isl_aux_usage aux_usage = ISL_AUX_USAGE_NONE; >> >> >> - if (mt->mcs_mt && >> >> >> - ((view.usage & >ISL_SURF_USAGE_RENDER_TARGET_BIT) || >> >> >> -mt->fast_clear_state != >> >INTEL_FAST_CLEAR_STATE_RESOLVED)) { >> >> >> + if (mt->mcs_mt && !(flags & >INTEL_AUX_BUFFER_DISABLED)) { >> >> >>intel_miptree_get_aux_isl_surf(brw, mt, >_surf_s, >> >> >_usage); >> >> >>aux_surf = _surf_s; >> >> >>assert(mt->mcs_mt->offset == 0); >> >> >> @@ -425,6 +423,54 @@ swizzle_to_scs(GLenum swizzle, >bool >> >> >need_green_to_blue) >> >> >> return (need_green_to_blue && scs == HSW_SCS_GREEN) >? >> >> >HSW_SCS_BLUE : scs; >> >> >> } >> >> >> >> >> >> +static unsigned >> >> >> +brw_find_matching_rb(const struct gl_framebuffer *fb, >> >> >> + const struct intel_mipmap_tree >*mt) >> >> >> +{ >> >> >> + for (unsigned i = 0; i < fb->_NumColorDrawBuffers; >i++) { >> >> >> + const struct intel_renderbuffer *irb = >> >> >> + intel_renderbuffer(fb->_ColorDrawBuffers[i]); >> >> >> + >> >> >> + if (irb->mt == mt) >> >> >> + return i; >> >> >> + } >> >> >> + >> >> >> + return fb->_NumColorDrawBuffers; >> >> >> +} >> >> >> + >> >> >> +static bool >> >> >> +brw_disable_aux_surface(const struct brw_context *brw, >> >> >> +const struct intel_mipmap_tree >*mt) >> >> >> +{ >> >> >> + /* Nothing to disable. */ >> >> >> + if (!mt->mcs_mt) >> >> >> + return false; >> >> >> + >> >> >> + /* There are special cases only for lossless >compression. >> >*/ >> >> >> + if (!intel_miptree_is_lossless_compressed(brw, mt)) >> >> >> + return mt->fast_clear_state == >> >> >INTEL_FAST_CLEAR_STATE_RESOLVED; >> >> >> + >> >> >> + const struct gl_framebuffer *fb = >brw->ctx.DrawBuffer; >> >> >> + const unsigned rb_index = brw_find_matching_rb(fb, >mt); >> >> >> + >> >> >> + /* In practise it looks that setting the same >lossless >
Re: [Mesa-dev] [PATCH 2/7] EGL: Implement eglLabelObjectKHR
On Fri, 2016-09-09 at 11:27 +0100, Emil Velikov wrote: > > On 8 September 2016 at 18:46, Adam Jacksonwrote: > > From: Kyle Brenneman > > Added a label to the _EGLThreadInfo, _EGLDisplay, and EGLResource > structs. Implemented the function eglLabelObjectKHR. > > > Coding style of the new hunk follows the GLVND one, which is _not_ > what we use in mesa/egl. Please don't do that ? Fixed in next version. > b) eglTerminate > It detaches/unlinks only contexts and surfaces (bug?). Thus even when > the display is no longer initialized we will get get to this point and > _eglCheckResource() will return true. Almost certainly a bug, patch in next version of the series. Not sure if any tests cover this bug, but I guess I'll find out. > > +/** > > + * Returns the label set for the current thread. > > + */ > > +EGLLabelKHR _eglGetThreadLabel(void) > > +{ > > + _EGLThreadInfo *t = _eglGetCurrentThread(); > > + return t->Label; > > Shouldn't the label be cleared in eglReleaseThread ? > It isn't, not explicitly, but eglReleaseThread -> _eglDestroyCurrentThread -> _eglDestroyThreadInfo which frees the struct containing the label. - ajax ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/7] EGL: Implement eglLabelObjectKHR
On 09/09/2016 04:27 AM, Emil Velikov wrote: On 8 September 2016 at 18:46, Adam Jacksonwrote: From: Kyle Brenneman Added a label to the _EGLThreadInfo, _EGLDisplay, and EGLResource structs. Implemented the function eglLabelObjectKHR. Coding style of the new hunk follows the GLVND one, which is _not_ what we use in mesa/egl. Please don't do that ? Reviewed-by: Adam Jackson --- src/egl/main/eglapi.c | 63 +++ src/egl/main/eglcurrent.c | 9 +++ src/egl/main/eglcurrent.h | 4 +++ src/egl/main/egldisplay.h | 4 +++ 4 files changed, 80 insertions(+) diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c index df2dcd6..31b842f 100644 --- a/src/egl/main/eglapi.c +++ b/src/egl/main/eglapi.c @@ -1791,6 +1791,68 @@ eglExportDMABUFImageMESA(EGLDisplay dpy, EGLImage image, RETURN_EGL_EVAL(disp, ret); } +static EGLint EGLAPIENTRY +eglLabelObjectKHR( + EGLDisplay dpy, + EGLenum objectType, + EGLObjectKHR object, + EGLLabelKHR label) Incorrect alignment. +{ + if (objectType == EGL_OBJECT_THREAD_KHR) { + _EGLThreadInfo *t = _eglGetCurrentThread(); Add blank newline after variable declarations. + if (!_eglIsCurrentThreadDummy()) { + t->Label = label; + } Drop {} when they encapsulate a single line on code (unless the other side of the conditional has them). + return EGL_SUCCESS; + } else { This else can do. This and the above two apply for the rest of the patch. + _EGLDisplay *disp = _eglLookupDisplay(dpy); + if (disp == NULL) { + _eglError(EGL_BAD_DISPLAY, "eglLabelObjectKHR"); + return EGL_BAD_DISPLAY; + } + + if (objectType == EGL_OBJECT_DISPLAY_KHR) { + if (dpy != (EGLDisplay) object) { +_eglError(EGL_BAD_PARAMETER, "eglLabelObjectKHR"); +return EGL_BAD_PARAMETER; + } + disp->Label = label; + return EGL_SUCCESS; + } else { + _EGLResourceType type; + switch (objectType) + { +case EGL_OBJECT_CONTEXT_KHR: + type = _EGL_RESOURCE_CONTEXT; + break; +case EGL_OBJECT_SURFACE_KHR: + type = _EGL_RESOURCE_SURFACE; + break; +case EGL_OBJECT_IMAGE_KHR: + type = _EGL_RESOURCE_IMAGE; + break; +case EGL_OBJECT_SYNC_KHR: + type = _EGL_RESOURCE_SYNC; + break; +case EGL_OBJECT_STREAM_KHR: +default: +_eglError(EGL_BAD_PARAMETER, "eglLabelObjectKHR"); + return EGL_BAD_PARAMETER; + } + The does not need to be initialized for EGL_OBJECT_THREAD_KHR, or EGL_OBJECT_DISPLAY_KHR; However for all other types it must be initialized in order to validate the for attaching a label. + if (_eglCheckResource(object, type, disp)) { With the above said, I'm not sure of this will be sufficient. _eglCheckResource checks if the resource is linked, with unlinking happening via a) destroy Mildly related: we seem to have a bug for EGL_OBJECT_SYNC_KHR, where we unconditionally unblock and destroy, while we should flag for deletion. b) eglTerminate It detaches/unlinks only contexts and surfaces (bug?). Thus even when the display is no longer initialized we will get get to this point and _eglCheckResource() will return true. c) eglReleaseThread Takes care of the error, bound API and current ctx state. Would it be sufficient to call _eglLockDisplay instead of _eglLookupDisplay, and then check the disp->Initialized flag for the other object types? That way, it could just return EGL_NOT_INITIALIZED for an uninitialized display, even if the EGLObjectKHR handle points to some object that's still lingering after an eglTerminate call. +/** + * Returns the label set for the current thread. + */ +EGLLabelKHR _eglGetThreadLabel(void) +{ + _EGLThreadInfo *t = _eglGetCurrentThread(); + return t->Label; Shouldn't the label be cleared in eglReleaseThread ? It is, albeit indirectly. eglReleaseThread frees the whole _EGLThreadInfo struct, so the next call to _eglGetCurrentThread will return an _EGLThreadInfo struct without a thread label. Though, _eglGetThreadLabel probably should call _eglIsCurrentThreadDummy first so that it doesn't try to create create a new _EGLThreadInfo if it doesn't have to. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/12] winsys/radeon: rename nrelocs, crelocs to max_relocs, num_relocs
From: Nicolai Hähnle--- src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 48 +-- src/gallium/winsys/radeon/drm/radeon_drm_cs.h | 6 ++-- 2 files changed, 27 insertions(+), 27 deletions(-) diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c index f0dcac0..b277379 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c @@ -118,27 +118,27 @@ static bool radeon_init_cs_context(struct radeon_cs_context *csc, for (i = 0; i < ARRAY_SIZE(csc->reloc_indices_hashlist); i++) { csc->reloc_indices_hashlist[i] = -1; } return true; } static void radeon_cs_context_cleanup(struct radeon_cs_context *csc) { unsigned i; -for (i = 0; i < csc->crelocs; i++) { +for (i = 0; i < csc->num_relocs; i++) { p_atomic_dec(>relocs_bo[i].bo->num_cs_references); radeon_bo_reference(>relocs_bo[i].bo, NULL); } -csc->crelocs = 0; -csc->validated_crelocs = 0; +csc->num_relocs = 0; +csc->num_validated_relocs = 0; csc->chunks[0].length_dw = 0; csc->chunks[1].length_dw = 0; for (i = 0; i < ARRAY_SIZE(csc->reloc_indices_hashlist); i++) { csc->reloc_indices_hashlist[i] = -1; } } static void radeon_destroy_cs_context(struct radeon_cs_context *csc) { @@ -207,21 +207,21 @@ static inline void update_reloc(struct drm_radeon_cs_reloc *reloc, int radeon_lookup_buffer(struct radeon_cs_context *csc, struct radeon_bo *bo) { unsigned hash = bo->handle & (ARRAY_SIZE(csc->reloc_indices_hashlist)-1); int i = csc->reloc_indices_hashlist[hash]; /* not found or found */ if (i == -1 || csc->relocs_bo[i].bo == bo) return i; /* Hash collision, look for the BO in the list of relocs linearly. */ -for (i = csc->crelocs - 1; i >= 0; i--) { +for (i = csc->num_relocs - 1; i >= 0; i--) { if (csc->relocs_bo[i].bo == bo) { /* Put this reloc in the hash list. * This will prevent additional hash collisions if there are * several consecutive lookup_buffer calls for the same buffer. * * Example: Assuming buffers A,B,C collide in the hash list, * the following sequence of relocs: * AAABB * will collide here: ^ and here: ^, * meaning that we should get very few collisions in the end. */ @@ -265,50 +265,50 @@ static unsigned radeon_add_buffer(struct radeon_drm_cs *cs, * * This doesn't have to be done if virtual memory is enabled, * because there is no offset patching with virtual memory. */ if (cs->ring_type != RING_DMA || cs->ws->info.has_virtual_memory) { return i; } } /* New relocation, check if the backing array is large enough. */ -if (csc->crelocs >= csc->nrelocs) { +if (csc->num_relocs >= csc->max_relocs) { uint32_t size; -csc->nrelocs = MAX2(csc->nrelocs + 16, (unsigned)(csc->nrelocs * 1.3)); +csc->max_relocs = MAX2(csc->max_relocs + 16, (unsigned)(csc->max_relocs * 1.3)); -size = csc->nrelocs * sizeof(csc->relocs_bo[0]); +size = csc->max_relocs * sizeof(csc->relocs_bo[0]); csc->relocs_bo = realloc(csc->relocs_bo, size); -size = csc->nrelocs * sizeof(struct drm_radeon_cs_reloc); +size = csc->max_relocs * sizeof(struct drm_radeon_cs_reloc); csc->relocs = realloc(csc->relocs, size); csc->chunks[1].chunk_data = (uint64_t)(uintptr_t)csc->relocs; } /* Initialize the new relocation. */ -csc->relocs_bo[csc->crelocs].bo = NULL; -csc->relocs_bo[csc->crelocs].priority_usage = 1llu << priority; -radeon_bo_reference(>relocs_bo[csc->crelocs].bo, bo); +csc->relocs_bo[csc->num_relocs].bo = NULL; +csc->relocs_bo[csc->num_relocs].priority_usage = 1llu << priority; +radeon_bo_reference(>relocs_bo[csc->num_relocs].bo, bo); p_atomic_inc(>num_cs_references); -reloc = >relocs[csc->crelocs]; +reloc = >relocs[csc->num_relocs]; reloc->handle = bo->handle; reloc->read_domains = rd; reloc->write_domain = wd; reloc->flags = priority / 4; -csc->reloc_indices_hashlist[hash] = csc->crelocs; +csc->reloc_indices_hashlist[hash] = csc->num_relocs; csc->chunks[1].length_dw += RELOC_DWORDS; *added_domains = rd | wd; -return csc->crelocs++; +return csc->num_relocs++; } static unsigned radeon_drm_cs_add_buffer(struct radeon_winsys_cs *rcs, struct pb_buffer *buf, enum radeon_bo_usage usage, enum radeon_bo_domain domains, enum radeon_bo_priority priority) { struct
[Mesa-dev] [PATCH 11/12] winsys/radeon: don't pre-allocate the relocations array
From: Nicolai HähnleIt's really not necessary. Switch to an exponential resizing strategy. --- src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 15 +-- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c index 612a876..f0dcac0 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c @@ -91,33 +91,20 @@ static void radeon_drm_ctx_destroy(struct radeon_winsys_ctx *ctx) { /* No context support here. */ } static bool radeon_init_cs_context(struct radeon_cs_context *csc, struct radeon_drm_winsys *ws) { int i; csc->fd = ws->fd; -csc->nrelocs = 512; -csc->relocs_bo = (struct radeon_bo_item*) - CALLOC(1, csc->nrelocs * sizeof(csc->relocs_bo[0])); -if (!csc->relocs_bo) { -return false; -} - -csc->relocs = (struct drm_radeon_cs_reloc*) - CALLOC(1, csc->nrelocs * sizeof(struct drm_radeon_cs_reloc)); -if (!csc->relocs) { -FREE(csc->relocs_bo); -return false; -} csc->chunks[0].chunk_id = RADEON_CHUNK_ID_IB; csc->chunks[0].length_dw = 0; csc->chunks[0].chunk_data = (uint64_t)(uintptr_t)csc->buf; csc->chunks[1].chunk_id = RADEON_CHUNK_ID_RELOCS; csc->chunks[1].length_dw = 0; csc->chunks[1].chunk_data = (uint64_t)(uintptr_t)csc->relocs; csc->chunks[2].chunk_id = RADEON_CHUNK_ID_FLAGS; csc->chunks[2].length_dw = 2; csc->chunks[2].chunk_data = (uint64_t)(uintptr_t)>flags; @@ -280,21 +267,21 @@ static unsigned radeon_add_buffer(struct radeon_drm_cs *cs, * because there is no offset patching with virtual memory. */ if (cs->ring_type != RING_DMA || cs->ws->info.has_virtual_memory) { return i; } } /* New relocation, check if the backing array is large enough. */ if (csc->crelocs >= csc->nrelocs) { uint32_t size; -csc->nrelocs += 10; +csc->nrelocs = MAX2(csc->nrelocs + 16, (unsigned)(csc->nrelocs * 1.3)); size = csc->nrelocs * sizeof(csc->relocs_bo[0]); csc->relocs_bo = realloc(csc->relocs_bo, size); size = csc->nrelocs * sizeof(struct drm_radeon_cs_reloc); csc->relocs = realloc(csc->relocs, size); csc->chunks[1].chunk_data = (uint64_t)(uintptr_t)csc->relocs; } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/12] winsys/amdgpu: clean up error paths in amdgpu_winsys_create
From: Nicolai HähnleNo need to call pb_cache_deinit, because the cache hasn't been initialized at that point. --- src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c index 33aa492..521a78a 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c @@ -520,31 +520,29 @@ amdgpu_winsys_create(int fd, radeon_screen_create_t screen_create) /* Lookup a winsys if we have already created one for this device. */ ws = util_hash_table_get(dev_tab, dev); if (ws) { pipe_reference(NULL, >reference); pipe_mutex_unlock(dev_tab_mutex); return >base; } /* Create a new winsys. */ ws = CALLOC_STRUCT(amdgpu_winsys); - if (!ws) { - pipe_mutex_unlock(dev_tab_mutex); - return NULL; - } + if (!ws) + goto fail; ws->dev = dev; ws->info.drm_major = drm_major; ws->info.drm_minor = drm_minor; if (!do_winsys_init(ws, fd)) - goto fail; + goto fail_alloc; /* Create managers. */ pb_cache_init(>bo_cache, 50, ws->check_vm ? 1.0f : 2.0f, 0, (ws->info.vram_size + ws->info.gart_size) / 8, amdgpu_bo_destroy, amdgpu_bo_can_reclaim); /* init reference */ pipe_reference_init(>reference, 1); /* Set functions. */ @@ -580,16 +578,16 @@ amdgpu_winsys_create(int fd, radeon_screen_create_t screen_create) util_hash_table_set(dev_tab, dev, ws); /* We must unlock the mutex once the winsys is fully initialized, so that * other threads attempting to create the winsys from the same fd will * get a fully initialized winsys and not just half-way initialized. */ pipe_mutex_unlock(dev_tab_mutex); return >base; +fail_alloc: + FREE(ws); fail: pipe_mutex_unlock(dev_tab_mutex); - pb_cache_deinit(>bo_cache); - FREE(ws); return NULL; } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/12] winsys/amdgpu: add do_winsys_deinit function
From: Nicolai HähnleThe idea is to have matching init/deinit functions so that deinit can be re-used for cleanup in the error path of amdgpu_winsys_create. --- src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c index 521a78a..3961ee3 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c @@ -361,32 +361,37 @@ static bool do_winsys_init(struct amdgpu_winsys *ws, int fd) return true; fail: if (ws->addrlib) AddrDestroy(ws->addrlib); amdgpu_device_deinitialize(ws->dev); ws->dev = NULL; return false; } +static void do_winsys_deinit(struct amdgpu_winsys *ws) +{ + AddrDestroy(ws->addrlib); + amdgpu_device_deinitialize(ws->dev); +} + static void amdgpu_winsys_destroy(struct radeon_winsys *rws) { struct amdgpu_winsys *ws = (struct amdgpu_winsys*)rws; if (util_queue_is_initialized(>cs_queue)) util_queue_destroy(>cs_queue); pipe_mutex_destroy(ws->bo_fence_lock); pb_cache_deinit(>bo_cache); pipe_mutex_destroy(ws->global_bo_list_lock); - AddrDestroy(ws->addrlib); - amdgpu_device_deinitialize(ws->dev); + do_winsys_deinit(ws); FREE(rws); } static void amdgpu_winsys_query_info(struct radeon_winsys *rws, struct radeon_info *info) { *info = ((struct amdgpu_winsys *)rws)->info; } static bool amdgpu_cs_request_feature(struct radeon_winsys_cs *rcs, -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/12] gallium/radeon: page alignment for buffers is unnecessary
From: Nicolai HähnleIn some places (e.g. shader program pointers) we require 256 bytes alignment. --- src/gallium/drivers/radeon/r600_pipe_common.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 6d7cc1b..6fe3183 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -1077,22 +1077,21 @@ static void r600_query_memory_info(struct pipe_screen *screen, /* Just return the number of evicted 64KB pages. */ info->nr_device_memory_evictions = info->device_memory_evicted / 64; } struct pipe_resource *r600_resource_create_common(struct pipe_screen *screen, const struct pipe_resource *templ) { struct r600_common_screen *rscreen = (struct r600_common_screen*)screen; if (templ->target == PIPE_BUFFER) { - return r600_buffer_create(screen, templ, - rscreen->info.gart_page_size); + return r600_buffer_create(screen, templ, 256); } else { return r600_texture_create(screen, templ); } } bool r600_common_screen_init(struct r600_common_screen *rscreen, struct radeon_winsys *ws) { char llvm_string[32] = {}, kernel_version[128] = {}; struct utsname uname_data; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/12] winsys/radeon: remove unused radeon_cs_context::priority_usage
From: Nicolai Hähnle--- src/gallium/winsys/radeon/drm/radeon_drm_cs.h | 1 - 1 file changed, 1 deletion(-) diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.h b/src/gallium/winsys/radeon/drm/radeon_drm_cs.h index 76004c5..208452d 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.h +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.h @@ -42,21 +42,20 @@ struct radeon_cs_context { struct drm_radeon_cs_chunk chunks[3]; uint64_tchunk_array[3]; uint32_tflags[2]; /* Buffers. */ unsignednrelocs; unsignedcrelocs; unsignedvalidated_crelocs; struct radeon_bo_item *relocs_bo; struct drm_radeon_cs_reloc *relocs; -uint64_t*priority_usage; int reloc_indices_hashlist[4096]; }; struct radeon_drm_cs { struct radeon_winsys_cs base; enum ring_type ring_type; /* We flip between these two CS. While one is being consumed * by the kernel in another thread, the other one is being filled -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/12] winsys/amdgpu: use only one fence per BO
From: Nicolai HähnleThe fence that is added to the BO during flush is guaranteed to be signaled after all the fences that were in the fences array of the BO before the flush, because those fences are added as dependencies for the submission (and all this happens atomically under the bo_fence_lock). Therefore, keeping only the last fence around is sufficient. --- src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 54 +++-- src/gallium/winsys/amdgpu/drm/amdgpu_bo.h | 4 +- src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 66 +++ 3 files changed, 56 insertions(+), 68 deletions(-) diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c index 32df0be..a6d4aa4 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c @@ -37,21 +37,20 @@ #include #include #include static bool amdgpu_bo_wait(struct pb_buffer *_buf, uint64_t timeout, enum radeon_bo_usage usage) { struct amdgpu_winsys_bo *bo = amdgpu_winsys_bo(_buf); struct amdgpu_winsys *ws = bo->ws; int64_t abs_timeout; - int i; if (timeout == 0) { if (p_atomic_read(>num_active_ioctls)) return false; } else { abs_timeout = os_time_get_absolute_timeout(timeout); /* Wait if any ioctl is being submitted with this buffer. */ if (!os_wait_until_zero_abs_timeout(>num_active_ioctls, abs_timeout)) @@ -68,91 +67,82 @@ static bool amdgpu_bo_wait(struct pb_buffer *_buf, uint64_t timeout, r = amdgpu_bo_wait_for_idle(bo->bo, timeout, _busy); if (r) fprintf(stderr, "%s: amdgpu_bo_wait_for_idle failed %i\n", __func__, r); return !buffer_busy; } if (timeout == 0) { pipe_mutex_lock(ws->bo_fence_lock); - for (i = 0; i < RING_LAST; i++) - if (bo->fence[i]) { -if (amdgpu_fence_wait(bo->fence[i], 0, false)) { - /* Release the idle fence to avoid checking it again later. */ - amdgpu_fence_reference(>fence[i], NULL); -} else { - pipe_mutex_unlock(ws->bo_fence_lock); - return false; -} + if (bo->fence) { + if (amdgpu_fence_wait(bo->fence, 0, false)) { +/* Release the idle fence to avoid checking it again later. */ +amdgpu_fence_reference(>fence, NULL); + } else { +pipe_mutex_unlock(ws->bo_fence_lock); +return false; } + } pipe_mutex_unlock(ws->bo_fence_lock); return true; } else { - struct pipe_fence_handle *fence[RING_LAST] = {}; - bool fence_idle[RING_LAST] = {}; + struct pipe_fence_handle *fence = NULL; + bool fence_idle = false; bool buffer_idle = true; - /* Take references to all fences, so that we can wait for them + /* Take a reference to the fences, so that we can wait for it * without the lock. */ pipe_mutex_lock(ws->bo_fence_lock); - for (i = 0; i < RING_LAST; i++) - amdgpu_fence_reference([i], bo->fence[i]); + amdgpu_fence_reference(, bo->fence); pipe_mutex_unlock(ws->bo_fence_lock); - /* Now wait for the fences. */ - for (i = 0; i < RING_LAST; i++) { - if (fence[i]) { -if (amdgpu_fence_wait(fence[i], abs_timeout, true)) - fence_idle[i] = true; -else - buffer_idle = false; - } + /* Now wait for the fence. */ + if (fence) { + if (amdgpu_fence_wait(fence, abs_timeout, true)) +fence_idle = true; + else +buffer_idle = false; } /* Release idle fences to avoid checking them again later. */ pipe_mutex_lock(ws->bo_fence_lock); - for (i = 0; i < RING_LAST; i++) { - if (fence[i] == bo->fence[i] && fence_idle[i]) -amdgpu_fence_reference(>fence[i], NULL); - - amdgpu_fence_reference([i], NULL); - } + if (fence == bo->fence && fence_idle) + amdgpu_fence_reference(>fence, NULL); + amdgpu_fence_reference(, NULL); pipe_mutex_unlock(ws->bo_fence_lock); return buffer_idle; } } static enum radeon_bo_domain amdgpu_bo_get_initial_domain( struct pb_buffer *buf) { return ((struct amdgpu_winsys_bo*)buf)->initial_domain; } void amdgpu_bo_destroy(struct pb_buffer *_buf) { struct amdgpu_winsys_bo *bo = amdgpu_winsys_bo(_buf); - int i; pipe_mutex_lock(bo->ws->global_bo_list_lock); LIST_DEL(>global_list_item); bo->ws->num_buffers--; pipe_mutex_unlock(bo->ws->global_bo_list_lock); amdgpu_bo_va_op(bo->bo, 0, bo->base.size, bo->va, 0, AMDGPU_VA_OP_UNMAP); amdgpu_va_range_free(bo->va_handle); amdgpu_bo_free(bo->bo); - for (i = 0; i < RING_LAST; i++) - amdgpu_fence_reference(>fence[i], NULL); +
[Mesa-dev] [PATCH 09/12] winsys/amdgpu: remove amdgpu_cs_lookup_buffer
From: Nicolai HähnleThe radeonsi driver doesn't and shouldn't care about the buffer index. Only the virtual addresses matter. --- src/gallium/drivers/radeon/radeon_winsys.h | 3 +++ src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 9 - 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_winsys.h b/src/gallium/drivers/radeon/radeon_winsys.h index 9693032..91f6e89 100644 --- a/src/gallium/drivers/radeon/radeon_winsys.h +++ b/src/gallium/drivers/radeon/radeon_winsys.h @@ -596,20 +596,23 @@ struct radeon_winsys { */ unsigned (*cs_add_buffer)(struct radeon_winsys_cs *cs, struct pb_buffer *buf, enum radeon_bo_usage usage, enum radeon_bo_domain domain, enum radeon_bo_priority priority); /** * Return the index of an already-added buffer. * + * Not supported on amdgpu. Drivers with GPUVM should not care about + * buffer indices. + * * \param csCommand stream * \param buf Buffer * \return The buffer index, or -1 if the buffer has not been added. */ int (*cs_lookup_buffer)(struct radeon_winsys_cs *cs, struct pb_buffer *buf); /** * Return true if there is enough memory in VRAM and GTT for the buffers * added so far. If the validation fails, all buffers which have diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c index df9859b..e2229ff 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c @@ -692,28 +692,20 @@ amdgpu_cs_add_const_preamble_ib(struct radeon_winsys_cs *rcs) cs->csc->request.ibs = >csc->ib[IB_CONST_PREAMBLE]; cs->cst->request.number_of_ibs = 3; cs->cst->request.ibs = >cst->ib[IB_CONST_PREAMBLE]; return >const_preamble_ib.base; } #define OUT_CS(cs, value) (cs)->current.buf[(cs)->current.cdw++] = (value) -static int amdgpu_cs_lookup_buffer(struct radeon_winsys_cs *rcs, - struct pb_buffer *buf) -{ - struct amdgpu_cs *cs = amdgpu_cs(rcs); - - return amdgpu_lookup_buffer(cs->csc, (struct amdgpu_winsys_bo*)buf); -} - static bool amdgpu_cs_validate(struct radeon_winsys_cs *rcs) { return true; } static bool amdgpu_cs_check_space(struct radeon_winsys_cs *rcs, unsigned dw) { struct amdgpu_ib *ib = amdgpu_ib(rcs); struct amdgpu_cs *cs = amdgpu_cs_from_ib(ib); unsigned requested_size = rcs->prev_dw + rcs->current.cdw + dw; @@ -,21 +1103,20 @@ static bool amdgpu_bo_is_referenced(struct radeon_winsys_cs *rcs, void amdgpu_cs_init_functions(struct amdgpu_winsys *ws) { ws->base.ctx_create = amdgpu_ctx_create; ws->base.ctx_destroy = amdgpu_ctx_destroy; ws->base.ctx_query_reset_status = amdgpu_ctx_query_reset_status; ws->base.cs_create = amdgpu_cs_create; ws->base.cs_add_const_ib = amdgpu_cs_add_const_ib; ws->base.cs_add_const_preamble_ib = amdgpu_cs_add_const_preamble_ib; ws->base.cs_destroy = amdgpu_cs_destroy; ws->base.cs_add_buffer = amdgpu_cs_add_buffer; - ws->base.cs_lookup_buffer = amdgpu_cs_lookup_buffer; ws->base.cs_validate = amdgpu_cs_validate; ws->base.cs_check_space = amdgpu_cs_check_space; ws->base.cs_get_buffer_list = amdgpu_cs_get_buffer_list; ws->base.cs_flush = amdgpu_cs_flush; ws->base.cs_get_next_fence = amdgpu_cs_get_next_fence; ws->base.cs_is_buffer_referenced = amdgpu_bo_is_referenced; ws->base.cs_sync_flush = amdgpu_cs_sync_flush; ws->base.fence_wait = amdgpu_fence_wait_rel_timeout; ws->base.fence_reference = amdgpu_fence_reference; } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/12] winsys/amdgpu: extract adding a new buffer list entry into its own function
From: Nicolai HähnleWhile at it, try to be a little more robust in the face of memory allocation failure. --- src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 113 ++ 1 file changed, 70 insertions(+), 43 deletions(-) diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c index 7cbf19b..638c2d5 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c @@ -284,77 +284,104 @@ int amdgpu_lookup_buffer(struct amdgpu_cs_context *cs, struct amdgpu_winsys_bo * * AAABB * will collide here: ^ and here: ^, * meaning that we should get very few collisions in the end. */ cs->buffer_indices_hashlist[hash] = i; return i; } } return -1; } -static unsigned amdgpu_add_buffer(struct amdgpu_cs *acs, - struct amdgpu_winsys_bo *bo, - enum radeon_bo_usage usage, - enum radeon_bo_domain domains, - unsigned priority, - enum radeon_bo_domain *added_domains) +static int +amdgpu_lookup_or_add_buffer(struct amdgpu_cs_context *cs, struct amdgpu_winsys_bo *bo) { - struct amdgpu_cs_context *cs = acs->csc; struct amdgpu_cs_buffer *buffer; - unsigned hash = bo->unique_id & (ARRAY_SIZE(cs->buffer_indices_hashlist)-1); - int i = -1; + unsigned hash; + int idx = amdgpu_lookup_buffer(cs, bo); - assert(priority < 64); - *added_domains = 0; - - i = amdgpu_lookup_buffer(cs, bo); - - if (i >= 0) { - buffer = >buffers[i]; - buffer->priority_usage |= 1llu << priority; - buffer->usage |= usage; - *added_domains = domains & ~buffer->domains; - buffer->domains |= domains; - cs->flags[i] = MAX2(cs->flags[i], priority / 4); - return i; - } + if (idx >= 0) + return idx; /* New buffer, check if the backing array is large enough. */ if (cs->num_buffers >= cs->max_num_buffers) { - uint32_t size; - cs->max_num_buffers += 10; + unsigned new_max = + MAX2(cs->max_num_buffers + 16, (unsigned)(cs->max_num_buffers * 1.3)); + struct amdgpu_cs_buffer *new_buffers; + amdgpu_bo_handle *new_handles; + uint8_t *new_flags; + + new_buffers = MALLOC(new_max * sizeof(*new_buffers)); + new_handles = MALLOC(new_max * sizeof(*new_handles)); + new_flags = MALLOC(new_max * sizeof(*new_flags)); + + if (!new_buffers || !new_handles || !new_flags) { + fprintf(stderr, "amdgpu_lookup_or_add_buffer: allocation failed\n"); + FREE(new_buffers); + FREE(new_handles); + FREE(new_flags); + return -1; + } - size = cs->max_num_buffers * sizeof(struct amdgpu_cs_buffer); - cs->buffers = realloc(cs->buffers, size); + memcpy(new_buffers, cs->buffers, cs->num_buffers * sizeof(*new_buffers)); + memcpy(new_handles, cs->handles, cs->num_buffers * sizeof(*new_handles)); + memcpy(new_flags, cs->flags, cs->num_buffers * sizeof(*new_flags)); - size = cs->max_num_buffers * sizeof(amdgpu_bo_handle); - cs->handles = realloc(cs->handles, size); + FREE(cs->buffers); + FREE(cs->handles); + FREE(cs->flags); - cs->flags = realloc(cs->flags, cs->max_num_buffers); + cs->max_num_buffers = new_max; + cs->buffers = new_buffers; + cs->handles = new_handles; + cs->flags = new_flags; } - /* Initialize the new buffer. */ - cs->buffers[cs->num_buffers].bo = NULL; - amdgpu_winsys_bo_reference(>buffers[cs->num_buffers].bo, bo); - cs->handles[cs->num_buffers] = bo->bo; - cs->flags[cs->num_buffers] = priority / 4; + idx = cs->num_buffers; + buffer = >buffers[idx]; + memset(buffer, 0, sizeof(*buffer)); + amdgpu_winsys_bo_reference(>bo, bo); + cs->handles[idx] = bo->bo; + cs->flags[idx] = 0; p_atomic_inc(>num_cs_references); - buffer = >buffers[cs->num_buffers]; - buffer->bo = bo; - buffer->priority_usage = 1llu << priority; - buffer->usage = usage; - buffer->domains = domains; + cs->num_buffers++; - cs->buffer_indices_hashlist[hash] = cs->num_buffers; + hash = bo->unique_id & (ARRAY_SIZE(cs->buffer_indices_hashlist)-1); + cs->buffer_indices_hashlist[hash] = idx; + + return idx; +} + +static unsigned amdgpu_add_buffer(struct amdgpu_cs *acs, + struct amdgpu_winsys_bo *bo, + enum radeon_bo_usage usage, + enum radeon_bo_domain domains, + unsigned priority, + enum radeon_bo_domain *added_domains) +{ + struct amdgpu_cs_context *cs = acs->csc; + struct amdgpu_cs_buffer *buffer; + int i = amdgpu_lookup_or_add_buffer(cs, bo); + + assert(priority < 64); +
[Mesa-dev] [PATCH 08/12] winsys/amdgpu: remove unused field domains from amdgpu_cs_buffer
From: Nicolai Hähnle--- src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 53 ++- src/gallium/winsys/amdgpu/drm/amdgpu_cs.h | 1 - 2 files changed, 17 insertions(+), 37 deletions(-) diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c index 10a4416..df9859b 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c @@ -285,22 +285,23 @@ int amdgpu_lookup_buffer(struct amdgpu_cs_context *cs, struct amdgpu_winsys_bo * * will collide here: ^ and here: ^, * meaning that we should get very few collisions in the end. */ cs->buffer_indices_hashlist[hash] = i; return i; } } return -1; } static int -amdgpu_lookup_or_add_buffer(struct amdgpu_cs_context *cs, struct amdgpu_winsys_bo *bo) +amdgpu_lookup_or_add_buffer(struct amdgpu_cs *acs, struct amdgpu_winsys_bo *bo) { + struct amdgpu_cs_context *cs = acs->csc; struct amdgpu_cs_buffer *buffer; unsigned hash; int idx = amdgpu_lookup_buffer(cs, bo); if (idx >= 0) return idx; /* New buffer, check if the backing array is large enough. */ if (cs->num_buffers >= cs->max_num_buffers) { unsigned new_max = @@ -340,70 +341,50 @@ amdgpu_lookup_or_add_buffer(struct amdgpu_cs_context *cs, struct amdgpu_winsys_b memset(buffer, 0, sizeof(*buffer)); amdgpu_winsys_bo_reference(>bo, bo); cs->handles[idx] = bo->bo; cs->flags[idx] = 0; p_atomic_inc(>num_cs_references); cs->num_buffers++; hash = bo->unique_id & (ARRAY_SIZE(cs->buffer_indices_hashlist)-1); cs->buffer_indices_hashlist[hash] = idx; - return idx; -} - -static unsigned amdgpu_add_buffer(struct amdgpu_cs *acs, - struct amdgpu_winsys_bo *bo, - enum radeon_bo_usage usage, - enum radeon_bo_domain domains, - unsigned priority, - enum radeon_bo_domain *added_domains) -{ - struct amdgpu_cs_context *cs = acs->csc; - struct amdgpu_cs_buffer *buffer; - int i = amdgpu_lookup_or_add_buffer(cs, bo); - - assert(priority < 64); - - if (i < 0) { - *added_domains = 0; - return ~0; - } + if (bo->initial_domain & RADEON_DOMAIN_VRAM) + acs->main.base.used_vram += bo->base.size; + else if (bo->initial_domain & RADEON_DOMAIN_GTT) + acs->main.base.used_gart += bo->base.size; - buffer = >buffers[i]; - buffer->priority_usage |= 1llu << priority; - buffer->usage |= usage; - *added_domains = domains & ~buffer->domains; - buffer->domains |= domains; - cs->flags[i] = MAX2(cs->flags[i], priority / 4); - return i; + return idx; } static unsigned amdgpu_cs_add_buffer(struct radeon_winsys_cs *rcs, struct pb_buffer *buf, enum radeon_bo_usage usage, enum radeon_bo_domain domains, enum radeon_bo_priority priority) { /* Don't use the "domains" parameter. Amdgpu doesn't support changing * the buffer placement during command submission. */ - struct amdgpu_cs *cs = amdgpu_cs(rcs); + struct amdgpu_cs *acs = amdgpu_cs(rcs); + struct amdgpu_cs_context *cs = acs->csc; struct amdgpu_winsys_bo *bo = (struct amdgpu_winsys_bo*)buf; - enum radeon_bo_domain added_domains; - unsigned index = amdgpu_add_buffer(cs, bo, usage, bo->initial_domain, - priority, _domains); + struct amdgpu_cs_buffer *buffer; + int index = amdgpu_lookup_or_add_buffer(acs, bo); - if (added_domains & RADEON_DOMAIN_VRAM) - cs->main.base.used_vram += bo->base.size; - else if (added_domains & RADEON_DOMAIN_GTT) - cs->main.base.used_gart += bo->base.size; + if (index < 0) + return 0; + buffer = >buffers[index]; + buffer->priority_usage |= 1llu << priority; + buffer->usage |= usage; + cs->flags[index] = MAX2(cs->flags[index], priority / 4); return index; } static bool amdgpu_ib_new_buffer(struct amdgpu_winsys *ws, struct amdgpu_ib *ib) { struct pb_buffer *pb; uint8_t *mapped; unsigned buffer_size; /* Always create a buffer that is at least as large as the maximum seen IB diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h index 7455061..51753db 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h @@ -40,21 +40,20 @@ struct amdgpu_ctx { amdgpu_context_handle ctx; amdgpu_bo_handle user_fence_bo; uint64_t *user_fence_cpu_address_base; int refcount; }; struct amdgpu_cs_buffer { struct amdgpu_winsys_bo *bo; uint64_t priority_usage; enum radeon_bo_usage usage; - enum radeon_bo_domain domains; }; enum
[Mesa-dev] [PATCH 01/12] gallium/radeon/winsyses: remove #includes of pb_bufmgr.h
From: Nicolai Hähnle--- src/gallium/winsys/amdgpu/drm/amdgpu_bo.h | 1 - src/gallium/winsys/radeon/drm/radeon_drm_bo.h | 1 - src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 1 - 3 files changed, 3 deletions(-) diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.h b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.h index e2ee049..70d9854 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.h +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.h @@ -27,21 +27,20 @@ */ /* * Authors: * Marek Olšák */ #ifndef AMDGPU_BO_H #define AMDGPU_BO_H #include "amdgpu_winsys.h" -#include "pipebuffer/pb_bufmgr.h" struct amdgpu_winsys_bo { struct pb_buffer base; struct pb_cache_entry cache_entry; struct amdgpu_winsys *ws; void *user_ptr; /* from buffer_from_ptr */ amdgpu_bo_handle bo; int map_count; diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.h b/src/gallium/winsys/radeon/drm/radeon_drm_bo.h index f7f4ce3..a9f31c0 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.h +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.h @@ -26,21 +26,20 @@ */ /* * Authors: * Jérôme Glisse * Marek Olšák */ #ifndef RADEON_DRM_BO_H #define RADEON_DRM_BO_H #include "radeon_drm_winsys.h" -#include "pipebuffer/pb_bufmgr.h" #include "os/os_thread.h" struct radeon_bo { struct pb_buffer base; struct pb_cache_entry cache_entry; struct radeon_drm_winsys *rws; void *user_ptr; /* from buffer_from_ptr */ void *ptr; diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c index d73b7f4..aa4bf5f 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c @@ -28,21 +28,20 @@ * Authors: * Corbin Simpson * Joakim Sindholt * Marek Olšák */ #include "radeon_drm_bo.h" #include "radeon_drm_cs.h" #include "radeon_drm_public.h" -#include "pipebuffer/pb_bufmgr.h" #include "util/u_memory.h" #include "util/u_hash_table.h" #include #include #include #include #include #include -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/12] winsys/amdgpu: remove initial buffer list allocation
From: Nicolai HähnleIt's really not necessary. --- src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 20 1 file changed, 20 deletions(-) diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c index 638c2d5..10a4416 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c @@ -568,40 +568,20 @@ static bool amdgpu_init_cs_context(struct amdgpu_cs_context *cs, case RING_COMPUTE: cs->request.ip_type = AMDGPU_HW_IP_COMPUTE; break; default: case RING_GFX: cs->request.ip_type = AMDGPU_HW_IP_GFX; break; } - cs->max_num_buffers = 512; - cs->buffers = (struct amdgpu_cs_buffer*) - CALLOC(1, cs->max_num_buffers * sizeof(struct amdgpu_cs_buffer)); - if (!cs->buffers) { - return false; - } - - cs->handles = CALLOC(1, cs->max_num_buffers * sizeof(amdgpu_bo_handle)); - if (!cs->handles) { - FREE(cs->buffers); - return false; - } - - cs->flags = CALLOC(1, cs->max_num_buffers); - if (!cs->flags) { - FREE(cs->handles); - FREE(cs->buffers); - return false; - } - for (i = 0; i < ARRAY_SIZE(cs->buffer_indices_hashlist); i++) { cs->buffer_indices_hashlist[i] = -1; } cs->request.number_of_ibs = 1; cs->request.ibs = >ib[IB_MAIN]; cs->ib[IB_CONST].flags = AMDGPU_IB_FLAG_CE; cs->ib[IB_CONST_PREAMBLE].flags = AMDGPU_IB_FLAG_CE | AMDGPU_IB_FLAG_PREAMBLE; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/12] radeon/winsyses: various cleanups
Hi, this is a bunch of random cleanups in and related to radeon and amdgpu winsyses. There is one significant behavioral change, which is that amdgpu now only keeps a single fence around for each buffer object. See the explanation at that commit. Please review! Thanks, Nicolai -- .../drivers/radeon/r600_pipe_common.c| 3 +- src/gallium/drivers/radeon/radeon_winsys.h | 3 + src/gallium/winsys/amdgpu/drm/amdgpu_bo.c| 54 ++--- src/gallium/winsys/amdgpu/drm/amdgpu_bo.h| 5 +- src/gallium/winsys/amdgpu/drm/amdgpu_cs.c| 203 - src/gallium/winsys/amdgpu/drm/amdgpu_cs.h| 1 - .../winsys/amdgpu/drm/amdgpu_winsys.c| 21 +- .../winsys/radeon/drm/radeon_drm_bo.h| 1 - .../winsys/radeon/drm/radeon_drm_cs.c| 61 ++--- .../winsys/radeon/drm/radeon_drm_cs.h| 7 +- .../winsys/radeon/drm/radeon_drm_winsys.c| 1 - ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium/winsys/kms: Close drm device filedescriptor on kms_dri_sw_winsys release
Hi, Can I do something to have it merged to the source tree? Fix was already tested, works fine :) regards Łukasz Spintzyk 2016-09-05 18:48 GMT+02:00 Lukasz Spintzyk: > This closes filedescriptor owned by kms_dri_sw_winsys struct. It fixes > issue > where removal of udl or evdi module used by DisplayLink devices was > impossible > due to not closed filedescriptors. > > When this file descriptor was not closed then command > rmmod udl was returning error "Module udl is in use". > By this fix xserver does not prevent module removal when usb device > is unplugged. > > Signed-off-by: Lukasz Spintzyk > --- > src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c > b/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c > index 07eca99..f06ccef 100644 > --- a/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c > +++ b/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c > @@ -371,6 +371,9 @@ kms_sw_displaytarget_display(struct sw_winsys *ws, > static void > kms_destroy_sw_winsys(struct sw_winsys *winsys) > { > + struct kms_sw_winsys *kms_sw = kms_sw_winsys(winsys); > + > + close(kms_sw->fd); > FREE(winsys); > } > > -- > 2.7.4 > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/4] i965: Use blorp_copy to implement glCopyImageSubData
Now that we have the shiny new blorp_copy entrypoint that was added to implement VkCopy* in the Vulkan driver, it's almost trivial to hook the GL driver up to use it for glCopyImageSubData. What's nice about this new entrypoint is that it can handle everything that glCopyImageSubData can throw at it instead of being restricted to just uncompressed destinations. Also, the old blorp and meta-based approaches are broken for compressed textures with multiple miplevels and fixing them isn't really practical. Jason Ekstrand (4): intel/isl: Allow valign2 for texture-only Y-tiled surfaces on gen7 intel/isl: Add support for RGB formats in X and Y-tiled memory i965/blorp: Add a copy_miptrees helper i965: Use blorp_copy for all copy_image operations on gen6+ src/intel/isl/isl.c | 49 ++- src/intel/isl/isl.h | 10 +++- src/intel/isl/isl_gen7.c | 3 +- src/mesa/drivers/dri/i965/brw_blorp.c| 71 src/mesa/drivers/dri/i965/brw_blorp.h| 10 src/mesa/drivers/dri/i965/intel_copy_image.c | 28 +++ 6 files changed, 134 insertions(+), 37 deletions(-) -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] i965/blorp: Add a copy_miptrees helper
--- src/mesa/drivers/dri/i965/brw_blorp.c | 71 +++ src/mesa/drivers/dri/i965/brw_blorp.h | 10 + 2 files changed, 81 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c b/src/mesa/drivers/dri/i965/brw_blorp.c index 193c7a4..626c33c 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.c +++ b/src/mesa/drivers/dri/i965/brw_blorp.c @@ -288,6 +288,23 @@ physical_to_logical_layer(struct intel_mipmap_tree *mt, } } +static void +miptree_check_level_logical_layer(struct intel_mipmap_tree *mt, + unsigned level, + unsigned logical_layer) +{ + if (mt->msaa_layout == INTEL_MSAA_LAYOUT_UMS || + mt->msaa_layout == INTEL_MSAA_LAYOUT_CMS) { + const unsigned num_samples = MAX2(1, mt->num_samples); + for (unsigned s = 0; s < num_samples; s++) { + const unsigned physical_layer = (logical_layer * num_samples) + s; + intel_miptree_check_level_layer(mt, level, physical_layer); + } + } else { + intel_miptree_check_level_layer(mt, level, logical_layer); + } +} + /** * Note: if the src (or dst) is a 2D multisample array texture on Gen7+ using * INTEL_MSAA_LAYOUT_UMS or INTEL_MSAA_LAYOUT_CMS, src_layer (dst_layer) is @@ -392,6 +409,60 @@ brw_blorp_blit_miptrees(struct brw_context *brw, dst_mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_UNRESOLVED; } +void +brw_blorp_copy_miptrees(struct brw_context *brw, +struct intel_mipmap_tree *src_mt, +unsigned src_level, unsigned src_layer, +struct intel_mipmap_tree *dst_mt, +unsigned dst_level, unsigned dst_layer, +unsigned src_x, unsigned src_y, +unsigned dst_x, unsigned dst_y, +unsigned src_width, unsigned src_height) +{ + /* Get ready to blit. This includes depth resolving the src and dst +* buffers if necessary. Note: it's not necessary to do a color resolve on +* the destination buffer because we use the standard render path to render +* to destination color buffers, and the standard render path is +* fast-color-aware. +*/ + intel_miptree_resolve_color(brw, src_mt, INTEL_MIPTREE_IGNORE_CCS_E); + intel_miptree_slice_resolve_depth(brw, src_mt, src_level, src_layer); + intel_miptree_slice_resolve_depth(brw, dst_mt, dst_level, dst_layer); + + intel_miptree_prepare_mcs(brw, dst_mt); + + DBG("%s from %dx %s mt %p %d %d (%d,%d) %dx%d" + "to %dx %s mt %p %d %d (%d,%d)\n", + __func__, + src_mt->num_samples, _mesa_get_format_name(src_mt->format), src_mt, + src_level, src_layer, src_x, src_y, src_width, src_height, + dst_mt->num_samples, _mesa_get_format_name(dst_mt->format), dst_mt, + dst_level, dst_layer, dst_x, dst_y); + + miptree_check_level_logical_layer(src_mt, src_level, src_layer); + miptree_check_level_logical_layer(dst_mt, dst_level, dst_layer); + intel_miptree_used_for_rendering(dst_mt); + + struct isl_surf tmp_surfs[4]; + struct blorp_surf src_surf, dst_surf; + blorp_surf_for_miptree(brw, _surf, src_mt, false, + _level, _surfs[0]); + blorp_surf_for_miptree(brw, _surf, dst_mt, true, + _level, _surfs[2]); + + struct blorp_batch batch; + blorp_batch_init(>blorp, , brw); + blorp_copy(, _surf, src_level, src_layer, + _surf, dst_level, dst_layer, + src_x, src_y, dst_x, dst_y, src_width, src_height); + blorp_batch_finish(); + + intel_miptree_slice_set_needs_hiz_resolve(dst_mt, dst_level, dst_layer); + + if (intel_miptree_is_lossless_compressed(brw, dst_mt)) + dst_mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_UNRESOLVED; +} + static struct intel_mipmap_tree * find_miptree(GLbitfield buffer_bit, struct intel_renderbuffer *irb) { diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h b/src/mesa/drivers/dri/i965/brw_blorp.h index afbf68f..abf3956 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.h +++ b/src/mesa/drivers/dri/i965/brw_blorp.h @@ -48,6 +48,16 @@ brw_blorp_blit_miptrees(struct brw_context *brw, GLenum filter, bool mirror_x, bool mirror_y, bool decode_srgb, bool encode_srgb); +void +brw_blorp_copy_miptrees(struct brw_context *brw, +struct intel_mipmap_tree *src_mt, +unsigned src_level, unsigned src_logical_layer, +struct intel_mipmap_tree *dst_mt, +unsigned dst_level, unsigned dst_logical_layer, +unsigned src_x, unsigned src_y, +unsigned dst_x, unsigned dst_y, +unsigned src_width, unsigned src_height); + bool brw_blorp_clear_color(struct brw_context *brw, struct gl_framebuffer *fb, GLbitfield mask, bool partial_clear, bool
[Mesa-dev] [PATCH 2/4] intel/isl: Add support for RGB formats in X and Y-tiled memory
Normally, using a non-linear tiling format helps improve cache locality by ensuring that neighboring pixels are usually close-by in memory. For RGB formats, this still sort-of holds, but it can also lead to rather terrible memory access patterns where a single RGB pixel value crosses a tile boundary and gets split into two pieces in different 4K pages. It also makes for some rather awkward calculations because your tile size is no longer an even multiple of surface element size. For these reasons, we chose to simply never create tiled RGB images in the Vulkan driver. The GL driver, however, is not so kind so we need to support it somehow. I briefly toyed with a couple of different schemes but this is the best one I could come up with. The fundamental problem is that a tile no longer contains an integer number of surface elements. I briefly considered a couple other options but found them wanting: 1) Using floats for the logical tile size. This leads to potential rounding error problems. 2) When presented with a RGB format, just make the tile 3-times as wide. This isn't so nice because now our tiles are no longer power-of-two size. Also, it can force the row_pitch to be larger than needed which, while not strictly a problem for ISL, causes incompatibility problems with the way the GL driver chooses surface pitches. The chosen method requires that you pay attention and not just assume that your tile_info is in the units you think it is. However, it's nice because it provides a nice "these are the units" declaration in isl_tile_info itself. Previously, the tile_info wasn't usable as a stand-alone structure because you had to also know the format. It also forces figuring out how to deal with inconsistencies between tiling and format back to the caller which is good because the two different consumers of isl_tile_info really want to deal with it differently: Computation of the surface size wants the fewest number of horizontal tiles possible while get_intratile_offset is far more concerned with things aligning nicely. Signed-off-by: Jason EkstrandCc: Chad Versace --- src/intel/isl/isl.c | 49 - src/intel/isl/isl.h | 10 +- 2 files changed, 45 insertions(+), 14 deletions(-) diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c index 34245f4..8c72186 100644 --- a/src/intel/isl/isl.c +++ b/src/intel/isl/isl.c @@ -113,7 +113,16 @@ isl_tiling_get_info(const struct isl_device *dev, const uint32_t bs = format_bpb / 8; struct isl_extent2d logical_el, phys_B; - assert(tiling == ISL_TILING_LINEAR || isl_is_pow2(format_bpb)); + if (tiling != ISL_TILING_LINEAR && ! isl_is_pow2(format_bpb)) { + /* It is possible to have non-power-of-two formats in a tiled buffer. + * The easiest way to handle this is to treat the tile as if it is three + * times as wide. This way no pixel will ever cross a tile boundary. + * This really only works on legacy X and Y tiling formats. + */ + assert(tiling == ISL_TILING_X || tiling == ISL_TILING_Y0); + assert(bs % 3 == 0 && isl_is_pow2(format_bpb / 3)); + return isl_tiling_get_info(dev, tiling, format_bpb / 3, tile_info); + } switch (tiling) { case ISL_TILING_LINEAR: @@ -209,6 +218,7 @@ isl_tiling_get_info(const struct isl_device *dev, *tile_info = (struct isl_tile_info) { .tiling = tiling, + .format_bpb = format_bpb, .logical_extent_el = logical_el, .phys_extent_B = phys_B, }; @@ -1215,14 +1225,20 @@ isl_surf_init_s(const struct isl_device *dev, } base_alignment = isl_round_up_to_power_of_two(base_alignment); } else { + assert(fmtl->bpb % tile_info.format_bpb == 0); + const uint32_t tile_el_scale = fmtl->bpb / tile_info.format_bpb; + assert(tile_el_scale == 1 || tile_el_scale == 3); + assert(phys_slice0_sa.w % fmtl->bw == 0); const uint32_t total_w_el = phys_slice0_sa.width / fmtl->bw; const uint32_t total_w_tl = - isl_align_div(total_w_el, tile_info.logical_extent_el.width); + isl_align_div(total_w_el * tile_el_scale, + tile_info.logical_extent_el.width); row_pitch = total_w_tl * tile_info.phys_extent_B.width; if (row_pitch < info->min_pitch) { - row_pitch = isl_align(info->min_pitch, tile_info.phys_extent_B.width); + row_pitch = isl_align_npot(info->min_pitch, +tile_info.phys_extent_B.width); } total_h_el += isl_align_div_npot(pad_bytes, row_pitch); @@ -1678,14 +1694,6 @@ isl_tiling_get_intratile_offset_el(const struct isl_device *dev, uint32_t *x_offset_el, uint32_t *y_offset_el) { - /* This function only really works for power-of-two surfaces. In -* theory, we could make it work for
[Mesa-dev] [PATCH 4/4] i965: Use blorp_copy for all copy_image operations on gen6+
--- src/mesa/drivers/dri/i965/intel_copy_image.c | 28 ++-- 1 file changed, 6 insertions(+), 22 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_copy_image.c b/src/mesa/drivers/dri/i965/intel_copy_image.c index 1ca6003..7698d8e 100644 --- a/src/mesa/drivers/dri/i965/intel_copy_image.c +++ b/src/mesa/drivers/dri/i965/intel_copy_image.c @@ -210,30 +210,14 @@ copy_miptrees(struct brw_context *brw, int dst_x, int dst_y, int dst_z, unsigned dst_level, int src_width, int src_height) { - struct gl_context *ctx = >ctx; unsigned bw, bh; - if (brw->gen >= 6 && - brw->format_supported_as_render_target[dst_mt->format] && - !_mesa_is_format_compressed(src_mt->format)) { - - /* We'll use the destination format for both images */ - mesa_format format = dst_mt->format; - - brw_blorp_blit_miptrees(brw, - src_mt, src_level, src_z, format, SWIZZLE_XYZW, - dst_mt, dst_level, dst_z, format, - src_x, src_y, - src_x + src_width, src_y + src_height, - dst_x, dst_y, - dst_x + src_width, dst_y + src_height, - GL_NEAREST, false, false, /* mirror */ - false, false); - return; - } - - if (src_mt->num_samples > 0 || dst_mt->num_samples > 0) { - _mesa_problem(ctx, "Failed to copy multisampled texture with BLORP\n"); + if (brw->gen >= 6) { + brw_blorp_copy_miptrees(brw, + src_mt, src_level, src_z, + dst_mt, dst_level, dst_z, + src_x, src_y, dst_x, dst_y, + src_width, src_height); return; } -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] intel/isl: Allow valign2 for texture-only Y-tiled surfaces on gen7
The restriction that Y-tiled surfaces must have valign == 4 only aplies to render targets but we were applying it universally. This causes problems if R32G32B32_SFLOAT is used because it requires valign == 2; this should be okay because you can't render to that format. Signed-off-by: Jason EkstrandCc: Chad Versace --- src/intel/isl/isl_gen7.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/intel/isl/isl_gen7.c b/src/intel/isl/isl_gen7.c index 02273f8..f3d8428 100644 --- a/src/intel/isl/isl_gen7.c +++ b/src/intel/isl/isl_gen7.c @@ -354,7 +354,8 @@ gen7_choose_valign_el(const struct isl_device *dev, */ if (isl_surf_usage_is_depth(info->usage) || info->samples > 1 || - tiling == ISL_TILING_Y0) { + ((info->usage & ISL_SURF_USAGE_RENDER_TARGET_BIT) && +tiling == ISL_TILING_Y0)) { require_valign4 = true; } -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/5] radeonsi: flush TC L2 before using a compute indirect buffer
This series is Reviewed-by: Nicolai HähnleOn 09.09.2016 17:05, Marek Olšák wrote: From: Marek Olšák There is no known test for this. Cc: 12.0 --- src/gallium/drivers/radeonsi/si_compute.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index f43c616..d988214 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -457,24 +457,32 @@ static void si_launch_grid( info->block[0] * info->block[1] * info->block[2] > 256; if (cs_regalloc_hang) sctx->b.flags |= SI_CONTEXT_PS_PARTIAL_FLUSH | SI_CONTEXT_CS_PARTIAL_FLUSH; si_decompress_compute_textures(sctx); /* Add buffer sizes for memory checking in need_cs_space. */ r600_context_add_resource_size(ctx, >shader.bo->b.b); - if (info->indirect) - r600_context_add_resource_size(ctx, info->indirect); /* TODO: add the scratch buffer */ + if (info->indirect) { + r600_context_add_resource_size(ctx, info->indirect); + + /* The hw doesn't read the indirect buffer via TC L2. */ + if (r600_resource(info->indirect)->TC_L2_dirty) { + sctx->b.flags |= SI_CONTEXT_INV_GLOBAL_L2; + r600_resource(info->indirect)->TC_L2_dirty = false; + } + } + si_need_cs_space(sctx); if (!sctx->cs_shader_state.initialized) si_initialize_compute(sctx); if (sctx->b.flags) si_emit_cache_flush(sctx); if (!si_switch_compute_shader(sctx, program, >shader, info->pc)) return; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/11] glapi: add entry points for GL_ARB_compute_variable_group_size
On Fri, Sep 9, 2016 at 5:46 PM, Samuel Pitoisetwrote: > > > On 09/08/2016 10:58 PM, Ian Romanick wrote: >> >> On 09/08/2016 01:31 PM, Samuel Pitoiset wrote: >>> >>> Signed-off-by: Samuel Pitoiset >>> --- >>> .../glapi/gen/ARB_compute_variable_group_size.xml | 25 >>> ++ >>> src/mapi/glapi/gen/Makefile.am | 1 + >>> src/mapi/glapi/gen/gl_API.xml | 2 ++ >>> src/mesa/main/compute.c| 8 +++ >>> src/mesa/main/compute.h| 5 + >>> src/mesa/main/tests/dispatch_sanity.cpp| 3 +++ >>> 6 files changed, 44 insertions(+) >>> create mode 100644 >>> src/mapi/glapi/gen/ARB_compute_variable_group_size.xml >>> >>> diff --git a/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml >>> b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml >>> new file mode 100644 >>> index 000..b21c52f >>> --- /dev/null >>> +++ b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml >>> @@ -0,0 +1,25 @@ >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> + >> value="0x9344"/> >>> + >> value="0x90EB"/> >>> + >> value="0x9345"/> >>> + >> value="0x91BF"/> >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> diff --git a/src/mapi/glapi/gen/Makefile.am >>> b/src/mapi/glapi/gen/Makefile.am >>> index 0d7c338..49fdfe3 100644 >>> --- a/src/mapi/glapi/gen/Makefile.am >>> +++ b/src/mapi/glapi/gen/Makefile.am >>> @@ -117,6 +117,7 @@ API_XML = \ >>> ARB_color_buffer_float.xml \ >>> ARB_compressed_texture_pixel_storage.xml \ >>> ARB_compute_shader.xml \ >>> + ARB_compute_variable_group_size.xml \ >>> ARB_copy_buffer.xml \ >>> ARB_copy_image.xml \ >>> ARB_debug_output.xml \ >>> diff --git a/src/mapi/glapi/gen/gl_API.xml >>> b/src/mapi/glapi/gen/gl_API.xml >>> index c39aa22..9ad3b60 100644 >>> --- a/src/mapi/glapi/gen/gl_API.xml >>> +++ b/src/mapi/glapi/gen/gl_API.xml >>> @@ -8258,6 +8258,8 @@ >>> >>> >> xmlns:xi="http://www.w3.org/2001/XInclude"/> >>> >>> +>> xmlns:xi="http://www.w3.org/2001/XInclude"/> >>> + >> >> >> This (extension #153) should go before ARB_indirect_parameters >> (extension #154), and the "ARB extensions 149 - 153" should be changed >> to "ARB extensions 149 - 152". > > > Presumably, you meant "ARB extensions 149 - 154"? Because the next section > is "ARB extension 155 - 159". The range indicates which extensions are NOT defined by the file. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/11] glapi: add entry points for GL_ARB_compute_variable_group_size
On 09/09/2016 06:31 PM, Marek Olšák wrote: On Fri, Sep 9, 2016 at 5:46 PM, Samuel Pitoisetwrote: On 09/08/2016 10:58 PM, Ian Romanick wrote: On 09/08/2016 01:31 PM, Samuel Pitoiset wrote: Signed-off-by: Samuel Pitoiset --- .../glapi/gen/ARB_compute_variable_group_size.xml | 25 ++ src/mapi/glapi/gen/Makefile.am | 1 + src/mapi/glapi/gen/gl_API.xml | 2 ++ src/mesa/main/compute.c| 8 +++ src/mesa/main/compute.h| 5 + src/mesa/main/tests/dispatch_sanity.cpp| 3 +++ 6 files changed, 44 insertions(+) create mode 100644 src/mapi/glapi/gen/ARB_compute_variable_group_size.xml diff --git a/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml new file mode 100644 index 000..b21c52f --- /dev/null +++ b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml @@ -0,0 +1,25 @@ + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am index 0d7c338..49fdfe3 100644 --- a/src/mapi/glapi/gen/Makefile.am +++ b/src/mapi/glapi/gen/Makefile.am @@ -117,6 +117,7 @@ API_XML = \ ARB_color_buffer_float.xml \ ARB_compressed_texture_pixel_storage.xml \ ARB_compute_shader.xml \ + ARB_compute_variable_group_size.xml \ ARB_copy_buffer.xml \ ARB_copy_image.xml \ ARB_debug_output.xml \ diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml index c39aa22..9ad3b60 100644 --- a/src/mapi/glapi/gen/gl_API.xml +++ b/src/mapi/glapi/gen/gl_API.xml @@ -8258,6 +8258,8 @@ http://www.w3.org/2001/XInclude"/> +http://www.w3.org/2001/XInclude"/> + This (extension #153) should go before ARB_indirect_parameters (extension #154), and the "ARB extensions 149 - 153" should be changed to "ARB extensions 149 - 152". Presumably, you meant "ARB extensions 149 - 154"? Because the next section is "ARB extension 155 - 159". The range indicates which extensions are NOT defined by the file. Ah okay. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] docs: Note MESA_configless_context as superseded
Signed-off-by: Adam Jackson--- docs/specs/MESA_configless_context.spec | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/specs/MESA_configless_context.spec b/docs/specs/MESA_configless_context.spec index f2fafb3..d7ba62d 100644 --- a/docs/specs/MESA_configless_context.spec +++ b/docs/specs/MESA_configless_context.spec @@ -12,11 +12,12 @@ Contact Status -Proposal +Superseded by the functionally identical EGL_KHR_no_config_context +extension. Version -Version 1, February 28, 2014 +Version 2, September 9, 2016 Number @@ -121,5 +122,8 @@ Issues Revision History +Version 2, September 9, 2016 +Defer to EGL_KHR_no_config_context (Adam Jackson) + Version 1, February 28, 2014 Initial draft (Neil Roberts) -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] egl: Rename MESA_configless_context bit to KHR_no_config_context
Keep the old name in the extension string, but refer to the KHR extension internally. Signed-off-by: Adam Jackson--- src/egl/drivers/dri2/egl_dri2.c | 2 +- src/egl/main/eglapi.c | 6 -- src/egl/main/eglcontext.c | 4 ++-- src/egl/main/egldisplay.h | 2 +- 4 files changed, 8 insertions(+), 6 deletions(-) diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index 859612f..bbc457c 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -603,8 +603,8 @@ dri2_setup_screen(_EGLDisplay *disp) disp->ClientAPIs |= EGL_OPENGL_ES3_BIT_KHR; assert(dri2_dpy->image_driver || dri2_dpy->dri2 || dri2_dpy->swrast); + disp->Extensions.KHR_no_config_context = EGL_TRUE; disp->Extensions.KHR_surfaceless_context = EGL_TRUE; - disp->Extensions.MESA_configless_context = EGL_TRUE; if (dri2_renderer_query_integer(dri2_dpy, __DRI2_RENDERER_HAS_FRAMEBUFFER_SRGB)) diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c index df2dcd6..ba8305e 100644 --- a/src/egl/main/eglapi.c +++ b/src/egl/main/eglapi.c @@ -406,11 +406,13 @@ _eglCreateExtensionsString(_EGLDisplay *dpy) _eglAppendExtension(, "EGL_KHR_image"); _EGL_CHECK_EXTENSION(KHR_image_base); _EGL_CHECK_EXTENSION(KHR_image_pixmap); + _EGL_CHECK_EXTENSION(KHR_no_config_context); _EGL_CHECK_EXTENSION(KHR_reusable_sync); _EGL_CHECK_EXTENSION(KHR_surfaceless_context); _EGL_CHECK_EXTENSION(KHR_wait_sync); - _EGL_CHECK_EXTENSION(MESA_configless_context); + if (dpy->Extensions.KHR_no_config_context) + _eglAppendExtension(, "EGL_MESA_configless_context"); _EGL_CHECK_EXTENSION(MESA_drm_image); _EGL_CHECK_EXTENSION(MESA_image_dma_buf_export); @@ -627,7 +629,7 @@ eglCreateContext(EGLDisplay dpy, EGLConfig config, EGLContext share_list, _EGL_CHECK_DISPLAY(disp, EGL_NO_CONTEXT, drv); - if (!config && !disp->Extensions.MESA_configless_context) + if (!config && !disp->Extensions.KHR_no_config_context) RETURN_EGL_ERROR(disp, EGL_BAD_CONFIG, EGL_NO_CONTEXT); if (!share && share_list != EGL_NO_CONTEXT) diff --git a/src/egl/main/eglcontext.c b/src/egl/main/eglcontext.c index 7eac79a..60625f6 100644 --- a/src/egl/main/eglcontext.c +++ b/src/egl/main/eglcontext.c @@ -642,9 +642,9 @@ _eglCheckMakeCurrent(_EGLContext *ctx, _EGLSurface *draw, _EGLSurface *read) (read && read->Config != ctx->Config)) return _eglError(EGL_BAD_MATCH, "eglMakeCurrent"); } else { - /* Otherwise we must be using the EGL_MESA_configless_context + /* Otherwise we must be using the EGL_KHR_no_config_context * extension */ - assert(dpy->Extensions.MESA_configless_context); + assert(dpy->Extensions.KHR_no_config_context); /* The extension doesn't permit binding draw and read buffers with * differing contexts */ diff --git a/src/egl/main/egldisplay.h b/src/egl/main/egldisplay.h index 6bfc858..6f3340e 100644 --- a/src/egl/main/egldisplay.h +++ b/src/egl/main/egldisplay.h @@ -112,11 +112,11 @@ struct _egl_extensions EGLBoolean KHR_gl_texture_cubemap_image; EGLBoolean KHR_image_base; EGLBoolean KHR_image_pixmap; + EGLBoolean KHR_no_config_context; EGLBoolean KHR_reusable_sync; EGLBoolean KHR_surfaceless_context; EGLBoolean KHR_wait_sync; - EGLBoolean MESA_configless_context; EGLBoolean MESA_drm_image; EGLBoolean MESA_image_dma_buf_export; -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/3] Implement EGL_KHR_no_config_context
KHR_no_config_context is virtually identical to MESA_configless_context, where they differ is only where the Mesa spec is silent, and for the most part the code already did what the Khronos spec wants. Fix up the one corner case, and mark the Mesa extension as superseded. - ajax ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] egl: QueryContext on a configless context returns zero
MESA_configless_context does not specify the interaction with QueryContext at all, and the code to generate an error in this case predates the Mesa extension. Since EGL_NO_CONFIG_{KHR,MESA} are numerically identical there's no way to distinguish which one the application asked for, so use the KHR behaviour. Signed-off-by: Adam Jackson--- src/egl/main/eglcontext.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/src/egl/main/eglcontext.c b/src/egl/main/eglcontext.c index 58740c3..7eac79a 100644 --- a/src/egl/main/eglcontext.c +++ b/src/egl/main/eglcontext.c @@ -538,9 +538,14 @@ _eglQueryContext(_EGLDriver *drv, _EGLDisplay *dpy, _EGLContext *c, switch (attribute) { case EGL_CONFIG_ID: - if (!c->Config) - return _eglError(EGL_BAD_ATTRIBUTE, "eglQueryContext"); - *value = c->Config->ConfigID; + /* + * From EGL_KHR_no_config_context: + * + *"Querying EGL_CONFIG_ID returns the ID of the EGLConfig with + * respect to which the context was created, or zero if created + * without respect to an EGLConfig." + */ + *value = c->Config ? c->Config->ConfigID : 0; break; case EGL_CONTEXT_CLIENT_VERSION: *value = c->ClientMajorVersion; -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: support TGSI compute shaders with variable block size
On 09.09.2016 18:07, Samuel Pitoiset wrote: On 09/09/2016 02:37 PM, Ilia Mirkin wrote: On Fri, Sep 9, 2016 at 8:29 AM, Marek Olšákwrote: On Fri, Sep 9, 2016 at 10:12 AM, Nicolai Hähnle wrote: From: Nicolai Hähnle Not sure if it's possible to avoid programming the block size twice (once for the userdata and once for the dispatch). Since the shaders are compiled with a pessimistic upper limit on the number of registers, asynchronously compiling variants may be worth considering in the future if we observe the shaders to be dispatched with small block sizes. --- I think this is sufficient to support variable group sizes on radeonsi, but it's completely untested. Do you keep the latest version of your series in a public repository somewhere? src/gallium/drivers/radeonsi/si_compute.c | 10 +- src/gallium/drivers/radeonsi/si_shader.c | 29 - src/gallium/drivers/radeonsi/si_shader.h | 4 +++- 3 files changed, 32 insertions(+), 11 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index 5041761..26e096c 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -379,25 +379,33 @@ static void si_setup_tgsi_grid(struct si_context *sctx, for (i = 0; i < 3; ++i) { radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0)); radeon_emit(cs, COPY_DATA_SRC_SEL(COPY_DATA_MEM) | COPY_DATA_DST_SEL(COPY_DATA_REG)); radeon_emit(cs, (va + 4 * i)); radeon_emit(cs, (va + 4 * i) >> 32); radeon_emit(cs, (grid_size_reg >> 2) + i); radeon_emit(cs, 0); } } else { + struct si_compute *program = sctx->cs_shader_state.program; + bool variable_group_size = + program->shader.selector->info.properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] == 0; - radeon_set_sh_reg_seq(cs, grid_size_reg, 3); + radeon_set_sh_reg_seq(cs, grid_size_reg, variable_group_size ? 6 : 3); radeon_emit(cs, info->grid[0]); radeon_emit(cs, info->grid[1]); radeon_emit(cs, info->grid[2]); + if (variable_group_size) { + radeon_emit(cs, info->block[0]); + radeon_emit(cs, info->block[1]); + radeon_emit(cs, info->block[2]); + } } } static void si_emit_dispatch_packets(struct si_context *sctx, const struct pipe_grid_info *info) { struct radeon_winsys_cs *cs = sctx->b.gfx.cs; bool render_cond_bit = sctx->b.render_cond && !sctx->b.render_cond_force_off; unsigned waves_per_threadgroup = DIV_ROUND_UP(info->block[0] * info->block[1] * info->block[2], 64); diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 0b7de18..730ee21 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -1783,30 +1783,35 @@ static void declare_system_value( case TGSI_SEMANTIC_GRID_SIZE: value = LLVMGetParam(radeon_bld->main_fn, SI_PARAM_GRID_SIZE); break; case TGSI_SEMANTIC_BLOCK_SIZE: { LLVMValueRef values[3]; unsigned i; unsigned *properties = ctx->shader->selector->info.properties; - unsigned sizes[3] = { - properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH], - properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT], - properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH] - }; - for (i = 0; i < 3; ++i) - values[i] = lp_build_const_int32(gallivm, sizes[i]); + if (properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] != 0) { + unsigned sizes[3] = { + properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH], + properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT], + properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH] + }; + + for (i = 0; i < 3; ++i) + values[i] = lp_build_const_int32(gallivm, sizes[i]); - value = lp_build_gather_values(gallivm, values, 3); + value = lp_build_gather_values(gallivm, values, 3); + } else { + value = LLVMGetParam(radeon_bld->main_fn, SI_PARAM_BLOCK_SIZE); + } break; } case TGSI_SEMANTIC_BLOCK_ID: value = LLVMGetParam(radeon_bld->main_fn, SI_PARAM_BLOCK_ID); break; case TGSI_SEMANTIC_THREAD_ID: value = LLVMGetParam(radeon_bld->main_fn, SI_PARAM_THREAD_ID);
Re: [Mesa-dev] [PATCH] radeonsi: support TGSI compute shaders with variable block size
On 09/09/2016 02:37 PM, Ilia Mirkin wrote: On Fri, Sep 9, 2016 at 8:29 AM, Marek Olšákwrote: On Fri, Sep 9, 2016 at 10:12 AM, Nicolai Hähnle wrote: From: Nicolai Hähnle Not sure if it's possible to avoid programming the block size twice (once for the userdata and once for the dispatch). Since the shaders are compiled with a pessimistic upper limit on the number of registers, asynchronously compiling variants may be worth considering in the future if we observe the shaders to be dispatched with small block sizes. --- I think this is sufficient to support variable group sizes on radeonsi, but it's completely untested. Do you keep the latest version of your series in a public repository somewhere? src/gallium/drivers/radeonsi/si_compute.c | 10 +- src/gallium/drivers/radeonsi/si_shader.c | 29 - src/gallium/drivers/radeonsi/si_shader.h | 4 +++- 3 files changed, 32 insertions(+), 11 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index 5041761..26e096c 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -379,25 +379,33 @@ static void si_setup_tgsi_grid(struct si_context *sctx, for (i = 0; i < 3; ++i) { radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0)); radeon_emit(cs, COPY_DATA_SRC_SEL(COPY_DATA_MEM) | COPY_DATA_DST_SEL(COPY_DATA_REG)); radeon_emit(cs, (va + 4 * i)); radeon_emit(cs, (va + 4 * i) >> 32); radeon_emit(cs, (grid_size_reg >> 2) + i); radeon_emit(cs, 0); } } else { + struct si_compute *program = sctx->cs_shader_state.program; + bool variable_group_size = + program->shader.selector->info.properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] == 0; - radeon_set_sh_reg_seq(cs, grid_size_reg, 3); + radeon_set_sh_reg_seq(cs, grid_size_reg, variable_group_size ? 6 : 3); radeon_emit(cs, info->grid[0]); radeon_emit(cs, info->grid[1]); radeon_emit(cs, info->grid[2]); + if (variable_group_size) { + radeon_emit(cs, info->block[0]); + radeon_emit(cs, info->block[1]); + radeon_emit(cs, info->block[2]); + } } } static void si_emit_dispatch_packets(struct si_context *sctx, const struct pipe_grid_info *info) { struct radeon_winsys_cs *cs = sctx->b.gfx.cs; bool render_cond_bit = sctx->b.render_cond && !sctx->b.render_cond_force_off; unsigned waves_per_threadgroup = DIV_ROUND_UP(info->block[0] * info->block[1] * info->block[2], 64); diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 0b7de18..730ee21 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -1783,30 +1783,35 @@ static void declare_system_value( case TGSI_SEMANTIC_GRID_SIZE: value = LLVMGetParam(radeon_bld->main_fn, SI_PARAM_GRID_SIZE); break; case TGSI_SEMANTIC_BLOCK_SIZE: { LLVMValueRef values[3]; unsigned i; unsigned *properties = ctx->shader->selector->info.properties; - unsigned sizes[3] = { - properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH], - properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT], - properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH] - }; - for (i = 0; i < 3; ++i) - values[i] = lp_build_const_int32(gallivm, sizes[i]); + if (properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] != 0) { + unsigned sizes[3] = { + properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH], + properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT], + properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH] + }; + + for (i = 0; i < 3; ++i) + values[i] = lp_build_const_int32(gallivm, sizes[i]); - value = lp_build_gather_values(gallivm, values, 3); + value = lp_build_gather_values(gallivm, values, 3); + } else { + value = LLVMGetParam(radeon_bld->main_fn, SI_PARAM_BLOCK_SIZE); + } break; } case TGSI_SEMANTIC_BLOCK_ID: value = LLVMGetParam(radeon_bld->main_fn, SI_PARAM_BLOCK_ID);
Re: [Mesa-dev] [PATCH] radeonsi: support TGSI compute shaders with variable block size
On 09/09/2016 10:12 AM, Nicolai Hähnle wrote: From: Nicolai HähnleNot sure if it's possible to avoid programming the block size twice (once for the userdata and once for the dispatch). Since the shaders are compiled with a pessimistic upper limit on the number of registers, asynchronously compiling variants may be worth considering in the future if we observe the shaders to be dispatched with small block sizes. --- I think this is sufficient to support variable group sizes on radeonsi, but it's completely untested. Do you keep the latest version of your series in a public repository somewhere? Yes, you can find the current version here: https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=arb_compute_variable_group_size The next one will be in the arb_compute_variable_group_size_v1 branch (once I have fixed all the things locally). src/gallium/drivers/radeonsi/si_compute.c | 10 +- src/gallium/drivers/radeonsi/si_shader.c | 29 - src/gallium/drivers/radeonsi/si_shader.h | 4 +++- 3 files changed, 32 insertions(+), 11 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index 5041761..26e096c 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -379,25 +379,33 @@ static void si_setup_tgsi_grid(struct si_context *sctx, for (i = 0; i < 3; ++i) { radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0)); radeon_emit(cs, COPY_DATA_SRC_SEL(COPY_DATA_MEM) | COPY_DATA_DST_SEL(COPY_DATA_REG)); radeon_emit(cs, (va + 4 * i)); radeon_emit(cs, (va + 4 * i) >> 32); radeon_emit(cs, (grid_size_reg >> 2) + i); radeon_emit(cs, 0); } } else { + struct si_compute *program = sctx->cs_shader_state.program; + bool variable_group_size = + program->shader.selector->info.properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] == 0; - radeon_set_sh_reg_seq(cs, grid_size_reg, 3); + radeon_set_sh_reg_seq(cs, grid_size_reg, variable_group_size ? 6 : 3); radeon_emit(cs, info->grid[0]); radeon_emit(cs, info->grid[1]); radeon_emit(cs, info->grid[2]); + if (variable_group_size) { + radeon_emit(cs, info->block[0]); + radeon_emit(cs, info->block[1]); + radeon_emit(cs, info->block[2]); + } } } static void si_emit_dispatch_packets(struct si_context *sctx, const struct pipe_grid_info *info) { struct radeon_winsys_cs *cs = sctx->b.gfx.cs; bool render_cond_bit = sctx->b.render_cond && !sctx->b.render_cond_force_off; unsigned waves_per_threadgroup = DIV_ROUND_UP(info->block[0] * info->block[1] * info->block[2], 64); diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 0b7de18..730ee21 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -1783,30 +1783,35 @@ static void declare_system_value( case TGSI_SEMANTIC_GRID_SIZE: value = LLVMGetParam(radeon_bld->main_fn, SI_PARAM_GRID_SIZE); break; case TGSI_SEMANTIC_BLOCK_SIZE: { LLVMValueRef values[3]; unsigned i; unsigned *properties = ctx->shader->selector->info.properties; - unsigned sizes[3] = { - properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH], - properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT], - properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH] - }; - for (i = 0; i < 3; ++i) - values[i] = lp_build_const_int32(gallivm, sizes[i]); + if (properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] != 0) { + unsigned sizes[3] = { + properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH], + properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT], + properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH] + }; + + for (i = 0; i < 3; ++i) + values[i] = lp_build_const_int32(gallivm, sizes[i]); - value = lp_build_gather_values(gallivm, values, 3); + value = lp_build_gather_values(gallivm, values, 3); + } else { + value = LLVMGetParam(radeon_bld->main_fn, SI_PARAM_BLOCK_SIZE); + } break; } case
Re: [Mesa-dev] [PATCH 00/11] add support for ARB_compute_variable_group_size
On 09/09/2016 09:43 AM, Nicolai Hähnle wrote: Hi Samuel, Hi, thanks for working on this! I've only skimmed the series so far without looking at each patch in detail: For GCN, we need to pass the (maximum) work group size (aka block size / local group size) into the compiler, because it affects occupancy and hence register limits. And the lowering of TGSI_SEMANTIC_BLOCK_SIZE also needs to be changed. This means more work will definitely be needed for radeonsi. I will add a new cap for MaxComputeVariableGroupInvocations as suggested by Marek. And enable the extension based on that cap. That way, you will be able to fix up radeonsi in a separate patch. It would be great if you could add docs for TGSI_SEMANTIC_{GROUP,BLOCK}_SIZE to tgsi.rst. I know they were already missing before. That would be great yeah, but in a separate patch out of that series I think. Also, the nouveau enablement patch should come _before_ the patch that turns on the extension... Good catch, thanks. :) Cheers, Nicolai On 08.09.2016 22:31, Samuel Pitoiset wrote: Hi, This series implements ARB_compute_variable_group_size written against GL 4.3. This extension allows to dispatch variable work group size via a new function called glDispatchComputeGroupSizeARB(). Because this extension is pretty similar to ARB_compute_shader, all Gallium drivers which already support compute shaders will expose ARB_compute_variable_group_size with that series. I did write a bunch of piglit tests, have a look here if you want: https://lists.freedesktop.org/archives/piglit/2016-September/020755.html All tests pass on Fermi (GF119) as well as all previous compute shaders tests. Marek, Nicolai and other AMD folks, I don't know if radeonsi will need a fix somewhere for handling a variable work group size, but as I don't have the hardware, I can't test. Let me know if something needs to be slighty updated. Please review, Thanks! Samuel Pitoiset (11): glapi: add entry points for GL_ARB_compute_variable_group_size mesa/main: add support for ARB_compute_variable_groups_size glsl: add enable flags for ARB_compute_variable_group_size glsl: process local_size_variable input qualifier glsl: reject compute shaders with fixed and variable local size glsl/linker: handle errors when a variable local size is used glsl: add gl_LocalGroupSizeARB as a system value st/mesa: add mapping for SYSTEM_VALUE_LOCAL_GROUP_SIZE st/mesa: add support for dispatching a variable local size st/mesa: expose ARB_compute_variable_group_size nv50/ir: use 1024 threads/block for variable local size src/compiler/glsl/ast.h| 5 ++ src/compiler/glsl/ast_to_hir.cpp | 14 src/compiler/glsl/ast_type.cpp | 6 ++ src/compiler/glsl/builtin_variables.cpp| 2 + src/compiler/glsl/glsl_parser.yy | 13 +++ src/compiler/glsl/glsl_parser_extras.cpp | 6 ++ src/compiler/glsl/glsl_parser_extras.h | 8 ++ src/compiler/glsl/linker.cpp | 23 +- src/compiler/glsl/standalone.cpp | 4 + src/compiler/glsl/standalone_scaffolding.cpp | 5 ++ src/compiler/shader_enums.h| 1 + .../drivers/nouveau/codegen/nv50_ir_target.h | 3 +- .../glapi/gen/ARB_compute_variable_group_size.xml | 25 ++ src/mapi/glapi/gen/Makefile.am | 1 + src/mapi/glapi/gen/gl_API.xml | 2 + src/mesa/main/api_validate.c | 94 ++ src/mesa/main/api_validate.h | 4 + src/mesa/main/compute.c| 25 ++ src/mesa/main/compute.h| 5 ++ src/mesa/main/context.c| 6 ++ src/mesa/main/dd.h | 9 +++ src/mesa/main/extensions_table.h | 1 + src/mesa/main/get.c| 12 +++ src/mesa/main/get_hash_params.py | 3 + src/mesa/main/mtypes.h | 23 +- src/mesa/main/shaderapi.c | 1 + src/mesa/main/shaderobj.c | 2 + src/mesa/main/tests/dispatch_sanity.cpp| 3 + src/mesa/state_tracker/st_cb_compute.c | 15 +++- src/mesa/state_tracker/st_extensions.c | 13 +++ src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 + 31 files changed, 329 insertions(+), 7 deletions(-) create mode 100644 src/mapi/glapi/gen/ARB_compute_variable_group_size.xml ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/11] st/mesa: expose ARB_compute_variable_group_size
On 09/09/2016 04:02 PM, Marek Olšák wrote: For patches 8, 9: Reviewed-by: Marek OlšákPatch 10 won't work for us, because radeonsi (and presumably softpipe as well) don't support this feature at the moment. Also, I would prefer a PIPE_CAP for MaxComputeVariableGroupInvocations and the extension can be exposed based on that CAP. Fine by me, I will add this new cap. Thanks for reviewing. Marek On Thu, Sep 8, 2016 at 10:31 PM, Samuel Pitoiset wrote: This extension is only exposed if the underlying driver supports ARB_compute_shader. Signed-off-by: Samuel Pitoiset --- src/mesa/state_tracker/st_extensions.c | 13 + 1 file changed, 13 insertions(+) diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index 807fbfb..dc2e60a 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -1196,6 +1196,19 @@ void st_init_extensions(struct pipe_screen *screen, extensions->ARB_compute_shader = extensions->ARB_shader_image_load_store && extensions->ARB_shader_atomic_counters; + + if (extensions->ARB_compute_shader) { +/* Because the minimum values required by + * ARB_compute_variable_group_size are less (or equal) than the + * ones defined by ARB_compute_shader we can re-use them. */ +for (i = 0; i < 3; i++) { + consts->MaxComputeVariableGroupSize[i] = + consts->MaxComputeWorkGroupSize[i]; +} +consts->MaxComputeVariableGroupInvocations = + consts->MaxComputeWorkGroupInvocations; +extensions->ARB_compute_variable_group_size = true; + } } } -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 02/11] mesa/main: add support for ARB_compute_variable_groups_size
On 09/08/2016 10:58 PM, Ian Romanick wrote: On 09/08/2016 01:31 PM, Samuel Pitoiset wrote: Signed-off-by: Samuel Pitoiset--- src/mesa/main/api_validate.c | 94 src/mesa/main/api_validate.h | 4 ++ src/mesa/main/compute.c | 17 src/mesa/main/context.c | 6 +++ src/mesa/main/dd.h | 9 src/mesa/main/extensions_table.h | 1 + src/mesa/main/get.c | 12 + src/mesa/main/get_hash_params.py | 3 ++ src/mesa/main/mtypes.h | 23 +- src/mesa/main/shaderapi.c| 1 + src/mesa/main/shaderobj.c| 2 + 11 files changed, 171 insertions(+), 1 deletion(-) diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c index b35751e..9379015 100644 --- a/src/mesa/main/api_validate.c +++ b/src/mesa/main/api_validate.c @@ -1096,6 +1096,7 @@ GLboolean _mesa_validate_DispatchCompute(struct gl_context *ctx, const GLuint *num_groups) { + struct gl_shader_program *prog; int i; FLUSH_CURRENT(ctx, 0); @@ -1128,6 +1129,86 @@ _mesa_validate_DispatchCompute(struct gl_context *ctx, } } + /* From the ARB_compute_variable_group_size specification: +* +* "An INVALID_OPERATION error is generated by DispatchCompute if the active +* program for the compute shader stage has a variable work group size." +*/ There has been a lot of debate about formatting spec quotations. The one thing where I think everyone agrees is formatting the first like. Please use The ARB_compute_variable_group_size spec says: That makes it much easier to grep the code for spec quotations. This comment applies to the spec quotations below too. Fine by me. + prog = ctx->_Shader->CurrentProgram[MESA_SHADER_COMPUTE]; + if (prog->Comp.LocalSizeVariable) { + _mesa_error(ctx, GL_INVALID_OPERATION, + "glDispatchCompute(variable work group size forbidden)"); + return GL_FALSE; + } + + return GL_TRUE; +} + +GLboolean +_mesa_validate_DispatchComputeGroupSizeARB(struct gl_context *ctx, + const GLuint *num_groups, + const GLuint *group_size) +{ + struct gl_shader_program *prog; + GLuint64 total_invocations = 1; + int i; + + FLUSH_CURRENT(ctx, 0); + + if (!check_valid_to_compute(ctx, "glDispatchComputeGroupSizeARB")) + return GL_FALSE; + + for (i = 0; i < 3; i++) { + /* From the ARB_compute_variable_group_size specification: + * + * "An INVALID_VALUE error is generated by DispatchComputeGroupSizeARB if + * any of , , or is less than + * or equal to zero or greater than the maximum local work group size for + * compute shaders with variable group size + * (MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB) in the corresponding dimension." + * + * However, the "less than" is a spec bug because they are declared as + * unsigned integers. + */ + if (group_size[i] == 0 || + group_size[i] > ctx->Const.MaxComputeVariableGroupSize[i]) { + _mesa_error(ctx, GL_INVALID_VALUE, + "glDispatchComputeGroupSizeARB(group_size_%c)", 'x' + i); + return GL_FALSE; + } + + /* From the ARB_compute_variable_group_size specification: + * + * "An INVALID_VALUE error is generated by DispatchComputeGroupSizeARB if + * the product of , , and + * exceeds the implementation-dependent maximum local work group + * invocation count for compute shaders with variable group size + * (MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB)." + */ + total_invocations *= group_size[i]; + if (total_invocations > ctx->Const.MaxComputeVariableGroupInvocations) { + _mesa_error(ctx, GL_INVALID_VALUE, + "glDispatchComputeGroupSizeARB(product of local_sizes " + "exceeds MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB (%d))", + ctx->Const.MaxComputeVariableGroupInvocations); + return GL_FALSE; + } This check should happen after the loop, and you should also log the value of total_invocations. Something like: _mesa_error(ctx, GL_INVALID_VALUE, "glDispatchComputeGroupSizeARB(product of local_sizes " "exceeds MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB (%d < %d))", total_invocations, ctx->Const.MaxComputeVariableGroupInvocations); Yes, the message is more informative. + + /* From the ARB_compute_variable_group_size specification: + * + * "An INVALID_OPERATION error is generated by + * DispatchComputeGroupSizeARB if the active program for the compute + * shader stage has a fixed work group size." + */ + prog = ctx->_Shader->CurrentProgram[MESA_SHADER_COMPUTE]; + if
[Mesa-dev] [PATCH v3] vl/dri3: handle the case of different GPU
In case of prime when rendering is done on GPU other then the server GPU, use a seprate linear buffer for each back buffer which will be displayed using present extension. v2: Use a seprate linear buffer for each back buffer (Michel) v3: change variable names and fix coding style (Leo and Emil) Signed-off-by: Nayan Deshmukh--- src/gallium/auxiliary/vl/vl_winsys_dri3.c | 63 --- 1 file changed, 50 insertions(+), 13 deletions(-) diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c b/src/gallium/auxiliary/vl/vl_winsys_dri3.c index 3d596a6..3607f86 100644 --- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c +++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c @@ -49,6 +49,7 @@ struct vl_dri3_buffer { struct pipe_resource *texture; + struct pipe_resource *linear_texture; uint32_t pixmap; uint32_t sync_fence; @@ -69,6 +70,8 @@ struct vl_dri3_screen xcb_present_event_t eid; xcb_special_event_t *special_event; + struct pipe_context *pipe; + struct vl_dri3_buffer *back_buffers[BACK_BUFFER_NUM]; int cur_back; @@ -82,6 +85,7 @@ struct vl_dri3_screen int64_t last_ust, ns_frame, last_msc, next_msc; bool flushed; + bool is_different_gpu; }; static void @@ -102,6 +106,8 @@ dri3_free_back_buffer(struct vl_dri3_screen *scrn, xcb_sync_destroy_fence(scrn->conn, buffer->sync_fence); xshmfence_unmap_shm(buffer->shm_fence); pipe_resource_reference(>texture, NULL); + if (buffer->linear_texture) + pipe_resource_reference(>linear_texture, NULL); FREE(buffer); } @@ -209,7 +215,7 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn) xcb_sync_fence_t sync_fence; struct xshmfence *shm_fence; int buffer_fd, fence_fd; - struct pipe_resource templ; + struct pipe_resource templ, *pixmap_buffer_texture; struct winsys_handle whandle; unsigned usage; @@ -226,8 +232,7 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn) goto close_fd; memset(, 0, sizeof(templ)); - templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW | -PIPE_BIND_SCANOUT | PIPE_BIND_SHARED; + templ.bind = PIPE_BIND_RENDER_TARGET; templ.format = PIPE_FORMAT_B8G8R8X8_UNORM; templ.target = PIPE_TEXTURE_2D; templ.last_level = 0; @@ -235,16 +240,36 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn) templ.height0 = scrn->height; templ.depth0 = 1; templ.array_size = 1; - buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen, - ); - if (!buffer->texture) - goto unmap_shm; + if (scrn->is_different_gpu) { + buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen, +); + if (!buffer->texture) + goto unmap_shm; + + templ.bind |= PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_SCANOUT | +PIPE_BIND_SHARED | PIPE_BIND_LINEAR; + buffer->linear_texture = scrn->base.pscreen->resource_create(scrn->base.pscreen, + ); + pixmap_buffer_texture = buffer->linear_texture; + + if (!buffer->linear_texture) + goto no_linear_texture; + + } else { + templ.bind |= PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_SCANOUT | +PIPE_BIND_SHARED; + buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen, +); + if (!buffer->texture) + goto unmap_shm; + pixmap_buffer_texture = buffer->texture; + } memset(, 0, sizeof(whandle)); whandle.type= DRM_API_HANDLE_TYPE_FD; usage = PIPE_HANDLE_USAGE_EXPLICIT_FLUSH | PIPE_HANDLE_USAGE_READ; scrn->base.pscreen->resource_get_handle(scrn->base.pscreen, NULL, - buffer->texture, , + pixmap_buffer_texture, , usage); buffer_fd = whandle.handle; buffer->pitch = whandle.stride; @@ -271,6 +296,8 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn) return buffer; +no_linear_texture: + pipe_resource_reference(>texture, NULL); unmap_shm: xshmfence_unmap_shm(shm_fence); close_fd: @@ -474,6 +501,7 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen, struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)context_private; uint32_t options = XCB_PRESENT_OPTION_NONE; struct vl_dri3_buffer *back; + struct pipe_box src_box; back = scrn->back_buffers[scrn->cur_back]; if (!back) @@ -485,6 +513,16 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen, return; } + if (scrn->is_different_gpu) { + u_box_origin_2d(scrn->width, scrn->height, _box); + scrn->pipe->resource_copy_region(scrn->pipe, + back->linear_texture,
Re: [Mesa-dev] [PATCH 01/11] glapi: add entry points for GL_ARB_compute_variable_group_size
On 09/08/2016 10:58 PM, Ian Romanick wrote: On 09/08/2016 01:31 PM, Samuel Pitoiset wrote: Signed-off-by: Samuel Pitoiset--- .../glapi/gen/ARB_compute_variable_group_size.xml | 25 ++ src/mapi/glapi/gen/Makefile.am | 1 + src/mapi/glapi/gen/gl_API.xml | 2 ++ src/mesa/main/compute.c| 8 +++ src/mesa/main/compute.h| 5 + src/mesa/main/tests/dispatch_sanity.cpp| 3 +++ 6 files changed, 44 insertions(+) create mode 100644 src/mapi/glapi/gen/ARB_compute_variable_group_size.xml diff --git a/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml new file mode 100644 index 000..b21c52f --- /dev/null +++ b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml @@ -0,0 +1,25 @@ + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am index 0d7c338..49fdfe3 100644 --- a/src/mapi/glapi/gen/Makefile.am +++ b/src/mapi/glapi/gen/Makefile.am @@ -117,6 +117,7 @@ API_XML = \ ARB_color_buffer_float.xml \ ARB_compressed_texture_pixel_storage.xml \ ARB_compute_shader.xml \ + ARB_compute_variable_group_size.xml \ ARB_copy_buffer.xml \ ARB_copy_image.xml \ ARB_debug_output.xml \ diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml index c39aa22..9ad3b60 100644 --- a/src/mapi/glapi/gen/gl_API.xml +++ b/src/mapi/glapi/gen/gl_API.xml @@ -8258,6 +8258,8 @@ http://www.w3.org/2001/XInclude"/> +http://www.w3.org/2001/XInclude"/> + This (extension #153) should go before ARB_indirect_parameters (extension #154), and the "ARB extensions 149 - 153" should be changed to "ARB extensions 149 - 152". Presumably, you meant "ARB extensions 149 - 154"? Because the next section is "ARB extension 155 - 159". With that fixed, this patch is Reviewed-by: Ian Romanick http://www.w3.org/2001/XInclude"/> diff --git a/src/mesa/main/compute.c b/src/mesa/main/compute.c index b71430f..b052bae 100644 --- a/src/mesa/main/compute.c +++ b/src/mesa/main/compute.c @@ -60,3 +60,11 @@ _mesa_DispatchComputeIndirect(GLintptr indirect) ctx->Driver.DispatchComputeIndirect(ctx, indirect); } + +void GLAPIENTRY +_mesa_DispatchComputeGroupSizeARB(GLuint num_groups_x, GLuint num_groups_y, + GLuint num_groups_z, GLuint group_size_x, + GLuint group_size_y, GLuint group_size_z) +{ + +} diff --git a/src/mesa/main/compute.h b/src/mesa/main/compute.h index 0cc034f..8018bbb 100644 --- a/src/mesa/main/compute.h +++ b/src/mesa/main/compute.h @@ -35,4 +35,9 @@ _mesa_DispatchCompute(GLuint num_groups_x, extern void GLAPIENTRY _mesa_DispatchComputeIndirect(GLintptr indirect); +extern void GLAPIENTRY +_mesa_DispatchComputeGroupSizeARB(GLuint num_groups_x, GLuint num_groups_y, + GLuint num_groups_z, GLuint group_size_x, + GLuint group_size_y, GLuint group_size_z); + #endif diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index 42fe61a..7faeabe 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -942,6 +942,9 @@ const struct function common_desktop_functions_possible[] = { { "glDispatchCompute", 43, -1 }, { "glDispatchComputeIndirect", 43, -1 }, + /* GL_ARB_compute_variable_group_size */ + { "glDispatchComputeGroupSizeARB", 43, -1 }, + /* GL_EXT_polygon_offset_clamp */ { "glPolygonOffsetClampEXT", 11, -1 }, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 2/3] egl: return corresponding offset of EGLImage instead of 0.
The comments from Emil can be summarized into the following code: if (offsets) { offsets[0] = 0; EGLint img_offset = 0; bool ret = dri2_dpy->image->queryImage(dri2_img->dri_image, __DRI_IMAGE_ATTRIB_OFFSET, _offset); if(ret == true) offsets[0] = img_offset; } return EGL_TRUE; Emil, is it right? -Original Message- From: Emil Velikov [mailto:emil.l.veli...@gmail.com] Sent: Friday, September 9, 2016 8:32 PM To: Weng, ChuanboCc: ML mesa-dev ; Axel Davy Subject: Re: [PATCH v2 2/3] egl: return corresponding offset of EGLImage instead of 0. On 9 September 2016 at 08:58, Chuanbo Weng wrote: > The offset should not always be 0. For example, if EGLImage is created > from a 2D texture with EGL_GL_TEXTURE_LEVEL=1, then the offset should > be the actual start of miplevel 1 in bo. > > v2: Add version check of __DRIimageExtension implementation (Suggested > by Axel Davy). > Just to elaborate what Axel was saying from another perspective: With current master the offset is explicitly zeroed, while with the (v2) patch you'll _only_ set the value when you have v13 driver. Thus using the loader + v12 driver you'll get a regression since a) the offset will remain garbage and b) the function (dri2_export_dma_buf_image_mesa) will fail. > Signed-off-by: Chuanbo Weng > --- > src/egl/drivers/dri2/egl_dri2.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/src/egl/drivers/dri2/egl_dri2.c > b/src/egl/drivers/dri2/egl_dri2.c index 859612f..c529e55 100644 > --- a/src/egl/drivers/dri2/egl_dri2.c > +++ b/src/egl/drivers/dri2/egl_dri2.c > @@ -2249,6 +2249,7 @@ dri2_export_dma_buf_image_mesa(_EGLDriver *drv, > _EGLDisplay *disp, _EGLImage *im > struct dri2_egl_image *dri2_img = dri2_egl_image(img); > > (void) drv; > + EGLint img_offset = 0; > > /* rework later to provide multiple fds/strides/offsets */ > if (fds) > @@ -2259,8 +2260,14 @@ dri2_export_dma_buf_image_mesa(_EGLDriver *drv, > _EGLDisplay *disp, _EGLImage *im >dri2_dpy->image->queryImage(dri2_img->dri_image, > __DRI_IMAGE_ATTRIB_STRIDE, strides); > > - if (offsets) > + if (offsets){ >offsets[0] = 0; > + if(dri2_dpy->image->base.version >= 13){ Please move the variable declaration in local scope and add space between ){ Functionality wise we don't need the version check, esp. since you want to set the offset only when queryImage() succeeds... > + dri2_dpy->image->queryImage(dri2_img->dri_image, > + __DRI_IMAGE_ATTRIB_OFFSET, _offset); ... which you don't seem to be checking here, so you'll effectively store/use garbage. So unless Alex feels strongly for the version check I'd omit it, and fix the rest of this patch. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/5] radeonsi: remove the cache_flush atom
From: Marek Olšák--- src/gallium/drivers/radeonsi/si_compute.c| 2 +- src/gallium/drivers/radeonsi/si_cp_dma.c | 2 +- src/gallium/drivers/radeonsi/si_hw_context.c | 2 +- src/gallium/drivers/radeonsi/si_pipe.h | 1 - src/gallium/drivers/radeonsi/si_state.c | 1 - src/gallium/drivers/radeonsi/si_state.h | 3 +-- src/gallium/drivers/radeonsi/si_state_draw.c | 10 +- 7 files changed, 9 insertions(+), 12 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index 5041761..f43c616 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -467,21 +467,21 @@ static void si_launch_grid( if (info->indirect) r600_context_add_resource_size(ctx, info->indirect); /* TODO: add the scratch buffer */ si_need_cs_space(sctx); if (!sctx->cs_shader_state.initialized) si_initialize_compute(sctx); if (sctx->b.flags) - si_emit_cache_flush(sctx, NULL); + si_emit_cache_flush(sctx); if (!si_switch_compute_shader(sctx, program, >shader, info->pc)) return; si_upload_compute_shader_descriptors(sctx); si_emit_compute_shader_userdata(sctx); if (si_is_atom_dirty(sctx, sctx->atoms.s.render_cond)) { sctx->atoms.s.render_cond->emit(>b, sctx->atoms.s.render_cond); diff --git a/src/gallium/drivers/radeonsi/si_cp_dma.c b/src/gallium/drivers/radeonsi/si_cp_dma.c index 7d4edc0..08d3dfe 100644 --- a/src/gallium/drivers/radeonsi/si_cp_dma.c +++ b/src/gallium/drivers/radeonsi/si_cp_dma.c @@ -163,21 +163,21 @@ static void si_cp_dma_prepare(struct si_context *sctx, struct pipe_resource *dst RADEON_USAGE_WRITE, RADEON_PRIO_CP_DMA); if (src) radeon_add_to_buffer_list(>b, >b.gfx, (struct r600_resource*)src, RADEON_USAGE_READ, RADEON_PRIO_CP_DMA); /* Flush the caches for the first copy only. * Also wait for the previous CP DMA operations. */ if (sctx->b.flags) { - si_emit_cache_flush(sctx, NULL); + si_emit_cache_flush(sctx); *flags |= SI_CP_DMA_RAW_WAIT; } /* Do the synchronization after the last dma, so that all data * is written to memory. */ if (byte_count == remaining_size) *flags |= R600_CP_DMA_SYNC; } diff --git a/src/gallium/drivers/radeonsi/si_hw_context.c b/src/gallium/drivers/radeonsi/si_hw_context.c index 24b0360..67e8352 100644 --- a/src/gallium/drivers/radeonsi/si_hw_context.c +++ b/src/gallium/drivers/radeonsi/si_hw_context.c @@ -108,21 +108,21 @@ void si_context_gfx_flush(void *context, unsigned flags, r600_preflush_suspend_features(>b); ctx->b.flags |= SI_CONTEXT_CS_PARTIAL_FLUSH | SI_CONTEXT_PS_PARTIAL_FLUSH; /* DRM 3.1.0 doesn't flush TC for VI correctly. */ if (ctx->b.chip_class == VI && ctx->b.screen->info.drm_minor <= 1) ctx->b.flags |= SI_CONTEXT_INV_GLOBAL_L2 | SI_CONTEXT_INV_VMEM_L1; - si_emit_cache_flush(ctx, NULL); + si_emit_cache_flush(ctx); if (ctx->trace_buf) si_trace_emit(ctx); if (ctx->is_debug) { /* Save the IB for debug contexts. */ radeon_clear_saved_cs(>last_gfx); radeon_save_cs(ws, cs, >last_gfx); r600_resource_reference(>last_trace_buf, ctx->trace_buf); r600_resource_reference(>trace_buf, NULL); diff --git a/src/gallium/drivers/radeonsi/si_pipe.h b/src/gallium/drivers/radeonsi/si_pipe.h index a648d86..1080e72 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.h +++ b/src/gallium/drivers/radeonsi/si_pipe.h @@ -222,21 +222,20 @@ struct si_context { boolcompute_is_busy; /* Atoms (direct states). */ union si_state_atomsatoms; unsigneddirty_atoms; /* mask */ /* PM4 states (precomputed immutable states) */ union si_state queued; union si_state emitted; /* Atom declarations. */ - struct r600_atomcache_flush; struct si_framebuffer framebuffer; struct si_sample_locs msaa_sample_locs; struct r600_atomdb_render_state; struct r600_atommsaa_config; struct si_sample_mask sample_mask; struct r600_atomcb_render_state; struct si_blend_color blend_color; struct r600_atom
[Mesa-dev] [PATCH 1/5] winsys/radeon: replace OUT_CS with radeon_emit
From: Marek Olšák--- src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c index 612a876..9de00c2 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c @@ -195,22 +195,20 @@ radeon_drm_cs_create(struct radeon_winsys_ctx *ctx, cs->csc = >csc1; cs->cst = >csc2; cs->base.current.buf = cs->csc->buf; cs->base.current.max_dw = ARRAY_SIZE(cs->csc->buf); cs->ring_type = ring_type; p_atomic_inc(>num_cs); return >base; } -#define OUT_CS(cs, value) (cs)->current.buf[(cs)->current.cdw++] = (value) - static inline void update_reloc(struct drm_radeon_cs_reloc *reloc, enum radeon_bo_domain rd, enum radeon_bo_domain wd, unsigned priority, enum radeon_bo_domain *added_domains) { *added_domains = (rd | wd) & ~(reloc->read_domains | reloc->write_domain); reloc->read_domains |= rd; reloc->write_domain |= wd; @@ -454,41 +452,41 @@ static int radeon_drm_cs_flush(struct radeon_winsys_cs *rcs, struct pipe_fence_handle **fence) { struct radeon_drm_cs *cs = radeon_drm_cs(rcs); struct radeon_cs_context *tmp; switch (cs->ring_type) { case RING_DMA: /* pad DMA ring to 8 DWs */ if (cs->ws->info.chip_class <= SI) { while (rcs->current.cdw & 7) -OUT_CS(>base, 0xf000); /* NOP packet */ +radeon_emit(>base, 0xf000); /* NOP packet */ } else { while (rcs->current.cdw & 7) -OUT_CS(>base, 0x); /* NOP packet */ +radeon_emit(>base, 0x); /* NOP packet */ } break; case RING_GFX: /* pad GFX ring to 8 DWs to meet CP fetch alignment requirements * r6xx, requires at least 4 dw alignment to avoid a hw bug. */ if (cs->ws->info.gfx_ib_pad_with_type2) { while (rcs->current.cdw & 7) -OUT_CS(>base, 0x8000); /* type2 nop packet */ +radeon_emit(>base, 0x8000); /* type2 nop packet */ } else { while (rcs->current.cdw & 7) -OUT_CS(>base, 0x1000); /* type3 nop packet */ +radeon_emit(>base, 0x1000); /* type3 nop packet */ } break; case RING_UVD: while (rcs->current.cdw & 15) -OUT_CS(>base, 0x8000); /* type2 nop packet */ +radeon_emit(>base, 0x8000); /* type2 nop packet */ break; default: break; } if (rcs->current.cdw > rcs->current.max_dw) { fprintf(stderr, "radeon: command stream overflowed\n"); } if (fence) { -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/5] radeonsi: flush TC L2 before using a compute indirect buffer
From: Marek OlšákThere is no known test for this. Cc: 12.0 --- src/gallium/drivers/radeonsi/si_compute.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index f43c616..d988214 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -457,24 +457,32 @@ static void si_launch_grid( info->block[0] * info->block[1] * info->block[2] > 256; if (cs_regalloc_hang) sctx->b.flags |= SI_CONTEXT_PS_PARTIAL_FLUSH | SI_CONTEXT_CS_PARTIAL_FLUSH; si_decompress_compute_textures(sctx); /* Add buffer sizes for memory checking in need_cs_space. */ r600_context_add_resource_size(ctx, >shader.bo->b.b); - if (info->indirect) - r600_context_add_resource_size(ctx, info->indirect); /* TODO: add the scratch buffer */ + if (info->indirect) { + r600_context_add_resource_size(ctx, info->indirect); + + /* The hw doesn't read the indirect buffer via TC L2. */ + if (r600_resource(info->indirect)->TC_L2_dirty) { + sctx->b.flags |= SI_CONTEXT_INV_GLOBAL_L2; + r600_resource(info->indirect)->TC_L2_dirty = false; + } + } + si_need_cs_space(sctx); if (!sctx->cs_shader_state.initialized) si_initialize_compute(sctx); if (sctx->b.flags) si_emit_cache_flush(sctx); if (!si_switch_compute_shader(sctx, program, >shader, info->pc)) return; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/5] winsys/amdgpu: replace OUT_CS with radeon_emit
From: Marek Olšák--- src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 26 -- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c index 73c8a97..0bb916e 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c @@ -702,22 +702,20 @@ amdgpu_cs_add_const_preamble_ib(struct radeon_winsys_cs *rcs) cs->csc->request.number_of_ibs = 3; cs->csc->request.ibs = >csc->ib[IB_CONST_PREAMBLE]; cs->cst->request.number_of_ibs = 3; cs->cst->request.ibs = >cst->ib[IB_CONST_PREAMBLE]; return >const_preamble_ib.base; } -#define OUT_CS(cs, value) (cs)->current.buf[(cs)->current.cdw++] = (value) - static int amdgpu_cs_lookup_buffer(struct radeon_winsys_cs *rcs, struct pb_buffer *buf) { struct amdgpu_cs *cs = amdgpu_cs(rcs); return amdgpu_lookup_buffer(cs->csc, (struct amdgpu_winsys_bo*)buf); } static bool amdgpu_cs_validate(struct radeon_winsys_cs *rcs) { @@ -765,28 +763,28 @@ static bool amdgpu_cs_check_space(struct radeon_winsys_cs *rcs, unsigned dw) assert(ib->used_ib_space == 0); va = amdgpu_winsys_bo(ib->big_ib_buffer)->va; /* This space was originally reserved. */ rcs->current.max_dw += 4; assert(ib->used_ib_space + 4 * rcs->current.max_dw <= ib->big_ib_buffer->size); /* Pad with NOPs and add INDIRECT_BUFFER packet */ while ((rcs->current.cdw & 7) != 4) - OUT_CS(rcs, 0x1000); /* type3 nop packet */ + radeon_emit(rcs, 0x1000); /* type3 nop packet */ - OUT_CS(rcs, PKT3(ib->ib_type == IB_MAIN ? PKT3_INDIRECT_BUFFER_CIK + radeon_emit(rcs, PKT3(ib->ib_type == IB_MAIN ? PKT3_INDIRECT_BUFFER_CIK : PKT3_INDIRECT_BUFFER_CONST, 2, 0)); - OUT_CS(rcs, va); - OUT_CS(rcs, va >> 32); + radeon_emit(rcs, va); + radeon_emit(rcs, va >> 32); new_ptr_ib_size = >current.buf[rcs->current.cdw]; - OUT_CS(rcs, S_3F2_CHAIN(1) | S_3F2_VALID(1)); + radeon_emit(rcs, S_3F2_CHAIN(1) | S_3F2_VALID(1)); assert((rcs->current.cdw & 7) == 0); assert(rcs->current.cdw <= rcs->current.max_dw); *ib->ptr_ib_size |= rcs->current.cdw; ib->ptr_ib_size = new_ptr_ib_size; /* Hook up the new chunk */ rcs->prev[rcs->num_prev].buf = rcs->current.buf; rcs->prev[rcs->num_prev].cdw = rcs->current.cdw; @@ -976,48 +974,48 @@ static int amdgpu_cs_flush(struct radeon_winsys_cs *rcs, struct amdgpu_winsys *ws = cs->ctx->ws; int error_code = 0; rcs->current.max_dw += amdgpu_cs_epilog_dws(cs->ring_type); switch (cs->ring_type) { case RING_DMA: /* pad DMA ring to 8 DWs */ if (ws->info.chip_class <= SI) { while (rcs->current.cdw & 7) -OUT_CS(rcs, 0xf000); /* NOP packet */ +radeon_emit(rcs, 0xf000); /* NOP packet */ } else { while (rcs->current.cdw & 7) -OUT_CS(rcs, 0x); /* NOP packet */ +radeon_emit(rcs, 0x); /* NOP packet */ } break; case RING_GFX: /* pad GFX ring to 8 DWs to meet CP fetch alignment requirements */ if (ws->info.gfx_ib_pad_with_type2) { while (rcs->current.cdw & 7) -OUT_CS(rcs, 0x8000); /* type2 nop packet */ +radeon_emit(rcs, 0x8000); /* type2 nop packet */ } else { while (rcs->current.cdw & 7) -OUT_CS(rcs, 0x1000); /* type3 nop packet */ +radeon_emit(rcs, 0x1000); /* type3 nop packet */ } /* Also pad the const IB. */ if (cs->const_ib.ib_mapped) while (!cs->const_ib.base.current.cdw || (cs->const_ib.base.current.cdw & 7)) -OUT_CS(>const_ib.base, 0x1000); /* type3 nop packet */ +radeon_emit(>const_ib.base, 0x1000); /* type3 nop packet */ if (cs->const_preamble_ib.ib_mapped) while (!cs->const_preamble_ib.base.current.cdw || (cs->const_preamble_ib.base.current.cdw & 7)) -OUT_CS(>const_preamble_ib.base, 0x1000); +radeon_emit(>const_preamble_ib.base, 0x1000); break; case RING_UVD: while (rcs->current.cdw & 15) - OUT_CS(rcs, 0x8000); /* type2 nop packet */ + radeon_emit(rcs, 0x8000); /* type2 nop packet */ break; default: break; } if (rcs->current.cdw > rcs->current.max_dw) { fprintf(stderr, "amdgpu: command stream overflowed\n"); } /* If the CS is not empty or overflowed */ -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/5] radeonsi: fix the VGT performance tweak for small instances
From: Marek OlšákBased on the VGT spec. The Vulkan driver doesn't do it optimally and they plan to fix it. --- src/gallium/drivers/radeonsi/si_state_draw.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index d3e6e1a..e44147f 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -311,28 +311,29 @@ static unsigned si_get_ia_multi_vgt_param(struct si_context *sctx, wd_switch_on_eop = true; /* Hawaii hangs if instancing is enabled and WD_SWITCH_ON_EOP is 0. * We don't know that for indirect drawing, so treat it as * always problematic. */ if (sctx->b.family == CHIP_HAWAII && (info->indirect || info->instance_count > 1)) wd_switch_on_eop = true; /* Performance recommendation for 4 SE Gfx7-8 parts if -* instances are smaller than a primgroup. Ignore the fact -* primgroup_size is a primitive count, not vertex count. -* Don't do anything for indirect draws. +* instances are smaller than a primgroup. +* Assume indirect draws always use small instances. +* This is needed for good VS wave utilization. */ if (sctx->b.chip_class <= VI && sctx->b.screen->info.max_se >= 4 && - !info->indirect && - info->instance_count > 1 && info->count < primgroup_size) + (info->indirect || +(info->instance_count > 1 && + si_num_prims_for_vertices(info) < primgroup_size))) wd_switch_on_eop = true; /* Required on CIK and later. */ if (sctx->b.screen->info.max_se > 2 && !wd_switch_on_eop) ia_switch_on_eoi = true; /* Required by Hawaii and, for some special cases, by VI. */ if (ia_switch_on_eoi && (sctx->b.family == CHIP_HAWAII || (sctx->b.chip_class == VI && -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/7] EGL: Update eglext.h
On 9 September 2016 at 15:26, Kyle Brennemanwrote: > Would it be better to use the new upstream version of eglext.h and add the > #includes, or to keep Mesa's existing eglext.h and add the missing > declarations for EGL_KHR_debug? > Afaict we've been doing the former every(?) time. Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] vl/dri3: handle the case of different GPU
Hi Nayan, Besides Emil's comments, here is some others in lines. On 09/09/2016 09:17 AM, Nayan Deshmukh wrote: use a linear buffer in case of back buffer v2: Use a seprate linear buffer for each back buffer Signed-off-by: Nayan Deshmukh--- src/gallium/auxiliary/vl/vl_winsys_dri3.c | 82 --- 1 file changed, 63 insertions(+), 19 deletions(-) diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c b/src/gallium/auxiliary/vl/vl_winsys_dri3.c index 3d596a6..e168e36 100644 --- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c +++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c @@ -49,6 +49,7 @@ struct vl_dri3_buffer { struct pipe_resource *texture; + struct pipe_resource *linear_buffer; Better name it as linear_texture so it matches above. uint32_t pixmap; uint32_t sync_fence; @@ -69,6 +70,8 @@ struct vl_dri3_screen xcb_present_event_t eid; xcb_special_event_t *special_event; + struct pipe_context *pipe; + struct vl_dri3_buffer *back_buffers[BACK_BUFFER_NUM]; int cur_back; @@ -82,6 +85,7 @@ struct vl_dri3_screen int64_t last_ust, ns_frame, last_msc, next_msc; bool flushed; + bool is_different_gpu; }; static void @@ -209,7 +213,7 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn) xcb_sync_fence_t sync_fence; struct xshmfence *shm_fence; int buffer_fd, fence_fd; - struct pipe_resource templ; + struct pipe_resource templ, *pixmap_buffer; Here is same, maybe just name as texture for pixmap_buffer. struct winsys_handle whandle; unsigned usage; @@ -226,25 +230,49 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn) goto close_fd; memset(, 0, sizeof(templ)); - templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW | -PIPE_BIND_SCANOUT | PIPE_BIND_SHARED; - templ.format = PIPE_FORMAT_B8G8R8X8_UNORM; - templ.target = PIPE_TEXTURE_2D; - templ.last_level = 0; - templ.width0 = scrn->width; - templ.height0 = scrn->height; - templ.depth0 = 1; - templ.array_size = 1; - buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen, - ); - if (!buffer->texture) - goto unmap_shm; - + if (scrn->is_different_gpu) { + templ.format = PIPE_FORMAT_B8G8R8X8_UNORM; + templ.target = PIPE_TEXTURE_2D; + templ.last_level = 0; + templ.width0 = scrn->width; + templ.height0 = scrn->height; + templ.depth0 = 1; + templ.array_size = 1; + buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen, +); + if (!buffer->texture) + goto unmap_shm; + + templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW | + PIPE_BIND_SCANOUT | PIPE_BIND_SHARED | PIPE_BIND_LINEAR; + buffer->linear_buffer = scrn->base.pscreen->resource_create(scrn->base.pscreen, + ); You have to remember to free it later. + pixmap_buffer = buffer->linear_buffer; + + if (!buffer->linear_buffer) + goto no_linear_buffer; + + } else { + templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW | + PIPE_BIND_SCANOUT | PIPE_BIND_SHARED; + templ.format = PIPE_FORMAT_B8G8R8X8_UNORM; + templ.target = PIPE_TEXTURE_2D; + templ.last_level = 0; + templ.width0 = scrn->width; + templ.height0 = scrn->height; + templ.depth0 = 1; + templ.array_size = 1; + buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen, +); + if (!buffer->texture) + goto unmap_shm; + pixmap_buffer = buffer->texture; + } memset(, 0, sizeof(whandle)); whandle.type= DRM_API_HANDLE_TYPE_FD; usage = PIPE_HANDLE_USAGE_EXPLICIT_FLUSH | PIPE_HANDLE_USAGE_READ; scrn->base.pscreen->resource_get_handle(scrn->base.pscreen, NULL, - buffer->texture, , + pixmap_buffer, , usage); buffer_fd = whandle.handle; buffer->pitch = whandle.stride; @@ -271,6 +299,8 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn) return buffer; +no_linear_buffer: + pipe_resource_reference(>texture, NULL); unmap_shm: xshmfence_unmap_shm(shm_fence); close_fd: @@ -474,6 +504,7 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen, struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)context_private; uint32_t options = XCB_PRESENT_OPTION_NONE; struct vl_dri3_buffer *back; + struct pipe_box src_box; back = scrn->back_buffers[scrn->cur_back]; if (!back) @@ -485,6 +516,17 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen,
Re: [Mesa-dev] [PATCH 1/7] EGL: Update eglext.h
Would it be better to use the new upstream version of eglext.h and add the #includes, or to keep Mesa's existing eglext.h and add the missing declarations for EGL_KHR_debug? -Kyle On 09/09/2016 03:31 AM, Emil Velikov wrote: On 8 September 2016 at 18:46, Adam Jacksonwrote: From: Kyle Brenneman Updated eglext.h to revision 32074 from the Khronos repository. Added two #includes to egltypedefs.h. Both were in the previous version of eglext.h but not in the new one. -#include -#include - AFAICT neither of these was in the upstream eglext.h, ever. Then again we must upstreaming/standardise the remaining extensions, since removing this hunk will break applications. Most/all companies working on mesa are members of Khronos so it shouldn't be that hard to get things sorted :-) NACK on this hunk - please keep it as-is for the time being. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [v3 4/6] i965/rbc: Consult rb settings for texture surface setup
On Sep 8, 2016 11:13 PM, "Pohjolainen, Topi"wrote: > > On Thu, Sep 08, 2016 at 08:49:56AM -0700, Jason Ekstrand wrote: > >On Sep 7, 2016 9:30 PM, "Pohjolainen, Topi" > ><[1]topi.pohjolai...@gmail.com> wrote: > >> > >> On Wed, Sep 07, 2016 at 03:25:30PM -0700, Jason Ekstrand wrote: > >> >On Sep 7, 2016 10:24 AM, "Topi Pohjolainen" > >> ><[1][2]topi.pohjolai...@gmail.com> wrote: > >> >> > >> >> Once mcs buffer gets allocated without delay for lossless > >> >> compression (same as we do for msaa), one gets regression in: > >> >> > >> >> GL45-CTS.texture_barrier_ARB.same-texel-rw > >> >> > >> >> Setting the auxiliary surface for both sampling engine and > >data > >> >> port seems to fix this. I haven't found any hardware > >documentation > >> >> backing this though. > >> >> > >> >> v2 (Jason): Prepare also for the case where surface is sampled > >with > >> >> non-compressible format forcing also rendering > >without > >> >> compression. > >> >> v3: Split asserts and decision making. > >> >> > >> >> Signed-off-by: Topi Pohjolainen > ><[2][3]topi.pohjolai...@intel.com> > >> >> --- > >> >> src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 63 > >> >+--- > >> >> 1 file changed, 56 insertions(+), 7 deletions(-) > >> >> > >> >> diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c > >> >b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c > >> >> index c1273c5..054c5c8 100644 > >> >> --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c > >> >> +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c > >> >> @@ -140,9 +140,7 @@ brw_emit_surface_state(struct brw_context > >*brw, > >> >> struct isl_surf *aux_surf = NULL, aux_surf_s; > >> >> uint64_t aux_offset = 0; > >> >> enum isl_aux_usage aux_usage = ISL_AUX_USAGE_NONE; > >> >> - if (mt->mcs_mt && > >> >> - ((view.usage & ISL_SURF_USAGE_RENDER_TARGET_BIT) || > >> >> -mt->fast_clear_state != > >INTEL_FAST_CLEAR_STATE_RESOLVED)) { > >> >> + if (mt->mcs_mt && !(flags & INTEL_AUX_BUFFER_DISABLED)) { > >> >>intel_miptree_get_aux_isl_surf(brw, mt, _surf_s, > >> >_usage); > >> >>aux_surf = _surf_s; > >> >>assert(mt->mcs_mt->offset == 0); > >> >> @@ -425,6 +423,54 @@ swizzle_to_scs(GLenum swizzle, bool > >> >need_green_to_blue) > >> >> return (need_green_to_blue && scs == HSW_SCS_GREEN) ? > >> >HSW_SCS_BLUE : scs; > >> >> } > >> >> > >> >> +static unsigned > >> >> +brw_find_matching_rb(const struct gl_framebuffer *fb, > >> >> + const struct intel_mipmap_tree *mt) > >> >> +{ > >> >> + for (unsigned i = 0; i < fb->_NumColorDrawBuffers; i++) { > >> >> + const struct intel_renderbuffer *irb = > >> >> + intel_renderbuffer(fb->_ColorDrawBuffers[i]); > >> >> + > >> >> + if (irb->mt == mt) > >> >> + return i; > >> >> + } > >> >> + > >> >> + return fb->_NumColorDrawBuffers; > >> >> +} > >> >> + > >> >> +static bool > >> >> +brw_disable_aux_surface(const struct brw_context *brw, > >> >> +const struct intel_mipmap_tree *mt) > >> >> +{ > >> >> + /* Nothing to disable. */ > >> >> + if (!mt->mcs_mt) > >> >> + return false; > >> >> + > >> >> + /* There are special cases only for lossless compression. > >*/ > >> >> + if (!intel_miptree_is_lossless_compressed(brw, mt)) > >> >> + return mt->fast_clear_state == > >> >INTEL_FAST_CLEAR_STATE_RESOLVED; > >> >> + > >> >> + const struct gl_framebuffer *fb = brw->ctx.DrawBuffer; > >> >> + const unsigned rb_index = brw_find_matching_rb(fb, mt); > >> >> + > >> >> + /* In practise it looks that setting the same lossless > >compressed > >> >surface > >> >> +* to be sampled without auxiliary surface and to be > >written with > >> >auxiliary > >> >> +* surface confuses the hardware. Therefore sampler engine > >must > >> >be provided > >> >> +* with auxiliary buffer regardless of the fast clear > >state if > >> >the same > >> >> +* surface is also going to be written during the same > >rendering > >> >pass with > >> >> +* auxiliary buffer enabled. > >> >> +*/ > >> >> + if (rb_index < fb->_NumColorDrawBuffers) { > >>
Re: [Mesa-dev] [PATCH v2] vl/dri3: handle the case of different GPU
On Fri, Sep 9, 2016 at 7:03 PM, Emil Velikovwrote: > Hi Nayan, > > Just a couple of fly-by comments. As always don't read too much into them. > > On 9 September 2016 at 14:17, Nayan Deshmukh > wrote: > > use a linear buffer in case of back buffer > > > You might want to mention a bit more about the implementation and/or > why doing things like X won't work/is bad idea. > > Yeah I will add more detail to the commit message. > > + if (scrn->is_different_gpu) { > > + templ.format = PIPE_FORMAT_B8G8R8X8_UNORM; > > + templ.target = PIPE_TEXTURE_2D; > > + templ.last_level = 0; > > + templ.width0 = scrn->width; > > + templ.height0 = scrn->height; > > + templ.depth0 = 1; > > + templ.array_size = 1; > bind == 0 here, that doesn't seem right ? > > I was trying to replicate the code used in loader_dri3_helper.c. I was not sure which bind flags should be used, I was hoping that michel can have a look and suggest the necessary changes. I tested it on my hardware and it was working fine. On second thought it should have a PIPE_BIND_RENDER_TARGET flag. > + buffer->texture = scrn->base.pscreen->resource_ > create(scrn->base.pscreen, > > +); > > + if (!buffer->texture) > > + goto unmap_shm; > > + > With the bind fixed the above hunk is common so can be kept prior to > the conditional, no ? > > With bind flag fixed this common code could be taken out of the conditional. > > > @@ -485,6 +516,17 @@ vl_dri3_flush_frontbuffer(struct pipe_screen > *screen, > > return; > > } > > > > + if (scrn->is_different_gpu) { > > + u_box_origin_2d(scrn->width, > > + scrn->height, > > + _box); > > + scrn->pipe->resource_copy_region(scrn->pipe, > > + back->linear_buffer, > > + 0, 0, 0, 0, > > + back->texture, > > + 0, _box); > > + > Please try to use the full width of the line. > > The entire file uses this kind of wraping so I was trying to follow the style. It would be better if entire file has the same kind of wraping. Thanks for the review. I'll make the necessary changes in the next version once michel has a look and approves it. > Regards, > Emil > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] tgsi/scan: don't set interp flags for inputs only used by INTERP instructions
From: Marek Olšákradeonsi depends on the interp flags a little bit too much. This fixes 9 randomly failing tests: GL45-CTS.shader_multisample_interpolation.render.interpolate_at_centroid.* --- src/gallium/auxiliary/tgsi/tgsi_scan.c | 105 ++--- 1 file changed, 57 insertions(+), 48 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.c b/src/gallium/auxiliary/tgsi/tgsi_scan.c index a3b0d9f..39e4dd9 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_scan.c +++ b/src/gallium/auxiliary/tgsi/tgsi_scan.c @@ -95,20 +95,21 @@ computes_derivative(unsigned opcode) } static void scan_instruction(struct tgsi_shader_info *info, const struct tgsi_full_instruction *fullinst, unsigned *current_depth) { unsigned i; bool is_mem_inst = false; + bool is_interp_instruction = false; assert(fullinst->Instruction.Opcode < TGSI_OPCODE_LAST); info->opcode_count[fullinst->Instruction.Opcode]++; switch (fullinst->Instruction.Opcode) { case TGSI_OPCODE_IF: case TGSI_OPCODE_UIF: case TGSI_OPCODE_BGNLOOP: (*current_depth)++; info->max_depth = MAX2(info->max_depth, *current_depth); @@ -120,20 +121,22 @@ scan_instruction(struct tgsi_shader_info *info, default: break; } if (fullinst->Instruction.Opcode == TGSI_OPCODE_INTERP_CENTROID || fullinst->Instruction.Opcode == TGSI_OPCODE_INTERP_OFFSET || fullinst->Instruction.Opcode == TGSI_OPCODE_INTERP_SAMPLE) { const struct tgsi_full_src_register *src0 = >Src[0]; unsigned input; + is_interp_instruction = true; + if (src0->Register.Indirect && src0->Indirect.ArrayID) input = info->input_array_first[src0->Indirect.ArrayID]; else input = src0->Register.Index; /* For the INTERP opcodes, the interpolation is always * PERSPECTIVE unless LINEAR is specified. */ switch (info->input_interpolate[input]) { case TGSI_INTERPOLATE_COLOR: @@ -183,43 +186,91 @@ scan_instruction(struct tgsi_shader_info *info, if (src->Register.Indirect) { for (ind = 0; ind < info->num_inputs; ++ind) { info->input_usage_mask[ind] |= usage_mask; } } else { assert(ind >= 0); assert(ind < PIPE_MAX_SHADER_INPUTS); info->input_usage_mask[ind] |= usage_mask; } - if (info->processor == PIPE_SHADER_FRAGMENT && - !src->Register.Indirect) { -unsigned name = - info->input_semantic_name[src->Register.Index]; -unsigned index = - info->input_semantic_index[src->Register.Index]; + if (info->processor == PIPE_SHADER_FRAGMENT) { +unsigned name, index, input; + +if (src->Register.Indirect && src->Indirect.ArrayID) + input = info->input_array_first[src->Indirect.ArrayID]; +else + input = src->Register.Index; + +name = info->input_semantic_name[input]; +index = info->input_semantic_index[input]; if (name == TGSI_SEMANTIC_POSITION && (src->Register.SwizzleX == TGSI_SWIZZLE_Z || src->Register.SwizzleY == TGSI_SWIZZLE_Z || src->Register.SwizzleZ == TGSI_SWIZZLE_Z || src->Register.SwizzleW == TGSI_SWIZZLE_Z)) info->reads_z = TRUE; if (name == TGSI_SEMANTIC_COLOR) { unsigned mask = (1 << src->Register.SwizzleX) | (1 << src->Register.SwizzleY) | (1 << src->Register.SwizzleZ) | (1 << src->Register.SwizzleW); info->colors_read |= mask << (index * 4); } + +/* Process only interpolated varyings. Don't include POSITION. + * Don't include integer varyings, because they are not + * interpolated. Don't process inputs interpolated by INTERP + * opcodes. Those are tracked separately. + */ +if ((!is_interp_instruction || i != 0) && +(name == TGSI_SEMANTIC_GENERIC || + name == TGSI_SEMANTIC_TEXCOORD || + name == TGSI_SEMANTIC_COLOR || + name == TGSI_SEMANTIC_BCOLOR || + name == TGSI_SEMANTIC_FOG || + name == TGSI_SEMANTIC_CLIPDIST)) { + switch (info->input_interpolate[index]) { + case TGSI_INTERPOLATE_COLOR: + case TGSI_INTERPOLATE_PERSPECTIVE: + switch (info->input_interpolate_loc[index]) { + case TGSI_INTERPOLATE_LOC_CENTER: + info->uses_persp_center = TRUE; + break; + case TGSI_INTERPOLATE_LOC_CENTROID: + info->uses_persp_centroid = TRUE; +
[Mesa-dev] [PATCH] mesa: fix glGetFramebufferAttachmentParameteriv w/ on-demand FRONT_BACK alloc
From: Marek OlšákThis fixes 66 CTS tests on st/mesa. Cc: 12.0 --- src/mesa/main/fbobject.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c index 2c01526..09da6b7 100644 --- a/src/mesa/main/fbobject.c +++ b/src/mesa/main/fbobject.c @@ -296,23 +296,35 @@ _mesa_get_fb0_attachment(struct gl_context *ctx, struct gl_framebuffer *fb, return >Attachment[BUFFER_FRONT_LEFT]; case GL_DEPTH: return >Attachment[BUFFER_DEPTH]; case GL_STENCIL: return >Attachment[BUFFER_STENCIL]; } } switch (attachment) { case GL_FRONT_LEFT: - return >Attachment[BUFFER_FRONT_LEFT]; + /* Front buffers can be allocated on the first use, but + * glGetFramebufferAttachmentParameteriv must work even if that + * allocation hasn't happened yet. In such case, use the back buffer, + * which should be the same. + */ + if (fb->Attachment[BUFFER_FRONT_LEFT].Type == GL_NONE) + return >Attachment[BUFFER_BACK_LEFT]; + else + return >Attachment[BUFFER_FRONT_LEFT]; case GL_FRONT_RIGHT: - return >Attachment[BUFFER_FRONT_RIGHT]; + /* Same as above. */ + if (fb->Attachment[BUFFER_FRONT_RIGHT].Type == GL_NONE) + return >Attachment[BUFFER_BACK_RIGHT]; + else + return >Attachment[BUFFER_FRONT_RIGHT]; case GL_BACK_LEFT: return >Attachment[BUFFER_BACK_LEFT]; case GL_BACK_RIGHT: return >Attachment[BUFFER_BACK_RIGHT]; case GL_AUX0: if (fb->Visual.numAuxBuffers == 1) { return >Attachment[BUFFER_AUX0]; } return NULL; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: remove interpolateAt* instructions for demoted inputs
From: Marek OlšákThis fixes 8 fs-interpolateat* piglit crashes on radeonsi, because it can't handle non-input operands in interpolateAt*. --- src/compiler/glsl/link_varyings.cpp | 5 + src/compiler/glsl/opt_algebraic.cpp | 10 ++ 2 files changed, 15 insertions(+) diff --git a/src/compiler/glsl/link_varyings.cpp b/src/compiler/glsl/link_varyings.cpp index 1fad310..85ca044 100644 --- a/src/compiler/glsl/link_varyings.cpp +++ b/src/compiler/glsl/link_varyings.cpp @@ -586,20 +586,25 @@ remove_unused_shader_inputs_and_outputs(bool is_separate_shader_object, if (var == NULL || var->data.mode != int(mode)) continue; /* A shader 'in' or 'out' variable is only really an input or output if * its value is used by other shader stages. This will cause the * variable to have a location assigned. */ if (var->data.is_unmatched_generic_inout && !var->data.is_xfb_only) { assert(var->data.mode != ir_var_temporary); + + /* Assign zeros to demoted inputs to allow more optimizations. */ + if (var->data.mode == ir_var_shader_in && !var->constant_value) +var->constant_value = ir_constant::zero(var, var->type); + var->data.mode = ir_var_auto; } } /* Eliminate code that is now dead due to unused inputs/outputs being * demoted. */ while (do_dead_code(sh->ir, false)) ; diff --git a/src/compiler/glsl/opt_algebraic.cpp b/src/compiler/glsl/opt_algebraic.cpp index f5858c8..2829a78 100644 --- a/src/compiler/glsl/opt_algebraic.cpp +++ b/src/compiler/glsl/opt_algebraic.cpp @@ -954,20 +954,30 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) } break; case ir_triop_csel: if (is_vec_one(op_const[0])) return ir->operands[1]; if (is_vec_zero(op_const[0])) return ir->operands[2]; break; + /* Remove interpolateAt* instructions for demoted inputs. They are +* assigned a constant expression to facilitate this. +*/ + case ir_unop_interpolate_at_centroid: + case ir_binop_interpolate_at_offset: + case ir_binop_interpolate_at_sample: + if (op_const[0]) + return ir->operands[0]; + break; + default: break; } return ir; } void ir_algebraic_visitor::handle_rvalue(ir_rvalue **rvalue) { -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/11] st/mesa: expose ARB_compute_variable_group_size
For patches 8, 9: Reviewed-by: Marek OlšákPatch 10 won't work for us, because radeonsi (and presumably softpipe as well) don't support this feature at the moment. Also, I would prefer a PIPE_CAP for MaxComputeVariableGroupInvocations and the extension can be exposed based on that CAP. Marek On Thu, Sep 8, 2016 at 10:31 PM, Samuel Pitoiset wrote: > This extension is only exposed if the underlying driver supports > ARB_compute_shader. > > Signed-off-by: Samuel Pitoiset > --- > src/mesa/state_tracker/st_extensions.c | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/src/mesa/state_tracker/st_extensions.c > b/src/mesa/state_tracker/st_extensions.c > index 807fbfb..dc2e60a 100644 > --- a/src/mesa/state_tracker/st_extensions.c > +++ b/src/mesa/state_tracker/st_extensions.c > @@ -1196,6 +1196,19 @@ void st_init_extensions(struct pipe_screen *screen, > extensions->ARB_compute_shader = > > extensions->ARB_shader_image_load_store && >extensions->ARB_shader_atomic_counters; > + > + if (extensions->ARB_compute_shader) { > +/* Because the minimum values required by > + * ARB_compute_variable_group_size are less (or equal) than the > + * ones defined by ARB_compute_shader we can re-use them. */ > +for (i = 0; i < 3; i++) { > + consts->MaxComputeVariableGroupSize[i] = > + consts->MaxComputeWorkGroupSize[i]; > +} > +consts->MaxComputeVariableGroupInvocations = > + consts->MaxComputeWorkGroupInvocations; > +extensions->ARB_compute_variable_group_size = true; > + } >} > } > > -- > 2.9.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] vl/dri3: handle the case of different GPU
Hi Nayan, Just a couple of fly-by comments. As always don't read too much into them. On 9 September 2016 at 14:17, Nayan Deshmukhwrote: > use a linear buffer in case of back buffer > You might want to mention a bit more about the implementation and/or why doing things like X won't work/is bad idea. > + if (scrn->is_different_gpu) { > + templ.format = PIPE_FORMAT_B8G8R8X8_UNORM; > + templ.target = PIPE_TEXTURE_2D; > + templ.last_level = 0; > + templ.width0 = scrn->width; > + templ.height0 = scrn->height; > + templ.depth0 = 1; > + templ.array_size = 1; bind == 0 here, that doesn't seem right ? > + buffer->texture = > scrn->base.pscreen->resource_create(scrn->base.pscreen, > +); > + if (!buffer->texture) > + goto unmap_shm; > + With the bind fixed the above hunk is common so can be kept prior to the conditional, no ? > @@ -485,6 +516,17 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen, > return; > } > > + if (scrn->is_different_gpu) { > + u_box_origin_2d(scrn->width, > + scrn->height, > + _box); > + scrn->pipe->resource_copy_region(scrn->pipe, > + back->linear_buffer, > + 0, 0, 0, 0, > + back->texture, > + 0, _box); > + Please try to use the full width of the line. Regards, Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] vl/dri3: handle the case of different GPU
use a linear buffer in case of back buffer v2: Use a seprate linear buffer for each back buffer Signed-off-by: Nayan Deshmukh--- src/gallium/auxiliary/vl/vl_winsys_dri3.c | 82 --- 1 file changed, 63 insertions(+), 19 deletions(-) diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c b/src/gallium/auxiliary/vl/vl_winsys_dri3.c index 3d596a6..e168e36 100644 --- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c +++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c @@ -49,6 +49,7 @@ struct vl_dri3_buffer { struct pipe_resource *texture; + struct pipe_resource *linear_buffer; uint32_t pixmap; uint32_t sync_fence; @@ -69,6 +70,8 @@ struct vl_dri3_screen xcb_present_event_t eid; xcb_special_event_t *special_event; + struct pipe_context *pipe; + struct vl_dri3_buffer *back_buffers[BACK_BUFFER_NUM]; int cur_back; @@ -82,6 +85,7 @@ struct vl_dri3_screen int64_t last_ust, ns_frame, last_msc, next_msc; bool flushed; + bool is_different_gpu; }; static void @@ -209,7 +213,7 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn) xcb_sync_fence_t sync_fence; struct xshmfence *shm_fence; int buffer_fd, fence_fd; - struct pipe_resource templ; + struct pipe_resource templ, *pixmap_buffer; struct winsys_handle whandle; unsigned usage; @@ -226,25 +230,49 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn) goto close_fd; memset(, 0, sizeof(templ)); - templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW | -PIPE_BIND_SCANOUT | PIPE_BIND_SHARED; - templ.format = PIPE_FORMAT_B8G8R8X8_UNORM; - templ.target = PIPE_TEXTURE_2D; - templ.last_level = 0; - templ.width0 = scrn->width; - templ.height0 = scrn->height; - templ.depth0 = 1; - templ.array_size = 1; - buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen, - ); - if (!buffer->texture) - goto unmap_shm; - + if (scrn->is_different_gpu) { + templ.format = PIPE_FORMAT_B8G8R8X8_UNORM; + templ.target = PIPE_TEXTURE_2D; + templ.last_level = 0; + templ.width0 = scrn->width; + templ.height0 = scrn->height; + templ.depth0 = 1; + templ.array_size = 1; + buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen, +); + if (!buffer->texture) + goto unmap_shm; + + templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW | + PIPE_BIND_SCANOUT | PIPE_BIND_SHARED | PIPE_BIND_LINEAR; + buffer->linear_buffer = scrn->base.pscreen->resource_create(scrn->base.pscreen, + ); + pixmap_buffer = buffer->linear_buffer; + + if (!buffer->linear_buffer) + goto no_linear_buffer; + + } else { + templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW | + PIPE_BIND_SCANOUT | PIPE_BIND_SHARED; + templ.format = PIPE_FORMAT_B8G8R8X8_UNORM; + templ.target = PIPE_TEXTURE_2D; + templ.last_level = 0; + templ.width0 = scrn->width; + templ.height0 = scrn->height; + templ.depth0 = 1; + templ.array_size = 1; + buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen, +); + if (!buffer->texture) + goto unmap_shm; + pixmap_buffer = buffer->texture; + } memset(, 0, sizeof(whandle)); whandle.type= DRM_API_HANDLE_TYPE_FD; usage = PIPE_HANDLE_USAGE_EXPLICIT_FLUSH | PIPE_HANDLE_USAGE_READ; scrn->base.pscreen->resource_get_handle(scrn->base.pscreen, NULL, - buffer->texture, , + pixmap_buffer, , usage); buffer_fd = whandle.handle; buffer->pitch = whandle.stride; @@ -271,6 +299,8 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn) return buffer; +no_linear_buffer: + pipe_resource_reference(>texture, NULL); unmap_shm: xshmfence_unmap_shm(shm_fence); close_fd: @@ -474,6 +504,7 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen, struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)context_private; uint32_t options = XCB_PRESENT_OPTION_NONE; struct vl_dri3_buffer *back; + struct pipe_box src_box; back = scrn->back_buffers[scrn->cur_back]; if (!back) @@ -485,6 +516,17 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen, return; } + if (scrn->is_different_gpu) { + u_box_origin_2d(scrn->width, + scrn->height, + _box); + scrn->pipe->resource_copy_region(scrn->pipe, + back->linear_buffer, + 0, 0, 0, 0, +
Re: [Mesa-dev] [PATCH] Disable the code that allocates W|X memory on OpenBSD
On 8 September 2016 at 19:46, Mark Ketteniswrote: >> From: Emil Velikov >> Date: Thu, 8 Sep 2016 18:57:44 +0100 >> >> On 1 September 2016 at 18:23, Jonathan Gray wrote: >> > OpenBSD now has strict W^X enforcement. Processes that violate >> > the policy get killed by the kernel. Don't attempt to use >> > executable memory on OpenBSD to avoid this. >> > >> > Patch from Mark Kettenis. > > Jonathan, Emil, > > That diff should probably not land "upstream". We're still tweaking > the W^X policy, trying to find the code in ports that violates W^X. > We recently changed things around such that > > mmap(..., PROT_WRITE | PROT_EXEC) > > will simply fail instead of aborting the process. Since Mesa has > fallback code for that case, the changes I made (at least most of > them) are no longer necessary. But I didn't revert them yet in order > to reduce the amount of false positives. > Fwiw I'm for upstreaming things where they don't seem too hacky. When they do I try to elaborate why and how they could be addressed. Here... thinking about the mmap solution (workaround?) I'm wondering if it isn't the better option. Using it one won't need to patch X projects and at the same time everyone should handle mmap() failure. If they don't then that's a bug which should be fixed regardless of anything else. >> > --- a/src/gallium/auxiliary/rtasm/rtasm_execmem.c >> > +++ b/src/gallium/auxiliary/rtasm/rtasm_execmem.c >> > @@ -69,6 +69,16 @@ static struct mem_block *exec_heap = NULL; >> > static unsigned char *exec_mem = NULL; >> > >> > >> > +#ifdef __OpenBSD__ >> > + >> > +static int >> > +init_heap(void) >> > +{ >> > + return 0; >> > +} >> Afaict this is equivalent to using the #else path in translate_see.c. >> In general I'm wondering if we can/should not have a configure toggle >> for this. Then again please look below. > > Right. We basically prefer the slow code paths over the unsecure > codepaths. There are ways around W^X enforcement though. One > possibility is to generate the code in pages that are writable, but > not executable, and then flip the permissions to make the page > read-only but executable. Other techniques include using aliased > pages where one mapping is writable and the other mapping is > executable. Although we don't really want to encourage using aliasing > since it is a bit of a cheat. > >> > --- a/src/mapi/u_execmem.c >> > +++ b/src/mapi/u_execmem.c >> > @@ -45,8 +45,15 @@ static unsigned int head = 0; >> > >> > static unsigned char *exec_mem = (unsigned char *)0; >> > >> > +#if defined(__OpenBSD__) >> > >> > -#if defined(__linux__) || defined(__OpenBSD__) || defined(_NetBSD__) || >> > defined(__sun) || defined(__HAIKU__) >> > +static int >> > +init_map(void) >> > +{ >> > + return 0; >> > +} >> > + >> And this one to --disable-glx-tls and/or --disable-asm. Which reminds >> me of - have you guys tried enabling either/both of them. Has there >> been (m)any issues ? > > Pilip Guenther is working on enabling TLS support in OpenBSD. He is > getting close, but he was grumbling that Mesa was uncovering some > toolchain bugs. > Ack, looking forward to it. Hope that if there's problems on mesa side, he'll shout early as he finds them. >> For a long while the intent has been to use --enable-glx-tls by >> default and kill off the other codepaths. But with the write xor >> execute policy, it's going to be (close to) impossible. > > Not entirely sure what TLS has to do with W^X enforcement. > Things are slightly overloaded, but the gist is that the TLS/ASM codepaths constructs a jump table(?) which is used instead of calls into the respective GL functions. Linux has been explicitly using only the TLS codepath for 3+ years (if not a lot more), with the only obstacle for keeping the other two codepaths being BSD and maybe Solaris. To make it even better the code (all three codepaths) has close to zero fixes/updates since it was introduced and is quite dense. >> Have you guys considered a way to disable the restriction for usecases >> that need the behaviour ? > > "no comment", erh, yes, there is a way. But not for what we consider > the base of our OS. Which includes X. Mapping pages to be both > writable and executable is really bad from a security perspective. > And the goal is to eliminate that practice completely from the > ecosystem. Yes even though I'm not a security person, I'm aware that using W|X is a bad idea. The key point imho is that a) there must always be a contingency plan, which involves toggling off and/or other methods that you've mentioned and b) considering the limited amount of devs/interest in the area it will take a while to untangle things. > Please note that we're not alone here. iOS for example > does not allow you to create mappings like that either. > I see your point about iOS and/or other platforms using such policy and I'm not against it. Merely pointing out be lots of divergent,
Re: [Mesa-dev] [PATCH 7/7] egl: Fix DebugMessageControl(callback, NULL)
On 8 September 2016 at 18:47, Adam Jacksonwrote: > Treat a null attribute list as meaning "don't change attributes". This > is semantically equivalent to a list consisting of just EGL_NONE. > Same as before - please don't introduce known buggy code with one patch only to fix it with a latter one in the series. git commit --fixup && git rebase --autosquash is our friend. Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/7] EGL: Fix some command names for EGL_KHR_debug
On 8 September 2016 at 18:47, Adam Jacksonwrote: > From: Kyle Brenneman > > Change a few EGL entrypoints to call a common internal function instead > of forwarding to another entrypoint. > > If one EGL entrypoint calls another, then the second entrypoint would > overwrite the current function name in the _EGLThreadInfo struct. That > would cause it to pass the wrong function name to the EGL_KHR_debug > callback. > This should _really_ be squashed with the respective patches. Ideally with the minor refactoring split out as separate patch(es). Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: support TGSI compute shaders with variable block size
On Fri, Sep 9, 2016 at 8:29 AM, Marek Olšákwrote: > On Fri, Sep 9, 2016 at 10:12 AM, Nicolai Hähnle wrote: >> From: Nicolai Hähnle >> >> Not sure if it's possible to avoid programming the block size twice (once for >> the userdata and once for the dispatch). >> >> Since the shaders are compiled with a pessimistic upper limit on the number >> of >> registers, asynchronously compiling variants may be worth considering in the >> future if we observe the shaders to be dispatched with small block sizes. >> --- >> I think this is sufficient to support variable group sizes on radeonsi, but >> it's completely untested. Do you keep the latest version of your series in a >> public repository somewhere? >> >> src/gallium/drivers/radeonsi/si_compute.c | 10 +- >> src/gallium/drivers/radeonsi/si_shader.c | 29 - >> src/gallium/drivers/radeonsi/si_shader.h | 4 +++- >> 3 files changed, 32 insertions(+), 11 deletions(-) >> >> diff --git a/src/gallium/drivers/radeonsi/si_compute.c >> b/src/gallium/drivers/radeonsi/si_compute.c >> index 5041761..26e096c 100644 >> --- a/src/gallium/drivers/radeonsi/si_compute.c >> +++ b/src/gallium/drivers/radeonsi/si_compute.c >> @@ -379,25 +379,33 @@ static void si_setup_tgsi_grid(struct si_context *sctx, >> for (i = 0; i < 3; ++i) { >> radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0)); >> radeon_emit(cs, COPY_DATA_SRC_SEL(COPY_DATA_MEM) | >> COPY_DATA_DST_SEL(COPY_DATA_REG)); >> radeon_emit(cs, (va + 4 * i)); >> radeon_emit(cs, (va + 4 * i) >> 32); >> radeon_emit(cs, (grid_size_reg >> 2) + i); >> radeon_emit(cs, 0); >> } >> } else { >> + struct si_compute *program = sctx->cs_shader_state.program; >> + bool variable_group_size = >> + >> program->shader.selector->info.properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] >> == 0; >> >> - radeon_set_sh_reg_seq(cs, grid_size_reg, 3); >> + radeon_set_sh_reg_seq(cs, grid_size_reg, variable_group_size >> ? 6 : 3); >> radeon_emit(cs, info->grid[0]); >> radeon_emit(cs, info->grid[1]); >> radeon_emit(cs, info->grid[2]); >> + if (variable_group_size) { >> + radeon_emit(cs, info->block[0]); >> + radeon_emit(cs, info->block[1]); >> + radeon_emit(cs, info->block[2]); >> + } >> } >> } >> >> static void si_emit_dispatch_packets(struct si_context *sctx, >> const struct pipe_grid_info *info) >> { >> struct radeon_winsys_cs *cs = sctx->b.gfx.cs; >> bool render_cond_bit = sctx->b.render_cond && >> !sctx->b.render_cond_force_off; >> unsigned waves_per_threadgroup = >> DIV_ROUND_UP(info->block[0] * info->block[1] * >> info->block[2], 64); >> diff --git a/src/gallium/drivers/radeonsi/si_shader.c >> b/src/gallium/drivers/radeonsi/si_shader.c >> index 0b7de18..730ee21 100644 >> --- a/src/gallium/drivers/radeonsi/si_shader.c >> +++ b/src/gallium/drivers/radeonsi/si_shader.c >> @@ -1783,30 +1783,35 @@ static void declare_system_value( >> >> case TGSI_SEMANTIC_GRID_SIZE: >> value = LLVMGetParam(radeon_bld->main_fn, >> SI_PARAM_GRID_SIZE); >> break; >> >> case TGSI_SEMANTIC_BLOCK_SIZE: >> { >> LLVMValueRef values[3]; >> unsigned i; >> unsigned *properties = >> ctx->shader->selector->info.properties; >> - unsigned sizes[3] = { >> - properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH], >> - properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT], >> - properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH] >> - }; >> >> - for (i = 0; i < 3; ++i) >> - values[i] = lp_build_const_int32(gallivm, sizes[i]); >> + if (properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] != 0) { >> + unsigned sizes[3] = { >> + >> properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH], >> + >> properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT], >> + >> properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH] >> + }; >> + >> + for (i = 0; i < 3; ++i) >> + values[i] = lp_build_const_int32(gallivm, >> sizes[i]); >> >> - value = lp_build_gather_values(gallivm, values, 3); >> + value = lp_build_gather_values(gallivm, values, 3); >> + } else {
Re: [Mesa-dev] [PATCH v2 2/3] egl: return corresponding offset of EGLImage instead of 0.
On 9 September 2016 at 08:58, Chuanbo Wengwrote: > The offset should not always be 0. For example, if EGLImage is > created from a 2D texture with EGL_GL_TEXTURE_LEVEL=1, then the > offset should be the actual start of miplevel 1 in bo. > > v2: Add version check of __DRIimageExtension implementation > (Suggested by Axel Davy). > Just to elaborate what Axel was saying from another perspective: With current master the offset is explicitly zeroed, while with the (v2) patch you'll _only_ set the value when you have v13 driver. Thus using the loader + v12 driver you'll get a regression since a) the offset will remain garbage and b) the function (dri2_export_dma_buf_image_mesa) will fail. > Signed-off-by: Chuanbo Weng > --- > src/egl/drivers/dri2/egl_dri2.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c > index 859612f..c529e55 100644 > --- a/src/egl/drivers/dri2/egl_dri2.c > +++ b/src/egl/drivers/dri2/egl_dri2.c > @@ -2249,6 +2249,7 @@ dri2_export_dma_buf_image_mesa(_EGLDriver *drv, > _EGLDisplay *disp, _EGLImage *im > struct dri2_egl_image *dri2_img = dri2_egl_image(img); > > (void) drv; > + EGLint img_offset = 0; > > /* rework later to provide multiple fds/strides/offsets */ > if (fds) > @@ -2259,8 +2260,14 @@ dri2_export_dma_buf_image_mesa(_EGLDriver *drv, > _EGLDisplay *disp, _EGLImage *im >dri2_dpy->image->queryImage(dri2_img->dri_image, > __DRI_IMAGE_ATTRIB_STRIDE, strides); > > - if (offsets) > + if (offsets){ >offsets[0] = 0; > + if(dri2_dpy->image->base.version >= 13){ Please move the variable declaration in local scope and add space between ){ Functionality wise we don't need the version check, esp. since you want to set the offset only when queryImage() succeeds... > + dri2_dpy->image->queryImage(dri2_img->dri_image, > + __DRI_IMAGE_ATTRIB_OFFSET, _offset); ... which you don't seem to be checking here, so you'll effectively store/use garbage. So unless Alex feels strongly for the version check I'd omit it, and fix the rest of this patch. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev