On Tue, 4 Dec 2018 at 21:57, Bas Nieuwenhuizen <b...@basnieuwenhuizen.nl> wrote:
> On Tue, Dec 4, 2018 at 4:52 PM Samuel Pitoiset > <samuel.pitoi...@gmail.com> wrote: > > > > Because WAIT_REG_MEM can only wait for a 32-bit value, it's not > > safe to use it for timestamp queries. If we only wait on the low > > 32 bits of a timestamp query we could be unlucky and the GPU > > might hang. > > > > One possible fix is to emit a full end of pipe event and wait > > on a 32-bit value which is actually an availability bit. This > > bit is allocated at creation time and always cleared before > > emitting the EOP event. > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108925 > > Fixes: 5d6a560a29 ("radv: do not use the availability bit for timestamp > queries") > > Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com> > > --- > > src/amd/vulkan/radv_query.c | 49 +++++++++++++++++++++++++++++++------ > > 1 file changed, 41 insertions(+), 8 deletions(-) > > > > diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c > > index 550abe307a1..9bb6b660add 100644 > > --- a/src/amd/vulkan/radv_query.c > > +++ b/src/amd/vulkan/radv_query.c > > @@ -1056,8 +1056,15 @@ VkResult radv_CreateQueryPool( > > pool->pipeline_stats_mask = pCreateInfo->pipelineStatistics; > > pool->availability_offset = pool->stride * > pCreateInfo->queryCount; > > pool->size = pool->availability_offset; > > - if (pCreateInfo->queryType == VK_QUERY_TYPE_PIPELINE_STATISTICS) > > + if (pCreateInfo->queryType == VK_QUERY_TYPE_PIPELINE_STATISTICS) > { > > pool->size += 4 * pCreateInfo->queryCount; > > + } else if (pCreateInfo->queryType == VK_QUERY_TYPE_TIMESTAMP) { > > + /* Allocate one DWORD for the availability bit which is > needed > > + * for vkCmdCopyQueryPoolResults() because we can't > perform a > > + * WAIT_REG_MEM on a 64-bit value. > > + */ > > + pool->size += 4; > > + } > > > > pool->bo = device->ws->buffer_create(device->ws, pool->size, > > 64, RADEON_DOMAIN_GTT, > RADEON_FLAG_NO_INTERPROCESS_SHARING); > > @@ -1328,19 +1335,45 @@ void radv_CmdCopyQueryPoolResults( > > pool->availability_offset + 4 * > firstQuery); > > break; > > case VK_QUERY_TYPE_TIMESTAMP: > > + if (flags & VK_QUERY_RESULT_WAIT_BIT) { > > + /* Emit a full end of pipe event because we can't > > + * perform a WAIT_REG_MEM on a 64-bit value. If > we only > > + * do a WAIT_REG_MEM on the low 32 bits of a > timestamp > > + * query we could be unlucky and the GPU might > hang. > > + */ > > + enum chip_class chip = > cmd_buffer->device->physical_device->rad_info.chip_class; > > + bool is_mec = > radv_cmd_buffer_uses_mec(cmd_buffer); > > + uint64_t avail_va = va + > pool->availability_offset; > > + > > + /* Clear the availability bit before waiting on > the end > > + * of pipe event. > > + */ > > + radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0)); > > + radeon_emit(cs, S_370_DST_SEL(V_370_MEM_ASYNC) | > > + S_370_WR_CONFIRM(1) | > > + S_370_ENGINE_SEL(V_370_ME)); > > + radeon_emit(cs, avail_va); > > + radeon_emit(cs, avail_va >> 32); > > + radeon_emit(cs, 0xdeadbeef); > > + > > + /* Wait for all prior GPU work. */ > > + si_cs_emit_write_event_eop(cs, chip, is_mec, > > + > V_028A90_BOTTOM_OF_PIPE_TS, 0, > > + > EOP_DATA_SEL_VALUE_32BIT, > > + avail_va, 0, 1, > > + > cmd_buffer->gfx9_eop_bug_va); > > + > > + /* Wait on the timestamp value. */ > > + radv_cp_wait_mem(cs, WAIT_REG_MEM_EQUAL, > avail_va, > > + 1, 0xffffffff); > > + } > > + > > Can we put this in a separate function? Also, you'll want to allocate > the availability bit in the upload buffer, in case there are multiple > concurrent command buffers using the same query pool. > > Alternative solution: look at the upper 32 bits, those definitely > should not be 0xfffffff until a far away point in the future. > I just looked into this a bit more, since if the cause of the hang is that the low 32 bits on a valid timestamp are 0xffffffff, it seemed a bit suspicious that it's 100% repro. What's actually happening is that some of the timestamps are being written before vkCmdResetQueryPool completes, so the reset ends up overwriting them back to TIMESTAMP_NOT_READY. I've updated the test case on the bug to map the buffer and check if any results have the low 32 bits as 0xffffffff. When there's a high enough query count to hang without this patch, the results for the first few queries in the pool are coming out as TIMESTAMP_NOT_READY. That doesn't happen with less than 512 queries, and that happens to be the threshold for using the shader path in radv_fill_buffer for the reset. That's fixed if I add a flush at the end of radv_CmdResetQueryPool. I think that's needed since as far as I can see from the spec the app shouldn't need to do any explicit sync between reset and later query commands. I'll send a patch for that as well. Thanks, Alex > > > for(unsigned i = 0; i < queryCount; ++i, dest_va += > stride) { > > unsigned query = firstQuery + i; > > uint64_t local_src_va = va + query * > pool->stride; > > > > MAYBE_UNUSED unsigned cdw_max = > radeon_check_space(cmd_buffer->device->ws, cs, 19); > > > > - > > - if (flags & VK_QUERY_RESULT_WAIT_BIT) { > > - radv_cp_wait_mem(cs, > WAIT_REG_MEM_NOT_EQUAL, > > - local_src_va, > > - TIMESTAMP_NOT_READY >> > 32, > > - 0xffffffff); > > - } > > if (flags & > VK_QUERY_RESULT_WITH_AVAILABILITY_BIT) { > > uint64_t avail_dest_va = dest_va + > elem_size; > > > > -- > > 2.19.2 > > > > _______________________________________________ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev