Re: [Mesa-dev] [PATCH] radv: fix vkCmdCopyQueryoolResults() for timestamp queries

Alex Smith Wed, 05 Dec 2018 01:38:37 -0800

On Tue, 4 Dec 2018 at 21:57, Bas Nieuwenhuizen <b...@basnieuwenhuizen.nl>
wrote:


> On Tue, Dec 4, 2018 at 4:52 PM Samuel Pitoiset
> <samuel.pitoi...@gmail.com> wrote:
> >
> > Because WAIT_REG_MEM can only wait for a 32-bit value, it's not
> > safe to use it for timestamp queries. If we only wait on the low
> > 32 bits of a timestamp query we could be unlucky and the GPU
> > might hang.
> >
> > One possible fix is to emit a full end of pipe event and wait
> > on a 32-bit value which is actually an availability bit. This
> > bit is allocated at creation time and always cleared before
> > emitting the EOP event.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108925
> > Fixes: 5d6a560a29 ("radv: do not use the availability bit for timestamp
> queries")
> > Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
> > ---
> >  src/amd/vulkan/radv_query.c | 49 +++++++++++++++++++++++++++++++------
> >  1 file changed, 41 insertions(+), 8 deletions(-)
> >
> > diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c
> > index 550abe307a1..9bb6b660add 100644
> > --- a/src/amd/vulkan/radv_query.c
> > +++ b/src/amd/vulkan/radv_query.c
> > @@ -1056,8 +1056,15 @@ VkResult radv_CreateQueryPool(
> >         pool->pipeline_stats_mask = pCreateInfo->pipelineStatistics;
> >         pool->availability_offset = pool->stride *
> pCreateInfo->queryCount;
> >         pool->size = pool->availability_offset;
> > -       if (pCreateInfo->queryType == VK_QUERY_TYPE_PIPELINE_STATISTICS)
> > +       if (pCreateInfo->queryType == VK_QUERY_TYPE_PIPELINE_STATISTICS)
> {
> >                 pool->size += 4 * pCreateInfo->queryCount;
> > +       } else if (pCreateInfo->queryType == VK_QUERY_TYPE_TIMESTAMP) {
> > +               /* Allocate one DWORD for the availability bit which is
> needed
> > +                * for vkCmdCopyQueryPoolResults() because we can't
> perform a
> > +                * WAIT_REG_MEM on a 64-bit value.
> > +                */
> > +               pool->size += 4;
> > +       }
> >
> >         pool->bo = device->ws->buffer_create(device->ws, pool->size,
> >                                              64, RADEON_DOMAIN_GTT,
> RADEON_FLAG_NO_INTERPROCESS_SHARING);
> > @@ -1328,19 +1335,45 @@ void radv_CmdCopyQueryPoolResults(
> >                                   pool->availability_offset + 4 *
> firstQuery);
> >                 break;
> >         case VK_QUERY_TYPE_TIMESTAMP:
> > +               if (flags & VK_QUERY_RESULT_WAIT_BIT) {
> > +                       /* Emit a full end of pipe event because we can't
> > +                        * perform a WAIT_REG_MEM on a 64-bit value. If
> we only
> > +                        * do a WAIT_REG_MEM on the low 32 bits of a
> timestamp
> > +                        * query we could be unlucky and the GPU might
> hang.
> > +                        */
> > +                       enum chip_class chip =
> cmd_buffer->device->physical_device->rad_info.chip_class;
> > +                       bool is_mec =
> radv_cmd_buffer_uses_mec(cmd_buffer);
> > +                       uint64_t avail_va = va +
> pool->availability_offset;
> > +
> > +                       /* Clear the availability bit before waiting on
> the end
> > +                        * of pipe event.
> > +                        */
> > +                       radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));
> > +                       radeon_emit(cs, S_370_DST_SEL(V_370_MEM_ASYNC) |
> > +                                       S_370_WR_CONFIRM(1) |
> > +                                       S_370_ENGINE_SEL(V_370_ME));
> > +                       radeon_emit(cs, avail_va);
> > +                       radeon_emit(cs, avail_va >> 32);
> > +                       radeon_emit(cs, 0xdeadbeef);
> > +
> > +                       /* Wait for all prior GPU work. */
> > +                       si_cs_emit_write_event_eop(cs, chip, is_mec,
> > +
> V_028A90_BOTTOM_OF_PIPE_TS, 0,
> > +
> EOP_DATA_SEL_VALUE_32BIT,
> > +                                                  avail_va, 0, 1,
> > +
> cmd_buffer->gfx9_eop_bug_va);
> > +
> > +                       /* Wait on the timestamp value. */
> > +                       radv_cp_wait_mem(cs, WAIT_REG_MEM_EQUAL,
> avail_va,
> > +                                        1, 0xffffffff);
> > +               }
> > +
>
> Can we put this in a separate function? Also, you'll want to allocate
> the availability bit in the upload buffer, in case there are multiple
> concurrent command buffers using the same query pool.
>
> Alternative solution: look at the upper 32 bits, those definitely
> should not be 0xfffffff until a far away point in the future.
>

I just looked into this a bit more, since if the cause of the hang is that
the low 32 bits on a valid timestamp are 0xffffffff, it seemed a bit
suspicious that it's 100% repro.

What's actually happening is that some of the timestamps are being written
before vkCmdResetQueryPool completes, so the reset ends up overwriting them
back to TIMESTAMP_NOT_READY. I've updated the test case on the bug to map
the buffer and check if any results have the low 32 bits as 0xffffffff.
When there's a high enough query count to hang without this patch, the
results for the first few queries in the pool are coming out as
TIMESTAMP_NOT_READY. That doesn't happen with less than 512 queries, and
that happens to be the threshold for using the shader path in
radv_fill_buffer for the reset.

That's fixed if I add a flush at the end of radv_CmdResetQueryPool. I think
that's needed since as far as I can see from the spec the app shouldn't
need to do any explicit sync between reset and later query commands. I'll
send a patch for that as well.

Thanks,
Alex


>
> >                 for(unsigned i = 0; i < queryCount; ++i, dest_va +=
> stride) {
> >                         unsigned query = firstQuery + i;
> >                         uint64_t local_src_va = va  + query *
> pool->stride;
> >
> >                         MAYBE_UNUSED unsigned cdw_max =
> radeon_check_space(cmd_buffer->device->ws, cs, 19);
> >
> > -
> > -                       if (flags & VK_QUERY_RESULT_WAIT_BIT) {
> > -                               radv_cp_wait_mem(cs,
> WAIT_REG_MEM_NOT_EQUAL,
> > -                                                local_src_va,
> > -                                                TIMESTAMP_NOT_READY >>
> 32,
> > -                                                0xffffffff);
> > -                       }
> >                         if (flags &
> VK_QUERY_RESULT_WITH_AVAILABILITY_BIT) {
> >                                 uint64_t avail_dest_va = dest_va +
> elem_size;
> >
> > --
> > 2.19.2
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: fix vkCmdCopyQueryoolResults() for timestamp queries

Reply via email to