Am 06.07.2015 um 19:54 schrieb Ilia Mirkin: > That's right. Except really what might have happend was > > occl query; > write X; > more drawing; > write X+1; > > and then on the CPU, you see X+1. So the tests are always for >= X. > And if you have more than 2^32 submits, you cry, because I'm *sure* > that nothing implements wraparound properly :) That's why 64bit counters are used right? :-).
Roland > > On Mon, Jul 6, 2015 at 1:45 PM, Vyacheslav Gonakhchyan > <ytri...@gmail.com> wrote: >> Ilia, thanks a lot for the info. >> >> So basically if I submit to GPU's command stream: >> perform occlusion query, >> write X to Y. >> I know that query is completed when after reading Y address I get X. >> >> Regards, >> Vyacheslav >> >> On Mon, Jul 6, 2015 at 9:13 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote: >>> >>> I'm only really familiar with nouveau, but I think all GPU hardware >>> works in roughly the same way. Basically you have some way of >>> inserting "write X to address Y" into the command stream (aka a >>> "fence"), after which you insert "write X+1 to address Y" and so on. >>> If you want the CPU to wait on a given fence, you just do "while >>> (*address < x);". If you have multiple GPU processing queues, you can >>> usually also insert a "stall this queue until the value at address Y >>> is at least X" command into the command stream. >>> >>> DRM uses implicit fences, so it knows which BOs are used for >>> particular commands. So the flow goes something like "submit bunch of >>> commands; submit fence write and attach that fence id to the BOs in >>> the previous bunch of comands". Then to wait for a bo to become ready, >>> you just wait until the GPU writes the appropriate number to memory >>> address Y (from above). >>> >>> The mesa drivers can sometimes use clever tricks that avoid this >>> sync'ing because it knows exactly how it emits the commands and >>> perhaps it waits on something related earlier whereby it knows the >>> other thing will be ready. No idea if that's the case here. >>> >>> Hope this helps, >>> >>> -ilia >>> >>> >>> On Mon, Jul 6, 2015 at 1:05 PM, Vyacheslav Gonakhchyan >>> <ytri...@gmail.com> wrote: >>>> Ilia, thanks for the gallium link. >>>> Do you know any links to high level info with broad strokes about how >>>> this >>>> sync works? Frankly I do not know driver terminology and wanted to know >>>> more >>>> about how this sync is performed for my research. I'm using mesa as a >>>> reference because it has open implementation code. Occlusion query >>>> functionality probably waits for z-buffer to become ready. Problem is >>>> that >>>> usual synchronization techniques do not apply here. I'm thinking that >>>> driver >>>> code gets notifications about state change. I want to know what kind of >>>> notifications are available? Can query be performed in parallel with >>>> another >>>> frame being processed or does it need complete GPU pipeline flush? >>>> >>>> Thanks, >>>> Vyacheslav >>>> >>>> On Mon, Jul 6, 2015 at 8:32 PM, Ilia Mirkin <imir...@alum.mit.edu> >>>> wrote: >>>>> >>>>> On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan >>>>> <ytri...@gmail.com> wrote: >>>>>> Hi, everyone. >>>>>> >>>>>> Trying to understand method radeonQueryGetResult (more broadly >>>>>> GPU-CPU >>>>>> sync). >>>>>> >>>>>> static void radeonQueryGetResult(struct gl_context *ctx, struct >>>>>> gl_query_object *q) >>>>>> { >>>>>> struct radeon_query_object *query = (struct radeon_query_object >>>>>> *)q; >>>>>> uint32_t *result; >>>>>> int i; >>>>>> >>>>>> radeon_print(RADEON_STATE, RADEON_VERBOSE, >>>>>> "%s: query id %d, result %d\n", >>>>>> __func__, query->Base.Id, (int) query->Base.Result); >>>>>> >>>>>> radeon_bo_map(query->bo, GL_FALSE); >>>>>> result = query->bo->ptr; >>>>>> >>>>>> query->Base.Result = 0; >>>>>> for (i = 0; i < query->curr_offset/sizeof(uint32_t); ++i) { >>>>>> query->Base.Result += LE32_TO_CPU(result[i]); >>>>>> radeon_print(RADEON_STATE, RADEON_TRACE, "result[%d] = %d\n", >>>>>> i, >>>>>> LE32_TO_CPU(result[i])); >>>>>> } >>>>>> >>>>>> radeon_bo_unmap(query->bo); >>>>>> } >>>>>> >>>>>> I don't know which part is responsible for blocking behavior (waiting >>>>>> for >>>>>> response from GPU). I suspect that radeon_bo_map does this magic. >>>>>> Can someone point in the right direction? >>>>> >>>>> The radeon_bo_map defined in >>>>> src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic. >>>>> However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c >>>>> references the radeon_bo_map in libdrm, which does not appear to wait. >>>>> >>>>> FWIW for nouveau, nouveau_bo_map will also implicitly do a >>>>> nouveau_bo_wait, but that does not appear to be the case for radeon. >>>>> >>>>> Cheers, >>>>> >>>>> -ilia >>>> >>>> >> >> > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=WLJtDrNOS1MO4md0Q2dXG1RDHVFkoqdi6-ZeojTw0l8&s=ZgIqDkkPvOUZvUp4VGqWC8rnvcv-tBNOIB6Dqpkh2uU&e= > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev