On Mon, Oct 5, 2015 at 11:36 AM, Emil Velikov <emil.l.veli...@gmail.com> wrote: > Hi all, > > I am looking at ARB_shader_clock with i965 in mind. > > So far I've got the most of the infra/plumbing, and a fancy a new intrinsic :) > > On the hardware side, I was thinking about using the Observability > Architecture (OA) counters. The fun part is that those tend to vary > quite a bit based on the hardware generation. So far I'm leaning > towards: > - "Count of XXX threads dispatched to EUs" for BRW and later. > - "XXX Shader Active Time" for earlier (SNB-HSW/VLV) hardware. > > Do there sound appropriate, or should we opt for the various knobs in > 'Flexible EU event counters' ? Is there some alternative piece of > hardware in i965, which I can use ? > > > Going for OA has a small catch. Reading through the PRM, it is not > obvious if one can track the same source twice (the > GL_AMD_performance_monitor implementation comes to mind). I'm about to > take a closer look into brw_performance_monitor.[ch] shortly, but if > any gotchas/fancy interactions come to mind let me know. > > Thanks > Emil > > P.S. Does anyone recall the consensus wrt adding the 2015 extensions > to GL3.txt ? > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Hi Emil, I don't think you want to use the OA counters to implement ARB_shader_clock. They're not exposed to the shader directly, AFAIK, and they only measure things on a per-invocation granularity, whereas the intent of ARB_shader_clock is to be able to measure the number of cycles that individual operations take with very low latency. Instead, you should read from the ARF performance register -- see page 822 of Vol 7 ("3D Media GPGPU") of the Broadwell PRM (page 858 of the PDF) for more details. Another interesting thing is that you can atomically read from that register and also get a bit that say whether there was some event, such as a context switch, since the last time you read it that would make your measurement invalid. It might be useful to expose this through a GLSL extension as another set of overloads: uint64_t clockARB(out bool valid); //once we get int64 support uvec2 clock2x32ARB(out bool valid); and a corresponding NIR intrinsic that outputs an extra component that's a boolean (i.e. 0 or ~0). That would help with implementing something like INTEL_DEBUG=shader_time generically with less outliers to throw away. Connor _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev