[Intel-gfx] [PATCH 5/9] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-05-04 Thread Robert Bragg
On Wed, May 4, 2016 at 10:04 AM, Martin Peres 
wrote:

> On 03/05/16 22:34, Robert Bragg wrote:
>
>> Sorry for the delay replying to this, I missed it.
>>
>
> No worries!
>
>
>> On Sat, Apr 23, 2016 at 11:34 AM, Martin Peres > > wrote:
>>
>> On 20/04/16 17:23, Robert Bragg wrote:
>>
>> Gen graphics hardware can be set up to periodically write
>> snapshots of
>> performance counters into a circular buffer via its Observation
>> Architecture and this patch exposes that capability to userspace
>> via the
>> i915 perf interface.
>>
>> Cc: Chris Wilson > >
>> Signed-off-by: Robert Bragg > >
>> Signed-off-by: Zhenyu Wang > >
>>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h |  56 +-
>>   drivers/gpu/drm/i915/i915_gem_context.c |  24 +-
>>   drivers/gpu/drm/i915/i915_perf.c| 940
>> +++-
>>   drivers/gpu/drm/i915/i915_reg.h | 338 
>>   include/uapi/drm/i915_drm.h |  70 ++-
>>   5 files changed, 1408 insertions(+), 20 deletions(-)
>>
>> +
>> +
>> +   /* It takes a fairly long time for a new MUX
>> configuration to
>> +* be be applied after these register writes. This delay
>> +* duration was derived empirically based on the
>> render_basic
>> +* config but hopefully it covers the maximum
>> configuration
>> +* latency...
>> +*/
>> +   mdelay(100);
>>
>>
>> With such a HW and SW design, how can we ever expose hope to get any
>> kind of performance when we are trying to monitor different metrics
>> on each
>> draw call? This may be acceptable for system monitoring, but it is
>> problematic
>> for the GL extensions :s
>>
>>
>> Since it seems like we are going for a perf API, it means that for
>> every change
>> of metrics, we need to flush the commands, wait for the GPU to be
>> done, then
>> program the new set of metrics via an IOCTL, wait 100 ms, and then
>> we may
>> resume rendering ... until the next change. We are talking about a
>> latency of
>> 6-7 frames at 60 Hz here... this is non-negligeable...
>>
>>
>> I understand that we have a ton of counters and we may hide latency
>> by not
>> allowing using more than half of the counters for every draw call or
>> frame, but
>> even then, this 100ms delay is killing this approach altogether.
>>
>>
>>
>> Although I'm also really unhappy about introducing this delay recently,
>> the impact of the delay is typically amortized somewhat by keeping a
>> configuration open as long as possible.
>>
>> Even without this explicit delay here the OA unit isn't suited to being
>> reconfigured on a per draw call basis, though it is able to support per
>> draw call queries with the same config.
>>
>> The above assessment assumes wanting to change config between draw calls
>> which is not something this driver aims to support - as the HW isn't
>> really designed for that model.
>>
>> E.g. in the case of INTEL_performance_query, the backend keeps the i915
>> perf stream open until all OA based query objects are deleted - so you
>> have to be pretty explicit if you want to change config.
>>
>
> OK, I get your point. However, I still want to state that applications
> changing the set would see a disastrous effect as a 100 ms is enough to
> downclock both the CPU and GPU and that would dramatically alter the
> metrics. Should we make it clear somewhere, either in the
> INTEL_performance_query or as a warning in mesa_performance if changing the
> set while running? I would think the latter would be preferable as it could
> also cover the case of the AMD extension which, IIRC, does not talk about
> the performance cost of changing the metrics. With this caveat made clear,
> it seems reasonable.
>

Yeah a KHR_debug performance warning sounds like a good idea.


>
>
>> In case you aren't familiar with how the GL_INTEL_performance_query side
>> of things works for OA counters; one thing to be aware of is that
>> there's a separate MI_REPORT_PERF_COUNT command that Mesa writes either
>> side of a query which writes all the counters for the current OA config
>> (as configured via this i915 perf interface) to a buffer. In addition to
>> collecting reports via MI_REPORT_PERF_COUNT Mesa also configures the
>> unit for periodic sampling to be able to account for potential counter
>> overflow.
>>
>
> Oh, the overflow case is mean. Doesn't the spec mandate the application to
> read at least every second? This is the case for the timestamp queries.
>

For a Haswell GT3 system with 40EUs @ 1GHz some aggregate EU counters may

[Intel-gfx] [PATCH 5/9] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-05-04 Thread Martin Peres
On 03/05/16 22:34, Robert Bragg wrote:
> Sorry for the delay replying to this, I missed it.

No worries!

>
> On Sat, Apr 23, 2016 at 11:34 AM, Martin Peres  > wrote:
>
> On 20/04/16 17:23, Robert Bragg wrote:
>
> Gen graphics hardware can be set up to periodically write
> snapshots of
> performance counters into a circular buffer via its Observation
> Architecture and this patch exposes that capability to userspace
> via the
> i915 perf interface.
>
> Cc: Chris Wilson  >
> Signed-off-by: Robert Bragg  >
> Signed-off-by: Zhenyu Wang  >
> ---
>   drivers/gpu/drm/i915/i915_drv.h |  56 +-
>   drivers/gpu/drm/i915/i915_gem_context.c |  24 +-
>   drivers/gpu/drm/i915/i915_perf.c| 940
> +++-
>   drivers/gpu/drm/i915/i915_reg.h | 338 
>   include/uapi/drm/i915_drm.h |  70 ++-
>   5 files changed, 1408 insertions(+), 20 deletions(-)
>
> +
> +
> +   /* It takes a fairly long time for a new MUX
> configuration to
> +* be be applied after these register writes. This delay
> +* duration was derived empirically based on the
> render_basic
> +* config but hopefully it covers the maximum configuration
> +* latency...
> +*/
> +   mdelay(100);
>
>
> With such a HW and SW design, how can we ever expose hope to get any
> kind of performance when we are trying to monitor different metrics
> on each
> draw call? This may be acceptable for system monitoring, but it is
> problematic
> for the GL extensions :s
>
>
> Since it seems like we are going for a perf API, it means that for
> every change
> of metrics, we need to flush the commands, wait for the GPU to be
> done, then
> program the new set of metrics via an IOCTL, wait 100 ms, and then
> we may
> resume rendering ... until the next change. We are talking about a
> latency of
> 6-7 frames at 60 Hz here... this is non-negligeable...
>
>
> I understand that we have a ton of counters and we may hide latency
> by not
> allowing using more than half of the counters for every draw call or
> frame, but
> even then, this 100ms delay is killing this approach altogether.
>
>
>
> Although I'm also really unhappy about introducing this delay recently,
> the impact of the delay is typically amortized somewhat by keeping a
> configuration open as long as possible.
>
> Even without this explicit delay here the OA unit isn't suited to being
> reconfigured on a per draw call basis, though it is able to support per
> draw call queries with the same config.
>
> The above assessment assumes wanting to change config between draw calls
> which is not something this driver aims to support - as the HW isn't
> really designed for that model.
>
> E.g. in the case of INTEL_performance_query, the backend keeps the i915
> perf stream open until all OA based query objects are deleted - so you
> have to be pretty explicit if you want to change config.

OK, I get your point. However, I still want to state that applications 
changing the set would see a disastrous effect as a 100 ms is enough to 
downclock both the CPU and GPU and that would dramatically alter the
metrics. Should we make it clear somewhere, either in the 
INTEL_performance_query or as a warning in mesa_performance if changing 
the set while running? I would think the latter would be preferable as 
it could also cover the case of the AMD extension which, IIRC, does not 
talk about the performance cost of changing the metrics. With this 
caveat made clear, it seems reasonable.

>
> Considering the sets available on Haswell:
> * Render Metrics Basic
> * Compute Metrics Basic
> * Compute Metrics Extended
> * Memory Reads Distribution
> * Memory Writes Distribution
> * Metric set SamplerBalance
>
> Each of these configs can expose around 50 counters as a set.
>
> A GL application is most likely just going to use the render basic set,
> and In the case of a tool like gputop/GPA then changing config would
> usually be driven by some user interaction to select a set of metrics,
> where even a 100ms delay will go unnoticed.

100 ms is becoming visible, but I agree, it would not be a show stopper 
for sure.

On the APITRACE side, this should not be an issue, because we do not 
change the set of metrics while running.

>
> In case you aren't familiar with how the GL_INTEL_performance_query side
> of things works for OA counters; one thing to be aware of is that
> there's a separate MI_REPORT_PERF_COUNT command that Mesa writes either
> side of a query which writes all t

[Intel-gfx] [PATCH 5/9] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-04-21 Thread kbuild test robot
Hi,

[auto build test ERROR on drm-intel/for-linux-next]
[also build test ERROR on next-20160420]
[cannot apply to v4.6-rc4]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Robert-Bragg/Enable-Gen-7-Observation-Architecture/20160420-222746
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: i386-allmodconfig (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

>> ERROR: "__udivdi3" [drivers/gpu/drm/i915/i915.ko] undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
-- next part --
A non-text attachment was scrubbed...
Name: .config.gz
Type: application/octet-stream
Size: 54467 bytes
Desc: not available
URL: 



[Intel-gfx] [PATCH 5/9] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-04-21 Thread kbuild test robot
Hi,

[auto build test ERROR on drm-intel/for-linux-next]
[also build test ERROR on next-20160420]
[cannot apply to v4.6-rc4]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Robert-Bragg/Enable-Gen-7-Observation-Architecture/20160420-222746
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: i386-randconfig-s1-201616 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `i915_perf_open_ioctl_locked':
>> (.text+0x2cadf4): undefined reference to `__udivdi3'
   drivers/built-in.o: In function `i915_perf_open_ioctl_locked':
   (.text+0x2cae0d): undefined reference to `__udivdi3'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
-- next part --
A non-text attachment was scrubbed...
Name: .config.gz
Type: application/octet-stream
Size: 25411 bytes
Desc: not available
URL: