[RFC v2] drm: Enable dynamic debug for DRM_[DEV]_DEBUG*

2016-12-08 Thread Robert Bragg
On Thu, Dec 8, 2016 at 12:17 AM, Daniel Vetter  wrote:
>
> On Wed, Dec 07, 2016 at 06:35:29PM +, Robert Bragg wrote:
> > This is still missing corresponding documentation changes, and I haven't
> > moved anything to drm_print.h yet, as suggested.
> >
> > Sending out with a few functional improvements first to get agreement
> > before documenting anything (changes summarised in v2: section below)
> >
> > In particular, affecting the output format, I stole an idea from Tvrtko
> > Ursulin to have the prefix for messages be based on the driver name,
> > such as "[i915]" instead of always being "[drm]".
> >
> > Depending on peoples thoughts on compatibility, we could consider
> > removing the prefix given that the dynamic debug control interface has a
> > way of specifying that messages should include a module name, function
> > or line info like:
> >
> > echo "module i915 +mfp" > dynamic_debug/control
> >
> > That would enable all i915 debug messages with a module and function
> > prefix.
> >
> > A trade-off would be that anyone only using the drm.drm_debug interface
> > to control messages would loose some information. If we really wanted we
> > could have the best of both by adding a utility printing api that can
> > recognise when printing due to a dynamic debug control query vs
> > drm.drm_debug to conditionally add the prefix.
> >
> > --- >8 --- (git am --scissors)
> >
> > Dynamic debug messages (ref: Documentation/dynamic-debug-howto.txt)
> > allow fine grained control over which debug messages are enabled with
> > runtime control through /sysfs/kernel/debug/dynamic_debug/control
> >
> > This provides more control than the current drm.drm_debug parameter
> > which for some use cases is impractical to use given how chatty
> > some drm debug categories are.
> >
> > For example all debug messages in i915_drm.c can be enabled with:
> > echo "file i915_perf.c +p" > dynamic_debug/control
> >
> > This doesn't strictly maintain format compatibility with the previous
> > debug messages since the category is now added as part of the prefix
> > like "[drm][kms] No FB found". Adding the categories with a consistent
> > format makes it possible to enable categories with a dynamic debug
> > query like: echo "format [kms] +p" > dynamic_debug/control
> >
> > This maintains support for enabling debug messages using the drm_debug
> > parameter. If dynamic debug is not enabled via CONFIG_DYNAMIC_DEBUG the
> > debug messages essentially work as before, except with the inclusion of
> > categories in the format strings as described above.
> >
> > This removes the drm_[dev_]printk wrappers considering that the dynamic
> > debug macros are only useful if they can track the __FILE__, __func__
> > and __LINE__ where they are called. The wrapper didn't seem necessary in
> > the DRM_UT_NONE case with no category flag.
> >
> > The non _DEV macros are no longer defined in terms of passing NULL to a
> > _DEV variant to avoid have the core.c dev_printk implementation adding
> > "(NULL device *)". The previous drm_[dev_]prink function used to handle
> > this as a special case.
> >
> > Instead of using DRM_NAME to add [drm] to the start of every message,
> > the prefix is now based on module_name(THIS_MODULE) so it will be [drm]
> > or e.g. [i915] for the Intel driver. Later we might consider removing
> > the prefix altogether considering that the dynamic debug control
> > interface has a way of optionally adding the module, function or line to
> > the formatting of messages.
> >
> > v2:
> > Add categories to format like "[drm][kms] No FB found"
> > Only single conditional call per message (macros expand to less code)
> > Uses __dynamic_pr_debug/dev_dbg for dynamic formatting features
> > Use module name for msg prefix like [drm] or [i915]
> >
> > Signed-off-by: Robert Bragg 
> > Cc: dri-devel at lists.freedesktop.org
> > Cc: Daniel Vetter 
> > Cc: Tvrtko Ursulin 
>
> So assuming I understand it correctly - I like this 3way cascade of
> dynamic debug, then printk and no_printk fallback if CONFIG_DEBUG=n for
> the space concious. But I guess we do need to add a DRM Kconfig knob to
> set DEBUG, at least I'm not entirely sure how that's supposed to work. Or
> we might need to have our own #ifdef maze for this. Maybe we need to keep
> the old drm*printk stuff for that?

Right, I wasn't really sure who/what's respo

[RFC] drm: Enable dynamic debug for DRM_[DEV]_DEBUG*

2016-12-07 Thread Robert Bragg
On Mon, Dec 5, 2016 at 4:31 PM, Daniel Vetter  wrote:

> On Mon, Dec 05, 2016 at 11:24:44AM +0000, Robert Bragg wrote:
> > Forgot to send to dri-devel when I first sent this out...
> >
> > The few times I've looked at using DRM_DEBUG messages, I haven't found
> > them very helpful considering how noisy some of the categories are. More
> > than once now I've preferred to go in and modify individual files to
> > affect what messages I see and re-build.
> >
> > After recently converting some of the i915_perf.c messages to use
> > DRM_DEBUG, I thought I'd see if DRM_DEBUG could be updated to have a bit
> > more fine grained control than the current category flags.
> >
> > A few things to note with this first iteration:
> >
> > - I haven't looked to see what affect the change has on linked object
> >   sizes.
> >
> > - It seems like it could be nice if dynamic debug could effectively make
> >   the drm_debug parameter redundant but dynamic debug doesn't give us a
> >   way to categorise messages so maybe we'd want to consider including
> >   categories in messages something like:
> >
> >   "[drm][kms] No FB found"
> >
> >   This way all kms messages could be enabled via:
> >   echo "format [kms] +p" > dynamic_debug/control
> >
> >   Note with this simple scheme categories would no longer be mutually
> >   exclusive which could be a nice bonus.
>
> Really nice idea, and I agree that unifying drm.debug with dynamic debug
> in some way would be useful. We could implement your idea by reworking the
> existing debug helpers to auto-prepend the right string. That also opens
> up the door for much more fine-grained bucketing maybe, only challenge is
> that we should document things somewhere.
>

yup, I don't mind writing some doc updates for this if it looks worthwhile.


>
> >   Since it would involve changing the output format, I wonder how
> >   concerned others might be about breaking some userspace (maybe CI test
> >   runners) that for some reason grep for specific messages?
>
> I think the only thing we have to keep working (somehow) is drm.debug. The
> exact output format doesn't really matter at all. Getting drm.debug to
> work when dynamic debugging is enabled probably requires exporting some
> functions, so that we can set the right ddebug options from the drm.debug
> mod-option write handler. There's special mod-option macros that allow you
> to specify write handlers, so that part is ok.
>

dynamic_debug.h exposes a macro for declaring your own dynamic debug meta
data as well as a macro for testing whether the message has been enabled.

I'm handling compatibility by using those macros so I can still test the
drm.drm_debug flags.

Handling compatibility in terms of running control queries from the kernel
would be a bit more tricky since we'd need to export some api from
dynamic_debug.c as well as adding a write handler for drm_debug. Also the
enabledness of messages is boolean not refcounted so I suppose there could
be slightly annoying interactions if mixing both - though that could be
documented.

The only disadvantage I can think of currently for not handling
compatibility in terms of running control queries is that the dynamic debug
macros can normally avoid evaluating any conditions on the cpu while a
message is disabled, based on jump labels/static branches. We were already
evaluating a condition for disabled drm debug messages though, so it seems
reasonable to continue for now.


>
> The other bit of backwards compat we imo need is that by default we should
> still keep drm.debug working, even when dynamic debugging is disabled.
> Having a third option that uses no_printk or similar (to get rid of all
> the debug strings and dead-code-eliminate all the related output code)
>

Yeah, I think the current code already handles this, but sorry if it's not
clear.

This version is #ifdefed so that if dynamic debug isn't enabled the dynamic
debug path reduces to a no_prink

I'm considering CONFIG_DYNAMIC_DEBUG being enabled or not and when enabled
I check drm_debug and the dynamic debug state, when disabled I'm just
checking the drm_debug flags and the dynamic debugs bits boil out.

In my updated patch things we re-jigged a little so pr_debug and dev_dbg
are used when CONFIG_DYNAMIC_DEBUG is not enabled, and these internally
boil down to no_printk if DEBUG is disabled. Actually we might want to
consider if that's the desired behaviour - since DRM_DEBUG wasn't
previously affected by DEBUG being defined or not.


> > --- >8 --- (git am --scissors)
> >
> > Dynamic debug messages (ref: Documentation/dynamic-debug-howto.txt)
&

[RFC v2] drm: Enable dynamic debug for DRM_[DEV]_DEBUG*

2016-12-07 Thread Robert Bragg
This is still missing corresponding documentation changes, and I haven't
moved anything to drm_print.h yet, as suggested.

Sending out with a few functional improvements first to get agreement
before documenting anything (changes summarised in v2: section below)

In particular, affecting the output format, I stole an idea from Tvrtko
Ursulin to have the prefix for messages be based on the driver name,
such as "[i915]" instead of always being "[drm]".

Depending on peoples thoughts on compatibility, we could consider
removing the prefix given that the dynamic debug control interface has a
way of specifying that messages should include a module name, function
or line info like:

echo "module i915 +mfp" > dynamic_debug/control

That would enable all i915 debug messages with a module and function
prefix.

A trade-off would be that anyone only using the drm.drm_debug interface
to control messages would loose some information. If we really wanted we
could have the best of both by adding a utility printing api that can
recognise when printing due to a dynamic debug control query vs
drm.drm_debug to conditionally add the prefix.

--- >8 --- (git am --scissors)

Dynamic debug messages (ref: Documentation/dynamic-debug-howto.txt)
allow fine grained control over which debug messages are enabled with
runtime control through /sysfs/kernel/debug/dynamic_debug/control

This provides more control than the current drm.drm_debug parameter
which for some use cases is impractical to use given how chatty
some drm debug categories are.

For example all debug messages in i915_drm.c can be enabled with:
echo "file i915_perf.c +p" > dynamic_debug/control

This doesn't strictly maintain format compatibility with the previous
debug messages since the category is now added as part of the prefix
like "[drm][kms] No FB found". Adding the categories with a consistent
format makes it possible to enable categories with a dynamic debug
query like: echo "format [kms] +p" > dynamic_debug/control

This maintains support for enabling debug messages using the drm_debug
parameter. If dynamic debug is not enabled via CONFIG_DYNAMIC_DEBUG the
debug messages essentially work as before, except with the inclusion of
categories in the format strings as described above.

This removes the drm_[dev_]printk wrappers considering that the dynamic
debug macros are only useful if they can track the __FILE__, __func__
and __LINE__ where they are called. The wrapper didn't seem necessary in
the DRM_UT_NONE case with no category flag.

The non _DEV macros are no longer defined in terms of passing NULL to a
_DEV variant to avoid have the core.c dev_printk implementation adding
"(NULL device *)". The previous drm_[dev_]prink function used to handle
this as a special case.

Instead of using DRM_NAME to add [drm] to the start of every message,
the prefix is now based on module_name(THIS_MODULE) so it will be [drm]
or e.g. [i915] for the Intel driver. Later we might consider removing
the prefix altogether considering that the dynamic debug control
interface has a way of optionally adding the module, function or line to
the formatting of messages.

v2:
Add categories to format like "[drm][kms] No FB found"
Only single conditional call per message (macros expand to less code)
Uses __dynamic_pr_debug/dev_dbg for dynamic formatting features
Use module name for msg prefix like [drm] or [i915]

Signed-off-by: Robert Bragg 
Cc: dri-devel at lists.freedesktop.org
Cc: Daniel Vetter 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/drm_drv.c |  47 ---
 include/drm/drmP.h| 202 +-
 2 files changed, 127 insertions(+), 122 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index f74b7d0..25d00aa 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -65,53 +65,6 @@ static struct idr drm_minors_idr;

 static struct dentry *drm_debugfs_root;

-#define DRM_PRINTK_FMT "[" DRM_NAME ":%s]%s %pV"
-
-void drm_dev_printk(const struct device *dev, const char *level,
-   unsigned int category, const char *function_name,
-   const char *prefix, const char *format, ...)
-{
-   struct va_format vaf;
-   va_list args;
-
-   if (category != DRM_UT_NONE && !(drm_debug & category))
-   return;
-
-   va_start(args, format);
-   vaf.fmt = format;
-   vaf.va = &args;
-
-   if (dev)
-   dev_printk(level, dev, DRM_PRINTK_FMT, function_name, prefix,
-  &vaf);
-   else
-   printk("%s" DRM_PRINTK_FMT, level, function_name, prefix, &vaf);
-
-   va_end(args);
-}
-EXPORT_SYMBOL(drm_dev_printk);
-
-void drm_printk(const char *level, unsigned int category,
-   const char *format, ...)
-{
-   stru

[Intel-gfx] [RFC 0/5] DRM logging tidy

2016-12-07 Thread Robert Bragg
On Tue, Dec 6, 2016 at 6:57 PM, Tvrtko Ursulin  wrote:

> From: Tvrtko Ursulin 
>
> I wasn't here at the beginnings of DRM so I might have gotten this wrong,
> however the existance of DRM_NAME suggested to me that the intention was to
> allow individual drivers to override it and get appropriate prefixes in
> their
> log messages.
>
> I can't see that any driver is using it like that but I still thought it
> would
> be neat to do that. That way we could have our log messages look more
> obviously ours. For example after this series we have:
>
>  [i915] Memory usable by graphics device = 4096M
>  [i915] VT-d active for gfx access
>  [i915] Replacing VGA console driver
>  [i915] ACPI BIOS requests an excessive sleep of 2 ms, using 1500 ms
> instead
>  [i915] Finished loading DMC firmware i915/skl_dmc_ver1_26.bin (v1.26)
>  [i915] Disabling framebuffer compression (FBC) to prevent screen flicker
> with VT-d enabled
>  [i915] GuC firmware load skipped
>  [i915] Initialized i915 1.6.0 20161205 for :00:02.0 on minor 0
>  [i915] DRM_I915_DEBUG enabled
>  [i915] DRM_I915_DEBUG_GEM enabled
>  [i915] RC6 on
>
> Previously all that was prefixed with "[drm]" which was OK but I think the
> above is even better.
>
> Also to consider is that recent drm_printk work has removed (it hardcoded)
> DRM_NAME from DRM_ERROR and DRM_DEBUG macros, while leaving it with the
> rest
> (DRM_INFO, NOTE and WARNING) creating a bit of a inconsistency.
>

I wonder if I can maybe fold some of this idea into my related DRM_DEBUG
[RFC] sent out recently:
https://lists.freedesktop.org/archives/dri-devel/2016-December/126094.html

Instead of using DRM_NAME, I've experimented with updating my changes
adding support for dynamic debug to add a prefix based on
module_name(THIS_MODULE) for a similar result

One thing to consider here is that with the addition of dynamic debug
support this prefix arguably becomes redundant because the
dynamic_debug/control interface lets you choose to add a module name or
function prefix to messages, e.g. like:

echo "module i915 +mfp" > dynamic_debug/control

I've ignored the redundancy because my change still allows enabling
messages with the drm.drm_debug parameter and in that case the prefix is
still useful.

Br,
- Robert



> This series also makes all the logging macros use drm_printk, but also
> makes DRM_NAME passed in from the macro wrappers in all cases. So drivers
> can override it regardless of the log level.
>
> And finally, the series also removes a bit of redundant data from the debug
> messages effectively converting this:
>
>  [drm:edp_panel_off [i915]] Wait for panel power off time
>
> Into this:
>
>  [edp_panel_off [i915]] Wait for panel power off time
>
> Which still has all the data in it.
>
> Tvrtko Ursulin (5):
>   drm/i915: Give our log messages our name
>   drm: Respect driver set DRM_NAME in drm_printk
>   drm: Respect driver set DRM_NAME in drm_dev_printk
>   drm: Use drm_printk for all logging macros
>   drm: Do not log driver prefix in debug messages
>
>  drivers/gpu/drm/drm_drv.c   | 39 +++--
>  drivers/gpu/drm/i915/i915_drv.c |  3 +-
>  include/drm/drmP.h  | 94 --
> ---
>  include/drm/drm_drv.h   | 11 ++---
>  include/uapi/drm/i915_drm.h |  3 ++
>  5 files changed, 92 insertions(+), 58 deletions(-)
>
> --
> 2.7.4
>
> ___
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
-- next part --
An HTML attachment was scrubbed...
URL: 



[RFC] drm: Enable dynamic debug for DRM_[DEV]_DEBUG*

2016-12-05 Thread Robert Bragg
Forgot to send to dri-devel when I first sent this out...

The few times I've looked at using DRM_DEBUG messages, I haven't found
them very helpful considering how noisy some of the categories are. More
than once now I've preferred to go in and modify individual files to
affect what messages I see and re-build.

After recently converting some of the i915_perf.c messages to use
DRM_DEBUG, I thought I'd see if DRM_DEBUG could be updated to have a bit
more fine grained control than the current category flags.

A few things to note with this first iteration:

- I haven't looked to see what affect the change has on linked object
  sizes.

- It seems like it could be nice if dynamic debug could effectively make
  the drm_debug parameter redundant but dynamic debug doesn't give us a
  way to categorise messages so maybe we'd want to consider including
  categories in messages something like:

  "[drm][kms] No FB found"

  This way all kms messages could be enabled via:
  echo "format [kms] +p" > dynamic_debug/control

  Note with this simple scheme categories would no longer be mutually
  exclusive which could be a nice bonus.

  Since it would involve changing the output format, I wonder how
  concerned others might be about breaking some userspace (maybe CI test
  runners) that for some reason grep for specific messages?

--- >8 --- (git am --scissors)

Dynamic debug messages (ref: Documentation/dynamic-debug-howto.txt)
allow fine grained control over which debug messages are enabled with
runtime control through /sysfs/kernel/debug/dynamic_debug/control

This provides more control than the current drm.drm_debug parameter
which for some use cases is impractical to use given how chatty
some drm debug categories are.

For example all debug messages in i915_drm.c can be enabled with:
echo "file i915_perf.c +p" > dynamic_debug/control

This aims to maintain compatibility with controlling debug messages
using the drm_debug parameter. The new dynamic debug macros are called
by default but conditionally calling [dev_]printk if the category flag
is set (side stepping the dynamic debug condition in that case)

This removes the drm_[dev_]printk wrappers considering that the dynamic
debug macros are only useful if they can track the __FILE__, __func__
and __LINE__ where they are called. The wrapper didn't seem necessary in
the DRM_UT_NONE case with no category flag.

The output format should be compatible, unless the _DEV macros are
passed a NULL dev pointer considering how the core.c dev_printk
implementation adds "(NULL device *)" to the message in that case while
the drm wrapper would fallback to a plain printk in this case.
Previously some of non-dev drm debug macros were defined in terms of
passing NULL to a dev version but that's avoided now due to this
difference.

Signed-off-by: Robert Bragg 
Cc: dri-devel at lists.freedesktop.org
Cc: Daniel Vetter 
Cc: Chris Wilson 
---
 drivers/gpu/drm/drm_drv.c |  47 -
 include/drm/drmP.h| 168 +-
 2 files changed, 108 insertions(+), 107 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index f74b7d0..25d00aa 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -65,53 +65,6 @@ static struct idr drm_minors_idr;

 static struct dentry *drm_debugfs_root;

-#define DRM_PRINTK_FMT "[" DRM_NAME ":%s]%s %pV"
-
-void drm_dev_printk(const struct device *dev, const char *level,
-   unsigned int category, const char *function_name,
-   const char *prefix, const char *format, ...)
-{
-   struct va_format vaf;
-   va_list args;
-
-   if (category != DRM_UT_NONE && !(drm_debug & category))
-   return;
-
-   va_start(args, format);
-   vaf.fmt = format;
-   vaf.va = &args;
-
-   if (dev)
-   dev_printk(level, dev, DRM_PRINTK_FMT, function_name, prefix,
-  &vaf);
-   else
-   printk("%s" DRM_PRINTK_FMT, level, function_name, prefix, &vaf);
-
-   va_end(args);
-}
-EXPORT_SYMBOL(drm_dev_printk);
-
-void drm_printk(const char *level, unsigned int category,
-   const char *format, ...)
-{
-   struct va_format vaf;
-   va_list args;
-
-   if (category != DRM_UT_NONE && !(drm_debug & category))
-   return;
-
-   va_start(args, format);
-   vaf.fmt = format;
-   vaf.va = &args;
-
-   printk("%s" "[" DRM_NAME ":%ps]%s %pV",
-  level, __builtin_return_address(0),
-  strcmp(level, KERN_ERR) == 0 ? " *ERROR*" : "", &vaf);
-
-   va_end(args);
-}
-EXPORT_SYMBOL(drm_printk);
-
 /*
  * DRM Minors
  * A DRM device can provide several char-dev interfaces on the DRM-Major. Eac

[ANNOUNCE] libdrm 2.4.74

2016-11-29 Thread Robert Bragg
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Ben Widawsky (1):
  intel: Add Geminilake PCI IDs

Christian Gmeiner (4):
  etnaviv: add API to get drm fd from etna_device
  etnaviv: add API to create etna_device from private dup() fd
  etnaviv: change get_abs_timeout(..) to use ns.
  etnaviv: add etna_pipe_wait_ns(..)

Emil Velikov (2):
  automake: make the build less chatty
  xf86drm: introduce drmGetDeviceNameFromFd2

Eric Anholt (1):
  vc4: Add new GETPARAMs that have been merged to drm-next.

Grazvydas Ignotas (2):
  tests: kms: fix shadowed declaration warning
  libdrm: random typo fixes

Michel Dänzer (1):
  intel: Add drm_intel_gem_context_get_id to intel-symbols-check

Rob Clark (1):
  freedreno: 64bit support

Robert Bragg (2):
  intel: Add a getter for the intel_context ctx_id
  Bump version for release

git tag: libdrm-2.4.74

https://dri.freedesktop.org/libdrm/libdrm-2.4.74.tar.bz2
MD5:  31964aa15bdea1a40c5941d4ce0962ee  libdrm-2.4.74.tar.bz2
SHA1: 0d9c02d5d2c6c2fac862cb687bf45bc20d129017  libdrm-2.4.74.tar.bz2
SHA256: d80dd5a76c401f4c8756dcccd999c63d7e0a3bad258d96a829055cfd86ef840b  
libdrm-2.4.74.tar.bz2
PGP:  https://dri.freedesktop.org/libdrm/libdrm-2.4.74.tar.bz2.sig

https://dri.freedesktop.org/libdrm/libdrm-2.4.74.tar.gz
MD5:  b661a54514109caad3de3b520680b98e  libdrm-2.4.74.tar.gz
SHA1: 7b5a80fbdd432e87934ef3b1256a58ed7b034574  libdrm-2.4.74.tar.gz
SHA256: 3c8fdf5a89826797a8060e6f3455ca22db9ae49576cfcda1c78e3e2ce59af0f1  
libdrm-2.4.74.tar.gz
PGP:  https://dri.freedesktop.org/libdrm/libdrm-2.4.74.tar.gz.sig

-BEGIN PGP SIGNATURE-

iEYEARECAAYFAlg9iT0ACgkQjNHfVSl1KXvN2QCfSj1H53aYHdMVMUN2B64FVF5E
n0QAn0Fn3jDlrl6lpdbTJO3Mclg9WFUZ
=+4Tx
-END PGP SIGNATURE-


[Intel-gfx] [PATCH v2] drm/i915: don't whitelist oacontrol in cmd parser

2016-11-22 Thread Robert Bragg
On Tue, Nov 22, 2016 at 1:34 PM, Daniel Vetter  wrote:

> On Tue, Nov 08, 2016 at 12:51:48PM +0000, Robert Bragg wrote:
> > This v2 patch bumps the command parser version so it can be referenced in
> > corresponding i-g-t gem_exec_parse changes.
> >
> > --- >8 ---
>
> Scissors cut everything below, not everything above, hence next time
> around pls switch around your comment and the commit message, as-is not
> much left ;-)
>

Hmm, they cut away what's above and keep what's below in my experience -
what command are you seeing the opposite with?

I just double checked this with git am --scissors

- Robert



>
> Fixed up while applying.
> -Daniel
>
> >
> > Being able to program OACONTROL from a non-privileged batch buffer is
> > not sufficient to be able to configure the OA unit. This was originally
> > allowed to help enable Mesa to expose OA counters via the
> > INTEL_performance_query extension, but the current implementation based
> > on programming OACONTROL via a batch buffer isn't able to report useable
> > data without a more complete OA unit configuration. Mesa handles the
> > possibility that writes to OACONTROL may not be allowed and so only
> > advertises the extension after explicitly testing that a write to
> > OACONTROL succeeds. Based on this; removing OACONTROL from the whitelist
> > should be ok for userspace.
> >
> > Removing this simplifies adding a new kernel api for configuring the OA
> > unit without needing to consider the possibility that userspace might
> > trample on OACONTROL state which we'd like to start managing within
> > the kernel instead. In particular running any Mesa based GL application
> > currently results in clearing OACONTROL when initializing which would
> > disable the capturing of metrics.
> >
> > v2:
> > This bumps the command parser version from 8 to 9, as the change is
> > visible to userspace.
> >
> > Signed-off-by: Robert Bragg 
> > Reviewed-by: Matthew Auld 
> > Reviewed-by: Sourab Gupta 
> > ---
> >  drivers/gpu/drm/i915/i915_cmd_parser.c | 42
> --
> >  1 file changed, 5 insertions(+), 37 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c
> b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > index c9d2ecd..f5762cd 100644
> > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > @@ -450,7 +450,6 @@ static const struct drm_i915_reg_descriptor
> gen7_render_regs[] = {
> >   REG64(PS_INVOCATION_COUNT),
> >   REG64(PS_DEPTH_COUNT),
> >   REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE),
> > - REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below.
> */
> >   REG64(MI_PREDICATE_SRC0),
> >   REG64(MI_PREDICATE_SRC1),
> >   REG32(GEN7_3DPRIM_END_OFFSET),
> > @@ -1060,8 +1059,7 @@ bool intel_engine_needs_cmd_parser(struct
> intel_engine_cs *engine)
> >  static bool check_cmd(const struct intel_engine_cs *engine,
> > const struct drm_i915_cmd_descriptor *desc,
> > const u32 *cmd, u32 length,
> > -   const bool is_master,
> > -   bool *oacontrol_set)
> > +   const bool is_master)
> >  {
> >   if (desc->flags & CMD_DESC_SKIP)
> >   return true;
> > @@ -1099,31 +1097,6 @@ static bool check_cmd(const struct
> intel_engine_cs *engine,
> >   }
> >
> >   /*
> > -  * OACONTROL requires some special handling for
> > -  * writes. We want to make sure that any batch
> which
> > -  * enables OA also disables it before the end of
> the
> > -  * batch. The goal is to prevent one process from
> > -  * snooping on the perf data from another process.
> To do
> > -  * that, we need to check the value that will be
> written
> > -  * to the register. Hence, limit OACONTROL writes
> to
> > -  * only MI_LOAD_REGISTER_IMM commands.
> > -  */
> > - if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL))
> {
> > - if (desc->cmd.value ==
> MI_LOAD_REGISTER_MEM) {
> > - DRM_DEBUG_DRIVER("CMD: Rejected
> LRM to OACONTROL\n");
> > - return false;
> > -   

[Intel-gfx] [PATCH v9 01/11] drm/i915: Add i915 perf infrastructure

2016-11-22 Thread Robert Bragg
On Tue, Nov 22, 2016 at 1:31 PM, Daniel Vetter  wrote:

> On Tue, Nov 22, 2016 at 02:29:18PM +0100, Daniel Vetter wrote:
> > On Wed, Nov 09, 2016 at 08:00:06PM +, Matthew Auld wrote:
> > > On 7 November 2016 at 19:49, Robert Bragg 
> wrote:
> > > > Adds base i915 perf infrastructure for Gen performance metrics.
> > > >
> > > > This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of
> uint64
> > > > properties to configure a stream of metrics and returns a new fd
> usable
> > > > with standard VFS system calls including read() to read typed and
> sized
> > > > records; ioctl() to enable or disable capture and poll() to wait for
> > > > data.
> > > >
> > > > A stream is opened something like:
> > > >
> > > >   uint64_t properties[] = {
> > > >   /* Single context sampling */
> > > >   DRM_I915_PERF_PROP_CTX_HANDLE,ctx_handle,
> > > >
> > > >   /* Include OA reports in samples */
> > > >   DRM_I915_PERF_PROP_SAMPLE_OA, true,
> > > >
> > > >   /* OA unit configuration */
> > > >   DRM_I915_PERF_PROP_OA_METRICS_SET,metrics_set_id,
> > > >   DRM_I915_PERF_PROP_OA_FORMAT, report_format,
> > > >   DRM_I915_PERF_PROP_OA_EXPONENT,   period_exponent,
> > > >};
> > > >struct drm_i915_perf_open_param parm = {
> > > >   .flags = I915_PERF_FLAG_FD_CLOEXEC |
> > > >I915_PERF_FLAG_FD_NONBLOCK |
> > > >I915_PERF_FLAG_DISABLED,
> > > >   .properties_ptr = (uint64_t)properties,
> > > >   .num_properties = sizeof(properties) / 16,
> > > >};
> > > >int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, ¶m);
> > > >
> > > > Records read all start with a common { type, size } header with
> > > > DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records
> > > > contain an extensible number of fields and it's the
> > > > DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that
> > > > determine what's included in every sample.
> > > >
> > > > No specific streams are supported yet so any attempt to open a stream
> > > > will return an error.
> > > >
> > > > v2:
> > > > use i915_gem_context_get() - Chris Wilson
> > > > v3:
> > > > update read() interface to avoid passing state struct - Chris
> Wilson
> > > > fix some rebase fallout, with i915-perf init/deinit
> > > > v4:
> > > > s/DRM_IORW/DRM_IOW/ - Emil Velikov
> > > >
> > > > Signed-off-by: Robert Bragg 
> > > > Reviewed-by: Matthew Auld 
> > > > Reviewed-by: Sourab Gupta 
> > > Minor nit, there are a fair few DRM_ERROR's missing a new line.
> >
> > Also, DRM_ERROR for userspace-triggerable failures is no good. igt
> > testcase are supposed to exercise all the invalid stuff, and would then
> > fail if you spam dmesg. Why was this not caught?
> >
> > Fixup patch totally fine, but if this wasn't caught due to missing igt
> > that needs to be fixed, too.
>
> Another nitpick for the future: Enabling new features first and then
> fixing up the fallout is the wrong way round, if someone bisects over this
> range mesa might blow up in really bad ways.
>
> Oh well, this has been out there for way too long, so meh.
>

Fwiw I'm aware of this, and think I've ordered the patches correctly to
avoid bisect problems in Mesa / userspace. This infrastructure patch should
have no fallout to fix for userspace. The command parser changes that
affect userspace were done before adding oacontrol usage to i915-perf and
the cmd parser's EINVAL reporting for access failures was changed *before*
removing oacontrol from the whitelist.

Did I overlook something in particular?

- Robert



> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
>
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://lists.freedesktop.org/archives/dri-devel/attachments/20161122/a04d5653/attachment.html>


[PATCH libdrm v2] intel: Add a getter for the intel_context ctx_id

2016-11-21 Thread Robert Bragg
Renamed to avoid the seemingly redundant 'context_' infix and note that it's
been reviewed by Matthew Auld.

--- >8 ---

Exposing the u32 context ID makes it possible to define new drm kernel
interfaces based on the same IDs that e.g. execbuf uses to identify a
gem context, that aren't themselves abstracted by libdrm but need to be
used by libdrm/drm_intel_context based clients such as (parts of) i-g-t
or Mesa.

For example this can be used to configure an i915-perf stream to collect
metrics for a specific context.

v2: s/drm_intel_gem_context_get_context_id/drm_intel_gem_context_get_id/

Signed-off: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 intel/intel_bufmgr.h |  2 ++
 intel/intel_bufmgr_gem.c | 11 +++
 2 files changed, 13 insertions(+)

diff --git a/intel/intel_bufmgr.h b/intel/intel_bufmgr.h
index ce4e70d..85e4ff7 100644
--- a/intel/intel_bufmgr.h
+++ b/intel/intel_bufmgr.h
@@ -212,6 +212,8 @@ int drm_intel_bufmgr_gem_get_devid(drm_intel_bufmgr 
*bufmgr);
 int drm_intel_gem_bo_wait(drm_intel_bo *bo, int64_t timeout_ns);

 drm_intel_context *drm_intel_gem_context_create(drm_intel_bufmgr *bufmgr);
+int drm_intel_gem_context_get_id(drm_intel_context *ctx,
+ uint32_t *ctx_id);
 void drm_intel_gem_context_destroy(drm_intel_context *ctx);
 int drm_intel_gem_bo_context_exec(drm_intel_bo *bo, drm_intel_context *ctx,
  int used, unsigned int flags);
diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index 15c79b3..5fc022a 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -3184,6 +3184,17 @@ drm_intel_gem_context_create(drm_intel_bufmgr *bufmgr)
return context;
 }

+int
+drm_intel_gem_context_get_id(drm_intel_context *ctx, uint32_t *ctx_id)
+{
+   if (ctx == NULL)
+   return -EINVAL;
+
+   *ctx_id = ctx->ctx_id;
+
+   return 0;
+}
+
 void
 drm_intel_gem_context_destroy(drm_intel_context *ctx)
 {
-- 
2.10.1



[PATCH libdrm] intel: Add a getter for the intel_context ctx_id

2016-11-17 Thread Robert Bragg
I forgot that my recently sent out i915-perf i-g-t tests depend on this utility
api (not just my Mesa / GL_INTEL_performance_query patches).

Not all tests in i-g-t use libdrm to create contexts, but the i915-perf tests
use render_copy (drm_intel_context based) while testing single context
filtering and so want to pluck out the u32 ctx_id for passing to i915-perf.

I made a last moment tweak to the utility to return an error value separate
from a uint32_t output ctx_id pointer, so there will need to be a corresponding
tweak of the perf.c tests I sent out.

Regards,
- Robert

--- >8 ---

Exposing the u32 context ID makes it possible to define new drm kernel
interfaces based on the same IDs that e.g. execbuf uses to identify a
gem context, that aren't themselves abstracted by libdrm but need to be
used by libdrm/drm_intel_context based clients such as (parts of) i-g-t
or Mesa.

For example this can be used to configure an i915-perf stream to collect
metrics for a specific context.

Signed-off: Robert Bragg 
---
 intel/intel_bufmgr.h |  2 ++
 intel/intel_bufmgr_gem.c | 11 +++
 2 files changed, 13 insertions(+)

diff --git a/intel/intel_bufmgr.h b/intel/intel_bufmgr.h
index ce4e70d..7530fa5 100644
--- a/intel/intel_bufmgr.h
+++ b/intel/intel_bufmgr.h
@@ -212,6 +212,8 @@ int drm_intel_bufmgr_gem_get_devid(drm_intel_bufmgr 
*bufmgr);
 int drm_intel_gem_bo_wait(drm_intel_bo *bo, int64_t timeout_ns);

 drm_intel_context *drm_intel_gem_context_create(drm_intel_bufmgr *bufmgr);
+int drm_intel_gem_context_get_context_id(drm_intel_context *ctx,
+uint32_t *ctx_id);
 void drm_intel_gem_context_destroy(drm_intel_context *ctx);
 int drm_intel_gem_bo_context_exec(drm_intel_bo *bo, drm_intel_context *ctx,
  int used, unsigned int flags);
diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index 15c79b3..cefe4a7 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -3184,6 +3184,17 @@ drm_intel_gem_context_create(drm_intel_bufmgr *bufmgr)
return context;
 }

+int
+drm_intel_gem_context_get_context_id(drm_intel_context *ctx, uint32_t *ctx_id)
+{
+   if (ctx == NULL)
+   return -EINVAL;
+
+   *ctx_id = ctx->ctx_id;
+
+   return 0;
+}
+
 void
 drm_intel_gem_context_destroy(drm_intel_context *ctx)
 {
-- 
2.10.1



[PATCH] headers: add i915-perf interface to i915_drm.h

2016-11-10 Thread Robert Bragg
Ah, yup, I missed the [PATH libdrm] convention sorry.

I noticed some other updates based on make headers_install, though I also
saw a bunch of unrelated noise with e.g. mocs and C++ guard changes so
figured I'd avoid those for now, while I was pretty much expecting I'd need
to re-send this later. I just wanted to get a patch out in case it could be
convenient for folks looking to run my i915-perf igt tests.

I don't have a meaningful tree/sha to reference while it's chicken/egg atm
the order of landing i915-perf changes.

tbh I didn't read the README before sending and not sure I'd expect to find
this kind of info there. Maybe if there were a CONTRIBUTING doc or
something along those lines a note could go there. I probably should have
just thought to check for a format-patch infix since libdrm isn't the only
fd.o project with this, it was probably a bit of laziness on my part so
sorry about that.

thanks for the note though, I've updated my git [format] config now.

- Robert


On Thu, Nov 10, 2016 at 3:45 PM, Emil Velikov 
wrote:

> Hi Robert
>
> Just a couple of trivial suggestions:
>  - git config --local format.subjectPrefix "PATCH libdrm"
>  - make sure that the headers are generated via `make headers_install'
> and state the tree/sha that you've used.
> See 9af2ccdef cc9a53f07 or 89cdda3d5.
>
> Thanks
> Emil
> P.S. /me goes to add a README with the above instructions
>
-- next part --
An HTML attachment was scrubbed...
URL: 



[PATCH v9 09/11] drm/i915: add dev.i915.oa_max_sample_rate sysctl

2016-11-10 Thread Robert Bragg
On Wed, Nov 9, 2016 at 7:52 PM, Matthew Auld  wrote:

> On 7 November 2016 at 19:49, Robert Bragg  wrote:
> > The maximum OA sampling frequency is now configurable via a
> > dev.i915.oa_max_sample_rate sysctl parameter.
> >
> > Following the precedent set by perf's similar
> > kernel.perf_event_max_sample_rate the default maximum rate is 10Hz
> >
> > Signed-off-by: Robert Bragg 
> > ---
> >  drivers/gpu/drm/i915/i915_perf.c | 61 ++
> ++
> >  1 file changed, 50 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c
> b/drivers/gpu/drm/i915/i915_perf.c
> > index e51c1d8..1a87fe9 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -82,6 +82,21 @@ static u32 i915_perf_stream_paranoid = true;
> >  #define INVALID_CTX_ID 0x
> >
> >
> > +/* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
> > + *
> > + * 160ns is the smallest sampling period we can theoretically program
> the OA
> > + * unit with on Haswell, corresponding to 6.25MHz.
> > + */
> > +static int oa_sample_rate_hard_limit = 625;
> > +
> > +/* Theoretically we can program the OA unit to sample every 160ns but
> don't
> > + * allow that by default unless root...
> > + *
> > + * The default threshold of 10Hz is based on perf's similar
> > + * kernel.perf_event_max_sample_rate sysctl parameter.
> > + */
> > +static u32 i915_oa_max_sample_rate = 10;
> > +
> >  /* XXX: beware if future OA HW adds new report formats that the current
> >   * code assumes all reports have a power-of-two size and ~(size - 1) can
> >   * be used as a mask to align the OA tail pointer.
> > @@ -1314,6 +1329,7 @@ static int read_properties_unlocked(struct
> drm_i915_private *dev_priv,
> > }
> >
> > for (i = 0; i < n_props; i++) {
> > +   u64 oa_period, oa_freq_hz;
> > u64 id, value;
> > int ret;
> >
> > @@ -1359,21 +1375,35 @@ static int read_properties_unlocked(struct
> drm_i915_private *dev_priv,
> > return -EINVAL;
> > }
> >
> > -   /* NB: The exponent represents a period as
> follows:
> > -*
> > -*   80ns * 2^(period_exponent + 1)
> > -*
> > -* Theoretically we can program the OA unit to
> sample
> > +   /* Theoretically we can program the OA unit to
> sample
> >  * every 160ns but don't allow that by default
> unless
> >  * root.
> >  *
> > -* Referring to perf's
> > -* kernel.perf_event_max_sample_rate for a
> precedent
> > -* (10 by default); with an OA exponent of 6
> we get
> > -* a period of 10.240 microseconds -just under
> 10Hz
> > +* On Haswell the period is derived from the
> exponent
> > +* as:
> > +*
> > +*   period = 80ns * 2^(exponent + 1)
> > +*/
> > +   BUILD_BUG_ON(sizeof(oa_period) != 8);
> > +   oa_period = 80ull * (2ull << value);
> > +
> > +   /* This check is primarily to ensure that
> oa_period <=
> > +* UINT32_MAX (before passing to do_div which
> only
> > +* accepts a u32 denominator), but we can also
> skip
> > +* checking anything < 1Hz which implicitly
> can't be
> > +* limited via an integer oa_max_sample_rate.
> >  */
> > -   if (value < 6 && !capable(CAP_SYS_ADMIN)) {
> > -   DRM_ERROR("Minimum OA sampling exponent
> is 6 without root privileges\n");
> > +   if (oa_period <= NSEC_PER_SEC) {
> > +   u64 tmp = NSEC_PER_SEC;
> > +   do_div(tmp, oa_period);
> > +   oa_freq_hz = tmp;
> > +   } else
> > +   oa_freq_hz = 0;
> > +
> > + 

[PATCH] headers: add i915-perf interface to i915_drm.h

2016-11-10 Thread Robert Bragg
This interface gives access to Gen graphics Observability counters

Signed-off-by: Robert Bragg 
---
 include/drm/i915_drm.h | 134 +
 1 file changed, 134 insertions(+)

diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
index eb611a7..9eb2b7b 100644
--- a/include/drm/i915_drm.h
+++ b/include/drm/i915_drm.h
@@ -230,6 +230,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_USERPTR   0x33
 #define DRM_I915_GEM_CONTEXT_GETPARAM  0x34
 #define DRM_I915_GEM_CONTEXT_SETPARAM  0x35
+#define DRM_I915_PERF_OPEN 0x36

 #define DRM_IOCTL_I915_INITDRM_IOW( DRM_COMMAND_BASE + 
DRM_I915_INIT, drm_i915_init_t)
 #define DRM_IOCTL_I915_FLUSH   DRM_IO ( DRM_COMMAND_BASE + 
DRM_I915_FLUSH)
@@ -283,6 +284,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_USERPTR DRM_IOWR 
(DRM_COMMAND_BASE + DRM_I915_GEM_USERPTR, struct drm_i915_gem_userptr)
 #define DRM_IOCTL_I915_GEM_CONTEXT_GETPARAMDRM_IOWR (DRM_COMMAND_BASE + 
DRM_I915_GEM_CONTEXT_GETPARAM, struct drm_i915_gem_context_param)
 #define DRM_IOCTL_I915_GEM_CONTEXT_SETPARAMDRM_IOWR (DRM_COMMAND_BASE + 
DRM_I915_GEM_CONTEXT_SETPARAM, struct drm_i915_gem_context_param)
+#define DRM_IOCTL_I915_PERF_OPEN   DRM_IOW(DRM_COMMAND_BASE + 
DRM_I915_PERF_OPEN, struct drm_i915_perf_open_param)

 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1172,4 +1174,136 @@ struct drm_i915_gem_context_param {
__u64 value;
 };

+enum drm_i915_oa_format {
+   I915_OA_FORMAT_A13 = 1,
+   I915_OA_FORMAT_A29,
+   I915_OA_FORMAT_A13_B8_C8,
+   I915_OA_FORMAT_B4_C8,
+   I915_OA_FORMAT_A45_B8_C8,
+   I915_OA_FORMAT_B4_C8_A16,
+   I915_OA_FORMAT_C4_B8,
+
+   I915_OA_FORMAT_MAX  /* non-ABI */
+};
+
+enum drm_i915_perf_property_id {
+   /**
+* Open the stream for a specific context handle (as used with
+* execbuffer2). A stream opened for a specific context this way
+* won't typically require root privileges.
+*/
+   DRM_I915_PERF_PROP_CTX_HANDLE = 1,
+
+   /**
+* A value of 1 requests the inclusion of raw OA unit reports as
+* part of stream samples.
+*/
+   DRM_I915_PERF_PROP_SAMPLE_OA,
+
+   /**
+* The value specifies which set of OA unit metrics should be
+* be configured, defining the contents of any OA unit reports.
+*/
+   DRM_I915_PERF_PROP_OA_METRICS_SET,
+
+   /**
+* The value specifies the size and layout of OA unit reports.
+*/
+   DRM_I915_PERF_PROP_OA_FORMAT,
+
+   /**
+* Specifying this property implicitly requests periodic OA unit
+* sampling and (at least on Haswell) the sampling frequency is derived
+* from this exponent as follows:
+*
+*   80ns * 2^(period_exponent + 1)
+*/
+   DRM_I915_PERF_PROP_OA_EXPONENT,
+
+   DRM_I915_PERF_PROP_MAX /* non-ABI */
+};
+
+struct drm_i915_perf_open_param {
+   __u32 flags;
+#define I915_PERF_FLAG_FD_CLOEXEC  (1<<0)
+#define I915_PERF_FLAG_FD_NONBLOCK (1<<1)
+#define I915_PERF_FLAG_DISABLED(1<<2)
+
+   /** The number of u64 (id, value) pairs */
+   __u32 num_properties;
+
+   /**
+* Pointer to array of u64 (id, value) pairs configuring the stream
+* to open.
+*/
+   __u64 properties_ptr;
+};
+
+/**
+ * Enable data capture for a stream that was either opened in a disabled state
+ * via I915_PERF_FLAG_DISABLED or was later disabled via
+ * I915_PERF_IOCTL_DISABLE.
+ *
+ * It is intended to be cheaper to disable and enable a stream than it may be
+ * to close and re-open a stream with the same configuration.
+ *
+ * It's undefined whether any pending data for the stream will be lost.
+ */
+#define I915_PERF_IOCTL_ENABLE _IO('i', 0x0)
+
+/**
+ * Disable data capture for a stream.
+ *
+ * It is an error to try and read a stream that is disabled.
+ */
+#define I915_PERF_IOCTL_DISABLE_IO('i', 0x1)
+
+/**
+ * Common to all i915 perf records
+ */
+struct drm_i915_perf_record_header {
+   __u32 type;
+   __u16 pad;
+   __u16 size;
+};
+
+enum drm_i915_perf_record_type {
+
+   /**
+* Samples are the work horse record type whose contents are extensible
+* and defined when opening an i915 perf stream based on the given
+* properties.
+*
+* Boolean properties following the naming convention
+* DRM_I915_PERF_SAMPLE_xyz_PROP request the inclusion of 'xyz' data in
+* every sample.
+*
+* The order of these sample properties given by userspace has no
+* affect on the ordering of data within a sample. The order is
+* documented here.
+*
+* struct 

[PATCH v2] drm/i915: don't whitelist oacontrol in cmd parser

2016-11-08 Thread Robert Bragg
This v2 patch bumps the command parser version so it can be referenced in
corresponding i-g-t gem_exec_parse changes.

--- >8 ---

Being able to program OACONTROL from a non-privileged batch buffer is
not sufficient to be able to configure the OA unit. This was originally
allowed to help enable Mesa to expose OA counters via the
INTEL_performance_query extension, but the current implementation based
on programming OACONTROL via a batch buffer isn't able to report useable
data without a more complete OA unit configuration. Mesa handles the
possibility that writes to OACONTROL may not be allowed and so only
advertises the extension after explicitly testing that a write to
OACONTROL succeeds. Based on this; removing OACONTROL from the whitelist
should be ok for userspace.

Removing this simplifies adding a new kernel api for configuring the OA
unit without needing to consider the possibility that userspace might
trample on OACONTROL state which we'd like to start managing within
the kernel instead. In particular running any Mesa based GL application
currently results in clearing OACONTROL when initializing which would
disable the capturing of metrics.

v2:
This bumps the command parser version from 8 to 9, as the change is
visible to userspace.

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
Reviewed-by: Sourab Gupta 
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 42 --
 1 file changed, 5 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index c9d2ecd..f5762cd 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -450,7 +450,6 @@ static const struct drm_i915_reg_descriptor 
gen7_render_regs[] = {
REG64(PS_INVOCATION_COUNT),
REG64(PS_DEPTH_COUNT),
REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE),
-   REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */
REG64(MI_PREDICATE_SRC0),
REG64(MI_PREDICATE_SRC1),
REG32(GEN7_3DPRIM_END_OFFSET),
@@ -1060,8 +1059,7 @@ bool intel_engine_needs_cmd_parser(struct intel_engine_cs 
*engine)
 static bool check_cmd(const struct intel_engine_cs *engine,
  const struct drm_i915_cmd_descriptor *desc,
  const u32 *cmd, u32 length,
- const bool is_master,
- bool *oacontrol_set)
+ const bool is_master)
 {
if (desc->flags & CMD_DESC_SKIP)
return true;
@@ -1099,31 +1097,6 @@ static bool check_cmd(const struct intel_engine_cs 
*engine,
}

/*
-* OACONTROL requires some special handling for
-* writes. We want to make sure that any batch which
-* enables OA also disables it before the end of the
-* batch. The goal is to prevent one process from
-* snooping on the perf data from another process. To do
-* that, we need to check the value that will be written
-* to the register. Hence, limit OACONTROL writes to
-* only MI_LOAD_REGISTER_IMM commands.
-*/
-   if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) {
-   if (desc->cmd.value == MI_LOAD_REGISTER_MEM) {
-   DRM_DEBUG_DRIVER("CMD: Rejected LRM to 
OACONTROL\n");
-   return false;
-   }
-
-   if (desc->cmd.value == MI_LOAD_REGISTER_REG) {
-   DRM_DEBUG_DRIVER("CMD: Rejected LRR to 
OACONTROL\n");
-   return false;
-   }
-
-   if (desc->cmd.value == MI_LOAD_REGISTER_IMM(1))
-   *oacontrol_set = (cmd[offset + 1] != 0);
-   }
-
-   /*
 * Check the value written to the register against the
 * allowed mask/value pair given in the whitelist entry.
 */
@@ -1214,7 +1187,6 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
u32 *cmd, *batch_end;
struct drm_i915_cmd_descriptor default_desc = noop_desc;
const struct drm_i915_cmd_descriptor *desc = &default_desc;
-   bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
bool needs_clflush_after = false;
int ret = 0;

@@ -1270,8 +1242,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
break;
}

-   if (!check_cmd(engine, desc, cmd, length, is_

[PATCH v9 09/11] drm/i915: add dev.i915.oa_max_sample_rate sysctl

2016-11-08 Thread Robert Bragg
On Tue, Nov 8, 2016 at 6:19 AM, sourab gupta  wrote:

> On Mon, 2016-11-07 at 11:49 -0800, Robert Bragg wrote:
> > The maximum OA sampling frequency is now configurable via a
> > dev.i915.oa_max_sample_rate sysctl parameter.
> >
> > Following the precedent set by perf's similar
> > kernel.perf_event_max_sample_rate the default maximum rate is 10Hz
> >
> > Signed-off-by: Robert Bragg 
> > ---
> >  drivers/gpu/drm/i915/i915_perf.c | 61 ++
> ++
> >  1 file changed, 50 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c
> b/drivers/gpu/drm/i915/i915_perf.c
> > index e51c1d8..1a87fe9 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -82,6 +82,21 @@ static u32 i915_perf_stream_paranoid = true;
> >  #define INVALID_CTX_ID 0x
> >
> >
> > +/* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
> > + *
> > + * 160ns is the smallest sampling period we can theoretically program
> the OA
> > + * unit with on Haswell, corresponding to 6.25MHz.
> > + */
> > +static int oa_sample_rate_hard_limit = 625;
> There's no check for 'oa_sample_rate_hard_limit' anywhere below.
>

It's in the struct ctl_table oa_table[] declaration of the
"oa_max_sample_rate" paramater, assigned to .extra2 which is referenced by
the proc_dointvec_minmax validation handler for the parameter.


>
> > +
> > +/* Theoretically we can program the OA unit to sample every 160ns but
> don't
> > + * allow that by default unless root...
> > + *
> > + * The default threshold of 10Hz is based on perf's similar
> > + * kernel.perf_event_max_sample_rate sysctl parameter.
> > + */
> > +static u32 i915_oa_max_sample_rate = 10;
> > +
> >  /* XXX: beware if future OA HW adds new report formats that the current
> >   * code assumes all reports have a power-of-two size and ~(size - 1) can
> >   * be used as a mask to align the OA tail pointer.
> > @@ -1314,6 +1329,7 @@ static int read_properties_unlocked(struct
> drm_i915_private *dev_priv,
> >   }
> >
> >   for (i = 0; i < n_props; i++) {
> > + u64 oa_period, oa_freq_hz;
> >   u64 id, value;
> >   int ret;
> >
> > @@ -1359,21 +1375,35 @@ static int read_properties_unlocked(struct
> drm_i915_private *dev_priv,
> >   return -EINVAL;
> >   }
> >
> > - /* NB: The exponent represents a period as follows:
> > -  *
> > -  *   80ns * 2^(period_exponent + 1)
> > -  *
> > -  * Theoretically we can program the OA unit to
> sample
> > + /* Theoretically we can program the OA unit to
> sample
> >* every 160ns but don't allow that by default
> unless
> >* root.
> >*
> > -  * Referring to perf's
> > -  * kernel.perf_event_max_sample_rate for a
> precedent
> > -  * (10 by default); with an OA exponent of 6
> we get
> > -  * a period of 10.240 microseconds -just under
> 10Hz
> > +  * On Haswell the period is derived from the
> exponent
> > +  * as:
> > +  *
> > +  *   period = 80ns * 2^(exponent + 1)
> > +  */
> > + BUILD_BUG_ON(sizeof(oa_period) != 8);
> > + oa_period = 80ull * (2ull << value);
> I assume now that there'll be a platform specific check for 80ull, while
> programming oa_period, for subquent Gen8+ platforms, which should be
> fine.
>

Yeah, this code will need adapting for gen9+. I guess we'll change it to
work in terms of ((2<
> > +
> > + /* This check is primarily to ensure that
> oa_period <=
> > +  * UINT32_MAX (before passing to do_div which only
> > +  * accepts a u32 denominator), but we can also skip
> > +  * checking anything < 1Hz which implicitly can't
> be
> > +  * limited via an integer oa_max_sample_rate.
> >*/
> > - if (value < 6 && !capable(CAP_SYS_ADMIN)) {
> > -  

[PATCH v9 11/11] drm/i915: Add a kerneldoc summary for i915_perf.c

2016-11-07 Thread Robert Bragg
In particular this tries to capture for posterity some of the early
challenges we had with using the core perf infrastructure in case we
ever want to revisit adapting perf for device metrics.

Cc: Chris Wilson 
Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_perf.c | 163 +++
 1 file changed, 163 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 1a87fe9..9551282 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -24,6 +24,169 @@
  *   Robert Bragg 
  */

+
+/**
+ * DOC: i915 Perf, streaming API for GPU metrics
+ *
+ * Gen graphics supports a large number of performance counters that can help
+ * driver and application developers understand and optimize their use of the
+ * GPU.
+ *
+ * This i915 perf interface enables userspace to configure and open a file
+ * descriptor representing a stream of GPU metrics which can then be read() as
+ * a stream of sample records.
+ *
+ * The interface is particularly suited to exposing buffered metrics that are
+ * captured by DMA from the GPU, unsynchronized with and unrelated to the CPU.
+ *
+ * Streams representing a single context are accessible to applications with a
+ * corresponding drm file descriptor, such that OpenGL can use the interface
+ * without special privileges. Access to system-wide metrics requires root
+ * privileges by default, unless changed via the dev.i915.perf_event_paranoid
+ * sysctl option.
+ *
+ *
+ * The interface was initially inspired by the core Perf infrastructure but
+ * some notable differences are:
+ *
+ * i915 perf file descriptors represent a "stream" instead of an "event"; where
+ * a perf event primarily corresponds to a single 64bit value, while a stream
+ * might sample sets of tightly-coupled counters, depending on the
+ * configuration.  For example the Gen OA unit isn't designed to support
+ * orthogonal configurations of individual counters; it's configured for a set
+ * of related counters. Samples for an i915 perf stream capturing OA metrics
+ * will include a set of counter values packed in a compact HW specific format.
+ * The OA unit supports a number of different packing formats which can be
+ * selected by the user opening the stream. Perf has support for grouping
+ * events, but each event in the group is configured, validated and
+ * authenticated individually with separate system calls.
+ *
+ * i915 perf stream configurations are provided as an array of u64 (key,value)
+ * pairs, instead of a fixed struct with multiple miscellaneous config members,
+ * interleaved with event-type specific members.
+ *
+ * i915 perf doesn't support exposing metrics via an mmap'd circular buffer.
+ * The supported metrics are being written to memory by the GPU unsynchronized
+ * with the CPU, using HW specific packing formats for counter sets. Sometimes
+ * the constraints on HW configuration require reports to be filtered before it
+ * would be acceptable to expose them to unprivileged applications - to hide
+ * the metrics of other processes/contexts. For these use cases a read() based
+ * interface is a good fit, and provides an opportunity to filter data as it
+ * gets copied from the GPU mapped buffers to userspace buffers.
+ *
+ *
+ * Some notes regarding Linux Perf:
+ * 
+ *
+ * The first prototype of this driver was based on the core perf
+ * infrastructure, and while we did make that mostly work, with some changes to
+ * perf, we found we were breaking or working around too many assumptions baked
+ * into perf's currently cpu centric design.
+ *
+ * In the end we didn't see a clear benefit to making perf's implementation and
+ * interface more complex by changing design assumptions while we knew we still
+ * wouldn't be able to use any existing perf based userspace tools.
+ *
+ * Also considering the Gen specific nature of the Observability hardware and
+ * how userspace will sometimes need to combine i915 perf OA metrics with
+ * side-band OA data captured via MI_REPORT_PERF_COUNT commands; we're
+ * expecting the interface to be used by a platform specific userspace such as
+ * OpenGL or tools. This is to say; we aren't inherently missing out on having
+ * a standard vendor/architecture agnostic interface by not using perf.
+ *
+ *
+ * For posterity, in case we might re-visit trying to adapt core perf to be
+ * better suited to exposing i915 metrics these were the main pain points we
+ * hit:
+ *
+ * - The perf based OA PMU driver broke some significant design assumptions:
+ *
+ *   Existing perf pmus are used for profiling work on a cpu and we were
+ *   introducing the idea of _IS_DEVICE pmus with different security
+ *   implications, the need to fake cpu-related data (such as user/kernel
+ *   registers) to fit with perf's current design, and addi

[PATCH v9 10/11] drm/i915: Add more Haswell OA metric sets

2016-11-07 Thread Robert Bragg
This adds 'compute', 'compute extended', 'memory reads', 'memory writes'
and 'sampler balance' metric sets for Haswell.

The code is auto generated from an XML description of metric sets,
currently maintained in gputop, ref:

 https://github.com/rib/gputop
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_oa_hsw.c | 559 -
 1 file changed, 558 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
index 6af25cf..4ddf756 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.c
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -31,9 +31,14 @@

 enum metric_set_id {
METRIC_SET_ID_RENDER_BASIC = 1,
+   METRIC_SET_ID_COMPUTE_BASIC,
+   METRIC_SET_ID_COMPUTE_EXTENDED,
+   METRIC_SET_ID_MEMORY_READS,
+   METRIC_SET_ID_MEMORY_WRITES,
+   METRIC_SET_ID_SAMPLER_BALANCE,
 };

-int i915_oa_n_builtin_metric_sets_hsw = 1;
+int i915_oa_n_builtin_metric_sets_hsw = 6;

 static const struct i915_oa_reg b_counter_config_render_basic[] = {
{ _MMIO(0x2724), 0x0080 },
@@ -112,6 +117,298 @@ get_render_basic_mux_config(struct drm_i915_private 
*dev_priv,
return mux_config_render_basic;
 }

+static const struct i915_oa_reg b_counter_config_compute_basic[] = {
+   { _MMIO(0x2710), 0x },
+   { _MMIO(0x2714), 0x0080 },
+   { _MMIO(0x2718), 0x },
+   { _MMIO(0x271c), 0x },
+   { _MMIO(0x2720), 0x },
+   { _MMIO(0x2724), 0x0080 },
+   { _MMIO(0x2728), 0x },
+   { _MMIO(0x272c), 0x },
+   { _MMIO(0x2740), 0x },
+   { _MMIO(0x2744), 0x },
+   { _MMIO(0x2748), 0x },
+   { _MMIO(0x274c), 0x },
+   { _MMIO(0x2750), 0x },
+   { _MMIO(0x2754), 0x },
+   { _MMIO(0x2758), 0x },
+   { _MMIO(0x275c), 0x },
+   { _MMIO(0x236c), 0x },
+};
+
+static const struct i915_oa_reg mux_config_compute_basic[] = {
+   { _MMIO(0x253a4), 0x },
+   { _MMIO(0x2681c), 0x01f00800 },
+   { _MMIO(0x26820), 0x1000 },
+   { _MMIO(0x2781c), 0x01f00800 },
+   { _MMIO(0x26520), 0x0007 },
+   { _MMIO(0x265a0), 0x0007 },
+   { _MMIO(0x25380), 0x0010 },
+   { _MMIO(0x2538c), 0x0030 },
+   { _MMIO(0x25384), 0xaa8a },
+   { _MMIO(0x25404), 0x },
+   { _MMIO(0x26800), 0x4202 },
+   { _MMIO(0x26808), 0x00605817 },
+   { _MMIO(0x2680c), 0x10001005 },
+   { _MMIO(0x26804), 0x },
+   { _MMIO(0x27800), 0x0102 },
+   { _MMIO(0x27808), 0x0c0701e0 },
+   { _MMIO(0x2780c), 0x000200a0 },
+   { _MMIO(0x27804), 0x },
+   { _MMIO(0x26484), 0x4400 },
+   { _MMIO(0x26704), 0x4400 },
+   { _MMIO(0x26500), 0x0006 },
+   { _MMIO(0x26510), 0x0001 },
+   { _MMIO(0x26504), 0x8800 },
+   { _MMIO(0x26580), 0x0006 },
+   { _MMIO(0x26590), 0x0020 },
+   { _MMIO(0x26584), 0x },
+   { _MMIO(0x26104), 0x5582 },
+   { _MMIO(0x26184), 0xaa86 },
+   { _MMIO(0x25420), 0x08320c83 },
+   { _MMIO(0x25424), 0x06820c83 },
+   { _MMIO(0x2541c), 0x },
+   { _MMIO(0x25428), 0x0c03 },
+};
+
+static const struct i915_oa_reg *
+get_compute_basic_mux_config(struct drm_i915_private *dev_priv,
+int *len)
+{
+   *len = ARRAY_SIZE(mux_config_compute_basic);
+   return mux_config_compute_basic;
+}
+
+static const struct i915_oa_reg b_counter_config_compute_extended[] = {
+   { _MMIO(0x2724), 0xf080 },
+   { _MMIO(0x2720), 0x },
+   { _MMIO(0x2714), 0xf080 },
+   { _MMIO(0x2710), 0x },
+   { _MMIO(0x2770), 0x0007fe2a },
+   { _MMIO(0x2774), 0xff00 },
+   { _MMIO(0x2778), 0x0007fe6a },
+   { _MMIO(0x277c), 0xff00 },
+   { _MMIO(0x2780), 0x0007fe92 },
+   { _MMIO(0x2784), 0xff00 },
+   { _MMIO(0x2788), 0x0007fea2 },
+   { _MMIO(0x278c), 0xff00 },
+   { _MMIO(0x2790), 0x0007fe32 },
+   { _MMIO(0x2794), 0xff00 },
+   { _MMIO(0x2798), 0x0007fe9a },
+   { _MMIO(0x279c), 0xff00 },
+   { _MMIO(0x27a0), 0x0007ff23 },
+   { _MMIO(0x27a4), 0xff00 },
+   { _MMIO(0x27a8), 0x0007fff3 },
+   { _MMIO(0x27ac), 0xfffe },
+};
+
+static const struct i915_oa_reg mux_config_compute_extended[] = {
+   { _MMIO(0x2681c), 0x3eb00800 },
+   { _MMIO(0x26820), 0x0090 },
+   { _MMIO(0x25384), 0x02aa },
+   { _MMIO(0x25404), 0x03ff },
+   { _MMIO(0x26800), 0x00142284 },
+   { _MMIO(0x26808), 0x0e629062 },
+   { _MMIO(0x2680c), 0x3f6f55cb },
+   { _MMIO(0x26810), 0x0014 },
+  

[PATCH v9 09/11] drm/i915: add dev.i915.oa_max_sample_rate sysctl

2016-11-07 Thread Robert Bragg
The maximum OA sampling frequency is now configurable via a
dev.i915.oa_max_sample_rate sysctl parameter.

Following the precedent set by perf's similar
kernel.perf_event_max_sample_rate the default maximum rate is 10Hz

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_perf.c | 61 
 1 file changed, 50 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index e51c1d8..1a87fe9 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -82,6 +82,21 @@ static u32 i915_perf_stream_paranoid = true;
 #define INVALID_CTX_ID 0x


+/* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
+ *
+ * 160ns is the smallest sampling period we can theoretically program the OA
+ * unit with on Haswell, corresponding to 6.25MHz.
+ */
+static int oa_sample_rate_hard_limit = 625;
+
+/* Theoretically we can program the OA unit to sample every 160ns but don't
+ * allow that by default unless root...
+ *
+ * The default threshold of 10Hz is based on perf's similar
+ * kernel.perf_event_max_sample_rate sysctl parameter.
+ */
+static u32 i915_oa_max_sample_rate = 10;
+
 /* XXX: beware if future OA HW adds new report formats that the current
  * code assumes all reports have a power-of-two size and ~(size - 1) can
  * be used as a mask to align the OA tail pointer.
@@ -1314,6 +1329,7 @@ static int read_properties_unlocked(struct 
drm_i915_private *dev_priv,
}

for (i = 0; i < n_props; i++) {
+   u64 oa_period, oa_freq_hz;
u64 id, value;
int ret;

@@ -1359,21 +1375,35 @@ static int read_properties_unlocked(struct 
drm_i915_private *dev_priv,
return -EINVAL;
}

-   /* NB: The exponent represents a period as follows:
-*
-*   80ns * 2^(period_exponent + 1)
-*
-* Theoretically we can program the OA unit to sample
+   /* Theoretically we can program the OA unit to sample
 * every 160ns but don't allow that by default unless
 * root.
 *
-* Referring to perf's
-* kernel.perf_event_max_sample_rate for a precedent
-* (10 by default); with an OA exponent of 6 we get
-* a period of 10.240 microseconds -just under 10Hz
+* On Haswell the period is derived from the exponent
+* as:
+*
+*   period = 80ns * 2^(exponent + 1)
+*/
+   BUILD_BUG_ON(sizeof(oa_period) != 8);
+   oa_period = 80ull * (2ull << value);
+
+   /* This check is primarily to ensure that oa_period <=
+* UINT32_MAX (before passing to do_div which only
+* accepts a u32 denominator), but we can also skip
+* checking anything < 1Hz which implicitly can't be
+* limited via an integer oa_max_sample_rate.
 */
-   if (value < 6 && !capable(CAP_SYS_ADMIN)) {
-   DRM_ERROR("Minimum OA sampling exponent is 6 
without root privileges\n");
+   if (oa_period <= NSEC_PER_SEC) {
+   u64 tmp = NSEC_PER_SEC;
+   do_div(tmp, oa_period);
+   oa_freq_hz = tmp;
+   } else
+   oa_freq_hz = 0;
+
+   if (oa_freq_hz > i915_oa_max_sample_rate &&
+   !capable(CAP_SYS_ADMIN)) {
+   DRM_ERROR("OA exponent would exceed the max 
sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without root 
privileges\n",
+ i915_oa_max_sample_rate);
return -EACCES;
}

@@ -1481,6 +1511,15 @@ static struct ctl_table oa_table[] = {
 .extra1 = &zero,
 .extra2 = &one,
 },
+   {
+.procname = "oa_max_sample_rate",
+.data = &i915_oa_max_sample_rate,
+.maxlen = sizeof(i915_oa_max_sample_rate),
+.mode = 0644,
+.proc_handler = proc_dointvec_minmax,
+.extra1 = &zero,
+.extra2 = &oa_sample_rate_hard_limit,
+},
{}
 };

-- 
2.10.1



[PATCH v9 08/11] drm/i915: Add dev.i915.perf_stream_paranoid sysctl option

2016-11-07 Thread Robert Bragg
Consistent with the kernel.perf_event_paranoid sysctl option that can
allow non-root users to access system wide cpu metrics, this can
optionally allow non-root users to access system wide OA counter metrics
from Gen graphics hardware.

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
Reviewed-by: Sourab Gupta 
---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/i915_perf.c | 50 +++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 15fba6b..8962bfd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2188,6 +2188,7 @@ struct drm_i915_private {
bool initialized;

struct kobject *metrics_kobj;
+   struct ctl_table_header *sysctl_header;

struct mutex lock;
struct list_head streams;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index c427cd8c..e51c1d8 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -64,6 +64,11 @@
 #define POLL_FREQUENCY 200
 #define POLL_PERIOD (NSEC_PER_SEC / POLL_FREQUENCY)

+/* for sysctl proc_dointvec_minmax of dev.i915.perf_stream_paranoid */
+static int zero;
+static int one = 1;
+static u32 i915_perf_stream_paranoid = true;
+
 /* The maximum exponent the hardware accepts is 63 (essentially it selects one
  * of the 64bit timestamp bits to trigger reports from) but there's currently
  * no known use case for sampling as infrequently as once per 47 thousand 
years.
@@ -1207,7 +1212,13 @@ i915_perf_open_ioctl_locked(struct drm_i915_private 
*dev_priv,
}
}

-   if (!specific_ctx && !capable(CAP_SYS_ADMIN)) {
+   /* Similar to perf's kernel.perf_paranoid_cpu sysctl option
+* we check a dev.i915.perf_stream_paranoid sysctl option
+* to determine if it's ok to access system wide OA counters
+* without CAP_SYS_ADMIN privileges.
+*/
+   if (!specific_ctx &&
+   i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
DRM_ERROR("Insufficient privileges to open system-wide i915 
perf stream\n");
ret = -EACCES;
goto err_ctx;
@@ -1460,6 +1471,39 @@ void i915_perf_unregister(struct drm_i915_private 
*dev_priv)
dev_priv->perf.metrics_kobj = NULL;
 }

+static struct ctl_table oa_table[] = {
+   {
+.procname = "perf_stream_paranoid",
+.data = &i915_perf_stream_paranoid,
+.maxlen = sizeof(i915_perf_stream_paranoid),
+.mode = 0644,
+.proc_handler = proc_dointvec_minmax,
+.extra1 = &zero,
+.extra2 = &one,
+},
+   {}
+};
+
+static struct ctl_table i915_root[] = {
+   {
+.procname = "i915",
+.maxlen = 0,
+.mode = 0555,
+.child = oa_table,
+},
+   {}
+};
+
+static struct ctl_table dev_root[] = {
+   {
+.procname = "dev",
+.maxlen = 0,
+.mode = 0555,
+.child = i915_root,
+},
+   {}
+};
+
 void i915_perf_init(struct drm_i915_private *dev_priv)
 {
if (!IS_HASWELL(dev_priv))
@@ -1490,6 +1534,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
dev_priv->perf.oa.n_builtin_sets =
i915_oa_n_builtin_metric_sets_hsw;

+   dev_priv->perf.sysctl_header = register_sysctl_table(dev_root);
+
dev_priv->perf.initialized = true;
 }

@@ -1498,6 +1544,8 @@ void i915_perf_fini(struct drm_i915_private *dev_priv)
if (!dev_priv->perf.initialized)
return;

+   unregister_sysctl_table(dev_priv->perf.sysctl_header);
+
memset(&dev_priv->perf.oa.ops, 0, sizeof(dev_priv->perf.oa.ops));
dev_priv->perf.initialized = false;
 }
-- 
2.10.1



[PATCH v9 07/11] drm/i915: advertise available metrics via sysfs

2016-11-07 Thread Robert Bragg
Each metric set is given a sysfs entry like:

/sys/class/drm/card0/metrics//id

This allows userspace to enumerate the specific sets that are available
for the current system. The 'id' file contains an unsigned integer that
can be used to open the associated metric set via
DRM_IOCTL_I915_PERF_OPEN. The  is a globally unique ID for a
specific OA unit register configuration that can be reliably used by
userspace as a key to lookup corresponding counter meta data and
normalization equations.

The guid registry is currently maintained as part of gputop along with
the XML metric set descriptions and code generation scripts, ref:

 https://github.com/rib/gputop
 > gputop-data/guids.xml
 > scripts/update-guids.py
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
Reviewed-by: Sourab Gupta 
---
 drivers/gpu/drm/i915/i915_drv.c|  5 
 drivers/gpu/drm/i915/i915_drv.h|  4 +++
 drivers/gpu/drm/i915/i915_oa_hsw.c | 51 +
 drivers/gpu/drm/i915/i915_oa_hsw.h |  4 +++
 drivers/gpu/drm/i915/i915_perf.c   | 52 ++
 5 files changed, 116 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 22b4166..4fd8650 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1125,6 +1125,9 @@ static void i915_driver_register(struct drm_i915_private 
*dev_priv)
i915_debugfs_register(dev_priv);
i915_guc_register(dev_priv);
i915_setup_sysfs(dev_priv);
+
+   /* Depends on sysfs having been initialized */
+   i915_perf_register(dev_priv);
} else
DRM_ERROR("Failed to register driver for userspace access!\n");

@@ -1161,6 +1164,8 @@ static void i915_driver_unregister(struct 
drm_i915_private *dev_priv)
acpi_video_unregister();
intel_opregion_unregister(dev_priv);

+   i915_perf_unregister(dev_priv);
+
i915_teardown_sysfs(dev_priv);
i915_guc_unregister(dev_priv);
i915_debugfs_unregister(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8003120..15fba6b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2187,6 +2187,8 @@ struct drm_i915_private {
struct {
bool initialized;

+   struct kobject *metrics_kobj;
+
struct mutex lock;
struct list_head streams;

@@ -3884,6 +3886,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
 /* i915_perf.c */
 extern void i915_perf_init(struct drm_i915_private *dev_priv);
 extern void i915_perf_fini(struct drm_i915_private *dev_priv);
+extern void i915_perf_register(struct drm_i915_private *dev_priv);
+extern void i915_perf_unregister(struct drm_i915_private *dev_priv);

 /* i915_suspend.c */
 extern int i915_save_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
index 8906380..6af25cf 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.c
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -24,6 +24,8 @@
  *
  */

+#include 
+
 #include "i915_drv.h"
 #include "i915_oa_hsw.h"

@@ -142,3 +144,52 @@ int i915_oa_select_metric_set_hsw(struct drm_i915_private 
*dev_priv)
return -ENODEV;
}
 }
+
+static ssize_t
+show_render_basic_id(struct device *kdev, struct device_attribute *attr, char 
*buf)
+{
+   return sprintf(buf, "%d\n", METRIC_SET_ID_RENDER_BASIC);
+}
+
+static struct device_attribute dev_attr_render_basic_id = {
+   .attr = { .name = "id", .mode = 0444 },
+   .show = show_render_basic_id,
+   .store = NULL,
+};
+
+static struct attribute *attrs_render_basic[] = {
+   &dev_attr_render_basic_id.attr,
+   NULL,
+};
+
+static struct attribute_group group_render_basic = {
+   .name = "403d8832-1a27-4aa6-a64e-f5389ce7b212",
+   .attrs =  attrs_render_basic,
+};
+
+int
+i915_perf_register_sysfs_hsw(struct drm_i915_private *dev_priv)
+{
+   int mux_len;
+   int ret = 0;
+
+   if (get_render_basic_mux_config(dev_priv, &mux_len)) {
+   ret = sysfs_create_group(dev_priv->perf.metrics_kobj, 
&group_render_basic);
+   if (ret)
+   goto error_render_basic;
+   }
+
+   return 0;
+
+error_render_basic:
+   return ret;
+}
+
+void
+i915_perf_unregister_sysfs_hsw(struct drm_i915_private *dev_priv)
+{
+   int mux_len;
+
+   if (get_render_basic_mux_config(dev_priv, &mux_len))
+   sysfs_remove_group(dev_priv->perf.metrics_kobj, 
&group_render_basic);
+}
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.h 
b/drivers/gpu/drm/i915/i915_oa_hsw.h
ind

[PATCH v9 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-11-07 Thread Robert Bragg
Gen graphics hardware can be set up to periodically write snapshots of
performance counters into a circular buffer via its Observation
Architecture and this patch exposes that capability to userspace via the
i915 perf interface.

v2:
   Make sure to initialize ->specific_ctx_id when opening, without
   relying on _pin_notify hook, in case ctx already pinned.
v3:
   Revert back to pinning ctx upfront when opening stream, removing
   need to hook in to pinning and to update OACONTROL on the fly.

Signed-off-by: Robert Bragg 
Signed-off-by: Zhenyu Wang 
Cc: Chris Wilson 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_drv.h  |   66 ++-
 drivers/gpu/drm/i915/i915_perf.c | 1036 +-
 drivers/gpu/drm/i915/i915_reg.h  |  338 +
 include/uapi/drm/i915_drm.h  |   71 ++-
 4 files changed, 1482 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index bdebb66..8003120 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1785,6 +1785,11 @@ struct intel_wm_config {
bool sprites_scaled;
 };

+struct i915_oa_format {
+   u32 format;
+   int size;
+};
+
 struct i915_oa_reg {
i915_reg_t addr;
u32 value;
@@ -1805,11 +1810,6 @@ struct i915_perf_stream_ops {
 */
void (*disable)(struct i915_perf_stream *stream);

-   /* Return: true if any i915 perf records are ready to read()
-* for this stream.
-*/
-   bool (*can_read)(struct i915_perf_stream *stream);
-
/* Call poll_wait, passing a wait queue that will be woken
 * once there is something ready to read() for the stream
 */
@@ -1819,9 +1819,7 @@ struct i915_perf_stream_ops {

/* For handling a blocking read, wait until there is something
 * to ready to read() for the stream. E.g. wait on the same
-* wait queue that would be passed to poll_wait() until
-* ->can_read() returns true (if its safe to call ->can_read()
-* without the i915 perf lock held).
+* wait queue that would be passed to poll_wait().
 */
int (*wait_unlocked)(struct i915_perf_stream *stream);

@@ -1861,11 +1859,28 @@ struct i915_perf_stream {
struct list_head link;

u32 sample_flags;
+   int sample_size;

struct i915_gem_context *ctx;
bool enabled;

-   struct i915_perf_stream_ops *ops;
+   const struct i915_perf_stream_ops *ops;
+};
+
+struct i915_oa_ops {
+   void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
+   int (*enable_metric_set)(struct drm_i915_private *dev_priv);
+   void (*disable_metric_set)(struct drm_i915_private *dev_priv);
+   void (*oa_enable)(struct drm_i915_private *dev_priv);
+   void (*oa_disable)(struct drm_i915_private *dev_priv);
+   void (*update_oacontrol)(struct drm_i915_private *dev_priv);
+   void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv,
+   u32 ctx_id);
+   int (*read)(struct i915_perf_stream *stream,
+   char __user *buf,
+   size_t count,
+   size_t *offset);
+   bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv);
 };

 struct drm_i915_private {
@@ -2171,16 +2186,47 @@ struct drm_i915_private {

struct {
bool initialized;
+
struct mutex lock;
struct list_head streams;

+   spinlock_t hook_lock;
+
struct {
-   u32 metrics_set;
+   struct i915_perf_stream *exclusive_stream;
+
+   u32 specific_ctx_id;
+   struct i915_vma *pinned_rcs_vma;
+
+   struct hrtimer poll_check_timer;
+   wait_queue_head_t poll_wq;
+   bool pollin;
+
+   bool periodic;
+   int period_exponent;
+   int timestamp_frequency;
+
+   int tail_margin;
+
+   int metrics_set;

const struct i915_oa_reg *mux_regs;
int mux_regs_len;
const struct i915_oa_reg *b_counter_regs;
int b_counter_regs_len;
+
+   struct {
+   struct i915_vma *vma;
+   u8 *vaddr;
+   int format;
+   int format_size;
+   } oa_buffer;
+
+   u32 gen7_latched_oastatus1;
+
+   struct i915_oa_ops ops;
+   const struct i915_oa_format *oa_formats;
+   int n_builtin_sets;
} oa;
} perf;

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
inde

[PATCH v9 05/11] drm/i915: Add 'render basic' Haswell OA unit config

2016-11-07 Thread Robert Bragg
Adds a static OA unit, MUX + B Counter configuration for basic render
metrics on Haswell. This is auto generated from an XML
description of metric sets, currently maintained in gputop, ref:

  https://github.com/rib/gputop
  > gputop-data/oa-*.xml
  > scripts/i915-perf-kernelgen.py

  $ make -C gputop-data -f Makefile.xml SYSFS=0 WHITELIST=RenderBasic

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/Makefile  |   3 +-
 drivers/gpu/drm/i915/i915_drv.h|  14 
 drivers/gpu/drm/i915/i915_oa_hsw.c | 144 +
 drivers/gpu/drm/i915/i915_oa_hsw.h |  34 +
 4 files changed, 194 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 08b43af..7e9e6d0 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -117,7 +117,8 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o
 i915-y += i915_vgpu.o

 # perf code
-i915-y += i915_perf.o
+i915-y += i915_perf.o \
+ i915_oa_hsw.o

 ifeq ($(CONFIG_DRM_I915_GVT),y)
 i915-y += intel_gvt.o
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e4cd322..bdebb66 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1785,6 +1785,11 @@ struct intel_wm_config {
bool sprites_scaled;
 };

+struct i915_oa_reg {
+   i915_reg_t addr;
+   u32 value;
+};
+
 struct i915_perf_stream;

 struct i915_perf_stream_ops {
@@ -2168,6 +2173,15 @@ struct drm_i915_private {
bool initialized;
struct mutex lock;
struct list_head streams;
+
+   struct {
+   u32 metrics_set;
+
+   const struct i915_oa_reg *mux_regs;
+   int mux_regs_len;
+   const struct i915_oa_reg *b_counter_regs;
+   int b_counter_regs_len;
+   } oa;
} perf;

/* Abstract the submission mechanism (legacy ringbuffer or execlists) 
away */
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
new file mode 100644
index 000..8906380
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -0,0 +1,144 @@
+/*
+ * Autogenerated file, DO NOT EDIT manually!
+ *
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "i915_drv.h"
+#include "i915_oa_hsw.h"
+
+enum metric_set_id {
+   METRIC_SET_ID_RENDER_BASIC = 1,
+};
+
+int i915_oa_n_builtin_metric_sets_hsw = 1;
+
+static const struct i915_oa_reg b_counter_config_render_basic[] = {
+   { _MMIO(0x2724), 0x0080 },
+   { _MMIO(0x2720), 0x },
+   { _MMIO(0x2714), 0x0080 },
+   { _MMIO(0x2710), 0x },
+};
+
+static const struct i915_oa_reg mux_config_render_basic[] = {
+   { _MMIO(0x253a4), 0x0160 },
+   { _MMIO(0x25440), 0x0010 },
+   { _MMIO(0x25128), 0x },
+   { _MMIO(0x2691c), 0x0800 },
+   { _MMIO(0x26aa0), 0x0150 },
+   { _MMIO(0x26b9c), 0x6000 },
+   { _MMIO(0x2791c), 0x0800 },
+   { _MMIO(0x27aa0), 0x0150 },
+   { _MMIO(0x27b9c), 0x6000 },
+   { _MMIO(0x2641c), 0x0400 },
+   { _MMIO(0x25380), 0x0010 },
+   { _MMIO(0x2538c), 0x },
+   { _MMIO(0x25384), 0x0800 },
+   { _MMIO(0x25400), 0x0004 },
+   { _MMIO(0x2540c), 0x06029000 },
+   { _MMIO(0x25410), 0x0002 },
+   { _MMIO(0x25404), 0x5c30 },
+   { _MMIO(0x25100), 0x0016 },
+   { _MMIO(0x25110), 0x0400 },
+   { _MMIO(0x25104), 0x },
+   { _MMIO(0x26804), 0x1211 },
+   { _MMIO(0

[PATCH v9 04/11] drm/i915: don't whitelist oacontrol in cmd parser

2016-11-07 Thread Robert Bragg
Being able to program OACONTROL from a non-privileged batch buffer is
not sufficient to be able to configure the OA unit. This was originally
allowed to help enable Mesa to expose OA counters via the
INTEL_performance_query extension, but the current implementation based
on programming OACONTROL via a batch buffer isn't able to report useable
data without a more complete OA unit configuration. Mesa handles the
possibility that writes to OACONTROL may not be allowed and so only
advertises the extension after explicitly testing that a write to
OACONTROL succeeds. Based on this; removing OACONTROL from the whitelist
should be ok for userspace.

Removing this simplifies adding a new kernel api for configuring the OA
unit without needing to consider the possibility that userspace might
trample on OACONTROL state which we'd like to start managing within
the kernel instead. In particular running any Mesa based GL application
currently results in clearing OACONTROL when initializing which would
disable the capturing of metrics.

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
Reviewed-by: Sourab Gupta 
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 38 ++
 1 file changed, 2 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index c9d2ecd..f3453dc 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -450,7 +450,6 @@ static const struct drm_i915_reg_descriptor 
gen7_render_regs[] = {
REG64(PS_INVOCATION_COUNT),
REG64(PS_DEPTH_COUNT),
REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE),
-   REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */
REG64(MI_PREDICATE_SRC0),
REG64(MI_PREDICATE_SRC1),
REG32(GEN7_3DPRIM_END_OFFSET),
@@ -1060,8 +1059,7 @@ bool intel_engine_needs_cmd_parser(struct intel_engine_cs 
*engine)
 static bool check_cmd(const struct intel_engine_cs *engine,
  const struct drm_i915_cmd_descriptor *desc,
  const u32 *cmd, u32 length,
- const bool is_master,
- bool *oacontrol_set)
+ const bool is_master)
 {
if (desc->flags & CMD_DESC_SKIP)
return true;
@@ -1099,31 +1097,6 @@ static bool check_cmd(const struct intel_engine_cs 
*engine,
}

/*
-* OACONTROL requires some special handling for
-* writes. We want to make sure that any batch which
-* enables OA also disables it before the end of the
-* batch. The goal is to prevent one process from
-* snooping on the perf data from another process. To do
-* that, we need to check the value that will be written
-* to the register. Hence, limit OACONTROL writes to
-* only MI_LOAD_REGISTER_IMM commands.
-*/
-   if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) {
-   if (desc->cmd.value == MI_LOAD_REGISTER_MEM) {
-   DRM_DEBUG_DRIVER("CMD: Rejected LRM to 
OACONTROL\n");
-   return false;
-   }
-
-   if (desc->cmd.value == MI_LOAD_REGISTER_REG) {
-   DRM_DEBUG_DRIVER("CMD: Rejected LRR to 
OACONTROL\n");
-   return false;
-   }
-
-   if (desc->cmd.value == MI_LOAD_REGISTER_IMM(1))
-   *oacontrol_set = (cmd[offset + 1] != 0);
-   }
-
-   /*
 * Check the value written to the register against the
 * allowed mask/value pair given in the whitelist entry.
 */
@@ -1214,7 +1187,6 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
u32 *cmd, *batch_end;
struct drm_i915_cmd_descriptor default_desc = noop_desc;
const struct drm_i915_cmd_descriptor *desc = &default_desc;
-   bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
bool needs_clflush_after = false;
int ret = 0;

@@ -1270,8 +1242,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
break;
}

-   if (!check_cmd(engine, desc, cmd, length, is_master,
-  &oacontrol_set)) {
+   if (!check_cmd(engine, desc, cmd, length, is_master)) {
ret = -EACCES;
break;
}
@@ -1279,11 +1250,6 @@ int i

[PATCH v9 03/11] drm/i915: return EACCES for check_cmd() failures

2016-11-07 Thread Robert Bragg
check_cmd() is checking whether a command adheres to certain
restrictions that ensure it's safe to execute within a privileged batch
buffer. Returning false implies a privilege problem, not that the
command is invalid.

The distinction makes the difference between allowing the buffer to be
executed as an unprivileged batch buffer or returning an EINVAL error to
userspace without executing anything.

In a case where userspace may want to test whether it can successfully
write to a register that needs privileges the distinction may be
important and an EINVAL error may be considered fatal.

In particular this is currently true for Mesa, which includes a test for
whether OACONTROL can be written too, but Mesa treats any error when
flushing a batch buffer as fatal, calling exit(1).

As it is currently Mesa can gracefully handle a failure to write to
OACONTROL if the command parser is disabled, but if we were to remove
OACONTROL from the parser's whitelist then the returned EINVAL would
break Mesa applications as they attempt an OACONTROL write.

This bumps the command parser version from 7 to 8, as the change is
visible to userspace.

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
Reviewed-by: Sourab Gupta 
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 7719aed..c9d2ecd 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -1272,7 +1272,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,

if (!check_cmd(engine, desc, cmd, length, is_master,
   &oacontrol_set)) {
-   ret = -EINVAL;
+   ret = -EACCES;
break;
}

@@ -1333,6 +1333,9 @@ int i915_cmd_parser_get_version(struct drm_i915_private 
*dev_priv)
 * 5. GPGPU dispatch compute indirect registers.
 * 6. TIMESTAMP register and Haswell CS GPR registers
 * 7. Allow MI_LOAD_REGISTER_REG between whitelisted registers.
+* 8. Don't report cmd_check() failures as EINVAL errors to userspace;
+*rely on the HW to NOOP disallowed commands as it would without
+*the parser enabled.
 */
-   return 7;
+   return 8;
 }
-- 
2.10.1



[PATCH v9 02/11] drm/i915: rename OACONTROL GEN7_OACONTROL

2016-11-07 Thread Robert Bragg
OACONTROL changes quite a bit for gen8, with some bits split out into a
per-context OACTXCONTROL register. Rename now before adding more gen7 OA
registers

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
Reviewed-by: Sourab Gupta 
---
 drivers/gpu/drm/i915/gvt/handlers.c| 2 +-
 drivers/gpu/drm/i915/i915_cmd_parser.c | 4 ++--
 drivers/gpu/drm/i915/i915_reg.h| 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/handlers.c 
b/drivers/gpu/drm/i915/gvt/handlers.c
index 9ab1f95..4527cb7 100644
--- a/drivers/gpu/drm/i915/gvt/handlers.c
+++ b/drivers/gpu/drm/i915/gvt/handlers.c
@@ -2177,7 +2177,7 @@ static int init_generic_mmio_info(struct intel_gvt *gvt)
MMIO_DFH(0x1217c, D_ALL, F_CMD_ACCESS, NULL, NULL);

MMIO_F(0x2290, 8, 0, 0, 0, D_HSW_PLUS, NULL, NULL);
-   MMIO_D(OACONTROL, D_HSW);
+   MMIO_D(GEN7_OACONTROL, D_HSW);
MMIO_D(0x2b00, D_BDW_PLUS);
MMIO_D(0x2360, D_BDW_PLUS);
MMIO_F(0x5200, 32, 0, 0, 0, D_ALL, NULL, NULL);
diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index f5039f4..7719aed 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -450,7 +450,7 @@ static const struct drm_i915_reg_descriptor 
gen7_render_regs[] = {
REG64(PS_INVOCATION_COUNT),
REG64(PS_DEPTH_COUNT),
REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE),
-   REG32(OACONTROL), /* Only allowed for LRI and SRM. See below. */
+   REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */
REG64(MI_PREDICATE_SRC0),
REG64(MI_PREDICATE_SRC1),
REG32(GEN7_3DPRIM_END_OFFSET),
@@ -1108,7 +1108,7 @@ static bool check_cmd(const struct intel_engine_cs 
*engine,
 * to the register. Hence, limit OACONTROL writes to
 * only MI_LOAD_REGISTER_IMM commands.
 */
-   if (reg_addr == i915_mmio_reg_offset(OACONTROL)) {
+   if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) {
if (desc->cmd.value == MI_LOAD_REGISTER_MEM) {
DRM_DEBUG_DRIVER("CMD: Rejected LRM to 
OACONTROL\n");
return false;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 3361d7f..4623dbb 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -615,7 +615,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define HSW_CS_GPR(n)   _MMIO(0x2600 + (n) * 8)
 #define HSW_CS_GPR_UDW(n)   _MMIO(0x2600 + (n) * 8 + 4)

-#define OACONTROL _MMIO(0x2360)
+#define GEN7_OACONTROL _MMIO(0x2360)

 #define _GEN7_PIPEA_DE_LOAD_SL 0x70068
 #define _GEN7_PIPEB_DE_LOAD_SL 0x71068
-- 
2.10.1



[PATCH v9 01/11] drm/i915: Add i915 perf infrastructure

2016-11-07 Thread Robert Bragg
Adds base i915 perf infrastructure for Gen performance metrics.

This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64
properties to configure a stream of metrics and returns a new fd usable
with standard VFS system calls including read() to read typed and sized
records; ioctl() to enable or disable capture and poll() to wait for
data.

A stream is opened something like:

  uint64_t properties[] = {
  /* Single context sampling */
  DRM_I915_PERF_PROP_CTX_HANDLE,ctx_handle,

  /* Include OA reports in samples */
  DRM_I915_PERF_PROP_SAMPLE_OA, true,

  /* OA unit configuration */
  DRM_I915_PERF_PROP_OA_METRICS_SET,metrics_set_id,
  DRM_I915_PERF_PROP_OA_FORMAT, report_format,
  DRM_I915_PERF_PROP_OA_EXPONENT,   period_exponent,
   };
   struct drm_i915_perf_open_param parm = {
  .flags = I915_PERF_FLAG_FD_CLOEXEC |
   I915_PERF_FLAG_FD_NONBLOCK |
   I915_PERF_FLAG_DISABLED,
  .properties_ptr = (uint64_t)properties,
  .num_properties = sizeof(properties) / 16,
   };
   int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, ¶m);

Records read all start with a common { type, size } header with
DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records
contain an extensible number of fields and it's the
DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that
determine what's included in every sample.

No specific streams are supported yet so any attempt to open a stream
will return an error.

v2:
use i915_gem_context_get() - Chris Wilson
v3:
update read() interface to avoid passing state struct - Chris Wilson
fix some rebase fallout, with i915-perf init/deinit
v4:
s/DRM_IORW/DRM_IOW/ - Emil Velikov

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
Reviewed-by: Sourab Gupta 
---
 drivers/gpu/drm/i915/Makefile|   3 +
 drivers/gpu/drm/i915/i915_drv.c  |   4 +
 drivers/gpu/drm/i915/i915_drv.h  |  91 
 drivers/gpu/drm/i915/i915_perf.c | 449 +++
 include/uapi/drm/i915_drm.h  |  67 ++
 5 files changed, 614 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_perf.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 0857e50..08b43af 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -116,6 +116,9 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o
 # virtual gpu code
 i915-y += i915_vgpu.o

+# perf code
+i915-y += i915_perf.o
+
 ifeq ($(CONFIG_DRM_I915_GVT),y)
 i915-y += intel_gvt.o
 include $(src)/gvt/Makefile
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 0213a30..22b4166 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -844,6 +844,8 @@ static int i915_driver_init_early(struct drm_i915_private 
*dev_priv,

intel_detect_preproduction_hw(dev_priv);

+   i915_perf_init(dev_priv);
+
return 0;

 err_gvt:
@@ -859,6 +861,7 @@ static int i915_driver_init_early(struct drm_i915_private 
*dev_priv,
  */
 static void i915_driver_cleanup_early(struct drm_i915_private *dev_priv)
 {
+   i915_perf_fini(dev_priv);
i915_gem_load_cleanup(&dev_priv->drm);
i915_workqueues_cleanup(dev_priv);
 }
@@ -2561,6 +2564,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, 
DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, 
i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, 
i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, 
DRM_RENDER_ALLOW),
 };

 static struct drm_driver driver = {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4735b417..e4cd322 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1785,6 +1785,84 @@ struct intel_wm_config {
bool sprites_scaled;
 };

+struct i915_perf_stream;
+
+struct i915_perf_stream_ops {
+   /* Enables the collection of HW samples, either in response to
+* I915_PERF_IOCTL_ENABLE or implicitly called when stream is
+* opened without I915_PERF_FLAG_DISABLED.
+*/
+   void (*enable)(struct i915_perf_stream *stream);
+
+   /* Disables the collection of HW samples, either in response to
+* I915_PERF_IOCTL_DISABLE or implicitly called before
+* destroying the stream.
+*/
+   void (*disable)(struct i915_perf_stream *stream);
+
+   /* Return: true if any i915 perf records are ready to read()
+* for this stream.
+*/
+   bool (*can_read)(struct i915_perf_stream *stream);
+
+   /* Call poll_wait, passing a wait queue that will be woken
+* once there is something ready to read() for the stream
+*/
+  

[PATCH v9 00/11] Enable i915 perf stream for Haswell OA unit

2016-11-07 Thread Robert Bragg
Rebased and updated with more feedback from Sourab and Matt.

In particular the patch that added the oa_min_timer_exponent sysctl parameter
has now been replaced with one adding an oa_max_sample_rate parameter in Hz.

This way userspace policy won't need to be tailored to different systems when
gen9 OA is enabled where the exponents don't represent the same periods as for
Haswell.

- Robert

Robert Bragg (11):
  drm/i915: Add i915 perf infrastructure
  drm/i915: rename OACONTROL GEN7_OACONTROL
  drm/i915: return EACCES for check_cmd() failures
  drm/i915: don't whitelist oacontrol in cmd parser
  drm/i915: Add 'render basic' Haswell OA unit config
  drm/i915: Enable i915 perf stream for Haswell OA unit
  drm/i915: advertise available metrics via sysfs
  drm/i915: Add dev.i915.perf_stream_paranoid sysctl option
  drm/i915: add dev.i915.oa_max_sample_rate sysctl
  drm/i915: Add more Haswell OA metric sets
  drm/i915: Add a kerneldoc summary for i915_perf.c

 drivers/gpu/drm/i915/Makefile  |4 +
 drivers/gpu/drm/i915/gvt/handlers.c|2 +-
 drivers/gpu/drm/i915/i915_cmd_parser.c |   45 +-
 drivers/gpu/drm/i915/i915_drv.c|9 +
 drivers/gpu/drm/i915/i915_drv.h|  156 +++
 drivers/gpu/drm/i915/i915_oa_hsw.c |  752 ++
 drivers/gpu/drm/i915/i915_oa_hsw.h |   38 +
 drivers/gpu/drm/i915/i915_perf.c   | 1753 
 drivers/gpu/drm/i915/i915_reg.h|  340 ++-
 include/uapi/drm/i915_drm.h|  134 +++
 10 files changed, 3193 insertions(+), 40 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h
 create mode 100644 drivers/gpu/drm/i915/i915_perf.c

-- 
2.10.1



[PATCH v8 02/12] drm/i915: Add i915 perf infrastructure

2016-11-04 Thread Robert Bragg
On Fri, Nov 4, 2016 at 8:59 AM, sourab gupta  wrote:

> On Thu, 2016-10-27 at 19:14 -0700, Robert Bragg wrote:
> > Adds base i915 perf infrastructure for Gen performance metrics.
> >
> > This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64
> > properties to configure a stream of metrics and returns a new fd usable
> > with standard VFS system calls including read() to read typed and sized
> > records; ioctl() to enable or disable capture and poll() to wait for
> > data.
> >
> > A stream is opened something like:
> >
> >   uint64_t properties[] = {
> >   /* Single context sampling */
> >   DRM_I915_PERF_PROP_CTX_HANDLE,ctx_handle,
> >
> >   /* Include OA reports in samples */
> >   DRM_I915_PERF_PROP_SAMPLE_OA, true,
> >
> >   /* OA unit configuration */
> >   DRM_I915_PERF_PROP_OA_METRICS_SET,metrics_set_id,
> >   DRM_I915_PERF_PROP_OA_FORMAT, report_format,
> >   DRM_I915_PERF_PROP_OA_EXPONENT,   period_exponent,
> >};
> >struct drm_i915_perf_open_param parm = {
> >   .flags = I915_PERF_FLAG_FD_CLOEXEC |
> >I915_PERF_FLAG_FD_NONBLOCK |
> >I915_PERF_FLAG_DISABLED,
> >   .properties_ptr = (uint64_t)properties,
> >   .num_properties = sizeof(properties) / 16,
> >};
> >int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, ¶m);
> >
> > Records read all start with a common { type, size } header with
> > DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records
> > contain an extensible number of fields and it's the
> > DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that
> > determine what's included in every sample.
> >
> > No specific streams are supported yet so any attempt to open a stream
> > will return an error.
> >
> > v2:
> > use i915_gem_context_get() - Chris Wilson
> > v3:
> > update read() interface to avoid passing state struct - Chris Wilson
> > fix some rebase fallout, with i915-perf init/deinit
> > v4:
> > s/DRM_IORW/DRM_IOW/ - Emil Velikov
> >
> > Signed-off-by: Robert Bragg 
> > ---
> >  drivers/gpu/drm/i915/Makefile|   3 +
> >  drivers/gpu/drm/i915/i915_drv.c  |   4 +
> >  drivers/gpu/drm/i915/i915_drv.h  |  91 
> >  drivers/gpu/drm/i915/i915_perf.c | 443 ++
> +
> >  include/uapi/drm/i915_drm.h  |  67 ++
> >  5 files changed, 608 insertions(+)
> >  create mode 100644 drivers/gpu/drm/i915/i915_perf.c
> >
> > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/
> Makefile
> > index 6123400..8d4e25f 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -113,6 +113,9 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) +=
> i915_gpu_error.o
> >  # virtual gpu code
> >  i915-y += i915_vgpu.o
> >
> > +# perf code
> > +i915-y += i915_perf.o
> > +
> >  ifeq ($(CONFIG_DRM_I915_GVT),y)
> >  i915-y += intel_gvt.o
> >  include $(src)/gvt/Makefile
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c
> b/drivers/gpu/drm/i915/i915_drv.c
> > index af3559d..685c96e 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -836,6 +836,8 @@ static int i915_driver_init_early(struct
> drm_i915_private *dev_priv,
> >
> >   intel_detect_preproduction_hw(dev_priv);
> >
> > + i915_perf_init(dev_priv);
> > +
> >   return 0;
> >
> >  err_workqueues:
> > @@ -849,6 +851,7 @@ static int i915_driver_init_early(struct
> drm_i915_private *dev_priv,
> >   */
> >  static void i915_driver_cleanup_early(struct drm_i915_private
> *dev_priv)
> >  {
> > + i915_perf_fini(dev_priv);
> >   i915_gem_load_cleanup(&dev_priv->drm);
> >   i915_workqueues_cleanup(dev_priv);
> >  }
> > @@ -2556,6 +2559,7 @@ static const struct drm_ioctl_desc i915_ioctls[] =
> {
> >   DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl,
> DRM_RENDER_ALLOW),
> >   DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM,
> i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW),
> >   DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM,
> i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW),
> > + DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl,
> DRM_RENDER_ALLOW),
> >  };
> >
> >  static struct drm_driver driver = {
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
&g

[PATCH v8 10/12] drm/i915: add oa_event_min_timer_exponent sysctl

2016-11-04 Thread Robert Bragg
On Wed, Nov 2, 2016 at 6:29 AM, sourab gupta  wrote:

> On Thu, 2016-10-27 at 19:14 -0700, Robert Bragg wrote:
> > The minimal sampling period is now configurable via a
> > dev.i915.oa_min_timer_exponent sysctl parameter.
> >
> > Following the precedent set by perf, the default is the minimum that
> > won't (on its own) exceed the default kernel.perf_event_max_sample_rate
> > default of 100000 samples/s.
> >
> > Signed-off-by: Robert Bragg 
> > Reviewed-by: Matthew Auld 
> > ---
> >  drivers/gpu/drm/i915/i915_perf.c | 42 --
> --
> >  1 file changed, 30 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c
> b/drivers/gpu/drm/i915/i915_perf.c
> > index 4e42073..e3c6f51 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -82,6 +82,22 @@ static u32 i915_perf_stream_paranoid = true;
> >  #define INVALID_CTX_ID 0x
> >
> >
> > +/* for sysctl proc_dointvec_minmax of i915_oa_min_timer_exponent */
> > +static int oa_exponent_max = OA_EXPONENT_MAX;
> > +
> > +/* Theoretically we can program the OA unit to sample every 160ns but
> don't
> > + * allow that by default unless root...
> > + *
> > + * The period is derived from the exponent as:
> > + *
> > + *   period = 80ns * 2^(exponent + 1)
> > + *
> > + * Referring to perf's kernel.perf_event_max_sample_rate for a
> precedent
> > + * (10 by default); with an OA exponent of 6 we get a period of
> 10.240
> > + * microseconds - just under 10Hz
> > + */
> > +static u32 i915_oa_min_timer_exponent = 6;
>
> For HSW, the timestamp period is 80ns, so the exponent of 6 translates
> to sampling rate of ~10Hz. But the timestamp period may change for
> other platforms, leading to different values of oa_min_timer_exponent
> corresponding to sampling rate of ~10Hz. Do we plan to have this
> value platform specific subsequently, or the guidance value of ~10Hz
> min sampling rate needn't be strictly followed?
>

actually it's bothered me a bit that I've been lazy with not having this
adapt for gen9+ in later patches

I think it would probably be better to make this a Hz based threshold for
userspace, otherwise any userspace policy here needs to be adapted for each
system with a different timestamp frequency which isn't great.

I've updated the patch locally to make this an oa_max_sample_rate parameter
in Hz, which I'll aim to test on haswell tomorrow and send out.

Thanks,
- Robert
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://lists.freedesktop.org/archives/dri-devel/attachments/20161104/e60f1a57/attachment.html>


[PATCH v8 11/12] drm/i915: Add more Haswell OA metric sets

2016-11-01 Thread Robert Bragg
On Tue, Nov 1, 2016 at 2:57 PM, Chris Wilson 
wrote:

> On Fri, Oct 28, 2016 at 03:14:29AM +0100, Robert Bragg wrote:
> > This adds 'compute', 'compute extended', 'memory reads', 'memory writes'
> > and 'sampler balance' metric sets for Haswell.
> >
> > The code is auto generated from an XML description of metric sets,
> > currently maintained in gputop, ref:
> >
> >  https://github.com/rib/gputop
> >  > gputop-data/oa-*.xml
> >  > scripts/i915-perf-kernelgen.py
> >
> >  $ make -C gputop-data -f Makefile.xml
> >
> > Signed-off-by: Robert Bragg 
> > Reviewed-by: Matthew Auld 
> > ---
> >  drivers/gpu/drm/i915/i915_oa_hsw.c | 559 ++
> ++-
> >  1 file changed, 558 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c
> b/drivers/gpu/drm/i915/i915_oa_hsw.c
> > index 6af25cf..4ddf756 100644
> > --- a/drivers/gpu/drm/i915/i915_oa_hsw.c
> > +++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
> > @@ -31,9 +31,14 @@
> >
> >  enum metric_set_id {
> >   METRIC_SET_ID_RENDER_BASIC = 1,
> > + METRIC_SET_ID_COMPUTE_BASIC,
> > + METRIC_SET_ID_COMPUTE_EXTENDED,
> > + METRIC_SET_ID_MEMORY_READS,
> > + METRIC_SET_ID_MEMORY_WRITES,
> > + METRIC_SET_ID_SAMPLER_BALANCE,
> >  };
> >
> > -int i915_oa_n_builtin_metric_sets_hsw = 1;
> > +int i915_oa_n_builtin_metric_sets_hsw = 6;
> >
> >  static const struct i915_oa_reg b_counter_config_render_basic[] = {
> >   { _MMIO(0x2724), 0x0080 },
> > @@ -112,6 +117,298 @@ get_render_basic_mux_config(struct
> drm_i915_private *dev_priv,
> >   return mux_config_render_basic;
> >  }
> >
> > +static const struct i915_oa_reg b_counter_config_compute_basic[] = {
> > + { _MMIO(0x2710), 0x },
> > + { _MMIO(0x2714), 0x0080 },
> > + { _MMIO(0x2718), 0x },
> > + { _MMIO(0x271c), 0x },
> > + { _MMIO(0x2720), 0x },
> > + { _MMIO(0x2724), 0x0080 },
> > + { _MMIO(0x2728), 0x },
> > + { _MMIO(0x272c), 0x },
> > + { _MMIO(0x2740), 0x },
> > + { _MMIO(0x2744), 0x },
> > + { _MMIO(0x2748), 0x },
> > + { _MMIO(0x274c), 0x },
> > + { _MMIO(0x2750), 0x },
> > + { _MMIO(0x2754), 0x },
> > + { _MMIO(0x2758), 0x },
> > + { _MMIO(0x275c), 0x },
> > + { _MMIO(0x236c), 0x },
> > +};
> > +
> > +static const struct i915_oa_reg mux_config_compute_basic[] = {
> > + { _MMIO(0x253a4), 0x },
> > + { _MMIO(0x2681c), 0x01f00800 },
> > + { _MMIO(0x26820), 0x1000 },
> > + { _MMIO(0x2781c), 0x01f00800 },
> > + { _MMIO(0x26520), 0x0007 },
> > + { _MMIO(0x265a0), 0x0007 },
> > + { _MMIO(0x25380), 0x0010 },
> > + { _MMIO(0x2538c), 0x0030 },
> > + { _MMIO(0x25384), 0xaa8a },
> > + { _MMIO(0x25404), 0x },
> > + { _MMIO(0x26800), 0x4202 },
> > + { _MMIO(0x26808), 0x00605817 },
> > + { _MMIO(0x2680c), 0x10001005 },
> > + { _MMIO(0x26804), 0x },
> > + { _MMIO(0x27800), 0x0102 },
> > + { _MMIO(0x27808), 0x0c0701e0 },
> > + { _MMIO(0x2780c), 0x000200a0 },
> > + { _MMIO(0x27804), 0x },
> > + { _MMIO(0x26484), 0x4400 },
> > + { _MMIO(0x26704), 0x4400 },
> > + { _MMIO(0x26500), 0x0006 },
> > + { _MMIO(0x26510), 0x0001 },
> > + { _MMIO(0x26504), 0x8800 },
> > + { _MMIO(0x26580), 0x0006 },
> > + { _MMIO(0x26590), 0x0020 },
> > + { _MMIO(0x26584), 0x },
> > + { _MMIO(0x26104), 0x5582 },
> > + { _MMIO(0x26184), 0xaa86 },
> > + { _MMIO(0x25420), 0x08320c83 },
> > + { _MMIO(0x25424), 0x06820c83 },
> > + { _MMIO(0x2541c), 0x },
> > + { _MMIO(0x25428), 0x0c03 },
> > +};
> > +
> > +static const struct i915_oa_reg *
> > +get_compute_basic_mux_config(struct drm_i915_private *dev_priv,
> > +  int *len)
> > +{
> > + *len = ARRAY_SIZE(mux_config_compute_basic);
> > + return mux_config_compute_basic;
> > +}
>
> > @@ -140,6 +437,106 @@ int i915_oa_select_metric_set_hsw(struct
> drm_i915_private *dev_priv)
> >   ARRAY_SIZE(b_counter_config_ren

[PATCH v8 02/12] drm/i915: Add i915 perf infrastructure

2016-10-31 Thread Robert Bragg
On Mon, Oct 31, 2016 at 5:13 PM, Matthew Auld <
matthew.william.auld at gmail.com> wrote:

> On 31 October 2016 at 16:27, Robert Bragg  wrote:
> >
> >
> > On Fri, Oct 28, 2016 at 3:27 PM, Matthew Auld
> >  wrote:
> >>
> >> > +/* Note we copy the properties from userspace outside of the i915
> perf
> >> > + * mutex to avoid an awkward lockdep with mmap_sem.
> >> > + *
> >> > + * Note this function only validates properties in isolation it
> doesn't
> >> > + * validate that the combination of properties makes sense or that
> all
> >> > + * properties necessary for a particular kind of stream have been
> set.
> >> > + */
> >> > +static int read_properties_unlocked(struct drm_i915_private
> *dev_priv,
> >> > +   u64 __user *uprops,
> >> > +   u32 n_props,
> >> > +   struct perf_open_properties
> *props)
> >> > +{
> >> > +   u64 __user *uprop = uprops;
> >> > +   int i;
> >> > +
> >> > +   memset(props, 0, sizeof(struct perf_open_properties));
> >> > +
> >> > +   if (!n_props) {
> >> > +   DRM_ERROR("No i915 perf properties given");
> >> > +   return -EINVAL;
> >> > +   }
> >> > +
> >> > +   if (n_props > DRM_I915_PERF_PROP_MAX) {
> >> Ah but DRM_I915_PERF_PROP_MAX is not a property itself.
> >
> >
> > I'm not sure I follow what your implied concern is?
> >
> > This is just a sanity check for the number properties given by userspace,
> > based on the assumption that there's currently no reason for multiple
> values
> > with a particular property id.
> >
> All I meant was should it not be n_props >= DRM_I915_PERF_PROP_MAX ?


> So with that fixed, or if I'm completely mad:
> Reviewed-by: Matthew Auld 
>

Ah, I see. Actually tbh I think either is reasonable...

The check is mainly about ruling out the silly large values that could be
given, imposing a upper-bound to the number of properties expected from
userspace. It might help catch userspace giving garbage/undefined data, or
block attempts to get the kernel parsing huge amounts of property data
which should never be necessary for configuring a stream. It doesn't e.g.
stop userspace specifying duplicate property IDs even if they supply less
than the maximum allowed. So even if it allowed say 2x the number of
properties I think it would still pretty much do its job.

I could imagine in the future the same check might become much more fuzzy
if we have a case where userspace might need to legitimately specify the
same property ID multiple times (where the sequential order is relevant).

_PERF_PROP_MAX is the last in the enum whereby we can interpret it as an
upper bound on the number of properties while we don't currently expect to
see property IDs duplicated.

The detail here though is that ID 0 is reserved so _PERF_PROP_MAX is more
like ('the maximum number of properties' + 1) - and so this is what you're
essentially highlighting.

I can change this - maybe with a comment about ID 0 being reserved and
explaining the assumption that property ID duplicates aren't currently
expected

Thanks for the review!
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://lists.freedesktop.org/archives/dri-devel/attachments/20161031/058022e6/attachment.html>


[PATCH v8 02/12] drm/i915: Add i915 perf infrastructure

2016-10-31 Thread Robert Bragg
On Fri, Oct 28, 2016 at 3:27 PM, Matthew Auld <
matthew.william.auld at gmail.com> wrote:

> > +/* Note we copy the properties from userspace outside of the i915 perf
> > + * mutex to avoid an awkward lockdep with mmap_sem.
> > + *
> > + * Note this function only validates properties in isolation it doesn't
> > + * validate that the combination of properties makes sense or that all
> > + * properties necessary for a particular kind of stream have been set.
> > + */
> > +static int read_properties_unlocked(struct drm_i915_private *dev_priv,
> > +   u64 __user *uprops,
> > +   u32 n_props,
> > +   struct perf_open_properties *props)
> > +{
> > +   u64 __user *uprop = uprops;
> > +   int i;
> > +
> > +   memset(props, 0, sizeof(struct perf_open_properties));
> > +
> > +   if (!n_props) {
> > +   DRM_ERROR("No i915 perf properties given");
> > +   return -EINVAL;
> > +   }
> > +
> > +   if (n_props > DRM_I915_PERF_PROP_MAX) {
> Ah but DRM_I915_PERF_PROP_MAX is not a property itself.
>

I'm not sure I follow what your implied concern is?

This is just a sanity check for the number properties given by userspace,
based on the assumption that there's currently no reason for multiple
values with a particular property id.
-- next part --
An HTML attachment was scrubbed...
URL: 



[PATCH v8 12/12] drm/i915: Add a kerneldoc summary for i915_perf.c

2016-10-28 Thread Robert Bragg
In particular this tries to capture for posterity some of the early
challenges we had with using the core perf infrastructure in case we
ever want to revisit adapting perf for device metrics.

Cc: Chris Wilson 
Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_perf.c | 163 +++
 1 file changed, 163 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index e3c6f51..621b3aa 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -24,6 +24,169 @@
  *   Robert Bragg 
  */

+
+/**
+ * DOC: i915 Perf, streaming API for GPU metrics
+ *
+ * Gen graphics supports a large number of performance counters that can help
+ * driver and application developers understand and optimize their use of the
+ * GPU.
+ *
+ * This i915 perf interface enables userspace to configure and open a file
+ * descriptor representing a stream of GPU metrics which can then be read() as
+ * a stream of sample records.
+ *
+ * The interface is particularly suited to exposing buffered metrics that are
+ * captured by DMA from the GPU, unsynchronized with and unrelated to the CPU.
+ *
+ * Streams representing a single context are accessible to applications with a
+ * corresponding drm file descriptor, such that OpenGL can use the interface
+ * without special privileges. Access to system-wide metrics requires root
+ * privileges by default, unless changed via the dev.i915.perf_event_paranoid
+ * sysctl option.
+ *
+ *
+ * The interface was initially inspired by the core Perf infrastructure but
+ * some notable differences are:
+ *
+ * i915 perf file descriptors represent a "stream" instead of an "event"; where
+ * a perf event primarily corresponds to a single 64bit value, while a stream
+ * might sample sets of tightly-coupled counters, depending on the
+ * configuration.  For example the Gen OA unit isn't designed to support
+ * orthogonal configurations of individual counters; it's configured for a set
+ * of related counters. Samples for an i915 perf stream capturing OA metrics
+ * will include a set of counter values packed in a compact HW specific format.
+ * The OA unit supports a number of different packing formats which can be
+ * selected by the user opening the stream. Perf has support for grouping
+ * events, but each event in the group is configured, validated and
+ * authenticated individually with separate system calls.
+ *
+ * i915 perf stream configurations are provided as an array of u64 (key,value)
+ * pairs, instead of a fixed struct with multiple miscellaneous config members,
+ * interleaved with event-type specific members.
+ *
+ * i915 perf doesn't support exposing metrics via an mmap'd circular buffer.
+ * The supported metrics are being written to memory by the GPU unsynchronized
+ * with the CPU, using HW specific packing formats for counter sets. Sometimes
+ * the constraints on HW configuration require reports to be filtered before it
+ * would be acceptable to expose them to unprivileged applications - to hide
+ * the metrics of other processes/contexts. For these use cases a read() based
+ * interface is a good fit, and provides an opportunity to filter data as it
+ * gets copied from the GPU mapped buffers to userspace buffers.
+ *
+ *
+ * Some notes regarding Linux Perf:
+ * 
+ *
+ * The first prototype of this driver was based on the core perf
+ * infrastructure, and while we did make that mostly work, with some changes to
+ * perf, we found we were breaking or working around too many assumptions baked
+ * into perf's currently cpu centric design.
+ *
+ * In the end we didn't see a clear benefit to making perf's implementation and
+ * interface more complex by changing design assumptions while we knew we still
+ * wouldn't be able to use any existing perf based userspace tools.
+ *
+ * Also considering the Gen specific nature of the Observability hardware and
+ * how userspace will sometimes need to combine i915 perf OA metrics with
+ * side-band OA data captured via MI_REPORT_PERF_COUNT commands; we're
+ * expecting the interface to be used by a platform specific userspace such as
+ * OpenGL or tools. This is to say; we aren't inherently missing out on having
+ * a standard vendor/architecture agnostic interface by not using perf.
+ *
+ *
+ * For posterity, in case we might re-visit trying to adapt core perf to be
+ * better suited to exposing i915 metrics these were the main pain points we
+ * hit:
+ *
+ * - The perf based OA PMU driver broke some significant design assumptions:
+ *
+ *   Existing perf pmus are used for profiling work on a cpu and we were
+ *   introducing the idea of _IS_DEVICE pmus with different security
+ *   implications, the need to fake cpu-related data (such as user/kernel
+ *   registers) to fit with perf's current design, and addi

[PATCH v8 11/12] drm/i915: Add more Haswell OA metric sets

2016-10-28 Thread Robert Bragg
This adds 'compute', 'compute extended', 'memory reads', 'memory writes'
and 'sampler balance' metric sets for Haswell.

The code is auto generated from an XML description of metric sets,
currently maintained in gputop, ref:

 https://github.com/rib/gputop
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_oa_hsw.c | 559 -
 1 file changed, 558 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
index 6af25cf..4ddf756 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.c
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -31,9 +31,14 @@

 enum metric_set_id {
METRIC_SET_ID_RENDER_BASIC = 1,
+   METRIC_SET_ID_COMPUTE_BASIC,
+   METRIC_SET_ID_COMPUTE_EXTENDED,
+   METRIC_SET_ID_MEMORY_READS,
+   METRIC_SET_ID_MEMORY_WRITES,
+   METRIC_SET_ID_SAMPLER_BALANCE,
 };

-int i915_oa_n_builtin_metric_sets_hsw = 1;
+int i915_oa_n_builtin_metric_sets_hsw = 6;

 static const struct i915_oa_reg b_counter_config_render_basic[] = {
{ _MMIO(0x2724), 0x0080 },
@@ -112,6 +117,298 @@ get_render_basic_mux_config(struct drm_i915_private 
*dev_priv,
return mux_config_render_basic;
 }

+static const struct i915_oa_reg b_counter_config_compute_basic[] = {
+   { _MMIO(0x2710), 0x },
+   { _MMIO(0x2714), 0x0080 },
+   { _MMIO(0x2718), 0x },
+   { _MMIO(0x271c), 0x },
+   { _MMIO(0x2720), 0x },
+   { _MMIO(0x2724), 0x0080 },
+   { _MMIO(0x2728), 0x },
+   { _MMIO(0x272c), 0x },
+   { _MMIO(0x2740), 0x },
+   { _MMIO(0x2744), 0x },
+   { _MMIO(0x2748), 0x },
+   { _MMIO(0x274c), 0x },
+   { _MMIO(0x2750), 0x },
+   { _MMIO(0x2754), 0x },
+   { _MMIO(0x2758), 0x },
+   { _MMIO(0x275c), 0x },
+   { _MMIO(0x236c), 0x },
+};
+
+static const struct i915_oa_reg mux_config_compute_basic[] = {
+   { _MMIO(0x253a4), 0x },
+   { _MMIO(0x2681c), 0x01f00800 },
+   { _MMIO(0x26820), 0x1000 },
+   { _MMIO(0x2781c), 0x01f00800 },
+   { _MMIO(0x26520), 0x0007 },
+   { _MMIO(0x265a0), 0x0007 },
+   { _MMIO(0x25380), 0x0010 },
+   { _MMIO(0x2538c), 0x0030 },
+   { _MMIO(0x25384), 0xaa8a },
+   { _MMIO(0x25404), 0x },
+   { _MMIO(0x26800), 0x4202 },
+   { _MMIO(0x26808), 0x00605817 },
+   { _MMIO(0x2680c), 0x10001005 },
+   { _MMIO(0x26804), 0x },
+   { _MMIO(0x27800), 0x0102 },
+   { _MMIO(0x27808), 0x0c0701e0 },
+   { _MMIO(0x2780c), 0x000200a0 },
+   { _MMIO(0x27804), 0x },
+   { _MMIO(0x26484), 0x4400 },
+   { _MMIO(0x26704), 0x4400 },
+   { _MMIO(0x26500), 0x0006 },
+   { _MMIO(0x26510), 0x0001 },
+   { _MMIO(0x26504), 0x8800 },
+   { _MMIO(0x26580), 0x0006 },
+   { _MMIO(0x26590), 0x0020 },
+   { _MMIO(0x26584), 0x },
+   { _MMIO(0x26104), 0x5582 },
+   { _MMIO(0x26184), 0xaa86 },
+   { _MMIO(0x25420), 0x08320c83 },
+   { _MMIO(0x25424), 0x06820c83 },
+   { _MMIO(0x2541c), 0x },
+   { _MMIO(0x25428), 0x0c03 },
+};
+
+static const struct i915_oa_reg *
+get_compute_basic_mux_config(struct drm_i915_private *dev_priv,
+int *len)
+{
+   *len = ARRAY_SIZE(mux_config_compute_basic);
+   return mux_config_compute_basic;
+}
+
+static const struct i915_oa_reg b_counter_config_compute_extended[] = {
+   { _MMIO(0x2724), 0xf080 },
+   { _MMIO(0x2720), 0x },
+   { _MMIO(0x2714), 0xf080 },
+   { _MMIO(0x2710), 0x },
+   { _MMIO(0x2770), 0x0007fe2a },
+   { _MMIO(0x2774), 0xff00 },
+   { _MMIO(0x2778), 0x0007fe6a },
+   { _MMIO(0x277c), 0xff00 },
+   { _MMIO(0x2780), 0x0007fe92 },
+   { _MMIO(0x2784), 0xff00 },
+   { _MMIO(0x2788), 0x0007fea2 },
+   { _MMIO(0x278c), 0xff00 },
+   { _MMIO(0x2790), 0x0007fe32 },
+   { _MMIO(0x2794), 0xff00 },
+   { _MMIO(0x2798), 0x0007fe9a },
+   { _MMIO(0x279c), 0xff00 },
+   { _MMIO(0x27a0), 0x0007ff23 },
+   { _MMIO(0x27a4), 0xff00 },
+   { _MMIO(0x27a8), 0x0007fff3 },
+   { _MMIO(0x27ac), 0xfffe },
+};
+
+static const struct i915_oa_reg mux_config_compute_extended[] = {
+   { _MMIO(0x2681c), 0x3eb00800 },
+   { _MMIO(0x26820), 0x0090 },
+   { _MMIO(0x25384), 0x02aa },
+   { _MMIO(0x25404), 0x03ff },
+   { _MMIO(0x26800), 0x00142284 },
+   { _MMIO(0x26808), 0x0e629062 },
+   { _MMIO(0x2680c), 0x3f6f55cb },
+   { _MMIO(0x26810), 0x0014 },
+  

[PATCH v8 10/12] drm/i915: add oa_event_min_timer_exponent sysctl

2016-10-28 Thread Robert Bragg
The minimal sampling period is now configurable via a
dev.i915.oa_min_timer_exponent sysctl parameter.

Following the precedent set by perf, the default is the minimum that
won't (on its own) exceed the default kernel.perf_event_max_sample_rate
default of 10 samples/s.

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_perf.c | 42 
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 4e42073..e3c6f51 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -82,6 +82,22 @@ static u32 i915_perf_stream_paranoid = true;
 #define INVALID_CTX_ID 0x


+/* for sysctl proc_dointvec_minmax of i915_oa_min_timer_exponent */
+static int oa_exponent_max = OA_EXPONENT_MAX;
+
+/* Theoretically we can program the OA unit to sample every 160ns but don't
+ * allow that by default unless root...
+ *
+ * The period is derived from the exponent as:
+ *
+ *   period = 80ns * 2^(exponent + 1)
+ *
+ * Referring to perf's kernel.perf_event_max_sample_rate for a precedent
+ * (10 by default); with an OA exponent of 6 we get a period of 10.240
+ * microseconds - just under 10Hz
+ */
+static u32 i915_oa_min_timer_exponent = 6;
+
 /* XXX: beware if future OA HW adds new report formats that the current
  * code assumes all reports have a power-of-two size and ~(size - 1) can
  * be used as a mask to align the OA tail pointer.
@@ -1353,21 +1369,14 @@ static int read_properties_unlocked(struct 
drm_i915_private *dev_priv,
return -EINVAL;
}

-   /* NB: The exponent represents a period as follows:
-*
-*   80ns * 2^(period_exponent + 1)
-*
-* Theoretically we can program the OA unit to sample
+   /* Theoretically we can program the OA unit to sample
 * every 160ns but don't allow that by default unless
 * root.
-*
-* Referring to perf's
-* kernel.perf_event_max_sample_rate for a precedent
-* (10 by default); with an OA exponent of 6 we get
-* a period of 10.240 microseconds -just under 10Hz
 */
-   if (value < 6 && !capable(CAP_SYS_ADMIN)) {
-   DRM_ERROR("Minimum OA sampling exponent is 6 
without root privileges\n");
+   if (value < i915_oa_min_timer_exponent &&
+   !capable(CAP_SYS_ADMIN)) {
+   DRM_ERROR("Minimum OA sampling exponent (sysctl 
dev.i915.oa_min_timer_exponent) is %u without root privileges\n",
+ i915_oa_min_timer_exponent);
return -EACCES;
}

@@ -1475,6 +1484,15 @@ static struct ctl_table oa_table[] = {
 .extra1 = &zero,
 .extra2 = &one,
 },
+   {
+.procname = "oa_min_timer_exponent",
+.data = &i915_oa_min_timer_exponent,
+.maxlen = sizeof(i915_oa_min_timer_exponent),
+.mode = 0644,
+.proc_handler = proc_dointvec_minmax,
+.extra1 = &zero,
+.extra2 = &oa_exponent_max,
+},
{}
 };

-- 
2.10.1



[PATCH v8 09/12] drm/i915: Add dev.i915.perf_stream_paranoid sysctl option

2016-10-28 Thread Robert Bragg
Consistent with the kernel.perf_event_paranoid sysctl option that can
allow non-root users to access system wide cpu metrics, this can
optionally allow non-root users to access system wide OA counter metrics
from Gen graphics hardware.

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/i915_perf.c | 50 +++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 01438fb..a138f86 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2171,6 +2171,7 @@ struct drm_i915_private {
bool initialized;

struct kobject *metrics_kobj;
+   struct ctl_table_header *sysctl_header;

struct mutex lock;
struct list_head streams;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 8d07c41..4e42073 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -64,6 +64,11 @@
 #define POLL_FREQUENCY 200
 #define POLL_PERIOD (NSEC_PER_SEC / POLL_FREQUENCY)

+/* for sysctl proc_dointvec_minmax of dev.i915.perf_stream_paranoid */
+static int zero;
+static int one = 1;
+static u32 i915_perf_stream_paranoid = true;
+
 /* The maximum exponent the hardware accepts is 63 (essentially it selects one
  * of the 64bit timestamp bits to trigger reports from) but there's currently
  * no known use case for sampling as infrequently as once per 47 thousand 
years.
@@ -1207,7 +1212,13 @@ i915_perf_open_ioctl_locked(struct drm_i915_private 
*dev_priv,
}
}

-   if (!specific_ctx && !capable(CAP_SYS_ADMIN)) {
+   /* Similar to perf's kernel.perf_paranoid_cpu sysctl option
+* we check a dev.i915.perf_stream_paranoid sysctl option
+* to determine if it's ok to access system wide OA counters
+* without CAP_SYS_ADMIN privileges.
+*/
+   if (!specific_ctx &&
+   i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
DRM_ERROR("Insufficient privileges to open system-wide i915 
perf stream\n");
ret = -EACCES;
goto err_ctx;
@@ -1454,6 +1465,39 @@ void i915_perf_unregister(struct drm_i915_private 
*dev_priv)
dev_priv->perf.metrics_kobj = NULL;
 }

+static struct ctl_table oa_table[] = {
+   {
+.procname = "perf_stream_paranoid",
+.data = &i915_perf_stream_paranoid,
+.maxlen = sizeof(i915_perf_stream_paranoid),
+.mode = 0644,
+.proc_handler = proc_dointvec_minmax,
+.extra1 = &zero,
+.extra2 = &one,
+},
+   {}
+};
+
+static struct ctl_table i915_root[] = {
+   {
+.procname = "i915",
+.maxlen = 0,
+.mode = 0555,
+.child = oa_table,
+},
+   {}
+};
+
+static struct ctl_table dev_root[] = {
+   {
+.procname = "dev",
+.maxlen = 0,
+.mode = 0555,
+.child = i915_root,
+},
+   {}
+};
+
 void i915_perf_init(struct drm_i915_private *dev_priv)
 {
if (!IS_HASWELL(dev_priv))
@@ -1484,6 +1528,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
dev_priv->perf.oa.n_builtin_sets =
i915_oa_n_builtin_metric_sets_hsw;

+   dev_priv->perf.sysctl_header = register_sysctl_table(dev_root);
+
dev_priv->perf.initialized = true;
 }

@@ -1492,6 +1538,8 @@ void i915_perf_fini(struct drm_i915_private *dev_priv)
if (!dev_priv->perf.initialized)
return;

+   unregister_sysctl_table(dev_priv->perf.sysctl_header);
+
memset(&dev_priv->perf.oa.ops, 0, sizeof(dev_priv->perf.oa.ops));
dev_priv->perf.initialized = false;
 }
-- 
2.10.1



[PATCH v8 08/12] drm/i915: advertise available metrics via sysfs

2016-10-28 Thread Robert Bragg
Each metric set is given a sysfs entry like:

/sys/class/drm/card0/metrics//id

This allows userspace to enumerate the specific sets that are available
for the current system. The 'id' file contains an unsigned integer that
can be used to open the associated metric set via
DRM_IOCTL_I915_PERF_OPEN. The  is a globally unique ID for a
specific OA unit register configuration that can be reliably used by
userspace as a key to lookup corresponding counter meta data and
normalization equations.

The guid registry is currently maintained as part of gputop along with
the XML metric set descriptions and code generation scripts, ref:

 https://github.com/rib/gputop
 > gputop-data/guids.xml
 > scripts/update-guids.py
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_drv.c|  5 
 drivers/gpu/drm/i915/i915_drv.h|  4 +++
 drivers/gpu/drm/i915/i915_oa_hsw.c | 51 +
 drivers/gpu/drm/i915/i915_oa_hsw.h |  4 +++
 drivers/gpu/drm/i915/i915_perf.c   | 52 ++
 5 files changed, 116 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 685c96e..29bc83b 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1116,6 +1116,9 @@ static void i915_driver_register(struct drm_i915_private 
*dev_priv)
i915_debugfs_register(dev_priv);
i915_guc_register(dev_priv);
i915_setup_sysfs(dev_priv);
+
+   /* Depends on sysfs having been initialized */
+   i915_perf_register(dev_priv);
} else
DRM_ERROR("Failed to register driver for userspace access!\n");

@@ -1152,6 +1155,8 @@ static void i915_driver_unregister(struct 
drm_i915_private *dev_priv)
acpi_video_unregister();
intel_opregion_unregister(dev_priv);

+   i915_perf_unregister(dev_priv);
+
i915_teardown_sysfs(dev_priv);
i915_guc_unregister(dev_priv);
i915_debugfs_unregister(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index dd2b4d3..01438fb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2170,6 +2170,8 @@ struct drm_i915_private {
struct {
bool initialized;

+   struct kobject *metrics_kobj;
+
struct mutex lock;
struct list_head streams;

@@ -3757,6 +3759,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
 /* i915_perf.c */
 extern void i915_perf_init(struct drm_i915_private *dev_priv);
 extern void i915_perf_fini(struct drm_i915_private *dev_priv);
+extern void i915_perf_register(struct drm_i915_private *dev_priv);
+extern void i915_perf_unregister(struct drm_i915_private *dev_priv);

 /* i915_suspend.c */
 extern int i915_save_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
index 8906380..6af25cf 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.c
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -24,6 +24,8 @@
  *
  */

+#include 
+
 #include "i915_drv.h"
 #include "i915_oa_hsw.h"

@@ -142,3 +144,52 @@ int i915_oa_select_metric_set_hsw(struct drm_i915_private 
*dev_priv)
return -ENODEV;
}
 }
+
+static ssize_t
+show_render_basic_id(struct device *kdev, struct device_attribute *attr, char 
*buf)
+{
+   return sprintf(buf, "%d\n", METRIC_SET_ID_RENDER_BASIC);
+}
+
+static struct device_attribute dev_attr_render_basic_id = {
+   .attr = { .name = "id", .mode = 0444 },
+   .show = show_render_basic_id,
+   .store = NULL,
+};
+
+static struct attribute *attrs_render_basic[] = {
+   &dev_attr_render_basic_id.attr,
+   NULL,
+};
+
+static struct attribute_group group_render_basic = {
+   .name = "403d8832-1a27-4aa6-a64e-f5389ce7b212",
+   .attrs =  attrs_render_basic,
+};
+
+int
+i915_perf_register_sysfs_hsw(struct drm_i915_private *dev_priv)
+{
+   int mux_len;
+   int ret = 0;
+
+   if (get_render_basic_mux_config(dev_priv, &mux_len)) {
+   ret = sysfs_create_group(dev_priv->perf.metrics_kobj, 
&group_render_basic);
+   if (ret)
+   goto error_render_basic;
+   }
+
+   return 0;
+
+error_render_basic:
+   return ret;
+}
+
+void
+i915_perf_unregister_sysfs_hsw(struct drm_i915_private *dev_priv)
+{
+   int mux_len;
+
+   if (get_render_basic_mux_config(dev_priv, &mux_len))
+   sysfs_remove_group(dev_priv->perf.metrics_kobj, 
&group_render_basic);
+}
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.h 
b/drivers/gpu/drm/i915/i915_oa_hsw.h
index b618a1f..429a229 100644
--

[PATCH v8 07/12] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-10-28 Thread Robert Bragg
Gen graphics hardware can be set up to periodically write snapshots of
performance counters into a circular buffer via its Observation
Architecture and this patch exposes that capability to userspace via the
i915 perf interface.

v2:
   Make sure to initialize ->specific_ctx_id when opening, without
   relying on _pin_notify hook, in case ctx already pinned.
v3:
   Revert back to pinning ctx upfront when opening stream, removing
   need to hook in to pinning and to update OACONTROL on the fly.

Cc: Chris Wilson 
Signed-off-by: Robert Bragg 
Signed-off-by: Zhenyu Wang 
---
 drivers/gpu/drm/i915/i915_drv.h  |   66 ++-
 drivers/gpu/drm/i915/i915_perf.c | 1036 +-
 drivers/gpu/drm/i915/i915_reg.h  |  338 +
 include/uapi/drm/i915_drm.h  |   71 ++-
 4 files changed, 1482 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f22adc4..dd2b4d3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1767,6 +1767,11 @@ struct intel_wm_config {
bool sprites_scaled;
 };

+struct i915_oa_format {
+   u32 format;
+   int size;
+};
+
 struct i915_oa_reg {
i915_reg_t addr;
u32 value;
@@ -1787,11 +1792,6 @@ struct i915_perf_stream_ops {
 */
void (*disable)(struct i915_perf_stream *stream);

-   /* Return: true if any i915 perf records are ready to read()
-* for this stream.
-*/
-   bool (*can_read)(struct i915_perf_stream *stream);
-
/* Call poll_wait, passing a wait queue that will be woken
 * once there is something ready to read() for the stream
 */
@@ -1801,9 +1801,7 @@ struct i915_perf_stream_ops {

/* For handling a blocking read, wait until there is something
 * to ready to read() for the stream. E.g. wait on the same
-* wait queue that would be passed to poll_wait() until
-* ->can_read() returns true (if its safe to call ->can_read()
-* without the i915 perf lock held).
+* wait queue that would be passed to poll_wait().
 */
int (*wait_unlocked)(struct i915_perf_stream *stream);

@@ -1843,11 +1841,28 @@ struct i915_perf_stream {
struct list_head link;

u32 sample_flags;
+   int sample_size;

struct i915_gem_context *ctx;
bool enabled;

-   struct i915_perf_stream_ops *ops;
+   const struct i915_perf_stream_ops *ops;
+};
+
+struct i915_oa_ops {
+   void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
+   int (*enable_metric_set)(struct drm_i915_private *dev_priv);
+   void (*disable_metric_set)(struct drm_i915_private *dev_priv);
+   void (*oa_enable)(struct drm_i915_private *dev_priv);
+   void (*oa_disable)(struct drm_i915_private *dev_priv);
+   void (*update_oacontrol)(struct drm_i915_private *dev_priv);
+   void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv,
+   u32 ctx_id);
+   int (*read)(struct i915_perf_stream *stream,
+   char __user *buf,
+   size_t count,
+   size_t *offset);
+   bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv);
 };

 struct drm_i915_private {
@@ -2154,16 +2169,47 @@ struct drm_i915_private {

struct {
bool initialized;
+
struct mutex lock;
struct list_head streams;

+   spinlock_t hook_lock;
+
struct {
-   u32 metrics_set;
+   struct i915_perf_stream *exclusive_stream;
+
+   u32 specific_ctx_id;
+   struct i915_vma *pinned_rcs_vma;
+
+   struct hrtimer poll_check_timer;
+   wait_queue_head_t poll_wq;
+   bool pollin;
+
+   bool periodic;
+   int period_exponent;
+   int timestamp_frequency;
+
+   int tail_margin;
+
+   int metrics_set;

const struct i915_oa_reg *mux_regs;
int mux_regs_len;
const struct i915_oa_reg *b_counter_regs;
int b_counter_regs_len;
+
+   struct {
+   struct i915_vma *vma;
+   u8 *vaddr;
+   int format;
+   int format_size;
+   } oa_buffer;
+
+   u32 gen7_latched_oastatus1;
+
+   struct i915_oa_ops ops;
+   const struct i915_oa_format *oa_formats;
+   int n_builtin_sets;
} oa;
} perf;

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index c45cf92..8b9cf0d 100644
--- 

[PATCH v8 06/12] drm/i915: Add 'render basic' Haswell OA unit config

2016-10-28 Thread Robert Bragg
Adds a static OA unit, MUX + B Counter configuration for basic render
metrics on Haswell. This is auto generated from an XML
description of metric sets, currently maintained in gputop, ref:

  https://github.com/rib/gputop
  > gputop-data/oa-*.xml
  > scripts/i915-perf-kernelgen.py

  $ make -C gputop-data -f Makefile.xml SYSFS=0 WHITELIST=RenderBasic

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/Makefile  |   3 +-
 drivers/gpu/drm/i915/i915_drv.h|  14 
 drivers/gpu/drm/i915/i915_oa_hsw.c | 144 +
 drivers/gpu/drm/i915/i915_oa_hsw.h |  34 +
 4 files changed, 194 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 8d4e25f..ac0c3ad 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -114,7 +114,8 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o
 i915-y += i915_vgpu.o

 # perf code
-i915-y += i915_perf.o
+i915-y += i915_perf.o \
+ i915_oa_hsw.o

 ifeq ($(CONFIG_DRM_I915_GVT),y)
 i915-y += intel_gvt.o
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7a65c0b..f22adc4 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1767,6 +1767,11 @@ struct intel_wm_config {
bool sprites_scaled;
 };

+struct i915_oa_reg {
+   i915_reg_t addr;
+   u32 value;
+};
+
 struct i915_perf_stream;

 struct i915_perf_stream_ops {
@@ -2151,6 +2156,15 @@ struct drm_i915_private {
bool initialized;
struct mutex lock;
struct list_head streams;
+
+   struct {
+   u32 metrics_set;
+
+   const struct i915_oa_reg *mux_regs;
+   int mux_regs_len;
+   const struct i915_oa_reg *b_counter_regs;
+   int b_counter_regs_len;
+   } oa;
} perf;

/* Abstract the submission mechanism (legacy ringbuffer or execlists) 
away */
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
new file mode 100644
index 000..8906380
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -0,0 +1,144 @@
+/*
+ * Autogenerated file, DO NOT EDIT manually!
+ *
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "i915_drv.h"
+#include "i915_oa_hsw.h"
+
+enum metric_set_id {
+   METRIC_SET_ID_RENDER_BASIC = 1,
+};
+
+int i915_oa_n_builtin_metric_sets_hsw = 1;
+
+static const struct i915_oa_reg b_counter_config_render_basic[] = {
+   { _MMIO(0x2724), 0x0080 },
+   { _MMIO(0x2720), 0x },
+   { _MMIO(0x2714), 0x0080 },
+   { _MMIO(0x2710), 0x },
+};
+
+static const struct i915_oa_reg mux_config_render_basic[] = {
+   { _MMIO(0x253a4), 0x0160 },
+   { _MMIO(0x25440), 0x0010 },
+   { _MMIO(0x25128), 0x },
+   { _MMIO(0x2691c), 0x0800 },
+   { _MMIO(0x26aa0), 0x0150 },
+   { _MMIO(0x26b9c), 0x6000 },
+   { _MMIO(0x2791c), 0x0800 },
+   { _MMIO(0x27aa0), 0x0150 },
+   { _MMIO(0x27b9c), 0x6000 },
+   { _MMIO(0x2641c), 0x0400 },
+   { _MMIO(0x25380), 0x0010 },
+   { _MMIO(0x2538c), 0x },
+   { _MMIO(0x25384), 0x0800 },
+   { _MMIO(0x25400), 0x0004 },
+   { _MMIO(0x2540c), 0x06029000 },
+   { _MMIO(0x25410), 0x0002 },
+   { _MMIO(0x25404), 0x5c30 },
+   { _MMIO(0x25100), 0x0016 },
+   { _MMIO(0x25110), 0x0400 },
+   { _MMIO(0x25104), 0x },
+   { _MMIO(0x26804), 0x1211 },
+   { _MMIO(0

[PATCH v8 05/12] drm/i915: don't whitelist oacontrol in cmd parser

2016-10-28 Thread Robert Bragg
Being able to program OACONTROL from a non-privileged batch buffer is
not sufficient to be able to configure the OA unit. This was originally
allowed to help enable Mesa to expose OA counters via the
INTEL_performance_query extension, but the current implementation based
on programming OACONTROL via a batch buffer isn't able to report useable
data without a more complete OA unit configuration. Mesa handles the
possibility that writes to OACONTROL may not be allowed and so only
advertises the extension after explicitly testing that a write to
OACONTROL succeeds. Based on this; removing OACONTROL from the whitelist
should be ok for userspace.

Removing this simplifies adding a new kernel api for configuring the OA
unit without needing to consider the possibility that userspace might
trample on OACONTROL state which we'd like to start managing within
the kernel instead. In particular running any Mesa based GL application
currently results in clearing OACONTROL when initializing which would
disable the capturing of metrics.

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 38 ++
 1 file changed, 2 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index c45dd83..5152d6f 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -450,7 +450,6 @@ static const struct drm_i915_reg_descriptor 
gen7_render_regs[] = {
REG64(PS_INVOCATION_COUNT),
REG64(PS_DEPTH_COUNT),
REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE),
-   REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */
REG64(MI_PREDICATE_SRC0),
REG64(MI_PREDICATE_SRC1),
REG32(GEN7_3DPRIM_END_OFFSET),
@@ -1060,8 +1059,7 @@ bool intel_engine_needs_cmd_parser(struct intel_engine_cs 
*engine)
 static bool check_cmd(const struct intel_engine_cs *engine,
  const struct drm_i915_cmd_descriptor *desc,
  const u32 *cmd, u32 length,
- const bool is_master,
- bool *oacontrol_set)
+ const bool is_master)
 {
if (desc->flags & CMD_DESC_SKIP)
return true;
@@ -1099,31 +1097,6 @@ static bool check_cmd(const struct intel_engine_cs 
*engine,
}

/*
-* OACONTROL requires some special handling for
-* writes. We want to make sure that any batch which
-* enables OA also disables it before the end of the
-* batch. The goal is to prevent one process from
-* snooping on the perf data from another process. To do
-* that, we need to check the value that will be written
-* to the register. Hence, limit OACONTROL writes to
-* only MI_LOAD_REGISTER_IMM commands.
-*/
-   if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) {
-   if (desc->cmd.value == MI_LOAD_REGISTER_MEM) {
-   DRM_DEBUG_DRIVER("CMD: Rejected LRM to 
OACONTROL\n");
-   return false;
-   }
-
-   if (desc->cmd.value == MI_LOAD_REGISTER_REG) {
-   DRM_DEBUG_DRIVER("CMD: Rejected LRR to 
OACONTROL\n");
-   return false;
-   }
-
-   if (desc->cmd.value == MI_LOAD_REGISTER_IMM(1))
-   *oacontrol_set = (cmd[offset + 1] != 0);
-   }
-
-   /*
 * Check the value written to the register against the
 * allowed mask/value pair given in the whitelist entry.
 */
@@ -1214,7 +1187,6 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
u32 *cmd, *batch_end;
struct drm_i915_cmd_descriptor default_desc = noop_desc;
const struct drm_i915_cmd_descriptor *desc = &default_desc;
-   bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
bool needs_clflush_after = false;
int ret = 0;

@@ -1270,8 +1242,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
break;
}

-   if (!check_cmd(engine, desc, cmd, length, is_master,
-  &oacontrol_set)) {
+   if (!check_cmd(engine, desc, cmd, length, is_master)) {
ret = -EACCES;
break;
}
@@ -1279,11 +1250,6 @@ int intel_engine_cmd_parse

[PATCH v8 04/12] drm/i915: return EACCES for check_cmd() failures

2016-10-28 Thread Robert Bragg
check_cmd() is checking whether a command adheres to certain
restrictions that ensure it's safe to execute within a privileged batch
buffer. Returning false implies a privilege problem, not that the
command is invalid.

The distinction makes the difference between allowing the buffer to be
executed as an unprivileged batch buffer or returning an EINVAL error to
userspace without executing anything.

In a case where userspace may want to test whether it can successfully
write to a register that needs privileges the distinction may be
important and an EINVAL error may be considered fatal.

In particular this is currently true for Mesa, which includes a test for
whether OACONTROL can be written too, but Mesa treats any error when
flushing a batch buffer as fatal, calling exit(1).

As it is currently Mesa can gracefully handle a failure to write to
OACONTROL if the command parser is disabled, but if we were to remove
OACONTROL from the parser's whitelist then the returned EINVAL would
break Mesa applications as they attempt an OACONTROL write.

This bumps the command parser version from 7 to 8, as the change is
visible to userspace.

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index fe34470..c45dd83 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -1272,7 +1272,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,

if (!check_cmd(engine, desc, cmd, length, is_master,
   &oacontrol_set)) {
-   ret = -EINVAL;
+   ret = -EACCES;
break;
}

@@ -1333,6 +1333,9 @@ int i915_cmd_parser_get_version(struct drm_i915_private 
*dev_priv)
 * 5. GPGPU dispatch compute indirect registers.
 * 6. TIMESTAMP register and Haswell CS GPR registers
 * 7. Allow MI_LOAD_REGISTER_REG between whitelisted registers.
+* 8. Don't report cmd_check() failures as EINVAL errors to userspace;
+*rely on the HW to NOOP disallowed commands as it would without
+*the parser enabled.
 */
-   return 7;
+   return 8;
 }
-- 
2.10.1



[PATCH v8 03/12] drm/i915: rename OACONTROL GEN7_OACONTROL

2016-10-28 Thread Robert Bragg
OACONTROL changes quite a bit for gen8, with some bits split out into a
per-context OACTXCONTROL register. Rename now before adding more gen7 OA
registers

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gvt/handlers.c| 2 +-
 drivers/gpu/drm/i915/i915_cmd_parser.c | 4 ++--
 drivers/gpu/drm/i915/i915_reg.h| 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/handlers.c 
b/drivers/gpu/drm/i915/gvt/handlers.c
index 9ab1f95..4527cb7 100644
--- a/drivers/gpu/drm/i915/gvt/handlers.c
+++ b/drivers/gpu/drm/i915/gvt/handlers.c
@@ -2177,7 +2177,7 @@ static int init_generic_mmio_info(struct intel_gvt *gvt)
MMIO_DFH(0x1217c, D_ALL, F_CMD_ACCESS, NULL, NULL);

MMIO_F(0x2290, 8, 0, 0, 0, D_HSW_PLUS, NULL, NULL);
-   MMIO_D(OACONTROL, D_HSW);
+   MMIO_D(GEN7_OACONTROL, D_HSW);
MMIO_D(0x2b00, D_BDW_PLUS);
MMIO_D(0x2360, D_BDW_PLUS);
MMIO_F(0x5200, 32, 0, 0, 0, D_ALL, NULL, NULL);
diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index f191d7b..fe34470 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -450,7 +450,7 @@ static const struct drm_i915_reg_descriptor 
gen7_render_regs[] = {
REG64(PS_INVOCATION_COUNT),
REG64(PS_DEPTH_COUNT),
REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE),
-   REG32(OACONTROL), /* Only allowed for LRI and SRM. See below. */
+   REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */
REG64(MI_PREDICATE_SRC0),
REG64(MI_PREDICATE_SRC1),
REG32(GEN7_3DPRIM_END_OFFSET),
@@ -1108,7 +1108,7 @@ static bool check_cmd(const struct intel_engine_cs 
*engine,
 * to the register. Hence, limit OACONTROL writes to
 * only MI_LOAD_REGISTER_IMM commands.
 */
-   if (reg_addr == i915_mmio_reg_offset(OACONTROL)) {
+   if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) {
if (desc->cmd.value == MI_LOAD_REGISTER_MEM) {
DRM_DEBUG_DRIVER("CMD: Rejected LRM to 
OACONTROL\n");
return false;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 542e570..59628d5 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -615,7 +615,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define HSW_CS_GPR(n)   _MMIO(0x2600 + (n) * 8)
 #define HSW_CS_GPR_UDW(n)   _MMIO(0x2600 + (n) * 8 + 4)

-#define OACONTROL _MMIO(0x2360)
+#define GEN7_OACONTROL _MMIO(0x2360)

 #define _GEN7_PIPEA_DE_LOAD_SL 0x70068
 #define _GEN7_PIPEB_DE_LOAD_SL 0x71068
-- 
2.10.1



[PATCH v8 02/12] drm/i915: Add i915 perf infrastructure

2016-10-28 Thread Robert Bragg
Adds base i915 perf infrastructure for Gen performance metrics.

This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64
properties to configure a stream of metrics and returns a new fd usable
with standard VFS system calls including read() to read typed and sized
records; ioctl() to enable or disable capture and poll() to wait for
data.

A stream is opened something like:

  uint64_t properties[] = {
  /* Single context sampling */
  DRM_I915_PERF_PROP_CTX_HANDLE,ctx_handle,

  /* Include OA reports in samples */
  DRM_I915_PERF_PROP_SAMPLE_OA, true,

  /* OA unit configuration */
  DRM_I915_PERF_PROP_OA_METRICS_SET,metrics_set_id,
  DRM_I915_PERF_PROP_OA_FORMAT, report_format,
  DRM_I915_PERF_PROP_OA_EXPONENT,   period_exponent,
   };
   struct drm_i915_perf_open_param parm = {
  .flags = I915_PERF_FLAG_FD_CLOEXEC |
   I915_PERF_FLAG_FD_NONBLOCK |
   I915_PERF_FLAG_DISABLED,
  .properties_ptr = (uint64_t)properties,
  .num_properties = sizeof(properties) / 16,
   };
   int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, ¶m);

Records read all start with a common { type, size } header with
DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records
contain an extensible number of fields and it's the
DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that
determine what's included in every sample.

No specific streams are supported yet so any attempt to open a stream
will return an error.

v2:
use i915_gem_context_get() - Chris Wilson
v3:
update read() interface to avoid passing state struct - Chris Wilson
fix some rebase fallout, with i915-perf init/deinit
v4:
s/DRM_IORW/DRM_IOW/ - Emil Velikov

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/Makefile|   3 +
 drivers/gpu/drm/i915/i915_drv.c  |   4 +
 drivers/gpu/drm/i915/i915_drv.h  |  91 
 drivers/gpu/drm/i915/i915_perf.c | 443 +++
 include/uapi/drm/i915_drm.h  |  67 ++
 5 files changed, 608 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_perf.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 6123400..8d4e25f 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -113,6 +113,9 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o
 # virtual gpu code
 i915-y += i915_vgpu.o

+# perf code
+i915-y += i915_perf.o
+
 ifeq ($(CONFIG_DRM_I915_GVT),y)
 i915-y += intel_gvt.o
 include $(src)/gvt/Makefile
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index af3559d..685c96e 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -836,6 +836,8 @@ static int i915_driver_init_early(struct drm_i915_private 
*dev_priv,

intel_detect_preproduction_hw(dev_priv);

+   i915_perf_init(dev_priv);
+
return 0;

 err_workqueues:
@@ -849,6 +851,7 @@ static int i915_driver_init_early(struct drm_i915_private 
*dev_priv,
  */
 static void i915_driver_cleanup_early(struct drm_i915_private *dev_priv)
 {
+   i915_perf_fini(dev_priv);
i915_gem_load_cleanup(&dev_priv->drm);
i915_workqueues_cleanup(dev_priv);
 }
@@ -2556,6 +2559,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, 
DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, 
i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, 
i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, 
DRM_RENDER_ALLOW),
 };

 static struct drm_driver driver = {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5a260db..7a65c0b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1767,6 +1767,84 @@ struct intel_wm_config {
bool sprites_scaled;
 };

+struct i915_perf_stream;
+
+struct i915_perf_stream_ops {
+   /* Enables the collection of HW samples, either in response to
+* I915_PERF_IOCTL_ENABLE or implicitly called when stream is
+* opened without I915_PERF_FLAG_DISABLED.
+*/
+   void (*enable)(struct i915_perf_stream *stream);
+
+   /* Disables the collection of HW samples, either in response to
+* I915_PERF_IOCTL_DISABLE or implicitly called before
+* destroying the stream.
+*/
+   void (*disable)(struct i915_perf_stream *stream);
+
+   /* Return: true if any i915 perf records are ready to read()
+* for this stream.
+*/
+   bool (*can_read)(struct i915_perf_stream *stream);
+
+   /* Call poll_wait, passing a wait queue that will be woken
+* once there is something ready to read() for the stream
+*/
+   void (*poll_wait)(struct i915_pe

[PATCH v8 01/12] ctx-pin placeholder from chris

2016-10-28 Thread Robert Bragg
From: Chris Wilson 

---
 drivers/gpu/drm/i915/i915_drv.h |  1 +
 drivers/gpu/drm/i915/i915_gem_context.c | 34 ++---
 2 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 55afb66..5a260db 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3437,6 +3437,7 @@ struct drm_i915_gem_object *
 i915_gem_alloc_context_obj(struct drm_device *dev, size_t size);
 struct i915_gem_context *
 i915_gem_context_create_gvt(struct drm_device *dev);
+struct i915_vma *i915_gem_context_pin_legacy(struct i915_gem_context *ctx);

 static inline struct i915_gem_context *
 i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 5dca32a..a620e15b 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -751,12 +751,31 @@ needs_pd_load_post(struct i915_hw_ppgtt *ppgtt,
return false;
 }

+struct i915_vma *i915_gem_context_pin_legacy(struct i915_gem_context *ctx)
+{
+   struct i915_vma *vma = ctx->engine[RCS].state;
+   int ret;
+
+   /* Clear this page out of any CPU caches for coherent swap-in/out. */
+   if (!(vma->flags & I915_VMA_GLOBAL_BIND)) {
+   ret = i915_gem_object_set_to_gtt_domain(vma->obj, false);
+   if (ret)
+   return ERR_PTR(ret);
+   }
+
+   ret = i915_vma_pin(vma, 0, ctx->ggtt_alignment, PIN_GLOBAL);
+   if (ret)
+   return ERR_PTR(ret);
+
+   return vma;
+}
+
 static int do_rcs_switch(struct drm_i915_gem_request *req)
 {
struct i915_gem_context *to = req->ctx;
struct intel_engine_cs *engine = req->engine;
struct i915_hw_ppgtt *ppgtt = to->ppgtt ?: req->i915->mm.aliasing_ppgtt;
-   struct i915_vma *vma = to->engine[RCS].state;
+   struct i915_vma *vma;
struct i915_gem_context *from;
u32 hw_flags;
int ret, i;
@@ -764,17 +783,10 @@ static int do_rcs_switch(struct drm_i915_gem_request *req)
if (skip_rcs_switch(ppgtt, engine, to))
return 0;

-   /* Clear this page out of any CPU caches for coherent swap-in/out. */
-   if (!(vma->flags & I915_VMA_GLOBAL_BIND)) {
-   ret = i915_gem_object_set_to_gtt_domain(vma->obj, false);
-   if (ret)
-   return ret;
-   }
-
/* Trying to pin first makes error handling easier. */
-   ret = i915_vma_pin(vma, 0, to->ggtt_alignment, PIN_GLOBAL);
-   if (ret)
-   return ret;
+   vma = i915_gem_context_pin_legacy(to);
+   if (IS_ERR(vma))
+   return PTR_ERR(vma);

/*
 * Pin can switch back to the default context if we end up calling into
-- 
2.10.1



[PATCH v8 00/12] Enable i915 perf stream for Haswell OA unit

2016-10-28 Thread Robert Bragg
Rebased on nightly, and updated as per review from Matt and Chris

The first patch from Chris adds an i915_gem_context_pin_legacy() utility that
I'm depending on now - though it doesn't really form part of the i915-perf
series proper. I'm assuming Chris plans to send a version of this to the list
himself with a proper commit message.

- Robert

Chris Wilson (1):
  ctx-pin placeholder from chris

Robert Bragg (11):
  drm/i915: Add i915 perf infrastructure
  drm/i915: rename OACONTROL GEN7_OACONTROL
  drm/i915: return EACCES for check_cmd() failures
  drm/i915: don't whitelist oacontrol in cmd parser
  drm/i915: Add 'render basic' Haswell OA unit config
  drm/i915: Enable i915 perf stream for Haswell OA unit
  drm/i915: advertise available metrics via sysfs
  drm/i915: Add dev.i915.perf_stream_paranoid sysctl option
  drm/i915: add oa_event_min_timer_exponent sysctl
  drm/i915: Add more Haswell OA metric sets
  drm/i915: Add a kerneldoc summary for i915_perf.c

 drivers/gpu/drm/i915/Makefile   |4 +
 drivers/gpu/drm/i915/gvt/handlers.c |2 +-
 drivers/gpu/drm/i915/i915_cmd_parser.c  |   45 +-
 drivers/gpu/drm/i915/i915_drv.c |9 +
 drivers/gpu/drm/i915/i915_drv.h |  157 +++
 drivers/gpu/drm/i915/i915_gem_context.c |   34 +-
 drivers/gpu/drm/i915/i915_oa_hsw.c  |  752 ++
 drivers/gpu/drm/i915/i915_oa_hsw.h  |   38 +
 drivers/gpu/drm/i915/i915_perf.c| 1726 +++
 drivers/gpu/drm/i915/i915_reg.h |  340 +-
 include/uapi/drm/i915_drm.h |  134 +++
 11 files changed, 3190 insertions(+), 51 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h
 create mode 100644 drivers/gpu/drm/i915/i915_perf.c

-- 
2.10.1



[PATCH v7 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-10-26 Thread Robert Bragg
On Wed, Oct 26, 2016 at 4:03 PM, Robert Bragg 
wrote:

> On 26 Oct 2016 11:08 a.m., "Matthew Auld" 
> wrote:
> >
> > On 26 October 2016 at 00:51, Robert Bragg  wrote:
> > >
> > >
> > > On Tue, Oct 25, 2016 at 10:35 PM, Matthew Auld
> > >  wrote:
> > >>
> > >> On 25 October 2016 at 00:19, Robert Bragg 
> wrote:
> > >
> > >
> > >>
> > >>
> > >> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > >> > b/drivers/gpu/drm/i915/i915_drv.h
> > >> > index 3448d05..ea24814 100644
> > >> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > >> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > >> > @@ -1764,6 +1764,11 @@ struct intel_wm_config {
> > >>
> > >> >
> > >> >  struct drm_i915_private {
> > >> > @@ -2149,16 +2164,46 @@ struct drm_i915_private {
> > >> >
> > >> > struct {
> > >> > bool initialized;
> > >> > +
> > >> > struct mutex lock;
> > >> > struct list_head streams;
> > >> >
> > >> > +   spinlock_t hook_lock;
> > >> > +
> > >> > struct {
> > >> > -   u32 metrics_set;
> > >> > +   struct i915_perf_stream *exclusive_stream;
> > >> > +
> > >> > +   u32 specific_ctx_id;
> > >> Can we just get rid of this, now that the vma remains pinned we can
> > >> simply get the ggtt address at the time of configuring the OA_CONTROL
> > >> register ?
> > >
> > >
> > > I considered that, but would ideally prefer to keep it considering the
> gen8+
> > > patches to come. For gen8+ (with execlists) the context ID isn't a gtt
> > > offset.
> > >
> > >>
> > >>
> > >> > +
> > >> > +   struct hrtimer poll_check_timer;
> > >> > +   wait_queue_head_t poll_wq;
> > >> > +   atomic_t pollin;
> > >> > +
> > >>
> > >
> > >>
> > >> > +/* The maximum exponent the hardware accepts is 63 (essentially it
> > >> > selects one
> > >> > + * of the 64bit timestamp bits to trigger reports from) but there's
> > >> > currently
> > >> > + * no known use case for sampling as infrequently as once per 47
> > >> > thousand years.
> > >> > + *
> > >> > + * Since the timestamps included in OA reports are only 32bits it
> seems
> > >> > + * reasonable to limit the OA exponent where it's still possible to
> > >> > account for
> > >> > + * overflow in OA report timestamps.
> > >> > + */
> > >> > +#define OA_EXPONENT_MAX 31
> > >> > +
> > >> > +#define INVALID_CTX_ID 0x
> > >> We shouldn't need this anymore.
> > >
> > >
> > > yeah I removed it and then added it back, just for the sake of
> explicitly
> > > setting the specific_ctx_id to an invalid ID when closing the exclusive
> > > stream - though resetting the value isn't strictly necessary.
> > Can we not make the specific_ctx_id per-stream, the gem context
> > already is, then we don't need to be concerned with resetting it ?
>
> Hmm, I'm not sure about that, conceptually to me it's global OA unit state.
>
> Currently the driver only supports a single exclusive stream, while Sourab
> later relaxes that to a per-engine stream and that could be relaxed further
> with non-oa metric stream types.
>
> With multiple streams we'll still only be able to programmer a single ctx
> id in oacontol.
>
> Conceptually to me, other stream types could be associated with different
> contexts (if they don't depend on the OA unit) so to me stream->ctx isn't
> necessarily OA unit state.
>
> It probably could be played around with, but right now we don't track OA
> specific state in the stream. For the ID it's just semantics to say it's OA
> state, and we could consider that it's maybe generally useful to track the
> ID, even for future non-oa streams. That might mean potentially redundantly
> pinning state for the sake of tracking the ID for streams that don't end up
> needing it.
>

I started to try out moving the specific_ctx_id and vma pointer (new) to
the stream, and also looked at initializing them together with the
stream->ctx reference, but I'm not really happy with how it's looking.

The specific_ctx_id and pinning are only for the render context, since the
OA unit is only well integrated with the render engine, which makes me more
inclined to consider them OA stream specific, not something we want/need
for all streams (considering that Sourab enables multiple streams in his
series).

Btw, for reference, my patches for gen8+ can also end up making use of the
INVALID_CTX_ID define (when overwriting the undefined ctx_id field in HW
reports when the report's ctx-id is flagged as invalid by the OA unit.) so
we maybe don't want to worry to much about removing the need for it here.

- Robert
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://lists.freedesktop.org/archives/dri-devel/attachments/20161026/1eabdc8e/attachment.html>


[Intel-gfx] [PATCH v7 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-10-26 Thread Robert Bragg
On 26 Oct 2016 5:54 p.m., "Ville Syrjälä" 
wrote:
>
> On Wed, Oct 26, 2016 at 05:42:23PM +0100, Robert Bragg wrote:
> > On Wed, Oct 26, 2016 at 4:37 PM, Ville Syrjälä <
> > ville.syrjala at linux.intel.com> wrote:
> >
> > > On Wed, Oct 26, 2016 at 04:17:45PM +0100, Robert Bragg wrote:
> > > > On 26 Oct 2016 9:54 a.m., "Chris Wilson" 
> > > wrote:
> > > > >
> > > > > On Wed, Oct 26, 2016 at 12:51:58AM +0100, Robert Bragg wrote:
> > > > > >On Tue, Oct 25, 2016 at 10:35 PM, Matthew Auld
> > > > > ><[1]matthew.william.auld at gmail.com> wrote:
> > > > > >
> > > > > >  On 25 October 2016 at 00:19, Robert Bragg <[2]
> > > robert at sixbynine.org>
> > > > > >  wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > >  > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > > > > >  b/drivers/gpu/drm/i915/i915_drv.h
> > > > > >  > index 3448d05..ea24814 100644
> > > > > >  > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > > > >  > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > > > >  > @@ -1764,6 +1764,11 @@ struct intel_wm_config {
> > > > > >
> > > > > >  >
> > > > > >  >  struct drm_i915_private {
> > > > > >  > @@ -2149,16 +2164,46 @@ struct drm_i915_private {
> > > > > >  >
> > > > > >  > struct {
> > > > > >  > bool initialized;
> > > > > >  > +
> > > > > >  > struct mutex lock;
> > > > > >  > struct list_head streams;
> > > > > >  >
> > > > > >  > +   spinlock_t hook_lock;
> > > > > >  > +
> > > > > >  > struct {
> > > > > >  > -   u32 metrics_set;
> > > > > >  > +   struct i915_perf_stream
> > > > *exclusive_stream;
> > >
> > > OT:
> > > What kind of MUA are you using that mangles quoted mails like this?
I've
> > > not seen it on intel-gfx before. mesa-dev seems rife with it, but as I
> > > rarely read that in any great detail I've managed to ignore it there.
> > > Anyways, it makes it espesially hard to navigate long mails since
mutt's
> > > 'S' (skip quoted text) no longer works correctly.
> > >
> >
> > Not sure I want to say, and get booted out the door :-)
> >
> > I've heard that gmail has an annoying habit of forcibly wrapping plain
text
> > emails like this, and a lot of people have complained that there's no
way
> > to disable that 'feature' :-/
> >
> > I used to use Mutt, but I don't think I could really bare to go back to
it
> > any more. Last time I was using it I found myself spending too much time
> > patching it to try and make it work how I'd like, but can't say I got
much
> > enjoyment from that process.
>
> Isn't gmail just a pile of client side javascript or something? Maybe
> you'd enjoy patching that one more? ;)
>
> >
> > I've tried most MUA options available, and can't say any of them make me
> > very happy - I think these days it's just not something developers are
very
> > interesting in working on.
> >
> > I'm a sell out and just use Gmail... sorry. I can't really see myself
> > changing, though I do wish Google weren't so pedantic about forcing
> > wrapping without any option to change that behaviour. I suspect you
> > wouldn't be happy with me sending html emails, which has been Google's
> > default response to this complaint afik.
> >
> > Maybe it's gmail users causing trouble on the Mesa list too.
> >
> > - Robert
> >
> > P.S please don't think lesser of me due to my misguided MUA choices.
>
> I think I'll just reserve the right to ignore any mail with bad quoting.

Okey, fwiw, at least my patches sent out via git send-email should be fine,
so maybe just ignore my replies to feedback - which I promise not to
exploit to achieve 'consensus' through silence.

- Robert

--
Sent from Gmail on Android, in a spare moment at a VR for Immersive Theatre
meet up.

>
> --
> Ville Syrjälä
> Intel OTC
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://lists.freedesktop.org/archives/dri-devel/attachments/20161026/92ecba19/attachment-0001.html>


[Intel-gfx] [PATCH v7 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-10-26 Thread Robert Bragg
On Wed, Oct 26, 2016 at 4:37 PM, Ville Syrjälä <
ville.syrjala at linux.intel.com> wrote:

> On Wed, Oct 26, 2016 at 04:17:45PM +0100, Robert Bragg wrote:
> > On 26 Oct 2016 9:54 a.m., "Chris Wilson" 
> wrote:
> > >
> > > On Wed, Oct 26, 2016 at 12:51:58AM +0100, Robert Bragg wrote:
> > > >On Tue, Oct 25, 2016 at 10:35 PM, Matthew Auld
> > > ><[1]matthew.william.auld at gmail.com> wrote:
> > > >
> > > >  On 25 October 2016 at 00:19, Robert Bragg <[2]
> robert at sixbynine.org>
> > > >  wrote:
> > > >
> > > >
> > > >
> > > >  > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > > >  b/drivers/gpu/drm/i915/i915_drv.h
> > > >  > index 3448d05..ea24814 100644
> > > >  > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > >  > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > >  > @@ -1764,6 +1764,11 @@ struct intel_wm_config {
> > > >
> > > >  >
> > > >  >  struct drm_i915_private {
> > > >  > @@ -2149,16 +2164,46 @@ struct drm_i915_private {
> > > >  >
> > > >  > struct {
> > > >  > bool initialized;
> > > >  > +
> > > >  > struct mutex lock;
> > > >  > struct list_head streams;
> > > >  >
> > > >  > +   spinlock_t hook_lock;
> > > >  > +
> > > >  > struct {
> > > >  > -   u32 metrics_set;
> > > >  > +   struct i915_perf_stream
> > *exclusive_stream;
>
> OT:
> What kind of MUA are you using that mangles quoted mails like this? I've
> not seen it on intel-gfx before. mesa-dev seems rife with it, but as I
> rarely read that in any great detail I've managed to ignore it there.
> Anyways, it makes it espesially hard to navigate long mails since mutt's
> 'S' (skip quoted text) no longer works correctly.
>

Not sure I want to say, and get booted out the door :-)

I've heard that gmail has an annoying habit of forcibly wrapping plain text
emails like this, and a lot of people have complained that there's no way
to disable that 'feature' :-/

I used to use Mutt, but I don't think I could really bare to go back to it
any more. Last time I was using it I found myself spending too much time
patching it to try and make it work how I'd like, but can't say I got much
enjoyment from that process.

I've tried most MUA options available, and can't say any of them make me
very happy - I think these days it's just not something developers are very
interesting in working on.

I'm a sell out and just use Gmail... sorry. I can't really see myself
changing, though I do wish Google weren't so pedantic about forcing
wrapping without any option to change that behaviour. I suspect you
wouldn't be happy with me sending html emails, which has been Google's
default response to this complaint afik.

Maybe it's gmail users causing trouble on the Mesa list too.

- Robert

P.S please don't think lesser of me due to my misguided MUA choices.



>
> > > >  > +
> > > >  > +   u32 specific_ctx_id;
> > > >  Can we just get rid of this, now that the vma remains pinned we
> can
> > > >  simply get the ggtt address at the time of configuring the
> > OA_CONTROL
> > > >  register ?
> > > >
> > > >I considered that, but would ideally prefer to keep it considering
> > the
> > > >gen8+ patches to come. For gen8+ (with execlists) the context ID
> > isn't a
> > > >gtt offset.
> > >
> > > In terms of symmetry, keeping the vma you pinned and unpinning the same
> > > later makes its ownership much clearer. (And I do want the owner of
> each
> > > pin to be clear, for when we start enabling debug to catch the VMA
> > > leaks.)
> >
> > Keeping our own pointer to the pinned vma could be a clarification.
> >
> > Considering Matt's comments too, I'm thinking I'll put the pinning and
> > specific_ctx_id initialization together with setting stream->ctx, keeping
> > the state together under the stream. It's going to potentially mean
> > redundantly pinning the ctx for the sake of the ID in the future for
> > streams that don't really need it, but I think it's probably not worth
> > worrying about that.
> >
> > - Robert
> >
> > > -Chris
> > >
> > > --
> > > Chris Wilson, Intel Open Source Technology Centre
>
> > ___
> > Intel-gfx mailing list
> > Intel-gfx at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
>
> --
> Ville Syrjälä
> Intel OTC
>
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://lists.freedesktop.org/archives/dri-devel/attachments/20161026/da381dcd/attachment-0001.html>


[PATCH v7 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-10-26 Thread Robert Bragg
On 26 Oct 2016 9:54 a.m., "Chris Wilson"  wrote:
>
> On Wed, Oct 26, 2016 at 12:51:58AM +0100, Robert Bragg wrote:
> >On Tue, Oct 25, 2016 at 10:35 PM, Matthew Auld
> ><[1]matthew.william.auld at gmail.com> wrote:
> >
> >  On 25 October 2016 at 00:19, Robert Bragg <[2]robert at sixbynine.org>
> >  wrote:
> >
> >
> >
> >  > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> >  b/drivers/gpu/drm/i915/i915_drv.h
> >  > index 3448d05..ea24814 100644
> >  > --- a/drivers/gpu/drm/i915/i915_drv.h
> >  > +++ b/drivers/gpu/drm/i915/i915_drv.h
> >  > @@ -1764,6 +1764,11 @@ struct intel_wm_config {
> >
> >  >
> >  >  struct drm_i915_private {
> >  > @@ -2149,16 +2164,46 @@ struct drm_i915_private {
> >  >
> >  > struct {
> >  > bool initialized;
> >  > +
> >  > struct mutex lock;
> >  > struct list_head streams;
> >  >
> >  > +   spinlock_t hook_lock;
> >  > +
> >  > struct {
> >  > -   u32 metrics_set;
> >  > +   struct i915_perf_stream
*exclusive_stream;
> >  > +
> >  > +   u32 specific_ctx_id;
> >  Can we just get rid of this, now that the vma remains pinned we can
> >  simply get the ggtt address at the time of configuring the
OA_CONTROL
> >  register ?
> >
> >I considered that, but would ideally prefer to keep it considering
the
> >gen8+ patches to come. For gen8+ (with execlists) the context ID
isn't a
> >gtt offset.
>
> In terms of symmetry, keeping the vma you pinned and unpinning the same
> later makes its ownership much clearer. (And I do want the owner of each
> pin to be clear, for when we start enabling debug to catch the VMA
> leaks.)

Keeping our own pointer to the pinned vma could be a clarification.

Considering Matt's comments too, I'm thinking I'll put the pinning and
specific_ctx_id initialization together with setting stream->ctx, keeping
the state together under the stream. It's going to potentially mean
redundantly pinning the ctx for the sake of the ID in the future for
streams that don't really need it, but I think it's probably not worth
worrying about that.

- Robert

> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://lists.freedesktop.org/archives/dri-devel/attachments/20161026/74d1f24a/attachment.html>


[PATCH v7 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-10-26 Thread Robert Bragg
On 26 Oct 2016 11:08 a.m., "Matthew Auld" 
wrote:
>
> On 26 October 2016 at 00:51, Robert Bragg  wrote:
> >
> >
> > On Tue, Oct 25, 2016 at 10:35 PM, Matthew Auld
> >  wrote:
> >>
> >> On 25 October 2016 at 00:19, Robert Bragg  wrote:
> >
> >
> >>
> >>
> >> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> >> > b/drivers/gpu/drm/i915/i915_drv.h
> >> > index 3448d05..ea24814 100644
> >> > --- a/drivers/gpu/drm/i915/i915_drv.h
> >> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> >> > @@ -1764,6 +1764,11 @@ struct intel_wm_config {
> >>
> >> >
> >> >  struct drm_i915_private {
> >> > @@ -2149,16 +2164,46 @@ struct drm_i915_private {
> >> >
> >> > struct {
> >> > bool initialized;
> >> > +
> >> > struct mutex lock;
> >> > struct list_head streams;
> >> >
> >> > +   spinlock_t hook_lock;
> >> > +
> >> > struct {
> >> > -   u32 metrics_set;
> >> > +   struct i915_perf_stream *exclusive_stream;
> >> > +
> >> > +   u32 specific_ctx_id;
> >> Can we just get rid of this, now that the vma remains pinned we can
> >> simply get the ggtt address at the time of configuring the OA_CONTROL
> >> register ?
> >
> >
> > I considered that, but would ideally prefer to keep it considering the
gen8+
> > patches to come. For gen8+ (with execlists) the context ID isn't a gtt
> > offset.
> >
> >>
> >>
> >> > +
> >> > +   struct hrtimer poll_check_timer;
> >> > +   wait_queue_head_t poll_wq;
> >> > +   atomic_t pollin;
> >> > +
> >>
> >
> >>
> >> > +/* The maximum exponent the hardware accepts is 63 (essentially it
> >> > selects one
> >> > + * of the 64bit timestamp bits to trigger reports from) but there's
> >> > currently
> >> > + * no known use case for sampling as infrequently as once per 47
> >> > thousand years.
> >> > + *
> >> > + * Since the timestamps included in OA reports are only 32bits it
seems
> >> > + * reasonable to limit the OA exponent where it's still possible to
> >> > account for
> >> > + * overflow in OA report timestamps.
> >> > + */
> >> > +#define OA_EXPONENT_MAX 31
> >> > +
> >> > +#define INVALID_CTX_ID 0x
> >> We shouldn't need this anymore.
> >
> >
> > yeah I removed it and then added it back, just for the sake of
explicitly
> > setting the specific_ctx_id to an invalid ID when closing the exclusive
> > stream - though resetting the value isn't strictly necessary.
> Can we not make the specific_ctx_id per-stream, the gem context
> already is, then we don't need to be concerned with resetting it ?

Hmm, I'm not sure about that, conceptually to me it's global OA unit state.

Currently the driver only supports a single exclusive stream, while Sourab
later relaxes that to a per-engine stream and that could be relaxed further
with non-oa metric stream types.

With multiple streams we'll still only be able to programmer a single ctx
id in oacontol.

Conceptually to me, other stream types could be associated with different
contexts (if they don't depend on the OA unit) so to me stream->ctx isn't
necessarily OA unit state.

It probably could be played around with, but right now we don't track OA
specific state in the stream. For the ID it's just semantics to say it's OA
state, and we could consider that it's maybe generally useful to track the
ID, even for future non-oa streams. That might mean potentially redundantly
pinning state for the sake of tracking the ID for streams that don't end up
needing it.
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://lists.freedesktop.org/archives/dri-devel/attachments/20161026/22bf9802/attachment.html>


[PATCH v7 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-10-26 Thread Robert Bragg
On Tue, Oct 25, 2016 at 10:35 PM, Matthew Auld <
matthew.william.auld at gmail.com> wrote:

> On 25 October 2016 at 00:19, Robert Bragg  wrote:



>
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h
> > index 3448d05..ea24814 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1764,6 +1764,11 @@ struct intel_wm_config {
>
> >
> >  struct drm_i915_private {
> > @@ -2149,16 +2164,46 @@ struct drm_i915_private {
> >
> > struct {
> > bool initialized;
> > +
> > struct mutex lock;
> > struct list_head streams;
> >
> > +   spinlock_t hook_lock;
> > +
> > struct {
> > -   u32 metrics_set;
> > +   struct i915_perf_stream *exclusive_stream;
> > +
> > +   u32 specific_ctx_id;
> Can we just get rid of this, now that the vma remains pinned we can
> simply get the ggtt address at the time of configuring the OA_CONTROL
> register ?
>

I considered that, but would ideally prefer to keep it considering the
gen8+ patches to come. For gen8+ (with execlists) the context ID isn't a
gtt offset.


>
> > +
> > +   struct hrtimer poll_check_timer;
> > +   wait_queue_head_t poll_wq;
> > +   atomic_t pollin;
> > +
>
>

> > +/* The maximum exponent the hardware accepts is 63 (essentially it
> selects one
> > + * of the 64bit timestamp bits to trigger reports from) but there's
> currently
> > + * no known use case for sampling as infrequently as once per 47
> thousand years.
> > + *
> > + * Since the timestamps included in OA reports are only 32bits it seems
> > + * reasonable to limit the OA exponent where it's still possible to
> account for
> > + * overflow in OA report timestamps.
> > + */
> > +#define OA_EXPONENT_MAX 31
> > +
> > +#define INVALID_CTX_ID 0x
> We shouldn't need this anymore.
>

yeah I removed it and then added it back, just for the sake of explicitly
setting the specific_ctx_id to an invalid ID when closing the exclusive
stream - though resetting the value isn't strictly necessary.

also maybe your comment is assuming specific_ctx_id can be removed, while
I'd prefer to keep it.


> > +
> > +static int claim_specific_ctx(struct i915_perf_stream *stream)
> > +{
> pin_oa_specific_ctx, or something? Also would it not make more sense
> to operate on the context, not the stream.
>

Yeah, I avoided a name like that mainly because it's also initializing
specific_ctx_id, which seemed to me like it would become an unexpected side
effect with that more specific name.

The other consideration is that in my gen8+ patches the pinning code is
conditional depending on whether execlists are enabled, while the function
still initializes specific_ctx_id.

Certainly not attached to the names though.

Chris has some feedback with the code, so maybe that will affect this too.


> > +   struct drm_i915_private *dev_priv = stream->dev_priv;
> > +   struct i915_vma *vma;
> > +   int ret;
> > +
> > +   ret = i915_mutex_lock_interruptible(&dev_priv->drm);
> > +   if (ret)
> > +   return ret;
> > +
> > +   /* So that we don't have to worry about updating the context ID
> > +* in OACONTOL on the fly we make sure to pin the context
> > +* upfront for the lifetime of the stream...
> > +*/
> > +   vma = stream->ctx->engine[RCS].state;
> > +   ret = i915_vma_pin(vma, 0, stream->ctx->ggtt_alignment,
> > +  PIN_GLOBAL | PIN_HIGH);
> > +   if (ret)
> > +   return ret;
> > +
> > +   dev_priv->perf.oa.specific_ctx_id = i915_ggtt_offset(vma);
> > +
> > +   mutex_unlock(&dev_priv->drm.struct_mutex);
> > +
> > +   return 0;
> > +}
>


I'll also follow up on the other notes; thanks!

- Robert
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://lists.freedesktop.org/archives/dri-devel/attachments/20161026/1e6cbaf7/attachment.html>


[PATCH v7 11/11] drm/i915: Add a kerneldoc summary for i915_perf.c

2016-10-25 Thread Robert Bragg
In particular this tries to capture for posterity some of the early
challenges we had with using the core perf infrastructure in case we
ever want to revisit adapting perf for device metrics.

Cc: Chris Wilson 
Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_perf.c | 163 +++
 1 file changed, 163 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index e46cd36..501d20a 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -24,6 +24,169 @@
  *   Robert Bragg 
  */

+
+/**
+ * DOC: i915 Perf, streaming API for GPU metrics
+ *
+ * Gen graphics supports a large number of performance counters that can help
+ * driver and application developers understand and optimize their use of the
+ * GPU.
+ *
+ * This i915 perf interface enables userspace to configure and open a file
+ * descriptor representing a stream of GPU metrics which can then be read() as
+ * a stream of sample records.
+ *
+ * The interface is particularly suited to exposing buffered metrics that are
+ * captured by DMA from the GPU, unsynchronized with and unrelated to the CPU.
+ *
+ * Streams representing a single context are accessible to applications with a
+ * corresponding drm file descriptor, such that OpenGL can use the interface
+ * without special privileges. Access to system-wide metrics requires root
+ * privileges by default, unless changed via the dev.i915.perf_event_paranoid
+ * sysctl option.
+ *
+ *
+ * The interface was initially inspired by the core Perf infrastructure but
+ * some notable differences are:
+ *
+ * i915 perf file descriptors represent a "stream" instead of an "event"; where
+ * a perf event primarily corresponds to a single 64bit value, while a stream
+ * might sample sets of tightly-coupled counters, depending on the
+ * configuration.  For example the Gen OA unit isn't designed to support
+ * orthogonal configurations of individual counters; it's configured for a set
+ * of related counters. Samples for an i915 perf stream capturing OA metrics
+ * will include a set of counter values packed in a compact HW specific format.
+ * The OA unit supports a number of different packing formats which can be
+ * selected by the user opening the stream. Perf has support for grouping
+ * events, but each event in the group is configured, validated and
+ * authenticated individually with separate system calls.
+ *
+ * i915 perf stream configurations are provided as an array of u64 (key,value)
+ * pairs, instead of a fixed struct with multiple miscellaneous config members,
+ * interleaved with event-type specific members.
+ *
+ * i915 perf doesn't support exposing metrics via an mmap'd circular buffer.
+ * The supported metrics are being written to memory by the GPU unsynchronized
+ * with the CPU, using HW specific packing formats for counter sets. Sometimes
+ * the constraints on HW configuration require reports to be filtered before it
+ * would be acceptable to expose them to unprivileged applications - to hide
+ * the metrics of other processes/contexts. For these use cases a read() based
+ * interface is a good fit, and provides an opportunity to filter data as it
+ * gets copied from the GPU mapped buffers to userspace buffers.
+ *
+ *
+ * Some notes regarding Linux Perf:
+ * 
+ *
+ * The first prototype of this driver was based on the core perf
+ * infrastructure, and while we did make that mostly work, with some changes to
+ * perf, we found we were breaking or working around too many assumptions baked
+ * into perf's currently cpu centric design.
+ *
+ * In the end we didn't see a clear benefit to making perf's implementation and
+ * interface more complex by changing design assumptions while we knew we still
+ * wouldn't be able to use any existing perf based userspace tools.
+ *
+ * Also considering the Gen specific nature of the Observability hardware and
+ * how userspace will sometimes need to combine i915 perf OA metrics with
+ * side-band OA data captured via MI_REPORT_PERF_COUNT commands; we're
+ * expecting the interface to be used by a platform specific userspace such as
+ * OpenGL or tools. This is to say; we aren't inherently missing out on having
+ * a standard vendor/architecture agnostic interface by not using perf.
+ *
+ *
+ * For posterity, in case we might re-visit trying to adapt core perf to be
+ * better suited to exposing i915 metrics these were the main pain points we
+ * hit:
+ *
+ * - The perf based OA PMU driver broke some significant design assumptions:
+ *
+ *   Existing perf pmus are used for profiling work on a cpu and we were
+ *   introducing the idea of _IS_DEVICE pmus with different security
+ *   implications, the need to fake cpu-related data (such as user/kernel
+ *   registers) to fit with perf's current design, and addi

[PATCH v7 10/11] drm/i915: Add more Haswell OA metric sets

2016-10-25 Thread Robert Bragg
This adds 'compute', 'compute extended', 'memory reads', 'memory writes'
and 'sampler balance' metric sets for Haswell.

The code is auto generated from an XML description of metric sets,
currently maintained in gputop, ref:

 https://github.com/rib/gputop
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_oa_hsw.c | 559 -
 1 file changed, 558 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
index 19f272b..cd2a23a 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.c
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -31,9 +31,14 @@

 enum metric_set_id {
METRIC_SET_ID_RENDER_BASIC = 1,
+   METRIC_SET_ID_COMPUTE_BASIC,
+   METRIC_SET_ID_COMPUTE_EXTENDED,
+   METRIC_SET_ID_MEMORY_READS,
+   METRIC_SET_ID_MEMORY_WRITES,
+   METRIC_SET_ID_SAMPLER_BALANCE,
 };

-int i915_oa_n_builtin_metric_sets_hsw = 1;
+int i915_oa_n_builtin_metric_sets_hsw = 6;

 static const struct i915_oa_reg b_counter_config_render_basic[] = {
{ _MMIO(0x2724), 0x0080 },
@@ -112,6 +117,298 @@ get_render_basic_mux_config(struct drm_i915_private 
*dev_priv,
return mux_config_render_basic;
 }

+static const struct i915_oa_reg b_counter_config_compute_basic[] = {
+   { _MMIO(0x2710), 0x },
+   { _MMIO(0x2714), 0x0080 },
+   { _MMIO(0x2718), 0x },
+   { _MMIO(0x271c), 0x },
+   { _MMIO(0x2720), 0x },
+   { _MMIO(0x2724), 0x0080 },
+   { _MMIO(0x2728), 0x },
+   { _MMIO(0x272c), 0x },
+   { _MMIO(0x2740), 0x },
+   { _MMIO(0x2744), 0x },
+   { _MMIO(0x2748), 0x },
+   { _MMIO(0x274c), 0x },
+   { _MMIO(0x2750), 0x },
+   { _MMIO(0x2754), 0x },
+   { _MMIO(0x2758), 0x },
+   { _MMIO(0x275c), 0x },
+   { _MMIO(0x236c), 0x },
+};
+
+static const struct i915_oa_reg mux_config_compute_basic[] = {
+   { _MMIO(0x253a4), 0x },
+   { _MMIO(0x2681c), 0x01f00800 },
+   { _MMIO(0x26820), 0x1000 },
+   { _MMIO(0x2781c), 0x01f00800 },
+   { _MMIO(0x26520), 0x0007 },
+   { _MMIO(0x265a0), 0x0007 },
+   { _MMIO(0x25380), 0x0010 },
+   { _MMIO(0x2538c), 0x0030 },
+   { _MMIO(0x25384), 0xaa8a },
+   { _MMIO(0x25404), 0x },
+   { _MMIO(0x26800), 0x4202 },
+   { _MMIO(0x26808), 0x00605817 },
+   { _MMIO(0x2680c), 0x10001005 },
+   { _MMIO(0x26804), 0x },
+   { _MMIO(0x27800), 0x0102 },
+   { _MMIO(0x27808), 0x0c0701e0 },
+   { _MMIO(0x2780c), 0x000200a0 },
+   { _MMIO(0x27804), 0x },
+   { _MMIO(0x26484), 0x4400 },
+   { _MMIO(0x26704), 0x4400 },
+   { _MMIO(0x26500), 0x0006 },
+   { _MMIO(0x26510), 0x0001 },
+   { _MMIO(0x26504), 0x8800 },
+   { _MMIO(0x26580), 0x0006 },
+   { _MMIO(0x26590), 0x0020 },
+   { _MMIO(0x26584), 0x },
+   { _MMIO(0x26104), 0x5582 },
+   { _MMIO(0x26184), 0xaa86 },
+   { _MMIO(0x25420), 0x08320c83 },
+   { _MMIO(0x25424), 0x06820c83 },
+   { _MMIO(0x2541c), 0x },
+   { _MMIO(0x25428), 0x0c03 },
+};
+
+static const struct i915_oa_reg *
+get_compute_basic_mux_config(struct drm_i915_private *dev_priv,
+int *len)
+{
+   *len = ARRAY_SIZE(mux_config_compute_basic);
+   return mux_config_compute_basic;
+}
+
+static const struct i915_oa_reg b_counter_config_compute_extended[] = {
+   { _MMIO(0x2724), 0xf080 },
+   { _MMIO(0x2720), 0x },
+   { _MMIO(0x2714), 0xf080 },
+   { _MMIO(0x2710), 0x },
+   { _MMIO(0x2770), 0x0007fe2a },
+   { _MMIO(0x2774), 0xff00 },
+   { _MMIO(0x2778), 0x0007fe6a },
+   { _MMIO(0x277c), 0xff00 },
+   { _MMIO(0x2780), 0x0007fe92 },
+   { _MMIO(0x2784), 0xff00 },
+   { _MMIO(0x2788), 0x0007fea2 },
+   { _MMIO(0x278c), 0xff00 },
+   { _MMIO(0x2790), 0x0007fe32 },
+   { _MMIO(0x2794), 0xff00 },
+   { _MMIO(0x2798), 0x0007fe9a },
+   { _MMIO(0x279c), 0xff00 },
+   { _MMIO(0x27a0), 0x0007ff23 },
+   { _MMIO(0x27a4), 0xff00 },
+   { _MMIO(0x27a8), 0x0007fff3 },
+   { _MMIO(0x27ac), 0xfffe },
+};
+
+static const struct i915_oa_reg mux_config_compute_extended[] = {
+   { _MMIO(0x2681c), 0x3eb00800 },
+   { _MMIO(0x26820), 0x0090 },
+   { _MMIO(0x25384), 0x02aa },
+   { _MMIO(0x25404), 0x03ff },
+   { _MMIO(0x26800), 0x00142284 },
+   { _MMIO(0x26808), 0x0e629062 },
+   { _MMIO(0x2680c), 0x3f6f55cb },
+   { _MMIO(0x26810), 0x0014 },
+  

[PATCH v7 09/11] drm/i915: add oa_event_min_timer_exponent sysctl

2016-10-25 Thread Robert Bragg
The minimal sampling period is now configurable via a
dev.i915.oa_min_timer_exponent sysctl parameter.

Following the precedent set by perf, the default is the minimum that
won't (on its own) exceed the default kernel.perf_event_max_sample_rate
default of 10 samples/s.

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_perf.c | 41 
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index ab4c171..e46cd36 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -82,6 +82,22 @@ static u32 i915_perf_stream_paranoid = true;
 #define INVALID_CTX_ID 0x


+/* for sysctl proc_dointvec_minmax of i915_oa_min_timer_exponent */
+static int oa_exponent_max = OA_EXPONENT_MAX;
+
+/* Theoretically we can program the OA unit to sample every 160ns but don't
+ * allow that by default unless root...
+ *
+ * The period is derived from the exponent as:
+ *
+ *   period = 80ns * 2^(exponent + 1)
+ *
+ * Referring to perf's kernel.perf_event_max_sample_rate for a precedent
+ * (10 by default); with an OA exponent of 6 we get a period of 10.240
+ * microseconds - just under 10Hz
+ */
+static u32 i915_oa_min_timer_exponent = 6;
+
 /* XXX: beware if future OA HW adds new report formats that the current
  * code assumes all reports have a power-of-two size and ~(size - 1) can
  * be used as a mask to align the OA tail pointer.
@@ -1317,21 +1333,13 @@ static int read_properties_unlocked(struct 
drm_i915_private *dev_priv,
return -EINVAL;
}

-   /* NB: The exponent represents a period as follows:
-*
-*   80ns * 2^(period_exponent + 1)
-*
-* Theoretically we can program the OA unit to sample
+   /* Theoretically we can program the OA unit to sample
 * every 160ns but don't allow that by default unless
 * root.
-*
-* Referring to perf's
-* kernel.perf_event_max_sample_rate for a precedent
-* (10 by default); with an OA exponent of 6 we get
-* a period of 10.240 microseconds -just under 10Hz
 */
-   if (value < 6 && !capable(CAP_SYS_ADMIN)) {
-   DRM_ERROR("Sampling period too high without 
root privileges\n");
+   if (value < i915_oa_min_timer_exponent &&
+   !capable(CAP_SYS_ADMIN)) {
+   DRM_ERROR("OA timer exponent too low without 
root privileges\n");
return -EACCES;
}

@@ -1439,6 +1447,15 @@ static struct ctl_table oa_table[] = {
 .extra1 = &zero,
 .extra2 = &one,
 },
+   {
+.procname = "oa_min_timer_exponent",
+.data = &i915_oa_min_timer_exponent,
+.maxlen = sizeof(i915_oa_min_timer_exponent),
+.mode = 0644,
+.proc_handler = proc_dointvec_minmax,
+.extra1 = &zero,
+.extra2 = &oa_exponent_max,
+},
{}
 };

-- 
2.10.1



[PATCH v7 08/11] drm/i915: Add dev.i915.perf_stream_paranoid sysctl option

2016-10-25 Thread Robert Bragg
Consistent with the kernel.perf_event_paranoid sysctl option that can
allow non-root users to access system wide cpu metrics, this can
optionally allow non-root users to access system wide OA counter metrics
from Gen graphics hardware.

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/i915_perf.c | 50 +++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a968212..7010c6e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2166,6 +2166,7 @@ struct drm_i915_private {
bool initialized;

struct kobject *metrics_kobj;
+   struct ctl_table_header *sysctl_header;

struct mutex lock;
struct list_head streams;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index aedefbc..ab4c171 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -64,6 +64,11 @@
 #define POLL_FREQUENCY 200
 #define POLL_PERIOD (NSEC_PER_SEC / POLL_FREQUENCY)

+/* for sysctl proc_dointvec_minmax of dev.i915.perf_stream_paranoid */
+static int zero;
+static int one = 1;
+static u32 i915_perf_stream_paranoid = true;
+
 /* The maximum exponent the hardware accepts is 63 (essentially it selects one
  * of the 64bit timestamp bits to trigger reports from) but there's currently
  * no known use case for sampling as infrequently as once per 47 thousand 
years.
@@ -1174,7 +1179,13 @@ i915_perf_open_ioctl_locked(struct drm_i915_private 
*dev_priv,
}
}

-   if (!specific_ctx && !capable(CAP_SYS_ADMIN)) {
+   /* Similar to perf's kernel.perf_paranoid_cpu sysctl option
+* we check a dev.i915.perf_stream_paranoid sysctl option
+* to determine if it's ok to access system wide OA counters
+* without CAP_SYS_ADMIN privileges.
+*/
+   if (!specific_ctx &&
+   i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
DRM_ERROR("Insufficient privileges to open system-wide i915 
perf stream\n");
ret = -EACCES;
goto err_ctx;
@@ -1418,6 +1429,39 @@ void i915_perf_unregister(struct drm_i915_private 
*dev_priv)
dev_priv->perf.metrics_kobj = NULL;
 }

+static struct ctl_table oa_table[] = {
+   {
+.procname = "perf_stream_paranoid",
+.data = &i915_perf_stream_paranoid,
+.maxlen = sizeof(i915_perf_stream_paranoid),
+.mode = 0644,
+.proc_handler = proc_dointvec_minmax,
+.extra1 = &zero,
+.extra2 = &one,
+},
+   {}
+};
+
+static struct ctl_table i915_root[] = {
+   {
+.procname = "i915",
+.maxlen = 0,
+.mode = 0555,
+.child = oa_table,
+},
+   {}
+};
+
+static struct ctl_table dev_root[] = {
+   {
+.procname = "dev",
+.maxlen = 0,
+.mode = 0555,
+.child = i915_root,
+},
+   {}
+};
+
 void i915_perf_init(struct drm_i915_private *dev_priv)
 {
if (!IS_HASWELL(dev_priv))
@@ -1448,6 +1492,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
dev_priv->perf.oa.n_builtin_sets =
i915_oa_n_builtin_metric_sets_hsw;

+   dev_priv->perf.sysctl_header = register_sysctl_table(dev_root);
+
dev_priv->perf.initialized = true;
 }

@@ -1456,6 +1502,8 @@ void i915_perf_fini(struct drm_i915_private *dev_priv)
if (!dev_priv->perf.initialized)
return;

+   unregister_sysctl_table(dev_priv->perf.sysctl_header);
+
memset(&dev_priv->perf.oa.ops, 0, sizeof(dev_priv->perf.oa.ops));
dev_priv->perf.initialized = false;
 }
-- 
2.10.1



[PATCH v7 07/11] drm/i915: advertise available metrics via sysfs

2016-10-25 Thread Robert Bragg
Each metric set is given a sysfs entry like:

/sys/class/drm/card0/metrics//id

This allows userspace to enumerate the specific sets that are available
for the current system. The 'id' file contains an unsigned integer that
can be used to open the associated metric set via
DRM_IOCTL_I915_PERF_OPEN. The  is a globally unique ID for a
specific OA unit register configuration that can be reliably used by
userspace as a key to lookup corresponding counter meta data and
normalization equations.

The guid registry is currently maintained as part of gputop along with
the XML metric set descriptions and code generation scripts, ref:

 https://github.com/rib/gputop
 > gputop-data/guids.xml
 > scripts/update-guids.py
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_drv.c|  5 
 drivers/gpu/drm/i915/i915_drv.h|  4 +++
 drivers/gpu/drm/i915/i915_oa_hsw.c | 51 +
 drivers/gpu/drm/i915/i915_oa_hsw.h |  4 +++
 drivers/gpu/drm/i915/i915_perf.c   | 52 ++
 5 files changed, 116 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index e99d14e..b887051 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1115,6 +1115,9 @@ static void i915_driver_register(struct drm_i915_private 
*dev_priv)
if (drm_dev_register(dev, 0) == 0) {
i915_debugfs_register(dev_priv);
i915_setup_sysfs(dev_priv);
+
+   /* Depends on sysfs having been initialized */
+   i915_perf_register(dev_priv);
} else
DRM_ERROR("Failed to register driver for userspace access!\n");

@@ -1151,6 +1154,8 @@ static void i915_driver_unregister(struct 
drm_i915_private *dev_priv)
acpi_video_unregister();
intel_opregion_unregister(dev_priv);

+   i915_perf_unregister(dev_priv);
+
i915_teardown_sysfs(dev_priv);
i915_debugfs_unregister(dev_priv);
drm_dev_unregister(&dev_priv->drm);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ea24814..a968212 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2165,6 +2165,8 @@ struct drm_i915_private {
struct {
bool initialized;

+   struct kobject *metrics_kobj;
+
struct mutex lock;
struct list_head streams;

@@ -3748,6 +3750,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
 /* i915_perf.c */
 extern void i915_perf_init(struct drm_i915_private *dev_priv);
 extern void i915_perf_fini(struct drm_i915_private *dev_priv);
+extern void i915_perf_register(struct drm_i915_private *dev_priv);
+extern void i915_perf_unregister(struct drm_i915_private *dev_priv);

 /* i915_suspend.c */
 extern int i915_save_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
index 8906380..19f272b 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.c
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -24,6 +24,8 @@
  *
  */

+#include 
+
 #include "i915_drv.h"
 #include "i915_oa_hsw.h"

@@ -142,3 +144,52 @@ int i915_oa_select_metric_set_hsw(struct drm_i915_private 
*dev_priv)
return -ENODEV;
}
 }
+
+static ssize_t
+show_render_basic_id(struct device *kdev, struct device_attribute *attr, char 
*buf)
+{
+   return sprintf(buf, "%d\n", METRIC_SET_ID_RENDER_BASIC);
+}
+
+static struct device_attribute dev_attr_render_basic_id = {
+   .attr = { .name = "id", .mode = S_IRUGO },
+   .show = show_render_basic_id,
+   .store = NULL,
+};
+
+static struct attribute *attrs_render_basic[] = {
+   &dev_attr_render_basic_id.attr,
+   NULL,
+};
+
+static struct attribute_group group_render_basic = {
+   .name = "403d8832-1a27-4aa6-a64e-f5389ce7b212",
+   .attrs =  attrs_render_basic,
+};
+
+int
+i915_perf_register_sysfs_hsw(struct drm_i915_private *dev_priv)
+{
+   int mux_len;
+   int ret = 0;
+
+   if (get_render_basic_mux_config(dev_priv, &mux_len)) {
+   ret = sysfs_create_group(dev_priv->perf.metrics_kobj, 
&group_render_basic);
+   if (ret)
+   goto error_render_basic;
+   }
+
+   return 0;
+
+error_render_basic:
+   return ret;
+}
+
+void
+i915_perf_unregister_sysfs_hsw(struct drm_i915_private *dev_priv)
+{
+   int mux_len;
+
+   if (get_render_basic_mux_config(dev_priv, &mux_len))
+   sysfs_remove_group(dev_priv->perf.metrics_kobj, 
&group_render_basic);
+}
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.h 
b/drivers/gpu/drm/i915/i915_oa_hsw.h
index b618a1f..429a22

[PATCH v7 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-10-25 Thread Robert Bragg
Gen graphics hardware can be set up to periodically write snapshots of
performance counters into a circular buffer via its Observation
Architecture and this patch exposes that capability to userspace via the
i915 perf interface.

v2:
   Make sure to initialize ->specific_ctx_id when opening, without
   relying on _pin_notify hook, in case ctx already pinned.
v3:
   Revert back to pinning ctx upfront when opening stream, removing
   need to hook in to pinning and to update OACONTROL on the fly.

Cc: Chris Wilson 
Signed-off-by: Robert Bragg 
Signed-off-by: Zhenyu Wang 

fix enable hsw
---
 drivers/gpu/drm/i915/i915_drv.h  |   65 ++-
 drivers/gpu/drm/i915/i915_perf.c | 1000 +-
 drivers/gpu/drm/i915/i915_reg.h  |  338 +
 include/uapi/drm/i915_drm.h  |   70 ++-
 4 files changed, 1444 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3448d05..ea24814 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1764,6 +1764,11 @@ struct intel_wm_config {
bool sprites_scaled;
 };

+struct i915_oa_format {
+   u32 format;
+   int size;
+};
+
 struct i915_oa_reg {
i915_reg_t addr;
u32 value;
@@ -1784,11 +1789,6 @@ struct i915_perf_stream_ops {
 */
void (*disable)(struct i915_perf_stream *stream);

-   /* Return: true if any i915 perf records are ready to read()
-* for this stream.
-*/
-   bool (*can_read)(struct i915_perf_stream *stream);
-
/* Call poll_wait, passing a wait queue that will be woken
 * once there is something ready to read() for the stream
 */
@@ -1798,9 +1798,7 @@ struct i915_perf_stream_ops {

/* For handling a blocking read, wait until there is something
 * to ready to read() for the stream. E.g. wait on the same
-* wait queue that would be passed to poll_wait() until
-* ->can_read() returns true (if its safe to call ->can_read()
-* without the i915 perf lock held).
+* wait queue that would be passed to poll_wait().
 */
int (*wait_unlocked)(struct i915_perf_stream *stream);

@@ -1840,11 +1838,28 @@ struct i915_perf_stream {
struct list_head link;

u32 sample_flags;
+   int sample_size;

struct i915_gem_context *ctx;
bool enabled;

-   struct i915_perf_stream_ops *ops;
+   const struct i915_perf_stream_ops *ops;
+};
+
+struct i915_oa_ops {
+   void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
+   int (*enable_metric_set)(struct drm_i915_private *dev_priv);
+   void (*disable_metric_set)(struct drm_i915_private *dev_priv);
+   void (*oa_enable)(struct drm_i915_private *dev_priv);
+   void (*oa_disable)(struct drm_i915_private *dev_priv);
+   void (*update_oacontrol)(struct drm_i915_private *dev_priv);
+   void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv,
+   u32 ctx_id);
+   int (*read)(struct i915_perf_stream *stream,
+   char __user *buf,
+   size_t count,
+   size_t *offset);
+   bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv);
 };

 struct drm_i915_private {
@@ -2149,16 +2164,46 @@ struct drm_i915_private {

struct {
bool initialized;
+
struct mutex lock;
struct list_head streams;

+   spinlock_t hook_lock;
+
struct {
-   u32 metrics_set;
+   struct i915_perf_stream *exclusive_stream;
+
+   u32 specific_ctx_id;
+
+   struct hrtimer poll_check_timer;
+   wait_queue_head_t poll_wq;
+   atomic_t pollin;
+
+   bool periodic;
+   int period_exponent;
+   int timestamp_frequency;
+
+   int tail_margin;
+
+   int metrics_set;

const struct i915_oa_reg *mux_regs;
int mux_regs_len;
const struct i915_oa_reg *b_counter_regs;
int b_counter_regs_len;
+
+   struct {
+   struct i915_vma *vma;
+   u8 *vaddr;
+   int format;
+   int format_size;
+   } oa_buffer;
+
+   u32 gen7_latched_oastatus1;
+
+   struct i915_oa_ops ops;
+   const struct i915_oa_format *oa_formats;
+   int n_builtin_sets;
} oa;
} perf;

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 4d51586..d7a4899 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ 

[PATCH v7 05/11] drm/i915: Add 'render basic' Haswell OA unit config

2016-10-25 Thread Robert Bragg
Adds a static OA unit, MUX + B Counter configuration for basic render
metrics on Haswell. This is auto generated from an XML
description of metric sets, currently maintained in gputop, ref:

  https://github.com/rib/gputop
  > gputop-data/oa-*.xml
  > scripts/i915-perf-kernelgen.py

  $ make -C gputop-data -f Makefile.xml SYSFS=0 WHITELIST=RenderBasic

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/Makefile  |   3 +-
 drivers/gpu/drm/i915/i915_drv.h|  14 
 drivers/gpu/drm/i915/i915_oa_hsw.c | 144 +
 drivers/gpu/drm/i915/i915_oa_hsw.h |  34 +
 4 files changed, 194 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 8d4e25f..ac0c3ad 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -114,7 +114,8 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o
 i915-y += i915_vgpu.o

 # perf code
-i915-y += i915_perf.o
+i915-y += i915_perf.o \
+ i915_oa_hsw.o

 ifeq ($(CONFIG_DRM_I915_GVT),y)
 i915-y += intel_gvt.o
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fcc5958..3448d05 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1764,6 +1764,11 @@ struct intel_wm_config {
bool sprites_scaled;
 };

+struct i915_oa_reg {
+   i915_reg_t addr;
+   u32 value;
+};
+
 struct i915_perf_stream;

 struct i915_perf_stream_ops {
@@ -2146,6 +2151,15 @@ struct drm_i915_private {
bool initialized;
struct mutex lock;
struct list_head streams;
+
+   struct {
+   u32 metrics_set;
+
+   const struct i915_oa_reg *mux_regs;
+   int mux_regs_len;
+   const struct i915_oa_reg *b_counter_regs;
+   int b_counter_regs_len;
+   } oa;
} perf;

/* Abstract the submission mechanism (legacy ringbuffer or execlists) 
away */
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
new file mode 100644
index 000..8906380
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -0,0 +1,144 @@
+/*
+ * Autogenerated file, DO NOT EDIT manually!
+ *
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "i915_drv.h"
+#include "i915_oa_hsw.h"
+
+enum metric_set_id {
+   METRIC_SET_ID_RENDER_BASIC = 1,
+};
+
+int i915_oa_n_builtin_metric_sets_hsw = 1;
+
+static const struct i915_oa_reg b_counter_config_render_basic[] = {
+   { _MMIO(0x2724), 0x0080 },
+   { _MMIO(0x2720), 0x },
+   { _MMIO(0x2714), 0x0080 },
+   { _MMIO(0x2710), 0x },
+};
+
+static const struct i915_oa_reg mux_config_render_basic[] = {
+   { _MMIO(0x253a4), 0x0160 },
+   { _MMIO(0x25440), 0x0010 },
+   { _MMIO(0x25128), 0x },
+   { _MMIO(0x2691c), 0x0800 },
+   { _MMIO(0x26aa0), 0x0150 },
+   { _MMIO(0x26b9c), 0x6000 },
+   { _MMIO(0x2791c), 0x0800 },
+   { _MMIO(0x27aa0), 0x0150 },
+   { _MMIO(0x27b9c), 0x6000 },
+   { _MMIO(0x2641c), 0x0400 },
+   { _MMIO(0x25380), 0x0010 },
+   { _MMIO(0x2538c), 0x },
+   { _MMIO(0x25384), 0x0800 },
+   { _MMIO(0x25400), 0x0004 },
+   { _MMIO(0x2540c), 0x06029000 },
+   { _MMIO(0x25410), 0x0002 },
+   { _MMIO(0x25404), 0x5c30 },
+   { _MMIO(0x25100), 0x0016 },
+   { _MMIO(0x25110), 0x0400 },
+   { _MMIO(0x25104), 0x },
+   { _MMIO(0x26804), 0x1211 },
+   { _MMIO(0

[PATCH v7 04/11] drm/i915: don't whitelist oacontrol in cmd parser

2016-10-25 Thread Robert Bragg
Being able to program OACONTROL from a non-privileged batch buffer is
not sufficient to be able to configure the OA unit. This was originally
allowed to help enable Mesa to expose OA counters via the
INTEL_performance_query extension, but the current implementation based
on programming OACONTROL via a batch buffer isn't able to report useable
data without a more complete OA unit configuration. Mesa handles the
possibility that writes to OACONTROL may not be allowed and so only
advertises the extension after explicitly testing that a write to
OACONTROL succeeds. Based on this; removing OACONTROL from the whitelist
should be ok for userspace.

Removing this simplifies adding a new kernel api for configuring the OA
unit without needing to consider the possibility that userspace might
trample on OACONTROL state which we'd like to start managing within
the kernel instead. In particular running any Mesa based GL application
currently results in clearing OACONTROL when initializing which would
disable the capturing of metrics.

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 38 ++
 1 file changed, 2 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index c45dd83..5152d6f 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -450,7 +450,6 @@ static const struct drm_i915_reg_descriptor 
gen7_render_regs[] = {
REG64(PS_INVOCATION_COUNT),
REG64(PS_DEPTH_COUNT),
REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE),
-   REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */
REG64(MI_PREDICATE_SRC0),
REG64(MI_PREDICATE_SRC1),
REG32(GEN7_3DPRIM_END_OFFSET),
@@ -1060,8 +1059,7 @@ bool intel_engine_needs_cmd_parser(struct intel_engine_cs 
*engine)
 static bool check_cmd(const struct intel_engine_cs *engine,
  const struct drm_i915_cmd_descriptor *desc,
  const u32 *cmd, u32 length,
- const bool is_master,
- bool *oacontrol_set)
+ const bool is_master)
 {
if (desc->flags & CMD_DESC_SKIP)
return true;
@@ -1099,31 +1097,6 @@ static bool check_cmd(const struct intel_engine_cs 
*engine,
}

/*
-* OACONTROL requires some special handling for
-* writes. We want to make sure that any batch which
-* enables OA also disables it before the end of the
-* batch. The goal is to prevent one process from
-* snooping on the perf data from another process. To do
-* that, we need to check the value that will be written
-* to the register. Hence, limit OACONTROL writes to
-* only MI_LOAD_REGISTER_IMM commands.
-*/
-   if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) {
-   if (desc->cmd.value == MI_LOAD_REGISTER_MEM) {
-   DRM_DEBUG_DRIVER("CMD: Rejected LRM to 
OACONTROL\n");
-   return false;
-   }
-
-   if (desc->cmd.value == MI_LOAD_REGISTER_REG) {
-   DRM_DEBUG_DRIVER("CMD: Rejected LRR to 
OACONTROL\n");
-   return false;
-   }
-
-   if (desc->cmd.value == MI_LOAD_REGISTER_IMM(1))
-   *oacontrol_set = (cmd[offset + 1] != 0);
-   }
-
-   /*
 * Check the value written to the register against the
 * allowed mask/value pair given in the whitelist entry.
 */
@@ -1214,7 +1187,6 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
u32 *cmd, *batch_end;
struct drm_i915_cmd_descriptor default_desc = noop_desc;
const struct drm_i915_cmd_descriptor *desc = &default_desc;
-   bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
bool needs_clflush_after = false;
int ret = 0;

@@ -1270,8 +1242,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
break;
}

-   if (!check_cmd(engine, desc, cmd, length, is_master,
-  &oacontrol_set)) {
+   if (!check_cmd(engine, desc, cmd, length, is_master)) {
ret = -EACCES;
break;
}
@@ -1279,11 +1250,6 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine

[PATCH v7 03/11] drm/i915: return EACCES for check_cmd() failures

2016-10-25 Thread Robert Bragg
check_cmd() is checking whether a command adheres to certain
restrictions that ensure it's safe to execute within a privileged batch
buffer. Returning false implies a privilege problem, not that the
command is invalid.

The distinction makes the difference between allowing the buffer to be
executed as an unprivileged batch buffer or returning an EINVAL error to
userspace without executing anything.

In a case where userspace may want to test whether it can successfully
write to a register that needs privileges the distinction may be
important and an EINVAL error may be considered fatal.

In particular this is currently true for Mesa, which includes a test for
whether OACONTROL can be written too, but Mesa treats any error when
flushing a batch buffer as fatal, calling exit(1).

As it is currently Mesa can gracefully handle a failure to write to
OACONTROL if the command parser is disabled, but if we were to remove
OACONTROL from the parser's whitelist then the returned EINVAL would
break Mesa applications as they attempt an OACONTROL write.

This bumps the command parser version from 7 to 8, as the change is
visible to userspace.

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index fe34470..c45dd83 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -1272,7 +1272,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,

if (!check_cmd(engine, desc, cmd, length, is_master,
   &oacontrol_set)) {
-   ret = -EINVAL;
+   ret = -EACCES;
break;
}

@@ -1333,6 +1333,9 @@ int i915_cmd_parser_get_version(struct drm_i915_private 
*dev_priv)
 * 5. GPGPU dispatch compute indirect registers.
 * 6. TIMESTAMP register and Haswell CS GPR registers
 * 7. Allow MI_LOAD_REGISTER_REG between whitelisted registers.
+* 8. Don't report cmd_check() failures as EINVAL errors to userspace;
+*rely on the HW to NOOP disallowed commands as it would without
+*the parser enabled.
 */
-   return 7;
+   return 8;
 }
-- 
2.10.1



[PATCH v7 02/11] drm/i915: rename OACONTROL GEN7_OACONTROL

2016-10-25 Thread Robert Bragg
OACONTROL changes quite a bit for gen8, with some bits split out into a
per-context OACTXCONTROL register. Rename now before adding more gen7 OA
registers

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gvt/handlers.c| 2 +-
 drivers/gpu/drm/i915/i915_cmd_parser.c | 4 ++--
 drivers/gpu/drm/i915/i915_reg.h| 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/handlers.c 
b/drivers/gpu/drm/i915/gvt/handlers.c
index 3e74fb3..68e07a1 100644
--- a/drivers/gpu/drm/i915/gvt/handlers.c
+++ b/drivers/gpu/drm/i915/gvt/handlers.c
@@ -2159,7 +2159,7 @@ static int init_generic_mmio_info(struct intel_gvt *gvt)
MMIO_DFH(0x1217c, D_ALL, F_CMD_ACCESS, NULL, NULL);

MMIO_F(0x2290, 8, 0, 0, 0, D_HSW_PLUS, NULL, NULL);
-   MMIO_D(OACONTROL, D_HSW);
+   MMIO_D(GEN7_OACONTROL, D_HSW);
MMIO_D(0x2b00, D_BDW_PLUS);
MMIO_D(0x2360, D_BDW_PLUS);
MMIO_F(0x5200, 32, 0, 0, 0, D_ALL, NULL, NULL);
diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index f191d7b..fe34470 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -450,7 +450,7 @@ static const struct drm_i915_reg_descriptor 
gen7_render_regs[] = {
REG64(PS_INVOCATION_COUNT),
REG64(PS_DEPTH_COUNT),
REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE),
-   REG32(OACONTROL), /* Only allowed for LRI and SRM. See below. */
+   REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */
REG64(MI_PREDICATE_SRC0),
REG64(MI_PREDICATE_SRC1),
REG32(GEN7_3DPRIM_END_OFFSET),
@@ -1108,7 +1108,7 @@ static bool check_cmd(const struct intel_engine_cs 
*engine,
 * to the register. Hence, limit OACONTROL writes to
 * only MI_LOAD_REGISTER_IMM commands.
 */
-   if (reg_addr == i915_mmio_reg_offset(OACONTROL)) {
+   if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) {
if (desc->cmd.value == MI_LOAD_REGISTER_MEM) {
DRM_DEBUG_DRIVER("CMD: Rejected LRM to 
OACONTROL\n");
return false;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index a9be3f0..070d3297 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -615,7 +615,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define HSW_CS_GPR(n)   _MMIO(0x2600 + (n) * 8)
 #define HSW_CS_GPR_UDW(n)   _MMIO(0x2600 + (n) * 8 + 4)

-#define OACONTROL _MMIO(0x2360)
+#define GEN7_OACONTROL _MMIO(0x2360)

 #define _GEN7_PIPEA_DE_LOAD_SL 0x70068
 #define _GEN7_PIPEB_DE_LOAD_SL 0x71068
-- 
2.10.1



[PATCH v7 01/11] drm/i915: Add i915 perf infrastructure

2016-10-25 Thread Robert Bragg
Adds base i915 perf infrastructure for Gen performance metrics.

This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64
properties to configure a stream of metrics and returns a new fd usable
with standard VFS system calls including read() to read typed and sized
records; ioctl() to enable or disable capture and poll() to wait for
data.

A stream is opened something like:

  uint64_t properties[] = {
  /* Single context sampling */
  DRM_I915_PERF_PROP_CTX_HANDLE,ctx_handle,

  /* Include OA reports in samples */
  DRM_I915_PERF_PROP_SAMPLE_OA, true,

  /* OA unit configuration */
  DRM_I915_PERF_PROP_OA_METRICS_SET,metrics_set_id,
  DRM_I915_PERF_PROP_OA_FORMAT, report_format,
  DRM_I915_PERF_PROP_OA_EXPONENT,   period_exponent,
   };
   struct drm_i915_perf_open_param parm = {
  .flags = I915_PERF_FLAG_FD_CLOEXEC |
   I915_PERF_FLAG_FD_NONBLOCK |
   I915_PERF_FLAG_DISABLED,
  .properties_ptr = (uint64_t)properties,
  .num_properties = sizeof(properties) / 16,
   };
   int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, ¶m);

Records read all start with a common { type, size } header with
DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records
contain an extensible number of fields and it's the
DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that
determine what's included in every sample.

No specific streams are supported yet so any attempt to open a stream
will return an error.

v4:
s/DRM_IORW/DRM_IOW/ - Emil Velikov
v3:
update read() interface to avoid passing state struct - Chris Wilson
fix some rebase fallout, with i915-perf init/deinit
v2:
use i915_gem_context_get() - Chris Wilson

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/Makefile|   3 +
 drivers/gpu/drm/i915/i915_drv.c  |   4 +
 drivers/gpu/drm/i915/i915_drv.h  |  91 
 drivers/gpu/drm/i915/i915_perf.c | 443 +++
 include/uapi/drm/i915_drm.h  |  67 ++
 5 files changed, 608 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_perf.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 6123400..8d4e25f 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -113,6 +113,9 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o
 # virtual gpu code
 i915-y += i915_vgpu.o

+# perf code
+i915-y += i915_perf.o
+
 ifeq ($(CONFIG_DRM_I915_GVT),y)
 i915-y += intel_gvt.o
 include $(src)/gvt/Makefile
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 99e4e04..e99d14e 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -836,6 +836,8 @@ static int i915_driver_init_early(struct drm_i915_private 
*dev_priv,

intel_detect_preproduction_hw(dev_priv);

+   i915_perf_init(dev_priv);
+
return 0;

 err_workqueues:
@@ -849,6 +851,7 @@ static int i915_driver_init_early(struct drm_i915_private 
*dev_priv,
  */
 static void i915_driver_cleanup_early(struct drm_i915_private *dev_priv)
 {
+   i915_perf_fini(dev_priv);
i915_gem_load_cleanup(&dev_priv->drm);
i915_workqueues_cleanup(dev_priv);
 }
@@ -2554,6 +2557,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, 
DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, 
i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, 
i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, 
DRM_RENDER_ALLOW),
 };

 static struct drm_driver driver = {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index dd3acab..fcc5958 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1764,6 +1764,84 @@ struct intel_wm_config {
bool sprites_scaled;
 };

+struct i915_perf_stream;
+
+struct i915_perf_stream_ops {
+   /* Enables the collection of HW samples, either in response to
+* I915_PERF_IOCTL_ENABLE or implicitly called when stream is
+* opened without I915_PERF_FLAG_DISABLED.
+*/
+   void (*enable)(struct i915_perf_stream *stream);
+
+   /* Disables the collection of HW samples, either in response to
+* I915_PERF_IOCTL_DISABLE or implicitly called before
+* destroying the stream.
+*/
+   void (*disable)(struct i915_perf_stream *stream);
+
+   /* Return: true if any i915 perf records are ready to read()
+* for this stream.
+*/
+   bool (*can_read)(struct i915_perf_stream *stream);
+
+   /* Call poll_wait, passing a wait queue that will be woken
+* once there is something ready to read() for the stream
+*/
+   void (*poll_wait)(struct i915_pe

[PATCH v7 00/11] Enable i915 perf stream for Haswell OA unit

2016-10-25 Thread Robert Bragg
Rebased on nightly, including recent review updates (CI wasn't happy picking up
the replies updating individual patches).

This also reverts back to pinning the context upfront when opening a stream for
a single context, instead of hooking into pinning and updating OACONTROL on the
fly.

Chris has repeatedly suggested he'd prefer to have the driver work with an
upfront pin, as it used to, instead of with the hook. It was changed last time
based on feedback considering some concern with the shrinker. At least from
inspection it does /seem/ safe to assume a pinned vma will reliably block the
shrinker from freeing ctx pages and the shrinker itself doesn't unpin things.
I'm not fully certain of the interaction with the _gem.c _context_lost() code
path which aims to unpin last_context. At least the code is a little simpler
this way, so maybe if Daniel is happy that his original concern was overly
cautious (or no longer an issue with the latest code), then this change is ok.

- Robert

Robert Bragg (11):
  drm/i915: Add i915 perf infrastructure
  drm/i915: rename OACONTROL GEN7_OACONTROL
  drm/i915: return EACCES for check_cmd() failures
  drm/i915: don't whitelist oacontrol in cmd parser
  drm/i915: Add 'render basic' Haswell OA unit config
  drm/i915: Enable i915 perf stream for Haswell OA unit
  drm/i915: advertise available metrics via sysfs
  drm/i915: Add dev.i915.perf_stream_paranoid sysctl option
  drm/i915: add oa_event_min_timer_exponent sysctl
  drm/i915: Add more Haswell OA metric sets
  drm/i915: Add a kerneldoc summary for i915_perf.c

 drivers/gpu/drm/i915/Makefile  |4 +
 drivers/gpu/drm/i915/gvt/handlers.c|2 +-
 drivers/gpu/drm/i915/i915_cmd_parser.c |   45 +-
 drivers/gpu/drm/i915/i915_drv.c|9 +
 drivers/gpu/drm/i915/i915_drv.h|  155 +++
 drivers/gpu/drm/i915/i915_oa_hsw.c |  752 ++
 drivers/gpu/drm/i915/i915_oa_hsw.h |   38 +
 drivers/gpu/drm/i915/i915_perf.c   | 1689 
 drivers/gpu/drm/i915/i915_reg.h|  340 ++-
 include/uapi/drm/i915_drm.h|  133 +++
 10 files changed, 3127 insertions(+), 40 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h
 create mode 100644 drivers/gpu/drm/i915/i915_perf.c

-- 
2.10.1



[PATCH] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-10-21 Thread Robert Bragg
Gen graphics hardware can be set up to periodically write snapshots of
performance counters into a circular buffer via its Observation
Architecture and this patch exposes that capability to userspace via the
i915 perf interface.

v2:
   Make sure to initialize ->specific_ctx_id when opening, without
   relying on _pin_notify hook, in case ctx already pinned.

Cc: Chris Wilson 
Signed-off-by: Robert Bragg 
Signed-off-by: Zhenyu Wang 
---
 drivers/gpu/drm/i915/i915_drv.h |   70 ++-
 drivers/gpu/drm/i915/i915_gem_context.c |   22 +-
 drivers/gpu/drm/i915/i915_perf.c| 1028 ++-
 drivers/gpu/drm/i915/i915_reg.h |  338 ++
 drivers/gpu/drm/i915/intel_ringbuffer.c |   11 +-
 include/uapi/drm/i915_drm.h |   70 ++-
 6 files changed, 1507 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 28f3f77..b155ab0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1760,6 +1760,11 @@ struct intel_wm_config {
bool sprites_scaled;
 };

+struct i915_oa_format {
+   u32 format;
+   int size;
+};
+
 struct i915_oa_reg {
i915_reg_t addr;
u32 value;
@@ -1780,11 +1785,6 @@ struct i915_perf_stream_ops {
 */
void (*disable)(struct i915_perf_stream *stream);

-   /* Return: true if any i915 perf records are ready to read()
-* for this stream.
-*/
-   bool (*can_read)(struct i915_perf_stream *stream);
-
/* Call poll_wait, passing a wait queue that will be woken
 * once there is something ready to read() for the stream
 */
@@ -1794,9 +1794,7 @@ struct i915_perf_stream_ops {

/* For handling a blocking read, wait until there is something
 * to ready to read() for the stream. E.g. wait on the same
-* wait queue that would be passed to poll_wait() until
-* ->can_read() returns true (if its safe to call ->can_read()
-* without the i915 perf lock held).
+* wait queue that would be passed to poll_wait().
 */
int (*wait_unlocked)(struct i915_perf_stream *stream);

@@ -1836,11 +1834,28 @@ struct i915_perf_stream {
struct list_head link;

u32 sample_flags;
+   int sample_size;

struct i915_gem_context *ctx;
bool enabled;

-   struct i915_perf_stream_ops *ops;
+   const struct i915_perf_stream_ops *ops;
+};
+
+struct i915_oa_ops {
+   void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
+   int (*enable_metric_set)(struct drm_i915_private *dev_priv);
+   void (*disable_metric_set)(struct drm_i915_private *dev_priv);
+   void (*oa_enable)(struct drm_i915_private *dev_priv);
+   void (*oa_disable)(struct drm_i915_private *dev_priv);
+   void (*update_oacontrol)(struct drm_i915_private *dev_priv);
+   void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv,
+   u32 ctx_id);
+   int (*read)(struct i915_perf_stream *stream,
+   char __user *buf,
+   size_t count,
+   size_t *offset);
+   bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv);
 };

 struct drm_i915_private {
@@ -2145,16 +2160,46 @@ struct drm_i915_private {

struct {
bool initialized;
+
struct mutex lock;
struct list_head streams;

+   spinlock_t hook_lock;
+
struct {
-   u32 metrics_set;
+   struct i915_perf_stream *exclusive_stream;
+
+   u32 specific_ctx_id;
+
+   struct hrtimer poll_check_timer;
+   wait_queue_head_t poll_wq;
+   atomic_t pollin;
+
+   bool periodic;
+   int period_exponent;
+   int timestamp_frequency;
+
+   int tail_margin;
+
+   int metrics_set;

const struct i915_oa_reg *mux_regs;
int mux_regs_len;
const struct i915_oa_reg *b_counter_regs;
int b_counter_regs_len;
+
+   struct {
+   struct i915_vma *vma;
+   u8 *vaddr;
+   int format;
+   int format_size;
+   } oa_buffer;
+
+   u32 gen7_latched_oastatus1;
+
+   struct i915_oa_ops ops;
+   const struct i915_oa_format *oa_formats;
+   int n_builtin_sets;
} oa;
} perf;

@@ -3525,6 +3570,9 @@ struct drm_i915_gem_object *
 i915_gem_alloc_context_obj(struct drm_device *dev, size_t size);
 struct i915_gem_context *
 i915_gem_context_create_gvt(struct drm_de

[PATCH v6 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-10-21 Thread Robert Bragg
On Thu, Oct 20, 2016 at 11:10 PM, Chris Wilson 
wrote:

> On Thu, Oct 20, 2016 at 10:19:05PM +0100, Robert Bragg wrote:
> > +int i915_gem_context_pin_legacy_rcs_state(struct drm_i915_private
> *dev_priv,
> > +   struct i915_gem_context *ctx,
> > +   u64 flags)
>
> This is still no.
>
> > +static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
> > +{
> > + struct drm_i915_gem_object *bo;
> > + enum i915_map_type map;
> > + struct i915_vma *vma;
> > + int ret;
> > +
> > + BUG_ON(dev_priv->perf.oa.oa_buffer.obj);
> > +
> > + ret = i915_mutex_lock_interruptible(&dev_priv->drm);
> > + if (ret)
> > + return ret;
> > +
> > + BUILD_BUG_ON_NOT_POWER_OF_2(OA_BUFFER_SIZE);
> > + BUILD_BUG_ON(OA_BUFFER_SIZE < SZ_128K || OA_BUFFER_SIZE > SZ_16M);
> > +
> > + bo = i915_gem_object_create(&dev_priv->drm, OA_BUFFER_SIZE);
> > + if (IS_ERR(bo)) {
> > + DRM_ERROR("Failed to allocate OA buffer\n");
> > + ret = PTR_ERR(bo);
> > + goto unlock;
> > + }
> > + dev_priv->perf.oa.oa_buffer.obj = bo;
> > +
> > + ret = i915_gem_object_set_cache_level(bo, I915_CACHE_LLC);
> > + if (ret)
> > + goto err_unref;
> > +
> > + /* PreHSW required 512K alignment, HSW requires 16M */
> > + vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, PIN_MAPPABLE);
> > + if (IS_ERR(vma)) {
> > + ret = PTR_ERR(vma);
> > + goto err_unref;
> > + }
> > + dev_priv->perf.oa.oa_buffer.vma = vma;
> > +
> > + map = HAS_LLC(dev_priv) ? I915_MAP_WB : I915_MAP_WC;
>
> You set the hw up to do coherent writes into the CPU cache, and then you
> request WC access to the pages? With set_cache_level(LLC) you can use
> MAP_WB on both llc and snoop based architectures. Fortunately this is
> only HSW!
>

hmm, yeah it looks like I unwittingly added this recently as part of a
rebase, I think from lazily copying some similar code from
intel_ringbuffer.c when I hit a conflict, without thinking more carefully,
sorry.


>
> > + dev_priv->perf.oa.oa_buffer.gtt_offset = i915_ggtt_offset(vma);
>
> I haven't spotted the advantage of storing both the ggtt_offset in
> addition to the vma (or the bo as well as the vma).
>

right, it looks like this can be cleaned up.


>
> > + dev_priv->perf.oa.oa_buffer.addr = i915_gem_object_pin_map(bo,
> map);
> > + if (IS_ERR(dev_priv->perf.oa.oa_buffer.addr)) {
> > + ret = PTR_ERR(dev_priv->perf.oa.oa_buffer.addr);
> > + goto err_unpin;
> > + }
>
> --
> Chris Wilson, Intel Open Source Technology Centre
>

Thanks,
- Robert
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://lists.freedesktop.org/archives/dri-devel/attachments/20161021/f35d133f/attachment.html>


[PATCH v6 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-10-21 Thread Robert Bragg
On Thu, Oct 20, 2016 at 11:10 PM, Chris Wilson 
wrote:

> On Thu, Oct 20, 2016 at 10:19:05PM +0100, Robert Bragg wrote:
> > +int i915_gem_context_pin_legacy_rcs_state(struct drm_i915_private
> *dev_priv,
> > +   struct i915_gem_context *ctx,
> > +   u64 flags)
>
> This is still no.
>

Okay, but it's a little frustrating for me to go in circles here :-/

I didn't originally do it this way; I originally looked at pinning the
context when opening the stream so I didn't have to consider it being
relocated. The feedback from Daniel Vetter was to look at doing it this way
I think because of some concern to do with some shrinker corner cases.

... just dug up the archive:
https://lists.freedesktop.org/archives/intel-gfx/2014-November/055385.html

Can you maybe please explain what's wrong with the current approach and
provide some justification for a different approach with some reassurance
that Daniel's original concern with the shrinker unpinning contexts isn't
actually a problem? I don't currently understand the concern with this, and
this approach seems to have been working well for quite a long time now.

- Robert
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://lists.freedesktop.org/archives/dri-devel/attachments/20161021/2f58bebf/attachment.html>


[PATCH v6 11/11] drm/i915: Add a kerneldoc summary for i915_perf.c

2016-10-20 Thread Robert Bragg
In particular this tries to capture for posterity some of the early
challenges we had with using the core perf infrastructure in case we
ever want to revisit adapting perf for device metrics.

Cc: Chris Wilson 
Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_perf.c | 163 +++
 1 file changed, 163 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 4e985dd..1e29655 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -24,6 +24,169 @@
  *   Robert Bragg 
  */

+
+/**
+ * DOC: i915 Perf, streaming API for GPU metrics
+ *
+ * Gen graphics supports a large number of performance counters that can help
+ * driver and application developers understand and optimize their use of the
+ * GPU.
+ *
+ * This i915 perf interface enables userspace to configure and open a file
+ * descriptor representing a stream of GPU metrics which can then be read() as
+ * a stream of sample records.
+ *
+ * The interface is particularly suited to exposing buffered metrics that are
+ * captured by DMA from the GPU, unsynchronized with and unrelated to the CPU.
+ *
+ * Streams representing a single context are accessible to applications with a
+ * corresponding drm file descriptor, such that OpenGL can use the interface
+ * without special privileges. Access to system-wide metrics requires root
+ * privileges by default, unless changed via the dev.i915.perf_event_paranoid
+ * sysctl option.
+ *
+ *
+ * The interface was initially inspired by the core Perf infrastructure but
+ * some notable differences are:
+ *
+ * i915 perf file descriptors represent a "stream" instead of an "event"; where
+ * a perf event primarily corresponds to a single 64bit value, while a stream
+ * might sample sets of tightly-coupled counters, depending on the
+ * configuration.  For example the Gen OA unit isn't designed to support
+ * orthogonal configurations of individual counters; it's configured for a set
+ * of related counters. Samples for an i915 perf stream capturing OA metrics
+ * will include a set of counter values packed in a compact HW specific format.
+ * The OA unit supports a number of different packing formats which can be
+ * selected by the user opening the stream. Perf has support for grouping
+ * events, but each event in the group is configured, validated and
+ * authenticated individually with separate system calls.
+ *
+ * i915 perf stream configurations are provided as an array of u64 (key,value)
+ * pairs, instead of a fixed struct with multiple miscellaneous config members,
+ * interleaved with event-type specific members.
+ *
+ * i915 perf doesn't support exposing metrics via an mmap'd circular buffer.
+ * The supported metrics are being written to memory by the GPU unsynchronized
+ * with the CPU, using HW specific packing formats for counter sets. Sometimes
+ * the constraints on HW configuration require reports to be filtered before it
+ * would be acceptable to expose them to unprivileged applications - to hide
+ * the metrics of other processes/contexts. For these use cases a read() based
+ * interface is a good fit, and provides an opportunity to filter data as it
+ * gets copied from the GPU mapped buffers to userspace buffers.
+ *
+ *
+ * Some notes regarding Linux Perf:
+ * 
+ *
+ * The first prototype of this driver was based on the core perf
+ * infrastructure, and while we did make that mostly work, with some changes to
+ * perf, we found we were breaking or working around too many assumptions baked
+ * into perf's currently cpu centric design.
+ *
+ * In the end we didn't see a clear benefit to making perf's implementation and
+ * interface more complex by changing design assumptions while we knew we still
+ * wouldn't be able to use any existing perf based userspace tools.
+ *
+ * Also considering the Gen specific nature of the Observability hardware and
+ * how userspace will sometimes need to combine i915 perf OA metrics with
+ * side-band OA data captured via MI_REPORT_PERF_COUNT commands; we're
+ * expecting the interface to be used by a platform specific userspace such as
+ * OpenGL or tools. This is to say; we aren't inherently missing out on having
+ * a standard vendor/architecture agnostic interface by not using perf.
+ *
+ *
+ * For posterity, in case we might re-visit trying to adapt core perf to be
+ * better suited to exposing i915 metrics these were the main pain points we
+ * hit:
+ *
+ * - The perf based OA PMU driver broke some significant design assumptions:
+ *
+ *   Existing perf pmus are used for profiling work on a cpu and we were
+ *   introducing the idea of _IS_DEVICE pmus with different security
+ *   implications, the need to fake cpu-related data (such as user/kernel
+ *   registers) to fit with perf's current design, and addi

[PATCH v6 10/11] drm/i915: Add more Haswell OA metric sets

2016-10-20 Thread Robert Bragg
This adds 'compute', 'compute extended', 'memory reads', 'memory writes'
and 'sampler balance' metric sets for Haswell.

The code is auto generated from an XML description of metric sets,
currently maintained in gputop, ref:

 https://github.com/rib/gputop
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_oa_hsw.c | 559 -
 1 file changed, 558 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
index 19f272b..cd2a23a 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.c
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -31,9 +31,14 @@

 enum metric_set_id {
METRIC_SET_ID_RENDER_BASIC = 1,
+   METRIC_SET_ID_COMPUTE_BASIC,
+   METRIC_SET_ID_COMPUTE_EXTENDED,
+   METRIC_SET_ID_MEMORY_READS,
+   METRIC_SET_ID_MEMORY_WRITES,
+   METRIC_SET_ID_SAMPLER_BALANCE,
 };

-int i915_oa_n_builtin_metric_sets_hsw = 1;
+int i915_oa_n_builtin_metric_sets_hsw = 6;

 static const struct i915_oa_reg b_counter_config_render_basic[] = {
{ _MMIO(0x2724), 0x0080 },
@@ -112,6 +117,298 @@ get_render_basic_mux_config(struct drm_i915_private 
*dev_priv,
return mux_config_render_basic;
 }

+static const struct i915_oa_reg b_counter_config_compute_basic[] = {
+   { _MMIO(0x2710), 0x },
+   { _MMIO(0x2714), 0x0080 },
+   { _MMIO(0x2718), 0x },
+   { _MMIO(0x271c), 0x },
+   { _MMIO(0x2720), 0x },
+   { _MMIO(0x2724), 0x0080 },
+   { _MMIO(0x2728), 0x },
+   { _MMIO(0x272c), 0x },
+   { _MMIO(0x2740), 0x },
+   { _MMIO(0x2744), 0x },
+   { _MMIO(0x2748), 0x },
+   { _MMIO(0x274c), 0x },
+   { _MMIO(0x2750), 0x },
+   { _MMIO(0x2754), 0x },
+   { _MMIO(0x2758), 0x },
+   { _MMIO(0x275c), 0x },
+   { _MMIO(0x236c), 0x },
+};
+
+static const struct i915_oa_reg mux_config_compute_basic[] = {
+   { _MMIO(0x253a4), 0x },
+   { _MMIO(0x2681c), 0x01f00800 },
+   { _MMIO(0x26820), 0x1000 },
+   { _MMIO(0x2781c), 0x01f00800 },
+   { _MMIO(0x26520), 0x0007 },
+   { _MMIO(0x265a0), 0x0007 },
+   { _MMIO(0x25380), 0x0010 },
+   { _MMIO(0x2538c), 0x0030 },
+   { _MMIO(0x25384), 0xaa8a },
+   { _MMIO(0x25404), 0x },
+   { _MMIO(0x26800), 0x4202 },
+   { _MMIO(0x26808), 0x00605817 },
+   { _MMIO(0x2680c), 0x10001005 },
+   { _MMIO(0x26804), 0x },
+   { _MMIO(0x27800), 0x0102 },
+   { _MMIO(0x27808), 0x0c0701e0 },
+   { _MMIO(0x2780c), 0x000200a0 },
+   { _MMIO(0x27804), 0x },
+   { _MMIO(0x26484), 0x4400 },
+   { _MMIO(0x26704), 0x4400 },
+   { _MMIO(0x26500), 0x0006 },
+   { _MMIO(0x26510), 0x0001 },
+   { _MMIO(0x26504), 0x8800 },
+   { _MMIO(0x26580), 0x0006 },
+   { _MMIO(0x26590), 0x0020 },
+   { _MMIO(0x26584), 0x },
+   { _MMIO(0x26104), 0x5582 },
+   { _MMIO(0x26184), 0xaa86 },
+   { _MMIO(0x25420), 0x08320c83 },
+   { _MMIO(0x25424), 0x06820c83 },
+   { _MMIO(0x2541c), 0x },
+   { _MMIO(0x25428), 0x0c03 },
+};
+
+static const struct i915_oa_reg *
+get_compute_basic_mux_config(struct drm_i915_private *dev_priv,
+int *len)
+{
+   *len = ARRAY_SIZE(mux_config_compute_basic);
+   return mux_config_compute_basic;
+}
+
+static const struct i915_oa_reg b_counter_config_compute_extended[] = {
+   { _MMIO(0x2724), 0xf080 },
+   { _MMIO(0x2720), 0x },
+   { _MMIO(0x2714), 0xf080 },
+   { _MMIO(0x2710), 0x },
+   { _MMIO(0x2770), 0x0007fe2a },
+   { _MMIO(0x2774), 0xff00 },
+   { _MMIO(0x2778), 0x0007fe6a },
+   { _MMIO(0x277c), 0xff00 },
+   { _MMIO(0x2780), 0x0007fe92 },
+   { _MMIO(0x2784), 0xff00 },
+   { _MMIO(0x2788), 0x0007fea2 },
+   { _MMIO(0x278c), 0xff00 },
+   { _MMIO(0x2790), 0x0007fe32 },
+   { _MMIO(0x2794), 0xff00 },
+   { _MMIO(0x2798), 0x0007fe9a },
+   { _MMIO(0x279c), 0xff00 },
+   { _MMIO(0x27a0), 0x0007ff23 },
+   { _MMIO(0x27a4), 0xff00 },
+   { _MMIO(0x27a8), 0x0007fff3 },
+   { _MMIO(0x27ac), 0xfffe },
+};
+
+static const struct i915_oa_reg mux_config_compute_extended[] = {
+   { _MMIO(0x2681c), 0x3eb00800 },
+   { _MMIO(0x26820), 0x0090 },
+   { _MMIO(0x25384), 0x02aa },
+   { _MMIO(0x25404), 0x03ff },
+   { _MMIO(0x26800), 0x00142284 },
+   { _MMIO(0x26808), 0x0e629062 },
+   { _MMIO(0x2680c), 0x3f6f55cb },
+   { _MMIO(0x26810), 0x0014 },
+  

[PATCH v6 09/11] drm/i915: add oa_event_min_timer_exponent sysctl

2016-10-20 Thread Robert Bragg
The minimal sampling period is now configurable via a
dev.i915.oa_min_timer_exponent sysctl parameter.

Following the precedent set by perf, the default is the minimum that
won't (on its own) exceed the default kernel.perf_event_max_sample_rate
default of 10 samples/s.

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_perf.c | 41 
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 1d61731..4e985dd 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -82,6 +82,22 @@ static u32 i915_perf_stream_paranoid = true;
 #define INVALID_CTX_ID 0x


+/* for sysctl proc_dointvec_minmax of i915_oa_min_timer_exponent */
+static int oa_exponent_max = OA_EXPONENT_MAX;
+
+/* Theoretically we can program the OA unit to sample every 160ns but don't
+ * allow that by default unless root...
+ *
+ * The period is derived from the exponent as:
+ *
+ *   period = 80ns * 2^(exponent + 1)
+ *
+ * Referring to perf's kernel.perf_event_max_sample_rate for a precedent
+ * (10 by default); with an OA exponent of 6 we get a period of 10.240
+ * microseconds - just under 10Hz
+ */
+static u32 i915_oa_min_timer_exponent = 6;
+
 /* XXX: beware if future OA HW adds new report formats that the current
  * code assumes all reports have a power-of-two size and ~(size - 1) can
  * be used as a mask to align the OA tail pointer.
@@ -1349,21 +1365,13 @@ static int read_properties_unlocked(struct 
drm_i915_private *dev_priv,
return -EINVAL;
}

-   /* NB: The exponent represents a period as follows:
-*
-*   80ns * 2^(period_exponent + 1)
-*
-* Theoretically we can program the OA unit to sample
+   /* Theoretically we can program the OA unit to sample
 * every 160ns but don't allow that by default unless
 * root.
-*
-* Referring to perf's
-* kernel.perf_event_max_sample_rate for a precedent
-* (10 by default); with an OA exponent of 6 we get
-* a period of 10.240 microseconds -just under 10Hz
 */
-   if (value < 6 && !capable(CAP_SYS_ADMIN)) {
-   DRM_ERROR("Sampling period too high without 
root privileges\n");
+   if (value < i915_oa_min_timer_exponent &&
+   !capable(CAP_SYS_ADMIN)) {
+   DRM_ERROR("OA timer exponent too low without 
root privileges\n");
return -EACCES;
}

@@ -1471,6 +1479,15 @@ static struct ctl_table oa_table[] = {
 .extra1 = &zero,
 .extra2 = &one,
 },
+   {
+.procname = "oa_min_timer_exponent",
+.data = &i915_oa_min_timer_exponent,
+.maxlen = sizeof(i915_oa_min_timer_exponent),
+.mode = 0644,
+.proc_handler = proc_dointvec_minmax,
+.extra1 = &zero,
+.extra2 = &oa_exponent_max,
+},
{}
 };

-- 
2.10.0



[PATCH v6 08/11] drm/i915: Add dev.i915.perf_stream_paranoid sysctl option

2016-10-20 Thread Robert Bragg
Consistent with the kernel.perf_event_paranoid sysctl option that can
allow non-root users to access system wide cpu metrics, this can
optionally allow non-root users to access system wide OA counter metrics
from Gen graphics hardware.

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/i915_perf.c | 50 +++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3b86427..66629bc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2162,6 +2162,7 @@ struct drm_i915_private {
bool initialized;

struct kobject *metrics_kobj;
+   struct ctl_table_header *sysctl_header;

struct mutex lock;
struct list_head streams;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index c45bba5..1d61731 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -64,6 +64,11 @@
 #define POLL_FREQUENCY 200
 #define POLL_PERIOD (NSEC_PER_SEC / POLL_FREQUENCY)

+/* for sysctl proc_dointvec_minmax of dev.i915.perf_stream_paranoid */
+static int zero;
+static int one = 1;
+static u32 i915_perf_stream_paranoid = true;
+
 /* The maximum exponent the hardware accepts is 63 (essentially it selects one
  * of the 64bit timestamp bits to trigger reports from) but there's currently
  * no known use case for sampling as infrequently as once per 47 thousand 
years.
@@ -1206,7 +1211,13 @@ i915_perf_open_ioctl_locked(struct drm_i915_private 
*dev_priv,
}
}

-   if (!specific_ctx && !capable(CAP_SYS_ADMIN)) {
+   /* Similar to perf's kernel.perf_paranoid_cpu sysctl option
+* we check a dev.i915.perf_stream_paranoid sysctl option
+* to determine if it's ok to access system wide OA counters
+* without CAP_SYS_ADMIN privileges.
+*/
+   if (!specific_ctx &&
+   i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
DRM_ERROR("Insufficient privileges to open system-wide i915 
perf stream\n");
ret = -EACCES;
goto err_ctx;
@@ -1450,6 +1461,39 @@ void i915_perf_unregister(struct drm_i915_private 
*dev_priv)
dev_priv->perf.metrics_kobj = NULL;
 }

+static struct ctl_table oa_table[] = {
+   {
+.procname = "perf_stream_paranoid",
+.data = &i915_perf_stream_paranoid,
+.maxlen = sizeof(i915_perf_stream_paranoid),
+.mode = 0644,
+.proc_handler = proc_dointvec_minmax,
+.extra1 = &zero,
+.extra2 = &one,
+},
+   {}
+};
+
+static struct ctl_table i915_root[] = {
+   {
+.procname = "i915",
+.maxlen = 0,
+.mode = 0555,
+.child = oa_table,
+},
+   {}
+};
+
+static struct ctl_table dev_root[] = {
+   {
+.procname = "dev",
+.maxlen = 0,
+.mode = 0555,
+.child = i915_root,
+},
+   {}
+};
+
 void i915_perf_init(struct drm_i915_private *dev_priv)
 {
if (!IS_HASWELL(dev_priv))
@@ -1482,6 +1526,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
dev_priv->perf.oa.n_builtin_sets =
i915_oa_n_builtin_metric_sets_hsw;

+   dev_priv->perf.sysctl_header = register_sysctl_table(dev_root);
+
dev_priv->perf.initialized = true;
 }

@@ -1490,6 +1536,8 @@ void i915_perf_fini(struct drm_i915_private *dev_priv)
if (!dev_priv->perf.initialized)
return;

+   unregister_sysctl_table(dev_priv->perf.sysctl_header);
+
memset(&dev_priv->perf.oa.ops, 0, sizeof(dev_priv->perf.oa.ops));
dev_priv->perf.initialized = false;
 }
-- 
2.10.0



[PATCH v6 07/11] drm/i915: advertise available metrics via sysfs

2016-10-20 Thread Robert Bragg
Each metric set is given a sysfs entry like:

/sys/class/drm/card0/metrics//id

This allows userspace to enumerate the specific sets that are available
for the current system. The 'id' file contains an unsigned integer that
can be used to open the associated metric set via
DRM_IOCTL_I915_PERF_OPEN. The  is a globally unique ID for a
specific OA unit register configuration that can be reliably used by
userspace as a key to lookup corresponding counter meta data and
normalization equations.

The guid registry is currently maintained as part of gputop along with
the XML metric set descriptions and code generation scripts, ref:

 https://github.com/rib/gputop
 > gputop-data/guids.xml
 > scripts/update-guids.py
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_drv.c|  5 
 drivers/gpu/drm/i915/i915_drv.h|  4 +++
 drivers/gpu/drm/i915/i915_oa_hsw.c | 51 +
 drivers/gpu/drm/i915/i915_oa_hsw.h |  4 +++
 drivers/gpu/drm/i915/i915_perf.c   | 52 ++
 5 files changed, 116 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 5449579..3b6f586 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1115,6 +1115,9 @@ static void i915_driver_register(struct drm_i915_private 
*dev_priv)
if (drm_dev_register(dev, 0) == 0) {
i915_debugfs_register(dev_priv);
i915_setup_sysfs(dev_priv);
+
+   /* Depends on sysfs having been initialized */
+   i915_perf_register(dev_priv);
} else
DRM_ERROR("Failed to register driver for userspace access!\n");

@@ -1151,6 +1154,8 @@ static void i915_driver_unregister(struct 
drm_i915_private *dev_priv)
acpi_video_unregister();
intel_opregion_unregister(dev_priv);

+   i915_perf_unregister(dev_priv);
+
i915_teardown_sysfs(dev_priv);
i915_debugfs_unregister(dev_priv);
drm_dev_unregister(&dev_priv->drm);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b234412..3b86427 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2161,6 +2161,8 @@ struct drm_i915_private {
struct {
bool initialized;

+   struct kobject *metrics_kobj;
+
struct mutex lock;
struct list_head streams;

@@ -3752,6 +3754,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
 /* i915_perf.c */
 extern void i915_perf_init(struct drm_i915_private *dev_priv);
 extern void i915_perf_fini(struct drm_i915_private *dev_priv);
+extern void i915_perf_register(struct drm_i915_private *dev_priv);
+extern void i915_perf_unregister(struct drm_i915_private *dev_priv);

 /* i915_suspend.c */
 extern int i915_save_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
index 8906380..19f272b 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.c
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -24,6 +24,8 @@
  *
  */

+#include 
+
 #include "i915_drv.h"
 #include "i915_oa_hsw.h"

@@ -142,3 +144,52 @@ int i915_oa_select_metric_set_hsw(struct drm_i915_private 
*dev_priv)
return -ENODEV;
}
 }
+
+static ssize_t
+show_render_basic_id(struct device *kdev, struct device_attribute *attr, char 
*buf)
+{
+   return sprintf(buf, "%d\n", METRIC_SET_ID_RENDER_BASIC);
+}
+
+static struct device_attribute dev_attr_render_basic_id = {
+   .attr = { .name = "id", .mode = S_IRUGO },
+   .show = show_render_basic_id,
+   .store = NULL,
+};
+
+static struct attribute *attrs_render_basic[] = {
+   &dev_attr_render_basic_id.attr,
+   NULL,
+};
+
+static struct attribute_group group_render_basic = {
+   .name = "403d8832-1a27-4aa6-a64e-f5389ce7b212",
+   .attrs =  attrs_render_basic,
+};
+
+int
+i915_perf_register_sysfs_hsw(struct drm_i915_private *dev_priv)
+{
+   int mux_len;
+   int ret = 0;
+
+   if (get_render_basic_mux_config(dev_priv, &mux_len)) {
+   ret = sysfs_create_group(dev_priv->perf.metrics_kobj, 
&group_render_basic);
+   if (ret)
+   goto error_render_basic;
+   }
+
+   return 0;
+
+error_render_basic:
+   return ret;
+}
+
+void
+i915_perf_unregister_sysfs_hsw(struct drm_i915_private *dev_priv)
+{
+   int mux_len;
+
+   if (get_render_basic_mux_config(dev_priv, &mux_len))
+   sysfs_remove_group(dev_priv->perf.metrics_kobj, 
&group_render_basic);
+}
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.h 
b/drivers/gpu/drm/i915/i915_oa_hsw.h
index b618a1f..429a22

[PATCH v6 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-10-20 Thread Robert Bragg
Gen graphics hardware can be set up to periodically write snapshots of
performance counters into a circular buffer via its Observation
Architecture and this patch exposes that capability to userspace via the
i915 perf interface.

v2:
   Make sure to initialize ->specific_ctx_id when opening, without
   relying on _pin_notify hook, in case ctx already pinned.

Cc: Chris Wilson 
Signed-off-by: Robert Bragg 
Signed-off-by: Zhenyu Wang 

factor out init_specific_ctx_id func
---
 drivers/gpu/drm/i915/i915_drv.h |   72 ++-
 drivers/gpu/drm/i915/i915_gem_context.c |   22 +-
 drivers/gpu/drm/i915/i915_perf.c| 1034 ++-
 drivers/gpu/drm/i915/i915_reg.h |  338 ++
 drivers/gpu/drm/i915/intel_ringbuffer.c |   11 +-
 include/uapi/drm/i915_drm.h |   70 ++-
 6 files changed, 1515 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 28f3f77..b234412 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1760,6 +1760,11 @@ struct intel_wm_config {
bool sprites_scaled;
 };

+struct i915_oa_format {
+   u32 format;
+   int size;
+};
+
 struct i915_oa_reg {
i915_reg_t addr;
u32 value;
@@ -1780,11 +1785,6 @@ struct i915_perf_stream_ops {
 */
void (*disable)(struct i915_perf_stream *stream);

-   /* Return: true if any i915 perf records are ready to read()
-* for this stream.
-*/
-   bool (*can_read)(struct i915_perf_stream *stream);
-
/* Call poll_wait, passing a wait queue that will be woken
 * once there is something ready to read() for the stream
 */
@@ -1794,9 +1794,7 @@ struct i915_perf_stream_ops {

/* For handling a blocking read, wait until there is something
 * to ready to read() for the stream. E.g. wait on the same
-* wait queue that would be passed to poll_wait() until
-* ->can_read() returns true (if its safe to call ->can_read()
-* without the i915 perf lock held).
+* wait queue that would be passed to poll_wait().
 */
int (*wait_unlocked)(struct i915_perf_stream *stream);

@@ -1836,11 +1834,28 @@ struct i915_perf_stream {
struct list_head link;

u32 sample_flags;
+   int sample_size;

struct i915_gem_context *ctx;
bool enabled;

-   struct i915_perf_stream_ops *ops;
+   const struct i915_perf_stream_ops *ops;
+};
+
+struct i915_oa_ops {
+   void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
+   int (*enable_metric_set)(struct drm_i915_private *dev_priv);
+   void (*disable_metric_set)(struct drm_i915_private *dev_priv);
+   void (*oa_enable)(struct drm_i915_private *dev_priv);
+   void (*oa_disable)(struct drm_i915_private *dev_priv);
+   void (*update_oacontrol)(struct drm_i915_private *dev_priv);
+   void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv,
+   u32 ctx_id);
+   int (*read)(struct i915_perf_stream *stream,
+   char __user *buf,
+   size_t count,
+   size_t *offset);
+   bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv);
 };

 struct drm_i915_private {
@@ -2145,16 +2160,48 @@ struct drm_i915_private {

struct {
bool initialized;
+
struct mutex lock;
struct list_head streams;

+   spinlock_t hook_lock;
+
struct {
-   u32 metrics_set;
+   struct i915_perf_stream *exclusive_stream;
+
+   u32 specific_ctx_id;
+
+   struct hrtimer poll_check_timer;
+   wait_queue_head_t poll_wq;
+   atomic_t pollin;
+
+   bool periodic;
+   int period_exponent;
+   int timestamp_frequency;
+
+   int tail_margin;
+
+   int metrics_set;

const struct i915_oa_reg *mux_regs;
int mux_regs_len;
const struct i915_oa_reg *b_counter_regs;
int b_counter_regs_len;
+
+   struct {
+   struct drm_i915_gem_object *obj;
+   struct i915_vma *vma;
+   u32 gtt_offset;
+   u8 *addr;
+   int format;
+   int format_size;
+   } oa_buffer;
+
+   u32 gen7_latched_oastatus1;
+
+   struct i915_oa_ops ops;
+   const struct i915_oa_format *oa_formats;
+   int n_builtin_sets;
} oa;
} perf;

@@ -3525,6 +3572,9 

[PATCH v6 05/11] drm/i915: Add 'render basic' Haswell OA unit config

2016-10-20 Thread Robert Bragg
Adds a static OA unit, MUX + B Counter configuration for basic render
metrics on Haswell. This is auto generated from an XML
description of metric sets, currently maintained in gputop, ref:

  https://github.com/rib/gputop
  > gputop-data/oa-*.xml
  > scripts/i915-perf-kernelgen.py

  $ make -C gputop-data -f Makefile.xml SYSFS=0 WHITELIST=RenderBasic

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/Makefile  |   3 +-
 drivers/gpu/drm/i915/i915_drv.h|  14 
 drivers/gpu/drm/i915/i915_oa_hsw.c | 144 +
 drivers/gpu/drm/i915/i915_oa_hsw.h |  34 +
 4 files changed, 194 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 8d4e25f..ac0c3ad 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -114,7 +114,8 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o
 i915-y += i915_vgpu.o

 # perf code
-i915-y += i915_perf.o
+i915-y += i915_perf.o \
+ i915_oa_hsw.o

 ifeq ($(CONFIG_DRM_I915_GVT),y)
 i915-y += intel_gvt.o
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d3737c6..28f3f77 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1760,6 +1760,11 @@ struct intel_wm_config {
bool sprites_scaled;
 };

+struct i915_oa_reg {
+   i915_reg_t addr;
+   u32 value;
+};
+
 struct i915_perf_stream;

 struct i915_perf_stream_ops {
@@ -2142,6 +2147,15 @@ struct drm_i915_private {
bool initialized;
struct mutex lock;
struct list_head streams;
+
+   struct {
+   u32 metrics_set;
+
+   const struct i915_oa_reg *mux_regs;
+   int mux_regs_len;
+   const struct i915_oa_reg *b_counter_regs;
+   int b_counter_regs_len;
+   } oa;
} perf;

/* Abstract the submission mechanism (legacy ringbuffer or execlists) 
away */
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
new file mode 100644
index 000..8906380
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -0,0 +1,144 @@
+/*
+ * Autogenerated file, DO NOT EDIT manually!
+ *
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "i915_drv.h"
+#include "i915_oa_hsw.h"
+
+enum metric_set_id {
+   METRIC_SET_ID_RENDER_BASIC = 1,
+};
+
+int i915_oa_n_builtin_metric_sets_hsw = 1;
+
+static const struct i915_oa_reg b_counter_config_render_basic[] = {
+   { _MMIO(0x2724), 0x0080 },
+   { _MMIO(0x2720), 0x },
+   { _MMIO(0x2714), 0x0080 },
+   { _MMIO(0x2710), 0x },
+};
+
+static const struct i915_oa_reg mux_config_render_basic[] = {
+   { _MMIO(0x253a4), 0x0160 },
+   { _MMIO(0x25440), 0x0010 },
+   { _MMIO(0x25128), 0x },
+   { _MMIO(0x2691c), 0x0800 },
+   { _MMIO(0x26aa0), 0x0150 },
+   { _MMIO(0x26b9c), 0x6000 },
+   { _MMIO(0x2791c), 0x0800 },
+   { _MMIO(0x27aa0), 0x0150 },
+   { _MMIO(0x27b9c), 0x6000 },
+   { _MMIO(0x2641c), 0x0400 },
+   { _MMIO(0x25380), 0x0010 },
+   { _MMIO(0x2538c), 0x },
+   { _MMIO(0x25384), 0x0800 },
+   { _MMIO(0x25400), 0x0004 },
+   { _MMIO(0x2540c), 0x06029000 },
+   { _MMIO(0x25410), 0x0002 },
+   { _MMIO(0x25404), 0x5c30 },
+   { _MMIO(0x25100), 0x0016 },
+   { _MMIO(0x25110), 0x0400 },
+   { _MMIO(0x25104), 0x },
+   { _MMIO(0x26804), 0x1211 },
+   { _MMIO(0

[PATCH v6 04/11] drm/i915: don't whitelist oacontrol in cmd parser

2016-10-20 Thread Robert Bragg
Being able to program OACONTROL from a non-privileged batch buffer is
not sufficient to be able to configure the OA unit. This was originally
allowed to help enable Mesa to expose OA counters via the
INTEL_performance_query extension, but the current implementation based
on programming OACONTROL via a batch buffer isn't able to report useable
data without a more complete OA unit configuration. Mesa handles the
possibility that writes to OACONTROL may not be allowed and so only
advertises the extension after explicitly testing that a write to
OACONTROL succeeds. Based on this; removing OACONTROL from the whitelist
should be ok for userspace.

Removing this simplifies adding a new kernel api for configuring the OA
unit without needing to consider the possibility that userspace might
trample on OACONTROL state which we'd like to start managing within
the kernel instead. In particular running any Mesa based GL application
currently results in clearing OACONTROL when initializing which would
disable the capturing of metrics.

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 38 ++
 1 file changed, 2 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index c45dd83..5152d6f 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -450,7 +450,6 @@ static const struct drm_i915_reg_descriptor 
gen7_render_regs[] = {
REG64(PS_INVOCATION_COUNT),
REG64(PS_DEPTH_COUNT),
REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE),
-   REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */
REG64(MI_PREDICATE_SRC0),
REG64(MI_PREDICATE_SRC1),
REG32(GEN7_3DPRIM_END_OFFSET),
@@ -1060,8 +1059,7 @@ bool intel_engine_needs_cmd_parser(struct intel_engine_cs 
*engine)
 static bool check_cmd(const struct intel_engine_cs *engine,
  const struct drm_i915_cmd_descriptor *desc,
  const u32 *cmd, u32 length,
- const bool is_master,
- bool *oacontrol_set)
+ const bool is_master)
 {
if (desc->flags & CMD_DESC_SKIP)
return true;
@@ -1099,31 +1097,6 @@ static bool check_cmd(const struct intel_engine_cs 
*engine,
}

/*
-* OACONTROL requires some special handling for
-* writes. We want to make sure that any batch which
-* enables OA also disables it before the end of the
-* batch. The goal is to prevent one process from
-* snooping on the perf data from another process. To do
-* that, we need to check the value that will be written
-* to the register. Hence, limit OACONTROL writes to
-* only MI_LOAD_REGISTER_IMM commands.
-*/
-   if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) {
-   if (desc->cmd.value == MI_LOAD_REGISTER_MEM) {
-   DRM_DEBUG_DRIVER("CMD: Rejected LRM to 
OACONTROL\n");
-   return false;
-   }
-
-   if (desc->cmd.value == MI_LOAD_REGISTER_REG) {
-   DRM_DEBUG_DRIVER("CMD: Rejected LRR to 
OACONTROL\n");
-   return false;
-   }
-
-   if (desc->cmd.value == MI_LOAD_REGISTER_IMM(1))
-   *oacontrol_set = (cmd[offset + 1] != 0);
-   }
-
-   /*
 * Check the value written to the register against the
 * allowed mask/value pair given in the whitelist entry.
 */
@@ -1214,7 +1187,6 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
u32 *cmd, *batch_end;
struct drm_i915_cmd_descriptor default_desc = noop_desc;
const struct drm_i915_cmd_descriptor *desc = &default_desc;
-   bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
bool needs_clflush_after = false;
int ret = 0;

@@ -1270,8 +1242,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
break;
}

-   if (!check_cmd(engine, desc, cmd, length, is_master,
-  &oacontrol_set)) {
+   if (!check_cmd(engine, desc, cmd, length, is_master)) {
ret = -EACCES;
break;
}
@@ -1279,11 +1250,6 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine

[PATCH v6 03/11] drm/i915: return EACCES for check_cmd() failures

2016-10-20 Thread Robert Bragg
check_cmd() is checking whether a command adheres to certain
restrictions that ensure it's safe to execute within a privileged batch
buffer. Returning false implies a privilege problem, not that the
command is invalid.

The distinction makes the difference between allowing the buffer to be
executed as an unprivileged batch buffer or returning an EINVAL error to
userspace without executing anything.

In a case where userspace may want to test whether it can successfully
write to a register that needs privileges the distinction may be
important and an EINVAL error may be considered fatal.

In particular this is currently true for Mesa, which includes a test for
whether OACONTROL can be written too, but Mesa treats any error when
flushing a batch buffer as fatal, calling exit(1).

As it is currently Mesa can gracefully handle a failure to write to
OACONTROL if the command parser is disabled, but if we were to remove
OACONTROL from the parser's whitelist then the returned EINVAL would
break Mesa applications as they attempt an OACONTROL write.

This bumps the command parser version from 7 to 8, as the change is
visible to userspace.

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index fe34470..c45dd83 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -1272,7 +1272,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,

if (!check_cmd(engine, desc, cmd, length, is_master,
   &oacontrol_set)) {
-   ret = -EINVAL;
+   ret = -EACCES;
break;
}

@@ -1333,6 +1333,9 @@ int i915_cmd_parser_get_version(struct drm_i915_private 
*dev_priv)
 * 5. GPGPU dispatch compute indirect registers.
 * 6. TIMESTAMP register and Haswell CS GPR registers
 * 7. Allow MI_LOAD_REGISTER_REG between whitelisted registers.
+* 8. Don't report cmd_check() failures as EINVAL errors to userspace;
+*rely on the HW to NOOP disallowed commands as it would without
+*the parser enabled.
 */
-   return 7;
+   return 8;
 }
-- 
2.10.0



[PATCH v6 02/11] drm/i915: rename OACONTROL GEN7_OACONTROL

2016-10-20 Thread Robert Bragg
OACONTROL changes quite a bit for gen8, with some bits split out into a
per-context OACTXCONTROL register. Rename now before adding more gen7 OA
registers

Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gvt/handlers.c| 2 +-
 drivers/gpu/drm/i915/i915_cmd_parser.c | 4 ++--
 drivers/gpu/drm/i915/i915_reg.h| 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/handlers.c 
b/drivers/gpu/drm/i915/gvt/handlers.c
index 3e74fb3..68e07a1 100644
--- a/drivers/gpu/drm/i915/gvt/handlers.c
+++ b/drivers/gpu/drm/i915/gvt/handlers.c
@@ -2159,7 +2159,7 @@ static int init_generic_mmio_info(struct intel_gvt *gvt)
MMIO_DFH(0x1217c, D_ALL, F_CMD_ACCESS, NULL, NULL);

MMIO_F(0x2290, 8, 0, 0, 0, D_HSW_PLUS, NULL, NULL);
-   MMIO_D(OACONTROL, D_HSW);
+   MMIO_D(GEN7_OACONTROL, D_HSW);
MMIO_D(0x2b00, D_BDW_PLUS);
MMIO_D(0x2360, D_BDW_PLUS);
MMIO_F(0x5200, 32, 0, 0, 0, D_ALL, NULL, NULL);
diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index f191d7b..fe34470 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -450,7 +450,7 @@ static const struct drm_i915_reg_descriptor 
gen7_render_regs[] = {
REG64(PS_INVOCATION_COUNT),
REG64(PS_DEPTH_COUNT),
REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE),
-   REG32(OACONTROL), /* Only allowed for LRI and SRM. See below. */
+   REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */
REG64(MI_PREDICATE_SRC0),
REG64(MI_PREDICATE_SRC1),
REG32(GEN7_3DPRIM_END_OFFSET),
@@ -1108,7 +1108,7 @@ static bool check_cmd(const struct intel_engine_cs 
*engine,
 * to the register. Hence, limit OACONTROL writes to
 * only MI_LOAD_REGISTER_IMM commands.
 */
-   if (reg_addr == i915_mmio_reg_offset(OACONTROL)) {
+   if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) {
if (desc->cmd.value == MI_LOAD_REGISTER_MEM) {
DRM_DEBUG_DRIVER("CMD: Rejected LRM to 
OACONTROL\n");
return false;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 00efaa1..0ad7f03 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -615,7 +615,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define HSW_CS_GPR(n)   _MMIO(0x2600 + (n) * 8)
 #define HSW_CS_GPR_UDW(n)   _MMIO(0x2600 + (n) * 8 + 4)

-#define OACONTROL _MMIO(0x2360)
+#define GEN7_OACONTROL _MMIO(0x2360)

 #define _GEN7_PIPEA_DE_LOAD_SL 0x70068
 #define _GEN7_PIPEB_DE_LOAD_SL 0x71068
-- 
2.10.0



[PATCH v6 01/11] drm/i915: Add i915 perf infrastructure

2016-10-20 Thread Robert Bragg
Adds base i915 perf infrastructure for Gen performance metrics.

This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64
properties to configure a stream of metrics and returns a new fd usable
with standard VFS system calls including read() to read typed and sized
records; ioctl() to enable or disable capture and poll() to wait for
data.

A stream is opened something like:

  uint64_t properties[] = {
  /* Single context sampling */
  DRM_I915_PERF_PROP_CTX_HANDLE,ctx_handle,

  /* Include OA reports in samples */
  DRM_I915_PERF_PROP_SAMPLE_OA, true,

  /* OA unit configuration */
  DRM_I915_PERF_PROP_OA_METRICS_SET,metrics_set_id,
  DRM_I915_PERF_PROP_OA_FORMAT, report_format,
  DRM_I915_PERF_PROP_OA_EXPONENT,   period_exponent,
   };
   struct drm_i915_perf_open_param parm = {
  .flags = I915_PERF_FLAG_FD_CLOEXEC |
   I915_PERF_FLAG_FD_NONBLOCK |
   I915_PERF_FLAG_DISABLED,
  .properties_ptr = (uint64_t)properties,
  .num_properties = sizeof(properties) / 16,
   };
   int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, ¶m);

Records read all start with a common { type, size } header with
DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records
contain an extensible number of fields and it's the
DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that
determine what's included in every sample.

No specific streams are supported yet so any attempt to open a stream
will return an error.

v4:
s/DRM_IORW/DRM_IOW/ - Emil Velikov
v3:
update read() interface to avoid passing state struct - Chris Wilson
fix some rebase fallout, with i915-perf init/deinit
v2:
use i915_gem_context_get() - Chris Wilson

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/Makefile|   3 +
 drivers/gpu/drm/i915/i915_drv.c  |   4 +
 drivers/gpu/drm/i915/i915_drv.h  |  91 
 drivers/gpu/drm/i915/i915_perf.c | 443 +++
 include/uapi/drm/i915_drm.h  |  67 ++
 5 files changed, 608 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_perf.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 6123400..8d4e25f 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -113,6 +113,9 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o
 # virtual gpu code
 i915-y += i915_vgpu.o

+# perf code
+i915-y += i915_perf.o
+
 ifeq ($(CONFIG_DRM_I915_GVT),y)
 i915-y += intel_gvt.o
 include $(src)/gvt/Makefile
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 912d534..5449579 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -836,6 +836,8 @@ static int i915_driver_init_early(struct drm_i915_private 
*dev_priv,

intel_detect_preproduction_hw(dev_priv);

+   i915_perf_init(dev_priv);
+
return 0;

 err_workqueues:
@@ -849,6 +851,7 @@ static int i915_driver_init_early(struct drm_i915_private 
*dev_priv,
  */
 static void i915_driver_cleanup_early(struct drm_i915_private *dev_priv)
 {
+   i915_perf_fini(dev_priv);
i915_gem_load_cleanup(&dev_priv->drm);
i915_workqueues_cleanup(dev_priv);
 }
@@ -2575,6 +2578,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, 
DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, 
i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, 
i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, 
DRM_RENDER_ALLOW),
 };

 static struct drm_driver driver = {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5b2b7f3..d3737c6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1760,6 +1760,84 @@ struct intel_wm_config {
bool sprites_scaled;
 };

+struct i915_perf_stream;
+
+struct i915_perf_stream_ops {
+   /* Enables the collection of HW samples, either in response to
+* I915_PERF_IOCTL_ENABLE or implicitly called when stream is
+* opened without I915_PERF_FLAG_DISABLED.
+*/
+   void (*enable)(struct i915_perf_stream *stream);
+
+   /* Disables the collection of HW samples, either in response to
+* I915_PERF_IOCTL_DISABLE or implicitly called before
+* destroying the stream.
+*/
+   void (*disable)(struct i915_perf_stream *stream);
+
+   /* Return: true if any i915 perf records are ready to read()
+* for this stream.
+*/
+   bool (*can_read)(struct i915_perf_stream *stream);
+
+   /* Call poll_wait, passing a wait queue that will be woken
+* once there is something ready to read() for the stream
+*/
+   void (*poll_wait)(struct i915_pe

[Intel-gfx] [PATCH] drm/i915: Add i915 perf infrastructure

2016-10-19 Thread Robert Bragg
On Wed, Oct 12, 2016 at 12:41 PM, Joonas Lahtinen <
joonas.lahtinen at linux.intel.com> wrote:

> On ti, 2016-10-11 at 12:03 -0700, Robert Bragg wrote:
> > > > +   case DRM_I915_PERF_PROP_MAX:
> > > > +   BUG();
> > >
> > > We already handle this case above, but I guess we still need this in
> > > order to silence gcc...
> >
> > right, and preferable to having a default: case, for the future compiler
> warning to handle any new properties here.
>
> Please, do use MISSING_CASE instead. Daniel is known to get upset for
> far less ;)
>
> Generally consensus is that BUG() is used only when there're no other
> options to back out.
>

thanks for this pointer.

I'll add a default: with MISSING_CASE as that looks like an i915-specific
convention; though it seems like a real shame to defer missing case issues
to runtime errors instead of taking advantage of the compiler complaining
at build time that a case has been forgotten.

Thanks,
- Robert



>
> Regards, Joonas
> --
> Joonas Lahtinen
> Open Source Technology Center
> Intel Corporation
>
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://lists.freedesktop.org/archives/dri-devel/attachments/20161019/9f52acaf/attachment.html>


[Intel-gfx] [PATCH v5 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-10-11 Thread Robert Bragg
On Fri, Oct 7, 2016 at 10:19 AM, Matthew Auld <
matthew.william.auld at gmail.com> wrote:

> On 14 September 2016 at 15:19, Robert Bragg  wrote:
>
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c
> b/drivers/gpu/drm/i915/i915_perf.c
> > index 87530f5..5305982 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -28,14 +28,860 @@
> >  #include 
> >
> >  #include "i915_drv.h"
> > +#include "intel_ringbuffer.h"
> > +#include "intel_lrc.h"
> Superfluous includes.
>

ah, yup, removed.


> > +#include "i915_oa_hsw.h"
> > +
> > +/* Must be a power of two */
> > +#define OA_BUFFER_SIZE SZ_16M
> It's a power of two between 128K and 16M, maybe add a build_bug_on and
> build_bug_on_not_power_of_2 to check this?
>

okey, added assertions to init_oa_buffer()



>
> > +#define OA_TAKEN(tail, head)   ((tail - head) & (OA_BUFFER_SIZE - 1))
> > +
> > +/* There's a HW race condition between OA unit tail pointer register
> updates and
> > + * writes to memory whereby the tail pointer can sometimes get ahead of
> what's
> > + * been written out to the OA buffer so far.
> > + *
> > + * Although this can be observed explicitly by checking for a zeroed
> report-id
> > + * field in tail reports, it seems preferable to account for this
> earlier e.g.
> > + * as part of the _oa_buffer_is_empty checks to minimize -EAGAIN
> polling cycles
> > + * in this situation.
> > + *
> > + * To give time for the most recent reports to land before they may be
> copied to
> > + * userspace, the driver operates as if the tail pointer effectively
> lags behind
> > + * the HW tail pointer by 'tail_margin' bytes. The margin in bytes is
> calculated
> > + * based on this constant in nanoseconds, the current OA sampling
> exponent
> > + * and current report size.
> > + *
> > + * There is also a fallback check while reading to simply skip over
> reports with
> > + * a zeroed report-id.
> > + */
> > +#define OA_TAIL_MARGIN_NSEC10ULL
> Yikes!
>

Yeah :-)

Although I've had some feedback from the hw side that there probably is a
race as described here; it's currently an assumption that a 100
microseconds is always enough time for any internally buffered OA reports
to get as far as being coherent with the cpu's view. If a more detailed
analysis is ever done to quantify the maximum latency (maybe not best to
measure as a unit of time) maybe we can update this, but for now I've found
this to work. I'm not really pushing for, or expecting this to be
investigated in detail for Haswell.


>
>
> > +
> > +static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv)
> > +{
> > +   /* Pre-DevBDW: OABUFFER must be set with counters off,
> > +* before OASTATUS1, but after OASTATUS2
> > +*/
> > +   I915_WRITE(GEN7_OASTATUS2, dev_priv->perf.oa.oa_buffer.gtt_offset
> |
> > +  OA_MEM_SELECT_GGTT); /* head */
> > +   I915_WRITE(GEN7_OABUFFER, dev_priv->perf.oa.oa_buffer.
> gtt_offset);
> > +   I915_WRITE(GEN7_OASTATUS1, dev_priv->perf.oa.oa_buffer.gtt_offset
> |
> > +  OABUFFER_SIZE_16M); /* tail */
> > +
> > +   /* On Haswell we have to track which OASTATUS1 flags we've
> > +* already seen since they can't be cleared while periodic
> > +* sampling is enabled.
> > +*/
> > +   dev_priv->perf.oa.gen7_latched_oastatus1 = 0;
> > +
> > +   /* NB: although the OA buffer will initially be allocated
> > +* zeroed via shmfs (and so this memset is redundant when
> > +* first allocating), we may re-init the OA buffer, either
> > +* when re-enabling a stream or in error/reset paths.
> > +*
> > +* The reason we clear the buffer for each re-init is for the
> > +* sanity check in gen7_append_oa_reports() that looks at the
> > +* report-id field to make sure it's non-zero which relies on
> > +* the assumption that new reports are being written to zeroed
> > +* memory...
> > +*/
> > +   memset(dev_priv->perf.oa.oa_buffer.addr, 0, SZ_16M);
> OA_BUFFER_SIZE
>

ah, yep.


>
> > +
> > +   /* Maybe make ->pollin per-stream state if we support multiple
> > +* concurrent streams in the future. */
> > +   atomic_set(&dev_priv->perf.oa.pollin, false);
> > +}
> > +
> > +static i

[Intel-gfx] [PATCH] drm/i915: Add i915 perf infrastructure

2016-10-11 Thread Robert Bragg
On Fri, Oct 7, 2016 at 10:10 AM, Matthew Auld <
matthew.william.auld at gmail.com> wrote:

> On 14 September 2016 at 16:32, Robert Bragg  wrote:
>
> > +
> > +int i915_perf_open_ioctl_locked(struct drm_device *dev,
> > +   struct drm_i915_perf_open_param *param,
> > +   struct perf_open_properties *props,
> > +   struct drm_file *file)
> > +{
> This should be static and also let's just make it take dev_priv directly.
>

Ah, yep, done.


> > +   case DRM_I915_PERF_PROP_MAX:
> > +   BUG();
> We already handle this case above, but I guess we still need this in
> order to silence gcc...
>

right, and preferable to having a default: case, for the future compiler
warning to handle any new properties here.



> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index 03725fe..77fe79b 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -258,6 +258,7 @@ typedef struct _drm_i915_sarea {
> >  #define DRM_I915_GEM_USERPTR   0x33
> >  #define DRM_I915_GEM_CONTEXT_GETPARAM  0x34
> >  #define DRM_I915_GEM_CONTEXT_SETPARAM  0x35
> > +#define DRM_I915_PERF_OPEN 0x36
> >
> >  #define DRM_IOCTL_I915_INITDRM_IOW( DRM_COMMAND_BASE +
> DRM_I915_INIT, drm_i915_init_t)
> >  #define DRM_IOCTL_I915_FLUSH   DRM_IO ( DRM_COMMAND_BASE +
> DRM_I915_FLUSH)
> > @@ -311,6 +312,7 @@ typedef struct _drm_i915_sarea {
> >  #define DRM_IOCTL_I915_GEM_USERPTR DRM_IOWR
> (DRM_COMMAND_BASE + DRM_I915_GEM_USERPTR, struct drm_i915_gem_userptr)
> >  #define DRM_IOCTL_I915_GEM_CONTEXT_GETPARAMDRM_IOWR
> (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_GETPARAM, struct
> drm_i915_gem_context_param)
> >  #define DRM_IOCTL_I915_GEM_CONTEXT_SETPARAMDRM_IOWR
> (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_SETPARAM, struct
> drm_i915_gem_context_param)
> > +#define DRM_IOCTL_I915_PERF_OPEN   DRM_IOR(DRM_COMMAND_BASE +
> DRM_I915_PERF_OPEN, struct drm_i915_perf_open_param)
> As you already pointed out this will need to be IOW.
>

Yeah, changed locally after I realised the mistake here, just didn't get
around to posting the patch.


Also applied the notes to not redundantly init some vars, spurious new
line, redundant include.

Thanks,
- Robert
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://lists.freedesktop.org/archives/dri-devel/attachments/20161011/1075dc85/attachment.html>


[PATCH] drm/i915: Add i915 perf infrastructure

2016-09-14 Thread Robert Bragg
Adds base i915 perf infrastructure for Gen performance metrics.

This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64
properties to configure a stream of metrics and returns a new fd usable
with standard VFS system calls including read() to read typed and sized
records; ioctl() to enable or disable capture and poll() to wait for
data.

A stream is opened something like:

  uint64_t properties[] = {
  /* Single context sampling */
  DRM_I915_PERF_PROP_CTX_HANDLE,ctx_handle,

  /* Include OA reports in samples */
  DRM_I915_PERF_PROP_SAMPLE_OA, true,

  /* OA unit configuration */
  DRM_I915_PERF_PROP_OA_METRICS_SET,metrics_set_id,
  DRM_I915_PERF_PROP_OA_FORMAT, report_format,
  DRM_I915_PERF_PROP_OA_EXPONENT,   period_exponent,
   };
   struct drm_i915_perf_open_param parm = {
  .flags = I915_PERF_FLAG_FD_CLOEXEC |
   I915_PERF_FLAG_FD_NONBLOCK |
   I915_PERF_FLAG_DISABLED,
  .properties_ptr = (uint64_t)properties,
  .num_properties = sizeof(properties) / 16,
   };
   int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, ¶m);

Records read all start with a common { type, size } header with
DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records
contain an extensible number of fields and it's the
DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that
determine what's included in every sample.

No specific streams are supported yet so any attempt to open a stream
will return an error.

v4:
s/DRM_IORW/DRM_IOR/ - Emil Velikov
v3:
update read() interface to avoid passing state struct - Chris Wilson
fix some rebase fallout, with i915-perf init/deinit
v2:
use i915_gem_context_get() - Chris Wilson

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/Makefile|   3 +
 drivers/gpu/drm/i915/i915_drv.c  |   4 +
 drivers/gpu/drm/i915/i915_drv.h  |  91 
 drivers/gpu/drm/i915/i915_perf.c | 448 +++
 include/uapi/drm/i915_drm.h  |  67 ++
 5 files changed, 613 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_perf.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index a998c2b..d991781 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -110,6 +110,9 @@ i915-y += dvo_ch7017.o \
 # virtual gpu code
 i915-y += i915_vgpu.o

+# perf code
+i915-y += i915_perf.o
+
 ifeq ($(CONFIG_DRM_I915_GVT),y)
 i915-y += intel_gvt.o
 include $(src)/gvt/Makefile
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 7f4e8ad..14f22fc 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -838,6 +838,8 @@ static int i915_driver_init_early(struct drm_i915_private 
*dev_priv,

intel_device_info_dump(dev_priv);

+   i915_perf_init(dev_priv);
+
/* Not all pre-production machines fall into this category, only the
 * very first ones. Almost everything should work, except for maybe
 * suspend/resume. And we don't implement workarounds that affect only
@@ -859,6 +861,7 @@ err_workqueues:
  */
 static void i915_driver_cleanup_early(struct drm_i915_private *dev_priv)
 {
+   i915_perf_fini(dev_priv);
i915_gem_load_cleanup(&dev_priv->drm);
i915_workqueues_cleanup(dev_priv);
 }
@@ -2560,6 +2563,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, 
DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, 
i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, 
i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, 
DRM_RENDER_ALLOW),
 };

 static struct drm_driver driver = {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1e2dda8..0f5cd8f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1740,6 +1740,84 @@ struct intel_wm_config {
bool sprites_scaled;
 };

+struct i915_perf_stream;
+
+struct i915_perf_stream_ops {
+   /* Enables the collection of HW samples, either in response to
+* I915_PERF_IOCTL_ENABLE or implicitly called when stream is
+* opened without I915_PERF_FLAG_DISABLED.
+*/
+   void (*enable)(struct i915_perf_stream *stream);
+
+   /* Disables the collection of HW samples, either in response to
+* I915_PERF_IOCTL_DISABLE or implicitly called before
+* destroying the stream.
+*/
+   void (*disable)(struct i915_perf_stream *stream);
+
+   /* Return: true if any i915 perf records are ready to read()
+* for this stream.
+*/
+   bool (*can_read)(struct i915_perf_stream *stream);
+
+   /* Call poll_wait, passing a wait queue that will be woken
+* once there is s

[Intel-gfx] [PATCH v5 01/11] drm/i915: Add i915 perf infrastructure

2016-09-14 Thread Robert Bragg
On Wed, Sep 14, 2016 at 3:42 PM, Emil Velikov 
wrote:

> Hi Robert,
>
> I think I've spotted one interesting, yet trivial bit.
>
> On 14 September 2016 at 15:19, Robert Bragg  wrote:
> > Adds base i915 perf infrastructure for Gen performance metrics.
> >
> > This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64
> > properties to configure a stream of metrics and returns a new fd usable
> > with standard VFS system calls including read() to read typed and sized
> > records; ioctl() to enable or disable capture and poll() to wait for
> > data.
> >
> > A stream is opened something like:
> >
> >   uint64_t properties[] = {
> >   /* Single context sampling */
> >   DRM_I915_PERF_PROP_CTX_HANDLE,ctx_handle,
> >
> >   /* Include OA reports in samples */
> >   DRM_I915_PERF_PROP_SAMPLE_OA, true,
> >
> >   /* OA unit configuration */
> >   DRM_I915_PERF_PROP_OA_METRICS_SET,metrics_set_id,
> >   DRM_I915_PERF_PROP_OA_FORMAT, report_format,
> >   DRM_I915_PERF_PROP_OA_EXPONENT,   period_exponent,
> >};
> >struct drm_i915_perf_open_param parm = {
> >   .flags = I915_PERF_FLAG_FD_CLOEXEC |
> >I915_PERF_FLAG_FD_NONBLOCK |
> >I915_PERF_FLAG_DISABLED,
> >   .properties_ptr = (uint64_t)properties,
> >   .num_properties = sizeof(properties) / 16,
> >};
> >int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, ¶m);
> >
> > Records read all start with a common { type, size } header with
> > DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records
> > contain an extensible number of fields and it's the
> > DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that
> > determine what's included in every sample.
> >
> If I'm understanding the above correctly the ioctl can only read user
> data and does not write to params, correct ?
>
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
>
> > +#define DRM_IOCTL_I915_PERF_OPEN   DRM_IOWR(DRM_COMMAND_BASE +
> DRM_I915_PERF_OPEN, struct drm_i915_perf_open_param)
>
> If so, we seems to have a one letter too much in DRM_IOWR - should one
> use DRM_IOW/DRM_IOR ? Then again I'm not sure how many ioctls bother,
> so please don't read too much into my suggestion :-)
>

Ah, yep, good catch, I don't write back to the param struct any more.

The first iteration of this interface was even more closely modeled on the
core linux perf interface where the param struct starts with a size member
and in a case where userspace passes a structure that's smaller than
expected the kernel returns an error but also writes back the expected size
to inform userspace.

i915 perf moved to taking an array of u64 properties and no longer writes
back a size member in the param struct like perf.

Thanks,
- Robert



>
> Regards,
> Emil
>
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://lists.freedesktop.org/archives/dri-devel/attachments/20160914/b1d161bb/attachment.html>


[PATCH v5 11/11] drm/i915: Add a kerneldoc summary for i915_perf.c

2016-09-14 Thread Robert Bragg
In particular this tries to capture for posterity some of the early
challenges we had with using the core perf infrastructure in case we
ever want to revisit adapting perf for device metrics.

Cc: Chris Wilson 
Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_perf.c | 163 +++
 1 file changed, 163 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index a7a248b..891efe6 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -24,6 +24,169 @@
  *   Robert Bragg 
  */

+
+/**
+ * DOC: i915 Perf, streaming API for GPU metrics
+ *
+ * Gen graphics supports a large number of performance counters that can help
+ * driver and application developers understand and optimize their use of the
+ * GPU.
+ *
+ * This i915 perf interface enables userspace to configure and open a file
+ * descriptor representing a stream of GPU metrics which can then be read() as
+ * a stream of sample records.
+ *
+ * The interface is particularly suited to exposing buffered metrics that are
+ * captured by DMA from the GPU, unsynchronized with and unrelated to the CPU.
+ *
+ * Streams representing a single context are accessible to applications with a
+ * corresponding drm file descriptor, such that OpenGL can use the interface
+ * without special privileges. Access to system-wide metrics requires root
+ * privileges by default, unless changed via the dev.i915.perf_event_paranoid
+ * sysctl option.
+ *
+ *
+ * The interface was initially inspired by the core Perf infrastructure but
+ * some notable differences are:
+ *
+ * i915 perf file descriptors represent a "stream" instead of an "event"; where
+ * a perf event primarily corresponds to a single 64bit value, while a stream
+ * might sample sets of tightly-coupled counters, depending on the
+ * configuration.  For example the Gen OA unit isn't designed to support
+ * orthogonal configurations of individual counters; it's configured for a set
+ * of related counters. Samples for an i915 perf stream capturing OA metrics
+ * will include a set of counter values packed in a compact HW specific format.
+ * The OA unit supports a number of different packing formats which can be
+ * selected by the user opening the stream. Perf has support for grouping
+ * events, but each event in the group is configured, validated and
+ * authenticated individually with separate system calls.
+ *
+ * i915 perf stream configurations are provided as an array of u64 (key,value)
+ * pairs, instead of a fixed struct with multiple miscellaneous config members,
+ * interleaved with event-type specific members.
+ *
+ * i915 perf doesn't support exposing metrics via an mmap'd circular buffer.
+ * The supported metrics are being written to memory by the GPU unsynchronized
+ * with the CPU, using HW specific packing formats for counter sets. Sometimes
+ * the constraints on HW configuration require reports to be filtered before it
+ * would be acceptable to expose them to unprivileged applications - to hide
+ * the metrics of other processes/contexts. For these use cases a read() based
+ * interface is a good fit, and provides an opportunity to filter data as it
+ * gets copied from the GPU mapped buffers to userspace buffers.
+ *
+ *
+ * Some notes regarding Linux Perf:
+ * 
+ *
+ * The first prototype of this driver was based on the core perf
+ * infrastructure, and while we did make that mostly work, with some changes to
+ * perf, we found we were breaking or working around too many assumptions baked
+ * into perf's currently cpu centric design.
+ *
+ * In the end we didn't see a clear benefit to making perf's implementation and
+ * interface more complex by changing design assumptions while we knew we still
+ * wouldn't be able to use any existing perf based userspace tools.
+ *
+ * Also considering the Gen specific nature of the Observability hardware and
+ * how userspace will sometimes need to combine i915 perf OA metrics with
+ * side-band OA data captured via MI_REPORT_PERF_COUNT commands; we're
+ * expecting the interface to be used by a platform specific userspace such as
+ * OpenGL or tools. This is to say; we aren't inherently missing out on having
+ * a standard vendor/architecture agnostic interface by not using perf.
+ *
+ *
+ * For posterity, in case we might re-visit trying to adapt core perf to be
+ * better suited to exposing i915 metrics these were the main pain points we
+ * hit:
+ *
+ * - The perf based OA PMU driver broke some significant design assumptions:
+ *
+ *   Existing perf pmus are used for profiling work on a cpu and we were
+ *   introducing the idea of _IS_DEVICE pmus with different security
+ *   implications, the need to fake cpu-related data (such as user/kernel
+ *   registers) to fit with perf's current design, and adding _DEVICE records
+ *   as a 

[PATCH v5 10/11] drm/i915: Add more Haswell OA metric sets

2016-09-14 Thread Robert Bragg
This adds 'compute', 'compute extended', 'memory reads', 'memory writes'
and 'sampler balance' metric sets for Haswell.

The code is auto generated from an XML description of metric sets,
currently maintained in gputop, ref:

 https://github.com/rib/gputop
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_oa_hsw.c | 559 -
 1 file changed, 558 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
index 656334d..7906a26 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.c
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -30,9 +30,14 @@

 enum metric_set_id {
METRIC_SET_ID_RENDER_BASIC = 1,
+   METRIC_SET_ID_COMPUTE_BASIC,
+   METRIC_SET_ID_COMPUTE_EXTENDED,
+   METRIC_SET_ID_MEMORY_READS,
+   METRIC_SET_ID_MEMORY_WRITES,
+   METRIC_SET_ID_SAMPLER_BALANCE,
 };

-int i915_oa_n_builtin_metric_sets_hsw = 1;
+int i915_oa_n_builtin_metric_sets_hsw = 6;

 static const struct i915_oa_reg b_counter_config_render_basic[] = {
{ _MMIO(0x2724), 0x0080 },
@@ -111,6 +116,298 @@ get_render_basic_mux_config(struct drm_i915_private 
*dev_priv,
return mux_config_render_basic;
 }

+static const struct i915_oa_reg b_counter_config_compute_basic[] = {
+   { _MMIO(0x2710), 0x },
+   { _MMIO(0x2714), 0x0080 },
+   { _MMIO(0x2718), 0x },
+   { _MMIO(0x271c), 0x },
+   { _MMIO(0x2720), 0x },
+   { _MMIO(0x2724), 0x0080 },
+   { _MMIO(0x2728), 0x },
+   { _MMIO(0x272c), 0x },
+   { _MMIO(0x2740), 0x },
+   { _MMIO(0x2744), 0x },
+   { _MMIO(0x2748), 0x },
+   { _MMIO(0x274c), 0x },
+   { _MMIO(0x2750), 0x },
+   { _MMIO(0x2754), 0x },
+   { _MMIO(0x2758), 0x },
+   { _MMIO(0x275c), 0x },
+   { _MMIO(0x236c), 0x },
+};
+
+static const struct i915_oa_reg mux_config_compute_basic[] = {
+   { _MMIO(0x253a4), 0x },
+   { _MMIO(0x2681c), 0x01f00800 },
+   { _MMIO(0x26820), 0x1000 },
+   { _MMIO(0x2781c), 0x01f00800 },
+   { _MMIO(0x26520), 0x0007 },
+   { _MMIO(0x265a0), 0x0007 },
+   { _MMIO(0x25380), 0x0010 },
+   { _MMIO(0x2538c), 0x0030 },
+   { _MMIO(0x25384), 0xaa8a },
+   { _MMIO(0x25404), 0x },
+   { _MMIO(0x26800), 0x4202 },
+   { _MMIO(0x26808), 0x00605817 },
+   { _MMIO(0x2680c), 0x10001005 },
+   { _MMIO(0x26804), 0x },
+   { _MMIO(0x27800), 0x0102 },
+   { _MMIO(0x27808), 0x0c0701e0 },
+   { _MMIO(0x2780c), 0x000200a0 },
+   { _MMIO(0x27804), 0x },
+   { _MMIO(0x26484), 0x4400 },
+   { _MMIO(0x26704), 0x4400 },
+   { _MMIO(0x26500), 0x0006 },
+   { _MMIO(0x26510), 0x0001 },
+   { _MMIO(0x26504), 0x8800 },
+   { _MMIO(0x26580), 0x0006 },
+   { _MMIO(0x26590), 0x0020 },
+   { _MMIO(0x26584), 0x },
+   { _MMIO(0x26104), 0x5582 },
+   { _MMIO(0x26184), 0xaa86 },
+   { _MMIO(0x25420), 0x08320c83 },
+   { _MMIO(0x25424), 0x06820c83 },
+   { _MMIO(0x2541c), 0x },
+   { _MMIO(0x25428), 0x0c03 },
+};
+
+static const struct i915_oa_reg *
+get_compute_basic_mux_config(struct drm_i915_private *dev_priv,
+int *len)
+{
+   *len = ARRAY_SIZE(mux_config_compute_basic);
+   return mux_config_compute_basic;
+}
+
+static const struct i915_oa_reg b_counter_config_compute_extended[] = {
+   { _MMIO(0x2724), 0xf080 },
+   { _MMIO(0x2720), 0x },
+   { _MMIO(0x2714), 0xf080 },
+   { _MMIO(0x2710), 0x },
+   { _MMIO(0x2770), 0x0007fe2a },
+   { _MMIO(0x2774), 0xff00 },
+   { _MMIO(0x2778), 0x0007fe6a },
+   { _MMIO(0x277c), 0xff00 },
+   { _MMIO(0x2780), 0x0007fe92 },
+   { _MMIO(0x2784), 0xff00 },
+   { _MMIO(0x2788), 0x0007fea2 },
+   { _MMIO(0x278c), 0xff00 },
+   { _MMIO(0x2790), 0x0007fe32 },
+   { _MMIO(0x2794), 0xff00 },
+   { _MMIO(0x2798), 0x0007fe9a },
+   { _MMIO(0x279c), 0xff00 },
+   { _MMIO(0x27a0), 0x0007ff23 },
+   { _MMIO(0x27a4), 0xff00 },
+   { _MMIO(0x27a8), 0x0007fff3 },
+   { _MMIO(0x27ac), 0xfffe },
+};
+
+static const struct i915_oa_reg mux_config_compute_extended[] = {
+   { _MMIO(0x2681c), 0x3eb00800 },
+   { _MMIO(0x26820), 0x0090 },
+   { _MMIO(0x25384), 0x02aa },
+   { _MMIO(0x25404), 0x03ff },
+   { _MMIO(0x26800), 0x00142284 },
+   { _MMIO(0x26808), 0x0e629062 },
+   { _MMIO(0x2680c), 0x3f6f55cb },
+   { _MMIO(0x26810), 0x0014 },
+   { _MMIO(0x26804), 0x },

[PATCH v5 09/11] drm/i915: add oa_event_min_timer_exponent sysctl

2016-09-14 Thread Robert Bragg
The minimal sampling period is now configurable via a
dev.i915.oa_min_timer_exponent sysctl parameter.

Following the precedent set by perf, the default is the minimum that
won't (on its own) exceed the default kernel.perf_event_max_sample_rate
default of 10 samples/s.

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_perf.c | 42 
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 38b13fa..a7a248b 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -74,6 +74,23 @@ static u32 i915_perf_stream_paranoid = true;
  */
 #define OA_EXPONENT_MAX 31

+/* for sysctl proc_dointvec_minmax of i915_oa_min_timer_exponent */
+static int zero;
+static int oa_exponent_max = OA_EXPONENT_MAX;
+
+/* Theoretically we can program the OA unit to sample every 160ns but don't
+ * allow that by default unless root...
+ *
+ * The period is derived from the exponent as:
+ *
+ *   period = 80ns * 2^(exponent + 1)
+ *
+ * Referring to perf's kernel.perf_event_max_sample_rate for a precedent
+ * (10 by default); with an OA exponent of 6 we get a period of 10.240
+ * microseconds - just under 10Hz
+ */
+static u32 i915_oa_min_timer_exponent = 6;
+
 /* XXX: beware if future OA HW adds new report formats that the current
  * code assumes all reports have a power-of-two size and ~(size - 1) can
  * be used as a mask to align the OA tail pointer.
@@ -1315,21 +1332,13 @@ static int read_properties_unlocked(struct 
drm_i915_private *dev_priv,
return -EINVAL;
}

-   /* NB: The exponent represents a period as follows:
-*
-*   80ns * 2^(period_exponent + 1)
-*
-* Theoretically we can program the OA unit to sample
+   /* Theoretically we can program the OA unit to sample
 * every 160ns but don't allow that by default unless
 * root.
-*
-* Referring to perf's
-* kernel.perf_event_max_sample_rate for a precedent
-* (10 by default); with an OA exponent of 6 we get
-* a period of 10.240 microseconds -just under 10Hz
 */
-   if (value < 6 && !capable(CAP_SYS_ADMIN)) {
-   DRM_ERROR("Sampling period too high without 
root privileges\n");
+   if (value < i915_oa_min_timer_exponent &&
+   !capable(CAP_SYS_ADMIN)) {
+   DRM_ERROR("OA timer exponent too low without 
root privileges\n");
return -EACCES;
}

@@ -1433,6 +1442,15 @@ static struct ctl_table oa_table[] = {
 .mode = 0644,
 .proc_handler = proc_dointvec,
 },
+   {
+.procname = "oa_min_timer_exponent",
+.data = &i915_oa_min_timer_exponent,
+.maxlen = sizeof(i915_oa_min_timer_exponent),
+.mode = 0644,
+.proc_handler = proc_dointvec_minmax,
+.extra1 = &zero,
+.extra2 = &oa_exponent_max,
+},
{}
 };

-- 
2.9.2



[PATCH v5 08/11] drm/i915: Add dev.i915.perf_event_paranoid sysctl option

2016-09-14 Thread Robert Bragg
Consistent with the kernel.perf_event_paranoid sysctl option that can
allow non-root users to access system wide cpu metrics, this can
optionally allow non-root users to access system wide OA counter metrics
from Gen graphics hardware.

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/i915_perf.c | 45 +++-
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f5ddf70..eaba7a9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2142,6 +2142,7 @@ struct drm_i915_private {
bool initialized;

struct kobject *metrics_kobj;
+   struct ctl_table_header *sysctl_header;

struct mutex lock;
struct list_head streams;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index e890c38..38b13fa 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -62,6 +62,8 @@
 #define POLL_FREQUENCY 200
 #define POLL_PERIOD (NSEC_PER_SEC / POLL_FREQUENCY)

+static u32 i915_perf_stream_paranoid = true;
+
 /* The maximum exponent the hardware accepts is 63 (essentially it selects one
  * of the 64bit timestamp bits to trigger reports from) but there's currently
  * no known use case for sampling as infrequently as once per 47 thousand 
years.
@@ -1170,7 +1172,13 @@ int i915_perf_open_ioctl_locked(struct drm_device *dev,
}
}

-   if (!specific_ctx && !capable(CAP_SYS_ADMIN)) {
+   /* Similar to perf's kernel.perf_paranoid_cpu sysctl option
+* we check a dev.i915.perf_stream_paranoid sysctl option
+* to determine if it's ok to access system wide OA counters
+* without CAP_SYS_ADMIN privileges.
+*/
+   if (!specific_ctx &&
+   i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
DRM_ERROR("Insufficient privileges to open system-wide i915 
perf stream\n");
ret = -EACCES;
goto err_ctx;
@@ -1417,6 +1425,37 @@ void i915_perf_unregister(struct drm_i915_private 
*dev_priv)
dev_priv->perf.metrics_kobj = NULL;
 }

+static struct ctl_table oa_table[] = {
+   {
+.procname = "perf_stream_paranoid",
+.data = &i915_perf_stream_paranoid,
+.maxlen = sizeof(i915_perf_stream_paranoid),
+.mode = 0644,
+.proc_handler = proc_dointvec,
+},
+   {}
+};
+
+static struct ctl_table i915_root[] = {
+   {
+.procname = "i915",
+.maxlen = 0,
+.mode = 0555,
+.child = oa_table,
+},
+   {}
+};
+
+static struct ctl_table dev_root[] = {
+   {
+.procname = "dev",
+.maxlen = 0,
+.mode = 0555,
+.child = i915_root,
+},
+   {}
+};
+
 void i915_perf_init(struct drm_i915_private *dev_priv)
 {
if (!IS_HASWELL(dev_priv))
@@ -1449,6 +1488,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
dev_priv->perf.oa.n_builtin_sets =
i915_oa_n_builtin_metric_sets_hsw;

+   dev_priv->perf.sysctl_header = register_sysctl_table(dev_root);
+
dev_priv->perf.initialized = true;
 }

@@ -1457,6 +1498,8 @@ void i915_perf_fini(struct drm_i915_private *dev_priv)
if (!dev_priv->perf.initialized)
return;

+   unregister_sysctl_table(dev_priv->perf.sysctl_header);
+
memset(&dev_priv->perf.oa.ops, 0, sizeof(dev_priv->perf.oa.ops));
dev_priv->perf.initialized = false;
 }
-- 
2.9.2



[PATCH v5 07/11] drm/i915: advertise available metrics via sysfs

2016-09-14 Thread Robert Bragg
Each metric set is given a sysfs entry like:

/sys/class/drm/card0/metrics//id

This allows userspace to enumerate the specific sets that are available
for the current system. The 'id' file contains an unsigned integer that
can be used to open the associated metric set via
DRM_IOCTL_I915_PERF_OPEN. The  is a globally unique ID for a
specific OA unit register configuration that can be reliably used by
userspace as a key to lookup corresponding counter meta data and
normalization equations.

The guid registry is currently maintained as part of gputop along with
the XML metric set descriptions and code generation scripts, ref:

 https://github.com/rib/gputop
 > gputop-data/guids.xml
 > scripts/update-guids.py
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_drv.c|  5 
 drivers/gpu/drm/i915/i915_drv.h|  4 +++
 drivers/gpu/drm/i915/i915_oa_hsw.c | 51 +
 drivers/gpu/drm/i915/i915_oa_hsw.h |  4 +++
 drivers/gpu/drm/i915/i915_perf.c   | 52 ++
 5 files changed, 116 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 14f22fc..8965df2 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1125,6 +1125,9 @@ static void i915_driver_register(struct drm_i915_private 
*dev_priv)
if (drm_dev_register(dev, 0) == 0) {
i915_debugfs_register(dev_priv);
i915_setup_sysfs(dev_priv);
+
+   /* Depends on sysfs having been initialized */
+   i915_perf_register(dev_priv);
} else
DRM_ERROR("Failed to register driver for userspace access!\n");

@@ -1161,6 +1164,8 @@ static void i915_driver_unregister(struct 
drm_i915_private *dev_priv)
acpi_video_unregister();
intel_opregion_unregister(dev_priv);

+   i915_perf_unregister(dev_priv);
+
i915_teardown_sysfs(dev_priv);
i915_debugfs_unregister(dev_priv);
drm_dev_unregister(&dev_priv->drm);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 551f078..f5ddf70 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2141,6 +2141,8 @@ struct drm_i915_private {
struct {
bool initialized;

+   struct kobject *metrics_kobj;
+
struct mutex lock;
struct list_head streams;

@@ -3711,6 +3713,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
 /* i915_perf.c */
 extern void i915_perf_init(struct drm_i915_private *dev_priv);
 extern void i915_perf_fini(struct drm_i915_private *dev_priv);
+extern void i915_perf_register(struct drm_i915_private *dev_priv);
+extern void i915_perf_unregister(struct drm_i915_private *dev_priv);

 /* i915_suspend.c */
 extern int i915_save_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
index eb5ceca..656334d 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.c
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -24,6 +24,8 @@
  *
  */

+#include 
+
 #include "i915_drv.h"

 enum metric_set_id {
@@ -141,3 +143,52 @@ int i915_oa_select_metric_set_hsw(struct drm_i915_private 
*dev_priv)
return -ENODEV;
}
 }
+
+static ssize_t
+show_render_basic_id(struct device *kdev, struct device_attribute *attr, char 
*buf)
+{
+   return sprintf(buf, "%d\n", METRIC_SET_ID_RENDER_BASIC);
+}
+
+static struct device_attribute dev_attr_render_basic_id = {
+   .attr = { .name = "id", .mode = S_IRUGO },
+   .show = show_render_basic_id,
+   .store = NULL,
+};
+
+static struct attribute *attrs_render_basic[] = {
+   &dev_attr_render_basic_id.attr,
+   NULL,
+};
+
+static struct attribute_group group_render_basic = {
+   .name = "403d8832-1a27-4aa6-a64e-f5389ce7b212",
+   .attrs =  attrs_render_basic,
+};
+
+int
+i915_perf_register_sysfs_hsw(struct drm_i915_private *dev_priv)
+{
+   int mux_len;
+   int ret = 0;
+
+   if (get_render_basic_mux_config(dev_priv, &mux_len)) {
+   ret = sysfs_create_group(dev_priv->perf.metrics_kobj, 
&group_render_basic);
+   if (ret)
+   goto error_render_basic;
+   }
+
+   return 0;
+
+error_render_basic:
+   return ret;
+}
+
+void
+i915_perf_unregister_sysfs_hsw(struct drm_i915_private *dev_priv)
+{
+   int mux_len;
+
+   if (get_render_basic_mux_config(dev_priv, &mux_len))
+   sysfs_remove_group(dev_priv->perf.metrics_kobj, 
&group_render_basic);
+}
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.h 
b/drivers/gpu/drm/i915/i915_oa_hsw.h
index b618a1f..429a229 100644
--- a

[PATCH v5 06/11] drm/i915: Enable i915 perf stream for Haswell OA unit

2016-09-14 Thread Robert Bragg
Gen graphics hardware can be set up to periodically write snapshots of
performance counters into a circular buffer via its Observation
Architecture and this patch exposes that capability to userspace via the
i915 perf interface.

Cc: Chris Wilson 
Signed-off-by: Robert Bragg 
Signed-off-by: Zhenyu Wang 
---
 drivers/gpu/drm/i915/i915_drv.h |  72 ++-
 drivers/gpu/drm/i915/i915_gem_context.c |  22 +-
 drivers/gpu/drm/i915/i915_perf.c| 998 +++-
 drivers/gpu/drm/i915/i915_reg.h | 338 +++
 drivers/gpu/drm/i915/intel_ringbuffer.c |  10 +-
 include/uapi/drm/i915_drm.h |  70 ++-
 6 files changed, 1477 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5fad018..551f078 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1740,6 +1740,11 @@ struct intel_wm_config {
bool sprites_scaled;
 };

+struct i915_oa_format {
+   u32 format;
+   int size;
+};
+
 struct i915_oa_reg {
i915_reg_t addr;
u32 value;
@@ -1760,11 +1765,6 @@ struct i915_perf_stream_ops {
 */
void (*disable)(struct i915_perf_stream *stream);

-   /* Return: true if any i915 perf records are ready to read()
-* for this stream.
-*/
-   bool (*can_read)(struct i915_perf_stream *stream);
-
/* Call poll_wait, passing a wait queue that will be woken
 * once there is something ready to read() for the stream
 */
@@ -1774,9 +1774,7 @@ struct i915_perf_stream_ops {

/* For handling a blocking read, wait until there is something
 * to ready to read() for the stream. E.g. wait on the same
-* wait queue that would be passed to poll_wait() until
-* ->can_read() returns true (if its safe to call ->can_read()
-* without the i915 perf lock held).
+* wait queue that would be passed to poll_wait().
 */
int (*wait_unlocked)(struct i915_perf_stream *stream);

@@ -1816,11 +1814,28 @@ struct i915_perf_stream {
struct list_head link;

u32 sample_flags;
+   int sample_size;

struct i915_gem_context *ctx;
bool enabled;

-   struct i915_perf_stream_ops *ops;
+   const struct i915_perf_stream_ops *ops;
+};
+
+struct i915_oa_ops {
+   void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
+   int (*enable_metric_set)(struct drm_i915_private *dev_priv);
+   void (*disable_metric_set)(struct drm_i915_private *dev_priv);
+   void (*oa_enable)(struct drm_i915_private *dev_priv);
+   void (*oa_disable)(struct drm_i915_private *dev_priv);
+   void (*update_oacontrol)(struct drm_i915_private *dev_priv);
+   void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv,
+   u32 ctx_id);
+   int (*read)(struct i915_perf_stream *stream,
+   char __user *buf,
+   size_t count,
+   size_t *offset);
+   bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv);
 };

 struct drm_i915_private {
@@ -2125,16 +2140,48 @@ struct drm_i915_private {

struct {
bool initialized;
+
struct mutex lock;
struct list_head streams;

+   spinlock_t hook_lock;
+
struct {
-   u32 metrics_set;
+   struct i915_perf_stream *exclusive_stream;
+
+   u32 specific_ctx_id;
+
+   struct hrtimer poll_check_timer;
+   wait_queue_head_t poll_wq;
+   atomic_t pollin;
+
+   bool periodic;
+   int period_exponent;
+   int timestamp_frequency;
+
+   int tail_margin;
+
+   int metrics_set;

const struct i915_oa_reg *mux_regs;
int mux_regs_len;
const struct i915_oa_reg *b_counter_regs;
int b_counter_regs_len;
+
+   struct {
+   struct drm_i915_gem_object *obj;
+   struct i915_vma *vma;
+   u32 gtt_offset;
+   u8 *addr;
+   int format;
+   int format_size;
+   } oa_buffer;
+
+   u32 gen7_latched_oastatus1;
+
+   struct i915_oa_ops ops;
+   const struct i915_oa_format *oa_formats;
+   int n_builtin_sets;
} oa;
} perf;

@@ -3499,6 +3546,9 @@ struct drm_i915_gem_object *
 i915_gem_alloc_context_obj(struct drm_device *dev, size_t size);
 struct i915_gem_context *
 i915_gem_context_create_gvt(struct drm_device *dev)

[PATCH v5 05/11] drm/i915: Add 'render basic' Haswell OA unit config

2016-09-14 Thread Robert Bragg
Adds a static OA unit, MUX + B Counter configuration for basic render
metrics on Haswell. This is auto generated from an XML
description of metric sets, currently maintained in gputop, ref:

  https://github.com/rib/gputop
  > gputop-data/oa-*.xml
  > scripts/i915-perf-kernelgen.py

  $ make -C gputop-data -f Makefile.xml SYSFS=0 WHITELIST=RenderBasic

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/Makefile  |   3 +-
 drivers/gpu/drm/i915/i915_drv.h|  14 
 drivers/gpu/drm/i915/i915_oa_hsw.c | 143 +
 drivers/gpu/drm/i915/i915_oa_hsw.h |  34 +
 4 files changed, 193 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index d991781..6cb25dd 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -111,7 +111,8 @@ i915-y += dvo_ch7017.o \
 i915-y += i915_vgpu.o

 # perf code
-i915-y += i915_perf.o
+i915-y += i915_perf.o \
+ i915_oa_hsw.o

 ifeq ($(CONFIG_DRM_I915_GVT),y)
 i915-y += intel_gvt.o
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0f5cd8f..5fad018 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1740,6 +1740,11 @@ struct intel_wm_config {
bool sprites_scaled;
 };

+struct i915_oa_reg {
+   i915_reg_t addr;
+   u32 value;
+};
+
 struct i915_perf_stream;

 struct i915_perf_stream_ops {
@@ -2122,6 +2127,15 @@ struct drm_i915_private {
bool initialized;
struct mutex lock;
struct list_head streams;
+
+   struct {
+   u32 metrics_set;
+
+   const struct i915_oa_reg *mux_regs;
+   int mux_regs_len;
+   const struct i915_oa_reg *b_counter_regs;
+   int b_counter_regs_len;
+   } oa;
} perf;

/* Abstract the submission mechanism (legacy ringbuffer or execlists) 
away */
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
new file mode 100644
index 000..eb5ceca
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -0,0 +1,143 @@
+/*
+ * Autogenerated file, DO NOT EDIT manually!
+ *
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "i915_drv.h"
+
+enum metric_set_id {
+   METRIC_SET_ID_RENDER_BASIC = 1,
+};
+
+int i915_oa_n_builtin_metric_sets_hsw = 1;
+
+static const struct i915_oa_reg b_counter_config_render_basic[] = {
+   { _MMIO(0x2724), 0x0080 },
+   { _MMIO(0x2720), 0x },
+   { _MMIO(0x2714), 0x0080 },
+   { _MMIO(0x2710), 0x },
+};
+
+static const struct i915_oa_reg mux_config_render_basic[] = {
+   { _MMIO(0x253a4), 0x0160 },
+   { _MMIO(0x25440), 0x0010 },
+   { _MMIO(0x25128), 0x },
+   { _MMIO(0x2691c), 0x0800 },
+   { _MMIO(0x26aa0), 0x0150 },
+   { _MMIO(0x26b9c), 0x6000 },
+   { _MMIO(0x2791c), 0x0800 },
+   { _MMIO(0x27aa0), 0x0150 },
+   { _MMIO(0x27b9c), 0x6000 },
+   { _MMIO(0x2641c), 0x0400 },
+   { _MMIO(0x25380), 0x0010 },
+   { _MMIO(0x2538c), 0x },
+   { _MMIO(0x25384), 0x0800 },
+   { _MMIO(0x25400), 0x0004 },
+   { _MMIO(0x2540c), 0x06029000 },
+   { _MMIO(0x25410), 0x0002 },
+   { _MMIO(0x25404), 0x5c30 },
+   { _MMIO(0x25100), 0x0016 },
+   { _MMIO(0x25110), 0x0400 },
+   { _MMIO(0x25104), 0x },
+   { _MMIO(0x26804), 0x1211 },
+   { _MMIO(0x26884), 0x0100 },
+   { _MMIO(0x26900), 0x0002 },
+   { _MMIO(0x26908), 0x0070 },

[PATCH v5 04/11] drm/i915: don't whitelist oacontrol in cmd parser

2016-09-14 Thread Robert Bragg
Being able to program OACONTROL from a non-privileged batch buffer is
not sufficient to be able to configure the OA unit. This was originally
allowed to help enable Mesa to expose OA counters via the
INTEL_performance_query extension, but the current implementation based
on programming OACONTROL via a batch buffer isn't able to report useable
data without a more complete OA unit configuration. Mesa handles the
possibility that writes to OACONTROL may not be allowed and so only
advertises the extension after explicitly testing that a write to
OACONTROL succeeds. Based on this; removing OACONTROL from the whitelist
should be ok for userspace.

Removing this simplifies adding a new kernel api for configuring the OA
unit without needing to consider the possibility that userspace might
trample on OACONTROL state which we'd like to start managing within
the kernel instead. In particular running any Mesa based GL application
currently results in clearing OACONTROL when initializing which would
disable the capturing of metrics.

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 38 ++
 1 file changed, 2 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 5ad02dc..bdee590 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -450,7 +450,6 @@ static const struct drm_i915_reg_descriptor 
gen7_render_regs[] = {
REG64(PS_INVOCATION_COUNT),
REG64(PS_DEPTH_COUNT),
REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE),
-   REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */
REG64(MI_PREDICATE_SRC0),
REG64(MI_PREDICATE_SRC1),
REG32(GEN7_3DPRIM_END_OFFSET),
@@ -1060,8 +1059,7 @@ bool intel_engine_needs_cmd_parser(struct intel_engine_cs 
*engine)
 static bool check_cmd(const struct intel_engine_cs *engine,
  const struct drm_i915_cmd_descriptor *desc,
  const u32 *cmd, u32 length,
- const bool is_master,
- bool *oacontrol_set)
+ const bool is_master)
 {
if (desc->flags & CMD_DESC_SKIP)
return true;
@@ -1099,31 +1097,6 @@ static bool check_cmd(const struct intel_engine_cs 
*engine,
}

/*
-* OACONTROL requires some special handling for
-* writes. We want to make sure that any batch which
-* enables OA also disables it before the end of the
-* batch. The goal is to prevent one process from
-* snooping on the perf data from another process. To do
-* that, we need to check the value that will be written
-* to the register. Hence, limit OACONTROL writes to
-* only MI_LOAD_REGISTER_IMM commands.
-*/
-   if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) {
-   if (desc->cmd.value == MI_LOAD_REGISTER_MEM) {
-   DRM_DEBUG_DRIVER("CMD: Rejected LRM to 
OACONTROL\n");
-   return false;
-   }
-
-   if (desc->cmd.value == MI_LOAD_REGISTER_REG) {
-   DRM_DEBUG_DRIVER("CMD: Rejected LRR to 
OACONTROL\n");
-   return false;
-   }
-
-   if (desc->cmd.value == MI_LOAD_REGISTER_IMM(1))
-   *oacontrol_set = (cmd[offset + 1] != 0);
-   }
-
-   /*
 * Check the value written to the register against the
 * allowed mask/value pair given in the whitelist entry.
 */
@@ -1214,7 +1187,6 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
u32 *cmd, *batch_end;
struct drm_i915_cmd_descriptor default_desc = noop_desc;
const struct drm_i915_cmd_descriptor *desc = &default_desc;
-   bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
bool needs_clflush_after = false;
int ret = 0;

@@ -1270,8 +1242,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
break;
}

-   if (!check_cmd(engine, desc, cmd, length, is_master,
-  &oacontrol_set)) {
+   if (!check_cmd(engine, desc, cmd, length, is_master)) {
ret = -EACCES;
break;
}
@@ -1279,11 +1250,6 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine

[PATCH v5 03/11] drm/i915: return EACCES for check_cmd() failures

2016-09-14 Thread Robert Bragg
check_cmd() is checking whether a command adheres to certain
restrictions that ensure it's safe to execute within a privileged batch
buffer. Returning false implies a privilege problem, not that the
command is invalid.

The distinction makes the difference between allowing the buffer to be
executed as an unprivileged batch buffer or returning an EINVAL error to
userspace without executing anything.

In a case where userspace may want to test whether it can successfully
write to a register that needs privileges the distinction may be
important and an EINVAL error may be considered fatal.

In particular this is currently true for Mesa, which includes a test for
whether OACONTROL can be written too, but Mesa treats any error when
flushing a batch buffer as fatal, calling exit(1).

As it is currently Mesa can gracefully handle a failure to write to
OACONTROL if the command parser is disabled, but if we were to remove
OACONTROL from the parser's whitelist then the returned EINVAL would
break Mesa applications as they attempt an OACONTROL write.

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 7269fe8..5ad02dc 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -1272,7 +1272,7 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,

if (!check_cmd(engine, desc, cmd, length, is_master,
   &oacontrol_set)) {
-   ret = -EINVAL;
+   ret = -EACCES;
break;
}

-- 
2.9.2



[PATCH v5 02/11] drm/i915: rename OACONTROL GEN7_OACONTROL

2016-09-14 Thread Robert Bragg
OACONTROL changes quite a bit for gen8, with some bits split out into a
per-context OACTXCONTROL register. Rename now before adding more gen7 OA
registers

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 4 ++--
 drivers/gpu/drm/i915/i915_reg.h| 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 3c72b3b..7269fe8 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -450,7 +450,7 @@ static const struct drm_i915_reg_descriptor 
gen7_render_regs[] = {
REG64(PS_INVOCATION_COUNT),
REG64(PS_DEPTH_COUNT),
REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE),
-   REG32(OACONTROL), /* Only allowed for LRI and SRM. See below. */
+   REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */
REG64(MI_PREDICATE_SRC0),
REG64(MI_PREDICATE_SRC1),
REG32(GEN7_3DPRIM_END_OFFSET),
@@ -1108,7 +1108,7 @@ static bool check_cmd(const struct intel_engine_cs 
*engine,
 * to the register. Hence, limit OACONTROL writes to
 * only MI_LOAD_REGISTER_IMM commands.
 */
-   if (reg_addr == i915_mmio_reg_offset(OACONTROL)) {
+   if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) {
if (desc->cmd.value == MI_LOAD_REGISTER_MEM) {
DRM_DEBUG_DRIVER("CMD: Rejected LRM to 
OACONTROL\n");
return false;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index a29d707..90756b2 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -616,7 +616,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define HSW_CS_GPR(n)   _MMIO(0x2600 + (n) * 8)
 #define HSW_CS_GPR_UDW(n)   _MMIO(0x2600 + (n) * 8 + 4)

-#define OACONTROL _MMIO(0x2360)
+#define GEN7_OACONTROL _MMIO(0x2360)

 #define _GEN7_PIPEA_DE_LOAD_SL 0x70068
 #define _GEN7_PIPEB_DE_LOAD_SL 0x71068
-- 
2.9.2



[PATCH v5 01/11] drm/i915: Add i915 perf infrastructure

2016-09-14 Thread Robert Bragg
Adds base i915 perf infrastructure for Gen performance metrics.

This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64
properties to configure a stream of metrics and returns a new fd usable
with standard VFS system calls including read() to read typed and sized
records; ioctl() to enable or disable capture and poll() to wait for
data.

A stream is opened something like:

  uint64_t properties[] = {
  /* Single context sampling */
  DRM_I915_PERF_PROP_CTX_HANDLE,ctx_handle,

  /* Include OA reports in samples */
  DRM_I915_PERF_PROP_SAMPLE_OA, true,

  /* OA unit configuration */
  DRM_I915_PERF_PROP_OA_METRICS_SET,metrics_set_id,
  DRM_I915_PERF_PROP_OA_FORMAT, report_format,
  DRM_I915_PERF_PROP_OA_EXPONENT,   period_exponent,
   };
   struct drm_i915_perf_open_param parm = {
  .flags = I915_PERF_FLAG_FD_CLOEXEC |
   I915_PERF_FLAG_FD_NONBLOCK |
   I915_PERF_FLAG_DISABLED,
  .properties_ptr = (uint64_t)properties,
  .num_properties = sizeof(properties) / 16,
   };
   int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, ¶m);

Records read all start with a common { type, size } header with
DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records
contain an extensible number of fields and it's the
DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that
determine what's included in every sample.

No specific streams are supported yet so any attempt to open a stream
will return an error.

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/Makefile|   3 +
 drivers/gpu/drm/i915/i915_drv.c  |   4 +
 drivers/gpu/drm/i915/i915_drv.h  |  91 
 drivers/gpu/drm/i915/i915_perf.c | 448 +++
 include/uapi/drm/i915_drm.h  |  67 ++
 5 files changed, 613 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_perf.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index a998c2b..d991781 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -110,6 +110,9 @@ i915-y += dvo_ch7017.o \
 # virtual gpu code
 i915-y += i915_vgpu.o

+# perf code
+i915-y += i915_perf.o
+
 ifeq ($(CONFIG_DRM_I915_GVT),y)
 i915-y += intel_gvt.o
 include $(src)/gvt/Makefile
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 7f4e8ad..14f22fc 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -838,6 +838,8 @@ static int i915_driver_init_early(struct drm_i915_private 
*dev_priv,

intel_device_info_dump(dev_priv);

+   i915_perf_init(dev_priv);
+
/* Not all pre-production machines fall into this category, only the
 * very first ones. Almost everything should work, except for maybe
 * suspend/resume. And we don't implement workarounds that affect only
@@ -859,6 +861,7 @@ err_workqueues:
  */
 static void i915_driver_cleanup_early(struct drm_i915_private *dev_priv)
 {
+   i915_perf_fini(dev_priv);
i915_gem_load_cleanup(&dev_priv->drm);
i915_workqueues_cleanup(dev_priv);
 }
@@ -2560,6 +2563,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, 
DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, 
i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, 
i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, 
DRM_RENDER_ALLOW),
 };

 static struct drm_driver driver = {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1e2dda8..0f5cd8f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1740,6 +1740,84 @@ struct intel_wm_config {
bool sprites_scaled;
 };

+struct i915_perf_stream;
+
+struct i915_perf_stream_ops {
+   /* Enables the collection of HW samples, either in response to
+* I915_PERF_IOCTL_ENABLE or implicitly called when stream is
+* opened without I915_PERF_FLAG_DISABLED.
+*/
+   void (*enable)(struct i915_perf_stream *stream);
+
+   /* Disables the collection of HW samples, either in response to
+* I915_PERF_IOCTL_DISABLE or implicitly called before
+* destroying the stream.
+*/
+   void (*disable)(struct i915_perf_stream *stream);
+
+   /* Return: true if any i915 perf records are ready to read()
+* for this stream.
+*/
+   bool (*can_read)(struct i915_perf_stream *stream);
+
+   /* Call poll_wait, passing a wait queue that will be woken
+* once there is something ready to read() for the stream
+*/
+   void (*poll_wait)(struct i915_perf_stream *stream,
+ struct file *file,
+ poll_table *wait);
+
+   /* For handling a blo

[PATCH v5 00/11] Enable i915 perf stream for Haswell OA unit

2016-09-14 Thread Robert Bragg
This just rebases my i915 perf series on a recent drm-intel-nightly.

Considering now that this series has been reviewed a number of times by Chris,
and I think I've responded to his feedback: I wonder if this series is ready
to be added to drm-intel-nightly soon?

I think most of the effort for this series at the moment is just keeping up
with rebasing on nightlies.

Regards,
- Robert

Robert Bragg (11):
  drm/i915: Add i915 perf infrastructure
  drm/i915: rename OACONTROL GEN7_OACONTROL
  drm/i915: return EACCES for check_cmd() failures
  drm/i915: don't whitelist oacontrol in cmd parser
  drm/i915: Add 'render basic' Haswell OA unit config
  drm/i915: Enable i915 perf stream for Haswell OA unit
  drm/i915: advertise available metrics via sysfs
  drm/i915: Add dev.i915.perf_event_paranoid sysctl option
  drm/i915: add oa_event_min_timer_exponent sysctl
  drm/i915: Add more Haswell OA metric sets
  drm/i915: Add a kerneldoc summary for i915_perf.c

 drivers/gpu/drm/i915/Makefile   |4 +
 drivers/gpu/drm/i915/i915_cmd_parser.c  |   40 +-
 drivers/gpu/drm/i915/i915_drv.c |9 +
 drivers/gpu/drm/i915/i915_drv.h |  162 +++
 drivers/gpu/drm/i915/i915_gem_context.c |   22 +-
 drivers/gpu/drm/i915/i915_oa_hsw.c  |  751 ++
 drivers/gpu/drm/i915/i915_oa_hsw.h  |   38 +
 drivers/gpu/drm/i915/i915_perf.c| 1686 +++
 drivers/gpu/drm/i915/i915_reg.h |  340 ++-
 drivers/gpu/drm/i915/intel_ringbuffer.c |   10 +-
 include/uapi/drm/i915_drm.h |  133 +++
 11 files changed, 3154 insertions(+), 41 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h
 create mode 100644 drivers/gpu/drm/i915/i915_perf.c

-- 
2.9.2



[PATCH v5 07/11] drm/i915: advertise available metrics via sysfs

2016-08-19 Thread Robert Bragg
Each metric set is given a sysfs entry like:

/sys/class/drm/card0/metrics//id

This allows userspace to enumerate the specific sets that are available
for the current system. The 'id' file contains an unsigned integer that
can be used to open the associated metric set via
DRM_IOCTL_I915_PERF_OPEN. The  is a globally unique ID for a
specific OA unit register configuration that can be reliably used by
userspace as a key to lookup corresponding counter meta data and
normalization equations.

The guid registry is currently maintained as part of gputop along with
the XML metric set descriptions and code generation scripts, ref:

 https://github.com/rib/gputop
 > gputop-data/guids.xml
 > scripts/update-guids.py
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_drv.c|  5 
 drivers/gpu/drm/i915/i915_drv.h|  4 +++
 drivers/gpu/drm/i915/i915_oa_hsw.c | 45 +
 drivers/gpu/drm/i915/i915_oa_hsw.h |  4 +++
 drivers/gpu/drm/i915/i915_perf.c   | 52 ++
 5 files changed, 110 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 92f668e..53df16f 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1150,6 +1150,9 @@ static void i915_driver_register(struct drm_i915_private 
*dev_priv)
if (drm_dev_register(dev, 0) == 0) {
i915_debugfs_register(dev_priv);
i915_setup_sysfs(dev);
+
+   /* Depends on sysfs having been initialized */
+   i915_perf_register(dev_priv);
} else
DRM_ERROR("Failed to register driver for userspace access!\n");

@@ -1186,6 +1189,8 @@ static void i915_driver_unregister(struct 
drm_i915_private *dev_priv)
acpi_video_unregister();
intel_opregion_unregister(dev_priv);

+   i915_perf_unregister(dev_priv);
+
i915_teardown_sysfs(&dev_priv->drm);
i915_debugfs_unregister(dev_priv);
drm_dev_unregister(&dev_priv->drm);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4c302cd..dd88eb1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2115,6 +2115,8 @@ struct drm_i915_private {
struct {
bool initialized;

+   struct kobject *metrics_kobj;
+
struct mutex lock;
struct list_head streams;

@@ -3694,6 +3696,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
 /* i915_perf.c */
 extern void i915_perf_init(struct drm_i915_private *dev_priv);
 extern void i915_perf_fini(struct drm_i915_private *dev_priv);
+extern void i915_perf_register(struct drm_i915_private *dev_priv);
+extern void i915_perf_unregister(struct drm_i915_private *dev_priv);

 /* i915_suspend.c */
 extern int i915_save_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
index 3e6006ec..3f9dd80 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.c
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -24,6 +24,8 @@
  *
  */

+#include 
+
 #include "i915_drv.h"

 enum metric_set_id {
@@ -130,3 +132,46 @@ int i915_oa_select_metric_set_hsw(struct drm_i915_private 
*dev_priv)
return -ENODEV;
}
 }
+
+static ssize_t
+show_render_basic_id(struct device *kdev, struct device_attribute *attr, char 
*buf)
+{
+   return sprintf(buf, "%d\n", METRIC_SET_ID_RENDER_BASIC);
+}
+
+static struct device_attribute dev_attr_render_basic_id = {
+   .attr = { .name = "id", .mode = S_IRUGO },
+   .show = show_render_basic_id,
+   .store = NULL,
+};
+
+static struct attribute *attrs_render_basic[] = {
+   &dev_attr_render_basic_id.attr,
+   NULL,
+};
+
+static struct attribute_group group_render_basic = {
+   .name = "403d8832-1a27-4aa6-a64e-f5389ce7b212",
+   .attrs =  attrs_render_basic,
+};
+
+int
+i915_perf_register_sysfs_hsw(struct drm_i915_private *dev_priv)
+{
+   int ret;
+
+   ret = sysfs_create_group(dev_priv->perf.metrics_kobj, 
&group_render_basic);
+   if (ret)
+   goto error_render_basic;
+
+   return 0;
+
+error_render_basic:
+   return ret;
+}
+
+void
+i915_perf_unregister_sysfs_hsw(struct drm_i915_private *dev_priv)
+{
+   sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_render_basic);
+}
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.h 
b/drivers/gpu/drm/i915/i915_oa_hsw.h
index b618a1f..429a229 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.h
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.h
@@ -31,4 +31,8 @@ extern int i915_oa_n_builtin_metric_sets_hsw;

 extern int i915_oa_select_metric_set_hsw(struct drm_i915_private *dev_priv);

+extern int i915_perf_reg

[PATCH v4 11/11] drm/i915: Add a kerneldoc summary for i915_perf.c

2016-08-18 Thread Robert Bragg
In particular this tries to capture for posterity some of the early
challenges we had with using the core perf infrastructure in case we
ever want to revisit adapting perf for device metrics.

Cc: Chris Wilson 
Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_perf.c | 163 +++
 1 file changed, 163 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 3d0ba09..2798d70 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -24,6 +24,169 @@
  *   Robert Bragg 
  */

+
+/**
+ * DOC: i915 Perf, streaming API for GPU metrics
+ *
+ * Gen graphics supports a large number of performance counters that can help
+ * driver and application developers understand and optimize their use of the
+ * GPU.
+ *
+ * This i915 perf interface enables userspace to configure and open a file
+ * descriptor representing a stream of GPU metrics which can then be read() as
+ * a stream of sample records.
+ *
+ * The interface is particularly suited to exposing buffered metrics that are
+ * captured by DMA from the GPU, unsynchronized with and unrelated to the CPU.
+ *
+ * Streams representing a single context are accessible to applications with a
+ * corresponding drm file descriptor, such that OpenGL can use the interface
+ * without special privileges. Access to system-wide metrics requires root
+ * privileges by default, unless changed via the dev.i915.perf_event_paranoid
+ * sysctl option.
+ *
+ *
+ * The interface was initially inspired by the core Perf infrastructure but
+ * some notable differences are:
+ *
+ * i915 perf file descriptors represent a "stream" instead of an "event"; where
+ * a perf event primarily corresponds to a single 64bit value, while a stream
+ * might sample sets of tightly-coupled counters, depending on the
+ * configuration.  For example the Gen OA unit isn't designed to support
+ * orthogonal configurations of individual counters; it's configured for a set
+ * of related counters. Samples for an i915 perf stream capturing OA metrics
+ * will include a set of counter values packed in a compact HW specific format.
+ * The OA unit supports a number of different packing formats which can be
+ * selected by the user opening the stream. Perf has support for grouping
+ * events, but each event in the group is configured, validated and
+ * authenticated individually with separate system calls.
+ *
+ * i915 perf stream configurations are provided as an array of u64 (key,value)
+ * pairs, instead of a fixed struct with multiple miscellaneous config members,
+ * interleaved with event-type specific members.
+ *
+ * i915 perf doesn't support exposing metrics via an mmap'd circular buffer.
+ * The supported metrics are being written to memory by the GPU unsynchronized
+ * with the CPU, using HW specific packing formats for counter sets. Sometimes
+ * the constraints on HW configuration require reports to be filtered before it
+ * would be acceptable to expose them to unprivileged applications - to hide
+ * the metrics of other processes/contexts. For these use cases a read() based
+ * interface is a good fit, and provides an opportunity to filter data as it
+ * gets copied from the GPU mapped buffers to userspace buffers.
+ *
+ *
+ * Some notes regarding Linux Perf:
+ * 
+ *
+ * The first prototype of this driver was based on the core perf
+ * infrastructure, and while we did make that mostly work, with some changes to
+ * perf, we found we were breaking or working around too many assumptions baked
+ * into perf's currently cpu centric design.
+ *
+ * In the end we didn't see a clear benefit to making perf's implementation and
+ * interface more complex by changing design assumptions while we knew we still
+ * wouldn't be able to use any existing perf based userspace tools.
+ *
+ * Also considering the Gen specific nature of the Observability hardware and
+ * how userspace will sometimes need to combine i915 perf OA metrics with
+ * side-band OA data captured via MI_REPORT_PERF_COUNT commands; we're
+ * expecting the interface to be used by a platform specific userspace such as
+ * OpenGL or tools. This is to say; we aren't inherently missing out on having
+ * a standard vendor/architecture agnostic interface by not using perf.
+ *
+ *
+ * For posterity, in case we might re-visit trying to adapt core perf to be
+ * better suited to exposing i915 metrics these were the main pain points we
+ * hit:
+ *
+ * - The perf based OA PMU driver broke some significant design assumptions:
+ *
+ *   Existing perf pmus are used for profiling work on a cpu and we were
+ *   introducing the idea of _IS_DEVICE pmus with different security
+ *   implications, the need to fake cpu-related data (such as user/kernel
+ *   registers) to fit with perf's current design, and adding _DEVICE records
+ *   as a 

[PATCH v4 10/11] drm/i915: Add more Haswell OA metric sets

2016-08-18 Thread Robert Bragg
This adds 'compute', 'compute extended', 'memory reads', 'memory writes'
and 'sampler balance' metric sets for Haswell.

The code is auto generated from an XML description of metric sets,
currently maintained in gputop, ref:

 https://github.com/rib/gputop
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_oa_hsw.c | 484 -
 1 file changed, 483 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
index c32b5f8..81e5628 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.c
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -30,9 +30,14 @@

 enum metric_set_id {
METRIC_SET_ID_RENDER_BASIC = 1,
+   METRIC_SET_ID_COMPUTE_BASIC,
+   METRIC_SET_ID_COMPUTE_EXTENDED,
+   METRIC_SET_ID_MEMORY_READS,
+   METRIC_SET_ID_MEMORY_WRITES,
+   METRIC_SET_ID_SAMPLER_BALANCE,
 };

-int i915_oa_n_builtin_metric_sets_hsw = 1;
+int i915_oa_n_builtin_metric_sets_hsw = 6;

 static const struct i915_oa_reg b_counter_config_render_basic[] = {
{ _MMIO(0x2724), 0x0080 },
@@ -118,6 +123,333 @@ static int select_render_basic_config(struct 
drm_i915_private *dev_priv)
return 0;
 }

+static const struct i915_oa_reg b_counter_config_compute_basic[] = {
+   { _MMIO(0x2710), 0x },
+   { _MMIO(0x2714), 0x0080 },
+   { _MMIO(0x2718), 0x },
+   { _MMIO(0x271c), 0x },
+   { _MMIO(0x2720), 0x },
+   { _MMIO(0x2724), 0x0080 },
+   { _MMIO(0x2728), 0x },
+   { _MMIO(0x272c), 0x },
+   { _MMIO(0x2740), 0x },
+   { _MMIO(0x2744), 0x },
+   { _MMIO(0x2748), 0x },
+   { _MMIO(0x274c), 0x },
+   { _MMIO(0x2750), 0x },
+   { _MMIO(0x2754), 0x },
+   { _MMIO(0x2758), 0x },
+   { _MMIO(0x275c), 0x },
+   { _MMIO(0x236c), 0x },
+};
+
+static const struct i915_oa_reg mux_config_compute_basic[] = {
+   { _MMIO(0x253a4), 0x },
+   { _MMIO(0x2681c), 0x01f00800 },
+   { _MMIO(0x26820), 0x1000 },
+   { _MMIO(0x2781c), 0x01f00800 },
+   { _MMIO(0x26520), 0x0007 },
+   { _MMIO(0x265a0), 0x0007 },
+   { _MMIO(0x25380), 0x0010 },
+   { _MMIO(0x2538c), 0x0030 },
+   { _MMIO(0x25384), 0xaa8a },
+   { _MMIO(0x25404), 0x },
+   { _MMIO(0x26800), 0x4202 },
+   { _MMIO(0x26808), 0x00605817 },
+   { _MMIO(0x2680c), 0x10001005 },
+   { _MMIO(0x26804), 0x },
+   { _MMIO(0x27800), 0x0102 },
+   { _MMIO(0x27808), 0x0c0701e0 },
+   { _MMIO(0x2780c), 0x000200a0 },
+   { _MMIO(0x27804), 0x },
+   { _MMIO(0x26484), 0x4400 },
+   { _MMIO(0x26704), 0x4400 },
+   { _MMIO(0x26500), 0x0006 },
+   { _MMIO(0x26510), 0x0001 },
+   { _MMIO(0x26504), 0x8800 },
+   { _MMIO(0x26580), 0x0006 },
+   { _MMIO(0x26590), 0x0020 },
+   { _MMIO(0x26584), 0x },
+   { _MMIO(0x26104), 0x5582 },
+   { _MMIO(0x26184), 0xaa86 },
+   { _MMIO(0x25420), 0x08320c83 },
+   { _MMIO(0x25424), 0x06820c83 },
+   { _MMIO(0x2541c), 0x },
+   { _MMIO(0x25428), 0x0c03 },
+};
+
+static int select_compute_basic_config(struct drm_i915_private *dev_priv)
+{
+   dev_priv->perf.oa.mux_regs =
+   mux_config_compute_basic;
+   dev_priv->perf.oa.mux_regs_len =
+   ARRAY_SIZE(mux_config_compute_basic);
+
+   dev_priv->perf.oa.b_counter_regs =
+   b_counter_config_compute_basic;
+   dev_priv->perf.oa.b_counter_regs_len =
+   ARRAY_SIZE(b_counter_config_compute_basic);
+
+   return 0;
+}
+
+static const struct i915_oa_reg b_counter_config_compute_extended[] = {
+   { _MMIO(0x2724), 0xf080 },
+   { _MMIO(0x2720), 0x },
+   { _MMIO(0x2714), 0xf080 },
+   { _MMIO(0x2710), 0x },
+   { _MMIO(0x2770), 0x0007fe2a },
+   { _MMIO(0x2774), 0xff00 },
+   { _MMIO(0x2778), 0x0007fe6a },
+   { _MMIO(0x277c), 0xff00 },
+   { _MMIO(0x2780), 0x0007fe92 },
+   { _MMIO(0x2784), 0xff00 },
+   { _MMIO(0x2788), 0x0007fea2 },
+   { _MMIO(0x278c), 0xff00 },
+   { _MMIO(0x2790), 0x0007fe32 },
+   { _MMIO(0x2794), 0xff00 },
+   { _MMIO(0x2798), 0x0007fe9a },
+   { _MMIO(0x279c), 0xff00 },
+   { _MMIO(0x27a0), 0x0007ff23 },
+   { _MMIO(0x27a4), 0xff00 },
+   { _MMIO(0x27a8), 0x0007fff3 },
+   { _MMIO(0x27ac), 0xfffe },
+};
+
+static const struct i915_oa_reg mux_config_compute_extended[] = {
+   { _MMIO(0x2681c), 0x3eb00800 },
+   { _MMIO(0x26820), 0x0090 },
+   { _MMIO(0x25384), 0x02aa },
+   

[PATCH v4 09/11] drm/i915: add oa_event_min_timer_exponent sysctl

2016-08-18 Thread Robert Bragg
The minimal sampling period is now configurable via a
dev.i915.oa_min_timer_exponent sysctl parameter.

Following the precedent set by perf, the default is the minimum that
won't (on its own) exceed the default kernel.perf_event_max_sample_rate
default of 10 samples/s.

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_perf.c | 42 
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index ac1f600..3d0ba09 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -74,6 +74,23 @@ static u32 i915_perf_stream_paranoid = true;
  */
 #define OA_EXPONENT_MAX 31

+/* for sysctl proc_dointvec_minmax of i915_oa_min_timer_exponent */
+static int zero;
+static int oa_exponent_max = OA_EXPONENT_MAX;
+
+/* Theoretically we can program the OA unit to sample every 160ns but don't
+ * allow that by default unless root...
+ *
+ * The period is derived from the exponent as:
+ *
+ *   period = 80ns * 2^(exponent + 1)
+ *
+ * Referring to perf's kernel.perf_event_max_sample_rate for a precedent
+ * (10 by default); with an OA exponent of 6 we get a period of 10.240
+ * microseconds - just under 10Hz
+ */
+static u32 i915_oa_min_timer_exponent = 6;
+
 /* XXX: beware if future OA HW adds new report formats that the current
  * code assumes all reports have a power-of-two size and ~(size - 1) can
  * be used as a mask to align the OA tail pointer.
@@ -1303,21 +1320,13 @@ static int read_properties_unlocked(struct 
drm_i915_private *dev_priv,
return -EINVAL;
}

-   /* NB: The exponent represents a period as follows:
-*
-*   80ns * 2^(period_exponent + 1)
-*
-* Theoretically we can program the OA unit to sample
+   /* Theoretically we can program the OA unit to sample
 * every 160ns but don't allow that by default unless
 * root.
-*
-* Referring to perf's
-* kernel.perf_event_max_sample_rate for a precedent
-* (10 by default); with an OA exponent of 6 we get
-* a period of 10.240 microseconds -just under 10Hz
 */
-   if (value < 6 && !capable(CAP_SYS_ADMIN)) {
-   DRM_ERROR("Sampling period too high without 
root privileges\n");
+   if (value < i915_oa_min_timer_exponent &&
+   !capable(CAP_SYS_ADMIN)) {
+   DRM_ERROR("OA timer exponent too low without 
root privileges\n");
return -EACCES;
}

@@ -1415,6 +1424,15 @@ static struct ctl_table oa_table[] = {
 .mode = 0644,
 .proc_handler = proc_dointvec,
 },
+   {
+.procname = "oa_min_timer_exponent",
+.data = &i915_oa_min_timer_exponent,
+.maxlen = sizeof(i915_oa_min_timer_exponent),
+.mode = 0644,
+.proc_handler = proc_dointvec_minmax,
+.extra1 = &zero,
+.extra2 = &oa_exponent_max,
+},
{}
 };

-- 
2.9.2



[PATCH v4 08/11] drm/i915: Add dev.i915.perf_event_paranoid sysctl option

2016-08-18 Thread Robert Bragg
Consistent with the kernel.perf_event_paranoid sysctl option that can
allow non-root users to access system wide cpu metrics, this can
optionally allow non-root users to access system wide OA counter metrics
from Gen graphics hardware.

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/i915_perf.c | 45 +++-
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index dd88eb1..558cc0b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2116,6 +2116,7 @@ struct drm_i915_private {
bool initialized;

struct kobject *metrics_kobj;
+   struct ctl_table_header *sysctl_header;

struct mutex lock;
struct list_head streams;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 7e1fc6b..ac1f600 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -62,6 +62,8 @@
 #define POLL_FREQUENCY 200
 #define POLL_PERIOD (NSEC_PER_SEC / POLL_FREQUENCY)

+static u32 i915_perf_stream_paranoid = true;
+
 /* The maximum exponent the hardware accepts is 63 (essentially it selects one
  * of the 64bit timestamp bits to trigger reports from) but there's currently
  * no known use case for sampling as infrequently as once per 47 thousand 
years.
@@ -1158,7 +1160,13 @@ int i915_perf_open_ioctl_locked(struct drm_device *dev,
}
}

-   if (!specific_ctx && !capable(CAP_SYS_ADMIN)) {
+   /* Similar to perf's kernel.perf_paranoid_cpu sysctl option
+* we check a dev.i915.perf_stream_paranoid sysctl option
+* to determine if it's ok to access system wide OA counters
+* without CAP_SYS_ADMIN privileges.
+*/
+   if (!specific_ctx &&
+   i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
DRM_ERROR("Insufficient privileges to open system-wide i915 
perf stream\n");
ret = -EACCES;
goto err_ctx;
@@ -1399,6 +1407,37 @@ void i915_perf_unregister(struct drm_i915_private 
*dev_priv)
dev_priv->perf.metrics_kobj = NULL;
 }

+static struct ctl_table oa_table[] = {
+   {
+.procname = "perf_stream_paranoid",
+.data = &i915_perf_stream_paranoid,
+.maxlen = sizeof(i915_perf_stream_paranoid),
+.mode = 0644,
+.proc_handler = proc_dointvec,
+},
+   {}
+};
+
+static struct ctl_table i915_root[] = {
+   {
+.procname = "i915",
+.maxlen = 0,
+.mode = 0555,
+.child = oa_table,
+},
+   {}
+};
+
+static struct ctl_table dev_root[] = {
+   {
+.procname = "dev",
+.maxlen = 0,
+.mode = 0555,
+.child = i915_root,
+},
+   {}
+};
+
 void i915_perf_init(struct drm_i915_private *dev_priv)
 {
if (!IS_HASWELL(dev_priv))
@@ -1431,6 +1470,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
dev_priv->perf.oa.n_builtin_sets =
i915_oa_n_builtin_metric_sets_hsw;

+   dev_priv->perf.sysctl_header = register_sysctl_table(dev_root);
+
dev_priv->perf.initialized = true;
 }

@@ -1439,6 +1480,8 @@ void i915_perf_fini(struct drm_i915_private *dev_priv)
if (!dev_priv->perf.initialized)
return;

+   unregister_sysctl_table(dev_priv->perf.sysctl_header);
+
memset(&dev_priv->perf.oa.ops, 0, sizeof(dev_priv->perf.oa.ops));
dev_priv->perf.initialized = false;
 }
-- 
2.9.2



[PATCH v4 07/11] drm/i915: advertise available metrics via sysfs

2016-08-18 Thread Robert Bragg
Each metric set is given a sysfs entry like:

/sys/class/drm/card0/metrics//id

This allows userspace to enumerate the specific sets that are available
for the current system. The 'id' file contains an unsigned integer that
can be used to open the associated metric set via
DRM_IOCTL_I915_PERF_OPEN. The  is a globally unique ID for a
specific OA unit register configuration that can be reliably used by
userspace as a key to lookup corresponding counter meta data and
normalization equations.

The guid registry is currently maintained as part of gputop along with
the XML metric set descriptions and code generation scripts, ref:

 https://github.com/rib/gputop
 > gputop-data/guids.xml
 > scripts/update-guids.py
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_drv.c|  5 +
 drivers/gpu/drm/i915/i915_drv.h|  4 
 drivers/gpu/drm/i915/i915_oa_hsw.c | 45 +
 drivers/gpu/drm/i915/i915_oa_hsw.h |  4 
 drivers/gpu/drm/i915/i915_perf.c   | 46 ++
 5 files changed, 104 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 92f668e..0f5f51b 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1172,6 +1172,9 @@ static void i915_driver_register(struct drm_i915_private 
*dev_priv)
 * cannot run before the connectors are registered.
 */
intel_fbdev_initial_config_async(dev);
+
+   /* Depends on sysfs having been initialized */
+   i915_perf_register(dev_priv);
 }

 /**
@@ -1180,6 +1183,8 @@ static void i915_driver_register(struct drm_i915_private 
*dev_priv)
  */
 static void i915_driver_unregister(struct drm_i915_private *dev_priv)
 {
+   i915_perf_unregister(dev_priv);
+
i915_audio_component_cleanup(dev_priv);

intel_gpu_ips_teardown();
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4c302cd..dd88eb1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2115,6 +2115,8 @@ struct drm_i915_private {
struct {
bool initialized;

+   struct kobject *metrics_kobj;
+
struct mutex lock;
struct list_head streams;

@@ -3694,6 +3696,8 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
 /* i915_perf.c */
 extern void i915_perf_init(struct drm_i915_private *dev_priv);
 extern void i915_perf_fini(struct drm_i915_private *dev_priv);
+extern void i915_perf_register(struct drm_i915_private *dev_priv);
+extern void i915_perf_unregister(struct drm_i915_private *dev_priv);

 /* i915_suspend.c */
 extern int i915_save_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c 
b/drivers/gpu/drm/i915/i915_oa_hsw.c
index 3e6006ec..c32b5f8 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.c
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -24,6 +24,8 @@
  *
  */

+#include 
+
 #include "i915_drv.h"

 enum metric_set_id {
@@ -130,3 +132,46 @@ int i915_oa_select_metric_set_hsw(struct drm_i915_private 
*dev_priv)
return -ENODEV;
}
 }
+
+static ssize_t
+show_render_basic_id(struct device *kdev, struct device_attribute *attr, char 
*buf)
+{
+   return sprintf(buf, "%d\n", METRIC_SET_ID_RENDER_BASIC);
+}
+
+static struct device_attribute dev_attr_render_basic_id = {
+   .attr = { .name = "id", .mode = S_IRUGO },
+   .show = show_render_basic_id,
+   .store = NULL,
+};
+
+static struct attribute *attrs_render_basic[] = {
+   &dev_attr_render_basic_id.attr,
+   NULL,
+};
+
+static struct attribute_group group_render_basic = {
+   .name = "403d8832-1a27-4aa6-a64e-f5389ce7b212",
+   .attrs =  attrs_render_basic,
+};
+
+int
+i915_perf_init_sysfs_hsw(struct drm_i915_private *dev_priv)
+{
+   int ret;
+
+   ret = sysfs_create_group(dev_priv->perf.metrics_kobj, 
&group_render_basic);
+   if (ret)
+   goto error_render_basic;
+
+   return 0;
+
+error_render_basic:
+   return ret;
+}
+
+void
+i915_perf_deinit_sysfs_hsw(struct drm_i915_private *dev_priv)
+{
+   sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_render_basic);
+}
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.h 
b/drivers/gpu/drm/i915/i915_oa_hsw.h
index b618a1f..e4ba89d 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.h
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.h
@@ -31,4 +31,8 @@ extern int i915_oa_n_builtin_metric_sets_hsw;

 extern int i915_oa_select_metric_set_hsw(struct drm_i915_private *dev_priv);

+extern int i915_perf_init_sysfs_hsw(struct drm_i915_private *dev_priv);
+
+extern void i915_perf_deinit_sysfs_hsw(struct drm_i915_private *dev_priv);
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/g

  1   2   3   >