Re: [PATCH] drm/amdgpu: fix amdgpu_ras_block_late_init error handler

2022-02-22 Thread Kenny Ho
On Thu, Feb 17, 2022 at 2:06 PM Alex Deucher  wrote:
>
> On Thu, Feb 17, 2022 at 2:04 PM Nick Desaulniers
>  wrote:
> >
> >
> > Alex,
> > Has AMD been able to set up clang builds, yet?
>
> No.  I think some individual teams do, but it's never been integrated
> into our larger CI systems as of yet as far as I know.

I have just added clang build to our CI last night so hopefully we
should be catching these now.

Kenny

>
> Alex
>
>
> >
> > --
> > Thanks,
> > ~Nick Desaulniers


Re: [RFC] Add BPF_PROG_TYPE_CGROUP_IOCTL

2021-05-07 Thread Kenny Ho
On Fri, May 7, 2021 at 12:54 PM Daniel Vetter  wrote:
>
> SRIOV is kinda by design vendor specific. You set up the VF endpoint, it
> shows up, it's all hw+fw magic. Nothing for cgroups to manage here at all.
Right, so in theory you just use the device cgroup with the VF endpoints.

> All I meant is that for the container/cgroups world starting out with
> time-sharing feels like the best fit, least because your SRIOV designers
> also seem to think that's the best first cut for cloud-y computing.
> Whether it's virtualized or containerized is a distinction that's getting
> ever more blurry, with virtualization become a lot more dynamic and
> container runtimes als possibly using hw virtualization underneath.
I disagree.  By the same logic, the existence of CU mask would imply
it being the preferred way for sub-device control per process.

Kenny


Re: [RFC] Add BPF_PROG_TYPE_CGROUP_IOCTL

2021-05-07 Thread Kenny Ho
On Fri, May 7, 2021 at 4:59 AM Daniel Vetter  wrote:
>
> Hm I missed that. I feel like time-sliced-of-a-whole gpu is the easier gpu
> cgroups controler to get started, since it's much closer to other cgroups
> that control bandwidth of some kind. Whether it's i/o bandwidth or compute
> bandwidht is kinda a wash.
sriov/time-sliced-of-a-whole gpu does not really need a cgroup
interface since each slice appears as a stand alone device.  This is
already in production (not using cgroup) with users.  The cgroup
proposal has always been parallel to that in many sense: 1) spatial
partitioning as an independent but equally valid use case as time
sharing, 2) sub-device resource control as opposed to full device
control motivated by the workload characterization paper.  It was
never about time vs space in terms of use cases but having new API for
users to be able to do spatial subdevice partitioning.

> CU mask feels a lot more like an isolation/guaranteed forward progress
> kind of thing, and I suspect that's always going to be a lot more gpu hw
> specific than anything we can reasonably put into a general cgroups
> controller.
The first half is correct but I disagree with the conclusion.  The
analogy I would use is multi-core CPU.  The capability of individual
CPU cores, core count and core arrangement may be hw specific but
there are general interfaces to support selection of these cores.  CU
mask may be hw specific but spatial partitioning as an idea is not.
Most gpu vendors have the concept of sub-device compute units (EU, SE,
etc.); OpenCL has the concept of subdevice in the language.  I don't
see any obstacle for vendors to implement spatial partitioning just
like many CPU vendors support the idea of multi-core.

> Also for the time slice cgroups thing, can you pls give me pointers to
> these old patches that had it, and how it's done? I very obviously missed
> that part.
I think you misunderstood what I wrote earlier.  The original proposal
was about spatial partitioning of subdevice resources not time sharing
using cgroup (since time sharing is already supported elsewhere.)

Kenny


Re: [RFC] Add BPF_PROG_TYPE_CGROUP_IOCTL

2021-05-06 Thread Kenny Ho
Sorry for the late reply (I have been working on other stuff.)

On Fri, Feb 5, 2021 at 8:49 AM Daniel Vetter  wrote:
>
> So I agree that on one side CU mask can be used for low-level quality
> of service guarantees (like the CLOS cache stuff on intel cpus as an
> example), and that's going to be rather hw specific no matter what.
>
> But my understanding of AMD's plans here is that CU mask is the only
> thing you'll have to partition gpu usage in a multi-tenant environment
> - whether that's cloud or also whether that's containing apps to make
> sure the compositor can still draw the desktop (except for fullscreen
> ofc) doesn't really matter I think.
This is not correct.  Even in the original cgroup proposal, it
supports both mask and count as a way to define unit(s) of sub-device.
For AMD, we already have SRIOV that supports GPU partitioning in a
time-sliced-of-a-whole-GPU fashion.

Kenny


Re: [RFC] Add BPF_PROG_TYPE_CGROUP_IOCTL

2021-02-03 Thread Kenny Ho
Daniel,

I will have to get back to you later on the details of this because my
head is currently context switched to some infrastructure and
Kubernetes/golang work, so I am having a hard time digesting what you
are saying.  I am new to the bpf stuff so this is about my own
learning as well as a conversation starter.  The high level goal here
is to have a path for flexibility via a bpf program.  Not just GPU or
DRM or CU mask, but devices making decisions via an operator-written
bpf-prog attached to a cgroup.  More inline.

On Wed, Feb 3, 2021 at 6:09 AM Daniel Vetter  wrote:
>
> On Mon, Feb 01, 2021 at 11:51:07AM -0500, Kenny Ho wrote:
> > On Mon, Feb 1, 2021 at 9:49 AM Daniel Vetter  wrote:
> > > - there's been a pile of cgroups proposal to manage gpus at the drm
> > >   subsystem level, some by Kenny, and frankly this at least looks a bit
> > >   like a quick hack to sidestep the consensus process for that.
> > No Daniel, this is quick *draft* to get a conversation going.  Bpf was
> > actually a path suggested by Tejun back in 2018 so I think you are
> > mischaracterizing this quite a bit.
> >
> > "2018-11-20 Kenny Ho:
> > To put the questions in more concrete terms, let say a user wants to
> >  expose certain part of a gpu to a particular cgroup similar to the
> >  way selective cpu cores are exposed to a cgroup via cpuset, how
> >  should we go about enabling such functionality?
> >
> > 2018-11-20 Tejun Heo:
> > Do what the intel driver or bpf is doing?  It's not difficult to hook
> > into cgroup for identification purposes."
>
> Yeah, but if you go full amd specific for this, you might as well have a
> specific BPF hook which is called in amdgpu/kfd and returns you the CU
> mask for a given cgroups (and figures that out however it pleases).
>
> Not a generic framework which lets you build pretty much any possible
> cgroups controller for anything else using BPF. Trying to filter anything
> at the generic ioctl just doesn't feel like a great idea that's long term
> maintainable. E.g. what happens if there's new uapi for command
> submission/context creation and now your bpf filter isn't catching all
> access anymore? If it's an explicit hook that explicitly computes the CU
> mask, then we can add more checks as needed. With ioctl that's impossible.
>
> Plus I'm also not sure whether that's really a good idea still, since if
> cloud companies have to built their own bespoke container stuff for every
> gpu vendor, that's quite a bad platform we're building. And "I'd like to
> make sure my gpu is used fairly among multiple tenents" really isn't a
> use-case that's specific to amd.

I don't understand what you are saying about containers here since
bpf-progs are not the same as container nor are they deployed from
inside a container (as far as I know, I am actually not sure how
bpf-cgroup works with higher level cloud orchestration since folks
like Docker just migrated to cgroup v2 very recently... I don't think
you can specify a bpf-prog to load as part of a k8s pod definition.)
That said, the bit I understand ("not sure whether that's really a
good ideacloud companies have to built their own bespoke container
stuff for every gpu vendor...") is in fact the current status quo.  If
you look into some of the popular ML/AI-oriented containers/apps, you
will likely see things are mostly hardcoded to CUDA.  Since I work for
AMD, I wouldn't say that's a good thing but this is just the reality.
For Kubernetes at least (where my head is currently), the official
mechanisms are Device Plugins (I am the author for the one for AMD but
there are a few ones from Intel too, you can confirm with your
colleagues)  and Node Feature/Labels.  Kubernetes schedules
pod/container launched by users to the node/servers by the affinity of
the node resources/labels, and the resources/labels in the pod
specification created by the users.

> If this would be something very hw specific like cache assignment and
> quality of service stuff or things like that, then vendor specific imo
> makes sense. But for CU masks essentially we're cutting the compute
> resources up in some way, and I kinda expect everyone with a gpu who cares
> about isolating workloads with cgroups wants to do that.

Right, but isolating workloads is quality of service stuff and *how*
compute resources are cut up are vendor specific.

Anyway, as I said at the beginning of this reply, this is about
flexibility in support of the diversity of devices and architectures.
CU mask is simply a concrete example of hw diversity that a
bpf-program can encapsulate.  I can see this framework (a custom
program making decisions in a specific cgroup and device context) use
for other things as well.  It may even be useful within a vendor to
handle the diversity between SKUs.

Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC] Add BPF_PROG_TYPE_CGROUP_IOCTL

2021-02-01 Thread Kenny Ho
[Resent in plain text.]

On Mon, Feb 1, 2021 at 9:49 AM Daniel Vetter  wrote:
> - there's been a pile of cgroups proposal to manage gpus at the drm
>   subsystem level, some by Kenny, and frankly this at least looks a bit
>   like a quick hack to sidestep the consensus process for that.
No Daniel, this is quick *draft* to get a conversation going.  Bpf was
actually a path suggested by Tejun back in 2018 so I think you are
mischaracterizing this quite a bit.

"2018-11-20 Kenny Ho:
To put the questions in more concrete terms, let say a user wants to
 expose certain part of a gpu to a particular cgroup similar to the
 way selective cpu cores are exposed to a cgroup via cpuset, how
 should we go about enabling such functionality?

2018-11-20 Tejun Heo:
Do what the intel driver or bpf is doing?  It's not difficult to hook
into cgroup for identification purposes."

Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC] Add BPF_PROG_TYPE_CGROUP_IOCTL

2021-02-01 Thread Kenny Ho
On Mon, Feb 1, 2021 at 9:49 AM Daniel Vetter  wrote:

>
> - there's been a pile of cgroups proposal to manage gpus at the drm
>   subsystem level, some by Kenny, and frankly this at least looks a bit
>   like a quick hack to sidestep the consensus process for that.
>
No Daniel, this is quick *draft* to get a conversation going.  Bpf was
actually a path suggested by Tejun back in 2018 so I think you are
mischaracterizing this quite a bit.

"2018-11-20 Kenny Ho:
To put the questions in more concrete terms, let say a user wants to
 expose certain part of a gpu to a particular cgroup similar to the
 way selective cpu cores are exposed to a cgroup via cpuset, how
 should we go about enabling such functionality?

2018-11-20 Tejun Heo:
Do what the intel driver or bpf is doing?  It's not difficult to hook
into cgroup for identification purposes."

Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-14 Thread Kenny Ho
On Tue, Apr 14, 2020 at 10:04 AM Daniel Vetter  wrote:
>
> This has _nothing_ to do with Intel (I think over the past 25 years or
> so intel has implemented all 4 versions of gpu splitting that I
> listed, but not entirely sure).
>
> So again pls less tribal fighting, more collaboration. If you can't do
> that, let's pick nouveau/nvidia as arbitrary neutral ground.

So are you saying Intel has implemented a form of masking before?  I
don't think we need to just pick a vendor as a neutral ground.  The
idea of spatial sharing vs time sharing is not vendor specific... it's
not even GPU specific.  This is why I asked the two questions below.

> > Perhaps the following questions can help keep the discussion technical:
> > 1)  Is it possible to implement non-work-conserving distribution of
> > GPU without spatial sharing?  (If yes, I'd love to hear a suggestion,
> > if not...question 2.)
> > 2)  If spatial sharing is required to support GPU HPC use cases, what
> > would you implement if you have the hardware support today?
>
> The thing we can currently do in upstream (from how I'm understanding
> hw) is assign entire PCI devices to containers, so essentially only
> the entire /dev/dri/* cdev. That works, and it works across all
> drivers we have in upstream right now.
>
> Anything more fine-grained I don't think is currently possible,
> because everyone has a different idea of how to split up gpus. It
> would be nice to have it, but in upstream, cross-vendor, I'm just not
> seeing it happen right now.

I understand the reality, but what would you implement to support the
concept (GPU in HPC, which you said you are not against) if you have
the hw support today?  How would you support low-jitter/low-latency
sharing of a single GPU if you have whatever hardware support you need
today?

Regards,
Kenny


> > On Tue, Apr 14, 2020 at 9:26 AM Daniel Vetter  wrote:
> > >
> > > On Tue, Apr 14, 2020 at 3:14 PM Kenny Ho  wrote:
> > > >
> > > > Ok.  I was hoping you can clarify the contradiction between the
> > > > existance of the spec below and your "not something any other gpu can
> > > > reasonably support" statement.  I mean, OneAPI is Intel's spec and
> > > > doesn't that at least make SubDevice support "reasonable" for one more
> > > > vendor?
> > > >
> > > > Partisanship aside, as a drm co-maintainer, do you really not see the
> > > > need for non-work-conserving way of distributing GPU as a resource?
> > > > You recognized the latencies involved (although that's really just
> > > > part of the story... time sharing is never going to be good enough
> > > > even if your switching cost is zero.)  As a drm co-maintainer, are you
> > > > suggesting GPU has no place in the HPC use case?
> > >
> > >  So I did chat with people and my understanding for how this subdevice
> > > stuff works is roughly, from least to most fine grained support:
> > > - Not possible at all, hw doesn't have any such support
> > > - The hw is actually not a single gpu, but a bunch of chips behind a
> > > magic bridge/interconnect, and there's a scheduler load-balancing
> > > stuff and you can't actually run on all "cores" in parallel with one
> > > compute/3d job. So subdevices just give you some of these cores, but
> > > from client api pov they're exactly as powerful as the full device. So
> > > this kinda works like assigning an entire NUMA node, including all the
> > > cpu cores and memory bandwidth and everything.
> > > - Hw has multiple "engines" which share resources (like compute cores
> > > or whatever) behind the scenes. There's no control over how this
> > > sharing works really, and whether you have guarantees about minimal
> > > execution resources or not. This kinda works like hyperthreading.
> > > - Then finally we have the CU mask thing amdgpu has. Which works like
> > > what you're proposing, works on amd.
> > >
> > > So this isn't something that I think we should standardize in a
> > > resource management framework like cgroups. Because it's a complete
> > > mess. Note that _all_ the above things (including the "no subdevices"
> > > one) are valid implementations of "subdevices" in the various specs.
> > >
> > > Now on your question on "why was this added to various standards?"
> > > because opencl has that too (and the rocm thing, and everything else
> > > it seems). What I heard is that a few people pushed really hard, and
> > > no one objected hard enoug

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-14 Thread Kenny Ho
Hi Daniel,

I appreciate many of your review so far and I much prefer keeping
things technical but that is very difficult to do when I get Intel
developers calling my implementation "most AMD-specific solution
possible" and objecting to an implementation because their hardware
cannot support it.  Can you help me with a more charitable
interpretation of what has been happening?

Perhaps the following questions can help keep the discussion technical:
1)  Is it possible to implement non-work-conserving distribution of
GPU without spatial sharing?  (If yes, I'd love to hear a suggestion,
if not...question 2.)
2)  If spatial sharing is required to support GPU HPC use cases, what
would you implement if you have the hardware support today?

Regards,
Kenny

On Tue, Apr 14, 2020 at 9:26 AM Daniel Vetter  wrote:
>
> On Tue, Apr 14, 2020 at 3:14 PM Kenny Ho  wrote:
> >
> > Ok.  I was hoping you can clarify the contradiction between the
> > existance of the spec below and your "not something any other gpu can
> > reasonably support" statement.  I mean, OneAPI is Intel's spec and
> > doesn't that at least make SubDevice support "reasonable" for one more
> > vendor?
> >
> > Partisanship aside, as a drm co-maintainer, do you really not see the
> > need for non-work-conserving way of distributing GPU as a resource?
> > You recognized the latencies involved (although that's really just
> > part of the story... time sharing is never going to be good enough
> > even if your switching cost is zero.)  As a drm co-maintainer, are you
> > suggesting GPU has no place in the HPC use case?
>
>  So I did chat with people and my understanding for how this subdevice
> stuff works is roughly, from least to most fine grained support:
> - Not possible at all, hw doesn't have any such support
> - The hw is actually not a single gpu, but a bunch of chips behind a
> magic bridge/interconnect, and there's a scheduler load-balancing
> stuff and you can't actually run on all "cores" in parallel with one
> compute/3d job. So subdevices just give you some of these cores, but
> from client api pov they're exactly as powerful as the full device. So
> this kinda works like assigning an entire NUMA node, including all the
> cpu cores and memory bandwidth and everything.
> - Hw has multiple "engines" which share resources (like compute cores
> or whatever) behind the scenes. There's no control over how this
> sharing works really, and whether you have guarantees about minimal
> execution resources or not. This kinda works like hyperthreading.
> - Then finally we have the CU mask thing amdgpu has. Which works like
> what you're proposing, works on amd.
>
> So this isn't something that I think we should standardize in a
> resource management framework like cgroups. Because it's a complete
> mess. Note that _all_ the above things (including the "no subdevices"
> one) are valid implementations of "subdevices" in the various specs.
>
> Now on your question on "why was this added to various standards?"
> because opencl has that too (and the rocm thing, and everything else
> it seems). What I heard is that a few people pushed really hard, and
> no one objected hard enough (because not having subdevices is a
> standards compliant implementation), so that's why it happened. Just
> because it's in various standards doesn't mean that a) it's actually
> standardized in a useful fashion and b) something we should just
> blindly adopt.
>
> Also like where exactly did you understand that I'm against gpus in
> HPC uses cases. Approaching this in a slightly less tribal way would
> really, really help to get something landed (which I'd like to see
> happen, personally). Always spinning this as an Intel vs AMD thing
> like you do here with every reply really doesn't help moving this in.
>
> So yeah stricter isolation is something customers want, it's just not
> something we can really give out right now at a level below the
> device.
> -Daniel
>
> >
> > Regards,
> > Kenny
> >
> > On Tue, Apr 14, 2020 at 8:52 AM Daniel Vetter  wrote:
> > >
> > > On Tue, Apr 14, 2020 at 2:47 PM Kenny Ho  wrote:
> > > > On Tue, Apr 14, 2020 at 8:20 AM Daniel Vetter  wrote:
> > > > > My understanding from talking with a few other folks is that
> > > > > the cpumask-style CU-weight thing is not something any other gpu can
> > > > > reasonably support (and we have about 6+ of those in-tree)
> > > >
> > > > How does Intel plan to support the SubDevice API as described in your
> > > > own spec here:
> > > > https://spec.oneapi.com/versions/0.7/oneL0/cor

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-14 Thread Kenny Ho
Ok.  I was hoping you can clarify the contradiction between the
existance of the spec below and your "not something any other gpu can
reasonably support" statement.  I mean, OneAPI is Intel's spec and
doesn't that at least make SubDevice support "reasonable" for one more
vendor?

Partisanship aside, as a drm co-maintainer, do you really not see the
need for non-work-conserving way of distributing GPU as a resource?
You recognized the latencies involved (although that's really just
part of the story... time sharing is never going to be good enough
even if your switching cost is zero.)  As a drm co-maintainer, are you
suggesting GPU has no place in the HPC use case?

Regards,
Kenny

On Tue, Apr 14, 2020 at 8:52 AM Daniel Vetter  wrote:
>
> On Tue, Apr 14, 2020 at 2:47 PM Kenny Ho  wrote:
> > On Tue, Apr 14, 2020 at 8:20 AM Daniel Vetter  wrote:
> > > My understanding from talking with a few other folks is that
> > > the cpumask-style CU-weight thing is not something any other gpu can
> > > reasonably support (and we have about 6+ of those in-tree)
> >
> > How does Intel plan to support the SubDevice API as described in your
> > own spec here:
> > https://spec.oneapi.com/versions/0.7/oneL0/core/INTRO.html#subdevice-support
>
> I can't talk about whether future products might or might not support
> stuff and in what form exactly they might support stuff or not support
> stuff. Or why exactly that's even in the spec there or not.
>
> Geez
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-14 Thread Kenny Ho
Hi Daniel,

On Tue, Apr 14, 2020 at 8:20 AM Daniel Vetter  wrote:
> My understanding from talking with a few other folks is that
> the cpumask-style CU-weight thing is not something any other gpu can
> reasonably support (and we have about 6+ of those in-tree)

How does Intel plan to support the SubDevice API as described in your
own spec here:
https://spec.oneapi.com/versions/0.7/oneL0/core/INTRO.html#subdevice-support

Regards,
Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-13 Thread Kenny Ho
Hi,

On Mon, Apr 13, 2020 at 4:54 PM Tejun Heo  wrote:
>
> Allocations definitely are acceptable and it's not a pre-requisite to have
> work-conserving control first either. Here, given the lack of consensus in
> terms of what even constitute resource units, I don't think it'd be a good
> idea to commit to the proposed interface and believe it'd be beneficial to
> work on interface-wise simpler work conserving controls.
>
...
> I hope the rationales are clear now. What I'm objecting is inclusion of
> premature interface, which is a lot easier and more tempting to do for
> hardware-specific limits and the proposals up until now have been showing
> ample signs of that. I don't think my position has changed much since the
> beginning - do the difficult-to-implement but easy-to-use weights first and
> then you and everyone would have a better idea of what hard-limit or
> allocation interfaces and mechanisms should look like, or even whether they're
> needed.

By lack of consense, do you mean Intel's assertion that a standard is
not a standard until Intel implements it? (That was in the context of
OpenCL language standard with the concept of SubDevice.)  I thought
the discussion so far has established that the concept of a compute
unit, while named differently (AMD's CUs, ARM's SCs, Intel's EUs,
Nvidia's SMs, Qualcomm's SPs), is cross vendor.  While an AMD CU is
not the same as an Intel EU or Nvidia SM, the same can be said for CPU
cores.  If cpuset is acceptable for a diversity of CPU core designs
and arrangements, I don't understand why an interface derived from GPU
SubDevice is considered premature.

If a decade-old language standard is not considered a consenses, can
you elaborate on what might consitute a consenses?

Regards,
Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-13 Thread Kenny Ho
(replying again in plain-text)

Hi Tejun,

Thanks for taking the time to reply.

Perhaps we can even narrow things down to just
gpu.weight/gpu.compute.weight as a start?  In this aspect, is the key
objection to the current implementation of gpu.compute.weight the
work-conserving bit?  This work-conserving requirement is probably
what I have missed for the last two years (and hence going in circle.)

If this is the case, can you clarify/confirm the followings?

1) Is resource scheduling goal of cgroup purely for the purpose of
throughput?  (at the expense of other scheduling goals such as
latency.)
2) If 1) is true, under what circumstances will the "Allocations"
resource distribution model (as defined in the cgroup-v2) be
acceptable?
3) If 1) is true, are things like cpuset from cgroup v1 no longer
acceptable going forward?

To be clear, while some have framed this (time sharing vs spatial
sharing) as a partisan issue, it is in fact a technical one.  I have
implemented the gpu cgroup support this way because we have a class of
users that value low latency/low jitter/predictability/synchronicity.
For example, they would like 4 tasks to share a GPU and they would
like the tasks to start and finish at the same time.

What is the rationale behind picking the Weight model over Allocations
as the first acceptable implementation?  Can't we have both
work-conserving and non-work-conserving ways of distributing GPU
resources?  If we can, why not allow non-work-conserving
implementation first, especially when we have users asking for such
functionality?

Regards,
Kenny

On Mon, Apr 13, 2020 at 3:11 PM Tejun Heo  wrote:
>
> Hello, Kenny.
>
> On Tue, Mar 24, 2020 at 02:49:27PM -0400, Kenny Ho wrote:
> > Can you elaborate more on what are the missing pieces?
>
> Sorry about the long delay, but I think we've been going in circles for quite
> a while now. Let's try to make it really simple as the first step. How about
> something like the following?
>
> * gpu.weight (should it be gpu.compute.weight? idk) - A single number
>   per-device weight similar to io.weight, which distributes computation
>   resources in work-conserving way.
>
> * gpu.memory.high - A single number per-device on-device memory limit.
>
> The above two, if works well, should already be plenty useful. And my guess is
> that getting the above working well will be plenty challenging already even
> though it's already excluding work-conserving memory distribution. So, let's
> please do that as the first step and see what more would be needed from there.
>
> Thanks.
>
> --
> tejun
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-03-24 Thread Kenny Ho
Hi Tejun,

Can you elaborate more on what are the missing pieces?

Regards,
Kenny

On Tue, Mar 24, 2020 at 2:46 PM Tejun Heo  wrote:
>
> On Tue, Mar 17, 2020 at 12:03:20PM -0400, Kenny Ho wrote:
> > What's your thoughts on this latest series?
>
> My overall impression is that the feedbacks aren't being incorporated 
> throughly
> / sufficiently.
>
> Thanks.
>
> --
> tejun
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-03-17 Thread Kenny Ho
Hi Tejun,

What's your thoughts on this latest series?

Regards,
Kenny

On Wed, Feb 26, 2020 at 2:02 PM Kenny Ho  wrote:
>
> This is a submission for the introduction of a new cgroup controller for the 
> drm subsystem follow a series of RFCs [v1, v2, v3, v4]
>
> Changes from PR v1
> * changed cgroup controller name from drm to gpu
> * removed lgpu
> * added compute.weight resources, clarified resources being distributed as 
> partitions of compute device
>
> PR v1: https://www.spinics.net/lists/cgroups/msg24479.html
>
> Changes from the RFC base on the feedbacks:
> * drop all drm.memory.* related implementation and focus only on buffer and 
> lgpu
> * add weight resource type for logical gpu (lgpu)
> * uncoupled drmcg device iteration from drm_minor
>
> I'd also like to highlight the fact that these patches are currently released 
> under MIT/X11 license aligning with the norm of the drm subsystem, but I am 
> working to have the cgroup parts release under GPLv2 to align with the norm 
> of the cgroup subsystem.
>
> RFC:
> [v1]: 
> https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
> [v2]: https://www.spinics.net/lists/cgroups/msg22074.html
> [v3]: https://lists.freedesktop.org/archives/amd-gfx/2019-June/036026.html
> [v4]: https://patchwork.kernel.org/cover/11120371/
>
> Changes since the start of RFC are as follows:
>
> v4:
> Unchanged (no review needed)
> * drm.memory.*/ttm resources (Patch 9-13, I am still working on memory 
> bandwidth
> and shrinker)
> Base on feedbacks on v3:
> * update nominclature to drmcg
> * embed per device drmcg properties into drm_device
> * split GEM buffer related commits into stats and limit
> * rename function name to align with convention
> * combined buffer accounting and check into a try_charge function
> * support buffer stats without limit enforcement
> * removed GEM buffer sharing limitation
> * updated documentations
> New features:
> * introducing logical GPU concept
> * example implementation with AMD KFD
>
> v3:
> Base on feedbacks on v2:
> * removed .help type file from v2
> * conform to cgroup convention for default and max handling
> * conform to cgroup convention for addressing device specific limits (with 
> major:minor)
> New function:
> * adopted memparse for memory size related attributes
> * added macro to marshall drmcgrp cftype private ?(DRMCG_CTF_PRIV, etc.)
> * added ttm buffer usage stats (per cgroup, for system, tt, vram.)
> * added ttm buffer usage limit (per cgroup, for vram.)
> * added per cgroup bandwidth stats and limiting (burst and average bandwidth)
>
> v2:
> * Removed the vendoring concepts
> * Add limit to total buffer allocation
> * Add limit to the maximum size of a buffer allocation
>
> v1: cover letter
>
> The purpose of this patch series is to start a discussion for a generic cgroup
> controller for the drm subsystem.  The design proposed here is a very early
> one.  We are hoping to engage the community as we develop the idea.
>
> Backgrounds
> ===
> Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of
> tasks, and all their future children, into hierarchical groups with 
> specialized
> behaviour, such as accounting/limiting the resources which processes in a
> cgroup can access[1].  Weights, limits, protections, allocations are the main
> resource distribution models.  Existing cgroup controllers includes cpu,
> memory, io, rdma, and more.  cgroup is one of the foundational technologies
> that enables the popular container application deployment and management 
> method.
>
> Direct Rendering Manager/drm contains code intended to support the needs of
> complex graphics devices. Graphics drivers in the kernel may make use of DRM
> functions to make tasks like memory management, interrupt handling and DMA
> easier, and provide a uniform interface to applications.  The DRM has also
> developed beyond traditional graphics applications to support compute/GPGPU
> applications.
>
> Motivations
> ===
> As GPU grow beyond the realm of desktop/workstation graphics into areas like
> data center clusters and IoT, there are increasing needs to monitor and
> regulate GPU as a resource like cpu, memory and io.
>
> Matt Roper from Intel began working on similar idea in early 2018 [2] for the
> purpose of managing GPU priority using the cgroup hierarchy.  While that
> particular use case may not warrant a standalone drm cgroup controller, there
> are other use cases where having one can be useful [3].  Monitoring GPU
> resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU
> (execution unit [Intel's nomenclature]), GPU job scheduling [4] can

[PATCH v2 10/11] drm, cgroup: add update trigger after limit change

2020-02-26 Thread Kenny Ho
Before this commit, drmcg limits are updated but enforcement is delayed
until the next time the driver check against the new limit.  While this
is sufficient for certain resources, a more proactive enforcement may be
needed for other resources.

Introducing an optional drmcg_limit_updated callback for the DRM
drivers.  When defined, it will be called in two scenarios:
1) When limits are updated for a particular cgroup, the callback will be
triggered for each task in the updated cgroup.
2) When a task is migrated from one cgroup to another, the callback will
be triggered for each resource type for the migrated task.

Change-Id: I0ce7c4e5a04c31bd0f8d9853a383575d4bc9a3fa
Signed-off-by: Kenny Ho 
---
 include/drm/drm_drv.h | 10 
 kernel/cgroup/drm.c   | 58 +++
 2 files changed, 68 insertions(+)

diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index 1f65ac4d9bbf..e7333143e722 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -724,6 +724,16 @@ struct drm_driver {
void (*drmcg_custom_init)(struct drm_device *dev,
struct drmcg_props *props);
 
+   /**
+* @drmcg_limit_updated
+*
+* Optional callback
+*/
+   void (*drmcg_limit_updated)(struct drm_device *dev,
+   struct task_struct *task,
+   struct drmcg_device_resource *ddr,
+   enum drmcg_res_type res_type);
+
/**
 * @gem_vm_ops: Driver private ops for this object
 *
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 2eadabebdfea..da439a351b07 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -127,6 +127,26 @@ static inline void drmcg_update_cg_tree(struct drm_device 
*dev)
mutex_unlock(_mutex);
 }
 
+static void drmcg_limit_updated(struct drm_device *dev, struct drmcg *drmcg,
+   enum drmcg_res_type res_type)
+{
+   struct drmcg_device_resource *ddr =
+   drmcg->dev_resources[dev->primary->index];
+   struct css_task_iter it;
+   struct task_struct *task;
+
+   if (dev->driver->drmcg_limit_updated == NULL)
+   return;
+
+   css_task_iter_start(>css.cgroup->self,
+   CSS_TASK_ITER_PROCS, );
+   while ((task = css_task_iter_next())) {
+   dev->driver->drmcg_limit_updated(dev, task,
+   ddr, res_type);
+   }
+   css_task_iter_end();
+}
+
 static void drmcg_calculate_effective_compute(struct drm_device *dev,
const unsigned long *free_weighted,
struct drmcg *parent_drmcg)
@@ -208,6 +228,8 @@ static void drmcg_apply_effective_compute(struct drm_device 
*dev)
 capacity);
ddr->compute_count_eff =
bitmap_weight(ddr->compute_eff, capacity);
+
+   drmcg_limit_updated(dev, drmcg, DRMCG_TYPE_COMPUTE);
}
}
rcu_read_unlock();
@@ -732,10 +754,46 @@ static int drmcg_css_online(struct cgroup_subsys_state 
*css)
return drm_minor_for_each(_online_fn, css_to_drmcg(css));
 }
 
+static int drmcg_attach_fn(int id, void *ptr, void *data)
+{
+   struct drm_minor *minor = ptr;
+   struct task_struct *task = data;
+   struct drm_device *dev;
+
+   if (minor->type != DRM_MINOR_PRIMARY)
+   return 0;
+
+   dev = minor->dev;
+
+   if (dev->driver->drmcg_limit_updated) {
+   struct drmcg *drmcg = drmcg_get(task);
+   struct drmcg_device_resource *ddr =
+   drmcg->dev_resources[minor->index];
+   enum drmcg_res_type type;
+
+   for (type = 0; type < __DRMCG_TYPE_LAST; type++)
+   dev->driver->drmcg_limit_updated(dev, task, ddr, type);
+
+   drmcg_put(drmcg);
+   }
+
+   return 0;
+}
+
+static void drmcg_attach(struct cgroup_taskset *tset)
+{
+   struct task_struct *task;
+   struct cgroup_subsys_state *css;
+
+   cgroup_taskset_for_each(task, css, tset)
+   drm_minor_for_each(_attach_fn, task);
+}
+
 struct cgroup_subsys gpu_cgrp_subsys = {
.css_alloc  = drmcg_css_alloc,
.css_free   = drmcg_css_free,
.css_online = drmcg_css_online,
+   .attach = drmcg_attach,
.early_init = false,
.legacy_cftypes = files,
.dfl_cftypes= files,
-- 
2.25.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v2 07/11] drm, cgroup: Add total GEM buffer allocation limit

2020-02-26 Thread Kenny Ho
The drm resource being limited here is the GEM buffer objects.  User
applications allocate and free these buffers.  In addition, a process
can allocate a buffer and share it with another process.  The consumer
of a shared buffer can also outlive the allocator of the buffer.

For the purpose of cgroup accounting and limiting, ownership of the
buffer is deemed to be the cgroup for which the allocating process
belongs to.  There is one cgroup limit per drm device.

The limiting functionality is added to the previous stats collection
function.  The drm_gem_private_object_init is modified to have a return
value to allow failure due to cgroup limit.

The try_chg function only fails if the DRM cgroup properties has
limit_enforced set to true for the DRM device.  This is to allow the DRM
cgroup controller to collect usage stats without enforcing the limits.

gpu.buffer.default
A read-only flat-keyed file which exists on the root cgroup.
Each entry is keyed by the drm device's major:minor.

Default limits on the total GEM buffer allocation in bytes.

gpu.buffer.max
A read-write flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Per device limits on the total GEM buffer allocation in byte.
This is a hard limit.  Attempts in allocating beyond the cgroup
limit will result in ENOMEM.  Shorthand understood by memparse
(such as k, m, g) can be used.

Set allocation limit for /dev/dri/card1 to 1GB
echo "226:1 1g" > gpu.buffer.total.max

Set allocation limit for /dev/dri/card0 to 512MB
echo "226:0 512m" > gpu.buffer.total.max

Change-Id: Id3265bbd0fafe84a16b59617df79bd32196160be
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst|  21 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  19 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   6 +-
 drivers/gpu/drm/drm_gem.c  |  11 +-
 include/drm/drm_cgroup.h   |   8 +-
 include/drm/drm_gem.h  |   2 +-
 include/linux/cgroup_drm.h |   1 +
 kernel/cgroup/drm.c| 227 -
 8 files changed, 278 insertions(+), 17 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 065f2b52da57..f2d7abf5c783 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2087,6 +2087,27 @@ GPU Interface Files
 
Total number of GEM buffer allocated.
 
+  gpu.buffer.default
+   A read-only flat-keyed file which exists on the root cgroup.
+   Each entry is keyed by the drm device's major:minor.
+
+   Default limits on the total GEM buffer allocation in bytes.
+
+  gpu.buffer.max
+   A read-write flat-keyed file which exists on all cgroups.  Each
+   entry is keyed by the drm device's major:minor.
+
+   Per device limits on the total GEM buffer allocation in byte.
+   This is a hard limit.  Attempts in allocating beyond the cgroup
+   limit will result in ENOMEM.  Shorthand understood by memparse
+   (such as k, m, g) can be used.
+
+   Set allocation limit for /dev/dri/card1 to 1GB
+   echo "226:1 1g" > gpu.buffer.total.max
+
+   Set allocation limit for /dev/dri/card0 to 512MB
+   echo "226:0 512m" > gpu.buffer.total.max
+
 GEM Buffer Ownership
 
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 6e1faf8a2bca..171397708855 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1413,6 +1413,23 @@ amdgpu_get_crtc_scanout_position(struct drm_device *dev, 
unsigned int pipe,
  stime, etime, mode);
 }
 
+#ifdef CONFIG_CGROUP_DRM
+
+static void amdgpu_drmcg_custom_init(struct drm_device *dev,
+   struct drmcg_props *props)
+{
+   props->limit_enforced = true;
+}
+
+#else
+
+static void amdgpu_drmcg_custom_init(struct drm_device *dev,
+   struct drmcg_props *props)
+{
+}
+
+#endif /* CONFIG_CGROUP_DRM */
+
 static struct drm_driver kms_driver = {
.driver_features =
DRIVER_USE_AGP | DRIVER_ATOMIC |
@@ -1444,6 +1461,8 @@ static struct drm_driver kms_driver = {
.gem_prime_vunmap = amdgpu_gem_prime_vunmap,
.gem_prime_mmap = amdgpu_gem_prime_mmap,
 
+   .drmcg_custom_init = amdgpu_drmcg_custom_init,
+
.name = DRIVER_NAME,
.desc = DRIVER_DESC,
.date = DRIVER_DATE,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 5766d20f29d8..4d08ccbc541a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -34,6 +34,7 @@
 
 #include 
 #include 
+#include 
 #include "amdgpu.h"

[PATCH v2 08/11] drm, cgroup: Add peak GEM buffer allocation limit

2020-02-26 Thread Kenny Ho
gpu.buffer.peak.default
A read-only flat-keyed file which exists on the root cgroup.
Each entry is keyed by the drm device's major:minor.

Default limits on the largest GEM buffer allocation in bytes.

gpu.buffer.peak.max
A read-write flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Per device limits on the largest GEM buffer allocation in bytes.
This is a hard limit.  Attempts in allocating beyond the cgroup
limit will result in ENOMEM.  Shorthand understood by memparse
(such as k, m, g) can be used.

Set largest allocation for /dev/dri/card1 to 4MB
echo "226:1 4m" > gpu.buffer.peak.max

Change-Id: I5ab3fb4a442b6cbd5db346be595897c90217da69
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst | 18 +++
 include/drm/drm_cgroup.h|  1 +
 include/linux/cgroup_drm.h  |  1 +
 kernel/cgroup/drm.c | 43 +
 4 files changed, 63 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index f2d7abf5c783..581343472651 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2108,6 +2108,24 @@ GPU Interface Files
Set allocation limit for /dev/dri/card0 to 512MB
echo "226:0 512m" > gpu.buffer.total.max
 
+  gpu.buffer.peak.default
+   A read-only flat-keyed file which exists on the root cgroup.
+   Each entry is keyed by the drm device's major:minor.
+
+   Default limits on the largest GEM buffer allocation in bytes.
+
+  gpu.buffer.peak.max
+   A read-write flat-keyed file which exists on all cgroups.  Each
+   entry is keyed by the drm device's major:minor.
+
+   Per device limits on the largest GEM buffer allocation in bytes.
+   This is a hard limit.  Attempts in allocating beyond the cgroup
+   limit will result in ENOMEM.  Shorthand understood by memparse
+   (such as k, m, g) can be used.
+
+   Set largest allocation for /dev/dri/card1 to 4MB
+   echo "226:1 4m" > gpu.buffer.peak.max
+
 GEM Buffer Ownership
 
 
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 2783e56690db..2b41d4d22e33 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -16,6 +16,7 @@ struct drmcg_props {
boollimit_enforced;
 
s64 bo_limits_total_allocated_default;
+   s64 bo_limits_peak_allocated_default;
 };
 
 void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 71023654fb77..aba3b26718c0 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -29,6 +29,7 @@ struct drmcg_device_resource {
s64 bo_limits_total_allocated;
 
s64 bo_stats_peak_allocated;
+   s64 bo_limits_peak_allocated;
 
s64 bo_stats_count_allocated;
 };
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 4b19e533941d..62d2a9d33d0c 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -95,6 +95,9 @@ static inline int init_drmcg_single(struct drmcg *drmcg, 
struct drm_device *dev)
ddr->bo_limits_total_allocated =
dev->drmcg_props.bo_limits_total_allocated_default;
 
+   ddr->bo_limits_peak_allocated =
+   dev->drmcg_props.bo_limits_peak_allocated_default;
+
return 0;
 }
 
@@ -305,6 +308,9 @@ static void drmcg_print_limits(struct drmcg_device_resource 
*ddr,
case DRMCG_TYPE_BO_TOTAL:
seq_printf(sf, "%lld\n", ddr->bo_limits_total_allocated);
break;
+   case DRMCG_TYPE_BO_PEAK:
+   seq_printf(sf, "%lld\n", ddr->bo_limits_peak_allocated);
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -319,6 +325,10 @@ static void drmcg_print_default(struct drmcg_props *props,
seq_printf(sf, "%lld\n",
props->bo_limits_total_allocated_default);
break;
+   case DRMCG_TYPE_BO_PEAK:
+   seq_printf(sf, "%lld\n",
+   props->bo_limits_peak_allocated_default);
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -476,6 +486,19 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file 
*of, char *buf,
 
ddr->bo_limits_total_allocated = val;
break;
+   case DRMCG_TYPE_BO_PEAK:
+   rc = drmcg_process_limit_s64_val

[PATCH v2 09/11] drm, cgroup: Add compute as gpu cgroup resource

2020-02-26 Thread Kenny Ho
gpu.compute.weight
  A read-write flat-keyed file which exists on all cgroups.  The
  default weight is 100.  Each entry is keyed by the DRM device's
  major:minor (the primary minor).  The weights are in the range [1,
  1] and specifies the relative amount of physical partitions
  the cgroup can use in relation to its siblings.  The partition
  concept here is analogous to the subdevice of OpenCL.

gpu.compute.effective
  A read-only nested-keyed file which exists on all cgroups.  Each
  entry is keyed by the DRM device's major:minor.

  It lists the GPU subdevices that are actually granted to this
  cgroup by its parent.  These subdevices are allowed to be used by
  tasks within the current cgroup.

  = ==
  count The total number of granted subdevices
  list  Enumeration of the subdevices
  = ==

Change-Id: Idde0ef9a331fd67bb9c7eb8ef9978439e6452488
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst |  21 +++
 include/drm/drm_cgroup.h|   3 +
 include/linux/cgroup_drm.h  |  16 +++
 kernel/cgroup/drm.c | 177 +++-
 4 files changed, 215 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 581343472651..f92f1f4a64d4 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2126,6 +2126,27 @@ GPU Interface Files
Set largest allocation for /dev/dri/card1 to 4MB
echo "226:1 4m" > gpu.buffer.peak.max
 
+  gpu.compute.weight
+   A read-write flat-keyed file which exists on all cgroups.  The
+   default weight is 100.  Each entry is keyed by the DRM device's
+   major:minor (the primary minor).  The weights are in the range
+   [1, 1] and specifies the relative amount of physical partitions 
+   the cgroup can use in relation to its siblings.  The partition
+   concept here is analogous to the subdevice concept of OpenCL.
+
+  gpu.compute.effective
+   A read-only nested-keyed file which exists on all cgroups.
+   Each entry is keyed by the DRM device's major:minor.
+
+   It lists the GPU subdevices that are actually granted to this
+   cgroup by its parent.  These subdevices are allowed to be used
+   by tasks within the current cgroup.
+
+ = ==
+ count The total number of granted subdevices
+ list  Enumeration of the subdevices
+ = ==
+
 GEM Buffer Ownership
 
 
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 2b41d4d22e33..5aac47ca536f 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -17,6 +17,9 @@ struct drmcg_props {
 
s64 bo_limits_total_allocated_default;
s64 bo_limits_peak_allocated_default;
+
+   int compute_capacity;
+   DECLARE_BITMAP(compute_slots, MAX_DRMCG_COMPUTE_CAPACITY);
 };
 
 void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index aba3b26718c0..fd02f59cabab 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -11,10 +11,14 @@
 /* limit defined per the way drm_minor_alloc operates */
 #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
 
+#define MAX_DRMCG_COMPUTE_CAPACITY 256
+
 enum drmcg_res_type {
DRMCG_TYPE_BO_TOTAL,
DRMCG_TYPE_BO_PEAK,
DRMCG_TYPE_BO_COUNT,
+   DRMCG_TYPE_COMPUTE,
+   DRMCG_TYPE_COMPUTE_EFF,
__DRMCG_TYPE_LAST,
 };
 
@@ -32,6 +36,18 @@ struct drmcg_device_resource {
s64 bo_limits_peak_allocated;
 
s64 bo_stats_count_allocated;
+
+/* compute_stg is used to calculate _eff before applying to _eff
+* after considering the entire hierarchy
+*/
+   DECLARE_BITMAP(compute_stg, MAX_DRMCG_COMPUTE_CAPACITY);
+   /* user configurations */
+   s64 compute_weight;
+   /* effective compute for the cgroup after considering
+* relationship with other cgroup
+*/
+   s64 compute_count_eff;
+   DECLARE_BITMAP(compute_eff, MAX_DRMCG_COMPUTE_CAPACITY);
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 62d2a9d33d0c..2eadabebdfea 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -98,6 +99,11 @@ static inline int init_drmcg_single(struct drmcg *drmcg, 
struct drm_device *dev)
ddr->bo_limit

[PATCH v2 11/11] drm/amdgpu: Integrate with DRM cgroup

2020-02-26 Thread Kenny Ho
The number of compute unit (CU) for a device is used for the gpu cgroup
compute capacity.  The gpu cgroup compute allocation limit only applies
to compute workload for the moment (enforced via kfd queue creation.)
Any cu_mask update is validated against the availability of the compute
unit as defined by the drmcg the kfd process belongs to.

Change-Id: I2930e76ef9ac6d36d0feb81f604c89a4208e6614
Signed-off-by: Kenny Ho 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  29 
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   7 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   3 +
 .../amd/amdkfd/kfd_process_queue_manager.c| 153 ++
 5 files changed, 196 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 0ee8aae6c519..1efbc0d3c03e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -199,6 +199,10 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev 
*dst, struct kgd_dev *s
valid;  \
})
 
+int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
+   struct amdgpu_device *adev, unsigned long *compute_bm,
+   unsigned int compute_bm_size);
+
 /* GPUVM API */
 int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int 
pasid,
void **vm, void **process_info,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 171397708855..595ad852080b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1418,9 +1418,31 @@ amdgpu_get_crtc_scanout_position(struct drm_device *dev, 
unsigned int pipe,
 static void amdgpu_drmcg_custom_init(struct drm_device *dev,
struct drmcg_props *props)
 {
+   struct amdgpu_device *adev = dev->dev_private;
+
+   props->compute_capacity = adev->gfx.cu_info.number;
+   bitmap_zero(props->compute_slots, MAX_DRMCG_COMPUTE_CAPACITY);
+   bitmap_fill(props->compute_slots, props->compute_capacity);
+
props->limit_enforced = true;
 }
 
+static void amdgpu_drmcg_limit_updated(struct drm_device *dev,
+   struct task_struct *task, struct drmcg_device_resource *ddr,
+   enum drmcg_res_type res_type)
+{
+   struct amdgpu_device *adev = dev->dev_private;
+
+   switch (res_type) {
+   case DRMCG_TYPE_COMPUTE:
+   amdgpu_amdkfd_update_cu_mask_for_process(task, adev,
+ddr->compute_eff, dev->drmcg_props.compute_capacity);
+   break;
+   default:
+   break;
+   }
+}
+
 #else
 
 static void amdgpu_drmcg_custom_init(struct drm_device *dev,
@@ -1428,6 +1450,12 @@ static void amdgpu_drmcg_custom_init(struct drm_device 
*dev,
 {
 }
 
+static void amdgpu_drmcg_limit_updated(struct drm_device *dev,
+   struct task_struct *task, struct drmcg_device_resource *ddr,
+   enum drmcg_res_type res_type)
+{
+}
+
 #endif /* CONFIG_CGROUP_DRM */
 
 static struct drm_driver kms_driver = {
@@ -1462,6 +1490,7 @@ static struct drm_driver kms_driver = {
.gem_prime_mmap = amdgpu_gem_prime_mmap,
 
.drmcg_custom_init = amdgpu_drmcg_custom_init,
+   .drmcg_limit_updated = amdgpu_drmcg_limit_updated,
 
.name = DRIVER_NAME,
.desc = DRIVER_DESC,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 675735b8243a..a35596f2dc4e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -451,6 +451,13 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct 
kfd_process *p,
return -EFAULT;
}
 
+   if (!pqm_drmcg_compute_validate(p, args->queue_id,
+properties.cu_mask, cu_mask_size)) {
+   pr_debug("CU mask not permitted by DRM Cgroup");
+   kfree(properties.cu_mask);
+   return -EACCES;
+   }
+
mutex_lock(>mutex);
 
retval = pqm_set_cu_mask(>pqm, args->queue_id, );
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 063096ec832d..0fb619586e24 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -929,6 +929,9 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
   u32 *ctl_stack_used_size,
   u32 *save_area_used_size);
 
+bool pqm_drmcg_compute_validate(struct kfd_process *p, int qid, u32 *cu_mask,
+   unsigned int cu_mask_size);
+
 int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
  unsigned int fence_value,
  unsign

[PATCH v2 01/11] cgroup: Introduce cgroup for drm subsystem

2020-02-26 Thread Kenny Ho
With the increased importance of machine learning, data science and
other cloud-based applications, GPUs are already in production use in
data centers today.  Existing GPU resource management is very coarse
grain, however, as sysadmins are only able to distribute workload on a
per-GPU basis.  An alternative is to use GPU virtualization (with or
without SRIOV) but it generally acts on the entire GPU instead of the
specific resources in a GPU.  With a drm cgroup controller, we can
enable alternate, fine-grain, sub-GPU resource management (in addition
to what may be available via GPU virtualization.)

Change-Id: Ia90aed8c4cb89ff20d8216a903a765655b44fc9a
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst | 18 -
 Documentation/cgroup-v1/drm.rst |  1 +
 include/linux/cgroup_drm.h  | 92 +
 include/linux/cgroup_subsys.h   |  4 ++
 init/Kconfig|  5 ++
 kernel/cgroup/Makefile  |  1 +
 kernel/cgroup/drm.c | 42 +++
 7 files changed, 161 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/cgroup-v1/drm.rst
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 0636bcb60b5a..7deff912185e 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -61,8 +61,10 @@ v1 is available under Documentation/admin-guide/cgroup-v1/.
  5-6. Device
  5-7. RDMA
5-7-1. RDMA Interface Files
- 5-8. Misc
-   5-8-1. perf_event
+ 5-8. GPU
+   5-8-1. GPU Interface Files
+ 5-9. Misc
+   5-9-1. perf_event
  5-N. Non-normative information
5-N-1. CPU controller root cgroup process behaviour
5-N-2. IO controller root cgroup process behaviour
@@ -2057,6 +2059,18 @@ RDMA Interface Files
  ocrdma1 hca_handle=1 hca_object=23
 
 
+GPU
+---
+
+The "gpu" controller regulates the distribution and accounting of
+of GPU-related resources.
+
+GPU Interface Files
+
+
+TODO
+
+
 Misc
 
 
diff --git a/Documentation/cgroup-v1/drm.rst b/Documentation/cgroup-v1/drm.rst
new file mode 100644
index ..5f5658e1f5ed
--- /dev/null
+++ b/Documentation/cgroup-v1/drm.rst
@@ -0,0 +1 @@
+Please see ../cgroup-v2.rst for details
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
new file mode 100644
index ..345af54a5d41
--- /dev/null
+++ b/include/linux/cgroup_drm.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ */
+#ifndef _CGROUP_DRM_H
+#define _CGROUP_DRM_H
+
+#include 
+
+#ifdef CONFIG_CGROUP_DRM
+
+/**
+ * The DRM cgroup controller data structure.
+ */
+struct drmcg {
+   struct cgroup_subsys_state  css;
+};
+
+/**
+ * css_to_drmcg - get the corresponding drmcg ref from a cgroup_subsys_state
+ * @css: the target cgroup_subsys_state
+ *
+ * Return: DRM cgroup that contains the @css
+ */
+static inline struct drmcg *css_to_drmcg(struct cgroup_subsys_state *css)
+{
+   return css ? container_of(css, struct drmcg, css) : NULL;
+}
+
+/**
+ * drmcg_get - get the drmcg reference that a task belongs to
+ * @task: the target task
+ *
+ * This increase the reference count of the css that the @task belongs to
+ *
+ * Return: reference to the DRM cgroup the task belongs to
+ */
+static inline struct drmcg *drmcg_get(struct task_struct *task)
+{
+   return css_to_drmcg(task_get_css(task, gpu_cgrp_id));
+}
+
+/**
+ * drmcg_put - put a drmcg reference
+ * @drmcg: the target drmcg
+ *
+ * Put a reference obtained via drmcg_get
+ */
+static inline void drmcg_put(struct drmcg *drmcg)
+{
+   if (drmcg)
+   css_put(>css);
+}
+
+/**
+ * drmcg_parent - find the parent of a drm cgroup
+ * @cg: the target drmcg
+ *
+ * This does not increase the reference count of the parent cgroup
+ *
+ * Return: parent DRM cgroup of @cg
+ */
+static inline struct drmcg *drmcg_parent(struct drmcg *cg)
+{
+   return css_to_drmcg(cg->css.parent);
+}
+
+#else /* CONFIG_CGROUP_DRM */
+
+struct drmcg {
+};
+
+static inline struct drmcg *css_to_drmcg(struct cgroup_subsys_state *css)
+{
+   return NULL;
+}
+
+static inline struct drmcg *drmcg_get(struct task_struct *task)
+{
+   return NULL;
+}
+
+static inline void drmcg_put(struct drmcg *drmcg)
+{
+}
+
+static inline struct drmcg *drmcg_parent(struct drmcg *cg)
+{
+   return NULL;
+}
+
+#endif /* CONFIG_CGROUP_DRM */
+#endif /* _CGROUP_DRM_H */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index acb77dcff3b4..f4e627942115 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -61,6 +61,10 @@ SUBSYS(pids)
 SUBSYS(rdma)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_DRM)
+SUBSYS(gpu)
+#endif
+
 /*
  * The following subsystems are not supporte

[PATCH v2 06/11] drm, cgroup: Add GEM buffer allocation count stats

2020-02-26 Thread Kenny Ho
gpu.buffer.count.stats
A read-only flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Total number of GEM buffer allocated.

Change-Id: Iad29bdf44390dbcee07b1e72ea0ff811aa3b9dcd
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst |  6 ++
 include/linux/cgroup_drm.h  |  3 +++
 kernel/cgroup/drm.c | 22 +++---
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 6199cc9a978f..065f2b52da57 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2081,6 +2081,12 @@ GPU Interface Files
 
Largest (high water mark) GEM buffer allocated in bytes.
 
+  gpu.buffer.count.stats
+   A read-only flat-keyed file which exists on all cgroups.  Each
+   entry is keyed by the drm device's major:minor.
+
+   Total number of GEM buffer allocated.
+
 GEM Buffer Ownership
 
 
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index d90807627213..103868d972d0 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -14,6 +14,7 @@
 enum drmcg_res_type {
DRMCG_TYPE_BO_TOTAL,
DRMCG_TYPE_BO_PEAK,
+   DRMCG_TYPE_BO_COUNT,
__DRMCG_TYPE_LAST,
 };
 
@@ -27,6 +28,8 @@ struct drmcg_device_resource {
s64 bo_stats_total_allocated;
 
s64 bo_stats_peak_allocated;
+
+   s64 bo_stats_count_allocated;
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 68b23693418b..5a700833a304 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -280,6 +280,9 @@ static void drmcg_print_stats(struct drmcg_device_resource 
*ddr,
case DRMCG_TYPE_BO_PEAK:
seq_printf(sf, "%lld\n", ddr->bo_stats_peak_allocated);
break;
+   case DRMCG_TYPE_BO_COUNT:
+   seq_printf(sf, "%lld\n", ddr->bo_stats_count_allocated);
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -334,6 +337,12 @@ struct cftype files[] = {
.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_PEAK,
DRMCG_FTYPE_STATS),
},
+   {
+   .name = "buffer.count.stats",
+   .seq_show = drmcg_seq_show,
+   .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_COUNT,
+   DRMCG_FTYPE_STATS),
+   },
{ } /* terminate */
 };
 
@@ -385,6 +394,8 @@ void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct 
drm_device *dev,
 
if (ddr->bo_stats_peak_allocated < (s64)size)
ddr->bo_stats_peak_allocated = (s64)size;
+
+   ddr->bo_stats_count_allocated++;
}
mutex_unlock(>drmcg_mutex);
 }
@@ -402,15 +413,20 @@ EXPORT_SYMBOL(drmcg_chg_bo_alloc);
 void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
size_t size)
 {
+   struct drmcg_device_resource *ddr;
int devIdx = dev->primary->index;
 
if (drmcg == NULL)
return;
 
mutex_lock(>drmcg_mutex);
-   for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg))
-   drmcg->dev_resources[devIdx]->bo_stats_total_allocated
-   -= (s64)size;
+   for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
+   ddr = drmcg->dev_resources[devIdx];
+
+   ddr->bo_stats_total_allocated -= (s64)size;
+
+   ddr->bo_stats_count_allocated--;
+   }
mutex_unlock(>drmcg_mutex);
 }
 EXPORT_SYMBOL(drmcg_unchg_bo_alloc);
-- 
2.25.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v2 05/11] drm, cgroup: Add peak GEM buffer allocation stats

2020-02-26 Thread Kenny Ho
gpu.buffer.peak.stats
A read-only flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Largest (high water mark) GEM buffer allocated in bytes.

Change-Id: I40fe4c13c1cea8613b3e04b802f3e1f19eaab4fc
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst |  6 ++
 include/linux/cgroup_drm.h  |  3 +++
 kernel/cgroup/drm.c | 12 
 3 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index c041e672cc10..6199cc9a978f 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2075,6 +2075,12 @@ GPU Interface Files
 
Total GEM buffer allocation in bytes.
 
+  gpu.buffer.peak.stats
+   A read-only flat-keyed file which exists on all cgroups.  Each
+   entry is keyed by the drm device's major:minor.
+
+   Largest (high water mark) GEM buffer allocated in bytes.
+
 GEM Buffer Ownership
 
 
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 34b0aec7c964..d90807627213 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -13,6 +13,7 @@
 
 enum drmcg_res_type {
DRMCG_TYPE_BO_TOTAL,
+   DRMCG_TYPE_BO_PEAK,
__DRMCG_TYPE_LAST,
 };
 
@@ -24,6 +25,8 @@ enum drmcg_res_type {
 struct drmcg_device_resource {
/* for per device stats */
s64 bo_stats_total_allocated;
+
+   s64 bo_stats_peak_allocated;
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index addb096edac5..68b23693418b 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -277,6 +277,9 @@ static void drmcg_print_stats(struct drmcg_device_resource 
*ddr,
case DRMCG_TYPE_BO_TOTAL:
seq_printf(sf, "%lld\n", ddr->bo_stats_total_allocated);
break;
+   case DRMCG_TYPE_BO_PEAK:
+   seq_printf(sf, "%lld\n", ddr->bo_stats_peak_allocated);
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -325,6 +328,12 @@ struct cftype files[] = {
.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_TOTAL,
DRMCG_FTYPE_STATS),
},
+   {
+   .name = "buffer.peak.stats",
+   .seq_show = drmcg_seq_show,
+   .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_PEAK,
+   DRMCG_FTYPE_STATS),
+   },
{ } /* terminate */
 };
 
@@ -373,6 +382,9 @@ void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct 
drm_device *dev,
ddr = drmcg->dev_resources[devIdx];
 
ddr->bo_stats_total_allocated += (s64)size;
+
+   if (ddr->bo_stats_peak_allocated < (s64)size)
+   ddr->bo_stats_peak_allocated = (s64)size;
}
mutex_unlock(>drmcg_mutex);
 }
-- 
2.25.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v2 03/11] drm, cgroup: Initialize drmcg properties

2020-02-26 Thread Kenny Ho
drmcg initialization involves allocating a per cgroup, per device data
structure and setting the defaults.  There are two entry points for
drmcg init:

1) When struct drmcg is created via css_alloc, initialization is done
for each device

2) When DRM devices are created after drmcgs are created
  a) Per device drmcg data structure is allocated at the beginning of
  DRM device creation such that drmcg can begin tracking usage
  statistics
  b) At the end of DRM device creation, drmcg_register_dev will update in
  case device specific defaults need to be applied.

Entry point #2 usually applies to the root cgroup since it can be
created before DRM devices are available.  The drmcg controller will go
through all existing drm cgroups and initialize them with the new device
accordingly.

Change-Id: I64e421d8dfcc22ee8282cc1305960e20c2704db7
Signed-off-by: Kenny Ho 
---
 drivers/gpu/drm/drm_drv.c  |   4 ++
 include/drm/drm_cgroup.h   |  18 +++
 include/drm/drm_device.h   |   7 +++
 include/drm/drm_drv.h  |   9 
 include/linux/cgroup_drm.h |  12 +
 kernel/cgroup/drm.c| 105 +
 6 files changed, 155 insertions(+)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index e418a61f5c85..e10bd42ebdba 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -644,6 +644,7 @@ int drm_dev_init(struct drm_device *dev,
mutex_init(>filelist_mutex);
mutex_init(>clientlist_mutex);
mutex_init(>master_mutex);
+   mutex_init(>drmcg_mutex);
 
dev->anon_inode = drm_fs_inode_new();
if (IS_ERR(dev->anon_inode)) {
@@ -680,6 +681,7 @@ int drm_dev_init(struct drm_device *dev,
if (ret)
goto err_setunique;
 
+   drmcg_device_early_init(dev);
return 0;
 
 err_setunique:
@@ -694,6 +696,7 @@ int drm_dev_init(struct drm_device *dev,
drm_fs_inode_free(dev->anon_inode);
 err_free:
put_device(dev->dev);
+   mutex_destroy(>drmcg_mutex);
mutex_destroy(>master_mutex);
mutex_destroy(>clientlist_mutex);
mutex_destroy(>filelist_mutex);
@@ -770,6 +773,7 @@ void drm_dev_fini(struct drm_device *dev)
 
put_device(dev->dev);
 
+   mutex_destroy(>drmcg_mutex);
mutex_destroy(>master_mutex);
mutex_destroy(>clientlist_mutex);
mutex_destroy(>filelist_mutex);
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 530c9a0b3238..fda426fba035 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -4,8 +4,17 @@
 #ifndef __DRM_CGROUP_H__
 #define __DRM_CGROUP_H__
 
+#include 
+
 #ifdef CONFIG_CGROUP_DRM
 
+/**
+ * Per DRM device properties for DRM cgroup controller for the purpose
+ * of storing per device defaults
+ */
+struct drmcg_props {
+};
+
 void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
void (*put_ddev)(struct drm_device *dev));
 
@@ -15,8 +24,13 @@ void drmcg_register_dev(struct drm_device *dev);
 
 void drmcg_unregister_dev(struct drm_device *dev);
 
+void drmcg_device_early_init(struct drm_device *device);
+
 #else
 
+struct drmcg_props {
+};
+
 static inline void drmcg_bind(
struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
void (*put_ddev)(struct drm_device *dev))
@@ -35,5 +49,9 @@ static inline void drmcg_unregister_dev(struct drm_device 
*dev)
 {
 }
 
+static inline void drmcg_device_early_init(struct drm_device *device)
+{
+}
+
 #endif /* CONFIG_CGROUP_DRM */
 #endif /* __DRM_CGROUP_H__ */
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index 1acfc3bbd3fb..a94598b8f670 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 
 struct drm_driver;
 struct drm_minor;
@@ -308,6 +309,12 @@ struct drm_device {
 */
struct drm_fb_helper *fb_helper;
 
+/** \name DRM Cgroup */
+   /*@{ */
+   struct mutex drmcg_mutex;
+   struct drmcg_props drmcg_props;
+   /*@} */
+
/* Everything below here is for legacy driver, never use! */
/* private: */
 #if IS_ENABLED(CONFIG_DRM_LEGACY)
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index cf13470810a5..1f65ac4d9bbf 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -715,6 +715,15 @@ struct drm_driver {
struct drm_device *dev,
uint32_t handle);
 
+   /**
+* @drmcg_custom_init
+*
+* Optional callback used to initialize drm cgroup per device properties
+* such as resource limit defaults.
+*/
+   void (*drmcg_custom_init)(struct drm_device *dev,
+   struct drmcg_props *props);
+
/**
 * @gem_vm_ops: Driver private ops for this object
 *
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 30

[PATCH v2 04/11] drm, cgroup: Add total GEM buffer allocation stats

2020-02-26 Thread Kenny Ho
The drm resource being measured here is the GEM buffer objects.  User
applications allocate and free these buffers.  In addition, a process
can allocate a buffer and share it with another process.  The consumer
of a shared buffer can also outlive the allocator of the buffer.

For the purpose of cgroup accounting and limiting, ownership of the
buffer is deemed to be the cgroup for which the allocating process
belongs to.  There is one cgroup stats per drm device.  Each allocation
is charged to the owning cgroup as well as all its ancestors.

Similar to the memory cgroup, migrating a process to a different cgroup
does not move the GEM buffer usages that the process started while in
previous cgroup, to the new cgroup.

The following is an example to illustrate some of the operations.  Given
the following cgroup hierarchy (The letters are cgroup names with R
being the root cgroup.  The numbers in brackets are processes.  The
processes are placed with cgroup's 'No Internal Process Constraint' in
mind, so no process is placed in cgroup B.)

R (4, 5) -- A (6)
 \
  B  C (7,8)
   \
D (9)

Here is a list of operation and the associated effect on the size
track by the cgroups (for simplicity, each buffer is 1 unit in size.)

==  ==  ==  ==  ==  ===
R   A   B   C   D   Ops
==  ==  ==  ==  ==  ===
1   0   0   0   0   4 allocated a buffer
1   0   0   0   0   4 shared a buffer with 5
1   0   0   0   0   4 shared a buffer with 9
2   0   1   0   1   9 allocated a buffer
3   0   2   1   1   7 allocated a buffer
3   0   2   1   1   7 shared a buffer with 8
3   0   2   1   1   7 sharing with 9
3   0   2   1   1   7 release a buffer
3   0   2   1   1   7 migrate to cgroup D
3   0   2   1   1   9 release a buffer from 7
2   0   1   0   1   8 release a buffer from 7 (last ref to shared buf)
==  ==  ==  ==  ==  ===

gpu.buffer.stats
A read-only flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Total GEM buffer allocation in bytes.

Change-Id: Ibc1f646ca7dbc588e2d11802b156b524696a23e7
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst |  50 +-
 drivers/gpu/drm/drm_gem.c   |   9 ++
 include/drm/drm_cgroup.h|  16 +++
 include/drm/drm_gem.h   |  10 ++
 include/linux/cgroup_drm.h  |   6 ++
 kernel/cgroup/drm.c | 126 
 6 files changed, 216 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 7deff912185e..c041e672cc10 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -63,6 +63,7 @@ v1 is available under Documentation/admin-guide/cgroup-v1/.
5-7-1. RDMA Interface Files
  5-8. GPU
5-8-1. GPU Interface Files
+   5-8-2. GEM Buffer Ownership
  5-9. Misc
5-9-1. perf_event
  5-N. Non-normative information
@@ -2068,7 +2069,54 @@ of GPU-related resources.
 GPU Interface Files
 
 
-TODO
+  gpu.buffer.stats
+   A read-only flat-keyed file which exists on all cgroups.  Each
+   entry is keyed by the drm device's major:minor.
+
+   Total GEM buffer allocation in bytes.
+
+GEM Buffer Ownership
+
+
+For the purpose of cgroup accounting and limiting, ownership of the
+buffer is deemed to be the cgroup for which the allocating process
+belongs to.  There is one cgroup stats per drm device.  Each allocation
+is charged to the owning cgroup as well as all its ancestors.
+
+Similar to the memory cgroup, migrating a process to a different cgroup
+does not move the GEM buffer usages that the process started while in
+previous cgroup, to the new cgroup.
+
+The following is an example to illustrate some of the operations.  Given
+the following cgroup hierarchy (The letters are cgroup names with R
+being the root cgroup.  The numbers in brackets are processes.  The
+processes are placed with cgroup's 'No Internal Process Constraint' in
+mind, so no process is placed in cgroup B.)
+
+R (4, 5) -- A (6)
+ \
+  B  C (7,8)
+   \
+D (9)
+
+Here is a list of operation and the associated effect on the size
+track by the cgroups (for simplicity, each buffer is 1 unit in size.)
+
+==  ==  ==  ==  ==  ===
+R   A   B   C   D   Ops
+==  ==  ==  ==  ==  ===
+1   0   0   0   0   4 allocated a buffer
+1   0   0   0   0   4 shared a buffer with 5
+1   0   0   0   0   4 shared a buffer with 9
+2   0   1   0   1   9 allocated a buffer
+3   0   2   1   1   7 allocated a buffer
+3   0   2   1   1   7 shared a buffer with 8
+3   0   2   1   1   7 sharing with 9
+3   0   2   1   1   7 release a buffer
+3

[PATCH v2 02/11] drm, cgroup: Bind drm and cgroup subsystem

2020-02-26 Thread Kenny Ho
Since the drm subsystem can be compiled as a module and drm devices can
be added and removed during run time, add several functions to bind the
drm subsystem as well as drm devices with drmcg.

Two pairs of functions:
drmcg_bind/drmcg_unbind - used to bind/unbind the drm subsystem to the
cgroup subsystem as the drm core initialize/exit.

drmcg_register_dev/drmcg_unregister_dev - used to register/unregister
drm devices to the cgroup subsystem as the devices are presented/removed
from userspace.

Change-Id: I1cb6b2080fc7d27979d886ef23e784341efafb41
---
 drivers/gpu/drm/drm_drv.c  |   8 +++
 include/drm/drm_cgroup.h   |  39 +++
 include/linux/cgroup_drm.h |   4 ++
 kernel/cgroup/drm.c| 131 +
 4 files changed, 182 insertions(+)
 create mode 100644 include/drm/drm_cgroup.h

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 7c18a980cd4b..e418a61f5c85 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "drm_crtc_internal.h"
 #include "drm_internal.h"
@@ -973,6 +974,8 @@ int drm_dev_register(struct drm_device *dev, unsigned long 
flags)
 
ret = 0;
 
+   drmcg_register_dev(dev);
+
DRM_INFO("Initialized %s %d.%d.%d %s for %s on minor %d\n",
 driver->name, driver->major, driver->minor,
 driver->patchlevel, driver->date,
@@ -1007,6 +1010,8 @@ EXPORT_SYMBOL(drm_dev_register);
  */
 void drm_dev_unregister(struct drm_device *dev)
 {
+   drmcg_unregister_dev(dev);
+
if (drm_core_check_feature(dev, DRIVER_LEGACY))
drm_lastclose(dev);
 
@@ -1113,6 +1118,7 @@ static const struct file_operations drm_stub_fops = {
 
 static void drm_core_exit(void)
 {
+   drmcg_unbind();
unregister_chrdev(DRM_MAJOR, "drm");
debugfs_remove(drm_debugfs_root);
drm_sysfs_destroy();
@@ -1139,6 +1145,8 @@ static int __init drm_core_init(void)
if (ret < 0)
goto error;
 
+   drmcg_bind(_minor_acquire, _dev_put);
+
drm_core_init_complete = true;
 
DRM_DEBUG("Initialized\n");
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
new file mode 100644
index ..530c9a0b3238
--- /dev/null
+++ b/include/drm/drm_cgroup.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ */
+#ifndef __DRM_CGROUP_H__
+#define __DRM_CGROUP_H__
+
+#ifdef CONFIG_CGROUP_DRM
+
+void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
+   void (*put_ddev)(struct drm_device *dev));
+
+void drmcg_unbind(void);
+
+void drmcg_register_dev(struct drm_device *dev);
+
+void drmcg_unregister_dev(struct drm_device *dev);
+
+#else
+
+static inline void drmcg_bind(
+   struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
+   void (*put_ddev)(struct drm_device *dev))
+{
+}
+
+static inline void drmcg_unbind(void)
+{
+}
+
+static inline void drmcg_register_dev(struct drm_device *dev)
+{
+}
+
+static inline void drmcg_unregister_dev(struct drm_device *dev)
+{
+}
+
+#endif /* CONFIG_CGROUP_DRM */
+#endif /* __DRM_CGROUP_H__ */
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 345af54a5d41..307bb75db248 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -5,6 +5,10 @@
 #define _CGROUP_DRM_H
 
 #include 
+#include 
+
+/* limit defined per the way drm_minor_alloc operates */
+#define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
 
 #ifdef CONFIG_CGROUP_DRM
 
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 5e38a8230922..061bb9c458e4 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -1,11 +1,142 @@
 // SPDX-License-Identifier: MIT
 // Copyright 2019 Advanced Micro Devices, Inc.
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 static struct drmcg *root_drmcg __read_mostly;
 
+/* global mutex for drmcg across all devices */
+static DEFINE_MUTEX(drmcg_mutex);
+
+static DECLARE_BITMAP(known_devs, MAX_DRM_DEV);
+
+static struct drm_minor (*(*acquire_drm_minor)(unsigned int minor_id));
+
+static void (*put_drm_dev)(struct drm_device *dev);
+
+/**
+ * drmcg_bind - Bind DRM subsystem to cgroup subsystem
+ * @acq_dm: function pointer to the drm_minor_acquire function
+ * @put_ddev: function pointer to the drm_dev_put function
+ *
+ * This function binds some functions from the DRM subsystem and make
+ * them available to the drmcg subsystem.
+ *
+ * drmcg_unbind does the opposite of this function
+ */
+void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
+   void (*put_ddev)(struct drm_device *dev))
+{
+   mutex_lock(_mutex);
+   acquire_drm_minor = acq_dm;
+   put_drm_dev = put_ddev;
+   mutex_unlock(_mutex);
+}
+EXPORT_SYMBOL(drmcg_bind);
+
+/**
+ * drmcg_unbind - Unbind DRM subsystem from cgroup subsystem
+ *
+ * drmcg_bind 

[PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-02-26 Thread Kenny Ho
 instead of the specific resources in a GPU.
With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU
resource management (in addition to what may be available via GPU
virtualization.)

In addition to production use, the DRM cgroup can also help with testing
graphics application robustness by providing a mean to artificially limit DRM
resources availble to the applications.


Challenges
==
While there are common infrastructure in DRM that is shared across many vendors
(the scheduler [4] for example), there are also aspects of DRM that are vendor
specific.  To accommodate this, we borrowed the mechanism used by the cgroup to
handle different kinds of cgroup controller.

Resources for DRM are also often device (GPU) specific instead of system
specific and a system may contain more than one GPU.  For this, we borrowed
some of the ideas from RDMA cgroup controller.

Approach

To experiment with the idea of a DRM cgroup, we would like to start with basic
accounting and statistics, then continue to iterate and add regulating
mechanisms into the driver.

[1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
[2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
[3] https://www.spinics.net/lists/cgroups/msg20720.html
[4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler
[5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
[6] 
https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/
[7] https://github.com/RadeonOpenCompute/k8s-device-plugin
[8] https://github.com/kubernetes/kubernetes/issues/52757

Kenny Ho (11):
  cgroup: Introduce cgroup for drm subsystem
  drm, cgroup: Bind drm and cgroup subsystem
  drm, cgroup: Initialize drmcg properties
  drm, cgroup: Add total GEM buffer allocation stats
  drm, cgroup: Add peak GEM buffer allocation stats
  drm, cgroup: Add GEM buffer allocation count stats
  drm, cgroup: Add total GEM buffer allocation limit
  drm, cgroup: Add peak GEM buffer allocation limit
  drm, cgroup: Add compute as gpu cgroup resource
  drm, cgroup: add update trigger after limit change
  drm/amdgpu: Integrate with DRM cgroup

 Documentation/admin-guide/cgroup-v2.rst   | 138 ++-
 Documentation/cgroup-v1/drm.rst   |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  48 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c|   6 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   7 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   3 +
 .../amd/amdkfd/kfd_process_queue_manager.c| 153 +++
 drivers/gpu/drm/drm_drv.c |  12 +
 drivers/gpu/drm/drm_gem.c |  16 +-
 include/drm/drm_cgroup.h  |  81 ++
 include/drm/drm_device.h  |   7 +
 include/drm/drm_drv.h |  19 +
 include/drm/drm_gem.h |  12 +-
 include/linux/cgroup_drm.h| 138 +++
 include/linux/cgroup_subsys.h |   4 +
 init/Kconfig  |   5 +
 kernel/cgroup/Makefile|   1 +
 kernel/cgroup/drm.c   | 913 ++
 19 files changed, 1563 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/cgroup-v1/drm.rst
 create mode 100644 include/drm/drm_cgroup.h
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

-- 
2.25.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource

2020-02-20 Thread Kenny Ho
Thanks, I will take a look.

Regards,
Kenny

On Wed, Feb 19, 2020 at 1:38 PM Johannes Weiner  wrote:
>
> On Wed, Feb 19, 2020 at 11:28:48AM -0500, Kenny Ho wrote:
> > On Wed, Feb 19, 2020 at 11:18 AM Johannes Weiner  wrote:
> > >
> > > Yes, I'd go with absolute units when it comes to memory, because it's
> > > not a renewable resource like CPU and IO, and so we do have cliff
> > > behavior around the edge where you transition from ok to not-enough.
> > >
> > > memory.low is a bit in flux right now, so if anything is unclear
> > > around its semantics, please feel free to reach out.
> >
> > I am not familiar with the discussion, would you point me to a
> > relevant thread please?
>
> Here is a cleanup patch, not yet merged, that documents the exact
> semantics and behavioral considerations:
>
> https://lore.kernel.org/linux-mm/20191213192158.188939-3-han...@cmpxchg.org/
>
> But the high-level idea is this: you assign each cgroup or cgroup
> subtree a chunk of the resource that it's guaranteed to be able to
> consume. It *can* consume beyond that threshold if available, but that
> overage may get reclaimed again if somebody else needs it instead.
>
> This allows you to do a ballpark distribution of the resource between
> different workloads, while the kernel retains the ability to optimize
> allocation of spare resources - because in practice, workload demand
> varies over time, workloads disappear and new ones start up etc.
>
> > In addition, is there some kind of order of preference for
> > implementing low vs high vs max?
>
> If you implement only one allocation model, the preference would be on
> memory.low. Limits are rigid and per definition waste resources, so in
> practice we're moving away from them.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource

2020-02-19 Thread Kenny Ho
On Wed, Feb 19, 2020 at 11:18 AM Johannes Weiner  wrote:
>
> Yes, I'd go with absolute units when it comes to memory, because it's
> not a renewable resource like CPU and IO, and so we do have cliff
> behavior around the edge where you transition from ok to not-enough.
>
> memory.low is a bit in flux right now, so if anything is unclear
> around its semantics, please feel free to reach out.

I am not familiar with the discussion, would you point me to a
relevant thread please?  In addition, is there some kind of order of
preference for implementing low vs high vs max?

Regards,
Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource

2020-02-14 Thread Kenny Ho
Hi Tejun,

On Fri, Feb 14, 2020 at 2:17 PM Tejun Heo  wrote:
>
> I have to agree with Daniel here. My apologies if I weren't clear
> enough. Here's one interface I can think of:
>
>  * compute weight: The same format as io.weight. Proportional control
>of gpu compute.
>
>  * memory low: Please see how the system memory.low behaves. For gpus,
>it'll need per-device entries.
>
> Note that for both, there one number to configure and conceptually
> it's pretty clear to everybody what that number means, which is not to
> say that it's clear to implement but it's much better to deal with
> that on this side of the interface than the other.

Can you elaborate, per your understanding, how the lgpu weight
attribute differ from the io.weight you suggested?  Is it merely a
formatting/naming issue or is it the implementation details that you
find troubling?  From my perspective, the weight attribute implements
as you suggested back in RFCv4 (proportional control on top of a unit
- either physical or time unit.)

Perhaps more explicit questions would help me understand what you
mean. If I remove the 'list' and 'count' attributes leaving just
weight, is that satisfactory?  Are you saying the idea of affinity or
named-resource is banned from cgroup entirely (even though it exists
in the form of cpuset already and users are interested in having such
options [i.e. userspace OpenCL] when needed?)

To be clear, I am not saying no proportional control.  I am saying
give the user the options, which is what has been implemented.

> cc'ing Johannes. Do you have anything on mind regarding how gpu memory
> configuration should look like? e.g. should it go w/ weights rather
> than absoulte units (I don't think so given that it'll most likely
> need limits at some point too but still and there are benefits from
> staying consistent with system memory).
>
> Also, a rather trivial high level question. Is drm a good controller
> name given that other controller names are like cpu, memory, io?

There was a discussion about naming early in the RFC (I believe
RFCv2), the consensuses then was to use drmcg to align with the drm
subsystem.  I have no problem renaming it to gpucg  or something
similar if that is the last thing that's blocking acceptance.  For
now, I would like to get some clarity on the implementation before
having more code churn.

Regards,
Kenny


> Thanks.
>
> --
> tejun
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource

2020-02-14 Thread Kenny Ho
On Fri, Feb 14, 2020 at 1:34 PM Daniel Vetter  wrote:
>
> I think guidance from Tejun in previos discussions was pretty clear that
> he expects cgroups to be both a) standardized and c) sufficient clear
> meaning that end-users have a clear understanding of what happens when
> they change the resource allocation.
>
> I'm not sure lgpu here, at least as specified, passes either.

I disagree (at least on the characterization of the feedback
provided.)  I believe this series satisfied the sprite of Tejun's
guidance so far (the weight knob for lgpu, for example, was
specifically implemented base on his input.)  But, I will let Tejun
speak for himself after he considered the implementation in detail.

Regards,
Kenny


> But I also
> don't have much clue, so pulled Jason in - he understands how this all
> gets reflected to userspace apis a lot better than me.
> -Daniel
>
>
> >
> > > If it's carving up compute power, what's actually being carved up?  Time? 
> > >  Execution units/waves/threads?  Even if that's the case, what advantage 
> > > does it give to have it in terms of a fixed set of lgpus where each 
> > > cgroup gets to pick a fixed set.  Does affinity matter that much?  Why 
> > > not just say how many waves the GPU supports and that they have to be 
> > > allocated in chunks of 16 waves (pulling a number out of thin air) and 
> > > let the cgroup specify how many waves it wants.
> > >
> > > Don't get me wrong here.  I'm all for the notion of being able to use 
> > > cgroups to carve up GPU compute resources.  However, this sounds to me 
> > > like the most AMD-specific solution possible.  We (Intel) could probably 
> > > do some sort of carving up as well but we'd likely want to do it with 
> > > preemption and time-slicing rather than handing out specific EUs.
> >
> > This has been discussed in the RFC before
> > (https://www.spinics.net/lists/cgroups/msg23469.html.)  As mentioned
> > before, the idea of a compute unit is hardly an AMD specific thing as
> > it is in the OpenCL standard and part of the architecture of many
> > different vendors.  In addition, the interface presented here supports
> > Intel's use case.  What you described is what I considered as the
> > "anonymous resources" view of the lgpu.  What you/Intel can do, is to
> > register your device to drmcg to have 100 lgpu and users can specify
> > simply by count.  So if they want to allocate 5% for a cgroup, they
> > would set count=5.  Per the documentation in this patch: "Some DRM
> > devices may only support lgpu as anonymous resources.  In such case,
> > the significance of the position of the set bits in list will be
> > ignored."  What Intel does with the user expressed configuration of "5
> > out of 100" is entirely up to Intel (time slice if you like, change to
> > specific EUs later if you like, or make it driver configurable to
> > support both if you like.)
> >
> > Regards,
> > Kenny
> >
> > >
> > > On Fri, Feb 14, 2020 at 9:57 AM Kenny Ho  wrote:
> > >>
> > >> drm.lgpu
> > >>   A read-write nested-keyed file which exists on all cgroups.
> > >>   Each entry is keyed by the DRM device's major:minor.
> > >>
> > >>   lgpu stands for logical GPU, it is an abstraction used to
> > >>   subdivide a physical DRM device for the purpose of resource
> > >>   management.  This file stores user configuration while the
> > >>   drm.lgpu.effective reflects the actual allocation after
> > >>   considering the relationship between the cgroups and their
> > >>   configurations.
> > >>
> > >>   The lgpu is a discrete quantity that is device specific (i.e.
> > >>   some DRM devices may have 64 lgpus while others may have 100
> > >>   lgpus.)  The lgpu is a single quantity that can be allocated
> > >>   in three different ways denoted by the following nested keys.
> > >>
> > >> = ==
> > >> weightAllocate by proportion in relationship with
> > >>   active sibling cgroups
> > >> count Allocate by amount statically, treat lgpu as
> > >>   anonymous resources
> > >> list  Allocate statically, treat lgpu as named
> > >>   resource
> > >> = ==
> > >>
> > >

Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource

2020-02-14 Thread Kenny Ho
Hi Jason,

Thanks for the review.

On Fri, Feb 14, 2020 at 11:44 AM Jason Ekstrand  wrote:
>
> Pardon my ignorance but I'm a bit confused by this.  What is a "logical GPU"? 
>  What are we subdividing?  Are we carving up memory?  Compute power?  Both?

The intention is compute but it is up to the individual drm driver to decide.

> If it's carving up compute power, what's actually being carved up?  Time?  
> Execution units/waves/threads?  Even if that's the case, what advantage does 
> it give to have it in terms of a fixed set of lgpus where each cgroup gets to 
> pick a fixed set.  Does affinity matter that much?  Why not just say how many 
> waves the GPU supports and that they have to be allocated in chunks of 16 
> waves (pulling a number out of thin air) and let the cgroup specify how many 
> waves it wants.
>
> Don't get me wrong here.  I'm all for the notion of being able to use cgroups 
> to carve up GPU compute resources.  However, this sounds to me like the most 
> AMD-specific solution possible.  We (Intel) could probably do some sort of 
> carving up as well but we'd likely want to do it with preemption and 
> time-slicing rather than handing out specific EUs.

This has been discussed in the RFC before
(https://www.spinics.net/lists/cgroups/msg23469.html.)  As mentioned
before, the idea of a compute unit is hardly an AMD specific thing as
it is in the OpenCL standard and part of the architecture of many
different vendors.  In addition, the interface presented here supports
Intel's use case.  What you described is what I considered as the
"anonymous resources" view of the lgpu.  What you/Intel can do, is to
register your device to drmcg to have 100 lgpu and users can specify
simply by count.  So if they want to allocate 5% for a cgroup, they
would set count=5.  Per the documentation in this patch: "Some DRM
devices may only support lgpu as anonymous resources.  In such case,
the significance of the position of the set bits in list will be
ignored."  What Intel does with the user expressed configuration of "5
out of 100" is entirely up to Intel (time slice if you like, change to
specific EUs later if you like, or make it driver configurable to
support both if you like.)

Regards,
Kenny

>
> On Fri, Feb 14, 2020 at 9:57 AM Kenny Ho  wrote:
>>
>> drm.lgpu
>>   A read-write nested-keyed file which exists on all cgroups.
>>   Each entry is keyed by the DRM device's major:minor.
>>
>>   lgpu stands for logical GPU, it is an abstraction used to
>>   subdivide a physical DRM device for the purpose of resource
>>   management.  This file stores user configuration while the
>>   drm.lgpu.effective reflects the actual allocation after
>>   considering the relationship between the cgroups and their
>>   configurations.
>>
>>   The lgpu is a discrete quantity that is device specific (i.e.
>>   some DRM devices may have 64 lgpus while others may have 100
>>   lgpus.)  The lgpu is a single quantity that can be allocated
>>   in three different ways denoted by the following nested keys.
>>
>> = ==
>> weightAllocate by proportion in relationship with
>>   active sibling cgroups
>> count Allocate by amount statically, treat lgpu as
>>   anonymous resources
>> list  Allocate statically, treat lgpu as named
>>   resource
>> = ==
>>
>>   For example:
>>   226:0 weight=100 count=256 list=0-255
>>   226:1 weight=100 count=4 list=0,2,4,6
>>   226:2 weight=100 count=32 list=32-63
>>   226:3 weight=100 count=0 list=
>>   226:4 weight=500 count=0 list=
>>
>>   lgpu is represented by a bitmap and uses the bitmap_parselist
>>   kernel function so the list key input format is a
>>   comma-separated list of decimal numbers and ranges.
>>
>>   Consecutively set bits are shown as two hyphen-separated decimal
>>   numbers, the smallest and largest bit numbers set in the range.
>>   Optionally each range can be postfixed to denote that only parts
>>   of it should be set.  The range will divided to groups of
>>   specific size.
>>   Syntax: range:used_size/group_size
>>   Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
>>
>>   The count key is the hamming weight / hweight of the bitmap.
>>
>>   Weight, count and list accept the max and default keywords.
>>
>>   Some DRM devices may only support lgpu as anonym

[PATCH 01/11] cgroup: Introduce cgroup for drm subsystem

2020-02-14 Thread Kenny Ho
With the increased importance of machine learning, data science and
other cloud-based applications, GPUs are already in production use in
data centers today.  Existing GPU resource management is very coarse
grain, however, as sysadmins are only able to distribute workload on a
per-GPU basis.  An alternative is to use GPU virtualization (with or
without SRIOV) but it generally acts on the entire GPU instead of the
specific resources in a GPU.  With a drm cgroup controller, we can
enable alternate, fine-grain, sub-GPU resource management (in addition
to what may be available via GPU virtualization.)

Change-Id: Ia90aed8c4cb89ff20d8216a903a765655b44fc9a
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst | 18 -
 Documentation/cgroup-v1/drm.rst |  1 +
 include/linux/cgroup_drm.h  | 92 +
 include/linux/cgroup_subsys.h   |  4 ++
 init/Kconfig|  5 ++
 kernel/cgroup/Makefile  |  1 +
 kernel/cgroup/drm.c | 42 +++
 7 files changed, 161 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/cgroup-v1/drm.rst
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 5361ebec3361..384db8df0f30 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -61,8 +61,10 @@ v1 is available under Documentation/admin-guide/cgroup-v1/.
  5-6. Device
  5-7. RDMA
5-7-1. RDMA Interface Files
- 5-8. Misc
-   5-8-1. perf_event
+ 5-8. DRM
+   5-8-1. DRM Interface Files
+ 5-9. Misc
+   5-9-1. perf_event
  5-N. Non-normative information
5-N-1. CPU controller root cgroup process behaviour
5-N-2. IO controller root cgroup process behaviour
@@ -2051,6 +2053,18 @@ RDMA Interface Files
  ocrdma1 hca_handle=1 hca_object=23
 
 
+DRM
+---
+
+The "drm" controller regulates the distribution and accounting of
+of DRM (Direct Rendering Manager) and GPU-related resources.
+
+DRM Interface Files
+
+
+TODO
+
+
 Misc
 
 
diff --git a/Documentation/cgroup-v1/drm.rst b/Documentation/cgroup-v1/drm.rst
new file mode 100644
index ..5f5658e1f5ed
--- /dev/null
+++ b/Documentation/cgroup-v1/drm.rst
@@ -0,0 +1 @@
+Please see ../cgroup-v2.rst for details
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
new file mode 100644
index ..ba7981ac3afc
--- /dev/null
+++ b/include/linux/cgroup_drm.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ */
+#ifndef _CGROUP_DRM_H
+#define _CGROUP_DRM_H
+
+#include 
+
+#ifdef CONFIG_CGROUP_DRM
+
+/**
+ * The DRM cgroup controller data structure.
+ */
+struct drmcg {
+   struct cgroup_subsys_state  css;
+};
+
+/**
+ * css_to_drmcg - get the corresponding drmcg ref from a cgroup_subsys_state
+ * @css: the target cgroup_subsys_state
+ *
+ * Return: DRM cgroup that contains the @css
+ */
+static inline struct drmcg *css_to_drmcg(struct cgroup_subsys_state *css)
+{
+   return css ? container_of(css, struct drmcg, css) : NULL;
+}
+
+/**
+ * drmcg_get - get the drmcg reference that a task belongs to
+ * @task: the target task
+ *
+ * This increase the reference count of the css that the @task belongs to
+ *
+ * Return: reference to the DRM cgroup the task belongs to
+ */
+static inline struct drmcg *drmcg_get(struct task_struct *task)
+{
+   return css_to_drmcg(task_get_css(task, drm_cgrp_id));
+}
+
+/**
+ * drmcg_put - put a drmcg reference
+ * @drmcg: the target drmcg
+ *
+ * Put a reference obtained via drmcg_get
+ */
+static inline void drmcg_put(struct drmcg *drmcg)
+{
+   if (drmcg)
+   css_put(>css);
+}
+
+/**
+ * drmcg_parent - find the parent of a drm cgroup
+ * @cg: the target drmcg
+ *
+ * This does not increase the reference count of the parent cgroup
+ *
+ * Return: parent DRM cgroup of @cg
+ */
+static inline struct drmcg *drmcg_parent(struct drmcg *cg)
+{
+   return css_to_drmcg(cg->css.parent);
+}
+
+#else /* CONFIG_CGROUP_DRM */
+
+struct drmcg {
+};
+
+static inline struct drmcg *css_to_drmcg(struct cgroup_subsys_state *css)
+{
+   return NULL;
+}
+
+static inline struct drmcg *drmcg_get(struct task_struct *task)
+{
+   return NULL;
+}
+
+static inline void drmcg_put(struct drmcg *drmcg)
+{
+}
+
+static inline struct drmcg *drmcg_parent(struct drmcg *cg)
+{
+   return NULL;
+}
+
+#endif /* CONFIG_CGROUP_DRM */
+#endif /* _CGROUP_DRM_H */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index acb77dcff3b4..ddedad809e8b 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -61,6 +61,10 @@ SUBSYS(pids)
 SUBSYS(rdma)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_DRM)
+SUBSYS(drm)
+#endif
+
 /*
 

[PATCH 02/11] drm, cgroup: Bind drm and cgroup subsystem

2020-02-14 Thread Kenny Ho
Since the drm subsystem can be compiled as a module and drm devices can
be added and removed during run time, add several functions to bind the
drm subsystem as well as drm devices with drmcg.

Two pairs of functions:
drmcg_bind/drmcg_unbind - used to bind/unbind the drm subsystem to the
cgroup subsystem as the drm core initialize/exit.

drmcg_register_dev/drmcg_unregister_dev - used to register/unregister
drm devices to the cgroup subsystem as the devices are presented/removed
from userspace.

Change-Id: I1cb6b2080fc7d27979d886ef23e784341efafb41
---
 drivers/gpu/drm/drm_drv.c  |   8 +++
 include/drm/drm_cgroup.h   |  39 +++
 include/linux/cgroup_drm.h |   4 ++
 kernel/cgroup/drm.c| 131 +
 4 files changed, 182 insertions(+)
 create mode 100644 include/drm/drm_cgroup.h

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 1b9b40a1c7c9..8e59cc5a5bde 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "drm_crtc_internal.h"
 #include "drm_internal.h"
@@ -972,6 +973,8 @@ int drm_dev_register(struct drm_device *dev, unsigned long 
flags)
 
ret = 0;
 
+   drmcg_register_dev(dev);
+
DRM_INFO("Initialized %s %d.%d.%d %s for %s on minor %d\n",
 driver->name, driver->major, driver->minor,
 driver->patchlevel, driver->date,
@@ -1006,6 +1009,8 @@ EXPORT_SYMBOL(drm_dev_register);
  */
 void drm_dev_unregister(struct drm_device *dev)
 {
+   drmcg_unregister_dev(dev);
+
if (drm_core_check_feature(dev, DRIVER_LEGACY))
drm_lastclose(dev);
 
@@ -1112,6 +1117,7 @@ static const struct file_operations drm_stub_fops = {
 
 static void drm_core_exit(void)
 {
+   drmcg_unbind();
unregister_chrdev(DRM_MAJOR, "drm");
debugfs_remove(drm_debugfs_root);
drm_sysfs_destroy();
@@ -1138,6 +1144,8 @@ static int __init drm_core_init(void)
if (ret < 0)
goto error;
 
+   drmcg_bind(_minor_acquire, _dev_put);
+
drm_core_init_complete = true;
 
DRM_DEBUG("Initialized\n");
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
new file mode 100644
index ..530c9a0b3238
--- /dev/null
+++ b/include/drm/drm_cgroup.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ */
+#ifndef __DRM_CGROUP_H__
+#define __DRM_CGROUP_H__
+
+#ifdef CONFIG_CGROUP_DRM
+
+void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
+   void (*put_ddev)(struct drm_device *dev));
+
+void drmcg_unbind(void);
+
+void drmcg_register_dev(struct drm_device *dev);
+
+void drmcg_unregister_dev(struct drm_device *dev);
+
+#else
+
+static inline void drmcg_bind(
+   struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
+   void (*put_ddev)(struct drm_device *dev))
+{
+}
+
+static inline void drmcg_unbind(void)
+{
+}
+
+static inline void drmcg_register_dev(struct drm_device *dev)
+{
+}
+
+static inline void drmcg_unregister_dev(struct drm_device *dev)
+{
+}
+
+#endif /* CONFIG_CGROUP_DRM */
+#endif /* __DRM_CGROUP_H__ */
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index ba7981ac3afc..854591bbb430 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -5,6 +5,10 @@
 #define _CGROUP_DRM_H
 
 #include 
+#include 
+
+/* limit defined per the way drm_minor_alloc operates */
+#define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
 
 #ifdef CONFIG_CGROUP_DRM
 
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index e97861b3cb30..37f98dc47268 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -1,11 +1,142 @@
 // SPDX-License-Identifier: MIT
 // Copyright 2019 Advanced Micro Devices, Inc.
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 static struct drmcg *root_drmcg __read_mostly;
 
+/* global mutex for drmcg across all devices */
+static DEFINE_MUTEX(drmcg_mutex);
+
+static DECLARE_BITMAP(known_devs, MAX_DRM_DEV);
+
+static struct drm_minor (*(*acquire_drm_minor)(unsigned int minor_id));
+
+static void (*put_drm_dev)(struct drm_device *dev);
+
+/**
+ * drmcg_bind - Bind DRM subsystem to cgroup subsystem
+ * @acq_dm: function pointer to the drm_minor_acquire function
+ * @put_ddev: function pointer to the drm_dev_put function
+ *
+ * This function binds some functions from the DRM subsystem and make
+ * them available to the drmcg subsystem.
+ *
+ * drmcg_unbind does the opposite of this function
+ */
+void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
+   void (*put_ddev)(struct drm_device *dev))
+{
+   mutex_lock(_mutex);
+   acquire_drm_minor = acq_dm;
+   put_drm_dev = put_ddev;
+   mutex_unlock(_mutex);
+}
+EXPORT_SYMBOL(drmcg_bind);
+
+/**
+ * drmcg_unbind - Unbind DRM subsystem from cgroup subsystem
+ *
+ * drmcg_bind 

[PATCH 11/11] drm/amdgpu: Integrate with DRM cgroup

2020-02-14 Thread Kenny Ho
The number of logical gpu (lgpu) is defined to be the number of compute
unit (CU) for a device.  The lgpu allocation limit only applies to
compute workload for the moment (enforced via kfd queue creation.)  Any
cu_mask update is validated against the availability of the compute unit
as defined by the drmcg the kfd process belongs to.

Change-Id: I2930e76ef9ac6d36d0feb81f604c89a4208e6614
Signed-off-by: Kenny Ho 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  29 
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   6 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   3 +
 .../amd/amdkfd/kfd_process_queue_manager.c| 153 ++
 5 files changed, 195 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 47b0f2957d1f..a45c7b5d23b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -198,6 +198,10 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev 
*dst, struct kgd_dev *s
valid;  \
})
 
+int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
+   struct amdgpu_device *adev, unsigned long *lgpu_bitmap,
+   unsigned int nbits);
+
 /* GPUVM API */
 int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int 
pasid,
void **vm, void **process_info,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 3ebef1d62346..dc31b9af2c72 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1402,9 +1402,31 @@ amdgpu_get_crtc_scanout_position(struct drm_device *dev, 
unsigned int pipe,
 static void amdgpu_drmcg_custom_init(struct drm_device *dev,
struct drmcg_props *props)
 {
+   struct amdgpu_device *adev = dev->dev_private;
+
+   props->lgpu_capacity = adev->gfx.cu_info.number;
+   bitmap_zero(props->lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
+   bitmap_fill(props->lgpu_slots, props->lgpu_capacity);
+
props->limit_enforced = true;
 }
 
+static void amdgpu_drmcg_limit_updated(struct drm_device *dev,
+   struct task_struct *task, struct drmcg_device_resource *ddr,
+   enum drmcg_res_type res_type)
+{
+   struct amdgpu_device *adev = dev->dev_private;
+
+   switch (res_type) {
+   case DRMCG_TYPE_LGPU:
+   amdgpu_amdkfd_update_cu_mask_for_process(task, adev,
+ddr->lgpu_eff, dev->drmcg_props.lgpu_capacity);
+   break;
+   default:
+   break;
+   }
+}
+
 #else
 
 static void amdgpu_drmcg_custom_init(struct drm_device *dev,
@@ -1412,6 +1434,12 @@ static void amdgpu_drmcg_custom_init(struct drm_device 
*dev,
 {
 }
 
+static void amdgpu_drmcg_limit_updated(struct drm_device *dev,
+   struct task_struct *task, struct drmcg_device_resource *ddr,
+   enum drmcg_res_type res_type)
+{
+}
+
 #endif /* CONFIG_CGROUP_DRM */
 
 static struct drm_driver kms_driver = {
@@ -1448,6 +1476,7 @@ static struct drm_driver kms_driver = {
.gem_prime_mmap = amdgpu_gem_prime_mmap,
 
.drmcg_custom_init = amdgpu_drmcg_custom_init,
+   .drmcg_limit_updated = amdgpu_drmcg_limit_updated,
 
.name = DRIVER_NAME,
.desc = DRIVER_DESC,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 275f79ab0900..f39555c0f1d8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -449,6 +449,12 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct 
kfd_process *p,
return -EFAULT;
}
 
+   if (!pqm_drmcg_lgpu_validate(p, args->queue_id, properties.cu_mask, 
cu_mask_size)) {
+   pr_debug("CU mask not permitted by DRM Cgroup");
+   kfree(properties.cu_mask);
+   return -EACCES;
+   }
+
mutex_lock(>mutex);
 
retval = pqm_set_cu_mask(>pqm, args->queue_id, );
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index c0b0defc8f7a..9053b1b7fb10 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -921,6 +921,9 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
   u32 *ctl_stack_used_size,
   u32 *save_area_used_size);
 
+bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
+   unsigned int cu_mask_size);
+
 int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
  unsigned int fence_value,
  unsigned int timeout_ms);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_p

[PATCH 04/11] drm, cgroup: Add total GEM buffer allocation stats

2020-02-14 Thread Kenny Ho
The drm resource being measured here is the GEM buffer objects.  User
applications allocate and free these buffers.  In addition, a process
can allocate a buffer and share it with another process.  The consumer
of a shared buffer can also outlive the allocator of the buffer.

For the purpose of cgroup accounting and limiting, ownership of the
buffer is deemed to be the cgroup for which the allocating process
belongs to.  There is one cgroup stats per drm device.  Each allocation
is charged to the owning cgroup as well as all its ancestors.

Similar to the memory cgroup, migrating a process to a different cgroup
does not move the GEM buffer usages that the process started while in
previous cgroup, to the new cgroup.

The following is an example to illustrate some of the operations.  Given
the following cgroup hierarchy (The letters are cgroup names with R
being the root cgroup.  The numbers in brackets are processes.  The
processes are placed with cgroup's 'No Internal Process Constraint' in
mind, so no process is placed in cgroup B.)

R (4, 5) -- A (6)
 \
  B  C (7,8)
   \
D (9)

Here is a list of operation and the associated effect on the size
track by the cgroups (for simplicity, each buffer is 1 unit in size.)

==  ==  ==  ==  ==  ===
R   A   B   C   D   Ops
==  ==  ==  ==  ==  ===
1   0   0   0   0   4 allocated a buffer
1   0   0   0   0   4 shared a buffer with 5
1   0   0   0   0   4 shared a buffer with 9
2   0   1   0   1   9 allocated a buffer
3   0   2   1   1   7 allocated a buffer
3   0   2   1   1   7 shared a buffer with 8
3   0   2   1   1   7 sharing with 9
3   0   2   1   1   7 release a buffer
3   0   2   1   1   7 migrate to cgroup D
3   0   2   1   1   9 release a buffer from 7
2   0   1   0   1   8 release a buffer from 7 (last ref to shared buf)
==  ==  ==  ==  ==  ===

drm.buffer.stats
A read-only flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Total GEM buffer allocation in bytes.

Change-Id: Ibc1f646ca7dbc588e2d11802b156b524696a23e7
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst |  50 +-
 drivers/gpu/drm/drm_gem.c   |   9 ++
 include/drm/drm_cgroup.h|  16 +++
 include/drm/drm_gem.h   |  10 ++
 include/linux/cgroup_drm.h  |   6 ++
 kernel/cgroup/drm.c | 126 
 6 files changed, 216 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 384db8df0f30..2d8162c109f3 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -63,6 +63,7 @@ v1 is available under Documentation/admin-guide/cgroup-v1/.
5-7-1. RDMA Interface Files
  5-8. DRM
5-8-1. DRM Interface Files
+   5-8-2. GEM Buffer Ownership
  5-9. Misc
5-9-1. perf_event
  5-N. Non-normative information
@@ -2062,7 +2063,54 @@ of DRM (Direct Rendering Manager) and GPU-related 
resources.
 DRM Interface Files
 
 
-TODO
+  drm.buffer.stats
+   A read-only flat-keyed file which exists on all cgroups.  Each
+   entry is keyed by the drm device's major:minor.
+
+   Total GEM buffer allocation in bytes.
+
+GEM Buffer Ownership
+
+
+For the purpose of cgroup accounting and limiting, ownership of the
+buffer is deemed to be the cgroup for which the allocating process
+belongs to.  There is one cgroup stats per drm device.  Each allocation
+is charged to the owning cgroup as well as all its ancestors.
+
+Similar to the memory cgroup, migrating a process to a different cgroup
+does not move the GEM buffer usages that the process started while in
+previous cgroup, to the new cgroup.
+
+The following is an example to illustrate some of the operations.  Given
+the following cgroup hierarchy (The letters are cgroup names with R
+being the root cgroup.  The numbers in brackets are processes.  The
+processes are placed with cgroup's 'No Internal Process Constraint' in
+mind, so no process is placed in cgroup B.)
+
+R (4, 5) -- A (6)
+ \
+  B  C (7,8)
+   \
+D (9)
+
+Here is a list of operation and the associated effect on the size
+track by the cgroups (for simplicity, each buffer is 1 unit in size.)
+
+==  ==  ==  ==  ==  ===
+R   A   B   C   D   Ops
+==  ==  ==  ==  ==  ===
+1   0   0   0   0   4 allocated a buffer
+1   0   0   0   0   4 shared a buffer with 5
+1   0   0   0   0   4 shared a buffer with 9
+2   0   1   0   1   9 allocated a buffer
+3   0   2   1   1   7 allocated a buffer
+3   0   2   1   1   7 shared a buffer with 8
+3   0   2   1   1   7 sharing with 9
+3   0

[PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource

2020-02-14 Thread Kenny Ho
drm.lgpu
  A read-write nested-keyed file which exists on all cgroups.
  Each entry is keyed by the DRM device's major:minor.

  lgpu stands for logical GPU, it is an abstraction used to
  subdivide a physical DRM device for the purpose of resource
  management.  This file stores user configuration while the
  drm.lgpu.effective reflects the actual allocation after
  considering the relationship between the cgroups and their
  configurations.

  The lgpu is a discrete quantity that is device specific (i.e.
  some DRM devices may have 64 lgpus while others may have 100
  lgpus.)  The lgpu is a single quantity that can be allocated
  in three different ways denoted by the following nested keys.

= ==
weightAllocate by proportion in relationship with
  active sibling cgroups
count Allocate by amount statically, treat lgpu as
  anonymous resources
list  Allocate statically, treat lgpu as named
  resource
= ==

  For example:
  226:0 weight=100 count=256 list=0-255
  226:1 weight=100 count=4 list=0,2,4,6
  226:2 weight=100 count=32 list=32-63
  226:3 weight=100 count=0 list=
  226:4 weight=500 count=0 list=

  lgpu is represented by a bitmap and uses the bitmap_parselist
  kernel function so the list key input format is a
  comma-separated list of decimal numbers and ranges.

  Consecutively set bits are shown as two hyphen-separated decimal
  numbers, the smallest and largest bit numbers set in the range.
  Optionally each range can be postfixed to denote that only parts
  of it should be set.  The range will divided to groups of
  specific size.
  Syntax: range:used_size/group_size
  Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769

  The count key is the hamming weight / hweight of the bitmap.

  Weight, count and list accept the max and default keywords.

  Some DRM devices may only support lgpu as anonymous resources.
  In such case, the significance of the position of the set bits
  in list will be ignored.

  The weight quantity is only in effect when static allocation
  is not used (by setting count=0) for this cgroup.  The weight
  quantity distributes lgpus that are not statically allocated by
  the siblings.  For example, given siblings cgroupA, cgroupB and
  cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
  0-63, no lgpu is available to be distributed by weight.
  Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
  cgroupC will be starved if it tries to allocate by weight.

  On the other hand, if cgroupA has weight=100 count=0, cgroupB
  has list=16-47, and cgroupC has weight=100 count=0, then 32
  lgpus are available to be distributed evenly between cgroupA
  and cgroupC.  In drm.lgpu.effective, cgroupA will have
  list=0-15 and cgroupC will have list=48-63.

  This lgpu resource supports the 'allocation' and 'weight'
  resource distribution model.

drm.lgpu.effective
  A read-only nested-keyed file which exists on all cgroups.
  Each entry is keyed by the DRM device's major:minor.

  lgpu stands for logical GPU, it is an abstraction used to
  subdivide a physical DRM device for the purpose of resource
  management.  This file reflects the actual allocation after
  considering the relationship between the cgroups and their
  configurations in drm.lgpu.

Change-Id: Idde0ef9a331fd67bb9c7eb8ef9978439e6452488
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst |  80 ++
 include/drm/drm_cgroup.h|   3 +
 include/linux/cgroup_drm.h  |  22 ++
 kernel/cgroup/drm.c | 324 +++-
 4 files changed, 427 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index ce5dc027366a..d8a41956e5c7 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2120,6 +2120,86 @@ DRM Interface Files
Set largest allocation for /dev/dri/card1 to 4MB
echo "226:1 4m" > drm.buffer.peak.max
 
+  drm.lgpu
+   A read-write nested-keyed file which exists on all cgroups.
+   Each entry is keyed by the DRM device's major:minor.
+
+   lgpu stands for logical GPU, it is an abstraction used to
+   subdivide a physical DRM device for the purpose of resource
+   management.  This file stores user configuration while the
+drm.lgpu.effective reflects the actual allocation after
+considering the relationship between the cgroups and their
+configurations.
+
+   The lgpu is a discrete quantity that is devi

[PATCH 07/11] drm, cgroup: Add total GEM buffer allocation limit

2020-02-14 Thread Kenny Ho
The drm resource being limited here is the GEM buffer objects.  User
applications allocate and free these buffers.  In addition, a process
can allocate a buffer and share it with another process.  The consumer
of a shared buffer can also outlive the allocator of the buffer.

For the purpose of cgroup accounting and limiting, ownership of the
buffer is deemed to be the cgroup for which the allocating process
belongs to.  There is one cgroup limit per drm device.

The limiting functionality is added to the previous stats collection
function.  The drm_gem_private_object_init is modified to have a return
value to allow failure due to cgroup limit.

The try_chg function only fails if the DRM cgroup properties has
limit_enforced set to true for the DRM device.  This is to allow the DRM
cgroup controller to collect usage stats without enforcing the limits.

drm.buffer.default
A read-only flat-keyed file which exists on the root cgroup.
Each entry is keyed by the drm device's major:minor.

Default limits on the total GEM buffer allocation in bytes.

drm.buffer.max
A read-write flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Per device limits on the total GEM buffer allocation in byte.
This is a hard limit.  Attempts in allocating beyond the cgroup
limit will result in ENOMEM.  Shorthand understood by memparse
(such as k, m, g) can be used.

Set allocation limit for /dev/dri/card1 to 1GB
echo "226:1 1g" > drm.buffer.total.max

Set allocation limit for /dev/dri/card0 to 512MB
echo "226:0 512m" > drm.buffer.total.max

Change-Id: Id3265bbd0fafe84a16b59617df79bd32196160be
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst|  21 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  19 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   6 +-
 drivers/gpu/drm/drm_gem.c  |  11 +-
 include/drm/drm_cgroup.h   |   8 +-
 include/drm/drm_gem.h  |   2 +-
 include/linux/cgroup_drm.h |   1 +
 kernel/cgroup/drm.c| 227 -
 8 files changed, 278 insertions(+), 17 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 19fcf54ace83..064172df63e2 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2081,6 +2081,27 @@ DRM Interface Files
 
Total number of GEM buffer allocated.
 
+  drm.buffer.default
+   A read-only flat-keyed file which exists on the root cgroup.
+   Each entry is keyed by the drm device's major:minor.
+
+   Default limits on the total GEM buffer allocation in bytes.
+
+  drm.buffer.max
+   A read-write flat-keyed file which exists on all cgroups.  Each
+   entry is keyed by the drm device's major:minor.
+
+   Per device limits on the total GEM buffer allocation in byte.
+   This is a hard limit.  Attempts in allocating beyond the cgroup
+   limit will result in ENOMEM.  Shorthand understood by memparse
+   (such as k, m, g) can be used.
+
+   Set allocation limit for /dev/dri/card1 to 1GB
+   echo "226:1 1g" > drm.buffer.total.max
+
+   Set allocation limit for /dev/dri/card0 to 512MB
+   echo "226:0 512m" > drm.buffer.total.max
+
 GEM Buffer Ownership
 
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index f28d040de3ce..3ebef1d62346 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1397,6 +1397,23 @@ amdgpu_get_crtc_scanout_position(struct drm_device *dev, 
unsigned int pipe,
  stime, etime, mode);
 }
 
+#ifdef CONFIG_CGROUP_DRM
+
+static void amdgpu_drmcg_custom_init(struct drm_device *dev,
+   struct drmcg_props *props)
+{
+   props->limit_enforced = true;
+}
+
+#else
+
+static void amdgpu_drmcg_custom_init(struct drm_device *dev,
+   struct drmcg_props *props)
+{
+}
+
+#endif /* CONFIG_CGROUP_DRM */
+
 static struct drm_driver kms_driver = {
.driver_features =
DRIVER_USE_AGP | DRIVER_ATOMIC |
@@ -1430,6 +1447,8 @@ static struct drm_driver kms_driver = {
.gem_prime_vunmap = amdgpu_gem_prime_vunmap,
.gem_prime_mmap = amdgpu_gem_prime_mmap,
 
+   .drmcg_custom_init = amdgpu_drmcg_custom_init,
+
.name = DRIVER_NAME,
.desc = DRIVER_DESC,
.date = DRIVER_DATE,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 46c76e2e1281..b81c608cb2cc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -34,6 +34,7 @@
 
 #include 
 #include 
+#include 
 #include "amdgpu.h"

[PATCH 10/11] drm, cgroup: add update trigger after limit change

2020-02-14 Thread Kenny Ho
Before this commit, drmcg limits are updated but enforcement is delayed
until the next time the driver check against the new limit.  While this
is sufficient for certain resources, a more proactive enforcement may be
needed for other resources.

Introducing an optional drmcg_limit_updated callback for the DRM
drivers.  When defined, it will be called in two scenarios:
1) When limits are updated for a particular cgroup, the callback will be
triggered for each task in the updated cgroup.
2) When a task is migrated from one cgroup to another, the callback will
be triggered for each resource type for the migrated task.

Change-Id: I0ce7c4e5a04c31bd0f8d9853a383575d4bc9a3fa
Signed-off-by: Kenny Ho 
---
 include/drm/drm_drv.h | 10 
 kernel/cgroup/drm.c   | 59 ++-
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index 1f65ac4d9bbf..e7333143e722 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -724,6 +724,16 @@ struct drm_driver {
void (*drmcg_custom_init)(struct drm_device *dev,
struct drmcg_props *props);
 
+   /**
+* @drmcg_limit_updated
+*
+* Optional callback
+*/
+   void (*drmcg_limit_updated)(struct drm_device *dev,
+   struct task_struct *task,
+   struct drmcg_device_resource *ddr,
+   enum drmcg_res_type res_type);
+
/**
 * @gem_vm_ops: Driver private ops for this object
 *
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index a4e88a3704bb..d3fa23b71f5f 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -133,6 +133,26 @@ static inline void drmcg_update_cg_tree(struct drm_device 
*dev)
mutex_unlock(_mutex);
 }
 
+static void drmcg_limit_updated(struct drm_device *dev, struct drmcg *drmcg,
+   enum drmcg_res_type res_type)
+{
+   struct drmcg_device_resource *ddr =
+   drmcg->dev_resources[dev->primary->index];
+   struct css_task_iter it;
+   struct task_struct *task;
+
+   if (dev->driver->drmcg_limit_updated == NULL)
+   return;
+
+   css_task_iter_start(>css.cgroup->self,
+   CSS_TASK_ITER_PROCS, );
+   while ((task = css_task_iter_next())) {
+   dev->driver->drmcg_limit_updated(dev, task,
+   ddr, res_type);
+   }
+   css_task_iter_end();
+}
+
 static void drmcg_calculate_effective_lgpu(struct drm_device *dev,
const unsigned long *free_static,
const unsigned long *free_weighted,
@@ -230,6 +250,8 @@ static void drmcg_apply_effective_lgpu(struct drm_device 
*dev)
bitmap_copy(ddr->lgpu_eff, ddr->lgpu_stg, capacity);
ddr->lgpu_count_eff =
bitmap_weight(ddr->lgpu_eff, capacity);
+
+   drmcg_limit_updated(dev, drmcg, DRMCG_TYPE_LGPU);
}
}
rcu_read_unlock();
@@ -686,7 +708,6 @@ static void drmcg_nested_limit_parse(struct 
kernfs_open_file *of,
}
 }
 
-
 /**
  * drmcg_limit_write - parse cgroup interface files to obtain user config
  *
@@ -879,10 +900,46 @@ static int drmcg_css_online(struct cgroup_subsys_state 
*css)
return drm_minor_for_each(_online_fn, css_to_drmcg(css));
 }
 
+static int drmcg_attach_fn(int id, void *ptr, void *data)
+{
+   struct drm_minor *minor = ptr;
+   struct task_struct *task = data;
+   struct drm_device *dev;
+
+   if (minor->type != DRM_MINOR_PRIMARY)
+   return 0;
+
+   dev = minor->dev;
+
+   if (dev->driver->drmcg_limit_updated) {
+   struct drmcg *drmcg = drmcg_get(task);
+   struct drmcg_device_resource *ddr =
+   drmcg->dev_resources[minor->index];
+   enum drmcg_res_type type;
+
+   for (type = 0; type < __DRMCG_TYPE_LAST; type++)
+   dev->driver->drmcg_limit_updated(dev, task, ddr, type);
+
+   drmcg_put(drmcg);
+   }
+
+   return 0;
+}
+
+static void drmcg_attach(struct cgroup_taskset *tset)
+{
+   struct task_struct *task;
+   struct cgroup_subsys_state *css;
+
+   cgroup_taskset_for_each(task, css, tset)
+   drm_minor_for_each(_attach_fn, task);
+}
+
 struct cgroup_subsys drm_cgrp_subsys = {
.css_alloc  = drmcg_css_alloc,
.css_free   = drmcg_css_free,
.css_online = drmcg_css_online,
+   .attach = drmcg_attach,
.early_init = false,
.legacy_cftypes = files,
.dfl_cftypes= files,
-- 
2.25.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 05/11] drm, cgroup: Add peak GEM buffer allocation stats

2020-02-14 Thread Kenny Ho
drm.buffer.peak.stats
A read-only flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Largest (high water mark) GEM buffer allocated in bytes.

Change-Id: I40fe4c13c1cea8613b3e04b802f3e1f19eaab4fc
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst |  6 ++
 include/linux/cgroup_drm.h  |  3 +++
 kernel/cgroup/drm.c | 12 
 3 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 2d8162c109f3..75b97962b127 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2069,6 +2069,12 @@ DRM Interface Files
 
Total GEM buffer allocation in bytes.
 
+  drm.buffer.peak.stats
+   A read-only flat-keyed file which exists on all cgroups.  Each
+   entry is keyed by the drm device's major:minor.
+
+   Largest (high water mark) GEM buffer allocated in bytes.
+
 GEM Buffer Ownership
 
 
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 174ab50701ef..593ad12602cd 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -13,6 +13,7 @@
 
 enum drmcg_res_type {
DRMCG_TYPE_BO_TOTAL,
+   DRMCG_TYPE_BO_PEAK,
__DRMCG_TYPE_LAST,
 };
 
@@ -24,6 +25,8 @@ enum drmcg_res_type {
 struct drmcg_device_resource {
/* for per device stats */
s64 bo_stats_total_allocated;
+
+   s64 bo_stats_peak_allocated;
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 425566753a5c..7a0da70c5a25 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -277,6 +277,9 @@ static void drmcg_print_stats(struct drmcg_device_resource 
*ddr,
case DRMCG_TYPE_BO_TOTAL:
seq_printf(sf, "%lld\n", ddr->bo_stats_total_allocated);
break;
+   case DRMCG_TYPE_BO_PEAK:
+   seq_printf(sf, "%lld\n", ddr->bo_stats_peak_allocated);
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -325,6 +328,12 @@ struct cftype files[] = {
.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_TOTAL,
DRMCG_FTYPE_STATS),
},
+   {
+   .name = "buffer.peak.stats",
+   .seq_show = drmcg_seq_show,
+   .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_PEAK,
+   DRMCG_FTYPE_STATS),
+   },
{ } /* terminate */
 };
 
@@ -373,6 +382,9 @@ void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct 
drm_device *dev,
ddr = drmcg->dev_resources[devIdx];
 
ddr->bo_stats_total_allocated += (s64)size;
+
+   if (ddr->bo_stats_peak_allocated < (s64)size)
+   ddr->bo_stats_peak_allocated = (s64)size;
}
mutex_unlock(>drmcg_mutex);
 }
-- 
2.25.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 08/11] drm, cgroup: Add peak GEM buffer allocation limit

2020-02-14 Thread Kenny Ho
drm.buffer.peak.default
A read-only flat-keyed file which exists on the root cgroup.
Each entry is keyed by the drm device's major:minor.

Default limits on the largest GEM buffer allocation in bytes.

drm.buffer.peak.max
A read-write flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Per device limits on the largest GEM buffer allocation in bytes.
This is a hard limit.  Attempts in allocating beyond the cgroup
limit will result in ENOMEM.  Shorthand understood by memparse
(such as k, m, g) can be used.

Set largest allocation for /dev/dri/card1 to 4MB
echo "226:1 4m" > drm.buffer.peak.max

Change-Id: I5ab3fb4a442b6cbd5db346be595897c90217da69
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst | 18 +++
 include/drm/drm_cgroup.h|  1 +
 include/linux/cgroup_drm.h  |  1 +
 kernel/cgroup/drm.c | 43 +
 4 files changed, 63 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 064172df63e2..ce5dc027366a 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2102,6 +2102,24 @@ DRM Interface Files
Set allocation limit for /dev/dri/card0 to 512MB
echo "226:0 512m" > drm.buffer.total.max
 
+  drm.buffer.peak.default
+   A read-only flat-keyed file which exists on the root cgroup.
+   Each entry is keyed by the drm device's major:minor.
+
+   Default limits on the largest GEM buffer allocation in bytes.
+
+  drm.buffer.peak.max
+   A read-write flat-keyed file which exists on all cgroups.  Each
+   entry is keyed by the drm device's major:minor.
+
+   Per device limits on the largest GEM buffer allocation in bytes.
+   This is a hard limit.  Attempts in allocating beyond the cgroup
+   limit will result in ENOMEM.  Shorthand understood by memparse
+   (such as k, m, g) can be used.
+
+   Set largest allocation for /dev/dri/card1 to 4MB
+   echo "226:1 4m" > drm.buffer.peak.max
+
 GEM Buffer Ownership
 
 
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 2783e56690db..2b41d4d22e33 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -16,6 +16,7 @@ struct drmcg_props {
boollimit_enforced;
 
s64 bo_limits_total_allocated_default;
+   s64 bo_limits_peak_allocated_default;
 };
 
 void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index b03d90623763..eae400f3d9b4 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -29,6 +29,7 @@ struct drmcg_device_resource {
s64 bo_limits_total_allocated;
 
s64 bo_stats_peak_allocated;
+   s64 bo_limits_peak_allocated;
 
s64 bo_stats_count_allocated;
 };
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index ee85482edd90..5fcbbc13fa1c 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -95,6 +95,9 @@ static inline int init_drmcg_single(struct drmcg *drmcg, 
struct drm_device *dev)
ddr->bo_limits_total_allocated =
dev->drmcg_props.bo_limits_total_allocated_default;
 
+   ddr->bo_limits_peak_allocated =
+   dev->drmcg_props.bo_limits_peak_allocated_default;
+
return 0;
 }
 
@@ -305,6 +308,9 @@ static void drmcg_print_limits(struct drmcg_device_resource 
*ddr,
case DRMCG_TYPE_BO_TOTAL:
seq_printf(sf, "%lld\n", ddr->bo_limits_total_allocated);
break;
+   case DRMCG_TYPE_BO_PEAK:
+   seq_printf(sf, "%lld\n", ddr->bo_limits_peak_allocated);
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -319,6 +325,10 @@ static void drmcg_print_default(struct drmcg_props *props,
seq_printf(sf, "%lld\n",
props->bo_limits_total_allocated_default);
break;
+   case DRMCG_TYPE_BO_PEAK:
+   seq_printf(sf, "%lld\n",
+   props->bo_limits_peak_allocated_default);
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -476,6 +486,19 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file 
*of, char *buf,
 
ddr->bo_limits_total_allocated = val;
break;
+   case DRMCG_TYPE_BO_PEAK:
+   rc = drmcg_process_limit_s64_val

[PATCH 06/11] drm, cgroup: Add GEM buffer allocation count stats

2020-02-14 Thread Kenny Ho
drm.buffer.count.stats
A read-only flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Total number of GEM buffer allocated.

Change-Id: Iad29bdf44390dbcee07b1e72ea0ff811aa3b9dcd
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst |  6 ++
 include/linux/cgroup_drm.h  |  3 +++
 kernel/cgroup/drm.c | 22 +++---
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 75b97962b127..19fcf54ace83 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2075,6 +2075,12 @@ DRM Interface Files
 
Largest (high water mark) GEM buffer allocated in bytes.
 
+  drm.buffer.count.stats
+   A read-only flat-keyed file which exists on all cgroups.  Each
+   entry is keyed by the drm device's major:minor.
+
+   Total number of GEM buffer allocated.
+
 GEM Buffer Ownership
 
 
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 593ad12602cd..51a0cd37da92 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -14,6 +14,7 @@
 enum drmcg_res_type {
DRMCG_TYPE_BO_TOTAL,
DRMCG_TYPE_BO_PEAK,
+   DRMCG_TYPE_BO_COUNT,
__DRMCG_TYPE_LAST,
 };
 
@@ -27,6 +28,8 @@ struct drmcg_device_resource {
s64 bo_stats_total_allocated;
 
s64 bo_stats_peak_allocated;
+
+   s64 bo_stats_count_allocated;
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 7a0da70c5a25..bc162aa9971d 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -280,6 +280,9 @@ static void drmcg_print_stats(struct drmcg_device_resource 
*ddr,
case DRMCG_TYPE_BO_PEAK:
seq_printf(sf, "%lld\n", ddr->bo_stats_peak_allocated);
break;
+   case DRMCG_TYPE_BO_COUNT:
+   seq_printf(sf, "%lld\n", ddr->bo_stats_count_allocated);
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -334,6 +337,12 @@ struct cftype files[] = {
.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_PEAK,
DRMCG_FTYPE_STATS),
},
+   {
+   .name = "buffer.count.stats",
+   .seq_show = drmcg_seq_show,
+   .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_COUNT,
+   DRMCG_FTYPE_STATS),
+   },
{ } /* terminate */
 };
 
@@ -385,6 +394,8 @@ void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct 
drm_device *dev,
 
if (ddr->bo_stats_peak_allocated < (s64)size)
ddr->bo_stats_peak_allocated = (s64)size;
+
+   ddr->bo_stats_count_allocated++;
}
mutex_unlock(>drmcg_mutex);
 }
@@ -402,15 +413,20 @@ EXPORT_SYMBOL(drmcg_chg_bo_alloc);
 void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
size_t size)
 {
+   struct drmcg_device_resource *ddr;
int devIdx = dev->primary->index;
 
if (drmcg == NULL)
return;
 
mutex_lock(>drmcg_mutex);
-   for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg))
-   drmcg->dev_resources[devIdx]->bo_stats_total_allocated
-   -= (s64)size;
+   for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
+   ddr = drmcg->dev_resources[devIdx];
+
+   ddr->bo_stats_total_allocated -= (s64)size;
+
+   ddr->bo_stats_count_allocated--;
+   }
mutex_unlock(>drmcg_mutex);
 }
 EXPORT_SYMBOL(drmcg_unchg_bo_alloc);
-- 
2.25.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 03/11] drm, cgroup: Initialize drmcg properties

2020-02-14 Thread Kenny Ho
drmcg initialization involves allocating a per cgroup, per device data
structure and setting the defaults.  There are two entry points for
drmcg init:

1) When struct drmcg is created via css_alloc, initialization is done
for each device

2) When DRM devices are created after drmcgs are created
  a) Per device drmcg data structure is allocated at the beginning of
  DRM device creation such that drmcg can begin tracking usage
  statistics
  b) At the end of DRM device creation, drmcg_register_dev will update in
  case device specific defaults need to be applied.

Entry point #2 usually applies to the root cgroup since it can be
created before DRM devices are available.  The drmcg controller will go
through all existing drm cgroups and initialize them with the new device
accordingly.

Change-Id: I64e421d8dfcc22ee8282cc1305960e20c2704db7
Signed-off-by: Kenny Ho 
---
 drivers/gpu/drm/drm_drv.c  |   4 ++
 include/drm/drm_cgroup.h   |  18 +++
 include/drm/drm_device.h   |   7 +++
 include/drm/drm_drv.h  |   9 
 include/linux/cgroup_drm.h |  12 +
 kernel/cgroup/drm.c| 105 +
 6 files changed, 155 insertions(+)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 8e59cc5a5bde..44a66edc81c2 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -643,6 +643,7 @@ int drm_dev_init(struct drm_device *dev,
mutex_init(>filelist_mutex);
mutex_init(>clientlist_mutex);
mutex_init(>master_mutex);
+   mutex_init(>drmcg_mutex);
 
dev->anon_inode = drm_fs_inode_new();
if (IS_ERR(dev->anon_inode)) {
@@ -679,6 +680,7 @@ int drm_dev_init(struct drm_device *dev,
if (ret)
goto err_setunique;
 
+   drmcg_device_early_init(dev);
return 0;
 
 err_setunique:
@@ -693,6 +695,7 @@ int drm_dev_init(struct drm_device *dev,
drm_fs_inode_free(dev->anon_inode);
 err_free:
put_device(dev->dev);
+   mutex_destroy(>drmcg_mutex);
mutex_destroy(>master_mutex);
mutex_destroy(>clientlist_mutex);
mutex_destroy(>filelist_mutex);
@@ -769,6 +772,7 @@ void drm_dev_fini(struct drm_device *dev)
 
put_device(dev->dev);
 
+   mutex_destroy(>drmcg_mutex);
mutex_destroy(>master_mutex);
mutex_destroy(>clientlist_mutex);
mutex_destroy(>filelist_mutex);
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 530c9a0b3238..fda426fba035 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -4,8 +4,17 @@
 #ifndef __DRM_CGROUP_H__
 #define __DRM_CGROUP_H__
 
+#include 
+
 #ifdef CONFIG_CGROUP_DRM
 
+/**
+ * Per DRM device properties for DRM cgroup controller for the purpose
+ * of storing per device defaults
+ */
+struct drmcg_props {
+};
+
 void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
void (*put_ddev)(struct drm_device *dev));
 
@@ -15,8 +24,13 @@ void drmcg_register_dev(struct drm_device *dev);
 
 void drmcg_unregister_dev(struct drm_device *dev);
 
+void drmcg_device_early_init(struct drm_device *device);
+
 #else
 
+struct drmcg_props {
+};
+
 static inline void drmcg_bind(
struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
void (*put_ddev)(struct drm_device *dev))
@@ -35,5 +49,9 @@ static inline void drmcg_unregister_dev(struct drm_device 
*dev)
 {
 }
 
+static inline void drmcg_device_early_init(struct drm_device *device)
+{
+}
+
 #endif /* CONFIG_CGROUP_DRM */
 #endif /* __DRM_CGROUP_H__ */
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index 1acfc3bbd3fb..a94598b8f670 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 
 struct drm_driver;
 struct drm_minor;
@@ -308,6 +309,12 @@ struct drm_device {
 */
struct drm_fb_helper *fb_helper;
 
+/** \name DRM Cgroup */
+   /*@{ */
+   struct mutex drmcg_mutex;
+   struct drmcg_props drmcg_props;
+   /*@} */
+
/* Everything below here is for legacy driver, never use! */
/* private: */
 #if IS_ENABLED(CONFIG_DRM_LEGACY)
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index cf13470810a5..1f65ac4d9bbf 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -715,6 +715,15 @@ struct drm_driver {
struct drm_device *dev,
uint32_t handle);
 
+   /**
+* @drmcg_custom_init
+*
+* Optional callback used to initialize drm cgroup per device properties
+* such as resource limit defaults.
+*/
+   void (*drmcg_custom_init)(struct drm_device *dev,
+   struct drmcg_props *props);
+
/**
 * @gem_vm_ops: Driver private ops for this object
 *
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 85

[PATCH 00/11] new cgroup controller for gpu/drm subsystem

2020-02-14 Thread Kenny Ho
 help with testing
graphics application robustness by providing a mean to artificially limit DRM
resources availble to the applications.


Challenges
==
While there are common infrastructure in DRM that is shared across many vendors
(the scheduler [4] for example), there are also aspects of DRM that are vendor
specific.  To accommodate this, we borrowed the mechanism used by the cgroup to
handle different kinds of cgroup controller.

Resources for DRM are also often device (GPU) specific instead of system
specific and a system may contain more than one GPU.  For this, we borrowed
some of the ideas from RDMA cgroup controller.

Approach

To experiment with the idea of a DRM cgroup, we would like to start with basic
accounting and statistics, then continue to iterate and add regulating
mechanisms into the driver.

[1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
[2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
[3] https://www.spinics.net/lists/cgroups/msg20720.html
[4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler
[5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
[6] 
https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/
[7] https://github.com/RadeonOpenCompute/k8s-device-plugin
[8] https://github.com/kubernetes/kubernetes/issues/52757

Kenny Ho (11):
  cgroup: Introduce cgroup for drm subsystem
  drm, cgroup: Bind drm and cgroup subsystem
  drm, cgroup: Initialize drmcg properties
  drm, cgroup: Add total GEM buffer allocation stats
  drm, cgroup: Add peak GEM buffer allocation stats
  drm, cgroup: Add GEM buffer allocation count stats
  drm, cgroup: Add total GEM buffer allocation limit
  drm, cgroup: Add peak GEM buffer allocation limit
  drm, cgroup: Introduce lgpu as DRM cgroup resource
  drm, cgroup: add update trigger after limit change
  drm/amdgpu: Integrate with DRM cgroup

 Documentation/admin-guide/cgroup-v2.rst   |  197 ++-
 Documentation/cgroup-v1/drm.rst   |1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |   48 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c|6 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |6 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |3 +
 .../amd/amdkfd/kfd_process_queue_manager.c|  153 +++
 drivers/gpu/drm/drm_drv.c |   12 +
 drivers/gpu/drm/drm_gem.c |   16 +-
 include/drm/drm_cgroup.h  |   81 ++
 include/drm/drm_device.h  |7 +
 include/drm/drm_drv.h |   19 +
 include/drm/drm_gem.h |   12 +-
 include/linux/cgroup_drm.h|  144 +++
 include/linux/cgroup_subsys.h |4 +
 init/Kconfig  |5 +
 kernel/cgroup/Makefile|1 +
 kernel/cgroup/drm.c   | 1059 +
 19 files changed, 1773 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/cgroup-v1/drm.rst
 create mode 100644 include/drm/drm_cgroup.h
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

-- 
2.25.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH RFC v4 07/16] drm, cgroup: Add total GEM buffer allocation limit

2019-11-28 Thread Kenny Ho
On Tue, Oct 1, 2019 at 10:30 AM Michal Koutný  wrote:
> On Thu, Aug 29, 2019 at 02:05:24AM -0400, Kenny Ho  wrote:
> > drm.buffer.default
> > A read-only flat-keyed file which exists on the root cgroup.
> > Each entry is keyed by the drm device's major:minor.
> >
> > Default limits on the total GEM buffer allocation in bytes.
> What is the purpose of this attribute (and alikes for other resources)?
> I can't see it being set differently but S64_MAX in
> drmcg_device_early_init.

cgroup has a number of conventions and one of which is the idea of a
default.  The idea here is to allow for device specific defaults.  For
this specific resource, I can probably not expose it since it's not
particularly useful, but for other resources (such as the lgpu
resource) the concept of a default is useful (for example, different
devices can have different number of lgpu.)


> > +static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> > [...]
> > + switch (type) {
> > + case DRMCG_TYPE_BO_TOTAL:
> > + p_max = parent == NULL ? S64_MAX :
> > + parent->dev_resources[minor]->
> > + bo_limits_total_allocated;
> > +
> > + rc = drmcg_process_limit_s64_val(sattr, true,
> > + props->bo_limits_total_allocated_default,
> > + p_max,
> > + );
> IIUC, this allows initiating the particular limit value based either on
> parent or the default per-device value. This is alas rather an
> antipattern. The most stringent limit on the path from a cgroup to the
> root should be applied at the charging time. However, the child should
> not inherit the verbatim value from the parent (may race with parent and
> it won't be updated upon parent change).
I think this was a mistake during one of my refactor and I shrunk the
critical section protected by a mutex a bit too much.  But you are
right in the sense that I don't propagate the limits downward to the
children when the parent's limit is updated.  But from the user
interface perspective, wouldn't this be confusing?  When a sysadmin
sets a limit using the 'max' keyword, the value would be a global one
even though the actual allowable maximum for the particular cgroup is
less in reality because of the ancestor cgroups?  (If this is the
established norm, I am ok to go along but seems confusing to me.)  I
am probably missing something because as I implemented this, the 'max'
and 'default' semantic has been confusing to me especially for the
children cgroups due to the context of the ancestors.

> You already do the appropriate hierarchical check in
> drmcg_try_chb_bo_alloc, so the parent propagation could be simply
> dropped if I'm not mistaken.
I will need to double check.  But I think interaction between parent
and children (or perhaps between siblings) will be needed eventually
because there seems to be a desire to implement "weight" type of
resource.  Also, from performance perspective, wouldn't it make more
sense to make sure the limits are set correctly during configuration
than to have to check all the cgroups up through the parents?  I don't
have comprehensive knowledge of the implementation of other cgroup
controllers so if more experience folks can comment that would be
great.  (Although, I probably should just do one approach instead of
doing both... or 1.5.)

>
> Also, I can't find how the read of
> parent->dev_resources[minor]->bo_limits_total_allocated and its
> concurrent update are synchronized (i.e. someone writing
> buffer.total.max for parent and child in parallel). (It may just my
> oversight.)
This is probably the refactor mistake I mentioned earlier.

Regards,
Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem

2019-11-28 Thread Kenny Ho
On Tue, Oct 1, 2019 at 10:31 AM Michal Koutný  wrote:
> On Thu, Aug 29, 2019 at 02:05:19AM -0400, Kenny Ho  wrote:
> > +struct cgroup_subsys drm_cgrp_subsys = {
> > + .css_alloc  = drmcg_css_alloc,
> > + .css_free   = drmcg_css_free,
> > + .early_init = false,
> > + .legacy_cftypes = files,
> Do you really want to expose the DRM controller on v1 hierarchies (where
> threads of one process can be in different cgroups, or children cgroups
> compete with their parents)?

(Sorry for the delay, I have been distracted by something else.)
Yes, I am hoping to make the functionality as widely available as
possible since the ecosystem is still transitioning to v2.  Do you see
inherent problem with this approach?

Regards,
Kenny


>
> > + .dfl_cftypes= files,
> > +};
>
> Just asking,
> Michal
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Proposal to report GPU private memory allocations with sysfs nodes [plain text version]

2019-10-31 Thread Kenny Ho
Hi Yiwei,

This is the latest series:
https://patchwork.kernel.org/cover/11120371/

(I still need to reply some of the feedback.)

Regards,
Kenny

On Thu, Oct 31, 2019 at 12:59 PM Yiwei Zhang  wrote:
>
> Hi Kenny,
>
> Thanks for the info. Do you mind forwarding the existing discussion to me or 
> have me cc'ed in that thread?
>
> Best,
> Yiwei
>
> On Wed, Oct 30, 2019 at 10:23 PM Kenny Ho  wrote:
>>
>> Hi Yiwei,
>>
>> I am not sure if you are aware, there is an ongoing RFC on adding drm
>> support in cgroup for the purpose of resource tracking.  One of the
>> resource is GPU memory.  It's not exactly the same as what you are
>> proposing (it doesn't track API usage, but it tracks the type of GPU
>> memory from kmd perspective) but perhaps it would be of interest to
>> you.  There are no consensus on it at this point.
>>
>> (sorry for being late to the discussion.  I only noticed this thread
>> when one of the email got lucky and escape the spam folder.)
>>
>> Regards,
>> Kenny
>>
>> On Wed, Oct 30, 2019 at 4:14 AM Yiwei Zhang  wrote:
>> >
>> > Hi Jerome and all folks,
>> >
>> > In addition to my last reply, I just wanna get some more information 
>> > regarding this on the upstream side.
>> >
>> > 1. Do you think this(standardize a way to report GPU private allocations) 
>> > is going to be a useful thing on the upstream as well? It grants a lot 
>> > benefits for Android, but I'd like to get an idea for the non-Android 
>> > world.
>> >
>> > 2. There might be some worries that upstream kernel driver has no idea 
>> > regarding the API. However, to achieve good fidelity around memory 
>> > reporting, we'd have to pass down certain metadata which is known only by 
>> > the userland. Consider this use case: on the upstream side, freedreno for 
>> > example, some memory buffer object(BO) during its own lifecycle could 
>> > represent totally different things, and kmd is not aware of that. When 
>> > we'd like to take memory snapshots at certain granularity, we have to know 
>> > what that buffer represents so that the snapshot can be meaningful and 
>> > useful.
>> >
>> > If we just keep this Android specific, I'd worry some day the upstream has 
>> > standardized a way to report this and Android vendors have to take extra 
>> > efforts to migrate over. This is one of the main reasons we'd like to do 
>> > this on the upstream side.
>> >
>> > Timeline wise, Android has explicit deadlines for the next release and we 
>> > have to push hard towards those. Any prompt responses are very much 
>> > appreciated!
>> >
>> > Best regards,
>> > Yiwei
>> >
>> > On Mon, Oct 28, 2019 at 11:33 AM Yiwei Zhang  wrote:
>> >>
>> >> On Mon, Oct 28, 2019 at 8:26 AM Jerome Glisse  wrote:
>> >>>
>> >>> On Fri, Oct 25, 2019 at 11:35:32AM -0700, Yiwei Zhang wrote:
>> >>> > Hi folks,
>> >>> >
>> >>> > This is the plain text version of the previous email in case that was
>> >>> > considered as spam.
>> >>> >
>> >>> > --- Background ---
>> >>> > On the downstream Android, vendors used to report GPU private memory
>> >>> > allocations with debugfs nodes in their own formats. However, debugfs 
>> >>> > nodes
>> >>> > are getting deprecated in the next Android release.
>> >>>
>> >>> Maybe explain why it is useful first ?
>> >>
>> >>
>> >> Memory is precious on Android mobile platforms. Apps using a large amount 
>> >> of
>> >> memory, games, tend to maintain a table for the memory on different 
>> >> devices with
>> >> different prediction models. Private gpu memory allocations is currently 
>> >> semi-blind
>> >> to the apps and the platform as well.
>> >>
>> >> By having the data, the platform can do:
>> >> (1) GPU memory profiling as part of the huge Android profiler in progress.
>> >> (2) Android system health team can enrich the performance test coverage.
>> >> (3) We can collect filed metrics to detect any regression on the gpu 
>> >> private memory
>> >> allocations in the production population.
>> >> (4) Shell user can easily dump the allocations in a uniform way across 
>> >> vendors.
>> >> (5) Platform can feed 

Re: Proposal to report GPU private memory allocations with sysfs nodes [plain text version]

2019-10-30 Thread Kenny Ho
Hi Yiwei,

I am not sure if you are aware, there is an ongoing RFC on adding drm
support in cgroup for the purpose of resource tracking.  One of the
resource is GPU memory.  It's not exactly the same as what you are
proposing (it doesn't track API usage, but it tracks the type of GPU
memory from kmd perspective) but perhaps it would be of interest to
you.  There are no consensus on it at this point.

(sorry for being late to the discussion.  I only noticed this thread
when one of the email got lucky and escape the spam folder.)

Regards,
Kenny

On Wed, Oct 30, 2019 at 4:14 AM Yiwei Zhang  wrote:
>
> Hi Jerome and all folks,
>
> In addition to my last reply, I just wanna get some more information 
> regarding this on the upstream side.
>
> 1. Do you think this(standardize a way to report GPU private allocations) is 
> going to be a useful thing on the upstream as well? It grants a lot benefits 
> for Android, but I'd like to get an idea for the non-Android world.
>
> 2. There might be some worries that upstream kernel driver has no idea 
> regarding the API. However, to achieve good fidelity around memory reporting, 
> we'd have to pass down certain metadata which is known only by the userland. 
> Consider this use case: on the upstream side, freedreno for example, some 
> memory buffer object(BO) during its own lifecycle could represent totally 
> different things, and kmd is not aware of that. When we'd like to take memory 
> snapshots at certain granularity, we have to know what that buffer represents 
> so that the snapshot can be meaningful and useful.
>
> If we just keep this Android specific, I'd worry some day the upstream has 
> standardized a way to report this and Android vendors have to take extra 
> efforts to migrate over. This is one of the main reasons we'd like to do this 
> on the upstream side.
>
> Timeline wise, Android has explicit deadlines for the next release and we 
> have to push hard towards those. Any prompt responses are very much 
> appreciated!
>
> Best regards,
> Yiwei
>
> On Mon, Oct 28, 2019 at 11:33 AM Yiwei Zhang  wrote:
>>
>> On Mon, Oct 28, 2019 at 8:26 AM Jerome Glisse  wrote:
>>>
>>> On Fri, Oct 25, 2019 at 11:35:32AM -0700, Yiwei Zhang wrote:
>>> > Hi folks,
>>> >
>>> > This is the plain text version of the previous email in case that was
>>> > considered as spam.
>>> >
>>> > --- Background ---
>>> > On the downstream Android, vendors used to report GPU private memory
>>> > allocations with debugfs nodes in their own formats. However, debugfs 
>>> > nodes
>>> > are getting deprecated in the next Android release.
>>>
>>> Maybe explain why it is useful first ?
>>
>>
>> Memory is precious on Android mobile platforms. Apps using a large amount of
>> memory, games, tend to maintain a table for the memory on different devices 
>> with
>> different prediction models. Private gpu memory allocations is currently 
>> semi-blind
>> to the apps and the platform as well.
>>
>> By having the data, the platform can do:
>> (1) GPU memory profiling as part of the huge Android profiler in progress.
>> (2) Android system health team can enrich the performance test coverage.
>> (3) We can collect filed metrics to detect any regression on the gpu private 
>> memory
>> allocations in the production population.
>> (4) Shell user can easily dump the allocations in a uniform way across 
>> vendors.
>> (5) Platform can feed the data to the apps so that apps can do memory 
>> allocations
>> in a more predictable way.
>>
>>>
>>> >
>>> > --- Proposal ---
>>> > We are taking the chance to unify all the vendors to migrate their 
>>> > existing
>>> > debugfs nodes into a standardized sysfs node structure. Then the platform
>>> > is able to do a bunch of useful things: memory profiling, system health
>>> > coverage, field metrics, local shell dump, in-app api, etc. This proposal
>>> > is better served upstream as all GPU vendors can standardize a gpu memory
>>> > structure and reduce fragmentation across Android and Linux that clients
>>> > can rely on.
>>> >
>>> > --- Detailed design ---
>>> > The sysfs node structure looks like below:
>>> > /sys/devices///
>>> > e.g. "/sys/devices/mali0/gpu_mem/606/gl_buffer" and the gl_buffer is a 
>>> > node
>>> > having the comma separated size values: "4096,81920,...,4096".
>>>
>>> How does kernel knows what API the allocation is use for ? With the
>>> open source driver you never specify what API is creating a gem object
>>> (opengl, vulkan, ...) nor what purpose (transient, shader, ...).
>>
>>
>> Oh, is this a hard requirement for the open source drivers to not bookkeep 
>> any
>> data from userland? I think the API is just some additional metadata passed 
>> down.
>>
>>>
>>>
>>> > For the top level root, vendors can choose their own names based on the
>>> > value of ro.gfx.sysfs.0 the vendors set. (1) For the multiple gpu driver
>>> > cases, we can use ro.gfx.sysfs.1, ro.gfx.sysfs.2 for the 2nd and 3rd KMDs.
>>> > (2) It's also allowed to put some sub-dir 

Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource

2019-10-09 Thread Kenny Ho
Hi Daniel,

Can you elaborate what you mean in more details?  The goal of lgpu is
to provide the ability to subdivide a GPU device and give those slices
to different users as needed.  I don't think there is anything
controversial or vendor specific here as requests for this are well
documented.  The underlying representation is just a bitmap, which is
neither unprecedented nor vendor specific (bitmap is used in cpuset
for instance.)

An implementation of this abstraction is not hardware specific either.
For example, one can associate a virtual function in SRIOV as a lgpu.
Alternatively, a device can also declare to have 100 lgpus and treat
the lgpu quantity as a percentage representation of GPU subdivision.
The fact that an abstraction works well with a vendor implementation
does not make it a "prettification" of a vendor feature (by this
logic, I hope you are not implying an abstraction is only valid if it
does not work with amd CU masking because that seems fairly partisan.)

Did I misread your characterization of this patch?

Regards,
Kenny


On Wed, Oct 9, 2019 at 6:31 AM Daniel Vetter  wrote:
>
> On Tue, Oct 08, 2019 at 06:53:18PM +, Kuehling, Felix wrote:
> > On 2019-08-29 2:05 a.m., Kenny Ho wrote:
> > > drm.lgpu
> > >  A read-write nested-keyed file which exists on all cgroups.
> > >  Each entry is keyed by the DRM device's major:minor.
> > >
> > >  lgpu stands for logical GPU, it is an abstraction used to
> > >  subdivide a physical DRM device for the purpose of resource
> > >  management.
> > >
> > >  The lgpu is a discrete quantity that is device specific (i.e.
> > >  some DRM devices may have 64 lgpus while others may have 100
> > >  lgpus.)  The lgpu is a single quantity with two representations
> > >  denoted by the following nested keys.
> > >
> > >= 
> > >count Representing lgpu as anonymous resource
> > >list  Representing lgpu as named resource
> > >= 
> > >
> > >  For example:
> > >  226:0 count=256 list=0-255
> > >  226:1 count=4 list=0,2,4,6
> > >  226:2 count=32 list=32-63
> > >
> > >  lgpu is represented by a bitmap and uses the bitmap_parselist
> > >  kernel function so the list key input format is a
> > >  comma-separated list of decimal numbers and ranges.
> > >
> > >  Consecutively set bits are shown as two hyphen-separated decimal
> > >  numbers, the smallest and largest bit numbers set in the range.
> > >  Optionally each range can be postfixed to denote that only parts
> > >  of it should be set.  The range will divided to groups of
> > >  specific size.
> > >  Syntax: range:used_size/group_size
> > >  Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> > >
> > >  The count key is the hamming weight / hweight of the bitmap.
> > >
> > >  Both count and list accept the max and default keywords.
> > >
> > >      Some DRM devices may only support lgpu as anonymous resources.
> > >  In such case, the significance of the position of the set bits
> > >  in list will be ignored.
> > >
> > >  This lgpu resource supports the 'allocation' resource
> > >  distribution model.
> > >
> > > Change-Id: I1afcacf356770930c7f925df043e51ad06ceb98e
> > > Signed-off-by: Kenny Ho 
> >
> > The description sounds reasonable to me and maps well to the CU masking
> > feature in our GPUs.
> >
> > It would also allow us to do more coarse-grained masking for example to
> > guarantee balanced allocation of CUs across shader engines or
> > partitioning of memory bandwidth or CP pipes (if that is supported by
> > the hardware/firmware).
>
> Hm, so this sounds like the definition for how this cgroup is supposed to
> work is "amd CU masking" (whatever that exactly is). And the abstract
> description is just prettification on top, but not actually the real
> definition you guys want.
>
> I think adding a cgroup which is that much depending upon the hw
> implementation of the first driver supporting it is not a good idea.
> -Daniel
>
> >
> > I can't comment on the code as I'm unfamiliar with the details of the
> > cgroup code.
> >
> > Acked-by: Felix Kuehling 

Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each

2019-09-05 Thread Kenny Ho
On Thu, Sep 5, 2019 at 4:32 PM Daniel Vetter  wrote:
>
*snip*
> drm_dev_unregister gets called on hotunplug, so your cgroup-internal
> tracking won't get out of sync any more than the drm_minor list gets
> out of sync with drm_devices. The trouble with drm_minor is just that
> cgroup doesn't track allocations on drm_minor (that's just the uapi
> flavour), but on the underlying drm_device. So really doesn't make
> much sense to attach cgroup tracking to the drm_minor.

Um... I think I get what you are saying, but isn't this a matter of
the cgroup controller doing a drm_dev_get when using the drm_minor?
Or that won't work because it's possible to have a valid drm_minor but
invalid drm_device in it? I understand it's an extra level of
indirection but since the convention for addressing device in cgroup
is using $major:$minor I don't see a way to escape this.  (Tejun
actually already made a comment on my earlier RFC where I didn't
follow the major:minor convention strictly.)

Kenny

> > > Just doing a drm_cg_register/unregister pair that's called from
> > > drm_dev_register/unregister, and then if you want, looking up the
> > > right minor (I think always picking the render node makes sense for
> > > this, and skipping if there's no render node) would make most sense.
> > > At least for the basic cgroup controllers which are generic across
> > > drivers.
> >
> > Why do we want to skip drm devices that does not have a render node
> > and not just use the primary instead?
>
> I guess we could also take the primary node, but drivers with only
> primary node are generaly display-only drm drivers. Not sure we want
> cgroups on those (but I guess can't hurt, and more consistent). But
> then we'd always need to pick the primary node for cgroup
> identification purposes.
> -Daniel
>
> >
> > Kenny
> >
> >
> >
> > > -Daniel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each

2019-09-05 Thread Kenny Ho
On Thu, Sep 5, 2019 at 4:06 PM Daniel Vetter  wrote:
>
> On Thu, Sep 5, 2019 at 8:28 PM Kenny Ho  wrote:
> >
> > (resent in plain text mode)
> >
> > Hi Daniel,
> >
> > This is the previous patch relevant to this discussion:
> > https://patchwork.freedesktop.org/patch/314343/
>
> Ah yes, thanks for finding that.
>
> > So before I refactored the code to leverage drm_minor, I kept my own
> > list of "known" drm_device inside the controller and have explicit
> > register and unregister function to init per device cgroup defaults.
> > For v4, I refactored the per device cgroup properties and embedded
> > them into the drm_device and continue to only use the primary minor as
> > a way to index the device as v3.
>
> I didn't really like the explicit registration step, at least for the
> basic cgroup controls (like gem buffer limits), and suggested that
> should happen automatically at drm_dev_register/unregister time. I
> also talked about picking a consistent minor (if we have to use
> minors, still would like Tejun to confirm what we should do here), but
> that was an unrelated comment. So doing auto-registration on drm_minor
> was one step too far.

How about your comments on embedding properties into drm_device?  I am
actually still not clear on the downside of using drm_minor this way.
With this implementation in v4, there isn't additional state that can
go out of sync with the ground truth of drm_device from the
perspective of drm_minor.  Wouldn't the issue with hotplugging drm
device you described earlier get worsen if the cgroup controller keep
its own list?

> Just doing a drm_cg_register/unregister pair that's called from
> drm_dev_register/unregister, and then if you want, looking up the
> right minor (I think always picking the render node makes sense for
> this, and skipping if there's no render node) would make most sense.
> At least for the basic cgroup controllers which are generic across
> drivers.

Why do we want to skip drm devices that does not have a render node
and not just use the primary instead?

Kenny



> -Daniel
>
>
>
> >
> > Regards,
> > Kenny
> >
> >
> > On Wed, Sep 4, 2019 at 4:54 AM Daniel Vetter  wrote:
> > >
> > > On Tue, Sep 03, 2019 at 04:43:45PM -0400, Kenny Ho wrote:
> > > > On Tue, Sep 3, 2019 at 4:12 PM Daniel Vetter  wrote:
> > > > > On Tue, Sep 3, 2019 at 9:45 PM Kenny Ho  wrote:
> > > > > > On Tue, Sep 3, 2019 at 3:57 AM Daniel Vetter  
> > > > > > wrote:
> > > > > > > Iterating over minors for cgroups sounds very, very wrong. Why do 
> > > > > > > we care
> > > > > > > whether a buffer was allocated through kms dumb vs render nodes?
> > > > > > >
> > > > > > > I'd expect all the cgroup stuff to only work on drm_device, if it 
> > > > > > > does
> > > > > > > care about devices.
> > > > > > >
> > > > > > > (I didn't look through the patch series to find out where exactly 
> > > > > > > you're
> > > > > > > using this, so maybe I'm off the rails here).
> > > > > >
> > > > > > I am exposing this to remove the need to keep track of a separate 
> > > > > > list
> > > > > > of available drm_device in the system (to remove the registering and
> > > > > > unregistering of drm_device to the cgroup subsystem and just use
> > > > > > drm_minor as the single source of truth.)  I am only filtering out 
> > > > > > the
> > > > > > render nodes minor because they point to the same drm_device and is
> > > > > > confusing.
> > > > > >
> > > > > > Perhaps I missed an obvious way to list the drm devices without
> > > > > > iterating through the drm_minors?  (I probably jumped to the minors
> > > > > > because $major:$minor is the convention to address devices in 
> > > > > > cgroup.)
> > > > >
> > > > > Create your own if there's nothing, because you need to anyway:
> > > > > - You need special locking anyway, we can't just block on the idr lock
> > > > > for everything.
> > > > > - This needs to refcount drm_device, no the minors.
> > > > >
> > > > > Iterating over stuff still feels kinda wrong still, because normally
> > > > > the way we register/unregister userspace api (and cgroups isn't
> > > > >

Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each

2019-09-05 Thread Kenny Ho
(resent in plain text mode)

Hi Daniel,

This is the previous patch relevant to this discussion:
https://patchwork.freedesktop.org/patch/314343/

So before I refactored the code to leverage drm_minor, I kept my own
list of "known" drm_device inside the controller and have explicit
register and unregister function to init per device cgroup defaults.
For v4, I refactored the per device cgroup properties and embedded
them into the drm_device and continue to only use the primary minor as
a way to index the device as v3.

Regards,
Kenny


On Wed, Sep 4, 2019 at 4:54 AM Daniel Vetter  wrote:
>
> On Tue, Sep 03, 2019 at 04:43:45PM -0400, Kenny Ho wrote:
> > On Tue, Sep 3, 2019 at 4:12 PM Daniel Vetter  wrote:
> > > On Tue, Sep 3, 2019 at 9:45 PM Kenny Ho  wrote:
> > > > On Tue, Sep 3, 2019 at 3:57 AM Daniel Vetter  wrote:
> > > > > Iterating over minors for cgroups sounds very, very wrong. Why do we 
> > > > > care
> > > > > whether a buffer was allocated through kms dumb vs render nodes?
> > > > >
> > > > > I'd expect all the cgroup stuff to only work on drm_device, if it does
> > > > > care about devices.
> > > > >
> > > > > (I didn't look through the patch series to find out where exactly 
> > > > > you're
> > > > > using this, so maybe I'm off the rails here).
> > > >
> > > > I am exposing this to remove the need to keep track of a separate list
> > > > of available drm_device in the system (to remove the registering and
> > > > unregistering of drm_device to the cgroup subsystem and just use
> > > > drm_minor as the single source of truth.)  I am only filtering out the
> > > > render nodes minor because they point to the same drm_device and is
> > > > confusing.
> > > >
> > > > Perhaps I missed an obvious way to list the drm devices without
> > > > iterating through the drm_minors?  (I probably jumped to the minors
> > > > because $major:$minor is the convention to address devices in cgroup.)
> > >
> > > Create your own if there's nothing, because you need to anyway:
> > > - You need special locking anyway, we can't just block on the idr lock
> > > for everything.
> > > - This needs to refcount drm_device, no the minors.
> > >
> > > Iterating over stuff still feels kinda wrong still, because normally
> > > the way we register/unregister userspace api (and cgroups isn't
> > > anything else from a drm driver pov) is by adding more calls to
> > > drm_dev_register/unregister. If you put a drm_cg_register/unregister
> > > call in there we have a clean separation, and you can track all the
> > > currently active devices however you want. Iterating over objects that
> > > can be hotunplugged any time tends to get really complicated really
> > > quickly.
> >
> > Um... I thought this is what I had previously.  Did I misunderstood
> > your feedback from v3?  Doesn't drm_minor already include all these
> > facilities so isn't creating my own kind of reinventing the wheel?
> > (as I did previously?)  drm_minor_register is called inside
> > drm_dev_register so isn't leveraging existing drm_minor facilities
> > much better solution?
>
> Hm the previous version already dropped out of my inbox, so hard to find
> it again. And I couldn't find this in archieves. Do you have pointers?
>
> I thought the previous version did cgroup init separately from drm_device
> setup, and I guess I suggested that it should be moved int
> drm_dev_register/unregister?
>
> Anyway, I don't think reusing the drm_minor registration makes sense,
> since we want to be on the drm_device, not on the minor. Which is a bit
> awkward for cgroups, which wants to identify devices using major.minor
> pairs. But I guess drm is the first subsystem where 1 device can be
> exposed through multiple minors ...
>
> Tejun, any suggestions on this?
>
> Anyway, I think just leveraging existing code because it can be abused to
> make it fit for us doesn't make sense. E.g. for the kms side we also don't
> piggy-back on top of drm_minor_register (it would be technically
> possible), but instead we have drm_modeset_register_all().
> -Daniel
>
> >
> > Kenny
> >
> > >
> > >
> > > >
> > > > Kenny
> > > >
> > > > > -Daniel
> > > > >
> > > > > > ---
> > > > > >  drivers/gpu/drm/drm_drv.c  | 19 +++
> > > > > >  drivers/gpu/dr

Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each

2019-09-05 Thread Kenny Ho
Hi Daniel,

This is the previous patch relevant to this discussion:
https://patchwork.freedesktop.org/patch/314343/

So before I refactored the code to leverage drm_minor, I kept my own list
of "known" drm_device inside the controller and have explicit register and
unregister function to init per device cgroup defaults.  For v4, I
refactored the per device cgroup properties and embedded them into the
drm_device and continue to only use the primary minor as a way to index the
device as v3.

Regards,
Kenny

On Wed, Sep 4, 2019 at 4:54 AM Daniel Vetter  wrote:

> On Tue, Sep 03, 2019 at 04:43:45PM -0400, Kenny Ho wrote:
> > On Tue, Sep 3, 2019 at 4:12 PM Daniel Vetter  wrote:
> > > On Tue, Sep 3, 2019 at 9:45 PM Kenny Ho  wrote:
> > > > On Tue, Sep 3, 2019 at 3:57 AM Daniel Vetter 
> wrote:
> > > > > Iterating over minors for cgroups sounds very, very wrong. Why do
> we care
> > > > > whether a buffer was allocated through kms dumb vs render nodes?
> > > > >
> > > > > I'd expect all the cgroup stuff to only work on drm_device, if it
> does
> > > > > care about devices.
> > > > >
> > > > > (I didn't look through the patch series to find out where exactly
> you're
> > > > > using this, so maybe I'm off the rails here).
> > > >
> > > > I am exposing this to remove the need to keep track of a separate
> list
> > > > of available drm_device in the system (to remove the registering and
> > > > unregistering of drm_device to the cgroup subsystem and just use
> > > > drm_minor as the single source of truth.)  I am only filtering out
> the
> > > > render nodes minor because they point to the same drm_device and is
> > > > confusing.
> > > >
> > > > Perhaps I missed an obvious way to list the drm devices without
> > > > iterating through the drm_minors?  (I probably jumped to the minors
> > > > because $major:$minor is the convention to address devices in
> cgroup.)
> > >
> > > Create your own if there's nothing, because you need to anyway:
> > > - You need special locking anyway, we can't just block on the idr lock
> > > for everything.
> > > - This needs to refcount drm_device, no the minors.
> > >
> > > Iterating over stuff still feels kinda wrong still, because normally
> > > the way we register/unregister userspace api (and cgroups isn't
> > > anything else from a drm driver pov) is by adding more calls to
> > > drm_dev_register/unregister. If you put a drm_cg_register/unregister
> > > call in there we have a clean separation, and you can track all the
> > > currently active devices however you want. Iterating over objects that
> > > can be hotunplugged any time tends to get really complicated really
> > > quickly.
> >
> > Um... I thought this is what I had previously.  Did I misunderstood
> > your feedback from v3?  Doesn't drm_minor already include all these
> > facilities so isn't creating my own kind of reinventing the wheel?
> > (as I did previously?)  drm_minor_register is called inside
> > drm_dev_register so isn't leveraging existing drm_minor facilities
> > much better solution?
>
> Hm the previous version already dropped out of my inbox, so hard to find
> it again. And I couldn't find this in archieves. Do you have pointers?
>
> I thought the previous version did cgroup init separately from drm_device
> setup, and I guess I suggested that it should be moved int
> drm_dev_register/unregister?
>
> Anyway, I don't think reusing the drm_minor registration makes sense,
> since we want to be on the drm_device, not on the minor. Which is a bit
> awkward for cgroups, which wants to identify devices using major.minor
> pairs. But I guess drm is the first subsystem where 1 device can be
> exposed through multiple minors ...
>
> Tejun, any suggestions on this?
>
> Anyway, I think just leveraging existing code because it can be abused to
> make it fit for us doesn't make sense. E.g. for the kms side we also don't
> piggy-back on top of drm_minor_register (it would be technically
> possible), but instead we have drm_modeset_register_all().
> -Daniel
>
> >
> > Kenny
> >
> > >
> > >
> > > >
> > > > Kenny
> > > >
> > > > > -Daniel
> > > > >
> > > > > > ---
> > > > > >  drivers/gpu/drm/drm_drv.c  | 19 +++
> > > > > >  drivers/gpu/drm/drm_internal.h |  4 
> > > > > >  include

Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each

2019-09-03 Thread Kenny Ho
On Tue, Sep 3, 2019 at 4:12 PM Daniel Vetter  wrote:
> On Tue, Sep 3, 2019 at 9:45 PM Kenny Ho  wrote:
> > On Tue, Sep 3, 2019 at 3:57 AM Daniel Vetter  wrote:
> > > Iterating over minors for cgroups sounds very, very wrong. Why do we care
> > > whether a buffer was allocated through kms dumb vs render nodes?
> > >
> > > I'd expect all the cgroup stuff to only work on drm_device, if it does
> > > care about devices.
> > >
> > > (I didn't look through the patch series to find out where exactly you're
> > > using this, so maybe I'm off the rails here).
> >
> > I am exposing this to remove the need to keep track of a separate list
> > of available drm_device in the system (to remove the registering and
> > unregistering of drm_device to the cgroup subsystem and just use
> > drm_minor as the single source of truth.)  I am only filtering out the
> > render nodes minor because they point to the same drm_device and is
> > confusing.
> >
> > Perhaps I missed an obvious way to list the drm devices without
> > iterating through the drm_minors?  (I probably jumped to the minors
> > because $major:$minor is the convention to address devices in cgroup.)
>
> Create your own if there's nothing, because you need to anyway:
> - You need special locking anyway, we can't just block on the idr lock
> for everything.
> - This needs to refcount drm_device, no the minors.
>
> Iterating over stuff still feels kinda wrong still, because normally
> the way we register/unregister userspace api (and cgroups isn't
> anything else from a drm driver pov) is by adding more calls to
> drm_dev_register/unregister. If you put a drm_cg_register/unregister
> call in there we have a clean separation, and you can track all the
> currently active devices however you want. Iterating over objects that
> can be hotunplugged any time tends to get really complicated really
> quickly.

Um... I thought this is what I had previously.  Did I misunderstood
your feedback from v3?  Doesn't drm_minor already include all these
facilities so isn't creating my own kind of reinventing the wheel?
(as I did previously?)  drm_minor_register is called inside
drm_dev_register so isn't leveraging existing drm_minor facilities
much better solution?

Kenny

>
>
> >
> > Kenny
> >
> > > -Daniel
> > >
> > > > ---
> > > >  drivers/gpu/drm/drm_drv.c  | 19 +++
> > > >  drivers/gpu/drm/drm_internal.h |  4 
> > > >  include/drm/drm_drv.h  |  4 
> > > >  3 files changed, 23 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > > > index 862621494a93..000cddabd970 100644
> > > > --- a/drivers/gpu/drm/drm_drv.c
> > > > +++ b/drivers/gpu/drm/drm_drv.c
> > > > @@ -254,11 +254,13 @@ struct drm_minor *drm_minor_acquire(unsigned int 
> > > > minor_id)
> > > >
> > > >   return minor;
> > > >  }
> > > > +EXPORT_SYMBOL(drm_minor_acquire);
> > > >
> > > >  void drm_minor_release(struct drm_minor *minor)
> > > >  {
> > > >   drm_dev_put(minor->dev);
> > > >  }
> > > > +EXPORT_SYMBOL(drm_minor_release);
> > > >
> > > >  /**
> > > >   * DOC: driver instance overview
> > > > @@ -1078,6 +1080,23 @@ int drm_dev_set_unique(struct drm_device *dev, 
> > > > const char *name)
> > > >  }
> > > >  EXPORT_SYMBOL(drm_dev_set_unique);
> > > >
> > > > +/**
> > > > + * drm_minor_for_each - Iterate through all stored DRM minors
> > > > + * @fn: Function to be called for each pointer.
> > > > + * @data: Data passed to callback function.
> > > > + *
> > > > + * The callback function will be called for each @drm_minor entry, 
> > > > passing
> > > > + * the minor, the entry and @data.
> > > > + *
> > > > + * If @fn returns anything other than %0, the iteration stops and that
> > > > + * value is returned from this function.
> > > > + */
> > > > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void 
> > > > *data)
> > > > +{
> > > > + return idr_for_each(_minors_idr, fn, data);
> > > > +}
> > > > +EXPORT_SYMBOL(drm_minor_for_each);
> > > > +
> > > >  /*
> > > >   * DRM Core
> > > >   * The DRM core module initializes all global DRM obj

Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each

2019-09-03 Thread Kenny Ho
On Tue, Sep 3, 2019 at 3:57 AM Daniel Vetter  wrote:
>
> On Thu, Aug 29, 2019 at 02:05:18AM -0400, Kenny Ho wrote:
> > To allow other subsystems to iterate through all stored DRM minors and
> > act upon them.
> >
> > Also exposes drm_minor_acquire and drm_minor_release for other subsystem
> > to handle drm_minor.  DRM cgroup controller is the initial consumer of
> > this new features.
> >
> > Change-Id: I7c4b67ce6b31f06d1037b03435386ff5b8144ca5
> > Signed-off-by: Kenny Ho 
>
> Iterating over minors for cgroups sounds very, very wrong. Why do we care
> whether a buffer was allocated through kms dumb vs render nodes?
>
> I'd expect all the cgroup stuff to only work on drm_device, if it does
> care about devices.
>
> (I didn't look through the patch series to find out where exactly you're
> using this, so maybe I'm off the rails here).

I am exposing this to remove the need to keep track of a separate list
of available drm_device in the system (to remove the registering and
unregistering of drm_device to the cgroup subsystem and just use
drm_minor as the single source of truth.)  I am only filtering out the
render nodes minor because they point to the same drm_device and is
confusing.

Perhaps I missed an obvious way to list the drm devices without
iterating through the drm_minors?  (I probably jumped to the minors
because $major:$minor is the convention to address devices in cgroup.)

Kenny

> -Daniel
>
> > ---
> >  drivers/gpu/drm/drm_drv.c  | 19 +++
> >  drivers/gpu/drm/drm_internal.h |  4 
> >  include/drm/drm_drv.h  |  4 
> >  3 files changed, 23 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > index 862621494a93..000cddabd970 100644
> > --- a/drivers/gpu/drm/drm_drv.c
> > +++ b/drivers/gpu/drm/drm_drv.c
> > @@ -254,11 +254,13 @@ struct drm_minor *drm_minor_acquire(unsigned int 
> > minor_id)
> >
> >   return minor;
> >  }
> > +EXPORT_SYMBOL(drm_minor_acquire);
> >
> >  void drm_minor_release(struct drm_minor *minor)
> >  {
> >   drm_dev_put(minor->dev);
> >  }
> > +EXPORT_SYMBOL(drm_minor_release);
> >
> >  /**
> >   * DOC: driver instance overview
> > @@ -1078,6 +1080,23 @@ int drm_dev_set_unique(struct drm_device *dev, const 
> > char *name)
> >  }
> >  EXPORT_SYMBOL(drm_dev_set_unique);
> >
> > +/**
> > + * drm_minor_for_each - Iterate through all stored DRM minors
> > + * @fn: Function to be called for each pointer.
> > + * @data: Data passed to callback function.
> > + *
> > + * The callback function will be called for each @drm_minor entry, passing
> > + * the minor, the entry and @data.
> > + *
> > + * If @fn returns anything other than %0, the iteration stops and that
> > + * value is returned from this function.
> > + */
> > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data)
> > +{
> > + return idr_for_each(_minors_idr, fn, data);
> > +}
> > +EXPORT_SYMBOL(drm_minor_for_each);
> > +
> >  /*
> >   * DRM Core
> >   * The DRM core module initializes all global DRM objects and makes them
> > diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
> > index e19ac7ca602d..6bfad76f8e78 100644
> > --- a/drivers/gpu/drm/drm_internal.h
> > +++ b/drivers/gpu/drm/drm_internal.h
> > @@ -54,10 +54,6 @@ void drm_prime_destroy_file_private(struct 
> > drm_prime_file_private *prime_fpriv);
> >  void drm_prime_remove_buf_handle_locked(struct drm_prime_file_private 
> > *prime_fpriv,
> >   struct dma_buf *dma_buf);
> >
> > -/* drm_drv.c */
> > -struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > -void drm_minor_release(struct drm_minor *minor);
> > -
> >  /* drm_vblank.c */
> >  void drm_vblank_disable_and_save(struct drm_device *dev, unsigned int 
> > pipe);
> >  void drm_vblank_cleanup(struct drm_device *dev);
> > diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> > index 68ca736c548d..24f8d054c570 100644
> > --- a/include/drm/drm_drv.h
> > +++ b/include/drm/drm_drv.h
> > @@ -799,5 +799,9 @@ static inline bool drm_drv_uses_atomic_modeset(struct 
> > drm_device *dev)
> >
> >  int drm_dev_set_unique(struct drm_device *dev, const char *name);
> >
> > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data);
> > +
> > +struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > +void drm_minor_release(struct drm_minor *minor);
> >
> >  #endif
> > --
> > 2.22.0
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

2019-09-03 Thread Kenny Ho
On Tue, Sep 3, 2019 at 5:20 AM Daniel Vetter  wrote:
>
> On Tue, Sep 3, 2019 at 10:24 AM Koenig, Christian
>  wrote:
> >
> > Am 03.09.19 um 10:02 schrieb Daniel Vetter:
> > > On Thu, Aug 29, 2019 at 02:05:17AM -0400, Kenny Ho wrote:
> > >> With this RFC v4, I am hoping to have some consensus on a merge plan.  I 
> > >> believe
> > >> the GEM related resources (drm.buffer.*) introduced in previous RFC and,
> > >> hopefully, the logical GPU concept (drm.lgpu.*) introduced in this RFC 
> > >> are
> > >> uncontroversial and ready to move out of RFC and into a more formal 
> > >> review.  I
> > >> will continue to work on the memory backend resources (drm.memory.*).
> > >>
> > >> The cover letter from v1 is copied below for reference.
> > >>
> > >> [v1]: 
> > >> https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
> > >> [v2]: https://www.spinics.net/lists/cgroups/msg22074.html
> > >> [v3]: 
> > >> https://lists.freedesktop.org/archives/amd-gfx/2019-June/036026.html
> > > So looking at all this doesn't seem to have changed much, and the old
> > > discussion didn't really conclude anywhere (aside from some details).
> > >
> > > One more open though that crossed my mind, having read a ton of ttm again
> > > recently: How does this all interact with ttm global limits? I'd say the
> > > ttm global limits is the ur-cgroups we have in drm, and not looking at
> > > that seems kinda bad.
> >
> > At least my hope was to completely replace ttm globals with those
> > limitations here when it is ready.
>
> You need more, at least some kind of shrinker to cut down bo placed in
> system memory when we're under memory pressure. Which drags in a
> pretty epic amount of locking lols (see i915's shrinker fun, where we
> attempt that). Probably another good idea to share at least some
> concepts, maybe even code.

I am still looking into your shrinker suggestion so the memory.*
resources are untouch from RFC v3.  The main change for the buffer.*
resources is the removal of buffer sharing restriction as you
suggested and additional documentation of that behaviour.  (I may have
neglected mentioning it in the cover.)  The other key part of RFC v4
is the "logical GPU/lgpu" concept.  I am hoping to get it out there
early for feedback while I continue to work on the memory.* parts.

Kenny

> -Daniel
>
> >
> > Christian.
> >
> > > -Daniel
> > >
> > >> v4:
> > >> Unchanged (no review needed)
> > >> * drm.memory.*/ttm resources (Patch 9-13, I am still working on memory 
> > >> bandwidth
> > >> and shrinker)
> > >> Base on feedbacks on v3:
> > >> * update nominclature to drmcg
> > >> * embed per device drmcg properties into drm_device
> > >> * split GEM buffer related commits into stats and limit
> > >> * rename function name to align with convention
> > >> * combined buffer accounting and check into a try_charge function
> > >> * support buffer stats without limit enforcement
> > >> * removed GEM buffer sharing limitation
> > >> * updated documentations
> > >> New features:
> > >> * introducing logical GPU concept
> > >> * example implementation with AMD KFD
> > >>
> > >> v3:
> > >> Base on feedbacks on v2:
> > >> * removed .help type file from v2
> > >> * conform to cgroup convention for default and max handling
> > >> * conform to cgroup convention for addressing device specific limits 
> > >> (with major:minor)
> > >> New function:
> > >> * adopted memparse for memory size related attributes
> > >> * added macro to marshall drmcgrp cftype private  (DRMCG_CTF_PRIV, etc.)
> > >> * added ttm buffer usage stats (per cgroup, for system, tt, vram.)
> > >> * added ttm buffer usage limit (per cgroup, for vram.)
> > >> * added per cgroup bandwidth stats and limiting (burst and average 
> > >> bandwidth)
> > >>
> > >> v2:
> > >> * Removed the vendoring concepts
> > >> * Add limit to total buffer allocation
> > >> * Add limit to the maximum size of a buffer allocation
> > >>
> > >> v1: cover letter
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

2019-09-03 Thread Kenny Ho
Hi Tejun,

Thanks for looking into this.  I can definitely help where I can and I
am sure other experts will jump in if I start misrepresenting the
reality :) (as Daniel already have done.)

Regarding your points, my understanding is that there isn't really a
TTM vs GEM situation anymore (there is an lwn.net article about that,
but it is more than a decade old.)  I believe GEM is the common
interface at this point and more and more features are being
refactored into it.  For example, AMD's driver uses TTM internally but
things are exposed via the GEM interface.

This GEM resource is actually the single number resource you just
referred to.  A GEM buffer (the drm.buffer.* resources) can be backed
by VRAM, or system memory or other type of memory.  The more fine
grain control is the drm.memory.* resources which still need more
discussion.  (As some of the functionalities in TTM are being
refactored into the GEM level.  I have seen some patches that make TTM
a subclass of GEM.)

This RFC can be grouped into 3 areas and they are fairly independent
so they can be reviewed separately: high level device memory control
(buffer.*), fine grain memory control and bandwidth (memory.*) and
compute resources (lgpu.*)  I think the memory.* resources are the
most controversial part but I think it's still needed.

Perhaps an analogy may help.  For a system, we have CPUs and memory.
And within memory, it can be backed by RAM or swap.  For GPU, each
device can have LGPUs and buffers.  And within the buffers, it can be
backed by VRAM, or system RAM or even swap.

As for setting the right amount, I think that's where the profiling
aspect of the *.stats comes in.  And while one can't necessary buy
more VRAM, it is still a useful knob to adjust if the intention is to
pack more work into a GPU device with predictable performance.  This
research on various GPU workload may be of interest:

A Taxonomy of GPGPU Performance Scaling
http://www.computermachines.org/joe/posters/iiswc2015_taxonomy.pdf
http://www.computermachines.org/joe/publications/pdfs/iiswc2015_taxonomy.pdf

(summary: GPU workload can be memory bound or compute bound.  So it's
possible to pack different workload together to improve utilization.)

Regards,
Kenny

On Tue, Sep 3, 2019 at 2:50 PM Tejun Heo  wrote:
>
> Hello, Daniel.
>
> On Tue, Sep 03, 2019 at 09:55:50AM +0200, Daniel Vetter wrote:
> > > * While breaking up and applying control to different types of
> > >   internal objects may seem attractive to folks who work day in and
> > >   day out with the subsystem, they aren't all that useful to users and
> > >   the siloed controls are likely to make the whole mechanism a lot
> > >   less useful.  We had the same problem with cgroup1 memcg - putting
> > >   control of different uses of memory under separate knobs.  It made
> > >   the whole thing pretty useless.  e.g. if you constrain all knobs
> > >   tight enough to control the overall usage, overall utilization
> > >   suffers, but if you don't, you really don't have control over actual
> > >   usage.  For memcg, what has to be allocated and controlled is
> > >   physical memory, no matter how they're used.  It's not like you can
> > >   go buy more "socket" memory.  At least from the looks of it, I'm
> > >   afraid gpu controller is repeating the same mistakes.
> >
> > We do have quite a pile of different memories and ranges, so I don't
> > thinkt we're doing the same mistake here. But it is maybe a bit too
>
> I see.  One thing which caught my eyes was the system memory control.
> Shouldn't that be controlled by memcg?  Is there something special
> about system memory used by gpus?
>
> > complicated, and exposes stuff that most users really don't care about.
>
> Could be from me not knowing much about gpus but definitely looks too
> complex to me.  I don't see how users would be able to alloate, vram,
> system memory and GART with reasonable accuracy.  memcg on cgroup2
> deals with just single number and that's already plenty challenging.
>
> Thanks.
>
> --
> tejun
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH RFC v4 13/16] drm, cgroup: Allow more aggressive memory reclaim

2019-08-29 Thread Kenny Ho
Yes, and I think it has quite a lot of coupling with mm's page and
pressure mechanisms.  My current thought is to just copy the API but
have a separate implementation of "ttm_shrinker" and
"ttm_shrinker_control" or something like that.  I am certainly happy
to listen to additional feedbacks and suggestions.

Regards,
Kenny


On Thu, Aug 29, 2019 at 10:12 AM Koenig, Christian
 wrote:
>
> Yeah, that's also a really good idea as well.
>
> The problem with the shrinker API is that it only applies to system memory 
> currently.
>
> So you won't have a distinction which domain you need to evict stuff from.
>
> Regards,
> Christian.
>
> Am 29.08.19 um 16:07 schrieb Kenny Ho:
>
> Thanks for the feedback Christian.  I am still digging into this one.  Daniel 
> suggested leveraging the Shrinker API for the functionality of this commit in 
> RFC v3 but I am still trying to figure it out how/if ttm fit with shrinker 
> (though the idea behind the shrinker API seems fairly straightforward as far 
> as I understand it currently.)
>
> Regards,
> Kenny
>
> On Thu, Aug 29, 2019 at 3:08 AM Koenig, Christian  
> wrote:
>>
>> Am 29.08.19 um 08:05 schrieb Kenny Ho:
>> > Allow DRM TTM memory manager to register a work_struct, such that, when
>> > a drmcgrp is under memory pressure, memory reclaiming can be triggered
>> > immediately.
>> >
>> > Change-Id: I25ac04e2db9c19ff12652b88ebff18b44b2706d8
>> > Signed-off-by: Kenny Ho 
>> > ---
>> >   drivers/gpu/drm/ttm/ttm_bo.c| 49 +
>> >   include/drm/drm_cgroup.h| 16 +++
>> >   include/drm/ttm/ttm_bo_driver.h |  2 ++
>> >   kernel/cgroup/drm.c | 30 
>> >   4 files changed, 97 insertions(+)
>> >
>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> > index d7e3d3128ebb..72efae694b7e 100644
>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> > @@ -1590,6 +1590,46 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, 
>> > unsigned mem_type)
>> >   }
>> >   EXPORT_SYMBOL(ttm_bo_evict_mm);
>> >
>> > +static void ttm_bo_reclaim_wq(struct work_struct *work)
>> > +{
>> > + struct ttm_operation_ctx ctx = {
>> > + .interruptible = false,
>> > + .no_wait_gpu = false,
>> > + .flags = TTM_OPT_FLAG_FORCE_ALLOC
>> > + };
>> > + struct ttm_mem_type_manager *man =
>> > + container_of(work, struct ttm_mem_type_manager, reclaim_wq);
>> > + struct ttm_bo_device *bdev = man->bdev;
>> > + struct dma_fence *fence;
>> > + int mem_type;
>> > + int ret;
>> > +
>> > + for (mem_type = 0; mem_type < TTM_NUM_MEM_TYPES; mem_type++)
>> > + if (>man[mem_type] == man)
>> > + break;
>> > +
>> > + WARN_ON(mem_type >= TTM_NUM_MEM_TYPES);
>> > + if (mem_type >= TTM_NUM_MEM_TYPES)
>> > + return;
>> > +
>> > + if (!drmcg_mem_pressure_scan(bdev, mem_type))
>> > + return;
>> > +
>> > + ret = ttm_mem_evict_first(bdev, mem_type, NULL, , NULL);
>> > + if (ret)
>> > + return;
>> > +
>> > + spin_lock(>move_lock);
>> > + fence = dma_fence_get(man->move);
>> > + spin_unlock(>move_lock);
>> > +
>> > + if (fence) {
>> > + ret = dma_fence_wait(fence, false);
>> > + dma_fence_put(fence);
>> > + }
>>
>> Why do you want to block for the fence here? That is a rather bad idea
>> and would break pipe-lining.
>>
>> Apart from that I don't think we should put that into TTM.
>>
>> Instead drmcg_register_device_mm() should get a function pointer which
>> is called from a work item when the group is under pressure.
>>
>> TTM can then provides the function which can be called, but the actually
>> registration is job of the device and not TTM.
>>
>> Regards,
>> Christian.
>>
>> > +
>> > +}
>> > +
>> >   int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
>> >   unsigned long p_size)
>> >   {
>> > @@ -1624,6 +1664,13 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, 
>> > unsigned type,
>> >   INIT_LIST_HEAD(>lru[i]);

Re: [PATCH RFC v4 13/16] drm, cgroup: Allow more aggressive memory reclaim

2019-08-29 Thread Kenny Ho
Thanks for the feedback Christian.  I am still digging into this one.
Daniel suggested leveraging the Shrinker API for the functionality of this
commit in RFC v3 but I am still trying to figure it out how/if ttm fit with
shrinker (though the idea behind the shrinker API seems fairly
straightforward as far as I understand it currently.)

Regards,
Kenny

On Thu, Aug 29, 2019 at 3:08 AM Koenig, Christian 
wrote:

> Am 29.08.19 um 08:05 schrieb Kenny Ho:
> > Allow DRM TTM memory manager to register a work_struct, such that, when
> > a drmcgrp is under memory pressure, memory reclaiming can be triggered
> > immediately.
> >
> > Change-Id: I25ac04e2db9c19ff12652b88ebff18b44b2706d8
> > Signed-off-by: Kenny Ho 
> > ---
> >   drivers/gpu/drm/ttm/ttm_bo.c| 49 +
> >   include/drm/drm_cgroup.h| 16 +++
> >   include/drm/ttm/ttm_bo_driver.h |  2 ++
> >   kernel/cgroup/drm.c | 30 
> >   4 files changed, 97 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > index d7e3d3128ebb..72efae694b7e 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -1590,6 +1590,46 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev,
> unsigned mem_type)
> >   }
> >   EXPORT_SYMBOL(ttm_bo_evict_mm);
> >
> > +static void ttm_bo_reclaim_wq(struct work_struct *work)
> > +{
> > + struct ttm_operation_ctx ctx = {
> > + .interruptible = false,
> > + .no_wait_gpu = false,
> > + .flags = TTM_OPT_FLAG_FORCE_ALLOC
> > + };
> > + struct ttm_mem_type_manager *man =
> > + container_of(work, struct ttm_mem_type_manager, reclaim_wq);
> > + struct ttm_bo_device *bdev = man->bdev;
> > + struct dma_fence *fence;
> > + int mem_type;
> > + int ret;
> > +
> > + for (mem_type = 0; mem_type < TTM_NUM_MEM_TYPES; mem_type++)
> > + if (>man[mem_type] == man)
> > + break;
> > +
> > + WARN_ON(mem_type >= TTM_NUM_MEM_TYPES);
> > + if (mem_type >= TTM_NUM_MEM_TYPES)
> > + return;
> > +
> > + if (!drmcg_mem_pressure_scan(bdev, mem_type))
> > + return;
> > +
> > + ret = ttm_mem_evict_first(bdev, mem_type, NULL, , NULL);
> > + if (ret)
> > + return;
> > +
> > + spin_lock(>move_lock);
> > + fence = dma_fence_get(man->move);
> > + spin_unlock(>move_lock);
> > +
> > + if (fence) {
> > + ret = dma_fence_wait(fence, false);
> > + dma_fence_put(fence);
> > + }
>
> Why do you want to block for the fence here? That is a rather bad idea
> and would break pipe-lining.
>
> Apart from that I don't think we should put that into TTM.
>
> Instead drmcg_register_device_mm() should get a function pointer which
> is called from a work item when the group is under pressure.
>
> TTM can then provides the function which can be called, but the actually
> registration is job of the device and not TTM.
>
> Regards,
> Christian.
>
> > +
> > +}
> > +
> >   int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
> >   unsigned long p_size)
> >   {
> > @@ -1624,6 +1664,13 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev,
> unsigned type,
> >   INIT_LIST_HEAD(>lru[i]);
> >   man->move = NULL;
> >
> > + pr_err("drmcg %p type %d\n", bdev->ddev, type);
> > +
> > + if (type <= TTM_PL_VRAM) {
> > + INIT_WORK(>reclaim_wq, ttm_bo_reclaim_wq);
> > + drmcg_register_device_mm(bdev->ddev, type,
> >reclaim_wq);
> > + }
> > +
> >   return 0;
> >   }
> >   EXPORT_SYMBOL(ttm_bo_init_mm);
> > @@ -1701,6 +1748,8 @@ int ttm_bo_device_release(struct ttm_bo_device
> *bdev)
> >   man = >man[i];
> >   if (man->has_type) {
> >   man->use_type = false;
> > + drmcg_unregister_device_mm(bdev->ddev, i);
> > + cancel_work_sync(>reclaim_wq);
> >   if ((i != TTM_PL_SYSTEM) && ttm_bo_clean_mm(bdev,
> i)) {
> >   ret = -EBUSY;
> >   pr_err("DRM memory manager type %d is not
> clean\n",
> > diff --git a/include/drm/drm_cgroup.h b/include/drm/

[PATCH RFC v4 13/16] drm, cgroup: Allow more aggressive memory reclaim

2019-08-29 Thread Kenny Ho
Allow DRM TTM memory manager to register a work_struct, such that, when
a drmcgrp is under memory pressure, memory reclaiming can be triggered
immediately.

Change-Id: I25ac04e2db9c19ff12652b88ebff18b44b2706d8
Signed-off-by: Kenny Ho 
---
 drivers/gpu/drm/ttm/ttm_bo.c| 49 +
 include/drm/drm_cgroup.h| 16 +++
 include/drm/ttm/ttm_bo_driver.h |  2 ++
 kernel/cgroup/drm.c | 30 
 4 files changed, 97 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index d7e3d3128ebb..72efae694b7e 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1590,6 +1590,46 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, unsigned 
mem_type)
 }
 EXPORT_SYMBOL(ttm_bo_evict_mm);
 
+static void ttm_bo_reclaim_wq(struct work_struct *work)
+{
+   struct ttm_operation_ctx ctx = {
+   .interruptible = false,
+   .no_wait_gpu = false,
+   .flags = TTM_OPT_FLAG_FORCE_ALLOC
+   };
+   struct ttm_mem_type_manager *man =
+   container_of(work, struct ttm_mem_type_manager, reclaim_wq);
+   struct ttm_bo_device *bdev = man->bdev;
+   struct dma_fence *fence;
+   int mem_type;
+   int ret;
+
+   for (mem_type = 0; mem_type < TTM_NUM_MEM_TYPES; mem_type++)
+   if (>man[mem_type] == man)
+   break;
+
+   WARN_ON(mem_type >= TTM_NUM_MEM_TYPES);
+   if (mem_type >= TTM_NUM_MEM_TYPES)
+   return;
+
+   if (!drmcg_mem_pressure_scan(bdev, mem_type))
+   return;
+
+   ret = ttm_mem_evict_first(bdev, mem_type, NULL, , NULL);
+   if (ret)
+   return;
+
+   spin_lock(>move_lock);
+   fence = dma_fence_get(man->move);
+   spin_unlock(>move_lock);
+
+   if (fence) {
+   ret = dma_fence_wait(fence, false);
+   dma_fence_put(fence);
+   }
+
+}
+
 int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
unsigned long p_size)
 {
@@ -1624,6 +1664,13 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned 
type,
INIT_LIST_HEAD(>lru[i]);
man->move = NULL;
 
+   pr_err("drmcg %p type %d\n", bdev->ddev, type);
+
+   if (type <= TTM_PL_VRAM) {
+   INIT_WORK(>reclaim_wq, ttm_bo_reclaim_wq);
+   drmcg_register_device_mm(bdev->ddev, type, >reclaim_wq);
+   }
+
return 0;
 }
 EXPORT_SYMBOL(ttm_bo_init_mm);
@@ -1701,6 +1748,8 @@ int ttm_bo_device_release(struct ttm_bo_device *bdev)
man = >man[i];
if (man->has_type) {
man->use_type = false;
+   drmcg_unregister_device_mm(bdev->ddev, i);
+   cancel_work_sync(>reclaim_wq);
if ((i != TTM_PL_SYSTEM) && ttm_bo_clean_mm(bdev, i)) {
ret = -EBUSY;
pr_err("DRM memory manager type %d is not 
clean\n",
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index c11df388fdf2..6d9707e1eb72 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -5,6 +5,7 @@
 #define __DRM_CGROUP_H__
 
 #include 
+#include 
 #include 
 #include 
 
@@ -25,12 +26,17 @@ struct drmcg_props {
s64 mem_bw_avg_bytes_per_us_default;
 
s64 mem_highs_default[TTM_PL_PRIV+1];
+
+   struct work_struct  *mem_reclaim_wq[TTM_PL_PRIV];
 };
 
 #ifdef CONFIG_CGROUP_DRM
 
 void drmcg_device_update(struct drm_device *device);
 void drmcg_device_early_init(struct drm_device *device);
+void drmcg_register_device_mm(struct drm_device *dev, unsigned int type,
+   struct work_struct *wq);
+void drmcg_unregister_device_mm(struct drm_device *dev, unsigned int type);
 bool drmcg_try_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
size_t size);
 void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
@@ -53,6 +59,16 @@ static inline void drmcg_device_early_init(struct drm_device 
*device)
 {
 }
 
+static inline void drmcg_register_device_mm(struct drm_device *dev,
+   unsigned int type, struct work_struct *wq)
+{
+}
+
+static inline void drmcg_unregister_device_mm(struct drm_device *dev,
+   unsigned int type)
+{
+}
+
 static inline void drmcg_try_chg_bo_alloc(struct drmcg *drmcg,
struct drm_device *dev, size_t size)
 {
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index e1a805d65b83..529cef92bcf6 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -205,6 +205,8 @@ struct ttm_mem_type_manager {
 * Protected by @move_lock.
 */
struct dma_fence *move;
+
+   struct work_struct reclaim_wq;
 };
 
 /**

[PATCH RFC v4 15/16] drm, cgroup: add update trigger after limit change

2019-08-29 Thread Kenny Ho
Before this commit, drmcg limits are updated but enforcement is delayed
until the next time the driver check against the new limit.  While this
is sufficient for certain resources, a more proactive enforcement may be
needed for other resources.

Introducing an optional drmcg_limit_updated callback for the DRM
drivers.  When defined, it will be called in two scenarios:
1) When limits are updated for a particular cgroup, the callback will be
triggered for each task in the updated cgroup.
2) When a task is migrated from one cgroup to another, the callback will
be triggered for each resource type for the migrated task.

Change-Id: I68187a72818b855b5f295aefcb241cda8ab63b00
Signed-off-by: Kenny Ho 
---
 include/drm/drm_drv.h | 10 
 kernel/cgroup/drm.c   | 57 +++
 2 files changed, 67 insertions(+)

diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index c8a37a08d98d..7e588b874a27 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -669,6 +669,16 @@ struct drm_driver {
void (*drmcg_custom_init)(struct drm_device *dev,
struct drmcg_props *props);
 
+   /**
+* @drmcg_limit_updated
+*
+* Optional callback
+*/
+   void (*drmcg_limit_updated)(struct drm_device *dev,
+   struct task_struct *task,\
+   struct drmcg_device_resource *ddr,
+   enum drmcg_res_type res_type);
+
/**
 * @gem_vm_ops: Driver private ops for this object
 */
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 18c4368e2c29..99772e5d9ccc 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -621,6 +621,23 @@ static void drmcg_nested_limit_parse(struct 
kernfs_open_file *of,
}
 }
 
+static void drmcg_limit_updated(struct drm_device *dev, struct drmcg *drmcg,
+   enum drmcg_res_type res_type)
+{
+   struct drmcg_device_resource *ddr =
+   drmcg->dev_resources[dev->primary->index];
+   struct css_task_iter it;
+   struct task_struct *task;
+
+   css_task_iter_start(>css.cgroup->self,
+   CSS_TASK_ITER_PROCS, );
+   while ((task = css_task_iter_next())) {
+   dev->driver->drmcg_limit_updated(dev, task,
+   ddr, res_type);
+   }
+   css_task_iter_end();
+}
+
 static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
size_t nbytes, loff_t off)
 {
@@ -726,6 +743,10 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file 
*of, char *buf,
default:
break;
}
+
+   if (dm->dev->driver->drmcg_limit_updated)
+   drmcg_limit_updated(dm->dev, drmcg, type);
+
drm_dev_put(dm->dev); /* release from drm_minor_acquire */
}
 
@@ -863,9 +884,45 @@ struct cftype files[] = {
{ } /* terminate */
 };
 
+static int drmcg_attach_fn(int id, void *ptr, void *data)
+{
+   struct drm_minor *minor = ptr;
+   struct task_struct *task = data;
+   struct drm_device *dev;
+
+   if (minor->type != DRM_MINOR_PRIMARY)
+   return 0;
+
+   dev = minor->dev;
+
+   if (dev->driver->drmcg_limit_updated) {
+   struct drmcg *drmcg = drmcg_get(task);
+   struct drmcg_device_resource *ddr =
+   drmcg->dev_resources[minor->index];
+   enum drmcg_res_type type;
+
+   for (type = 0; type < __DRMCG_TYPE_LAST; type++)
+   dev->driver->drmcg_limit_updated(dev, task, ddr, type);
+
+   drmcg_put(drmcg);
+   }
+
+   return 0;
+}
+
+static void drmcg_attach(struct cgroup_taskset *tset)
+{
+   struct task_struct *task;
+   struct cgroup_subsys_state *css;
+
+   cgroup_taskset_for_each(task, css, tset)
+   drm_minor_for_each(_attach_fn, task);
+}
+
 struct cgroup_subsys drm_cgrp_subsys = {
.css_alloc  = drmcg_css_alloc,
.css_free   = drmcg_css_free,
+   .attach = drmcg_attach,
.early_init = false,
.legacy_cftypes = files,
.dfl_cftypes= files,
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH RFC v4 16/16] drm/amdgpu: Integrate with DRM cgroup

2019-08-29 Thread Kenny Ho
The number of logical gpu (lgpu) is defined to be the number of compute
unit (CU) for a device.  The lgpu allocation limit only applies to
compute workload for the moment (enforced via kfd queue creation.)  Any
cu_mask update is validated against the availability of the compute unit
as defined by the drmcg the kfd process belongs to.

Change-Id: I69a57452c549173a1cd623c30dc57195b3b6563e
Signed-off-by: Kenny Ho 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  21 +++
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   6 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   3 +
 .../amd/amdkfd/kfd_process_queue_manager.c| 140 ++
 5 files changed, 174 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 55cb1b2094fd..369915337213 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -198,6 +198,10 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev 
*dst, struct kgd_dev *s
valid;  \
})
 
+int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
+   struct amdgpu_device *adev, unsigned long *lgpu_bitmap,
+   unsigned int nbits);
+
 /* GPUVM API */
 int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int 
pasid,
void **vm, void **process_info,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 163a4fbf0611..8abeffdd2e5b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1398,9 +1398,29 @@ amdgpu_get_crtc_scanout_position(struct drm_device *dev, 
unsigned int pipe,
 static void amdgpu_drmcg_custom_init(struct drm_device *dev,
struct drmcg_props *props)
 {
+   struct amdgpu_device *adev = dev->dev_private;
+
+   props->lgpu_capacity = adev->gfx.cu_info.number;
+
props->limit_enforced = true;
 }
 
+static void amdgpu_drmcg_limit_updated(struct drm_device *dev,
+   struct task_struct *task, struct drmcg_device_resource *ddr,
+   enum drmcg_res_type res_type)
+{
+   struct amdgpu_device *adev = dev->dev_private;
+
+   switch (res_type) {
+   case DRMCG_TYPE_LGPU:
+   amdgpu_amdkfd_update_cu_mask_for_process(task, adev,
+ddr->lgpu_allocated, dev->drmcg_props.lgpu_capacity);
+   break;
+   default:
+   break;
+   }
+}
+
 static struct drm_driver kms_driver = {
.driver_features =
DRIVER_USE_AGP | DRIVER_ATOMIC |
@@ -1438,6 +1458,7 @@ static struct drm_driver kms_driver = {
.gem_prime_mmap = amdgpu_gem_prime_mmap,
 
.drmcg_custom_init = amdgpu_drmcg_custom_init,
+   .drmcg_limit_updated = amdgpu_drmcg_limit_updated,
 
.name = DRIVER_NAME,
.desc = DRIVER_DESC,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 138c70454e2b..fa765b803f97 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -450,6 +450,12 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct 
kfd_process *p,
return -EFAULT;
}
 
+   if (!pqm_drmcg_lgpu_validate(p, args->queue_id, properties.cu_mask, 
cu_mask_size)) {
+   pr_debug("CU mask not permitted by DRM Cgroup");
+   kfree(properties.cu_mask);
+   return -EACCES;
+   }
+
mutex_lock(>mutex);
 
retval = pqm_set_cu_mask(>pqm, args->queue_id, );
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 8b0eee5b3521..1bec7550 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1038,6 +1038,9 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
   u32 *ctl_stack_used_size,
   u32 *save_area_used_size);
 
+bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
+   unsigned int cu_mask_size);
+
 int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
unsigned int fence_value,
unsigned int timeout_ms);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 7e6c3ee82f5b..a896de290307 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -23,9 +23,11 @@
 
 #include 
 #include 
+#include 
 #include "kfd_device_queue_manager.h"
 #include "kfd_priv.h"
 #include "kfd_kernel_queue.h"
+#include "am

[PATCH RFC v4 05/16] drm, cgroup: Add peak GEM buffer allocation stats

2019-08-29 Thread Kenny Ho
drm.buffer.peak.stats
A read-only flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Largest (high water mark) GEM buffer allocated in bytes.

Change-Id: I79e56222151a3d33a76a61ba0097fe93ebb3449f
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst |  6 ++
 include/linux/cgroup_drm.h  |  3 +++
 kernel/cgroup/drm.c | 12 
 3 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 0e29d136e2f9..8588a0ffc69d 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1907,6 +1907,12 @@ DRM Interface Files
 
Total GEM buffer allocation in bytes.
 
+  drm.buffer.peak.stats
+   A read-only flat-keyed file which exists on all cgroups.  Each
+   entry is keyed by the drm device's major:minor.
+
+   Largest (high water mark) GEM buffer allocated in bytes.
+
 GEM Buffer Ownership
 
 
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 1d8a7f2cdb4e..974d390cfa4f 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -15,6 +15,7 @@
 
 enum drmcg_res_type {
DRMCG_TYPE_BO_TOTAL,
+   DRMCG_TYPE_BO_PEAK,
__DRMCG_TYPE_LAST,
 };
 
@@ -24,6 +25,8 @@ enum drmcg_res_type {
 struct drmcg_device_resource {
/* for per device stats */
s64 bo_stats_total_allocated;
+
+   s64 bo_stats_peak_allocated;
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 87ae9164d8d8..0bf5b95668c4 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -129,6 +129,9 @@ static void drmcg_print_stats(struct drmcg_device_resource 
*ddr,
case DRMCG_TYPE_BO_TOTAL:
seq_printf(sf, "%lld\n", ddr->bo_stats_total_allocated);
break;
+   case DRMCG_TYPE_BO_PEAK:
+   seq_printf(sf, "%lld\n", ddr->bo_stats_peak_allocated);
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -177,6 +180,12 @@ struct cftype files[] = {
.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_TOTAL,
DRMCG_FTYPE_STATS),
},
+   {
+   .name = "buffer.peak.stats",
+   .seq_show = drmcg_seq_show,
+   .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_PEAK,
+   DRMCG_FTYPE_STATS),
+   },
{ } /* terminate */
 };
 
@@ -260,6 +269,9 @@ void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct 
drm_device *dev,
ddr = drmcg->dev_resources[devIdx];
 
ddr->bo_stats_total_allocated += (s64)size;
+
+   if (ddr->bo_stats_peak_allocated < (s64)size)
+   ddr->bo_stats_peak_allocated = (s64)size;
}
mutex_unlock(>drmcg_mutex);
 }
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH RFC v4 10/16] drm, cgroup: Add TTM buffer peak usage stats

2019-08-29 Thread Kenny Ho
drm.memory.peak.stats
A read-only nested-keyed file which exists on all cgroups.
Each entry is keyed by the drm device's major:minor.  The
following nested keys are defined.

  == ==
  system Peak host memory used
  tt Peak host memory used by the device (GTT/GART)
  vram   Peak Video RAM used by the drm device
  priv   Other drm device specific memory peak usage
  == ==

Reading returns the following::

226:0 system=0 tt=0 vram=0 priv=0
226:1 system=0 tt=9035776 vram=17768448 priv=16809984
226:2 system=0 tt=9035776 vram=17768448 priv=16809984

Change-Id: I986e44533848f66411465bdd52105e78105a709a
Signed-off-by: Kenny Ho 
---
 include/linux/cgroup_drm.h |  2 ++
 kernel/cgroup/drm.c| 19 +++
 2 files changed, 21 insertions(+)

diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 4c2794c9333d..9579e2a0b71d 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -20,6 +20,7 @@ enum drmcg_res_type {
DRMCG_TYPE_BO_COUNT,
DRMCG_TYPE_MEM,
DRMCG_TYPE_MEM_EVICT,
+   DRMCG_TYPE_MEM_PEAK,
__DRMCG_TYPE_LAST,
 };
 
@@ -37,6 +38,7 @@ struct drmcg_device_resource {
s64 bo_stats_count_allocated;
 
s64 mem_stats[TTM_PL_PRIV+1];
+   s64 mem_peaks[TTM_PL_PRIV+1];
s64 mem_stats_evict;
 };
 
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 4960a8d1e8f4..899dc44722c3 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -162,6 +162,13 @@ static void drmcg_print_stats(struct drmcg_device_resource 
*ddr,
case DRMCG_TYPE_MEM_EVICT:
seq_printf(sf, "%lld\n", ddr->mem_stats_evict);
break;
+   case DRMCG_TYPE_MEM_PEAK:
+   for (i = 0; i <= TTM_PL_PRIV; i++) {
+   seq_printf(sf, "%s=%lld ", ttm_placement_names[i],
+   ddr->mem_peaks[i]);
+   }
+   seq_puts(sf, "\n");
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -443,6 +450,12 @@ struct cftype files[] = {
.private = DRMCG_CTF_PRIV(DRMCG_TYPE_MEM_EVICT,
DRMCG_FTYPE_STATS),
},
+   {
+   .name = "memory.peaks.stats",
+   .seq_show = drmcg_seq_show,
+   .private = DRMCG_CTF_PRIV(DRMCG_TYPE_MEM_PEAK,
+   DRMCG_FTYPE_STATS),
+   },
{ } /* terminate */
 };
 
@@ -617,6 +630,8 @@ void drmcg_chg_mem(struct ttm_buffer_object *tbo)
for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
ddr = drmcg->dev_resources[devIdx];
ddr->mem_stats[mem_type] += size;
+   ddr->mem_peaks[mem_type] = max(ddr->mem_peaks[mem_type],
+   ddr->mem_stats[mem_type]);
}
mutex_unlock(>drmcg_mutex);
 }
@@ -668,6 +683,10 @@ void drmcg_mem_track_move(struct ttm_buffer_object 
*old_bo, bool evict,
ddr->mem_stats[old_mem_type] -= move_in_bytes;
ddr->mem_stats[new_mem_type] += move_in_bytes;
 
+   ddr->mem_peaks[new_mem_type] = max(
+   ddr->mem_peaks[new_mem_type],
+   ddr->mem_stats[new_mem_type]);
+
if (evict)
ddr->mem_stats_evict++;
}
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH RFC v4 09/16] drm, cgroup: Add TTM buffer allocation stats

2019-08-29 Thread Kenny Ho
The drm resource being measured is the TTM (Translation Table Manager)
buffers.  TTM manages different types of memory that a GPU might access.
These memory types include dedicated Video RAM (VRAM) and host/system
memory accessible through IOMMU (GART/GTT).  TTM is currently used by
multiple drm drivers (amd, ast, bochs, cirrus, hisilicon, maga200,
nouveau, qxl, virtio, vmwgfx.)

drm.memory.stats
A read-only nested-keyed file which exists on all cgroups.
Each entry is keyed by the drm device's major:minor.  The
following nested keys are defined.

  == =
  system Host/system memory
  tt Host memory used by the drm device (GTT/GART)
  vram   Video RAM used by the drm device
  priv   Other drm device, vendor specific memory
  == =

Reading returns the following::

226:0 system=0 tt=0 vram=0 priv=0
226:1 system=0 tt=9035776 vram=17768448 priv=16809984
226:2 system=0 tt=9035776 vram=17768448 priv=16809984

drm.memory.evict.stats
A read-only flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Total number of evictions.

Change-Id: Ice2c4cc845051229549bebeb6aa2d7d6153bdf6a
Signed-off-by: Kenny Ho 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |   3 +-
 drivers/gpu/drm/ttm/ttm_bo.c|  30 +++
 drivers/gpu/drm/ttm/ttm_bo_util.c   |   4 +
 include/drm/drm_cgroup.h|  19 +
 include/drm/ttm/ttm_bo_api.h|   2 +
 include/drm/ttm/ttm_bo_driver.h |   8 ++
 include/linux/cgroup_drm.h  |   6 ++
 kernel/cgroup/drm.c | 108 
 8 files changed, 179 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index cfcbbdc39656..463e015e8694 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1720,8 +1720,9 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
mutex_init(>mman.gtt_window_lock);
 
/* No others user of address space so set it to 0 */
-   r = ttm_bo_device_init(>mman.bdev,
+   r = ttm_bo_device_init_tmp(>mman.bdev,
   _bo_driver,
+  adev->ddev,
   adev->ddev->anon_inode->i_mapping,
   adev->need_dma32);
if (r) {
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 58c403eda04e..a0e9ce46baf3 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -42,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static void ttm_bo_global_kobj_release(struct kobject *kobj);
 
@@ -151,6 +153,10 @@ static void ttm_bo_release_list(struct kref *list_kref)
struct ttm_bo_device *bdev = bo->bdev;
size_t acc_size = bo->acc_size;
 
+   if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for 
all
+   drmcg_unchg_mem(bo);
+   drmcg_put(bo->drmcg);
+
BUG_ON(kref_read(>list_kref));
BUG_ON(kref_read(>kref));
BUG_ON(atomic_read(>cpu_writers));
@@ -360,6 +366,8 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object 
*bo,
if (bo->mem.mem_type == TTM_PL_SYSTEM) {
if (bdev->driver->move_notify)
bdev->driver->move_notify(bo, evict, mem);
+   if (bo->bdev->ddev != NULL) // TODO: remove after ddev 
initiazlied for all
+   drmcg_mem_track_move(bo, evict, mem);
bo->mem = *mem;
mem->mm_node = NULL;
goto moved;
@@ -368,6 +376,8 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object 
*bo,
 
if (bdev->driver->move_notify)
bdev->driver->move_notify(bo, evict, mem);
+   if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for 
all
+   drmcg_mem_track_move(bo, evict, mem);
 
if (!(old_man->flags & TTM_MEMTYPE_FLAG_FIXED) &&
!(new_man->flags & TTM_MEMTYPE_FLAG_FIXED))
@@ -381,6 +391,8 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object 
*bo,
if (bdev->driver->move_notify) {
swap(*mem, bo->mem);
bdev->driver->move_notify(bo, false, mem);
+   if (bo->bdev->ddev != NULL) // TODO: remove after ddev 
initiazlied for all
+

[PATCH RFC v4 11/16] drm, cgroup: Add per cgroup bw measure and control

2019-08-29 Thread Kenny Ho
The bandwidth is measured by keeping track of the amount of bytes moved
by ttm within a time period.  We defined two type of bandwidth: burst
and average.  Average bandwidth is calculated by dividing the total
amount of bytes moved within a cgroup by the lifetime of the cgroup.
Burst bandwidth is similar except that the byte and time measurement is
reset after a user configurable period.

The bandwidth control is best effort since it is done on a per move
basis instead of per byte.  The bandwidth is limited by delaying the
move of a buffer.  The bandwidth limit can be exceeded when the next
move is larger than the remaining allowance.

drm.burst_bw_period_in_us
A read-write flat-keyed file which exists on the root cgroup.
Each entry is keyed by the drm device's major:minor.

Length of a period use to measure burst bandwidth in us.
One period per device.

drm.burst_bw_period_in_us.default
A read-only flat-keyed file which exists on the root cgroup.
Each entry is keyed by the drm device's major:minor.

Default length of a period in us (one per device.)

drm.bandwidth.stats
A read-only nested-keyed file which exists on all cgroups.
Each entry is keyed by the drm device's major:minor.  The
following nested keys are defined.

  = ==
  burst_byte_per_us Burst bandwidth
  avg_bytes_per_us  Average bandwidth
  moved_byteAmount of byte moved within a period
  accum_us  Amount of time accumulated in a period
  total_moved_byte  Byte moved within the cgroup lifetime
  total_accum_usCgroup lifetime in us
  byte_credit   Available byte credit to limit avg bw
  = ==

Reading returns the following::
226:1 burst_byte_per_us=23 avg_bytes_per_us=0 moved_byte=2244608
accum_us=95575 total_moved_byte=45899776 total_accum_us=201634590
byte_credit=13214278590464
226:2 burst_byte_per_us=10 avg_bytes_per_us=219 moved_byte=430080
accum_us=39350 total_moved_byte=65518026752 total_accum_us=298337721
byte_credit=9223372036854644735

drm.bandwidth.high
A read-write nested-keyed file which exists on all cgroups.
Each entry is keyed by the drm device's major:minor.  The
following nested keys are defined.

    ===
  bytes_in_period   Burst limit per period in byte
  avg_bytes_per_us  Average bandwidth limit in bytes per us
    ===

Reading returns the following::

226:1 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536
226:2 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536

drm.bandwidth.default
A read-only nested-keyed file which exists on the root cgroup.
Each entry is keyed by the drm device's major:minor.  The
following nested keys are defined.

    
  bytes_in_period   Default burst limit per period in byte
  avg_bytes_per_us  Default average bw limit in bytes per us
    

Reading returns the following::

226:1 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536
226:2 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536

Change-Id: Ie573491325ccc16535bb943e7857f43bd0962add
Signed-off-by: Kenny Ho 
---
 drivers/gpu/drm/ttm/ttm_bo.c |   7 +
 include/drm/drm_cgroup.h |  19 +++
 include/linux/cgroup_drm.h   |  16 ++
 kernel/cgroup/drm.c  | 319 ++-
 4 files changed, 359 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index a0e9ce46baf3..32eee85f3641 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1256,6 +1257,12 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
 * Check whether we need to move buffer.
 */
if (!ttm_bo_mem_compat(placement, >mem, _flags)) {
+   unsigned int move_delay = drmcg_get_mem_bw_period_in_us(bo);
+
+   move_delay /= 2000; /* check every half period in ms*/
+   while (bo->bdev->ddev != NULL && !drmcg_mem_can_move(bo))
+   msleep(move_delay);
+
ret = ttm_bo_move_buffer(bo, placement, ctx);
if (ret)
return ret;
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 7d63f73a5375..9ce0d54e6bd8 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.

[PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource

2019-08-29 Thread Kenny Ho
drm.lgpu
A read-write nested-keyed file which exists on all cgroups.
Each entry is keyed by the DRM device's major:minor.

lgpu stands for logical GPU, it is an abstraction used to
subdivide a physical DRM device for the purpose of resource
management.

The lgpu is a discrete quantity that is device specific (i.e.
some DRM devices may have 64 lgpus while others may have 100
lgpus.)  The lgpu is a single quantity with two representations
denoted by the following nested keys.

  = 
  count Representing lgpu as anonymous resource
  list  Representing lgpu as named resource
  = 

For example:
226:0 count=256 list=0-255
226:1 count=4 list=0,2,4,6
226:2 count=32 list=32-63

lgpu is represented by a bitmap and uses the bitmap_parselist
kernel function so the list key input format is a
comma-separated list of decimal numbers and ranges.

Consecutively set bits are shown as two hyphen-separated decimal
numbers, the smallest and largest bit numbers set in the range.
Optionally each range can be postfixed to denote that only parts
of it should be set.  The range will divided to groups of
specific size.
Syntax: range:used_size/group_size
Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769

The count key is the hamming weight / hweight of the bitmap.

Both count and list accept the max and default keywords.

Some DRM devices may only support lgpu as anonymous resources.
In such case, the significance of the position of the set bits
in list will be ignored.

This lgpu resource supports the 'allocation' resource
distribution model.

Change-Id: I1afcacf356770930c7f925df043e51ad06ceb98e
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst |  46 
 include/drm/drm_cgroup.h|   4 +
 include/linux/cgroup_drm.h  |   6 ++
 kernel/cgroup/drm.c | 135 
 4 files changed, 191 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 87a195133eaa..57f18469bd76 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1958,6 +1958,52 @@ DRM Interface Files
Set largest allocation for /dev/dri/card1 to 4MB
echo "226:1 4m" > drm.buffer.peak.max
 
+  drm.lgpu
+   A read-write nested-keyed file which exists on all cgroups.
+   Each entry is keyed by the DRM device's major:minor.
+
+   lgpu stands for logical GPU, it is an abstraction used to
+   subdivide a physical DRM device for the purpose of resource
+   management.
+
+   The lgpu is a discrete quantity that is device specific (i.e.
+   some DRM devices may have 64 lgpus while others may have 100
+   lgpus.)  The lgpu is a single quantity with two representations
+   denoted by the following nested keys.
+
+ = 
+ count Representing lgpu as anonymous resource
+ list  Representing lgpu as named resource
+ = 
+
+   For example:
+   226:0 count=256 list=0-255
+   226:1 count=4 list=0,2,4,6
+   226:2 count=32 list=32-63
+
+   lgpu is represented by a bitmap and uses the bitmap_parselist
+   kernel function so the list key input format is a
+   comma-separated list of decimal numbers and ranges.
+
+   Consecutively set bits are shown as two hyphen-separated decimal
+   numbers, the smallest and largest bit numbers set in the range.
+   Optionally each range can be postfixed to denote that only parts
+   of it should be set.  The range will divided to groups of
+   specific size.
+   Syntax: range:used_size/group_size
+   Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
+
+   The count key is the hamming weight / hweight of the bitmap.
+
+   Both count and list accept the max and default keywords.
+
+   Some DRM devices may only support lgpu as anonymous resources.
+   In such case, the significance of the position of the set bits
+   in list will be ignored.
+
+   This lgpu resource supports the 'allocation' resource
+   distribution model.
+
 GEM Buffer Ownership
 
 
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 6d9707e1eb72..a8d6be0b075b 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -6,6 +6,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -28,6 +29,9 @@ struct drmcg_props {
s64 mem_highs_default[TTM_PL_PRIV+1

[PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem

2019-08-29 Thread Kenny Ho
With the increased importance of machine learning, data science and
other cloud-based applications, GPUs are already in production use in
data centers today.  Existing GPU resource management is very coarse
grain, however, as sysadmins are only able to distribute workload on a
per-GPU basis.  An alternative is to use GPU virtualization (with or
without SRIOV) but it generally acts on the entire GPU instead of the
specific resources in a GPU.  With a drm cgroup controller, we can
enable alternate, fine-grain, sub-GPU resource management (in addition
to what may be available via GPU virtualization.)

Change-Id: I6830d3990f63f0c13abeba29b1d330cf28882831
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst | 18 -
 Documentation/cgroup-v1/drm.rst |  1 +
 include/linux/cgroup_drm.h  | 92 +
 include/linux/cgroup_subsys.h   |  4 ++
 init/Kconfig|  5 ++
 kernel/cgroup/Makefile  |  1 +
 kernel/cgroup/drm.c | 42 +++
 7 files changed, 161 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/cgroup-v1/drm.rst
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 88e746074252..2936423a3fd5 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -61,8 +61,10 @@ v1 is available under Documentation/cgroup-v1/.
  5-6. Device
  5-7. RDMA
5-7-1. RDMA Interface Files
- 5-8. Misc
-   5-8-1. perf_event
+ 5-8. DRM
+   5-8-1. DRM Interface Files
+ 5-9. Misc
+   5-9-1. perf_event
  5-N. Non-normative information
5-N-1. CPU controller root cgroup process behaviour
5-N-2. IO controller root cgroup process behaviour
@@ -1889,6 +1891,18 @@ RDMA Interface Files
  ocrdma1 hca_handle=1 hca_object=23
 
 
+DRM
+---
+
+The "drm" controller regulates the distribution and accounting of
+of DRM (Direct Rendering Manager) and GPU-related resources.
+
+DRM Interface Files
+
+
+TODO
+
+
 Misc
 
 
diff --git a/Documentation/cgroup-v1/drm.rst b/Documentation/cgroup-v1/drm.rst
new file mode 100644
index ..5f5658e1f5ed
--- /dev/null
+++ b/Documentation/cgroup-v1/drm.rst
@@ -0,0 +1 @@
+Please see ../cgroup-v2.rst for details
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
new file mode 100644
index ..971166f9dd78
--- /dev/null
+++ b/include/linux/cgroup_drm.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ */
+#ifndef _CGROUP_DRM_H
+#define _CGROUP_DRM_H
+
+#ifdef CONFIG_CGROUP_DRM
+
+#include 
+
+/**
+ * The DRM cgroup controller data structure.
+ */
+struct drmcg {
+   struct cgroup_subsys_state  css;
+};
+
+/**
+ * css_to_drmcg - get the corresponding drmcg ref from a cgroup_subsys_state
+ * @css: the target cgroup_subsys_state
+ *
+ * Return: DRM cgroup that contains the @css
+ */
+static inline struct drmcg *css_to_drmcg(struct cgroup_subsys_state *css)
+{
+   return css ? container_of(css, struct drmcg, css) : NULL;
+}
+
+/**
+ * drmcg_get - get the drmcg reference that a task belongs to
+ * @task: the target task
+ *
+ * This increase the reference count of the css that the @task belongs to
+ *
+ * Return: reference to the DRM cgroup the task belongs to
+ */
+static inline struct drmcg *drmcg_get(struct task_struct *task)
+{
+   return css_to_drmcg(task_get_css(task, drm_cgrp_id));
+}
+
+/**
+ * drmcg_put - put a drmcg reference
+ * @drmcg: the target drmcg
+ *
+ * Put a reference obtained via drmcg_get
+ */
+static inline void drmcg_put(struct drmcg *drmcg)
+{
+   if (drmcg)
+   css_put(>css);
+}
+
+/**
+ * drmcg_parent - find the parent of a drm cgroup
+ * @cg: the target drmcg
+ *
+ * This does not increase the reference count of the parent cgroup
+ *
+ * Return: parent DRM cgroup of @cg
+ */
+static inline struct drmcg *drmcg_parent(struct drmcg *cg)
+{
+   return css_to_drmcg(cg->css.parent);
+}
+
+#else /* CONFIG_CGROUP_DRM */
+
+struct drmcg {
+};
+
+static inline struct drmcg *css_to_drmcg(struct cgroup_subsys_state *css)
+{
+   return NULL;
+}
+
+static inline struct drmcg *drmcg_get(struct task_struct *task)
+{
+   return NULL;
+}
+
+static inline void drmcg_put(struct drmcg *drmcg)
+{
+}
+
+static inline struct drmcg *drmcg_parent(struct drmcg *cg)
+{
+   return NULL;
+}
+
+#endif /* CONFIG_CGROUP_DRM */
+#endif /* _CGROUP_DRM_H */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index acb77dcff3b4..ddedad809e8b 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -61,6 +61,10 @@ SUBSYS(pids)
 SUBSYS(rdma)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_DRM)
+SUBSYS(drm)
+#endif
+
 /*
  * The foll

[PATCH RFC v4 07/16] drm, cgroup: Add total GEM buffer allocation limit

2019-08-29 Thread Kenny Ho
The drm resource being limited here is the GEM buffer objects.  User
applications allocate and free these buffers.  In addition, a process
can allocate a buffer and share it with another process.  The consumer
of a shared buffer can also outlive the allocator of the buffer.

For the purpose of cgroup accounting and limiting, ownership of the
buffer is deemed to be the cgroup for which the allocating process
belongs to.  There is one cgroup limit per drm device.

The limiting functionality is added to the previous stats collection
function.  The drm_gem_private_object_init is modified to have a return
value to allow failure due to cgroup limit.

The try_chg function only fails if the DRM cgroup properties has
limit_enforced set to true for the DRM device.  This is to allow the DRM
cgroup controller to collect usage stats without enforcing the limits.

drm.buffer.default
A read-only flat-keyed file which exists on the root cgroup.
Each entry is keyed by the drm device's major:minor.

Default limits on the total GEM buffer allocation in bytes.

drm.buffer.max
A read-write flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Per device limits on the total GEM buffer allocation in byte.
This is a hard limit.  Attempts in allocating beyond the cgroup
limit will result in ENOMEM.  Shorthand understood by memparse
(such as k, m, g) can be used.

Set allocation limit for /dev/dri/card1 to 1GB
echo "226:1 1g" > drm.buffer.total.max

Set allocation limit for /dev/dri/card0 to 512MB
echo "226:0 512m" > drm.buffer.total.max

Change-Id: I96e0b7add4d331ed8bb267b3c9243d360c6e9903
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst|  21 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|   8 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   6 +-
 drivers/gpu/drm/drm_gem.c  |  11 +-
 include/drm/drm_cgroup.h   |   7 +-
 include/drm/drm_gem.h  |   2 +-
 include/linux/cgroup_drm.h |   1 +
 kernel/cgroup/drm.c| 221 -
 8 files changed, 260 insertions(+), 17 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 4dc72339a9b6..e8fac2684179 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1919,6 +1919,27 @@ DRM Interface Files
 
Total number of GEM buffer allocated.
 
+  drm.buffer.default
+   A read-only flat-keyed file which exists on the root cgroup.
+   Each entry is keyed by the drm device's major:minor.
+
+   Default limits on the total GEM buffer allocation in bytes.
+
+  drm.buffer.max
+   A read-write flat-keyed file which exists on all cgroups.  Each
+   entry is keyed by the drm device's major:minor.
+
+   Per device limits on the total GEM buffer allocation in byte.
+   This is a hard limit.  Attempts in allocating beyond the cgroup
+   limit will result in ENOMEM.  Shorthand understood by memparse
+   (such as k, m, g) can be used.
+
+   Set allocation limit for /dev/dri/card1 to 1GB
+   echo "226:1 1g" > drm.buffer.total.max
+
+   Set allocation limit for /dev/dri/card0 to 512MB
+   echo "226:0 512m" > drm.buffer.total.max
+
 GEM Buffer Ownership
 
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index c0bbd3aa0558..163a4fbf0611 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1395,6 +1395,12 @@ amdgpu_get_crtc_scanout_position(struct drm_device *dev, 
unsigned int pipe,
  stime, etime, mode);
 }
 
+static void amdgpu_drmcg_custom_init(struct drm_device *dev,
+   struct drmcg_props *props)
+{
+   props->limit_enforced = true;
+}
+
 static struct drm_driver kms_driver = {
.driver_features =
DRIVER_USE_AGP | DRIVER_ATOMIC |
@@ -1431,6 +1437,8 @@ static struct drm_driver kms_driver = {
.gem_prime_vunmap = amdgpu_gem_prime_vunmap,
.gem_prime_mmap = amdgpu_gem_prime_mmap,
 
+   .drmcg_custom_init = amdgpu_drmcg_custom_init,
+
.name = DRIVER_NAME,
.desc = DRIVER_DESC,
.date = DRIVER_DATE,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 989b7b55cb2e..b1bd66be3e1a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_amdkfd.h"
@@ -454,7 +455,10 @@ static int amdgpu_bo_do_create(struct amdgpu_device *ade

[PATCH RFC v4 12/16] drm, cgroup: Add soft VRAM limit

2019-08-29 Thread Kenny Ho
The drm resource being limited is the TTM (Translation Table Manager)
buffers.  TTM manages different types of memory that a GPU might access.
These memory types include dedicated Video RAM (VRAM) and host/system
memory accessible through IOMMU (GART/GTT).  TTM is currently used by
multiple drm drivers (amd, ast, bochs, cirrus, hisilicon, maga200,
nouveau, qxl, virtio, vmwgfx.)

TTM buffers belonging to drm cgroups under memory pressure will be
selected to be evicted first.

drm.memory.high
A read-write nested-keyed file which exists on all cgroups.
Each entry is keyed by the drm device's major:minor.  The
following nested keys are defined.

   =
  vram Video RAM soft limit for a drm device in byte
   =

Reading returns the following::

226:0 vram=0
226:1 vram=17768448
226:2 vram=17768448

drm.memory.default
A read-only nested-keyed file which exists on the root cgroup.
Each entry is keyed by the drm device's major:minor.  The
following nested keys are defined.

   ===
  vram Video RAM default limit in byte
   ===

Reading returns the following::

226:0 vram=0
226:1 vram=17768448
226:2 vram=17768448

Change-Id: I7988e28a453b53140b40a28c176239acbc81d491
Signed-off-by: Kenny Ho 
---
 drivers/gpu/drm/ttm/ttm_bo.c |   7 ++
 include/drm/drm_cgroup.h |  17 +
 include/linux/cgroup_drm.h   |   2 +
 kernel/cgroup/drm.c  | 135 +++
 4 files changed, 161 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 32eee85f3641..d7e3d3128ebb 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -853,14 +853,21 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
struct ttm_bo_global *glob = bdev->glob;
struct ttm_mem_type_manager *man = >man[mem_type];
bool locked = false;
+   bool check_drmcg;
unsigned i;
int ret;
 
+   check_drmcg = drmcg_mem_pressure_scan(bdev, mem_type);
+
spin_lock(>lru_lock);
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
list_for_each_entry(bo, >lru[i], lru) {
bool busy;
 
+   if (check_drmcg &&
+   !drmcg_mem_should_evict(bo, mem_type))
+   continue;
+
if (!ttm_bo_evict_swapout_allowable(bo, ctx, ,
)) {
if (busy && !busy_bo &&
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 9ce0d54e6bd8..c11df388fdf2 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -6,6 +6,7 @@
 
 #include 
 #include 
+#include 
 
 /**
  * Per DRM device properties for DRM cgroup controller for the purpose
@@ -22,6 +23,8 @@ struct drmcg_props {
 
s64 mem_bw_bytes_in_period_default;
s64 mem_bw_avg_bytes_per_us_default;
+
+   s64 mem_highs_default[TTM_PL_PRIV+1];
 };
 
 #ifdef CONFIG_CGROUP_DRM
@@ -38,6 +41,8 @@ void drmcg_mem_track_move(struct ttm_buffer_object *old_bo, 
bool evict,
struct ttm_mem_reg *new_mem);
 unsigned int drmcg_get_mem_bw_period_in_us(struct ttm_buffer_object *tbo);
 bool drmcg_mem_can_move(struct ttm_buffer_object *tbo);
+bool drmcg_mem_pressure_scan(struct ttm_bo_device *bdev, unsigned int type);
+bool drmcg_mem_should_evict(struct ttm_buffer_object *tbo, unsigned int type);
 
 #else
 static inline void drmcg_device_update(struct drm_device *device)
@@ -81,5 +86,17 @@ static inline bool drmcg_mem_can_move(struct 
ttm_buffer_object *tbo)
 {
return true;
 }
+
+static inline bool drmcg_mem_pressure_scan(struct ttm_bo_device *bdev,
+   unsigned int type)
+{
+   return false;
+}
+
+static inline bool drmcg_mem_should_evict(struct ttm_buffer_object *tbo,
+   unsigned int type)
+{
+   return true;
+}
 #endif /* CONFIG_CGROUP_DRM */
 #endif /* __DRM_CGROUP_H__ */
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 27809a583bf2..c56cfe74d1a6 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -50,6 +50,8 @@ struct drmcg_device_resource {
 
s64 mem_stats[TTM_PL_PRIV+1];
s64 mem_peaks[TTM_PL_PRIV+1];
+   s64 mem_highs[TTM_PL_PRIV+1];
+   boolmem_pressure[TTM_PL_PRIV+1];
s64 mem_stats_evict;
 
s64 mem_bw_stats_last_update_us;
diff --gi

[PATCH RFC v4 08/16] drm, cgroup: Add peak GEM buffer allocation limit

2019-08-29 Thread Kenny Ho
drm.buffer.peak.default
A read-only flat-keyed file which exists on the root cgroup.
Each entry is keyed by the drm device's major:minor.

Default limits on the largest GEM buffer allocation in bytes.

drm.buffer.peak.max
A read-write flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Per device limits on the largest GEM buffer allocation in bytes.
This is a hard limit.  Attempts in allocating beyond the cgroup
limit will result in ENOMEM.  Shorthand understood by memparse
(such as k, m, g) can be used.

Set largest allocation for /dev/dri/card1 to 4MB
echo "226:1 4m" > drm.buffer.peak.max

Change-Id: I0830d56775568e1cf215b56cc892d5e7945e9f25
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst | 18 ++
 include/drm/drm_cgroup.h|  1 +
 include/linux/cgroup_drm.h  |  1 +
 kernel/cgroup/drm.c | 48 +
 4 files changed, 68 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index e8fac2684179..87a195133eaa 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1940,6 +1940,24 @@ DRM Interface Files
Set allocation limit for /dev/dri/card0 to 512MB
echo "226:0 512m" > drm.buffer.total.max
 
+  drm.buffer.peak.default
+   A read-only flat-keyed file which exists on the root cgroup.
+   Each entry is keyed by the drm device's major:minor.
+
+   Default limits on the largest GEM buffer allocation in bytes.
+
+  drm.buffer.peak.max
+   A read-write flat-keyed file which exists on all cgroups.  Each
+   entry is keyed by the drm device's major:minor.
+
+   Per device limits on the largest GEM buffer allocation in bytes.
+   This is a hard limit.  Attempts in allocating beyond the cgroup
+   limit will result in ENOMEM.  Shorthand understood by memparse
+   (such as k, m, g) can be used.
+
+   Set largest allocation for /dev/dri/card1 to 4MB
+   echo "226:1 4m" > drm.buffer.peak.max
+
 GEM Buffer Ownership
 
 
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 49c5d35ff6e1..d61b90beded5 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -14,6 +14,7 @@ struct drmcg_props {
boollimit_enforced;
 
s64 bo_limits_total_allocated_default;
+   s64 bo_limits_peak_allocated_default;
 };
 
 #ifdef CONFIG_CGROUP_DRM
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index eb54e56f20ae..87a2566c9fdd 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -29,6 +29,7 @@ struct drmcg_device_resource {
s64 bo_limits_total_allocated;
 
s64 bo_stats_peak_allocated;
+   s64 bo_limits_peak_allocated;
 
s64 bo_stats_count_allocated;
 };
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 7161fa40e156..2f54bff291e5 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -75,6 +75,9 @@ static inline int init_drmcg_single(struct drmcg *drmcg, 
struct drm_device *dev)
ddr->bo_limits_total_allocated =
dev->drmcg_props.bo_limits_total_allocated_default;
 
+   ddr->bo_limits_peak_allocated =
+   dev->drmcg_props.bo_limits_peak_allocated_default;
+
mutex_unlock(>drmcg_mutex);
return 0;
 }
@@ -157,6 +160,9 @@ static void drmcg_print_limits(struct drmcg_device_resource 
*ddr,
case DRMCG_TYPE_BO_TOTAL:
seq_printf(sf, "%lld\n", ddr->bo_limits_total_allocated);
break;
+   case DRMCG_TYPE_BO_PEAK:
+   seq_printf(sf, "%lld\n", ddr->bo_limits_peak_allocated);
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -171,6 +177,10 @@ static void drmcg_print_default(struct drmcg_props *props,
seq_printf(sf, "%lld\n",
props->bo_limits_total_allocated_default);
break;
+   case DRMCG_TYPE_BO_PEAK:
+   seq_printf(sf, "%lld\n",
+   props->bo_limits_peak_allocated_default);
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -327,6 +337,24 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file 
*of, char *buf,
drmcg_value_apply(dm->dev,
>bo_limits_total_allocated, val);
break;
+   case DRMCG_

[PATCH RFC v4 03/16] drm, cgroup: Initialize drmcg properties

2019-08-29 Thread Kenny Ho
drmcg initialization involves allocating a per cgroup, per device data
structure and setting the defaults.  There are two entry points for
drmcg init:

1) When struct drmcg is created via css_alloc, initialization is done
for each device

2) When DRM devices are created after drmcgs are created
  a) Per device drmcg data structure is allocated at the beginning of
  DRM device creation such that drmcg can begin tracking usage
  statistics
  b) At the end of DRM device creation, drmcg_device_update is called in
  case device specific defaults need to be applied.

Entry point #2 usually applies to the root cgroup since it can be
created before DRM devices are available.  The drmcg controller will go
through all existing drm cgroups and initialize them with the new device
accordingly.

Change-Id: I908ee6975ea0585e4c30eafde4599f87094d8c65
Signed-off-by: Kenny Ho 
---
 drivers/gpu/drm/drm_drv.c  |   7 +++
 include/drm/drm_cgroup.h   |  27 
 include/drm/drm_device.h   |   7 +++
 include/drm/drm_drv.h  |   9 +++
 include/linux/cgroup_drm.h |  13 
 kernel/cgroup/drm.c| 123 +
 6 files changed, 186 insertions(+)
 create mode 100644 include/drm/drm_cgroup.h

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 000cddabd970..94265eba68ca 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "drm_crtc_internal.h"
 #include "drm_legacy.h"
@@ -672,6 +673,7 @@ int drm_dev_init(struct drm_device *dev,
mutex_init(>filelist_mutex);
mutex_init(>clientlist_mutex);
mutex_init(>master_mutex);
+   mutex_init(>drmcg_mutex);
 
dev->anon_inode = drm_fs_inode_new();
if (IS_ERR(dev->anon_inode)) {
@@ -708,6 +710,7 @@ int drm_dev_init(struct drm_device *dev,
if (ret)
goto err_setunique;
 
+   drmcg_device_early_init(dev);
return 0;
 
 err_setunique:
@@ -722,6 +725,7 @@ int drm_dev_init(struct drm_device *dev,
drm_fs_inode_free(dev->anon_inode);
 err_free:
put_device(dev->dev);
+   mutex_destroy(>drmcg_mutex);
mutex_destroy(>master_mutex);
mutex_destroy(>clientlist_mutex);
mutex_destroy(>filelist_mutex);
@@ -798,6 +802,7 @@ void drm_dev_fini(struct drm_device *dev)
 
put_device(dev->dev);
 
+   mutex_destroy(>drmcg_mutex);
mutex_destroy(>master_mutex);
mutex_destroy(>clientlist_mutex);
mutex_destroy(>filelist_mutex);
@@ -1008,6 +1013,8 @@ int drm_dev_register(struct drm_device *dev, unsigned 
long flags)
 dev->dev ? dev_name(dev->dev) : "virtual device",
 dev->primary->index);
 
+   drmcg_device_update(dev);
+
goto out_unlock;
 
 err_minors:
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
new file mode 100644
index ..bef9f9245924
--- /dev/null
+++ b/include/drm/drm_cgroup.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ */
+#ifndef __DRM_CGROUP_H__
+#define __DRM_CGROUP_H__
+
+/**
+ * Per DRM device properties for DRM cgroup controller for the purpose
+ * of storing per device defaults
+ */
+struct drmcg_props {
+};
+
+#ifdef CONFIG_CGROUP_DRM
+
+void drmcg_device_update(struct drm_device *device);
+void drmcg_device_early_init(struct drm_device *device);
+#else
+static inline void drmcg_device_update(struct drm_device *device)
+{
+}
+
+static inline void drmcg_device_early_init(struct drm_device *device)
+{
+}
+#endif /* CONFIG_CGROUP_DRM */
+#endif /* __DRM_CGROUP_H__ */
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index 7f9ef709b2b6..5d7d779a5083 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 
 struct drm_driver;
 struct drm_minor;
@@ -304,6 +305,12 @@ struct drm_device {
 */
struct drm_fb_helper *fb_helper;
 
+/** \name DRM Cgroup */
+   /*@{ */
+   struct mutex drmcg_mutex;
+   struct drmcg_props drmcg_props;
+   /*@} */
+
/* Everything below here is for legacy driver, never use! */
/* private: */
 #if IS_ENABLED(CONFIG_DRM_LEGACY)
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index 24f8d054c570..c8a37a08d98d 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -660,6 +660,15 @@ struct drm_driver {
struct drm_device *dev,
uint32_t handle);
 
+   /**
+* @drmcg_custom_init
+*
+* Optional callback used to initialize drm cgroup per device properties
+* such as resource limit defaults.
+*/
+   void (*drmcg_custom_init)(struct drm_device *dev,
+   struct drmcg_props *props);
+
/**

[PATCH RFC v4 01/16] drm: Add drm_minor_for_each

2019-08-29 Thread Kenny Ho
To allow other subsystems to iterate through all stored DRM minors and
act upon them.

Also exposes drm_minor_acquire and drm_minor_release for other subsystem
to handle drm_minor.  DRM cgroup controller is the initial consumer of
this new features.

Change-Id: I7c4b67ce6b31f06d1037b03435386ff5b8144ca5
Signed-off-by: Kenny Ho 
---
 drivers/gpu/drm/drm_drv.c  | 19 +++
 drivers/gpu/drm/drm_internal.h |  4 
 include/drm/drm_drv.h  |  4 
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 862621494a93..000cddabd970 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -254,11 +254,13 @@ struct drm_minor *drm_minor_acquire(unsigned int minor_id)
 
return minor;
 }
+EXPORT_SYMBOL(drm_minor_acquire);
 
 void drm_minor_release(struct drm_minor *minor)
 {
drm_dev_put(minor->dev);
 }
+EXPORT_SYMBOL(drm_minor_release);
 
 /**
  * DOC: driver instance overview
@@ -1078,6 +1080,23 @@ int drm_dev_set_unique(struct drm_device *dev, const 
char *name)
 }
 EXPORT_SYMBOL(drm_dev_set_unique);
 
+/**
+ * drm_minor_for_each - Iterate through all stored DRM minors
+ * @fn: Function to be called for each pointer.
+ * @data: Data passed to callback function.
+ *
+ * The callback function will be called for each @drm_minor entry, passing
+ * the minor, the entry and @data.
+ *
+ * If @fn returns anything other than %0, the iteration stops and that
+ * value is returned from this function.
+ */
+int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data)
+{
+   return idr_for_each(_minors_idr, fn, data);
+}
+EXPORT_SYMBOL(drm_minor_for_each);
+
 /*
  * DRM Core
  * The DRM core module initializes all global DRM objects and makes them
diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index e19ac7ca602d..6bfad76f8e78 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -54,10 +54,6 @@ void drm_prime_destroy_file_private(struct 
drm_prime_file_private *prime_fpriv);
 void drm_prime_remove_buf_handle_locked(struct drm_prime_file_private 
*prime_fpriv,
struct dma_buf *dma_buf);
 
-/* drm_drv.c */
-struct drm_minor *drm_minor_acquire(unsigned int minor_id);
-void drm_minor_release(struct drm_minor *minor);
-
 /* drm_vblank.c */
 void drm_vblank_disable_and_save(struct drm_device *dev, unsigned int pipe);
 void drm_vblank_cleanup(struct drm_device *dev);
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index 68ca736c548d..24f8d054c570 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -799,5 +799,9 @@ static inline bool drm_drv_uses_atomic_modeset(struct 
drm_device *dev)
 
 int drm_dev_set_unique(struct drm_device *dev, const char *name);
 
+int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data);
+
+struct drm_minor *drm_minor_acquire(unsigned int minor_id);
+void drm_minor_release(struct drm_minor *minor);
 
 #endif
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

2019-08-29 Thread Kenny Ho
 to artificially limit DRM
resources availble to the applications.


Challenges

While there are common infrastructure in DRM that is shared across many vendors
(the scheduler [4] for example), there are also aspects of DRM that are vendor
specific.  To accommodate this, we borrowed the mechanism used by the cgroup to
handle different kinds of cgroup controller.

Resources for DRM are also often device (GPU) specific instead of system
specific and a system may contain more than one GPU.  For this, we borrowed some
of the ideas from RDMA cgroup controller.

Approach
===
To experiment with the idea of a DRM cgroup, we would like to start with basic
accounting and statistics, then continue to iterate and add regulating
mechanisms into the driver.

[1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
[2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
[3] https://www.spinics.net/lists/cgroups/msg20720.html
[4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler
[5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
[6] 
https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/
[7] https://github.com/RadeonOpenCompute/k8s-device-plugin
[8] https://github.com/kubernetes/kubernetes/issues/52757

Kenny Ho (16):
  drm: Add drm_minor_for_each
  cgroup: Introduce cgroup for drm subsystem
  drm, cgroup: Initialize drmcg properties
  drm, cgroup: Add total GEM buffer allocation stats
  drm, cgroup: Add peak GEM buffer allocation stats
  drm, cgroup: Add GEM buffer allocation count stats
  drm, cgroup: Add total GEM buffer allocation limit
  drm, cgroup: Add peak GEM buffer allocation limit
  drm, cgroup: Add TTM buffer allocation stats
  drm, cgroup: Add TTM buffer peak usage stats
  drm, cgroup: Add per cgroup bw measure and control
  drm, cgroup: Add soft VRAM limit
  drm, cgroup: Allow more aggressive memory reclaim
  drm, cgroup: Introduce lgpu as DRM cgroup resource
  drm, cgroup: add update trigger after limit change
  drm/amdgpu: Integrate with DRM cgroup

 Documentation/admin-guide/cgroup-v2.rst   |  163 +-
 Documentation/cgroup-v1/drm.rst   |1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |   29 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c|6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |6 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |3 +
 .../amd/amdkfd/kfd_process_queue_manager.c|  140 ++
 drivers/gpu/drm/drm_drv.c |   26 +
 drivers/gpu/drm/drm_gem.c |   16 +-
 drivers/gpu/drm/drm_internal.h|4 -
 drivers/gpu/drm/ttm/ttm_bo.c  |   93 ++
 drivers/gpu/drm/ttm/ttm_bo_util.c |4 +
 include/drm/drm_cgroup.h  |  122 ++
 include/drm/drm_device.h  |7 +
 include/drm/drm_drv.h |   23 +
 include/drm/drm_gem.h |   13 +-
 include/drm/ttm/ttm_bo_api.h  |2 +
 include/drm/ttm/ttm_bo_driver.h   |   10 +
 include/linux/cgroup_drm.h|  151 ++
 include/linux/cgroup_subsys.h |4 +
 init/Kconfig  |5 +
 kernel/cgroup/Makefile|1 +
 kernel/cgroup/drm.c   | 1367 +
 25 files changed, 2193 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/cgroup-v1/drm.rst
 create mode 100644 include/drm/drm_cgroup.h
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH RFC v4 04/16] drm, cgroup: Add total GEM buffer allocation stats

2019-08-29 Thread Kenny Ho
The drm resource being measured here is the GEM buffer objects.  User
applications allocate and free these buffers.  In addition, a process
can allocate a buffer and share it with another process.  The consumer
of a shared buffer can also outlive the allocator of the buffer.

For the purpose of cgroup accounting and limiting, ownership of the
buffer is deemed to be the cgroup for which the allocating process
belongs to.  There is one cgroup stats per drm device.  Each allocation
is charged to the owning cgroup as well as all its ancestors.

Similar to the memory cgroup, migrating a process to a different cgroup
does not move the GEM buffer usages that the process started while in
previous cgroup, to the new cgroup.

The following is an example to illustrate some of the operations.  Given
the following cgroup hierarchy (The letters are cgroup names with R
being the root cgroup.  The numbers in brackets are processes.  The
processes are placed with cgroup's 'No Internal Process Constraint' in
mind, so no process is placed in cgroup B.)

R (4, 5) -- A (6)
 \
  B  C (7,8)
   \
D (9)

Here is a list of operation and the associated effect on the size
track by the cgroups (for simplicity, each buffer is 1 unit in size.)

==  ==  ==  ==  ==  ===
R   A   B   C   D   Ops
==  ==  ==  ==  ==  ===
1   0   0   0   0   4 allocated a buffer
1   0   0   0   0   4 shared a buffer with 5
1   0   0   0   0   4 shared a buffer with 9
2   0   1   0   1   9 allocated a buffer
3   0   2   1   1   7 allocated a buffer
3   0   2   1   1   7 shared a buffer with 8
3   0   2   1   1   7 sharing with 9
3   0   2   1   1   7 release a buffer
3   0   2   1   1   7 migrate to cgroup D
3   0   2   1   1   9 release a buffer from 7
2   0   1   0   1   8 release a buffer from 7 (last ref to shared buf)
==  ==  ==  ==  ==  ===

drm.buffer.stats
A read-only flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Total GEM buffer allocation in bytes.

Change-Id: I9d662ec50d64bb40a37dbf47f018b2f3a1c033ad
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst |  50 +-
 drivers/gpu/drm/drm_gem.c   |   9 ++
 include/drm/drm_cgroup.h|  16 +++
 include/drm/drm_gem.h   |  11 +++
 include/linux/cgroup_drm.h  |   6 ++
 kernel/cgroup/drm.c | 126 
 6 files changed, 217 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 2936423a3fd5..0e29d136e2f9 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -63,6 +63,7 @@ v1 is available under Documentation/cgroup-v1/.
5-7-1. RDMA Interface Files
  5-8. DRM
5-8-1. DRM Interface Files
+   5-8-2. GEM Buffer Ownership
  5-9. Misc
5-9-1. perf_event
  5-N. Non-normative information
@@ -1900,7 +1901,54 @@ of DRM (Direct Rendering Manager) and GPU-related 
resources.
 DRM Interface Files
 
 
-TODO
+  drm.buffer.stats
+   A read-only flat-keyed file which exists on all cgroups.  Each
+   entry is keyed by the drm device's major:minor.
+
+   Total GEM buffer allocation in bytes.
+
+GEM Buffer Ownership
+
+
+For the purpose of cgroup accounting and limiting, ownership of the
+buffer is deemed to be the cgroup for which the allocating process
+belongs to.  There is one cgroup stats per drm device.  Each allocation
+is charged to the owning cgroup as well as all its ancestors.
+
+Similar to the memory cgroup, migrating a process to a different cgroup
+does not move the GEM buffer usages that the process started while in
+previous cgroup, to the new cgroup.
+
+The following is an example to illustrate some of the operations.  Given
+the following cgroup hierarchy (The letters are cgroup names with R
+being the root cgroup.  The numbers in brackets are processes.  The
+processes are placed with cgroup's 'No Internal Process Constraint' in
+mind, so no process is placed in cgroup B.)
+
+R (4, 5) -- A (6)
+ \
+  B  C (7,8)
+   \
+D (9)
+
+Here is a list of operation and the associated effect on the size
+track by the cgroups (for simplicity, each buffer is 1 unit in size.)
+
+==  ==  ==  ==  ==  ===
+R   A   B   C   D   Ops
+==  ==  ==  ==  ==  ===
+1   0   0   0   0   4 allocated a buffer
+1   0   0   0   0   4 shared a buffer with 5
+1   0   0   0   0   4 shared a buffer with 9
+2   0   1   0   1   9 allocated a buffer
+3   0   2   1   1   7 allocated a buffer
+3   0   2   1   1   7 shared a buffer with 8
+3   0   2   1   1   7 sharing with 9
+3   0   2   1   1

[PATCH RFC v4 06/16] drm, cgroup: Add GEM buffer allocation count stats

2019-08-29 Thread Kenny Ho
drm.buffer.count.stats
A read-only flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Total number of GEM buffer allocated.

Change-Id: Id3e1809d5fee8562e47a7d2b961688956d844ec6
Signed-off-by: Kenny Ho 
---
 Documentation/admin-guide/cgroup-v2.rst |  6 ++
 include/linux/cgroup_drm.h  |  3 +++
 kernel/cgroup/drm.c | 22 +++---
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 8588a0ffc69d..4dc72339a9b6 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1913,6 +1913,12 @@ DRM Interface Files
 
Largest (high water mark) GEM buffer allocated in bytes.
 
+  drm.buffer.count.stats
+   A read-only flat-keyed file which exists on all cgroups.  Each
+   entry is keyed by the drm device's major:minor.
+
+   Total number of GEM buffer allocated.
+
 GEM Buffer Ownership
 
 
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 974d390cfa4f..972f7aa975b5 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -16,6 +16,7 @@
 enum drmcg_res_type {
DRMCG_TYPE_BO_TOTAL,
DRMCG_TYPE_BO_PEAK,
+   DRMCG_TYPE_BO_COUNT,
__DRMCG_TYPE_LAST,
 };
 
@@ -27,6 +28,8 @@ struct drmcg_device_resource {
s64 bo_stats_total_allocated;
 
s64 bo_stats_peak_allocated;
+
+   s64 bo_stats_count_allocated;
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 0bf5b95668c4..85e46ece4a82 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -132,6 +132,9 @@ static void drmcg_print_stats(struct drmcg_device_resource 
*ddr,
case DRMCG_TYPE_BO_PEAK:
seq_printf(sf, "%lld\n", ddr->bo_stats_peak_allocated);
break;
+   case DRMCG_TYPE_BO_COUNT:
+   seq_printf(sf, "%lld\n", ddr->bo_stats_count_allocated);
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -186,6 +189,12 @@ struct cftype files[] = {
.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_PEAK,
DRMCG_FTYPE_STATS),
},
+   {
+   .name = "buffer.count.stats",
+   .seq_show = drmcg_seq_show,
+   .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_COUNT,
+   DRMCG_FTYPE_STATS),
+   },
{ } /* terminate */
 };
 
@@ -272,6 +281,8 @@ void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct 
drm_device *dev,
 
if (ddr->bo_stats_peak_allocated < (s64)size)
ddr->bo_stats_peak_allocated = (s64)size;
+
+   ddr->bo_stats_count_allocated++;
}
mutex_unlock(>drmcg_mutex);
 }
@@ -289,15 +300,20 @@ EXPORT_SYMBOL(drmcg_chg_bo_alloc);
 void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
size_t size)
 {
+   struct drmcg_device_resource *ddr;
int devIdx = dev->primary->index;
 
if (drmcg == NULL)
return;
 
mutex_lock(>drmcg_mutex);
-   for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg))
-   drmcg->dev_resources[devIdx]->bo_stats_total_allocated
-   -= (s64)size;
+   for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
+   ddr = drmcg->dev_resources[devIdx];
+
+   ddr->bo_stats_total_allocated -= (s64)size;
+
+   ddr->bo_stats_count_allocated--;
+   }
mutex_unlock(>drmcg_mutex);
 }
 EXPORT_SYMBOL(drmcg_unchg_bo_alloc);
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem

2019-06-29 Thread Kenny Ho
On Thu, Jun 27, 2019 at 3:24 AM Daniel Vetter  wrote:
> Another question I have: What about HMM? With the device memory zone
> the core mm will be a lot more involved in managing that, but I also
> expect that we'll have classic buffer-based management for a long time
> still. So these need to work together, and I fear slightly that we'll
> have memcg and drmcg fighting over the same pieces a bit perhaps?
>
> Adding Jerome, maybe he has some thoughts on this.

I just did a bit of digging and this looks like the current behaviour:
https://www.kernel.org/doc/html/v5.1/vm/hmm.html#memory-cgroup-memcg-and-rss-accounting

"For now device memory is accounted as any regular page in rss
counters (either anonymous if device page is used for anonymous, file
if device page is used for file backed page or shmem if device page is
used for shared memory). This is a deliberate choice to keep existing
applications, that might start using device memory without knowing
about it, running unimpacted.

A drawback is that the OOM killer might kill an application using a
lot of device memory and not a lot of regular system memory and thus
not freeing much system memory. We want to gather more real world
experience on how applications and system react under memory pressure
in the presence of device memory before deciding to account device
memory differently."

Regards,
Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH v3 09/11] drm, cgroup: Add per cgroup bw measure and control

2019-06-28 Thread Kenny Ho
On Thu, Jun 27, 2019 at 2:11 AM Daniel Vetter  wrote:
> I feel like a better approach would by to add a cgroup for the various
> engines on the gpu, and then also account all the sdma (or whatever the
> name of the amd copy engines is again) usage by ttm_bo moves to the right
> cgroup.  I think that's a more meaningful limitation. For direct thrashing
> control I think there's both not enough information available in the
> kernel (you'd need some performance counters to watch how much bandwidth
> userspace batches/CS are wasting), and I don't think the ttm eviction
> logic is ready to step over all the priority inversion issues this will
> bring up. Managing sdma usage otoh will be a lot more straightforward (but
> still has all the priority inversion problems, but in the scheduler that
> might be easier to fix perhaps with the explicit dependency graph - in the
> i915 scheduler we already have priority boosting afaiui).
My concern with hooking into the engine/ lower level is that the
engine may not be process/cgroup aware.  So the bandwidth tracking is
per device.  I am also wondering if this is also potentially be a case
of perfect getting in the way of good.  While ttm_bo_handle_move_mem
may not track everything, it is still a key function for a lot of the
memory operation.  Also, if the programming model is designed to
bypass the kernel then I am not sure if there are anything the kernel
can do.  (Things like kernel-bypass network stack comes to mind.)  All
that said, I will certainly dig deeper into the topic.

Regards,
Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit

2019-06-28 Thread Kenny Ho
On Thu, Jun 27, 2019 at 5:24 PM Daniel Vetter  wrote:
> On Thu, Jun 27, 2019 at 02:42:43PM -0400, Kenny Ho wrote:
> > Um... I am going to get a bit philosophical here and suggest that the
> > idea of sharing (especially uncontrolled sharing) is inherently at odd
> > with containment.  It's like, if everybody is special, no one is
> > special.  Perhaps an alternative is to make this configurable so that
> > people can allow sharing knowing the caveat?  And just to be clear,
> > the current solution allows for sharing, even between cgroup.
>
> The thing is, why shouldn't we just allow it (with some documented
> caveat)?
>
> I mean if all people do is share it as your current patches allow, then
> there's nothing funny going on (at least if we go with just leaking the
> allocations). If we allow additional sharing, then that's a plus.
Um... perhaps I was being overly conservative :).  So let me
illustrate with an example to add more clarity and get more comments
on it.

Let say we have the following cgroup hierarchy (The letters are
cgroups with R being the root cgroup.  The numbers in brackets are
processes.  The processes are placed with the 'No Internal Process
Constraint' in mind.)
R (4, 5) -- A (6)
  \
B  C (7,8)
 \
   D (9)

Here is a list of operation and the associated effect on the size
track by the cgroups (for simplicity, each buffer is 1 unit in size.)
With current implementation (charge on buffer creation with
restriction on sharing.)
R   A   B   C   D   |Ops

1   0   0   0   0   |4 allocated a buffer
1   0   0   0   0   |4 shared a buffer with 5
1   0   0   0   0   |4 shared a buffer with 9
2   0   1   0   1   |9 allocated a buffer
3   0   2   1   1   |7 allocated a buffer
3   0   2   1   1   |7 shared a buffer with 8
3   0   2   1   1   |7 sharing with 9 (not allowed)
3   0   2   1   1   |7 sharing with 4 (not allowed)
3   0   2   1   1   |7 release a buffer
2   0   1   0   1   |8 release a buffer from 7

The suggestion as I understand it (charge per buffer reference with
unrestricted sharing.)
R   A   B   C   D   |Ops

1   0   0   0   0   |4 allocated a buffer
2   0   0   0   0   |4 shared a buffer with 5
3   0   0   0   1   |4 shared a buffer with 9
4   0   1   0   2   |9 allocated a buffer
5   0   2   1   1   |7 allocated a buffer
6   0   3   2   1   |7 shared a buffer with 8
7   0   4   2   2   |7 sharing with 9
8   0   4   2   2   |7 sharing with 4
7   0   3   1   2   |7 release a buffer
6   0   2   0   2   |8 release a buffer from 7

Is this a correct understanding of the suggestion?

Regards,
Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats

2019-06-27 Thread Kenny Ho
On Thu, Jun 27, 2019 at 2:01 AM Daniel Vetter  wrote:
>
> btw reminds me: I guess it would be good to have a per-type .total
> read-only exposed, so that userspace has an idea of how much there is?
> ttm is trying to be agnostic to the allocator that's used to manage a
> memory type/resource, so doesn't even know that. But I think something we
> need to expose to admins, otherwise they can't meaningfully set limits.

I don't think I understand this bit, do you mean total across multiple
GPU of the same mem type?  Or do you mean the total available per GPU
(or something else?)

Regards,
Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit

2019-06-27 Thread Kenny Ho
On Thu, Jun 27, 2019 at 1:43 AM Daniel Vetter  wrote:
>
> On Wed, Jun 26, 2019 at 06:41:32PM -0400, Kenny Ho wrote:
> > So without the sharing restriction and some kind of ownership
> > structure, we will have to migrate/change the owner of the buffer when
> > the cgroup that created the buffer die before the receiving cgroup(s)
> > and I am not sure how to do that properly at the moment.  1) Should
> > each cgroup keep track of all the buffers that belongs to it and
> > migrate?  (Is that efficient?)  2) which cgroup should be the new
> > owner (and therefore have the limit applied?)  Having the creator
> > being the owner is kind of natural, but let say the buffer is shared
> > with 5 other cgroups, which of these 5 cgroups should be the new owner
> > of the buffer?
>
> Different answers:
>
> - Do we care if we leak bos like this in a cgroup, if the cgroup
>   disappears before all the bo are cleaned up?
>
> - Just charge the bo to each cgroup it's in? Will be quite a bit more
>   tracking needed to get that done ...
That seems to be the approach memcg takes, but as shown by the lwn
link you sent me from the last rfc (talk from Roman Gushchin), that
approach is not problem free either.  And wouldn't this approach
disconnect resource management from the underlying resource one would
like to control?  For example, if you have 5 MB of memory, you can
have 5 users using 1 MB each.  But in the charge-everybody approach, a
1 MB usage shared 4 times will make it looks like 5MB is used.  So the
resource being control is no longer 'real' since the amount of
resource you have is now dynamic and depends on the amount of sharing
one does.

> - Also, there's the legacy way of sharing a bo, with the FLINK and
>   GEM_OPEN ioctls. We need to plug these holes too.
>
> Just feels like your current solution is technically well-justified, but
> it completely defeats the point of cgroups/containers and buffer sharing
> ...
Um... I am going to get a bit philosophical here and suggest that the
idea of sharing (especially uncontrolled sharing) is inherently at odd
with containment.  It's like, if everybody is special, no one is
special.  Perhaps an alternative is to make this configurable so that
people can allow sharing knowing the caveat?  And just to be clear,
the current solution allows for sharing, even between cgroup.

Regards,
Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH v3 09/11] drm, cgroup: Add per cgroup bw measure and control

2019-06-26 Thread Kenny Ho
On Wed, Jun 26, 2019 at 12:25 PM Daniel Vetter  wrote:
>
> On Wed, Jun 26, 2019 at 11:05:20AM -0400, Kenny Ho wrote:
> > The bandwidth is measured by keeping track of the amount of bytes moved
> > by ttm within a time period.  We defined two type of bandwidth: burst
> > and average.  Average bandwidth is calculated by dividing the total
> > amount of bytes moved within a cgroup by the lifetime of the cgroup.
> > Burst bandwidth is similar except that the byte and time measurement is
> > reset after a user configurable period.
>
> So I'm not too sure exposing this is a great idea, at least depending upon
> what you're trying to do with it. There's a few concerns here:
>
> - I think bo movement stats might be useful, but they're not telling you
>   everything. Applications can also copy data themselves and put buffers
>   where they want them, especially with more explicit apis like vk.
>
> - which kind of moves are we talking about here? Eviction related bo moves
>   seem not counted here, and if you have lots of gpus with funny
>   interconnects you might also get other kinds of moves, not just system
>   ram <-> vram.
Eviction move is counted but I think I placed the delay in the wrong
place (the tracking of byte moved is in previous patch in
ttm_bo_handle_move_mem, which is common to all move as far as I can
tell.)

> - What happens if we slow down, but someone else needs to evict our
>   buffers/move them (ttm is atm not great at this, but Christian König is
>   working on patches). I think there's lots of priority inversion
>   potential here.
>
> - If the goal is to avoid thrashing the interconnects, then this isn't the
>   full picture by far - apps can use copy engines and explicit placement,
>   again that's how vulkan at least is supposed to work.
>
> I guess these all boil down to: What do you want to achieve here? The
> commit message doesn't explain the intended use-case of this.
Thrashing prevention is the intent.  I am not familiar with Vulkan so
I will have to get back to you on that.  I don't know how those
explicit placement translate into the kernel.  At this stage, I think
it's still worth while to have this as a resource even if some
applications bypass the kernel.  I certainly welcome more feedback on
this topic.

Regards,
Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats

2019-06-26 Thread Kenny Ho
On Wed, Jun 26, 2019 at 12:12 PM Daniel Vetter  wrote:
>
> On Wed, Jun 26, 2019 at 11:05:18AM -0400, Kenny Ho wrote:
> > drm.memory.stats
> > A read-only nested-keyed file which exists on all cgroups.
> > Each entry is keyed by the drm device's major:minor.  The
> > following nested keys are defined.
> >
> >   == =
> >   system Host/system memory
>
> Shouldn't that be covered by gem bo stats already? Also, system memory is
> definitely something a lot of non-ttm drivers want to be able to track, so
> that needs to be separate from ttm.
The gem bo stats covers all of these type.  I am treat the gem stats
as more of the front end and a hard limit and this set of stats as the
backing store which can be of various type.  How does non-ttm drivers
identify various memory types?

> >   tt Host memory used by the drm device (GTT/GART)
> >   vram   Video RAM used by the drm device
> >   priv   Other drm device, vendor specific memory
>
> So what's "priv". In general I think we need some way to register the
> different kinds of memory, e.g. stuff not in your list:
>
> - multiple kinds of vram (like numa-style gpus)
> - cma (for all those non-ttm drivers that's a big one, it's like system
>   memory but also totally different)
> - any carveouts and stuff
privs are vendor specific, which is why I have truncated it.  For
example, AMD has AMDGPU_PL_GDS, GWS, OA
https://elixir.bootlin.com/linux/v5.2-rc6/source/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h#L30

Since we are using keyed file type, we should be able to support
vendor specific memory type but I am not sure if this is acceptable to
cgroup upstream.  This is why I stick to the 3 memory type that is
common across all ttm drivers.

> I think with all the ttm refactoring going on I think we need to de-ttm
> the interface functions here a bit. With Gerd Hoffmans series you can just
> use a gem_bo pointer here, so what's left to do is have some extracted
> structure for tracking memory types. I think Brian Welty has some ideas
> for this, even in patch form. Would be good to keep him on cc at least for
> the next version. We'd need to explicitly hand in the ttm_mem_reg (or
> whatever the specific thing is going to be).

I assume Gerd Hoffman's series you are referring to is this one?
https://www.spinics.net/lists/dri-devel/msg215056.html

I can certainly keep an eye out for Gerd's refactoring while
refactoring other parts of this RFC.

I have added Brian and Gerd to the thread for awareness.

Regards,
Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH v3 11/11] drm, cgroup: Allow more aggressive memory reclaim

2019-06-26 Thread Kenny Ho
Ok.  I am not too familiar with shrinker but I will dig into it.  Just
so that I am looking into the right things, you are referring to
things like struct shrinker and struct shrink_control?

Regards,
Kenny

On Wed, Jun 26, 2019 at 12:44 PM Daniel Vetter  wrote:
>
> On Wed, Jun 26, 2019 at 11:05:22AM -0400, Kenny Ho wrote:
> > Allow DRM TTM memory manager to register a work_struct, such that, when
> > a drmcgrp is under memory pressure, memory reclaiming can be triggered
> > immediately.
> >
> > Change-Id: I25ac04e2db9c19ff12652b88ebff18b44b2706d8
> > Signed-off-by: Kenny Ho 
> > ---
> >  drivers/gpu/drm/ttm/ttm_bo.c| 47 +
> >  include/drm/drm_cgroup.h| 14 ++
> >  include/drm/ttm/ttm_bo_driver.h |  2 ++
> >  kernel/cgroup/drm.c | 33 +++
> >  4 files changed, 96 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > index 79c530f4a198..5fc3bc5bd4c5 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -1509,6 +1509,44 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, 
> > unsigned mem_type)
> >  }
> >  EXPORT_SYMBOL(ttm_bo_evict_mm);
> >
> > +static void ttm_bo_reclaim_wq(struct work_struct *work)
> > +{
>
> I think a design a bit more inspired by memcg aware core shrinkers would
> be nice, i.e. explicitly passing:
> - which drm_cgroup needs to be shrunk
> - which ttm_mem_reg (well the fancy new abstracted out stuff for tracking
>   special gpu memory resources like tt or vram or whatever)
> - how much it needs to be shrunk
>
> I think with that a lot more the book-keeping could be pushed into the
> drm_cgroup code, and the callback just needs to actually shrink enough as
> requested.
> -Daniel
>
> > + struct ttm_operation_ctx ctx = {
> > + .interruptible = false,
> > + .no_wait_gpu = false,
> > + .flags = TTM_OPT_FLAG_FORCE_ALLOC
> > + };
> > + struct ttm_mem_type_manager *man =
> > + container_of(work, struct ttm_mem_type_manager, reclaim_wq);
> > + struct ttm_bo_device *bdev = man->bdev;
> > + struct dma_fence *fence;
> > + int mem_type;
> > + int ret;
> > +
> > + for (mem_type = 0; mem_type < TTM_NUM_MEM_TYPES; mem_type++)
> > + if (>man[mem_type] == man)
> > + break;
> > +
> > + BUG_ON(mem_type >= TTM_NUM_MEM_TYPES);
> > +
> > + if (!drmcgrp_mem_pressure_scan(bdev, mem_type))
> > + return;
> > +
> > + ret = ttm_mem_evict_first(bdev, mem_type, NULL, );
> > + if (ret)
> > + return;
> > +
> > + spin_lock(>move_lock);
> > + fence = dma_fence_get(man->move);
> > + spin_unlock(>move_lock);
> > +
> > + if (fence) {
> > + ret = dma_fence_wait(fence, false);
> > + dma_fence_put(fence);
> > + }
> > +
> > +}
> > +
> >  int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
> >   unsigned long p_size)
> >  {
> > @@ -1543,6 +1581,13 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, 
> > unsigned type,
> >   INIT_LIST_HEAD(>lru[i]);
> >   man->move = NULL;
> >
> > + pr_err("drmcgrp %p type %d\n", bdev->ddev, type);
> > +
> > + if (type <= TTM_PL_VRAM) {
> > + INIT_WORK(>reclaim_wq, ttm_bo_reclaim_wq);
> > + drmcgrp_register_device_mm(bdev->ddev, type, 
> > >reclaim_wq);
> > + }
> > +
> >   return 0;
> >  }
> >  EXPORT_SYMBOL(ttm_bo_init_mm);
> > @@ -1620,6 +1665,8 @@ int ttm_bo_device_release(struct ttm_bo_device *bdev)
> >   man = >man[i];
> >   if (man->has_type) {
> >   man->use_type = false;
> > + drmcgrp_unregister_device_mm(bdev->ddev, i);
> > + cancel_work_sync(>reclaim_wq);
> >   if ((i != TTM_PL_SYSTEM) && ttm_bo_clean_mm(bdev, i)) 
> > {
> >   ret = -EBUSY;
> >   pr_err("DRM memory manager type %d is not 
> > clean\n",
> > diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> > index 360c1e6c809f..134d6e5475f3 100644
> > --- a/include/drm/drm_cgroup.h
> > +++ b/include/drm/drm_cgroup.h
> > @@ -5,6 +5,7 @@
>

Re: [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit

2019-06-26 Thread Kenny Ho
On Wed, Jun 26, 2019 at 5:41 PM Daniel Vetter  wrote:
> On Wed, Jun 26, 2019 at 05:27:48PM -0400, Kenny Ho wrote:
> > On Wed, Jun 26, 2019 at 12:05 PM Daniel Vetter  wrote:
> > > So what happens when you start a lot of threads all at the same time,
> > > allocating gem bo? Also would be nice if we could roll out at least the
> > > accounting part of this cgroup to all GEM drivers.
> >
> > When there is a large number of allocation, the allocation will be
> > checked in sequence within a device (since I used a per device mutex
> > in the check.)  Are you suggesting the overhead here is significant
> > enough to be a bottleneck?  The accounting part should be available to
> > all GEM drivers (unless I missed something) since the chg and unchg
> > function is called via the generic drm_gem_private_object_init and
> > drm_gem_object_release.
>
> thread 1: checks limits, still under the total
>
> thread 2: checks limits, still under the total
>
> thread 1: allocates, still under
>
> thread 2: allocates, now over the limit
>
> I think the check and chg need to be one step, or this wont work. Or I'm
> missing something somewhere.

Ok, I see what you are saying.

> Wrt rolling out the accounting for all drivers: Since you also roll out
> enforcement in this patch I'm not sure whether the accounting part is
> fully stand-alone. And as discussed a bit on an earlier patch, I think for
> DRIVER_GEM we should set up the accounting cgroup automatically.
I think I should be able to split the commit and restructure things a bit.

> > > What's the underlying technical reason for not allowing sharing across
> > > cgroups?
> > To be clear, sharing across cgroup is allowed, the buffer just needs
> > to be allocated by a process that is parent to the cgroup.  So in the
> > case of xorg allocating buffer for client, the xorg would be in the
> > root cgroup and the buffer can be passed around by different clients
> > (in root or other cgroup.)  The idea here is to establish some form of
> > ownership, otherwise there wouldn't be a way to account for or limit
> > the usage.
>
> But why? What's the problem if I allocate something and then hand it to
> someone else. E.g. one popular use of cgroups is to isolate clients, so
> maybe you'd do a cgroup + namespace for each X11 client (ok wayland, with
> X11 this is probably pointless).
>
> But with your current limitation those clients can't pass buffers to the
> compositor anymore, making cgroups useless. Your example here only works
> if Xorg is in the root and allocates all the buffers. That's not even true
> for DRI3 anymore.
>
> So pretty serious limitation on cgroups, and I'm not really understanding
> why we need this. I think if we want to prevent buffer sharing, what we
> need are some selinux hooks and stuff so you can prevent an import/access
> by someone who's not allowed to touch a buffer. But that kind of access
> right management should be separate from resource control imo.
So without the sharing restriction and some kind of ownership
structure, we will have to migrate/change the owner of the buffer when
the cgroup that created the buffer die before the receiving cgroup(s)
and I am not sure how to do that properly at the moment.  1) Should
each cgroup keep track of all the buffers that belongs to it and
migrate?  (Is that efficient?)  2) which cgroup should be the new
owner (and therefore have the limit applied?)  Having the creator
being the owner is kind of natural, but let say the buffer is shared
with 5 other cgroups, which of these 5 cgroups should be the new owner
of the buffer?

Regards,
Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH v3 02/11] cgroup: Add mechanism to register DRM devices

2019-06-26 Thread Kenny Ho
On Wed, Jun 26, 2019 at 5:04 PM Daniel Vetter  wrote:
> On Wed, Jun 26, 2019 at 10:37 PM Kenny Ho  wrote:
> > (sending again, I keep missing the reply-all in gmail.)
> You can make it the default somewhere in the gmail options.
Um... interesting, my option was actually not set (neither reply or reply-all.)

> > On Wed, Jun 26, 2019 at 11:56 AM Daniel Vetter  wrote:
> > >
> > > Why the separate, explicit registration step? I think a simpler design for
> > > drivers would be that we set up cgroups if there's anything to be
> > > controlled, and then for GEM drivers the basic GEM stuff would be set up
> > > automically (there's really no reason not to I think).
> >
> > Is this what you mean with the comment about drm_dev_register below?
> > I think I understand what you are saying but not super clear.  Are you
> > suggesting the use of driver feature bits (drm_core_check_feature,
> > etc.) similar to the way Brian Welty did in his proposal in May?
>
> Also not exactly a fan of driver feature bits tbh. What I had in mind was:
>
> - For stuff like the GEM accounting which we can do for all drivers
> easily (we can't do the enforcment, that needs a few changes), just
> roll it out for everyone. I.e. if you enable the DRMCG Kconfig, all
> DRIVER_GEM would get that basic gem cgroup accounting.
>
> - for other bits the driver just registers certain things, like "I can
> enforce gem limits" or "I have gpu memory regions vram, tt, and system
> and can enforce them" in their normal driver setup. Then at
> drm_dev_register time we register all these additional cgroups, like
> we today register all the other interafaces and pieces of a drm_device
> (drm_minor, drm_connectors, debugfs files, sysfs stuff, all these
> things).
>
> Since the concepts are still a bit in flux, let's take an example from
> the modeset side:
> - driver call drm_connector_init() to create connector object
> - drm_dev_register() also sets up all the public interfaces for that
> connector (debugfs, sysfs, ...)
>
> I think a similar setup would be good for cgroups here, you just
> register your special ttm_mem_reg or whatever, and the magic happens
> automatically.

Ok, I will look into those (I am not too familiar about those at this point.)

> > > I have no idea, but is this guaranteed to get them all?
> >
> > I believe so, base on my understanding about
> > css_for_each_descendant_pre and how I am starting from the root
> > cgroup.  Hopefully I didn't miss anything.
>
> Well it's rcu, so I expect it'll race with concurrent
> addition/removal. And the kerneldoc has some complicated sounding
> comments about how to synchronize that with some locks that I don't
> fully understand, but I think you're also not having any additional
> locking so not sure this all works correctly ...
>
> Do we still need the init_dmcgrp stuff if we'd just embedd? That would
> probably be the simplest way to solve this all :-)

I will need to dig into it a bit more to know for sure.  I think I
still need the init_drmcgrp stuff. I implemented it like this because
the cgroup subsystem appear to be initialized before the drm subsystem
so the root cgroup does not know any drm devices and the per device
default limits are not set.  In theory, I should only need to set the
root cgroup (so I don't need to use css_for_each_descendant_pre, which
requires the rcu_lock.)  But I am not 100% confident there won't be
any additional cgroup being added to the hierarchy between cgroup
subsystem init and drm subsystem init.

Alternatively I can protect it with an additional mutex but I am not
sure if that's needed.

Regards,
Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit

2019-06-26 Thread Kenny Ho
On Wed, Jun 26, 2019 at 12:05 PM Daniel Vetter  wrote:
>
> > drm.buffer.default
> > A read-only flat-keyed file which exists on the root cgroup.
> > Each entry is keyed by the drm device's major:minor.
> >
> > Default limits on the total GEM buffer allocation in bytes.
>
> Don't we need a "0 means no limit" semantics here?

I believe the convention is to use the 'max' keyword.

>
> I think we need a new drm-cgroup.rst which contains all this
> documentation.

Yes I planned to do that when things are more finalized.  I am
actually writing the commit message following the current doc format
so I can reuse it in the rst.

>
> With multiple GPUs, do we need an overall GEM bo limit, across all gpus?
> For other stuff later on like vram/tt/... and all that it needs to be
> per-device, but I think one overall limit could be useful.

This one I am not sure but should be fairly straightforward to add.
I'd love to hear more feedbacks on this as well.

> >   if (!amdgpu_bo_validate_size(adev, size, bp->domain))
> >   return -ENOMEM;
> >
> > + if (!drmcgrp_bo_can_allocate(current, adev->ddev, size))
> > + return -ENOMEM;
>
> So what happens when you start a lot of threads all at the same time,
> allocating gem bo? Also would be nice if we could roll out at least the
> accounting part of this cgroup to all GEM drivers.

When there is a large number of allocation, the allocation will be
checked in sequence within a device (since I used a per device mutex
in the check.)  Are you suggesting the overhead here is significant
enough to be a bottleneck?  The accounting part should be available to
all GEM drivers (unless I missed something) since the chg and unchg
function is called via the generic drm_gem_private_object_init and
drm_gem_object_release.

> > + /* only allow bo from the same cgroup or its ancestor to be imported 
> > */
> > + if (drmcgrp != NULL &&
>
> Quite a serious limitation here ...
>
> > + !drmcgrp_is_self_or_ancestor(drmcgrp, obj->drmcgrp)) {
>
> Also what happens if you actually share across devices? Then importing in
> the 2nd group is suddenly possible, and I think will be double-counted.
>
> What's the underlying technical reason for not allowing sharing across
> cgroups?

With the current implementation, there shouldn't be double counting as
the counting is done during the buffer init.

To be clear, sharing across cgroup is allowed, the buffer just needs
to be allocated by a process that is parent to the cgroup.  So in the
case of xorg allocating buffer for client, the xorg would be in the
root cgroup and the buffer can be passed around by different clients
(in root or other cgroup.)  The idea here is to establish some form of
ownership, otherwise there wouldn't be a way to account for or limit
the usage.

Regards,
Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH v3 02/11] cgroup: Add mechanism to register DRM devices

2019-06-26 Thread Kenny Ho
(sending again, I keep missing the reply-all in gmail.)

On Wed, Jun 26, 2019 at 11:56 AM Daniel Vetter  wrote:
>
> Why the separate, explicit registration step? I think a simpler design for
> drivers would be that we set up cgroups if there's anything to be
> controlled, and then for GEM drivers the basic GEM stuff would be set up
> automically (there's really no reason not to I think).

Is this what you mean with the comment about drm_dev_register below?
I think I understand what you are saying but not super clear.  Are you
suggesting the use of driver feature bits (drm_core_check_feature,
etc.) similar to the way Brian Welty did in his proposal in May?

> Also tying to the minor is a bit funky, since we have multiple of these.
> Need to make sure were at least consistent with whether we use the primary
> or render minor - I'd always go with the primary one like you do here.

Um... come to think of it, I can probably embed struct drmcgrp_device
into drm_device and that way I don't really need to keep a separate
array of
known_drmcgrp_devs and get rid of that max_minor thing.  Not sure why
I didn't think of this before.

> > +
> > +int drmcgrp_register_device(struct drm_device *dev)
>
> Imo this should be done as part of drm_dev_register (maybe only if the
> driver has set up a controller or something). Definitely with the
> unregister logic below. Also anything used by drivers needs kerneldoc.
>
>
> > + /* init cgroups created before registration (i.e. root cgroup) */
> > + if (root_drmcgrp != NULL) {
> > + struct cgroup_subsys_state *pos;
> > + struct drmcgrp *child;
> > +
> > + rcu_read_lock();
> > + css_for_each_descendant_pre(pos, _drmcgrp->css) {
> > + child = css_drmcgrp(pos);
> > + init_drmcgrp(child, dev);
> > + }
> > + rcu_read_unlock();
>
> I have no idea, but is this guaranteed to get them all?

I believe so, base on my understanding about
css_for_each_descendant_pre and how I am starting from the root
cgroup.  Hopefully I didn't miss anything.

Regards,
Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH v3 01/11] cgroup: Introduce cgroup for drm subsystem

2019-06-26 Thread Kenny Ho
On Wed, Jun 26, 2019 at 11:49 AM Daniel Vetter  wrote:
>
> Bunch of naming bikesheds

I appreciate the suggestions, naming is hard :).

> > +#include 
> > +
> > +struct drmcgrp {
>
> drm_cgroup for more consistency how we usually call these things.

I was hoping to keep the symbol short if possible.  I started with
drmcg (following blkcg),  but I believe that causes confusion with
other aspect of the drm subsystem.  I don't have too strong of an
opinion on this but I'd prefer not needing to keep refactoring.  So if
there are other opinions on this, please speak up.

> > +
> > +static inline void put_drmcgrp(struct drmcgrp *drmcgrp)
>
> In drm we generally put _get/_put at the end, cgroup seems to do the same.

ok, I will refactor.

> > +{
> > + if (drmcgrp)
> > + css_put(>css);
> > +}
> > +
> > +static inline struct drmcgrp *parent_drmcgrp(struct drmcgrp *cg)
>
> I'd also call this drm_cgroup_parent or so.
>
> Also all the above needs a bit of nice kerneldoc for the final version.
> -Daniel

Noted, will do, thanks.

Regards,
Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[RFC PATCH v3 08/11] drm, cgroup: Add TTM buffer peak usage stats

2019-06-26 Thread Kenny Ho
drm.memory.peak.stats
A read-only nested-keyed file which exists on all cgroups.
Each entry is keyed by the drm device's major:minor.  The
following nested keys are defined.

  == ==
  system Peak host memory used
  tt Peak host memory used by the device (GTT/GART)
  vram   Peak Video RAM used by the drm device
  priv   Other drm device specific memory peak usage
  == ==

Reading returns the following::

226:0 system=0 tt=0 vram=0 priv=0
226:1 system=0 tt=9035776 vram=17768448 priv=16809984
226:2 system=0 tt=9035776 vram=17768448 priv=16809984

Change-Id: I986e44533848f66411465bdd52105e78105a709a
Signed-off-by: Kenny Ho 
---
 include/linux/cgroup_drm.h |  1 +
 kernel/cgroup/drm.c| 20 
 2 files changed, 21 insertions(+)

diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 141bea06f74c..922529641df5 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -25,6 +25,7 @@ struct drmcgrp_device_resource {
s64 bo_stats_count_allocated;
 
s64 mem_stats[TTM_PL_PRIV+1];
+   s64 mem_peaks[TTM_PL_PRIV+1];
s64 mem_stats_evict;
 };
 
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 5aee42a628c1..5f5fa6a2b068 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -38,6 +38,7 @@ enum drmcgrp_res_type {
DRMCGRP_TYPE_BO_COUNT,
DRMCGRP_TYPE_MEM,
DRMCGRP_TYPE_MEM_EVICT,
+   DRMCGRP_TYPE_MEM_PEAK,
 };
 
 enum drmcgrp_file_type {
@@ -171,6 +172,13 @@ static inline void drmcgrp_print_stats(struct 
drmcgrp_device_resource *ddr,
case DRMCGRP_TYPE_MEM_EVICT:
seq_printf(sf, "%lld\n", ddr->mem_stats_evict);
break;
+   case DRMCGRP_TYPE_MEM_PEAK:
+   for (i = 0; i <= TTM_PL_PRIV; i++) {
+   seq_printf(sf, "%s=%lld ", ttm_placement_names[i],
+   ddr->mem_peaks[i]);
+   }
+   seq_puts(sf, "\n");
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -440,6 +448,12 @@ struct cftype files[] = {
.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_MEM_EVICT,
DRMCGRP_FTYPE_STATS),
},
+   {
+   .name = "memory.peaks.stats",
+   .seq_show = drmcgrp_bo_show,
+   .private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_MEM_PEAK,
+   DRMCGRP_FTYPE_STATS),
+   },
{ } /* terminate */
 };
 
@@ -608,6 +622,8 @@ void drmcgrp_chg_mem(struct ttm_buffer_object *tbo)
for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp)) {
ddr = drmcgrp->dev_resources[devIdx];
ddr->mem_stats[mem_type] += size;
+   ddr->mem_peaks[mem_type] = max(ddr->mem_peaks[mem_type],
+   ddr->mem_stats[mem_type]);
}
mutex_unlock(_drmcgrp_devs[devIdx]->mutex);
 }
@@ -662,6 +678,10 @@ void drmcgrp_mem_track_move(struct ttm_buffer_object 
*old_bo, bool evict,
ddr->mem_stats[old_mem_type] -= move_in_bytes;
ddr->mem_stats[new_mem_type] += move_in_bytes;
 
+   ddr->mem_peaks[new_mem_type] = max(
+   ddr->mem_peaks[new_mem_type],
+   ddr->mem_stats[new_mem_type]);
+
if (evict)
ddr->mem_stats_evict++;
}
-- 
2.21.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[RFC PATCH v3 11/11] drm, cgroup: Allow more aggressive memory reclaim

2019-06-26 Thread Kenny Ho
Allow DRM TTM memory manager to register a work_struct, such that, when
a drmcgrp is under memory pressure, memory reclaiming can be triggered
immediately.

Change-Id: I25ac04e2db9c19ff12652b88ebff18b44b2706d8
Signed-off-by: Kenny Ho 
---
 drivers/gpu/drm/ttm/ttm_bo.c| 47 +
 include/drm/drm_cgroup.h| 14 ++
 include/drm/ttm/ttm_bo_driver.h |  2 ++
 kernel/cgroup/drm.c | 33 +++
 4 files changed, 96 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 79c530f4a198..5fc3bc5bd4c5 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1509,6 +1509,44 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, unsigned 
mem_type)
 }
 EXPORT_SYMBOL(ttm_bo_evict_mm);
 
+static void ttm_bo_reclaim_wq(struct work_struct *work)
+{
+   struct ttm_operation_ctx ctx = {
+   .interruptible = false,
+   .no_wait_gpu = false,
+   .flags = TTM_OPT_FLAG_FORCE_ALLOC
+   };
+   struct ttm_mem_type_manager *man =
+   container_of(work, struct ttm_mem_type_manager, reclaim_wq);
+   struct ttm_bo_device *bdev = man->bdev;
+   struct dma_fence *fence;
+   int mem_type;
+   int ret;
+
+   for (mem_type = 0; mem_type < TTM_NUM_MEM_TYPES; mem_type++)
+   if (>man[mem_type] == man)
+   break;
+
+   BUG_ON(mem_type >= TTM_NUM_MEM_TYPES);
+
+   if (!drmcgrp_mem_pressure_scan(bdev, mem_type))
+   return;
+
+   ret = ttm_mem_evict_first(bdev, mem_type, NULL, );
+   if (ret)
+   return;
+
+   spin_lock(>move_lock);
+   fence = dma_fence_get(man->move);
+   spin_unlock(>move_lock);
+
+   if (fence) {
+   ret = dma_fence_wait(fence, false);
+   dma_fence_put(fence);
+   }
+
+}
+
 int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
unsigned long p_size)
 {
@@ -1543,6 +1581,13 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned 
type,
INIT_LIST_HEAD(>lru[i]);
man->move = NULL;
 
+   pr_err("drmcgrp %p type %d\n", bdev->ddev, type);
+
+   if (type <= TTM_PL_VRAM) {
+   INIT_WORK(>reclaim_wq, ttm_bo_reclaim_wq);
+   drmcgrp_register_device_mm(bdev->ddev, type, >reclaim_wq);
+   }
+
return 0;
 }
 EXPORT_SYMBOL(ttm_bo_init_mm);
@@ -1620,6 +1665,8 @@ int ttm_bo_device_release(struct ttm_bo_device *bdev)
man = >man[i];
if (man->has_type) {
man->use_type = false;
+   drmcgrp_unregister_device_mm(bdev->ddev, i);
+   cancel_work_sync(>reclaim_wq);
if ((i != TTM_PL_SYSTEM) && ttm_bo_clean_mm(bdev, i)) {
ret = -EBUSY;
pr_err("DRM memory manager type %d is not 
clean\n",
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 360c1e6c809f..134d6e5475f3 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -5,6 +5,7 @@
 #define __DRM_CGROUP_H__
 
 #include 
+#include 
 #include 
 #include 
 
@@ -12,6 +13,9 @@
 
 int drmcgrp_register_device(struct drm_device *device);
 int drmcgrp_unregister_device(struct drm_device *device);
+void drmcgrp_register_device_mm(struct drm_device *dev, unsigned type,
+   struct work_struct *wq);
+void drmcgrp_unregister_device_mm(struct drm_device *dev, unsigned type);
 bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self,
struct drmcgrp *relative);
 void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
@@ -40,6 +44,16 @@ static inline int drmcgrp_unregister_device(struct 
drm_device *device)
return 0;
 }
 
+static inline void drmcgrp_register_device_mm(struct drm_device *dev,
+   unsigned type, struct work_struct *wq)
+{
+}
+
+static inline void drmcgrp_unregister_device_mm(struct drm_device *dev,
+   unsigned type)
+{
+}
+
 static inline bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self,
struct drmcgrp *relative)
 {
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 4cbcb41e5aa9..0956ca7888fc 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -205,6 +205,8 @@ struct ttm_mem_type_manager {
 * Protected by @move_lock.
 */
struct dma_fence *move;
+
+   struct work_struct reclaim_wq;
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 1ce13db36ce9..985a89e849d3 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -31,6 +31,8 @@ struct drmcgrp_device {
s64 mem_bw_avg_bytes_per_us_default;
 
s64 mem_highs_default[TTM_PL_PR

[RFC PATCH v3 10/11] drm, cgroup: Add soft VRAM limit

2019-06-26 Thread Kenny Ho
The drm resource being limited is the TTM (Translation Table Manager)
buffers.  TTM manages different types of memory that a GPU might access.
These memory types include dedicated Video RAM (VRAM) and host/system
memory accessible through IOMMU (GART/GTT).  TTM is currently used by
multiple drm drivers (amd, ast, bochs, cirrus, hisilicon, maga200,
nouveau, qxl, virtio, vmwgfx.)

TTM buffers belonging to drm cgroups under memory pressure will be
selected to be evicted first.

drm.memory.high
A read-write nested-keyed file which exists on all cgroups.
Each entry is keyed by the drm device's major:minor.  The
following nested keys are defined.

   =
  vram Video RAM soft limit for a drm device in byte
   =

Reading returns the following::

226:0 vram=0
226:1 vram=17768448
226:2 vram=17768448

drm.memory.default
A read-only nested-keyed file which exists on the root cgroup.
Each entry is keyed by the drm device's major:minor.  The
following nested keys are defined.

   ===
  vram Video RAM default limit in byte
   ===

Reading returns the following::

226:0 vram=0
226:1 vram=17768448
226:2 vram=17768448

Change-Id: I7988e28a453b53140b40a28c176239acbc81d491
Signed-off-by: Kenny Ho 
---
 drivers/gpu/drm/ttm/ttm_bo.c |   7 ++
 include/drm/drm_cgroup.h |  15 
 include/linux/cgroup_drm.h   |   2 +
 kernel/cgroup/drm.c  | 145 +++
 4 files changed, 169 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index f06c2b9d8a4a..79c530f4a198 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -806,12 +806,19 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
struct ttm_mem_type_manager *man = >man[mem_type];
struct ttm_buffer_object *bo = NULL;
bool locked = false;
+bool check_drmcgrp;
unsigned i;
int ret;
 
+   check_drmcgrp = drmcgrp_mem_pressure_scan(bdev, mem_type);
+
spin_lock(>lru_lock);
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
list_for_each_entry(bo, >lru[i], lru) {
+   if (check_drmcgrp &&
+   !drmcgrp_mem_should_evict(bo, mem_type))
+   continue;
+
if (!ttm_bo_evict_swapout_allowable(bo, ctx, ))
continue;
 
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 9b1dbd6a4eca..360c1e6c809f 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -6,6 +6,7 @@
 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_CGROUP_DRM
 
@@ -25,6 +26,8 @@ void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, 
bool evict,
struct ttm_mem_reg *new_mem);
 unsigned int drmcgrp_get_mem_bw_period_in_us(struct ttm_buffer_object *tbo);
 bool drmcgrp_mem_can_move(struct ttm_buffer_object *tbo);
+bool drmcgrp_mem_pressure_scan(struct ttm_bo_device *bdev, unsigned type);
+bool drmcgrp_mem_should_evict(struct ttm_buffer_object *tbo, unsigned type);
 
 #else
 static inline int drmcgrp_register_device(struct drm_device *device)
@@ -82,5 +85,17 @@ static inline bool drmcgrp_mem_can_move(struct 
ttm_buffer_object *tbo)
 {
return true;
 }
+
+static inline bool drmcgrp_mem_pressure_scan(struct ttm_bo_device *bdev,
+   unsigned type)
+{
+   return false;
+}
+
+static inline bool drmcgrp_mem_should_evict(struct ttm_buffer_object *tbo,
+   unsigned type)
+{
+   return true;
+}
 #endif /* CONFIG_CGROUP_DRM */
 #endif /* __DRM_CGROUP_H__ */
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 94828da2104a..52ef02eaac70 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -35,6 +35,8 @@ struct drmcgrp_device_resource {
 
s64 mem_stats[TTM_PL_PRIV+1];
s64 mem_peaks[TTM_PL_PRIV+1];
+   s64 mem_highs[TTM_PL_PRIV+1];
+   boolmem_pressure[TTM_PL_PRIV+1];
s64 mem_stats_evict;
 
s64 mem_bw_stats_last_update_us;
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index bbc6612200a4..1ce13db36ce9 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -29,6 +29,8 @@ struct drmcgrp_device {
 
s64 mem_bw_bytes_in_period_default;
s64 mem_bw_avg_bytes_per_us_default;
+
+   s64 mem_highs_default[TTM_PL_PRIV+1];
 };
 
 #define DRMCG_CTF_PRIV_SIZE 3
@@ -114,6 +116,8 

[RFC PATCH v3 02/11] cgroup: Add mechanism to register DRM devices

2019-06-26 Thread Kenny Ho
Change-Id: I908ee6975ea0585e4c30eafde4599f87094d8c65
Signed-off-by: Kenny Ho 
---
 include/drm/drm_cgroup.h   |  24 
 include/linux/cgroup_drm.h |  10 
 kernel/cgroup/drm.c| 116 +
 3 files changed, 150 insertions(+)
 create mode 100644 include/drm/drm_cgroup.h

diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
new file mode 100644
index ..ddb9eab64360
--- /dev/null
+++ b/include/drm/drm_cgroup.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ */
+#ifndef __DRM_CGROUP_H__
+#define __DRM_CGROUP_H__
+
+#ifdef CONFIG_CGROUP_DRM
+
+int drmcgrp_register_device(struct drm_device *device);
+
+int drmcgrp_unregister_device(struct drm_device *device);
+
+#else
+static inline int drmcgrp_register_device(struct drm_device *device)
+{
+   return 0;
+}
+
+static inline int drmcgrp_unregister_device(struct drm_device *device)
+{
+   return 0;
+}
+#endif /* CONFIG_CGROUP_DRM */
+#endif /* __DRM_CGROUP_H__ */
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 9928e60037a5..27497f786c93 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -6,10 +6,20 @@
 
 #ifdef CONFIG_CGROUP_DRM
 
+#include 
 #include 
+#include 
+
+/* limit defined per the way drm_minor_alloc operates */
+#define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
+
+struct drmcgrp_device_resource {
+   /* for per device stats */
+};
 
 struct drmcgrp {
struct cgroup_subsys_state  css;
+   struct drmcgrp_device_resource  *dev_resources[MAX_DRM_DEV];
 };
 
 static inline struct drmcgrp *css_drmcgrp(struct cgroup_subsys_state *css)
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 66cb1dda023d..7da6e0d93991 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -1,28 +1,99 @@
 // SPDX-License-Identifier: MIT
 // Copyright 2019 Advanced Micro Devices, Inc.
+#include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
+#include 
+#include 
+#include 
+
+static DEFINE_MUTEX(drmcgrp_mutex);
+
+struct drmcgrp_device {
+   struct drm_device   *dev;
+   struct mutexmutex;
+};
+
+/* indexed by drm_minor for access speed */
+static struct drmcgrp_device   *known_drmcgrp_devs[MAX_DRM_DEV];
+
+static int max_minor;
+
 
 static struct drmcgrp *root_drmcgrp __read_mostly;
 
 static void drmcgrp_css_free(struct cgroup_subsys_state *css)
 {
struct drmcgrp *drmcgrp = css_drmcgrp(css);
+   int i;
+
+   for (i = 0; i <= max_minor; i++) {
+   if (drmcgrp->dev_resources[i] != NULL)
+   kfree(drmcgrp->dev_resources[i]);
+   }
 
kfree(drmcgrp);
 }
 
+static inline int init_drmcgrp_single(struct drmcgrp *drmcgrp, int minor)
+{
+   struct drmcgrp_device_resource *ddr = drmcgrp->dev_resources[minor];
+
+   if (ddr == NULL) {
+   ddr = kzalloc(sizeof(struct drmcgrp_device_resource),
+   GFP_KERNEL);
+
+   if (!ddr)
+   return -ENOMEM;
+
+   drmcgrp->dev_resources[minor] = ddr;
+   }
+
+   /* set defaults here */
+
+   return 0;
+}
+
+static inline int init_drmcgrp(struct drmcgrp *drmcgrp, struct drm_device *dev)
+{
+   int rc = 0;
+   int i;
+
+   if (dev != NULL) {
+   rc = init_drmcgrp_single(drmcgrp, dev->primary->index);
+   return rc;
+   }
+
+   for (i = 0; i <= max_minor; i++) {
+   rc = init_drmcgrp_single(drmcgrp, i);
+   if (rc)
+   return rc;
+   }
+
+   return 0;
+}
+
 static struct cgroup_subsys_state *
 drmcgrp_css_alloc(struct cgroup_subsys_state *parent_css)
 {
struct drmcgrp *parent = css_drmcgrp(parent_css);
struct drmcgrp *drmcgrp;
+   int rc;
 
drmcgrp = kzalloc(sizeof(struct drmcgrp), GFP_KERNEL);
if (!drmcgrp)
return ERR_PTR(-ENOMEM);
 
+   rc = init_drmcgrp(drmcgrp, NULL);
+   if (rc) {
+   drmcgrp_css_free(>css);
+   return ERR_PTR(rc);
+   }
+
if (!parent)
root_drmcgrp = drmcgrp;
 
@@ -40,3 +111,48 @@ struct cgroup_subsys drm_cgrp_subsys = {
.legacy_cftypes = files,
.dfl_cftypes= files,
 };
+
+int drmcgrp_register_device(struct drm_device *dev)
+{
+   struct drmcgrp_device *ddev;
+
+   ddev = kzalloc(sizeof(struct drmcgrp_device), GFP_KERNEL);
+   if (!ddev)
+   return -ENOMEM;
+
+   ddev->dev = dev;
+   mutex_init(>mutex);
+
+   mutex_lock(_mutex);
+   known_drmcgrp_devs[dev->primary->index] = ddev;
+   max_minor = max(max_minor, dev->primary->index);
+   mutex_unlock(_mutex);
+
+   /* init cgroups created before registration (i.e. root cgroup) */
+   if (root_drmcgrp != NULL) {
+   struct cgroup_subsys_stat

[RFC PATCH v3 05/11] drm, cgroup: Add peak GEM buffer allocation limit

2019-06-26 Thread Kenny Ho
drm.buffer.peak.stats
A read-only flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Largest GEM buffer allocated in bytes.

drm.buffer.peak.default
A read-only flat-keyed file which exists on the root cgroup.
Each entry is keyed by the drm device's major:minor.

Default limits on the largest GEM buffer allocation in bytes.

drm.buffer.peak.max
A read-write flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Per device limits on the largest GEM buffer allocation in bytes.
This is a hard limit.  Attempts in allocating beyond the cgroup
limit will result in ENOMEM.  Shorthand understood by memparse
(such as k, m, g) can be used.

Set largest allocation for /dev/dri/card1 to 4MB
echo "226:1 4m" > drm.buffer.peak.max

Change-Id: I0830d56775568e1cf215b56cc892d5e7945e9f25
Signed-off-by: Kenny Ho 
---
 include/linux/cgroup_drm.h |  3 ++
 kernel/cgroup/drm.c| 61 ++
 2 files changed, 64 insertions(+)

diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index efa019666f1c..126c156ffd70 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -17,6 +17,9 @@ struct drmcgrp_device_resource {
/* for per device stats */
s64 bo_stats_total_allocated;
s64 bo_limits_total_allocated;
+
+   size_t  bo_stats_peak_allocated;
+   size_t  bo_limits_peak_allocated;
 };
 
 struct drmcgrp {
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index cfc1fe74dca3..265008197654 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -19,6 +19,7 @@ struct drmcgrp_device {
struct mutexmutex;
 
s64 bo_limits_total_allocated_default;
+   size_t  bo_limits_peak_allocated_default;
 };
 
 #define DRMCG_CTF_PRIV_SIZE 3
@@ -31,6 +32,7 @@ struct drmcgrp_device {
 
 enum drmcgrp_res_type {
DRMCGRP_TYPE_BO_TOTAL,
+   DRMCGRP_TYPE_BO_PEAK,
 };
 
 enum drmcgrp_file_type {
@@ -78,6 +80,9 @@ static inline int init_drmcgrp_single(struct drmcgrp 
*drmcgrp, int minor)
if (known_drmcgrp_devs[minor] != NULL) {
ddr->bo_limits_total_allocated =
  known_drmcgrp_devs[minor]->bo_limits_total_allocated_default;
+
+   ddr->bo_limits_peak_allocated =
+ known_drmcgrp_devs[minor]->bo_limits_peak_allocated_default;
}
 
return 0;
@@ -137,6 +142,9 @@ static inline void drmcgrp_print_stats(struct 
drmcgrp_device_resource *ddr,
case DRMCGRP_TYPE_BO_TOTAL:
seq_printf(sf, "%lld\n", ddr->bo_stats_total_allocated);
break;
+   case DRMCGRP_TYPE_BO_PEAK:
+   seq_printf(sf, "%zu\n", ddr->bo_stats_peak_allocated);
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -155,6 +163,9 @@ static inline void drmcgrp_print_limits(struct 
drmcgrp_device_resource *ddr,
case DRMCGRP_TYPE_BO_TOTAL:
seq_printf(sf, "%lld\n", ddr->bo_limits_total_allocated);
break;
+   case DRMCGRP_TYPE_BO_PEAK:
+   seq_printf(sf, "%zu\n", ddr->bo_limits_peak_allocated);
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -174,6 +185,10 @@ static inline void drmcgrp_print_default(struct 
drmcgrp_device *ddev,
seq_printf(sf, "%lld\n",
ddev->bo_limits_total_allocated_default);
break;
+   case DRMCGRP_TYPE_BO_PEAK:
+   seq_printf(sf, "%zu\n",
+   ddev->bo_limits_peak_allocated_default);
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -315,6 +330,23 @@ ssize_t drmcgrp_bo_limit_write(struct kernfs_open_file 
*of, char *buf,
 
ddr->bo_limits_total_allocated = val;
break;
+   case DRMCGRP_TYPE_BO_PEAK:
+   p_max = parent == NULL ? SIZE_MAX :
+   parent->dev_resources[minor]->
+   bo_limits_peak_allocated;
+
+   rc = drmcgrp_process_limit_val(sattr, true,
+   ddev->bo_limits_peak_allocated_default,
+   p_max,
+   );
+
+   if (rc || val < 0) {
+   drmcgrp_pr_cft_err(drmcgrp, cft_name, minor);
+   continue;
+   }

[RFC PATCH v3 09/11] drm, cgroup: Add per cgroup bw measure and control

2019-06-26 Thread Kenny Ho
The bandwidth is measured by keeping track of the amount of bytes moved
by ttm within a time period.  We defined two type of bandwidth: burst
and average.  Average bandwidth is calculated by dividing the total
amount of bytes moved within a cgroup by the lifetime of the cgroup.
Burst bandwidth is similar except that the byte and time measurement is
reset after a user configurable period.

The bandwidth control is best effort since it is done on a per move
basis instead of per byte.  The bandwidth is limited by delaying the
move of a buffer.  The bandwidth limit can be exceeded when the next
move is larger than the remaining allowance.

drm.burst_bw_period_in_us
A read-write flat-keyed file which exists on the root cgroup.
Each entry is keyed by the drm device's major:minor.

Length of a period use to measure burst bandwidth in us.
One period per device.

drm.burst_bw_period_in_us.default
A read-only flat-keyed file which exists on the root cgroup.
Each entry is keyed by the drm device's major:minor.

Default length of a period in us (one per device.)

drm.bandwidth.stats
A read-only nested-keyed file which exists on all cgroups.
Each entry is keyed by the drm device's major:minor.  The
following nested keys are defined.

  = ==
  burst_byte_per_us Burst bandwidth
  avg_bytes_per_us  Average bandwidth
  moved_byteAmount of byte moved within a period
  accum_us  Amount of time accumulated in a period
  total_moved_byte  Byte moved within the cgroup lifetime
  total_accum_usCgroup lifetime in us
  byte_credit   Available byte credit to limit avg bw
  = ==

Reading returns the following::
226:1 burst_byte_per_us=23 avg_bytes_per_us=0 moved_byte=2244608
accum_us=95575 total_moved_byte=45899776 total_accum_us=201634590
byte_credit=13214278590464
226:2 burst_byte_per_us=10 avg_bytes_per_us=219 moved_byte=430080
accum_us=39350 total_moved_byte=65518026752 total_accum_us=298337721
byte_credit=9223372036854644735

drm.bandwidth.high
A read-write nested-keyed file which exists on all cgroups.
Each entry is keyed by the drm device's major:minor.  The
following nested keys are defined.

    ===
  bytes_in_period   Burst limit per period in byte
  avg_bytes_per_us  Average bandwidth limit in bytes per us
    ===

Reading returns the following::

226:1 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536
226:2 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536

drm.bandwidth.default
A read-only nested-keyed file which exists on the root cgroup.
Each entry is keyed by the drm device's major:minor.  The
following nested keys are defined.

    
  bytes_in_period   Default burst limit per period in byte
  avg_bytes_per_us  Default average bw limit in bytes per us
    

Reading returns the following::

226:1 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536
226:2 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536

Change-Id: Ie573491325ccc16535bb943e7857f43bd0962add
Signed-off-by: Kenny Ho 
---
 drivers/gpu/drm/ttm/ttm_bo.c |   7 +
 include/drm/drm_cgroup.h |  13 ++
 include/linux/cgroup_drm.h   |  14 ++
 kernel/cgroup/drm.c  | 309 ++-
 4 files changed, 340 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index e9f70547f0ad..f06c2b9d8a4a 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1176,6 +1177,12 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
 * Check whether we need to move buffer.
 */
if (!ttm_bo_mem_compat(placement, >mem, _flags)) {
+   unsigned int move_delay = drmcgrp_get_mem_bw_period_in_us(bo);
+   move_delay /= 2000; /* check every half period in ms*/
+   while (bo->bdev->ddev != NULL && !drmcgrp_mem_can_move(bo)) {
+   msleep(move_delay);
+   }
+
ret = ttm_bo_move_buffer(bo, placement, ctx);
if (ret)
return ret;
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 48ab5450cf17..9b1dbd6a4eca 100644
--- a/include/drm/drm_cgroup.h

[RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats

2019-06-26 Thread Kenny Ho
The drm resource being measured is the TTM (Translation Table Manager)
buffers.  TTM manages different types of memory that a GPU might access.
These memory types include dedicated Video RAM (VRAM) and host/system
memory accessible through IOMMU (GART/GTT).  TTM is currently used by
multiple drm drivers (amd, ast, bochs, cirrus, hisilicon, maga200,
nouveau, qxl, virtio, vmwgfx.)

drm.memory.stats
A read-only nested-keyed file which exists on all cgroups.
Each entry is keyed by the drm device's major:minor.  The
following nested keys are defined.

  == =
  system Host/system memory
  tt Host memory used by the drm device (GTT/GART)
  vram   Video RAM used by the drm device
  priv   Other drm device, vendor specific memory
  == =

Reading returns the following::

226:0 system=0 tt=0 vram=0 priv=0
226:1 system=0 tt=9035776 vram=17768448 priv=16809984
226:2 system=0 tt=9035776 vram=17768448 priv=16809984

drm.memory.evict.stats
A read-only flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Total number of evictions.

Change-Id: Ice2c4cc845051229549bebeb6aa2d7d6153bdf6a
Signed-off-by: Kenny Ho 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |   3 +-
 drivers/gpu/drm/ttm/ttm_bo.c|  30 +++
 drivers/gpu/drm/ttm/ttm_bo_util.c   |   4 +
 include/drm/drm_cgroup.h|  19 
 include/drm/ttm/ttm_bo_api.h|   2 +
 include/drm/ttm/ttm_bo_driver.h |   8 ++
 include/linux/cgroup_drm.h  |   4 +
 kernel/cgroup/drm.c | 113 
 8 files changed, 182 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index e9ecc3953673..a8dfc78ed45f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1678,8 +1678,9 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
mutex_init(>mman.gtt_window_lock);
 
/* No others user of address space so set it to 0 */
-   r = ttm_bo_device_init(>mman.bdev,
+   r = ttm_bo_device_init_tmp(>mman.bdev,
   _bo_driver,
+  adev->ddev,
   adev->ddev->anon_inode->i_mapping,
   adev->need_dma32);
if (r) {
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 2845fceb2fbd..e9f70547f0ad 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -42,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static void ttm_bo_global_kobj_release(struct kobject *kobj);
 
@@ -151,6 +153,10 @@ static void ttm_bo_release_list(struct kref *list_kref)
struct ttm_bo_device *bdev = bo->bdev;
size_t acc_size = bo->acc_size;
 
+   if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for 
all
+   drmcgrp_unchg_mem(bo);
+   put_drmcgrp(bo->drmcgrp);
+
BUG_ON(kref_read(>list_kref));
BUG_ON(kref_read(>kref));
BUG_ON(atomic_read(>cpu_writers));
@@ -353,6 +359,8 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object 
*bo,
if (bo->mem.mem_type == TTM_PL_SYSTEM) {
if (bdev->driver->move_notify)
bdev->driver->move_notify(bo, evict, mem);
+   if (bo->bdev->ddev != NULL) // TODO: remove after ddev 
initiazlied for all
+   drmcgrp_mem_track_move(bo, evict, mem);
bo->mem = *mem;
mem->mm_node = NULL;
goto moved;
@@ -361,6 +369,8 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object 
*bo,
 
if (bdev->driver->move_notify)
bdev->driver->move_notify(bo, evict, mem);
+   if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for 
all
+   drmcgrp_mem_track_move(bo, evict, mem);
 
if (!(old_man->flags & TTM_MEMTYPE_FLAG_FIXED) &&
!(new_man->flags & TTM_MEMTYPE_FLAG_FIXED))
@@ -374,6 +384,8 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object 
*bo,
if (bdev->driver->move_notify) {
swap(*mem, bo->mem);
bdev->driver->move_notify(bo, false, mem);
+   if (bo->bdev->ddev != NULL) // TODO: remove after ddev 
initiazlied for all
+  

[RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit

2019-06-26 Thread Kenny Ho
The drm resource being measured and limited here is the GEM buffer
objects.  User applications allocate and free these buffers.  In
addition, a process can allocate a buffer and share it with another
process.  The consumer of a shared buffer can also outlive the
allocator of the buffer.

For the purpose of cgroup accounting and limiting, ownership of the
buffer is deemed to be the cgroup for which the allocating process
belongs to.  There is one cgroup limit per drm device.

In order to prevent the buffer outliving the cgroup that owns it, a
process is prevented from importing buffers that are not own by the
process' cgroup or the ancestors of the process' cgroup.  In other
words, in order for a buffer to be shared between two cgroups, the
buffer must be created by the common ancestors of the cgroups.

drm.buffer.stats
A read-only flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Total GEM buffer allocation in bytes.

drm.buffer.default
A read-only flat-keyed file which exists on the root cgroup.
Each entry is keyed by the drm device's major:minor.

Default limits on the total GEM buffer allocation in bytes.

drm.buffer.max
A read-write flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Per device limits on the total GEM buffer allocation in byte.
This is a hard limit.  Attempts in allocating beyond the cgroup
limit will result in ENOMEM.  Shorthand understood by memparse
(such as k, m, g) can be used.

Set allocation limit for /dev/dri/card1 to 1GB
echo "226:1 1g" > drm.buffer.total.max

Set allocation limit for /dev/dri/card0 to 512MB
echo "226:0 512m" > drm.buffer.total.max

Change-Id: I4c249d06d45ec709d6481d4cbe87c5168545c5d0
Signed-off-by: Kenny Ho 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   4 +
 drivers/gpu/drm/drm_gem.c  |   8 +
 drivers/gpu/drm/drm_prime.c|   9 +
 include/drm/drm_cgroup.h   |  34 ++-
 include/drm/drm_gem.h  |  11 +
 include/linux/cgroup_drm.h |   2 +
 kernel/cgroup/drm.c| 321 +
 7 files changed, 387 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 93b2c5a48a71..b4c078b7ad63 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_amdkfd.h"
@@ -446,6 +447,9 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
if (!amdgpu_bo_validate_size(adev, size, bp->domain))
return -ENOMEM;
 
+   if (!drmcgrp_bo_can_allocate(current, adev->ddev, size))
+   return -ENOMEM;
+
*bo_ptr = NULL;
 
acc_size = ttm_bo_dma_acc_size(>mman.bdev, size,
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 6a80db077dc6..e20c1034bf2b 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -37,10 +37,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include "drm_internal.h"
 
 /** @file drm_gem.c
@@ -154,6 +156,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
obj->handle_count = 0;
obj->size = size;
drm_vma_node_reset(>vma_node);
+
+   obj->drmcgrp = get_drmcgrp(current);
+   drmcgrp_chg_bo_alloc(obj->drmcgrp, dev, size);
 }
 EXPORT_SYMBOL(drm_gem_private_object_init);
 
@@ -804,6 +809,9 @@ drm_gem_object_release(struct drm_gem_object *obj)
if (obj->filp)
fput(obj->filp);
 
+   drmcgrp_unchg_bo_alloc(obj->drmcgrp, obj->dev, obj->size);
+   put_drmcgrp(obj->drmcgrp);
+
drm_gem_free_mmap_offset(obj);
 }
 EXPORT_SYMBOL(drm_gem_object_release);
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 231e3f6d5f41..eeb612116810 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "drm_internal.h"
 
@@ -794,6 +795,7 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
 {
struct dma_buf *dma_buf;
struct drm_gem_object *obj;
+   struct drmcgrp *drmcgrp = drmcgrp_from(current);
int ret;
 
dma_buf = dma_buf_get(prime_fd);
@@ -818,6 +820,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
goto out_unlock;
}
 
+   /* only allow bo from the same cgroup or its ancestor to be imported */
+   if (drmcgrp != NULL &&
+  

[RFC PATCH v3 06/11] drm, cgroup: Add GEM buffer allocation count stats

2019-06-26 Thread Kenny Ho
drm.buffer.count.stats
A read-only flat-keyed file which exists on all cgroups.  Each
entry is keyed by the drm device's major:minor.

Total number of GEM buffer allocated.

Change-Id: Id3e1809d5fee8562e47a7d2b961688956d844ec6
Signed-off-by: Kenny Ho 
---
 include/linux/cgroup_drm.h |  2 ++
 kernel/cgroup/drm.c| 23 ---
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 126c156ffd70..e4400b21ab8e 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -20,6 +20,8 @@ struct drmcgrp_device_resource {
 
size_t  bo_stats_peak_allocated;
size_t  bo_limits_peak_allocated;
+
+   s64 bo_stats_count_allocated;
 };
 
 struct drmcgrp {
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 265008197654..9144f93b851f 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -33,6 +33,7 @@ struct drmcgrp_device {
 enum drmcgrp_res_type {
DRMCGRP_TYPE_BO_TOTAL,
DRMCGRP_TYPE_BO_PEAK,
+   DRMCGRP_TYPE_BO_COUNT,
 };
 
 enum drmcgrp_file_type {
@@ -145,6 +146,9 @@ static inline void drmcgrp_print_stats(struct 
drmcgrp_device_resource *ddr,
case DRMCGRP_TYPE_BO_PEAK:
seq_printf(sf, "%zu\n", ddr->bo_stats_peak_allocated);
break;
+   case DRMCGRP_TYPE_BO_COUNT:
+   seq_printf(sf, "%lld\n", ddr->bo_stats_count_allocated);
+   break;
default:
seq_puts(sf, "\n");
break;
@@ -396,6 +400,12 @@ struct cftype files[] = {
.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BO_PEAK,
DRMCGRP_FTYPE_LIMIT),
},
+   {
+   .name = "buffer.count.stats",
+   .seq_show = drmcgrp_bo_show,
+   .private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BO_COUNT,
+   DRMCGRP_FTYPE_STATS),
+   },
{ } /* terminate */
 };
 
@@ -518,6 +528,8 @@ void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct 
drm_device *dev,
 
if (ddr->bo_stats_peak_allocated < (size_t)size)
ddr->bo_stats_peak_allocated = (size_t)size;
+
+   ddr->bo_stats_count_allocated++;
}
mutex_unlock(_drmcgrp_devs[devIdx]->mutex);
 }
@@ -526,15 +538,20 @@ EXPORT_SYMBOL(drmcgrp_chg_bo_alloc);
 void drmcgrp_unchg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
size_t size)
 {
+   struct drmcgrp_device_resource *ddr;
int devIdx = dev->primary->index;
 
if (drmcgrp == NULL || known_drmcgrp_devs[devIdx] == NULL)
return;
 
mutex_lock(_drmcgrp_devs[devIdx]->mutex);
-   for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp))
-   drmcgrp->dev_resources[devIdx]->bo_stats_total_allocated
-   -= (s64)size;
+   for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp)) {
+   ddr = drmcgrp->dev_resources[devIdx];
+
+   ddr->bo_stats_total_allocated -= (s64)size;
+
+   ddr->bo_stats_count_allocated--;
+   }
mutex_unlock(_drmcgrp_devs[devIdx]->mutex);
 }
 EXPORT_SYMBOL(drmcgrp_unchg_bo_alloc);
-- 
2.21.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[RFC PATCH v3 03/11] drm/amdgpu: Register AMD devices for DRM cgroup

2019-06-26 Thread Kenny Ho
Change-Id: I3750fc657b956b52750a36cb303c54fa6a265b44
Signed-off-by: Kenny Ho 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index da7b4fe8ade3..2568fd730161 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -28,6 +28,7 @@
 #include 
 #include "amdgpu.h"
 #include 
+#include 
 #include "amdgpu_sched.h"
 #include "amdgpu_uvd.h"
 #include "amdgpu_vce.h"
@@ -97,6 +98,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
 
amdgpu_device_fini(adev);
 
+   drmcgrp_unregister_device(dev);
 done_free:
kfree(adev);
dev->dev_private = NULL;
@@ -141,6 +143,8 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned 
long flags)
struct amdgpu_device *adev;
int r, acpi_status;
 
+   drmcgrp_register_device(dev);
+
 #ifdef CONFIG_DRM_AMDGPU_SI
if (!amdgpu_si_support) {
switch (flags & AMD_ASIC_MASK) {
-- 
2.21.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem

2019-06-26 Thread Kenny Ho
://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
[6] 
https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/
[7] https://github.com/RadeonOpenCompute/k8s-device-plugin
[8] https://github.com/kubernetes/kubernetes/issues/52757

Kenny Ho (11):
  cgroup: Introduce cgroup for drm subsystem
  cgroup: Add mechanism to register DRM devices
  drm/amdgpu: Register AMD devices for DRM cgroup
  drm, cgroup: Add total GEM buffer allocation limit
  drm, cgroup: Add peak GEM buffer allocation limit
  drm, cgroup: Add GEM buffer allocation count stats
  drm, cgroup: Add TTM buffer allocation stats
  drm, cgroup: Add TTM buffer peak usage stats
  drm, cgroup: Add per cgroup bw measure and control
  drm, cgroup: Add soft VRAM limit
  drm, cgroup: Allow more aggressive memory reclaim

 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|3 +-
 drivers/gpu/drm/drm_gem.c  |8 +
 drivers/gpu/drm/drm_prime.c|9 +
 drivers/gpu/drm/ttm/ttm_bo.c   |   91 ++
 drivers/gpu/drm/ttm/ttm_bo_util.c  |4 +
 include/drm/drm_cgroup.h   |  115 ++
 include/drm/drm_gem.h  |   11 +
 include/drm/ttm/ttm_bo_api.h   |2 +
 include/drm/ttm/ttm_bo_driver.h|   10 +
 include/linux/cgroup_drm.h |  114 ++
 include/linux/cgroup_subsys.h  |4 +
 init/Kconfig   |5 +
 kernel/cgroup/Makefile |1 +
 kernel/cgroup/drm.c| 1171 
 16 files changed, 1555 insertions(+), 1 deletion(-)
 create mode 100644 include/drm/drm_cgroup.h
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

-- 
2.21.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  1   2   >