Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-02-03 Thread Tvrtko Ursulin



On 02/02/2023 23:43, T.J. Mercier wrote:

On Wed, Feb 1, 2023 at 6:23 AM Tvrtko Ursulin
 wrote:



On 01/02/2023 01:49, T.J. Mercier wrote:

On Tue, Jan 31, 2023 at 6:01 AM Tvrtko Ursulin
 wrote:



On 25/01/2023 20:04, T.J. Mercier wrote:

On Wed, Jan 25, 2023 at 9:31 AM Tvrtko Ursulin
 wrote:



Hi,

On 25/01/2023 11:52, Michal Hocko wrote:

On Tue 24-01-23 19:46:28, Shakeel Butt wrote:

On Tue, Jan 24, 2023 at 03:59:58PM +0100, Michal Hocko wrote:

On Mon 23-01-23 19:17:23, T.J. Mercier wrote:

When a buffer is exported to userspace, use memcg to attribute the
buffer to the allocating cgroup until all buffer references are
released.


Is there any reason why this memory cannot be charged during the
allocation (__GFP_ACCOUNT used)?
Also you do charge and account the memory but underlying pages do not
know about their memcg (this is normally done with commit_charge for
user mapped pages). This would become a problem if the memory is
migrated for example.


I don't think this is movable memory.


This also means that you have to maintain memcg
reference outside of the memcg proper which is not really nice either.
This mimicks tcp kmem limit implementation which I really have to say I
am not a great fan of and this pattern shouldn't be coppied.



I think we should keep the discussion on technical merits instead of
personal perference. To me using skmem like interface is totally fine
but the pros/cons need to be very explicit and the clear reasons to
select that option should be included.


I do agree with that. I didn't want sound to be personal wrt tcp kmem
accounting but the overall code maintenance cost is higher because
of how tcp take on accounting differs from anything else in the memcg
proper. I would prefer to not grow another example like that.


To me there are two options:

1. Using skmem like interface as this patch series:

The main pros of this option is that it is very simple. Let me list down
the cons of this approach:

a. There is time window between the actual memory allocation/free and
the charge and uncharge and [un]charge happen when the whole memory is
allocated or freed. I think for the charge path that might not be a big
issue but on the uncharge, this can cause issues. The application and
the potential shrinkers have freed some of this dmabuf memory but until
the whole dmabuf is freed, the memcg uncharge will not happen. This can
consequences on reclaim and oom behavior of the application.

b. Due to the usage model i.e. a central daemon allocating the dmabuf
memory upfront, there is a requirement to have a memcg charge transfer
functionality to transfer the charge from the central daemon to the
client applications. This does introduce complexity and avenues of weird
reclaim and oom behavior.


2. Allocate and charge the memory on page fault by actual user

In this approach, the memory is not allocated upfront by the central
daemon but rather on the page fault by the client application and the
memcg charge happen at the same time.

The only cons I can think of is this approach is more involved and may
need some clever tricks to track the page on the free patch i.e. we to
decrement the dmabuf memcg stat on free path. Maybe a page flag.

The pros of this approach is there is no need have a charge transfer
functionality and the charge/uncharge being closely tied to the actual
memory allocation and free.

Personally I would prefer the second approach but I don't want to just
block this work if the dmabuf folks are ok with the cons mentioned of
the first approach.


I am not familiar with dmabuf internals to judge complexity on their end
but I fully agree that charge-when-used is much more easier to reason
about and it should have less subtle surprises.


Disclaimer that I don't seem to see patches 3&4 on dri-devel so maybe I
am missing something, but in principle yes, I agree that the 2nd option
(charge the user, not exporter) should be preferred. Thing being that at
export time there may not be any backing store allocated, plus if the
series is restricting the charge transfer to just Android clients then
it seems it has the potential to miss many other use cases. At least
needs to outline a description on how the feature will be useful outside
Android.


There is no restriction like that. It's available to anybody who wants
to call dma_buf_charge_transfer if they actually have a need for that,
which I don't really expect to be common since most users/owners of
the buffers will be the ones causing the export in the first place.
It's just not like that on Android with the extra allocator process in
the middle most of the time.


Yeah I used the wrong term "restrict", apologies. What I meant was, if
the idea was to allow spotting memory leaks, with the charge transfer
being optional and in the series only wired up for Android Binder, then
it obviously only fully works for that one case. So a step back..


Oh, spotting kernel memory leaks is a side-benefit of accounting
kernel-only 

Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-02-03 Thread Tvrtko Ursulin



On 02/02/2023 23:43, T.J. Mercier wrote:

On Wed, Feb 1, 2023 at 6:52 AM Tvrtko Ursulin
 wrote:



On 01/02/2023 14:23, Tvrtko Ursulin wrote:


On 01/02/2023 01:49, T.J. Mercier wrote:

On Tue, Jan 31, 2023 at 6:01 AM Tvrtko Ursulin
 wrote:



On 25/01/2023 20:04, T.J. Mercier wrote:

On Wed, Jan 25, 2023 at 9:31 AM Tvrtko Ursulin
 wrote:



Hi,

On 25/01/2023 11:52, Michal Hocko wrote:

On Tue 24-01-23 19:46:28, Shakeel Butt wrote:

On Tue, Jan 24, 2023 at 03:59:58PM +0100, Michal Hocko wrote:

On Mon 23-01-23 19:17:23, T.J. Mercier wrote:

When a buffer is exported to userspace, use memcg to attribute the
buffer to the allocating cgroup until all buffer references are
released.


Is there any reason why this memory cannot be charged during the
allocation (__GFP_ACCOUNT used)?
Also you do charge and account the memory but underlying pages
do not
know about their memcg (this is normally done with commit_charge
for
user mapped pages). This would become a problem if the memory is
migrated for example.


I don't think this is movable memory.


This also means that you have to maintain memcg
reference outside of the memcg proper which is not really nice
either.
This mimicks tcp kmem limit implementation which I really have
to say I
am not a great fan of and this pattern shouldn't be coppied.



I think we should keep the discussion on technical merits instead of
personal perference. To me using skmem like interface is totally
fine
but the pros/cons need to be very explicit and the clear reasons to
select that option should be included.


I do agree with that. I didn't want sound to be personal wrt tcp kmem
accounting but the overall code maintenance cost is higher because
of how tcp take on accounting differs from anything else in the memcg
proper. I would prefer to not grow another example like that.


To me there are two options:

1. Using skmem like interface as this patch series:

The main pros of this option is that it is very simple. Let me
list down
the cons of this approach:

a. There is time window between the actual memory allocation/free
and
the charge and uncharge and [un]charge happen when the whole
memory is
allocated or freed. I think for the charge path that might not be
a big
issue but on the uncharge, this can cause issues. The application
and
the potential shrinkers have freed some of this dmabuf memory but
until
the whole dmabuf is freed, the memcg uncharge will not happen.
This can
consequences on reclaim and oom behavior of the application.

b. Due to the usage model i.e. a central daemon allocating the
dmabuf
memory upfront, there is a requirement to have a memcg charge
transfer
functionality to transfer the charge from the central daemon to the
client applications. This does introduce complexity and avenues
of weird
reclaim and oom behavior.


2. Allocate and charge the memory on page fault by actual user

In this approach, the memory is not allocated upfront by the central
daemon but rather on the page fault by the client application and
the
memcg charge happen at the same time.

The only cons I can think of is this approach is more involved
and may
need some clever tricks to track the page on the free patch i.e.
we to
decrement the dmabuf memcg stat on free path. Maybe a page flag.

The pros of this approach is there is no need have a charge transfer
functionality and the charge/uncharge being closely tied to the
actual
memory allocation and free.

Personally I would prefer the second approach but I don't want to
just
block this work if the dmabuf folks are ok with the cons
mentioned of
the first approach.


I am not familiar with dmabuf internals to judge complexity on
their end
but I fully agree that charge-when-used is much more easier to reason
about and it should have less subtle surprises.


Disclaimer that I don't seem to see patches 3&4 on dri-devel so
maybe I
am missing something, but in principle yes, I agree that the 2nd
option
(charge the user, not exporter) should be preferred. Thing being
that at
export time there may not be any backing store allocated, plus if the
series is restricting the charge transfer to just Android clients then
it seems it has the potential to miss many other use cases. At least
needs to outline a description on how the feature will be useful
outside
Android.


There is no restriction like that. It's available to anybody who wants
to call dma_buf_charge_transfer if they actually have a need for that,
which I don't really expect to be common since most users/owners of
the buffers will be the ones causing the export in the first place.
It's just not like that on Android with the extra allocator process in
the middle most of the time.


Yeah I used the wrong term "restrict", apologies. What I meant was, if
the idea was to allow spotting memory leaks, with the charge transfer
being optional and in the series only wired up for Android Binder, then
it obviously only fully works for that one case. So a step back..


Oh, spotting kernel memory leaks 

Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-02-02 Thread T.J. Mercier
On Wed, Feb 1, 2023 at 6:52 AM Tvrtko Ursulin
 wrote:
>
>
> On 01/02/2023 14:23, Tvrtko Ursulin wrote:
> >
> > On 01/02/2023 01:49, T.J. Mercier wrote:
> >> On Tue, Jan 31, 2023 at 6:01 AM Tvrtko Ursulin
> >>  wrote:
> >>>
> >>>
> >>> On 25/01/2023 20:04, T.J. Mercier wrote:
>  On Wed, Jan 25, 2023 at 9:31 AM Tvrtko Ursulin
>   wrote:
> >
> >
> > Hi,
> >
> > On 25/01/2023 11:52, Michal Hocko wrote:
> >> On Tue 24-01-23 19:46:28, Shakeel Butt wrote:
> >>> On Tue, Jan 24, 2023 at 03:59:58PM +0100, Michal Hocko wrote:
>  On Mon 23-01-23 19:17:23, T.J. Mercier wrote:
> > When a buffer is exported to userspace, use memcg to attribute the
> > buffer to the allocating cgroup until all buffer references are
> > released.
> 
>  Is there any reason why this memory cannot be charged during the
>  allocation (__GFP_ACCOUNT used)?
>  Also you do charge and account the memory but underlying pages
>  do not
>  know about their memcg (this is normally done with commit_charge
>  for
>  user mapped pages). This would become a problem if the memory is
>  migrated for example.
> >>>
> >>> I don't think this is movable memory.
> >>>
>  This also means that you have to maintain memcg
>  reference outside of the memcg proper which is not really nice
>  either.
>  This mimicks tcp kmem limit implementation which I really have
>  to say I
>  am not a great fan of and this pattern shouldn't be coppied.
> 
> >>>
> >>> I think we should keep the discussion on technical merits instead of
> >>> personal perference. To me using skmem like interface is totally
> >>> fine
> >>> but the pros/cons need to be very explicit and the clear reasons to
> >>> select that option should be included.
> >>
> >> I do agree with that. I didn't want sound to be personal wrt tcp kmem
> >> accounting but the overall code maintenance cost is higher because
> >> of how tcp take on accounting differs from anything else in the memcg
> >> proper. I would prefer to not grow another example like that.
> >>
> >>> To me there are two options:
> >>>
> >>> 1. Using skmem like interface as this patch series:
> >>>
> >>> The main pros of this option is that it is very simple. Let me
> >>> list down
> >>> the cons of this approach:
> >>>
> >>> a. There is time window between the actual memory allocation/free
> >>> and
> >>> the charge and uncharge and [un]charge happen when the whole
> >>> memory is
> >>> allocated or freed. I think for the charge path that might not be
> >>> a big
> >>> issue but on the uncharge, this can cause issues. The application
> >>> and
> >>> the potential shrinkers have freed some of this dmabuf memory but
> >>> until
> >>> the whole dmabuf is freed, the memcg uncharge will not happen.
> >>> This can
> >>> consequences on reclaim and oom behavior of the application.
> >>>
> >>> b. Due to the usage model i.e. a central daemon allocating the
> >>> dmabuf
> >>> memory upfront, there is a requirement to have a memcg charge
> >>> transfer
> >>> functionality to transfer the charge from the central daemon to the
> >>> client applications. This does introduce complexity and avenues
> >>> of weird
> >>> reclaim and oom behavior.
> >>>
> >>>
> >>> 2. Allocate and charge the memory on page fault by actual user
> >>>
> >>> In this approach, the memory is not allocated upfront by the central
> >>> daemon but rather on the page fault by the client application and
> >>> the
> >>> memcg charge happen at the same time.
> >>>
> >>> The only cons I can think of is this approach is more involved
> >>> and may
> >>> need some clever tricks to track the page on the free patch i.e.
> >>> we to
> >>> decrement the dmabuf memcg stat on free path. Maybe a page flag.
> >>>
> >>> The pros of this approach is there is no need have a charge transfer
> >>> functionality and the charge/uncharge being closely tied to the
> >>> actual
> >>> memory allocation and free.
> >>>
> >>> Personally I would prefer the second approach but I don't want to
> >>> just
> >>> block this work if the dmabuf folks are ok with the cons
> >>> mentioned of
> >>> the first approach.
> >>
> >> I am not familiar with dmabuf internals to judge complexity on
> >> their end
> >> but I fully agree that charge-when-used is much more easier to reason
> >> about and it should have less subtle surprises.
> >
> > Disclaimer that I don't seem to see patches 3&4 on dri-devel so
> > maybe I
> > am missing something, but in principle yes, I agree that the 2nd
> > option
> > (charge the user, not 

Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-02-02 Thread T.J. Mercier
On Wed, Feb 1, 2023 at 6:23 AM Tvrtko Ursulin
 wrote:
>
>
> On 01/02/2023 01:49, T.J. Mercier wrote:
> > On Tue, Jan 31, 2023 at 6:01 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> On 25/01/2023 20:04, T.J. Mercier wrote:
> >>> On Wed, Jan 25, 2023 at 9:31 AM Tvrtko Ursulin
> >>>  wrote:
> 
> 
>  Hi,
> 
>  On 25/01/2023 11:52, Michal Hocko wrote:
> > On Tue 24-01-23 19:46:28, Shakeel Butt wrote:
> >> On Tue, Jan 24, 2023 at 03:59:58PM +0100, Michal Hocko wrote:
> >>> On Mon 23-01-23 19:17:23, T.J. Mercier wrote:
>  When a buffer is exported to userspace, use memcg to attribute the
>  buffer to the allocating cgroup until all buffer references are
>  released.
> >>>
> >>> Is there any reason why this memory cannot be charged during the
> >>> allocation (__GFP_ACCOUNT used)?
> >>> Also you do charge and account the memory but underlying pages do not
> >>> know about their memcg (this is normally done with commit_charge for
> >>> user mapped pages). This would become a problem if the memory is
> >>> migrated for example.
> >>
> >> I don't think this is movable memory.
> >>
> >>> This also means that you have to maintain memcg
> >>> reference outside of the memcg proper which is not really nice either.
> >>> This mimicks tcp kmem limit implementation which I really have to say 
> >>> I
> >>> am not a great fan of and this pattern shouldn't be coppied.
> >>>
> >>
> >> I think we should keep the discussion on technical merits instead of
> >> personal perference. To me using skmem like interface is totally fine
> >> but the pros/cons need to be very explicit and the clear reasons to
> >> select that option should be included.
> >
> > I do agree with that. I didn't want sound to be personal wrt tcp kmem
> > accounting but the overall code maintenance cost is higher because
> > of how tcp take on accounting differs from anything else in the memcg
> > proper. I would prefer to not grow another example like that.
> >
> >> To me there are two options:
> >>
> >> 1. Using skmem like interface as this patch series:
> >>
> >> The main pros of this option is that it is very simple. Let me list 
> >> down
> >> the cons of this approach:
> >>
> >> a. There is time window between the actual memory allocation/free and
> >> the charge and uncharge and [un]charge happen when the whole memory is
> >> allocated or freed. I think for the charge path that might not be a big
> >> issue but on the uncharge, this can cause issues. The application and
> >> the potential shrinkers have freed some of this dmabuf memory but until
> >> the whole dmabuf is freed, the memcg uncharge will not happen. This can
> >> consequences on reclaim and oom behavior of the application.
> >>
> >> b. Due to the usage model i.e. a central daemon allocating the dmabuf
> >> memory upfront, there is a requirement to have a memcg charge transfer
> >> functionality to transfer the charge from the central daemon to the
> >> client applications. This does introduce complexity and avenues of 
> >> weird
> >> reclaim and oom behavior.
> >>
> >>
> >> 2. Allocate and charge the memory on page fault by actual user
> >>
> >> In this approach, the memory is not allocated upfront by the central
> >> daemon but rather on the page fault by the client application and the
> >> memcg charge happen at the same time.
> >>
> >> The only cons I can think of is this approach is more involved and may
> >> need some clever tricks to track the page on the free patch i.e. we to
> >> decrement the dmabuf memcg stat on free path. Maybe a page flag.
> >>
> >> The pros of this approach is there is no need have a charge transfer
> >> functionality and the charge/uncharge being closely tied to the actual
> >> memory allocation and free.
> >>
> >> Personally I would prefer the second approach but I don't want to just
> >> block this work if the dmabuf folks are ok with the cons mentioned of
> >> the first approach.
> >
> > I am not familiar with dmabuf internals to judge complexity on their end
> > but I fully agree that charge-when-used is much more easier to reason
> > about and it should have less subtle surprises.
> 
>  Disclaimer that I don't seem to see patches 3&4 on dri-devel so maybe I
>  am missing something, but in principle yes, I agree that the 2nd option
>  (charge the user, not exporter) should be preferred. Thing being that at
>  export time there may not be any backing store allocated, plus if the
>  series is restricting the charge transfer to just Android clients then
>  it seems it has the potential to miss many other use cases. At least
>  needs to outline a description on how the feature will be useful 

Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-02-01 Thread Tvrtko Ursulin



On 01/02/2023 14:23, Tvrtko Ursulin wrote:


On 01/02/2023 01:49, T.J. Mercier wrote:

On Tue, Jan 31, 2023 at 6:01 AM Tvrtko Ursulin
 wrote:



On 25/01/2023 20:04, T.J. Mercier wrote:

On Wed, Jan 25, 2023 at 9:31 AM Tvrtko Ursulin
 wrote:



Hi,

On 25/01/2023 11:52, Michal Hocko wrote:

On Tue 24-01-23 19:46:28, Shakeel Butt wrote:

On Tue, Jan 24, 2023 at 03:59:58PM +0100, Michal Hocko wrote:

On Mon 23-01-23 19:17:23, T.J. Mercier wrote:

When a buffer is exported to userspace, use memcg to attribute the
buffer to the allocating cgroup until all buffer references are
released.


Is there any reason why this memory cannot be charged during the
allocation (__GFP_ACCOUNT used)?
Also you do charge and account the memory but underlying pages 
do not
know about their memcg (this is normally done with commit_charge 
for

user mapped pages). This would become a problem if the memory is
migrated for example.


I don't think this is movable memory.


This also means that you have to maintain memcg
reference outside of the memcg proper which is not really nice 
either.
This mimicks tcp kmem limit implementation which I really have 
to say I

am not a great fan of and this pattern shouldn't be coppied.



I think we should keep the discussion on technical merits instead of
personal perference. To me using skmem like interface is totally 
fine

but the pros/cons need to be very explicit and the clear reasons to
select that option should be included.


I do agree with that. I didn't want sound to be personal wrt tcp kmem
accounting but the overall code maintenance cost is higher because
of how tcp take on accounting differs from anything else in the memcg
proper. I would prefer to not grow another example like that.


To me there are two options:

1. Using skmem like interface as this patch series:

The main pros of this option is that it is very simple. Let me 
list down

the cons of this approach:

a. There is time window between the actual memory allocation/free 
and
the charge and uncharge and [un]charge happen when the whole 
memory is
allocated or freed. I think for the charge path that might not be 
a big
issue but on the uncharge, this can cause issues. The application 
and
the potential shrinkers have freed some of this dmabuf memory but 
until
the whole dmabuf is freed, the memcg uncharge will not happen. 
This can

consequences on reclaim and oom behavior of the application.

b. Due to the usage model i.e. a central daemon allocating the 
dmabuf
memory upfront, there is a requirement to have a memcg charge 
transfer

functionality to transfer the charge from the central daemon to the
client applications. This does introduce complexity and avenues 
of weird

reclaim and oom behavior.


2. Allocate and charge the memory on page fault by actual user

In this approach, the memory is not allocated upfront by the central
daemon but rather on the page fault by the client application and 
the

memcg charge happen at the same time.

The only cons I can think of is this approach is more involved 
and may
need some clever tricks to track the page on the free patch i.e. 
we to

decrement the dmabuf memcg stat on free path. Maybe a page flag.

The pros of this approach is there is no need have a charge transfer
functionality and the charge/uncharge being closely tied to the 
actual

memory allocation and free.

Personally I would prefer the second approach but I don't want to 
just
block this work if the dmabuf folks are ok with the cons 
mentioned of

the first approach.


I am not familiar with dmabuf internals to judge complexity on 
their end

but I fully agree that charge-when-used is much more easier to reason
about and it should have less subtle surprises.


Disclaimer that I don't seem to see patches 3&4 on dri-devel so 
maybe I
am missing something, but in principle yes, I agree that the 2nd 
option
(charge the user, not exporter) should be preferred. Thing being 
that at

export time there may not be any backing store allocated, plus if the
series is restricting the charge transfer to just Android clients then
it seems it has the potential to miss many other use cases. At least
needs to outline a description on how the feature will be useful 
outside

Android.


There is no restriction like that. It's available to anybody who wants
to call dma_buf_charge_transfer if they actually have a need for that,
which I don't really expect to be common since most users/owners of
the buffers will be the ones causing the export in the first place.
It's just not like that on Android with the extra allocator process in
the middle most of the time.


Yeah I used the wrong term "restrict", apologies. What I meant was, if
the idea was to allow spotting memory leaks, with the charge transfer
being optional and in the series only wired up for Android Binder, then
it obviously only fully works for that one case. So a step back..


Oh, spotting kernel memory leaks is a side-benefit of accounting
kernel-only buffers in the 

Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-02-01 Thread Tvrtko Ursulin



On 01/02/2023 01:49, T.J. Mercier wrote:

On Tue, Jan 31, 2023 at 6:01 AM Tvrtko Ursulin
 wrote:



On 25/01/2023 20:04, T.J. Mercier wrote:

On Wed, Jan 25, 2023 at 9:31 AM Tvrtko Ursulin
 wrote:



Hi,

On 25/01/2023 11:52, Michal Hocko wrote:

On Tue 24-01-23 19:46:28, Shakeel Butt wrote:

On Tue, Jan 24, 2023 at 03:59:58PM +0100, Michal Hocko wrote:

On Mon 23-01-23 19:17:23, T.J. Mercier wrote:

When a buffer is exported to userspace, use memcg to attribute the
buffer to the allocating cgroup until all buffer references are
released.


Is there any reason why this memory cannot be charged during the
allocation (__GFP_ACCOUNT used)?
Also you do charge and account the memory but underlying pages do not
know about their memcg (this is normally done with commit_charge for
user mapped pages). This would become a problem if the memory is
migrated for example.


I don't think this is movable memory.


This also means that you have to maintain memcg
reference outside of the memcg proper which is not really nice either.
This mimicks tcp kmem limit implementation which I really have to say I
am not a great fan of and this pattern shouldn't be coppied.



I think we should keep the discussion on technical merits instead of
personal perference. To me using skmem like interface is totally fine
but the pros/cons need to be very explicit and the clear reasons to
select that option should be included.


I do agree with that. I didn't want sound to be personal wrt tcp kmem
accounting but the overall code maintenance cost is higher because
of how tcp take on accounting differs from anything else in the memcg
proper. I would prefer to not grow another example like that.


To me there are two options:

1. Using skmem like interface as this patch series:

The main pros of this option is that it is very simple. Let me list down
the cons of this approach:

a. There is time window between the actual memory allocation/free and
the charge and uncharge and [un]charge happen when the whole memory is
allocated or freed. I think for the charge path that might not be a big
issue but on the uncharge, this can cause issues. The application and
the potential shrinkers have freed some of this dmabuf memory but until
the whole dmabuf is freed, the memcg uncharge will not happen. This can
consequences on reclaim and oom behavior of the application.

b. Due to the usage model i.e. a central daemon allocating the dmabuf
memory upfront, there is a requirement to have a memcg charge transfer
functionality to transfer the charge from the central daemon to the
client applications. This does introduce complexity and avenues of weird
reclaim and oom behavior.


2. Allocate and charge the memory on page fault by actual user

In this approach, the memory is not allocated upfront by the central
daemon but rather on the page fault by the client application and the
memcg charge happen at the same time.

The only cons I can think of is this approach is more involved and may
need some clever tricks to track the page on the free patch i.e. we to
decrement the dmabuf memcg stat on free path. Maybe a page flag.

The pros of this approach is there is no need have a charge transfer
functionality and the charge/uncharge being closely tied to the actual
memory allocation and free.

Personally I would prefer the second approach but I don't want to just
block this work if the dmabuf folks are ok with the cons mentioned of
the first approach.


I am not familiar with dmabuf internals to judge complexity on their end
but I fully agree that charge-when-used is much more easier to reason
about and it should have less subtle surprises.


Disclaimer that I don't seem to see patches 3&4 on dri-devel so maybe I
am missing something, but in principle yes, I agree that the 2nd option
(charge the user, not exporter) should be preferred. Thing being that at
export time there may not be any backing store allocated, plus if the
series is restricting the charge transfer to just Android clients then
it seems it has the potential to miss many other use cases. At least
needs to outline a description on how the feature will be useful outside
Android.


There is no restriction like that. It's available to anybody who wants
to call dma_buf_charge_transfer if they actually have a need for that,
which I don't really expect to be common since most users/owners of
the buffers will be the ones causing the export in the first place.
It's just not like that on Android with the extra allocator process in
the middle most of the time.


Yeah I used the wrong term "restrict", apologies. What I meant was, if
the idea was to allow spotting memory leaks, with the charge transfer
being optional and in the series only wired up for Android Binder, then
it obviously only fully works for that one case. So a step back..


Oh, spotting kernel memory leaks is a side-benefit of accounting
kernel-only buffers in the root cgroup. The primary goal is to
attribute buffers to applications that originated 

Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-01-31 Thread T.J. Mercier
On Tue, Jan 31, 2023 at 6:01 AM Tvrtko Ursulin
 wrote:
>
>
> On 25/01/2023 20:04, T.J. Mercier wrote:
> > On Wed, Jan 25, 2023 at 9:31 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> Hi,
> >>
> >> On 25/01/2023 11:52, Michal Hocko wrote:
> >>> On Tue 24-01-23 19:46:28, Shakeel Butt wrote:
>  On Tue, Jan 24, 2023 at 03:59:58PM +0100, Michal Hocko wrote:
> > On Mon 23-01-23 19:17:23, T.J. Mercier wrote:
> >> When a buffer is exported to userspace, use memcg to attribute the
> >> buffer to the allocating cgroup until all buffer references are
> >> released.
> >
> > Is there any reason why this memory cannot be charged during the
> > allocation (__GFP_ACCOUNT used)?
> > Also you do charge and account the memory but underlying pages do not
> > know about their memcg (this is normally done with commit_charge for
> > user mapped pages). This would become a problem if the memory is
> > migrated for example.
> 
>  I don't think this is movable memory.
> 
> > This also means that you have to maintain memcg
> > reference outside of the memcg proper which is not really nice either.
> > This mimicks tcp kmem limit implementation which I really have to say I
> > am not a great fan of and this pattern shouldn't be coppied.
> >
> 
>  I think we should keep the discussion on technical merits instead of
>  personal perference. To me using skmem like interface is totally fine
>  but the pros/cons need to be very explicit and the clear reasons to
>  select that option should be included.
> >>>
> >>> I do agree with that. I didn't want sound to be personal wrt tcp kmem
> >>> accounting but the overall code maintenance cost is higher because
> >>> of how tcp take on accounting differs from anything else in the memcg
> >>> proper. I would prefer to not grow another example like that.
> >>>
>  To me there are two options:
> 
>  1. Using skmem like interface as this patch series:
> 
>  The main pros of this option is that it is very simple. Let me list down
>  the cons of this approach:
> 
>  a. There is time window between the actual memory allocation/free and
>  the charge and uncharge and [un]charge happen when the whole memory is
>  allocated or freed. I think for the charge path that might not be a big
>  issue but on the uncharge, this can cause issues. The application and
>  the potential shrinkers have freed some of this dmabuf memory but until
>  the whole dmabuf is freed, the memcg uncharge will not happen. This can
>  consequences on reclaim and oom behavior of the application.
> 
>  b. Due to the usage model i.e. a central daemon allocating the dmabuf
>  memory upfront, there is a requirement to have a memcg charge transfer
>  functionality to transfer the charge from the central daemon to the
>  client applications. This does introduce complexity and avenues of weird
>  reclaim and oom behavior.
> 
> 
>  2. Allocate and charge the memory on page fault by actual user
> 
>  In this approach, the memory is not allocated upfront by the central
>  daemon but rather on the page fault by the client application and the
>  memcg charge happen at the same time.
> 
>  The only cons I can think of is this approach is more involved and may
>  need some clever tricks to track the page on the free patch i.e. we to
>  decrement the dmabuf memcg stat on free path. Maybe a page flag.
> 
>  The pros of this approach is there is no need have a charge transfer
>  functionality and the charge/uncharge being closely tied to the actual
>  memory allocation and free.
> 
>  Personally I would prefer the second approach but I don't want to just
>  block this work if the dmabuf folks are ok with the cons mentioned of
>  the first approach.
> >>>
> >>> I am not familiar with dmabuf internals to judge complexity on their end
> >>> but I fully agree that charge-when-used is much more easier to reason
> >>> about and it should have less subtle surprises.
> >>
> >> Disclaimer that I don't seem to see patches 3&4 on dri-devel so maybe I
> >> am missing something, but in principle yes, I agree that the 2nd option
> >> (charge the user, not exporter) should be preferred. Thing being that at
> >> export time there may not be any backing store allocated, plus if the
> >> series is restricting the charge transfer to just Android clients then
> >> it seems it has the potential to miss many other use cases. At least
> >> needs to outline a description on how the feature will be useful outside
> >> Android.
> >>
> > There is no restriction like that. It's available to anybody who wants
> > to call dma_buf_charge_transfer if they actually have a need for that,
> > which I don't really expect to be common since most users/owners of
> > the buffers will be the ones causing the export in the first place.
> > It's 

Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-01-31 Thread Tvrtko Ursulin



On 25/01/2023 20:04, T.J. Mercier wrote:

On Wed, Jan 25, 2023 at 9:31 AM Tvrtko Ursulin
 wrote:



Hi,

On 25/01/2023 11:52, Michal Hocko wrote:

On Tue 24-01-23 19:46:28, Shakeel Butt wrote:

On Tue, Jan 24, 2023 at 03:59:58PM +0100, Michal Hocko wrote:

On Mon 23-01-23 19:17:23, T.J. Mercier wrote:

When a buffer is exported to userspace, use memcg to attribute the
buffer to the allocating cgroup until all buffer references are
released.


Is there any reason why this memory cannot be charged during the
allocation (__GFP_ACCOUNT used)?
Also you do charge and account the memory but underlying pages do not
know about their memcg (this is normally done with commit_charge for
user mapped pages). This would become a problem if the memory is
migrated for example.


I don't think this is movable memory.


This also means that you have to maintain memcg
reference outside of the memcg proper which is not really nice either.
This mimicks tcp kmem limit implementation which I really have to say I
am not a great fan of and this pattern shouldn't be coppied.



I think we should keep the discussion on technical merits instead of
personal perference. To me using skmem like interface is totally fine
but the pros/cons need to be very explicit and the clear reasons to
select that option should be included.


I do agree with that. I didn't want sound to be personal wrt tcp kmem
accounting but the overall code maintenance cost is higher because
of how tcp take on accounting differs from anything else in the memcg
proper. I would prefer to not grow another example like that.


To me there are two options:

1. Using skmem like interface as this patch series:

The main pros of this option is that it is very simple. Let me list down
the cons of this approach:

a. There is time window between the actual memory allocation/free and
the charge and uncharge and [un]charge happen when the whole memory is
allocated or freed. I think for the charge path that might not be a big
issue but on the uncharge, this can cause issues. The application and
the potential shrinkers have freed some of this dmabuf memory but until
the whole dmabuf is freed, the memcg uncharge will not happen. This can
consequences on reclaim and oom behavior of the application.

b. Due to the usage model i.e. a central daemon allocating the dmabuf
memory upfront, there is a requirement to have a memcg charge transfer
functionality to transfer the charge from the central daemon to the
client applications. This does introduce complexity and avenues of weird
reclaim and oom behavior.


2. Allocate and charge the memory on page fault by actual user

In this approach, the memory is not allocated upfront by the central
daemon but rather on the page fault by the client application and the
memcg charge happen at the same time.

The only cons I can think of is this approach is more involved and may
need some clever tricks to track the page on the free patch i.e. we to
decrement the dmabuf memcg stat on free path. Maybe a page flag.

The pros of this approach is there is no need have a charge transfer
functionality and the charge/uncharge being closely tied to the actual
memory allocation and free.

Personally I would prefer the second approach but I don't want to just
block this work if the dmabuf folks are ok with the cons mentioned of
the first approach.


I am not familiar with dmabuf internals to judge complexity on their end
but I fully agree that charge-when-used is much more easier to reason
about and it should have less subtle surprises.


Disclaimer that I don't seem to see patches 3&4 on dri-devel so maybe I
am missing something, but in principle yes, I agree that the 2nd option
(charge the user, not exporter) should be preferred. Thing being that at
export time there may not be any backing store allocated, plus if the
series is restricting the charge transfer to just Android clients then
it seems it has the potential to miss many other use cases. At least
needs to outline a description on how the feature will be useful outside
Android.


There is no restriction like that. It's available to anybody who wants
to call dma_buf_charge_transfer if they actually have a need for that,
which I don't really expect to be common since most users/owners of
the buffers will be the ones causing the export in the first place.
It's just not like that on Android with the extra allocator process in
the middle most of the time.


Yeah I used the wrong term "restrict", apologies. What I meant was, if 
the idea was to allow spotting memory leaks, with the charge transfer 
being optional and in the series only wired up for Android Binder, then 
it obviously only fully works for that one case. So a step back..


.. For instance, it is not feasible to transfer the charge when dmabuf 
is attached, or imported? That would attribute the usage to the 
user/importer so give better visibility on who is actually causing the 
memory leak.


Further more, if above is feasible, then could it 

Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-01-25 Thread T.J. Mercier
On Wed, Jan 25, 2023 at 4:05 AM Michal Hocko  wrote:
>
> On Tue 24-01-23 10:55:21, T.J. Mercier wrote:
> > On Tue, Jan 24, 2023 at 7:00 AM Michal Hocko  wrote:
> > >
> > > On Mon 23-01-23 19:17:23, T.J. Mercier wrote:
> > > > When a buffer is exported to userspace, use memcg to attribute the
> > > > buffer to the allocating cgroup until all buffer references are
> > > > released.
> > >
> > > Is there any reason why this memory cannot be charged during the
> > > allocation (__GFP_ACCOUNT used)?
> >
> > My main motivation was to keep code changes away from exporters and
> > implement the accounting in one common spot for all of them. This is a
> > bit of a carryover from a previous approach [1] where there was some
> > objection to pushing off this work onto exporters and forcing them to
> > adapt, but __GFP_ACCOUNT does seem like a smaller burden than before
> > at least initially. However in order to support charge transfer
> > between cgroups with __GFP_ACCOUNT we'd need to be able to get at the
> > pages backing dmabuf objects, and the exporters are the ones with that
> > access. Meaning I think we'd have to add some additional dma_buf_ops
> > to achieve that, which was the objection from [1].
> >
> > [1] 
> > https://lore.kernel.org/lkml/5cc27a05-8131-ce9b-dea1-5c75e9942...@amd.com/
> >
> > >
> > > Also you do charge and account the memory but underlying pages do not
> > > know about their memcg (this is normally done with commit_charge for
> > > user mapped pages). This would become a problem if the memory is
> > > migrated for example.
> >
> > Hmm, what problem do you see in this situation? If the backing pages
> > are to be migrated that requires the cooperation of the exporter,
> > which currently has no influence on how the cgroup charging is done
> > and that seems fine. (Unless you mean migrating the charge across
> > cgroups? In which case that's the next patch.)
>
> My main concern was that page migration could lose the external tracking
> without some additional steps on the dmabuf front.
>
I see, yes that would be true if an exporter moves data around between
system memory and VRAM for example. (I think TTM does this sort of
thing, but not sure if that's actually within a single dma buffer.)
VRAM feels like it maybe doesn't belong in memcg, yet it would still
be charged there under this series right now. I don't really see a way
around this except to involve the exporters directly in the accounting
(or don't attempt to distinguish between types of memory).

> > > This also means that you have to maintain memcg
> > > reference outside of the memcg proper which is not really nice either.
> > > This mimicks tcp kmem limit implementation which I really have to say I
> > > am not a great fan of and this pattern shouldn't be coppied.
> > >
> > Ah, what can I say. This way looked simple to me. I think otherwise
> > we're back to making all exporters do more stuff for the accounting.
> >
> > > Also you are not really saying anything about the oom behavior. With
> > > this implementation the kernel will try to reclaim the memory and even
> > > trigger the memcg oom killer if the request size is <= 8 pages. Is this
> > > a desirable behavior?
> >
> > It will try to reclaim some memory, but not the dmabuf pages right?
> > Not *yet* anyway. This behavior sounds expected to me.
>
> Yes, we have discussed that shrinkers will follow up later which is
> fine. The question is how much reclaim actually makes sense at this
> stage. Charging interface usually copes with sizes resulting from
> allocation requests (so usually 1< batch charge like implemented here could easily be 100s of MBs and it is
> much harder to define reclaim targets for. At least that is something
> the memcg charging hasn't really considered yet.  Maybe the existing
> try_charge implementation can cope with that just fine but it would be
> really great to have the expected behavior described.
>
> E.g. should be memcg OOM killer be invoked? Should reclaim really target
> regular memory at all costs or just a lightweight memory reclaim is
> preferred (is the dmabuf charge failure an expensive operation wrt.
> memory refault due to reclaim).

Ah, in my experience very large individual buffers like that are rare.
Cumulative system-wide usage might reach 100s of megs or more spread
across many buffers. On my phone the majority of buffer sizes are 4
pages or less, but there are a few that reach into the tens of megs.
But now I see your point. I still think that where a memcg limit is
exceeded and we can't reclaim enough as a result of a new dmabuf
allocation, we should see a memcg OOM kill. Sounds like you are
looking for that to be written down, so I'll try to find a place for
that.

Part of the motivation for this accounting is to eventually have a
well defined limit for applications to know how much more they can
allocate. So where buffer size or number of buffers is a flexible
variable, I'd like to see an application checking this limit 

Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-01-25 Thread T.J. Mercier
On Wed, Jan 25, 2023 at 9:31 AM Tvrtko Ursulin
 wrote:
>
>
> Hi,
>
> On 25/01/2023 11:52, Michal Hocko wrote:
> > On Tue 24-01-23 19:46:28, Shakeel Butt wrote:
> >> On Tue, Jan 24, 2023 at 03:59:58PM +0100, Michal Hocko wrote:
> >>> On Mon 23-01-23 19:17:23, T.J. Mercier wrote:
>  When a buffer is exported to userspace, use memcg to attribute the
>  buffer to the allocating cgroup until all buffer references are
>  released.
> >>>
> >>> Is there any reason why this memory cannot be charged during the
> >>> allocation (__GFP_ACCOUNT used)?
> >>> Also you do charge and account the memory but underlying pages do not
> >>> know about their memcg (this is normally done with commit_charge for
> >>> user mapped pages). This would become a problem if the memory is
> >>> migrated for example.
> >>
> >> I don't think this is movable memory.
> >>
> >>> This also means that you have to maintain memcg
> >>> reference outside of the memcg proper which is not really nice either.
> >>> This mimicks tcp kmem limit implementation which I really have to say I
> >>> am not a great fan of and this pattern shouldn't be coppied.
> >>>
> >>
> >> I think we should keep the discussion on technical merits instead of
> >> personal perference. To me using skmem like interface is totally fine
> >> but the pros/cons need to be very explicit and the clear reasons to
> >> select that option should be included.
> >
> > I do agree with that. I didn't want sound to be personal wrt tcp kmem
> > accounting but the overall code maintenance cost is higher because
> > of how tcp take on accounting differs from anything else in the memcg
> > proper. I would prefer to not grow another example like that.
> >
> >> To me there are two options:
> >>
> >> 1. Using skmem like interface as this patch series:
> >>
> >> The main pros of this option is that it is very simple. Let me list down
> >> the cons of this approach:
> >>
> >> a. There is time window between the actual memory allocation/free and
> >> the charge and uncharge and [un]charge happen when the whole memory is
> >> allocated or freed. I think for the charge path that might not be a big
> >> issue but on the uncharge, this can cause issues. The application and
> >> the potential shrinkers have freed some of this dmabuf memory but until
> >> the whole dmabuf is freed, the memcg uncharge will not happen. This can
> >> consequences on reclaim and oom behavior of the application.
> >>
> >> b. Due to the usage model i.e. a central daemon allocating the dmabuf
> >> memory upfront, there is a requirement to have a memcg charge transfer
> >> functionality to transfer the charge from the central daemon to the
> >> client applications. This does introduce complexity and avenues of weird
> >> reclaim and oom behavior.
> >>
> >>
> >> 2. Allocate and charge the memory on page fault by actual user
> >>
> >> In this approach, the memory is not allocated upfront by the central
> >> daemon but rather on the page fault by the client application and the
> >> memcg charge happen at the same time.
> >>
> >> The only cons I can think of is this approach is more involved and may
> >> need some clever tricks to track the page on the free patch i.e. we to
> >> decrement the dmabuf memcg stat on free path. Maybe a page flag.
> >>
> >> The pros of this approach is there is no need have a charge transfer
> >> functionality and the charge/uncharge being closely tied to the actual
> >> memory allocation and free.
> >>
> >> Personally I would prefer the second approach but I don't want to just
> >> block this work if the dmabuf folks are ok with the cons mentioned of
> >> the first approach.
> >
> > I am not familiar with dmabuf internals to judge complexity on their end
> > but I fully agree that charge-when-used is much more easier to reason
> > about and it should have less subtle surprises.
>
> Disclaimer that I don't seem to see patches 3&4 on dri-devel so maybe I
> am missing something, but in principle yes, I agree that the 2nd option
> (charge the user, not exporter) should be preferred. Thing being that at
> export time there may not be any backing store allocated, plus if the
> series is restricting the charge transfer to just Android clients then
> it seems it has the potential to miss many other use cases. At least
> needs to outline a description on how the feature will be useful outside
> Android.
>
There is no restriction like that. It's available to anybody who wants
to call dma_buf_charge_transfer if they actually have a need for that,
which I don't really expect to be common since most users/owners of
the buffers will be the ones causing the export in the first place.
It's just not like that on Android with the extra allocator process in
the middle most of the time.

> Also stepping back for a moment - is a new memory category really
> needed, versus perhaps attempting to charge the actual backing store
> memory to the correct client? (There might have been many past
> discussions on this so 

Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-01-25 Thread Tvrtko Ursulin



Hi,

On 25/01/2023 11:52, Michal Hocko wrote:

On Tue 24-01-23 19:46:28, Shakeel Butt wrote:

On Tue, Jan 24, 2023 at 03:59:58PM +0100, Michal Hocko wrote:

On Mon 23-01-23 19:17:23, T.J. Mercier wrote:

When a buffer is exported to userspace, use memcg to attribute the
buffer to the allocating cgroup until all buffer references are
released.


Is there any reason why this memory cannot be charged during the
allocation (__GFP_ACCOUNT used)?
Also you do charge and account the memory but underlying pages do not
know about their memcg (this is normally done with commit_charge for
user mapped pages). This would become a problem if the memory is
migrated for example.


I don't think this is movable memory.


This also means that you have to maintain memcg
reference outside of the memcg proper which is not really nice either.
This mimicks tcp kmem limit implementation which I really have to say I
am not a great fan of and this pattern shouldn't be coppied.



I think we should keep the discussion on technical merits instead of
personal perference. To me using skmem like interface is totally fine
but the pros/cons need to be very explicit and the clear reasons to
select that option should be included.


I do agree with that. I didn't want sound to be personal wrt tcp kmem
accounting but the overall code maintenance cost is higher because
of how tcp take on accounting differs from anything else in the memcg
proper. I would prefer to not grow another example like that.


To me there are two options:

1. Using skmem like interface as this patch series:

The main pros of this option is that it is very simple. Let me list down
the cons of this approach:

a. There is time window between the actual memory allocation/free and
the charge and uncharge and [un]charge happen when the whole memory is
allocated or freed. I think for the charge path that might not be a big
issue but on the uncharge, this can cause issues. The application and
the potential shrinkers have freed some of this dmabuf memory but until
the whole dmabuf is freed, the memcg uncharge will not happen. This can
consequences on reclaim and oom behavior of the application.

b. Due to the usage model i.e. a central daemon allocating the dmabuf
memory upfront, there is a requirement to have a memcg charge transfer
functionality to transfer the charge from the central daemon to the
client applications. This does introduce complexity and avenues of weird
reclaim and oom behavior.


2. Allocate and charge the memory on page fault by actual user

In this approach, the memory is not allocated upfront by the central
daemon but rather on the page fault by the client application and the
memcg charge happen at the same time.

The only cons I can think of is this approach is more involved and may
need some clever tricks to track the page on the free patch i.e. we to
decrement the dmabuf memcg stat on free path. Maybe a page flag.

The pros of this approach is there is no need have a charge transfer
functionality and the charge/uncharge being closely tied to the actual
memory allocation and free.

Personally I would prefer the second approach but I don't want to just
block this work if the dmabuf folks are ok with the cons mentioned of
the first approach.


I am not familiar with dmabuf internals to judge complexity on their end
but I fully agree that charge-when-used is much more easier to reason
about and it should have less subtle surprises.


Disclaimer that I don't seem to see patches 3&4 on dri-devel so maybe I 
am missing something, but in principle yes, I agree that the 2nd option 
(charge the user, not exporter) should be preferred. Thing being that at 
export time there may not be any backing store allocated, plus if the 
series is restricting the charge transfer to just Android clients then 
it seems it has the potential to miss many other use cases. At least 
needs to outline a description on how the feature will be useful outside 
Android.


Also stepping back for a moment - is a new memory category really 
needed, versus perhaps attempting to charge the actual backing store 
memory to the correct client? (There might have been many past 
discussions on this so it's okay to point me towards something in the 
archives.)


Regards,

Tvrtko


Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-01-25 Thread Michal Hocko
On Tue 24-01-23 10:55:21, T.J. Mercier wrote:
> On Tue, Jan 24, 2023 at 7:00 AM Michal Hocko  wrote:
> >
> > On Mon 23-01-23 19:17:23, T.J. Mercier wrote:
> > > When a buffer is exported to userspace, use memcg to attribute the
> > > buffer to the allocating cgroup until all buffer references are
> > > released.
> >
> > Is there any reason why this memory cannot be charged during the
> > allocation (__GFP_ACCOUNT used)?
> 
> My main motivation was to keep code changes away from exporters and
> implement the accounting in one common spot for all of them. This is a
> bit of a carryover from a previous approach [1] where there was some
> objection to pushing off this work onto exporters and forcing them to
> adapt, but __GFP_ACCOUNT does seem like a smaller burden than before
> at least initially. However in order to support charge transfer
> between cgroups with __GFP_ACCOUNT we'd need to be able to get at the
> pages backing dmabuf objects, and the exporters are the ones with that
> access. Meaning I think we'd have to add some additional dma_buf_ops
> to achieve that, which was the objection from [1].
> 
> [1] https://lore.kernel.org/lkml/5cc27a05-8131-ce9b-dea1-5c75e9942...@amd.com/
> 
> >
> > Also you do charge and account the memory but underlying pages do not
> > know about their memcg (this is normally done with commit_charge for
> > user mapped pages). This would become a problem if the memory is
> > migrated for example.
> 
> Hmm, what problem do you see in this situation? If the backing pages
> are to be migrated that requires the cooperation of the exporter,
> which currently has no influence on how the cgroup charging is done
> and that seems fine. (Unless you mean migrating the charge across
> cgroups? In which case that's the next patch.)

My main concern was that page migration could lose the external tracking
without some additional steps on the dmabuf front.

> > This also means that you have to maintain memcg
> > reference outside of the memcg proper which is not really nice either.
> > This mimicks tcp kmem limit implementation which I really have to say I
> > am not a great fan of and this pattern shouldn't be coppied.
> >
> Ah, what can I say. This way looked simple to me. I think otherwise
> we're back to making all exporters do more stuff for the accounting.
> 
> > Also you are not really saying anything about the oom behavior. With
> > this implementation the kernel will try to reclaim the memory and even
> > trigger the memcg oom killer if the request size is <= 8 pages. Is this
> > a desirable behavior?
> 
> It will try to reclaim some memory, but not the dmabuf pages right?
> Not *yet* anyway. This behavior sounds expected to me.

Yes, we have discussed that shrinkers will follow up later which is
fine. The question is how much reclaim actually makes sense at this
stage. Charging interface usually copes with sizes resulting from
allocation requests (so usually 1<

Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-01-25 Thread Michal Hocko
On Tue 24-01-23 19:46:28, Shakeel Butt wrote:
> On Tue, Jan 24, 2023 at 03:59:58PM +0100, Michal Hocko wrote:
> > On Mon 23-01-23 19:17:23, T.J. Mercier wrote:
> > > When a buffer is exported to userspace, use memcg to attribute the
> > > buffer to the allocating cgroup until all buffer references are
> > > released.
> > 
> > Is there any reason why this memory cannot be charged during the
> > allocation (__GFP_ACCOUNT used)?
> > Also you do charge and account the memory but underlying pages do not
> > know about their memcg (this is normally done with commit_charge for
> > user mapped pages). This would become a problem if the memory is
> > migrated for example.
> 
> I don't think this is movable memory.
> 
> > This also means that you have to maintain memcg
> > reference outside of the memcg proper which is not really nice either.
> > This mimicks tcp kmem limit implementation which I really have to say I
> > am not a great fan of and this pattern shouldn't be coppied.
> > 
> 
> I think we should keep the discussion on technical merits instead of
> personal perference. To me using skmem like interface is totally fine
> but the pros/cons need to be very explicit and the clear reasons to
> select that option should be included.

I do agree with that. I didn't want sound to be personal wrt tcp kmem
accounting but the overall code maintenance cost is higher because
of how tcp take on accounting differs from anything else in the memcg
proper. I would prefer to not grow another example like that.

> To me there are two options:
> 
> 1. Using skmem like interface as this patch series:
> 
> The main pros of this option is that it is very simple. Let me list down
> the cons of this approach:
> 
> a. There is time window between the actual memory allocation/free and
> the charge and uncharge and [un]charge happen when the whole memory is
> allocated or freed. I think for the charge path that might not be a big
> issue but on the uncharge, this can cause issues. The application and
> the potential shrinkers have freed some of this dmabuf memory but until
> the whole dmabuf is freed, the memcg uncharge will not happen. This can
> consequences on reclaim and oom behavior of the application.
> 
> b. Due to the usage model i.e. a central daemon allocating the dmabuf
> memory upfront, there is a requirement to have a memcg charge transfer
> functionality to transfer the charge from the central daemon to the
> client applications. This does introduce complexity and avenues of weird
> reclaim and oom behavior.
> 
> 
> 2. Allocate and charge the memory on page fault by actual user
> 
> In this approach, the memory is not allocated upfront by the central
> daemon but rather on the page fault by the client application and the
> memcg charge happen at the same time.
> 
> The only cons I can think of is this approach is more involved and may
> need some clever tricks to track the page on the free patch i.e. we to
> decrement the dmabuf memcg stat on free path. Maybe a page flag.
> 
> The pros of this approach is there is no need have a charge transfer
> functionality and the charge/uncharge being closely tied to the actual
> memory allocation and free.
> 
> Personally I would prefer the second approach but I don't want to just
> block this work if the dmabuf folks are ok with the cons mentioned of
> the first approach.

I am not familiar with dmabuf internals to judge complexity on their end
but I fully agree that charge-when-used is much more easier to reason
about and it should have less subtle surprises.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-01-24 Thread Shakeel Butt
On Tue, Jan 24, 2023 at 03:59:58PM +0100, Michal Hocko wrote:
> On Mon 23-01-23 19:17:23, T.J. Mercier wrote:
> > When a buffer is exported to userspace, use memcg to attribute the
> > buffer to the allocating cgroup until all buffer references are
> > released.
> 
> Is there any reason why this memory cannot be charged during the
> allocation (__GFP_ACCOUNT used)?
> Also you do charge and account the memory but underlying pages do not
> know about their memcg (this is normally done with commit_charge for
> user mapped pages). This would become a problem if the memory is
> migrated for example.

I don't think this is movable memory.

> This also means that you have to maintain memcg
> reference outside of the memcg proper which is not really nice either.
> This mimicks tcp kmem limit implementation which I really have to say I
> am not a great fan of and this pattern shouldn't be coppied.
> 

I think we should keep the discussion on technical merits instead of
personal perference. To me using skmem like interface is totally fine
but the pros/cons need to be very explicit and the clear reasons to
select that option should be included.

To me there are two options:

1. Using skmem like interface as this patch series:

The main pros of this option is that it is very simple. Let me list down
the cons of this approach:

a. There is time window between the actual memory allocation/free and
the charge and uncharge and [un]charge happen when the whole memory is
allocated or freed. I think for the charge path that might not be a big
issue but on the uncharge, this can cause issues. The application and
the potential shrinkers have freed some of this dmabuf memory but until
the whole dmabuf is freed, the memcg uncharge will not happen. This can
consequences on reclaim and oom behavior of the application.

b. Due to the usage model i.e. a central daemon allocating the dmabuf
memory upfront, there is a requirement to have a memcg charge transfer
functionality to transfer the charge from the central daemon to the
client applications. This does introduce complexity and avenues of weird
reclaim and oom behavior.


2. Allocate and charge the memory on page fault by actual user

In this approach, the memory is not allocated upfront by the central
daemon but rather on the page fault by the client application and the
memcg charge happen at the same time.

The only cons I can think of is this approach is more involved and may
need some clever tricks to track the page on the free patch i.e. we to
decrement the dmabuf memcg stat on free path. Maybe a page flag.

The pros of this approach is there is no need have a charge transfer
functionality and the charge/uncharge being closely tied to the actual
memory allocation and free.

Personally I would prefer the second approach but I don't want to just
block this work if the dmabuf folks are ok with the cons mentioned of
the first approach.

thanks,
Shakeel


Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-01-24 Thread T.J. Mercier
On Tue, Jan 24, 2023 at 7:00 AM Michal Hocko  wrote:
>
> On Mon 23-01-23 19:17:23, T.J. Mercier wrote:
> > When a buffer is exported to userspace, use memcg to attribute the
> > buffer to the allocating cgroup until all buffer references are
> > released.
>
> Is there any reason why this memory cannot be charged during the
> allocation (__GFP_ACCOUNT used)?

My main motivation was to keep code changes away from exporters and
implement the accounting in one common spot for all of them. This is a
bit of a carryover from a previous approach [1] where there was some
objection to pushing off this work onto exporters and forcing them to
adapt, but __GFP_ACCOUNT does seem like a smaller burden than before
at least initially. However in order to support charge transfer
between cgroups with __GFP_ACCOUNT we'd need to be able to get at the
pages backing dmabuf objects, and the exporters are the ones with that
access. Meaning I think we'd have to add some additional dma_buf_ops
to achieve that, which was the objection from [1].

[1] https://lore.kernel.org/lkml/5cc27a05-8131-ce9b-dea1-5c75e9942...@amd.com/

>
> Also you do charge and account the memory but underlying pages do not
> know about their memcg (this is normally done with commit_charge for
> user mapped pages). This would become a problem if the memory is
> migrated for example.

Hmm, what problem do you see in this situation? If the backing pages
are to be migrated that requires the cooperation of the exporter,
which currently has no influence on how the cgroup charging is done
and that seems fine. (Unless you mean migrating the charge across
cgroups? In which case that's the next patch.)

> This also means that you have to maintain memcg
> reference outside of the memcg proper which is not really nice either.
> This mimicks tcp kmem limit implementation which I really have to say I
> am not a great fan of and this pattern shouldn't be coppied.
>
Ah, what can I say. This way looked simple to me. I think otherwise
we're back to making all exporters do more stuff for the accounting.

> Also you are not really saying anything about the oom behavior. With
> this implementation the kernel will try to reclaim the memory and even
> trigger the memcg oom killer if the request size is <= 8 pages. Is this
> a desirable behavior?

It will try to reclaim some memory, but not the dmabuf pages right?
Not *yet* anyway. This behavior sounds expected to me. I would only
expect it to be surprising for cgroups making heavy use of dmabufs
(that weren't accounted before) *and* with hard limits already very
close to actual usage. I remember Johannes mentioning that what counts
under memcg use is already a bit of a moving target.

> --
> Michal Hocko
> SUSE Labs


Re: [PATCH v2 1/4] memcg: Track exported dma-buffers

2023-01-24 Thread Michal Hocko
On Mon 23-01-23 19:17:23, T.J. Mercier wrote:
> When a buffer is exported to userspace, use memcg to attribute the
> buffer to the allocating cgroup until all buffer references are
> released.

Is there any reason why this memory cannot be charged during the
allocation (__GFP_ACCOUNT used)?
Also you do charge and account the memory but underlying pages do not
know about their memcg (this is normally done with commit_charge for
user mapped pages). This would become a problem if the memory is
migrated for example. This also means that you have to maintain memcg
reference outside of the memcg proper which is not really nice either.
This mimicks tcp kmem limit implementation which I really have to say I
am not a great fan of and this pattern shouldn't be coppied.

Also you are not really saying anything about the oom behavior. With
this implementation the kernel will try to reclaim the memory and even
trigger the memcg oom killer if the request size is <= 8 pages. Is this
a desirable behavior?
-- 
Michal Hocko
SUSE Labs


[PATCH v2 1/4] memcg: Track exported dma-buffers

2023-01-23 Thread T.J. Mercier
When a buffer is exported to userspace, use memcg to attribute the
buffer to the allocating cgroup until all buffer references are
released.

Unlike the dmabuf sysfs stats implementation, this memcg accounting
avoids contention over the kernfs_rwsem incurred when creating or
removing nodes.

Signed-off-by: T.J. Mercier 
---
 Documentation/admin-guide/cgroup-v2.rst |  4 +++
 drivers/dma-buf/dma-buf.c   | 13 +
 include/linux/dma-buf.h |  3 ++
 include/linux/memcontrol.h  | 38 +
 mm/memcontrol.c | 19 +
 5 files changed, 77 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index c8ae7c897f14..538ae22bc514 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1455,6 +1455,10 @@ PAGE_SIZE multiple when read back.
Amount of memory used for storing in-kernel data
structures.
 
+ dmabuf (npn)
+   Amount of memory used for exported DMA buffers allocated by the 
cgroup.
+   Stays with the allocating cgroup regardless of how the buffer 
is shared.
+
  workingset_refault_anon
Number of refaults of previously evicted anonymous pages.
 
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index e6528767efc7..a6a8cb5cb32d 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -75,6 +75,9 @@ static void dma_buf_release(struct dentry *dentry)
 */
BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
 
+   mem_cgroup_uncharge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / 
PAGE_SIZE);
+   mem_cgroup_put(dmabuf->memcg);
+
dma_buf_stats_teardown(dmabuf);
dmabuf->ops->release(dmabuf);
 
@@ -673,6 +676,13 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
if (ret)
goto err_dmabuf;
 
+   dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
+   if (!mem_cgroup_charge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / 
PAGE_SIZE,
+ GFP_KERNEL)) {
+   ret = -ENOMEM;
+   goto err_memcg;
+   }
+
file->private_data = dmabuf;
file->f_path.dentry->d_fsdata = dmabuf;
dmabuf->file = file;
@@ -683,6 +693,9 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
 
return dmabuf;
 
+err_memcg:
+   mem_cgroup_put(dmabuf->memcg);
+   dma_buf_stats_teardown(dmabuf);
 err_dmabuf:
if (!resv)
dma_resv_fini(dmabuf->resv);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 6fa8d4e29719..1f0ffb8e4bf5 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device;
 struct dma_buf;
@@ -446,6 +447,8 @@ struct dma_buf {
struct dma_buf *dmabuf;
} *sysfs_entry;
 #endif
+   /* The cgroup to which this buffer is currently attributed */
+   struct mem_cgroup *memcg;
 };
 
 /**
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d3c8203cab6c..c10b8565fdbf 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -37,6 +37,7 @@ enum memcg_stat_item {
MEMCG_KMEM,
MEMCG_ZSWAP_B,
MEMCG_ZSWAPPED,
+   MEMCG_DMABUF,
MEMCG_NR_STAT,
 };
 
@@ -673,6 +674,25 @@ static inline int mem_cgroup_charge(struct folio *folio, 
struct mm_struct *mm,
 
 int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm,
  gfp_t gfp, swp_entry_t entry);
+
+/**
+ * mem_cgroup_charge_dmabuf - Charge dma-buf memory to a cgroup and update 
stat counter
+ * @memcg: memcg to charge
+ * @nr_pages: number of pages to charge
+ * @gfp_mask: reclaim mode
+ *
+ * Charges @nr_pages to @memcg. Returns %true if the charge fit within
+ * @memcg's configured limit, %false if it doesn't.
+ */
+bool __mem_cgroup_charge_dmabuf(struct mem_cgroup *memcg, unsigned int 
nr_pages, gfp_t gfp_mask);
+static inline bool mem_cgroup_charge_dmabuf(struct mem_cgroup *memcg, unsigned 
int nr_pages,
+   gfp_t gfp_mask)
+{
+   if (mem_cgroup_disabled())
+   return 0;
+   return __mem_cgroup_charge_dmabuf(memcg, nr_pages, gfp_mask);
+}
+
 void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry);
 
 void __mem_cgroup_uncharge(struct folio *folio);
@@ -690,6 +710,14 @@ static inline void mem_cgroup_uncharge(struct folio *folio)
__mem_cgroup_uncharge(folio);
 }
 
+void __mem_cgroup_uncharge_dmabuf(struct mem_cgroup *memcg, unsigned int 
nr_pages);
+static inline void mem_cgroup_uncharge_dmabuf(struct mem_cgroup *memcg, 
unsigned int nr_pages)
+{
+   if (mem_cgroup_disabled())
+   return;
+