subject:"\[RFC\] Per file OOM badness"

[RFC] Per file OOM-badness / RSS once more

2022-06-24 Thread Christian König

Hello everyone,

To summarize the issue I'm trying to address here: Processes can allocate
resources through a file descriptor without being held responsible for it.

I'm not explaining all the details again. See here for a more deeply
description of the problem: 
https://lwn.net/ml/linux-kernel/2022053117.174649-1-christian.koe...@amd.com/

With this iteration I'm trying to address a bunch of the comments Michal Hocko
(thanks a lot for that) gave as well as giving some new ideas.

Changes made so far:
1. Renamed the callback into file_rss(). This is at least a start to better
   describe what this is all about. I've been going back and forth over the
   naming here, if you have any better idea please speak up.

2. Cleanups, e.g. now providing a helper function in the fs layer to sum up
   all the pages allocated by the files in a file descriptor table.

3. Using the actual number of allocated pages for the shmem implementation
   instead of just the size. I also tried to ignore shmem files which are part
   of tmpfs, cause that has a separate accounting/limitation approach.

4. The OOM killer now prints the memory of the killed process including the per
   file pages which makes the whole things much more comprehensible.

5. I've added the per file pages to the different reports in RSS in procfs.
   This has the interesting effect that tools like top suddenly give a much
   more accurate overview of the memory use as well. This of course increases
   the overhead of gathering those information quite a bit and I'm not sure how
   feasible that is for up-streaming. On the other hand this once more clearly
   shows that we need to do something about this issue.

Another rather interesting observation is that multiple subsystems (shmem,
tmpfs, ttm) came up with the same workaround of limiting the memory which can
be allocated through them to 50% of the whole system memory. Unfortunately
that isn't the same 50% and it doesn't apply everywhere, so you can still
easily crash the box.

Ideas and/or comments are really welcome.

Thanks,
Christian.

Re: [RFC] Per file OOM badness

2018-04-04 Thread Michel Dänzer

On 2018-04-04 11:36 AM, Lucas Stach wrote:
> Am Mittwoch, den 04.04.2018, 11:09 +0200 schrieb Michel Dänzer:
>> On 2018-03-26 04:36 PM, Lucas Stach wrote:
>>> Am Dienstag, den 30.01.2018, 11:28 +0100 schrieb Michal Hocko:
 On Tue 30-01-18 10:29:10, Michel Dänzer wrote:
> On 2018-01-24 12:50 PM, Michal Hocko wrote:
>> On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
>>> On 2018-01-24 12:01 PM, Michal Hocko wrote:
 On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
>>
>> [...]
> 2. If the OOM killer kills a process which is sharing BOs
> with another
> process, this should result in the other process dropping
> its references
> to the BOs as well, at which point the memory is released.

 OK. How exactly are those BOs mapped to the userspace?
>>>
>>> I'm not sure what you're asking. Userspace mostly uses a GEM
>>> handle to
>>> refer to a BO. There can also be userspace CPU mappings of the
>>> BO's
>>> memory, but userspace doesn't need CPU mappings for all BOs and
>>> only
>>> creates them as needed.
>>
>> OK, I guess you have to bear with me some more. This whole stack
>> is a
>> complete uknonwn. I am mostly after finding a boundary where you
>> can
>> charge the allocated memory to the process so that the oom killer
>> can
>> consider it. Is there anything like that? Except for the proposed
>> file
>> handle hack?
>
> How about the other way around: what APIs can we use to charge /
> "uncharge" memory to a process? If we have those, we can experiment
> with
> different places to call them.

 add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES.
>>>
>>> So is anyone still working on this? This is hurting us bad enough that
>>> I don't want to keep this topic rotting for another year.
>>>
>>> If no one is currently working on this I would volunteer to give the
>>> simple "just account private, non-shared buffers in process RSS" a
>>> spin.
>>
>> Sounds good. FWIW, I think shared buffers can also be easily handled by
>> accounting them in each process which has a reference. But that's more
>> of a detail, shouldn't make a big difference overall either way.
> 
> Yes, both options to wither never account shared buffers or to always
> account them into every process having a reference should be pretty
> easy. Where it gets hard is when trying to account the buffer only in
> the last process holding a reference or something like this.

FWIW, I don't think that would make sense anyway. A shared buffer is
actually used by all processes which have a reference to it, so it
should be accounted the same in all of them.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-04-04 Thread Lucas Stach

Am Mittwoch, den 04.04.2018, 11:09 +0200 schrieb Michel Dänzer:
> On 2018-03-26 04:36 PM, Lucas Stach wrote:
> > Am Dienstag, den 30.01.2018, 11:28 +0100 schrieb Michal Hocko:
> > > On Tue 30-01-18 10:29:10, Michel Dänzer wrote:
> > > > On 2018-01-24 12:50 PM, Michal Hocko wrote:
> > > > > On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
> > > > > > On 2018-01-24 12:01 PM, Michal Hocko wrote:
> > > > > > > On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> > > > > 
> > > > > [...]
> > > > > > > > 2. If the OOM killer kills a process which is sharing BOs
> > > > > > > > with another
> > > > > > > > process, this should result in the other process dropping
> > > > > > > > its references
> > > > > > > > to the BOs as well, at which point the memory is released.
> > > > > > > 
> > > > > > > OK. How exactly are those BOs mapped to the userspace?
> > > > > > 
> > > > > > I'm not sure what you're asking. Userspace mostly uses a GEM
> > > > > > handle to
> > > > > > refer to a BO. There can also be userspace CPU mappings of the
> > > > > > BO's
> > > > > > memory, but userspace doesn't need CPU mappings for all BOs and
> > > > > > only
> > > > > > creates them as needed.
> > > > > 
> > > > > OK, I guess you have to bear with me some more. This whole stack
> > > > > is a
> > > > > complete uknonwn. I am mostly after finding a boundary where you
> > > > > can
> > > > > charge the allocated memory to the process so that the oom killer
> > > > > can
> > > > > consider it. Is there anything like that? Except for the proposed
> > > > > file
> > > > > handle hack?
> > > > 
> > > > How about the other way around: what APIs can we use to charge /
> > > > "uncharge" memory to a process? If we have those, we can experiment
> > > > with
> > > > different places to call them.
> > > 
> > > add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES.
> > 
> > So is anyone still working on this? This is hurting us bad enough that
> > I don't want to keep this topic rotting for another year.
> > 
> > If no one is currently working on this I would volunteer to give the
> > simple "just account private, non-shared buffers in process RSS" a
> > spin.
> 
> Sounds good. FWIW, I think shared buffers can also be easily handled by
> accounting them in each process which has a reference. But that's more
> of a detail, shouldn't make a big difference overall either way.

Yes, both options to wither never account shared buffers or to always
account them into every process having a reference should be pretty
easy. Where it gets hard is when trying to account the buffer only in
the last process holding a reference or something like this.

For the OOM case I think it makes more sense to never account shared
buffers, as this may lead to a process (like the compositor) to have
its RSS inflated by shared buffers, rendering it the likely victim for
the OOM killer/reaper, while killing this process will not lead to
freeing of any shared graphics memory, at least if the clients sharing
the buffer survive killing of the compositor.

This opens up the possibility to "hide" buffers from the accounting by
sharing them, but I guess it's still much better than the nothing we do
today to account for graphics buffers.

Regards,
Lucas
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-04-04 Thread Michel Dänzer

On 2018-03-26 04:36 PM, Lucas Stach wrote:
> Am Dienstag, den 30.01.2018, 11:28 +0100 schrieb Michal Hocko:
>> On Tue 30-01-18 10:29:10, Michel Dänzer wrote:
>>> On 2018-01-24 12:50 PM, Michal Hocko wrote:
 On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
> On 2018-01-24 12:01 PM, Michal Hocko wrote:
>> On Wed 24-01-18 11:27:15, Michel Dänzer wrote:

 [...]
>>> 2. If the OOM killer kills a process which is sharing BOs
>>> with another
>>> process, this should result in the other process dropping
>>> its references
>>> to the BOs as well, at which point the memory is released.
>>
>> OK. How exactly are those BOs mapped to the userspace?
>
> I'm not sure what you're asking. Userspace mostly uses a GEM
> handle to
> refer to a BO. There can also be userspace CPU mappings of the
> BO's
> memory, but userspace doesn't need CPU mappings for all BOs and
> only
> creates them as needed.

 OK, I guess you have to bear with me some more. This whole stack
 is a
 complete uknonwn. I am mostly after finding a boundary where you
 can
 charge the allocated memory to the process so that the oom killer
 can
 consider it. Is there anything like that? Except for the proposed
 file
 handle hack?
>>>
>>> How about the other way around: what APIs can we use to charge /
>>> "uncharge" memory to a process? If we have those, we can experiment
>>> with
>>> different places to call them.
>>
>> add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES.
> 
> So is anyone still working on this? This is hurting us bad enough that
> I don't want to keep this topic rotting for another year.
> 
> If no one is currently working on this I would volunteer to give the
> simple "just account private, non-shared buffers in process RSS" a
> spin.

Sounds good. FWIW, I think shared buffers can also be easily handled by
accounting them in each process which has a reference. But that's more
of a detail, shouldn't make a big difference overall either way.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-03-26 Thread Lucas Stach

Hi all,

Am Dienstag, den 30.01.2018, 11:28 +0100 schrieb Michal Hocko:
> On Tue 30-01-18 10:29:10, Michel Dänzer wrote:
> > On 2018-01-24 12:50 PM, Michal Hocko wrote:
> > > On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
> > > > On 2018-01-24 12:01 PM, Michal Hocko wrote:
> > > > > On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> > > 
> > > [...]
> > > > > > 2. If the OOM killer kills a process which is sharing BOs
> > > > > > with another
> > > > > > process, this should result in the other process dropping
> > > > > > its references
> > > > > > to the BOs as well, at which point the memory is released.
> > > > > 
> > > > > OK. How exactly are those BOs mapped to the userspace?
> > > > 
> > > > I'm not sure what you're asking. Userspace mostly uses a GEM
> > > > handle to
> > > > refer to a BO. There can also be userspace CPU mappings of the
> > > > BO's
> > > > memory, but userspace doesn't need CPU mappings for all BOs and
> > > > only
> > > > creates them as needed.
> > > 
> > > OK, I guess you have to bear with me some more. This whole stack
> > > is a
> > > complete uknonwn. I am mostly after finding a boundary where you
> > > can
> > > charge the allocated memory to the process so that the oom killer
> > > can
> > > consider it. Is there anything like that? Except for the proposed
> > > file
> > > handle hack?
> > 
> > How about the other way around: what APIs can we use to charge /
> > "uncharge" memory to a process? If we have those, we can experiment
> > with
> > different places to call them.
> 
> add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES.

So is anyone still working on this? This is hurting us bad enough that
I don't want to keep this topic rotting for another year.

If no one is currently working on this I would volunteer to give the
simple "just account private, non-shared buffers in process RSS" a
spin.

Regards,
Lucas
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-30 Thread Michel Dänzer

On 2018-01-30 12:56 PM, Christian König wrote:
> Am 30.01.2018 um 12:42 schrieb Michel Dänzer:
>> On 2018-01-30 12:36 PM, Nicolai Hähnle wrote:
>>> On 30.01.2018 12:34, Michel Dänzer wrote:
 On 2018-01-30 12:28 PM, Christian König wrote:
> Am 30.01.2018 um 12:02 schrieb Michel Dänzer:
>> On 2018-01-30 11:40 AM, Christian König wrote:
>>> Am 30.01.2018 um 10:43 schrieb Michel Dänzer:
 [SNIP]
> Would it be ok to hang onto potentially arbitrary mmget references
> essentially forever? If that's ok I think we can do your process
> based
> account (minus a few minor inaccuracies for shared stuff perhaps,
> but no
> one cares about that).
 Honestly, I think you and Christian are overthinking this. Let's
 try
 charging the memory to every process which shares a buffer, and go
 from
 there.
>>> My problem is that this needs to be bullet prove.
>>>
>>> For example imagine an application which allocates a lot of BOs,
>>> then
>>> calls fork() and let the parent process die. The file descriptor
>>> lives
>>> on in the child process, but the memory is not accounted against the
>>> child.
>> What exactly are you referring to by "the file descriptor" here?
> The file descriptor used to identify the connection to the driver. In
> other words our drm_file structure in the kernel.
>
>> What happens to BO handles in general in this case? If both parent
>> and
>> child process keep the same handle for the same BO, one of them
>> destroying the handle will result in the other one not being able to
>> use
>> it anymore either, won't it?
> Correct.
>
> That usage is actually not useful at all, but we already had
> applications which did exactly that by accident.
>
> Not to mention that somebody could do it on purpose.
 Can we just prevent child processes from using their parent's DRM file
 descriptors altogether? Allowing it seems like a bad idea all around.
>>> Existing protocols pass DRM fds between processes though, don't they?
>>>
>>> Not child processes perhaps, but special-casing that seems like awful
>>> design.
>> Fair enough.
>>
>> Can we disallow passing DRM file descriptors which have any buffers
>> allocated? :)
> 
> Hehe good point, but I'm sorry I have to ruin that.
> 
> The root VM page table is allocated when the DRM file descriptor is
> created and we want to account those to whoever uses the file descriptor
> as well.

Alternatively, since the file descriptor is closed in the sending
process in this case, maybe we can "uncharge" the buffer memory from the
sending process and charge it to the receiving one during the transfer?


> Looking into the fs layer there actually only seem to be two function
> which are involved when a file descriptor is installed/removed from a
> process. So we just need to add some callbacks there.

That could work for file descriptor passing, but I'm not sure it really
helps for the fork case. Let's say we charge the buffer memory to the
child process as well. If either process later destroys a buffer handle,
the buffer becomes inaccessible to the other process as well, however
its memory remains charged to it (even though it may already be freed).

I think using a DRM file descriptor in both parent and child processes
is a pathological case that we really want to prevent rather than
worrying about how to make it work well. It doesn't seem to be working
well in general already anyway.


Maybe we could keep track of which process "owns" a DRM file descriptor,
and return an error from any relevant system calls for it from another
process. When passing an fd, its ownership would transfer to the
receiving process. When forking, the ownership would remain with the
parent process.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-30 Thread Christian König


Am 30.01.2018 um 12:42 schrieb Michel Dänzer:

On 2018-01-30 12:36 PM, Nicolai Hähnle wrote:

On 30.01.2018 12:34, Michel Dänzer wrote:

On 2018-01-30 12:28 PM, Christian König wrote:

Am 30.01.2018 um 12:02 schrieb Michel Dänzer:

On 2018-01-30 11:40 AM, Christian König wrote:

Am 30.01.2018 um 10:43 schrieb Michel Dänzer:

[SNIP]

Would it be ok to hang onto potentially arbitrary mmget references
essentially forever? If that's ok I think we can do your process
based
account (minus a few minor inaccuracies for shared stuff perhaps,
but no
one cares about that).

Honestly, I think you and Christian are overthinking this. Let's try
charging the memory to every process which shares a buffer, and go
from
there.

My problem is that this needs to be bullet prove.

For example imagine an application which allocates a lot of BOs, then
calls fork() and let the parent process die. The file descriptor lives
on in the child process, but the memory is not accounted against the
child.

What exactly are you referring to by "the file descriptor" here?

The file descriptor used to identify the connection to the driver. In
other words our drm_file structure in the kernel.


What happens to BO handles in general in this case? If both parent and
child process keep the same handle for the same BO, one of them
destroying the handle will result in the other one not being able to
use
it anymore either, won't it?

Correct.

That usage is actually not useful at all, but we already had
applications which did exactly that by accident.

Not to mention that somebody could do it on purpose.

Can we just prevent child processes from using their parent's DRM file
descriptors altogether? Allowing it seems like a bad idea all around.

Existing protocols pass DRM fds between processes though, don't they?

Not child processes perhaps, but special-casing that seems like awful
design.

Fair enough.

Can we disallow passing DRM file descriptors which have any buffers
allocated? :)


Hehe good point, but I'm sorry I have to ruin that.

The root VM page table is allocated when the DRM file descriptor is 
created and we want to account those to whoever uses the file descriptor 
as well.


We could now make an exception for the root VM page table to not be 
accounted (shouldn't be that much compared to the rest of the VM tree), 
but Nicolai is right all those exceptions are just an awful design :)


Looking into the fs layer there actually only seem to be two function 
which are involved when a file descriptor is installed/removed from a 
process. So we just need to add some callbacks there.


Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-30 Thread Michel Dänzer

On 2018-01-30 12:36 PM, Nicolai Hähnle wrote:
> On 30.01.2018 12:34, Michel Dänzer wrote:
>> On 2018-01-30 12:28 PM, Christian König wrote:
>>> Am 30.01.2018 um 12:02 schrieb Michel Dänzer:
 On 2018-01-30 11:40 AM, Christian König wrote:
> Am 30.01.2018 um 10:43 schrieb Michel Dänzer:
>> [SNIP]
>>> Would it be ok to hang onto potentially arbitrary mmget references
>>> essentially forever? If that's ok I think we can do your process
>>> based
>>> account (minus a few minor inaccuracies for shared stuff perhaps,
>>> but no
>>> one cares about that).
>> Honestly, I think you and Christian are overthinking this. Let's try
>> charging the memory to every process which shares a buffer, and go
>> from
>> there.
> My problem is that this needs to be bullet prove.
>
> For example imagine an application which allocates a lot of BOs, then
> calls fork() and let the parent process die. The file descriptor lives
> on in the child process, but the memory is not accounted against the
> child.
 What exactly are you referring to by "the file descriptor" here?
>>>
>>> The file descriptor used to identify the connection to the driver. In
>>> other words our drm_file structure in the kernel.
>>>
 What happens to BO handles in general in this case? If both parent and
 child process keep the same handle for the same BO, one of them
 destroying the handle will result in the other one not being able to
 use
 it anymore either, won't it?
>>> Correct.
>>>
>>> That usage is actually not useful at all, but we already had
>>> applications which did exactly that by accident.
>>>
>>> Not to mention that somebody could do it on purpose.
>>
>> Can we just prevent child processes from using their parent's DRM file
>> descriptors altogether? Allowing it seems like a bad idea all around.
> 
> Existing protocols pass DRM fds between processes though, don't they?
> 
> Not child processes perhaps, but special-casing that seems like awful
> design.

Fair enough.

Can we disallow passing DRM file descriptors which have any buffers
allocated? :)


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-30 Thread Nicolai Hähnle


On 30.01.2018 12:34, Michel Dänzer wrote:

On 2018-01-30 12:28 PM, Christian König wrote:

Am 30.01.2018 um 12:02 schrieb Michel Dänzer:

On 2018-01-30 11:40 AM, Christian König wrote:

Am 30.01.2018 um 10:43 schrieb Michel Dänzer:

[SNIP]

Would it be ok to hang onto potentially arbitrary mmget references
essentially forever? If that's ok I think we can do your process based
account (minus a few minor inaccuracies for shared stuff perhaps,
but no
one cares about that).

Honestly, I think you and Christian are overthinking this. Let's try
charging the memory to every process which shares a buffer, and go from
there.

My problem is that this needs to be bullet prove.

For example imagine an application which allocates a lot of BOs, then
calls fork() and let the parent process die. The file descriptor lives
on in the child process, but the memory is not accounted against the
child.

What exactly are you referring to by "the file descriptor" here?


The file descriptor used to identify the connection to the driver. In
other words our drm_file structure in the kernel.


What happens to BO handles in general in this case? If both parent and
child process keep the same handle for the same BO, one of them
destroying the handle will result in the other one not being able to use
it anymore either, won't it?

Correct.

That usage is actually not useful at all, but we already had
applications which did exactly that by accident.

Not to mention that somebody could do it on purpose.


Can we just prevent child processes from using their parent's DRM file
descriptors altogether? Allowing it seems like a bad idea all around.


Existing protocols pass DRM fds between processes though, don't they?

Not child processes perhaps, but special-casing that seems like awful 
design.


Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-30 Thread Nicolai Hähnle


On 30.01.2018 11:48, Michel Dänzer wrote:

On 2018-01-30 11:42 AM, Daniel Vetter wrote:

On Tue, Jan 30, 2018 at 10:43:10AM +0100, Michel Dänzer wrote:

On 2018-01-30 10:31 AM, Daniel Vetter wrote:


I guess a good first order approximation would be if we simply charge any
newly allocated buffers to the process that created them, but that means
hanging onto lots of mm_struct pointers since we want to make sure we then
release those pages to the right mm again (since the process that drops
the last ref might be a totally different one, depending upon how the
buffers or DRM fd have been shared).

Would it be ok to hang onto potentially arbitrary mmget references
essentially forever? If that's ok I think we can do your process based
account (minus a few minor inaccuracies for shared stuff perhaps, but no
one cares about that).


Honestly, I think you and Christian are overthinking this. Let's try
charging the memory to every process which shares a buffer, and go from
there.


I'm not concerned about wrongly accounting shared buffers (they don't
matter), but imbalanced accounting. I.e. allocate a buffer in the client,
share it, but then the compositor drops the last reference.


I don't think the order matters. The memory is "uncharged" in each
process when it drops its reference.


Daniel made a fair point about passing DRM fds between processes, though.

It's not a problem with how the fds are currently used, but somebody 
could do the following:


1. Create a DRM fd in process A, allocate lots of buffers.
2. Pass the fd to process B via some IPC mechanism.
3. Exit process A.

There needs to be some assurance that the BOs are accounted as belonging 
to process B in the end.


Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-30 Thread Michel Dänzer

On 2018-01-30 12:28 PM, Christian König wrote:
> Am 30.01.2018 um 12:02 schrieb Michel Dänzer:
>> On 2018-01-30 11:40 AM, Christian König wrote:
>>> Am 30.01.2018 um 10:43 schrieb Michel Dänzer:
 [SNIP]
> Would it be ok to hang onto potentially arbitrary mmget references
> essentially forever? If that's ok I think we can do your process based
> account (minus a few minor inaccuracies for shared stuff perhaps,
> but no
> one cares about that).
 Honestly, I think you and Christian are overthinking this. Let's try
 charging the memory to every process which shares a buffer, and go from
 there.
>>> My problem is that this needs to be bullet prove.
>>>
>>> For example imagine an application which allocates a lot of BOs, then
>>> calls fork() and let the parent process die. The file descriptor lives
>>> on in the child process, but the memory is not accounted against the
>>> child.
>> What exactly are you referring to by "the file descriptor" here?
> 
> The file descriptor used to identify the connection to the driver. In
> other words our drm_file structure in the kernel.
> 
>> What happens to BO handles in general in this case? If both parent and
>> child process keep the same handle for the same BO, one of them
>> destroying the handle will result in the other one not being able to use
>> it anymore either, won't it?
> Correct.
> 
> That usage is actually not useful at all, but we already had
> applications which did exactly that by accident.
> 
> Not to mention that somebody could do it on purpose.

Can we just prevent child processes from using their parent's DRM file
descriptors altogether? Allowing it seems like a bad idea all around.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-30 Thread Christian König


Am 30.01.2018 um 12:02 schrieb Michel Dänzer:

On 2018-01-30 11:40 AM, Christian König wrote:

Am 30.01.2018 um 10:43 schrieb Michel Dänzer:

[SNIP]

Would it be ok to hang onto potentially arbitrary mmget references
essentially forever? If that's ok I think we can do your process based
account (minus a few minor inaccuracies for shared stuff perhaps, but no
one cares about that).

Honestly, I think you and Christian are overthinking this. Let's try
charging the memory to every process which shares a buffer, and go from
there.

My problem is that this needs to be bullet prove.

For example imagine an application which allocates a lot of BOs, then
calls fork() and let the parent process die. The file descriptor lives
on in the child process, but the memory is not accounted against the child.

What exactly are you referring to by "the file descriptor" here?


The file descriptor used to identify the connection to the driver. In 
other words our drm_file structure in the kernel.



What happens to BO handles in general in this case? If both parent and
child process keep the same handle for the same BO, one of them
destroying the handle will result in the other one not being able to use
it anymore either, won't it?

Correct.

That usage is actually not useful at all, but we already had 
applications which did exactly that by accident.


Not to mention that somebody could do it on purpose.

Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-30 Thread Michel Dänzer

On 2018-01-30 11:40 AM, Christian König wrote:
> Am 30.01.2018 um 10:43 schrieb Michel Dänzer:
>> [SNIP]
>>> Would it be ok to hang onto potentially arbitrary mmget references
>>> essentially forever? If that's ok I think we can do your process based
>>> account (minus a few minor inaccuracies for shared stuff perhaps, but no
>>> one cares about that).
>> Honestly, I think you and Christian are overthinking this. Let's try
>> charging the memory to every process which shares a buffer, and go from
>> there.
> 
> My problem is that this needs to be bullet prove.
> 
> For example imagine an application which allocates a lot of BOs, then
> calls fork() and let the parent process die. The file descriptor lives
> on in the child process, but the memory is not accounted against the child.

What exactly are you referring to by "the file descriptor" here?


What happens to BO handles in general in this case? If both parent and
child process keep the same handle for the same BO, one of them
destroying the handle will result in the other one not being able to use
it anymore either, won't it?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-30 Thread Michel Dänzer

On 2018-01-30 11:42 AM, Daniel Vetter wrote:
> On Tue, Jan 30, 2018 at 10:43:10AM +0100, Michel Dänzer wrote:
>> On 2018-01-30 10:31 AM, Daniel Vetter wrote:
>>
>>> I guess a good first order approximation would be if we simply charge any
>>> newly allocated buffers to the process that created them, but that means
>>> hanging onto lots of mm_struct pointers since we want to make sure we then
>>> release those pages to the right mm again (since the process that drops
>>> the last ref might be a totally different one, depending upon how the
>>> buffers or DRM fd have been shared).
>>>
>>> Would it be ok to hang onto potentially arbitrary mmget references
>>> essentially forever? If that's ok I think we can do your process based
>>> account (minus a few minor inaccuracies for shared stuff perhaps, but no
>>> one cares about that).
>>
>> Honestly, I think you and Christian are overthinking this. Let's try
>> charging the memory to every process which shares a buffer, and go from
>> there.
> 
> I'm not concerned about wrongly accounting shared buffers (they don't
> matter), but imbalanced accounting. I.e. allocate a buffer in the client,
> share it, but then the compositor drops the last reference.

I don't think the order matters. The memory is "uncharged" in each
process when it drops its reference.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-30 Thread Daniel Vetter

On Tue, Jan 30, 2018 at 10:43:10AM +0100, Michel Dänzer wrote:
> On 2018-01-30 10:31 AM, Daniel Vetter wrote:
> > On Wed, Jan 24, 2018 at 01:11:09PM +0100, Christian König wrote:
> >> Am 24.01.2018 um 12:50 schrieb Michal Hocko:
> >>> On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
>  On 2018-01-24 12:01 PM, Michal Hocko wrote:
> > On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> >>> [...]
> >> 2. If the OOM killer kills a process which is sharing BOs with another
> >> process, this should result in the other process dropping its 
> >> references
> >> to the BOs as well, at which point the memory is released.
> > OK. How exactly are those BOs mapped to the userspace?
>  I'm not sure what you're asking. Userspace mostly uses a GEM handle to
>  refer to a BO. There can also be userspace CPU mappings of the BO's
>  memory, but userspace doesn't need CPU mappings for all BOs and only
>  creates them as needed.
> >>> OK, I guess you have to bear with me some more. This whole stack is a
> >>> complete uknonwn. I am mostly after finding a boundary where you can
> >>> charge the allocated memory to the process so that the oom killer can
> >>> consider it. Is there anything like that? Except for the proposed file
> >>> handle hack?
> >>
> >> Not that I knew of.
> >>
> >> As I said before we need some kind of callback that a process now starts to
> >> use a file descriptor, but without anything from that file descriptor 
> >> mapped
> >> into the address space.
> > 
> > For more context: With DRI3 and wayland the compositor opens the DRM fd
> > and then passes it to the client, which then starts allocating stuff. That
> > makes book-keeping rather annoying.
> 
> Actually, what you're describing is only true for the buffers shared by
> an X server with an X11 compositor. For the actual applications, the
> buffers are created on the client side and then shared with the X server
> / Wayland compositor.
> 
> Anyway, it doesn't really matter. In all cases, the buffers are actually
> used by all parties that are sharing them, so charging the memory to all
> of them is perfectly appropriate.
> 
> 
> > I guess a good first order approximation would be if we simply charge any
> > newly allocated buffers to the process that created them, but that means
> > hanging onto lots of mm_struct pointers since we want to make sure we then
> > release those pages to the right mm again (since the process that drops
> > the last ref might be a totally different one, depending upon how the
> > buffers or DRM fd have been shared).
> > 
> > Would it be ok to hang onto potentially arbitrary mmget references
> > essentially forever? If that's ok I think we can do your process based
> > account (minus a few minor inaccuracies for shared stuff perhaps, but no
> > one cares about that).
> 
> Honestly, I think you and Christian are overthinking this. Let's try
> charging the memory to every process which shares a buffer, and go from
> there.

I'm not concerned about wrongly accounting shared buffers (they don't
matter), but imbalanced accounting. I.e. allocate a buffer in the client,
share it, but then the compositor drops the last reference.

If we store the mm_struct pointer in drm_gem_object, we don't need any
callback from the vfs when fds are shared or anything like that. We can
simply account any newly allocated buffers to the current->mm, and then
store that later for dropping the account for when the gem obj is
released. This would entirely ignore any complications with shared
buffers, which I think we can do because even when we pass the DRM fd to a
different process, the actual buffer allocations are not passed around
like that for private buffers. And private buffers are the only ones that
really matter.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-30 Thread Christian König


Am 30.01.2018 um 10:43 schrieb Michel Dänzer:

[SNIP]

Would it be ok to hang onto potentially arbitrary mmget references
essentially forever? If that's ok I think we can do your process based
account (minus a few minor inaccuracies for shared stuff perhaps, but no
one cares about that).

Honestly, I think you and Christian are overthinking this. Let's try
charging the memory to every process which shares a buffer, and go from
there.


My problem is that this needs to be bullet prove.

For example imagine an application which allocates a lot of BOs, then 
calls fork() and let the parent process die. The file descriptor lives 
on in the child process, but the memory is not accounted against the child.


Otherwise we would allow easy construction of deny of service problems.

To avoid that I think we need to add something like new file_operations 
callbacks which informs a file descriptor that it is going to be used in 
a new process or stopped to be used in a process.


Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-30 Thread Michal Hocko

On Tue 30-01-18 10:29:10, Michel Dänzer wrote:
> On 2018-01-24 12:50 PM, Michal Hocko wrote:
> > On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
> >> On 2018-01-24 12:01 PM, Michal Hocko wrote:
> >>> On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> > [...]
>  2. If the OOM killer kills a process which is sharing BOs with another
>  process, this should result in the other process dropping its references
>  to the BOs as well, at which point the memory is released.
> >>>
> >>> OK. How exactly are those BOs mapped to the userspace?
> >>
> >> I'm not sure what you're asking. Userspace mostly uses a GEM handle to
> >> refer to a BO. There can also be userspace CPU mappings of the BO's
> >> memory, but userspace doesn't need CPU mappings for all BOs and only
> >> creates them as needed.
> > 
> > OK, I guess you have to bear with me some more. This whole stack is a
> > complete uknonwn. I am mostly after finding a boundary where you can
> > charge the allocated memory to the process so that the oom killer can
> > consider it. Is there anything like that? Except for the proposed file
> > handle hack?
> 
> How about the other way around: what APIs can we use to charge /
> "uncharge" memory to a process? If we have those, we can experiment with
> different places to call them.

add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES.

-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-30 Thread Michel Dänzer

On 2018-01-30 10:31 AM, Daniel Vetter wrote:
> On Wed, Jan 24, 2018 at 01:11:09PM +0100, Christian König wrote:
>> Am 24.01.2018 um 12:50 schrieb Michal Hocko:
>>> On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
 On 2018-01-24 12:01 PM, Michal Hocko wrote:
> On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
>>> [...]
>> 2. If the OOM killer kills a process which is sharing BOs with another
>> process, this should result in the other process dropping its references
>> to the BOs as well, at which point the memory is released.
> OK. How exactly are those BOs mapped to the userspace?
 I'm not sure what you're asking. Userspace mostly uses a GEM handle to
 refer to a BO. There can also be userspace CPU mappings of the BO's
 memory, but userspace doesn't need CPU mappings for all BOs and only
 creates them as needed.
>>> OK, I guess you have to bear with me some more. This whole stack is a
>>> complete uknonwn. I am mostly after finding a boundary where you can
>>> charge the allocated memory to the process so that the oom killer can
>>> consider it. Is there anything like that? Except for the proposed file
>>> handle hack?
>>
>> Not that I knew of.
>>
>> As I said before we need some kind of callback that a process now starts to
>> use a file descriptor, but without anything from that file descriptor mapped
>> into the address space.
> 
> For more context: With DRI3 and wayland the compositor opens the DRM fd
> and then passes it to the client, which then starts allocating stuff. That
> makes book-keeping rather annoying.

Actually, what you're describing is only true for the buffers shared by
an X server with an X11 compositor. For the actual applications, the
buffers are created on the client side and then shared with the X server
/ Wayland compositor.

Anyway, it doesn't really matter. In all cases, the buffers are actually
used by all parties that are sharing them, so charging the memory to all
of them is perfectly appropriate.


> I guess a good first order approximation would be if we simply charge any
> newly allocated buffers to the process that created them, but that means
> hanging onto lots of mm_struct pointers since we want to make sure we then
> release those pages to the right mm again (since the process that drops
> the last ref might be a totally different one, depending upon how the
> buffers or DRM fd have been shared).
> 
> Would it be ok to hang onto potentially arbitrary mmget references
> essentially forever? If that's ok I think we can do your process based
> account (minus a few minor inaccuracies for shared stuff perhaps, but no
> one cares about that).

Honestly, I think you and Christian are overthinking this. Let's try
charging the memory to every process which shares a buffer, and go from
there.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-30 Thread Daniel Vetter

On Wed, Jan 24, 2018 at 01:11:09PM +0100, Christian König wrote:
> Am 24.01.2018 um 12:50 schrieb Michal Hocko:
> > On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
> > > On 2018-01-24 12:01 PM, Michal Hocko wrote:
> > > > On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> > [...]
> > > > > 2. If the OOM killer kills a process which is sharing BOs with another
> > > > > process, this should result in the other process dropping its 
> > > > > references
> > > > > to the BOs as well, at which point the memory is released.
> > > > OK. How exactly are those BOs mapped to the userspace?
> > > I'm not sure what you're asking. Userspace mostly uses a GEM handle to
> > > refer to a BO. There can also be userspace CPU mappings of the BO's
> > > memory, but userspace doesn't need CPU mappings for all BOs and only
> > > creates them as needed.
> > OK, I guess you have to bear with me some more. This whole stack is a
> > complete uknonwn. I am mostly after finding a boundary where you can
> > charge the allocated memory to the process so that the oom killer can
> > consider it. Is there anything like that? Except for the proposed file
> > handle hack?
> 
> Not that I knew of.
> 
> As I said before we need some kind of callback that a process now starts to
> use a file descriptor, but without anything from that file descriptor mapped
> into the address space.

For more context: With DRI3 and wayland the compositor opens the DRM fd
and then passes it to the client, which then starts allocating stuff. That
makes book-keeping rather annoying.

I guess a good first order approximation would be if we simply charge any
newly allocated buffers to the process that created them, but that means
hanging onto lots of mm_struct pointers since we want to make sure we then
release those pages to the right mm again (since the process that drops
the last ref might be a totally different one, depending upon how the
buffers or DRM fd have been shared).

Would it be ok to hang onto potentially arbitrary mmget references
essentially forever? If that's ok I think we can do your process based
account (minus a few minor inaccuracies for shared stuff perhaps, but no
one cares about that).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-30 Thread Michel Dänzer

On 2018-01-24 12:50 PM, Michal Hocko wrote:
> On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
>> On 2018-01-24 12:01 PM, Michal Hocko wrote:
>>> On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> [...]
 2. If the OOM killer kills a process which is sharing BOs with another
 process, this should result in the other process dropping its references
 to the BOs as well, at which point the memory is released.
>>>
>>> OK. How exactly are those BOs mapped to the userspace?
>>
>> I'm not sure what you're asking. Userspace mostly uses a GEM handle to
>> refer to a BO. There can also be userspace CPU mappings of the BO's
>> memory, but userspace doesn't need CPU mappings for all BOs and only
>> creates them as needed.
> 
> OK, I guess you have to bear with me some more. This whole stack is a
> complete uknonwn. I am mostly after finding a boundary where you can
> charge the allocated memory to the process so that the oom killer can
> consider it. Is there anything like that? Except for the proposed file
> handle hack?

How about the other way around: what APIs can we use to charge /
"uncharge" memory to a process? If we have those, we can experiment with
different places to call them.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-24 Thread Michel Dänzer

On 2018-01-24 12:50 PM, Michal Hocko wrote:
> On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
>> On 2018-01-24 12:01 PM, Michal Hocko wrote:
>>> On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> [...]
 2. If the OOM killer kills a process which is sharing BOs with another
 process, this should result in the other process dropping its references
 to the BOs as well, at which point the memory is released.
>>>
>>> OK. How exactly are those BOs mapped to the userspace?
>>
>> I'm not sure what you're asking. Userspace mostly uses a GEM handle to
>> refer to a BO. There can also be userspace CPU mappings of the BO's
>> memory, but userspace doesn't need CPU mappings for all BOs and only
>> creates them as needed.
> 
> OK, I guess you have to bear with me some more. This whole stack is a
> complete uknonwn. I am mostly after finding a boundary where you can
> charge the allocated memory to the process so that the oom killer can
> consider it. Is there anything like that?

I think something like charging the memory of a BO to the process when a
userspace handle is created for it, and "uncharging" when a handle is
destroyed, could be a good start.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-24 Thread Christian König


Am 24.01.2018 um 12:50 schrieb Michal Hocko:

On Wed 24-01-18 12:23:10, Michel Dänzer wrote:

On 2018-01-24 12:01 PM, Michal Hocko wrote:

On Wed 24-01-18 11:27:15, Michel Dänzer wrote:

[...]

2. If the OOM killer kills a process which is sharing BOs with another
process, this should result in the other process dropping its references
to the BOs as well, at which point the memory is released.

OK. How exactly are those BOs mapped to the userspace?

I'm not sure what you're asking. Userspace mostly uses a GEM handle to
refer to a BO. There can also be userspace CPU mappings of the BO's
memory, but userspace doesn't need CPU mappings for all BOs and only
creates them as needed.

OK, I guess you have to bear with me some more. This whole stack is a
complete uknonwn. I am mostly after finding a boundary where you can
charge the allocated memory to the process so that the oom killer can
consider it. Is there anything like that? Except for the proposed file
handle hack?


Not that I knew of.

As I said before we need some kind of callback that a process now starts 
to use a file descriptor, but without anything from that file descriptor 
mapped into the address space.


Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-24 Thread Michal Hocko

On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
> On 2018-01-24 12:01 PM, Michal Hocko wrote:
> > On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
[...]
> >> 2. If the OOM killer kills a process which is sharing BOs with another
> >> process, this should result in the other process dropping its references
> >> to the BOs as well, at which point the memory is released.
> > 
> > OK. How exactly are those BOs mapped to the userspace?
> 
> I'm not sure what you're asking. Userspace mostly uses a GEM handle to
> refer to a BO. There can also be userspace CPU mappings of the BO's
> memory, but userspace doesn't need CPU mappings for all BOs and only
> creates them as needed.

OK, I guess you have to bear with me some more. This whole stack is a
complete uknonwn. I am mostly after finding a boundary where you can
charge the allocated memory to the process so that the oom killer can
consider it. Is there anything like that? Except for the proposed file
handle hack?
-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-24 Thread Michel Dänzer

On 2018-01-24 12:01 PM, Michal Hocko wrote:
> On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
>> On 2018-01-24 10:28 AM, Michal Hocko wrote:
> [...]
>>> So how exactly then helps to kill one of those processes? The memory
>>> stays pinned behind or do I still misunderstand?
>>
>> Fundamentally, the memory is only released once all references to the
>> BOs are dropped. That's true no matter how the memory is accounted for
>> between the processes referencing the BO.
>>
>>
>> In practice, this should be fine:
>>
>> 1. The amount of memory used for shared BOs is normally small compared
>> to the amount of memory used for non-shared BOs (and other things). So
>> regardless of how shared BOs are accounted for, the OOM killer should
>> first target the process which is responsible for more memory overall.
> 
> OK. So this is essentially the same as with the normal shared memory
> which is a part of the RSS in general.

Right.


>> 2. If the OOM killer kills a process which is sharing BOs with another
>> process, this should result in the other process dropping its references
>> to the BOs as well, at which point the memory is released.
> 
> OK. How exactly are those BOs mapped to the userspace?

I'm not sure what you're asking. Userspace mostly uses a GEM handle to
refer to a BO. There can also be userspace CPU mappings of the BO's
memory, but userspace doesn't need CPU mappings for all BOs and only
creates them as needed.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-24 Thread Michal Hocko

On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> On 2018-01-24 10:28 AM, Michal Hocko wrote:
[...]
> > So how exactly then helps to kill one of those processes? The memory
> > stays pinned behind or do I still misunderstand?
> 
> Fundamentally, the memory is only released once all references to the
> BOs are dropped. That's true no matter how the memory is accounted for
> between the processes referencing the BO.
> 
> 
> In practice, this should be fine:
> 
> 1. The amount of memory used for shared BOs is normally small compared
> to the amount of memory used for non-shared BOs (and other things). So
> regardless of how shared BOs are accounted for, the OOM killer should
> first target the process which is responsible for more memory overall.

OK. So this is essentially the same as with the normal shared memory
which is a part of the RSS in general.

> 2. If the OOM killer kills a process which is sharing BOs with another
> process, this should result in the other process dropping its references
> to the BOs as well, at which point the memory is released.

OK. How exactly are those BOs mapped to the userspace?
-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-24 Thread Michel Dänzer

On 2018-01-24 10:28 AM, Michal Hocko wrote:
> On Tue 23-01-18 17:39:19, Michel Dänzer wrote:
>> On 2018-01-23 04:36 PM, Michal Hocko wrote:
>>> On Tue 23-01-18 15:27:00, Roman Gushchin wrote:
 On Thu, Jan 18, 2018 at 06:00:06PM +0100, Michal Hocko wrote:
> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
>> Hi, this series is a revised version of an RFC sent by Christian König
>> a few years ago. The original RFC can be found at 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_archives_dri-2Ddevel_2015-2DSeptember_089778.html&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=R-JIQjy8rqmH5qD581_VYL0Q7cpWSITKOnBCE-3LI8U&s=QZGqKpKuJ2BtioFGSy8_721owcWJ0J6c6d4jywOwN4w&;
> Here is the origin cover letter text
> : I'm currently working on the issue that when device drivers allocate 
> memory on
> : behalf of an application the OOM killer usually doesn't knew about that 
> unless
> : the application also get this memory mapped into their address space.
> : 
> : This is especially annoying for graphics drivers where a lot of the VRAM
> : usually isn't CPU accessible and so doesn't make sense to map into the
> : address space of the process using it.
> : 
> : The problem now is that when an application starts to use a lot of VRAM 
> those
> : buffers objects sooner or later get swapped out to system memory, but 
> when we
> : now run into an out of memory situation the OOM killer obviously 
> doesn't knew
> : anything about that memory and so usually kills the wrong process.
> : 
> : The following set of patches tries to address this problem by 
> introducing a per
> : file OOM badness score, which device drivers can use to give the OOM 
> killer a
> : hint how many resources are bound to a file descriptor so that it can 
> make
> : better decisions which process to kill.
> : 
> : So question at every one: What do you think about this approach?
> : 
> : My biggest concern right now is the patches are messing with a core 
> kernel
> : structure (adding a field to struct file). Any better idea? I'm 
> considering
> : to put a callback into file_ops instead.

 Hello!

 I wonder if groupoom (aka cgroup-aware OOM killer) can work for you?
>>>
>>> I do not think so. The problem is that the allocating context is not
>>> identical with the end consumer.
>>
>> That's actually not really true. Even in cases where a BO is shared with
>> a different process, it is still used at least occasionally in the
>> process which allocated it as well. Otherwise there would be no point in
>> sharing it between processes.
> 
> OK, but somebody has to be made responsible. Otherwise you are just
> killing a process which doesn't really release any memory.
>  
>> There should be no problem if the memory of a shared BO is accounted for
>> in each process sharing it. It might be nice to scale each process'
>> "debt" by 1 / (number of processes sharing it) if possible, but in the
>> worst case accounting it fully in each process should be fine.
> 
> So how exactly then helps to kill one of those processes? The memory
> stays pinned behind or do I still misunderstand?

Fundamentally, the memory is only released once all references to the
BOs are dropped. That's true no matter how the memory is accounted for
between the processes referencing the BO.


In practice, this should be fine:

1. The amount of memory used for shared BOs is normally small compared
to the amount of memory used for non-shared BOs (and other things). So
regardless of how shared BOs are accounted for, the OOM killer should
first target the process which is responsible for more memory overall.

2. If the OOM killer kills a process which is sharing BOs with another
process, this should result in the other process dropping its references
to the BOs as well, at which point the memory is released.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-24 Thread Michal Hocko

On Tue 23-01-18 17:39:19, Michel Dänzer wrote:
> On 2018-01-23 04:36 PM, Michal Hocko wrote:
> > On Tue 23-01-18 15:27:00, Roman Gushchin wrote:
> >> On Thu, Jan 18, 2018 at 06:00:06PM +0100, Michal Hocko wrote:
> >>> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
>  Hi, this series is a revised version of an RFC sent by Christian König
>  a few years ago. The original RFC can be found at 
>  https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_archives_dri-2Ddevel_2015-2DSeptember_089778.html&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=R-JIQjy8rqmH5qD581_VYL0Q7cpWSITKOnBCE-3LI8U&s=QZGqKpKuJ2BtioFGSy8_721owcWJ0J6c6d4jywOwN4w&;
> >>> Here is the origin cover letter text
> >>> : I'm currently working on the issue that when device drivers allocate 
> >>> memory on
> >>> : behalf of an application the OOM killer usually doesn't knew about that 
> >>> unless
> >>> : the application also get this memory mapped into their address space.
> >>> : 
> >>> : This is especially annoying for graphics drivers where a lot of the VRAM
> >>> : usually isn't CPU accessible and so doesn't make sense to map into the
> >>> : address space of the process using it.
> >>> : 
> >>> : The problem now is that when an application starts to use a lot of VRAM 
> >>> those
> >>> : buffers objects sooner or later get swapped out to system memory, but 
> >>> when we
> >>> : now run into an out of memory situation the OOM killer obviously 
> >>> doesn't knew
> >>> : anything about that memory and so usually kills the wrong process.
> >>> : 
> >>> : The following set of patches tries to address this problem by 
> >>> introducing a per
> >>> : file OOM badness score, which device drivers can use to give the OOM 
> >>> killer a
> >>> : hint how many resources are bound to a file descriptor so that it can 
> >>> make
> >>> : better decisions which process to kill.
> >>> : 
> >>> : So question at every one: What do you think about this approach?
> >>> : 
> >>> : My biggest concern right now is the patches are messing with a core 
> >>> kernel
> >>> : structure (adding a field to struct file). Any better idea? I'm 
> >>> considering
> >>> : to put a callback into file_ops instead.
> >>
> >> Hello!
> >>
> >> I wonder if groupoom (aka cgroup-aware OOM killer) can work for you?
> > 
> > I do not think so. The problem is that the allocating context is not
> > identical with the end consumer.
> 
> That's actually not really true. Even in cases where a BO is shared with
> a different process, it is still used at least occasionally in the
> process which allocated it as well. Otherwise there would be no point in
> sharing it between processes.

OK, but somebody has to be made responsible. Otherwise you are just
killing a process which doesn't really release any memory.
 
> There should be no problem if the memory of a shared BO is accounted for
> in each process sharing it. It might be nice to scale each process'
> "debt" by 1 / (number of processes sharing it) if possible, but in the
> worst case accounting it fully in each process should be fine.

So how exactly then helps to kill one of those processes? The memory
stays pinned behind or do I still misunderstand?
-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-23 Thread Michel Dänzer

On 2018-01-23 04:36 PM, Michal Hocko wrote:
> On Tue 23-01-18 15:27:00, Roman Gushchin wrote:
>> On Thu, Jan 18, 2018 at 06:00:06PM +0100, Michal Hocko wrote:
>>> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
 Hi, this series is a revised version of an RFC sent by Christian König
 a few years ago. The original RFC can be found at 
 https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_archives_dri-2Ddevel_2015-2DSeptember_089778.html&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=R-JIQjy8rqmH5qD581_VYL0Q7cpWSITKOnBCE-3LI8U&s=QZGqKpKuJ2BtioFGSy8_721owcWJ0J6c6d4jywOwN4w&;
>>> Here is the origin cover letter text
>>> : I'm currently working on the issue that when device drivers allocate 
>>> memory on
>>> : behalf of an application the OOM killer usually doesn't knew about that 
>>> unless
>>> : the application also get this memory mapped into their address space.
>>> : 
>>> : This is especially annoying for graphics drivers where a lot of the VRAM
>>> : usually isn't CPU accessible and so doesn't make sense to map into the
>>> : address space of the process using it.
>>> : 
>>> : The problem now is that when an application starts to use a lot of VRAM 
>>> those
>>> : buffers objects sooner or later get swapped out to system memory, but 
>>> when we
>>> : now run into an out of memory situation the OOM killer obviously doesn't 
>>> knew
>>> : anything about that memory and so usually kills the wrong process.
>>> : 
>>> : The following set of patches tries to address this problem by introducing 
>>> a per
>>> : file OOM badness score, which device drivers can use to give the OOM 
>>> killer a
>>> : hint how many resources are bound to a file descriptor so that it can make
>>> : better decisions which process to kill.
>>> : 
>>> : So question at every one: What do you think about this approach?
>>> : 
>>> : My biggest concern right now is the patches are messing with a core kernel
>>> : structure (adding a field to struct file). Any better idea? I'm 
>>> considering
>>> : to put a callback into file_ops instead.
>>
>> Hello!
>>
>> I wonder if groupoom (aka cgroup-aware OOM killer) can work for you?
> 
> I do not think so. The problem is that the allocating context is not
> identical with the end consumer.

That's actually not really true. Even in cases where a BO is shared with
a different process, it is still used at least occasionally in the
process which allocated it as well. Otherwise there would be no point in
sharing it between processes.


There should be no problem if the memory of a shared BO is accounted for
in each process sharing it. It might be nice to scale each process'
"debt" by 1 / (number of processes sharing it) if possible, but in the
worst case accounting it fully in each process should be fine.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-23 Thread Michal Hocko

On Tue 23-01-18 15:27:00, Roman Gushchin wrote:
> On Thu, Jan 18, 2018 at 06:00:06PM +0100, Michal Hocko wrote:
> > On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
> > > Hi, this series is a revised version of an RFC sent by Christian König
> > > a few years ago. The original RFC can be found at 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_archives_dri-2Ddevel_2015-2DSeptember_089778.html&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=R-JIQjy8rqmH5qD581_VYL0Q7cpWSITKOnBCE-3LI8U&s=QZGqKpKuJ2BtioFGSy8_721owcWJ0J6c6d4jywOwN4w&;
> > Here is the origin cover letter text
> > : I'm currently working on the issue that when device drivers allocate 
> > memory on
> > : behalf of an application the OOM killer usually doesn't knew about that 
> > unless
> > : the application also get this memory mapped into their address space.
> > : 
> > : This is especially annoying for graphics drivers where a lot of the VRAM
> > : usually isn't CPU accessible and so doesn't make sense to map into the
> > : address space of the process using it.
> > : 
> > : The problem now is that when an application starts to use a lot of VRAM 
> > those
> > : buffers objects sooner or later get swapped out to system memory, but 
> > when we
> > : now run into an out of memory situation the OOM killer obviously doesn't 
> > knew
> > : anything about that memory and so usually kills the wrong process.
> > : 
> > : The following set of patches tries to address this problem by introducing 
> > a per
> > : file OOM badness score, which device drivers can use to give the OOM 
> > killer a
> > : hint how many resources are bound to a file descriptor so that it can make
> > : better decisions which process to kill.
> > : 
> > : So question at every one: What do you think about this approach?
> > : 
> > : My biggest concern right now is the patches are messing with a core kernel
> > : structure (adding a field to struct file). Any better idea? I'm 
> > considering
> > : to put a callback into file_ops instead.
> 
> Hello!
> 
> I wonder if groupoom (aka cgroup-aware OOM killer) can work for you?

I do not think so. The problem is that the allocating context is not
identical with the end consumer.
-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-23 Thread Roman Gushchin

On Thu, Jan 18, 2018 at 06:00:06PM +0100, Michal Hocko wrote:
> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
> > Hi, this series is a revised version of an RFC sent by Christian König
> > a few years ago. The original RFC can be found at 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_archives_dri-2Ddevel_2015-2DSeptember_089778.html&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=R-JIQjy8rqmH5qD581_VYL0Q7cpWSITKOnBCE-3LI8U&s=QZGqKpKuJ2BtioFGSy8_721owcWJ0J6c6d4jywOwN4w&;
> Here is the origin cover letter text
> : I'm currently working on the issue that when device drivers allocate memory 
> on
> : behalf of an application the OOM killer usually doesn't knew about that 
> unless
> : the application also get this memory mapped into their address space.
> : 
> : This is especially annoying for graphics drivers where a lot of the VRAM
> : usually isn't CPU accessible and so doesn't make sense to map into the
> : address space of the process using it.
> : 
> : The problem now is that when an application starts to use a lot of VRAM 
> those
> : buffers objects sooner or later get swapped out to system memory, but when 
> we
> : now run into an out of memory situation the OOM killer obviously doesn't 
> knew
> : anything about that memory and so usually kills the wrong process.
> : 
> : The following set of patches tries to address this problem by introducing a 
> per
> : file OOM badness score, which device drivers can use to give the OOM killer 
> a
> : hint how many resources are bound to a file descriptor so that it can make
> : better decisions which process to kill.
> : 
> : So question at every one: What do you think about this approach?
> : 
> : My biggest concern right now is the patches are messing with a core kernel
> : structure (adding a field to struct file). Any better idea? I'm considering
> : to put a callback into file_ops instead.

Hello!

I wonder if groupoom (aka cgroup-aware OOM killer) can work for you?
We do have kmem accounting on the memory cgroup level, and the cgroup-aware
OOM selection logic takes cgroup's kmem size into account. So, you don't
need to introduce another accounting mechanism for OOM.

You can find the current implementation in the mm tree.

Thanks!

Roman
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-23 Thread Michal Hocko

On Fri 19-01-18 17:54:36, Christian König wrote:
> Am 19.01.2018 um 13:20 schrieb Michal Hocko:
> > On Fri 19-01-18 13:13:51, Michal Hocko wrote:
> > > On Fri 19-01-18 12:37:51, Christian König wrote:
> > > [...]
> > > > The per file descriptor badness is/was just the much easier approach to
> > > > solve the issue, because the drivers already knew which client is 
> > > > currently
> > > > using which buffer objects.
> > > > 
> > > > I of course agree that file descriptors can be shared between processes 
> > > > and
> > > > are by themselves not killable. But at least for our graphics driven use
> > > > case I don't see much of a problem killing all processes when a file
> > > > descriptor is used by more than one at the same time.
> > > Ohh, I absolutely see why you have chosen this way for your particular
> > > usecase. I am just arguing that this would rather be more generic to be
> > > merged. If there is absolutely no other way around we can consider it
> > > but right now I do not see that all other options have been considered
> > > properly. Especially when the fd based approach is basically wrong for
> > > almost anybody else.
> > And more importantly. Iterating over _all_ fd which is what is your
> > approach is based on AFAIU is not acceptable for the OOM path. Even
> > though oom_badness is not a hot path we do not really want it to take a
> > lot of time either. Even the current iteration over all processes is
> > quite time consuming. Now you want to add the number of opened files and
> > that might be quite many per process.
> 
> Mhm, crap that is a really good argument.
> 
> How about adding a linked list of callbacks to check for the OOM killer to
> check for each process?
> 
> This way we can avoid finding the process where we need to account things on
> when memory is allocated and still allow the OOM killer to only check the
> specific callbacks it needs to determine the score of a process?

I might be oversimplifying but there really have to be a boundary when
you have the target user context, no? Then do the accounting when you
get data to the user.
-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-22 Thread Andrey Grodzovsky




On 01/22/2018 06:23 PM, Andrew Morton wrote:

On Thu, 18 Jan 2018 11:47:48 -0500 Andrey Grodzovsky 
 wrote:


Hi, this series is a revised version of an RFC sent by Christian König
a few years ago. The original RFC can be found at
https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html

This is the same idea and I've just adressed his concern from the original RFC
and switched to a callback into file_ops instead of a new member in struct file.

Should be in address_space_operations, I suspect.  If an application
opens a file twice, we only want to count it once?


Makes sense



But we're putting the cart ahead of the horse here.  Please provide us
with a detailed description of the problem which you are addressing so
that the MM developers can better consider how to address your
requirements.


I will just reiterate the problem statement from the original RFC, 
should have
put it in the body of the RFC and not just a link, as already commented 
by Michal.

Bellow is the quoted RFC.

Thanks,
Andrey

P.S You can also check the follow up discussion after this first email.

"

I'm currently working on the issue that when device drivers allocate memory on
behalf of an application the OOM killer usually doesn't knew about that unless
the application also get this memory mapped into their address space.

This is especially annoying for graphics drivers where a lot of the VRAM
usually isn't CPU accessible and so doesn't make sense to map into the
address space of the process using it.

The problem now is that when an application starts to use a lot of VRAM those
buffers objects sooner or later get swapped out to system memory, but when we
now run into an out of memory situation the OOM killer obviously doesn't knew
anything about that memory and so usually kills the wrong process

"



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-22 Thread Andrew Morton

On Thu, 18 Jan 2018 11:47:48 -0500 Andrey Grodzovsky 
 wrote:

> Hi, this series is a revised version of an RFC sent by Christian König
> a few years ago. The original RFC can be found at 
> https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
> 
> This is the same idea and I've just adressed his concern from the original 
> RFC 
> and switched to a callback into file_ops instead of a new member in struct 
> file.

Should be in address_space_operations, I suspect.  If an application
opens a file twice, we only want to count it once?

But we're putting the cart ahead of the horse here.  Please provide us
with a detailed description of the problem which you are addressing so
that the MM developers can better consider how to address your
requirements.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-21 Thread Eric Anholt

Michel Dänzer  writes:

> On 2018-01-19 11:02 AM, Michel Dänzer wrote:
>> On 2018-01-19 10:58 AM, Christian König wrote:
>>> Am 19.01.2018 um 10:32 schrieb Michel Dänzer:
 On 2018-01-19 09:39 AM, Christian König wrote:
> Am 19.01.2018 um 09:20 schrieb Michal Hocko:
>> OK, in that case I would propose a different approach. We already
>> have rss_stat. So why do not we simply add a new counter there
>> MM_KERNELPAGES and consider those in oom_badness? The rule would be
>> that such a memory is bound to the process life time. I guess we will
>> find more users for this later.
> I already tried that and the problem with that approach is that some
> buffers are not created by the application which actually uses them.
>
> For example X/Wayland is creating and handing out render buffers to
> application which want to use OpenGL.
>
> So the result is when you always account the application who created the
> buffer the OOM killer will certainly reap X/Wayland first. And that is
> exactly what we want to avoid here.
 FWIW, what you describe is true with DRI2, but not with DRI3 or Wayland
 anymore. With DRI3 and Wayland, buffers are allocated by the clients and
 then shared with the X / Wayland server.
>>>
>>> Good point, when I initially looked at that problem DRI3 wasn't widely
>>> used yet.
>>>
 Also, in all cases, the amount of memory allocated for buffers shared
 between DRI/Wayland clients and the server should be relatively small
 compared to the amount of memory allocated for buffers used only locally
 in the client, particularly for clients which create significant memory
 pressure.
>>>
>>> That is unfortunately only partially true. When you have a single
>>> runaway application which tries to allocate everything it would indeed
>>> work as you described.
>>>
>>> But when I tested this a few years ago with X based desktop the
>>> applications which actually used most of the memory where Firefox and
>>> Thunderbird. Unfortunately they never got accounted for that.
>>>
>>> Now, on my current Wayland based desktop it actually doesn't look much
>>> better. Taking a look at radeon_gem_info/amdgpu_gem_info the majority of
>>> all memory was allocated either by gnome-shell or Xwayland.
>> 
>> My guess would be this is due to pixmaps, which allow X clients to cause
>> the X server to allocate essentially unlimited amounts of memory. It's a
>> separate issue, which would require a different solution than what we're
>> discussing in this thread. Maybe something that would allow the X server
>> to tell the kernel that some of the memory it allocates is for the
>> client process.
>
> Of course, such a mechanism could probably be abused to incorrectly
> blame other processes for one's own memory consumption...
>
>
> I'm not sure if the pixmap issue can be solved for the OOM killer. It's
> an X design issue which is fixed with Wayland. So it's probably better
> to ignore it for this discussion.
>
> Also, I really think the issue with DRM buffers being shared between
> processes isn't significant for the OOM killer compared to DRM buffers
> only used in the same process that allocates them. So I suggest focusing
> on the latter.

Agreed.  The 95% case is non-shared buffers, so just don't account for
them and we'll have a solution good enough that we probably never need
to handle the shared case.  On the DRM side, removing buffers from the
accounting once they get shared would be easy.


signature.asc
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-19 Thread Christian König


Am 19.01.2018 um 13:20 schrieb Michal Hocko:

On Fri 19-01-18 13:13:51, Michal Hocko wrote:

On Fri 19-01-18 12:37:51, Christian König wrote:
[...]

The per file descriptor badness is/was just the much easier approach to
solve the issue, because the drivers already knew which client is currently
using which buffer objects.

I of course agree that file descriptors can be shared between processes and
are by themselves not killable. But at least for our graphics driven use
case I don't see much of a problem killing all processes when a file
descriptor is used by more than one at the same time.

Ohh, I absolutely see why you have chosen this way for your particular
usecase. I am just arguing that this would rather be more generic to be
merged. If there is absolutely no other way around we can consider it
but right now I do not see that all other options have been considered
properly. Especially when the fd based approach is basically wrong for
almost anybody else.

And more importantly. Iterating over _all_ fd which is what is your
approach is based on AFAIU is not acceptable for the OOM path. Even
though oom_badness is not a hot path we do not really want it to take a
lot of time either. Even the current iteration over all processes is
quite time consuming. Now you want to add the number of opened files and
that might be quite many per process.


Mhm, crap that is a really good argument.

How about adding a linked list of callbacks to check for the OOM killer 
to check for each process?


This way we can avoid finding the process where we need to account 
things on when memory is allocated and still allow the OOM killer to 
only check the specific callbacks it needs to determine the score of a 
process?


Would still require some changes in the fs layer, but I think that 
should be doable.


Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-19 Thread Michel Dänzer

On 2018-01-19 12:37 PM, Christian König wrote:
> Am 19.01.2018 um 11:40 schrieb Michal Hocko:
>> On Fri 19-01-18 09:39:03, Christian König wrote:
>>> Am 19.01.2018 um 09:20 schrieb Michal Hocko:
>> [...]
 OK, in that case I would propose a different approach. We already
 have rss_stat. So why do not we simply add a new counter there
 MM_KERNELPAGES and consider those in oom_badness? The rule would be
 that such a memory is bound to the process life time. I guess we will
 find more users for this later.
>>> I already tried that and the problem with that approach is that some
>>> buffers
>>> are not created by the application which actually uses them.
>>>
>>> For example X/Wayland is creating and handing out render buffers to
>>> application which want to use OpenGL.
>>>
>>> So the result is when you always account the application who created the
>>> buffer the OOM killer will certainly reap X/Wayland first. And that is
>>> exactly what we want to avoid here.
>> Then you have to find the target allocation context at the time of the
>> allocation and account it.
> 
> And exactly that's the root of the problem: The target allocation
> context isn't known at the time of the allocation.
> 
> We could add callbacks so that when the memory is passed from the
> allocator to the actual user of the memory. In other words when the
> memory is passed from the X server to the client the driver would need
> to decrement the X servers accounting and increment the clients accounting.
> 
> But I think that would go deep into the file descriptor handling (we
> would at least need to handle dup/dup2 and passing the fd using unix
> domain sockets) and most likely would be rather error prone.
> 
> The per file descriptor badness is/was just the much easier approach to
> solve the issue, because the drivers already knew which client is
> currently using which buffer objects.
> 
> I of course agree that file descriptors can be shared between processes
> and are by themselves not killable. But at least for our graphics driven
> use case I don't see much of a problem killing all processes when a file
> descriptor is used by more than one at the same time.

In that case, accounting a BO as suggested by Michal above, in every
process that shares it, should work fine, shouldn't it?

The OOM killer will first select the process which has more memory
accounted for other things than the BOs shared with another process.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-19 Thread Michal Hocko

On Fri 19-01-18 12:37:51, Christian König wrote:
[...]
> The per file descriptor badness is/was just the much easier approach to
> solve the issue, because the drivers already knew which client is currently
> using which buffer objects.
> 
> I of course agree that file descriptors can be shared between processes and
> are by themselves not killable. But at least for our graphics driven use
> case I don't see much of a problem killing all processes when a file
> descriptor is used by more than one at the same time.

Ohh, I absolutely see why you have chosen this way for your particular
usecase. I am just arguing that this would rather be more generic to be
merged. If there is absolutely no other way around we can consider it
but right now I do not see that all other options have been considered
properly. Especially when the fd based approach is basically wrong for
almost anybody else.
-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-19 Thread Michal Hocko

On Fri 19-01-18 13:13:51, Michal Hocko wrote:
> On Fri 19-01-18 12:37:51, Christian König wrote:
> [...]
> > The per file descriptor badness is/was just the much easier approach to
> > solve the issue, because the drivers already knew which client is currently
> > using which buffer objects.
> > 
> > I of course agree that file descriptors can be shared between processes and
> > are by themselves not killable. But at least for our graphics driven use
> > case I don't see much of a problem killing all processes when a file
> > descriptor is used by more than one at the same time.
> 
> Ohh, I absolutely see why you have chosen this way for your particular
> usecase. I am just arguing that this would rather be more generic to be
> merged. If there is absolutely no other way around we can consider it
> but right now I do not see that all other options have been considered
> properly. Especially when the fd based approach is basically wrong for
> almost anybody else.

And more importantly. Iterating over _all_ fd which is what is your
approach is based on AFAIU is not acceptable for the OOM path. Even
though oom_badness is not a hot path we do not really want it to take a
lot of time either. Even the current iteration over all processes is
quite time consuming. Now you want to add the number of opened files and
that might be quite many per process.
-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-19 Thread Michal Hocko

On Fri 19-01-18 09:39:03, Christian König wrote:
> Am 19.01.2018 um 09:20 schrieb Michal Hocko:
[...]
> > OK, in that case I would propose a different approach. We already
> > have rss_stat. So why do not we simply add a new counter there
> > MM_KERNELPAGES and consider those in oom_badness? The rule would be
> > that such a memory is bound to the process life time. I guess we will
> > find more users for this later.
> 
> I already tried that and the problem with that approach is that some buffers
> are not created by the application which actually uses them.
> 
> For example X/Wayland is creating and handing out render buffers to
> application which want to use OpenGL.
> 
> So the result is when you always account the application who created the
> buffer the OOM killer will certainly reap X/Wayland first. And that is
> exactly what we want to avoid here.

Then you have to find the target allocation context at the time of the
allocation and account it. As follow up emails show, implementations
might differ and any robust oom solution have to rely on the common
counters.
-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-19 Thread Michel Dänzer

On 2018-01-19 11:02 AM, Michel Dänzer wrote:
> On 2018-01-19 10:58 AM, Christian König wrote:
>> Am 19.01.2018 um 10:32 schrieb Michel Dänzer:
>>> On 2018-01-19 09:39 AM, Christian König wrote:
 Am 19.01.2018 um 09:20 schrieb Michal Hocko:
> OK, in that case I would propose a different approach. We already
> have rss_stat. So why do not we simply add a new counter there
> MM_KERNELPAGES and consider those in oom_badness? The rule would be
> that such a memory is bound to the process life time. I guess we will
> find more users for this later.
 I already tried that and the problem with that approach is that some
 buffers are not created by the application which actually uses them.

 For example X/Wayland is creating and handing out render buffers to
 application which want to use OpenGL.

 So the result is when you always account the application who created the
 buffer the OOM killer will certainly reap X/Wayland first. And that is
 exactly what we want to avoid here.
>>> FWIW, what you describe is true with DRI2, but not with DRI3 or Wayland
>>> anymore. With DRI3 and Wayland, buffers are allocated by the clients and
>>> then shared with the X / Wayland server.
>>
>> Good point, when I initially looked at that problem DRI3 wasn't widely
>> used yet.
>>
>>> Also, in all cases, the amount of memory allocated for buffers shared
>>> between DRI/Wayland clients and the server should be relatively small
>>> compared to the amount of memory allocated for buffers used only locally
>>> in the client, particularly for clients which create significant memory
>>> pressure.
>>
>> That is unfortunately only partially true. When you have a single
>> runaway application which tries to allocate everything it would indeed
>> work as you described.
>>
>> But when I tested this a few years ago with X based desktop the
>> applications which actually used most of the memory where Firefox and
>> Thunderbird. Unfortunately they never got accounted for that.
>>
>> Now, on my current Wayland based desktop it actually doesn't look much
>> better. Taking a look at radeon_gem_info/amdgpu_gem_info the majority of
>> all memory was allocated either by gnome-shell or Xwayland.
> 
> My guess would be this is due to pixmaps, which allow X clients to cause
> the X server to allocate essentially unlimited amounts of memory. It's a
> separate issue, which would require a different solution than what we're
> discussing in this thread. Maybe something that would allow the X server
> to tell the kernel that some of the memory it allocates is for the
> client process.

Of course, such a mechanism could probably be abused to incorrectly
blame other processes for one's own memory consumption...


I'm not sure if the pixmap issue can be solved for the OOM killer. It's
an X design issue which is fixed with Wayland. So it's probably better
to ignore it for this discussion.

Also, I really think the issue with DRM buffers being shared between
processes isn't significant for the OOM killer compared to DRM buffers
only used in the same process that allocates them. So I suggest focusing
on the latter.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-19 Thread Christian König


Am 19.01.2018 um 11:40 schrieb Michal Hocko:

On Fri 19-01-18 09:39:03, Christian König wrote:

Am 19.01.2018 um 09:20 schrieb Michal Hocko:

[...]

OK, in that case I would propose a different approach. We already
have rss_stat. So why do not we simply add a new counter there
MM_KERNELPAGES and consider those in oom_badness? The rule would be
that such a memory is bound to the process life time. I guess we will
find more users for this later.

I already tried that and the problem with that approach is that some buffers
are not created by the application which actually uses them.

For example X/Wayland is creating and handing out render buffers to
application which want to use OpenGL.

So the result is when you always account the application who created the
buffer the OOM killer will certainly reap X/Wayland first. And that is
exactly what we want to avoid here.

Then you have to find the target allocation context at the time of the
allocation and account it.


And exactly that's the root of the problem: The target allocation 
context isn't known at the time of the allocation.


We could add callbacks so that when the memory is passed from the 
allocator to the actual user of the memory. In other words when the 
memory is passed from the X server to the client the driver would need 
to decrement the X servers accounting and increment the clients accounting.


But I think that would go deep into the file descriptor handling (we 
would at least need to handle dup/dup2 and passing the fd using unix 
domain sockets) and most likely would be rather error prone.


The per file descriptor badness is/was just the much easier approach to 
solve the issue, because the drivers already knew which client is 
currently using which buffer objects.


I of course agree that file descriptors can be shared between processes 
and are by themselves not killable. But at least for our graphics driven 
use case I don't see much of a problem killing all processes when a file 
descriptor is used by more than one at the same time.


Regards,
Christian.


As follow up emails show, implementations
might differ and any robust oom solution have to rely on the common
counters.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-19 Thread Michel Dänzer

On 2018-01-19 10:58 AM, Christian König wrote:
> Am 19.01.2018 um 10:32 schrieb Michel Dänzer:
>> On 2018-01-19 09:39 AM, Christian König wrote:
>>> Am 19.01.2018 um 09:20 schrieb Michal Hocko:
 On Thu 18-01-18 12:01:32, Eric Anholt wrote:
> Michal Hocko  writes:
>
>> On Thu 18-01-18 18:00:06, Michal Hocko wrote:
>>> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
 Hi, this series is a revised version of an RFC sent by Christian
 König
 a few years ago. The original RFC can be found at
 https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html



 This is the same idea and I've just adressed his concern from the
 original RFC
 and switched to a callback into file_ops instead of a new member
 in struct file.
>>> Please add the full description to the cover letter and do not make
>>> people hunt links.
>>>
>>> Here is the origin cover letter text
>>> : I'm currently working on the issue that when device drivers
>>> allocate memory on
>>> : behalf of an application the OOM killer usually doesn't knew
>>> about that unless
>>> : the application also get this memory mapped into their address
>>> space.
>>> :
>>> : This is especially annoying for graphics drivers where a lot of
>>> the VRAM
>>> : usually isn't CPU accessible and so doesn't make sense to map
>>> into the
>>> : address space of the process using it.
>>> :
>>> : The problem now is that when an application starts to use a lot
>>> of VRAM those
>>> : buffers objects sooner or later get swapped out to system memory,
>>> but when we
>>> : now run into an out of memory situation the OOM killer obviously
>>> doesn't knew
>>> : anything about that memory and so usually kills the wrong process.
>> OK, but how do you attribute that memory to a particular OOM killable
>> entity? And how do you actually enforce that those resources get
>> freed
>> on the oom killer action?
>>
>>> : The following set of patches tries to address this problem by
>>> introducing a per
>>> : file OOM badness score, which device drivers can use to give the
>>> OOM killer a
>>> : hint how many resources are bound to a file descriptor so that it
>>> can make
>>> : better decisions which process to kill.
>> But files are not killable, they can be shared... In other words this
>> doesn't help the oom killer to make an educated guess at all.
> Maybe some more context would help the discussion?
>
> The struct file in patch 3 is the DRM fd.  That's effectively "my
> process's interface to talking to the GPU" not "a single GPU
> resource".
> Once that file is closed, all of the process's private, idle GPU
> buffers
> will be immediately freed (this will be most of their allocations),
> and
> some will be freed once the GPU completes some work (this will be most
> of the rest of their allocations).
>
> Some GEM BOs won't be freed just by closing the fd, if they've been
> shared between processes.  Those are usually about 8-24MB total in a
> process, rather than the GBs that modern apps use (or that our
> testcases
> like to allocate and thus trigger oomkilling of the test harness
> instead
> of the offending testcase...)
>
> Even if we just had the private+idle buffers being accounted in OOM
> badness, that would be a huge step forward in system reliability.
 OK, in that case I would propose a different approach. We already
 have rss_stat. So why do not we simply add a new counter there
 MM_KERNELPAGES and consider those in oom_badness? The rule would be
 that such a memory is bound to the process life time. I guess we will
 find more users for this later.
>>> I already tried that and the problem with that approach is that some
>>> buffers are not created by the application which actually uses them.
>>>
>>> For example X/Wayland is creating and handing out render buffers to
>>> application which want to use OpenGL.
>>>
>>> So the result is when you always account the application who created the
>>> buffer the OOM killer will certainly reap X/Wayland first. And that is
>>> exactly what we want to avoid here.
>> FWIW, what you describe is true with DRI2, but not with DRI3 or Wayland
>> anymore. With DRI3 and Wayland, buffers are allocated by the clients and
>> then shared with the X / Wayland server.
> 
> Good point, when I initially looked at that problem DRI3 wasn't widely
> used yet.
> 
>> Also, in all cases, the amount of memory allocated for buffers shared
>> between DRI/Wayland clients and the server should be relatively small
>> compared to the amount of memory allocated for buffers used only locally
>> in the client, particularly for clients which create significant memory
>> pressure.
> 
> That is unfo

Re: [RFC] Per file OOM badness

2018-01-19 Thread roger




On 2018年01月19日 16:25, Michal Hocko wrote:

[removed the broken quoting - please try to use an email client which
doesn't mess up the qouted text]

On Fri 19-01-18 06:01:26, He, Roger wrote:
[...]

I think you are misunderstanding here.
Actually for now, the memory in TTM Pools already has mm_shrink which is 
implemented in ttm_pool_mm_shrink_init.
And here the memory we want to make it contribute to OOM badness is not in TTM 
Pools.
Because when TTM buffer allocation success, the memory already is removed from 
TTM Pools.


I have no idea what TTM buffers are. But this smells like something
rather specific to the particular subsytem. And my main objection here
is that struct file is not a proper vehicle to carry such an
information. So whatever the TTM subsystem does it should contribute to
generic counters rather than abuse fd because it happens to use it to
communicate with userspace.


   got it. thanks.




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-19 Thread Christian König


Am 19.01.2018 um 10:32 schrieb Michel Dänzer:

On 2018-01-19 09:39 AM, Christian König wrote:

Am 19.01.2018 um 09:20 schrieb Michal Hocko:

On Thu 18-01-18 12:01:32, Eric Anholt wrote:

Michal Hocko  writes:


On Thu 18-01-18 18:00:06, Michal Hocko wrote:

On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:

Hi, this series is a revised version of an RFC sent by Christian
König
a few years ago. The original RFC can be found at
https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html


This is the same idea and I've just adressed his concern from the
original RFC
and switched to a callback into file_ops instead of a new member
in struct file.

Please add the full description to the cover letter and do not make
people hunt links.

Here is the origin cover letter text
: I'm currently working on the issue that when device drivers
allocate memory on
: behalf of an application the OOM killer usually doesn't knew
about that unless
: the application also get this memory mapped into their address
space.
:
: This is especially annoying for graphics drivers where a lot of
the VRAM
: usually isn't CPU accessible and so doesn't make sense to map
into the
: address space of the process using it.
:
: The problem now is that when an application starts to use a lot
of VRAM those
: buffers objects sooner or later get swapped out to system memory,
but when we
: now run into an out of memory situation the OOM killer obviously
doesn't knew
: anything about that memory and so usually kills the wrong process.

OK, but how do you attribute that memory to a particular OOM killable
entity? And how do you actually enforce that those resources get freed
on the oom killer action?


: The following set of patches tries to address this problem by
introducing a per
: file OOM badness score, which device drivers can use to give the
OOM killer a
: hint how many resources are bound to a file descriptor so that it
can make
: better decisions which process to kill.

But files are not killable, they can be shared... In other words this
doesn't help the oom killer to make an educated guess at all.

Maybe some more context would help the discussion?

The struct file in patch 3 is the DRM fd.  That's effectively "my
process's interface to talking to the GPU" not "a single GPU resource".
Once that file is closed, all of the process's private, idle GPU buffers
will be immediately freed (this will be most of their allocations), and
some will be freed once the GPU completes some work (this will be most
of the rest of their allocations).

Some GEM BOs won't be freed just by closing the fd, if they've been
shared between processes.  Those are usually about 8-24MB total in a
process, rather than the GBs that modern apps use (or that our testcases
like to allocate and thus trigger oomkilling of the test harness instead
of the offending testcase...)

Even if we just had the private+idle buffers being accounted in OOM
badness, that would be a huge step forward in system reliability.

OK, in that case I would propose a different approach. We already
have rss_stat. So why do not we simply add a new counter there
MM_KERNELPAGES and consider those in oom_badness? The rule would be
that such a memory is bound to the process life time. I guess we will
find more users for this later.

I already tried that and the problem with that approach is that some
buffers are not created by the application which actually uses them.

For example X/Wayland is creating and handing out render buffers to
application which want to use OpenGL.

So the result is when you always account the application who created the
buffer the OOM killer will certainly reap X/Wayland first. And that is
exactly what we want to avoid here.

FWIW, what you describe is true with DRI2, but not with DRI3 or Wayland
anymore. With DRI3 and Wayland, buffers are allocated by the clients and
then shared with the X / Wayland server.


Good point, when I initially looked at that problem DRI3 wasn't widely 
used yet.



Also, in all cases, the amount of memory allocated for buffers shared
between DRI/Wayland clients and the server should be relatively small
compared to the amount of memory allocated for buffers used only locally
in the client, particularly for clients which create significant memory
pressure.


That is unfortunately only partially true. When you have a single 
runaway application which tries to allocate everything it would indeed 
work as you described.


But when I tested this a few years ago with X based desktop the 
applications which actually used most of the memory where Firefox and 
Thunderbird. Unfortunately they never got accounted for that.


Now, on my current Wayland based desktop it actually doesn't look much 
better. Taking a look at radeon_gem_info/amdgpu_gem_info the majority of 
all memory was allocated either by gnome-shell or Xwayland.


Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org

Re: [RFC] Per file OOM badness

2018-01-19 Thread Michel Dänzer

On 2018-01-19 09:39 AM, Christian König wrote:
> Am 19.01.2018 um 09:20 schrieb Michal Hocko:
>> On Thu 18-01-18 12:01:32, Eric Anholt wrote:
>>> Michal Hocko  writes:
>>>
 On Thu 18-01-18 18:00:06, Michal Hocko wrote:
> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
>> Hi, this series is a revised version of an RFC sent by Christian
>> König
>> a few years ago. The original RFC can be found at
>> https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
>>
>>
>> This is the same idea and I've just adressed his concern from the
>> original RFC
>> and switched to a callback into file_ops instead of a new member
>> in struct file.
> Please add the full description to the cover letter and do not make
> people hunt links.
>
> Here is the origin cover letter text
> : I'm currently working on the issue that when device drivers
> allocate memory on
> : behalf of an application the OOM killer usually doesn't knew
> about that unless
> : the application also get this memory mapped into their address
> space.
> :
> : This is especially annoying for graphics drivers where a lot of
> the VRAM
> : usually isn't CPU accessible and so doesn't make sense to map
> into the
> : address space of the process using it.
> :
> : The problem now is that when an application starts to use a lot
> of VRAM those
> : buffers objects sooner or later get swapped out to system memory,
> but when we
> : now run into an out of memory situation the OOM killer obviously
> doesn't knew
> : anything about that memory and so usually kills the wrong process.
 OK, but how do you attribute that memory to a particular OOM killable
 entity? And how do you actually enforce that those resources get freed
 on the oom killer action?

> : The following set of patches tries to address this problem by
> introducing a per
> : file OOM badness score, which device drivers can use to give the
> OOM killer a
> : hint how many resources are bound to a file descriptor so that it
> can make
> : better decisions which process to kill.
 But files are not killable, they can be shared... In other words this
 doesn't help the oom killer to make an educated guess at all.
>>> Maybe some more context would help the discussion?
>>>
>>> The struct file in patch 3 is the DRM fd.  That's effectively "my
>>> process's interface to talking to the GPU" not "a single GPU resource".
>>> Once that file is closed, all of the process's private, idle GPU buffers
>>> will be immediately freed (this will be most of their allocations), and
>>> some will be freed once the GPU completes some work (this will be most
>>> of the rest of their allocations).
>>>
>>> Some GEM BOs won't be freed just by closing the fd, if they've been
>>> shared between processes.  Those are usually about 8-24MB total in a
>>> process, rather than the GBs that modern apps use (or that our testcases
>>> like to allocate and thus trigger oomkilling of the test harness instead
>>> of the offending testcase...)
>>>
>>> Even if we just had the private+idle buffers being accounted in OOM
>>> badness, that would be a huge step forward in system reliability.
>> OK, in that case I would propose a different approach. We already
>> have rss_stat. So why do not we simply add a new counter there
>> MM_KERNELPAGES and consider those in oom_badness? The rule would be
>> that such a memory is bound to the process life time. I guess we will
>> find more users for this later.
> 
> I already tried that and the problem with that approach is that some
> buffers are not created by the application which actually uses them.
> 
> For example X/Wayland is creating and handing out render buffers to
> application which want to use OpenGL.
> 
> So the result is when you always account the application who created the
> buffer the OOM killer will certainly reap X/Wayland first. And that is
> exactly what we want to avoid here.

FWIW, what you describe is true with DRI2, but not with DRI3 or Wayland
anymore. With DRI3 and Wayland, buffers are allocated by the clients and
then shared with the X / Wayland server.

Also, in all cases, the amount of memory allocated for buffers shared
between DRI/Wayland clients and the server should be relatively small
compared to the amount of memory allocated for buffers used only locally
in the client, particularly for clients which create significant memory
pressure.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-19 Thread Michal Hocko

On Thu 18-01-18 18:00:06, Michal Hocko wrote:
> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
> > Hi, this series is a revised version of an RFC sent by Christian König
> > a few years ago. The original RFC can be found at 
> > https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
> > 
> > This is the same idea and I've just adressed his concern from the original 
> > RFC 
> > and switched to a callback into file_ops instead of a new member in struct 
> > file.
> 
> Please add the full description to the cover letter and do not make
> people hunt links.
> 
> Here is the origin cover letter text
> : I'm currently working on the issue that when device drivers allocate memory 
> on
> : behalf of an application the OOM killer usually doesn't knew about that 
> unless
> : the application also get this memory mapped into their address space.
> : 
> : This is especially annoying for graphics drivers where a lot of the VRAM
> : usually isn't CPU accessible and so doesn't make sense to map into the
> : address space of the process using it.
> : 
> : The problem now is that when an application starts to use a lot of VRAM 
> those
> : buffers objects sooner or later get swapped out to system memory, but when 
> we
> : now run into an out of memory situation the OOM killer obviously doesn't 
> knew
> : anything about that memory and so usually kills the wrong process.

OK, but how do you attribute that memory to a particular OOM killable
entity? And how do you actually enforce that those resources get freed
on the oom killer action?

> : The following set of patches tries to address this problem by introducing a 
> per
> : file OOM badness score, which device drivers can use to give the OOM killer 
> a
> : hint how many resources are bound to a file descriptor so that it can make
> : better decisions which process to kill.

But files are not killable, they can be shared... In other words this
doesn't help the oom killer to make an educated guess at all.

> : 
> : So question at every one: What do you think about this approach?

I thing is just just wrong semantically. Non-reclaimable memory is a
pain, especially when there is way too much of it. If you can free that
memory somehow then you can hook into slab shrinker API and react on the
memory pressure. If you can account such a memory to a particular
process and make sure that the consumption is bound by the process life
time then we can think of an accounting that oom_badness can consider
when selecting a victim.

-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-19 Thread Michal Hocko

[removed the broken quoting - please try to use an email client which
doesn't mess up the qouted text]

On Fri 19-01-18 06:01:26, He, Roger wrote:
[...]
> I think you are misunderstanding here.
> Actually for now, the memory in TTM Pools already has mm_shrink which is 
> implemented in ttm_pool_mm_shrink_init.
> And here the memory we want to make it contribute to OOM badness is not in 
> TTM Pools.
> Because when TTM buffer allocation success, the memory already is removed 
> from TTM Pools.  

I have no idea what TTM buffers are. But this smells like something
rather specific to the particular subsytem. And my main objection here
is that struct file is not a proper vehicle to carry such an
information. So whatever the TTM subsystem does it should contribute to
generic counters rather than abuse fd because it happens to use it to
communicate with userspace.
-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-19 Thread Michal Hocko

On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
> Hi, this series is a revised version of an RFC sent by Christian König
> a few years ago. The original RFC can be found at 
> https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
> 
> This is the same idea and I've just adressed his concern from the original 
> RFC 
> and switched to a callback into file_ops instead of a new member in struct 
> file.

Please add the full description to the cover letter and do not make
people hunt links.

Here is the origin cover letter text
: I'm currently working on the issue that when device drivers allocate memory on
: behalf of an application the OOM killer usually doesn't knew about that unless
: the application also get this memory mapped into their address space.
: 
: This is especially annoying for graphics drivers where a lot of the VRAM
: usually isn't CPU accessible and so doesn't make sense to map into the
: address space of the process using it.
: 
: The problem now is that when an application starts to use a lot of VRAM those
: buffers objects sooner or later get swapped out to system memory, but when we
: now run into an out of memory situation the OOM killer obviously doesn't knew
: anything about that memory and so usually kills the wrong process.
: 
: The following set of patches tries to address this problem by introducing a 
per
: file OOM badness score, which device drivers can use to give the OOM killer a
: hint how many resources are bound to a file descriptor so that it can make
: better decisions which process to kill.
: 
: So question at every one: What do you think about this approach?
: 
: My biggest concern right now is the patches are messing with a core kernel
: structure (adding a field to struct file). Any better idea? I'm considering
: to put a callback into file_ops instead.

-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-19 Thread Michal Hocko

On Thu 18-01-18 12:01:32, Eric Anholt wrote:
> Michal Hocko  writes:
> 
> > On Thu 18-01-18 18:00:06, Michal Hocko wrote:
> >> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
> >> > Hi, this series is a revised version of an RFC sent by Christian König
> >> > a few years ago. The original RFC can be found at 
> >> > https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
> >> > 
> >> > This is the same idea and I've just adressed his concern from the 
> >> > original RFC 
> >> > and switched to a callback into file_ops instead of a new member in 
> >> > struct file.
> >> 
> >> Please add the full description to the cover letter and do not make
> >> people hunt links.
> >> 
> >> Here is the origin cover letter text
> >> : I'm currently working on the issue that when device drivers allocate 
> >> memory on
> >> : behalf of an application the OOM killer usually doesn't knew about that 
> >> unless
> >> : the application also get this memory mapped into their address space.
> >> : 
> >> : This is especially annoying for graphics drivers where a lot of the VRAM
> >> : usually isn't CPU accessible and so doesn't make sense to map into the
> >> : address space of the process using it.
> >> : 
> >> : The problem now is that when an application starts to use a lot of VRAM 
> >> those
> >> : buffers objects sooner or later get swapped out to system memory, but 
> >> when we
> >> : now run into an out of memory situation the OOM killer obviously doesn't 
> >> knew
> >> : anything about that memory and so usually kills the wrong process.
> >
> > OK, but how do you attribute that memory to a particular OOM killable
> > entity? And how do you actually enforce that those resources get freed
> > on the oom killer action?
> >
> >> : The following set of patches tries to address this problem by 
> >> introducing a per
> >> : file OOM badness score, which device drivers can use to give the OOM 
> >> killer a
> >> : hint how many resources are bound to a file descriptor so that it can 
> >> make
> >> : better decisions which process to kill.
> >
> > But files are not killable, they can be shared... In other words this
> > doesn't help the oom killer to make an educated guess at all.
> 
> Maybe some more context would help the discussion?
> 
> The struct file in patch 3 is the DRM fd.  That's effectively "my
> process's interface to talking to the GPU" not "a single GPU resource".
> Once that file is closed, all of the process's private, idle GPU buffers
> will be immediately freed (this will be most of their allocations), and
> some will be freed once the GPU completes some work (this will be most
> of the rest of their allocations).
> 
> Some GEM BOs won't be freed just by closing the fd, if they've been
> shared between processes.  Those are usually about 8-24MB total in a
> process, rather than the GBs that modern apps use (or that our testcases
> like to allocate and thus trigger oomkilling of the test harness instead
> of the offending testcase...)
> 
> Even if we just had the private+idle buffers being accounted in OOM
> badness, that would be a huge step forward in system reliability.

OK, in that case I would propose a different approach. We already
have rss_stat. So why do not we simply add a new counter there
MM_KERNELPAGES and consider those in oom_badness? The rule would be
that such a memory is bound to the process life time. I guess we will
find more users for this later.
-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-19 Thread Christian König


Am 19.01.2018 um 09:20 schrieb Michal Hocko:

On Thu 18-01-18 12:01:32, Eric Anholt wrote:

Michal Hocko  writes:


On Thu 18-01-18 18:00:06, Michal Hocko wrote:

On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:

Hi, this series is a revised version of an RFC sent by Christian König
a few years ago. The original RFC can be found at
https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html

This is the same idea and I've just adressed his concern from the original RFC
and switched to a callback into file_ops instead of a new member in struct file.

Please add the full description to the cover letter and do not make
people hunt links.

Here is the origin cover letter text
: I'm currently working on the issue that when device drivers allocate memory on
: behalf of an application the OOM killer usually doesn't knew about that unless
: the application also get this memory mapped into their address space.
:
: This is especially annoying for graphics drivers where a lot of the VRAM
: usually isn't CPU accessible and so doesn't make sense to map into the
: address space of the process using it.
:
: The problem now is that when an application starts to use a lot of VRAM those
: buffers objects sooner or later get swapped out to system memory, but when we
: now run into an out of memory situation the OOM killer obviously doesn't knew
: anything about that memory and so usually kills the wrong process.

OK, but how do you attribute that memory to a particular OOM killable
entity? And how do you actually enforce that those resources get freed
on the oom killer action?


: The following set of patches tries to address this problem by introducing a 
per
: file OOM badness score, which device drivers can use to give the OOM killer a
: hint how many resources are bound to a file descriptor so that it can make
: better decisions which process to kill.

But files are not killable, they can be shared... In other words this
doesn't help the oom killer to make an educated guess at all.

Maybe some more context would help the discussion?

The struct file in patch 3 is the DRM fd.  That's effectively "my
process's interface to talking to the GPU" not "a single GPU resource".
Once that file is closed, all of the process's private, idle GPU buffers
will be immediately freed (this will be most of their allocations), and
some will be freed once the GPU completes some work (this will be most
of the rest of their allocations).

Some GEM BOs won't be freed just by closing the fd, if they've been
shared between processes.  Those are usually about 8-24MB total in a
process, rather than the GBs that modern apps use (or that our testcases
like to allocate and thus trigger oomkilling of the test harness instead
of the offending testcase...)

Even if we just had the private+idle buffers being accounted in OOM
badness, that would be a huge step forward in system reliability.

OK, in that case I would propose a different approach. We already
have rss_stat. So why do not we simply add a new counter there
MM_KERNELPAGES and consider those in oom_badness? The rule would be
that such a memory is bound to the process life time. I guess we will
find more users for this later.


I already tried that and the problem with that approach is that some 
buffers are not created by the application which actually uses them.


For example X/Wayland is creating and handing out render buffers to 
application which want to use OpenGL.


So the result is when you always account the application who created the 
buffer the OOM killer will certainly reap X/Wayland first. And that is 
exactly what we want to avoid here.


Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-19 Thread Christian König


Am 18.01.2018 um 21:01 schrieb Eric Anholt:

Michal Hocko  writes:


[SNIP]
But files are not killable, they can be shared... In other words this
doesn't help the oom killer to make an educated guess at all.

Maybe some more context would help the discussion?


Thanks for doing this. Wanted to reply yesterday with that information 
as well, but was unfortunately on sick leave.




The struct file in patch 3 is the DRM fd.  That's effectively "my
process's interface to talking to the GPU" not "a single GPU resource".
Once that file is closed, all of the process's private, idle GPU buffers
will be immediately freed (this will be most of their allocations), and
some will be freed once the GPU completes some work (this will be most
of the rest of their allocations).

Some GEM BOs won't be freed just by closing the fd, if they've been
shared between processes.  Those are usually about 8-24MB total in a
process, rather than the GBs that modern apps use (or that our testcases
like to allocate and thus trigger oomkilling of the test harness instead
of the offending testcase...)

Even if we just had the private+idle buffers being accounted in OOM
badness, that would be a huge step forward in system reliability.


Yes, and that's exactly the intention here because currently the OOM 
killer usually kills X when a graphics related application allocates to 
much memory and that is highly undesirable.



: So question at every one: What do you think about this approach?

I thing is just just wrong semantically. Non-reclaimable memory is a
pain, especially when there is way too much of it. If you can free that
memory somehow then you can hook into slab shrinker API and react on the
memory pressure. If you can account such a memory to a particular
process and make sure that the consumption is bound by the process life
time then we can think of an accounting that oom_badness can consider
when selecting a victim.

For graphics, we can't free most of our memory without also effectively
killing the process.  i915 and vc4 have "purgeable" interfaces for
userspace (on i915 this is exposed all the way to GL applications and is
hooked into shrinker, and on vc4 this is so far just used for
userspace-internal buffer caches to be purged when a CMA allocation
fails).  However, those purgeable pools are expected to be a tiny
fraction of the GPU allocations by the process.


Same thing with TTM and amdgpu/radeon. We already have a shrinker hock 
as well and make room as much as we can when needed.


But I think Michal's concerns are valid as well and I thought about them 
when I created the initial patch.


One possible solution which came to my mind is that (IIRC) we not only 
store the usual reference count per GEM object, but also how many 
handles where created for it.


So what we could do is to iterate over all GEM handles of a client and 
account only size/num_handles as badness for the client.


The end result would be that X and the client application would both get 
1/2 of the GEM objects size accounted for.


Regards,
Christian.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-19 Thread Christian König


Am 19.01.2018 um 06:39 schrieb He, Roger:

Basically the idea is right to me.

1. But we need smaller granularity to control the contribution to OOM badness.
  Because when the TTM buffer resides in VRAM rather than evict to system 
memory, we should not take this account into badness.
  But I think it is not easy to implement.


I was considering that as well when I wrote the original patch set, but 
then decided against it at least for now.


Basically all VRAM buffers can be swapped to system memory, so they 
potentially need system memory as well. That is especially important 
during suspend/resume.




2. If the TTM buffer(GTT here) is mapped to user for CPU access, not quite sure 
the buffer size is already taken into account for kernel.
  If yes, at last the size will be counted again by your patches.


No that isn't accounted for as far as I know.



So, I am thinking if we can counted the TTM buffer size into:
struct mm_rss_stat {
atomic_long_t count[NR_MM_COUNTERS];
};
Which is done by kernel based on CPU VM (page table).

Something like that:
When GTT allocate suceess:
add_mm_counter(vma->vm_mm, MM_ANONPAGES, buffer_size);

When GTT swapped out:
dec_mm_counter from MM_ANONPAGES frist, then
add_mm_counter(vma->vm_mm, MM_SWAPENTS, buffer_size);  // or MM_SHMEMPAGES or 
add new item.

Update the corresponding item in mm_rss_stat always.
If that, we can control the status update accurately.
What do you think about that?
And is there any side-effect for this approach?


I already tried this when I originally worked on the issue and that 
approach didn't worked because allocated buffers are not associated to 
the process where they are created.


E.g. most display surfaces are created by the X server, but used by 
processes. So if you account the BO to the process who created it we 
would start to kill X again and that is exactly what we try to avoid.


Regards,
Christian.




Thanks
Roger(Hongbo.He)

-Original Message-
From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of 
Andrey Grodzovsky
Sent: Friday, January 19, 2018 12:48 AM
To: linux-ker...@vger.kernel.org; linux...@kvack.org; 
dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org
Cc: Koenig, Christian 
Subject: [RFC] Per file OOM badness

Hi, this series is a revised version of an RFC sent by Christian König a few 
years ago. The original RFC can be found at 
https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html

This is the same idea and I've just adressed his concern from the original RFC 
and switched to a callback into file_ops instead of a new member in struct file.

Thanks,
Andrey

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

RE: [RFC] Per file OOM badness

2018-01-18 Thread He, Roger

Basically the idea is right to me.

1. But we need smaller granularity to control the contribution to OOM badness.
 Because when the TTM buffer resides in VRAM rather than evict to system 
memory, we should not take this account into badness.
 But I think it is not easy to implement.

2. If the TTM buffer(GTT here) is mapped to user for CPU access, not quite sure 
the buffer size is already taken into account for kernel.
 If yes, at last the size will be counted again by your patches.

So, I am thinking if we can counted the TTM buffer size into: 
struct mm_rss_stat {
atomic_long_t count[NR_MM_COUNTERS];
};
Which is done by kernel based on CPU VM (page table).

Something like that:
When GTT allocate suceess:
add_mm_counter(vma->vm_mm, MM_ANONPAGES, buffer_size);

When GTT swapped out:
dec_mm_counter from MM_ANONPAGES frist, then 
add_mm_counter(vma->vm_mm, MM_SWAPENTS, buffer_size);  // or MM_SHMEMPAGES or 
add new item.

Update the corresponding item in mm_rss_stat always.
If that, we can control the status update accurately. 
What do you think about that?
And is there any side-effect for this approach?


Thanks
Roger(Hongbo.He)

-Original Message-
From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of 
Andrey Grodzovsky
Sent: Friday, January 19, 2018 12:48 AM
To: linux-ker...@vger.kernel.org; linux...@kvack.org; 
dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org
Cc: Koenig, Christian 
Subject: [RFC] Per file OOM badness

Hi, this series is a revised version of an RFC sent by Christian König a few 
years ago. The original RFC can be found at 
https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html

This is the same idea and I've just adressed his concern from the original RFC 
and switched to a callback into file_ops instead of a new member in struct file.

Thanks,
Andrey

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

RE: [RFC] Per file OOM badness

2018-01-18 Thread He, Roger

-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
Michal Hocko
Sent: Friday, January 19, 2018 1:14 AM
To: Grodzovsky, Andrey 
Cc: linux...@kvack.org; amd-...@lists.freedesktop.org; 
linux-ker...@vger.kernel.org; dri-devel@lists.freedesktop.org; Koenig, 
Christian 
Subject: Re: [RFC] Per file OOM badness

On Thu 18-01-18 18:00:06, Michal Hocko wrote:
> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
> > Hi, this series is a revised version of an RFC sent by Christian 
> > König a few years ago. The original RFC can be found at 
> > https://lists.freedesktop.org/archives/dri-devel/2015-September/0897
> > 78.html
> > 
> > This is the same idea and I've just adressed his concern from the 
> > original RFC and switched to a callback into file_ops instead of a new 
> > member in struct file.
> 
> Please add the full description to the cover letter and do not make 
> people hunt links.
> 
> Here is the origin cover letter text
> : I'm currently working on the issue that when device drivers allocate 
> memory on
> : behalf of an application the OOM killer usually doesn't knew about 
> that unless
> : the application also get this memory mapped into their address space.
> : 
> : This is especially annoying for graphics drivers where a lot of the 
> VRAM
> : usually isn't CPU accessible and so doesn't make sense to map into 
> the
> : address space of the process using it.
> : 
> : The problem now is that when an application starts to use a lot of 
> VRAM those
> : buffers objects sooner or later get swapped out to system memory, 
> but when we
> : now run into an out of memory situation the OOM killer obviously 
> doesn't knew
> : anything about that memory and so usually kills the wrong process.

OK, but how do you attribute that memory to a particular OOM killable 
entity? And how do you actually enforce that those resources get freed on the 
oom killer action?

Here I think we need more fine granularity for distinguishing the buffer is 
taking VRAM or system memory.

> : The following set of patches tries to address this problem by 
> introducing a per
> : file OOM badness score, which device drivers can use to give the OOM 
> killer a
> : hint how many resources are bound to a file descriptor so that it 
> can make
> : better decisions which process to kill.

But files are not killable, they can be shared... In other words this doesn't 
help the oom killer to make an educated guess at all.

> : 
> : So question at every one: What do you think about this approach?

I thing is just just wrong semantically. Non-reclaimable memory is a 
pain, especially when there is way too much of it. If you can free that memory 
somehow then you can hook into slab shrinker API and react on the memory 
pressure. If you can account such amemory to a particular process and 
make sure that the consumption is bound by the process life time then we can 
think of an accounting that oom_badness can consider when selecting a victim.

I think you are misunderstanding here.
Actually for now, the memory in TTM Pools already has mm_shrink which is 
implemented in ttm_pool_mm_shrink_init.
And here the memory we want to make it contribute to OOM badness is not in TTM 
Pools.
Because when TTM buffer allocation success, the memory already is removed from 
TTM Pools.  

Thanks
Roger(Hongbo.He)

--
Michal Hocko
SUSE Labs
___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Per file OOM badness

2018-01-18 Thread Eric Anholt

Michal Hocko  writes:

> On Thu 18-01-18 18:00:06, Michal Hocko wrote:
>> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
>> > Hi, this series is a revised version of an RFC sent by Christian König
>> > a few years ago. The original RFC can be found at 
>> > https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
>> > 
>> > This is the same idea and I've just adressed his concern from the original 
>> > RFC 
>> > and switched to a callback into file_ops instead of a new member in struct 
>> > file.
>> 
>> Please add the full description to the cover letter and do not make
>> people hunt links.
>> 
>> Here is the origin cover letter text
>> : I'm currently working on the issue that when device drivers allocate 
>> memory on
>> : behalf of an application the OOM killer usually doesn't knew about that 
>> unless
>> : the application also get this memory mapped into their address space.
>> : 
>> : This is especially annoying for graphics drivers where a lot of the VRAM
>> : usually isn't CPU accessible and so doesn't make sense to map into the
>> : address space of the process using it.
>> : 
>> : The problem now is that when an application starts to use a lot of VRAM 
>> those
>> : buffers objects sooner or later get swapped out to system memory, but when 
>> we
>> : now run into an out of memory situation the OOM killer obviously doesn't 
>> knew
>> : anything about that memory and so usually kills the wrong process.
>
> OK, but how do you attribute that memory to a particular OOM killable
> entity? And how do you actually enforce that those resources get freed
> on the oom killer action?
>
>> : The following set of patches tries to address this problem by introducing 
>> a per
>> : file OOM badness score, which device drivers can use to give the OOM 
>> killer a
>> : hint how many resources are bound to a file descriptor so that it can make
>> : better decisions which process to kill.
>
> But files are not killable, they can be shared... In other words this
> doesn't help the oom killer to make an educated guess at all.

Maybe some more context would help the discussion?

The struct file in patch 3 is the DRM fd.  That's effectively "my
process's interface to talking to the GPU" not "a single GPU resource".
Once that file is closed, all of the process's private, idle GPU buffers
will be immediately freed (this will be most of their allocations), and
some will be freed once the GPU completes some work (this will be most
of the rest of their allocations).

Some GEM BOs won't be freed just by closing the fd, if they've been
shared between processes.  Those are usually about 8-24MB total in a
process, rather than the GBs that modern apps use (or that our testcases
like to allocate and thus trigger oomkilling of the test harness instead
of the offending testcase...)

Even if we just had the private+idle buffers being accounted in OOM
badness, that would be a huge step forward in system reliability.

>> : So question at every one: What do you think about this approach?
>
> I thing is just just wrong semantically. Non-reclaimable memory is a
> pain, especially when there is way too much of it. If you can free that
> memory somehow then you can hook into slab shrinker API and react on the
> memory pressure. If you can account such a memory to a particular
> process and make sure that the consumption is bound by the process life
> time then we can think of an accounting that oom_badness can consider
> when selecting a victim.

For graphics, we can't free most of our memory without also effectively
killing the process.  i915 and vc4 have "purgeable" interfaces for
userspace (on i915 this is exposed all the way to GL applications and is
hooked into shrinker, and on vc4 this is so far just used for
userspace-internal buffer caches to be purged when a CMA allocation
fails).  However, those purgeable pools are expected to be a tiny
fraction of the GPU allocations by the process.

signature.asc
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[RFC] Per file OOM badness

2018-01-18 Thread Andrey Grodzovsky

Hi, this series is a revised version of an RFC sent by Christian König
a few years ago. The original RFC can be found at 
https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html

This is the same idea and I've just adressed his concern from the original RFC 
and switched to a callback into file_ops instead of a new member in struct file.

Thanks,
Andrey

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[RFC] Per file OOM badness

2015-09-04 Thread Christian König

Hello everyone,

I'm currently working on the issue that when device drivers allocate memory on
behalf of an application the OOM killer usually doesn't knew about that unless
the application also get this memory mapped into their address space.

This is especially annoying for graphics drivers where a lot of the VRAM
usually isn't CPU accessible and so doesn't make sense to map into the
address space of the process using it.

The problem now is that when an application starts to use a lot of VRAM those
buffers objects sooner or later get swapped out to system memory, but when we
now run into an out of memory situation the OOM killer obviously doesn't knew
anything about that memory and so usually kills the wrong process.

The following set of patches tries to address this problem by introducing a per
file OOM badness score, which device drivers can use to give the OOM killer a
hint how many resources are bound to a file descriptor so that it can make
better decisions which process to kill.

So question at every one: What do you think about this approach?

My biggest concern right now is the patches are messing with a core kernel
structure (adding a field to struct file). Any better idea? I'm considering
to put a callback into file_ops instead.

Best regards and feel free to tear this idea apart,
Christian.

57 matches

Mail list logo