[RFC] Per file OOM-badness / RSS once more
Hello everyone, To summarize the issue I'm trying to address here: Processes can allocate resources through a file descriptor without being held responsible for it. I'm not explaining all the details again. See here for a more deeply description of the problem: https://lwn.net/ml/linux-kernel/2022053117.174649-1-christian.koe...@amd.com/ With this iteration I'm trying to address a bunch of the comments Michal Hocko (thanks a lot for that) gave as well as giving some new ideas. Changes made so far: 1. Renamed the callback into file_rss(). This is at least a start to better describe what this is all about. I've been going back and forth over the naming here, if you have any better idea please speak up. 2. Cleanups, e.g. now providing a helper function in the fs layer to sum up all the pages allocated by the files in a file descriptor table. 3. Using the actual number of allocated pages for the shmem implementation instead of just the size. I also tried to ignore shmem files which are part of tmpfs, cause that has a separate accounting/limitation approach. 4. The OOM killer now prints the memory of the killed process including the per file pages which makes the whole things much more comprehensible. 5. I've added the per file pages to the different reports in RSS in procfs. This has the interesting effect that tools like top suddenly give a much more accurate overview of the memory use as well. This of course increases the overhead of gathering those information quite a bit and I'm not sure how feasible that is for up-streaming. On the other hand this once more clearly shows that we need to do something about this issue. Another rather interesting observation is that multiple subsystems (shmem, tmpfs, ttm) came up with the same workaround of limiting the memory which can be allocated through them to 50% of the whole system memory. Unfortunately that isn't the same 50% and it doesn't apply everywhere, so you can still easily crash the box. Ideas and/or comments are really welcome. Thanks, Christian.
Re: [RFC] Per file OOM badness
On 2018-04-04 11:36 AM, Lucas Stach wrote: > Am Mittwoch, den 04.04.2018, 11:09 +0200 schrieb Michel Dänzer: >> On 2018-03-26 04:36 PM, Lucas Stach wrote: >>> Am Dienstag, den 30.01.2018, 11:28 +0100 schrieb Michal Hocko: On Tue 30-01-18 10:29:10, Michel Dänzer wrote: > On 2018-01-24 12:50 PM, Michal Hocko wrote: >> On Wed 24-01-18 12:23:10, Michel Dänzer wrote: >>> On 2018-01-24 12:01 PM, Michal Hocko wrote: On Wed 24-01-18 11:27:15, Michel Dänzer wrote: >> >> [...] > 2. If the OOM killer kills a process which is sharing BOs > with another > process, this should result in the other process dropping > its references > to the BOs as well, at which point the memory is released. OK. How exactly are those BOs mapped to the userspace? >>> >>> I'm not sure what you're asking. Userspace mostly uses a GEM >>> handle to >>> refer to a BO. There can also be userspace CPU mappings of the >>> BO's >>> memory, but userspace doesn't need CPU mappings for all BOs and >>> only >>> creates them as needed. >> >> OK, I guess you have to bear with me some more. This whole stack >> is a >> complete uknonwn. I am mostly after finding a boundary where you >> can >> charge the allocated memory to the process so that the oom killer >> can >> consider it. Is there anything like that? Except for the proposed >> file >> handle hack? > > How about the other way around: what APIs can we use to charge / > "uncharge" memory to a process? If we have those, we can experiment > with > different places to call them. add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES. >>> >>> So is anyone still working on this? This is hurting us bad enough that >>> I don't want to keep this topic rotting for another year. >>> >>> If no one is currently working on this I would volunteer to give the >>> simple "just account private, non-shared buffers in process RSS" a >>> spin. >> >> Sounds good. FWIW, I think shared buffers can also be easily handled by >> accounting them in each process which has a reference. But that's more >> of a detail, shouldn't make a big difference overall either way. > > Yes, both options to wither never account shared buffers or to always > account them into every process having a reference should be pretty > easy. Where it gets hard is when trying to account the buffer only in > the last process holding a reference or something like this. FWIW, I don't think that would make sense anyway. A shared buffer is actually used by all processes which have a reference to it, so it should be accounted the same in all of them. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
Am Mittwoch, den 04.04.2018, 11:09 +0200 schrieb Michel Dänzer: > On 2018-03-26 04:36 PM, Lucas Stach wrote: > > Am Dienstag, den 30.01.2018, 11:28 +0100 schrieb Michal Hocko: > > > On Tue 30-01-18 10:29:10, Michel Dänzer wrote: > > > > On 2018-01-24 12:50 PM, Michal Hocko wrote: > > > > > On Wed 24-01-18 12:23:10, Michel Dänzer wrote: > > > > > > On 2018-01-24 12:01 PM, Michal Hocko wrote: > > > > > > > On Wed 24-01-18 11:27:15, Michel Dänzer wrote: > > > > > > > > > > [...] > > > > > > > > 2. If the OOM killer kills a process which is sharing BOs > > > > > > > > with another > > > > > > > > process, this should result in the other process dropping > > > > > > > > its references > > > > > > > > to the BOs as well, at which point the memory is released. > > > > > > > > > > > > > > OK. How exactly are those BOs mapped to the userspace? > > > > > > > > > > > > I'm not sure what you're asking. Userspace mostly uses a GEM > > > > > > handle to > > > > > > refer to a BO. There can also be userspace CPU mappings of the > > > > > > BO's > > > > > > memory, but userspace doesn't need CPU mappings for all BOs and > > > > > > only > > > > > > creates them as needed. > > > > > > > > > > OK, I guess you have to bear with me some more. This whole stack > > > > > is a > > > > > complete uknonwn. I am mostly after finding a boundary where you > > > > > can > > > > > charge the allocated memory to the process so that the oom killer > > > > > can > > > > > consider it. Is there anything like that? Except for the proposed > > > > > file > > > > > handle hack? > > > > > > > > How about the other way around: what APIs can we use to charge / > > > > "uncharge" memory to a process? If we have those, we can experiment > > > > with > > > > different places to call them. > > > > > > add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES. > > > > So is anyone still working on this? This is hurting us bad enough that > > I don't want to keep this topic rotting for another year. > > > > If no one is currently working on this I would volunteer to give the > > simple "just account private, non-shared buffers in process RSS" a > > spin. > > Sounds good. FWIW, I think shared buffers can also be easily handled by > accounting them in each process which has a reference. But that's more > of a detail, shouldn't make a big difference overall either way. Yes, both options to wither never account shared buffers or to always account them into every process having a reference should be pretty easy. Where it gets hard is when trying to account the buffer only in the last process holding a reference or something like this. For the OOM case I think it makes more sense to never account shared buffers, as this may lead to a process (like the compositor) to have its RSS inflated by shared buffers, rendering it the likely victim for the OOM killer/reaper, while killing this process will not lead to freeing of any shared graphics memory, at least if the clients sharing the buffer survive killing of the compositor. This opens up the possibility to "hide" buffers from the accounting by sharing them, but I guess it's still much better than the nothing we do today to account for graphics buffers. Regards, Lucas ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 2018-03-26 04:36 PM, Lucas Stach wrote: > Am Dienstag, den 30.01.2018, 11:28 +0100 schrieb Michal Hocko: >> On Tue 30-01-18 10:29:10, Michel Dänzer wrote: >>> On 2018-01-24 12:50 PM, Michal Hocko wrote: On Wed 24-01-18 12:23:10, Michel Dänzer wrote: > On 2018-01-24 12:01 PM, Michal Hocko wrote: >> On Wed 24-01-18 11:27:15, Michel Dänzer wrote: [...] >>> 2. If the OOM killer kills a process which is sharing BOs >>> with another >>> process, this should result in the other process dropping >>> its references >>> to the BOs as well, at which point the memory is released. >> >> OK. How exactly are those BOs mapped to the userspace? > > I'm not sure what you're asking. Userspace mostly uses a GEM > handle to > refer to a BO. There can also be userspace CPU mappings of the > BO's > memory, but userspace doesn't need CPU mappings for all BOs and > only > creates them as needed. OK, I guess you have to bear with me some more. This whole stack is a complete uknonwn. I am mostly after finding a boundary where you can charge the allocated memory to the process so that the oom killer can consider it. Is there anything like that? Except for the proposed file handle hack? >>> >>> How about the other way around: what APIs can we use to charge / >>> "uncharge" memory to a process? If we have those, we can experiment >>> with >>> different places to call them. >> >> add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES. > > So is anyone still working on this? This is hurting us bad enough that > I don't want to keep this topic rotting for another year. > > If no one is currently working on this I would volunteer to give the > simple "just account private, non-shared buffers in process RSS" a > spin. Sounds good. FWIW, I think shared buffers can also be easily handled by accounting them in each process which has a reference. But that's more of a detail, shouldn't make a big difference overall either way. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
Hi all, Am Dienstag, den 30.01.2018, 11:28 +0100 schrieb Michal Hocko: > On Tue 30-01-18 10:29:10, Michel Dänzer wrote: > > On 2018-01-24 12:50 PM, Michal Hocko wrote: > > > On Wed 24-01-18 12:23:10, Michel Dänzer wrote: > > > > On 2018-01-24 12:01 PM, Michal Hocko wrote: > > > > > On Wed 24-01-18 11:27:15, Michel Dänzer wrote: > > > > > > [...] > > > > > > 2. If the OOM killer kills a process which is sharing BOs > > > > > > with another > > > > > > process, this should result in the other process dropping > > > > > > its references > > > > > > to the BOs as well, at which point the memory is released. > > > > > > > > > > OK. How exactly are those BOs mapped to the userspace? > > > > > > > > I'm not sure what you're asking. Userspace mostly uses a GEM > > > > handle to > > > > refer to a BO. There can also be userspace CPU mappings of the > > > > BO's > > > > memory, but userspace doesn't need CPU mappings for all BOs and > > > > only > > > > creates them as needed. > > > > > > OK, I guess you have to bear with me some more. This whole stack > > > is a > > > complete uknonwn. I am mostly after finding a boundary where you > > > can > > > charge the allocated memory to the process so that the oom killer > > > can > > > consider it. Is there anything like that? Except for the proposed > > > file > > > handle hack? > > > > How about the other way around: what APIs can we use to charge / > > "uncharge" memory to a process? If we have those, we can experiment > > with > > different places to call them. > > add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES. So is anyone still working on this? This is hurting us bad enough that I don't want to keep this topic rotting for another year. If no one is currently working on this I would volunteer to give the simple "just account private, non-shared buffers in process RSS" a spin. Regards, Lucas ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 2018-01-30 12:56 PM, Christian König wrote: > Am 30.01.2018 um 12:42 schrieb Michel Dänzer: >> On 2018-01-30 12:36 PM, Nicolai Hähnle wrote: >>> On 30.01.2018 12:34, Michel Dänzer wrote: On 2018-01-30 12:28 PM, Christian König wrote: > Am 30.01.2018 um 12:02 schrieb Michel Dänzer: >> On 2018-01-30 11:40 AM, Christian König wrote: >>> Am 30.01.2018 um 10:43 schrieb Michel Dänzer: [SNIP] > Would it be ok to hang onto potentially arbitrary mmget references > essentially forever? If that's ok I think we can do your process > based > account (minus a few minor inaccuracies for shared stuff perhaps, > but no > one cares about that). Honestly, I think you and Christian are overthinking this. Let's try charging the memory to every process which shares a buffer, and go from there. >>> My problem is that this needs to be bullet prove. >>> >>> For example imagine an application which allocates a lot of BOs, >>> then >>> calls fork() and let the parent process die. The file descriptor >>> lives >>> on in the child process, but the memory is not accounted against the >>> child. >> What exactly are you referring to by "the file descriptor" here? > The file descriptor used to identify the connection to the driver. In > other words our drm_file structure in the kernel. > >> What happens to BO handles in general in this case? If both parent >> and >> child process keep the same handle for the same BO, one of them >> destroying the handle will result in the other one not being able to >> use >> it anymore either, won't it? > Correct. > > That usage is actually not useful at all, but we already had > applications which did exactly that by accident. > > Not to mention that somebody could do it on purpose. Can we just prevent child processes from using their parent's DRM file descriptors altogether? Allowing it seems like a bad idea all around. >>> Existing protocols pass DRM fds between processes though, don't they? >>> >>> Not child processes perhaps, but special-casing that seems like awful >>> design. >> Fair enough. >> >> Can we disallow passing DRM file descriptors which have any buffers >> allocated? :) > > Hehe good point, but I'm sorry I have to ruin that. > > The root VM page table is allocated when the DRM file descriptor is > created and we want to account those to whoever uses the file descriptor > as well. Alternatively, since the file descriptor is closed in the sending process in this case, maybe we can "uncharge" the buffer memory from the sending process and charge it to the receiving one during the transfer? > Looking into the fs layer there actually only seem to be two function > which are involved when a file descriptor is installed/removed from a > process. So we just need to add some callbacks there. That could work for file descriptor passing, but I'm not sure it really helps for the fork case. Let's say we charge the buffer memory to the child process as well. If either process later destroys a buffer handle, the buffer becomes inaccessible to the other process as well, however its memory remains charged to it (even though it may already be freed). I think using a DRM file descriptor in both parent and child processes is a pathological case that we really want to prevent rather than worrying about how to make it work well. It doesn't seem to be working well in general already anyway. Maybe we could keep track of which process "owns" a DRM file descriptor, and return an error from any relevant system calls for it from another process. When passing an fd, its ownership would transfer to the receiving process. When forking, the ownership would remain with the parent process. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
Am 30.01.2018 um 12:42 schrieb Michel Dänzer: On 2018-01-30 12:36 PM, Nicolai Hähnle wrote: On 30.01.2018 12:34, Michel Dänzer wrote: On 2018-01-30 12:28 PM, Christian König wrote: Am 30.01.2018 um 12:02 schrieb Michel Dänzer: On 2018-01-30 11:40 AM, Christian König wrote: Am 30.01.2018 um 10:43 schrieb Michel Dänzer: [SNIP] Would it be ok to hang onto potentially arbitrary mmget references essentially forever? If that's ok I think we can do your process based account (minus a few minor inaccuracies for shared stuff perhaps, but no one cares about that). Honestly, I think you and Christian are overthinking this. Let's try charging the memory to every process which shares a buffer, and go from there. My problem is that this needs to be bullet prove. For example imagine an application which allocates a lot of BOs, then calls fork() and let the parent process die. The file descriptor lives on in the child process, but the memory is not accounted against the child. What exactly are you referring to by "the file descriptor" here? The file descriptor used to identify the connection to the driver. In other words our drm_file structure in the kernel. What happens to BO handles in general in this case? If both parent and child process keep the same handle for the same BO, one of them destroying the handle will result in the other one not being able to use it anymore either, won't it? Correct. That usage is actually not useful at all, but we already had applications which did exactly that by accident. Not to mention that somebody could do it on purpose. Can we just prevent child processes from using their parent's DRM file descriptors altogether? Allowing it seems like a bad idea all around. Existing protocols pass DRM fds between processes though, don't they? Not child processes perhaps, but special-casing that seems like awful design. Fair enough. Can we disallow passing DRM file descriptors which have any buffers allocated? :) Hehe good point, but I'm sorry I have to ruin that. The root VM page table is allocated when the DRM file descriptor is created and we want to account those to whoever uses the file descriptor as well. We could now make an exception for the root VM page table to not be accounted (shouldn't be that much compared to the rest of the VM tree), but Nicolai is right all those exceptions are just an awful design :) Looking into the fs layer there actually only seem to be two function which are involved when a file descriptor is installed/removed from a process. So we just need to add some callbacks there. Regards, Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 2018-01-30 12:36 PM, Nicolai Hähnle wrote: > On 30.01.2018 12:34, Michel Dänzer wrote: >> On 2018-01-30 12:28 PM, Christian König wrote: >>> Am 30.01.2018 um 12:02 schrieb Michel Dänzer: On 2018-01-30 11:40 AM, Christian König wrote: > Am 30.01.2018 um 10:43 schrieb Michel Dänzer: >> [SNIP] >>> Would it be ok to hang onto potentially arbitrary mmget references >>> essentially forever? If that's ok I think we can do your process >>> based >>> account (minus a few minor inaccuracies for shared stuff perhaps, >>> but no >>> one cares about that). >> Honestly, I think you and Christian are overthinking this. Let's try >> charging the memory to every process which shares a buffer, and go >> from >> there. > My problem is that this needs to be bullet prove. > > For example imagine an application which allocates a lot of BOs, then > calls fork() and let the parent process die. The file descriptor lives > on in the child process, but the memory is not accounted against the > child. What exactly are you referring to by "the file descriptor" here? >>> >>> The file descriptor used to identify the connection to the driver. In >>> other words our drm_file structure in the kernel. >>> What happens to BO handles in general in this case? If both parent and child process keep the same handle for the same BO, one of them destroying the handle will result in the other one not being able to use it anymore either, won't it? >>> Correct. >>> >>> That usage is actually not useful at all, but we already had >>> applications which did exactly that by accident. >>> >>> Not to mention that somebody could do it on purpose. >> >> Can we just prevent child processes from using their parent's DRM file >> descriptors altogether? Allowing it seems like a bad idea all around. > > Existing protocols pass DRM fds between processes though, don't they? > > Not child processes perhaps, but special-casing that seems like awful > design. Fair enough. Can we disallow passing DRM file descriptors which have any buffers allocated? :) -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 30.01.2018 12:34, Michel Dänzer wrote: On 2018-01-30 12:28 PM, Christian König wrote: Am 30.01.2018 um 12:02 schrieb Michel Dänzer: On 2018-01-30 11:40 AM, Christian König wrote: Am 30.01.2018 um 10:43 schrieb Michel Dänzer: [SNIP] Would it be ok to hang onto potentially arbitrary mmget references essentially forever? If that's ok I think we can do your process based account (minus a few minor inaccuracies for shared stuff perhaps, but no one cares about that). Honestly, I think you and Christian are overthinking this. Let's try charging the memory to every process which shares a buffer, and go from there. My problem is that this needs to be bullet prove. For example imagine an application which allocates a lot of BOs, then calls fork() and let the parent process die. The file descriptor lives on in the child process, but the memory is not accounted against the child. What exactly are you referring to by "the file descriptor" here? The file descriptor used to identify the connection to the driver. In other words our drm_file structure in the kernel. What happens to BO handles in general in this case? If both parent and child process keep the same handle for the same BO, one of them destroying the handle will result in the other one not being able to use it anymore either, won't it? Correct. That usage is actually not useful at all, but we already had applications which did exactly that by accident. Not to mention that somebody could do it on purpose. Can we just prevent child processes from using their parent's DRM file descriptors altogether? Allowing it seems like a bad idea all around. Existing protocols pass DRM fds between processes though, don't they? Not child processes perhaps, but special-casing that seems like awful design. Cheers, Nicolai -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 30.01.2018 11:48, Michel Dänzer wrote: On 2018-01-30 11:42 AM, Daniel Vetter wrote: On Tue, Jan 30, 2018 at 10:43:10AM +0100, Michel Dänzer wrote: On 2018-01-30 10:31 AM, Daniel Vetter wrote: I guess a good first order approximation would be if we simply charge any newly allocated buffers to the process that created them, but that means hanging onto lots of mm_struct pointers since we want to make sure we then release those pages to the right mm again (since the process that drops the last ref might be a totally different one, depending upon how the buffers or DRM fd have been shared). Would it be ok to hang onto potentially arbitrary mmget references essentially forever? If that's ok I think we can do your process based account (minus a few minor inaccuracies for shared stuff perhaps, but no one cares about that). Honestly, I think you and Christian are overthinking this. Let's try charging the memory to every process which shares a buffer, and go from there. I'm not concerned about wrongly accounting shared buffers (they don't matter), but imbalanced accounting. I.e. allocate a buffer in the client, share it, but then the compositor drops the last reference. I don't think the order matters. The memory is "uncharged" in each process when it drops its reference. Daniel made a fair point about passing DRM fds between processes, though. It's not a problem with how the fds are currently used, but somebody could do the following: 1. Create a DRM fd in process A, allocate lots of buffers. 2. Pass the fd to process B via some IPC mechanism. 3. Exit process A. There needs to be some assurance that the BOs are accounted as belonging to process B in the end. Cheers, Nicolai -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 2018-01-30 12:28 PM, Christian König wrote: > Am 30.01.2018 um 12:02 schrieb Michel Dänzer: >> On 2018-01-30 11:40 AM, Christian König wrote: >>> Am 30.01.2018 um 10:43 schrieb Michel Dänzer: [SNIP] > Would it be ok to hang onto potentially arbitrary mmget references > essentially forever? If that's ok I think we can do your process based > account (minus a few minor inaccuracies for shared stuff perhaps, > but no > one cares about that). Honestly, I think you and Christian are overthinking this. Let's try charging the memory to every process which shares a buffer, and go from there. >>> My problem is that this needs to be bullet prove. >>> >>> For example imagine an application which allocates a lot of BOs, then >>> calls fork() and let the parent process die. The file descriptor lives >>> on in the child process, but the memory is not accounted against the >>> child. >> What exactly are you referring to by "the file descriptor" here? > > The file descriptor used to identify the connection to the driver. In > other words our drm_file structure in the kernel. > >> What happens to BO handles in general in this case? If both parent and >> child process keep the same handle for the same BO, one of them >> destroying the handle will result in the other one not being able to use >> it anymore either, won't it? > Correct. > > That usage is actually not useful at all, but we already had > applications which did exactly that by accident. > > Not to mention that somebody could do it on purpose. Can we just prevent child processes from using their parent's DRM file descriptors altogether? Allowing it seems like a bad idea all around. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
Am 30.01.2018 um 12:02 schrieb Michel Dänzer: On 2018-01-30 11:40 AM, Christian König wrote: Am 30.01.2018 um 10:43 schrieb Michel Dänzer: [SNIP] Would it be ok to hang onto potentially arbitrary mmget references essentially forever? If that's ok I think we can do your process based account (minus a few minor inaccuracies for shared stuff perhaps, but no one cares about that). Honestly, I think you and Christian are overthinking this. Let's try charging the memory to every process which shares a buffer, and go from there. My problem is that this needs to be bullet prove. For example imagine an application which allocates a lot of BOs, then calls fork() and let the parent process die. The file descriptor lives on in the child process, but the memory is not accounted against the child. What exactly are you referring to by "the file descriptor" here? The file descriptor used to identify the connection to the driver. In other words our drm_file structure in the kernel. What happens to BO handles in general in this case? If both parent and child process keep the same handle for the same BO, one of them destroying the handle will result in the other one not being able to use it anymore either, won't it? Correct. That usage is actually not useful at all, but we already had applications which did exactly that by accident. Not to mention that somebody could do it on purpose. Regards, Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 2018-01-30 11:40 AM, Christian König wrote: > Am 30.01.2018 um 10:43 schrieb Michel Dänzer: >> [SNIP] >>> Would it be ok to hang onto potentially arbitrary mmget references >>> essentially forever? If that's ok I think we can do your process based >>> account (minus a few minor inaccuracies for shared stuff perhaps, but no >>> one cares about that). >> Honestly, I think you and Christian are overthinking this. Let's try >> charging the memory to every process which shares a buffer, and go from >> there. > > My problem is that this needs to be bullet prove. > > For example imagine an application which allocates a lot of BOs, then > calls fork() and let the parent process die. The file descriptor lives > on in the child process, but the memory is not accounted against the child. What exactly are you referring to by "the file descriptor" here? What happens to BO handles in general in this case? If both parent and child process keep the same handle for the same BO, one of them destroying the handle will result in the other one not being able to use it anymore either, won't it? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 2018-01-30 11:42 AM, Daniel Vetter wrote: > On Tue, Jan 30, 2018 at 10:43:10AM +0100, Michel Dänzer wrote: >> On 2018-01-30 10:31 AM, Daniel Vetter wrote: >> >>> I guess a good first order approximation would be if we simply charge any >>> newly allocated buffers to the process that created them, but that means >>> hanging onto lots of mm_struct pointers since we want to make sure we then >>> release those pages to the right mm again (since the process that drops >>> the last ref might be a totally different one, depending upon how the >>> buffers or DRM fd have been shared). >>> >>> Would it be ok to hang onto potentially arbitrary mmget references >>> essentially forever? If that's ok I think we can do your process based >>> account (minus a few minor inaccuracies for shared stuff perhaps, but no >>> one cares about that). >> >> Honestly, I think you and Christian are overthinking this. Let's try >> charging the memory to every process which shares a buffer, and go from >> there. > > I'm not concerned about wrongly accounting shared buffers (they don't > matter), but imbalanced accounting. I.e. allocate a buffer in the client, > share it, but then the compositor drops the last reference. I don't think the order matters. The memory is "uncharged" in each process when it drops its reference. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On Tue, Jan 30, 2018 at 10:43:10AM +0100, Michel Dänzer wrote: > On 2018-01-30 10:31 AM, Daniel Vetter wrote: > > On Wed, Jan 24, 2018 at 01:11:09PM +0100, Christian König wrote: > >> Am 24.01.2018 um 12:50 schrieb Michal Hocko: > >>> On Wed 24-01-18 12:23:10, Michel Dänzer wrote: > On 2018-01-24 12:01 PM, Michal Hocko wrote: > > On Wed 24-01-18 11:27:15, Michel Dänzer wrote: > >>> [...] > >> 2. If the OOM killer kills a process which is sharing BOs with another > >> process, this should result in the other process dropping its > >> references > >> to the BOs as well, at which point the memory is released. > > OK. How exactly are those BOs mapped to the userspace? > I'm not sure what you're asking. Userspace mostly uses a GEM handle to > refer to a BO. There can also be userspace CPU mappings of the BO's > memory, but userspace doesn't need CPU mappings for all BOs and only > creates them as needed. > >>> OK, I guess you have to bear with me some more. This whole stack is a > >>> complete uknonwn. I am mostly after finding a boundary where you can > >>> charge the allocated memory to the process so that the oom killer can > >>> consider it. Is there anything like that? Except for the proposed file > >>> handle hack? > >> > >> Not that I knew of. > >> > >> As I said before we need some kind of callback that a process now starts to > >> use a file descriptor, but without anything from that file descriptor > >> mapped > >> into the address space. > > > > For more context: With DRI3 and wayland the compositor opens the DRM fd > > and then passes it to the client, which then starts allocating stuff. That > > makes book-keeping rather annoying. > > Actually, what you're describing is only true for the buffers shared by > an X server with an X11 compositor. For the actual applications, the > buffers are created on the client side and then shared with the X server > / Wayland compositor. > > Anyway, it doesn't really matter. In all cases, the buffers are actually > used by all parties that are sharing them, so charging the memory to all > of them is perfectly appropriate. > > > > I guess a good first order approximation would be if we simply charge any > > newly allocated buffers to the process that created them, but that means > > hanging onto lots of mm_struct pointers since we want to make sure we then > > release those pages to the right mm again (since the process that drops > > the last ref might be a totally different one, depending upon how the > > buffers or DRM fd have been shared). > > > > Would it be ok to hang onto potentially arbitrary mmget references > > essentially forever? If that's ok I think we can do your process based > > account (minus a few minor inaccuracies for shared stuff perhaps, but no > > one cares about that). > > Honestly, I think you and Christian are overthinking this. Let's try > charging the memory to every process which shares a buffer, and go from > there. I'm not concerned about wrongly accounting shared buffers (they don't matter), but imbalanced accounting. I.e. allocate a buffer in the client, share it, but then the compositor drops the last reference. If we store the mm_struct pointer in drm_gem_object, we don't need any callback from the vfs when fds are shared or anything like that. We can simply account any newly allocated buffers to the current->mm, and then store that later for dropping the account for when the gem obj is released. This would entirely ignore any complications with shared buffers, which I think we can do because even when we pass the DRM fd to a different process, the actual buffer allocations are not passed around like that for private buffers. And private buffers are the only ones that really matter. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
Am 30.01.2018 um 10:43 schrieb Michel Dänzer: [SNIP] Would it be ok to hang onto potentially arbitrary mmget references essentially forever? If that's ok I think we can do your process based account (minus a few minor inaccuracies for shared stuff perhaps, but no one cares about that). Honestly, I think you and Christian are overthinking this. Let's try charging the memory to every process which shares a buffer, and go from there. My problem is that this needs to be bullet prove. For example imagine an application which allocates a lot of BOs, then calls fork() and let the parent process die. The file descriptor lives on in the child process, but the memory is not accounted against the child. Otherwise we would allow easy construction of deny of service problems. To avoid that I think we need to add something like new file_operations callbacks which informs a file descriptor that it is going to be used in a new process or stopped to be used in a process. Regards, Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On Tue 30-01-18 10:29:10, Michel Dänzer wrote: > On 2018-01-24 12:50 PM, Michal Hocko wrote: > > On Wed 24-01-18 12:23:10, Michel Dänzer wrote: > >> On 2018-01-24 12:01 PM, Michal Hocko wrote: > >>> On Wed 24-01-18 11:27:15, Michel Dänzer wrote: > > [...] > 2. If the OOM killer kills a process which is sharing BOs with another > process, this should result in the other process dropping its references > to the BOs as well, at which point the memory is released. > >>> > >>> OK. How exactly are those BOs mapped to the userspace? > >> > >> I'm not sure what you're asking. Userspace mostly uses a GEM handle to > >> refer to a BO. There can also be userspace CPU mappings of the BO's > >> memory, but userspace doesn't need CPU mappings for all BOs and only > >> creates them as needed. > > > > OK, I guess you have to bear with me some more. This whole stack is a > > complete uknonwn. I am mostly after finding a boundary where you can > > charge the allocated memory to the process so that the oom killer can > > consider it. Is there anything like that? Except for the proposed file > > handle hack? > > How about the other way around: what APIs can we use to charge / > "uncharge" memory to a process? If we have those, we can experiment with > different places to call them. add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES. -- Michal Hocko SUSE Labs ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 2018-01-30 10:31 AM, Daniel Vetter wrote: > On Wed, Jan 24, 2018 at 01:11:09PM +0100, Christian König wrote: >> Am 24.01.2018 um 12:50 schrieb Michal Hocko: >>> On Wed 24-01-18 12:23:10, Michel Dänzer wrote: On 2018-01-24 12:01 PM, Michal Hocko wrote: > On Wed 24-01-18 11:27:15, Michel Dänzer wrote: >>> [...] >> 2. If the OOM killer kills a process which is sharing BOs with another >> process, this should result in the other process dropping its references >> to the BOs as well, at which point the memory is released. > OK. How exactly are those BOs mapped to the userspace? I'm not sure what you're asking. Userspace mostly uses a GEM handle to refer to a BO. There can also be userspace CPU mappings of the BO's memory, but userspace doesn't need CPU mappings for all BOs and only creates them as needed. >>> OK, I guess you have to bear with me some more. This whole stack is a >>> complete uknonwn. I am mostly after finding a boundary where you can >>> charge the allocated memory to the process so that the oom killer can >>> consider it. Is there anything like that? Except for the proposed file >>> handle hack? >> >> Not that I knew of. >> >> As I said before we need some kind of callback that a process now starts to >> use a file descriptor, but without anything from that file descriptor mapped >> into the address space. > > For more context: With DRI3 and wayland the compositor opens the DRM fd > and then passes it to the client, which then starts allocating stuff. That > makes book-keeping rather annoying. Actually, what you're describing is only true for the buffers shared by an X server with an X11 compositor. For the actual applications, the buffers are created on the client side and then shared with the X server / Wayland compositor. Anyway, it doesn't really matter. In all cases, the buffers are actually used by all parties that are sharing them, so charging the memory to all of them is perfectly appropriate. > I guess a good first order approximation would be if we simply charge any > newly allocated buffers to the process that created them, but that means > hanging onto lots of mm_struct pointers since we want to make sure we then > release those pages to the right mm again (since the process that drops > the last ref might be a totally different one, depending upon how the > buffers or DRM fd have been shared). > > Would it be ok to hang onto potentially arbitrary mmget references > essentially forever? If that's ok I think we can do your process based > account (minus a few minor inaccuracies for shared stuff perhaps, but no > one cares about that). Honestly, I think you and Christian are overthinking this. Let's try charging the memory to every process which shares a buffer, and go from there. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On Wed, Jan 24, 2018 at 01:11:09PM +0100, Christian König wrote: > Am 24.01.2018 um 12:50 schrieb Michal Hocko: > > On Wed 24-01-18 12:23:10, Michel Dänzer wrote: > > > On 2018-01-24 12:01 PM, Michal Hocko wrote: > > > > On Wed 24-01-18 11:27:15, Michel Dänzer wrote: > > [...] > > > > > 2. If the OOM killer kills a process which is sharing BOs with another > > > > > process, this should result in the other process dropping its > > > > > references > > > > > to the BOs as well, at which point the memory is released. > > > > OK. How exactly are those BOs mapped to the userspace? > > > I'm not sure what you're asking. Userspace mostly uses a GEM handle to > > > refer to a BO. There can also be userspace CPU mappings of the BO's > > > memory, but userspace doesn't need CPU mappings for all BOs and only > > > creates them as needed. > > OK, I guess you have to bear with me some more. This whole stack is a > > complete uknonwn. I am mostly after finding a boundary where you can > > charge the allocated memory to the process so that the oom killer can > > consider it. Is there anything like that? Except for the proposed file > > handle hack? > > Not that I knew of. > > As I said before we need some kind of callback that a process now starts to > use a file descriptor, but without anything from that file descriptor mapped > into the address space. For more context: With DRI3 and wayland the compositor opens the DRM fd and then passes it to the client, which then starts allocating stuff. That makes book-keeping rather annoying. I guess a good first order approximation would be if we simply charge any newly allocated buffers to the process that created them, but that means hanging onto lots of mm_struct pointers since we want to make sure we then release those pages to the right mm again (since the process that drops the last ref might be a totally different one, depending upon how the buffers or DRM fd have been shared). Would it be ok to hang onto potentially arbitrary mmget references essentially forever? If that's ok I think we can do your process based account (minus a few minor inaccuracies for shared stuff perhaps, but no one cares about that). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 2018-01-24 12:50 PM, Michal Hocko wrote: > On Wed 24-01-18 12:23:10, Michel Dänzer wrote: >> On 2018-01-24 12:01 PM, Michal Hocko wrote: >>> On Wed 24-01-18 11:27:15, Michel Dänzer wrote: > [...] 2. If the OOM killer kills a process which is sharing BOs with another process, this should result in the other process dropping its references to the BOs as well, at which point the memory is released. >>> >>> OK. How exactly are those BOs mapped to the userspace? >> >> I'm not sure what you're asking. Userspace mostly uses a GEM handle to >> refer to a BO. There can also be userspace CPU mappings of the BO's >> memory, but userspace doesn't need CPU mappings for all BOs and only >> creates them as needed. > > OK, I guess you have to bear with me some more. This whole stack is a > complete uknonwn. I am mostly after finding a boundary where you can > charge the allocated memory to the process so that the oom killer can > consider it. Is there anything like that? Except for the proposed file > handle hack? How about the other way around: what APIs can we use to charge / "uncharge" memory to a process? If we have those, we can experiment with different places to call them. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 2018-01-24 12:50 PM, Michal Hocko wrote: > On Wed 24-01-18 12:23:10, Michel Dänzer wrote: >> On 2018-01-24 12:01 PM, Michal Hocko wrote: >>> On Wed 24-01-18 11:27:15, Michel Dänzer wrote: > [...] 2. If the OOM killer kills a process which is sharing BOs with another process, this should result in the other process dropping its references to the BOs as well, at which point the memory is released. >>> >>> OK. How exactly are those BOs mapped to the userspace? >> >> I'm not sure what you're asking. Userspace mostly uses a GEM handle to >> refer to a BO. There can also be userspace CPU mappings of the BO's >> memory, but userspace doesn't need CPU mappings for all BOs and only >> creates them as needed. > > OK, I guess you have to bear with me some more. This whole stack is a > complete uknonwn. I am mostly after finding a boundary where you can > charge the allocated memory to the process so that the oom killer can > consider it. Is there anything like that? I think something like charging the memory of a BO to the process when a userspace handle is created for it, and "uncharging" when a handle is destroyed, could be a good start. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
Am 24.01.2018 um 12:50 schrieb Michal Hocko: On Wed 24-01-18 12:23:10, Michel Dänzer wrote: On 2018-01-24 12:01 PM, Michal Hocko wrote: On Wed 24-01-18 11:27:15, Michel Dänzer wrote: [...] 2. If the OOM killer kills a process which is sharing BOs with another process, this should result in the other process dropping its references to the BOs as well, at which point the memory is released. OK. How exactly are those BOs mapped to the userspace? I'm not sure what you're asking. Userspace mostly uses a GEM handle to refer to a BO. There can also be userspace CPU mappings of the BO's memory, but userspace doesn't need CPU mappings for all BOs and only creates them as needed. OK, I guess you have to bear with me some more. This whole stack is a complete uknonwn. I am mostly after finding a boundary where you can charge the allocated memory to the process so that the oom killer can consider it. Is there anything like that? Except for the proposed file handle hack? Not that I knew of. As I said before we need some kind of callback that a process now starts to use a file descriptor, but without anything from that file descriptor mapped into the address space. Regards, Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On Wed 24-01-18 12:23:10, Michel Dänzer wrote: > On 2018-01-24 12:01 PM, Michal Hocko wrote: > > On Wed 24-01-18 11:27:15, Michel Dänzer wrote: [...] > >> 2. If the OOM killer kills a process which is sharing BOs with another > >> process, this should result in the other process dropping its references > >> to the BOs as well, at which point the memory is released. > > > > OK. How exactly are those BOs mapped to the userspace? > > I'm not sure what you're asking. Userspace mostly uses a GEM handle to > refer to a BO. There can also be userspace CPU mappings of the BO's > memory, but userspace doesn't need CPU mappings for all BOs and only > creates them as needed. OK, I guess you have to bear with me some more. This whole stack is a complete uknonwn. I am mostly after finding a boundary where you can charge the allocated memory to the process so that the oom killer can consider it. Is there anything like that? Except for the proposed file handle hack? -- Michal Hocko SUSE Labs ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 2018-01-24 12:01 PM, Michal Hocko wrote: > On Wed 24-01-18 11:27:15, Michel Dänzer wrote: >> On 2018-01-24 10:28 AM, Michal Hocko wrote: > [...] >>> So how exactly then helps to kill one of those processes? The memory >>> stays pinned behind or do I still misunderstand? >> >> Fundamentally, the memory is only released once all references to the >> BOs are dropped. That's true no matter how the memory is accounted for >> between the processes referencing the BO. >> >> >> In practice, this should be fine: >> >> 1. The amount of memory used for shared BOs is normally small compared >> to the amount of memory used for non-shared BOs (and other things). So >> regardless of how shared BOs are accounted for, the OOM killer should >> first target the process which is responsible for more memory overall. > > OK. So this is essentially the same as with the normal shared memory > which is a part of the RSS in general. Right. >> 2. If the OOM killer kills a process which is sharing BOs with another >> process, this should result in the other process dropping its references >> to the BOs as well, at which point the memory is released. > > OK. How exactly are those BOs mapped to the userspace? I'm not sure what you're asking. Userspace mostly uses a GEM handle to refer to a BO. There can also be userspace CPU mappings of the BO's memory, but userspace doesn't need CPU mappings for all BOs and only creates them as needed. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On Wed 24-01-18 11:27:15, Michel Dänzer wrote: > On 2018-01-24 10:28 AM, Michal Hocko wrote: [...] > > So how exactly then helps to kill one of those processes? The memory > > stays pinned behind or do I still misunderstand? > > Fundamentally, the memory is only released once all references to the > BOs are dropped. That's true no matter how the memory is accounted for > between the processes referencing the BO. > > > In practice, this should be fine: > > 1. The amount of memory used for shared BOs is normally small compared > to the amount of memory used for non-shared BOs (and other things). So > regardless of how shared BOs are accounted for, the OOM killer should > first target the process which is responsible for more memory overall. OK. So this is essentially the same as with the normal shared memory which is a part of the RSS in general. > 2. If the OOM killer kills a process which is sharing BOs with another > process, this should result in the other process dropping its references > to the BOs as well, at which point the memory is released. OK. How exactly are those BOs mapped to the userspace? -- Michal Hocko SUSE Labs ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 2018-01-24 10:28 AM, Michal Hocko wrote: > On Tue 23-01-18 17:39:19, Michel Dänzer wrote: >> On 2018-01-23 04:36 PM, Michal Hocko wrote: >>> On Tue 23-01-18 15:27:00, Roman Gushchin wrote: On Thu, Jan 18, 2018 at 06:00:06PM +0100, Michal Hocko wrote: > On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote: >> Hi, this series is a revised version of an RFC sent by Christian König >> a few years ago. The original RFC can be found at >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_archives_dri-2Ddevel_2015-2DSeptember_089778.html&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=R-JIQjy8rqmH5qD581_VYL0Q7cpWSITKOnBCE-3LI8U&s=QZGqKpKuJ2BtioFGSy8_721owcWJ0J6c6d4jywOwN4w&; > Here is the origin cover letter text > : I'm currently working on the issue that when device drivers allocate > memory on > : behalf of an application the OOM killer usually doesn't knew about that > unless > : the application also get this memory mapped into their address space. > : > : This is especially annoying for graphics drivers where a lot of the VRAM > : usually isn't CPU accessible and so doesn't make sense to map into the > : address space of the process using it. > : > : The problem now is that when an application starts to use a lot of VRAM > those > : buffers objects sooner or later get swapped out to system memory, but > when we > : now run into an out of memory situation the OOM killer obviously > doesn't knew > : anything about that memory and so usually kills the wrong process. > : > : The following set of patches tries to address this problem by > introducing a per > : file OOM badness score, which device drivers can use to give the OOM > killer a > : hint how many resources are bound to a file descriptor so that it can > make > : better decisions which process to kill. > : > : So question at every one: What do you think about this approach? > : > : My biggest concern right now is the patches are messing with a core > kernel > : structure (adding a field to struct file). Any better idea? I'm > considering > : to put a callback into file_ops instead. Hello! I wonder if groupoom (aka cgroup-aware OOM killer) can work for you? >>> >>> I do not think so. The problem is that the allocating context is not >>> identical with the end consumer. >> >> That's actually not really true. Even in cases where a BO is shared with >> a different process, it is still used at least occasionally in the >> process which allocated it as well. Otherwise there would be no point in >> sharing it between processes. > > OK, but somebody has to be made responsible. Otherwise you are just > killing a process which doesn't really release any memory. > >> There should be no problem if the memory of a shared BO is accounted for >> in each process sharing it. It might be nice to scale each process' >> "debt" by 1 / (number of processes sharing it) if possible, but in the >> worst case accounting it fully in each process should be fine. > > So how exactly then helps to kill one of those processes? The memory > stays pinned behind or do I still misunderstand? Fundamentally, the memory is only released once all references to the BOs are dropped. That's true no matter how the memory is accounted for between the processes referencing the BO. In practice, this should be fine: 1. The amount of memory used for shared BOs is normally small compared to the amount of memory used for non-shared BOs (and other things). So regardless of how shared BOs are accounted for, the OOM killer should first target the process which is responsible for more memory overall. 2. If the OOM killer kills a process which is sharing BOs with another process, this should result in the other process dropping its references to the BOs as well, at which point the memory is released. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On Tue 23-01-18 17:39:19, Michel Dänzer wrote: > On 2018-01-23 04:36 PM, Michal Hocko wrote: > > On Tue 23-01-18 15:27:00, Roman Gushchin wrote: > >> On Thu, Jan 18, 2018 at 06:00:06PM +0100, Michal Hocko wrote: > >>> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote: > Hi, this series is a revised version of an RFC sent by Christian König > a few years ago. The original RFC can be found at > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_archives_dri-2Ddevel_2015-2DSeptember_089778.html&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=R-JIQjy8rqmH5qD581_VYL0Q7cpWSITKOnBCE-3LI8U&s=QZGqKpKuJ2BtioFGSy8_721owcWJ0J6c6d4jywOwN4w&; > >>> Here is the origin cover letter text > >>> : I'm currently working on the issue that when device drivers allocate > >>> memory on > >>> : behalf of an application the OOM killer usually doesn't knew about that > >>> unless > >>> : the application also get this memory mapped into their address space. > >>> : > >>> : This is especially annoying for graphics drivers where a lot of the VRAM > >>> : usually isn't CPU accessible and so doesn't make sense to map into the > >>> : address space of the process using it. > >>> : > >>> : The problem now is that when an application starts to use a lot of VRAM > >>> those > >>> : buffers objects sooner or later get swapped out to system memory, but > >>> when we > >>> : now run into an out of memory situation the OOM killer obviously > >>> doesn't knew > >>> : anything about that memory and so usually kills the wrong process. > >>> : > >>> : The following set of patches tries to address this problem by > >>> introducing a per > >>> : file OOM badness score, which device drivers can use to give the OOM > >>> killer a > >>> : hint how many resources are bound to a file descriptor so that it can > >>> make > >>> : better decisions which process to kill. > >>> : > >>> : So question at every one: What do you think about this approach? > >>> : > >>> : My biggest concern right now is the patches are messing with a core > >>> kernel > >>> : structure (adding a field to struct file). Any better idea? I'm > >>> considering > >>> : to put a callback into file_ops instead. > >> > >> Hello! > >> > >> I wonder if groupoom (aka cgroup-aware OOM killer) can work for you? > > > > I do not think so. The problem is that the allocating context is not > > identical with the end consumer. > > That's actually not really true. Even in cases where a BO is shared with > a different process, it is still used at least occasionally in the > process which allocated it as well. Otherwise there would be no point in > sharing it between processes. OK, but somebody has to be made responsible. Otherwise you are just killing a process which doesn't really release any memory. > There should be no problem if the memory of a shared BO is accounted for > in each process sharing it. It might be nice to scale each process' > "debt" by 1 / (number of processes sharing it) if possible, but in the > worst case accounting it fully in each process should be fine. So how exactly then helps to kill one of those processes? The memory stays pinned behind or do I still misunderstand? -- Michal Hocko SUSE Labs ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 2018-01-23 04:36 PM, Michal Hocko wrote: > On Tue 23-01-18 15:27:00, Roman Gushchin wrote: >> On Thu, Jan 18, 2018 at 06:00:06PM +0100, Michal Hocko wrote: >>> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote: Hi, this series is a revised version of an RFC sent by Christian König a few years ago. The original RFC can be found at https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_archives_dri-2Ddevel_2015-2DSeptember_089778.html&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=R-JIQjy8rqmH5qD581_VYL0Q7cpWSITKOnBCE-3LI8U&s=QZGqKpKuJ2BtioFGSy8_721owcWJ0J6c6d4jywOwN4w&; >>> Here is the origin cover letter text >>> : I'm currently working on the issue that when device drivers allocate >>> memory on >>> : behalf of an application the OOM killer usually doesn't knew about that >>> unless >>> : the application also get this memory mapped into their address space. >>> : >>> : This is especially annoying for graphics drivers where a lot of the VRAM >>> : usually isn't CPU accessible and so doesn't make sense to map into the >>> : address space of the process using it. >>> : >>> : The problem now is that when an application starts to use a lot of VRAM >>> those >>> : buffers objects sooner or later get swapped out to system memory, but >>> when we >>> : now run into an out of memory situation the OOM killer obviously doesn't >>> knew >>> : anything about that memory and so usually kills the wrong process. >>> : >>> : The following set of patches tries to address this problem by introducing >>> a per >>> : file OOM badness score, which device drivers can use to give the OOM >>> killer a >>> : hint how many resources are bound to a file descriptor so that it can make >>> : better decisions which process to kill. >>> : >>> : So question at every one: What do you think about this approach? >>> : >>> : My biggest concern right now is the patches are messing with a core kernel >>> : structure (adding a field to struct file). Any better idea? I'm >>> considering >>> : to put a callback into file_ops instead. >> >> Hello! >> >> I wonder if groupoom (aka cgroup-aware OOM killer) can work for you? > > I do not think so. The problem is that the allocating context is not > identical with the end consumer. That's actually not really true. Even in cases where a BO is shared with a different process, it is still used at least occasionally in the process which allocated it as well. Otherwise there would be no point in sharing it between processes. There should be no problem if the memory of a shared BO is accounted for in each process sharing it. It might be nice to scale each process' "debt" by 1 / (number of processes sharing it) if possible, but in the worst case accounting it fully in each process should be fine. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On Tue 23-01-18 15:27:00, Roman Gushchin wrote: > On Thu, Jan 18, 2018 at 06:00:06PM +0100, Michal Hocko wrote: > > On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote: > > > Hi, this series is a revised version of an RFC sent by Christian König > > > a few years ago. The original RFC can be found at > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_archives_dri-2Ddevel_2015-2DSeptember_089778.html&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=R-JIQjy8rqmH5qD581_VYL0Q7cpWSITKOnBCE-3LI8U&s=QZGqKpKuJ2BtioFGSy8_721owcWJ0J6c6d4jywOwN4w&; > > Here is the origin cover letter text > > : I'm currently working on the issue that when device drivers allocate > > memory on > > : behalf of an application the OOM killer usually doesn't knew about that > > unless > > : the application also get this memory mapped into their address space. > > : > > : This is especially annoying for graphics drivers where a lot of the VRAM > > : usually isn't CPU accessible and so doesn't make sense to map into the > > : address space of the process using it. > > : > > : The problem now is that when an application starts to use a lot of VRAM > > those > > : buffers objects sooner or later get swapped out to system memory, but > > when we > > : now run into an out of memory situation the OOM killer obviously doesn't > > knew > > : anything about that memory and so usually kills the wrong process. > > : > > : The following set of patches tries to address this problem by introducing > > a per > > : file OOM badness score, which device drivers can use to give the OOM > > killer a > > : hint how many resources are bound to a file descriptor so that it can make > > : better decisions which process to kill. > > : > > : So question at every one: What do you think about this approach? > > : > > : My biggest concern right now is the patches are messing with a core kernel > > : structure (adding a field to struct file). Any better idea? I'm > > considering > > : to put a callback into file_ops instead. > > Hello! > > I wonder if groupoom (aka cgroup-aware OOM killer) can work for you? I do not think so. The problem is that the allocating context is not identical with the end consumer. -- Michal Hocko SUSE Labs ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On Thu, Jan 18, 2018 at 06:00:06PM +0100, Michal Hocko wrote: > On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote: > > Hi, this series is a revised version of an RFC sent by Christian König > > a few years ago. The original RFC can be found at > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_archives_dri-2Ddevel_2015-2DSeptember_089778.html&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=R-JIQjy8rqmH5qD581_VYL0Q7cpWSITKOnBCE-3LI8U&s=QZGqKpKuJ2BtioFGSy8_721owcWJ0J6c6d4jywOwN4w&; > Here is the origin cover letter text > : I'm currently working on the issue that when device drivers allocate memory > on > : behalf of an application the OOM killer usually doesn't knew about that > unless > : the application also get this memory mapped into their address space. > : > : This is especially annoying for graphics drivers where a lot of the VRAM > : usually isn't CPU accessible and so doesn't make sense to map into the > : address space of the process using it. > : > : The problem now is that when an application starts to use a lot of VRAM > those > : buffers objects sooner or later get swapped out to system memory, but when > we > : now run into an out of memory situation the OOM killer obviously doesn't > knew > : anything about that memory and so usually kills the wrong process. > : > : The following set of patches tries to address this problem by introducing a > per > : file OOM badness score, which device drivers can use to give the OOM killer > a > : hint how many resources are bound to a file descriptor so that it can make > : better decisions which process to kill. > : > : So question at every one: What do you think about this approach? > : > : My biggest concern right now is the patches are messing with a core kernel > : structure (adding a field to struct file). Any better idea? I'm considering > : to put a callback into file_ops instead. Hello! I wonder if groupoom (aka cgroup-aware OOM killer) can work for you? We do have kmem accounting on the memory cgroup level, and the cgroup-aware OOM selection logic takes cgroup's kmem size into account. So, you don't need to introduce another accounting mechanism for OOM. You can find the current implementation in the mm tree. Thanks! Roman ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On Fri 19-01-18 17:54:36, Christian König wrote: > Am 19.01.2018 um 13:20 schrieb Michal Hocko: > > On Fri 19-01-18 13:13:51, Michal Hocko wrote: > > > On Fri 19-01-18 12:37:51, Christian König wrote: > > > [...] > > > > The per file descriptor badness is/was just the much easier approach to > > > > solve the issue, because the drivers already knew which client is > > > > currently > > > > using which buffer objects. > > > > > > > > I of course agree that file descriptors can be shared between processes > > > > and > > > > are by themselves not killable. But at least for our graphics driven use > > > > case I don't see much of a problem killing all processes when a file > > > > descriptor is used by more than one at the same time. > > > Ohh, I absolutely see why you have chosen this way for your particular > > > usecase. I am just arguing that this would rather be more generic to be > > > merged. If there is absolutely no other way around we can consider it > > > but right now I do not see that all other options have been considered > > > properly. Especially when the fd based approach is basically wrong for > > > almost anybody else. > > And more importantly. Iterating over _all_ fd which is what is your > > approach is based on AFAIU is not acceptable for the OOM path. Even > > though oom_badness is not a hot path we do not really want it to take a > > lot of time either. Even the current iteration over all processes is > > quite time consuming. Now you want to add the number of opened files and > > that might be quite many per process. > > Mhm, crap that is a really good argument. > > How about adding a linked list of callbacks to check for the OOM killer to > check for each process? > > This way we can avoid finding the process where we need to account things on > when memory is allocated and still allow the OOM killer to only check the > specific callbacks it needs to determine the score of a process? I might be oversimplifying but there really have to be a boundary when you have the target user context, no? Then do the accounting when you get data to the user. -- Michal Hocko SUSE Labs ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 01/22/2018 06:23 PM, Andrew Morton wrote: On Thu, 18 Jan 2018 11:47:48 -0500 Andrey Grodzovsky wrote: Hi, this series is a revised version of an RFC sent by Christian König a few years ago. The original RFC can be found at https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html This is the same idea and I've just adressed his concern from the original RFC and switched to a callback into file_ops instead of a new member in struct file. Should be in address_space_operations, I suspect. If an application opens a file twice, we only want to count it once? Makes sense But we're putting the cart ahead of the horse here. Please provide us with a detailed description of the problem which you are addressing so that the MM developers can better consider how to address your requirements. I will just reiterate the problem statement from the original RFC, should have put it in the body of the RFC and not just a link, as already commented by Michal. Bellow is the quoted RFC. Thanks, Andrey P.S You can also check the follow up discussion after this first email. " I'm currently working on the issue that when device drivers allocate memory on behalf of an application the OOM killer usually doesn't knew about that unless the application also get this memory mapped into their address space. This is especially annoying for graphics drivers where a lot of the VRAM usually isn't CPU accessible and so doesn't make sense to map into the address space of the process using it. The problem now is that when an application starts to use a lot of VRAM those buffers objects sooner or later get swapped out to system memory, but when we now run into an out of memory situation the OOM killer obviously doesn't knew anything about that memory and so usually kills the wrong process " ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On Thu, 18 Jan 2018 11:47:48 -0500 Andrey Grodzovsky wrote: > Hi, this series is a revised version of an RFC sent by Christian König > a few years ago. The original RFC can be found at > https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html > > This is the same idea and I've just adressed his concern from the original > RFC > and switched to a callback into file_ops instead of a new member in struct > file. Should be in address_space_operations, I suspect. If an application opens a file twice, we only want to count it once? But we're putting the cart ahead of the horse here. Please provide us with a detailed description of the problem which you are addressing so that the MM developers can better consider how to address your requirements. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
Michel Dänzer writes: > On 2018-01-19 11:02 AM, Michel Dänzer wrote: >> On 2018-01-19 10:58 AM, Christian König wrote: >>> Am 19.01.2018 um 10:32 schrieb Michel Dänzer: On 2018-01-19 09:39 AM, Christian König wrote: > Am 19.01.2018 um 09:20 schrieb Michal Hocko: >> OK, in that case I would propose a different approach. We already >> have rss_stat. So why do not we simply add a new counter there >> MM_KERNELPAGES and consider those in oom_badness? The rule would be >> that such a memory is bound to the process life time. I guess we will >> find more users for this later. > I already tried that and the problem with that approach is that some > buffers are not created by the application which actually uses them. > > For example X/Wayland is creating and handing out render buffers to > application which want to use OpenGL. > > So the result is when you always account the application who created the > buffer the OOM killer will certainly reap X/Wayland first. And that is > exactly what we want to avoid here. FWIW, what you describe is true with DRI2, but not with DRI3 or Wayland anymore. With DRI3 and Wayland, buffers are allocated by the clients and then shared with the X / Wayland server. >>> >>> Good point, when I initially looked at that problem DRI3 wasn't widely >>> used yet. >>> Also, in all cases, the amount of memory allocated for buffers shared between DRI/Wayland clients and the server should be relatively small compared to the amount of memory allocated for buffers used only locally in the client, particularly for clients which create significant memory pressure. >>> >>> That is unfortunately only partially true. When you have a single >>> runaway application which tries to allocate everything it would indeed >>> work as you described. >>> >>> But when I tested this a few years ago with X based desktop the >>> applications which actually used most of the memory where Firefox and >>> Thunderbird. Unfortunately they never got accounted for that. >>> >>> Now, on my current Wayland based desktop it actually doesn't look much >>> better. Taking a look at radeon_gem_info/amdgpu_gem_info the majority of >>> all memory was allocated either by gnome-shell or Xwayland. >> >> My guess would be this is due to pixmaps, which allow X clients to cause >> the X server to allocate essentially unlimited amounts of memory. It's a >> separate issue, which would require a different solution than what we're >> discussing in this thread. Maybe something that would allow the X server >> to tell the kernel that some of the memory it allocates is for the >> client process. > > Of course, such a mechanism could probably be abused to incorrectly > blame other processes for one's own memory consumption... > > > I'm not sure if the pixmap issue can be solved for the OOM killer. It's > an X design issue which is fixed with Wayland. So it's probably better > to ignore it for this discussion. > > Also, I really think the issue with DRM buffers being shared between > processes isn't significant for the OOM killer compared to DRM buffers > only used in the same process that allocates them. So I suggest focusing > on the latter. Agreed. The 95% case is non-shared buffers, so just don't account for them and we'll have a solution good enough that we probably never need to handle the shared case. On the DRM side, removing buffers from the accounting once they get shared would be easy. signature.asc Description: PGP signature ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
Am 19.01.2018 um 13:20 schrieb Michal Hocko: On Fri 19-01-18 13:13:51, Michal Hocko wrote: On Fri 19-01-18 12:37:51, Christian König wrote: [...] The per file descriptor badness is/was just the much easier approach to solve the issue, because the drivers already knew which client is currently using which buffer objects. I of course agree that file descriptors can be shared between processes and are by themselves not killable. But at least for our graphics driven use case I don't see much of a problem killing all processes when a file descriptor is used by more than one at the same time. Ohh, I absolutely see why you have chosen this way for your particular usecase. I am just arguing that this would rather be more generic to be merged. If there is absolutely no other way around we can consider it but right now I do not see that all other options have been considered properly. Especially when the fd based approach is basically wrong for almost anybody else. And more importantly. Iterating over _all_ fd which is what is your approach is based on AFAIU is not acceptable for the OOM path. Even though oom_badness is not a hot path we do not really want it to take a lot of time either. Even the current iteration over all processes is quite time consuming. Now you want to add the number of opened files and that might be quite many per process. Mhm, crap that is a really good argument. How about adding a linked list of callbacks to check for the OOM killer to check for each process? This way we can avoid finding the process where we need to account things on when memory is allocated and still allow the OOM killer to only check the specific callbacks it needs to determine the score of a process? Would still require some changes in the fs layer, but I think that should be doable. Regards, Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 2018-01-19 12:37 PM, Christian König wrote: > Am 19.01.2018 um 11:40 schrieb Michal Hocko: >> On Fri 19-01-18 09:39:03, Christian König wrote: >>> Am 19.01.2018 um 09:20 schrieb Michal Hocko: >> [...] OK, in that case I would propose a different approach. We already have rss_stat. So why do not we simply add a new counter there MM_KERNELPAGES and consider those in oom_badness? The rule would be that such a memory is bound to the process life time. I guess we will find more users for this later. >>> I already tried that and the problem with that approach is that some >>> buffers >>> are not created by the application which actually uses them. >>> >>> For example X/Wayland is creating and handing out render buffers to >>> application which want to use OpenGL. >>> >>> So the result is when you always account the application who created the >>> buffer the OOM killer will certainly reap X/Wayland first. And that is >>> exactly what we want to avoid here. >> Then you have to find the target allocation context at the time of the >> allocation and account it. > > And exactly that's the root of the problem: The target allocation > context isn't known at the time of the allocation. > > We could add callbacks so that when the memory is passed from the > allocator to the actual user of the memory. In other words when the > memory is passed from the X server to the client the driver would need > to decrement the X servers accounting and increment the clients accounting. > > But I think that would go deep into the file descriptor handling (we > would at least need to handle dup/dup2 and passing the fd using unix > domain sockets) and most likely would be rather error prone. > > The per file descriptor badness is/was just the much easier approach to > solve the issue, because the drivers already knew which client is > currently using which buffer objects. > > I of course agree that file descriptors can be shared between processes > and are by themselves not killable. But at least for our graphics driven > use case I don't see much of a problem killing all processes when a file > descriptor is used by more than one at the same time. In that case, accounting a BO as suggested by Michal above, in every process that shares it, should work fine, shouldn't it? The OOM killer will first select the process which has more memory accounted for other things than the BOs shared with another process. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On Fri 19-01-18 12:37:51, Christian König wrote: [...] > The per file descriptor badness is/was just the much easier approach to > solve the issue, because the drivers already knew which client is currently > using which buffer objects. > > I of course agree that file descriptors can be shared between processes and > are by themselves not killable. But at least for our graphics driven use > case I don't see much of a problem killing all processes when a file > descriptor is used by more than one at the same time. Ohh, I absolutely see why you have chosen this way for your particular usecase. I am just arguing that this would rather be more generic to be merged. If there is absolutely no other way around we can consider it but right now I do not see that all other options have been considered properly. Especially when the fd based approach is basically wrong for almost anybody else. -- Michal Hocko SUSE Labs ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On Fri 19-01-18 13:13:51, Michal Hocko wrote: > On Fri 19-01-18 12:37:51, Christian König wrote: > [...] > > The per file descriptor badness is/was just the much easier approach to > > solve the issue, because the drivers already knew which client is currently > > using which buffer objects. > > > > I of course agree that file descriptors can be shared between processes and > > are by themselves not killable. But at least for our graphics driven use > > case I don't see much of a problem killing all processes when a file > > descriptor is used by more than one at the same time. > > Ohh, I absolutely see why you have chosen this way for your particular > usecase. I am just arguing that this would rather be more generic to be > merged. If there is absolutely no other way around we can consider it > but right now I do not see that all other options have been considered > properly. Especially when the fd based approach is basically wrong for > almost anybody else. And more importantly. Iterating over _all_ fd which is what is your approach is based on AFAIU is not acceptable for the OOM path. Even though oom_badness is not a hot path we do not really want it to take a lot of time either. Even the current iteration over all processes is quite time consuming. Now you want to add the number of opened files and that might be quite many per process. -- Michal Hocko SUSE Labs ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On Fri 19-01-18 09:39:03, Christian König wrote: > Am 19.01.2018 um 09:20 schrieb Michal Hocko: [...] > > OK, in that case I would propose a different approach. We already > > have rss_stat. So why do not we simply add a new counter there > > MM_KERNELPAGES and consider those in oom_badness? The rule would be > > that such a memory is bound to the process life time. I guess we will > > find more users for this later. > > I already tried that and the problem with that approach is that some buffers > are not created by the application which actually uses them. > > For example X/Wayland is creating and handing out render buffers to > application which want to use OpenGL. > > So the result is when you always account the application who created the > buffer the OOM killer will certainly reap X/Wayland first. And that is > exactly what we want to avoid here. Then you have to find the target allocation context at the time of the allocation and account it. As follow up emails show, implementations might differ and any robust oom solution have to rely on the common counters. -- Michal Hocko SUSE Labs ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 2018-01-19 11:02 AM, Michel Dänzer wrote: > On 2018-01-19 10:58 AM, Christian König wrote: >> Am 19.01.2018 um 10:32 schrieb Michel Dänzer: >>> On 2018-01-19 09:39 AM, Christian König wrote: Am 19.01.2018 um 09:20 schrieb Michal Hocko: > OK, in that case I would propose a different approach. We already > have rss_stat. So why do not we simply add a new counter there > MM_KERNELPAGES and consider those in oom_badness? The rule would be > that such a memory is bound to the process life time. I guess we will > find more users for this later. I already tried that and the problem with that approach is that some buffers are not created by the application which actually uses them. For example X/Wayland is creating and handing out render buffers to application which want to use OpenGL. So the result is when you always account the application who created the buffer the OOM killer will certainly reap X/Wayland first. And that is exactly what we want to avoid here. >>> FWIW, what you describe is true with DRI2, but not with DRI3 or Wayland >>> anymore. With DRI3 and Wayland, buffers are allocated by the clients and >>> then shared with the X / Wayland server. >> >> Good point, when I initially looked at that problem DRI3 wasn't widely >> used yet. >> >>> Also, in all cases, the amount of memory allocated for buffers shared >>> between DRI/Wayland clients and the server should be relatively small >>> compared to the amount of memory allocated for buffers used only locally >>> in the client, particularly for clients which create significant memory >>> pressure. >> >> That is unfortunately only partially true. When you have a single >> runaway application which tries to allocate everything it would indeed >> work as you described. >> >> But when I tested this a few years ago with X based desktop the >> applications which actually used most of the memory where Firefox and >> Thunderbird. Unfortunately they never got accounted for that. >> >> Now, on my current Wayland based desktop it actually doesn't look much >> better. Taking a look at radeon_gem_info/amdgpu_gem_info the majority of >> all memory was allocated either by gnome-shell or Xwayland. > > My guess would be this is due to pixmaps, which allow X clients to cause > the X server to allocate essentially unlimited amounts of memory. It's a > separate issue, which would require a different solution than what we're > discussing in this thread. Maybe something that would allow the X server > to tell the kernel that some of the memory it allocates is for the > client process. Of course, such a mechanism could probably be abused to incorrectly blame other processes for one's own memory consumption... I'm not sure if the pixmap issue can be solved for the OOM killer. It's an X design issue which is fixed with Wayland. So it's probably better to ignore it for this discussion. Also, I really think the issue with DRM buffers being shared between processes isn't significant for the OOM killer compared to DRM buffers only used in the same process that allocates them. So I suggest focusing on the latter. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
Am 19.01.2018 um 11:40 schrieb Michal Hocko: On Fri 19-01-18 09:39:03, Christian König wrote: Am 19.01.2018 um 09:20 schrieb Michal Hocko: [...] OK, in that case I would propose a different approach. We already have rss_stat. So why do not we simply add a new counter there MM_KERNELPAGES and consider those in oom_badness? The rule would be that such a memory is bound to the process life time. I guess we will find more users for this later. I already tried that and the problem with that approach is that some buffers are not created by the application which actually uses them. For example X/Wayland is creating and handing out render buffers to application which want to use OpenGL. So the result is when you always account the application who created the buffer the OOM killer will certainly reap X/Wayland first. And that is exactly what we want to avoid here. Then you have to find the target allocation context at the time of the allocation and account it. And exactly that's the root of the problem: The target allocation context isn't known at the time of the allocation. We could add callbacks so that when the memory is passed from the allocator to the actual user of the memory. In other words when the memory is passed from the X server to the client the driver would need to decrement the X servers accounting and increment the clients accounting. But I think that would go deep into the file descriptor handling (we would at least need to handle dup/dup2 and passing the fd using unix domain sockets) and most likely would be rather error prone. The per file descriptor badness is/was just the much easier approach to solve the issue, because the drivers already knew which client is currently using which buffer objects. I of course agree that file descriptors can be shared between processes and are by themselves not killable. But at least for our graphics driven use case I don't see much of a problem killing all processes when a file descriptor is used by more than one at the same time. Regards, Christian. As follow up emails show, implementations might differ and any robust oom solution have to rely on the common counters. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On 2018-01-19 10:58 AM, Christian König wrote: > Am 19.01.2018 um 10:32 schrieb Michel Dänzer: >> On 2018-01-19 09:39 AM, Christian König wrote: >>> Am 19.01.2018 um 09:20 schrieb Michal Hocko: On Thu 18-01-18 12:01:32, Eric Anholt wrote: > Michal Hocko writes: > >> On Thu 18-01-18 18:00:06, Michal Hocko wrote: >>> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote: Hi, this series is a revised version of an RFC sent by Christian König a few years ago. The original RFC can be found at https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html This is the same idea and I've just adressed his concern from the original RFC and switched to a callback into file_ops instead of a new member in struct file. >>> Please add the full description to the cover letter and do not make >>> people hunt links. >>> >>> Here is the origin cover letter text >>> : I'm currently working on the issue that when device drivers >>> allocate memory on >>> : behalf of an application the OOM killer usually doesn't knew >>> about that unless >>> : the application also get this memory mapped into their address >>> space. >>> : >>> : This is especially annoying for graphics drivers where a lot of >>> the VRAM >>> : usually isn't CPU accessible and so doesn't make sense to map >>> into the >>> : address space of the process using it. >>> : >>> : The problem now is that when an application starts to use a lot >>> of VRAM those >>> : buffers objects sooner or later get swapped out to system memory, >>> but when we >>> : now run into an out of memory situation the OOM killer obviously >>> doesn't knew >>> : anything about that memory and so usually kills the wrong process. >> OK, but how do you attribute that memory to a particular OOM killable >> entity? And how do you actually enforce that those resources get >> freed >> on the oom killer action? >> >>> : The following set of patches tries to address this problem by >>> introducing a per >>> : file OOM badness score, which device drivers can use to give the >>> OOM killer a >>> : hint how many resources are bound to a file descriptor so that it >>> can make >>> : better decisions which process to kill. >> But files are not killable, they can be shared... In other words this >> doesn't help the oom killer to make an educated guess at all. > Maybe some more context would help the discussion? > > The struct file in patch 3 is the DRM fd. That's effectively "my > process's interface to talking to the GPU" not "a single GPU > resource". > Once that file is closed, all of the process's private, idle GPU > buffers > will be immediately freed (this will be most of their allocations), > and > some will be freed once the GPU completes some work (this will be most > of the rest of their allocations). > > Some GEM BOs won't be freed just by closing the fd, if they've been > shared between processes. Those are usually about 8-24MB total in a > process, rather than the GBs that modern apps use (or that our > testcases > like to allocate and thus trigger oomkilling of the test harness > instead > of the offending testcase...) > > Even if we just had the private+idle buffers being accounted in OOM > badness, that would be a huge step forward in system reliability. OK, in that case I would propose a different approach. We already have rss_stat. So why do not we simply add a new counter there MM_KERNELPAGES and consider those in oom_badness? The rule would be that such a memory is bound to the process life time. I guess we will find more users for this later. >>> I already tried that and the problem with that approach is that some >>> buffers are not created by the application which actually uses them. >>> >>> For example X/Wayland is creating and handing out render buffers to >>> application which want to use OpenGL. >>> >>> So the result is when you always account the application who created the >>> buffer the OOM killer will certainly reap X/Wayland first. And that is >>> exactly what we want to avoid here. >> FWIW, what you describe is true with DRI2, but not with DRI3 or Wayland >> anymore. With DRI3 and Wayland, buffers are allocated by the clients and >> then shared with the X / Wayland server. > > Good point, when I initially looked at that problem DRI3 wasn't widely > used yet. > >> Also, in all cases, the amount of memory allocated for buffers shared >> between DRI/Wayland clients and the server should be relatively small >> compared to the amount of memory allocated for buffers used only locally >> in the client, particularly for clients which create significant memory >> pressure. > > That is unfo
Re: [RFC] Per file OOM badness
On 2018年01月19日 16:25, Michal Hocko wrote: [removed the broken quoting - please try to use an email client which doesn't mess up the qouted text] On Fri 19-01-18 06:01:26, He, Roger wrote: [...] I think you are misunderstanding here. Actually for now, the memory in TTM Pools already has mm_shrink which is implemented in ttm_pool_mm_shrink_init. And here the memory we want to make it contribute to OOM badness is not in TTM Pools. Because when TTM buffer allocation success, the memory already is removed from TTM Pools. I have no idea what TTM buffers are. But this smells like something rather specific to the particular subsytem. And my main objection here is that struct file is not a proper vehicle to carry such an information. So whatever the TTM subsystem does it should contribute to generic counters rather than abuse fd because it happens to use it to communicate with userspace. got it. thanks. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
Am 19.01.2018 um 10:32 schrieb Michel Dänzer: On 2018-01-19 09:39 AM, Christian König wrote: Am 19.01.2018 um 09:20 schrieb Michal Hocko: On Thu 18-01-18 12:01:32, Eric Anholt wrote: Michal Hocko writes: On Thu 18-01-18 18:00:06, Michal Hocko wrote: On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote: Hi, this series is a revised version of an RFC sent by Christian König a few years ago. The original RFC can be found at https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html This is the same idea and I've just adressed his concern from the original RFC and switched to a callback into file_ops instead of a new member in struct file. Please add the full description to the cover letter and do not make people hunt links. Here is the origin cover letter text : I'm currently working on the issue that when device drivers allocate memory on : behalf of an application the OOM killer usually doesn't knew about that unless : the application also get this memory mapped into their address space. : : This is especially annoying for graphics drivers where a lot of the VRAM : usually isn't CPU accessible and so doesn't make sense to map into the : address space of the process using it. : : The problem now is that when an application starts to use a lot of VRAM those : buffers objects sooner or later get swapped out to system memory, but when we : now run into an out of memory situation the OOM killer obviously doesn't knew : anything about that memory and so usually kills the wrong process. OK, but how do you attribute that memory to a particular OOM killable entity? And how do you actually enforce that those resources get freed on the oom killer action? : The following set of patches tries to address this problem by introducing a per : file OOM badness score, which device drivers can use to give the OOM killer a : hint how many resources are bound to a file descriptor so that it can make : better decisions which process to kill. But files are not killable, they can be shared... In other words this doesn't help the oom killer to make an educated guess at all. Maybe some more context would help the discussion? The struct file in patch 3 is the DRM fd. That's effectively "my process's interface to talking to the GPU" not "a single GPU resource". Once that file is closed, all of the process's private, idle GPU buffers will be immediately freed (this will be most of their allocations), and some will be freed once the GPU completes some work (this will be most of the rest of their allocations). Some GEM BOs won't be freed just by closing the fd, if they've been shared between processes. Those are usually about 8-24MB total in a process, rather than the GBs that modern apps use (or that our testcases like to allocate and thus trigger oomkilling of the test harness instead of the offending testcase...) Even if we just had the private+idle buffers being accounted in OOM badness, that would be a huge step forward in system reliability. OK, in that case I would propose a different approach. We already have rss_stat. So why do not we simply add a new counter there MM_KERNELPAGES and consider those in oom_badness? The rule would be that such a memory is bound to the process life time. I guess we will find more users for this later. I already tried that and the problem with that approach is that some buffers are not created by the application which actually uses them. For example X/Wayland is creating and handing out render buffers to application which want to use OpenGL. So the result is when you always account the application who created the buffer the OOM killer will certainly reap X/Wayland first. And that is exactly what we want to avoid here. FWIW, what you describe is true with DRI2, but not with DRI3 or Wayland anymore. With DRI3 and Wayland, buffers are allocated by the clients and then shared with the X / Wayland server. Good point, when I initially looked at that problem DRI3 wasn't widely used yet. Also, in all cases, the amount of memory allocated for buffers shared between DRI/Wayland clients and the server should be relatively small compared to the amount of memory allocated for buffers used only locally in the client, particularly for clients which create significant memory pressure. That is unfortunately only partially true. When you have a single runaway application which tries to allocate everything it would indeed work as you described. But when I tested this a few years ago with X based desktop the applications which actually used most of the memory where Firefox and Thunderbird. Unfortunately they never got accounted for that. Now, on my current Wayland based desktop it actually doesn't look much better. Taking a look at radeon_gem_info/amdgpu_gem_info the majority of all memory was allocated either by gnome-shell or Xwayland. Regards, Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org
Re: [RFC] Per file OOM badness
On 2018-01-19 09:39 AM, Christian König wrote: > Am 19.01.2018 um 09:20 schrieb Michal Hocko: >> On Thu 18-01-18 12:01:32, Eric Anholt wrote: >>> Michal Hocko writes: >>> On Thu 18-01-18 18:00:06, Michal Hocko wrote: > On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote: >> Hi, this series is a revised version of an RFC sent by Christian >> König >> a few years ago. The original RFC can be found at >> https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html >> >> >> This is the same idea and I've just adressed his concern from the >> original RFC >> and switched to a callback into file_ops instead of a new member >> in struct file. > Please add the full description to the cover letter and do not make > people hunt links. > > Here is the origin cover letter text > : I'm currently working on the issue that when device drivers > allocate memory on > : behalf of an application the OOM killer usually doesn't knew > about that unless > : the application also get this memory mapped into their address > space. > : > : This is especially annoying for graphics drivers where a lot of > the VRAM > : usually isn't CPU accessible and so doesn't make sense to map > into the > : address space of the process using it. > : > : The problem now is that when an application starts to use a lot > of VRAM those > : buffers objects sooner or later get swapped out to system memory, > but when we > : now run into an out of memory situation the OOM killer obviously > doesn't knew > : anything about that memory and so usually kills the wrong process. OK, but how do you attribute that memory to a particular OOM killable entity? And how do you actually enforce that those resources get freed on the oom killer action? > : The following set of patches tries to address this problem by > introducing a per > : file OOM badness score, which device drivers can use to give the > OOM killer a > : hint how many resources are bound to a file descriptor so that it > can make > : better decisions which process to kill. But files are not killable, they can be shared... In other words this doesn't help the oom killer to make an educated guess at all. >>> Maybe some more context would help the discussion? >>> >>> The struct file in patch 3 is the DRM fd. That's effectively "my >>> process's interface to talking to the GPU" not "a single GPU resource". >>> Once that file is closed, all of the process's private, idle GPU buffers >>> will be immediately freed (this will be most of their allocations), and >>> some will be freed once the GPU completes some work (this will be most >>> of the rest of their allocations). >>> >>> Some GEM BOs won't be freed just by closing the fd, if they've been >>> shared between processes. Those are usually about 8-24MB total in a >>> process, rather than the GBs that modern apps use (or that our testcases >>> like to allocate and thus trigger oomkilling of the test harness instead >>> of the offending testcase...) >>> >>> Even if we just had the private+idle buffers being accounted in OOM >>> badness, that would be a huge step forward in system reliability. >> OK, in that case I would propose a different approach. We already >> have rss_stat. So why do not we simply add a new counter there >> MM_KERNELPAGES and consider those in oom_badness? The rule would be >> that such a memory is bound to the process life time. I guess we will >> find more users for this later. > > I already tried that and the problem with that approach is that some > buffers are not created by the application which actually uses them. > > For example X/Wayland is creating and handing out render buffers to > application which want to use OpenGL. > > So the result is when you always account the application who created the > buffer the OOM killer will certainly reap X/Wayland first. And that is > exactly what we want to avoid here. FWIW, what you describe is true with DRI2, but not with DRI3 or Wayland anymore. With DRI3 and Wayland, buffers are allocated by the clients and then shared with the X / Wayland server. Also, in all cases, the amount of memory allocated for buffers shared between DRI/Wayland clients and the server should be relatively small compared to the amount of memory allocated for buffers used only locally in the client, particularly for clients which create significant memory pressure. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On Thu 18-01-18 18:00:06, Michal Hocko wrote: > On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote: > > Hi, this series is a revised version of an RFC sent by Christian König > > a few years ago. The original RFC can be found at > > https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html > > > > This is the same idea and I've just adressed his concern from the original > > RFC > > and switched to a callback into file_ops instead of a new member in struct > > file. > > Please add the full description to the cover letter and do not make > people hunt links. > > Here is the origin cover letter text > : I'm currently working on the issue that when device drivers allocate memory > on > : behalf of an application the OOM killer usually doesn't knew about that > unless > : the application also get this memory mapped into their address space. > : > : This is especially annoying for graphics drivers where a lot of the VRAM > : usually isn't CPU accessible and so doesn't make sense to map into the > : address space of the process using it. > : > : The problem now is that when an application starts to use a lot of VRAM > those > : buffers objects sooner or later get swapped out to system memory, but when > we > : now run into an out of memory situation the OOM killer obviously doesn't > knew > : anything about that memory and so usually kills the wrong process. OK, but how do you attribute that memory to a particular OOM killable entity? And how do you actually enforce that those resources get freed on the oom killer action? > : The following set of patches tries to address this problem by introducing a > per > : file OOM badness score, which device drivers can use to give the OOM killer > a > : hint how many resources are bound to a file descriptor so that it can make > : better decisions which process to kill. But files are not killable, they can be shared... In other words this doesn't help the oom killer to make an educated guess at all. > : > : So question at every one: What do you think about this approach? I thing is just just wrong semantically. Non-reclaimable memory is a pain, especially when there is way too much of it. If you can free that memory somehow then you can hook into slab shrinker API and react on the memory pressure. If you can account such a memory to a particular process and make sure that the consumption is bound by the process life time then we can think of an accounting that oom_badness can consider when selecting a victim. -- Michal Hocko SUSE Labs ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
[removed the broken quoting - please try to use an email client which doesn't mess up the qouted text] On Fri 19-01-18 06:01:26, He, Roger wrote: [...] > I think you are misunderstanding here. > Actually for now, the memory in TTM Pools already has mm_shrink which is > implemented in ttm_pool_mm_shrink_init. > And here the memory we want to make it contribute to OOM badness is not in > TTM Pools. > Because when TTM buffer allocation success, the memory already is removed > from TTM Pools. I have no idea what TTM buffers are. But this smells like something rather specific to the particular subsytem. And my main objection here is that struct file is not a proper vehicle to carry such an information. So whatever the TTM subsystem does it should contribute to generic counters rather than abuse fd because it happens to use it to communicate with userspace. -- Michal Hocko SUSE Labs ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote: > Hi, this series is a revised version of an RFC sent by Christian König > a few years ago. The original RFC can be found at > https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html > > This is the same idea and I've just adressed his concern from the original > RFC > and switched to a callback into file_ops instead of a new member in struct > file. Please add the full description to the cover letter and do not make people hunt links. Here is the origin cover letter text : I'm currently working on the issue that when device drivers allocate memory on : behalf of an application the OOM killer usually doesn't knew about that unless : the application also get this memory mapped into their address space. : : This is especially annoying for graphics drivers where a lot of the VRAM : usually isn't CPU accessible and so doesn't make sense to map into the : address space of the process using it. : : The problem now is that when an application starts to use a lot of VRAM those : buffers objects sooner or later get swapped out to system memory, but when we : now run into an out of memory situation the OOM killer obviously doesn't knew : anything about that memory and so usually kills the wrong process. : : The following set of patches tries to address this problem by introducing a per : file OOM badness score, which device drivers can use to give the OOM killer a : hint how many resources are bound to a file descriptor so that it can make : better decisions which process to kill. : : So question at every one: What do you think about this approach? : : My biggest concern right now is the patches are messing with a core kernel : structure (adding a field to struct file). Any better idea? I'm considering : to put a callback into file_ops instead. -- Michal Hocko SUSE Labs ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
On Thu 18-01-18 12:01:32, Eric Anholt wrote: > Michal Hocko writes: > > > On Thu 18-01-18 18:00:06, Michal Hocko wrote: > >> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote: > >> > Hi, this series is a revised version of an RFC sent by Christian König > >> > a few years ago. The original RFC can be found at > >> > https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html > >> > > >> > This is the same idea and I've just adressed his concern from the > >> > original RFC > >> > and switched to a callback into file_ops instead of a new member in > >> > struct file. > >> > >> Please add the full description to the cover letter and do not make > >> people hunt links. > >> > >> Here is the origin cover letter text > >> : I'm currently working on the issue that when device drivers allocate > >> memory on > >> : behalf of an application the OOM killer usually doesn't knew about that > >> unless > >> : the application also get this memory mapped into their address space. > >> : > >> : This is especially annoying for graphics drivers where a lot of the VRAM > >> : usually isn't CPU accessible and so doesn't make sense to map into the > >> : address space of the process using it. > >> : > >> : The problem now is that when an application starts to use a lot of VRAM > >> those > >> : buffers objects sooner or later get swapped out to system memory, but > >> when we > >> : now run into an out of memory situation the OOM killer obviously doesn't > >> knew > >> : anything about that memory and so usually kills the wrong process. > > > > OK, but how do you attribute that memory to a particular OOM killable > > entity? And how do you actually enforce that those resources get freed > > on the oom killer action? > > > >> : The following set of patches tries to address this problem by > >> introducing a per > >> : file OOM badness score, which device drivers can use to give the OOM > >> killer a > >> : hint how many resources are bound to a file descriptor so that it can > >> make > >> : better decisions which process to kill. > > > > But files are not killable, they can be shared... In other words this > > doesn't help the oom killer to make an educated guess at all. > > Maybe some more context would help the discussion? > > The struct file in patch 3 is the DRM fd. That's effectively "my > process's interface to talking to the GPU" not "a single GPU resource". > Once that file is closed, all of the process's private, idle GPU buffers > will be immediately freed (this will be most of their allocations), and > some will be freed once the GPU completes some work (this will be most > of the rest of their allocations). > > Some GEM BOs won't be freed just by closing the fd, if they've been > shared between processes. Those are usually about 8-24MB total in a > process, rather than the GBs that modern apps use (or that our testcases > like to allocate and thus trigger oomkilling of the test harness instead > of the offending testcase...) > > Even if we just had the private+idle buffers being accounted in OOM > badness, that would be a huge step forward in system reliability. OK, in that case I would propose a different approach. We already have rss_stat. So why do not we simply add a new counter there MM_KERNELPAGES and consider those in oom_badness? The rule would be that such a memory is bound to the process life time. I guess we will find more users for this later. -- Michal Hocko SUSE Labs ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
Am 19.01.2018 um 09:20 schrieb Michal Hocko: On Thu 18-01-18 12:01:32, Eric Anholt wrote: Michal Hocko writes: On Thu 18-01-18 18:00:06, Michal Hocko wrote: On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote: Hi, this series is a revised version of an RFC sent by Christian König a few years ago. The original RFC can be found at https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html This is the same idea and I've just adressed his concern from the original RFC and switched to a callback into file_ops instead of a new member in struct file. Please add the full description to the cover letter and do not make people hunt links. Here is the origin cover letter text : I'm currently working on the issue that when device drivers allocate memory on : behalf of an application the OOM killer usually doesn't knew about that unless : the application also get this memory mapped into their address space. : : This is especially annoying for graphics drivers where a lot of the VRAM : usually isn't CPU accessible and so doesn't make sense to map into the : address space of the process using it. : : The problem now is that when an application starts to use a lot of VRAM those : buffers objects sooner or later get swapped out to system memory, but when we : now run into an out of memory situation the OOM killer obviously doesn't knew : anything about that memory and so usually kills the wrong process. OK, but how do you attribute that memory to a particular OOM killable entity? And how do you actually enforce that those resources get freed on the oom killer action? : The following set of patches tries to address this problem by introducing a per : file OOM badness score, which device drivers can use to give the OOM killer a : hint how many resources are bound to a file descriptor so that it can make : better decisions which process to kill. But files are not killable, they can be shared... In other words this doesn't help the oom killer to make an educated guess at all. Maybe some more context would help the discussion? The struct file in patch 3 is the DRM fd. That's effectively "my process's interface to talking to the GPU" not "a single GPU resource". Once that file is closed, all of the process's private, idle GPU buffers will be immediately freed (this will be most of their allocations), and some will be freed once the GPU completes some work (this will be most of the rest of their allocations). Some GEM BOs won't be freed just by closing the fd, if they've been shared between processes. Those are usually about 8-24MB total in a process, rather than the GBs that modern apps use (or that our testcases like to allocate and thus trigger oomkilling of the test harness instead of the offending testcase...) Even if we just had the private+idle buffers being accounted in OOM badness, that would be a huge step forward in system reliability. OK, in that case I would propose a different approach. We already have rss_stat. So why do not we simply add a new counter there MM_KERNELPAGES and consider those in oom_badness? The rule would be that such a memory is bound to the process life time. I guess we will find more users for this later. I already tried that and the problem with that approach is that some buffers are not created by the application which actually uses them. For example X/Wayland is creating and handing out render buffers to application which want to use OpenGL. So the result is when you always account the application who created the buffer the OOM killer will certainly reap X/Wayland first. And that is exactly what we want to avoid here. Regards, Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
Am 18.01.2018 um 21:01 schrieb Eric Anholt: Michal Hocko writes: [SNIP] But files are not killable, they can be shared... In other words this doesn't help the oom killer to make an educated guess at all. Maybe some more context would help the discussion? Thanks for doing this. Wanted to reply yesterday with that information as well, but was unfortunately on sick leave. The struct file in patch 3 is the DRM fd. That's effectively "my process's interface to talking to the GPU" not "a single GPU resource". Once that file is closed, all of the process's private, idle GPU buffers will be immediately freed (this will be most of their allocations), and some will be freed once the GPU completes some work (this will be most of the rest of their allocations). Some GEM BOs won't be freed just by closing the fd, if they've been shared between processes. Those are usually about 8-24MB total in a process, rather than the GBs that modern apps use (or that our testcases like to allocate and thus trigger oomkilling of the test harness instead of the offending testcase...) Even if we just had the private+idle buffers being accounted in OOM badness, that would be a huge step forward in system reliability. Yes, and that's exactly the intention here because currently the OOM killer usually kills X when a graphics related application allocates to much memory and that is highly undesirable. : So question at every one: What do you think about this approach? I thing is just just wrong semantically. Non-reclaimable memory is a pain, especially when there is way too much of it. If you can free that memory somehow then you can hook into slab shrinker API and react on the memory pressure. If you can account such a memory to a particular process and make sure that the consumption is bound by the process life time then we can think of an accounting that oom_badness can consider when selecting a victim. For graphics, we can't free most of our memory without also effectively killing the process. i915 and vc4 have "purgeable" interfaces for userspace (on i915 this is exposed all the way to GL applications and is hooked into shrinker, and on vc4 this is so far just used for userspace-internal buffer caches to be purged when a CMA allocation fails). However, those purgeable pools are expected to be a tiny fraction of the GPU allocations by the process. Same thing with TTM and amdgpu/radeon. We already have a shrinker hock as well and make room as much as we can when needed. But I think Michal's concerns are valid as well and I thought about them when I created the initial patch. One possible solution which came to my mind is that (IIRC) we not only store the usual reference count per GEM object, but also how many handles where created for it. So what we could do is to iterate over all GEM handles of a client and account only size/num_handles as badness for the client. The end result would be that X and the client application would both get 1/2 of the GEM objects size accounted for. Regards, Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
Am 19.01.2018 um 06:39 schrieb He, Roger: Basically the idea is right to me. 1. But we need smaller granularity to control the contribution to OOM badness. Because when the TTM buffer resides in VRAM rather than evict to system memory, we should not take this account into badness. But I think it is not easy to implement. I was considering that as well when I wrote the original patch set, but then decided against it at least for now. Basically all VRAM buffers can be swapped to system memory, so they potentially need system memory as well. That is especially important during suspend/resume. 2. If the TTM buffer(GTT here) is mapped to user for CPU access, not quite sure the buffer size is already taken into account for kernel. If yes, at last the size will be counted again by your patches. No that isn't accounted for as far as I know. So, I am thinking if we can counted the TTM buffer size into: struct mm_rss_stat { atomic_long_t count[NR_MM_COUNTERS]; }; Which is done by kernel based on CPU VM (page table). Something like that: When GTT allocate suceess: add_mm_counter(vma->vm_mm, MM_ANONPAGES, buffer_size); When GTT swapped out: dec_mm_counter from MM_ANONPAGES frist, then add_mm_counter(vma->vm_mm, MM_SWAPENTS, buffer_size); // or MM_SHMEMPAGES or add new item. Update the corresponding item in mm_rss_stat always. If that, we can control the status update accurately. What do you think about that? And is there any side-effect for this approach? I already tried this when I originally worked on the issue and that approach didn't worked because allocated buffers are not associated to the process where they are created. E.g. most display surfaces are created by the X server, but used by processes. So if you account the BO to the process who created it we would start to kill X again and that is exactly what we try to avoid. Regards, Christian. Thanks Roger(Hongbo.He) -Original Message- From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of Andrey Grodzovsky Sent: Friday, January 19, 2018 12:48 AM To: linux-ker...@vger.kernel.org; linux...@kvack.org; dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org Cc: Koenig, Christian Subject: [RFC] Per file OOM badness Hi, this series is a revised version of an RFC sent by Christian König a few years ago. The original RFC can be found at https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html This is the same idea and I've just adressed his concern from the original RFC and switched to a callback into file_ops instead of a new member in struct file. Thanks, Andrey ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ amd-gfx mailing list amd-...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: [RFC] Per file OOM badness
Basically the idea is right to me. 1. But we need smaller granularity to control the contribution to OOM badness. Because when the TTM buffer resides in VRAM rather than evict to system memory, we should not take this account into badness. But I think it is not easy to implement. 2. If the TTM buffer(GTT here) is mapped to user for CPU access, not quite sure the buffer size is already taken into account for kernel. If yes, at last the size will be counted again by your patches. So, I am thinking if we can counted the TTM buffer size into: struct mm_rss_stat { atomic_long_t count[NR_MM_COUNTERS]; }; Which is done by kernel based on CPU VM (page table). Something like that: When GTT allocate suceess: add_mm_counter(vma->vm_mm, MM_ANONPAGES, buffer_size); When GTT swapped out: dec_mm_counter from MM_ANONPAGES frist, then add_mm_counter(vma->vm_mm, MM_SWAPENTS, buffer_size); // or MM_SHMEMPAGES or add new item. Update the corresponding item in mm_rss_stat always. If that, we can control the status update accurately. What do you think about that? And is there any side-effect for this approach? Thanks Roger(Hongbo.He) -Original Message- From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of Andrey Grodzovsky Sent: Friday, January 19, 2018 12:48 AM To: linux-ker...@vger.kernel.org; linux...@kvack.org; dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org Cc: Koenig, Christian Subject: [RFC] Per file OOM badness Hi, this series is a revised version of an RFC sent by Christian König a few years ago. The original RFC can be found at https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html This is the same idea and I've just adressed his concern from the original RFC and switched to a callback into file_ops instead of a new member in struct file. Thanks, Andrey ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: [RFC] Per file OOM badness
-Original Message- From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of Michal Hocko Sent: Friday, January 19, 2018 1:14 AM To: Grodzovsky, Andrey Cc: linux...@kvack.org; amd-...@lists.freedesktop.org; linux-ker...@vger.kernel.org; dri-devel@lists.freedesktop.org; Koenig, Christian Subject: Re: [RFC] Per file OOM badness On Thu 18-01-18 18:00:06, Michal Hocko wrote: > On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote: > > Hi, this series is a revised version of an RFC sent by Christian > > König a few years ago. The original RFC can be found at > > https://lists.freedesktop.org/archives/dri-devel/2015-September/0897 > > 78.html > > > > This is the same idea and I've just adressed his concern from the > > original RFC and switched to a callback into file_ops instead of a new > > member in struct file. > > Please add the full description to the cover letter and do not make > people hunt links. > > Here is the origin cover letter text > : I'm currently working on the issue that when device drivers allocate > memory on > : behalf of an application the OOM killer usually doesn't knew about > that unless > : the application also get this memory mapped into their address space. > : > : This is especially annoying for graphics drivers where a lot of the > VRAM > : usually isn't CPU accessible and so doesn't make sense to map into > the > : address space of the process using it. > : > : The problem now is that when an application starts to use a lot of > VRAM those > : buffers objects sooner or later get swapped out to system memory, > but when we > : now run into an out of memory situation the OOM killer obviously > doesn't knew > : anything about that memory and so usually kills the wrong process. OK, but how do you attribute that memory to a particular OOM killable entity? And how do you actually enforce that those resources get freed on the oom killer action? Here I think we need more fine granularity for distinguishing the buffer is taking VRAM or system memory. > : The following set of patches tries to address this problem by > introducing a per > : file OOM badness score, which device drivers can use to give the OOM > killer a > : hint how many resources are bound to a file descriptor so that it > can make > : better decisions which process to kill. But files are not killable, they can be shared... In other words this doesn't help the oom killer to make an educated guess at all. > : > : So question at every one: What do you think about this approach? I thing is just just wrong semantically. Non-reclaimable memory is a pain, especially when there is way too much of it. If you can free that memory somehow then you can hook into slab shrinker API and react on the memory pressure. If you can account such amemory to a particular process and make sure that the consumption is bound by the process life time then we can think of an accounting that oom_badness can consider when selecting a victim. I think you are misunderstanding here. Actually for now, the memory in TTM Pools already has mm_shrink which is implemented in ttm_pool_mm_shrink_init. And here the memory we want to make it contribute to OOM badness is not in TTM Pools. Because when TTM buffer allocation success, the memory already is removed from TTM Pools. Thanks Roger(Hongbo.He) -- Michal Hocko SUSE Labs ___ amd-gfx mailing list amd-...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] Per file OOM badness
Michal Hocko writes: > On Thu 18-01-18 18:00:06, Michal Hocko wrote: >> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote: >> > Hi, this series is a revised version of an RFC sent by Christian König >> > a few years ago. The original RFC can be found at >> > https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html >> > >> > This is the same idea and I've just adressed his concern from the original >> > RFC >> > and switched to a callback into file_ops instead of a new member in struct >> > file. >> >> Please add the full description to the cover letter and do not make >> people hunt links. >> >> Here is the origin cover letter text >> : I'm currently working on the issue that when device drivers allocate >> memory on >> : behalf of an application the OOM killer usually doesn't knew about that >> unless >> : the application also get this memory mapped into their address space. >> : >> : This is especially annoying for graphics drivers where a lot of the VRAM >> : usually isn't CPU accessible and so doesn't make sense to map into the >> : address space of the process using it. >> : >> : The problem now is that when an application starts to use a lot of VRAM >> those >> : buffers objects sooner or later get swapped out to system memory, but when >> we >> : now run into an out of memory situation the OOM killer obviously doesn't >> knew >> : anything about that memory and so usually kills the wrong process. > > OK, but how do you attribute that memory to a particular OOM killable > entity? And how do you actually enforce that those resources get freed > on the oom killer action? > >> : The following set of patches tries to address this problem by introducing >> a per >> : file OOM badness score, which device drivers can use to give the OOM >> killer a >> : hint how many resources are bound to a file descriptor so that it can make >> : better decisions which process to kill. > > But files are not killable, they can be shared... In other words this > doesn't help the oom killer to make an educated guess at all. Maybe some more context would help the discussion? The struct file in patch 3 is the DRM fd. That's effectively "my process's interface to talking to the GPU" not "a single GPU resource". Once that file is closed, all of the process's private, idle GPU buffers will be immediately freed (this will be most of their allocations), and some will be freed once the GPU completes some work (this will be most of the rest of their allocations). Some GEM BOs won't be freed just by closing the fd, if they've been shared between processes. Those are usually about 8-24MB total in a process, rather than the GBs that modern apps use (or that our testcases like to allocate and thus trigger oomkilling of the test harness instead of the offending testcase...) Even if we just had the private+idle buffers being accounted in OOM badness, that would be a huge step forward in system reliability. >> : So question at every one: What do you think about this approach? > > I thing is just just wrong semantically. Non-reclaimable memory is a > pain, especially when there is way too much of it. If you can free that > memory somehow then you can hook into slab shrinker API and react on the > memory pressure. If you can account such a memory to a particular > process and make sure that the consumption is bound by the process life > time then we can think of an accounting that oom_badness can consider > when selecting a victim. For graphics, we can't free most of our memory without also effectively killing the process. i915 and vc4 have "purgeable" interfaces for userspace (on i915 this is exposed all the way to GL applications and is hooked into shrinker, and on vc4 this is so far just used for userspace-internal buffer caches to be purged when a CMA allocation fails). However, those purgeable pools are expected to be a tiny fraction of the GPU allocations by the process. signature.asc Description: PGP signature ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[RFC] Per file OOM badness
Hi, this series is a revised version of an RFC sent by Christian König a few years ago. The original RFC can be found at https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html This is the same idea and I've just adressed his concern from the original RFC and switched to a callback into file_ops instead of a new member in struct file. Thanks, Andrey ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[RFC] Per file OOM badness
Hello everyone, I'm currently working on the issue that when device drivers allocate memory on behalf of an application the OOM killer usually doesn't knew about that unless the application also get this memory mapped into their address space. This is especially annoying for graphics drivers where a lot of the VRAM usually isn't CPU accessible and so doesn't make sense to map into the address space of the process using it. The problem now is that when an application starts to use a lot of VRAM those buffers objects sooner or later get swapped out to system memory, but when we now run into an out of memory situation the OOM killer obviously doesn't knew anything about that memory and so usually kills the wrong process. The following set of patches tries to address this problem by introducing a per file OOM badness score, which device drivers can use to give the OOM killer a hint how many resources are bound to a file descriptor so that it can make better decisions which process to kill. So question at every one: What do you think about this approach? My biggest concern right now is the patches are messing with a core kernel structure (adding a field to struct file). Any better idea? I'm considering to put a callback into file_ops instead. Best regards and feel free to tear this idea apart, Christian.