Re: next-20200515: Xorg killed due to "OOM"
On Sun 31-05-20 14:16:01, Pavel Machek wrote: > On Thu 2020-05-28 14:07:50, Michal Hocko wrote: > > On Thu 28-05-20 14:03:54, Pavel Machek wrote: > > > On Thu 2020-05-28 11:05:17, Michal Hocko wrote: > > > > On Tue 26-05-20 11:10:54, Pavel Machek wrote: > > > > [...] > > > > > [38617.276517] oom_reaper: reaped process 31769 (chromium), now > > > > > anon-rss:0kB, file-rss:0kB, shmem-rss:7968kB > > > > > [38617.277232] Xorg invoked oom-killer: gfp_mask=0x0(), order=0, > > > > > oom_score_adj=0 > > > > > [38617.277247] CPU: 0 PID: 2978 Comm: Xorg Not tainted > > > > > 5.7.0-rc5-next-20200515+ #117 > > > > > [38617.277256] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW > > > > > (2.19 ) 03/31/2011 > > > > > [38617.277266] Call Trace: > > > > > [38617.277286] dump_stack+0x54/0x6e > > > > > [38617.277300] dump_header+0x45/0x321 > > > > > [38617.277313] oom_kill_process.cold+0x9/0xe > > > > > [38617.277324] ? out_of_memory+0x167/0x420 > > > > > [38617.277336] out_of_memory+0x1f2/0x420 > > > > > [38617.277348] pagefault_out_of_memory+0x34/0x56 > > > > > [38617.277361] mm_fault_error+0x4a/0x130 > > > > > [38617.277372] do_page_fault+0x3ce/0x416 > > > > > > > > The reason the OOM killer has been invoked is that the page fault > > > > handler has returned VM_FAULT_OOM. So this is not a result of the page > > > > allocator struggling to allocate a memory. It would be interesting to > > > > check which code path has returned this. > > > > > > Should the core WARN_ON if that happens and there's enough memory, or > > > something like that? > > > > I wish it would simply go away. There shouldn't be really any reason for > > VM_FAULT_OOM to exist. The real low on memory situation is already > > handled in the page allocator. > > Umm. Maybe the WARN_ON is first step in that direction? So we can see > what driver actually did that, and complain to its authors? This is much harder done than it seems. But maybe this doesn't really need a full coverage. Some of the code paths which return VM_FAULT_OOM will simply not fail. But checking for vma->vm_ops->fault() failures might be interesting. Does the following tell you more about the failure you can see diff --git a/mm/memory.c b/mm/memory.c index 9ab00dcb95d4..5ff023ab7b49 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3442,8 +3442,11 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) ret = vma->vm_ops->fault(vmf); if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY | - VM_FAULT_DONE_COW))) + VM_FAULT_DONE_COW))) { + if (unlikely(ret & VM_FAULT_OOM)) + pr_warn("VM_FAULT_OOM returned from %ps\n", vma->vm_ops->fault); return ret; + } if (unlikely(PageHWPoison(vmf->page))) { if (ret & VM_FAULT_LOCKED) -- Michal Hocko SUSE Labs
Re: next-20200515: Xorg killed due to "OOM"
On Thu 2020-05-28 14:07:50, Michal Hocko wrote: > On Thu 28-05-20 14:03:54, Pavel Machek wrote: > > On Thu 2020-05-28 11:05:17, Michal Hocko wrote: > > > On Tue 26-05-20 11:10:54, Pavel Machek wrote: > > > [...] > > > > [38617.276517] oom_reaper: reaped process 31769 (chromium), now > > > > anon-rss:0kB, file-rss:0kB, shmem-rss:7968kB > > > > [38617.277232] Xorg invoked oom-killer: gfp_mask=0x0(), order=0, > > > > oom_score_adj=0 > > > > [38617.277247] CPU: 0 PID: 2978 Comm: Xorg Not tainted > > > > 5.7.0-rc5-next-20200515+ #117 > > > > [38617.277256] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW > > > > (2.19 ) 03/31/2011 > > > > [38617.277266] Call Trace: > > > > [38617.277286] dump_stack+0x54/0x6e > > > > [38617.277300] dump_header+0x45/0x321 > > > > [38617.277313] oom_kill_process.cold+0x9/0xe > > > > [38617.277324] ? out_of_memory+0x167/0x420 > > > > [38617.277336] out_of_memory+0x1f2/0x420 > > > > [38617.277348] pagefault_out_of_memory+0x34/0x56 > > > > [38617.277361] mm_fault_error+0x4a/0x130 > > > > [38617.277372] do_page_fault+0x3ce/0x416 > > > > > > The reason the OOM killer has been invoked is that the page fault > > > handler has returned VM_FAULT_OOM. So this is not a result of the page > > > allocator struggling to allocate a memory. It would be interesting to > > > check which code path has returned this. > > > > Should the core WARN_ON if that happens and there's enough memory, or > > something like that? > > I wish it would simply go away. There shouldn't be really any reason for > VM_FAULT_OOM to exist. The real low on memory situation is already > handled in the page allocator. Umm. Maybe the WARN_ON is first step in that direction? So we can see what driver actually did that, and complain to its authors? Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature
Re: next-20200515: Xorg killed due to "OOM"
On Thu 28-05-20 14:03:54, Pavel Machek wrote: > On Thu 2020-05-28 11:05:17, Michal Hocko wrote: > > On Tue 26-05-20 11:10:54, Pavel Machek wrote: > > [...] > > > [38617.276517] oom_reaper: reaped process 31769 (chromium), now > > > anon-rss:0kB, file-rss:0kB, shmem-rss:7968kB > > > [38617.277232] Xorg invoked oom-killer: gfp_mask=0x0(), order=0, > > > oom_score_adj=0 > > > [38617.277247] CPU: 0 PID: 2978 Comm: Xorg Not tainted > > > 5.7.0-rc5-next-20200515+ #117 > > > [38617.277256] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 > > > ) 03/31/2011 > > > [38617.277266] Call Trace: > > > [38617.277286] dump_stack+0x54/0x6e > > > [38617.277300] dump_header+0x45/0x321 > > > [38617.277313] oom_kill_process.cold+0x9/0xe > > > [38617.277324] ? out_of_memory+0x167/0x420 > > > [38617.277336] out_of_memory+0x1f2/0x420 > > > [38617.277348] pagefault_out_of_memory+0x34/0x56 > > > [38617.277361] mm_fault_error+0x4a/0x130 > > > [38617.277372] do_page_fault+0x3ce/0x416 > > > > The reason the OOM killer has been invoked is that the page fault > > handler has returned VM_FAULT_OOM. So this is not a result of the page > > allocator struggling to allocate a memory. It would be interesting to > > check which code path has returned this. > > Should the core WARN_ON if that happens and there's enough memory, or > something like that? I wish it would simply go away. There shouldn't be really any reason for VM_FAULT_OOM to exist. The real low on memory situation is already handled in the page allocator. -- Michal Hocko SUSE Labs
Re: next-20200515: Xorg killed due to "OOM"
On Thu 2020-05-28 11:05:17, Michal Hocko wrote: > On Tue 26-05-20 11:10:54, Pavel Machek wrote: > [...] > > [38617.276517] oom_reaper: reaped process 31769 (chromium), now > > anon-rss:0kB, file-rss:0kB, shmem-rss:7968kB > > [38617.277232] Xorg invoked oom-killer: gfp_mask=0x0(), order=0, > > oom_score_adj=0 > > [38617.277247] CPU: 0 PID: 2978 Comm: Xorg Not tainted > > 5.7.0-rc5-next-20200515+ #117 > > [38617.277256] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) > > 03/31/2011 > > [38617.277266] Call Trace: > > [38617.277286] dump_stack+0x54/0x6e > > [38617.277300] dump_header+0x45/0x321 > > [38617.277313] oom_kill_process.cold+0x9/0xe > > [38617.277324] ? out_of_memory+0x167/0x420 > > [38617.277336] out_of_memory+0x1f2/0x420 > > [38617.277348] pagefault_out_of_memory+0x34/0x56 > > [38617.277361] mm_fault_error+0x4a/0x130 > > [38617.277372] do_page_fault+0x3ce/0x416 > > The reason the OOM killer has been invoked is that the page fault > handler has returned VM_FAULT_OOM. So this is not a result of the page > allocator struggling to allocate a memory. It would be interesting to > check which code path has returned this. Should the core WARN_ON if that happens and there's enough memory, or something like that? I grepped, and there are not too many users of VM_FAULT_OOM. These might be relevant: drivers/gpu/drm/ttm/ttm_bo_vm.c: * VM_FAULT_OOM on out-of-memory drivers/gpu/drm/ttm/ttm_bo_vm.c:ret = VM_FAULT_OOM; drivers/gpu/drm/ttm/ttm_bo_vm.c:ret = VM_FAULT_OOM; drivers/gpu/drm/i915/gem/i915_gem_mman.c: return VM_FAULT_OOM; drivers/gpu/drm/vkms/vkms_gem.c:ret = VM_FAULT_OOM; drivers/gpu/drm/vgem/vgem_drv.c:ret = VM_FAULT_OOM; Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature
Re: next-20200515: Xorg killed due to "OOM"
On Tue 26-05-20 11:10:54, Pavel Machek wrote: [...] > [38617.276517] oom_reaper: reaped process 31769 (chromium), now anon-rss:0kB, > file-rss:0kB, shmem-rss:7968kB > [38617.277232] Xorg invoked oom-killer: gfp_mask=0x0(), order=0, > oom_score_adj=0 > [38617.277247] CPU: 0 PID: 2978 Comm: Xorg Not tainted > 5.7.0-rc5-next-20200515+ #117 > [38617.277256] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) > 03/31/2011 > [38617.277266] Call Trace: > [38617.277286] dump_stack+0x54/0x6e > [38617.277300] dump_header+0x45/0x321 > [38617.277313] oom_kill_process.cold+0x9/0xe > [38617.277324] ? out_of_memory+0x167/0x420 > [38617.277336] out_of_memory+0x1f2/0x420 > [38617.277348] pagefault_out_of_memory+0x34/0x56 > [38617.277361] mm_fault_error+0x4a/0x130 > [38617.277372] do_page_fault+0x3ce/0x416 The reason the OOM killer has been invoked is that the page fault handler has returned VM_FAULT_OOM. So this is not a result of the page allocator struggling to allocate a memory. It would be interesting to check which code path has returned this. -- Michal Hocko SUSE Labs