On 2016/12/16 23:15, Kirill A. Shutemov wrote:
> Logic on whether we can reap pages from the VMA should match what we
> have in madvise_dontneed(). In particular, we should skip, VM_PFNMAP
> VMAs, but we don't now.
> 
> Let's just call madvise_dontneed() from __oom_reap_task_mm(), so we
> won't need to sync the logic in the future.
> 
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> ---
>  mm/internal.h |  7 +++----
>  mm/madvise.c  |  2 +-
>  mm/memory.c   |  2 +-
>  mm/oom_kill.c | 15 ++-------------
>  4 files changed, 7 insertions(+), 19 deletions(-)

madvise_dontneed() calls zap_page_range().
zap_page_range() calls mmu_notifier_invalidate_range_start().
mmu_notifier_invalidate_range_start() calls 
__mmu_notifier_invalidate_range_start().
__mmu_notifier_invalidate_range_start() calls 
srcu_read_lock()/srcu_read_unlock().
This means that madvise_dontneed() might sleep.

I don't know what individual notifier will do, but for example

  static const struct mmu_notifier_ops i915_gem_userptr_notifier = {
          .invalidate_range_start = i915_gem_userptr_mn_invalidate_range_start,
  };

i915_gem_userptr_mn_invalidate_range_start() calls flush_workqueue()
which means that we can OOM livelock if work item involves memory allocation.
Some of other notifiers call mutex_lock()/mutex_unlock().

Even if none of currently in-tree notifier users are blocked on memory
allocation, I think it is not guaranteed that future changes/users won't be
blocked on memory allocation.

Reply via email to