Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Wed 01-04-15 08:46:51, Dave Chinner wrote: [...] > GFP_NOFS has also been required in the mapping mask in the past > because reclaim from page cache allocation points had a nasty habit > of blowing the stack. Yeah I remember some scary traces but we are talking about the page fault path and we definitely have to handle GFP_IOFS allocations there. We cannot use GFP_NOFS as a workaround e.g. for anonymous pages. [...] > > From 292cfcbbe18b2afc8d2bc0cf568ca4c5842d4c8f Mon Sep 17 00:00:00 2001 > > From: Michal Hocko > > Date: Fri, 27 Mar 2015 13:33:51 +0100 > > Subject: [PATCH] mm: Allow GFP_IOFS for page_cache_read page cache > > allocation > > > > page_cache_read has been historically using page_cache_alloc_cold to > > allocate a new page. This means that mapping_gfp_mask is used as the > > base for the gfp_mask. Many filesystems are setting this mask to > > GFP_NOFS to prevent from fs recursion issues. page_cache_read is, > > however, not called from the fs layera directly so it doesn't need this > > protection normally. > > It can be called from a page fault while copying into or out of a > user buffer from a read()/write() system call. Hence the page fault > can be nested inside filesystem locks. As pointed above, the user buffer might be an anonymous memory as well and so we have to be able to handle GFP_IOFS allocations from the page fault without recalaim deadlocks. Besides that we are allocating page tables which are GFP_KERNEL and probably some more. So either we are broken by definition or GFP_IOFS is safe from under i_mutex lock. My code inspection suggests the later but the code is really hard to follow and dependencies might be not direct. I remember that nfs_release_page would be prone to i_mutex deadlock when server and client are on the same machine. But this shouldn't be a problem anymore because the amount of time client waits for the server is limited (9590544694bec). I might be missing other places of course but to me it sounds that GFP_IOFS must be safe under _some_ FS locks and i_mutex is one of them. > Indeed, the canonical reason > for why we can't take the i_mutex in the page fault path is exactly > this. i.e. the user buffer might be a mmap()d region of the same > file and so we have mmap_sem/i_mutex inversion issues. > > This is the same case - we can be taking page faults with filesystem > locks held, and that means we've got problems if the page fault then > recurses back into the filesystem and trips over those locks... Yeah, I am familiar with the generic meaning of GFP_NOFS flags. I just think that it is used as a too big of a hammer here (all FS locks is just too broad). The page fault is not GFP_NOFS safe now and it never has been (anonymous pages are not GFP_NOFS, page tables etc...). And I am afraid we cannot simply change it to use GFP_NOFS all over. Are there any other fs locks (except for i_mutex) which might be held while doing {get,put}_user or get_user_pages? I haven't found many instances in the fs/ but there is a lot of done via indirection. That being said I think the patch should be safe and an improvement over the current state. Unless I am missing something obvious or there are other objections I will repost it along with the other clean up patch later this week. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Mon, Mar 30, 2015 at 10:22:18AM +0200, Michal Hocko wrote: > On Fri 27-03-15 08:43:54, Dave Chinner wrote: > > On Thu, Mar 26, 2015 at 10:53:02AM +0100, Michal Hocko wrote: > > > On Fri 20-03-15 14:48:20, Dave Chinner wrote: > > > > On Thu, Mar 19, 2015 at 01:44:41PM +0100, Michal Hocko wrote: > > > [...] > > > > > Or did I miss your point? Are you concerned about some fs overloading > > > > > filemap_fault and do some locking before delegating to filemap_fault? > > > > > > > > The latter: > > > > > > > > https://git.kernel.org/cgit/linux/kernel/git/dgc/linux-xfs.git/commit/?h=xfs-mmap-lock&id=de0e8c20ba3a65b0f15040aabbefdc1999876e6b > > > > > > Hmm. I am completely unfamiliar with the xfs code but my reading of > > > 964aa8d9e4d3..723cac484733 is that the newly introduced lock should be > > > OK from the reclaim recursion POV. It protects against truncate and > > > punch hole, right? Or are there any internal paths which I am missing > > > and would cause problems if we do GFP_FS with XFS_MMAPLOCK_SHARED held? > > > > It might be OK, but you're only looking at the example I gave you, > > not the fundamental issue it demonstrates. That is: filesystems may > > have *internal dependencies that are unknown to the page cache or mm > > subsystem*. Hence the page cache or mm allocations cannot > > arbitrarily ignore allocation constraints the filesystem assigns to > > mapping operations > > I fully understand that. I am just trying to understand what are the > real requirements from filesystems wrt filemap_fault. mapping gfp mask is > not usable much for that because e.g. xfs has GFP_NOFS set for the whole > inode life AFAICS. And it seems that this context is not really required > even after the recent code changes. > We can add gfp_mask into struct vm_fault and initialize it to > mapping_gfp_mask | GFP_IOFS and .fault() callback might overwrite it. This > would be cleaner than unconditional gfp manipulation (the patch below). > > But we are in a really hard position if the GFP_NOFS context is really > required here. We shouldn't really trigger OOM killer because that could > be premature and way too disruptive. We can retry the page fault or the > allocation but that both sound suboptimal to me. GFP_NOFS has also been required in the mapping mask in the past because reclaim from page cache allocation points had a nasty habit of blowing the stack. In XFS, we replaced AOP_FLAG_NOFS calls with the GFP_NOFS mapping mask because the AOP_FLAG_NOFS didn't give us the coverage needed on mapping allocation calls to prevent the problems being reported. Yes, we now have a larger stack, but as I've said in the past, GFP_NOFS is used for several different reasons by subsystems - recursion can be bad in more ways than one, and GFP_NOFS is the only hammer we have... > This hasn't been tested yet it just shows the idea mentioned above. > --- > From 292cfcbbe18b2afc8d2bc0cf568ca4c5842d4c8f Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Fri, 27 Mar 2015 13:33:51 +0100 > Subject: [PATCH] mm: Allow GFP_IOFS for page_cache_read page cache allocation > > page_cache_read has been historically using page_cache_alloc_cold to > allocate a new page. This means that mapping_gfp_mask is used as the > base for the gfp_mask. Many filesystems are setting this mask to > GFP_NOFS to prevent from fs recursion issues. page_cache_read is, > however, not called from the fs layera directly so it doesn't need this > protection normally. It can be called from a page fault while copying into or out of a user buffer from a read()/write() system call. Hence the page fault can be nested inside filesystem locks. Indeed, the canonical reason for why we can't take the i_mutex in the page fault path is exactly this. i.e. the user buffer might be a mmap()d region of the same file and so we have mmap_sem/i_mutex inversion issues. This is the same case - we can be taking page faults with filesystem locks held, and that means we've got problems if the page fault then recurses back into the filesystem and trips over those locks... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Fri 27-03-15 08:43:54, Dave Chinner wrote: > On Thu, Mar 26, 2015 at 10:53:02AM +0100, Michal Hocko wrote: > > On Fri 20-03-15 14:48:20, Dave Chinner wrote: > > > On Thu, Mar 19, 2015 at 01:44:41PM +0100, Michal Hocko wrote: > > [...] > > > > Or did I miss your point? Are you concerned about some fs overloading > > > > filemap_fault and do some locking before delegating to filemap_fault? > > > > > > The latter: > > > > > > https://git.kernel.org/cgit/linux/kernel/git/dgc/linux-xfs.git/commit/?h=xfs-mmap-lock&id=de0e8c20ba3a65b0f15040aabbefdc1999876e6b > > > > Hmm. I am completely unfamiliar with the xfs code but my reading of > > 964aa8d9e4d3..723cac484733 is that the newly introduced lock should be > > OK from the reclaim recursion POV. It protects against truncate and > > punch hole, right? Or are there any internal paths which I am missing > > and would cause problems if we do GFP_FS with XFS_MMAPLOCK_SHARED held? > > It might be OK, but you're only looking at the example I gave you, > not the fundamental issue it demonstrates. That is: filesystems may > have *internal dependencies that are unknown to the page cache or mm > subsystem*. Hence the page cache or mm allocations cannot > arbitrarily ignore allocation constraints the filesystem assigns to > mapping operations I fully understand that. I am just trying to understand what are the real requirements from filesystems wrt filemap_fault. mapping gfp mask is not usable much for that because e.g. xfs has GFP_NOFS set for the whole inode life AFAICS. And it seems that this context is not really required even after the recent code changes. We can add gfp_mask into struct vm_fault and initialize it to mapping_gfp_mask | GFP_IOFS and .fault() callback might overwrite it. This would be cleaner than unconditional gfp manipulation (the patch below). But we are in a really hard position if the GFP_NOFS context is really required here. We shouldn't really trigger OOM killer because that could be premature and way too disruptive. We can retry the page fault or the allocation but that both sound suboptimal to me. Do you have any other suggestions? This hasn't been tested yet it just shows the idea mentioned above. --- >From 292cfcbbe18b2afc8d2bc0cf568ca4c5842d4c8f Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Fri, 27 Mar 2015 13:33:51 +0100 Subject: [PATCH] mm: Allow GFP_IOFS for page_cache_read page cache allocation page_cache_read has been historically using page_cache_alloc_cold to allocate a new page. This means that mapping_gfp_mask is used as the base for the gfp_mask. Many filesystems are setting this mask to GFP_NOFS to prevent from fs recursion issues. page_cache_read is, however, not called from the fs layera directly so it doesn't need this protection normally. ceph and ocfs2 which call filemap_fault from their fault handlers seem to be OK because they are not taking any fs lock before invoking generic implementation. xfs which takes XFS_MMAPLOCK_SHARED is safe from the reclaim recursion POV because this lock serializes truncate and punch hole with the page faults and it doesn't get involved in the reclaim. The GFP_NOFS protection might be even harmful. There is a push to fail GFP_NOFS allocations rather than loop within allocator indefinitely with a very limited reclaim ability. Once we start failing those requests the OOM killer might be triggered prematurely because the page cache allocation failure is propagated up the page fault path and end up in pagefault_out_of_memory. We cannot play with mapping_gfp_mask directly because that would be racy wrt. parallel page faults and it might interfere with other users who really rely on NOFS semantic from the stored gfp_mask. The mask is also inode proper so it would even be a layering violation. What we can do instead is to push the gfp_mask into struct vm_fault and allow fs layer to overwrite it should the callback need to be called with a different allocation context. Initialize the default to (mapping_gfp_mask | GFP_IOFS) because this should be safe from the page fault path normally. Why do we care about mapping_gfp_mask at all then? Because this doesn't hold only reclaim protection flags but it also might contain zone and movability restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we have to respect those. Reported-by: Tetsuo Handa Signed-off-by: Michal Hocko --- include/linux/mm.h | 4 mm/filemap.c | 9 - mm/memory.c| 3 +++ 3 files changed, 11 insertions(+), 5 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index b720b5146a4e..c919e48664d5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -217,10 +217,14 @@ extern pgprot_t protection_map[16]; * ->fault function. The vma's ->fault is responsible for returning a bitmask * of VM_FAULT_xxx flags that give details about how the fault was handled. * + * MM layer fills up gfp_mask for page allocations but fault handler might + * alter it if its implementation req
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Thu, Mar 26, 2015 at 10:53:02AM +0100, Michal Hocko wrote: > On Fri 20-03-15 14:48:20, Dave Chinner wrote: > > On Thu, Mar 19, 2015 at 01:44:41PM +0100, Michal Hocko wrote: > [...] > > > Or did I miss your point? Are you concerned about some fs overloading > > > filemap_fault and do some locking before delegating to filemap_fault? > > > > The latter: > > > > https://git.kernel.org/cgit/linux/kernel/git/dgc/linux-xfs.git/commit/?h=xfs-mmap-lock&id=de0e8c20ba3a65b0f15040aabbefdc1999876e6b > > Hmm. I am completely unfamiliar with the xfs code but my reading of > 964aa8d9e4d3..723cac484733 is that the newly introduced lock should be > OK from the reclaim recursion POV. It protects against truncate and > punch hole, right? Or are there any internal paths which I am missing > and would cause problems if we do GFP_FS with XFS_MMAPLOCK_SHARED held? It might be OK, but you're only looking at the example I gave you, not the fundamental issue it demonstrates. That is: filesystems may have *internal dependencies that are unknown to the page cache or mm subsystem*. Hence the page cache or mm allocations cannot arbitrarily ignore allocation constraints the filesystem assigns to mapping operations Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Fri 20-03-15 14:48:20, Dave Chinner wrote: > On Thu, Mar 19, 2015 at 01:44:41PM +0100, Michal Hocko wrote: [...] > > Or did I miss your point? Are you concerned about some fs overloading > > filemap_fault and do some locking before delegating to filemap_fault? > > The latter: > > https://git.kernel.org/cgit/linux/kernel/git/dgc/linux-xfs.git/commit/?h=xfs-mmap-lock&id=de0e8c20ba3a65b0f15040aabbefdc1999876e6b Hmm. I am completely unfamiliar with the xfs code but my reading of 964aa8d9e4d3..723cac484733 is that the newly introduced lock should be OK from the reclaim recursion POV. It protects against truncate and punch hole, right? Or are there any internal paths which I am missing and would cause problems if we do GFP_FS with XFS_MMAPLOCK_SHARED held? -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Sat 21-03-15 09:51:39, Dave Chinner wrote: > On Fri, Mar 20, 2015 at 02:14:53PM +0100, Michal Hocko wrote: > > On Fri 20-03-15 14:48:20, Dave Chinner wrote: > > > On Thu, Mar 19, 2015 at 01:44:41PM +0100, Michal Hocko wrote: > > > > > allocations where the caller cannot pass a gfp_mask need to obey > > > > > the mapping_gfp_mask that is set by the mapping owner > > > > > > > > Hmm, I thought this is true only when the function might be called from > > > > the fs path. > > > > > > How do you know in, say, mpage_readpages, you aren't being called > > > from a fs path that holds locks? e.g. we can get there from ext4 > > > doing readdir, so it is holding an i_mutex lock at that point. > > > > > > Many other paths into mpages_readpages don't hold locks, but there > > > are some that do, and those that do need functionals like this to > > > obey the mapping_gfp_mask because it is set appropriately for the > > > allocation context of the inode that owns the mapping > > > > What about the following? > > --- > > From 5d905cb291138d61bbab056845d6e53bc4451ec8 Mon Sep 17 00:00:00 2001 > > From: Michal Hocko > > Date: Thu, 19 Mar 2015 14:56:56 +0100 > > Subject: [PATCH 1/2] mm: do not ignore mapping_gfp_mask in page cache > > allocation paths > > Looks reasonable, though I though there were more places that that > in the mapping paths that need to be careful... I have focused on those which involve page cache allocation because those are obvious. We might need others but I do not see them right now. I will include this patch for the next submit after I manage to wrap my head around up-coming xfs changes and come up with something for page_cache_read vs OOM killer issue. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Fri, Mar 20, 2015 at 02:14:53PM +0100, Michal Hocko wrote: > On Fri 20-03-15 14:48:20, Dave Chinner wrote: > > On Thu, Mar 19, 2015 at 01:44:41PM +0100, Michal Hocko wrote: > > > > allocations where the caller cannot pass a gfp_mask need to obey > > > > the mapping_gfp_mask that is set by the mapping owner > > > > > > Hmm, I thought this is true only when the function might be called from > > > the fs path. > > > > How do you know in, say, mpage_readpages, you aren't being called > > from a fs path that holds locks? e.g. we can get there from ext4 > > doing readdir, so it is holding an i_mutex lock at that point. > > > > Many other paths into mpages_readpages don't hold locks, but there > > are some that do, and those that do need functionals like this to > > obey the mapping_gfp_mask because it is set appropriately for the > > allocation context of the inode that owns the mapping > > What about the following? > --- > From 5d905cb291138d61bbab056845d6e53bc4451ec8 Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Thu, 19 Mar 2015 14:56:56 +0100 > Subject: [PATCH 1/2] mm: do not ignore mapping_gfp_mask in page cache > allocation paths Looks reasonable, though I though there were more places that that in the mapping paths that need to be careful... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Fri 20-03-15 14:48:20, Dave Chinner wrote: > On Thu, Mar 19, 2015 at 01:44:41PM +0100, Michal Hocko wrote: > > On Thu 19-03-15 18:14:39, Dave Chinner wrote: > > > On Wed, Mar 18, 2015 at 03:55:28PM +0100, Michal Hocko wrote: > > > > On Wed 18-03-15 10:44:11, Rik van Riel wrote: > > > > > On 03/18/2015 10:09 AM, Michal Hocko wrote: > > > > > > page_cache_read has been historically using page_cache_alloc_cold to > > > > > > allocate a new page. This means that mapping_gfp_mask is used as the > > > > > > base for the gfp_mask. Many filesystems are setting this mask to > > > > > > GFP_NOFS to prevent from fs recursion issues. page_cache_read is, > > > > > > however, not called from the fs layer > > > > > > > > > > Is that true for filesystems that have directories in > > > > > the page cache? > > > > > > > > I haven't found any explicit callers of filemap_fault except for ocfs2 > > > > and ceph and those seem OK to me. Which filesystems you have in mind? > > > > > > Just about every major filesystem calls filemap_fault through the > > > .fault callout. > > > > That is right but the callback is called from the VM layer where we > > obviously do not take any fs locks (we are holding only mmap_sem > > for reading). > > Those who call filemap_fault directly (ocfs2 and ceph) and those > > who call the callback directly: qxl_ttm_fault, radeon_ttm_fault, > > kernfs_vma_fault, shm_fault seem to be safe from the reclaim recursion > > POV. radeon_ttm_fault takes a lock for reading but that one doesn't seem > > to be used from the reclaim context. > > > > Or did I miss your point? Are you concerned about some fs overloading > > filemap_fault and do some locking before delegating to filemap_fault? > > The latter: > > https://git.kernel.org/cgit/linux/kernel/git/dgc/linux-xfs.git/commit/?h=xfs-mmap-lock&id=de0e8c20ba3a65b0f15040aabbefdc1999876e6b I will have a look at the code to see what we can do about it. > > > GFP_KERNEL allocation for mappings is simply wrong. All mapping > > > allocations where the caller cannot pass a gfp_mask need to obey > > > the mapping_gfp_mask that is set by the mapping owner > > > > Hmm, I thought this is true only when the function might be called from > > the fs path. > > How do you know in, say, mpage_readpages, you aren't being called > from a fs path that holds locks? e.g. we can get there from ext4 > doing readdir, so it is holding an i_mutex lock at that point. > > Many other paths into mpages_readpages don't hold locks, but there > are some that do, and those that do need functionals like this to > obey the mapping_gfp_mask because it is set appropriately for the > allocation context of the inode that owns the mapping What about the following? --- >From 5d905cb291138d61bbab056845d6e53bc4451ec8 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 19 Mar 2015 14:56:56 +0100 Subject: [PATCH 1/2] mm: do not ignore mapping_gfp_mask in page cache allocation paths page_cache_read, do_generic_file_read, __generic_file_splice_read and __ntfs_grab_cache_pages currently ignore mapping_gfp_mask when calling add_to_page_cache_lru which might cause recursion into fs down in the direct reclaim path if the mapping really relies on GFP_NOFS semantic. This doesn't seem to be the case now because page_cache_read (page fault path) doesn't seem to suffer from the reclaim recursion issues and do_generic_file_read and __generic_file_splice_read also shouldn't be called under fs locks which would deadlock in the reclaim path. Anyway it is better to obey mapping gfp mask and prevent from later breakage. Signed-off-by: Michal Hocko --- fs/ntfs/file.c | 2 +- fs/splice.c| 2 +- mm/filemap.c | 6 -- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c index 1da9b2d184dc..568c9dbc7e61 100644 --- a/fs/ntfs/file.c +++ b/fs/ntfs/file.c @@ -422,7 +422,7 @@ static inline int __ntfs_grab_cache_pages(struct address_space *mapping, } } err = add_to_page_cache_lru(*cached_page, mapping, index, - GFP_KERNEL); + GFP_KERNEL & mapping_gfp_mask(mapping)); if (unlikely(err)) { if (err == -EEXIST) continue; diff --git a/fs/splice.c b/fs/splice.c index 75c6058eabf2..71f6c51f019a 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -360,7 +360,7 @@ __generic_file_splice_read(struct file *in, loff_t *ppos, break; error = add_to_page_cache_lru(page, mapping, index, - GFP_KERNEL); + GFP_KERNEL & mapping_gfp_mask(mapping)); if (unlikely(error)) { page_cache_release(page);
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Thu, Mar 19, 2015 at 02:55:58PM +0100, Michal Hocko wrote: > @@ -2701,13 +2701,24 @@ static int __do_fault(struct vm_area_struct *vma, > unsigned long address, > { > struct vm_fault vmf; > int ret; > + struct address_space *mapping = vma->vm_file->f_mapping; > + gfp_t mapping_gfp; > > vmf.virtual_address = (void __user *)(address & PAGE_MASK); > vmf.pgoff = pgoff; > vmf.flags = flags; > vmf.page = NULL; > > + /* > + * Some filesystems always drop __GFP_FS to prevent from reclaim > + * recursion back to FS code. This is not the case here because > + * we are at the top of the call chain. Add GFP_FS flags to prevent > + * from premature OOM killer. > + */ > + mapping_gfp = mapping_gfp_mask(mapping); > + mapping_set_gfp_mask(mapping, mapping_gfp | __GFP_FS | __GFP_IO); > ret = vma->vm_ops->fault(vma, &vmf); > + mapping_set_gfp_mask(mapping, mapping_gfp); Urk! The inode owns the mapping and makes these decisions, not the page fault path. These mapping flags may be set for reasons you don't expect or know about (e.g. a subsystem specific shrinker constraint) so paths like this have no business clearing flags they don't own. cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Thu, Mar 19, 2015 at 01:44:41PM +0100, Michal Hocko wrote: > On Thu 19-03-15 18:14:39, Dave Chinner wrote: > > On Wed, Mar 18, 2015 at 03:55:28PM +0100, Michal Hocko wrote: > > > On Wed 18-03-15 10:44:11, Rik van Riel wrote: > > > > On 03/18/2015 10:09 AM, Michal Hocko wrote: > > > > > page_cache_read has been historically using page_cache_alloc_cold to > > > > > allocate a new page. This means that mapping_gfp_mask is used as the > > > > > base for the gfp_mask. Many filesystems are setting this mask to > > > > > GFP_NOFS to prevent from fs recursion issues. page_cache_read is, > > > > > however, not called from the fs layer > > > > > > > > Is that true for filesystems that have directories in > > > > the page cache? > > > > > > I haven't found any explicit callers of filemap_fault except for ocfs2 > > > and ceph and those seem OK to me. Which filesystems you have in mind? > > > > Just about every major filesystem calls filemap_fault through the > > .fault callout. > > That is right but the callback is called from the VM layer where we > obviously do not take any fs locks (we are holding only mmap_sem > for reading). > Those who call filemap_fault directly (ocfs2 and ceph) and those > who call the callback directly: qxl_ttm_fault, radeon_ttm_fault, > kernfs_vma_fault, shm_fault seem to be safe from the reclaim recursion > POV. radeon_ttm_fault takes a lock for reading but that one doesn't seem > to be used from the reclaim context. > > Or did I miss your point? Are you concerned about some fs overloading > filemap_fault and do some locking before delegating to filemap_fault? The latter: https://git.kernel.org/cgit/linux/kernel/git/dgc/linux-xfs.git/commit/?h=xfs-mmap-lock&id=de0e8c20ba3a65b0f15040aabbefdc1999876e6b > > GFP_KERNEL allocation for mappings is simply wrong. All mapping > > allocations where the caller cannot pass a gfp_mask need to obey > > the mapping_gfp_mask that is set by the mapping owner > > Hmm, I thought this is true only when the function might be called from > the fs path. How do you know in, say, mpage_readpages, you aren't being called from a fs path that holds locks? e.g. we can get there from ext4 doing readdir, so it is holding an i_mutex lock at that point. Many other paths into mpages_readpages don't hold locks, but there are some that do, and those that do need functionals like this to obey the mapping_gfp_mask because it is set appropriately for the allocation context of the inode that owns the mapping Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Thu 19-03-15 14:55:58, Michal Hocko wrote: > On Thu 19-03-15 08:38:35, Neil Brown wrote: > [...] > > Nearly half the places in the kernel which call mapping_gfp_mask() remove > > the > > __GFP_FS bit. > > > > That suggests to me that it might make sense to have > >mapping_gfp_mask_fs() > > and > >mapping_gfp_mask_nofs() > > > > and let the presence of __GFP_FS (and __GFP_IO) be determined by the > > call-site rather than the filesystem. > > Sounds reasonable to me but filesystems tend to use this in a very > different ways. > - xfs drops GFP_FS in xfs_setup_inode so all page cache allocations are > NOFS. > - reiserfs drops GFP_FS only before calling read_mapping_page in > reiserfs_get_page and never restores the original mask. > - btrfs doesn't seem to rely on mapping_gfp_mask for anything other than > btree_inode (unless it gets inherrited in a way I haven't noticed). > - ext* doesn't seem to rely on the mapping gfp mask at all. > > So it is not clear to me how we should change that into callsites. But I > guess we can change at least the page fault path like the following. I > like it much more than the previous way which is too hackish. But this is racy instead... And I do not think we can make it raceless so scratch this and get back to the original approach. [...] > + /* > + * Some filesystems always drop __GFP_FS to prevent from reclaim > + * recursion back to FS code. This is not the case here because > + * we are at the top of the call chain. Add GFP_FS flags to prevent > + * from premature OOM killer. > + */ > + mapping_gfp = mapping_gfp_mask(mapping); > + mapping_set_gfp_mask(mapping, mapping_gfp | __GFP_FS | __GFP_IO); > ret = vma->vm_ops->fault(vma, &vmf); > + mapping_set_gfp_mask(mapping, mapping_gfp); > if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY))) > return ret; -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Thu 19-03-15 08:38:35, Neil Brown wrote: [...] > Nearly half the places in the kernel which call mapping_gfp_mask() remove the > __GFP_FS bit. > > That suggests to me that it might make sense to have >mapping_gfp_mask_fs() > and >mapping_gfp_mask_nofs() > > and let the presence of __GFP_FS (and __GFP_IO) be determined by the > call-site rather than the filesystem. Sounds reasonable to me but filesystems tend to use this in a very different ways. - xfs drops GFP_FS in xfs_setup_inode so all page cache allocations are NOFS. - reiserfs drops GFP_FS only before calling read_mapping_page in reiserfs_get_page and never restores the original mask. - btrfs doesn't seem to rely on mapping_gfp_mask for anything other than btree_inode (unless it gets inherrited in a way I haven't noticed). - ext* doesn't seem to rely on the mapping gfp mask at all. So it is not clear to me how we should change that into callsites. But I guess we can change at least the page fault path like the following. I like it much more than the previous way which is too hackish. --- >From 0aff17ef2daf1e4d7b5c7a2fcf3c0aac2670f527 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 19 Mar 2015 14:46:07 +0100 Subject: [PATCH] mm: Allow __GFP_FS for page_cache_read page cache allocation page_cache_read has been historically using page_cache_alloc_cold to allocate a new page. This means that mapping_gfp_mask is used as the base for the gfp_mask. Many filesystems are setting this mask to GFP_NOFS to prevent from fs recursion issues. page_cache_read is, however, not called from the fs layer so it doesn't need this protection. The protection might be even harmful. There is a strong push to fail GFP_NOFS allocations rather than loop within allocator indefinitely with a very limited reclaim ability. Once we start failing those requests the OOM killer might be triggered prematurely because the page cache allocation failure is propagated up the page fault path and end up in pagefault_out_of_memory. This patch updates mapping_gfp_mask and adds both __GFP_FS and __GFP_IO in __do_fault before we the address space fault handler is called and restores the original flags after the callback. This will allow to use the go into FS from the direct reclaim if needed and prevent from pre-mature OOM killer for most filesystems and still allow FS layer to overwrite the mask from the .fault handler should there be a need. Reported-by: Tetsuo Handa Signed-off-by: Michal Hocko --- mm/memory.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index c4b08e3ef058..ba528787e25f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2701,13 +2701,24 @@ static int __do_fault(struct vm_area_struct *vma, unsigned long address, { struct vm_fault vmf; int ret; + struct address_space *mapping = vma->vm_file->f_mapping; + gfp_t mapping_gfp; vmf.virtual_address = (void __user *)(address & PAGE_MASK); vmf.pgoff = pgoff; vmf.flags = flags; vmf.page = NULL; + /* +* Some filesystems always drop __GFP_FS to prevent from reclaim +* recursion back to FS code. This is not the case here because +* we are at the top of the call chain. Add GFP_FS flags to prevent +* from premature OOM killer. +*/ + mapping_gfp = mapping_gfp_mask(mapping); + mapping_set_gfp_mask(mapping, mapping_gfp | __GFP_FS | __GFP_IO); ret = vma->vm_ops->fault(vma, &vmf); + mapping_set_gfp_mask(mapping, mapping_gfp); if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY))) return ret; -- 2.1.4 > However I am a bit concerned about drivers/block/loop.c. > Might a filesystem read on the loop block device wait for a page_cache_read() > on the loop-mounted file? In that case you really don't want __GFP_FS set > when allocating that page. I guess I see what you mean but I fail to see how would the deadlock happen from the page fault path. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Thu 19-03-15 18:14:39, Dave Chinner wrote: > On Wed, Mar 18, 2015 at 03:55:28PM +0100, Michal Hocko wrote: > > On Wed 18-03-15 10:44:11, Rik van Riel wrote: > > > On 03/18/2015 10:09 AM, Michal Hocko wrote: > > > > page_cache_read has been historically using page_cache_alloc_cold to > > > > allocate a new page. This means that mapping_gfp_mask is used as the > > > > base for the gfp_mask. Many filesystems are setting this mask to > > > > GFP_NOFS to prevent from fs recursion issues. page_cache_read is, > > > > however, not called from the fs layer > > > > > > Is that true for filesystems that have directories in > > > the page cache? > > > > I haven't found any explicit callers of filemap_fault except for ocfs2 > > and ceph and those seem OK to me. Which filesystems you have in mind? > > Just about every major filesystem calls filemap_fault through the > .fault callout. That is right but the callback is called from the VM layer where we obviously do not take any fs locks (we are holding only mmap_sem for reading). Those who call filemap_fault directly (ocfs2 and ceph) and those who call the callback directly: qxl_ttm_fault, radeon_ttm_fault, kernfs_vma_fault, shm_fault seem to be safe from the reclaim recursion POV. radeon_ttm_fault takes a lock for reading but that one doesn't seem to be used from the reclaim context. Or did I miss your point? Are you concerned about some fs overloading filemap_fault and do some locking before delegating to filemap_fault? > C symbol: filemap_fault > > File FunctionLine > 0 9p/vfs_file.c 831 .fault = filemap_fault, > 1 9p/vfs_file.c 838 .fault = filemap_fault, > 2 btrfs/file.c 2081 .fault = filemap_fault, > 3 cifs/file.c3242 .fault = filemap_fault, > 4 ext4/file.c 215 .fault = filemap_fault, > 5 f2fs/file.c 93 .fault = filemap_fault, > 6 fuse/file.c2062 .fault = filemap_fault, > 7 gfs2/file.c 498 .fault = filemap_fault, > 8 nfs/file.c 653 .fault = filemap_fault, > 9 nilfs2/file.c 128 .fault = filemap_fault, > a ubifs/file.c 1536 .fault = filemap_fault, > b xfs/xfs_file.c 1420 .fault = filemap_fault, > > > > Btw. how would that work as we already have GFP_KERNEL allocation few > > lines below? > > GFP_KERNEL allocation for mappings is simply wrong. All mapping > allocations where the caller cannot pass a gfp_mask need to obey > the mapping_gfp_mask that is set by the mapping owner Hmm, I thought this is true only when the function might be called from the fs path. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Wed, Mar 18, 2015 at 03:55:28PM +0100, Michal Hocko wrote: > On Wed 18-03-15 10:44:11, Rik van Riel wrote: > > On 03/18/2015 10:09 AM, Michal Hocko wrote: > > > page_cache_read has been historically using page_cache_alloc_cold to > > > allocate a new page. This means that mapping_gfp_mask is used as the > > > base for the gfp_mask. Many filesystems are setting this mask to > > > GFP_NOFS to prevent from fs recursion issues. page_cache_read is, > > > however, not called from the fs layer > > > > Is that true for filesystems that have directories in > > the page cache? > > I haven't found any explicit callers of filemap_fault except for ocfs2 > and ceph and those seem OK to me. Which filesystems you have in mind? Just about every major filesystem calls filemap_fault through the .fault callout. C symbol: filemap_fault File FunctionLine 0 9p/vfs_file.c 831 .fault = filemap_fault, 1 9p/vfs_file.c 838 .fault = filemap_fault, 2 btrfs/file.c 2081 .fault = filemap_fault, 3 cifs/file.c3242 .fault = filemap_fault, 4 ext4/file.c 215 .fault = filemap_fault, 5 f2fs/file.c 93 .fault = filemap_fault, 6 fuse/file.c2062 .fault = filemap_fault, 7 gfs2/file.c 498 .fault = filemap_fault, 8 nfs/file.c 653 .fault = filemap_fault, 9 nilfs2/file.c 128 .fault = filemap_fault, a ubifs/file.c 1536 .fault = filemap_fault, b xfs/xfs_file.c 1420 .fault = filemap_fault, > Btw. how would that work as we already have GFP_KERNEL allocation few > lines below? GFP_KERNEL allocation for mappings is simply wrong. All mapping allocations where the caller cannot pass a gfp_mask need to obey the mapping_gfp_mask that is set by the mapping owner Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Wed, 18 Mar 2015 16:45:40 +0100 Michal Hocko wrote: > What do you think about this v2? I cannot say I would like it but I > really dislike the whole mapping_gfp_mask API to be honest. > --- > >From d88010d6f5f59d7eb87b691e27e201d12cab9141 Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Wed, 18 Mar 2015 16:06:40 +0100 > Subject: [PATCH] mm: Allow __GFP_FS for page_cache_read page cache allocation > > page_cache_read has been historically using page_cache_alloc_cold to > allocate a new page. This means that mapping_gfp_mask is used as the > base for the gfp_mask. Many filesystems are setting this mask to > GFP_NOFS to prevent from fs recursion issues. page_cache_read is, > however, not called from the fs layer so it doesn't need this > protection. Even ceph and ocfs2 which call filemap_fault from their > fault handlers seem to be OK because they are not taking any fs lock > before invoking generic implementation. > > The protection might be even harmful. There is a strong push to fail > GFP_NOFS allocations rather than loop within allocator indefinitely with > a very limited reclaim ability. Once we start failing those requests > the OOM killer might be triggered prematurely because the page cache > allocation failure is propagated up the page fault path and end up in > pagefault_out_of_memory. > > Add __GFP_FS and __GFPIO to the gfp mask which is coming from the > mapping to fix this issue. > > Reported-by: Tetsuo Handa > Signed-off-by: Michal Hocko > --- > mm/filemap.c | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 968cd8e03d2e..8b50d5eb52b2 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1752,7 +1752,15 @@ static int page_cache_read(struct file *file, pgoff_t > offset) > int ret; > > do { > - page = page_cache_alloc_cold(mapping); > + gfp_t page_cache_gfp = mapping_gfp_mask(mapping)|__GFP_COLD; > + > + /* > + * This code is not called from the fs layer so we do not need > + * reclaim recursion protection. !GFP_FS might fail too easy > + * and trigger OOM killer prematuraly. > + */ > + page_cache_gfp |= __GFP_FS | __GFP_IO; > + page = __page_cache_alloc(page_cache_gfp); > if (!page) > return -ENOMEM; > Nearly half the places in the kernel which call mapping_gfp_mask() remove the __GFP_FS bit. That suggests to me that it might make sense to have mapping_gfp_mask_fs() and mapping_gfp_mask_nofs() and let the presence of __GFP_FS (and __GFP_IO) be determined by the call-site rather than the filesystem. However I am a bit concerned about drivers/block/loop.c. Might a filesystem read on the loop block device wait for a page_cache_read() on the loop-mounted file? In that case you really don't want __GFP_FS set when allocating that page. NeilBrown pgp4N0WCvHqLi.pgp Description: OpenPGP digital signature
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
What do you think about this v2? I cannot say I would like it but I really dislike the whole mapping_gfp_mask API to be honest. --- >From d88010d6f5f59d7eb87b691e27e201d12cab9141 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Wed, 18 Mar 2015 16:06:40 +0100 Subject: [PATCH] mm: Allow __GFP_FS for page_cache_read page cache allocation page_cache_read has been historically using page_cache_alloc_cold to allocate a new page. This means that mapping_gfp_mask is used as the base for the gfp_mask. Many filesystems are setting this mask to GFP_NOFS to prevent from fs recursion issues. page_cache_read is, however, not called from the fs layer so it doesn't need this protection. Even ceph and ocfs2 which call filemap_fault from their fault handlers seem to be OK because they are not taking any fs lock before invoking generic implementation. The protection might be even harmful. There is a strong push to fail GFP_NOFS allocations rather than loop within allocator indefinitely with a very limited reclaim ability. Once we start failing those requests the OOM killer might be triggered prematurely because the page cache allocation failure is propagated up the page fault path and end up in pagefault_out_of_memory. Add __GFP_FS and __GFPIO to the gfp mask which is coming from the mapping to fix this issue. Reported-by: Tetsuo Handa Signed-off-by: Michal Hocko --- mm/filemap.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/filemap.c b/mm/filemap.c index 968cd8e03d2e..8b50d5eb52b2 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1752,7 +1752,15 @@ static int page_cache_read(struct file *file, pgoff_t offset) int ret; do { - page = page_cache_alloc_cold(mapping); + gfp_t page_cache_gfp = mapping_gfp_mask(mapping)|__GFP_COLD; + + /* +* This code is not called from the fs layer so we do not need +* reclaim recursion protection. !GFP_FS might fail too easy +* and trigger OOM killer prematuraly. +*/ + page_cache_gfp |= __GFP_FS | __GFP_IO; + page = __page_cache_alloc(page_cache_gfp); if (!page) return -ENOMEM; -- 2.1.4 -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Wed 18-03-15 10:44:11, Rik van Riel wrote: > On 03/18/2015 10:09 AM, Michal Hocko wrote: > > page_cache_read has been historically using page_cache_alloc_cold to > > allocate a new page. This means that mapping_gfp_mask is used as the > > base for the gfp_mask. Many filesystems are setting this mask to > > GFP_NOFS to prevent from fs recursion issues. page_cache_read is, > > however, not called from the fs layer > > Is that true for filesystems that have directories in > the page cache? I haven't found any explicit callers of filemap_fault except for ocfs2 and ceph and those seem OK to me. Which filesystems you have in mind? Btw. how would that work as we already have GFP_KERNEL allocation few lines below? -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On 03/18/2015 10:09 AM, Michal Hocko wrote: > page_cache_read has been historically using page_cache_alloc_cold to > allocate a new page. This means that mapping_gfp_mask is used as the > base for the gfp_mask. Many filesystems are setting this mask to > GFP_NOFS to prevent from fs recursion issues. page_cache_read is, > however, not called from the fs layer Is that true for filesystems that have directories in the page cache? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Wed 18-03-15 14:38:47, Mel Gorman wrote: > On Wed, Mar 18, 2015 at 03:09:26PM +0100, Michal Hocko wrote: > > page_cache_read has been historically using page_cache_alloc_cold to > > allocate a new page. This means that mapping_gfp_mask is used as the > > base for the gfp_mask. Many filesystems are setting this mask to > > GFP_NOFS to prevent from fs recursion issues. page_cache_read is, > > however, not called from the fs layer so it doesn't need this > > protection. Even ceph and ocfs2 which call filemap_fault from their > > fault handlers seem to be OK because they are not taking any fs lock > > before invoking generic implementation. > > > > The protection might be even harmful. There is a strong push to fail > > GFP_NOFS allocations rather than loop within allocator indefinitely with > > a very limited reclaim ability. Once we start failing those requests > > the OOM killer might be triggered prematurely because the page cache > > allocation failure is propagated up the page fault path and end up in > > pagefault_out_of_memory. > > > > Use GFP_KERNEL mask instead because it is safe from the reclaim > > recursion POV. We are already doing GFP_KERNEL allocations down > > add_to_page_cache_lru path. > > > > Reported-by: Tetsuo Handa > > Signed-off-by: Michal Hocko > > I'm very far behind after LSF/MM so do not know where this came out of > but it loses addressing restriction hints from the driver such as > > drivers/gpu/drm/gma500/gem.c: mapping_set_gfp_mask(r->gem.filp->f_mapping, > GFP_KERNEL | __GFP_DMA32); OK, I have missed these drivers. Scratch the patch for now I will think about it more. Thanks! -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Wed, Mar 18, 2015 at 03:09:26PM +0100, Michal Hocko wrote: > page_cache_read has been historically using page_cache_alloc_cold to > allocate a new page. This means that mapping_gfp_mask is used as the > base for the gfp_mask. Many filesystems are setting this mask to > GFP_NOFS to prevent from fs recursion issues. page_cache_read is, > however, not called from the fs layer so it doesn't need this > protection. Even ceph and ocfs2 which call filemap_fault from their > fault handlers seem to be OK because they are not taking any fs lock > before invoking generic implementation. > > The protection might be even harmful. There is a strong push to fail > GFP_NOFS allocations rather than loop within allocator indefinitely with > a very limited reclaim ability. Once we start failing those requests > the OOM killer might be triggered prematurely because the page cache > allocation failure is propagated up the page fault path and end up in > pagefault_out_of_memory. > > Use GFP_KERNEL mask instead because it is safe from the reclaim > recursion POV. We are already doing GFP_KERNEL allocations down > add_to_page_cache_lru path. > > Reported-by: Tetsuo Handa > Signed-off-by: Michal Hocko I'm very far behind after LSF/MM so do not know where this came out of but it loses addressing restriction hints from the driver such as drivers/gpu/drm/gma500/gem.c: mapping_set_gfp_mask(r->gem.filp->f_mapping, GFP_KERNEL | __GFP_DMA32); It also loses mobility hints for fragmentation avoidance. fs/inode.c: mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE); If users of mapping_set_gfp_mask are now being ignored then it should at least trigger a once-off warning that the flags are being ignored so it's obvious if a recursion does occur and cause problems. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On Wed 18-03-15 10:32:57, Rik van Riel wrote: > On 03/18/2015 10:09 AM, Michal Hocko wrote: > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 968cd8e03d2e..26f62ba79f50 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -1752,7 +1752,7 @@ static int page_cache_read(struct file *file, pgoff_t > > offset) > > int ret; > > > > do { > > - page = page_cache_alloc_cold(mapping); > > + page = __page_cache_alloc(GFP_KERNEL|__GFP_COLD); > > if (!page) > > return -ENOMEM; > > Won't this break on highmem systems, by failing to > allocate the page cache from highmem, where previously > it would? It will! This is broken. I can see inode_init_always now. We need to add GFP_HIGHUSER_MOVABLE here. Thanks for pointing this out! -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
On 03/18/2015 10:09 AM, Michal Hocko wrote: > diff --git a/mm/filemap.c b/mm/filemap.c > index 968cd8e03d2e..26f62ba79f50 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1752,7 +1752,7 @@ static int page_cache_read(struct file *file, pgoff_t > offset) > int ret; > > do { > - page = page_cache_alloc_cold(mapping); > + page = __page_cache_alloc(GFP_KERNEL|__GFP_COLD); > if (!page) > return -ENOMEM; Won't this break on highmem systems, by failing to allocate the page cache from highmem, where previously it would? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/