page_cache_read has been historically using page_cache_alloc_cold to allocate a new page. This means that mapping_gfp_mask is used as the base for the gfp_mask. Many filesystems are setting this mask to GFP_NOFS to prevent from fs recursion issues. page_cache_read is, however, not called from the fs layer so it doesn't need this protection. Even ceph and ocfs2 which call filemap_fault from their fault handlers seem to be OK because they are not taking any fs lock before invoking generic implementation.
The protection might be even harmful. There is a strong push to fail GFP_NOFS allocations rather than loop within allocator indefinitely with a very limited reclaim ability. Once we start failing those requests the OOM killer might be triggered prematurely because the page cache allocation failure is propagated up the page fault path and end up in pagefault_out_of_memory. Use GFP_KERNEL mask instead because it is safe from the reclaim recursion POV. We are already doing GFP_KERNEL allocations down add_to_page_cache_lru path. Reported-by: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> Signed-off-by: Michal Hocko <mho...@suse.cz> --- mm/filemap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/filemap.c b/mm/filemap.c index 968cd8e03d2e..26f62ba79f50 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1752,7 +1752,7 @@ static int page_cache_read(struct file *file, pgoff_t offset) int ret; do { - page = page_cache_alloc_cold(mapping); + page = __page_cache_alloc(GFP_KERNEL|__GFP_COLD); if (!page) return -ENOMEM; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/