[RFC 5/8] ext4: Do not fail journal due to block allocator

2015-08-05 Thread mhocko
From: Michal Hocko mho...@suse.com

Since mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM
memory allocator doesn't endlessly loop to satisfy low-order allocations
and instead fails them to allow callers to handle them gracefully.

Some of the callers are not yet prepared for this behavior though. ext4
block allocator relies solely on GFP_NOFS allocation requests and
allocation failures lead to aborting yournal too easily:

[  345.028333] oom-trash: page allocation failure: order:0, mode:0x50
[  345.028336] CPU: 1 PID: 8334 Comm: oom-trash Tainted: GW   
4.0.0-nofs3-6-gdfe9931f5f68 #588
[  345.028337] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.8.1-20150428_134905-gandalf 04/01/2014
[  345.028339]   880005a17708 81538a54 
8107a40f
[  345.028341]  0050 880005a17798 810fe854 
00018000
[  345.028342]  0046  81a52100 
0246
[  345.028343] Call Trace:
[  345.028348]  [81538a54] dump_stack+0x4f/0x7b
[  345.028370]  [810fe854] warn_alloc_failed+0x12a/0x13f
[  345.028373]  [81101bd2] __alloc_pages_nodemask+0x7f3/0x8aa
[  345.028375]  [810f9933] pagecache_get_page+0x12a/0x1c9
[  345.028390]  [a005bc64] ext4_mb_load_buddy+0x220/0x367 [ext4]
[  345.028414]  [a006014f] ext4_free_blocks+0x522/0xa4c [ext4]
[  345.028425]  [a0054e14] ext4_ext_remove_space+0x833/0xf22 [ext4]
[  345.028434]  [a005677e] ext4_ext_truncate+0x8c/0xb0 [ext4]
[  345.028441]  [a00342bf] ext4_truncate+0x20b/0x38d [ext4]
[  345.028462]  [a003573c] ext4_evict_inode+0x32b/0x4c1 [ext4]
[  345.028464]  [8116d04f] evict+0xa0/0x148
[  345.028466]  [8116dca8] iput+0x1a1/0x1f0
[  345.028468]  [811697b4] __dentry_kill+0x136/0x1a6
[  345.028470]  [81169a3e] dput+0x21a/0x243
[  345.028472]  [81157cda] __fput+0x184/0x19b
[  345.028473]  [81157d29] fput+0xe/0x10
[  345.028475]  [8105a05f] task_work_run+0x8a/0xa1
[  345.028477]  [810452f0] do_exit+0x3c6/0x8dc
[  345.028482]  [8104588a] do_group_exit+0x4d/0xb2
[  345.028483]  [8104eeeb] get_signal+0x5b1/0x5f5
[  345.028488]  [81002202] do_signal+0x28/0x5d0
[...]
[  345.028624] EXT4-fs error (device hdb1) in ext4_free_blocks:4879: Out of 
memory
[  345.033097] Aborting journal on device hdb1-8.
[  345.036339] EXT4-fs (hdb1): Remounting filesystem read-only
[  345.036344] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: 
Journal has aborted
[  345.036766] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: 
Journal has aborted
[  345.038583] EXT4-fs error (device hdb1) in ext4_ext_remove_space:3048: 
Journal has aborted
[  345.049115] EXT4-fs error (device hdb1) in ext4_ext_truncate:4669: Journal 
has aborted
[  345.050434] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: 
Journal has aborted
[  345.053064] EXT4-fs error (device hdb1) in ext4_truncate:3668: Journal has 
aborted
[  345.053582] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: 
Journal has aborted
[  345.053946] EXT4-fs error (device hdb1) in ext4_orphan_del:2686: Journal has 
aborted
[  345.055367] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: 
Journal has aborted

The failure is really premature because GFP_NOFS allocation context is
very restricted - especially in the fs metadata heavy loads. Before we
go with a more sofisticated solution, let's simply imitate the previous
behavior of non-failing NOFS allocation and use __GFP_NOFAIL for the
buddy block allocator. I wasn't able to trigger the issue with this
patch anymore.

Signed-off-by: Michal Hocko mho...@suse.com
---
 fs/ext4/mballoc.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5b1613a54307..e6361622bfd5 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -992,7 +992,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block 
*sb,
block = group * 2;
pnum = block / blocks_per_page;
poff = block % blocks_per_page;
-   page = find_or_create_page(inode-i_mapping, pnum, GFP_NOFS);
+   page = find_or_create_page(inode-i_mapping, pnum,
+  GFP_NOFS|__GFP_NOFAIL);
if (!page)
return -ENOMEM;
BUG_ON(page-mapping != inode-i_mapping);
@@ -1006,7 +1007,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block 
*sb,
 
block++;
pnum = block / blocks_per_page;
-   page = find_or_create_page(inode-i_mapping, pnum, GFP_NOFS);
+   page = find_or_create_page(inode-i_mapping, pnum,
+  GFP_NOFS|__GFP_NOFAIL);
if (!page)
return -ENOMEM;
BUG_ON(page-mapping != inode-i_mapping);
@@ -1158,7 +1160,8 @@ ext4_mb_load_buddy(struct super_block *sb, 

Re: [RFC 5/8] ext4: Do not fail journal due to block allocator

2015-08-05 Thread Jan Kara
On Wed 05-08-15 11:51:21, mho...@kernel.org wrote:
 From: Michal Hocko mho...@suse.com
 
 Since mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM
 memory allocator doesn't endlessly loop to satisfy low-order allocations
 and instead fails them to allow callers to handle them gracefully.
 
 Some of the callers are not yet prepared for this behavior though. ext4
 block allocator relies solely on GFP_NOFS allocation requests and
 allocation failures lead to aborting yournal too easily:
 
 [  345.028333] oom-trash: page allocation failure: order:0, mode:0x50
 [  345.028336] CPU: 1 PID: 8334 Comm: oom-trash Tainted: GW   
 4.0.0-nofs3-6-gdfe9931f5f68 #588
 [  345.028337] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
 1.8.1-20150428_134905-gandalf 04/01/2014
 [  345.028339]   880005a17708 81538a54 
 8107a40f
 [  345.028341]  0050 880005a17798 810fe854 
 00018000
 [  345.028342]  0046  81a52100 
 0246
 [  345.028343] Call Trace:
 [  345.028348]  [81538a54] dump_stack+0x4f/0x7b
 [  345.028370]  [810fe854] warn_alloc_failed+0x12a/0x13f
 [  345.028373]  [81101bd2] __alloc_pages_nodemask+0x7f3/0x8aa
 [  345.028375]  [810f9933] pagecache_get_page+0x12a/0x1c9
 [  345.028390]  [a005bc64] ext4_mb_load_buddy+0x220/0x367 [ext4]
 [  345.028414]  [a006014f] ext4_free_blocks+0x522/0xa4c [ext4]
 [  345.028425]  [a0054e14] ext4_ext_remove_space+0x833/0xf22 [ext4]
 [  345.028434]  [a005677e] ext4_ext_truncate+0x8c/0xb0 [ext4]
 [  345.028441]  [a00342bf] ext4_truncate+0x20b/0x38d [ext4]
 [  345.028462]  [a003573c] ext4_evict_inode+0x32b/0x4c1 [ext4]
 [  345.028464]  [8116d04f] evict+0xa0/0x148
 [  345.028466]  [8116dca8] iput+0x1a1/0x1f0
 [  345.028468]  [811697b4] __dentry_kill+0x136/0x1a6
 [  345.028470]  [81169a3e] dput+0x21a/0x243
 [  345.028472]  [81157cda] __fput+0x184/0x19b
 [  345.028473]  [81157d29] fput+0xe/0x10
 [  345.028475]  [8105a05f] task_work_run+0x8a/0xa1
 [  345.028477]  [810452f0] do_exit+0x3c6/0x8dc
 [  345.028482]  [8104588a] do_group_exit+0x4d/0xb2
 [  345.028483]  [8104eeeb] get_signal+0x5b1/0x5f5
 [  345.028488]  [81002202] do_signal+0x28/0x5d0
 [...]
 [  345.028624] EXT4-fs error (device hdb1) in ext4_free_blocks:4879: Out of 
 memory
 [  345.033097] Aborting journal on device hdb1-8.
 [  345.036339] EXT4-fs (hdb1): Remounting filesystem read-only
 [  345.036344] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: 
 Journal has aborted
 [  345.036766] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: 
 Journal has aborted
 [  345.038583] EXT4-fs error (device hdb1) in ext4_ext_remove_space:3048: 
 Journal has aborted
 [  345.049115] EXT4-fs error (device hdb1) in ext4_ext_truncate:4669: Journal 
 has aborted
 [  345.050434] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: 
 Journal has aborted
 [  345.053064] EXT4-fs error (device hdb1) in ext4_truncate:3668: Journal has 
 aborted
 [  345.053582] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: 
 Journal has aborted
 [  345.053946] EXT4-fs error (device hdb1) in ext4_orphan_del:2686: Journal 
 has aborted
 [  345.055367] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: 
 Journal has aborted
 
 The failure is really premature because GFP_NOFS allocation context is
 very restricted - especially in the fs metadata heavy loads. Before we
 go with a more sofisticated solution, let's simply imitate the previous
 behavior of non-failing NOFS allocation and use __GFP_NOFAIL for the
 buddy block allocator. I wasn't able to trigger the issue with this
 patch anymore.
 
The patch looks good. You can add:

Reviewed-by: Jan Kara j...@suse.com

Honza

 Signed-off-by: Michal Hocko mho...@suse.com
 ---
  fs/ext4/mballoc.c | 12 
  1 file changed, 8 insertions(+), 4 deletions(-)
 
 diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
 index 5b1613a54307..e6361622bfd5 100644
 --- a/fs/ext4/mballoc.c
 +++ b/fs/ext4/mballoc.c
 @@ -992,7 +992,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block 
 *sb,
   block = group * 2;
   pnum = block / blocks_per_page;
   poff = block % blocks_per_page;
 - page = find_or_create_page(inode-i_mapping, pnum, GFP_NOFS);
 + page = find_or_create_page(inode-i_mapping, pnum,
 +GFP_NOFS|__GFP_NOFAIL);
   if (!page)
   return -ENOMEM;
   BUG_ON(page-mapping != inode-i_mapping);
 @@ -1006,7 +1007,8 @@ static int ext4_mb_get_buddy_page_lock(struct 
 super_block *sb,
  
   block++;
   pnum = block / blocks_per_page;
 - page = find_or_create_page(inode-i_mapping, pnum, GFP_NOFS);
 + page =