Re: [PATCH] kernel/module.c: Free lock-classes if parse_args failed
22.01.2015 3:40, Rusty Russell пишет: Andrey Tsyvarev writes: 21.01.2015 4:40, Rusty Russell пишет: Andrey Tsyvarev writes: 20.01.2015 9:37, Rusty Russell пишет: Andrey Tsyvarev writes: parse_args call module parameters' .set handlers, which may use locks defined in the module. So, these classes should be freed in case parse_args returns error(e.g. due to incorrect parameter passed). Thanks, this seems right. Applied. But this makes me ask: where is lockdep_free_key_range() called on the module init code? It doesn't seem to be at all... As I understand, locks are not allowed to be defined in the module init section. So, no needs to call lockdep_free_key_range() for it. This has a sense: objects from that section are allowed to be used only by module->init() function. But a single function call doesn't require any synchronization wrt itself. I don't know that we have any __initdata locks; it would be really weird. But change 'static DEFINE_MUTEX(mutex_param);' to 'static __initdata DEFINE_MUTEX(mutex_param);' to test. Compiler warns about sections mismatch, but the test works. According to lockdep_free_key_range() code, lock class is cleared not only according to its key(which is equal to lock address in the case of static lock) but also according to its name. What happens if you later register another lock at that address, since the memory is freed? Do you mean that scenario: 1) mutex1 is placed in module1 .init.data section, 2) after module1 is initialized, .init.data section is freed, 3) same memory is reused for module2 .data section, 4) mutex2 is placed in module2 .data section at the same address, as mutex1 was? It seems, mutex2 will share lock class with mutex1. That is, lockdep will confused: [kernel/locking/lockdep.c] 707 if (class->key == key) { 708 /* 709 * Huh! same key, different name? Did someone trample 710 * on some memory? We're most confused. 711 */ 712 WARN_ON_ONCE(class->name != lock->name); 713 return class; Things will go worse, when 5) module1 is exited, and lock class for mutex1 will be cleared because mutex2 will cache lock class which actually does not exist. -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http://linuxtesting.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kernel/module.c: Free lock-classes if parse_args failed
22.01.2015 3:40, Rusty Russell пишет: Andrey Tsyvarev tsyva...@ispras.ru writes: 21.01.2015 4:40, Rusty Russell пишет: Andrey Tsyvarev tsyva...@ispras.ru writes: 20.01.2015 9:37, Rusty Russell пишет: Andrey Tsyvarev tsyva...@ispras.ru writes: parse_args call module parameters' .set handlers, which may use locks defined in the module. So, these classes should be freed in case parse_args returns error(e.g. due to incorrect parameter passed). Thanks, this seems right. Applied. But this makes me ask: where is lockdep_free_key_range() called on the module init code? It doesn't seem to be at all... As I understand, locks are not allowed to be defined in the module init section. So, no needs to call lockdep_free_key_range() for it. This has a sense: objects from that section are allowed to be used only by module-init() function. But a single function call doesn't require any synchronization wrt itself. I don't know that we have any __initdata locks; it would be really weird. But change 'static DEFINE_MUTEX(mutex_param);' to 'static __initdata DEFINE_MUTEX(mutex_param);' to test. Compiler warns about sections mismatch, but the test works. According to lockdep_free_key_range() code, lock class is cleared not only according to its key(which is equal to lock address in the case of static lock) but also according to its name. What happens if you later register another lock at that address, since the memory is freed? Do you mean that scenario: 1) mutex1 is placed in module1 .init.data section, 2) after module1 is initialized, .init.data section is freed, 3) same memory is reused for module2 .data section, 4) mutex2 is placed in module2 .data section at the same address, as mutex1 was? It seems, mutex2 will share lock class with mutex1. That is, lockdep will confused: [kernel/locking/lockdep.c] 707 if (class-key == key) { 708 /* 709 * Huh! same key, different name? Did someone trample 710 * on some memory? We're most confused. 711 */ 712 WARN_ON_ONCE(class-name != lock-name); 713 return class; Things will go worse, when 5) module1 is exited, and lock class for mutex1 will be cleared because mutex2 will cache lock class which actually does not exist. -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http://linuxtesting.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kernel/module.c: Free lock-classes if parse_args failed
21.01.2015 4:40, Rusty Russell пишет: Andrey Tsyvarev writes: 20.01.2015 9:37, Rusty Russell пишет: Andrey Tsyvarev writes: parse_args call module parameters' .set handlers, which may use locks defined in the module. So, these classes should be freed in case parse_args returns error(e.g. due to incorrect parameter passed). Thanks, this seems right. Applied. But this makes me ask: where is lockdep_free_key_range() called on the module init code? It doesn't seem to be at all... As I understand, locks are not allowed to be defined in the module init section. So, no needs to call lockdep_free_key_range() for it. This has a sense: objects from that section are allowed to be used only by module->init() function. But a single function call doesn't require any synchronization wrt itself. I don't know that we have any __initdata locks; it would be really weird. But change 'static DEFINE_MUTEX(mutex_param);' to 'static __initdata DEFINE_MUTEX(mutex_param);' to test. Compiler warns about sections mismatch, but the test works. According to lockdep_free_key_range() code, lock class is cleared not only according to its key(which is equal to lock address in the case of static lock) but also according to its name. Lock class name are assigned from the name of lock itself, which is initialized using .name = #lockname construction inside a macro. While "__initdata" force mutex structure to be placed inside .init.data section, string for the .name field is placed in its normal section. That's why lock class for the "__initdata" mutex is cleared when lockdep_free_key_range() is called for "core" module section. I remember that message about lockdep crash(on unmodified kernel and with original test) was also be concerned with the lock class name: [ 184.903688] BUG: unable to handle kernel paging request at a00110be [ 184.903697] IP: [] strcmp+0x18/0x40 [ 184.903705] PGD 1e15067 PUD 1e16063 PMD 3793e067 PTE 0 [ 184.903711] Oops: [#1] SMP [ 184.903715] Modules linked in: test(O+) iptable_nat nf_nat_ipv4 nf_nat [ 184.903724] CPU: 0 PID: 5072 Comm: insmod Tainted: G O 3.19.0-rc4-newest+ #3 [ 184.903727] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 184.903730] task: 8800130222c0 ti: 880013fc8000 task.ti: 880013fc8000 [ 184.903732] RIP: 0010:[] [] strcmp+0x18/0x40 [ 184.903737] RSP: 0018:880013fcbbc8 EFLAGS: 00010082 [ 184.903739] RAX: RBX: 82566530 RCX: [ 184.903741] RDX: a0015168 RSI: a001509d RDI: a00110bf [ 184.903743] RBP: 880013fcbbc8 R08: R09: [ 184.903745] R10: fec17ffa5a615168 R11: 825666f0 R12: [ 184.903747] R13: a001509d R14: 825666e0 R15: fec0 [ 184.903750] FS: 7f250e13e740() GS:88003fc0() knlGS: [ 184.903753] CS: 0010 DS: ES: CR0: 8005003b [ 184.903755] CR2: a00110be CR3: 24188000 CR4: 06f0 [ 184.903763] Stack: [ 184.903765] 880013fcbbf8 81893958 8800130222c0 825666e0 [ 184.903769] a0015168 880013fcbc78 8108e539 [ 184.903773] 13d3629d 8800 [ 184.903778] Call Trace: [ 184.903786] [] count_matching_names+0x61/0x8e [ 184.903793] [] __lock_acquire.isra.30+0x8f9/0xa10 [ 184.903799] [] ? ioread32+0x2f/0x50 [ 184.903803] [] lock_acquire+0xaa/0x130 [ 184.903808] [] ? minit+0x15/0x28 [test] [ 184.903813] [] mutex_lock_nested+0x46/0x340 [ 184.903816] [] ? minit+0x15/0x28 [test] [ 184.903822] [] ? do_one_initcall+0x78/0x1c0 [ 184.903826] [] ? param_set+0x34/0x34 [test] [ 184.903830] [] minit+0x15/0x28 [test] [ 184.903835] [] do_one_initcall+0x84/0x1c0 [ 184.903840] [] ? __vunmap+0xc2/0x110 [ 184.903845] [] load_module+0x1cfe/0x2280 [ 184.903849] [] ? m_show+0x1a0/0x1a0 [ 184.903855] [] ? __audit_syscall_entry+0xac/0x100 [ 184.903859] [] SyS_finit_module+0x76/0x80 [ 184.903864] [] system_call_fastpath+0x12/0x17 [ 184.903866] Code: c9 88 4a ff 75 ed 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 eb 0e 66 2e 0f 1f 84 00 00 00 00 00 84 c0 74 1c 48 83 c7 01 <0f> b6 47 ff 48 83 c6 01 3a 46 ff 74 eb 19 c0 83 c8 01 5d c3 0f [ 184.903912] RIP [] strcmp+0x18/0x40 [ 184.903916] RSP [ 184.903918] CR2: a00110be Cheers, Rusty. -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http://linuxtesting.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kernel/module.c: Free lock-classes if parse_args failed
21.01.2015 4:40, Rusty Russell пишет: Andrey Tsyvarev tsyva...@ispras.ru writes: 20.01.2015 9:37, Rusty Russell пишет: Andrey Tsyvarev tsyva...@ispras.ru writes: parse_args call module parameters' .set handlers, which may use locks defined in the module. So, these classes should be freed in case parse_args returns error(e.g. due to incorrect parameter passed). Thanks, this seems right. Applied. But this makes me ask: where is lockdep_free_key_range() called on the module init code? It doesn't seem to be at all... As I understand, locks are not allowed to be defined in the module init section. So, no needs to call lockdep_free_key_range() for it. This has a sense: objects from that section are allowed to be used only by module-init() function. But a single function call doesn't require any synchronization wrt itself. I don't know that we have any __initdata locks; it would be really weird. But change 'static DEFINE_MUTEX(mutex_param);' to 'static __initdata DEFINE_MUTEX(mutex_param);' to test. Compiler warns about sections mismatch, but the test works. According to lockdep_free_key_range() code, lock class is cleared not only according to its key(which is equal to lock address in the case of static lock) but also according to its name. Lock class name are assigned from the name of lock itself, which is initialized using .name = #lockname construction inside a macro. While __initdata force mutex structure to be placed inside .init.data section, string for the .name field is placed in its normal section. That's why lock class for the __initdata mutex is cleared when lockdep_free_key_range() is called for core module section. I remember that message about lockdep crash(on unmodified kernel and with original test) was also be concerned with the lock class name: [ 184.903688] BUG: unable to handle kernel paging request at a00110be [ 184.903697] IP: [81321b48] strcmp+0x18/0x40 [ 184.903705] PGD 1e15067 PUD 1e16063 PMD 3793e067 PTE 0 [ 184.903711] Oops: [#1] SMP [ 184.903715] Modules linked in: test(O+) iptable_nat nf_nat_ipv4 nf_nat [ 184.903724] CPU: 0 PID: 5072 Comm: insmod Tainted: G O 3.19.0-rc4-newest+ #3 [ 184.903727] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 184.903730] task: 8800130222c0 ti: 880013fc8000 task.ti: 880013fc8000 [ 184.903732] RIP: 0010:[81321b48] [81321b48] strcmp+0x18/0x40 [ 184.903737] RSP: 0018:880013fcbbc8 EFLAGS: 00010082 [ 184.903739] RAX: RBX: 82566530 RCX: [ 184.903741] RDX: a0015168 RSI: a001509d RDI: a00110bf [ 184.903743] RBP: 880013fcbbc8 R08: R09: [ 184.903745] R10: fec17ffa5a615168 R11: 825666f0 R12: [ 184.903747] R13: a001509d R14: 825666e0 R15: fec0 [ 184.903750] FS: 7f250e13e740() GS:88003fc0() knlGS: [ 184.903753] CS: 0010 DS: ES: CR0: 8005003b [ 184.903755] CR2: a00110be CR3: 24188000 CR4: 06f0 [ 184.903763] Stack: [ 184.903765] 880013fcbbf8 81893958 8800130222c0 825666e0 [ 184.903769] a0015168 880013fcbc78 8108e539 [ 184.903773] 13d3629d 8800 [ 184.903778] Call Trace: [ 184.903786] [81893958] count_matching_names+0x61/0x8e [ 184.903793] [8108e539] __lock_acquire.isra.30+0x8f9/0xa10 [ 184.903799] [8132] ? ioread32+0x2f/0x50 [ 184.903803] [8108edba] lock_acquire+0xaa/0x130 [ 184.903808] [a0015049] ? minit+0x15/0x28 [test] [ 184.903813] [8189f236] mutex_lock_nested+0x46/0x340 [ 184.903816] [a0015049] ? minit+0x15/0x28 [test] [ 184.903822] [810002b8] ? do_one_initcall+0x78/0x1c0 [ 184.903826] [a0015034] ? param_set+0x34/0x34 [test] [ 184.903830] [a0015049] minit+0x15/0x28 [test] [ 184.903835] [810002c4] do_one_initcall+0x84/0x1c0 [ 184.903840] [81156fc2] ? __vunmap+0xc2/0x110 [ 184.903845] [810c76de] load_module+0x1cfe/0x2280 [ 184.903849] [810c41c0] ? m_show+0x1a0/0x1a0 [ 184.903855] [810e2f3c] ? __audit_syscall_entry+0xac/0x100 [ 184.903859] [810c7d96] SyS_finit_module+0x76/0x80 [ 184.903864] [818a2b69] system_call_fastpath+0x12/0x17 [ 184.903866] Code: c9 88 4a ff 75 ed 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 eb 0e 66 2e 0f 1f 84 00 00 00 00 00 84 c0 74 1c 48 83 c7 01 0f b6 47 ff 48 83 c6 01 3a 46 ff 74 eb 19 c0 83 c8 01 5d c3 0f [ 184.903912] RIP [81321b48] strcmp+0x18/0x40 [ 184.903916] RSP 880013fcbbc8 [ 184.903918] CR2: a00110be Cheers, Rusty. -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http
Re: [PATCH] kernel/module.c: Free lock-classes if parse_args failed
20.01.2015 9:37, Rusty Russell пишет: Andrey Tsyvarev writes: parse_args call module parameters' .set handlers, which may use locks defined in the module. So, these classes should be freed in case parse_args returns error(e.g. due to incorrect parameter passed). Thanks, this seems right. Applied. But this makes me ask: where is lockdep_free_key_range() called on the module init code? It doesn't seem to be at all... As I understand, locks are not allowed to be defined in the module init section. So, no needs to call lockdep_free_key_range() for it. This has a sense: objects from that section are allowed to be used only by module->init() function. But a single function call doesn't require any synchronization wrt itself. If it helps, I used module below for verify effect of the patch: test.c: --- #include #include MODULE_AUTHOR("Tsyvarev Andrey"); MODULE_LICENSE("GPL"); static DEFINE_MUTEX(mutex_main); static DEFINE_MUTEX(mutex_param); static int param_set(const char* val, const struct kernel_param* kp) { mutex_lock(_param); //Use mutex defined in the module mutex_unlock(_param); return -EINVAL; // Setting this parameter is always invalid } static struct kernel_param_ops param_ops = { .set = param_set }; module_param_cb(param, _ops, NULL, S_IRUGO); static int minit(void) { mutex_lock(_main); // Use another mutex mutex_unlock(_main); return 0; } static void mexit(void) { } module_init(minit); module_exit(mexit); -- insmod test.ko param=1 # Will fail insmod test.ko If kernel is compiled with CONFIG_LOCKDEP, then, without patch, lockdep crashes on the second module insertion. Confused, Rusty. Signed-off-by: Andrey Tsyvarev --- kernel/module.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/module.c b/kernel/module.c index 3965511..2b44de4 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -3311,6 +3311,9 @@ static int load_module(struct load_info *info, const char __user *uargs, module_bug_cleanup(mod); mutex_unlock(_mutex); + /* Free lock-classes: */ + lockdep_free_key_range(mod->module_core, mod->core_size); + /* we can't deallocate the module until we clear memory protection */ unset_module_init_ro_nx(mod); unset_module_core_ro_nx(mod); -- 1.8.3.1 -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http://linuxtesting.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kernel/module.c: Free lock-classes if parse_args failed
20.01.2015 9:37, Rusty Russell пишет: Andrey Tsyvarev tsyva...@ispras.ru writes: parse_args call module parameters' .set handlers, which may use locks defined in the module. So, these classes should be freed in case parse_args returns error(e.g. due to incorrect parameter passed). Thanks, this seems right. Applied. But this makes me ask: where is lockdep_free_key_range() called on the module init code? It doesn't seem to be at all... As I understand, locks are not allowed to be defined in the module init section. So, no needs to call lockdep_free_key_range() for it. This has a sense: objects from that section are allowed to be used only by module-init() function. But a single function call doesn't require any synchronization wrt itself. If it helps, I used module below for verify effect of the patch: test.c: --- #include linux/module.h #include linux/moduleparam.h MODULE_AUTHOR(Tsyvarev Andrey); MODULE_LICENSE(GPL); static DEFINE_MUTEX(mutex_main); static DEFINE_MUTEX(mutex_param); static int param_set(const char* val, const struct kernel_param* kp) { mutex_lock(mutex_param); //Use mutex defined in the module mutex_unlock(mutex_param); return -EINVAL; // Setting this parameter is always invalid } static struct kernel_param_ops param_ops = { .set = param_set }; module_param_cb(param, param_ops, NULL, S_IRUGO); static int minit(void) { mutex_lock(mutex_main); // Use another mutex mutex_unlock(mutex_main); return 0; } static void mexit(void) { } module_init(minit); module_exit(mexit); -- insmod test.ko param=1 # Will fail insmod test.ko If kernel is compiled with CONFIG_LOCKDEP, then, without patch, lockdep crashes on the second module insertion. Confused, Rusty. Signed-off-by: Andrey Tsyvarev tsyva...@ispras.ru --- kernel/module.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/module.c b/kernel/module.c index 3965511..2b44de4 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -3311,6 +3311,9 @@ static int load_module(struct load_info *info, const char __user *uargs, module_bug_cleanup(mod); mutex_unlock(module_mutex); + /* Free lock-classes: */ + lockdep_free_key_range(mod-module_core, mod-core_size); + /* we can't deallocate the module until we clear memory protection */ unset_module_init_ro_nx(mod); unset_module_core_ro_nx(mod); -- 1.8.3.1 -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http://linuxtesting.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] kernel/module.c: Free lock-classes if parse_args failed
parse_args call module parameters' .set handlers, which may use locks defined in the module. So, these classes should be freed in case parse_args returns error(e.g. due to incorrect parameter passed). Signed-off-by: Andrey Tsyvarev --- kernel/module.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/module.c b/kernel/module.c index 3965511..2b44de4 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -3311,6 +3311,9 @@ static int load_module(struct load_info *info, const char __user *uargs, module_bug_cleanup(mod); mutex_unlock(_mutex); + /* Free lock-classes: */ + lockdep_free_key_range(mod->module_core, mod->core_size); + /* we can't deallocate the module until we clear memory protection */ unset_module_init_ro_nx(mod); unset_module_core_ro_nx(mod); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] kernel/module.c: Free lock-classes if parse_args failed
parse_args call module parameters' .set handlers, which may use locks defined in the module. So, these classes should be freed in case parse_args returns error(e.g. due to incorrect parameter passed). Signed-off-by: Andrey Tsyvarev tsyva...@ispras.ru --- kernel/module.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/module.c b/kernel/module.c index 3965511..2b44de4 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -3311,6 +3311,9 @@ static int load_module(struct load_info *info, const char __user *uargs, module_bug_cleanup(mod); mutex_unlock(module_mutex); + /* Free lock-classes: */ + lockdep_free_key_range(mod-module_core, mod-core_size); + /* we can't deallocate the module until we clear memory protection */ unset_module_init_ro_nx(mod); unset_module_core_ro_nx(mod); -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] f2fs: Possible use-after-free when umount filesystem
Hi, With patch skipping invalidating pages for node_inode and meta_inode use-after-free error disappears too. 23.07.2014 7:39, Gu Zheng пишет: Hi, On 07/23/2014 10:12 AM, Chao Yu wrote: Hi Andrey Gu, -Original Message- From: Andrey Tsyvarev [mailto:tsyva...@ispras.ru] Sent: Tuesday, July 22, 2014 6:04 PM To: Gu Zheng Cc: Jaegeuk Kim; linux-kernel; Alexey Khoroshilov; linux-f2fs-de...@lists.sourceforge.net Subject: Re: [f2fs-dev] f2fs: Possible use-after-free when umount filesystem Hi Gu, Investigation shows, that f2fs_evict_inode, when called for 'meta_inode', uses invalidate_mapping_pages() for 'node_inode'. But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput(). It seems that in common usage scenario this use-after-free is benign, because 'node_inode' remains partially valid data even after kmem_cache_free(). But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another (mounted) f2fs filesystem requests inode from cache, and formely 'node_inode' of the first filesystem is returned. The analysis seems reasonable. Have you tried to swap the reclaim order of node_inde and meta_inode? diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 870fe19..e114418 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -430,8 +430,8 @@ static void f2fs_put_super(struct super_block *sb) if (sbi->s_dirty && get_pages(sbi, F2FS_DIRTY_NODES)) write_checkpoint(sbi, true); - iput(sbi->node_inode); iput(sbi->meta_inode); + iput(sbi->node_inode); /* destroy f2fs internal modules */ destroy_node_manager(sbi); Thanks, Gu With reclaim order of node_inode and meta_inode swapped, use-after-free error disappears. But shouldn't initialization order of these inodes be swapped too? As meta_inode uses node_inode, it seems logical that it should be initialized after it. The initialization order dose not affect anything, so swapping the order dose not make more sense here. IMO, it's not easy to exchange order of initialization between meta_inode and node_inode, because we should use meta_inode in get_valid_checkpoint for valid cp first for usual verification, then init node_inode. Yeah, but I think just moving node_inode's initialization to the front of meta_inode dose not break anything. As I checked, nids for both meta_inode and node_inode are reservation, so it's not necessary for us to invalidate pages which will never alloced. How about skipping it as following? It seems the right way to fix this issue. To Andrey: Could you please try this one? Thanks, Gu diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index 2cf6962..cafba3c 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -273,7 +273,7 @@ void f2fs_evict_inode(struct inode *inode) if (inode->i_ino == F2FS_NODE_INO(sbi) || inode->i_ino == F2FS_META_INO(sbi)) - goto no_delete; + goto out_clear; f2fs_bug_on(get_dirty_dents(inode)); remove_dirty_dir_inode(inode); @@ -295,6 +295,7 @@ void f2fs_evict_inode(struct inode *inode) sb_end_intwrite(inode->i_sb); no_delete: - clear_inode(inode); invalidate_mapping_pages(NODE_MAPPING(sbi), inode->i_ino, inode->i_ino); +out_clear: + clear_inode(inode); } -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http://linuxtesting.org -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel . -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http://linuxtesting.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] f2fs: Possible use-after-free when umount filesystem
Hi, With patch skipping invalidating pages for node_inode and meta_inode use-after-free error disappears too. 23.07.2014 7:39, Gu Zheng пишет: Hi, On 07/23/2014 10:12 AM, Chao Yu wrote: Hi Andrey Gu, -Original Message- From: Andrey Tsyvarev [mailto:tsyva...@ispras.ru] Sent: Tuesday, July 22, 2014 6:04 PM To: Gu Zheng Cc: Jaegeuk Kim; linux-kernel; Alexey Khoroshilov; linux-f2fs-de...@lists.sourceforge.net Subject: Re: [f2fs-dev] f2fs: Possible use-after-free when umount filesystem Hi Gu, Investigation shows, that f2fs_evict_inode, when called for 'meta_inode', uses invalidate_mapping_pages() for 'node_inode'. But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput(). It seems that in common usage scenario this use-after-free is benign, because 'node_inode' remains partially valid data even after kmem_cache_free(). But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another (mounted) f2fs filesystem requests inode from cache, and formely 'node_inode' of the first filesystem is returned. The analysis seems reasonable. Have you tried to swap the reclaim order of node_inde and meta_inode? diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 870fe19..e114418 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -430,8 +430,8 @@ static void f2fs_put_super(struct super_block *sb) if (sbi-s_dirty get_pages(sbi, F2FS_DIRTY_NODES)) write_checkpoint(sbi, true); - iput(sbi-node_inode); iput(sbi-meta_inode); + iput(sbi-node_inode); /* destroy f2fs internal modules */ destroy_node_manager(sbi); Thanks, Gu With reclaim order of node_inode and meta_inode swapped, use-after-free error disappears. But shouldn't initialization order of these inodes be swapped too? As meta_inode uses node_inode, it seems logical that it should be initialized after it. The initialization order dose not affect anything, so swapping the order dose not make more sense here. IMO, it's not easy to exchange order of initialization between meta_inode and node_inode, because we should use meta_inode in get_valid_checkpoint for valid cp first for usual verification, then init node_inode. Yeah, but I think just moving node_inode's initialization to the front of meta_inode dose not break anything. As I checked, nids for both meta_inode and node_inode are reservation, so it's not necessary for us to invalidate pages which will never alloced. How about skipping it as following? It seems the right way to fix this issue. To Andrey: Could you please try this one? Thanks, Gu diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index 2cf6962..cafba3c 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -273,7 +273,7 @@ void f2fs_evict_inode(struct inode *inode) if (inode-i_ino == F2FS_NODE_INO(sbi) || inode-i_ino == F2FS_META_INO(sbi)) - goto no_delete; + goto out_clear; f2fs_bug_on(get_dirty_dents(inode)); remove_dirty_dir_inode(inode); @@ -295,6 +295,7 @@ void f2fs_evict_inode(struct inode *inode) sb_end_intwrite(inode-i_sb); no_delete: - clear_inode(inode); invalidate_mapping_pages(NODE_MAPPING(sbi), inode-i_ino, inode-i_ino); +out_clear: + clear_inode(inode); } -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http://linuxtesting.org -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel . -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http://linuxtesting.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: f2fs: Possible use-after-free when umount filesystem
Hi Gu, Investigation shows, that f2fs_evict_inode, when called for 'meta_inode', uses invalidate_mapping_pages() for 'node_inode'. But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput(). It seems that in common usage scenario this use-after-free is benign, because 'node_inode' remains partially valid data even after kmem_cache_free(). But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another (mounted) f2fs filesystem requests inode from cache, and formely 'node_inode' of the first filesystem is returned. The analysis seems reasonable. Have you tried to swap the reclaim order of node_inde and meta_inode? diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 870fe19..e114418 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -430,8 +430,8 @@ static void f2fs_put_super(struct super_block *sb) if (sbi->s_dirty && get_pages(sbi, F2FS_DIRTY_NODES)) write_checkpoint(sbi, true); - iput(sbi->node_inode); iput(sbi->meta_inode); + iput(sbi->node_inode); /* destroy f2fs internal modules */ destroy_node_manager(sbi); Thanks, Gu With reclaim order of node_inode and meta_inode swapped, use-after-free error disappears. But shouldn't initialization order of these inodes be swapped too? As meta_inode uses node_inode, it seems logical that it should be initialized after it. -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http://linuxtesting.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: f2fs: Possible use-after-free when umount filesystem
Hi Gu, Investigation shows, that f2fs_evict_inode, when called for 'meta_inode', uses invalidate_mapping_pages() for 'node_inode'. But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput(). It seems that in common usage scenario this use-after-free is benign, because 'node_inode' remains partially valid data even after kmem_cache_free(). But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another (mounted) f2fs filesystem requests inode from cache, and formely 'node_inode' of the first filesystem is returned. The analysis seems reasonable. Have you tried to swap the reclaim order of node_inde and meta_inode? diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 870fe19..e114418 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -430,8 +430,8 @@ static void f2fs_put_super(struct super_block *sb) if (sbi-s_dirty get_pages(sbi, F2FS_DIRTY_NODES)) write_checkpoint(sbi, true); - iput(sbi-node_inode); iput(sbi-meta_inode); + iput(sbi-node_inode); /* destroy f2fs internal modules */ destroy_node_manager(sbi); Thanks, Gu With reclaim order of node_inode and meta_inode swapped, use-after-free error disappears. But shouldn't initialization order of these inodes be swapped too? As meta_inode uses node_inode, it seems logical that it should be initialized after it. -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http://linuxtesting.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
f2fs: Possible use-after-free when umount filesystem
that f2fs_evict_inode, when called for 'meta_inode', uses invalidate_mapping_pages() for 'node_inode'. But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput(). It seems that in common usage scenario this use-after-free is benign, because 'node_inode' remains partially valid data even after kmem_cache_free(). But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another (mounted) f2fs filesystem requests inode from cache, and formely 'node_inode' of the first filesystem is returned. Found by Linux File System Verification project (linuxtesting.org). -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http://linuxtesting.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
f2fs: Possible use-after-free when umount filesystem
^ 880058786700: 880058786800: 880058786900: 880058786a00: 880058786b00: Legend: f - 8 freed bytes r - 8 redzone bytes . - 8 allocated bytes x=1..7 - x allocated bytes + (8-x) redzone bytes Investigation shows, that f2fs_evict_inode, when called for 'meta_inode', uses invalidate_mapping_pages() for 'node_inode'. But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput(). It seems that in common usage scenario this use-after-free is benign, because 'node_inode' remains partially valid data even after kmem_cache_free(). But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another (mounted) f2fs filesystem requests inode from cache, and formely 'node_inode' of the first filesystem is returned. Found by Linux File System Verification project (linuxtesting.org). -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http://linuxtesting.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ext4: Do not destroy ext4_groupinfo_caches if ext4_mb_init() fails
12.05.2014 19:08, Lukáš Czerner пишет: On Mon, 12 May 2014, Andrey Tsyvarev wrote: Date: Mon, 12 May 2014 12:23:59 +0400 From: Andrey Tsyvarev To: Theodore Ts'o Cc: Andrey Tsyvarev, Andreas Dilger,linux-e...@vger.kernel.org, linux-kernel@vger.kernel.org, Alexey Khoroshilov Subject: [PATCH] ext4: Do not destroy ext4_groupinfo_caches if ext4_mb_init() fails Caches from 'ext4_groupinfo_caches' may be in use by other mounts, which have already existed. So, it is incorrect to destroy them when newly requested mount fails. Found by Linux File System Verification project (linuxtesting.org). Makes sense, thanks! Can you please share the test case which triggered this ? It might be worth including in xfstests. Actually it was triggered by xfstests themselves but run with fault simulation. The method of fault simulation is under development/evaluation now, we expect to publish a paper describing it in the near future. BUG_ON() in get_groupinfo_cache() was firstly triggered by test generic/003, but actually it could be any other test, which uses a scratch device: xftests itself requires test device(TEST_DEV) mounted, so a fault simulated while mount scratch device causes the problem described. Reviewed-by: Lukas Czerner Signed-off-by: Andrey Tsyvarev --- fs/ext4/mballoc.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 04a5c75..becea1d 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2607,7 +2607,7 @@ int ext4_mb_init(struct super_block *sb) sbi->s_locality_groups = alloc_percpu(struct ext4_locality_group); if (sbi->s_locality_groups == NULL) { ret = -ENOMEM; - goto out_free_groupinfo_slab; + goto out; } for_each_possible_cpu(i) { struct ext4_locality_group *lg; @@ -2632,8 +2632,6 @@ int ext4_mb_init(struct super_block *sb) out_free_locality_groups: free_percpu(sbi->s_locality_groups); sbi->s_locality_groups = NULL; -out_free_groupinfo_slab: - ext4_groupinfo_destroy_slabs(); out: kfree(sbi->s_mb_offsets); sbi->s_mb_offsets = NULL; -- Andrey Tsyvarev Linux Verification Center, ISPRAS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ext4: Do not destroy ext4_groupinfo_caches if ext4_mb_init() fails
12.05.2014 19:08, Lukáš Czerner пишет: On Mon, 12 May 2014, Andrey Tsyvarev wrote: Date: Mon, 12 May 2014 12:23:59 +0400 From: Andrey Tsyvarevtsyva...@ispras.ru To: Theodore Ts'oty...@mit.edu Cc: Andrey Tsyvarevtsyva...@ispras.ru, Andreas Dilgeradilger.ker...@dilger.ca,linux-e...@vger.kernel.org, linux-kernel@vger.kernel.org, Alexey Khoroshilovkhoroshi...@ispras.ru Subject: [PATCH] ext4: Do not destroy ext4_groupinfo_caches if ext4_mb_init() fails Caches from 'ext4_groupinfo_caches' may be in use by other mounts, which have already existed. So, it is incorrect to destroy them when newly requested mount fails. Found by Linux File System Verification project (linuxtesting.org). Makes sense, thanks! Can you please share the test case which triggered this ? It might be worth including in xfstests. Actually it was triggered by xfstests themselves but run with fault simulation. The method of fault simulation is under development/evaluation now, we expect to publish a paper describing it in the near future. BUG_ON() in get_groupinfo_cache() was firstly triggered by test generic/003, but actually it could be any other test, which uses a scratch device: xftests itself requires test device(TEST_DEV) mounted, so a fault simulated while mount scratch device causes the problem described. Reviewed-by: Lukas Czernerlczer...@redhat.com Signed-off-by: Andrey Tsyvarevtsyva...@ispras.ru --- fs/ext4/mballoc.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 04a5c75..becea1d 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2607,7 +2607,7 @@ int ext4_mb_init(struct super_block *sb) sbi-s_locality_groups = alloc_percpu(struct ext4_locality_group); if (sbi-s_locality_groups == NULL) { ret = -ENOMEM; - goto out_free_groupinfo_slab; + goto out; } for_each_possible_cpu(i) { struct ext4_locality_group *lg; @@ -2632,8 +2632,6 @@ int ext4_mb_init(struct super_block *sb) out_free_locality_groups: free_percpu(sbi-s_locality_groups); sbi-s_locality_groups = NULL; -out_free_groupinfo_slab: - ext4_groupinfo_destroy_slabs(); out: kfree(sbi-s_mb_offsets); sbi-s_mb_offsets = NULL; -- Andrey Tsyvarev Linux Verification Center, ISPRAS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ext4: Do not destroy ext4_groupinfo_caches if ext4_mb_init() fails
Caches from 'ext4_groupinfo_caches' may be in use by other mounts, which have already existed. So, it is incorrect to destroy them when newly requested mount fails. Found by Linux File System Verification project (linuxtesting.org). Signed-off-by: Andrey Tsyvarev --- fs/ext4/mballoc.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 04a5c75..becea1d 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2607,7 +2607,7 @@ int ext4_mb_init(struct super_block *sb) sbi->s_locality_groups = alloc_percpu(struct ext4_locality_group); if (sbi->s_locality_groups == NULL) { ret = -ENOMEM; - goto out_free_groupinfo_slab; + goto out; } for_each_possible_cpu(i) { struct ext4_locality_group *lg; @@ -2632,8 +2632,6 @@ int ext4_mb_init(struct super_block *sb) out_free_locality_groups: free_percpu(sbi->s_locality_groups); sbi->s_locality_groups = NULL; -out_free_groupinfo_slab: - ext4_groupinfo_destroy_slabs(); out: kfree(sbi->s_mb_offsets); sbi->s_mb_offsets = NULL; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ext4: Do not destroy ext4_groupinfo_caches if ext4_mb_init() fails
Caches from 'ext4_groupinfo_caches' may be in use by other mounts, which have already existed. So, it is incorrect to destroy them when newly requested mount fails. Found by Linux File System Verification project (linuxtesting.org). Signed-off-by: Andrey Tsyvarev tsyva...@ispras.ru --- fs/ext4/mballoc.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 04a5c75..becea1d 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2607,7 +2607,7 @@ int ext4_mb_init(struct super_block *sb) sbi-s_locality_groups = alloc_percpu(struct ext4_locality_group); if (sbi-s_locality_groups == NULL) { ret = -ENOMEM; - goto out_free_groupinfo_slab; + goto out; } for_each_possible_cpu(i) { struct ext4_locality_group *lg; @@ -2632,8 +2632,6 @@ int ext4_mb_init(struct super_block *sb) out_free_locality_groups: free_percpu(sbi-s_locality_groups); sbi-s_locality_groups = NULL; -out_free_groupinfo_slab: - ext4_groupinfo_destroy_slabs(); out: kfree(sbi-s_mb_offsets); sbi-s_mb_offsets = NULL; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: f2fs: BUG_ON() is triggered when mount valid f2fs filesystem
Hi, With this patch mounting of the image continues to fail (with similar BUG_ON). But when image is formatted again (and steps mentioned in the previous message are performed), mounting of it is now succeed. Is this is a true purpose of the patch? 15.04.2014 15:04, Jaegeuk Kim пишет: Hi, Thank you for the report. I retrieved the fault image and found out that previous garbage data wreak such the wrong behaviors. So, I wrote the following patch that fills one zero-block at the checkpoint procedure. If the underlying device supports discard, I expect that it mostly doesn't incur any performance regression significantly. Could you test this patch? >From 60588ceb7277aae2a79e7f67f5217d1256720d78 Mon Sep 17 00:00:00 2001 From: Jaegeuk Kim Date: Tue, 15 Apr 2014 13:57:55 +0900 Subject: [PATCH] f2fs: avoid to conduct roll-forward due to the remained garbage blocks The f2fs always scans the next chain of direct node blocks. But some garbage blocks are able to be remained due to no discard support or SSR triggers. This occasionally wreaks recovering wrong inodes that were used or BUG_ONs due to reallocating node ids as follows. When mount this f2fs image: http://linuxtesting.org/downloads/f2fs_fault_image.zip BUG_ON is triggered in f2fs driver (messages below are generated on kernel 3.13.2; for other kernels output is similar): kernel BUG at fs/f2fs/node.c:215! Call Trace: [] recover_inode_page+0x1fd/0x3e0 [f2fs] [] ? __lock_page+0x67/0x70 [] ? autoremove_wake_function+0x50/0x50 [] recover_fsync_data+0x1398/0x15d0 [f2fs] [] ? selinux_d_instantiate+0x1c/0x20 [] ? d_instantiate+0x5b/0x80 [] f2fs_fill_super+0xb04/0xbf0 [f2fs] [] ? mount_bdev+0x7e/0x210 [] mount_bdev+0x1c9/0x210 [] ? validate_superblock+0x210/0x210 [f2fs] [] f2fs_mount+0x1d/0x30 [f2fs] [] mount_fs+0x47/0x1c0 [] ? __alloc_percpu+0x10/0x20 [] vfs_kern_mount+0x72/0x110 [] do_mount+0x493/0x910 [] ? strndup_user+0x5b/0x80 [] SyS_mount+0x90/0xe0 [] system_call_fastpath+0x16/0x1b Found by Linux File System Verification project (linuxtesting.org). Reported-by: Andrey Tsyvarev Signed-off-by: Jaegeuk Kim --- fs/f2fs/checkpoint.c | 6 ++ fs/f2fs/f2fs.h | 1 + fs/f2fs/segment.c| 17 +++-- 3 files changed, 22 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 4aa521a..890e23d 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -762,6 +762,12 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount) void *kaddr; int i; + /* +* This avoids to conduct wrong roll-forward operations and uses +* metapages, so should be called prior to sync_meta_pages below. +*/ + discard_next_dnode(sbi); + /* Flush all the NAT/SIT pages */ while (get_pages(sbi, F2FS_DIRTY_META)) sync_meta_pages(sbi, META, LONG_MAX); diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 2ecac83..2c5a5da 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1179,6 +1179,7 @@ int f2fs_issue_flush(struct f2fs_sb_info *); void invalidate_blocks(struct f2fs_sb_info *, block_t); void refresh_sit_entry(struct f2fs_sb_info *, block_t, block_t); void clear_prefree_segments(struct f2fs_sb_info *); +void discard_next_dnode(struct f2fs_sb_info *); int npages_for_summary_flush(struct f2fs_sb_info *); void allocate_new_segments(struct f2fs_sb_info *); struct page *get_sum_page(struct f2fs_sb_info *, unsigned int); diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 1e264e7..9993f94 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -335,13 +335,26 @@ static void locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno) mutex_unlock(_i->seglist_lock); } -static void f2fs_issue_discard(struct f2fs_sb_info *sbi, +static int f2fs_issue_discard(struct f2fs_sb_info *sbi, block_t blkstart, block_t blklen) { sector_t start = SECTOR_FROM_BLOCK(sbi, blkstart); sector_t len = SECTOR_FROM_BLOCK(sbi, blklen); - blkdev_issue_discard(sbi->sb->s_bdev, start, len, GFP_NOFS, 0); trace_f2fs_issue_discard(sbi->sb, blkstart, blklen); + return blkdev_issue_discard(sbi->sb->s_bdev, start, len, GFP_NOFS, 0); +} + +void discard_next_dnode(struct f2fs_sb_info *sbi) +{ + struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_WARM_NODE); + block_t blkaddr = NEXT_FREE_BLKADDR(sbi, curseg); + + if (f2fs_issue_discard(sbi, blkaddr, 1)) { + struct page *page = grab_meta_page(sbi, blkaddr); + /* zero-filled page */ + set_page_dirty(page); + f2fs_put_page(page, 1); + } } static void add_discard_addrs(struct f2fs_sb_info *sbi, -- Andrey Tsyvarev -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo
Re: f2fs: BUG_ON() is triggered when mount valid f2fs filesystem
Hi, With this patch mounting of the image continues to fail (with similar BUG_ON). But when image is formatted again (and steps mentioned in the previous message are performed), mounting of it is now succeed. Is this is a true purpose of the patch? 15.04.2014 15:04, Jaegeuk Kim пишет: Hi, Thank you for the report. I retrieved the fault image and found out that previous garbage data wreak such the wrong behaviors. So, I wrote the following patch that fills one zero-block at the checkpoint procedure. If the underlying device supports discard, I expect that it mostly doesn't incur any performance regression significantly. Could you test this patch? From 60588ceb7277aae2a79e7f67f5217d1256720d78 Mon Sep 17 00:00:00 2001 From: Jaegeuk Kim jaegeuk@samsung.com Date: Tue, 15 Apr 2014 13:57:55 +0900 Subject: [PATCH] f2fs: avoid to conduct roll-forward due to the remained garbage blocks The f2fs always scans the next chain of direct node blocks. But some garbage blocks are able to be remained due to no discard support or SSR triggers. This occasionally wreaks recovering wrong inodes that were used or BUG_ONs due to reallocating node ids as follows. When mount this f2fs image: http://linuxtesting.org/downloads/f2fs_fault_image.zip BUG_ON is triggered in f2fs driver (messages below are generated on kernel 3.13.2; for other kernels output is similar): kernel BUG at fs/f2fs/node.c:215! Call Trace: [a032ebad] recover_inode_page+0x1fd/0x3e0 [f2fs] [811446e7] ? __lock_page+0x67/0x70 [81089990] ? autoremove_wake_function+0x50/0x50 [a0337788] recover_fsync_data+0x1398/0x15d0 [f2fs] [812b9e5c] ? selinux_d_instantiate+0x1c/0x20 [811cb20b] ? d_instantiate+0x5b/0x80 [a0321044] f2fs_fill_super+0xb04/0xbf0 [f2fs] [811b861e] ? mount_bdev+0x7e/0x210 [811b8769] mount_bdev+0x1c9/0x210 [a0320540] ? validate_superblock+0x210/0x210 [f2fs] [a031cf8d] f2fs_mount+0x1d/0x30 [f2fs] [811b9497] mount_fs+0x47/0x1c0 [81166e00] ? __alloc_percpu+0x10/0x20 [811d4032] vfs_kern_mount+0x72/0x110 [811d6763] do_mount+0x493/0x910 [811615cb] ? strndup_user+0x5b/0x80 [811d6c70] SyS_mount+0x90/0xe0 [8166f8d9] system_call_fastpath+0x16/0x1b Found by Linux File System Verification project (linuxtesting.org). Reported-by: Andrey Tsyvarev tsyva...@ispras.ru Signed-off-by: Jaegeuk Kim jaegeuk@samsung.com --- fs/f2fs/checkpoint.c | 6 ++ fs/f2fs/f2fs.h | 1 + fs/f2fs/segment.c| 17 +++-- 3 files changed, 22 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 4aa521a..890e23d 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -762,6 +762,12 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount) void *kaddr; int i; + /* +* This avoids to conduct wrong roll-forward operations and uses +* metapages, so should be called prior to sync_meta_pages below. +*/ + discard_next_dnode(sbi); + /* Flush all the NAT/SIT pages */ while (get_pages(sbi, F2FS_DIRTY_META)) sync_meta_pages(sbi, META, LONG_MAX); diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 2ecac83..2c5a5da 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1179,6 +1179,7 @@ int f2fs_issue_flush(struct f2fs_sb_info *); void invalidate_blocks(struct f2fs_sb_info *, block_t); void refresh_sit_entry(struct f2fs_sb_info *, block_t, block_t); void clear_prefree_segments(struct f2fs_sb_info *); +void discard_next_dnode(struct f2fs_sb_info *); int npages_for_summary_flush(struct f2fs_sb_info *); void allocate_new_segments(struct f2fs_sb_info *); struct page *get_sum_page(struct f2fs_sb_info *, unsigned int); diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 1e264e7..9993f94 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -335,13 +335,26 @@ static void locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno) mutex_unlock(dirty_i-seglist_lock); } -static void f2fs_issue_discard(struct f2fs_sb_info *sbi, +static int f2fs_issue_discard(struct f2fs_sb_info *sbi, block_t blkstart, block_t blklen) { sector_t start = SECTOR_FROM_BLOCK(sbi, blkstart); sector_t len = SECTOR_FROM_BLOCK(sbi, blklen); - blkdev_issue_discard(sbi-sb-s_bdev, start, len, GFP_NOFS, 0); trace_f2fs_issue_discard(sbi-sb, blkstart, blklen); + return blkdev_issue_discard(sbi-sb-s_bdev, start, len, GFP_NOFS, 0); +} + +void discard_next_dnode(struct f2fs_sb_info *sbi) +{ + struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_WARM_NODE); + block_t blkaddr = NEXT_FREE_BLKADDR(sbi, curseg); + + if (f2fs_issue_discard(sbi, blkaddr, 1)) { + struct page *page = grab_meta_page(sbi, blkaddr); + /* zero-filled page
f2fs: BUG_ON() is triggered when mount valid f2fs filesystem
Hello, When mount this f2fs image: http://linuxtesting.org/downloads/f2fs_fault_image.zip BUG_ON is triggered in f2fs driver (messages below are generated on kernel 3.13.2; for other kernels output is similar): [ 2416.364463] kernel BUG at fs/f2fs/node.c:215! [ 2416.364464] invalid opcode: [#1] SMP [ 2416.364466] Modules linked in: f2fs fuse ip6t_rpfilter ip6t_REJECT xt_conntrack bnep bluetooth rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vboxsf(OF) snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device ppdev snd_pcm snd_page_alloc snd_timer snd e1000 joydev soundcore microcode serio_raw parport_pc parport vboxvideo(OF) drm i2c_piix4 i2c_core vboxguest(OF) ata_generic pata_acpi [ 2416.364493] CPU: 0 PID: 2117 Comm: mount Tainted: GF O 3.10.11fs #4 [ 2416.364494] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 2416.364496] task: 8800304d3fc0 ti: 88000dbae000 task.ti: 88000dbae000 [ 2416.364497] RIP: 0010:[] [] set_node_addr.clone.1+0x1de/0x270 [f2fs] [ 2416.364503] RSP: 0018:88000dbafaa8 EFLAGS: 00010202 [ 2416.364504] RAX: 880034bc0030 RBX: 88000dbafaf8 RCX: [ 2416.364505] RDX: RSI: 0005 RDI: [ 2416.364505] RBP: 88000dbafae8 R08: 880034bc0030 R09: 88000860e6e8 [ 2416.364506] R10: 0001 R11: 0084642a R12: 88001f617020 [ 2416.364507] R13: 88001f617000 R14: 88001f617010 R15: [ 2416.364509] FS: 7f8597b25880() GS:88003fc0() knlGS: [ 2416.364510] CS: 0010 DS: ES: CR0: 8005003b [ 2416.364511] CR2: 7ffc645020b0 CR3: 3c699000 CR4: 06f0 [ 2416.364514] DR0: DR1: DR2: [ 2416.364515] DR3: DR6: 0ff0 DR7: 0400 [ 2416.364516] Stack: [ 2416.364517] 01fa0400 88001f617000 88000dbafae8 88003390 [ 2416.364519] eaddbec0 8800339008f8 88003bc4b000 8800 [ 2416.364521] 88000dbafb68 a032ebad 00050005 0001fa00 [ 2416.364523] Call Trace: [ 2416.364528] [] recover_inode_page+0x1fd/0x3e0 [f2fs] [ 2416.364531] [] ? __lock_page+0x67/0x70 [ 2416.364535] [] ? autoremove_wake_function+0x50/0x50 [ 2416.364538] [] recover_fsync_data+0x1398/0x15d0 [f2fs] [ 2416.364541] [] ? selinux_d_instantiate+0x1c/0x20 [ 2416.364544] [] ? d_instantiate+0x5b/0x80 [ 2416.364547] [] f2fs_fill_super+0xb04/0xbf0 [f2fs] [ 2416.364549] [] ? mount_bdev+0x7e/0x210 [ 2416.364551] [] mount_bdev+0x1c9/0x210 [ 2416.364554] [] ? validate_superblock+0x210/0x210 [f2fs] [ 2416.364557] [] f2fs_mount+0x1d/0x30 [f2fs] [ 2416.364559] [] mount_fs+0x47/0x1c0 [ 2416.364562] [] ? __alloc_percpu+0x10/0x20 [ 2416.364564] [] vfs_kern_mount+0x72/0x110 [ 2416.364566] [] do_mount+0x493/0x910 [ 2416.364568] [] ? strndup_user+0x5b/0x80 [ 2416.364570] [] SyS_mount+0x90/0xe0 [ 2416.364573] [] system_call_fastpath+0x16/0x1b [ 2416.364574] Code: a0 24 02 00 01 48 8b 13 48 89 50 18 48 8b 53 08 48 89 50 20 48 8b 53 10 48 89 50 28 48 83 7b 08 00 74 c4 48 83 05 82 24 02 00 01 <0f> 0b 48 83 05 80 24 02 00 01 48 83 05 58 24 02 00 01 0f 0b 48 [ 2416.364595] RIP [] set_node_addr.clone.1+0x1de/0x270 [f2fs] [ 2416.364598] RSP [ 2416.364600] ---[ end trace d203dddb09f4fc3d ]--- Found by Linux File System Verification project (linuxtesting.org). fsck.f2fs reports that given filesystem is valid. Moreover, on kernels 3.13.2, 3.14 mount continues to fail(with same error) even after these operations on given filesystem's image: mkfs -t f2fs mount -t f2fs -omand touch /file.txt setfacl /file.txt umount Initial filesystem's content for above operations is important: if one applies them to zero-filled or one-filled image, resulted filesystem is mounted successfully. -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web: http://linuxtesting.org -- Andrey Tsyvarev -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
f2fs: BUG_ON() is triggered when mount valid f2fs filesystem
Hello, When mount this f2fs image: http://linuxtesting.org/downloads/f2fs_fault_image.zip BUG_ON is triggered in f2fs driver (messages below are generated on kernel 3.13.2; for other kernels output is similar): [ 2416.364463] kernel BUG at fs/f2fs/node.c:215! [ 2416.364464] invalid opcode: [#1] SMP [ 2416.364466] Modules linked in: f2fs fuse ip6t_rpfilter ip6t_REJECT xt_conntrack bnep bluetooth rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vboxsf(OF) snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device ppdev snd_pcm snd_page_alloc snd_timer snd e1000 joydev soundcore microcode serio_raw parport_pc parport vboxvideo(OF) drm i2c_piix4 i2c_core vboxguest(OF) ata_generic pata_acpi [ 2416.364493] CPU: 0 PID: 2117 Comm: mount Tainted: GF O 3.10.11fs #4 [ 2416.364494] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 2416.364496] task: 8800304d3fc0 ti: 88000dbae000 task.ti: 88000dbae000 [ 2416.364497] RIP: 0010:[a0329f2e] [a0329f2e] set_node_addr.clone.1+0x1de/0x270 [f2fs] [ 2416.364503] RSP: 0018:88000dbafaa8 EFLAGS: 00010202 [ 2416.364504] RAX: 880034bc0030 RBX: 88000dbafaf8 RCX: [ 2416.364505] RDX: RSI: 0005 RDI: [ 2416.364505] RBP: 88000dbafae8 R08: 880034bc0030 R09: 88000860e6e8 [ 2416.364506] R10: 0001 R11: 0084642a R12: 88001f617020 [ 2416.364507] R13: 88001f617000 R14: 88001f617010 R15: [ 2416.364509] FS: 7f8597b25880() GS:88003fc0() knlGS: [ 2416.364510] CS: 0010 DS: ES: CR0: 8005003b [ 2416.364511] CR2: 7ffc645020b0 CR3: 3c699000 CR4: 06f0 [ 2416.364514] DR0: DR1: DR2: [ 2416.364515] DR3: DR6: 0ff0 DR7: 0400 [ 2416.364516] Stack: [ 2416.364517] 01fa0400 88001f617000 88000dbafae8 88003390 [ 2416.364519] eaddbec0 8800339008f8 88003bc4b000 8800 [ 2416.364521] 88000dbafb68 a032ebad 00050005 0001fa00 [ 2416.364523] Call Trace: [ 2416.364528] [a032ebad] recover_inode_page+0x1fd/0x3e0 [f2fs] [ 2416.364531] [811446e7] ? __lock_page+0x67/0x70 [ 2416.364535] [81089990] ? autoremove_wake_function+0x50/0x50 [ 2416.364538] [a0337788] recover_fsync_data+0x1398/0x15d0 [f2fs] [ 2416.364541] [812b9e5c] ? selinux_d_instantiate+0x1c/0x20 [ 2416.364544] [811cb20b] ? d_instantiate+0x5b/0x80 [ 2416.364547] [a0321044] f2fs_fill_super+0xb04/0xbf0 [f2fs] [ 2416.364549] [811b861e] ? mount_bdev+0x7e/0x210 [ 2416.364551] [811b8769] mount_bdev+0x1c9/0x210 [ 2416.364554] [a0320540] ? validate_superblock+0x210/0x210 [f2fs] [ 2416.364557] [a031cf8d] f2fs_mount+0x1d/0x30 [f2fs] [ 2416.364559] [811b9497] mount_fs+0x47/0x1c0 [ 2416.364562] [81166e00] ? __alloc_percpu+0x10/0x20 [ 2416.364564] [811d4032] vfs_kern_mount+0x72/0x110 [ 2416.364566] [811d6763] do_mount+0x493/0x910 [ 2416.364568] [811615cb] ? strndup_user+0x5b/0x80 [ 2416.364570] [811d6c70] SyS_mount+0x90/0xe0 [ 2416.364573] [8166f8d9] system_call_fastpath+0x16/0x1b [ 2416.364574] Code: a0 24 02 00 01 48 8b 13 48 89 50 18 48 8b 53 08 48 89 50 20 48 8b 53 10 48 89 50 28 48 83 7b 08 00 74 c4 48 83 05 82 24 02 00 01 0f 0b 48 83 05 80 24 02 00 01 48 83 05 58 24 02 00 01 0f 0b 48 [ 2416.364595] RIP [a0329f2e] set_node_addr.clone.1+0x1de/0x270 [f2fs] [ 2416.364598] RSP 88000dbafaa8 [ 2416.364600] ---[ end trace d203dddb09f4fc3d ]--- Found by Linux File System Verification project (linuxtesting.org). fsck.f2fs reports that given filesystem is valid. Moreover, on kernels 3.13.2, 3.14 mount continues to fail(with same error) even after these operations on given filesystem's image: mkfs -t f2fs img mount -t f2fs -omand img mount-point touch mount-point/file.txt setfacl mount-point/file.txt umount mount-point Initial filesystem's content for above operations is important: if one applies them to zero-filled or one-filled image, resulted filesystem is mounted successfully. -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web: http://linuxtesting.org -- Andrey Tsyvarevtsyva...@ispras.ru -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] f2fs: f2fs unmount hangs if f2fs_init_acl() fails during mkdir syscall
Hi, BTW, have you tested the case that added remove_dirty_dir_inode() into the fail path of init_inode_metadata? diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index e095a4f..d5a2c9e 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -375,6 +375,7 @@ put_error: /* once the failed inode becomes a bad inode, i_mode is S_IFREG */ truncate_inode_pages(>i_data, 0); truncate_blocks(inode, 0); + remove_dirty_dir_inode(inode); error: remove_inode_page(inode); return ERR_PTR(err); Yes, i have tested that case. Fail in init_inode_metadata has been processed correctly. Thanks. -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http://linuxtesting.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] f2fs: f2fs unmount hangs if f2fs_init_acl() fails during mkdir syscall
Hi, BTW, have you tested the case that added remove_dirty_dir_inode() into the fail path of init_inode_metadata? diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index e095a4f..d5a2c9e 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -375,6 +375,7 @@ put_error: /* once the failed inode becomes a bad inode, i_mode is S_IFREG */ truncate_inode_pages(inode-i_data, 0); truncate_blocks(inode, 0); + remove_dirty_dir_inode(inode); error: remove_inode_page(inode); return ERR_PTR(err); Yes, i have tested that case. Fail in init_inode_metadata has been processed correctly. Thanks. -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http://linuxtesting.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] f2fs: f2fs unmount hangs if f2fs_init_acl() fails during mkdir syscall
Hi, It turns out that make_bad_inode prior to iput sets i_mode to a regular file, so that f2fs_evict_inode -> truncate_inode_pages -> f2fs_invalidate_data_page doesn't decrement dirty_dents. It seems that remove_dirty_dir_inode() call should also be added to the error-path of init_inode_metadata, because its functionality is also based on inode->i_mode field which is changed by make_bad_inode(). Otherwise memory leak is reported when f2fs module is unloaded: [ 231.378192] BUG f2fs_dirty_dir_entry (Tainted: GF O): Objects remaining in f2fs_dirty_dir_entry on kmem_cache_close() [ 231.378193] - [ 231.378194] Disabling lock debugging due to kernel taint [ 231.378195] INFO: Slab 0xea437200 objects=102 used=1 fp=0x880010dc8fc8 flags=0x3fffc00080 [ 231.378197] CPU: 0 PID: 2605 Comm: rmmod Tainted: GF B O 3.14.0-rc1fs #4 [ 231.378198] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 231.378199] 88000e5e3200 88000cc9bd40 8166fd7e ea437200 [ 231.378202] 88000cc9be28 811c3fdf 88003fc10066 0cc9bda0 [ 231.378203] 0020 88000cc9be38 88000cc9bde0 656a624f0296 [ 231.378205] Call Trace: [ 231.378210] [] dump_stack+0x45/0x56 [ 231.378213] [] slab_err+0xaf/0xc0 [ 231.378215] [] ? kmem_cache_close+0x133/0x340 [ 231.378216] [] ? __kmalloc+0x1f5/0x250 [ 231.378218] [] kmem_cache_close+0x153/0x340 [ 231.378221] [] ? kmem_cache_destroy+0x27/0xf0 [ 231.378223] [] __kmem_cache_shutdown+0x14/0x80 [ 231.378224] [] kmem_cache_destroy+0x41/0xf0 [ 231.378229] [] destroy_checkpoint_caches+0x21/0x30 [f2fs] [ 231.378232] [] exit_f2fs_fs+0x28/0x34e [f2fs] [ 231.378235] [] SyS_delete_module+0x152/0x1f0 [ 231.378237] [] ? __audit_syscall_entry+0x9c/0xf0 [ 231.378239] [] system_call_fastpath+0x16/0x1b [ 231.378242] INFO: Object 0x880010dc8000 @offset=0 [ 231.378245] kmem_cache_destroy f2fs_dirty_dir_entry: Slab cache still has objects [ 231.378247] CPU: 0 PID: 2605 Comm: rmmod Tainted: GF B O 3.14.0-rc1fs #4 [ 231.378247] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 231.378248] 88000e5e3268 88000cc9beb8 8166fd7e 88000e5e3200 [ 231.378250] 88000cc9bed8 811934cf a0204f60 [ 231.378251] 88000cc9bee8 a01eab91 88000cc9bef8 a01facda [ 231.378253] Call Trace: [ 231.378255] [] dump_stack+0x45/0x56 [ 231.378256] [] kmem_cache_destroy+0xdf/0xf0 [ 231.378259] [] destroy_checkpoint_caches+0x21/0x30 [f2fs] [ 231.378262] [] exit_f2fs_fs+0x28/0x34e [f2fs] [ 231.378263] [] SyS_delete_module+0x152/0x1f0 [ 231.378265] [] ? __audit_syscall_entry+0x9c/0xf0 [ 231.378266] [] system_call_fastpath+0x16/0x1b Stack of allocation (obtained with KEDR, which is also used for fault simulation): [ 231.414875] [leak_check] Address: 0x880010dc8000, size: 24; stack trace of the allocation: [ 231.414886] [leak_check] [] set_dirty_dir_page+0x62/0xe0 [f2fs] [ 231.414893] [leak_check] [] f2fs_set_data_page_dirty+0x4e/0x90 [f2fs] [ 231.414898] [leak_check] [] set_page_dirty+0x3a/0x60 [ 231.414904] [leak_check] [] __f2fs_add_link+0x732/0x7d0 [f2fs] [ 231.414909] [leak_check] [] f2fs_mkdir+0xbb/0x150 [f2fs] [ 231.414914] [leak_check] [] vfs_mkdir+0xb7/0x160 [ 231.414918] [leak_check] [] SyS_mkdir+0x5f/0xc0 [ 231.414923] [leak_check] [] system_call_fastpath+0x16/0x1b [ 231.414931] [leak_check] [] 0x P.S. It was required to add 'slub_debug' kernel options for make SLUB output correct cache name, otherwise cache "f2fs_dirty_dir_entry" was merged into "free_nid" one. It was surprise for me, that's why patch investigation took so long time. -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http://linuxtesting.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [f2fs-dev] f2fs: f2fs unmount hangs if f2fs_init_acl() fails during mkdir syscall
Hi, It turns out that make_bad_inode prior to iput sets i_mode to a regular file, so that f2fs_evict_inode - truncate_inode_pages - f2fs_invalidate_data_page doesn't decrement dirty_dents. It seems that remove_dirty_dir_inode() call should also be added to the error-path of init_inode_metadata, because its functionality is also based on inode-i_mode field which is changed by make_bad_inode(). Otherwise memory leak is reported when f2fs module is unloaded: [ 231.378192] BUG f2fs_dirty_dir_entry (Tainted: GF O): Objects remaining in f2fs_dirty_dir_entry on kmem_cache_close() [ 231.378193] - [ 231.378194] Disabling lock debugging due to kernel taint [ 231.378195] INFO: Slab 0xea437200 objects=102 used=1 fp=0x880010dc8fc8 flags=0x3fffc00080 [ 231.378197] CPU: 0 PID: 2605 Comm: rmmod Tainted: GF B O 3.14.0-rc1fs #4 [ 231.378198] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 231.378199] 88000e5e3200 88000cc9bd40 8166fd7e ea437200 [ 231.378202] 88000cc9be28 811c3fdf 88003fc10066 0cc9bda0 [ 231.378203] 0020 88000cc9be38 88000cc9bde0 656a624f0296 [ 231.378205] Call Trace: [ 231.378210] [8166fd7e] dump_stack+0x45/0x56 [ 231.378213] [811c3fdf] slab_err+0xaf/0xc0 [ 231.378215] [811c84a3] ? kmem_cache_close+0x133/0x340 [ 231.378216] [811c6b55] ? __kmalloc+0x1f5/0x250 [ 231.378218] [811c84c3] kmem_cache_close+0x153/0x340 [ 231.378221] [81193417] ? kmem_cache_destroy+0x27/0xf0 [ 231.378223] [811c86c4] __kmem_cache_shutdown+0x14/0x80 [ 231.378224] [81193431] kmem_cache_destroy+0x41/0xf0 [ 231.378229] [a01eab91] destroy_checkpoint_caches+0x21/0x30 [f2fs] [ 231.378232] [a01facda] exit_f2fs_fs+0x28/0x34e [f2fs] [ 231.378235] [810ffe32] SyS_delete_module+0x152/0x1f0 [ 231.378237] [8111d85c] ? __audit_syscall_entry+0x9c/0xf0 [ 231.378239] [81680729] system_call_fastpath+0x16/0x1b [ 231.378242] INFO: Object 0x880010dc8000 @offset=0 [ 231.378245] kmem_cache_destroy f2fs_dirty_dir_entry: Slab cache still has objects [ 231.378247] CPU: 0 PID: 2605 Comm: rmmod Tainted: GF B O 3.14.0-rc1fs #4 [ 231.378247] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 231.378248] 88000e5e3268 88000cc9beb8 8166fd7e 88000e5e3200 [ 231.378250] 88000cc9bed8 811934cf a0204f60 [ 231.378251] 88000cc9bee8 a01eab91 88000cc9bef8 a01facda [ 231.378253] Call Trace: [ 231.378255] [8166fd7e] dump_stack+0x45/0x56 [ 231.378256] [811934cf] kmem_cache_destroy+0xdf/0xf0 [ 231.378259] [a01eab91] destroy_checkpoint_caches+0x21/0x30 [f2fs] [ 231.378262] [a01facda] exit_f2fs_fs+0x28/0x34e [f2fs] [ 231.378263] [810ffe32] SyS_delete_module+0x152/0x1f0 [ 231.378265] [8111d85c] ? __audit_syscall_entry+0x9c/0xf0 [ 231.378266] [81680729] system_call_fastpath+0x16/0x1b Stack of allocation (obtained with KEDR, which is also used for fault simulation): [ 231.414875] [leak_check] Address: 0x880010dc8000, size: 24; stack trace of the allocation: [ 231.414886] [leak_check] [a01e9d72] set_dirty_dir_page+0x62/0xe0 [f2fs] [ 231.414893] [leak_check] [a01ec9be] f2fs_set_data_page_dirty+0x4e/0x90 [f2fs] [ 231.414898] [leak_check] [8117b02a] set_page_dirty+0x3a/0x60 [ 231.414904] [leak_check] [a01dfeb2] __f2fs_add_link+0x732/0x7d0 [f2fs] [ 231.414909] [leak_check] [a01e2f1b] f2fs_mkdir+0xbb/0x150 [f2fs] [ 231.414914] [leak_check] [811f2a37] vfs_mkdir+0xb7/0x160 [ 231.414918] [leak_check] [811f367f] SyS_mkdir+0x5f/0xc0 [ 231.414923] [leak_check] [81680729] system_call_fastpath+0x16/0x1b [ 231.414931] [leak_check] [] 0x P.S. It was required to add 'slub_debug' kernel options for make SLUB output correct cache name, otherwise cache f2fs_dirty_dir_entry was merged into free_nid one. It was surprise for me, that's why patch investigation took so long time. -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web:http://linuxtesting.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: f2fs: f2fs unmount hangs if f2fs_init_acl() fails during mkdir syscall
Hi, 06.02.2014 10:02, Jaegeuk Kim пишет: Hi, Thank you for the test and valuable report. This bug was fixed recently by: commit 03dea3129d558bf5293a6e9f12777176619ac876 Author: Jaegeuk Kim Date: Wed Feb 5 11:16:39 2014 +0900 f2fs: fix to truncate dentry pages in the error case Now remove_inode_page() succeed, but another assertion failed (tested on revision e964751c): [ 1272.747011] kernel BUG at fs/f2fs/inode.c:274! [ 1272.747011] invalid opcode: [#1] SMP [ 1272.747011] Modules linked in: f2fs kedr_fsim_indicator_common(OF) kedr_fsim_indicator_capable(OF) kedr_fsim_indicator_kmalloc(OF) kedr_fsim_vmm(OF) kedr_fsim_mem_util(OF) kedr_fsim_capable(OF) kedr_fsim_uaccess(OF) kedr_fsim_cmm(OF) kedr_fault_simulation(OF) kedr(OF) fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw parport_pc i2c_piix4 e1000 i2c_core microcode parport ata_generic pata_acpi [last unloaded: kedr] [ 1272.747011] CPU: 0 PID: 14613 Comm: fs-driver-tests Tainted: GF W O 3.14.0-rc1fs #1 [ 1272.747011] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 1272.747011] task: 88001e939190 ti: 88000d7ec000 task.ti: 88000d7ec000 [ 1272.747011] RIP: 0010:[] [] f2fs_evict_inode+0x178/0x180 [f2fs] [ 1272.747011] RSP: 0018:88000d7ede50 EFLAGS: 00010202 [ 1272.747011] RAX: 0001 RBX: 88000475cc30 RCX: 88001e9398a0 [ 1272.747011] RDX: 0002 RSI: RDI: 88000475ce10 [ 1272.747011] RBP: 88000d7ede68 R08: R09: [ 1272.747011] R10: R11: 0001 R12: 88000475cc30 [ 1272.747011] R13: 88000f147800 R14: a01e7080 R15: 88000f147b80 [ 1272.747011] FS: 7f1795424740() GS:88003fc0() knlGS: [ 1272.747011] CS: 0010 DS: ES: CR0: 8005003b [ 1272.747011] CR2: 7fc33bfa9000 CR3: 0f14e000 CR4: 06f0 [ 1272.747011] Stack: [ 1272.747011] 88000475cc30 88000475cdc8 a01e7080 88000d7ede90 [ 1272.747011] 811fde03 88000475cc30 88000475ccb8 88000f147000 [ 1272.747011] 88000d7edec0 811fe615 88000475cc30 88000f147800 [ 1272.747011] Call Trace: [ 1272.747011] [] evict+0xa3/0x1a0 [ 1272.747011] [] iput+0xf5/0x180 [ 1272.747011] [] f2fs_mkdir+0xf3/0x150 [f2fs] [ 1272.747011] [] vfs_mkdir+0xb7/0x160 [ 1272.747011] [] SyS_mkdir+0x5f/0xc0 [ 1272.747011] [] system_call_fastpath+0x16/0x1b [ 1272.747011] Code: 01 e1 4c 89 e7 e8 39 59 03 e1 5b 41 5c 41 5d 5d c3 31 c0 49 83 bc 24 c8 00 00 00 01 0f 97 c0 eb 8f 4c 89 e7 e8 fa ec ff ff eb 89 <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 c7 c0 dc ff ff ff [ 1272.747011] RIP [] f2fs_evict_inode+0x178/0x180 [f2fs] [ 1272.747011] RSP Failed assertion claims that dirty dentries counter should be zero when inode is deleted. This counter is incremented by mkdir()-> f2fs_add_link()-> init_inode_metadata()-> make_empty_dir()-> set_page_dirty(); but no one decrement it. May be, this should be done along with truncating directory inode in error-path of init_inode_metadata() ? -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web: http://linuxtesting.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: f2fs: f2fs unmount hangs if f2fs_init_acl() fails during mkdir syscall
Hi, 06.02.2014 10:02, Jaegeuk Kim пишет: Hi, Thank you for the test and valuable report. This bug was fixed recently by: commit 03dea3129d558bf5293a6e9f12777176619ac876 Author: Jaegeuk Kim jaegeuk@samsung.com Date: Wed Feb 5 11:16:39 2014 +0900 f2fs: fix to truncate dentry pages in the error case Now remove_inode_page() succeed, but another assertion failed (tested on revision e964751c): [ 1272.747011] kernel BUG at fs/f2fs/inode.c:274! [ 1272.747011] invalid opcode: [#1] SMP [ 1272.747011] Modules linked in: f2fs kedr_fsim_indicator_common(OF) kedr_fsim_indicator_capable(OF) kedr_fsim_indicator_kmalloc(OF) kedr_fsim_vmm(OF) kedr_fsim_mem_util(OF) kedr_fsim_capable(OF) kedr_fsim_uaccess(OF) kedr_fsim_cmm(OF) kedr_fault_simulation(OF) kedr(OF) fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw parport_pc i2c_piix4 e1000 i2c_core microcode parport ata_generic pata_acpi [last unloaded: kedr] [ 1272.747011] CPU: 0 PID: 14613 Comm: fs-driver-tests Tainted: GF W O 3.14.0-rc1fs #1 [ 1272.747011] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 1272.747011] task: 88001e939190 ti: 88000d7ec000 task.ti: 88000d7ec000 [ 1272.747011] RIP: 0010:[a01c74a8] [a01c74a8] f2fs_evict_inode+0x178/0x180 [f2fs] [ 1272.747011] RSP: 0018:88000d7ede50 EFLAGS: 00010202 [ 1272.747011] RAX: 0001 RBX: 88000475cc30 RCX: 88001e9398a0 [ 1272.747011] RDX: 0002 RSI: RDI: 88000475ce10 [ 1272.747011] RBP: 88000d7ede68 R08: R09: [ 1272.747011] R10: R11: 0001 R12: 88000475cc30 [ 1272.747011] R13: 88000f147800 R14: a01e7080 R15: 88000f147b80 [ 1272.747011] FS: 7f1795424740() GS:88003fc0() knlGS: [ 1272.747011] CS: 0010 DS: ES: CR0: 8005003b [ 1272.747011] CR2: 7fc33bfa9000 CR3: 0f14e000 CR4: 06f0 [ 1272.747011] Stack: [ 1272.747011] 88000475cc30 88000475cdc8 a01e7080 88000d7ede90 [ 1272.747011] 811fde03 88000475cc30 88000475ccb8 88000f147000 [ 1272.747011] 88000d7edec0 811fe615 88000475cc30 88000f147800 [ 1272.747011] Call Trace: [ 1272.747011] [811fde03] evict+0xa3/0x1a0 [ 1272.747011] [811fe615] iput+0xf5/0x180 [ 1272.747011] [a01c7f63] f2fs_mkdir+0xf3/0x150 [f2fs] [ 1272.747011] [811f2a77] vfs_mkdir+0xb7/0x160 [ 1272.747011] [811f36bf] SyS_mkdir+0x5f/0xc0 [ 1272.747011] [81680769] system_call_fastpath+0x16/0x1b [ 1272.747011] Code: 01 e1 4c 89 e7 e8 39 59 03 e1 5b 41 5c 41 5d 5d c3 31 c0 49 83 bc 24 c8 00 00 00 01 0f 97 c0 eb 8f 4c 89 e7 e8 fa ec ff ff eb 89 0f 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 c7 c0 dc ff ff ff [ 1272.747011] RIP [a01c74a8] f2fs_evict_inode+0x178/0x180 [f2fs] [ 1272.747011] RSP 88000d7ede50 Failed assertion claims that dirty dentries counter should be zero when inode is deleted. This counter is incremented by mkdir()- f2fs_add_link()- init_inode_metadata()- make_empty_dir()- set_page_dirty(); but no one decrement it. May be, this should be done along with truncating directory inode in error-path of init_inode_metadata() ? -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web: http://linuxtesting.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
f2fs: f2fs unmount hangs if f2fs_init_acl() fails during mkdir syscall
tion, fault injection kernel infrustructure. In my tests I use KEDR framework (http://forge.ispras.ru/projects/kedr) for that purpose, that is why kedr* modules are in the log above. Call stack of the fault simulated is: [ 117.863789] KEDR FAULT SIMULATION: forcing a failure [ 117.863792] CPU: 0 PID: 2766 Comm: fs-driver-tests Tainted: GF O 3.13.0fs #2 [ 117.863794] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 117.863795] 8800111adce8 8800111adcb0 8164bd96 0001 [ 117.863797] 8800111adcd8 a018ace2 a018ac45 a01e8b54 [ 117.863799] 1000 8800111add18 a0194816 a01e8b54 [ 117.863801] Call Trace: [ 117.863806] [] dump_stack+0x45/0x56 [ 117.863810] [] kedr_fsim_point_simulate+0xa2/0xb0 [kedr_fault_simulation] [ 117.863812] [] ? kedr_fsim_point_simulate+0x5/0xb0 [kedr_fault_simulation] [ 117.863816] [] ? read_all_xattrs+0x3c4/0x3e0 [f2fs] [ 117.863819] [] kedr_repl___kmalloc+0x36/0x80 [kedr_fsim_cmm] [ 117.863822] [] ? read_all_xattrs+0x3c4/0x3e0 [f2fs] [ 117.863824] [] kedr_intermediate_func___kmalloc+0x72/0xd0 [kedr_fsim_cmm] [ 117.863826] [] ? read_all_xattrs+0x3c4/0x3e0 [f2fs] [ 117.863829] [] read_all_xattrs+0x3c4/0x3e0 [f2fs] [ 117.863832] [] f2fs_getxattr+0x44/0xf0 [f2fs] [ 117.863835] [] f2fs_get_acl+0xe9/0x390 [f2fs] [ 117.863838] [] ? set_dirty_dir_page+0xb9/0xe0 [f2fs] [ 117.863841] [] f2fs_init_acl+0x10a/0x180 [f2fs] [ 117.863843] [] __f2fs_add_link+0x50c/0x7c0 [f2fs] [ 117.863846] [] f2fs_mkdir+0xbb/0x150 [f2fs] [ 117.863849] [] vfs_mkdir+0xb7/0x160 [ 117.863851] [] SyS_mkdir+0x5f/0xc0 [ 117.863854] [] system_call_fastpath+0x16/0x1b -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web: http://linuxtesting.org --- Это сообщение свободно от вирусов и вредоносного ПО благодаря защите от вирусов avast! http://www.avast.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
f2fs: f2fs unmount hangs if f2fs_init_acl() fails during mkdir syscall
for simulate faults in kernel function calls, e.g. manual faults insertion, fault injection kernel infrustructure. In my tests I use KEDR framework (http://forge.ispras.ru/projects/kedr) for that purpose, that is why kedr* modules are in the log above. Call stack of the fault simulated is: [ 117.863789] KEDR FAULT SIMULATION: forcing a failure [ 117.863792] CPU: 0 PID: 2766 Comm: fs-driver-tests Tainted: GF O 3.13.0fs #2 [ 117.863794] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 117.863795] 8800111adce8 8800111adcb0 8164bd96 0001 [ 117.863797] 8800111adcd8 a018ace2 a018ac45 a01e8b54 [ 117.863799] 1000 8800111add18 a0194816 a01e8b54 [ 117.863801] Call Trace: [ 117.863806] [8164bd96] dump_stack+0x45/0x56 [ 117.863810] [a018ace2] kedr_fsim_point_simulate+0xa2/0xb0 [kedr_fault_simulation] [ 117.863812] [a018ac45] ? kedr_fsim_point_simulate+0x5/0xb0 [kedr_fault_simulation] [ 117.863816] [a01e8b54] ? read_all_xattrs+0x3c4/0x3e0 [f2fs] [ 117.863819] [a0194816] kedr_repl___kmalloc+0x36/0x80 [kedr_fsim_cmm] [ 117.863822] [a01e8b54] ? read_all_xattrs+0x3c4/0x3e0 [f2fs] [ 117.863824] [a01955d2] kedr_intermediate_func___kmalloc+0x72/0xd0 [kedr_fsim_cmm] [ 117.863826] [a01e8b54] ? read_all_xattrs+0x3c4/0x3e0 [f2fs] [ 117.863829] [a01e8b54] read_all_xattrs+0x3c4/0x3e0 [f2fs] [ 117.863832] [a01e91d4] f2fs_getxattr+0x44/0xf0 [f2fs] [ 117.863835] [a01e9ba9] f2fs_get_acl+0xe9/0x390 [f2fs] [ 117.863838] [a01d99c9] ? set_dirty_dir_page+0xb9/0xe0 [f2fs] [ 117.863841] [a01e9ffa] f2fs_init_acl+0x10a/0x180 [f2fs] [ 117.863843] [a01d0ccc] __f2fs_add_link+0x50c/0x7c0 [f2fs] [ 117.863846] [a01d3b8b] f2fs_mkdir+0xbb/0x150 [f2fs] [ 117.863849] [811cf4c7] vfs_mkdir+0xb7/0x160 [ 117.863851] [811d010f] SyS_mkdir+0x5f/0xc0 [ 117.863854] [8165bf29] system_call_fastpath+0x16/0x1b -- Best regards, Andrey Tsyvarev Linux Verification Center, ISPRAS web: http://linuxtesting.org --- Это сообщение свободно от вирусов и вредоносного ПО благодаря защите от вирусов avast! http://www.avast.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/