On  8.05.2018 01:58, David Sterba wrote:
> On Fri, Apr 27, 2018 at 12:21:49PM +0300, Nikolay Borisov wrote:
>> After investigating crashes on generic/176 it turned that the culprit in fact
>> is the random failure induced by generic/019. As it happens, if on unmount 
>> the 
>> filesystem is in BTRFS_FS_STATE_ERROR then btrfs_error_commit_super is 
>> called. 
>> This unveiled 2 bugs:
>>  1. btrfs_destroy_delalloc_inodes's implementation was completely bogus, 
>> since
>>  it only called btrfs_invalidate_inodes which only pruned dentries and 
>> didn't 
>>  do anything to free any inodes with pending delalloc bytes. Once this is 
>> fixed 
>>  with the use of invalide_inode_pages2 the second bug transpired. 
>>  2. The last call ot run_delayed_iputs is made before 
>> btrfs_cleanup_transaction
>>  is called. The latter in turn could queue up more delayed iputs resulting 
>> from 
>>  invalidates_inode_pages2. 
>>
>> This series fixes the problem by first fixing btrfs_destroy_delalloc_inode 
>> to 
>> properly cleanup delalloc inodes and as a result cleans up the code a bit. 
>>
>> I've given it a good bashing through xfstest (4 full xfstest cycles + 100 
>> iterations of generic/475 since it was hitting some early assertion failures,
>> which are fixed in the final version) so am pretty confident in the change. 
> 
> One qemu testmachine complains.
> 
> The branch was ext/nikbor/delalloc-invalidate in my github repo. Other
> tests seem "fine", unlikely to be related to this patchset.
> 
> The error here is a null pointer deref in end bio callback, which
> matches a use-after-free scenario, so I think there's still something
> left to fix.
> 
> generic/335             [22:34:50]

How easy is to repro this on this particular test? Like every other run
or is it spurious?

> [26281.970322] run fstests generic/335 at 2018-05-07 22:34:50
> [26282.440728] BUG: unable to handle kernel NULL pointer dereference 
> at0000000000000000
> [26282.445060] PGD 0 P4D 0
> [26282.446526] Oops: 0000 [#1] PREEMPT SMP
> [26282.448562] Modules linked in: btrfs libcrc32c xor 
> zstd_decompresszstd_compress xxhash raid6_pq loop af_packet [last unloaded: 
> libcrc32c]
> [26282.454384] CPU: 2 PID: 30005 Comm: btrfs-endio-met Tainted: GW         
> 4.17.0-rc4-default+ #73
> [26282.457247] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS 
> 1.0.0-prebuilt.qemu-project.org 04/01/2014
> [26282.459386] RIP: 0010:__queue_work+0x189/0x3f0
> [26282.460342] RSP: 0018:ffff8ea47fd03f00 EFLAGS: 00010046
> [26282.461506] RAX: 0000000000000000 RBX: 0000000000000000 
> RCX:0000000000000000
> [26282.463061] RDX: ffff8ea47fbda640 RSI: 000000007fffffff 
> RDI:ffff8ea47fbda640
> [26282.464606] RBP: 000000000000000d R08: 0000000000000000 
> R09:0000000000000001
> [26282.466169] R10: 0000000000001000 R11: ffff8ea469244000 
> R12:ffff8ea40819c800
> [26282.467697] R13: 0000000000000002 R14: 0000000000000200 
> R15:ffff8ea47fbda640
> [26282.469205] FS:  0000000000000000(0000) 
> GS:ffff8ea47fd00000(0000)knlGS:0000000000000000
> [26282.470971] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [26282.472173] CR2: 0000000000000000 CR3: 0000000042ed4000 
> CR4:00000000000006e0
> [26282.473674] Call Trace:
> [26282.474300]  <IRQ>
> [26282.474848]  queue_work_on+0x34/0x40
> [26282.475717]  btrfs_end_bio+0x71/0x110 [btrfs]
> [26282.476750]  blk_update_request+0x78/0x2d0
> [26282.477675]  blk_mq_end_request+0x18/0x70
> [26282.478599]  flush_smp_call_function_queue+0x6f/0xe0
> [26282.479690]  smp_call_function_single_interrupt+0x2c/0xe0
> [26282.480825]  call_function_single_interrupt+0xf/0x20
> [26282.481888]  </IRQ>
> [26282.482423] RIP: 0010:exit_shm+0x0/0x1c0
> [26282.483281] RSP: 0018:ffffa9a1c4787ea0 EFLAGS: 00000292 
> ORIG_RAX:ffffffffffffff04
> [26282.484944] RAX: ffffffffb2e37960 RBX: ffff8ea47ccc1c00 
> RCX:0000000000000000
> [26282.486423] RDX: ffff8ea413270e40 RSI: 0000000000000282 
> RDI:ffff8ea47ccc1c00
> [26282.487870] RBP: 0000000000000000 R08: ffff8ea413297630 
> R09:0000000000000000
> [26282.489104] R10: 0000000000000000 R11: 0000000000000256 
> R12:0000000000000000
> [26282.490306] R13: ffff8ea47ccc2301 R14: 0000000000000001 
> R15:ffff8ea47ccc1c00
> [26282.491500]  do_exit+0x274/0xb00
> [26282.492198]  ? rescuer_thread+0x2be/0x310
> [26282.492990]  ? worker_thread+0x380/0x380
> [26282.493795]  kthread+0xe0/0x130
> [26282.494443]  ? kthread_create_worker_on_cpu+0x50/0x50
> [26282.495395]  ret_from_fork+0x1f/0x30
> [26282.499564] RIP: __queue_work+0x189/0x3f0 RSP: ffff8ea47fd03f00
> [26282.500614] CR2: 0000000000000000
> [26282.501248] ---[ end trace f7e701988bc2b82f ]---
> [26282.502141] Kernel panic - not syncing: Fatal exception in interrupt
> [26282.503419] Kernel Offset: 0x31000000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [26282.505348] Rebooting in 90 seconds..
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to