Re: [PATCH] Btrfs: fix log replay failure after unlink and link combination

2018-03-02 Thread Filipe Manana
On Fri, Mar 2, 2018 at 7:00 PM, Liu Bo  wrote:
> On Fri, Mar 02, 2018 at 11:29:33AM -0700, Liu Bo wrote:
>> On Wed, Feb 28, 2018 at 03:56:10PM +, fdman...@kernel.org wrote:
>> > From: Filipe Manana 
>> >
>> > If we have a file with 2 (or more) hard links in the same directory,
>> > remove one of the hard links, create a new file (or link an existing file)
>> > in the same directory with the name of the removed hard link, and then
>> > finally fsync the new file, we end up with a log that fails to replay,
>> > causing a mount failure.
>> >
>> > Example:
>> >
>> >   $ mkfs.btrfs -f /dev/sdb
>> >   $ mount /dev/sdb /mnt
>> >
>> >   $ mkdir /mnt/testdir
>> >   $ touch /mnt/testdir/foo
>> >   $ ln /mnt/testdir/foo /mnt/testdir/bar
>> >
>> >   $ sync
>> >
>> >   $ unlink /mnt/testdir/bar
>> >   $ touch /mnt/testdir/bar
>
> So here, "unlink & touch" similar to rename, therefore we could also
> be conservative and set /mnt/testdir 's last_unlink_trans to force to
> commit transaction.

That would cause any fsync on any file inside the directory to result
in a full transaction commit.
We had in the past more "exotic" scenarios with renames and creating
new files with the same old name where the initial solution was just
that, forcing a full commit [1], just to find out later that it caused
big performance drop on real workloads [2]. That's why I'm avoiding
it, even because fsync'ing any file in the directory (existing or new)
isn't uncommon at all.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=56f23fdbb600e6087db7b009775b95ce07cc3195
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=44f714dae50a2e795d3268a6831762aa6fa54f55

thanks

>
> Thanks,
>
> -liubo
>
>> >   $ xfs_io -c "fsync" /mnt/testdir/bar
>> >
>> >   
>> >
>> >   $ mount /dev/sdb /mnt
>> >   mount: mount(2) failed: /mnt: No such file or directory
>> >
>> > When replaying the log, for that example, we also see the following in
>> > dmesg/syslog:
>> >
>> >   [71813.671307] BTRFS info (device dm-0): failed to delete reference to 
>> > bar, inode 258 parent 257
>> >   [71813.674204] [ cut here ]
>> >   [71813.675694] BTRFS: Transaction aborted (error -2)
>> >   [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 
>> > __btrfs_unlink_inode+0x17b/0x355 [btrfs]
>> >   [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax 
>> > ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd 
>> > glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg 
>> > serio_raw parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 
>> > zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov 
>> > async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c 
>> > crc32c_generic raid1 raid0 multipath linear md_mod ata_generic sd_mod 
>> > virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel floppy 
>> > virtio e1000 scsi_mod [last unloaded: btrfs]
>> >   [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: GW
>> > 4.15.0-rc9-btrfs-next-56+ #1
>> >   [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
>> > BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
>> >   [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
>> >   [71813.679669] RSP: 0018:c90001cef738 EFLAGS: 00010286
>> >   [71813.679669] RAX: 0025 RBX: 880217ce4708 RCX: 
>> > 0001
>> >   [71813.679669] RDX:  RSI: 81c14bae RDI: 
>> > 
>> >   [71813.679669] RBP: c90001cef7c0 R08: 0001 R09: 
>> > 0001
>> >   [71813.679669] R10: c90001cef5e0 R11: 8343f007 R12: 
>> > 880217d474c8
>> >   [71813.679669] R13: fffe R14: 88021ccf1548 R15: 
>> > 0101
>> >   [71813.679669] FS:  7f7cee84c480() GS:88023fc8() 
>> > knlGS:
>> >   [71813.679669] CS:  0010 DS:  ES:  CR0: 80050033
>> >   [71813.679669] CR2: 7f7cedc1abf9 CR3: 0002354b4003 CR4: 
>> > 001606e0
>> >   [71813.679669] Call Trace:
>> >   [71813.679669]  btrfs_unlink_inode+0x17/0x41 [btrfs]
>> >   [71813.679669]  drop_one_dir_item+0xfa/0x131 [btrfs]
>> >   [71813.679669]  add_inode_ref+0x71e/0x851 [btrfs]
>> >   [71813.679669]  ? __lock_is_held+0x39/0x71
>> >   [71813.679669]  ? replay_one_buffer+0x53/0x53a [btrfs]
>> >   [71813.679669]  replay_one_buffer+0x4a4/0x53a [btrfs]
>> >   [71813.679669]  ? rcu_read_unlock+0x3a/0x57
>> >   [71813.679669]  ? __lock_is_held+0x39/0x71
>> >   [71813.679669]  walk_up_log_tree+0x101/0x1d2 [btrfs]
>> >   [71813.679669]  walk_log_tree+0xad/0x188 [btrfs]
>> >   [71813.679669]  btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
>> >   [71813.679669]  ? replay_one_extent+0x544/0x544 [btrfs]
>> >   [71813.679669]  open_ctree+0x1cf6/0x2209 [btrfs]
>> >   

Re: [PATCH] Btrfs: fix log replay failure after unlink and link combination

2018-03-02 Thread Filipe Manana
On Fri, Mar 2, 2018 at 6:29 PM, Liu Bo  wrote:
> On Wed, Feb 28, 2018 at 03:56:10PM +, fdman...@kernel.org wrote:
>> From: Filipe Manana 
>>
>> If we have a file with 2 (or more) hard links in the same directory,
>> remove one of the hard links, create a new file (or link an existing file)
>> in the same directory with the name of the removed hard link, and then
>> finally fsync the new file, we end up with a log that fails to replay,
>> causing a mount failure.
>>
>> Example:
>>
>>   $ mkfs.btrfs -f /dev/sdb
>>   $ mount /dev/sdb /mnt
>>
>>   $ mkdir /mnt/testdir
>>   $ touch /mnt/testdir/foo
>>   $ ln /mnt/testdir/foo /mnt/testdir/bar
>>
>>   $ sync
>>
>>   $ unlink /mnt/testdir/bar
>>   $ touch /mnt/testdir/bar
>>   $ xfs_io -c "fsync" /mnt/testdir/bar
>>
>>   
>>
>>   $ mount /dev/sdb /mnt
>>   mount: mount(2) failed: /mnt: No such file or directory
>>
>> When replaying the log, for that example, we also see the following in
>> dmesg/syslog:
>>
>>   [71813.671307] BTRFS info (device dm-0): failed to delete reference to 
>> bar, inode 258 parent 257
>>   [71813.674204] [ cut here ]
>>   [71813.675694] BTRFS: Transaction aborted (error -2)
>>   [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 
>> __btrfs_unlink_inode+0x17b/0x355 [btrfs]
>>   [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax 
>> ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd 
>> glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw 
>> parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress 
>> zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq 
>> async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 
>> multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata 
>> virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last 
>> unloaded: btrfs]
>>   [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: GW
>> 4.15.0-rc9-btrfs-next-56+ #1
>>   [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
>>   [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
>>   [71813.679669] RSP: 0018:c90001cef738 EFLAGS: 00010286
>>   [71813.679669] RAX: 0025 RBX: 880217ce4708 RCX: 
>> 0001
>>   [71813.679669] RDX:  RSI: 81c14bae RDI: 
>> 
>>   [71813.679669] RBP: c90001cef7c0 R08: 0001 R09: 
>> 0001
>>   [71813.679669] R10: c90001cef5e0 R11: 8343f007 R12: 
>> 880217d474c8
>>   [71813.679669] R13: fffe R14: 88021ccf1548 R15: 
>> 0101
>>   [71813.679669] FS:  7f7cee84c480() GS:88023fc8() 
>> knlGS:
>>   [71813.679669] CS:  0010 DS:  ES:  CR0: 80050033
>>   [71813.679669] CR2: 7f7cedc1abf9 CR3: 0002354b4003 CR4: 
>> 001606e0
>>   [71813.679669] Call Trace:
>>   [71813.679669]  btrfs_unlink_inode+0x17/0x41 [btrfs]
>>   [71813.679669]  drop_one_dir_item+0xfa/0x131 [btrfs]
>>   [71813.679669]  add_inode_ref+0x71e/0x851 [btrfs]
>>   [71813.679669]  ? __lock_is_held+0x39/0x71
>>   [71813.679669]  ? replay_one_buffer+0x53/0x53a [btrfs]
>>   [71813.679669]  replay_one_buffer+0x4a4/0x53a [btrfs]
>>   [71813.679669]  ? rcu_read_unlock+0x3a/0x57
>>   [71813.679669]  ? __lock_is_held+0x39/0x71
>>   [71813.679669]  walk_up_log_tree+0x101/0x1d2 [btrfs]
>>   [71813.679669]  walk_log_tree+0xad/0x188 [btrfs]
>>   [71813.679669]  btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
>>   [71813.679669]  ? replay_one_extent+0x544/0x544 [btrfs]
>>   [71813.679669]  open_ctree+0x1cf6/0x2209 [btrfs]
>>   [71813.679669]  btrfs_mount_root+0x368/0x482 [btrfs]
>>   [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
>>   [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
>>   [71813.679669]  ? mount_fs+0x64/0x10b
>>   [71813.679669]  mount_fs+0x64/0x10b
>>   [71813.679669]  vfs_kern_mount+0x68/0xce
>>   [71813.679669]  btrfs_mount+0x13e/0x772 [btrfs]
>>   [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
>>   [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
>>   [71813.679669]  ? mount_fs+0x64/0x10b
>>   [71813.679669]  mount_fs+0x64/0x10b
>>   [71813.679669]  vfs_kern_mount+0x68/0xce
>>   [71813.679669]  do_mount+0x6e5/0x973
>>   [71813.679669]  ? memdup_user+0x3e/0x5c
>>   [71813.679669]  SyS_mount+0x72/0x98
>>   [71813.679669]  entry_SYSCALL_64_fastpath+0x1e/0x8b
>>   [71813.679669] RIP: 0033:0x7f7cedf150ba
>>   [71813.679669] RSP: 002b:7ffca71da688 EFLAGS: 0206
>>   [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 
>> 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 
>> <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c
>>   [71813.679669] ---[ end trace 

Re: [PATCH] Btrfs: fix log replay failure after unlink and link combination

2018-03-02 Thread Liu Bo
On Fri, Mar 02, 2018 at 11:29:33AM -0700, Liu Bo wrote:
> On Wed, Feb 28, 2018 at 03:56:10PM +, fdman...@kernel.org wrote:
> > From: Filipe Manana 
> > 
> > If we have a file with 2 (or more) hard links in the same directory,
> > remove one of the hard links, create a new file (or link an existing file)
> > in the same directory with the name of the removed hard link, and then
> > finally fsync the new file, we end up with a log that fails to replay,
> > causing a mount failure.
> > 
> > Example:
> > 
> >   $ mkfs.btrfs -f /dev/sdb
> >   $ mount /dev/sdb /mnt
> > 
> >   $ mkdir /mnt/testdir
> >   $ touch /mnt/testdir/foo
> >   $ ln /mnt/testdir/foo /mnt/testdir/bar
> > 
> >   $ sync
> > 
> >   $ unlink /mnt/testdir/bar
> >   $ touch /mnt/testdir/bar

So here, "unlink & touch" similar to rename, therefore we could also
be conservative and set /mnt/testdir 's last_unlink_trans to force to
commit transaction.

Thanks,

-liubo

> >   $ xfs_io -c "fsync" /mnt/testdir/bar
> > 
> >   
> > 
> >   $ mount /dev/sdb /mnt
> >   mount: mount(2) failed: /mnt: No such file or directory
> > 
> > When replaying the log, for that example, we also see the following in
> > dmesg/syslog:
> > 
> >   [71813.671307] BTRFS info (device dm-0): failed to delete reference to 
> > bar, inode 258 parent 257
> >   [71813.674204] [ cut here ]
> >   [71813.675694] BTRFS: Transaction aborted (error -2)
> >   [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 
> > __btrfs_unlink_inode+0x17b/0x355 [btrfs]
> >   [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax 
> > ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd 
> > glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw 
> > parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress 
> > zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq 
> > async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 
> > multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata 
> > virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last 
> > unloaded: btrfs]
> >   [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: GW
> > 4.15.0-rc9-btrfs-next-56+ #1
> >   [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> >   [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
> >   [71813.679669] RSP: 0018:c90001cef738 EFLAGS: 00010286
> >   [71813.679669] RAX: 0025 RBX: 880217ce4708 RCX: 
> > 0001
> >   [71813.679669] RDX:  RSI: 81c14bae RDI: 
> > 
> >   [71813.679669] RBP: c90001cef7c0 R08: 0001 R09: 
> > 0001
> >   [71813.679669] R10: c90001cef5e0 R11: 8343f007 R12: 
> > 880217d474c8
> >   [71813.679669] R13: fffe R14: 88021ccf1548 R15: 
> > 0101
> >   [71813.679669] FS:  7f7cee84c480() GS:88023fc8() 
> > knlGS:
> >   [71813.679669] CS:  0010 DS:  ES:  CR0: 80050033
> >   [71813.679669] CR2: 7f7cedc1abf9 CR3: 0002354b4003 CR4: 
> > 001606e0
> >   [71813.679669] Call Trace:
> >   [71813.679669]  btrfs_unlink_inode+0x17/0x41 [btrfs]
> >   [71813.679669]  drop_one_dir_item+0xfa/0x131 [btrfs]
> >   [71813.679669]  add_inode_ref+0x71e/0x851 [btrfs]
> >   [71813.679669]  ? __lock_is_held+0x39/0x71
> >   [71813.679669]  ? replay_one_buffer+0x53/0x53a [btrfs]
> >   [71813.679669]  replay_one_buffer+0x4a4/0x53a [btrfs]
> >   [71813.679669]  ? rcu_read_unlock+0x3a/0x57
> >   [71813.679669]  ? __lock_is_held+0x39/0x71
> >   [71813.679669]  walk_up_log_tree+0x101/0x1d2 [btrfs]
> >   [71813.679669]  walk_log_tree+0xad/0x188 [btrfs]
> >   [71813.679669]  btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
> >   [71813.679669]  ? replay_one_extent+0x544/0x544 [btrfs]
> >   [71813.679669]  open_ctree+0x1cf6/0x2209 [btrfs]
> >   [71813.679669]  btrfs_mount_root+0x368/0x482 [btrfs]
> >   [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
> >   [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
> >   [71813.679669]  ? mount_fs+0x64/0x10b
> >   [71813.679669]  mount_fs+0x64/0x10b
> >   [71813.679669]  vfs_kern_mount+0x68/0xce
> >   [71813.679669]  btrfs_mount+0x13e/0x772 [btrfs]
> >   [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
> >   [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
> >   [71813.679669]  ? mount_fs+0x64/0x10b
> >   [71813.679669]  mount_fs+0x64/0x10b
> >   [71813.679669]  vfs_kern_mount+0x68/0xce
> >   [71813.679669]  do_mount+0x6e5/0x973
> >   [71813.679669]  ? memdup_user+0x3e/0x5c
> >   [71813.679669]  SyS_mount+0x72/0x98
> >   [71813.679669]  entry_SYSCALL_64_fastpath+0x1e/0x8b
> >   [71813.679669] RIP: 0033:0x7f7cedf150ba
> >   [71813.679669] RSP: 002b:7ffca71da688 EFLAGS: 

Re: [PATCH] Btrfs: fix log replay failure after unlink and link combination

2018-03-02 Thread Liu Bo
On Wed, Feb 28, 2018 at 03:56:10PM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> If we have a file with 2 (or more) hard links in the same directory,
> remove one of the hard links, create a new file (or link an existing file)
> in the same directory with the name of the removed hard link, and then
> finally fsync the new file, we end up with a log that fails to replay,
> causing a mount failure.
> 
> Example:
> 
>   $ mkfs.btrfs -f /dev/sdb
>   $ mount /dev/sdb /mnt
> 
>   $ mkdir /mnt/testdir
>   $ touch /mnt/testdir/foo
>   $ ln /mnt/testdir/foo /mnt/testdir/bar
> 
>   $ sync
> 
>   $ unlink /mnt/testdir/bar
>   $ touch /mnt/testdir/bar
>   $ xfs_io -c "fsync" /mnt/testdir/bar
> 
>   
> 
>   $ mount /dev/sdb /mnt
>   mount: mount(2) failed: /mnt: No such file or directory
> 
> When replaying the log, for that example, we also see the following in
> dmesg/syslog:
> 
>   [71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, 
> inode 258 parent 257
>   [71813.674204] [ cut here ]
>   [71813.675694] BTRFS: Transaction aborted (error -2)
>   [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 
> __btrfs_unlink_inode+0x17b/0x355 [btrfs]
>   [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax 
> ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd 
> glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw 
> parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress 
> zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq 
> async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 
> multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata 
> virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last 
> unloaded: btrfs]
>   [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: GW
> 4.15.0-rc9-btrfs-next-56+ #1
>   [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
>   [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
>   [71813.679669] RSP: 0018:c90001cef738 EFLAGS: 00010286
>   [71813.679669] RAX: 0025 RBX: 880217ce4708 RCX: 
> 0001
>   [71813.679669] RDX:  RSI: 81c14bae RDI: 
> 
>   [71813.679669] RBP: c90001cef7c0 R08: 0001 R09: 
> 0001
>   [71813.679669] R10: c90001cef5e0 R11: 8343f007 R12: 
> 880217d474c8
>   [71813.679669] R13: fffe R14: 88021ccf1548 R15: 
> 0101
>   [71813.679669] FS:  7f7cee84c480() GS:88023fc8() 
> knlGS:
>   [71813.679669] CS:  0010 DS:  ES:  CR0: 80050033
>   [71813.679669] CR2: 7f7cedc1abf9 CR3: 0002354b4003 CR4: 
> 001606e0
>   [71813.679669] Call Trace:
>   [71813.679669]  btrfs_unlink_inode+0x17/0x41 [btrfs]
>   [71813.679669]  drop_one_dir_item+0xfa/0x131 [btrfs]
>   [71813.679669]  add_inode_ref+0x71e/0x851 [btrfs]
>   [71813.679669]  ? __lock_is_held+0x39/0x71
>   [71813.679669]  ? replay_one_buffer+0x53/0x53a [btrfs]
>   [71813.679669]  replay_one_buffer+0x4a4/0x53a [btrfs]
>   [71813.679669]  ? rcu_read_unlock+0x3a/0x57
>   [71813.679669]  ? __lock_is_held+0x39/0x71
>   [71813.679669]  walk_up_log_tree+0x101/0x1d2 [btrfs]
>   [71813.679669]  walk_log_tree+0xad/0x188 [btrfs]
>   [71813.679669]  btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
>   [71813.679669]  ? replay_one_extent+0x544/0x544 [btrfs]
>   [71813.679669]  open_ctree+0x1cf6/0x2209 [btrfs]
>   [71813.679669]  btrfs_mount_root+0x368/0x482 [btrfs]
>   [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
>   [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
>   [71813.679669]  ? mount_fs+0x64/0x10b
>   [71813.679669]  mount_fs+0x64/0x10b
>   [71813.679669]  vfs_kern_mount+0x68/0xce
>   [71813.679669]  btrfs_mount+0x13e/0x772 [btrfs]
>   [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
>   [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
>   [71813.679669]  ? mount_fs+0x64/0x10b
>   [71813.679669]  mount_fs+0x64/0x10b
>   [71813.679669]  vfs_kern_mount+0x68/0xce
>   [71813.679669]  do_mount+0x6e5/0x973
>   [71813.679669]  ? memdup_user+0x3e/0x5c
>   [71813.679669]  SyS_mount+0x72/0x98
>   [71813.679669]  entry_SYSCALL_64_fastpath+0x1e/0x8b
>   [71813.679669] RIP: 0033:0x7f7cedf150ba
>   [71813.679669] RSP: 002b:7ffca71da688 EFLAGS: 0206
>   [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 
> 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 
> <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c
>   [71813.679669] ---[ end trace 83bd473fc5b4663b ]---
>   [71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: 
> errno=-2 No such entry
>   [71813.886994] BTRFS: error (device 

[PATCH] Btrfs: fix log replay failure after unlink and link combination

2018-02-28 Thread fdmanana
From: Filipe Manana 

If we have a file with 2 (or more) hard links in the same directory,
remove one of the hard links, create a new file (or link an existing file)
in the same directory with the name of the removed hard link, and then
finally fsync the new file, we end up with a log that fails to replay,
causing a mount failure.

Example:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ mkdir /mnt/testdir
  $ touch /mnt/testdir/foo
  $ ln /mnt/testdir/foo /mnt/testdir/bar

  $ sync

  $ unlink /mnt/testdir/bar
  $ touch /mnt/testdir/bar
  $ xfs_io -c "fsync" /mnt/testdir/bar

  

  $ mount /dev/sdb /mnt
  mount: mount(2) failed: /mnt: No such file or directory

When replaying the log, for that example, we also see the following in
dmesg/syslog:

  [71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, 
inode 258 parent 257
  [71813.674204] [ cut here ]
  [71813.675694] BTRFS: Transaction aborted (error -2)
  [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 
__btrfs_unlink_inode+0x17b/0x355 [btrfs]
  [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax 
ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd 
glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw 
parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress 
zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath 
linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata virtio_pci 
virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last unloaded: btrfs]
  [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: GW
4.15.0-rc9-btrfs-next-56+ #1
  [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
  [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
  [71813.679669] RSP: 0018:c90001cef738 EFLAGS: 00010286
  [71813.679669] RAX: 0025 RBX: 880217ce4708 RCX: 
0001
  [71813.679669] RDX:  RSI: 81c14bae RDI: 

  [71813.679669] RBP: c90001cef7c0 R08: 0001 R09: 
0001
  [71813.679669] R10: c90001cef5e0 R11: 8343f007 R12: 
880217d474c8
  [71813.679669] R13: fffe R14: 88021ccf1548 R15: 
0101
  [71813.679669] FS:  7f7cee84c480() GS:88023fc8() 
knlGS:
  [71813.679669] CS:  0010 DS:  ES:  CR0: 80050033
  [71813.679669] CR2: 7f7cedc1abf9 CR3: 0002354b4003 CR4: 
001606e0
  [71813.679669] Call Trace:
  [71813.679669]  btrfs_unlink_inode+0x17/0x41 [btrfs]
  [71813.679669]  drop_one_dir_item+0xfa/0x131 [btrfs]
  [71813.679669]  add_inode_ref+0x71e/0x851 [btrfs]
  [71813.679669]  ? __lock_is_held+0x39/0x71
  [71813.679669]  ? replay_one_buffer+0x53/0x53a [btrfs]
  [71813.679669]  replay_one_buffer+0x4a4/0x53a [btrfs]
  [71813.679669]  ? rcu_read_unlock+0x3a/0x57
  [71813.679669]  ? __lock_is_held+0x39/0x71
  [71813.679669]  walk_up_log_tree+0x101/0x1d2 [btrfs]
  [71813.679669]  walk_log_tree+0xad/0x188 [btrfs]
  [71813.679669]  btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
  [71813.679669]  ? replay_one_extent+0x544/0x544 [btrfs]
  [71813.679669]  open_ctree+0x1cf6/0x2209 [btrfs]
  [71813.679669]  btrfs_mount_root+0x368/0x482 [btrfs]
  [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
  [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
  [71813.679669]  ? mount_fs+0x64/0x10b
  [71813.679669]  mount_fs+0x64/0x10b
  [71813.679669]  vfs_kern_mount+0x68/0xce
  [71813.679669]  btrfs_mount+0x13e/0x772 [btrfs]
  [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
  [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
  [71813.679669]  ? mount_fs+0x64/0x10b
  [71813.679669]  mount_fs+0x64/0x10b
  [71813.679669]  vfs_kern_mount+0x68/0xce
  [71813.679669]  do_mount+0x6e5/0x973
  [71813.679669]  ? memdup_user+0x3e/0x5c
  [71813.679669]  SyS_mount+0x72/0x98
  [71813.679669]  entry_SYSCALL_64_fastpath+0x1e/0x8b
  [71813.679669] RIP: 0033:0x7f7cedf150ba
  [71813.679669] RSP: 002b:7ffca71da688 EFLAGS: 0206
  [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 00 
02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 <0f> ff 
44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c
  [71813.679669] ---[ end trace 83bd473fc5b4663b ]---
  [71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: 
errno=-2 No such entry
  [71813.886994] BTRFS: error (device dm-0) in btrfs_replay_log:2307: errno=-2 
No such entry (Failed to recover log tree)
  [71813.903357] BTRFS error (device dm-0): cleaner transaction attach returned 
-30
  [71814.128078] BTRFS error (device dm-0): open_ctree failed

This happens because the log has inode reference