Dear BTRFS developers, First of all -- thanks for developing BTRFS! So far it served really well, when others falling (or failing) behind in my initial evaluation (http://datalad.org/test_fs_analysis.html). With btrbk backups are a breeze. But it still does fail completely for me at times unfortunately.
I know that I should upgrade the kernel, and I will now... but I thought to share this incident(s) report since those might have been of some value. Running Debian jessie but with manually built kernel. btrfs is extensively used for a high meta-data partition (lots of symlinks, lots of directories with a single file in them -- heave use of git-annex), snapshots are taken regularly etc. Setup -- btrfs on top of software raids: # btrfs fi show /mnt/btrfs/ Label: 'tank' uuid: b5fe7f5e-3478-4293-a42c-bf9ca26ea724 Total devices 4 FS bytes used 21.07TiB devid 2 size 10.92TiB used 5.30TiB path /dev/md10 devid 3 size 10.92TiB used 5.30TiB path /dev/md11 devid 4 size 10.92TiB used 5.30TiB path /dev/md12 devid 5 size 10.92TiB used 5.30TiB path /dev/md13 Within last 5 days, the beast has stalled twice by now. The last signs were: * 20160605 -- kernel kaboomed at btrfs level smaug login: [3675876.734400] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa03d0354 [3675876.734400] [3675876.745680] CPU: 9 PID: 651474 Comm: git Tainted: G W IO 4.6.0-rc4+ #1 [3675876.753272] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.0b 09/17/2014 [3675876.760431] 0000000000000086 000000005e62edd4 ffffffff813098f5 ffffffff817cd080 [3675876.768104] ffff880036f23da8 ffffffff811701af ffff881e00000010 ffff880036f23db8 [3675876.775763] ffff880036f23d50 000000005e62edd4 ffff880036f23d88 ffffffffa03d0354 [3675876.783426] Call Trace: [3675876.786057] [<ffffffff813098f5>] ? dump_stack+0x5c/0x77 [3675876.791575] [<ffffffff811701af>] ? panic+0xdf/0x226 [3675876.796812] [<ffffffffa03d0354>] ? btrfs_add_link+0x384/0x3e0 [btrfs] [3675876.803549] [<ffffffff8107abf7>] ? __stack_chk_fail+0x17/0x30 [3675876.809610] [<ffffffffa03d0354>] ? btrfs_add_link+0x384/0x3e0 [btrfs] [3675876.816391] [<ffffffffa03d1273>] ? btrfs_link+0x143/0x220 [btrfs] [3675876.822802] [<ffffffff811fea9f>] ? vfs_link+0x1af/0x280 [3675876.828331] [<ffffffff812020ba>] ? SyS_link+0x22a/0x260 [3675876.833859] [<ffffffff815ba436>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8 [3675876.840740] Kernel Offset: disabled [3675876.854050] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa03d0354 [3675876.854050] * 20160610 -- again, different kaboom [443370.085059] CPU: 10 PID: 1044513 Comm: git-annex Tainted: G W IO 4.6.0-rc4+ #1 [443370.093268] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.0b 09/17/2014 [443370.100356] task: ffff8806c463d0c0 ti: ffff8808f9dc8000 task.ti: ffff8808f9dc8000 [443370.107953] RIP: 0010:[<ffff88090f67be10>] [<ffff88090f67be10>] 0xffff88090f67be10 [443370.115761] RSP: 0018:ffff8808f9dcbe18 EFLAGS: 00010292 [443370.121187] RAX: ffff88103fd95fc0 RBX: ffff8808f9dcc000 RCX: 0000000000000000 [443370.128438] RDX: 00000000ffffffff RSI: ffff8806c463d0c0 RDI: ffff88103fd95fc0 [443370.135693] RBP: ffff8808f9dcbe30 R08: ffff8808f9dc8000 R09: 0000000000000000 [443370.142940] R10: 000000000000000a R11: 0000000000000000 R12: ffff881035beedc8 [443370.150184] R13: ffff880ff1106800 R14: ffff88123d6c0000 R15: ffff88123d6c0068 [443370.157432] FS: 00007f0ab3d83740(0000) GS:ffff88103fd80000(0000) knlGS:0000000000000000 [443370.165645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [443370.171512] CR2: ffff88090f67be10 CR3: 0000000cf7516000 CR4: 00000000001406e0 [443370.178758] Stack: [443370.180880] ffff88069dda93c0 ffffffffa0358700 ffff88069dda93c0 ffff880f00000000 [443370.188490] ffff8806c463d0c0 ffffffff810bb560 ffff8808f9dcbe48 ffff8808f9dcbe48 [443370.196107] 00000000d5ce3509 ffff88069dda93c0 0000000000000001 ffff8806a64835c8 [443370.203726] Call Trace: [443370.206310] [<ffffffffa0358700>] ? btrfs_commit_transaction+0x350/0xa30 [btrfs] [443370.213826] [<ffffffff810bb560>] ? wait_woken+0x90/0x90 [443370.219280] [<ffffffffa036fb6b>] ? btrfs_sync_file+0x2fb/0x3d0 [btrfs] [443370.226012] [<ffffffff81222a48>] ? do_fsync+0x38/0x60 [443370.231267] [<ffffffff81222ccf>] ? SyS_fdatasync+0xf/0x20 [443370.236870] [<ffffffff815ba436>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8 [443370.243604] Code: 88 ff ff 21 67 5b 81 ff ff ff ff 00 00 6c 3d 12 88 ff ff dd 77 35 a0 ff ff ff ff 00 00 00 00 00 00 00 00 40 e0 91 4b 08 88 ff ff <60> b5 0b 81 ff ff ff ff f0 fd 61 8a 0c 88 ff ff 18 7c 79 3e 00 [443370.264107] RIP [<ffff88090f67be10>] 0xffff88090f67be10 [443370.271044] RSP <ffff8808f9dcbe18> [443370.276177] CR2: ffff88090f67be10 [443370.284979] ---[ end trace 2c4b690b49d17ebd ]--- and for the last case here is more details with dmesg showing apparently other tracebacks and errors logged before, so might be of help: http://www.onerussian.com/tmp/dmesg-nonet.20160610.txt Are those issues something which was fixed since 4.6.0-rc4+ or I should be on look out for them to come back? What other information should I provide if I run into them again to help you troubleshoot/fix it? P.S. Please CC me the replies -- Yaroslav O. Halchenko Center for Open Neuroscience http://centerforopenneuroscience.org Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html