RichACLs for BTRFS? (this time complete)
Hi btrfs-developers, I just read about the possible/planned merge of richacl patches into linux kernel 4.4. s. http://lwn.net/Articles/661078/ s. http://lwn.net/Articles/661357/ Will btrfs support richacls with kernel 4.4? According to the btrfs wiki, this topic has not been claimed: https://btrfs.wiki.kernel.org/index.php/Project_ideas#RichACLs_.2F_NFS4_ACLS As we'd like to use btrfs with NFSv4 I'd really like to see richacls on btrfs. Hope someone can comment on this topic. Bye, Marcel PS: Please excuse my former incomplete posting. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RichACLs for BTRFS?
Hi btrfs-developers, I just read about the possible/planned merge of richacl patches into linux kernel 4.4. s. http://lwn.net/Articles/661078/ s. http://lwn.net/Articles/661357/ Will btrfs support richacls with kernel 4.4? According to -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
linux 4.1 - memory leak (possibly dedup related)
Hi, I've been running some btrfs tests (mainly duperemove related) with linux kernel 4.1 for the last few days. Now I noticed by accident (dying processes), that all my memory (128 GB!) is gone. Gone meaning, there's no user space process allocating this memory. Digging deeper I found the missing memory using slabtop (see output of /proc/slabinfo is attached): Looks like I got a lot of kernel memory allocated by kmalloc-1024 (memory leak?). Given the fact that the test machine does little more than btrfs testing I think this may be btrfs related. I was running duperemove on a 1.5 TB volume around the time the first Out of memory error were logged, so maybe the memory leak can be found somewhere in this code path. I'm still waiting for a scrub run to finish, after that I'll reboot the machine and try to reproduce this behaviour with a fresh btrfs filesystem. Have there been any fixes concerning memory leaks since 4.1 release I could try? Any other ideas how to track down this potential memory leak? Bye, Marcel slabinfo - version: 2.1 # nameactive_objs num_objs objsize objperslab pagesperslab : tunables limit batchcount sharedfactor : slabdata active_slabs num_slabs sharedavail btrfs_delayed_data_ref 2394 2982 96 421 : tunables000 : slabdata 71 71 0 btrfs_delayed_tree_ref 3726 4600 88 461 : tunables000 : slabdata100100 0 btrfs_delayed_ref_head 1900 2100160 251 : tunables000 : slabdata 84 84 0 btrfs_delayed_node 1561 1794304 262 : tunables000 : slabdata 69 69 0 btrfs_ordered_extent 1140 1558424 384 : tunables000 : slabdata 41 41 0 bio-2 1600 1625320 252 : tunables000 : slabdata 65 65 0 btrfs_extent_buffer688 1276280 292 : tunables000 : slabdata 44 44 0 btrfs_extent_state 4539 4590 80 511 : tunables000 : slabdata 90 90 0 btrfs_delalloc_work 0 0152 261 : tunables000 : slabdata 0 0 0 btrfs_transaction176176360 222 : tunables000 : slabdata 8 8 0 btrfs_trans_handle184184176 231 : tunables000 : slabdata 8 8 0 btrfs_inode 1636 6388984 338 : tunables000 : slabdata205205 0 nfs4_layout_stateid 0 0240 342 : tunables000 : slabdata 0 0 0 nfsd4_delegations 0 0224 362 : tunables000 : slabdata 0 0 0 nfsd4_files0 0288 282 : tunables000 : slabdata 0 0 0 nfsd4_openowners 0 0440 374 : tunables000 : slabdata 0 0 0 nfs_direct_cache 0 0352 232 : tunables000 : slabdata 0 0 0 nfs_commit_data 23 23704 234 : tunables000 : slabdata 1 1 0 nfs_inode_cache0 0 1000 328 : tunables000 : slabdata 0 0 0 rpc_inode_cache 50 50640 254 : tunables000 : slabdata 2 2 0 fscache_cookie_jar 46 46 88 461 : tunables000 : slabdata 1 1 0 ext3_inode_cache 160160808 204 : tunables000 : slabdata 8 8 0 journal_handle 1360 1360 24 1701 : tunables000 : slabdata 8 8 0 ext4_groupinfo_4k 3887 6636144 281 : tunables000 : slabdata237237 0 ip6-frags 0 0216 372 : tunables000 : slabdata 0 0 0 UDPLITEv6 0 0 1088 308 : tunables000 : slabdata 0 0 0 UDPv6240240 1088 308 : tunables000 : slabdata 8 8 0 tw_sock_TCPv6 58 58280 292 : tunables000 : slabdata 2 2 0 TCPv6112112 2240 148 : tunables000 : slabdata 8 8 0 kcopyd_job 0 0 331298 : tunables000 : slabdata 0 0 0 dm_uevent 0 0 2632 128 : tunables000 : slabdata 0 0 0 cfq_queue 0 0232 352 : tunables000 : slabdata 0 0 0 bsg_cmd0 0312 262 : tunables000 : slabdata 0 0 0 mqueue_inode_cache 36 36896 368 : tunables000 : slabdata 1 1 0 fuse_request 0 0400 202 : tunables000 : slabdata 0 0 0
Re: Status: converting raid levels
Hi, I tried to revert the mentioned patch set (kernel 4.0.0-rc2). Starting a new re-balance with this kernel, while running my I/O test (big dd write) on the same btrfs volume (14 disks) resulted in cpu stuck messages - system was unusable just a few seconds later. With a plain 4.0.0-rc2 kernel the rebalance at least does finish without crashing/hanging the system, but - on the downside - it does not change the raid layout of the btrfs volume :-( Any other ideas? 2015-03-07 10:39 GMT+01:00 Filipe David Manana fdman...@gmail.com: On Fri, Mar 6, 2015 at 10:22 AM, Marcel Ritter ritter.mar...@gmail.com wrote: Hi, please, can someone comment on the current status of raid level migration? (kernel 4.0.0-rc2, btrfs-progs 3.19-rc2) I just started testing this feature, and it doesn't seem to work: Starting with Raid1: root@thunder[ ~/btrfs-progs ]# ./btrfs fi df /tmp/m Data, RAID1: total=3.00GiB, used=512.00KiB Data, single: total=8.00MiB, used=0.00B System, RAID1: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00B Metadata, RAID1: total=1.00GiB, used=112.00KiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=16.00MiB, used=0.00B Converting to Raid10: root@thunder[ ~/btrfs-progs ]# ./btrfs balance start -dconvert=raid10 -mconvert=raid10 /tmp/m/ Done, had to relocate 9 out of 9 chunks Still Raid1 ...: root@thunder[ ~/btrfs-progs ]# ./btrfs fi df /tmp/m Data, RAID1: total=35.00GiB, used=31.49GiB System, RAID1: total=32.00MiB, used=16.00KiB Metadata, RAID1: total=1.00GiB, used=33.86MiB GlobalReserve, single: total=16.00MiB, used=0.00B I also tried conversion of raid6 - but it did not work either (according to btrfs fi df output). Did I miss something, or is it a bug? Seems like the regression Holger found due to a patch added to 4.0: http://www.spinics.net/lists/linux-btrfs/msg42084.html (he mentions a 3.18.x kernel, but that's because he builds his own kernels with patches backported) Bye, Marcel -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Status: converting raid levels
Hi, please, can someone comment on the current status of raid level migration? (kernel 4.0.0-rc2, btrfs-progs 3.19-rc2) I just started testing this feature, and it doesn't seem to work: Starting with Raid1: root@thunder[ ~/btrfs-progs ]# ./btrfs fi df /tmp/m Data, RAID1: total=3.00GiB, used=512.00KiB Data, single: total=8.00MiB, used=0.00B System, RAID1: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00B Metadata, RAID1: total=1.00GiB, used=112.00KiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=16.00MiB, used=0.00B Converting to Raid10: root@thunder[ ~/btrfs-progs ]# ./btrfs balance start -dconvert=raid10 -mconvert=raid10 /tmp/m/ Done, had to relocate 9 out of 9 chunks Still Raid1 ...: root@thunder[ ~/btrfs-progs ]# ./btrfs fi df /tmp/m Data, RAID1: total=35.00GiB, used=31.49GiB System, RAID1: total=32.00MiB, used=16.00KiB Metadata, RAID1: total=1.00GiB, used=33.86MiB GlobalReserve, single: total=16.00MiB, used=0.00B I also tried conversion of raid6 - but it did not work either (according to btrfs fi df output). Did I miss something, or is it a bug? Bye, Marcel -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression: kernel 4.0.0-rc1 - soft lockups
Hi, just a short update on this topic: I also tried the Ubuntu 4.0.0-rc1 ppa kernel - problems are still there. Luckily kernel 4.0.0-rc2 was released yesterday: I updated my machine to kernel 4.0.0-rc2 and the problems are gone (test script has been running fine for about 12 hours now) Bye, Marcel 2015-03-03 12:05 GMT+01:00 Liu Bo bo.li@oracle.com: On Tue, Mar 03, 2015 at 08:31:10AM +0100, Marcel Ritter wrote: Hi, yes it is reproducible. Just creating a new btrfs filesystem (14 disks, data/mdata raid6, latest git btrfs-progs) and mounting this filesystems causes the system to hang (I think I once even got it mounted, but it did hang shortly after when dd started writing to it). I just ran some quick tests and (at least at first sight) it looks like the raid5/6 code may cause the trouble: I created different btrfs filesystem types, mounted them and (if possible) did a big dd on the filesystem: mkfs.btrfs /dev/cciss/c1d* -m raid0 -d raid0 -f - no problem (only short test) mkfs.btrfs /dev/cciss/c1d* -m raid1 -d raid1 -f - no problem (only short test) mkfs.btrfs /dev/cciss/c1d* -m raid5 -d raid5 -f - (almost) instant hang mkfs.btrfs /dev/cciss/c1d* -m raid6 -d raid6 -f - (almost) instant hang (standard test) Once the machine is up again I'll do some more testing (variing the combination of data and mdata raid levels) Hmm, just FYI, raid56 works good on my box with 4.0.0 rc1. Thanks, -liubo Bye, Marcel 2015-03-03 7:37 GMT+01:00 Liu Bo bo.li@oracle.com: On Tue, Mar 03, 2015 at 07:02:01AM +0100, Marcel Ritter wrote: Hi, yesterday I did a kernel update on my btrfs test system (Ubuntu 14.04.2) from custom-build kernel 3.19-rc6 to 4.0.0-rc1. Almost instantly after starting my test script, the system got stuck with soft lockups (the machine was running the very same test for weeks on the old kernel without problems, basically doing massive streaming i/o on a raid6 btrfs volume): I found 2 types of messages in the logs: one btrfs related: [34165.540004] INFO: rcu_sched detected stalls on CPUs/tasks: { 3 7} (detected by 6, t=6990777 jiffies, g=67455, c=67454, q=0) [34165.540004] Task dump for CPU 3: [34165.540004] mount D 8803ed266000 0 15156 15110 0x [34165.540004] 0158 0014 8803ecc13718 8803ecc136d8 [34165.540004] 8106075a 0002 [34165.540004] ecc13728 8803eb603128 [34165.540004] Call Trace: [34165.540004] [8106075a] ? __do_page_fault+0x2fa/0x440 [34165.540004] [810608d1] ? do_page_fault+0x31/0x70 [34165.540004] [81792778] ? page_fault+0x28/0x30 [34165.540004] [810ae2ce] ? pick_next_task_fair+0x53e/0x880 [34165.540004] [810ae2ce] ? pick_next_task_fair+0x53e/0x880 [34165.540004] [8109707c] ? dequeue_task+0x5c/0x80 [34165.540004] [8178b9a3] ? __schedule+0xf3/0x960 [34165.540004] [8178c247] ? schedule+0x37/0x90 [34165.540004] [a0896375] ? btrfs_start_ordered_extent+0xd5/0x110 [btrfs] [34165.540004] [810b3cb0] ? prepare_to_wait_event+0x110/0x110 [34165.540004] [a0896884] ? btrfs_wait_ordered_range+0xc4/0x120 [btrfs] [34165.540004] [a08c0c18] ? __btrfs_write_out_cache+0x378/0x470 [btrfs] [34165.540004] [a08c104a] ? btrfs_write_out_cache+0x9a/0x100 [btrfs] [34165.540004] [a086af79] ? btrfs_write_dirty_block_groups+0x159/0x560 [btrfs] [34165.540004] [a08f2aa6] ? commit_cowonly_roots+0x18d/0x2a4 [btrfs] [34165.540004] [a087bd31] ? btrfs_commit_transaction+0x521/0xa50 [btrfs] [34165.540004] [a08a3fbe] ? btrfs_create_uuid_tree+0x5e/0x110 [btrfs] [34165.540004] [a087963f] ? open_ctree+0x1dff/0x2200 [btrfs] [34165.540004] [a084f7ce] ? btrfs_mount+0x75e/0x8f0 [btrfs] [34165.540004] [811ecbf9] ? mount_fs+0x39/0x180 [34165.540004] [81192405] ? __alloc_percpu+0x15/0x20 [34165.540004] [812082bb] ? vfs_kern_mount+0x6b/0x120 [34165.540004] [8120afe4] ? do_mount+0x204/0xb30 [34165.540004] [8120bc0b] ? SyS_mount+0x8b/0xe0 [34165.540004] [817905ed] ? system_call_fastpath+0x16/0x1b [34165.540004] Task dump for CPU 7: [34165.540004] kworker/u16:1 R running task0 14518 2 0x0008 [34165.540004] Workqueue: btrfs-freespace-write btrfs_freespace_write_helper [btrfs] [34165.540004] 0200 8803eac6fdf8 a08ac242 8803eac6fe48 [34165.540004] 8108b64f f1091400 8803eca58000 [34165.540004] 8803ea9ed3c0 8803f1091418 8803f1091400 8803eca58000 [34165.540004] Call Trace: [34165.540004] [a08ac242] ? btrfs_freespace_write_helper+0x12/0x20 [btrfs
Regression: kernel 4.0.0-rc1 - soft lockups
Hi, yesterday I did a kernel update on my btrfs test system (Ubuntu 14.04.2) from custom-build kernel 3.19-rc6 to 4.0.0-rc1. Almost instantly after starting my test script, the system got stuck with soft lockups (the machine was running the very same test for weeks on the old kernel without problems, basically doing massive streaming i/o on a raid6 btrfs volume): I found 2 types of messages in the logs: one btrfs related: [34165.540004] INFO: rcu_sched detected stalls on CPUs/tasks: { 3 7} (detected by 6, t=6990777 jiffies, g=67455, c=67454, q=0) [34165.540004] Task dump for CPU 3: [34165.540004] mount D 8803ed266000 0 15156 15110 0x [34165.540004] 0158 0014 8803ecc13718 8803ecc136d8 [34165.540004] 8106075a 0002 [34165.540004] ecc13728 8803eb603128 [34165.540004] Call Trace: [34165.540004] [8106075a] ? __do_page_fault+0x2fa/0x440 [34165.540004] [810608d1] ? do_page_fault+0x31/0x70 [34165.540004] [81792778] ? page_fault+0x28/0x30 [34165.540004] [810ae2ce] ? pick_next_task_fair+0x53e/0x880 [34165.540004] [810ae2ce] ? pick_next_task_fair+0x53e/0x880 [34165.540004] [8109707c] ? dequeue_task+0x5c/0x80 [34165.540004] [8178b9a3] ? __schedule+0xf3/0x960 [34165.540004] [8178c247] ? schedule+0x37/0x90 [34165.540004] [a0896375] ? btrfs_start_ordered_extent+0xd5/0x110 [btrfs] [34165.540004] [810b3cb0] ? prepare_to_wait_event+0x110/0x110 [34165.540004] [a0896884] ? btrfs_wait_ordered_range+0xc4/0x120 [btrfs] [34165.540004] [a08c0c18] ? __btrfs_write_out_cache+0x378/0x470 [btrfs] [34165.540004] [a08c104a] ? btrfs_write_out_cache+0x9a/0x100 [btrfs] [34165.540004] [a086af79] ? btrfs_write_dirty_block_groups+0x159/0x560 [btrfs] [34165.540004] [a08f2aa6] ? commit_cowonly_roots+0x18d/0x2a4 [btrfs] [34165.540004] [a087bd31] ? btrfs_commit_transaction+0x521/0xa50 [btrfs] [34165.540004] [a08a3fbe] ? btrfs_create_uuid_tree+0x5e/0x110 [btrfs] [34165.540004] [a087963f] ? open_ctree+0x1dff/0x2200 [btrfs] [34165.540004] [a084f7ce] ? btrfs_mount+0x75e/0x8f0 [btrfs] [34165.540004] [811ecbf9] ? mount_fs+0x39/0x180 [34165.540004] [81192405] ? __alloc_percpu+0x15/0x20 [34165.540004] [812082bb] ? vfs_kern_mount+0x6b/0x120 [34165.540004] [8120afe4] ? do_mount+0x204/0xb30 [34165.540004] [8120bc0b] ? SyS_mount+0x8b/0xe0 [34165.540004] [817905ed] ? system_call_fastpath+0x16/0x1b [34165.540004] Task dump for CPU 7: [34165.540004] kworker/u16:1 R running task0 14518 2 0x0008 [34165.540004] Workqueue: btrfs-freespace-write btrfs_freespace_write_helper [btrfs] [34165.540004] 0200 8803eac6fdf8 a08ac242 8803eac6fe48 [34165.540004] 8108b64f f1091400 8803eca58000 [34165.540004] 8803ea9ed3c0 8803f1091418 8803f1091400 8803eca58000 [34165.540004] Call Trace: [34165.540004] [a08ac242] ? btrfs_freespace_write_helper+0x12/0x20 [btrfs] [34165.540004] [8108b64f] ? process_one_work+0x14f/0x420 [34165.540004] [8108be08] ? worker_thread+0x118/0x510 [34165.540004] [8108bcf0] ? rescuer_thread+0x3d0/0x3d0 [34165.540004] [81091212] ? kthread+0xd2/0xf0 [34165.540004] [81091140] ? kthread_create_on_node+0x180/0x180 [34165.540004] [8179053c] ? ret_from_fork+0x7c/0xb0 [34165.540004] [81091140] ? kthread_create_on_node+0x180/0x180 and one general (related to native_flush_tlb_other: [34152.604004] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [rs:main Q:Reg:490] [34152.604004] Modules linked in: btrfs(E) xor(E) radeon(E) ttm(E) drm_kms_helper(E) kvm(E) drm(E) raid6_pq(E) i2c_algo_bit(E) ipmi_si (E) amd64_edac_mod(E) serio_raw(E) hpilo(E) hpwdt(E) edac_core(E) shpchp(E) k8temp(E) mac_hid(E) edac_mce_amd(E) nfsd(E) auth_rpcgss(E ) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E) lp(E) parport(E) hpsa(E) pata_acpi(E) hid_generic(E) psmouse(E) usbhid(E) b nx2(E) cciss(E) hid(E) pata_amd(E) [34152.604004] CPU: 6 PID: 490 Comm: rs:main Q:Reg Tainted: G D W EL 4.0.0-rc1-custom #1 [34152.604004] Hardware name: HP ProLiant DL585 G2 , BIOS A07 05/02/2011 [34152.604004] task: 8803eecd9910 ti: 8803ecb3 task.ti: 8803ecb3 [34152.604004] RIP: 0010:[810f1e3a] [810f1e3a] smp_call_function_many+0x20a/0x270 [34152.604004] RSP: 0018:8803ecb33cf8 EFLAGS: 0202 [34152.604004] RAX: RBX: 81cdd140 RCX: 8803ffc19700 [34152.604004] RDX: RSI: 0100 RDI: [34152.604004] RBP: 8803ecb33d38 R08: 8803ffd961c8 R09: 0004 [34152.604004] R10: 0004 R11: 0246 R12:
Re: Regression: kernel 4.0.0-rc1 - soft lockups
Hi, yes it is reproducible. Just creating a new btrfs filesystem (14 disks, data/mdata raid6, latest git btrfs-progs) and mounting this filesystems causes the system to hang (I think I once even got it mounted, but it did hang shortly after when dd started writing to it). I just ran some quick tests and (at least at first sight) it looks like the raid5/6 code may cause the trouble: I created different btrfs filesystem types, mounted them and (if possible) did a big dd on the filesystem: mkfs.btrfs /dev/cciss/c1d* -m raid0 -d raid0 -f - no problem (only short test) mkfs.btrfs /dev/cciss/c1d* -m raid1 -d raid1 -f - no problem (only short test) mkfs.btrfs /dev/cciss/c1d* -m raid5 -d raid5 -f - (almost) instant hang mkfs.btrfs /dev/cciss/c1d* -m raid6 -d raid6 -f - (almost) instant hang (standard test) Once the machine is up again I'll do some more testing (variing the combination of data and mdata raid levels) Bye, Marcel 2015-03-03 7:37 GMT+01:00 Liu Bo bo.li@oracle.com: On Tue, Mar 03, 2015 at 07:02:01AM +0100, Marcel Ritter wrote: Hi, yesterday I did a kernel update on my btrfs test system (Ubuntu 14.04.2) from custom-build kernel 3.19-rc6 to 4.0.0-rc1. Almost instantly after starting my test script, the system got stuck with soft lockups (the machine was running the very same test for weeks on the old kernel without problems, basically doing massive streaming i/o on a raid6 btrfs volume): I found 2 types of messages in the logs: one btrfs related: [34165.540004] INFO: rcu_sched detected stalls on CPUs/tasks: { 3 7} (detected by 6, t=6990777 jiffies, g=67455, c=67454, q=0) [34165.540004] Task dump for CPU 3: [34165.540004] mount D 8803ed266000 0 15156 15110 0x [34165.540004] 0158 0014 8803ecc13718 8803ecc136d8 [34165.540004] 8106075a 0002 [34165.540004] ecc13728 8803eb603128 [34165.540004] Call Trace: [34165.540004] [8106075a] ? __do_page_fault+0x2fa/0x440 [34165.540004] [810608d1] ? do_page_fault+0x31/0x70 [34165.540004] [81792778] ? page_fault+0x28/0x30 [34165.540004] [810ae2ce] ? pick_next_task_fair+0x53e/0x880 [34165.540004] [810ae2ce] ? pick_next_task_fair+0x53e/0x880 [34165.540004] [8109707c] ? dequeue_task+0x5c/0x80 [34165.540004] [8178b9a3] ? __schedule+0xf3/0x960 [34165.540004] [8178c247] ? schedule+0x37/0x90 [34165.540004] [a0896375] ? btrfs_start_ordered_extent+0xd5/0x110 [btrfs] [34165.540004] [810b3cb0] ? prepare_to_wait_event+0x110/0x110 [34165.540004] [a0896884] ? btrfs_wait_ordered_range+0xc4/0x120 [btrfs] [34165.540004] [a08c0c18] ? __btrfs_write_out_cache+0x378/0x470 [btrfs] [34165.540004] [a08c104a] ? btrfs_write_out_cache+0x9a/0x100 [btrfs] [34165.540004] [a086af79] ? btrfs_write_dirty_block_groups+0x159/0x560 [btrfs] [34165.540004] [a08f2aa6] ? commit_cowonly_roots+0x18d/0x2a4 [btrfs] [34165.540004] [a087bd31] ? btrfs_commit_transaction+0x521/0xa50 [btrfs] [34165.540004] [a08a3fbe] ? btrfs_create_uuid_tree+0x5e/0x110 [btrfs] [34165.540004] [a087963f] ? open_ctree+0x1dff/0x2200 [btrfs] [34165.540004] [a084f7ce] ? btrfs_mount+0x75e/0x8f0 [btrfs] [34165.540004] [811ecbf9] ? mount_fs+0x39/0x180 [34165.540004] [81192405] ? __alloc_percpu+0x15/0x20 [34165.540004] [812082bb] ? vfs_kern_mount+0x6b/0x120 [34165.540004] [8120afe4] ? do_mount+0x204/0xb30 [34165.540004] [8120bc0b] ? SyS_mount+0x8b/0xe0 [34165.540004] [817905ed] ? system_call_fastpath+0x16/0x1b [34165.540004] Task dump for CPU 7: [34165.540004] kworker/u16:1 R running task0 14518 2 0x0008 [34165.540004] Workqueue: btrfs-freespace-write btrfs_freespace_write_helper [btrfs] [34165.540004] 0200 8803eac6fdf8 a08ac242 8803eac6fe48 [34165.540004] 8108b64f f1091400 8803eca58000 [34165.540004] 8803ea9ed3c0 8803f1091418 8803f1091400 8803eca58000 [34165.540004] Call Trace: [34165.540004] [a08ac242] ? btrfs_freespace_write_helper+0x12/0x20 [btrfs] [34165.540004] [8108b64f] ? process_one_work+0x14f/0x420 [34165.540004] [8108be08] ? worker_thread+0x118/0x510 [34165.540004] [8108bcf0] ? rescuer_thread+0x3d0/0x3d0 [34165.540004] [81091212] ? kthread+0xd2/0xf0 [34165.540004] [81091140] ? kthread_create_on_node+0x180/0x180 [34165.540004] [8179053c] ? ret_from_fork+0x7c/0xb0 [34165.540004] [81091140] ? kthread_create_on_node+0x180/0x180 and one general (related to native_flush_tlb_other: [34152.604004] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [rs:main Q:Reg:490] [34152.604004
Kernel bug in 3.19-rc4
Hi, I just started some btrfs stress testing on latest linux kernel 3.19-rc4: A few hours later, filesystem stopped working - the kernel bug report can be found below. The test consists of one massive IO thread (writing 100GB files with dd), and 2 tar instances extracting kernel sources and deleting them afterwards (I can provide the simple bash script doing this, if needed). System information (Ubuntu 14.04.1, latest kernel): root@thunder # uname -a Linux thunder 3.19.0-rc4-custom #1 SMP Mon Jan 12 16:13:44 CET 2015 x86_64 x86_64 x86_64 GNU/Linux root@thunder # /root/btrfs-progs/btrfs --version Btrfs v3.18-36-g0173148 Tests are done on 14 SCSI disks, using raid6 for data and metadata: root@thunder # /root/btrfs-progs/btrfs fi show Label: 'raid6' uuid: cbe34d2b-5f75-46cf-9263-9813028ebc19 Total devices 14 FS bytes used 674.62GiB devid1 size 279.39GiB used 59.24GiB path /dev/cciss/c1d0 devid2 size 279.39GiB used 59.22GiB path /dev/cciss/c1d1 devid3 size 279.39GiB used 59.22GiB path /dev/cciss/c1d10 devid4 size 279.39GiB used 59.22GiB path /dev/cciss/c1d11 devid5 size 279.39GiB used 59.22GiB path /dev/cciss/c1d12 devid6 size 279.39GiB used 59.22GiB path /dev/cciss/c1d13 devid7 size 279.39GiB used 59.22GiB path /dev/cciss/c1d2 devid8 size 279.39GiB used 59.22GiB path /dev/cciss/c1d3 devid9 size 279.39GiB used 59.22GiB path /dev/cciss/c1d4 devid 10 size 279.39GiB used 59.22GiB path /dev/cciss/c1d5 devid 11 size 279.39GiB used 59.22GiB path /dev/cciss/c1d6 devid 12 size 279.39GiB used 59.22GiB path /dev/cciss/c1d7 devid 13 size 279.39GiB used 59.22GiB path /dev/cciss/c1d8 devid 14 size 279.39GiB used 59.22GiB path /dev/cciss/c1d9 Btrfs v3.18-36-g0173148 # This is provided for completeness only, and is taken # somewhen *before* the kernel crash occured, so basic # setup is the same, but allocated/free sizes won't match root@thunder # /root/btrfs-progs/btrfs fi df /tmp/m Data, single: total=8.00MiB, used=0.00B Data, RAID6: total=727.45GiB, used=697.84GiB System, single: total=4.00MiB, used=0.00B System, RAID6: total=13.50MiB, used=64.00KiB Metadata, single: total=8.00MiB, used=0.00B Metadata, RAID6: total=3.43GiB, used=805.91MiB GlobalReserve, single: total=272.00MiB, used=0.00B Here's what happens after some hours of stress testing: [85162.472989] [ cut here ] [85162.473071] kernel BUG at fs/btrfs/inode.c:3142! [85162.473139] invalid opcode: [#1] SMP [85162.473212] Modules linked in: btrfs(E) xor(E) raid6_pq(E) radeon(E) ttm(E) drm_kms_helper(E) drm(E) hpwdt(E) amd64_edac_mod(E) kvm(E) edac_core(E) shpchp(E) k8temp(E) serio_raw(E) hpilo(E) edac_mce_amd(E) mac_hid(E) i2c_algo_bit(E) ipmi_si(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) lp(E) fscache(E) parport(E) hid_generic(E) usbhid(E) hid(E) hpsa(E) psmouse(E) bnx2(E) cciss(E) pata_acpi(E) pata_amd(E) [85162.473911] CPU: 4 PID: 3039 Comm: btrfs-cleaner Tainted: G E 3.19.0-rc4-custom #1 [85162.474028] Hardware name: HP ProLiant DL585 G2 , BIOS A07 05/02/2011 [85162.474122] task: 88085b054aa0 ti: 88205ad4c000 task.ti: 88205ad4c000 [85162.474230] RIP: 0010:[a06a8182] [a06a8182] btrfs_orphan_add+0x1d2/0x1e0 [btrfs] [85162.474422] RSP: 0018:88205ad4fc48 EFLAGS: 00010286 [85162.474497] RAX: ffe4 RBX: 8810a35d42f8 RCX: 88185b896000 [85162.474595] RDX: 6a54 RSI: 0004 RDI: 88185b896138 [85162.474694] RBP: 88205ad4fc88 R08: 0001e670 R09: 88016194b240 [85162.474793] R10: a06bd797 R11: ea0004f71800 R12: 88185baa2000 [85162.474892] R13: 88085f6d7630 R14: 88185baa2458 R15: 0001 [85162.474992] FS: 7fb3f27fb740() GS:88085fd0() knlGS: [85162.475105] CS: 0010 DS: ES: CR0: 8005003b [85162.475184] CR2: 7f896c02c220 CR3: 00085b328000 CR4: 07e0 [85162.475286] Stack: [85162.475318] 88205ad4fc88 a06e6a14 88185b896b04 88105b03e800 [85162.475442] 88016194b240 8810a35d42f8 881e8ffe9a00 88133dc48ea0 [85162.475561] 88205ad4fd18 a0691a57 88016194b244 88016194b240 [85162.475680] Call Trace: [85162.475738] [a06e6a14] ? lookup_free_space_inode+0x44/0x100 [btrfs] [85162.475849] [a0691a57] btrfs_remove_block_group+0x137/0x740 [btrfs] [85162.475964] [a06ca8d2] btrfs_remove_chunk+0x672/0x780 [btrfs] [85162.476065] [a06922bf] btrfs_delete_unused_bgs+0x25f/0x280 [btrfs] [85162.476172] [a0699e0c] cleaner_kthread+0x12c/0x190 [btrfs] [85162.476269] [a0699ce0] ? check_leaf+0x350/0x350 [btrfs] [85162.476355] [8108f8d2] kthread+0xd2/0xf0 [85162.476424] [8108f800] ? kthread_create_on_node+0x180/0x180 [85162.476519] [8177bcbc]