RichACLs for BTRFS? (this time complete)

2015-10-30 Thread Marcel Ritter
Hi btrfs-developers,

I just read about the possible/planned merge of richacl patches into
linux kernel 4.4.

s. http://lwn.net/Articles/661078/
s. http://lwn.net/Articles/661357/

Will btrfs support richacls with kernel 4.4?

According to the btrfs wiki, this topic has not been claimed:

https://btrfs.wiki.kernel.org/index.php/Project_ideas#RichACLs_.2F_NFS4_ACLS

As we'd like to use btrfs with NFSv4 I'd really like to see richacls on btrfs.

Hope someone can comment on this topic.

Bye,
   Marcel

PS: Please excuse my former incomplete posting.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RichACLs for BTRFS?

2015-10-30 Thread Marcel Ritter
Hi btrfs-developers,

I just read about the possible/planned merge of richacl patches into
linux kernel 4.4.

s. http://lwn.net/Articles/661078/
s. http://lwn.net/Articles/661357/

Will btrfs support richacls with kernel 4.4?
According to
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


linux 4.1 - memory leak (possibly dedup related)

2015-07-03 Thread Marcel Ritter
Hi,

I've been running some btrfs tests (mainly duperemove related) with
linux kernel 4.1 for the last few days.

Now I noticed by accident (dying processes), that all my memory (128
GB!) is gone.
Gone meaning, there's no user space process allocating this memory.

Digging deeper I found the missing memory using slabtop (see output of
/proc/slabinfo is attached): Looks like I got a lot of kernel memory
allocated by kmalloc-1024 (memory leak?).
Given the fact that the test machine does little more than btrfs
testing I think this may be btrfs related.

I was running duperemove on a 1.5 TB volume around the time the first
Out of memory error were logged, so maybe the memory leak can be
found somewhere in this code path.

I'm still waiting for a scrub run to finish, after that I'll reboot
the machine and try to reproduce this behaviour with a fresh btrfs
filesystem.

Have there been any fixes concerning memory leaks since 4.1 release I could try?
Any other ideas how to track down this potential memory leak?

Bye,
   Marcel
slabinfo - version: 2.1
# nameactive_objs num_objs objsize objperslab pagesperslab : tunables limit batchcount sharedfactor : slabdata active_slabs num_slabs sharedavail
btrfs_delayed_data_ref   2394   2982 96   421 : tunables000 : slabdata 71 71  0
btrfs_delayed_tree_ref   3726   4600 88   461 : tunables000 : slabdata100100  0
btrfs_delayed_ref_head   1900   2100160   251 : tunables000 : slabdata 84 84  0
btrfs_delayed_node   1561   1794304   262 : tunables000 : slabdata 69 69  0
btrfs_ordered_extent   1140   1558424   384 : tunables000 : slabdata 41 41  0
bio-2   1600   1625320   252 : tunables000 : slabdata 65 65  0
btrfs_extent_buffer688   1276280   292 : tunables000 : slabdata 44 44  0
btrfs_extent_state   4539   4590 80   511 : tunables000 : slabdata 90 90  0
btrfs_delalloc_work  0  0152   261 : tunables000 : slabdata  0  0  0
btrfs_transaction176176360   222 : tunables000 : slabdata  8  8  0
btrfs_trans_handle184184176   231 : tunables000 : slabdata  8  8  0
btrfs_inode 1636   6388984   338 : tunables000 : slabdata205205  0
nfs4_layout_stateid  0  0240   342 : tunables000 : slabdata  0  0  0
nfsd4_delegations  0  0224   362 : tunables000 : slabdata  0  0  0
nfsd4_files0  0288   282 : tunables000 : slabdata  0  0  0
nfsd4_openowners   0  0440   374 : tunables000 : slabdata  0  0  0
nfs_direct_cache   0  0352   232 : tunables000 : slabdata  0  0  0
nfs_commit_data   23 23704   234 : tunables000 : slabdata  1  1  0
nfs_inode_cache0  0   1000   328 : tunables000 : slabdata  0  0  0
rpc_inode_cache   50 50640   254 : tunables000 : slabdata  2  2  0
fscache_cookie_jar 46 46 88   461 : tunables000 : slabdata  1  1  0
ext3_inode_cache 160160808   204 : tunables000 : slabdata  8  8  0
journal_handle  1360   1360 24  1701 : tunables000 : slabdata  8  8  0
ext4_groupinfo_4k   3887   6636144   281 : tunables000 : slabdata237237  0
ip6-frags  0  0216   372 : tunables000 : slabdata  0  0  0
UDPLITEv6  0  0   1088   308 : tunables000 : slabdata  0  0  0
UDPv6240240   1088   308 : tunables000 : slabdata  8  8  0
tw_sock_TCPv6 58 58280   292 : tunables000 : slabdata  2  2  0
TCPv6112112   2240   148 : tunables000 : slabdata  8  8  0
kcopyd_job 0  0   331298 : tunables000 : slabdata  0  0  0
dm_uevent  0  0   2632   128 : tunables000 : slabdata  0  0  0
cfq_queue  0  0232   352 : tunables000 : slabdata  0  0  0
bsg_cmd0  0312   262 : tunables000 : slabdata  0  0  0
mqueue_inode_cache 36 36896   368 : tunables000 : slabdata  1  1  0
fuse_request   0  0400   202 : tunables000 : slabdata  0  0  0

Re: Status: converting raid levels

2015-03-09 Thread Marcel Ritter
Hi,

I tried to revert the mentioned patch set (kernel 4.0.0-rc2).

Starting a new re-balance with this kernel, while running my I/O test
(big dd write) on the same btrfs volume (14 disks) resulted in cpu stuck
messages - system was unusable just a few seconds later.

With a plain 4.0.0-rc2 kernel the rebalance at least does finish
without crashing/hanging the system, but - on the downside - it
does not change the raid layout of the btrfs volume :-(

Any other ideas?

2015-03-07 10:39 GMT+01:00 Filipe David Manana fdman...@gmail.com:
 On Fri, Mar 6, 2015 at 10:22 AM, Marcel Ritter ritter.mar...@gmail.com 
 wrote:
 Hi,

 please, can someone comment on the current status of raid level migration?
 (kernel 4.0.0-rc2, btrfs-progs 3.19-rc2)

 I just started testing this feature, and it doesn't seem to work:

 Starting with Raid1:

 root@thunder[ ~/btrfs-progs ]# ./btrfs fi df /tmp/m
 Data, RAID1: total=3.00GiB, used=512.00KiB
 Data, single: total=8.00MiB, used=0.00B
 System, RAID1: total=8.00MiB, used=16.00KiB
 System, single: total=4.00MiB, used=0.00B
 Metadata, RAID1: total=1.00GiB, used=112.00KiB
 Metadata, single: total=8.00MiB, used=0.00B
 GlobalReserve, single: total=16.00MiB, used=0.00B

 Converting to Raid10:

 root@thunder[ ~/btrfs-progs ]# ./btrfs balance start -dconvert=raid10
 -mconvert=raid10 /tmp/m/
 Done, had to relocate 9 out of 9 chunks

 Still Raid1 ...:

 root@thunder[ ~/btrfs-progs ]# ./btrfs fi df /tmp/m
 Data, RAID1: total=35.00GiB, used=31.49GiB
 System, RAID1: total=32.00MiB, used=16.00KiB
 Metadata, RAID1: total=1.00GiB, used=33.86MiB
 GlobalReserve, single: total=16.00MiB, used=0.00B

 I also tried conversion of raid6 - but it did not work either
 (according to btrfs fi df output).

 Did I miss something, or is it a bug?

 Seems like the regression Holger found due to a patch added to 4.0:

 http://www.spinics.net/lists/linux-btrfs/msg42084.html

 (he mentions a 3.18.x kernel, but that's because he builds his own
 kernels with patches backported)


 Bye,
 Marcel
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



 --
 Filipe David Manana,

 Reasonable men adapt themselves to the world.
  Unreasonable men adapt the world to themselves.
  That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Status: converting raid levels

2015-03-06 Thread Marcel Ritter
Hi,

please, can someone comment on the current status of raid level migration?
(kernel 4.0.0-rc2, btrfs-progs 3.19-rc2)

I just started testing this feature, and it doesn't seem to work:

Starting with Raid1:

root@thunder[ ~/btrfs-progs ]# ./btrfs fi df /tmp/m
Data, RAID1: total=3.00GiB, used=512.00KiB
Data, single: total=8.00MiB, used=0.00B
System, RAID1: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B

Converting to Raid10:

root@thunder[ ~/btrfs-progs ]# ./btrfs balance start -dconvert=raid10
-mconvert=raid10 /tmp/m/
Done, had to relocate 9 out of 9 chunks

Still Raid1 ...:

root@thunder[ ~/btrfs-progs ]# ./btrfs fi df /tmp/m
Data, RAID1: total=35.00GiB, used=31.49GiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=1.00GiB, used=33.86MiB
GlobalReserve, single: total=16.00MiB, used=0.00B

I also tried conversion of raid6 - but it did not work either
(according to btrfs fi df output).

Did I miss something, or is it a bug?

Bye,
Marcel
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression: kernel 4.0.0-rc1 - soft lockups

2015-03-03 Thread Marcel Ritter
Hi,

just a short update on this topic:

I also tried the Ubuntu 4.0.0-rc1 ppa kernel - problems are still there.

Luckily kernel 4.0.0-rc2 was released yesterday:
I updated my machine to kernel 4.0.0-rc2 and the problems are gone
(test script has been running fine for about 12 hours now)

Bye,
Marcel

2015-03-03 12:05 GMT+01:00 Liu Bo bo.li@oracle.com:
 On Tue, Mar 03, 2015 at 08:31:10AM +0100, Marcel Ritter wrote:
 Hi,

 yes it is reproducible.

 Just creating a new btrfs filesystem (14 disks, data/mdata raid6,
 latest git btrfs-progs)
 and mounting this filesystems causes the system to hang (I think I once even 
 got
 it mounted, but it did hang shortly after when dd started writing to it).

 I just ran some quick tests and (at least at first sight) it looks
 like the raid5/6
 code may cause the trouble:

 I created different btrfs filesystem types, mounted them and (if possible)
 did a big dd on the filesystem:

 mkfs.btrfs /dev/cciss/c1d* -m raid0 -d raid0 -f - no problem (only short 
 test)
 mkfs.btrfs /dev/cciss/c1d* -m raid1 -d raid1 -f - no problem (only short 
 test)
 mkfs.btrfs /dev/cciss/c1d* -m raid5 -d raid5 -f - (almost) instant hang
 mkfs.btrfs /dev/cciss/c1d* -m raid6 -d raid6 -f - (almost) instant
 hang (standard test)

 Once the machine is up again I'll do some more testing (variing the 
 combination
 of data and mdata raid levels)

 Hmm, just FYI, raid56 works good on my box with 4.0.0 rc1.

 Thanks,

 -liubo


 Bye,
Marcel


 2015-03-03 7:37 GMT+01:00 Liu Bo bo.li@oracle.com:
  On Tue, Mar 03, 2015 at 07:02:01AM +0100, Marcel Ritter wrote:
  Hi,
 
  yesterday I did a kernel update on my btrfs test system (Ubuntu
  14.04.2) from custom-build kernel 3.19-rc6 to 4.0.0-rc1.
 
  Almost instantly after starting my test script, the system got stuck
  with soft lockups (the machine was running the very same test for
  weeks on the old kernel without problems,
  basically doing massive streaming i/o on a raid6 btrfs volume):
 
  I found 2 types of messages in the logs:
 
  one btrfs related:
 
  [34165.540004] INFO: rcu_sched detected stalls on CPUs/tasks: { 3 7}
  (detected by 6, t=6990777 jiffies, g=67455, c=67454, q=0)
  [34165.540004] Task dump for CPU 3:
  [34165.540004] mount   D 8803ed266000 0 15156  15110 
  0x
  [34165.540004]  0158 0014 8803ecc13718
  8803ecc136d8
  [34165.540004]  8106075a  0002
  
  [34165.540004]  ecc13728 8803eb603128 
  
  [34165.540004] Call Trace:
  [34165.540004]  [8106075a] ? __do_page_fault+0x2fa/0x440
  [34165.540004]  [810608d1] ? do_page_fault+0x31/0x70
  [34165.540004]  [81792778] ? page_fault+0x28/0x30
  [34165.540004]  [810ae2ce] ? pick_next_task_fair+0x53e/0x880
  [34165.540004]  [810ae2ce] ? pick_next_task_fair+0x53e/0x880
  [34165.540004]  [8109707c] ? dequeue_task+0x5c/0x80
  [34165.540004]  [8178b9a3] ? __schedule+0xf3/0x960
  [34165.540004]  [8178c247] ? schedule+0x37/0x90
  [34165.540004]  [a0896375] ?
  btrfs_start_ordered_extent+0xd5/0x110 [btrfs]
  [34165.540004]  [810b3cb0] ? prepare_to_wait_event+0x110/0x110
  [34165.540004]  [a0896884] ?
  btrfs_wait_ordered_range+0xc4/0x120 [btrfs]
  [34165.540004]  [a08c0c18] ?
  __btrfs_write_out_cache+0x378/0x470 [btrfs]
  [34165.540004]  [a08c104a] ? btrfs_write_out_cache+0x9a/0x100 
  [btrfs]
  [34165.540004]  [a086af79] ?
  btrfs_write_dirty_block_groups+0x159/0x560 [btrfs]
  [34165.540004]  [a08f2aa6] ? commit_cowonly_roots+0x18d/0x2a4 
  [btrfs]
  [34165.540004]  [a087bd31] ?
  btrfs_commit_transaction+0x521/0xa50 [btrfs]
  [34165.540004]  [a08a3fbe] ? btrfs_create_uuid_tree+0x5e/0x110 
  [btrfs]
  [34165.540004]  [a087963f] ? open_ctree+0x1dff/0x2200 [btrfs]
  [34165.540004]  [a084f7ce] ? btrfs_mount+0x75e/0x8f0 [btrfs]
  [34165.540004]  [811ecbf9] ? mount_fs+0x39/0x180
  [34165.540004]  [81192405] ? __alloc_percpu+0x15/0x20
  [34165.540004]  [812082bb] ? vfs_kern_mount+0x6b/0x120
  [34165.540004]  [8120afe4] ? do_mount+0x204/0xb30
  [34165.540004]  [8120bc0b] ? SyS_mount+0x8b/0xe0
  [34165.540004]  [817905ed] ? system_call_fastpath+0x16/0x1b
  [34165.540004] Task dump for CPU 7:
  [34165.540004] kworker/u16:1   R  running task0 14518  2 
  0x0008
  [34165.540004] Workqueue: btrfs-freespace-write
  btrfs_freespace_write_helper [btrfs]
  [34165.540004]  0200 8803eac6fdf8 a08ac242
  8803eac6fe48
  [34165.540004]  8108b64f f1091400 
  8803eca58000
  [34165.540004]  8803ea9ed3c0 8803f1091418 8803f1091400
  8803eca58000
  [34165.540004] Call Trace:
  [34165.540004]  [a08ac242] ?
  btrfs_freespace_write_helper+0x12/0x20 [btrfs

Regression: kernel 4.0.0-rc1 - soft lockups

2015-03-02 Thread Marcel Ritter
Hi,

yesterday I did a kernel update on my btrfs test system (Ubuntu
14.04.2) from custom-build kernel 3.19-rc6 to 4.0.0-rc1.

Almost instantly after starting my test script, the system got stuck
with soft lockups (the machine was running the very same test for
weeks on the old kernel without problems,
basically doing massive streaming i/o on a raid6 btrfs volume):

I found 2 types of messages in the logs:

one btrfs related:

[34165.540004] INFO: rcu_sched detected stalls on CPUs/tasks: { 3 7}
(detected by 6, t=6990777 jiffies, g=67455, c=67454, q=0)
[34165.540004] Task dump for CPU 3:
[34165.540004] mount   D 8803ed266000 0 15156  15110 0x
[34165.540004]  0158 0014 8803ecc13718
8803ecc136d8
[34165.540004]  8106075a  0002

[34165.540004]  ecc13728 8803eb603128 

[34165.540004] Call Trace:
[34165.540004]  [8106075a] ? __do_page_fault+0x2fa/0x440
[34165.540004]  [810608d1] ? do_page_fault+0x31/0x70
[34165.540004]  [81792778] ? page_fault+0x28/0x30
[34165.540004]  [810ae2ce] ? pick_next_task_fair+0x53e/0x880
[34165.540004]  [810ae2ce] ? pick_next_task_fair+0x53e/0x880
[34165.540004]  [8109707c] ? dequeue_task+0x5c/0x80
[34165.540004]  [8178b9a3] ? __schedule+0xf3/0x960
[34165.540004]  [8178c247] ? schedule+0x37/0x90
[34165.540004]  [a0896375] ?
btrfs_start_ordered_extent+0xd5/0x110 [btrfs]
[34165.540004]  [810b3cb0] ? prepare_to_wait_event+0x110/0x110
[34165.540004]  [a0896884] ?
btrfs_wait_ordered_range+0xc4/0x120 [btrfs]
[34165.540004]  [a08c0c18] ?
__btrfs_write_out_cache+0x378/0x470 [btrfs]
[34165.540004]  [a08c104a] ? btrfs_write_out_cache+0x9a/0x100 [btrfs]
[34165.540004]  [a086af79] ?
btrfs_write_dirty_block_groups+0x159/0x560 [btrfs]
[34165.540004]  [a08f2aa6] ? commit_cowonly_roots+0x18d/0x2a4 [btrfs]
[34165.540004]  [a087bd31] ?
btrfs_commit_transaction+0x521/0xa50 [btrfs]
[34165.540004]  [a08a3fbe] ? btrfs_create_uuid_tree+0x5e/0x110 [btrfs]
[34165.540004]  [a087963f] ? open_ctree+0x1dff/0x2200 [btrfs]
[34165.540004]  [a084f7ce] ? btrfs_mount+0x75e/0x8f0 [btrfs]
[34165.540004]  [811ecbf9] ? mount_fs+0x39/0x180
[34165.540004]  [81192405] ? __alloc_percpu+0x15/0x20
[34165.540004]  [812082bb] ? vfs_kern_mount+0x6b/0x120
[34165.540004]  [8120afe4] ? do_mount+0x204/0xb30
[34165.540004]  [8120bc0b] ? SyS_mount+0x8b/0xe0
[34165.540004]  [817905ed] ? system_call_fastpath+0x16/0x1b
[34165.540004] Task dump for CPU 7:
[34165.540004] kworker/u16:1   R  running task0 14518  2 0x0008
[34165.540004] Workqueue: btrfs-freespace-write
btrfs_freespace_write_helper [btrfs]
[34165.540004]  0200 8803eac6fdf8 a08ac242
8803eac6fe48
[34165.540004]  8108b64f f1091400 
8803eca58000
[34165.540004]  8803ea9ed3c0 8803f1091418 8803f1091400
8803eca58000
[34165.540004] Call Trace:
[34165.540004]  [a08ac242] ?
btrfs_freespace_write_helper+0x12/0x20 [btrfs]
[34165.540004]  [8108b64f] ? process_one_work+0x14f/0x420
[34165.540004]  [8108be08] ? worker_thread+0x118/0x510
[34165.540004]  [8108bcf0] ? rescuer_thread+0x3d0/0x3d0
[34165.540004]  [81091212] ? kthread+0xd2/0xf0
[34165.540004]  [81091140] ? kthread_create_on_node+0x180/0x180
[34165.540004]  [8179053c] ? ret_from_fork+0x7c/0xb0
[34165.540004]  [81091140] ? kthread_create_on_node+0x180/0x180


and one general (related to native_flush_tlb_other:

[34152.604004] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s!
[rs:main Q:Reg:490]
[34152.604004] Modules linked in: btrfs(E) xor(E) radeon(E) ttm(E)
drm_kms_helper(E) kvm(E) drm(E) raid6_pq(E) i2c_algo_bit(E) ipmi_si
(E) amd64_edac_mod(E) serio_raw(E) hpilo(E) hpwdt(E) edac_core(E)
shpchp(E) k8temp(E) mac_hid(E) edac_mce_amd(E) nfsd(E) auth_rpcgss(E
) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E) lp(E)
parport(E) hpsa(E) pata_acpi(E) hid_generic(E) psmouse(E) usbhid(E) b
nx2(E) cciss(E) hid(E) pata_amd(E)
[34152.604004] CPU: 6 PID: 490 Comm: rs:main Q:Reg Tainted: G  D W
  EL  4.0.0-rc1-custom #1
[34152.604004] Hardware name: HP ProLiant DL585 G2   , BIOS A07 05/02/2011
[34152.604004] task: 8803eecd9910 ti: 8803ecb3 task.ti:
8803ecb3
[34152.604004] RIP: 0010:[810f1e3a]  [810f1e3a]
smp_call_function_many+0x20a/0x270
[34152.604004] RSP: 0018:8803ecb33cf8  EFLAGS: 0202
[34152.604004] RAX:  RBX: 81cdd140 RCX: 8803ffc19700
[34152.604004] RDX:  RSI: 0100 RDI: 
[34152.604004] RBP: 8803ecb33d38 R08: 8803ffd961c8 R09: 0004
[34152.604004] R10: 0004 R11: 0246 R12: 

Re: Regression: kernel 4.0.0-rc1 - soft lockups

2015-03-02 Thread Marcel Ritter
Hi,

yes it is reproducible.

Just creating a new btrfs filesystem (14 disks, data/mdata raid6,
latest git btrfs-progs)
and mounting this filesystems causes the system to hang (I think I once even got
it mounted, but it did hang shortly after when dd started writing to it).

I just ran some quick tests and (at least at first sight) it looks
like the raid5/6
code may cause the trouble:

I created different btrfs filesystem types, mounted them and (if possible)
did a big dd on the filesystem:

mkfs.btrfs /dev/cciss/c1d* -m raid0 -d raid0 -f - no problem (only short test)
mkfs.btrfs /dev/cciss/c1d* -m raid1 -d raid1 -f - no problem (only short test)
mkfs.btrfs /dev/cciss/c1d* -m raid5 -d raid5 -f - (almost) instant hang
mkfs.btrfs /dev/cciss/c1d* -m raid6 -d raid6 -f - (almost) instant
hang (standard test)

Once the machine is up again I'll do some more testing (variing the combination
of data and mdata raid levels)

Bye,
   Marcel


2015-03-03 7:37 GMT+01:00 Liu Bo bo.li@oracle.com:
 On Tue, Mar 03, 2015 at 07:02:01AM +0100, Marcel Ritter wrote:
 Hi,

 yesterday I did a kernel update on my btrfs test system (Ubuntu
 14.04.2) from custom-build kernel 3.19-rc6 to 4.0.0-rc1.

 Almost instantly after starting my test script, the system got stuck
 with soft lockups (the machine was running the very same test for
 weeks on the old kernel without problems,
 basically doing massive streaming i/o on a raid6 btrfs volume):

 I found 2 types of messages in the logs:

 one btrfs related:

 [34165.540004] INFO: rcu_sched detected stalls on CPUs/tasks: { 3 7}
 (detected by 6, t=6990777 jiffies, g=67455, c=67454, q=0)
 [34165.540004] Task dump for CPU 3:
 [34165.540004] mount   D 8803ed266000 0 15156  15110 
 0x
 [34165.540004]  0158 0014 8803ecc13718
 8803ecc136d8
 [34165.540004]  8106075a  0002
 
 [34165.540004]  ecc13728 8803eb603128 
 
 [34165.540004] Call Trace:
 [34165.540004]  [8106075a] ? __do_page_fault+0x2fa/0x440
 [34165.540004]  [810608d1] ? do_page_fault+0x31/0x70
 [34165.540004]  [81792778] ? page_fault+0x28/0x30
 [34165.540004]  [810ae2ce] ? pick_next_task_fair+0x53e/0x880
 [34165.540004]  [810ae2ce] ? pick_next_task_fair+0x53e/0x880
 [34165.540004]  [8109707c] ? dequeue_task+0x5c/0x80
 [34165.540004]  [8178b9a3] ? __schedule+0xf3/0x960
 [34165.540004]  [8178c247] ? schedule+0x37/0x90
 [34165.540004]  [a0896375] ?
 btrfs_start_ordered_extent+0xd5/0x110 [btrfs]
 [34165.540004]  [810b3cb0] ? prepare_to_wait_event+0x110/0x110
 [34165.540004]  [a0896884] ?
 btrfs_wait_ordered_range+0xc4/0x120 [btrfs]
 [34165.540004]  [a08c0c18] ?
 __btrfs_write_out_cache+0x378/0x470 [btrfs]
 [34165.540004]  [a08c104a] ? btrfs_write_out_cache+0x9a/0x100 
 [btrfs]
 [34165.540004]  [a086af79] ?
 btrfs_write_dirty_block_groups+0x159/0x560 [btrfs]
 [34165.540004]  [a08f2aa6] ? commit_cowonly_roots+0x18d/0x2a4 
 [btrfs]
 [34165.540004]  [a087bd31] ?
 btrfs_commit_transaction+0x521/0xa50 [btrfs]
 [34165.540004]  [a08a3fbe] ? btrfs_create_uuid_tree+0x5e/0x110 
 [btrfs]
 [34165.540004]  [a087963f] ? open_ctree+0x1dff/0x2200 [btrfs]
 [34165.540004]  [a084f7ce] ? btrfs_mount+0x75e/0x8f0 [btrfs]
 [34165.540004]  [811ecbf9] ? mount_fs+0x39/0x180
 [34165.540004]  [81192405] ? __alloc_percpu+0x15/0x20
 [34165.540004]  [812082bb] ? vfs_kern_mount+0x6b/0x120
 [34165.540004]  [8120afe4] ? do_mount+0x204/0xb30
 [34165.540004]  [8120bc0b] ? SyS_mount+0x8b/0xe0
 [34165.540004]  [817905ed] ? system_call_fastpath+0x16/0x1b
 [34165.540004] Task dump for CPU 7:
 [34165.540004] kworker/u16:1   R  running task0 14518  2 
 0x0008
 [34165.540004] Workqueue: btrfs-freespace-write
 btrfs_freespace_write_helper [btrfs]
 [34165.540004]  0200 8803eac6fdf8 a08ac242
 8803eac6fe48
 [34165.540004]  8108b64f f1091400 
 8803eca58000
 [34165.540004]  8803ea9ed3c0 8803f1091418 8803f1091400
 8803eca58000
 [34165.540004] Call Trace:
 [34165.540004]  [a08ac242] ?
 btrfs_freespace_write_helper+0x12/0x20 [btrfs]
 [34165.540004]  [8108b64f] ? process_one_work+0x14f/0x420
 [34165.540004]  [8108be08] ? worker_thread+0x118/0x510
 [34165.540004]  [8108bcf0] ? rescuer_thread+0x3d0/0x3d0
 [34165.540004]  [81091212] ? kthread+0xd2/0xf0
 [34165.540004]  [81091140] ? kthread_create_on_node+0x180/0x180
 [34165.540004]  [8179053c] ? ret_from_fork+0x7c/0xb0
 [34165.540004]  [81091140] ? kthread_create_on_node+0x180/0x180


 and one general (related to native_flush_tlb_other:

 [34152.604004] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s!
 [rs:main Q:Reg:490]
 [34152.604004

Kernel bug in 3.19-rc4

2015-01-15 Thread Marcel Ritter
Hi,

I just started some btrfs stress testing on latest linux kernel 3.19-rc4:
A few hours later, filesystem stopped working - the kernel bug report
can be found below.

The test consists of one massive IO thread (writing 100GB files with dd),
and 2 tar instances extracting kernel sources and deleting them afterwards
(I can provide the simple bash script doing this, if needed).

System information (Ubuntu 14.04.1, latest kernel):

root@thunder # uname -a
Linux thunder 3.19.0-rc4-custom #1 SMP Mon Jan 12 16:13:44 CET 2015
x86_64 x86_64 x86_64 GNU/Linux

root@thunder # /root/btrfs-progs/btrfs --version
Btrfs v3.18-36-g0173148

Tests are done on 14 SCSI disks, using raid6 for data and metadata:

root@thunder # /root/btrfs-progs/btrfs fi show
Label: 'raid6'  uuid: cbe34d2b-5f75-46cf-9263-9813028ebc19
Total devices 14 FS bytes used 674.62GiB
devid1 size 279.39GiB used 59.24GiB path /dev/cciss/c1d0
devid2 size 279.39GiB used 59.22GiB path /dev/cciss/c1d1
devid3 size 279.39GiB used 59.22GiB path /dev/cciss/c1d10
devid4 size 279.39GiB used 59.22GiB path /dev/cciss/c1d11
devid5 size 279.39GiB used 59.22GiB path /dev/cciss/c1d12
devid6 size 279.39GiB used 59.22GiB path /dev/cciss/c1d13
devid7 size 279.39GiB used 59.22GiB path /dev/cciss/c1d2
devid8 size 279.39GiB used 59.22GiB path /dev/cciss/c1d3
devid9 size 279.39GiB used 59.22GiB path /dev/cciss/c1d4
devid   10 size 279.39GiB used 59.22GiB path /dev/cciss/c1d5
devid   11 size 279.39GiB used 59.22GiB path /dev/cciss/c1d6
devid   12 size 279.39GiB used 59.22GiB path /dev/cciss/c1d7
devid   13 size 279.39GiB used 59.22GiB path /dev/cciss/c1d8
devid   14 size 279.39GiB used 59.22GiB path /dev/cciss/c1d9

Btrfs v3.18-36-g0173148

# This is provided for completeness only, and is taken
# somewhen *before* the kernel crash occured, so basic
# setup is the same, but allocated/free sizes won't match
root@thunder # /root/btrfs-progs/btrfs fi df /tmp/m
Data, single: total=8.00MiB, used=0.00B
Data, RAID6: total=727.45GiB, used=697.84GiB
System, single: total=4.00MiB, used=0.00B
System, RAID6: total=13.50MiB, used=64.00KiB
Metadata, single: total=8.00MiB, used=0.00B
Metadata, RAID6: total=3.43GiB, used=805.91MiB
GlobalReserve, single: total=272.00MiB, used=0.00B


Here's what happens after some hours of stress testing:

[85162.472989] [ cut here ]
[85162.473071] kernel BUG at fs/btrfs/inode.c:3142!
[85162.473139] invalid opcode:  [#1] SMP
[85162.473212] Modules linked in: btrfs(E) xor(E) raid6_pq(E)
radeon(E) ttm(E) drm_kms_helper(E) drm(E) hpwdt(E) amd64_edac_mod(E)
kvm(E) edac_core(E) shpchp(E) k8temp(E) serio_raw(E) hpilo(E)
edac_mce_amd(E) mac_hid(E) i2c_algo_bit(E) ipmi_si(E) nfsd(E)
auth_rpcgss(E) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) lp(E)
fscache(E) parport(E) hid_generic(E) usbhid(E) hid(E) hpsa(E)
psmouse(E) bnx2(E) cciss(E) pata_acpi(E) pata_amd(E)
[85162.473911] CPU: 4 PID: 3039 Comm: btrfs-cleaner Tainted: G
   E  3.19.0-rc4-custom #1
[85162.474028] Hardware name: HP ProLiant DL585 G2   , BIOS A07 05/02/2011
[85162.474122] task: 88085b054aa0 ti: 88205ad4c000 task.ti:
88205ad4c000
[85162.474230] RIP: 0010:[a06a8182]  [a06a8182]
btrfs_orphan_add+0x1d2/0x1e0 [btrfs]
[85162.474422] RSP: 0018:88205ad4fc48  EFLAGS: 00010286
[85162.474497] RAX: ffe4 RBX: 8810a35d42f8 RCX: 88185b896000
[85162.474595] RDX: 6a54 RSI: 0004 RDI: 88185b896138
[85162.474694] RBP: 88205ad4fc88 R08: 0001e670 R09: 88016194b240
[85162.474793] R10: a06bd797 R11: ea0004f71800 R12: 88185baa2000
[85162.474892] R13: 88085f6d7630 R14: 88185baa2458 R15: 0001
[85162.474992] FS:  7fb3f27fb740() GS:88085fd0()
knlGS:
[85162.475105] CS:  0010 DS:  ES:  CR0: 8005003b
[85162.475184] CR2: 7f896c02c220 CR3: 00085b328000 CR4: 07e0
[85162.475286] Stack:
[85162.475318]  88205ad4fc88 a06e6a14 88185b896b04
88105b03e800
[85162.475442]  88016194b240 8810a35d42f8 881e8ffe9a00
88133dc48ea0
[85162.475561]  88205ad4fd18 a0691a57 88016194b244
88016194b240
[85162.475680] Call Trace:
[85162.475738]  [a06e6a14] ?
lookup_free_space_inode+0x44/0x100 [btrfs]
[85162.475849]  [a0691a57]
btrfs_remove_block_group+0x137/0x740 [btrfs]
[85162.475964]  [a06ca8d2] btrfs_remove_chunk+0x672/0x780 [btrfs]
[85162.476065]  [a06922bf] btrfs_delete_unused_bgs+0x25f/0x280 [btrfs]
[85162.476172]  [a0699e0c] cleaner_kthread+0x12c/0x190 [btrfs]
[85162.476269]  [a0699ce0] ? check_leaf+0x350/0x350 [btrfs]
[85162.476355]  [8108f8d2] kthread+0xd2/0xf0
[85162.476424]  [8108f800] ? kthread_create_on_node+0x180/0x180
[85162.476519]  [8177bcbc]