Reiterate: btrfs stuck with lot's of files
Hi, guys again. Looking at this issue, I suspect this is bug in btrfs. We'll have to clean up this installation soon, so if there is any request to do some debugging, please, ask. I'll try to reiterate what was said in this thread. Short story: btrfs filesystem made of 22 1Tb disks with lot's of files (~3024). Write load is 25 Mbyte/second. After some time file system became unable to cope with this load. Also at this time `sync` takes ages to finish, shutdown -r hangs (I guess related to sync). Also I see there is one some kernel kworker that is main suspect for this behavior: all the time it takes 100% of CPU core, jumping from core to core. At the same time according to iostat write/read speed is close to zero and everything is stuck. Siting some details from previous messages: top - 13:10:58 up 1 day, 9:26, 5 users, load average: 157.76, 156.61, 149.29 Tasks: 235 total, 2 running, 233 sleeping, 0 stopped, 0 zombie %Cpu(s): 19.8 us, 15.0 sy, 0.0 ni, 60.7 id, 3.9 wa, 0.0 hi, 0.6 si, 0.0 st KiB Mem: 65922104 total, 65414856 used, 507248 free, 1844 buffers KiB Swap:0 total,0 used,0 free. 62570804 cached Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 8644 root 20 0 0 0 0 R 96.5 0.0 127:21.95 kworker/u16:16 5047 dvr 20 0 6884292 122668 4132 S 6.4 0.2 258:59.49 dvrserver 30223 root 20 0 20140 2600 2132 R 6.4 0.0 0:00.01 top 1 root 20 04276 1628 1524 S 0.0 0.0 0:40.19 init There are about 300 treads on server, some of which are writing on disk. A bit information about this btrfs filesystem: this is 22 disk file system with raid1 for metadata and raid0 for data: # btrfs filesystem df /store/ Data, single: total=11.92TiB, used=10.86TiB System, RAID1: total=8.00MiB, used=1.27MiB System, single: total=4.00MiB, used=0.00B Metadata, RAID1: total=46.00GiB, used=33.49GiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=128.00KiB # btrfs property get /store/ ro=false label=store # btrfs device stats /store/ (shows all zeros) # btrfs balance status /store/ No balance found on '/store/' # btrfs filesystem show Label: 'store' uuid: 296404d1-bd3f-417d-8501-02f8d7906bcf Total devices 22 FS bytes used 6.50TiB devid1 size 931.51GiB used 558.02GiB path /dev/sdb devid2 size 931.51GiB used 559.00GiB path /dev/sdc devid3 size 931.51GiB used 559.00GiB path /dev/sdd devid4 size 931.51GiB used 559.00GiB path /dev/sde devid5 size 931.51GiB used 559.00GiB path /dev/sdf devid6 size 931.51GiB used 559.00GiB path /dev/sdg devid7 size 931.51GiB used 559.00GiB path /dev/sdh devid8 size 931.51GiB used 559.00GiB path /dev/sdi devid9 size 931.51GiB used 559.00GiB path /dev/sdj devid 10 size 931.51GiB used 559.00GiB path /dev/sdk devid 11 size 931.51GiB used 559.00GiB path /dev/sdl devid 12 size 931.51GiB used 559.00GiB path /dev/sdm devid 13 size 931.51GiB used 559.00GiB path /dev/sdn devid 14 size 931.51GiB used 559.00GiB path /dev/sdo devid 15 size 931.51GiB used 559.00GiB path /dev/sdp devid 16 size 931.51GiB used 559.00GiB path /dev/sdq devid 17 size 931.51GiB used 559.00GiB path /dev/sdr devid 18 size 931.51GiB used 559.00GiB path /dev/sds devid 19 size 931.51GiB used 559.00GiB path /dev/sdt devid 20 size 931.51GiB used 559.00GiB path /dev/sdu devid 21 size 931.51GiB used 559.01GiB path /dev/sdv devid 22 size 931.51GiB used 560.01GiB path /dev/sdw Btrfs v3.17.1 iostat 1 exposes following problem: avg-cpu: %user %nice %system %iowait %steal %idle 16.960.00 17.09 65.950.000.00 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 0.00 0.00 0.00 0 0 sdc 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 sde 0.00 0.00 0.00 0 0 sdd 0.00 0.00 0.00 0 0 sdf 0.00 0.00 0.00 0 0 sdg 0.00 0.00 0.00 0 0 sdj 0.00 0.00 0.00 0 0 sdh 0.00 0.00 0.00 0 0 sdk 0.00 0.00 0.00 0 0 sdi 1.00 0.00 200.00 0200 sdl 0.00 0.00 0.00 0 0 sdn 48.00 0.00 17260.00
Re: Reiterate: btrfs stuck with lot's of files
On Thu, Dec 4, 2014 at 3:58 PM, Peter Volkov p...@gentoo.org wrote: Hi, guys again. Looking at this issue, I suspect this is bug in btrfs. We'll have to clean up this installation soon, so if there is any request to do some debugging, please, ask. I'll try to reiterate what was said in this thread. Short story: btrfs filesystem made of 22 1Tb disks with lot's of files (~3024). Write load is 25 Mbyte/second. After some time file system became unable to cope with this load. Also at this time `sync` takes ages to finish, shutdown -r hangs (I guess related to sync). Also I see there is one some kernel kworker that is main suspect for this behavior: all the time it takes 100% of CPU core, jumping from core to core. At the same time according to iostat write/read speed is close to zero and everything is stuck. Siting some details from previous messages: top - 13:10:58 up 1 day, 9:26, 5 users, load average: 157.76, 156.61, 149.29 Tasks: 235 total, 2 running, 233 sleeping, 0 stopped, 0 zombie %Cpu(s): 19.8 us, 15.0 sy, 0.0 ni, 60.7 id, 3.9 wa, 0.0 hi, 0.6 si, 0.0 st KiB Mem: 65922104 total, 65414856 used, 507248 free, 1844 buffers KiB Swap:0 total,0 used,0 free. 62570804 cached Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 8644 root 20 0 0 0 0 R 96.5 0.0 127:21.95 kworker/u16:16 5047 dvr 20 0 6884292 122668 4132 S 6.4 0.2 258:59.49 dvrserver 30223 root 20 0 20140 2600 2132 R 6.4 0.0 0:00.01 top 1 root 20 04276 1628 1524 S 0.0 0.0 0:40.19 init There are about 300 treads on server, some of which are writing on disk. A bit information about this btrfs filesystem: this is 22 disk file system with raid1 for metadata and raid0 for data: # btrfs filesystem df /store/ Data, single: total=11.92TiB, used=10.86TiB System, RAID1: total=8.00MiB, used=1.27MiB System, single: total=4.00MiB, used=0.00B Metadata, RAID1: total=46.00GiB, used=33.49GiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=128.00KiB # btrfs property get /store/ ro=false label=store # btrfs device stats /store/ (shows all zeros) # btrfs balance status /store/ No balance found on '/store/' # btrfs filesystem show Label: 'store' uuid: 296404d1-bd3f-417d-8501-02f8d7906bcf Total devices 22 FS bytes used 6.50TiB devid1 size 931.51GiB used 558.02GiB path /dev/sdb devid2 size 931.51GiB used 559.00GiB path /dev/sdc devid3 size 931.51GiB used 559.00GiB path /dev/sdd devid4 size 931.51GiB used 559.00GiB path /dev/sde devid5 size 931.51GiB used 559.00GiB path /dev/sdf devid6 size 931.51GiB used 559.00GiB path /dev/sdg devid7 size 931.51GiB used 559.00GiB path /dev/sdh devid8 size 931.51GiB used 559.00GiB path /dev/sdi devid9 size 931.51GiB used 559.00GiB path /dev/sdj devid 10 size 931.51GiB used 559.00GiB path /dev/sdk devid 11 size 931.51GiB used 559.00GiB path /dev/sdl devid 12 size 931.51GiB used 559.00GiB path /dev/sdm devid 13 size 931.51GiB used 559.00GiB path /dev/sdn devid 14 size 931.51GiB used 559.00GiB path /dev/sdo devid 15 size 931.51GiB used 559.00GiB path /dev/sdp devid 16 size 931.51GiB used 559.00GiB path /dev/sdq devid 17 size 931.51GiB used 559.00GiB path /dev/sdr devid 18 size 931.51GiB used 559.00GiB path /dev/sds devid 19 size 931.51GiB used 559.00GiB path /dev/sdt devid 20 size 931.51GiB used 559.00GiB path /dev/sdu devid 21 size 931.51GiB used 559.01GiB path /dev/sdv devid 22 size 931.51GiB used 560.01GiB path /dev/sdw Btrfs v3.17.1 iostat 1 exposes following problem: avg-cpu: %user %nice %system %iowait %steal %idle 16.960.00 17.09 65.950.000.00 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 0.00 0.00 0.00 0 0 sdc 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 sde 0.00 0.00 0.00 0 0 sdd 0.00 0.00 0.00 0 0 sdf 0.00 0.00 0.00 0 0 sdg 0.00 0.00 0.00 0 0 sdj 0.00 0.00 0.00 0 0 sdh 0.00 0.00 0.00 0 0 sdk 0.00 0.00 0.00 0 0 sdi 1.00 0.00 200.00 0200 sdl
Re: btrfs stuck with lot's of files
Peter Volkov posted on Tue, 02 Dec 2014 04:50:29 +0300 as excerpted: В Пн, 01/12/2014 в 10:47 -0800, Robert White пишет: On 12/01/2014 03:46 AM, Peter Volkov wrote: (stuff about getting hung up trying to write to one drive) That drive (/dev/sdn) is probably starting to fail. (about failed drive) Thank you Robert for the answer. It is not likely that drive fails here. Similar condition (write to a single drive) happens with other drives i.e. such write pattern may happen with any drive. After looking at what happens longer I see the following. During stuck single processor core is busy 100% of CPU in kernel space (some kworker is taking 100% CPU). FWIW, agreed that it's unlikely to be the drive, especially if you're not seeing bus resets or drive errors in dmesg and smart says the drive is fine, as I expect it does/will. It may be a btrfs bug or scaling issue, of which btrfs still has some, or it could simply be the single mode vs raid0 mode issue I explain below. # btrfs filesystem df /store/ Data, single: total=11.92TiB, used=10.86TiB Reguardless of the above... You have a terabyte of unused but allocated data storage. You probably need to balance your system to un-jamb that. That's a lot of space that is unavailable to the metadata (etc). Well, I'm afraid that balance will put fs into even longer stuck. ASIDE: Having your metadata set to RAID1 (as opposed to the default of DUP) seems a little iffy since your data is still set to DUP. That's true. But why data is duplicated? During btrfs volume creation I've set explicitly -d data single. I believe Robert mis-wrote (thinko). The btrfs filesystem df clearly shows that your data is in single mode, the data default mode, not dup mode, which is normally only available to metadata (not data) on a single- device filesystem, where it is the metadata default. However, in the original post you /did/ say raid1 for metadata, raid0 for data, and the above btrfs filesystem df again clearly says single, not raid0. Which is very likely to be your problem. In single mode, btrfs will create chunks one at a time, picking the device with the most free space to allocate it on. The normal data chunk size is 1 GiB. Because of the most-free-space allocation rule, with N devices (22 in your case) of the same size, after N (22) data chunks are allocated you'll tend to have one such chunk on each device. Each of these 1 GiB chunks (along with space freed up by normal delete activity in other allocated data chunks) will be filled before another is allocated. Which will mean you're writing a GiB worth of data to one device before you switch to the next one. With your mostly sub-MiB file write pattern, that's probably 1500-2000 files written to a chunk on that single device, before another chunk is allocated on the next device. Thus all your activity on that single device! In raid0 mode, by contrast, the same 1 GiB chunks will be allocated on each device, but a stripe of chunks will be allocated across all devices (22 in your case) at the same time, and data being written is broken up into much smaller per-device strips. I'm not sure what the actual per- device is in raid0 mode, but it's *WELL* under a GiB and I believe in the KiBs not MiB range. It might be 128 KiB, the compression block size when the compress mount option is used. Obviously were you using raid0 data, you'd see the load spread out at least somewhat better. But the df says it's single, not raid0. To get raid0 mode you can use a balance with filters (see the wiki or recent btrfs-balance manpage), or blow away the existing filesystem and create a new one, setting --data raid0 when you mkfs.btrfs, and restore from backups (which you're already prepared to do if you value your data in any case[1]). That missing btrfs filesystem show, due to the terminating / in /store/ (simply /store should work) is somewhat frustrating here, as it'd show per-device sizes and utilization. Assuming near same-sized devices, with 11 TiB of data being far greater than the 1 GiB data chunk size times 22 devices I'd guess you're pretty evened out, utilization-wise, but the output from both show and df is necessary to get the full story. FUTHER ASIDE: raid1 metadata and raid5 data might be good for you given 22 volumes and 10% empty empty space it would only cost you half of your existing empty space. If you don't RAID your data, there is no real point to putting your metadata in RAID. Is raid5 ready for use? As I read post[1] mentioned on[2] it is still some way to make it stable. You are absolutely correct. I'd strongly recommend staying AWAY from btrfs raid5/6 modes at this time. While Robert is becoming an active regular and has the technical background to point out some things others miss, he's still reasonably new to this list and may not have been aware of the incomplete status of raid5/6 modes at this time. Effectively
Re: btrfs stuck with lot's of files
On Tue, 2 Dec 2014 12:48:21 + (UTC) Duncan 1i5t5.dun...@cox.net wrote: Peter Volkov posted on Tue, 02 Dec 2014 04:50:29 +0300 as excerpted: В Пн, 01/12/2014 в 10:47 -0800, Robert White пишет: On 12/01/2014 03:46 AM, Peter Volkov wrote: (stuff about getting hung up trying to write to one drive) That drive (/dev/sdn) is probably starting to fail. (about failed drive) Thank you Robert for the answer. It is not likely that drive fails here. Similar condition (write to a single drive) happens with other drives i.e. such write pattern may happen with any drive. After looking at what happens longer I see the following. During stuck single processor core is busy 100% of CPU in kernel space (some kworker is taking 100% CPU). FWIW, agreed that it's unlikely to be the drive, especially if you're not seeing bus resets or drive errors in dmesg and smart says the drive is fine, as I expect it does/will. It may be a btrfs bug or scaling issue, of which btrfs still has some, or it could simply be the single mode vs raid0 mode issue I explain below. I encountered a similar problem here a few days ago on a btrfs raid1 partition while using rsync to clone a (~30GB) directory. Everything started fine, but I came back an hour later to find rsync had apparently stalled at about 20% with cpu usage at 100% on a single kworker thread. I was able to kill rsync eventually, and after a while (don't know how long, but 10 minutes) cpu usage returned to normal. Restarting rsync resulted in kworker at 100% cpu in less than a minute. Once stalled there was little drive access happening. Another raid1 partition (mdadm/ext4) on the same drive pair was having no problems. Nothing showed in the system logs. In this instance I'd forgotten to delete a temporary 500GB file before starting rsync, so although recently balanced (musage=80/dusage=80) it was running at near capacity. After a reboot, deleting the 500GB file running balance, everything returned to normal. Ran rsync again it completed fine. Running slackware current, with Kernel 3.16.4 # btrfs filesystem df /mnt/general Data, RAID1: total=1.38TiB, used=1.38TiB System, RAID1: total=32.00MiB, used=256.00KiB Metadata, RAID1: total=6.00GiB, used=4.67GiB GlobalReserve, single: total=512.00MiB, used=0.00B # btrfs filesystem show /mnt/general Label: none uuid: 592376ea-769f-4abb-915e-aa5e49162d90 Total devices 2 FS bytes used 1.38TiB devid1 size 1.79TiB used 1.39TiB path /dev/sda4 devid2 size 1.79TiB used 1.39TiB path /dev/sdd4 Btrfs v3.17.2 -- Ian -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs stuck with lot's of files
Ian Armstrong posted on Tue, 02 Dec 2014 18:56:13 + as excerpted: On Tue, 2 Dec 2014 12:48:21 + (UTC) Duncan 1i5t5.dun...@cox.net wrote: FWIW, agreed that it's unlikely to be the drive, especially if you're not seeing bus resets or drive errors in dmesg and smart says the drive is fine, as I expect it does/will. It may be a btrfs bug or scaling issue, of which btrfs still has some, or it could simply be the single mode vs raid0 mode issue I explain below. I encountered a similar problem here a few days ago on a btrfs raid1 partition while using rsync to clone a (~30GB) directory. Everything started fine, but I came back an hour later to find rsync had apparently stalled at about 20% with cpu usage at 100% on a single kworker thread. I was able to kill rsync eventually, and after a while (don't know how long, but 10 minutes) cpu usage returned to normal. Restarting rsync resulted in kworker at 100% cpu in less than a minute. Once stalled there was little drive access happening. Another raid1 partition (mdadm/ext4) on the same drive pair was having no problems. Nothing showed in the system logs. In this instance I'd forgotten to delete a temporary 500GB file before starting rsync, so although recently balanced (musage=80/dusage=80) it was running at near capacity. After a reboot, deleting the 500GB file running balance, everything returned to normal. Ran rsync again it completed fine. Running slackware current, with Kernel 3.16.4 FWIW that was my point -- there are still such bugs out there, often corner-case so they don't affect most folks most of the time, but out there. I had a similar stall recently, a kworker stuck at 100% that went away after I killed whatever app had triggered the problem (pan, the news program I'm writing this with, as it happens). In my case I chalked it up to a known corner-case bug in my slightly old 3.17.0 kernel (my use- case doesn't do read-only snapshots so I'm not affected by that known bug that effectively blacklists 3.17.0 for some users; this would have been a different one). I don't /know/ it was that bug, but it most likely was, as it's a known but rare corner-case that AFAIK is already fixed in the late 3.18-rcs. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs stuck with lot's of files
Hi, guys. We have a problem with btrfs file system: sometimes it became stuck without leaving me any way to interrupt it (shutdown -r now is unable to restart server). By stuck I mean some processes that previously were able to write on disk are unable to cope with load and load average goes up: top - 13:10:58 up 1 day, 9:26, 5 users, load average: 157.76, 156.61, 149.29 Tasks: 235 total, 2 running, 233 sleeping, 0 stopped, 0 zombie %Cpu(s): 19.8 us, 15.0 sy, 0.0 ni, 60.7 id, 3.9 wa, 0.0 hi, 0.6 si, 0.0 st KiB Mem: 65922104 total, 65414856 used, 507248 free, 1844 buffers KiB Swap:0 total,0 used,0 free. 62570804 cached Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 8644 root 20 0 0 0 0 R 96.5 0.0 127:21.95 kworker/u16:16 5047 dvr 20 0 6884292 122668 4132 S 6.4 0.2 258:59.49 dvrserver 30223 root 20 0 20140 2600 2132 R 6.4 0.0 0:00.01 top 1 root 20 04276 1628 1524 S 0.0 0.0 0:40.19 init There are about 300 treads on server, some of which are writing on disk. A bit information about this btrfs filesystem: this is 22 disk file system with raid1 for metadata and raid0 for data: # btrfs filesystem df /store/ Data, single: total=11.92TiB, used=10.86TiB System, RAID1: total=8.00MiB, used=1.27MiB System, single: total=4.00MiB, used=0.00B Metadata, RAID1: total=46.00GiB, used=33.49GiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=128.00KiB # btrfs property get /store/ ro=false label=store # btrfs device stats /store/ (shows all zeros) # btrfs balance status /store/ No balance found on '/store/' # btrfs filesystem show /store/ Btrfs v3.17.1 (btw, is it supposed to have only version here?) As for load we write quite small files of size (some of 313K, some of 800K), that's why metadata takes that much. So back to the problem. iostat 1 exposes following problem: avg-cpu: %user %nice %system %iowait %steal %idle 16.960.00 17.09 65.950.000.00 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 0.00 0.00 0.00 0 0 sdc 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 sde 0.00 0.00 0.00 0 0 sdd 0.00 0.00 0.00 0 0 sdf 0.00 0.00 0.00 0 0 sdg 0.00 0.00 0.00 0 0 sdj 0.00 0.00 0.00 0 0 sdh 0.00 0.00 0.00 0 0 sdk 0.00 0.00 0.00 0 0 sdi 1.00 0.00 200.00 0200 sdl 0.00 0.00 0.00 0 0 sdn 48.00 0.00 17260.00 0 17260 sdm 0.00 0.00 0.00 0 0 sdp 0.00 0.00 0.00 0 0 sdo 0.00 0.00 0.00 0 0 sdq 0.00 0.00 0.00 0 0 sdr 0.00 0.00 0.00 0 0 sds 0.00 0.00 0.00 0 0 sdt 0.00 0.00 0.00 0 0 sdv 0.00 0.00 0.00 0 0 sdw 0.00 0.00 0.00 0 0 sdu 0.00 0.00 0.00 0 0 write goes to one disk. I've tried to debug what's going in kworker and did $ echo workqueue:workqueue_queue_work /sys/kernel/debug/tracing/set_event $ cat /sys/kernel/debug/tracing/trace_pipe trace_pipe.out2 trace_pipe2.out.xz in attachment. Could you comment, what goes wrong here? Server has 64Gb of RAM. Is it possible that it is unable to keep all metadata in memory, can we encrease this memory limit, if exists? Thanks in advance for any pointers, -- Peter. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs stuck with lot's of files
On 12/01/2014 03:46 AM, Peter Volkov wrote: Hi, guys. (stuff about getting hung up trying to write to one drive) That drive (/dev/sdn) is probably starting to fail. Some older drives basically go unresponsive when they start to go bad. Particularly if they've gone bad enough to have run out of spare tracks/sectors. Sometimes they will just refuse to answer. Sometimes they will go into try again mode, and the same activity will be retried indefinitely. This will then fill up your write queues and jam up all sorts of subsystems. Step 1: Backup your data. Since you didn't RAID your data at all, when that drive dies your data is going to go away in fascinating and unpredictable ways. (RAID1 metadata with no RAID1 or RAID5 of the data means you have essentially no media failure protection.) Step 2: Turn on SMART (if you can and you can) and check whether the drive is in its final moments of life. If your disk is all green lights according to smart, you may be able to un-jamb it by just doing a balance as described and explained after the next time I quote you. Step 3: Switch your data mode to RAID5. It will cost you about half of your currenly free data space, but it won't leave you _as_ _vulnerable_ to complete data loss as you are now. SMART might be wrong about your drive being fine if it says it is. # btrfs filesystem df /store/ Data, single: total=11.92TiB, used=10.86TiB Reguardless of the above... You have a terabyte of unused but allocated data storage. You probably need to balance your system to un-jamb that. That's a lot of space that is unavailable to the metadata (etc). ASIDE: Having your metadata set to RAID1 (as opposed to the default of DUP) seems a little iffy since your data is still set to DUP. This configuration is not going to leave you with a mountable filesystem if you lose a disk. I'm not sure if the RAID1 layout is going to want to put specific datum in specific places, but it might, which if it does might leave you in an irreconcilable position. Either way, you will probably un-jam your system in the short run by doing a balance. A full balance (no filter args at all) would be your best bet. FUTHER ASIDE: raid1 metadata and raid5 data might be good for you given 22 volumes and 10% empty empty space it would only cost you half of your existing empty space. If you don't RAID your data, there is no real point to putting your metadata in RAID. [Yes, I said my basic points about your current layout two different ways and times. You are either just a little over-committed on space or you are about to lose all your data and it's impossible to tell which is the case from here.] Backup your data. NOW! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs stuck with lot's of files
Original Message Subject: btrfs stuck with lot's of files From: Peter Volkov p...@gentoo.org To: linux-btrfs@vger.kernel.org linux-btrfs@vger.kernel.org Date: 2014年12月01日 19:46 Hi, guys. We have a problem with btrfs file system: sometimes it became stuck without leaving me any way to interrupt it (shutdown -r now is unable to restart server). By stuck I mean some processes that previously were able to write on disk are unable to cope with load and load average goes up: top - 13:10:58 up 1 day, 9:26, 5 users, load average: 157.76, 156.61, 149.29 Tasks: 235 total, 2 running, 233 sleeping, 0 stopped, 0 zombie %Cpu(s): 19.8 us, 15.0 sy, 0.0 ni, 60.7 id, 3.9 wa, 0.0 hi, 0.6 si, 0.0 st KiB Mem: 65922104 total, 65414856 used, 507248 free, 1844 buffers KiB Swap:0 total,0 used,0 free. 62570804 cached Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 8644 root 20 0 0 0 0 R 96.5 0.0 127:21.95 kworker/u16:16 5047 dvr 20 0 6884292 122668 4132 S 6.4 0.2 258:59.49 dvrserver 30223 root 20 0 20140 2600 2132 R 6.4 0.0 0:00.01 top 1 root 20 04276 1628 1524 S 0.0 0.0 0:40.19 init There are about 300 treads on server, some of which are writing on disk. A bit information about this btrfs filesystem: this is 22 disk file system with raid1 for metadata and raid0 for data: # btrfs filesystem df /store/ Data, single: total=11.92TiB, used=10.86TiB System, RAID1: total=8.00MiB, used=1.27MiB System, single: total=4.00MiB, used=0.00B Metadata, RAID1: total=46.00GiB, used=33.49GiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=128.00KiB # btrfs property get /store/ ro=false label=store # btrfs device stats /store/ (shows all zeros) # btrfs balance status /store/ No balance found on '/store/' # btrfs filesystem show /store/ Btrfs v3.17.1 (btw, is it supposed to have only version here?) This is a small bug that if there is appending '/' in the path for 'btrfs fi show', it can't recognize it Patch is already sent and maybe included next version. As for load we write quite small files of size (some of 313K, some of 800K), that's why metadata takes that much. So back to the problem. iostat 1 exposes following problem: avg-cpu: %user %nice %system %iowait %steal %idle 16.960.00 17.09 65.950.000.00 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 0.00 0.00 0.00 0 0 sdc 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 sde 0.00 0.00 0.00 0 0 sdd 0.00 0.00 0.00 0 0 sdf 0.00 0.00 0.00 0 0 sdg 0.00 0.00 0.00 0 0 sdj 0.00 0.00 0.00 0 0 sdh 0.00 0.00 0.00 0 0 sdk 0.00 0.00 0.00 0 0 sdi 1.00 0.00 200.00 0200 sdl 0.00 0.00 0.00 0 0 sdn 48.00 0.00 17260.00 0 17260 sdm 0.00 0.00 0.00 0 0 sdp 0.00 0.00 0.00 0 0 sdo 0.00 0.00 0.00 0 0 sdq 0.00 0.00 0.00 0 0 sdr 0.00 0.00 0.00 0 0 sds 0.00 0.00 0.00 0 0 sdt 0.00 0.00 0.00 0 0 sdv 0.00 0.00 0.00 0 0 sdw 0.00 0.00 0.00 0 0 sdu 0.00 0.00 0.00 0 0 write goes to one disk. I've tried to debug what's going in kworker and did $ echo workqueue:workqueue_queue_work /sys/kernel/debug/tracing/set_event $ cat /sys/kernel/debug/tracing/trace_pipe trace_pipe.out2 trace_pipe2.out.xz in attachment. Could you comment, what goes wrong here? It seems that attachment is blocked by mail-list so I didn't see the attachment. Server has 64Gb of RAM. Is it possible that it is unable to keep all metadata in memory, can we encrease this memory limit, if exists? Not possible, it will never happen (if nothing goes wrong). Kernel has the outstanding page cache mechanism, when memory comes short, some cached metadata/data can be flushed back(if dirty) to disk to free space. And re-read from disk if needed later. So kernel don't need to load all
Re: btrfs stuck with lot's of files
В Вт, 02/12/2014 в 09:33 +0800, Qu Wenruo пишет: Original Message Subject: btrfs stuck with lot's of files From: Peter Volkov p...@gentoo.org To: linux-btrfs@vger.kernel.org linux-btrfs@vger.kernel.org Date: 2014年12月01日 19:46 Hi, guys. We have a problem with btrfs file system: sometimes it became stuck without leaving me any way to interrupt it (shutdown -r now is unable to restart server). By stuck I mean some processes that previously were able to write on disk are unable to cope with load and load average goes up: top - 13:10:58 up 1 day, 9:26, 5 users, load average: 157.76, 156.61, 149.29 Tasks: 235 total, 2 running, 233 sleeping, 0 stopped, 0 zombie %Cpu(s): 19.8 us, 15.0 sy, 0.0 ni, 60.7 id, 3.9 wa, 0.0 hi, 0.6 si, 0.0 st KiB Mem: 65922104 total, 65414856 used, 507248 free, 1844 buffers KiB Swap:0 total,0 used,0 free. 62570804 cached Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 8644 root 20 0 0 0 0 R 96.5 0.0 127:21.95 kworker/u16:16 5047 dvr 20 0 6884292 122668 4132 S 6.4 0.2 258:59.49 dvrserver 30223 root 20 0 20140 2600 2132 R 6.4 0.0 0:00.01 top 1 root 20 04276 1628 1524 S 0.0 0.0 0:40.19 init There are about 300 treads on server, some of which are writing on disk. A bit information about this btrfs filesystem: this is 22 disk file system with raid1 for metadata and raid0 for data: # btrfs filesystem df /store/ Data, single: total=11.92TiB, used=10.86TiB System, RAID1: total=8.00MiB, used=1.27MiB System, single: total=4.00MiB, used=0.00B Metadata, RAID1: total=46.00GiB, used=33.49GiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=128.00KiB # btrfs property get /store/ ro=false label=store # btrfs device stats /store/ (shows all zeros) # btrfs balance status /store/ No balance found on '/store/' # btrfs filesystem show /store/ Btrfs v3.17.1 (btw, is it supposed to have only version here?) This is a small bug that if there is appending '/' in the path for 'btrfs fi show', it can't recognize it Patch is already sent and maybe included next version. As for load we write quite small files of size (some of 313K, some of 800K), that's why metadata takes that much. So back to the problem. iostat 1 exposes following problem: avg-cpu: %user %nice %system %iowait %steal %idle 16.960.00 17.09 65.950.000.00 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 0.00 0.00 0.00 0 0 sdc 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 sde 0.00 0.00 0.00 0 0 sdd 0.00 0.00 0.00 0 0 sdf 0.00 0.00 0.00 0 0 sdg 0.00 0.00 0.00 0 0 sdj 0.00 0.00 0.00 0 0 sdh 0.00 0.00 0.00 0 0 sdk 0.00 0.00 0.00 0 0 sdi 1.00 0.00 200.00 0200 sdl 0.00 0.00 0.00 0 0 sdn 48.00 0.00 17260.00 0 17260 sdm 0.00 0.00 0.00 0 0 sdp 0.00 0.00 0.00 0 0 sdo 0.00 0.00 0.00 0 0 sdq 0.00 0.00 0.00 0 0 sdr 0.00 0.00 0.00 0 0 sds 0.00 0.00 0.00 0 0 sdt 0.00 0.00 0.00 0 0 sdv 0.00 0.00 0.00 0 0 sdw 0.00 0.00 0.00 0 0 sdu 0.00 0.00 0.00 0 0 write goes to one disk. I've tried to debug what's going in kworker and did $ echo workqueue:workqueue_queue_work /sys/kernel/debug/tracing/set_event $ cat /sys/kernel/debug/tracing/trace_pipe trace_pipe.out2 trace_pipe2.out.xz in attachment. Could you comment, what goes wrong here? It seems that attachment is blocked by mail-list so I didn't see the attachment. I've put it here: https://drive.google.com/file/d/0BygFL6N3ZVUAMWxCQ0tDREE1Uzg/view?usp=sharing And some additional information I've put in another letter that just sent to mailing list. Server has 64Gb