Reiterate: btrfs stuck with lot's of files

2014-12-04 Thread Peter Volkov
Hi, guys again. Looking at this issue, I suspect this is bug in btrfs.
We'll have to clean up this installation soon, so if there is any
request to do some debugging, please, ask. I'll try to reiterate what
was said in this thread.

Short story: btrfs filesystem made of 22 1Tb disks with lot's of files
(~3024). Write load is 25 Mbyte/second. After some time file system
became unable to cope with this load. Also at this time `sync` takes
ages to finish, shutdown -r hangs (I guess related to sync).

Also I see there is one some kernel kworker that is main suspect for
this behavior: all the time it takes 100% of CPU core, jumping from core
to core. At the same time according to iostat write/read speed is close
to zero and everything is stuck.

Siting some details from previous messages:

  top - 13:10:58 up 1 day,  9:26,  5 users,  load average: 157.76, 156.61, 
  149.29
  Tasks: 235 total,   2 running, 233 sleeping,   0 stopped,   0 zombie
  %Cpu(s): 19.8 us, 15.0 sy,  0.0 ni, 60.7 id,  3.9 wa,  0.0 hi,  0.6 si, 0.0 
  st
  KiB Mem:  65922104 total, 65414856 used,   507248 free, 1844 buffers
  KiB Swap:0 total,0 used,0 free. 62570804 cached Mem
 
 PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
  COMMAND
8644 root  20   0   0  0  0 R  96.5  0.0 127:21.95 
  kworker/u16:16
5047 dvr   20   0 6884292 122668   4132 S   6.4  0.2 258:59.49 
  dvrserver
  30223 root  20   0   20140   2600   2132 R   6.4  0.0   0:00.01 top
   1 root  20   04276   1628   1524 S   0.0  0.0   0:40.19 init
 
  There are about 300 treads on server, some of which are writing on disk.
  A bit information about this btrfs filesystem: this is 22 disk file
  system with raid1 for metadata and raid0 for data:
 
# btrfs filesystem df /store/
  Data, single: total=11.92TiB, used=10.86TiB
  System, RAID1: total=8.00MiB, used=1.27MiB
  System, single: total=4.00MiB, used=0.00B
  Metadata, RAID1: total=46.00GiB, used=33.49GiB
  Metadata, single: total=8.00MiB, used=0.00B
  GlobalReserve, single: total=512.00MiB, used=128.00KiB
# btrfs property get /store/
  ro=false
  label=store
# btrfs device stats /store/
  (shows all zeros)
# btrfs balance status /store/
  No balance found on '/store/'

 # btrfs filesystem show
Label: 'store'  uuid: 296404d1-bd3f-417d-8501-02f8d7906bcf
Total devices 22 FS bytes used 6.50TiB
devid1 size 931.51GiB used 558.02GiB path /dev/sdb
devid2 size 931.51GiB used 559.00GiB path /dev/sdc
devid3 size 931.51GiB used 559.00GiB path /dev/sdd
devid4 size 931.51GiB used 559.00GiB path /dev/sde
devid5 size 931.51GiB used 559.00GiB path /dev/sdf
devid6 size 931.51GiB used 559.00GiB path /dev/sdg
devid7 size 931.51GiB used 559.00GiB path /dev/sdh
devid8 size 931.51GiB used 559.00GiB path /dev/sdi
devid9 size 931.51GiB used 559.00GiB path /dev/sdj
devid   10 size 931.51GiB used 559.00GiB path /dev/sdk
devid   11 size 931.51GiB used 559.00GiB path /dev/sdl
devid   12 size 931.51GiB used 559.00GiB path /dev/sdm
devid   13 size 931.51GiB used 559.00GiB path /dev/sdn
devid   14 size 931.51GiB used 559.00GiB path /dev/sdo
devid   15 size 931.51GiB used 559.00GiB path /dev/sdp
devid   16 size 931.51GiB used 559.00GiB path /dev/sdq
devid   17 size 931.51GiB used 559.00GiB path /dev/sdr
devid   18 size 931.51GiB used 559.00GiB path /dev/sds
devid   19 size 931.51GiB used 559.00GiB path /dev/sdt
devid   20 size 931.51GiB used 559.00GiB path /dev/sdu
devid   21 size 931.51GiB used 559.01GiB path /dev/sdv
devid   22 size 931.51GiB used 560.01GiB path /dev/sdw

Btrfs v3.17.1

  iostat 1 exposes following problem:
 
  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 16.960.00   17.09   65.950.000.00
 
  Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
  sda   0.00 0.00 0.00  0  0
  sdc   0.00 0.00 0.00  0  0
  sdb   0.00 0.00 0.00  0  0
  sde   0.00 0.00 0.00  0  0
  sdd   0.00 0.00 0.00  0  0
  sdf   0.00 0.00 0.00  0  0
  sdg   0.00 0.00 0.00  0  0
  sdj   0.00 0.00 0.00  0  0
  sdh   0.00 0.00 0.00  0  0
  sdk   0.00 0.00 0.00  0  0
  sdi   1.00 0.00   200.00  0200
  sdl   0.00 0.00 0.00  0  0
  sdn  48.00 0.00 17260.00   

Re: Reiterate: btrfs stuck with lot's of files

2014-12-04 Thread Chris Murphy
On Thu, Dec 4, 2014 at 3:58 PM, Peter Volkov p...@gentoo.org wrote:
 Hi, guys again. Looking at this issue, I suspect this is bug in btrfs.
 We'll have to clean up this installation soon, so if there is any
 request to do some debugging, please, ask. I'll try to reiterate what
 was said in this thread.

 Short story: btrfs filesystem made of 22 1Tb disks with lot's of files
 (~3024). Write load is 25 Mbyte/second. After some time file system
 became unable to cope with this load. Also at this time `sync` takes
 ages to finish, shutdown -r hangs (I guess related to sync).

 Also I see there is one some kernel kworker that is main suspect for
 this behavior: all the time it takes 100% of CPU core, jumping from core
 to core. At the same time according to iostat write/read speed is close
 to zero and everything is stuck.

 Siting some details from previous messages:

  top - 13:10:58 up 1 day,  9:26,  5 users,  load average: 157.76, 156.61, 
  149.29
  Tasks: 235 total,   2 running, 233 sleeping,   0 stopped,   0 zombie
  %Cpu(s): 19.8 us, 15.0 sy,  0.0 ni, 60.7 id,  3.9 wa,  0.0 hi,  0.6 si, 
  0.0 st
  KiB Mem:  65922104 total, 65414856 used,   507248 free, 1844 buffers
  KiB Swap:0 total,0 used,0 free. 62570804 cached Mem
 
 PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
  COMMAND
8644 root  20   0   0  0  0 R  96.5  0.0 127:21.95 
  kworker/u16:16
5047 dvr   20   0 6884292 122668   4132 S   6.4  0.2 258:59.49 
  dvrserver
  30223 root  20   0   20140   2600   2132 R   6.4  0.0   0:00.01 top
   1 root  20   04276   1628   1524 S   0.0  0.0   0:40.19 init
 
  There are about 300 treads on server, some of which are writing on disk.
  A bit information about this btrfs filesystem: this is 22 disk file
  system with raid1 for metadata and raid0 for data:
 
# btrfs filesystem df /store/
  Data, single: total=11.92TiB, used=10.86TiB
  System, RAID1: total=8.00MiB, used=1.27MiB
  System, single: total=4.00MiB, used=0.00B
  Metadata, RAID1: total=46.00GiB, used=33.49GiB
  Metadata, single: total=8.00MiB, used=0.00B
  GlobalReserve, single: total=512.00MiB, used=128.00KiB
# btrfs property get /store/
  ro=false
  label=store
# btrfs device stats /store/
  (shows all zeros)
# btrfs balance status /store/
  No balance found on '/store/'

  # btrfs filesystem show
 Label: 'store'  uuid: 296404d1-bd3f-417d-8501-02f8d7906bcf
 Total devices 22 FS bytes used 6.50TiB
 devid1 size 931.51GiB used 558.02GiB path /dev/sdb
 devid2 size 931.51GiB used 559.00GiB path /dev/sdc
 devid3 size 931.51GiB used 559.00GiB path /dev/sdd
 devid4 size 931.51GiB used 559.00GiB path /dev/sde
 devid5 size 931.51GiB used 559.00GiB path /dev/sdf
 devid6 size 931.51GiB used 559.00GiB path /dev/sdg
 devid7 size 931.51GiB used 559.00GiB path /dev/sdh
 devid8 size 931.51GiB used 559.00GiB path /dev/sdi
 devid9 size 931.51GiB used 559.00GiB path /dev/sdj
 devid   10 size 931.51GiB used 559.00GiB path /dev/sdk
 devid   11 size 931.51GiB used 559.00GiB path /dev/sdl
 devid   12 size 931.51GiB used 559.00GiB path /dev/sdm
 devid   13 size 931.51GiB used 559.00GiB path /dev/sdn
 devid   14 size 931.51GiB used 559.00GiB path /dev/sdo
 devid   15 size 931.51GiB used 559.00GiB path /dev/sdp
 devid   16 size 931.51GiB used 559.00GiB path /dev/sdq
 devid   17 size 931.51GiB used 559.00GiB path /dev/sdr
 devid   18 size 931.51GiB used 559.00GiB path /dev/sds
 devid   19 size 931.51GiB used 559.00GiB path /dev/sdt
 devid   20 size 931.51GiB used 559.00GiB path /dev/sdu
 devid   21 size 931.51GiB used 559.01GiB path /dev/sdv
 devid   22 size 931.51GiB used 560.01GiB path /dev/sdw

 Btrfs v3.17.1

  iostat 1 exposes following problem:
 
  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 16.960.00   17.09   65.950.000.00
 
  Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
  sda   0.00 0.00 0.00  0  0
  sdc   0.00 0.00 0.00  0  0
  sdb   0.00 0.00 0.00  0  0
  sde   0.00 0.00 0.00  0  0
  sdd   0.00 0.00 0.00  0  0
  sdf   0.00 0.00 0.00  0  0
  sdg   0.00 0.00 0.00  0  0
  sdj   0.00 0.00 0.00  0  0
  sdh   0.00 0.00 0.00  0  0
  sdk   0.00 0.00 0.00  0  0
  sdi   1.00 0.00   200.00  0200
  sdl   

Re: btrfs stuck with lot's of files

2014-12-02 Thread Duncan
Peter Volkov posted on Tue, 02 Dec 2014 04:50:29 +0300 as excerpted:

 В Пн, 01/12/2014 в 10:47 -0800, Robert White пишет:
 On 12/01/2014 03:46 AM, Peter Volkov wrote:
   (stuff about getting hung up trying to write to one drive)
 
 That drive (/dev/sdn) is probably starting to fail.
 (about failed drive)
 
 Thank you Robert for the answer. It is not likely that drive fails here.
 Similar condition (write to a single drive) happens with other drives
 i.e. such write pattern may happen with any drive.

 After looking at what happens longer I see the following. During stuck
 single processor core is busy 100% of CPU in kernel space (some kworker
 is taking 100% CPU).

FWIW, agreed that it's unlikely to be the drive, especially if you're not 
seeing bus resets or drive errors in dmesg and smart says the drive is 
fine, as I expect it does/will.  It may be a btrfs bug or scaling issue, 
of which btrfs still has some, or it could simply be the single mode vs 
raid0 mode issue I explain below.

# btrfs filesystem df /store/
  Data, single: total=11.92TiB, used=10.86TiB
 
 Reguardless of the above...
 
 You have a terabyte of unused but allocated data storage. You probably
 need to balance your system to un-jamb that. That's a lot of space that
 is unavailable to the metadata (etc).
 
 Well, I'm afraid that balance will put fs into even longer stuck.
 
 ASIDE: Having your metadata set to RAID1 (as opposed to the default of
 DUP) seems a little iffy since your data is still set to DUP.
 
 That's true. But why data is duplicated? During btrfs volume creation
 I've set explicitly -d data single.

I believe Robert mis-wrote (thinko).  The btrfs filesystem df clearly 
shows that your data is in single mode, the data default mode, not dup 
mode, which is normally only available to metadata (not data) on a single-
device filesystem, where it is the metadata default.

However, in the original post you /did/ say raid1 for metadata, raid0 for 
data, and the above btrfs filesystem df again clearly says single, not 
raid0.

Which is very likely to be your problem.  In single mode, btrfs will 
create chunks one at a time, picking the device with the most free space 
to allocate it on.  The normal data chunk size is 1 GiB.  Because of the 
most-free-space allocation rule, with N devices (22 in your case) of the 
same size, after N (22) data chunks are allocated you'll tend to have one 
such chunk on each device.

Each of these 1 GiB chunks (along with space freed up by normal delete 
activity in other allocated data chunks) will be filled before another is 
allocated.

Which will mean you're writing a GiB worth of data to one device before 
you switch to the next one.  With your mostly sub-MiB file write pattern, 
that's probably 1500-2000 files written to a chunk on that single device, 
before another chunk is allocated on the next device.

Thus all your activity on that single device!

In raid0 mode, by contrast, the same 1 GiB chunks will be allocated on 
each device, but a stripe of chunks will be allocated across all devices 
(22 in your case) at the same time, and data being written is broken up 
into much smaller per-device strips.  I'm not sure what the actual per-
device is in raid0 mode, but it's *WELL* under a GiB and I believe in the 
KiBs not MiB range.  It might be 128 KiB, the compression block size when 
the compress mount option is used.

Obviously were you using raid0 data, you'd see the load spread out at 
least somewhat better.  But the df says it's single, not raid0.

To get raid0 mode you can use a balance with filters (see the wiki or 
recent btrfs-balance manpage), or blow away the existing filesystem and 
create a new one, setting --data raid0 when you mkfs.btrfs, and restore 
from backups (which you're already prepared to do if you value your data 
in any case[1]).

That missing btrfs filesystem show, due to the terminating / in /store/ 
(simply /store should work) is somewhat frustrating here, as it'd show 
per-device sizes and utilization.  Assuming near same-sized devices, with 
11 TiB of data being far greater than the 1 GiB data chunk size times 22 
devices I'd guess you're pretty evened out, utilization-wise, but the 
output from both show and df is necessary to get the full story.

 FUTHER ASIDE: raid1 metadata and raid5 data might be good for you given
 22 volumes and 10% empty empty space it would only cost you half of
 your existing empty space. If you don't RAID your data, there is no
 real point to putting your metadata in RAID.
 
 Is raid5 ready for use? As I read post[1] mentioned on[2] it is still
 some way to make it stable.

You are absolutely correct.  I'd strongly recommend staying AWAY from 
btrfs raid5/6 modes at this time.  While Robert is becoming an active 
regular and has the technical background to point out some things others 
miss, he's still reasonably new to this list and may not have been aware 
of the incomplete status of raid5/6 modes at this time.

Effectively 

Re: btrfs stuck with lot's of files

2014-12-02 Thread Ian Armstrong
On Tue, 2 Dec 2014 12:48:21 + (UTC)
Duncan 1i5t5.dun...@cox.net wrote:

 Peter Volkov posted on Tue, 02 Dec 2014 04:50:29 +0300 as excerpted:
 
  В Пн, 01/12/2014 в 10:47 -0800, Robert White пишет:
  On 12/01/2014 03:46 AM, Peter Volkov wrote:
(stuff about getting hung up trying to write to one drive)
  
  That drive (/dev/sdn) is probably starting to fail.
  (about failed drive)
  
  Thank you Robert for the answer. It is not likely that drive fails
  here. Similar condition (write to a single drive) happens with
  other drives i.e. such write pattern may happen with any drive.
 
  After looking at what happens longer I see the following. During
  stuck single processor core is busy 100% of CPU in kernel space
  (some kworker is taking 100% CPU).
 
 FWIW, agreed that it's unlikely to be the drive, especially if you're
 not seeing bus resets or drive errors in dmesg and smart says the
 drive is fine, as I expect it does/will.  It may be a btrfs bug or
 scaling issue, of which btrfs still has some, or it could simply be
 the single mode vs raid0 mode issue I explain below.

I encountered a similar problem here a few days ago on a btrfs raid1
partition while using rsync to clone a (~30GB) directory.

Everything started fine, but I came back an hour later to find rsync had
apparently stalled at about 20% with cpu usage at 100% on a single
kworker thread. I was able to kill rsync eventually, and after a while
(don't know how long, but 10 minutes) cpu usage returned to normal.
Restarting rsync resulted in kworker at 100% cpu in less than a minute.
Once stalled there was little drive access happening. Another raid1
partition (mdadm/ext4) on the same drive pair was having no problems.
Nothing showed in the system logs.

In this instance I'd forgotten to delete a temporary 500GB file before
starting rsync, so although recently balanced (musage=80/dusage=80) it
was running at near capacity.

After a reboot, deleting the 500GB file  running balance, everything
returned to normal. Ran rsync again  it completed fine.

Running slackware current, with Kernel 3.16.4

# btrfs filesystem df /mnt/general
Data, RAID1: total=1.38TiB, used=1.38TiB
System, RAID1: total=32.00MiB, used=256.00KiB
Metadata, RAID1: total=6.00GiB, used=4.67GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

# btrfs filesystem show /mnt/general
Label: none  uuid: 592376ea-769f-4abb-915e-aa5e49162d90
Total devices 2 FS bytes used 1.38TiB
devid1 size 1.79TiB used 1.39TiB path /dev/sda4
devid2 size 1.79TiB used 1.39TiB path /dev/sdd4

Btrfs v3.17.2

-- 
Ian
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs stuck with lot's of files

2014-12-02 Thread Duncan
Ian Armstrong posted on Tue, 02 Dec 2014 18:56:13 + as excerpted:

 On Tue, 2 Dec 2014 12:48:21 + (UTC)
 Duncan 1i5t5.dun...@cox.net wrote:
 
 FWIW, agreed that it's unlikely to be the drive, especially if you're
 not seeing bus resets or drive errors in dmesg and smart says the drive
 is fine, as I expect it does/will.  It may be a btrfs bug or scaling
 issue, of which btrfs still has some, or it could simply be the single
 mode vs raid0 mode issue I explain below.
 
 I encountered a similar problem here a few days ago on a btrfs raid1
 partition while using rsync to clone a (~30GB) directory.
 
 Everything started fine, but I came back an hour later to find rsync had
 apparently stalled at about 20% with cpu usage at 100% on a single
 kworker thread. I was able to kill rsync eventually, and after a while
 (don't know how long, but 10 minutes) cpu usage returned to normal.
 Restarting rsync resulted in kworker at 100% cpu in less than a minute.
 Once stalled there was little drive access happening. Another raid1
 partition (mdadm/ext4) on the same drive pair was having no problems.
 Nothing showed in the system logs.
 
 In this instance I'd forgotten to delete a temporary 500GB file before
 starting rsync, so although recently balanced (musage=80/dusage=80) it
 was running at near capacity.
 
 After a reboot, deleting the 500GB file  running balance, everything
 returned to normal. Ran rsync again  it completed fine.
 
 Running slackware current, with Kernel 3.16.4

FWIW that was my point -- there are still such bugs out there, often 
corner-case so they don't affect most folks most of the time, but out 
there.

I had a similar stall recently, a kworker stuck at 100% that went away 
after I killed whatever app had triggered the problem (pan, the news 
program I'm writing this with, as it happens).  In my case I chalked it 
up to a known corner-case bug in my slightly old 3.17.0 kernel (my use-
case doesn't do read-only snapshots so I'm not affected by that known bug 
that effectively blacklists 3.17.0 for some users; this would have been a 
different one).  I don't /know/ it was that bug, but it most likely was, 
as it's a known but rare corner-case that AFAIK is already fixed in the 
late 3.18-rcs.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs stuck with lot's of files

2014-12-01 Thread Peter Volkov
Hi, guys.

We have a problem with btrfs file system: sometimes it became stuck
without leaving me any way to interrupt it (shutdown -r now is unable to
restart server). By stuck I mean some processes that previously were
able to write on disk are unable to cope with load and load average goes
up:

top - 13:10:58 up 1 day,  9:26,  5 users,  load average: 157.76, 156.61,
149.29
Tasks: 235 total,   2 running, 233 sleeping,   0 stopped,   0 zombie
%Cpu(s): 19.8 us, 15.0 sy,  0.0 ni, 60.7 id,  3.9 wa,  0.0 hi,  0.6 si,
0.0 st
KiB Mem:  65922104 total, 65414856 used,   507248 free, 1844 buffers
KiB Swap:0 total,0 used,0 free. 62570804 cached
Mem

  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND  
 8644 root  20   0   0  0  0 R  96.5  0.0 127:21.95
kworker/u16:16   
 5047 dvr   20   0 6884292 122668   4132 S   6.4  0.2 258:59.49
dvrserver
30223 root  20   0   20140   2600   2132 R   6.4  0.0   0:00.01
top  
1 root  20   04276   1628   1524 S   0.0  0.0   0:40.19
init 



There are about 300 treads on server, some of which are writing on disk.
A bit information about this btrfs filesystem: this is 22 disk file
system with raid1 for metadata and raid0 for data:

 # btrfs filesystem df /store/
Data, single: total=11.92TiB, used=10.86TiB
System, RAID1: total=8.00MiB, used=1.27MiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=46.00GiB, used=33.49GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=128.00KiB
 # btrfs property get /store/
ro=false
label=store
 # btrfs device stats /store/
(shows all zeros)
 # btrfs balance status /store/
No balance found on '/store/'
 # btrfs filesystem show /store/
Btrfs v3.17.1
(btw, is it supposed to have only version here?)

As for load we write quite small files of size (some of 313K, some of
800K), that's why metadata takes that much. So back to the problem.
iostat 1 exposes following problem:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  16.960.00   17.09   65.950.000.00

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda   0.00 0.00 0.00  0  0
sdc   0.00 0.00 0.00  0  0
sdb   0.00 0.00 0.00  0  0
sde   0.00 0.00 0.00  0  0
sdd   0.00 0.00 0.00  0  0
sdf   0.00 0.00 0.00  0  0
sdg   0.00 0.00 0.00  0  0
sdj   0.00 0.00 0.00  0  0
sdh   0.00 0.00 0.00  0  0
sdk   0.00 0.00 0.00  0  0
sdi   1.00 0.00   200.00  0200
sdl   0.00 0.00 0.00  0  0
sdn  48.00 0.00 17260.00  0  17260
sdm   0.00 0.00 0.00  0  0
sdp   0.00 0.00 0.00  0  0
sdo   0.00 0.00 0.00  0  0
sdq   0.00 0.00 0.00  0  0
sdr   0.00 0.00 0.00  0  0
sds   0.00 0.00 0.00  0  0
sdt   0.00 0.00 0.00  0  0
sdv   0.00 0.00 0.00  0  0
sdw   0.00 0.00 0.00  0  0
sdu   0.00 0.00 0.00  0  0


write goes to one disk. I've tried to debug what's going in kworker and
did

$ echo workqueue:workqueue_queue_work
 /sys/kernel/debug/tracing/set_event
$ cat /sys/kernel/debug/tracing/trace_pipe  trace_pipe.out2

trace_pipe2.out.xz in attachment. Could you comment, what goes wrong
here?

Server has 64Gb of RAM. Is it possible that it is unable to keep all
metadata in memory, can we encrease this memory limit, if exists?


Thanks in advance for any pointers,
--
Peter.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs stuck with lot's of files

2014-12-01 Thread Robert White

On 12/01/2014 03:46 AM, Peter Volkov wrote:

Hi, guys.

 (stuff about getting hung up trying to write to one drive)

That drive (/dev/sdn) is probably starting to fail. Some older drives 
basically go unresponsive when they start to go bad. Particularly if 
they've gone bad enough to have run out of spare tracks/sectors. 
Sometimes they will just refuse to answer. Sometimes they will go into 
try again mode, and the same activity will be retried indefinitely. 
This will then fill up your write queues and jam up all sorts of subsystems.


Step 1: Backup your data. Since you didn't RAID your data at all, when 
that drive dies your data is going to go away in fascinating and 
unpredictable ways. (RAID1 metadata with no RAID1 or RAID5 of the data 
means you have essentially no media failure protection.)


Step 2: Turn on SMART (if you can and you can) and check whether the 
drive is in its final moments of life. If your disk is all green lights 
according to smart, you may be able to un-jamb it by just doing a 
balance as described and explained after the next time I quote you.


Step 3: Switch your data mode to RAID5. It will cost you about half of 
your currenly free data space, but it won't leave you _as_ _vulnerable_ 
to complete data loss as you are now. SMART might be wrong about your 
drive being fine if it says it is.



  # btrfs filesystem df /store/
Data, single: total=11.92TiB, used=10.86TiB


Reguardless of the above...

You have a terabyte of unused but allocated data storage. You probably 
need to balance your system to un-jamb that. That's a lot of space that 
is unavailable to the metadata (etc).


ASIDE: Having your metadata set to RAID1 (as opposed to the default of 
DUP) seems a little iffy since your data is still set to DUP. This 
configuration is not going to leave you with a mountable filesystem if 
you lose a disk. I'm not sure if the RAID1 layout is going to want to 
put specific datum in specific places, but it might, which if it does 
might leave you in an irreconcilable position.


Either way, you will probably un-jam your system in the short run by 
doing a balance. A full balance (no filter args at all) would be your 
best bet.


FUTHER ASIDE: raid1 metadata and raid5 data might be good for you given 
22 volumes and 10% empty empty space it would only cost you half of your 
existing empty space. If you don't RAID your data, there is no real 
point to putting your metadata in RAID.


[Yes, I said my basic points about your current layout two different 
ways and times. You are either just a little over-committed on space 
or you are about to lose all your data and it's impossible to tell 
which is the case from here.]


Backup your data. NOW!

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs stuck with lot's of files

2014-12-01 Thread Qu Wenruo


 Original Message 
Subject: btrfs stuck with lot's of files
From: Peter Volkov p...@gentoo.org
To: linux-btrfs@vger.kernel.org linux-btrfs@vger.kernel.org
Date: 2014年12月01日 19:46

Hi, guys.

We have a problem with btrfs file system: sometimes it became stuck
without leaving me any way to interrupt it (shutdown -r now is unable to
restart server). By stuck I mean some processes that previously were
able to write on disk are unable to cope with load and load average goes
up:

top - 13:10:58 up 1 day,  9:26,  5 users,  load average: 157.76, 156.61,
149.29
Tasks: 235 total,   2 running, 233 sleeping,   0 stopped,   0 zombie
%Cpu(s): 19.8 us, 15.0 sy,  0.0 ni, 60.7 id,  3.9 wa,  0.0 hi,  0.6 si,
0.0 st
KiB Mem:  65922104 total, 65414856 used,   507248 free, 1844 buffers
KiB Swap:0 total,0 used,0 free. 62570804 cached
Mem

   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND
  8644 root  20   0   0  0  0 R  96.5  0.0 127:21.95
kworker/u16:16
  5047 dvr   20   0 6884292 122668   4132 S   6.4  0.2 258:59.49
dvrserver
30223 root  20   0   20140   2600   2132 R   6.4  0.0   0:00.01
top
 1 root  20   04276   1628   1524 S   0.0  0.0   0:40.19
init



There are about 300 treads on server, some of which are writing on disk.
A bit information about this btrfs filesystem: this is 22 disk file
system with raid1 for metadata and raid0 for data:

  # btrfs filesystem df /store/
Data, single: total=11.92TiB, used=10.86TiB
System, RAID1: total=8.00MiB, used=1.27MiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=46.00GiB, used=33.49GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=128.00KiB
  # btrfs property get /store/
ro=false
label=store
  # btrfs device stats /store/
(shows all zeros)
  # btrfs balance status /store/
No balance found on '/store/'
  # btrfs filesystem show /store/
Btrfs v3.17.1
(btw, is it supposed to have only version here?)
This is a small bug that if there is appending '/' in the path for 
'btrfs fi show', it can't recognize it

Patch is already sent and maybe included next version.


As for load we write quite small files of size (some of 313K, some of
800K), that's why metadata takes that much. So back to the problem.
iostat 1 exposes following problem:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   16.960.00   17.09   65.950.000.00

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda   0.00 0.00 0.00  0  0
sdc   0.00 0.00 0.00  0  0
sdb   0.00 0.00 0.00  0  0
sde   0.00 0.00 0.00  0  0
sdd   0.00 0.00 0.00  0  0
sdf   0.00 0.00 0.00  0  0
sdg   0.00 0.00 0.00  0  0
sdj   0.00 0.00 0.00  0  0
sdh   0.00 0.00 0.00  0  0
sdk   0.00 0.00 0.00  0  0
sdi   1.00 0.00   200.00  0200
sdl   0.00 0.00 0.00  0  0
sdn  48.00 0.00 17260.00  0  17260
sdm   0.00 0.00 0.00  0  0
sdp   0.00 0.00 0.00  0  0
sdo   0.00 0.00 0.00  0  0
sdq   0.00 0.00 0.00  0  0
sdr   0.00 0.00 0.00  0  0
sds   0.00 0.00 0.00  0  0
sdt   0.00 0.00 0.00  0  0
sdv   0.00 0.00 0.00  0  0
sdw   0.00 0.00 0.00  0  0
sdu   0.00 0.00 0.00  0  0


write goes to one disk. I've tried to debug what's going in kworker and
did

$ echo workqueue:workqueue_queue_work

/sys/kernel/debug/tracing/set_event

$ cat /sys/kernel/debug/tracing/trace_pipe  trace_pipe.out2

trace_pipe2.out.xz in attachment. Could you comment, what goes wrong
here?
It seems that attachment is blocked by mail-list so I didn't see the 
attachment.


Server has 64Gb of RAM. Is it possible that it is unable to keep all
metadata in memory, can we encrease this memory limit, if exists?

Not possible, it will never happen (if nothing goes wrong).
Kernel has the outstanding page cache mechanism, when memory comes short,
some cached metadata/data can be flushed back(if dirty) to disk to free 
space.

And re-read from disk if needed later.

So kernel don't need to load all

Re: btrfs stuck with lot's of files

2014-12-01 Thread Peter Volkov
В Вт, 02/12/2014 в 09:33 +0800, Qu Wenruo пишет:
  Original Message 
 Subject: btrfs stuck with lot's of files
 From: Peter Volkov p...@gentoo.org
 To: linux-btrfs@vger.kernel.org linux-btrfs@vger.kernel.org
 Date: 2014年12月01日 19:46
  Hi, guys.
 
  We have a problem with btrfs file system: sometimes it became stuck
  without leaving me any way to interrupt it (shutdown -r now is unable to
  restart server). By stuck I mean some processes that previously were
  able to write on disk are unable to cope with load and load average goes
  up:
 
  top - 13:10:58 up 1 day,  9:26,  5 users,  load average: 157.76, 156.61,
  149.29
  Tasks: 235 total,   2 running, 233 sleeping,   0 stopped,   0 zombie
  %Cpu(s): 19.8 us, 15.0 sy,  0.0 ni, 60.7 id,  3.9 wa,  0.0 hi,  0.6 si,
  0.0 st
  KiB Mem:  65922104 total, 65414856 used,   507248 free, 1844 buffers
  KiB Swap:0 total,0 used,0 free. 62570804 cached
  Mem
 
 PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
  COMMAND
8644 root  20   0   0  0  0 R  96.5  0.0 127:21.95
  kworker/u16:16
5047 dvr   20   0 6884292 122668   4132 S   6.4  0.2 258:59.49
  dvrserver
  30223 root  20   0   20140   2600   2132 R   6.4  0.0   0:00.01
  top
   1 root  20   04276   1628   1524 S   0.0  0.0   0:40.19
  init
 
 
 
  There are about 300 treads on server, some of which are writing on disk.
  A bit information about this btrfs filesystem: this is 22 disk file
  system with raid1 for metadata and raid0 for data:
 
# btrfs filesystem df /store/
  Data, single: total=11.92TiB, used=10.86TiB
  System, RAID1: total=8.00MiB, used=1.27MiB
  System, single: total=4.00MiB, used=0.00B
  Metadata, RAID1: total=46.00GiB, used=33.49GiB
  Metadata, single: total=8.00MiB, used=0.00B
  GlobalReserve, single: total=512.00MiB, used=128.00KiB
# btrfs property get /store/
  ro=false
  label=store
# btrfs device stats /store/
  (shows all zeros)
# btrfs balance status /store/
  No balance found on '/store/'
# btrfs filesystem show /store/
  Btrfs v3.17.1
  (btw, is it supposed to have only version here?)
 This is a small bug that if there is appending '/' in the path for 
 'btrfs fi show', it can't recognize it
 Patch is already sent and maybe included next version.
 
  As for load we write quite small files of size (some of 313K, some of
  800K), that's why metadata takes that much. So back to the problem.
  iostat 1 exposes following problem:
 
  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 16.960.00   17.09   65.950.000.00
 
  Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
  sda   0.00 0.00 0.00  0  0
  sdc   0.00 0.00 0.00  0  0
  sdb   0.00 0.00 0.00  0  0
  sde   0.00 0.00 0.00  0  0
  sdd   0.00 0.00 0.00  0  0
  sdf   0.00 0.00 0.00  0  0
  sdg   0.00 0.00 0.00  0  0
  sdj   0.00 0.00 0.00  0  0
  sdh   0.00 0.00 0.00  0  0
  sdk   0.00 0.00 0.00  0  0
  sdi   1.00 0.00   200.00  0200
  sdl   0.00 0.00 0.00  0  0
  sdn  48.00 0.00 17260.00  0  17260
  sdm   0.00 0.00 0.00  0  0
  sdp   0.00 0.00 0.00  0  0
  sdo   0.00 0.00 0.00  0  0
  sdq   0.00 0.00 0.00  0  0
  sdr   0.00 0.00 0.00  0  0
  sds   0.00 0.00 0.00  0  0
  sdt   0.00 0.00 0.00  0  0
  sdv   0.00 0.00 0.00  0  0
  sdw   0.00 0.00 0.00  0  0
  sdu   0.00 0.00 0.00  0  0
 
 
  write goes to one disk. I've tried to debug what's going in kworker and
  did
 
  $ echo workqueue:workqueue_queue_work
  /sys/kernel/debug/tracing/set_event
  $ cat /sys/kernel/debug/tracing/trace_pipe  trace_pipe.out2
 
  trace_pipe2.out.xz in attachment. Could you comment, what goes wrong
  here?
 It seems that attachment is blocked by mail-list so I didn't see the 
 attachment.

I've put it here:
https://drive.google.com/file/d/0BygFL6N3ZVUAMWxCQ0tDREE1Uzg/view?usp=sharing

And some additional information I've put in another letter that just
sent to mailing list.

  Server has 64Gb