Reiterate: btrfs stuck with lot's of files

2014-12-04 Thread Peter Volkov
Hi, guys again. Looking at this issue, I suspect this is bug in btrfs.
We'll have to clean up this installation soon, so if there is any
request to do some debugging, please, ask. I'll try to reiterate what
was said in this thread.

Short story: btrfs filesystem made of 22 1Tb disks with lot's of files
(~3024). Write load is 25 Mbyte/second. After some time file system
became unable to cope with this load. Also at this time `sync` takes
ages to finish, shutdown -r hangs (I guess related to sync).

Also I see there is one some kernel kworker that is main suspect for
this behavior: all the time it takes 100% of CPU core, jumping from core
to core. At the same time according to iostat write/read speed is close
to zero and everything is stuck.

Siting some details from previous messages:

> > top - 13:10:58 up 1 day,  9:26,  5 users,  load average: 157.76, 156.61, 
> > 149.29
> > Tasks: 235 total,   2 running, 233 sleeping,   0 stopped,   0 zombie
> > %Cpu(s): 19.8 us, 15.0 sy,  0.0 ni, 60.7 id,  3.9 wa,  0.0 hi,  0.6 si, 0.0 
> > st
> > KiB Mem:  65922104 total, 65414856 used,   507248 free, 1844 buffers
> > KiB Swap:0 total,0 used,0 free. 62570804 cached Mem
> >
> >PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> > COMMAND
> >   8644 root  20   0   0  0  0 R  96.5  0.0 127:21.95 
> > kworker/u16:16
> >   5047 dvr   20   0 6884292 122668   4132 S   6.4  0.2 258:59.49 
> > dvrserver
> > 30223 root  20   0   20140   2600   2132 R   6.4  0.0   0:00.01 top
> >  1 root  20   04276   1628   1524 S   0.0  0.0   0:40.19 init
> >
> > There are about 300 treads on server, some of which are writing on disk.
> > A bit information about this btrfs filesystem: this is 22 disk file
> > system with raid1 for metadata and raid0 for data:
> >
> >   # btrfs filesystem df /store/
> > Data, single: total=11.92TiB, used=10.86TiB
> > System, RAID1: total=8.00MiB, used=1.27MiB
> > System, single: total=4.00MiB, used=0.00B
> > Metadata, RAID1: total=46.00GiB, used=33.49GiB
> > Metadata, single: total=8.00MiB, used=0.00B
> > GlobalReserve, single: total=512.00MiB, used=128.00KiB
> >   # btrfs property get /store/
> > ro=false
> > label=store
> >   # btrfs device stats /store/
> > (shows all zeros)
> >   # btrfs balance status /store/
> > No balance found on '/store/'

 # btrfs filesystem show
Label: 'store'  uuid: 296404d1-bd3f-417d-8501-02f8d7906bcf
Total devices 22 FS bytes used 6.50TiB
devid1 size 931.51GiB used 558.02GiB path /dev/sdb
devid2 size 931.51GiB used 559.00GiB path /dev/sdc
devid3 size 931.51GiB used 559.00GiB path /dev/sdd
devid4 size 931.51GiB used 559.00GiB path /dev/sde
devid5 size 931.51GiB used 559.00GiB path /dev/sdf
devid6 size 931.51GiB used 559.00GiB path /dev/sdg
devid7 size 931.51GiB used 559.00GiB path /dev/sdh
devid8 size 931.51GiB used 559.00GiB path /dev/sdi
devid9 size 931.51GiB used 559.00GiB path /dev/sdj
devid   10 size 931.51GiB used 559.00GiB path /dev/sdk
devid   11 size 931.51GiB used 559.00GiB path /dev/sdl
devid   12 size 931.51GiB used 559.00GiB path /dev/sdm
devid   13 size 931.51GiB used 559.00GiB path /dev/sdn
devid   14 size 931.51GiB used 559.00GiB path /dev/sdo
devid   15 size 931.51GiB used 559.00GiB path /dev/sdp
devid   16 size 931.51GiB used 559.00GiB path /dev/sdq
devid   17 size 931.51GiB used 559.00GiB path /dev/sdr
devid   18 size 931.51GiB used 559.00GiB path /dev/sds
devid   19 size 931.51GiB used 559.00GiB path /dev/sdt
devid   20 size 931.51GiB used 559.00GiB path /dev/sdu
devid   21 size 931.51GiB used 559.01GiB path /dev/sdv
devid   22 size 931.51GiB used 560.01GiB path /dev/sdw

Btrfs v3.17.1

> > iostat 1 exposes following problem:
> >
> > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> >16.960.00   17.09   65.950.000.00
> >
> > Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
> > sda   0.00 0.00 0.00  0  0
> > sdc   0.00 0.00 0.00  0  0
> > sdb   0.00 0.00 0.00  0  0
> > sde   0.00 0.00 0.00  0  0
> > sdd   0.00 0.00 0.00  0  0
> > sdf   0.00 0.00 0.00  0  0
> > sdg   0.00 0.00 0.00  0  0
> > sdj   0.00 0.00 0.00  0  0
> > sdh   0.00 0.00 0.00  0  0
> > sdk   0.00 0.00 0.00  0  0
> > sdi   1.00 0.00   200.00  0200
> > sdl   0.00

btrfs stuck with lot's of files

2014-12-01 Thread Peter Volkov
Hi, guys.

We have a problem with btrfs file system: sometimes it became stuck
without leaving me any way to interrupt it (shutdown -r now is unable to
restart server). By stuck I mean some processes that previously were
able to write on disk are unable to cope with load and load average goes
up:

top - 13:10:58 up 1 day,  9:26,  5 users,  load average: 157.76, 156.61,
149.29
Tasks: 235 total,   2 running, 233 sleeping,   0 stopped,   0 zombie
%Cpu(s): 19.8 us, 15.0 sy,  0.0 ni, 60.7 id,  3.9 wa,  0.0 hi,  0.6 si,
0.0 st
KiB Mem:  65922104 total, 65414856 used,   507248 free, 1844 buffers
KiB Swap:0 total,0 used,0 free. 62570804 cached
Mem

  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND  
 8644 root  20   0   0  0  0 R  96.5  0.0 127:21.95
kworker/u16:16   
 5047 dvr   20   0 6884292 122668   4132 S   6.4  0.2 258:59.49
dvrserver
30223 root  20   0   20140   2600   2132 R   6.4  0.0   0:00.01
top  
1 root  20   04276   1628   1524 S   0.0  0.0   0:40.19
init 



There are about 300 treads on server, some of which are writing on disk.
A bit information about this btrfs filesystem: this is 22 disk file
system with raid1 for metadata and raid0 for data:

 # btrfs filesystem df /store/
Data, single: total=11.92TiB, used=10.86TiB
System, RAID1: total=8.00MiB, used=1.27MiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=46.00GiB, used=33.49GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=128.00KiB
 # btrfs property get /store/
ro=false
label=store
 # btrfs device stats /store/
(shows all zeros)
 # btrfs balance status /store/
No balance found on '/store/'
 # btrfs filesystem show /store/
Btrfs v3.17.1
(btw, is it supposed to have only version here?)

As for load we write quite small files of size (some of 313K, some of
800K), that's why metadata takes that much. So back to the problem.
iostat 1 exposes following problem:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  16.960.00   17.09   65.950.000.00

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda   0.00 0.00 0.00  0  0
sdc   0.00 0.00 0.00  0  0
sdb   0.00 0.00 0.00  0  0
sde   0.00 0.00 0.00  0  0
sdd   0.00 0.00 0.00  0  0
sdf   0.00 0.00 0.00  0  0
sdg   0.00 0.00 0.00  0  0
sdj   0.00 0.00 0.00  0  0
sdh   0.00 0.00 0.00  0  0
sdk   0.00 0.00 0.00  0  0
sdi   1.00 0.00   200.00  0200
sdl   0.00 0.00 0.00  0  0
sdn  48.00 0.00 17260.00  0  17260
sdm   0.00 0.00 0.00  0  0
sdp   0.00 0.00 0.00  0  0
sdo   0.00 0.00 0.00  0  0
sdq   0.00 0.00 0.00  0  0
sdr   0.00 0.00 0.00  0  0
sds   0.00 0.00 0.00  0  0
sdt   0.00 0.00 0.00  0  0
sdv   0.00 0.00 0.00  0  0
sdw   0.00 0.00 0.00  0  0
sdu   0.00 0.00 0.00  0  0


write goes to one disk. I've tried to debug what's going in kworker and
did

$ echo workqueue:workqueue_queue_work
> /sys/kernel/debug/tracing/set_event
$ cat /sys/kernel/debug/tracing/trace_pipe > trace_pipe.out2

trace_pipe2.out.xz in attachment. Could you comment, what goes wrong
here?

Server has 64Gb of RAM. Is it possible that it is unable to keep all
metadata in memory, can we encrease this memory limit, if exists?


Thanks in advance for any pointers,
--
Peter.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs stuck with lot's of files

2014-12-01 Thread Peter Volkov
В Пн, 01/12/2014 в 10:47 -0800, Robert White пишет:
> On 12/01/2014 03:46 AM, Peter Volkov wrote:
>  > (stuff about getting hung up trying to write to one drive)
> 
> That drive (/dev/sdn) is probably starting to fail.
> (about failed drive)

Thank you Robert for the answer. It is not likely that drive fails here.
Similar condition (write to a single drive) happens with other drives
i.e. such write pattern may happen with any drive.

After looking at what happens longer I see the following. During stuck
single processor core is busy 100% of CPU in kernel space (some kworker
is taking 100% CPU). Ftrace reveals that
btrfs_async_reclaim_metadata_space is most frequently called function.
So it looks like btrfs is doing some operation with metadata and until
it finishes that everything is stuck (practically no writes happens on
disk). So I'm looking for suggestion on how to cope with this process.

> >   # btrfs filesystem df /store/
> > Data, single: total=11.92TiB, used=10.86TiB
> 
> Reguardless of the above...
> 
> You have a terabyte of unused but allocated data storage. You probably 
> need to balance your system to un-jamb that. That's a lot of space that 
> is unavailable to the metadata (etc).

Well, I'm afraid that balance will put fs into even longer "stuck".

> ASIDE: Having your metadata set to RAID1 (as opposed to the default of 
> DUP) seems a little iffy since your data is still set to DUP.

That's true. But why data is duplicated? During btrfs volume creation
I've set explicitly -d data single.

> FUTHER ASIDE: raid1 metadata and raid5 data might be good for you given 
> 22 volumes and 10% empty empty space it would only cost you half of your 
> existing empty space. If you don't RAID your data, there is no real 
> point to putting your metadata in RAID.

Is raid5 ready for use? As I read post[1] mentioned on[2] it is still
some way to make it stable.

[1]
http://marc.merlins.org/perso/btrfs/post_2014-03-23_Btrfs-Raid5-Status.html
[2] https://btrfs.wiki.kernel.org/index.php/RAID56

--
Peter.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs stuck with lot's of files

2014-12-01 Thread Peter Volkov
В Вт, 02/12/2014 в 09:33 +0800, Qu Wenruo пишет:
>  Original Message 
> Subject: btrfs stuck with lot's of files
> From: Peter Volkov 
> To: linux-btrfs@vger.kernel.org 
> Date: 2014年12月01日 19:46
> > Hi, guys.
> >
> > We have a problem with btrfs file system: sometimes it became stuck
> > without leaving me any way to interrupt it (shutdown -r now is unable to
> > restart server). By stuck I mean some processes that previously were
> > able to write on disk are unable to cope with load and load average goes
> > up:
> >
> > top - 13:10:58 up 1 day,  9:26,  5 users,  load average: 157.76, 156.61,
> > 149.29
> > Tasks: 235 total,   2 running, 233 sleeping,   0 stopped,   0 zombie
> > %Cpu(s): 19.8 us, 15.0 sy,  0.0 ni, 60.7 id,  3.9 wa,  0.0 hi,  0.6 si,
> > 0.0 st
> > KiB Mem:  65922104 total, 65414856 used,   507248 free, 1844 buffers
> > KiB Swap:0 total,0 used,0 free. 62570804 cached
> > Mem
> >
> >PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> > COMMAND
> >   8644 root  20   0   0  0  0 R  96.5  0.0 127:21.95
> > kworker/u16:16
> >   5047 dvr   20   0 6884292 122668   4132 S   6.4  0.2 258:59.49
> > dvrserver
> > 30223 root  20   0   20140   2600   2132 R   6.4  0.0   0:00.01
> > top
> >  1 root  20   04276   1628   1524 S   0.0  0.0   0:40.19
> > init
> >
> >
> >
> > There are about 300 treads on server, some of which are writing on disk.
> > A bit information about this btrfs filesystem: this is 22 disk file
> > system with raid1 for metadata and raid0 for data:
> >
> >   # btrfs filesystem df /store/
> > Data, single: total=11.92TiB, used=10.86TiB
> > System, RAID1: total=8.00MiB, used=1.27MiB
> > System, single: total=4.00MiB, used=0.00B
> > Metadata, RAID1: total=46.00GiB, used=33.49GiB
> > Metadata, single: total=8.00MiB, used=0.00B
> > GlobalReserve, single: total=512.00MiB, used=128.00KiB
> >   # btrfs property get /store/
> > ro=false
> > label=store
> >   # btrfs device stats /store/
> > (shows all zeros)
> >   # btrfs balance status /store/
> > No balance found on '/store/'
> >   # btrfs filesystem show /store/
> > Btrfs v3.17.1
> > (btw, is it supposed to have only version here?)
> This is a small bug that if there is appending '/' in the path for 
> 'btrfs fi show', it can't recognize it
> Patch is already sent and maybe included next version.
> >
> > As for load we write quite small files of size (some of 313K, some of
> > 800K), that's why metadata takes that much. So back to the problem.
> > iostat 1 exposes following problem:
> >
> > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> >16.960.00   17.09   65.950.000.00
> >
> > Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
> > sda   0.00 0.00 0.00  0  0
> > sdc   0.00 0.00 0.00  0  0
> > sdb   0.00 0.00 0.00  0  0
> > sde   0.00 0.00 0.00  0  0
> > sdd   0.00 0.00 0.00  0  0
> > sdf   0.00 0.00 0.00  0  0
> > sdg   0.00 0.00 0.00  0  0
> > sdj   0.00 0.00 0.00  0  0
> > sdh   0.00 0.00 0.00  0  0
> > sdk   0.00 0.00 0.00  0  0
> > sdi   1.00 0.00   200.00  0200
> > sdl   0.00 0.00 0.00  0  0
> > sdn  48.00 0.00 17260.00  0  17260
> > sdm   0.00 0.00 0.00  0  0
> > sdp   0.00 0.00 0.00  0  0
> > sdo   0.00 0.00 0.00  0  0
> > sdq   0.00 0.00 0.00  0  0
> > sdr   0.00 0.00 0.00  0  0
> > sds   0.00 0.00 0.00  0  0
> > sdt   0.00 0.00 0.00  0  0
> > sdv   0.00 0.00 0.00  0  0
> > sdw   0.00 0.00 0.00