Re: BTRFS free space handling still needs more work: Hangs again (no complete lockups, "just" tasks stuck for some time)

Zygo Blaxell Tue, 06 Jan 2015 12:03:50 -0800

On Mon, Dec 29, 2014 at 10:32:00AM +0100, Martin Steigerwald wrote:
> Am Sonntag, 28. Dezember 2014, 21:07:05 schrieb Zygo Blaxell:
> > On Sat, Dec 27, 2014 at 08:23:59PM +0100, Martin Steigerwald wrote:
> > > My simple test case didn´t trigger it, and I so not have another twice 160
> > > GiB available on this SSDs available to try with a copy of my home
> > > filesystem. Then I could safely test without bringing the desktop session 
> > > to
> > > an halt. Maybe someone has an idea on how to "enhance" my test case in
> > > order to reliably trigger the issue.
> > > 
> > > It may be challenging tough. My /home is quite a filesystem. It has a 
> > > maildir
> > > with at least one million of files (yeah, I am performance testing KMail 
> > > and
> > > Akonadi as well to the limit!), and it has git repos and this one VM 
> > > image,
> > > and the desktop search and the Akonadi database. In other words: It has
> > > been hit nicely with various mostly random I think workloads over the last
> > > about six months. I bet its not that easy to simulate that. Maybe some 
> > > runs
> > > of compilebench to age the filesystem before the fio test?
> > > 
> > > That said, BTRFS performs a lot better. The complete lockups without any
> > > CPU usage of 3.15 and 3.16 have gone for sure. Thats wonderful. But there
> > > is this kworker issue now. I noticed it that gravely just while trying to
> > > complete this tax returns stuff with the Windows XP VM. Otherwise it may
> > > have happened, I have seen some backtraces in kern.log, but it didn´t last
> > > for minutes. So this indeed is of less severity than the full lockups with
> > > 3.15 and 3.16.
> > > 
> > > Zygo, was is the characteristics of your filesystem. Do you use
> > > compress=lzo and skinny metadata as well? How are the chunks allocated?
> > > What kind of data you have on it?
> > 
> > compress-force (default zlib), no skinny-metadata.  Chunks are d=single,
> > m=dup.  Data is a mix of various desktop applications, most active
> > file sizes from a few hundred K to a few MB, maybe 300k-400k files.
> > No database or VM workloads.  Filesystem is 100GB and is usually between
> > 98 and 99% full (about 1-2GB free).
> > 
> > I have another filesystem which has similar problems when it's 99.99%
> > full (it's 13TB, so 0.01% is 1.3GB).  That filesystem is RAID1 with
> > skinny-metadata and no-holes.
> > 
> > On various filesystems I have the above CPU-burning problem, a bunch of
> > irreproducible random crashes, and a hang with a kernel stack that goes
> > through SyS_unlinkat and btrfs_evict_inode.
> 
> Zygo, thanks. That desktop filesystem sounds a bit similar to my usecase,
> with the interesting difference that you have no databases or VMs on it.
> 
> That said, I use the Windows XP rarely, but using it was what made the issue
> so visible for me. Is your desktop filesystem on SSD?


No, but I recently stumbled across the same symptoms on an 8GB SD card
on kernel 3.12.24 (raspberry pi).  When the filesystem hit over ~97%
full, all accesses were blocked for several minutes.  I was able to
work around it by adjusting the threshold on a garbage collector daemon
(i.e. deleting a lot of expendable files) to keep usage below 90%.
I didn't try to balance the filesystem, and didn't seem to need to.

ext3 has a related problem when it's nearly full:  it will try to search
gigabytes of block allocation bitmaps searching for a free block, which
can result in a single 'mkdir' call spending 45 minutes reading a large
slow 99.5% full filesystem.

I'd expect a btrfs filesystem that was nearly full to have a small tree
of cached free space extents and be able to search it quickly even if
the result is negative (i.e. there's no free space).  It seems to be
doing something else... :-P

> Do you have the chance to extend one of the affected filesystems to check
> my theory that this does not happen as long as BTRFS can still allocate new
> data chunks? If its right, your FS should be fluent again as long as you see
> more than 1 GiB free
> 
> Label: none  uuid: 53bdf47c-4298-45bc-a30f-8a310c274069
>         Total devices 2 FS bytes used 512.00KiB
>         devid    1 size 10.00GiB used 6.53GiB path /dev/mapper/sata-btrfsraid1
>         devid    2 size 10.00GiB used 6.53GiB path 
> /dev/mapper/msata-btrfsraid1
> 
> between "size" and "used" in btrfs fi sh. I suggest going with at least 2-3
> GiB, as BTRFS may allocate just one chunk so quickly that you do not have
> the chance to recognize the difference.

So far I've found that problems start when space drops below 1GB free
(although it can go as low as 400MB) and problems stop when space gets
above 1GB free, even without resizing or balancing the filesystem.
I've adjusted free space monitoring thresholds accordingly for now,
and it seems to be keeping things working so far.

> Well, and if thats works for you, we are back to my recommendation:
> 
> More so than with other filesystems give BTRFS plenty of free space to
> operate with. At best as much, that you always have a mininum of 2-3 GiB
> unused device space for chunk reservation left. One could even do some
> Nagios/Icinga monitoring plugin for that :)
> 
> -- 
> Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
> GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

signature.asc
Description: Digital signature

Re: BTRFS free space handling still needs more work: Hangs again (no complete lockups, "just" tasks stuck for some time)

Reply via email to