Re: BTRFS free space handling still needs more work: Hangs again (no complete lockups, "just" tasks stuck for some time)

Martin Steigerwald Mon, 29 Dec 2014 01:32:20 -0800

Am Sonntag, 28. Dezember 2014, 21:07:05 schrieb Zygo Blaxell:
> On Sat, Dec 27, 2014 at 08:23:59PM +0100, Martin Steigerwald wrote:
> > My simple test case didn´t trigger it, and I so not have another twice 160
> > GiB available on this SSDs available to try with a copy of my home
> > filesystem. Then I could safely test without bringing the desktop session to
> > an halt. Maybe someone has an idea on how to "enhance" my test case in
> > order to reliably trigger the issue.
> > 
> > It may be challenging tough. My /home is quite a filesystem. It has a 
> > maildir
> > with at least one million of files (yeah, I am performance testing KMail and
> > Akonadi as well to the limit!), and it has git repos and this one VM image,
> > and the desktop search and the Akonadi database. In other words: It has
> > been hit nicely with various mostly random I think workloads over the last
> > about six months. I bet its not that easy to simulate that. Maybe some runs
> > of compilebench to age the filesystem before the fio test?
> > 
> > That said, BTRFS performs a lot better. The complete lockups without any
> > CPU usage of 3.15 and 3.16 have gone for sure. Thats wonderful. But there
> > is this kworker issue now. I noticed it that gravely just while trying to
> > complete this tax returns stuff with the Windows XP VM. Otherwise it may
> > have happened, I have seen some backtraces in kern.log, but it didn´t last
> > for minutes. So this indeed is of less severity than the full lockups with
> > 3.15 and 3.16.
> > 
> > Zygo, was is the characteristics of your filesystem. Do you use
> > compress=lzo and skinny metadata as well? How are the chunks allocated?
> > What kind of data you have on it?
> 
> compress-force (default zlib), no skinny-metadata.  Chunks are d=single,
> m=dup.  Data is a mix of various desktop applications, most active
> file sizes from a few hundred K to a few MB, maybe 300k-400k files.
> No database or VM workloads.  Filesystem is 100GB and is usually between
> 98 and 99% full (about 1-2GB free).
> 
> I have another filesystem which has similar problems when it's 99.99%
> full (it's 13TB, so 0.01% is 1.3GB).  That filesystem is RAID1 with
> skinny-metadata and no-holes.
> 
> On various filesystems I have the above CPU-burning problem, a bunch of
> irreproducible random crashes, and a hang with a kernel stack that goes
> through SyS_unlinkat and btrfs_evict_inode.


Zygo, thanks. That desktop filesystem sounds a bit similar to my usecase,
with the interesting difference that you have no databases or VMs on it.

That said, I use the Windows XP rarely, but using it was what made the issue
so visible for me. Is your desktop filesystem on SSD?

Do you have the chance to extend one of the affected filesystems to check
my theory that this does not happen as long as BTRFS can still allocate new
data chunks? If its right, your FS should be fluent again as long as you see
more than 1 GiB free

Label: none  uuid: 53bdf47c-4298-45bc-a30f-8a310c274069
        Total devices 2 FS bytes used 512.00KiB
        devid    1 size 10.00GiB used 6.53GiB path /dev/mapper/sata-btrfsraid1
        devid    2 size 10.00GiB used 6.53GiB path /dev/mapper/msata-btrfsraid1

between "size" and "used" in btrfs fi sh. I suggest going with at least 2-3
GiB, as BTRFS may allocate just one chunk so quickly that you do not have
the chance to recognize the difference.

Well, and if thats works for you, we are back to my recommendation:

More so than with other filesystems give BTRFS plenty of free space to
operate with. At best as much, that you always have a mininum of 2-3 GiB
unused device space for chunk reservation left. One could even do some
Nagios/Icinga monitoring plugin for that :)

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

signature.asc
Description: This is a digitally signed message part.

Re: BTRFS free space handling still needs more work: Hangs again (no complete lockups, "just" tasks stuck for some time)

Reply via email to