Re: BTRFS free space handling still needs more work: Hangs again

Robert White Sat, 27 Dec 2014 07:15:06 -0800

On 12/27/2014 06:21 AM, Martin Steigerwald wrote:

Am Samstag, 27. Dezember 2014, 15:14:05 schrieb Martin Steigerwald:

Am Samstag, 27. Dezember 2014, 06:00:48 schrieb Robert White:

On 12/27/2014 05:16 AM, Martin Steigerwald wrote:

It can easily be reproduced without even using Virtualbox, just by a
nice
simple fio job.


TL;DR: If you want a worst-case example of consuming a BTRFS filesystem
with one single file...

#!/bin/bash
# not tested, so correct any syntax errors
typeset -i counter
for ((counter=250;counter>0;counter--)); do

   dd if=/dev/urandom of=/some/file bs=4k count=$counter

done
exit


Each pass over /some/file is 4k shorter than the previous one, but none
of the extents can be deallocated. File will be 1MiB in size and usage
will be something like 125.5MiB (if I've done the math correctly).
larger values of counter will result in exponentially larger amounts of
waste.


Robert, I experienced this hang issues even before the defragmenting case.
It happened while just installed a 400 MiB tax returns application to it
(that is no joke, it is that big).

It happens while just using the VM.

Yes, I recommend not to use BTRFS for any VM image or any larger database on
rotating storage for exactly that COW semantics.

But on SSD?

Its busy looping a CPU core and while the flash is basically idling.

I refuse to believe that this is by design.

I do think there is a *bug*.

Either acknowledge it and try to fix it, or say its by design *without even
looking at it closely enough to be sure that it is not a bug* and limit your
own possibilities by it.

I´d rather see it treated as a bug for now.

Come on, 254 IOPS on a filesystem with still 17 GiB of free space while
randomly writing to a 4 GiB file.

People do these kind of things. Ditch that defrag Windows XP VM case, I had
performance issue even before by just installing things to it. Databases,
VMs, emulators. And heck even while just *creating* the file with fio as I
shown.


Add to these use cases things like this:

martin@merkaba:~/.local/share/akonadi/db_data/akonadi> ls -lSh | head -5
insgesamt 2,2G
-rw-rw---- 1 martin martin 1,7G Dez 27 15:17 parttable.ibd
-rw-rw---- 1 martin martin 488M Dez 27 15:17 pimitemtable.ibd
-rw-rw---- 1 martin martin  23M Dez 27 15:17 pimitemflagrelation.ibd
-rw-rw---- 1 martin martin 240K Dez 27 15:17 collectiontable.ibd


Or this:

martin@merkaba:~/.local/share/baloo> du -sch * | sort -rh
9,2G    insgesamt
8,0G    email
1,2G    file
51M     emailContacts
408K    contacts
76K     notes
16K     calendars

martin@merkaba:~/.local/share/baloo> ls -lSh email | head -5
insgesamt 8,0G
-rw-r--r-- 1 martin martin 4,0G Dez 27 15:16 postlist.DB
-rw-r--r-- 1 martin martin 3,9G Dez 27 15:16 termlist.DB
-rw-r--r-- 1 martin martin 143M Dez 27 15:16 record.DB
-rw-r--r-- 1 martin martin  63K Dez 27 15:16 postlist.baseA

/usr/bin/du and /usr/bin/df and /bin/ls are all _useless_ for showingthe amount of filespace used by a file in BTRFS.


Look at a nice paste of the previously described "worst case" allocation.

Gust rwhite # btrfs fi df /
Data, single: total=344.00GiB, used=340.41GiB
System, DUP: total=32.00MiB, used=80.00KiB
Metadata, DUP: total=8.00GiB, used=4.84GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Gust rwhite # for ((counter=250;counter>0;counter--)); do ddif=/dev/urandom of=some_file conv=notrunc,fsync bs=4k count=$counter>/dev/null 2>&1; done


Gust rwhite # btrfs fi df /
Data, single: total=344.00GiB, used=340.48GiB
System, DUP: total=32.00MiB, used=80.00KiB
Metadata, DUP: total=8.00GiB, used=4.84GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Gust rwhite # du some_file
1000    some_file

Gust rwhite # ls -lh some_file
-rw-rw-r--+ 1 root root 1000K Dec 27 07:00 some_file

Gust rwhite # rm some_file
Gust rwhite # btrfs fi df /
Data, single: total=344.00GiB, used=340.41GiB
System, DUP: total=32.00MiB, used=80.00KiB
Metadata, DUP: total=8.00GiB, used=4.84GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Notice that "some_file" shows 1000 blocks in du, and 1000k bytes in ls.

But notice that data used jumps from 340.41GiB to 340.48GiB when thefile is created, then drops back down to 340.41GiB when it's deleted.

Now I have compression turned on so the amount of growth/shrinkagechanges between each run, but it's _Way_ more than 1Meg, that's like70MiB (give or take significant rounding in the third place after thedecimal). So I wrote this file in a way that leads to it taking up_seventy_ _times_ it's base size in actual allocated storage. Real filesdo not perform this terribly, but they can get pretty ugly in some cases.

You _really_ need to learn how the system works and what its best andworst cases look like before you start shouting "bug!"

You are using the wrong numbers (e.g. "df") for available space and youdon't know how to estimate what your tools _should_ do for theconditions observed.

But yes, if you open a file and scribble all over it when your disk isfull to within the same order of magnitude as the size of the file youare scribbling on, you will get into a condition where the _application_will aggressively retry the IO. Particularly if that application is a"test program" or a virtual machine doing asynchronous IO.

That's what those sorts of systems do when they crash against a limit inthe underlying system.


So yea... out of space plus agressive writer equals spinning CPU

Before you can assign blame you need to strace your application to seewhat call its making over and over again to see if its just being stupid.

These will not be as bad as the fio test case, but still these files are
written into. They are updated in place.

And thats running on every Plasma desktop by default. And on GNOME desktops
there is similar stuff.

I haven´t seen this spike out a kworker yet tough, so maybe the workload is
light enough not to trigger it that easily.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS free space handling still needs more work: Hangs again

Reply via email to