On Sat, Dec 27, 2014 at 06:54:33AM -0800, Robert White wrote:
> On 12/27/2014 05:55 AM, Martin Steigerwald wrote:
[snip]
> >while fio was just *laying* out the 4 GiB file. Yes, thats 100% system CPU
> >for 10 seconds while allocatiing a 4 GiB file on a filesystem like:
> >
> >martin@merkaba:~> LANG=C df -hT /home
> >Filesystem             Type   Size  Used Avail Use% Mounted on
> >/dev/mapper/msata-home btrfs  170G  156G   17G  91% /home
> >
> >where a 4 GiB file should easily fit, no? (And this output is with the 4
> >GiB file. So it was even 4 GiB more free before.)
> 
> No. /usr/bin/df is an _approximation_ in BTRFS because of the limits
> of the fsstat() function call. The fstat function call was defined
> in 1990 and "can't understand" the dynamic allocation model used in
> BTRFS as it assumes fixed geometry for filesystems. You do _not_
> have 17G actually available. You need to rely on btrfs fi df and
> btrfs fi show to figure out how much space you _really_ have.
> 
> According to this block you have a RAID1 of ~ 160GB expanse (two 160G disks)
> 
> > merkaba:~> date; btrfs fi sh /home ; btrfs fi df /home
> > Sa 27. Dez 13:26:39 CET 2014
> > Label: 'home'  uuid: [some UUID]
> >          Total devices 2 FS bytes used 152.83GiB
> >          devid    1 size 160.00GiB used 160.00GiB path
> /dev/mapper/msata-home
> >          devid    2 size 160.00GiB used 160.00GiB path
> /dev/mapper/sata-home
> 
> And according to this block you have about 4.49GiB of data space:
> 
> > Btrfs v3.17
> > Data, RAID1: total=154.97GiB, used=149.58GiB
> > System, RAID1: total=32.00MiB, used=48.00KiB
> > Metadata, RAID1: total=5.00GiB, used=3.26GiB
> > GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> 154.97
>   5.00
>   0.032
> + 0.512
> 
> Pretty much as close to 160GiB as you are going to get (those
> numbers being rounded up in places for "human readability") BTRFS
> has allocate 100% of the raw storage into typed extents.
> 
> A large datafile can only fit in the 154.97-149.58 = 5.39

   I appreciate that this is something of a minor point in the grand
scheme of things, but I'm afraid I've lost the enthusiasm to engage
with the broader (somewhat rambling, possibly-at-cross-purposes)
conversation in this thread. However...

> Trying to allocate that 4GiB file into that 5.39GiB of space becomes
> an NP-complete (e.g. "very hard") problem if it is very fragmented.

   This is... badly mistaken, at best. The problem of where to write a
file into a set of free extents is definitely *not* an NP-hard
problem. It's a P problem, with an O(n log n) solution, where n is the
number of free extents in the free space cache. The simple approach:
fill the first hole with as many bytes as you can, then move on to the
next hole. More complex: order the free extents by size first. Both of
these are O(n log n) algorithms, given an efficient general-purpose
index of free space.

   The problem of placing file data isn't a bin-packing problem; it's
not like allocating RAM (where each allocation must be contiguous).
The items being placed may be split as much as you like, although
minimising the amount of splitting is a goal.

   I suspect that the performance problems that Martin is seeing may
indeed be related to free space fragmentation, in that finding and
creating all of those tiny extents for a huge file is causing
problems. I believe that btrfs isn't alone in this, but it may well be
showing the problem to a far greater degree than other FSes. I don't
have figures to compare, I'm afraid.

> I also don't know what kind of tool you are using, but it might be
> repeatedly trying and failing to fallocate the file as a single
> extent or something equally dumb.

   Userspace doesn't as far as I know, get to make that decision. I've
just read the fallocate(2) man page, and it says nothing at all about
the contiguity of the extent(s) storage allocated by the call.

   Hugo.

[snip]

-- 
Hugo Mills             | O tempura! O moresushi!
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: 65E74AC0          |

Attachment: signature.asc
Description: Digital signature

Reply via email to