Re: ENOSPC after conversion [Was: Fixing Btrfs Filesystem Full Problems typo?]

Robert White Thu, 11 Dec 2014 16:36:24 -0800

On 12/11/2014 03:01 PM, Patrik Lundquist wrote:

On 11 December 2014 at 11:18, Robert White <rwh...@pobox.com> wrote:

So far I don't see a "bug".


Fair enough, lets call it a huge problem with btrfs convert. I think
it warrants a note in the wiki.

On 12/11/2014 12:18 AM, Patrik Lundquist wrote:


Running defrag several more times and balance again doesn't help.


That sounds correct as defrag defrags files, it does not reallocate extents.

From https://btrfs.wiki.kernel.org/index.php/Conversion_from_Ext3


"A notable caveat is that a balance can fail with "ENOSPC" if the
defragment is skipped. This is usually due to large extents on ext
being larger than the maximum size btrfs normally operates with (1
GB). A defrag of all large files will avoid this:"

I interpreted it as breaking down large extents and reallocating them,
thus avoiding my current situation.

There's a good chance that if you balanced again and again the number of no
space errors might decrease. With only one 2-ish gig empty slot sliding
around like one of those puzzles where you have to sort the numbers from 1
to 15 by sliding them around in the 4x4=16 element grid.


I was never fond of those puzzles.

The first step is admitting that you _don't_ have a problem.


I've got 99 problems and balance is one of them (the other are block
groups). :-)

Of course the filesystem is in a problematic state after the
conversion, even if it's not a bug. ~1.5TB of free space and yet out
of space and it can't be fixed with a balance. It might not be wrong
per se but it's very problematic from a user perspective.

Anyway, this thread has turned up lots of good information.

You are _not_ out of space in which to create files. (or so I presume, you
still haven't posted the output of /bin/df or btrfs filesystem df).


I'm not; creating new files works.


Exactly!

I know you are not, thats why the DF below doesn't say you are out ofspace. You have 1526389096 one-K blocks available on the drive for filestorage.

You just don't have enough _contiguous_ unallocated _raw_ storage tocopy a 2-gig-ish extent.


Until you "get" this idea you are not going to understand what is happening.

When you make an EXT4/3/2 file system it goes out and allocates _all__the_ _space_ to the file system's structures. It goes out and writesblock groups where all the block groups go, with zero-ed out allocationbitmaps and zeroed-out inodes. (Actually the kernel does most of thatwork on "first mount" in the modern cases, which is way a mkfs.ext4takes _way_ _less_ _time_ than an mkfs.ext2 on the same expanse of disk).

BTRFS, on the other hand, does something _completely_ _different_. Itjust writes a few things. One is the superblock(s) [up to three perphysical disk,depending on media size]. One is the metadata chunk(s)that contain those blocks. One is the raw space storage tree. And so on.It never _touches_ the bulk of the disk(s).

As you fill your metadata and data extents, the raw storage managermakes new ones on-demand. This requires space from the "raw storage".

In EXT4 speak it would be like if EXT4 only wrote the _first_ blockgroup and then only added the second block group once the first wasfilled. Then a third when the second and first were both filled.

There will come a point when BTRFS has allocated _all_ the raw storage.And at that point it will fail to allocate new storage for block groupsbecause there is no more raw storage. At that moment, the BTRFSfilesystem is "just like" the freshly built EXT4 in that 100% of the rawstorage has "initialized file system bits" written on it.

Recent kernel patches have added a feature to BTRFS that says "if thisfile system bit is completely empty, lets de-initialize it and return itto the raw pool". This was added because one could fill ones file systemwith "large files" (and so data extents) then delete all those and fillthe same space wiht tiny files (and so need metadata extents). If therewas no way to "remove the empty data extent" so that it could bereplaced by one or more metadata extents, you could end up in the verysame jam you can end up wiht in EXT4, which is running out of inodeswhen you've got plenty of "storage space".

So BTRFS has this give-and-take. But when it comes to "relocating"something, as in "balancing it", you have to be able to make the newspace before you can move the files over and remove the old space.That's the core idea behind Copy On Write. In COW there is no "move"operation for storage. You have only "copy" and "delete".

So when the raw storage pool gets full you run out of space to make moreBTRFS. When the allocated storage space gets full, you run out of spaceto make files.

Its unlike other filesystems. Every byte is allocated twice. First ithas to be allocated to the file system, then the file system has toallocate it to the individual files.


This double layer makes it more adaptable as described above.

So Is There A Precedent?

Yes.

ITEM: A normal program running in a system has an address space (on a32-bit linux box this is typically 2G for code, and 2G for data). Nomatter how "big" the program is, it gets the same 4G of "space" in itsvirtual memory map. That is, the map is _capable_ _of_ containing 4G intww 2G regions. This map is assembled by the kernel when it calls exec,and it is assembled _before_ the code of the program is even read offthe disk.

This is very like what happens in a mkfs.btrfs. We point it at a regionof disk and it builds the _ability_ to make the map by setting up theminimal necessary structures needed to maintain the map that will be built.

ITEM: The kernel then loads the core executable into the start of the 2Gspace by using mmap(). But because of dynamic linking, this load maybring in lots of other files, each will take up some of that 2G of codespace. Many will take up some of the 2G of data space since most coderequires data.


This is like how BTRFS makes the first extents for data and whtnot.

ITEM: As the program runs it may dynamically load even more code (likesay loading the flash player extension into your web browser when you goto a site that has some stupid advert or movie); but what it's reallygoing to start doing is using the data area. You want to look at thatpicture of that kitten, you are going to need to take a chunk out ofthat 2G of data and "realize it", that is access after doing ananynonymous mmap() or just make your effective amount of "ram in use"bigger with setbrk().

This is like how BTRFS allocates metadata (analogous to code space) anddata space extents.

ITEM: If you go to another page with a picture of a puppy instead of akitten, the space for the kitten picture has been freed up and if thepuppy picture can fit there then no additional mmap() or setbrk() callsneed to be made. This sort of thing is normally dealt-with/controlled atthe deepest level by the memory allocation library. I don't have to knowwhen setbrk() or mmap() is being called for this data memory, I justcall malloc() when I need more and free() when I am done with somethingI got from malloc(). Under the hood there are linked list structures,and trees, and bitmaps for small allocations too small to be worthcalling setbrk() for each one. etc.

This is normal runtime behavior for BTRFS. It serves up alreadycontrolled bits when it can, and it goes out and allocates andinitialized more when the need arises.

This all happens under the hood, and normally the only way you'd eversee an out of space error ("ENOSPC") is if you asked for a bit and itdidn't have the right kind of space, so it went to allocate more spaceon the disk and that failed as well.

In the same sense a program could fill up its 2G for data, and stillhave nearly 2G available for code, but its the wrong kind of space, so amalloc would return "out of memory".


==== SO WTF HERE THEN, AM I RIGHT? ====

EXT4 had already allocated all the raw space and decided what kind ofstorage each segment was for.

You ran btrfs-convert, and it went in and adopted all those spaces forall those purposes. Then it did "what it could" to free up as much ofthat allocated space as was possible.

Since your EXT4 file system had been full-to-busting at least once inthe past, there was very little the convert program to could to free upthose data extents because on the average the now half-empty file systemconsisted of "all the data extents, each half empty at best".

If you original EXT4 image had been less something-is-everywhere, theconvert process, and the balance process thereafter, would have had moreraw space to do the "sliding block puzzle" that is a balance operation.

By adding another storage volume (disk/partition/etc) you would begiving it that room.

Where the EXT4/3/2 raw storage map looks like a fully built brick wallmade with fixed sized bricks, a BTRFS raw storage layout looks like asliding block puzzle.

To really understand what balance is trying to do, play this game for awhile


  http://www.agame.com/game/sliding-block-puzzle

Only imagine that each block had to be copied to the new location beforeit could be removed from the old one. And keep in mind that the littleblocks are the metadata, the medium blocks are partly full data blocks,and the big blocks are the giant 2-gig-ish extents crammed full of filesthat you were asking balance to juggle around.


$ df
Filesystem      1K-blocks       Used  Available Use% Mounted on
/dev/sdc1      2930265088 1402223656 1526389096  48% /mnt


See, there's your space.

It's not "lost" its just not on hand in 2Gig contiguous chunks thatbalance can copy around. Chances are there a good number of 1 gig chunksfor future data, and lots more 256M chunks for future metadata (and thesmall files that fit wholly therein).

Nothing broken here. No "mising space" just a sliding block puzzle withno current solution that would be easily solvable if more raw space wasavailable.


$ btrfs fi df /mnt
Data, single: total=1.41TiB, used=1.30TiB
System, DUP: total=32.00MiB, used=124.00KiB
Metadata, DUP: total=2.50GiB, used=1.49GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Your next step is to either add storage in accordance with your plan of
adding four more volumes to make a RAID (as expressed elsewhere), or make a
clean filesystem and copy your files over.


I've already decided to start over with a clean filesystem to get rid
of the ext4 legacy. I'm only curious about how to solve the balance
problem, and now I know how.

You solve the balance problem by adding enough working space for it toproceed, or by removing enough files that it can coalesce enough todelete a few of those big extents, thus providing enough working spaceto proceed.



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ENOSPC after conversion [Was: Fixing Btrfs Filesystem Full Problems typo?]

Reply via email to