On 12/11/2014 03:01 PM, Patrik Lundquist wrote:
On 11 December 2014 at 11:18, Robert White <rwh...@pobox.com> wrote:
So far I don't see a "bug".
Fair enough, lets call it a huge problem with btrfs convert. I think
it warrants a note in the wiki.
On 12/11/2014 12:18 AM, Patrik Lundquist wrote:
Running defrag several more times and balance again doesn't help.
That sounds correct as defrag defrags files, it does not reallocate extents.
From https://btrfs.wiki.kernel.org/index.php/Conversion_from_Ext3
"A notable caveat is that a balance can fail with "ENOSPC" if the
defragment is skipped. This is usually due to large extents on ext
being larger than the maximum size btrfs normally operates with (1
GB). A defrag of all large files will avoid this:"
I interpreted it as breaking down large extents and reallocating them,
thus avoiding my current situation.
There's a good chance that if you balanced again and again the number of no
space errors might decrease. With only one 2-ish gig empty slot sliding
around like one of those puzzles where you have to sort the numbers from 1
to 15 by sliding them around in the 4x4=16 element grid.
I was never fond of those puzzles.
The first step is admitting that you _don't_ have a problem.
I've got 99 problems and balance is one of them (the other are block
groups). :-)
Of course the filesystem is in a problematic state after the
conversion, even if it's not a bug. ~1.5TB of free space and yet out
of space and it can't be fixed with a balance. It might not be wrong
per se but it's very problematic from a user perspective.
Anyway, this thread has turned up lots of good information.
You are _not_ out of space in which to create files. (or so I presume, you
still haven't posted the output of /bin/df or btrfs filesystem df).
I'm not; creating new files works.
Exactly!
I know you are not, thats why the DF below doesn't say you are out of
space. You have 1526389096 one-K blocks available on the drive for file
storage.
You just don't have enough _contiguous_ unallocated _raw_ storage to
copy a 2-gig-ish extent.
Until you "get" this idea you are not going to understand what is happening.
When you make an EXT4/3/2 file system it goes out and allocates _all_
_the_ _space_ to the file system's structures. It goes out and writes
block groups where all the block groups go, with zero-ed out allocation
bitmaps and zeroed-out inodes. (Actually the kernel does most of that
work on "first mount" in the modern cases, which is way a mkfs.ext4
takes _way_ _less_ _time_ than an mkfs.ext2 on the same expanse of disk).
BTRFS, on the other hand, does something _completely_ _different_. It
just writes a few things. One is the superblock(s) [up to three per
physical disk,depending on media size]. One is the metadata chunk(s)
that contain those blocks. One is the raw space storage tree. And so on.
It never _touches_ the bulk of the disk(s).
As you fill your metadata and data extents, the raw storage manager
makes new ones on-demand. This requires space from the "raw storage".
In EXT4 speak it would be like if EXT4 only wrote the _first_ block
group and then only added the second block group once the first was
filled. Then a third when the second and first were both filled.
There will come a point when BTRFS has allocated _all_ the raw storage.
And at that point it will fail to allocate new storage for block groups
because there is no more raw storage. At that moment, the BTRFS
filesystem is "just like" the freshly built EXT4 in that 100% of the raw
storage has "initialized file system bits" written on it.
Recent kernel patches have added a feature to BTRFS that says "if this
file system bit is completely empty, lets de-initialize it and return it
to the raw pool". This was added because one could fill ones file system
with "large files" (and so data extents) then delete all those and fill
the same space wiht tiny files (and so need metadata extents). If there
was no way to "remove the empty data extent" so that it could be
replaced by one or more metadata extents, you could end up in the very
same jam you can end up wiht in EXT4, which is running out of inodes
when you've got plenty of "storage space".
So BTRFS has this give-and-take. But when it comes to "relocating"
something, as in "balancing it", you have to be able to make the new
space before you can move the files over and remove the old space.
That's the core idea behind Copy On Write. In COW there is no "move"
operation for storage. You have only "copy" and "delete".
So when the raw storage pool gets full you run out of space to make more
BTRFS. When the allocated storage space gets full, you run out of space
to make files.
Its unlike other filesystems. Every byte is allocated twice. First it
has to be allocated to the file system, then the file system has to
allocate it to the individual files.
This double layer makes it more adaptable as described above.
So Is There A Precedent?
Yes.
ITEM: A normal program running in a system has an address space (on a
32-bit linux box this is typically 2G for code, and 2G for data). No
matter how "big" the program is, it gets the same 4G of "space" in its
virtual memory map. That is, the map is _capable_ _of_ containing 4G in
tww 2G regions. This map is assembled by the kernel when it calls exec,
and it is assembled _before_ the code of the program is even read off
the disk.
This is very like what happens in a mkfs.btrfs. We point it at a region
of disk and it builds the _ability_ to make the map by setting up the
minimal necessary structures needed to maintain the map that will be built.
ITEM: The kernel then loads the core executable into the start of the 2G
space by using mmap(). But because of dynamic linking, this load may
bring in lots of other files, each will take up some of that 2G of code
space. Many will take up some of the 2G of data space since most code
requires data.
This is like how BTRFS makes the first extents for data and whtnot.
ITEM: As the program runs it may dynamically load even more code (like
say loading the flash player extension into your web browser when you go
to a site that has some stupid advert or movie); but what it's really
going to start doing is using the data area. You want to look at that
picture of that kitten, you are going to need to take a chunk out of
that 2G of data and "realize it", that is access after doing an
anynonymous mmap() or just make your effective amount of "ram in use"
bigger with setbrk().
This is like how BTRFS allocates metadata (analogous to code space) and
data space extents.
ITEM: If you go to another page with a picture of a puppy instead of a
kitten, the space for the kitten picture has been freed up and if the
puppy picture can fit there then no additional mmap() or setbrk() calls
need to be made. This sort of thing is normally dealt-with/controlled at
the deepest level by the memory allocation library. I don't have to know
when setbrk() or mmap() is being called for this data memory, I just
call malloc() when I need more and free() when I am done with something
I got from malloc(). Under the hood there are linked list structures,
and trees, and bitmaps for small allocations too small to be worth
calling setbrk() for each one. etc.
This is normal runtime behavior for BTRFS. It serves up already
controlled bits when it can, and it goes out and allocates and
initialized more when the need arises.
This all happens under the hood, and normally the only way you'd ever
see an out of space error ("ENOSPC") is if you asked for a bit and it
didn't have the right kind of space, so it went to allocate more space
on the disk and that failed as well.
In the same sense a program could fill up its 2G for data, and still
have nearly 2G available for code, but its the wrong kind of space, so a
malloc would return "out of memory".
==== SO WTF HERE THEN, AM I RIGHT? ====
EXT4 had already allocated all the raw space and decided what kind of
storage each segment was for.
You ran btrfs-convert, and it went in and adopted all those spaces for
all those purposes. Then it did "what it could" to free up as much of
that allocated space as was possible.
Since your EXT4 file system had been full-to-busting at least once in
the past, there was very little the convert program to could to free up
those data extents because on the average the now half-empty file system
consisted of "all the data extents, each half empty at best".
If you original EXT4 image had been less something-is-everywhere, the
convert process, and the balance process thereafter, would have had more
raw space to do the "sliding block puzzle" that is a balance operation.
By adding another storage volume (disk/partition/etc) you would be
giving it that room.
Where the EXT4/3/2 raw storage map looks like a fully built brick wall
made with fixed sized bricks, a BTRFS raw storage layout looks like a
sliding block puzzle.
To really understand what balance is trying to do, play this game for a
while
http://www.agame.com/game/sliding-block-puzzle
Only imagine that each block had to be copied to the new location before
it could be removed from the old one. And keep in mind that the little
blocks are the metadata, the medium blocks are partly full data blocks,
and the big blocks are the giant 2-gig-ish extents crammed full of files
that you were asking balance to juggle around.
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdc1 2930265088 1402223656 1526389096 48% /mnt
See, there's your space.
It's not "lost" its just not on hand in 2Gig contiguous chunks that
balance can copy around. Chances are there a good number of 1 gig chunks
for future data, and lots more 256M chunks for future metadata (and the
small files that fit wholly therein).
Nothing broken here. No "mising space" just a sliding block puzzle with
no current solution that would be easily solvable if more raw space was
available.
$ btrfs fi df /mnt
Data, single: total=1.41TiB, used=1.30TiB
System, DUP: total=32.00MiB, used=124.00KiB
Metadata, DUP: total=2.50GiB, used=1.49GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
Your next step is to either add storage in accordance with your plan of
adding four more volumes to make a RAID (as expressed elsewhere), or make a
clean filesystem and copy your files over.
I've already decided to start over with a clean filesystem to get rid
of the ext4 legacy. I'm only curious about how to solve the balance
problem, and now I know how.
You solve the balance problem by adding enough working space for it to
proceed, or by removing enough files that it can coalesce enough to
delete a few of those big extents, thus providing enough working space
to proceed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html