Re: Fixing Btrfs Filesystem Full Problems typo?

Robert White Wed, 10 Dec 2014 04:19:05 -0800

On 12/09/2014 11:19 PM, Patrik Lundquist wrote:

On 10 December 2014 at 00:13, Robert White <rwh...@pobox.com> wrote:

On 12/09/2014 02:29 PM, Patrik Lundquist wrote:


Label: none  uuid: 770fe01d-6a45-42b9-912e-e8f8b413f6a4
      Total devices 1 FS bytes used 1.35TiB
      devid    1 size 2.73TiB used 1.36TiB path /dev/sdc1


Data, single: total=1.35TiB, used=1.35TiB
System, single: total=32.00MiB, used=112.00KiB
Metadata, single: total=3.00GiB, used=1.55GiB
GlobalReserve, single: total=512.00MiB, used=0.00B



Are you trying to convert a filesystem on a single device/partition to RAID
1?


Not yet. I'm stuck at the full balance after the conversion from ext4.
I haven't added the disks for RAID1 and might need them for starting
over instead.


You are not "stuck" here as this step is not mandatory. (see below)


A balance with -musage=100 -dusage=99 works but a full fails. It would
be nice to nail the bug since the fs passes btrfs check and it seems
to be a clear ENOSPC bug.


Conversion from ext2/3/4 is constrained because it needs to be reversible.

If you are out of space this isn't a "bug", you are just out of space.So by telling the system to ignore the 100% full clusters it is free tojuggle the fragments. But once you get into moving the fully fullextents the COW features _MUST_ have access to _contiguous_ 1Gib blocksto make the new extents int which the Copy will be Written. If your filesystem was nearly full it's completely likely that there are no suchcontiguous blocks available to make the necessary extents.

BUT FIRST UNDERSTAND: you do _not_ need to balance a newly convertedfilesystem. That is, the recommended balance (and recursive defrag) is_not_ a useability issue, its an efficiency issue.

Check what you've got. Make sure it is good. Make sure you are cool withit all. When you know everything is usable then remove the undoinformation snapshot. That snapshot is pinning a _lot_ of data intoexact positions on disk. It's memorializing your previous fragmentationand the anniversary positions of all the EXT4 data structures. Sinceyour system is basically full that undo information has to go.


At that point your balance will probably have the room it needs.

_Then_ you can balance if you feel the desire.

If you are _still_ out of space you'll need to add some, at leasttemporarily, to give the system enough room to work.

Since we all _know_ you are a dilligent system administrator andarchitect with a good, recent, and well tested backup we know we canrecommend that you just dump the undo partition with a nice btrfs subvoldelete, right? Because you made a backup and everything yes?

So anyway. Your system isn't "bugged" or "broken" it's "full" but its afragmented fullness that has lots of free sectors but insufficentcontiguous free sectors, so it cannot satisfy the request.


That Said...

I suspect you _have_ revealed a problem with the error reporting in thecase of "scary and wrong error message".

The allocator in extent-tree.c just tells you the raw free space on thedisk and says "hua... there are lots of bytes out there".

Which is _WAY_ different than "there are enough bytes all in one clumpto satisfy my needs. E.g. there is _not_ a lot of brains behind the message.



        ret = find_free_extent(root, num_bytes, empty_size, hint_byte, ins,
                               flags, delalloc);

        if (ret == -ENOSPC) {
                if (!final_tried && ins->offset) {
                        num_bytes = min(num_bytes >> 1, ins->offset);

num_bytes = round_down(num_bytes,root->sectorsize);

                        num_bytes = max(num_bytes, min_alloc_size);
                        if (num_bytes == min_alloc_size)
                                final_tried = true;
                        goto again;
                } else if (btrfs_test_opt(root, ENOSPC_DEBUG)) {
                        struct btrfs_space_info *sinfo;

                        sinfo = __find_space_info(root->fs_info, flags);

btrfs_err(root->fs_info, "allocation failedflags %llu, wanted %llu",

                                flags, num_bytes);
                        if (sinfo)
                                dump_space_info(sinfo, num_bytes, 1);
                }
        }



I don't know how to interpret the space_info error. Why is only
4773171200 (4,4GiB) free?
Can I inspect block group 1821099687936 to try to find out what makes
it problematic?

BTRFS info (device sdc1): relocating block group 1821099687936 flags 1
BTRFS error (device sdc1): allocation failed flags 1, wanted 2013265920
BTRFS: space_info 1 has 4773171200 free, is not full
BTRFS: space_info total=1494648619008, used=1489775505408, pinned=0,
reserved=99700736, may_use=2102390784, readonly=241664

So it was looking for a single chunk 2013265920 bytes long and itcouldn't find one because all the spaces were smaller and there was noroom to make a new suitable space.

The problem is that it wanted 2013265920 bytes and while the system as awhole had no way to satisfy that desire. It asked for something just shyof two gigs as a single extent. That's a tough order on a full platter.

Since your entire free size is 2102390784 that is an attempt to allocateabout 80% of your free space as one contiguous block. That's never goingto happen. 8-)

I don't even know if 2GiB is normally a legal size for an extent. Myunderstanding is that data is allocated in 1G chunks, so I'd expect allextents to be smaller than 1G.


Normally...

But... I would bet that this 2gig monster is the image file, or partthereof, that btrfs-convert left behind, and it may well be a magicalallocation of some sort. It may even be beyond the reach of balance etal for being so large. But it _is_ within the bounds of the byte offestsand sizes the file system uses.

After a quick glance at the btrfs-convert, it looks like it might makesome pretty atypical extents if the underlying donor filesystem neededneeded them. It wouldn't have had a choice. So it's easily within therealm of reason that you'd have some really fascinating data as a resultof converting a nearly full EXT4 file system of the Terabyte+ size. Thiswould be quadruply true if you'd tweaked the block group ratios when youmade the original file system.

So since you have nice backups... you should probably drop theext2_saved subvolume and then get on with your life for good or ill.


But its do or undo time.

AND UNDO IS NOT A BAD OPTION.

If you've got the media, building a fresh filesystem and copying thecontents onto it is my preferred method anyway. I get to set the optionsI want (compression, skinny metadata, whatever) and I know I've got agood backup on the original media. It's also the perfectly natural wayto get the subvolume boundaries where I want them and all that stuff.

Think of the time and worry you'd have saved if you'd copied the thingin the first place. 8-)


So anyway...

Probably fine.
Probably just very full filesystem.
Clearly got some big whale files that just won't balance due to space.
Probably those files are the leftover EXT4 structures.
Probably okay to revert.
Probably okay to just delete the revert info.
The prior two items are mutually exclusive.

Since you have nice and validated backups you can't go wrong either way.

P.S. you should re-balance your System and Metadata as "DUP" for now. Two
copies of that stuff is better than one as right now you have no real
recovery path for that stuff. If you didn't make that change on purpose it
probably got down-revved from DUP automagically when you tired to RAID it.


Good point. Maybe btrfs-convert should do that by default? I don't
think it has ever been DUP.


Eyup.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Fixing Btrfs Filesystem Full Problems typo?

Reply via email to