Re: Recovery from full metadata with all device space consumed?

Hugo Mills Thu, 19 Apr 2018 15:44:39 -0700

On Thu, Apr 19, 2018 at 03:08:48PM -0700, Drew Bloechl wrote:
> I've got a btrfs filesystem that I can't seem to get back to a useful
> state. The symptom I started with is that rename() operations started
> dying with ENOSPC, and it looks like the metadata allocation on the
> filesystem is full:
> 
> # btrfs fi df /broken
> Data, RAID0: total=3.63TiB, used=67.00GiB
> System, RAID1: total=8.00MiB, used=224.00KiB
> Metadata, RAID1: total=3.00GiB, used=2.50GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> All of the consumable space on the backing devices also seems to be in
> use:
> 
> # btrfs fi show /broken
> Label: 'mon_data'  uuid: 85e52555-7d6d-4346-8b37-8278447eb590
>       Total devices 4 FS bytes used 69.50GiB
>       devid    1 size 931.51GiB used 931.51GiB path /dev/sda1
>       devid    2 size 931.51GiB used 931.51GiB path /dev/sdb1
>       devid    3 size 931.51GiB used 931.51GiB path /dev/sdc1
>       devid    4 size 931.51GiB used 931.51GiB path /dev/sdd1
> 
> Even the smallest balance operation I can start fails (this doesn't
> change even with an extra temporary device added to the filesystem):


   Given that both data and metadata levels here require paired
chunks, try adding _two_ temporary devices so that it can allocate a
new block group.

   Hugo.

> # btrfs balance start -v -dusage=1 /broken
> Dumping filters: flags 0x1, state 0x0, force is off
>   DATA (flags 0x2): balancing, usage=1
> ERROR: error during balancing '/broken': No space left on device
> There may be more info in syslog - try dmesg | tail
> # dmesg | tail -1
> [11554.296805] BTRFS info (device sdc1): 757 enospc errors during
> balance
> 
> The current kernel is 4.15.0 from Debian's stretch-backports
> (specifically linux-image-4.15.0-0.bpo.2-amd64), but it was Debian's
> 4.9.30 when the filesystem got into this state. I upgraded it in the
> hopes that a newer kernel would be smarter, but no dice.
> 
> btrfs-progs is currently at v4.7.3.
> 
> Most of what this filesystem stores is Prometheus 1.8's TSDB for its
> metrics, which are constantly written at around 50MB/second. The
> filesystem never really gets full as far as data goes, but there's a lot
> of never-ending churn for what data is there.
> 
> Question 1: Are there other steps that can be tried to rescue a
> filesystem in this state? I still have it mounted in the same state, and
> I'm willing to try other things or extract debugging info.
> 
> Question 2: Is there something I could have done to prevent this from
> happening in the first place?
> 
> Thanks!

-- 
Hugo Mills             | Always be sincere, whether you mean it or not.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |                                      Flanders & Swann
PGP: E2AB1DE4          |                                The Reluctant Cannibal

signature.asc
Description: Digital signature

Re: Recovery from full metadata with all device space consumed?

Reply via email to