Thanks for responses.

All of this is *very* surprising. I'm not new to BTRFS, I've been
using it on my own machines for multiple years. I didn't realise there
was an un-holstered footgun on my lap at this point. How can it be
made clear how to avoid the ENOSPC problem to myself and other
sysadmins? Or preferably not exist as a problem?

One thing which continues to puzzle me is "How do I make an alarm to
warn of an impending ENOSPC condition on BTRFS?". ENOSPC is a bad
place to be.

All of the standard monitoring tools warn on the output of `df`.

My first thought was to make a graph and put a threshold in `metadata
total - used`. However, I was fortunate enough in this case to know
about `btrfs fi df`. When I looked at "metadata free" I concluded that
there is plenty free, not knowing that it was allocated in blocks
larger than the amount presented as free (total - used = 0.5GiB). So
these numbers were quite misleading in this case. If I had seen
total=used, or available=0, the problem would have been much clearer.

Why present space as available when it can't be used?

In the end, it seems that metadata should be able to steal space from
"data" on demand. That would make the output of "df" more informative,
since you wouldn't see "60 GB free" and get ENOSPC, which is an
utterly confusing situation and harmful to production.

Is there something fundamental preventing that from happening or is it
just that no-one has gotten around to yet?

Thanks,

- Peter


On 4 August 2014 02:38, Qu Wenruo <quwen...@cn.fujitsu.com> wrote:
> Hi, Peter
>
> Some explain below inline.
>
> -------- Original Message --------
> Subject: ENOSPC with mkdir and rename
> From: Peter Waller <pe...@scraperwiki.com>
> To: <linux-btrfs@vger.kernel.org>
> Date: 2014年08月03日 07:35
>>
>> Hi All,
>>
>> My TL;DR questions are at the bottom, before the stack trace.
>>
>> I'm running Ubuntu 14.04. I wonder if this problem is related to the
>> thread titled "Machine lockup due to btrfs-transaction on AWS EC2
>> Ubuntu 14.04" which I started on the 29th of July:
>>
>>> http://thread.gmane.org/gmane.comp.file-systems.btrfs/37224
>>
>> Kernel: 3.15.7-031507-generic
>>
>> I'm on a single block device system, i.e, no RAID.
>>
>> I was observing ENOSPC from `mkdir` and `rename` on this system, with
>> a good amount of free disk space (df -h reports 62 GB remain). I added
>> enospc_debug (full umount/mount, not just mount -o remount), but this
>> had no apparent effect when receiving ENOSPC from userland.
>>
>> $ sudo btrfs fi df /path/to/volume
>> Data, single: total=489.97GiB, used=427.75GiB
>> System, DUP: total=8.00MiB, used=60.00KiB
>> System, single: total=4.00MiB, used=0.00
>> Metadata, DUP: total=5.00GiB, used=4.50GiB
>
> In fact, all your metadata is used.
> It seems strange since there should be 500MB(to be precious 512MiB) free,
> but I'll explain it below.
>
>> Metadata, single: total=8.00MiB, used=0.00
>> unknown, single: total=512.00MiB, used=820.00KiB
>
> Here the "unknown" is in fact "global data reserve", reserved for COW tree
> write (except FS-tree and subvolume tree if I'm right)
> If you use latest btrfs-progs, it will not show "unknown" but
> "GlobalReserve" and it should not be used under most cases, but it is used,
> which really shows the shortage of space.
>
> So saddly, there is really no space for metadata for mkdir and rename(*).
>
> *: since rename will modify the metadata and since btrfs will do COW for
> metadata tree, and rename/mkdir
> will not use space from global reserve, so ENOSPC is normal.
>
> The good thing is that rm will steel space from global reserve, so you
> should be OK to remove files and hope to free
> enough metadata space.
> Or you can try to add more device to this btrfs.
>
> Thanks,
> Qu
>>
>>
>> After a thorough search of the internet for ENOSPC BTRFS I found
>> various resources and came to understand a little bit more. One thing
>> which broke my intuition severely is that I expected if there is a
>> large number of free GiB, I should expect things to continue to work.
>>
>> In this case, for example, metadata has 0.5GiB free ("sounds like
>> plenty for metadata for one mkdir to me"). Data has 62GiB free. Why
>> would I get ENOSPC for a file rename?
>>
>> I expected that if metadata needed more space, it would just eat it
>> from the 'data'. Now I believe this not to be the case and that it
>> wanted to allocate > 0.5GiB, and this is why I was getting ENOSPC.
>>
>> I tried a rebalance with btrfs balance start -dusage=10 and tried
>> increasing the value until I saw reallocations in dmesg.
>>
>> This spat out a large number of messages in dmesg, of this form:
>>
>>> [376096.546353] BTRFS info (device dm-0): relocating block group
>>> 530457821184 flags 1
>>> [376010.736879] BTRFS info (device dm-0): 40 enospc errors during balance
>>
>> (and a full stack trace at the end of this message).
>>
>> The rebalance printed:
>>
>>> ERROR: error during balancing '/path/to/volume' - No space left on device
>>> There may be more info in syslog - try dmesg | tail
>>
>> Eventually, not knowing what else to do I had to take my escape hatch
>> and enlarge the volume. When I did this, metadata grew by 1GiB:
>>
>>> Data, single: total=490.97GiB, used=427.75GiB
>>> System, DUP: total=8.00MiB, used=60.00KiB
>>> System, single: total=4.00MiB, used=0.00
>>> Metadata, DUP: total=5.50GiB, used=4.50GiB
>>> Metadata, single: total=8.00MiB, used=0.00
>>> unknown, single: total=512.00MiB, used=0.00
>>
>> A few questions:
>>
>> * Why didn't the metadata grow before enlarging the disk?
>> * Why didn't the rebalance enable the metadata to grow?
>> * Why is it necessary to rebalance? Can't it automatically take some
>> free space from 'data'?
>> * Are my machine lockups related to the fact I was low on space?
>> * Can we improve the documentation/FAQ for this? I was scratching my
>> head in particular because my notion of free space definitely does not
>> match up with BTRFS', and I didn't find the FAQ very helpful for
>> getting out of this mess.
>> * It isn't documented on the wiki what enospc_debug is supposed to do,
>> so I couldn't tell whether I should have expected it to tell me
>> anything in my circumstances.
>> * What is the best course of action to take (other than enlarging the
>> disk or deleting files) if I encounter this situation again?
>>
>> Thanks in advance,
>>
>> - Peter
>>
>> [376007.681938] ------------[ cut here ]------------
>> [376007.681957] WARNING: CPU: 1 PID: 27021 at
>> /home/apw/COD/linux/fs/btrfs/extent-tree.c:6946
>> use_block_rsv+0xfd/0x1a0 [btrfs]()
>> [376007.681958] BTRFS: block rsv returned -28
>> [376007.681959] Modules linked in: softdog tcp_diag inet_diag dm_crypt
>> ppdev xen_fbfront fb_sys_fops syscopyarea sysfillrect sysimgblt
>> i2c_piix4 serio_raw parport_pc parport mac_hid isofs xt_tcpudp
>> iptable_filter xt_owner ip_tables x_tables btrfs xor raid6_pq
>> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
>> aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd floppy psmouse
>> [376007.681980] CPU: 1 PID: 27021 Comm: pam_script_ses_ Tainted: G
>>     W     3.15.7-031507-generic #201407281235
>> [376007.681981] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/23/2014
>> [376007.681983]  0000000000001b22 ffff8800acca39d8 ffffffff8176f115
>> 0000000000000007
>> [376007.681986]  ffff8800acca3a28 ffff8800acca3a18 ffffffff8106ceac
>> ffff8801efc37870
>> [376007.681989]  ffff88017db0ff00 ffff8801aedcd800 0000000000001000
>> ffff88001c987000
>> [376007.681992] Call Trace:
>> [376007.682000]  [<ffffffff8176f115>] dump_stack+0x46/0x58
>> [376007.682005]  [<ffffffff8106ceac>] warn_slowpath_common+0x8c/0xc0
>> [376007.682008]  [<ffffffff8106cf96>] warn_slowpath_fmt+0x46/0x50
>> [376007.682016]  [<ffffffffa00d9d1d>] use_block_rsv+0xfd/0x1a0 [btrfs]
>> [376007.682024]  [<ffffffffa00de687>] btrfs_alloc_free_block+0x57/0x220
>> [btrfs]
>> [376007.682027]  [<ffffffff8178033c>] ? __do_page_fault+0x28c/0x550
>> [376007.682031]  [<ffffffff8119749f>] ? page_add_file_rmap+0x6f/0xb0
>> [376007.682037]  [<ffffffffa00c8a3c>] btrfs_copy_root+0xfc/0x2b0 [btrfs]
>> [376007.682041]  [<ffffffff811c60b9>] ? memcg_check_events+0x29/0x50
>> [376007.682051]  [<ffffffffa013a583>] ? create_reloc_root+0x33/0x2c0
>> [btrfs]
>> [376007.682061]  [<ffffffffa013a743>] create_reloc_root+0x1f3/0x2c0
>> [btrfs]
>> [376007.682064]  [<ffffffff811dd073>] ? generic_permission+0xf3/0x120
>> [376007.682073]  [<ffffffffa0140eb8>] btrfs_init_reloc_root+0xb8/0xd0
>> [btrfs]
>> [376007.682082]  [<ffffffffa00ee967>]
>> record_root_in_trans.part.30+0x97/0x100 [btrfs]
>> [376007.682090]  [<ffffffffa00ee9f4>] record_root_in_trans+0x24/0x30
>> [btrfs]
>> [376007.682098]  [<ffffffffa00efeb1>]
>> btrfs_record_root_in_trans+0x51/0x80 [btrfs]
>> [376007.682106]  [<ffffffffa00f13d6>]
>> start_transaction.part.35+0x86/0x560 [btrfs]
>> [376007.682109]  [<ffffffff8132c197>] ? apparmor_capable+0x27/0x80
>> [376007.682117]  [<ffffffffa00f18d9>] start_transaction+0x29/0x30 [btrfs]
>> [376007.682125]  [<ffffffffa00f19a7>] btrfs_join_transaction+0x17/0x20
>> [btrfs]
>> [376007.682133]  [<ffffffffa00f7fa8>] btrfs_dirty_inode+0x58/0xe0 [btrfs]
>> [376007.682141]  [<ffffffffa00fcaf2>] btrfs_setattr+0xa2/0xf0 [btrfs]
>> [376007.682144]  [<ffffffff811eec74>] notify_change+0x1c4/0x3b0
>> [376007.682146]  [<ffffffff811dde96>] ? final_putname+0x26/0x50
>> [376007.682149]  [<ffffffff811d088d>] chown_common+0x16d/0x1a0
>> [376007.682153]  [<ffffffff811f2b08>] ? __mnt_want_write+0x58/0x70
>> [376007.682156]  [<ffffffff811d1a8f>] SyS_fchownat+0xbf/0x100
>> [376007.682159]  [<ffffffff811d1aed>] SyS_chown+0x1d/0x20
>> [376007.682163]  [<ffffffff817858bf>] tracesys+0xe1/0xe6
>> [376007.682165] ---[ end trace 1853311c87a5cd94 ]---
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to