On 16.11.18 г. 17:18 ч., David Sterba wrote:
> On Fri, Oct 26, 2018 at 02:43:16PM +0300, Nikolay Borisov wrote:
>> While investigating the balance hang I came across various inconsistencies
>> in
>> the source. This series aims to fix those.
>>
>> The first patch is (I believe) a fix to a longstanding bug that could cause
>> balance to fail due to ENOSPC. The code no properly ensures that there is
>> at least 1g of unallocated space on every device being balance.
>>
>> Patch 2 makes btrfs_can_relocate a bit more obvious and removes leftovers
>> from
>> previous cleanup
>>
>> Patches 3/4/5 remove some redundant code from various functions.
>>
>> This series has survived multiple xfstest runs.
>>
>> Nikolay Borisov (5):
>> btrfs: Ensure at least 1g is free for balance
>> btrfs: Refactor btrfs_can_relocate
>> btrfs: Remove superfluous check form btrfs_remove_chunk
>> btrfs: Sink find_lock_delalloc_range's 'max_bytes' argument
>> btrfs: Replace BUG_ON with ASSERT in find_lock_delalloc_range
>
> Patches 2-5 on the way to misc-next, thanks. The first one can have user
> visible consequences, so I'd rather first find out why the 1MB was there
> and if it's really a bug and what exactly will change when it's 1G.
I'm fine with that, one thing I don't agree with, though, is the
conclusion that patch 1 has user visible consequence. As a matter of
fact it does not. If btrfs_shrink_device fails with ENOSPC we just
break, i/e we don't try to free balance space for any of the other
devices, but this doesn't stop from balance actually continuing. As it
stands today I think "step one" in __btrfs_balance is more or less
null-op, i.e cycles are wasted since all we do is shrink every device by
1mb, ensuring one more mb is already free. I think this is very
insufficient for a balance operation and when it succeeds during normal
operation it's due to the device already having enough unallocated space
before the balance.
Anyway, I will speak with Chris to try and find out why the code uses 1mb
>