Re: RAID-1 refuses to balance large drive

Brad Templeton Sat, 26 May 2018 19:07:24 -0700

Thanks.  These are all things which take substantial fractions of a
day to try, unfortunately.    Last time I ended up fixing it in a
fairly kluged way, which was to convert from raid-1 to single long
enough to get enough single blocks that when I converted back to
raid-1 they got distributed to the right drives.  But this is, aside
from being a kludge, a procedure with some minor risk.  Of course I am
taking a backup first, but still...


This strikes me as something that should be a fairly common event --
your raid is filling up, and so you expand it by replacing the oldest
and smallest drive with a new much bigger one.   In the old days of
RAID, you could not do that, you had to grow all drives at the same
time, and this is one of the ways that BTRFS is quite superior.
When I had MD raid, I went through a strange process of always having
a raid 5 that consisted of different sized drives.  The raid-5 was
based on the smallest of the 3 drives, and then the larger ones had
extra space which could either be in raid-1, or more imply was in solo
disk mode and used for less critical data (such as backups and old
archives.)   Slowly, and in a messy way, each time I replaced the
smallest drive, I could then grow the raid 5.  Yuck.     BTRFS is so
much better, except for this issue.

So if somebody has a thought of a procedure that is fairly sure to
work and doesn't involve too many copying passes -- copying 4tb is not
a quick operation -- it is much appreciated and might be a good thing
to add to a wiki page, which I would be happy to do.

On Sat, May 26, 2018 at 6:56 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>
>
> On 2018年05月27日 09:49, Brad Templeton wrote:
>> That is what did not work last time.
>>
>> I say I think there can be a "fix" because I hope the goal of BTRFS
>> raid is to be superior to traditional RAID.   That if one replaces a
>> drive, and asks to balance, it figures out what needs to be done to
>> make that work.  I understand that the current balance algorithm may
>> have trouble with that.   In this situation, the ideal result would be
>> the system would take the 3 drives (4TB and 6TB full, 8TB with 4TB
>> free) and move extents strictly from the 4TB and 6TB to the 8TB -- ie
>> extents which are currently on both the 4TB and 6TB -- by moving only
>> one copy.
>
> Btrfs can only do balance in a chunk unit.
> Thus btrfs can only do:
> 1) Create new chunk
> 2) Copy data
> 3) Remove old chunk.
>
> So it can't do the way you mentioned.
> But your purpose sounds pretty valid and maybe we could enhanace btrfs
> to do such thing.
> (Currently only replace can behave like that)
>
>> It is not strictly a "bug" in that the code is operating
>> as designed, but it is an undesired function.
>>
>> The problem is the approach you describe did not work in the prior upgrade.
>
> Would you please try 4/4/6 + 4 or 4/4/6 + 2 and then balance?
> And before/after balance, "btrfs fi usage" and "btrfs fi show" output
> could also help.
>
> Thanks,
> Qu
>
>>
>> On Sat, May 26, 2018 at 6:41 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>>>
>>>
>>> On 2018年05月27日 09:27, Brad Templeton wrote:
>>>> A few years ago, I encountered an issue (halfway between a bug and a
>>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was
>>>> fairly full.   The problem was that after replacing (by add/delete) a
>>>> small drive with a larger one, there were now 2 full drives and one
>>>> new half-full one, and balance was not able to correct this situation
>>>> to produce the desired result, which is 3 drives, each with a roughly
>>>> even amount of free space.  It can't do it because the 2 smaller
>>>> drives are full, and it doesn't realize it could just move one of the
>>>> copies of a block off the smaller drive onto the larger drive to free
>>>> space on the smaller drive, it wants to move them both, and there is
>>>> nowhere to put them both.
>>>
>>> It's not that easy.
>>> For balance, btrfs must first find a large enough space to locate both
>>> copy, then copy data.
>>> Or if powerloss happens, it will cause data corruption.
>>>
>>> So in your case, btrfs can only find enough space for one copy, thus
>>> unable to relocate any chunk.
>>>
>>>>
>>>> I'm about to do it again, taking my nearly full array which is 4TB,
>>>> 4TB, 6TB and replacing one of the 4TB with an 8TB.  I don't want to
>>>> repeat the very time consuming situation, so I wanted to find out if
>>>> things were fixed now.   I am running Xenial (kernel 4.4.0) and could
>>>> consider the upgrade to  bionic (4.15) though that adds a lot more to
>>>> my plate before a long trip and I would prefer to avoid if I can.
>>>
>>> Since there is nothing to fix, the behavior will not change at all.
>>>
>>>>
>>>> So what is the best strategy:
>>>>
>>>> a) Replace 4TB with 8TB, resize up and balance?  (This is the "basic" 
>>>> strategy)
>>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks
>>>> from 4TB but possibly not enough)
>>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with
>>>> recently vacated 6TB -- much longer procedure but possibly better
>>>>
>>>> Or has this all been fixed and method A will work fine and get to the
>>>> ideal goal -- 3 drives, with available space suitably distributed to
>>>> allow full utilization over time?
>>>
>>> Btrfs chunk allocator is already trying to utilize all drivers for a
>>> long long time.
>>> When allocate chunks, btrfs will choose the device with the most free
>>> space. However the nature of RAID1 needs btrfs to allocate extents from
>>> 2 different devices, which makes your replaced 4/4/6 a little complex.
>>> (If your 4/4/6 array is set up and then filled to current stage, btrfs
>>> should be able to utilize all the space)
>>>
>>>
>>> Personally speaking, if you're confident enough, just add a new device,
>>> and then do balance.
>>> If enough chunks get balanced, there should be enough space freed on
>>> existing disks.
>>> Then remove the newly added device, then btrfs should handle the
>>> remaining space well.
>>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>> On Sat, May 26, 2018 at 6:24 PM, Brad Templeton <brad...@gmail.com> wrote:
>>>>> A few years ago, I encountered an issue (halfway between a bug and a
>>>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly
>>>>> full.   The problem was that after replacing (by add/delete) a small drive
>>>>> with a larger one, there were now 2 full drives and one new half-full one,
>>>>> and balance was not able to correct this situation to produce the desired
>>>>> result, which is 3 drives, each with a roughly even amount of free space.
>>>>> It can't do it because the 2 smaller drives are full, and it doesn't 
>>>>> realize
>>>>> it could just move one of the copies of a block off the smaller drive onto
>>>>> the larger drive to free space on the smaller drive, it wants to move them
>>>>> both, and there is nowhere to put them both.
>>>>>
>>>>> I'm about to do it again, taking my nearly full array which is 4TB, 4TB, 
>>>>> 6TB
>>>>> and replacing one of the 4TB with an 8TB.  I don't want to repeat the very
>>>>> time consuming situation, so I wanted to find out if things were fixed 
>>>>> now.
>>>>> I am running Xenial (kernel 4.4.0) and could consider the upgrade to  
>>>>> bionic
>>>>> (4.15) though that adds a lot more to my plate before a long trip and I
>>>>> would prefer to avoid if I can.
>>>>>
>>>>> So what is the best strategy:
>>>>>
>>>>> a) Replace 4TB with 8TB, resize up and balance?  (This is the "basic"
>>>>> strategy)
>>>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks 
>>>>> from
>>>>> 4TB but possibly not enough)
>>>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with recently
>>>>> vacated 6TB -- much longer procedure but possibly better
>>>>>
>>>>> Or has this all been fixed and method A will work fine and get to the 
>>>>> ideal
>>>>> goal -- 3 drives, with available space suitably distributed to allow full
>>>>> utilization over time?
>>>>>
>>>>> On Fri, Mar 25, 2016 at 7:35 AM, Henk Slager <eye...@gmail.com> wrote:
>>>>>>
>>>>>> On Fri, Mar 25, 2016 at 2:16 PM, Patrik Lundquist
>>>>>> <patrik.lundqu...@gmail.com> wrote:
>>>>>>> On 23 March 2016 at 20:33, Chris Murphy <li...@colorremedies.com> wrote:
>>>>>>>>
>>>>>>>> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton <brad...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> I am surprised to hear it said that having the mixed sizes is an odd
>>>>>>>>> case.
>>>>>>>>
>>>>>>>> Not odd as in wrong, just uncommon compared to other arrangements being
>>>>>>>> tested.
>>>>>>>
>>>>>>> I think mixed drive sizes in raid1 is a killer feature for a home NAS,
>>>>>>> where you replace an old smaller drive with the latest and largest
>>>>>>> when you need more storage.
>>>>>>>
>>>>>>> My raid1 currently consists of 6TB+3TB+3*2TB.
>>>>>>
>>>>>> For the original OP situation, with chunks all filled op with extents
>>>>>> and devices all filled up with chunks, 'integrating' a new 6TB drive
>>>>>> in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual
>>>>>> way in order to avoid immediate balancing needs:
>>>>>> - 'plug-in' the 6TB
>>>>>> - btrfs-replace  4TB by 6TB
>>>>>> - btrfs fi resize max 6TB_devID
>>>>>> - btrfs-replace  2TB by 4TB
>>>>>> - btrfs fi resize max 4TB_devID
>>>>>> - 'unplug' the 2TB
>>>>>>
>>>>>> So then there would be 2 devices with roughly 2TB space available, so
>>>>>> good for continued btrfs raid1 writes.
>>>>>>
>>>>>> An offline variant with dd instead of btrfs-replace could also be done
>>>>>> (I used to do that sometimes when btrfs-replace was not implemented).
>>>>>> My experience is that btrfs-replace speed is roughly at max speed (so
>>>>>> harddisk magnetic media transferspeed) during the whole replace
>>>>>> process and it does in a more direct way what you actually want. So in
>>>>>> total mostly way faster device replace/upgrade than with the
>>>>>> add+delete method. And raid1 redundancy is active all the time. Of
>>>>>> course it means first make sure the system runs up-to-date/latest
>>>>>> kernel+tools.
>>>>>
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majord...@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RAID-1 refuses to balance large drive

Reply via email to