Re: RAID-1 refuses to balance large drive

Qu Wenruo Sat, 26 May 2018 19:17:36 -0700


On 2018年05月27日 10:06, Brad Templeton wrote:
> Thanks.  These are all things which take substantial fractions of a
> day to try, unfortunately.


Normally I would suggest just using VM and several small disks (~10G),
along with fallocate (the fastest way to use space) to get a basic view
of the procedure.

> Last time I ended up fixing it in a
> fairly kluged way, which was to convert from raid-1 to single long
> enough to get enough single blocks that when I converted back to
> raid-1 they got distributed to the right drives.

Yep, that's the ultimate one-fit-all solution.
Also, this reminds me about the fact we could do the
RAID1->Single/DUP->Single downgrade in a much much faster way.
I think it's worthy considering for later enhancement.

>  But this is, aside
> from being a kludge, a procedure with some minor risk.  Of course I am
> taking a backup first, but still...
> 
> This strikes me as something that should be a fairly common event --
> your raid is filling up, and so you expand it by replacing the oldest
> and smallest drive with a new much bigger one.   In the old days of
> RAID, you could not do that, you had to grow all drives at the same
> time, and this is one of the ways that BTRFS is quite superior.
> When I had MD raid, I went through a strange process of always having
> a raid 5 that consisted of different sized drives.  The raid-5 was
> based on the smallest of the 3 drives, and then the larger ones had
> extra space which could either be in raid-1, or more imply was in solo
> disk mode and used for less critical data (such as backups and old
> archives.)   Slowly, and in a messy way, each time I replaced the
> smallest drive, I could then grow the raid 5.  Yuck.     BTRFS is so
> much better, except for this issue.
> 
> So if somebody has a thought of a procedure that is fairly sure to
> work and doesn't involve too many copying passes -- copying 4tb is not
> a quick operation -- it is much appreciated and might be a good thing
> to add to a wiki page, which I would be happy to do.

Anyway, "btrfs fi show" and "btrfs fi usage" would help before any
further advice from community.

Thanks,
Qu

> 
> On Sat, May 26, 2018 at 6:56 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>>
>>
>> On 2018年05月27日 09:49, Brad Templeton wrote:
>>> That is what did not work last time.
>>>
>>> I say I think there can be a "fix" because I hope the goal of BTRFS
>>> raid is to be superior to traditional RAID.   That if one replaces a
>>> drive, and asks to balance, it figures out what needs to be done to
>>> make that work.  I understand that the current balance algorithm may
>>> have trouble with that.   In this situation, the ideal result would be
>>> the system would take the 3 drives (4TB and 6TB full, 8TB with 4TB
>>> free) and move extents strictly from the 4TB and 6TB to the 8TB -- ie
>>> extents which are currently on both the 4TB and 6TB -- by moving only
>>> one copy.
>>
>> Btrfs can only do balance in a chunk unit.
>> Thus btrfs can only do:
>> 1) Create new chunk
>> 2) Copy data
>> 3) Remove old chunk.
>>
>> So it can't do the way you mentioned.
>> But your purpose sounds pretty valid and maybe we could enhanace btrfs
>> to do such thing.
>> (Currently only replace can behave like that)
>>
>>> It is not strictly a "bug" in that the code is operating
>>> as designed, but it is an undesired function.
>>>
>>> The problem is the approach you describe did not work in the prior upgrade.
>>
>> Would you please try 4/4/6 + 4 or 4/4/6 + 2 and then balance?
>> And before/after balance, "btrfs fi usage" and "btrfs fi show" output
>> could also help.
>>
>> Thanks,
>> Qu
>>
>>>
>>> On Sat, May 26, 2018 at 6:41 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>>>>
>>>>
>>>> On 2018年05月27日 09:27, Brad Templeton wrote:
>>>>> A few years ago, I encountered an issue (halfway between a bug and a
>>>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was
>>>>> fairly full.   The problem was that after replacing (by add/delete) a
>>>>> small drive with a larger one, there were now 2 full drives and one
>>>>> new half-full one, and balance was not able to correct this situation
>>>>> to produce the desired result, which is 3 drives, each with a roughly
>>>>> even amount of free space.  It can't do it because the 2 smaller
>>>>> drives are full, and it doesn't realize it could just move one of the
>>>>> copies of a block off the smaller drive onto the larger drive to free
>>>>> space on the smaller drive, it wants to move them both, and there is
>>>>> nowhere to put them both.
>>>>
>>>> It's not that easy.
>>>> For balance, btrfs must first find a large enough space to locate both
>>>> copy, then copy data.
>>>> Or if powerloss happens, it will cause data corruption.
>>>>
>>>> So in your case, btrfs can only find enough space for one copy, thus
>>>> unable to relocate any chunk.
>>>>
>>>>>
>>>>> I'm about to do it again, taking my nearly full array which is 4TB,
>>>>> 4TB, 6TB and replacing one of the 4TB with an 8TB.  I don't want to
>>>>> repeat the very time consuming situation, so I wanted to find out if
>>>>> things were fixed now.   I am running Xenial (kernel 4.4.0) and could
>>>>> consider the upgrade to  bionic (4.15) though that adds a lot more to
>>>>> my plate before a long trip and I would prefer to avoid if I can.
>>>>
>>>> Since there is nothing to fix, the behavior will not change at all.
>>>>
>>>>>
>>>>> So what is the best strategy:
>>>>>
>>>>> a) Replace 4TB with 8TB, resize up and balance?  (This is the "basic" 
>>>>> strategy)
>>>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks
>>>>> from 4TB but possibly not enough)
>>>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with
>>>>> recently vacated 6TB -- much longer procedure but possibly better
>>>>>
>>>>> Or has this all been fixed and method A will work fine and get to the
>>>>> ideal goal -- 3 drives, with available space suitably distributed to
>>>>> allow full utilization over time?
>>>>
>>>> Btrfs chunk allocator is already trying to utilize all drivers for a
>>>> long long time.
>>>> When allocate chunks, btrfs will choose the device with the most free
>>>> space. However the nature of RAID1 needs btrfs to allocate extents from
>>>> 2 different devices, which makes your replaced 4/4/6 a little complex.
>>>> (If your 4/4/6 array is set up and then filled to current stage, btrfs
>>>> should be able to utilize all the space)
>>>>
>>>>
>>>> Personally speaking, if you're confident enough, just add a new device,
>>>> and then do balance.
>>>> If enough chunks get balanced, there should be enough space freed on
>>>> existing disks.
>>>> Then remove the newly added device, then btrfs should handle the
>>>> remaining space well.
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>>
>>>>> On Sat, May 26, 2018 at 6:24 PM, Brad Templeton <brad...@gmail.com> wrote:
>>>>>> A few years ago, I encountered an issue (halfway between a bug and a
>>>>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly
>>>>>> full.   The problem was that after replacing (by add/delete) a small 
>>>>>> drive
>>>>>> with a larger one, there were now 2 full drives and one new half-full 
>>>>>> one,
>>>>>> and balance was not able to correct this situation to produce the desired
>>>>>> result, which is 3 drives, each with a roughly even amount of free space.
>>>>>> It can't do it because the 2 smaller drives are full, and it doesn't 
>>>>>> realize
>>>>>> it could just move one of the copies of a block off the smaller drive 
>>>>>> onto
>>>>>> the larger drive to free space on the smaller drive, it wants to move 
>>>>>> them
>>>>>> both, and there is nowhere to put them both.
>>>>>>
>>>>>> I'm about to do it again, taking my nearly full array which is 4TB, 4TB, 
>>>>>> 6TB
>>>>>> and replacing one of the 4TB with an 8TB.  I don't want to repeat the 
>>>>>> very
>>>>>> time consuming situation, so I wanted to find out if things were fixed 
>>>>>> now.
>>>>>> I am running Xenial (kernel 4.4.0) and could consider the upgrade to  
>>>>>> bionic
>>>>>> (4.15) though that adds a lot more to my plate before a long trip and I
>>>>>> would prefer to avoid if I can.
>>>>>>
>>>>>> So what is the best strategy:
>>>>>>
>>>>>> a) Replace 4TB with 8TB, resize up and balance?  (This is the "basic"
>>>>>> strategy)
>>>>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks 
>>>>>> from
>>>>>> 4TB but possibly not enough)
>>>>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with recently
>>>>>> vacated 6TB -- much longer procedure but possibly better
>>>>>>
>>>>>> Or has this all been fixed and method A will work fine and get to the 
>>>>>> ideal
>>>>>> goal -- 3 drives, with available space suitably distributed to allow full
>>>>>> utilization over time?
>>>>>>
>>>>>> On Fri, Mar 25, 2016 at 7:35 AM, Henk Slager <eye...@gmail.com> wrote:
>>>>>>>
>>>>>>> On Fri, Mar 25, 2016 at 2:16 PM, Patrik Lundquist
>>>>>>> <patrik.lundqu...@gmail.com> wrote:
>>>>>>>> On 23 March 2016 at 20:33, Chris Murphy <li...@colorremedies.com> 
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton <brad...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> I am surprised to hear it said that having the mixed sizes is an odd
>>>>>>>>>> case.
>>>>>>>>>
>>>>>>>>> Not odd as in wrong, just uncommon compared to other arrangements 
>>>>>>>>> being
>>>>>>>>> tested.
>>>>>>>>
>>>>>>>> I think mixed drive sizes in raid1 is a killer feature for a home NAS,
>>>>>>>> where you replace an old smaller drive with the latest and largest
>>>>>>>> when you need more storage.
>>>>>>>>
>>>>>>>> My raid1 currently consists of 6TB+3TB+3*2TB.
>>>>>>>
>>>>>>> For the original OP situation, with chunks all filled op with extents
>>>>>>> and devices all filled up with chunks, 'integrating' a new 6TB drive
>>>>>>> in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual
>>>>>>> way in order to avoid immediate balancing needs:
>>>>>>> - 'plug-in' the 6TB
>>>>>>> - btrfs-replace  4TB by 6TB
>>>>>>> - btrfs fi resize max 6TB_devID
>>>>>>> - btrfs-replace  2TB by 4TB
>>>>>>> - btrfs fi resize max 4TB_devID
>>>>>>> - 'unplug' the 2TB
>>>>>>>
>>>>>>> So then there would be 2 devices with roughly 2TB space available, so
>>>>>>> good for continued btrfs raid1 writes.
>>>>>>>
>>>>>>> An offline variant with dd instead of btrfs-replace could also be done
>>>>>>> (I used to do that sometimes when btrfs-replace was not implemented).
>>>>>>> My experience is that btrfs-replace speed is roughly at max speed (so
>>>>>>> harddisk magnetic media transferspeed) during the whole replace
>>>>>>> process and it does in a more direct way what you actually want. So in
>>>>>>> total mostly way faster device replace/upgrade than with the
>>>>>>> add+delete method. And raid1 redundancy is active all the time. Of
>>>>>>> course it means first make sure the system runs up-to-date/latest
>>>>>>> kernel+tools.
>>>>>>
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>> the body of a message to majord...@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

signature.asc
Description: OpenPGP digital signature

Re: RAID-1 refuses to balance large drive

Reply via email to