Re: RAID-1 refuses to balance large drive

Brad Templeton Sat, 26 May 2018 19:22:33 -0700

Certainly.  My apologies for not including them before.   As
described, the disks are reasonably balanced -- not as full as the
last time.  As such, it might be enough that balance would (slowly)
free up enough chunks to get things going.  And if I have to, I will
partially convert to single again.   Certainly btrfs replace seems
like the most planned and simple path but it will result in a strange
distribution of the chunks.


Label: 'butter'  uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438
       Total devices 3 FS bytes used 6.11TiB
       devid    1 size 3.62TiB used 3.47TiB path /dev/sdj2Overall:
   Device size:                  12.70TiB
   Device allocated:             12.25TiB
   Device unallocated:          459.95GiB
   Device missing:                  0.00B
   Used:                         12.21TiB
   Free (estimated):            246.35GiB      (min: 246.35GiB)
   Data ratio:                       2.00
   Metadata ratio:                   2.00
   Global reserve:              512.00MiB      (used: 1.32MiB)

Data,RAID1: Size:6.11TiB, Used:6.09TiB
  /dev/sda        3.48TiB
  /dev/sdi2       5.28TiB
  /dev/sdj2       3.46TiB

Metadata,RAID1: Size:14.00GiB, Used:12.38GiB
  /dev/sda        8.00GiB
  /dev/sdi2       7.00GiB
  /dev/sdj2      13.00GiB

System,RAID1: Size:32.00MiB, Used:888.00KiB
  /dev/sdi2      32.00MiB
  /dev/sdj2      32.00MiB

Unallocated:
  /dev/sda      153.02GiB
  /dev/sdi2     154.56GiB
  /dev/sdj2     152.36GiB

      devid    2 size 3.64TiB used 3.49TiB path /dev/sda
       devid    3 size 5.43TiB used 5.28TiB path /dev/sdi2


On Sat, May 26, 2018 at 7:16 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>
>
> On 2018年05月27日 10:06, Brad Templeton wrote:
>> Thanks.  These are all things which take substantial fractions of a
>> day to try, unfortunately.
>
> Normally I would suggest just using VM and several small disks (~10G),
> along with fallocate (the fastest way to use space) to get a basic view
> of the procedure.
>
>> Last time I ended up fixing it in a
>> fairly kluged way, which was to convert from raid-1 to single long
>> enough to get enough single blocks that when I converted back to
>> raid-1 they got distributed to the right drives.
>
> Yep, that's the ultimate one-fit-all solution.
> Also, this reminds me about the fact we could do the
> RAID1->Single/DUP->Single downgrade in a much much faster way.
> I think it's worthy considering for later enhancement.
>
>>  But this is, aside
>> from being a kludge, a procedure with some minor risk.  Of course I am
>> taking a backup first, but still...
>>
>> This strikes me as something that should be a fairly common event --
>> your raid is filling up, and so you expand it by replacing the oldest
>> and smallest drive with a new much bigger one.   In the old days of
>> RAID, you could not do that, you had to grow all drives at the same
>> time, and this is one of the ways that BTRFS is quite superior.
>> When I had MD raid, I went through a strange process of always having
>> a raid 5 that consisted of different sized drives.  The raid-5 was
>> based on the smallest of the 3 drives, and then the larger ones had
>> extra space which could either be in raid-1, or more imply was in solo
>> disk mode and used for less critical data (such as backups and old
>> archives.)   Slowly, and in a messy way, each time I replaced the
>> smallest drive, I could then grow the raid 5.  Yuck.     BTRFS is so
>> much better, except for this issue.
>>
>> So if somebody has a thought of a procedure that is fairly sure to
>> work and doesn't involve too many copying passes -- copying 4tb is not
>> a quick operation -- it is much appreciated and might be a good thing
>> to add to a wiki page, which I would be happy to do.
>
> Anyway, "btrfs fi show" and "btrfs fi usage" would help before any
> further advice from community.
>
> Thanks,
> Qu
>
>>
>> On Sat, May 26, 2018 at 6:56 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>>>
>>>
>>> On 2018年05月27日 09:49, Brad Templeton wrote:
>>>> That is what did not work last time.
>>>>
>>>> I say I think there can be a "fix" because I hope the goal of BTRFS
>>>> raid is to be superior to traditional RAID.   That if one replaces a
>>>> drive, and asks to balance, it figures out what needs to be done to
>>>> make that work.  I understand that the current balance algorithm may
>>>> have trouble with that.   In this situation, the ideal result would be
>>>> the system would take the 3 drives (4TB and 6TB full, 8TB with 4TB
>>>> free) and move extents strictly from the 4TB and 6TB to the 8TB -- ie
>>>> extents which are currently on both the 4TB and 6TB -- by moving only
>>>> one copy.
>>>
>>> Btrfs can only do balance in a chunk unit.
>>> Thus btrfs can only do:
>>> 1) Create new chunk
>>> 2) Copy data
>>> 3) Remove old chunk.
>>>
>>> So it can't do the way you mentioned.
>>> But your purpose sounds pretty valid and maybe we could enhanace btrfs
>>> to do such thing.
>>> (Currently only replace can behave like that)
>>>
>>>> It is not strictly a "bug" in that the code is operating
>>>> as designed, but it is an undesired function.
>>>>
>>>> The problem is the approach you describe did not work in the prior upgrade.
>>>
>>> Would you please try 4/4/6 + 4 or 4/4/6 + 2 and then balance?
>>> And before/after balance, "btrfs fi usage" and "btrfs fi show" output
>>> could also help.
>>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>> On Sat, May 26, 2018 at 6:41 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>>>>>
>>>>>
>>>>> On 2018年05月27日 09:27, Brad Templeton wrote:
>>>>>> A few years ago, I encountered an issue (halfway between a bug and a
>>>>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was
>>>>>> fairly full.   The problem was that after replacing (by add/delete) a
>>>>>> small drive with a larger one, there were now 2 full drives and one
>>>>>> new half-full one, and balance was not able to correct this situation
>>>>>> to produce the desired result, which is 3 drives, each with a roughly
>>>>>> even amount of free space.  It can't do it because the 2 smaller
>>>>>> drives are full, and it doesn't realize it could just move one of the
>>>>>> copies of a block off the smaller drive onto the larger drive to free
>>>>>> space on the smaller drive, it wants to move them both, and there is
>>>>>> nowhere to put them both.
>>>>>
>>>>> It's not that easy.
>>>>> For balance, btrfs must first find a large enough space to locate both
>>>>> copy, then copy data.
>>>>> Or if powerloss happens, it will cause data corruption.
>>>>>
>>>>> So in your case, btrfs can only find enough space for one copy, thus
>>>>> unable to relocate any chunk.
>>>>>
>>>>>>
>>>>>> I'm about to do it again, taking my nearly full array which is 4TB,
>>>>>> 4TB, 6TB and replacing one of the 4TB with an 8TB.  I don't want to
>>>>>> repeat the very time consuming situation, so I wanted to find out if
>>>>>> things were fixed now.   I am running Xenial (kernel 4.4.0) and could
>>>>>> consider the upgrade to  bionic (4.15) though that adds a lot more to
>>>>>> my plate before a long trip and I would prefer to avoid if I can.
>>>>>
>>>>> Since there is nothing to fix, the behavior will not change at all.
>>>>>
>>>>>>
>>>>>> So what is the best strategy:
>>>>>>
>>>>>> a) Replace 4TB with 8TB, resize up and balance?  (This is the "basic" 
>>>>>> strategy)
>>>>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks
>>>>>> from 4TB but possibly not enough)
>>>>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with
>>>>>> recently vacated 6TB -- much longer procedure but possibly better
>>>>>>
>>>>>> Or has this all been fixed and method A will work fine and get to the
>>>>>> ideal goal -- 3 drives, with available space suitably distributed to
>>>>>> allow full utilization over time?
>>>>>
>>>>> Btrfs chunk allocator is already trying to utilize all drivers for a
>>>>> long long time.
>>>>> When allocate chunks, btrfs will choose the device with the most free
>>>>> space. However the nature of RAID1 needs btrfs to allocate extents from
>>>>> 2 different devices, which makes your replaced 4/4/6 a little complex.
>>>>> (If your 4/4/6 array is set up and then filled to current stage, btrfs
>>>>> should be able to utilize all the space)
>>>>>
>>>>>
>>>>> Personally speaking, if you're confident enough, just add a new device,
>>>>> and then do balance.
>>>>> If enough chunks get balanced, there should be enough space freed on
>>>>> existing disks.
>>>>> Then remove the newly added device, then btrfs should handle the
>>>>> remaining space well.
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>>>>
>>>>>> On Sat, May 26, 2018 at 6:24 PM, Brad Templeton <brad...@gmail.com> 
>>>>>> wrote:
>>>>>>> A few years ago, I encountered an issue (halfway between a bug and a
>>>>>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly
>>>>>>> full.   The problem was that after replacing (by add/delete) a small 
>>>>>>> drive
>>>>>>> with a larger one, there were now 2 full drives and one new half-full 
>>>>>>> one,
>>>>>>> and balance was not able to correct this situation to produce the 
>>>>>>> desired
>>>>>>> result, which is 3 drives, each with a roughly even amount of free 
>>>>>>> space.
>>>>>>> It can't do it because the 2 smaller drives are full, and it doesn't 
>>>>>>> realize
>>>>>>> it could just move one of the copies of a block off the smaller drive 
>>>>>>> onto
>>>>>>> the larger drive to free space on the smaller drive, it wants to move 
>>>>>>> them
>>>>>>> both, and there is nowhere to put them both.
>>>>>>>
>>>>>>> I'm about to do it again, taking my nearly full array which is 4TB, 
>>>>>>> 4TB, 6TB
>>>>>>> and replacing one of the 4TB with an 8TB.  I don't want to repeat the 
>>>>>>> very
>>>>>>> time consuming situation, so I wanted to find out if things were fixed 
>>>>>>> now.
>>>>>>> I am running Xenial (kernel 4.4.0) and could consider the upgrade to  
>>>>>>> bionic
>>>>>>> (4.15) though that adds a lot more to my plate before a long trip and I
>>>>>>> would prefer to avoid if I can.
>>>>>>>
>>>>>>> So what is the best strategy:
>>>>>>>
>>>>>>> a) Replace 4TB with 8TB, resize up and balance?  (This is the "basic"
>>>>>>> strategy)
>>>>>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks 
>>>>>>> from
>>>>>>> 4TB but possibly not enough)
>>>>>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with recently
>>>>>>> vacated 6TB -- much longer procedure but possibly better
>>>>>>>
>>>>>>> Or has this all been fixed and method A will work fine and get to the 
>>>>>>> ideal
>>>>>>> goal -- 3 drives, with available space suitably distributed to allow 
>>>>>>> full
>>>>>>> utilization over time?
>>>>>>>
>>>>>>> On Fri, Mar 25, 2016 at 7:35 AM, Henk Slager <eye...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On Fri, Mar 25, 2016 at 2:16 PM, Patrik Lundquist
>>>>>>>> <patrik.lundqu...@gmail.com> wrote:
>>>>>>>>> On 23 March 2016 at 20:33, Chris Murphy <li...@colorremedies.com> 
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton <brad...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> I am surprised to hear it said that having the mixed sizes is an odd
>>>>>>>>>>> case.
>>>>>>>>>>
>>>>>>>>>> Not odd as in wrong, just uncommon compared to other arrangements 
>>>>>>>>>> being
>>>>>>>>>> tested.
>>>>>>>>>
>>>>>>>>> I think mixed drive sizes in raid1 is a killer feature for a home NAS,
>>>>>>>>> where you replace an old smaller drive with the latest and largest
>>>>>>>>> when you need more storage.
>>>>>>>>>
>>>>>>>>> My raid1 currently consists of 6TB+3TB+3*2TB.
>>>>>>>>
>>>>>>>> For the original OP situation, with chunks all filled op with extents
>>>>>>>> and devices all filled up with chunks, 'integrating' a new 6TB drive
>>>>>>>> in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual
>>>>>>>> way in order to avoid immediate balancing needs:
>>>>>>>> - 'plug-in' the 6TB
>>>>>>>> - btrfs-replace  4TB by 6TB
>>>>>>>> - btrfs fi resize max 6TB_devID
>>>>>>>> - btrfs-replace  2TB by 4TB
>>>>>>>> - btrfs fi resize max 4TB_devID
>>>>>>>> - 'unplug' the 2TB
>>>>>>>>
>>>>>>>> So then there would be 2 devices with roughly 2TB space available, so
>>>>>>>> good for continued btrfs raid1 writes.
>>>>>>>>
>>>>>>>> An offline variant with dd instead of btrfs-replace could also be done
>>>>>>>> (I used to do that sometimes when btrfs-replace was not implemented).
>>>>>>>> My experience is that btrfs-replace speed is roughly at max speed (so
>>>>>>>> harddisk magnetic media transferspeed) during the whole replace
>>>>>>>> process and it does in a more direct way what you actually want. So in
>>>>>>>> total mostly way faster device replace/upgrade than with the
>>>>>>>> add+delete method. And raid1 redundancy is active all the time. Of
>>>>>>>> course it means first make sure the system runs up-to-date/latest
>>>>>>>> kernel+tools.
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>>> the body of a message to majord...@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RAID-1 refuses to balance large drive

Reply via email to