On Mon, Jun 10, 2019 at 02:29:40PM +0200, David Sterba wrote:
> Hi,
> 
> this patchset brings the RAID1 with 3 and 4 copies as a separate
> feature as outlined in V1
> (https://lore.kernel.org/linux-btrfs/cover.1531503452.git.dste...@suse.com/).
> 
> This should help a bit in the raid56 situation, where the write hole
> hurts most for metadata, without a block group profile that offers 2
> device loss resistance.
> 
> I've gathered some feedback from knowlegeable poeople on IRC and the
> following setup is considered good enough (certainly better than what we
> have now):
> 
> - data: RAID6
> - metadata: RAID1C3
> 
> The RAID1C3 vs RAID6 have different characteristics in terms of space
> consumption and repair.
> 
> 
> Space consumption
> ~~~~~~~~~~~~~~~~~
> 
> * RAID6 reduces overall metadata by N/(N-2), so with more devices the
>   parity overhead ratio is small
> 
> * RAID1C3 will allways consume 67% of metadata chunks for redundancy
> 
> The overall size of metadata is typically in range of gigabytes to
> hundreds of gigabytes (depends on usecase), rough estimate is from
> 1%-10%. With larger filesystem the percentage is usually smaller.
> 
> So, for the 3-copy raid1 the cost of redundancy is better expressed in
> the absolute value of gigabytes "wasted" on redundancy than as the
> ratio that does look scary compared to raid6.
> 
> 
> Repair
> ~~~~~~
> 
> RAID6 needs to access all available devices to calculate the P and Q,
> either 1 or 2 missing devices.
> 
> RAID1C3 can utilize the independence of each copy and also the way the
> RAID1 works in btrfs. In the scenario with 1 missing device, one of the
> 2 correct copies is read and written to the repaired devices.
> 
> Given how the 2-copy RAID1 works on btrfs, the block groups could be
> spread over several devices so the load during repair would be spread as
> well.
> 
> Additionally, device replace works sequentially and in big chunks so on
> a lightly used system the read pattern is seek-friendly.
> 
> 
> Compatibility
> ~~~~~~~~~~~~~
> 
> The new block group types cost an incompatibility bit, so old kernel
> will refuse to mount filesystem with RAID1C3 feature, ie. any chunk on
> the filesystem with the new type.
> 
> To upgrade existing filesystems use the balance filters eg. from RAID6
> 
>   $ btrfs balance start -mconvert=raid1c3 /path
> 
> 
> Merge target
> ~~~~~~~~~~~~
> 
> I'd like to push that to misc-next for wider testing and merge to 5.3,
> unless something bad pops up. Given that the code changes are small and
> just a new types with the constraints, the rest is done by the generic
> code, I'm not expecting problems that can't be fixed before full
> release.
> 
> 
> Testing so far
> ~~~~~~~~~~~~~~
> 
> * mkfs with the profiles
> * fstests (no specific tests, only check that it does not break)
> * profile conversions between single/raid1/raid5/raid1c3/raid6/raid1c4/raid1c4
>   with added devices where needed
> * scrub
> 
> TODO:
> 
> * 1 missing device followed by repair
> * 2 missing devices followed by repair

Unfortunatelly neither of the two cases works as expected and I don't have time
to fix it for the 5.3 deadline. As the 3-copy is supposed to be a replacement
for raid6, I consider the lack of repair capability to be a show stopper so the
main part of the patchset is postponed.

The test I did was something like this:

- create fs with 3 devices, raid1c3
- fill with some data
- unmount
- wipe 2nd device
- mount degraded
- replace missing
- remount read-write, write data to verify that it works
- unmount
- mount as usual   <-- here it fails and the device is still reported missing

The same happens for the 2 missing devices.

Reply via email to