On 29.11.2016 23:52, Chris Murphy wrote:
> On Tue, Nov 29, 2016 at 3:34 PM, Wilson Meier <wilson.me...@gmail.com> wrote:
>> On 29.11.2016 18:54, Austin S. Hemmelgarn wrote:
>>> On 2016-11-29 12:20, Florian Lindner wrote:
>>>> Hello,
>>>>
>>>> I have 4 harddisks with 3TB capacity each. They are all used in a
>>>> btrfs RAID 5. It has come to my attention, that there
>>>> seem to be major flaws in btrfs' raid 5 implementation. Because of
>>>> that, I want to convert the the raid 5 to a raid 10
>>>> and I have several questions.
>>>>
>>>> * Is that possible as an online conversion?
>>> Yes, as long as you have a complete array to begin with (converting from
>>> a degraded raid5/6 array has the same issues as rebuilding a degraded
>>> raid5/6 array).
>>>>
>>>> * Since my effective capacity will shrink during conversions, does
>>>> btrfs check if there is enough free capacity to
>>>> convert? As you see below, right now it's probably too full, but I'm
>>>> going to delete some stuff.
>>> No, you'll have to do the math yourself.  This would be a great project
>>> idea to place on the wiki though.
>>>>
>>>> * I understand the command to convert is
>>>>
>>>> btrfs balance start -dconvert=raid10 -mconvert=raid10 /mnt
>>>>
>>>> Correct?
>>> Yes, but I would personally convert first metadata then data.  The
>>> raid10 profile gets better performance than raid5, so converting the
>>> metadata first (by issuing a balance just covering the metadata) should
>>> speed up the data conversion a bit).
>>>>
>>>> * What disks are allowed to fail? My understanding of a raid 10 is
>>>> like that
>>>>
>>>> disks = {a, b, c, d}
>>>>
>>>> raid0( raid1(a, b), raid1(c, d) )
>>>>
>>>> This way (a XOR b) AND (c XOR d) are allowed to fail without the raid
>>>> to fail (either a or b and c or d are allowed to fail)
>>>>
>>>> How is that with a btrfs raid 10?
>>> A BTRFS raid10 can only sustain one disk failure.  Ideally, it would
>>> work like you show, but in practice it doesn't.
>> I'm a little bit concerned right now. I migrated my 4 disk raid6 to
>> raid10 because of the known raid5/6 problems. I assumed that btrfs
>> raid10 can handle 2 disk failures as longs as they occur in different
>> stripes.
>> Could you please point out why it cannot sustain 2 disk failures?
> 
> Conventional raid10 has a fixed assignment of which drives are
> mirrored pairs, and this doesn't happen with Btrfs at the device level
> but rather the chunk level. And a chunk stripe number is not fixed to
> a particular device, therefore it's possible a device will have more
> than one chunk stripe number. So what that means is the loss of two
> devices has a pretty decent chance of resulting in the loss of both
> copies of a chunk, whereas conventional RAID 10 must lose both
> mirrored pairs for data loss to happen.
> 
> With very cursory testing what I've found is btrfs-progs establishes
> an initial stripe number to device mapping that's different than the
> kernel code. The kernel code appears to be pretty consistent so long
> as the member devices are identically sized. So it's probably not an
> unfixable problem, but the effect is that right now Btrfs raid10
> profile is more like raid0+1.
> 
> You can use
> $ sudo btrfs insp dump-tr -t 3 /dev/
> 
> That will dump the chunk tree, and you can see if any device has more
> than one chunk stripe number associated with it.
> 
> 
Huh, that makes sense. That probably should be fixed :)

Given your advised command (extended it a bit for readability):
# btrfs insp dump-tr -t 3 /dev/mapper/luks-2.1 | grep "stripe " | awk '{
print $1" "$2" "$3" "$4 }' | sort -u

I get:
stripe 0 devid 1
stripe 0 devid 4
stripe 1 devid 2
stripe 1 devid 3
stripe 1 devid 4
stripe 2 devid 1
stripe 2 devid 2
stripe 2 devid 3
stripe 3 devid 1
stripe 3 devid 2
stripe 3 devid 3
stripe 3 devid 4

Now i'm even more concerned!
That said, btrfs shouldn't be used for other then raid1 as every other
raid level has serious problems or at least doesn't work as the expected
raid level (in terms of failure recovery).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to