On 30.11.2016 00:49, Chris Murphy wrote:
> On Tue, Nov 29, 2016 at 4:16 PM, Wilson Meier <wilson.me...@gmail.com> wrote:
>>
>>
>> On 29.11.2016 23:52, Chris Murphy wrote:
>>> On Tue, Nov 29, 2016 at 3:34 PM, Wilson Meier <wilson.me...@gmail.com> 
>>> wrote:
>>>> On 29.11.2016 18:54, Austin S. Hemmelgarn wrote:
>>>>> On 2016-11-29 12:20, Florian Lindner wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I have 4 harddisks with 3TB capacity each. They are all used in a
>>>>>> btrfs RAID 5. It has come to my attention, that there
>>>>>> seem to be major flaws in btrfs' raid 5 implementation. Because of
>>>>>> that, I want to convert the the raid 5 to a raid 10
>>>>>> and I have several questions.
>>>>>>
>>>>>> * Is that possible as an online conversion?
>>>>> Yes, as long as you have a complete array to begin with (converting from
>>>>> a degraded raid5/6 array has the same issues as rebuilding a degraded
>>>>> raid5/6 array).
>>>>>>
>>>>>> * Since my effective capacity will shrink during conversions, does
>>>>>> btrfs check if there is enough free capacity to
>>>>>> convert? As you see below, right now it's probably too full, but I'm
>>>>>> going to delete some stuff.
>>>>> No, you'll have to do the math yourself.  This would be a great project
>>>>> idea to place on the wiki though.
>>>>>>
>>>>>> * I understand the command to convert is
>>>>>>
>>>>>> btrfs balance start -dconvert=raid10 -mconvert=raid10 /mnt
>>>>>>
>>>>>> Correct?
>>>>> Yes, but I would personally convert first metadata then data.  The
>>>>> raid10 profile gets better performance than raid5, so converting the
>>>>> metadata first (by issuing a balance just covering the metadata) should
>>>>> speed up the data conversion a bit).
>>>>>>
>>>>>> * What disks are allowed to fail? My understanding of a raid 10 is
>>>>>> like that
>>>>>>
>>>>>> disks = {a, b, c, d}
>>>>>>
>>>>>> raid0( raid1(a, b), raid1(c, d) )
>>>>>>
>>>>>> This way (a XOR b) AND (c XOR d) are allowed to fail without the raid
>>>>>> to fail (either a or b and c or d are allowed to fail)
>>>>>>
>>>>>> How is that with a btrfs raid 10?
>>>>> A BTRFS raid10 can only sustain one disk failure.  Ideally, it would
>>>>> work like you show, but in practice it doesn't.
>>>> I'm a little bit concerned right now. I migrated my 4 disk raid6 to
>>>> raid10 because of the known raid5/6 problems. I assumed that btrfs
>>>> raid10 can handle 2 disk failures as longs as they occur in different
>>>> stripes.
>>>> Could you please point out why it cannot sustain 2 disk failures?
>>>
>>> Conventional raid10 has a fixed assignment of which drives are
>>> mirrored pairs, and this doesn't happen with Btrfs at the device level
>>> but rather the chunk level. And a chunk stripe number is not fixed to
>>> a particular device, therefore it's possible a device will have more
>>> than one chunk stripe number. So what that means is the loss of two
>>> devices has a pretty decent chance of resulting in the loss of both
>>> copies of a chunk, whereas conventional RAID 10 must lose both
>>> mirrored pairs for data loss to happen.
>>>
>>> With very cursory testing what I've found is btrfs-progs establishes
>>> an initial stripe number to device mapping that's different than the
>>> kernel code. The kernel code appears to be pretty consistent so long
>>> as the member devices are identically sized. So it's probably not an
>>> unfixable problem, but the effect is that right now Btrfs raid10
>>> profile is more like raid0+1.
>>>
>>> You can use
>>> $ sudo btrfs insp dump-tr -t 3 /dev/
>>>
>>> That will dump the chunk tree, and you can see if any device has more
>>> than one chunk stripe number associated with it.
>>>
>>>
>> Huh, that makes sense. That probably should be fixed :)
>>
>> Given your advised command (extended it a bit for readability):
>> # btrfs insp dump-tr -t 3 /dev/mapper/luks-2.1 | grep "stripe " | awk '{
>> print $1" "$2" "$3" "$4 }' | sort -u
>>
>> I get:
>> stripe 0 devid 1
>> stripe 0 devid 4
>> stripe 1 devid 2
>> stripe 1 devid 3
>> stripe 1 devid 4
>> stripe 2 devid 1
>> stripe 2 devid 2
>> stripe 2 devid 3
>> stripe 3 devid 1
>> stripe 3 devid 2
>> stripe 3 devid 3
>> stripe 3 devid 4
>>
>> Now i'm even more concerned!
> 
> Uhh yeah, this is a four device raid10? I'm a little confused why it's
> not consistently showing four stripes per chunk, which would mean the
> same number of strip 0's as stripe 3's. I don't know what that's
> about.
> 
Yes, 4 devices. It does show 4 stripes per chunk, but the command above
sorts and makes the results unique (sort -u). This gives a quick
overview of multiple stripes on a single device.

> A full balance might make the mapping consistent.
>
Will give i a try.

>> That said, btrfs shouldn't be used for other then raid1 as every other
>> raid level has serious problems or at least doesn't work as the expected
>> raid level (in terms of failure recovery).
> 
> Well, raid1 is also single device failure tolerance only as well.
> There is no device n raid1.
> 
Sure, but this is the expected behaviour of raid1. So at least no
surprise here :)



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to