On 2014-07-09 22:10, Russell Coker wrote:
> On Wed, 9 Jul 2014 16:48:05 Martin Steigerwald wrote:
>>> - for someone using SAS or enterprise SATA drives with Linux, I
>>> understand btrfs gives the extra benefit of checksums, are there any
>>> other specific benefits over using mdadm or dmraid?
>>
>> I think I can answer this one.
>>
>> Most important advantage I think is BTRFS is aware of which blocks of the
>> RAID are in use and need to be synced:
>>
>> - Instant initialization of RAID regardless of size (unless at some
>> capacity mkfs.btrfs needs more time)
> 
> From mdadm(8):
> 
>        --assume-clean
>               Tell mdadm that the array pre-existed and is known to be  clean.
>               It  can be useful when trying to recover from a major failure as
>               you can be sure that no data will be affected unless  you  actu‐
>               ally  write  to  the array.  It can also be used when creating a
>               RAID1 or RAID10 if you want to avoid the initial resync, however
>               this  practice  — while normally safe — is not recommended.  Use
>               this only if you really know what you are doing.
> 
>               When the devices that will be part of a new  array  were  filled
>               with zeros before creation the operator knows the array is actu‐
>               ally clean. If that is the case,  such  as  after  running  bad‐
>               blocks,  this  argument  can be used to tell mdadm the facts the
>               operator knows.
> 
> While it might be regarded as a hack, it is possible to do a fairly instant 
> initialisation of a Linux software RAID-1.
>
This has the notable disadvantage however that the first scrub you run
will essentially preform a full resync if you didn't make sure that the
disks had identical data to begin with.
>> - Rebuild after disk failure or disk replace will only copy *used* blocks
> 
> Have you done any benchmarks on this?  The down-side of copying used blocks 
> is 
> that you first need to discover which blocks are used.  Given that seek time 
> is 
> a major bottleneck at some portion of space used it will be faster to just 
> copy the entire disk.
> 
> I haven't done any tests on BTRFS in this regard, but I've seen a disk 
> replacement on ZFS run significantly slower than a dd of the block device 
> would.
> 
First of all, this isn't really a good comparison for two reasons:
1. EVERYTHING on ZFS (or any filesystem that tries to do that much work)
is slower than a dd of the raw block device.
2. Even if the throughput is lower, this is only really an issue if the
disk is more than half full, because you don't copy the unused blocks

Also, while it isn't really a recovery situation, I recently upgraded
from a 2 1TB disk BTRFS RAID1 setup to a 4 1TB disk BTRFS RAID10 setup,
and the performance of the re-balance really wasn't all that bad.  I
have maybe 100GB of actual data, so the array started out roughly 10%
full, and the re-balance only took about 2 minutes.  Of course, it
probably helps that I make a point to keep my filesystems de-fragmented,
scrub and balance regularly, and don't use a lot of sub-volumes or
snapshots, so the filesystem in question is not too different from what
it would have looked like if I had just wiped the FS and restored from a
backup.
>> Scrubbing can repair from good disk if RAID with redundancy, but SoftRAID
>> should be able to do this as well. But also for scrubbing: BTRFS only
>> check and repairs used blocks.
> 
> When you scrub Linux Software RAID (and in fact pretty much every RAID) it 
> will only correct errors that the disks flag.  If a disk returns bad data and 
> says that it's good then the RAID scrub will happily copy the bad data over 
> the good data (for a RAID-1) or generate new valid parity blocks for bad data 
> (for RAID-5/6).
> 
> http://research.cs.wisc.edu/adsl/Publications/corruption-fast08.html
> 
> Page 12 of the above document says that "nearline" disks (IE the ones people 
> like me can afford for home use) have a 0.466% incidence of returning bad 
> data 
> and claiming it's good in a year.  Currently I run about 20 such disks in a 
> variety of servers, workstations, and laptops.  Therefore the probability of 
> having no such errors on all those disks would be .99534^20=.91081.  The 
> probability of having no such errors over a period of 10 years would be 
> (.99534^20)^10=.39290 which means that over 10 years I should expect to have 
> such errors, which is why BTRFS RAID-1 and DUP metadata on single disks are 
> necessary features.
> 


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to