On Fri, May 02, 2014 at 01:21:50PM -0600, Chris Murphy wrote:
> 
> On May 2, 2014, at 2:23 AM, Duncan <1i5t5.dun...@cox.net> wrote:
> > 
> > Something tells me btrfs replace (not device replace, simply replace) 
> > should be moved to btrfs device replaceā€¦
> 
> The syntax for "btrfs device" is different though; replace is like balance: 
> btrfs balance start and btrfs replace start. And you can also get a status on 
> it. We don't (yet) have options to stop, start, resume, which could maybe 
> come in handy for long rebuilds and a reboot is required (?) although maybe 
> that just gets handled automatically: set it to pause, then unmount, then 
> reboot, then mount and resume.
> 
> > Well, I'd say two copies if it's only two devices in the raid1... would 
> > be true raid1.  But if it's say four devices in the raid1, as is 
> > certainly possible with btrfs raid1, that if it's not mirrored 4-way 
> > across all devices, it's not true raid1, but rather some sort of hybrid 
> > raid,  raid10 (or raid01) if the devices are so arranged, raid1+linear if 
> > arranged that way, or some form that doesn't nicely fall into a well 
> > defined raid level categorization.
> 
> Well, md raid1 is always n-way. So if you use -n 3 and specify three devices, 
> you'll get 3-way mirroring (3 mirrors). But I don't know any hardware raid 
> that works this way. They all seem to be raid 1 is strictly two devices. At 4 
> devices it's raid10, and only in pairs.
> 
> Btrfs raid1 with 3+ devices is unique as far as I can tell. It is something 
> like raid1 (2 copies) + linear/concat. But that allocation is round robin. I 
> don't read code but based on how a 3 disk raid1 volume grows VDI files as 
> it's filled it looks like 1GB chunks are copied like this
> 
> Disk1 Disk2   Disk3
> 134           124             235
> 679           578             689
> 
> So 1 through 9 each represent a 1GB chunk. Disk 1 and 2 each have a chunk 1; 
> disk 2 and 3 each have a chunk 2, and so on. Total of 9GB of data taking up 
> 18GB of space, 6GB on each drive. You can't do this with any other raid1 as 
> far as I know. You do definitely run out of space on one disk first though 
> because of uneven metadata to data chunk allocation.

   The algorithm is that when the chunk allocator is asked for a block
group (in pairs of chunks for RAID-1), it picks the number of chunks
it needs, from different devices, in order of the device with the most
free space. So, with disks of size 8, 4, 4, you get:

Disk 1: 12345678
Disk 2: 1357
Disk 3: 2468

and with 8, 8, 4, you get:

Disk 1: 1234568A
Disk 2: 1234579A
Disk 3: 6789

   Hugo.

> Anyway I think we're off the rails with raid1 nomenclature as soon as we have 
> 3 devices. It's probably better to call it replication, with an assumed 
> default of 2 replicates unless otherwise specified.
> 
> There's definitely a benefit to a 3 device volume with 2 replicates, 
> efficiency wise. As soon as we go to four disks 2 replicates it makes more 
> sense to do raid10, although I haven't tested odd device raid10 setups so I'm 
> not sure what happens.
> 
> 
> Chris Murphy
> 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
               --- Prisoner unknown:  Return to Zenda. ---               

Attachment: signature.asc
Description: Digital signature

Reply via email to