On Tue, Nov 19, 2013 at 11:16:58PM +0000, Duncan wrote: > Hugo Mills posted on Tue, 19 Nov 2013 09:06:02 +0000 as excerpted: > > > This will happen with RAID-10. The allocator will write stripes as wide > > as it can: in this case, the first stripes will run across all 8 > > devices, until the SSDs are full, and then will write across the > > remaining 4 devices. > > Hugo, it doesn't change the outcome for this case, but either your > assertion above is incorrect, or the wiki discussion is incorrect (of > course, or possibly I'm the one misunderstanding something, in which case > hopefully replies to this will correct my understanding). > > Because I distinctly recall reading on the wiki that for raid, regardless > of the raid level, btrfs always allocates in pairs (well, I guess it'd be > pairs of pairs for raid10 mode, and I believe that statement pre-dated > raid5/6 support so that isn't included). I was actually shocked by that > because while I knew that was the case for raid1, I had thought that > other raid levels would stripe as widely as possible, which is what you > assert above as well.
That's incorrect. I used to think that, a few years ago, and it got into at least one piece of documentation as a result, but once I worked out the actual behaviour, I did try to correct it (I definitely remember fixing the sysadmin guide this way). For striped levels (RAID-0, 10, 5, 6), the FS will use as many stripes as possible -- for RAID-10, this means an even number; for the others, this is all the devices with free space on, down to a RAID-level dependent minimum. RAID-0: min 2 devices RAID-10: min 4 devices RAID-5: min 2 devices (I think) RAID-6: min 3 devices (I think) > Now I just have to find where I read that on the wiki... > > OK, here's one spot, FAQ, md-raid/device-mapper-raid/btrfs-raid > differences, btrfs: > > https://btrfs.wiki.kernel.org/index.php/FAQ#btrfs > > >>>> > > btrfs combines all the drives into a storage pool first, and then > duplicates the chunks as file data is created. RAID-1 is defined > currently as "2 copies of all the data on different disks". This differs > from MD-RAID and dmraid, in that those make exactly n copies for n disks. > In a btrfs RAID-1 on 3 1TB drives we get 1.5TB of usable data. Because > each block is only copied to 2 drives, writing a given block only > requires exactly 2 drives spin up, reading requires only 1 drive to > spinup. This is correct. > RAID-0 is similarly defined, with the stripe split among exactly 2 disks. > 3 1TB drives yield 3TB usable space, but to read a given stripe only > requires 2 disks. This is definitely wrong. RAID-0 will use all 3 drives for each stripe. > RAID-10 is built on top of these definitions. Every stripe is split > across to exactly 2 RAID1 sets and those RAID1 sets are written to > exactly 2 disk (hense 4 disk minimum). A btrfs raid-10 volume with 6 1TB > drives will yield 3TB usable space with 2 copies of all data, but only 4 This is also wrong. You will get 3 TB usage out of 6 × 1 TB drives, but the individual stripes will be 3 drives wide. You would have the same behaviour (2 copies of 3 stripes wide) on a 7-device array. > <<<< > > [Yes, that ending sentence is incomplete in the wiki.] > > So we have: > > 1) raid1 is exactly two copies of data, paired devices. > > 2) raid0 is a stripe exactly two devices wide (reinforced by to read a > stripe takes only two devices), so again paired devices. > > 3) raid10 is a combination of the above raid0 and raid1 definitions, > exactly two raid1 pairs, paired in raid0. > > So btrfs raid10 is pairs of pairs, each raid0 stripe a pair of raid1 > mirrors. If there's 8 devices, four smaller, four larger, the first > allocated chunks should be one per device, until the smaller devices fill > up it'll chunk across the remaining four, but it'll be pairs of pairs of > pairs, two pair(0)-of-pair(1) stripes wide instead of a single quad(0)-of- > pair(1) stripe wide. If the RAID code used pairs for its stripes, that'd be the case, but it doesn't... Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- emacs: Emacs Makes A Computer Slow. ---
signature.asc
Description: Digital signature