--- Begin Message ---
Package: mdadm
Version: 3.2.5-5
Severity: wishlist
Tags: patch
Hi.
Attached are two patches against the FAQ.
Their headers describe what and why the change things.
For the 2nd patch, I'd suggest against using the non-unicode
version.
Unicode is really everywhere available by now... even on
the consoles... and this information is likely not needed
from small embedded/resuce systems where one actually might
not have Unicode support.
Another patchset will probably follow.
Cheers,
Chris.
This contains many possible improvements and fixes.
Please carefully check them all, as in some cases I assumed
typos and swapped the meaning of something.
* Link to the RAW version of the FAQ in git, instead of the annotated one.
* At nearly all places, replaced "disk/drive/hdd" by "device", even in the
quote from Doug Ledford which already said it's an adapted version.
Reasons: * It doesn't need to be physical disks, any logical block device
will do as well.
* Nowadays SSD RAIDs aren't that uncommon, and SSD are not disks
anymore. Device is the better and neutral term IMHO.
* RAID0 is not "pseudo-redundant" (it isn't redundant at all), it's rather a
"pseudo-RAID" as the R in RAID already implies the redundancy.
* Q1, is, AFAIU not targeted at removing the superblock of the /dev/mdX
(if that works at all, since /dev/mdX itself has no superblock), but rather
the superblocks of the underlying block devices.
* Q1: Imrpoved the wording a bit, to point out that it's a problem when you
reuse a device with mdadm superblock in another one.
* Q1: Add a warning why one should always use mdadm --zero-superblock to
erase it.
* Added the manpage section (e.g. mdadm(8)) at such places where I found it
missing.
* For the RAID levels, added links to the Wikipedia entries which give a lot
of details how they work.
The link to the "Non-standard_RAID_levels" contains info about the Linux
kernels RAID10 layouts.
* Removed the notice about "(Thanks to Per Olofsson for clarifying this in
#493577)"... such things belong to the changelog.
* Made it easier to find the targets of e.g. "[0]" by changing it from "0."
to the usual "[0]" as well.
* Q5. How to convert RAID5 to RAID10
Seems to be specific for 3device RAID5s, right? Mentioned that.
Added a note as well, that witht the one type of spare, a hot spare is
meant.
And it's probably the fourth disk, not the forth.
* The http://marc.theaimsgroup.com/ seemed to be all dead. Searched
for the same entries on GMAME, which also has a nicer interface and
shows the message as part of the whole tree of the thread.
* Q9b: LVM doesn't need to span over devices, and as such doesn't need to
be "dangerous".
* Q11: newer kernels don't use /dev/hd* anymore... and used the neutral
style of /dev/sdXY instead of a real example.
* Q11: Part b: Don't refer to the "above example", as there was really no
direct connection (it didn't mention /dev/hd[ab]* /dev/hdc[13] at all).
* Q19: Made it an exmaple for physical non-hotswappable devices, as the
commands used won't work in many other examples.
Also use poweroff instead of halt, as halt might not poweroff, which
IS important when having the non-hotswappable devices.
* Q20: The one pair wouldn't be broken already (i.e. data loss) but only
degraded.
* Q26: More thoroughly explain what rootdelay actually does.
Especiall (but not limited to), check these:
* Q7: 1,2 and 3,4 are each RAID1 pairs (and not RAID0)... and these are
combined into a RAID0 (not a RAID1).
* Q20: I think the ordering is in DEcreasing not INcreasing numbers?!
Index: x/FAQ
===================================================================
--- x.orig/FAQ 2013-07-04 16:59:15.212163773 +0200
+++ x/FAQ 2013-07-04 17:29:14.861181639 +0200
@@ -4,15 +4,16 @@
Also see /usr/share/doc/mdadm/README.recipes.gz .
The latest version of this FAQ is available here:
- http://git.debian.org/?p=pkg-mdadm/mdadm.git;a=blob;f=debian/FAQ;hb=HEAD
+ http://anonscm.debian.org/gitweb/?p=pkg-mdadm/mdadm.git;a=blob_plain;f=debian/FAQ;hb=HEAD
0. What does MD stand for?
~~~~~~~~~~~~~~~~~~~~~~~~~~
MD is an abbreviation for "multiple device" (also often called "multi-
disk"). The Linux MD implementation implements various strategies for
- combining multiple physical devices into single logical ones. The most
- common use case is commonly known as "Software RAID". Linux supports RAID
- levels 1, 4, 5, 6, and 10, as well as the "pseudo-redundant" RAID level 0.
+ combining multiple (typically but not necessarily physical) block devices
+ into single logical ones. The most common use case is commonly known as
+ "Software RAID". Linux supports RAID levels 1, 4, 5, 6 and 10 as well
+ as the "pseudo" RAID level 0.
In addition, the MD implementation covers linear and multipath
configurations.
@@ -21,18 +22,19 @@
1. How do I overwrite ("zero") the superblock?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- mdadm --zero-superblock /dev/mdX
+ mdadm --zero-superblock /dev/sdXY
Note that this is a destructive operation. It does not actually delete any
data, but the device will have lost its "authority". You cannot assemble the
- array with it anymore, and if you add the device to another array, the
+ array with it anymore and if you add the device to another array, the
synchronisation process *will* *overwrite* all data on the device.
Nevertheless, sometimes it is necessary to zero the superblock:
- - If you are reusing a disk that has been part of an array with an different
- superblock version and/or location. In this case you zero the superblock
- before you assemble the array, or add the device to an array.
+ - If you want ot re-use a device (e.g. a HDD or SSD) that has been part of an
+ array (with an different superblock version and/or location) in another one.
+ In this case you zero the superblock before you assemble the array or add
+ the device to a new array.
- If you are trying to prevent a device from being recognised as part of an
array. Say for instance you are trying to change an array spanning sd[ab]1
@@ -42,9 +44,15 @@
this may not be what you want. Instead, zeroing the superblock will
(permanently) prevent a device from being considered as part of an array.
+ WARNING: Depending on which superblock version you use, it won't work to just
+ overwrite the first few MiBs of the block device with 0x0 (e.g. via
+ dd), since the superblock may be at other locations (especially the
+ end of the device).
+ Therefore always use mdadm --zero-superblock .
+
2. How do I change the preferred minor of an MD array (RAID)?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- See item 12 in /usr/share/doc/mdadm/README.recipes.gz and read the mdadm
+ See item 12 in /usr/share/doc/mdadm/README.recipes.gz and read the mdadm(8)
manpage (search for 'preferred').
3. How does mdadm determine which /dev/mdX or /dev/md/X to use?
@@ -105,72 +113,73 @@
4. Which RAID level should I use?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Please read /usr/share/doc/mdadm/RAID5_versus_RAID10.txt.gz .
+ Please read /usr/share/doc/mdadm/RAID5_versus_RAID10.txt.gz,
+ https://en.wikipedia.org/wiki/Standard_RAID_levels and perhaps even
+ https://en.wikipedia.org/wiki/Non-standard_RAID_levels .
Many people seem to prefer RAID4/5/6 because it makes more efficient use of
- space. For example, if you have disks of size X, then in order to get 2X
- storage, you need 3 disks for RAID5, but 4 if you use RAID10 or RAID1+0 (or
+ space. For example, if you have devices of size X, then in order to get 2X
+ storage, you need 3 devices for RAID5, but 4 if you use RAID10 or RAID1+0 (or
RAID6).
This gain in usable space comes at a price: performance; RAID1/10 can be up
to four times faster than RAID4/5/6.
At the same time, however, RAID4/5/6 provide somewhat better redundancy in
- the event of two failing disks. In a RAID10 configuration, if one disk is
- already dead, the RAID can only survive if any of the two disks in the other
- RAID1 array fails, but not if the second disk in the degraded RAID1 array
- fails (see next item, 4b). A RAID6 across four disks can cope with any two
- disks failing. However, RAID6 is noticeably slower than RAID5. RAID5 and
- RAID4 do not differ much, but can only handle single-disk failures.
+ the event of two failing devices. In a RAID10 configuration, if one device is
+ already dead, the RAID can only survive if any of the two devices in the other
+ RAID1 array fails, but not if the second device in the degraded RAID1 array
+ fails (see next item, 4b). A RAID6 across four devices can cope with any two
+ devices failing. However, RAID6 is noticeably slower than RAID5. RAID5 and
+ RAID4 do not differ much, but can only handle single-device failures.
- If you can afford the extra disks (storage *is* cheap these days), I suggest
+ If you can afford the extra devices (storage *is* cheap these days), I suggest
RAID1/10 over RAID4/5/6. If you don't care about performance but need as
much space as possible, go with RAID4/5/6, but make sure to have backups.
Heck, make sure to have backups whatever you do.
Let it be said, however, that I thoroughly regret putting my primary
- workstation on RAID5. Anything disk-intensive brings the system to its
+ workstation on RAID5. Anything device-intensive brings the system to its
knees; I will have to migrate to RAID10 at one point.
-4b. Can a 4-disk RAID10 survive two disk failures?
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+4b. Can a 4-device RAID10 survive two device failures?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I am assuming that you are talking about a setup with two copies of each
block, so --layout=near2/far2/offset2:
In two thirds of the cases, yes[0], and it does not matter which layout you
- use. When you assemble 4 disks into a RAID10, you essentially stripe a RAID0
- across two RAID1, so the four disks A,B,C,D become two pairs: A,B and C,D.
- If A fails, the RAID10 can only survive if the second failing disk is either
- C or D; If B fails, your array is dead.
+ use. When you assemble 4 devices into a RAID10, you essentially stripe a RAID0
+ across two RAID1, so the four devices A,B,C,D become two pairs: A,B and C,D.
+ If A fails, the RAID10 can only survive if the second failing device is either
+ C or D; if B fails, your array is dead.
- Thus, if you see a disk failing, replace it as soon as possible!
+ Thus, if you see a device failing, replace it as soon as possible!
- If you need to handle two failing disks out of a set of four, you have to
+ If you need to handle two failing devices out of a set of four, you have to
use RAID6, or store more than two copies of each block (see the --layout
option in the mdadm(8) manpage).
See also question 18 further down.
- 0. it's actually (n-2)/(n-1), where n is the number of disks. I am not
- a mathematician, see http://aput.net/~jheiss/raid10/, which gives the
- chance of *failure* as 1/(n-1), so the chance of success is 1-1/(n-1), or
- (n-2)/(n-1), or 2/3 in the four disk example.
- (Thanks to Per Olofsson for clarifying this in #493577).
+ [0] It's actually (n-2)/(n-1), where n is the number of devices. I am not
+ a mathematician, see http://aput.net/~jheiss/raid10/, which gives the
+ chance of *failure* as 1/(n-1), so the chance of success is 1-1/(n-1), or
+ (n-2)/(n-1), or 2/3 in the four device example.
5. How to convert RAID5 to RAID10?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- To convert RAID5 to RAID10, you need a spare disk (either a spare, forth
- disk in the array, or a new one). Then you remove the spare and one of the
- three disks from the RAID5, create a degraded RAID10 across them, create
- the filesystem and copy the data (or do a raw copy), then add the other two
- disks to the new RAID10. However, mdadm cannot assemble a RAID10 with 50%
- missing devices the way you might like it:
+ To convert 3 device RAID5 to RAID10, you need a spare device (either a hot
+ spare, fourth device in the array, or a new one). Then you remove the spare
+ and one of the three devices from the RAID5, create a degraded RAID10 across
+ them, create the filesystem and copy the data (or do a raw copy), then add the
+ other two devices to the new RAID10. However, mdadm cannot assemble a RAID10
+ with 50% missing devices the way you might like it:
mdadm --create -l 10 -n4 -pn2 /dev/md1 /dev/sd[cd] missing missing
For reasons that may be answered by question 20 further down, mdadm actually
- cares about the order of devices you give it. If you intersperse the missing
- keywords with the physical drives, it should work:
+ cares about the order of devices you give it. If you intersperse the "missing"
+ keywords with the physical devices, it should work:
mdadm --create -l 10 -n4 -pn2 /dev/md1 /dev/sdc missing /dev/sdd missing
@@ -179,7 +188,7 @@
mdadm --create -l 10 -n4 -pn2 /dev/md1 missing /dev/sd[cd] missing
Also see item (4b) further up, and this thread:
- http://marc.theaimsgroup.com/?l=linux-raid&m=116004333406395&w=2
+ http://thread.gmane.org/gmane.linux.raid/13469/focus=13472
6. What is the difference between RAID1+0 and RAID10?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -189,8 +198,8 @@
The Linux kernel provides the RAID10 level to do pretty much exactly the
same for you, but with greater flexibility (and somewhat improved
- performance). While RAID1+0 makes sense with 4 disks, RAID10 can be
- configured to work with only 3 disks. Also, RAID10 has a little less
+ performance). While RAID1+0 makes sense with 4 devices, RAID10 can be
+ configured to work with only 3 devices. Also, RAID10 has a little less
overhead than RAID1+0, which has data pass the md layer twice.
I prefer RAID10 over RAID1+0.
@@ -203,40 +212,40 @@
The linux MD driver supports RAID10, which is equivalent to the above
RAID1+0 definition.
- RAID1+0/10 has a greater chance to survive two disk failures, its
+ RAID1+0/10 has a greater chance to survive two device failures, its
performance suffers less when in degraded state, and it resyncs faster after
- replacing a failed disk.
+ replacing a failed device.
See http://aput.net/~jheiss/raid10/ for more details.
7. Which RAID10 layout scheme should I use
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RAID10 gives you the choice between three ways of laying out the blocks on
- the disk. Assuming a simple 4 drive setup with 2 copies of each block, then
+ the device. Assuming a simple 4 device setup with 2 copies of each block, then
if A,B,C are data blocks, a,b their parts, and 1,2 denote their copies, the
- following would be a classic RAID1+0 where 1,2 and 3,4 are RAID0 pairs
- combined into a RAID1:
+ following would be a classic RAID1+0 where device1,2 and device3,4 are each
+ RAID1 pairs combined into a RAID0:
near=2 would be (this is the classic RAID1+0)
- hdd1 Aa1 Ba1 Ca1
- hdd2 Aa2 Ba2 Ca2
- hdd3 Ab1 Bb1 Cb1
- hdd4 Ab2 Bb2 Cb2
+ device1 Aa1 Ba1 Ca1
+ device2 Aa2 Ba2 Ca2
+ device3 Ab1 Bb1 Cb1
+ device4 Ab2 Bb2 Cb2
offset=2 would be
- hdd1 Aa1 Bb2 Ca1 Db2
- hdd2 Ab1 Aa2 Cb1 Ca2
- hdd3 Ba1 Ab2 Da1 Cb2
- hdd4 Bb1 Ba2 Db1 Da2
+ device1 Aa1 Bb2 Ca1 Db2
+ device2 Ab1 Aa2 Cb1 Ca2
+ device3 Ba1 Ab2 Da1 Cb2
+ device4 Bb1 Ba2 Db1 Da2
far=2 would be
- hdd1 Aa1 Ca1 .... Bb2 Db2
- hdd2 Ab1 Cb1 .... Aa2 Ca2
- hdd3 Ba1 Da1 .... Ab2 Cb2
- hdd4 Bb1 Db1 .... Ba2 Da2
+ device1 Aa1 Ca1 .... Bb2 Db2
+ device2 Ab1 Cb1 .... Aa2 Ca2
+ device3 Ba1 Da1 .... Ab2 Cb2
+ device4 Bb1 Db1 .... Ba2 Da2
Where the second set start half-way through the drives.
@@ -276,10 +285,10 @@
9b. Why not?
~~~~~~~~~~~~
- RAID0 has zero redundancy. If you stripe a RAID0 across X disks, you
+ RAID0 has zero redundancy. If you stripe a RAID0 across X devices, you
increase the likelyhood of complete loss of the filesystem by a factor of X.
- The same applies to LVM by the way.
+ The same applies to LVM by the way (when LVs are placed over X PVs).
If you want/must used LVM or RAID0, stripe it across RAID1 arrays
(RAID10/RAID1+0, or LVM on RAID1), and keep backups!
@@ -291,9 +300,9 @@
11. mdadm warns about duplicate/similar superblocks; what gives?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In certain configurations, especially if your last partition extends all the
- way to the end of the disk, mdadm may display a warning like:
+ way to the end of the device, mdadm may display a warning like:
- mdadm: WARNING /dev/hdc3 and /dev/hdc appear to have very similar
+ mdadm: WARNING /dev/sdXY and /dev/sdX appear to have very similar
superblocks. If they are really different, please --zero the superblock on
one. If they are the same or overlap, please remove one from the DEVICE
list in mdadm.conf.
@@ -305,10 +314,13 @@
existing arrays.
(b) instead of 'DEVICE partitions', list exactly those devices that are
- components of MD arrays on your system. So in the above example:
+ components of MD arrays on your system. So istead of:
+
+ DEVICE partitions
- - DEVICE partitions
- + DEVICE /dev/hd[ab]* /dev/hdc[123]
+ for example:
+
+ DEVICE /dev/sd[ab]* /dev/sdc[123]
12. mdadm -E / mkconf report different arrays with the same device
name / minor number. What gives?
@@ -352,7 +364,7 @@
permission):
First, not all MD types make sense to be split up, e.g. multipath. For
- those types, when a disk fails, the *entire* disk is considered to have
+ those types, when a device fails, the *entire* device is considered to have
failed, but with different arrays you won't switch over to the next path
until each MD array has attempted to access the bad path. This can have
obvious bad consequences for certain array types that do automatic
@@ -362,38 +374,38 @@
array stayed on the old path because it didn't send any commands during
the path down time period).
- Second, convenience. Assume you have a 6 disk RAID5 array. If a disk
+ Second, convenience. Assume you have a 6 device RAID5 array. If a device
fails and you are using a partitioned MD array, then all the partitions on
- the disk will already be handled without using that disk. No need to
+ the device will already be handled without using that device. No need to
manually fail any still active array members from other arrays.
- Third, safety. Again with the raid5 array. If you use multiple arrays on
- a single disk, and that disk fails, but it only failed on one array, then
- you now need to manually fail that disk from the other arrays before
- shutting down or hot swapping the disk. Generally speaking, that's not
+ Third, safety. Again with the RAID5 array. If you use multiple arrays on
+ a single device, and that device fails, but it only failed on one array, then
+ you now need to manually fail that device from the other arrays before
+ shutting down or hot swapping the device. Generally speaking, that's not
a big deal, but people do occasionally have fat finger syndrome and this
- is a good opportunity for someone to accidentally fail the wrong disk, and
- when you then go to remove the disk you create a two disk failure instead
+ is a good opportunity for someone to accidentally fail the wrong device, and
+ when you then go to remove the device you create a two device failure instead
of one and now you are in real trouble.
Forth, to respond to what you wrote about independent of each other --
part of the reason why you partition. I would argue that's not true. If
- your goal is to salvage as much use from a failing disk as possible, then
+ your goal is to salvage as much use from a failing device as possible, then
OK. But, generally speaking, people that have something of value on their
- disks don't want to salvage any part of a failing disk, they want that
- disk gone and replaced immediately. There simply is little to no value in
- an already malfunctioning disk. They're too cheap and the data stored on
+ devices don't want to salvage any part of a failing device, they want that
+ device gone and replaced immediately. There simply is little to no value in
+ an already malfunctioning device. They're too cheap and the data stored on
them too valuable to risk loosing something in an effort to further
utilize broken hardware. This of course is written with the understanding
that the latest MD RAID code will do read error rewrites to compensate for
- minor disk issues, so anything that will throw a disk out of an array is
+ minor device issues, so anything that will throw a device out of an array is
more than just a minor sector glitch.
- 0. http://marc.theaimsgroup.com/?l=linux-raid&m=116117813315590&w=2
+ [0] http://thread.gmane.org/gmane.linux.raid/13594/focus=13597
15. How can I start a dirty degraded array?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- A degraded array (e.g. a RAID5 with only two disks) that has not been
+ A degraded array (e.g. a RAID5 with only two devices) that has not been
properly stopped cannot be assembled just like that; mdadm will refuse and
complain about a "dirty degraded array", for good reasons.
@@ -425,16 +437,16 @@
overridden with the --force and --assume-clean options, but it is not
recommended. Read the manpage.
-18. How many failed disks can a RAID10 handle?
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+18. How many failed devics can a RAID10 handle?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(see also question 4b)
- The following table shows how many disks you can lose and still have an
+ The following table shows how many devices you can lose and still have an
operational array. In some cases, you *can* lose more than the given number
- of disks, but there is no guarantee that the array survives. Thus, the
- following is the guaranteed number of failed disks a RAID10 array survives
- and the maximum number of failed disks the array can (but is not guaranteed
- to) handle, given the number of disks used and the number of data block
+ of devices, but there is no guarantee that the array survives. Thus, the
+ following is the guaranteed number of failed devices a RAID10 array survives
+ and the maximum number of failed devices the array can (but is not guaranteed
+ to) handle, given the number of devices used and the number of data block
copies. Note that 2 copies means original + 1 copy. Thus, if you only have
one copy (the original), you cannot handle any failures.
@@ -447,28 +459,30 @@
6 0/0 1/3 2/3 3/3
7 0/0 1/3 2/3 3/3
8 0/0 1/4 2/3 3/4
- (# of disks)
+ (# of devices)
Note: I have not really verified the above information. Please don't count
- on it. If a disk fails, replace it as soon as possible. Corrections welcome.
+ on it. If a device fails, replace it as soon as possible. Corrections welcome.
+
+19. What should I do if a device fails?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ Replace it as soon as possible.
-19. What should I do if a disk fails?
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Replace it as soon as possible:
+ In case of physical devices with no hot-swap capabilities, for example via:
mdadm --remove /dev/md0 /dev/sda1
- halt
- <replace disk and start the machine>
+ poweroff
+ <replace device and start the machine>
mdadm --add /dev/md0 /dev/sda1
-20. So how do I find out which other disk(s) can fail without killing the
+20. So how do I find out which other device(s) can fail without killing the
array?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Did you read the previous question and its answer?
For cases when you have two copies of each block, the question is easily
- answered by looking at the output of /proc/mdstat. For instance on a four
- disk array:
+ answered by looking at the output of /proc/mdstat. For instance on a 4 device
+ array:
md3 : active raid10 sdg7[3] sde7[0] sdh7[2] sdf7[1]
@@ -478,25 +492,25 @@
md3 : active raid10 sdg7[3] sde7[0] sdh7[4](F) sdf7[1]
- So now the second pair is broken; the array could take another failure in
+ So now the second pair is degraded; the array could take another failure in
the first pair, but if sdg now also fails, you're history.
Now go and read question 19.
For cases with more copies per block, it becomes more complicated. Let's
- think of a seven disk array with three copies:
+ think of a 7 device array with three copies:
md5 : active raid10 sdg7[6] sde7[4] sdb7[5] sdf7[2] sda7[3] sdc7[1] sdd7[0]
- Each mirror now has 7/3 = 2.33 disks to it, so in order to determine groups,
- you need to round up. Note how the disks are arranged in increasing order of
+ Each mirror now has 7/3 = 2.33 devices to it, so in order to determine groups,
+ you need to round up. Note how the devices are arranged in decreasing order of
their indices (the number in brackes in /proc/mdstat):
- disk: -sdd7- -sdc7- -sdf7- -sda7- -sde7- -sdb7- -sdg7-
+ device: -sdd7- -sdc7- -sdf7- -sda7- -sde7- -sdb7- -sdg7-
group: [ one ][ two ][ three ]
- Basically this means that after two disk failed, you need to make sure that
- the third failed disk doesn't destroy all copies of any given block. And
+ Basically this means that after two devices failed, you need to make sure that
+ the third failed device doesn't destroy all copies of any given block. And
that's not always easy as it depends on the layout chosen: whether the
blocks are near (same offset within each group), far (spread apart in a way
to maximise the mean distance), or offset (offset by size/n within each
@@ -506,8 +520,7 @@
21. Why does the kernel speak of 'resync' when using checkarray?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Please see README.checkarray and
- http://www.mail-archive.com/[email protected]/msg04835.html .
+ Please see README.checkarray and http://thread.gmane.org/gmane.linux.raid/11864 .
In short: it's a bug. checkarray is actually not a resync, but the kernel
does not distinguish between them.
@@ -524,8 +537,8 @@
echo idle >/sys/block/md1/md/sync_action
- This will cause md1 to go idle and md to synchronise md3 (or whatever is
- queued next; repeat the above for other devices if necessary). md will also
+ This will cause md1 to go idle and MD to synchronise md3 (or whatever is
+ queued next; repeat the above for other devices if necessary). MD will also
realise that md1 is still not in sync and queue it for resynchronisation,
so it will sync automatically when its turn has come.
@@ -533,13 +546,13 @@
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you don't have any arrays on your system, then mdadm's init script will
fail to assemble them and print a warning. If you don't like that, disable
- AUTOSTART in /etc/default/mdadm.
+ AUTOSTART in /etc/default/mdadm .
24. What happened to mdrun? How do I replace it?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mdrun used to be the sledgehammer approach to assembling arrays. It has
- accumulated several problems over the years (e.g. #354705) and thus has been
- deprecated and removed with the 2.6.7-2 version of this package.
+ accumulated several problems over the years (e.g. Debian bug #354705) and
+ thus has been deprecated and removed with the 2.6.7-2 version of this package.
If you are still using mdrun, please ensure that you have a valid
/etc/mdadm/mdadm.conf file (run /usr/share/mdadm/mkconf --generate to get
@@ -574,13 +587,14 @@
Short answer: It doesn't, the underlying devices aren't yet available yet
when mdadm runs during the early boot process.
- Long answer: It doesn't. but the drivers of those devices incorrectly
+ Long answer: It doesn't, but the drivers of those devices incorrectly
communicate to the kernel that the devices are ready, when in fact they are
not. I consider this a bug in those drivers. Please consider reporting it.
Workaround: there is nothing mdadm can or will do against this. Fortunately
though, initramfs provides a method, documented at
- http://wiki.debian.org/InitramfsDebug. Please append rootdelay=10 to the
+ http://wiki.debian.org/InitramfsDebug. Please append rootdelay=10 (which sets
+ a delay of 10 seconds before trying to mount the root filesystem) to the
kernel command line and try if the boot now works.
-- martin f. krafft <[email protected]> Wed, 13 May 2009 09:59:53 +0200
To be honest... I didn't understand the current block visualisation
for the RAID10 layout question at all.
And the different meanings of ABC abc 123 are not really well explained.
I looked around for other examples in the web and besed on the md(4)
manpage and some answers from Neil Brown, I made this patch.
Ideally, I think, that this information should go into the md(4)
manapge,... but unless this is the case, here the content for the FAQ.
Index: mdadm-faq/FAQ
===================================================================
--- mdadm-faq.orig/FAQ 2013-07-04 17:29:14.861181639 +0200
+++ mdadm-faq/FAQ 2013-07-08 18:31:19.741175429 +0200
@@ -220,41 +220,119 @@
7. Which RAID10 layout scheme should I use
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- RAID10 gives you the choice between three ways of laying out the blocks on
- the device. Assuming a simple 4 device setup with 2 copies of each block, then
- if A,B,C are data blocks, a,b their parts, and 1,2 denote their copies, the
- following would be a classic RAID1+0 where device1,2 and device3,4 are each
- RAID1 pairs combined into a RAID0:
-
- near=2 would be (this is the classic RAID1+0)
-
- device1 Aa1 Ba1 Ca1
- device2 Aa2 Ba2 Ca2
- device3 Ab1 Bb1 Cb1
- device4 Ab2 Bb2 Cb2
-
- offset=2 would be
-
- device1 Aa1 Bb2 Ca1 Db2
- device2 Ab1 Aa2 Cb1 Ca2
- device3 Ba1 Ab2 Da1 Cb2
- device4 Bb1 Ba2 Db1 Da2
-
- far=2 would be
-
- device1 Aa1 Ca1 .... Bb2 Db2
- device2 Ab1 Cb1 .... Aa2 Ca2
- device3 Ba1 Da1 .... Ab2 Cb2
- device4 Bb1 Db1 .... Ba2 Da2
-
- Where the second set start half-way through the drives.
-
- The advantage of far= is that you can easily spread a long sequential read
- across the drives. The cost is more seeking for writes. offset= can
- possibly get similar benefits with large enough chunk size. Neither upstream
- nor the package maintainer have tried to understand all the implications of
- that layout. It was added simply because it is a supported layout in DDF and
- DDF support is a goal.
+ RAID10 gives you the choice between three ways of laying out chunks on the
+ devices: near, far and offset.
+
+
+ The examples below explain the chunk distribution for each of these layouts
+ with 2 copies per chunk, using either an even number (below 4) or an odd number
+ (below 5) of devices.
+
+ For simplicity it is assumed, that the size of the chunks equals the size of
+ the blocks of the underlying devices as well as the RAID10 device exported
+ by the kernel (e.g. /dev/md/name).
+ Therefore the chunks/chunk numbers map directly to the blocks/block addresses
+ of the exported RAID10 device.
+
+ Decimal numbers (0, 1, 2, …) are the chunks of the RAID10 and due to the above
+ assumption also the blocks and block addresses of the exported RAID10 device.
+ Same numbers mean copies of a chunk/block (obviously on different underlying
+ devices).
+ Hexadecimal numbers (0x00, 0x01, 0x02, …) are the block addresses of the
+ underlying devices.
+
+
+ near with 2 copies per chunk (--layout=n2):
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ The chunk copies are placed "as close to each other as possible".
+
+ With an even number of devices, they lay at the very same offset on the
+ different devices and this is a classic RAID1+0, i.e. two groups of mirrored
+ devices (in the example with an even number of devices, the groups device1/2 and
+ device3/4 are each a RAID1), both in turn forming a striped RAID0.
+
+ device1 device2 device3 device4 device1 device2 device3 device4 device5
+ ─────── ─────── ─────── ─────── ─────── ─────── ─────── ─────── ───────
+ 0 0 1 1 0x00 0 0 1 1 2
+ 2 2 3 3 0x01 2 3 3 4 4
+ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯
+ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
+ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯
+ 254 254 255 255 0x80 317 318 318 319 319
+ ╰──────┬──────╯ ╰──────┬──────╯
+ RAID1 RAID1
+ ╰──────────────┬──────────────╯
+ RAID0
+
+
+ far with 2 copies per chunk (--layout=f2):
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ The chunk copies are placed "as far from each other as possible".
+ This means that first a complete sequence of all different chunks (i.e. all data
+ on the exported block device) is striped over the devices. Then a second complete
+ sequence of all different chunks (and so on in the case of f>2).
+
+ The "shift" needed to prevent placing copies of the same chunks on the same
+ devices is gained by a cyclic permutation of each stripe from the second complete
+ sequence of all different chunks (and in case of f>2, further cyclic permutations
+ of each stripe in any further complete sequence of all differen chunks).
+
+ device1 device2 device3 device4 device1 device2 device3 device4 device5
+ ─────── ─────── ─────── ─────── ─────── ─────── ─────── ─────── ───────
+ 0 1 2 3 0x00 0 1 2 3 4 ╮
+ 4 5 6 7 0x01 5 6 7 8 9 ├ ▒
+ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ┆
+ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ┆
+ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ┆
+ 252 253 254 255 0x40 315 316 317 318 319 ╯
+ 3 0 1 2 0x41 4 0 1 2 3 ╮
+ 7 4 5 6 0x42 9 5 6 7 8 ├ ▒ₚ
+ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ┆
+ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ┆
+ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ┆
+ 255 252 253 254 0x80 319 315 316 317 318 ╯
+
+ With ▒ being the complete sequence of all different chunks and ▒ₚ the cyclic
+ permutation thereof (in the case of f>2 there would be (▒ₚ)ₚ, ((▒ₚ)ₚ)ₚ, …).
+
+ The advantage of the far layout is that MD can easily spread sequential reads over
+ the devices, making it similar to RAID0 in terms of speed.
+ The cost is more seeking for writes, making them substantially slower.
+
+
+ offset with 2 copies per chunk (--layout=o2):
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ <number of devices> consecutive chunks are striped over the devices, immediately
+ followed by a copy (or more in the case of o>2) of these chunks.
+ This pattern goes on for all further consecutive chunks of the exported RAID10
+ device.
+
+ The "shift" needed to prevent placing copies of the same chunks on the same
+ devices is gained by a cyclic permutation of all the second sets of chunks (and
+ the case of o>2 further cyclic permutations of the thrid, etc. sets of chunks).
+
+ device1 device2 device3 device4 device1 device2 device3 device4 device5
+ ─────── ─────── ─────── ─────── ─────── ─────── ─────── ─────── ───────
+ 0 1 2 3 0x00 0 1 2 3 4 ) AA
+ 3 0 1 2 0x01 4 0 1 2 3 ) AAₚ
+ 4 5 6 7 0x02 5 6 7 8 9 ) AB
+ 7 4 5 6 0x03 9 5 6 7 8 ) ABₚ
+ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ) ⋯
+ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
+ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ) ⋯
+ 251 252 253 254 0x79 314 315 316 317 318 ) EX
+ 254 251 252 253 0x80 318 314 315 316 317 ) EXₚ
+
+ With AA, AB, …, AZ, BA, … being the sets of <number of devices> consecutive chunks
+ and AAₚ, ABₚ, …, AZₚ, BAₚ, … the cyclic permutations thereof (in the case of
+ o>2 there would be (AAₚ)ₚ, … as well as ((AAₚ)ₚ)ₚ, … and so on).
+
+ This should give similar read characteristics as with the far layout when a suitably
+ large chunk size is used, but without as much seeking on writes.
+
+
+
+ See the md(4) manpage for more details.
8. (One of) my RAID arrays is busy and cannot be stopped. What gives?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To be honest... I didn't understand the current block visualisation
for the RAID10 layout question at all.
And the different meanings of ABC abc 123 are not really well explained.
I looked around for other examples in the web and besed on the md(4)
manpage and some answers from Neil Brown, I made this patch.
Ideally, I think, that this information should go into the md(4)
manapge,... but unless this is the case, here the content for the FAQ.
Index: mdadm-faq/FAQ
===================================================================
--- mdadm-faq.orig/FAQ 2013-07-04 17:29:14.861181639 +0200
+++ mdadm-faq/FAQ 2013-07-08 17:57:18.458706236 +0200
@@ -220,41 +220,111 @@
7. Which RAID10 layout scheme should I use
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- RAID10 gives you the choice between three ways of laying out the blocks on
- the device. Assuming a simple 4 device setup with 2 copies of each block, then
- if A,B,C are data blocks, a,b their parts, and 1,2 denote their copies, the
- following would be a classic RAID1+0 where device1,2 and device3,4 are each
- RAID1 pairs combined into a RAID0:
-
- near=2 would be (this is the classic RAID1+0)
-
- device1 Aa1 Ba1 Ca1
- device2 Aa2 Ba2 Ca2
- device3 Ab1 Bb1 Cb1
- device4 Ab2 Bb2 Cb2
-
- offset=2 would be
-
- device1 Aa1 Bb2 Ca1 Db2
- device2 Ab1 Aa2 Cb1 Ca2
- device3 Ba1 Ab2 Da1 Cb2
- device4 Bb1 Ba2 Db1 Da2
-
- far=2 would be
-
- device1 Aa1 Ca1 .... Bb2 Db2
- device2 Ab1 Cb1 .... Aa2 Ca2
- device3 Ba1 Da1 .... Ab2 Cb2
- device4 Bb1 Db1 .... Ba2 Da2
-
- Where the second set start half-way through the drives.
-
- The advantage of far= is that you can easily spread a long sequential read
- across the drives. The cost is more seeking for writes. offset= can
- possibly get similar benefits with large enough chunk size. Neither upstream
- nor the package maintainer have tried to understand all the implications of
- that layout. It was added simply because it is a supported layout in DDF and
- DDF support is a goal.
+ RAID10 gives you the choice between three ways of laying out chunks on the
+ devices: near, far and offset.
+
+
+ The examples below explain the chunk distribution for each of these layouts
+ with 2 copies per chunk, using either an even number (below 4) or an odd number
+ (below 5) of devices.
+
+ For simplicity it is assumed, that the size of the chunks equals the size of
+ the blocks of the underlying devices as well as the RAID10 device exported
+ by the kernel (e.g. /dev/md/name).
+ Therefore the chunks/chunk numbers map directly to the blocks/block addresses
+ of the exported RAID10 device.
+
+ Decimal numbers (0, 1, 2, ...) are the chunks of the RAID10 and due to the above
+ assumption also the blocks and block addresses of the exported RAID10 device.
+ Same numbers mean copies of a chunk/block (obviously on different underlying
+ devices).
+ Hexadecimal numbers (0x00, 0x01, 0x02, ...) are the block addresses of the
+ underlying devices.
+
+
+ near with 2 copies per chunk (--layout=n2):
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ The chunk copies are placed "as close to each other as possible".
+
+ With an even number of devices, they lay at the very same offset on the
+ different devices and this is a classic RAID1+0, i.e. two groups of mirrored
+ devices (in the example with an even number of devices, the groups device1/2 and
+ device3/4 are each a RAID1), both in turn forming a striped RAID0.
+
+ device1 device2 device3 device4 device1 device2 device3 device4 device5
+ ------- ------- ------- ------- ------- ------- ------- ------- -------
+ 0 0 1 1 0x00 0 0 1 1 2
+ 2 2 3 3 0x01 2 3 3 4 4
+ ... ... ... ... .... ... ... ... ... ...
+ 254 254 255 255 0x80 317 318 318 319 319
+ \_______/ \_______/
+ RAID1 RAID1
+ \_______________/
+ RAID0
+
+
+ far with 2 copies per chunk (--layout=f2):
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ The chunk copies are placed "as far from each other as possible".
+ This means that first a complete sequence of all different chunks (i.e. all data
+ on the exported block device) is striped over the devices. Then a second complete
+ sequence of all different chunks (and so on in the case of f>2).
+
+ The "shift" needed to prevent placing copies of the same chunks on the same
+ devices is gained by a cyclic permutation of each stripe from the second complete
+ sequence of all different chunks (and in case of f>2, further cyclic permutations
+ of each stripe in any further complete sequence of all differen chunks).
+
+ device1 device2 device3 device4 device1 device2 device3 device4 device5
+ ------- ------- ------- ------- ------- ------- ------- ------- -------
+ 0 1 2 3 0x00 0 1 2 3 4 \
+ 4 5 6 7 0x01 5 6 7 8 9 | *
+ ... ... ... ... .... ... ... ... ... ... |
+ 252 253 254 255 0x40 315 316 317 318 319 /
+ 3 0 1 2 0x41 4 0 1 2 3 \
+ 7 4 5 6 0x42 9 5 6 7 8 | *^p
+ ... ... ... ... .... ... ... ... ... ... |
+ 255 252 253 254 0x80 319 315 316 317 318 /
+
+ With * being the complete sequence of all different chunks and *^p the cyclic
+ permutation thereof (in the case of f>2 there would be (*^p)^p, ((*^p)^p)^p, ...).
+
+ The advantage of the far layout is that MD can easily spread sequential reads over
+ the devices, making it similar to RAID0 in terms of speed.
+ The cost is more seeking for writes, making them substantially slower.
+
+
+ offset with 2 copies per chunk (--layout=o2):
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ <number of devices> consecutive chunks are striped over the devices, immediately
+ followed by a copy (or more in the case of o>2) of these chunks.
+ This pattern goes on for all further consecutive chunks of the exported RAID10
+ device.
+
+ The "shift" needed to prevent placing copies of the same chunks on the same
+ devices is gained by a cyclic permutation of all the second sets of chunks (and
+ the case of o>2 further cyclic permutations of the thrid, etc. sets of chunks).
+
+ device1 device2 device3 device4 device1 device2 device3 device4 device5
+ ------- ------- ------- ------- ------- ------- ------- ------- -------
+ 0 1 2 3 0x00 0 1 2 3 4 ) AA
+ 3 0 1 2 0x01 4 0 1 2 3 ) AA^p
+ 4 5 6 7 0x02 5 6 7 8 9 ) AB
+ 7 4 5 6 0x03 9 5 6 7 8 ) AB^p
+ ... ... ... ... .... ... ... ... ... ... ) ....
+ 251 252 253 254 0x79 314 315 316 317 318 ) EX
+ 254 251 252 253 0x80 318 314 315 316 317 ) EX^p
+
+ With AA, AB, ..., AZ, BA, ... being the sets of <number of devices> consecutive chunks
+ and AA^p, AB^p, ..., AZ^p, BA^p, ... the cyclic permutations thereof (in the case of
+ o>2 there would be (AA^p)^p, ... as well as ((AA^p)^p)^p, ... and so on).
+
+ This should give similar read characteristics as with the far layout when a suitably
+ large chunk size is used, but without as much seeking on writes.
+
+
+
+ See the md(4) manpage for more details.
8. (One of) my RAID arrays is busy and cannot be stopped. What gives?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--- End Message ---