Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ?
On Fri, 14 Jan 2000, D. Lance Robinson wrote: >Ingo, > >I can fairly regularly generate corruption (data or ext2 filesystem) on a busy >RAID-5 by adding a spare drive to a degraded array and letting it build the >parity. Could the problem be from the bad (illegal) buffer interactions you >mentioned, or are there other areas that need fixing as well? I have been >looking into this issue for a long time with no resolve. Since you may be aware >of possible problem areas: any ideas, code or encouragement is greatly welcome. Hy Lance, which raid code was you using when you had the above problem? Was you using the raid 0.90 patches for 2.2.x, right? Andrea
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ?
Hi, Chris Wedgwood writes: > > This may affect data which was not being written at the time of the > > crash. Only raid 5 is affected. > > Long term -- if you journal to something outside the RAID5 array (ie. > to raid-1 protected log disks) then you should be safe against this > type of failure? Indeed. The jfs journaling layer in ext3 is a completely generic block device journaling layer which could be used for such a purpose (and raid/LVM journaling is one of the reasons it was designed this way). --Stephen
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ?
Hi, Benno Senoner writes: > wow, really good idea to journal to a RAID1 array ! > > do you think it is possible to to the following: > > - N disks holding a soft RAID5 array. > - reserve a small partition on at least 2 disks of the array to hold a RAID1 > array. > - keep the journal on this partition. Yes. My jfs code will eventually support this. The main thing it is missing right now is the ability to journal multiple devices to a single journal: the on-disk structure is already designed with that in mind but the code does not yet support it. --Stephen
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ?
Ingo, I can fairly regularly generate corruption (data or ext2 filesystem) on a busy RAID-5 by adding a spare drive to a degraded array and letting it build the parity. Could the problem be from the bad (illegal) buffer interactions you mentioned, or are there other areas that need fixing as well? I have been looking into this issue for a long time with no resolve. Since you may be aware of possible problem areas: any ideas, code or encouragement is greatly welcome. <>< Lance. Ingo Molnar wrote: > On Wed, 12 Jan 2000, Gadi Oxman wrote: > > > As far as I know, we took care not to poke into the buffer cache to > > find clean buffers -- in raid5.c, the only code which does a find_buffer() > > is: > > yep, this is still the case. (Sorry Stephen, my bad.) We will have these > problems once we try to eliminate the current copying overhead. > Nevertheless there are bad (illegal) interactions between the RAID code > and the buffer cache, i'm cleaning up this for 2.3 right now. Especially > the reconstruction code is a rathole. Unfortunately blocking > reconstruction if b_count == 0 is not acceptable because several > filesystems (such as ext2fs) keep metadata caches around (eg. the block > group descriptors in the ext2fs case) which have b_count == 1 for a longer > time.
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ?
Chris Wedgwood wrote: > > In the power+disk failure case, there is a very narrow window in which > > parity may be incorrect, so loss of the disk may result in inability to > > correctly restore the lost data. > > For some people, this very narrow window may still be a problem. > Especially when you consider the case of a disk failing because of a > power surge -- which also kills a drive. > > > This may affect data which was not being written at the time of the > > crash. Only raid 5 is affected. > > Long term -- if you journal to something outside the RAID5 array (ie. > to raid-1 protected log disks) then you should be safe against this > type of failure? > > -cw wow, really good idea to journal to a RAID1 array ! do you think it is possible to to the following: - N disks holding a soft RAID5 array. - reserve a small partition on at least 2 disks of the array to hold a RAID1 array. - keep the journal on this partition. do you think that this will be possible ? is ext3 / reiserfs capable of keeping the journal on a different partition than the one holding the FS ? That would really be great ! Benno.
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ?
Hi, On Wed, 12 Jan 2000 22:09:35 +0100, Benno Senoner <[EMAIL PROTECTED]> said: > Sorry for my ignorance I got a little confused by this post: > Ingo said we are 100% journal-safe, you said the contrary, Raid resync is safe in the presence of journaling. Journaling is not safe in the presence of raid resync. > can you or Ingo please explain us in which situation (power-loss) > running linux-raid+ journaled FS we risk a corrupted filesystem ? Please read my previous reply on the subject (the one that started off with "I'm tired of answering the same question a million times so here's a definitive answer"). Basically, there will always be a small risk of data loss if power-down is accompanied by loss of a disk (it's a double-failure); and the current implementation of raid resync means that journaling will be broken by the raid1 or raid5 resync code after a reboot on a journaled filesystem (ext3 is likely to panic, reiserfs will not but will still get its IO ordering requirements messed up by the resync). > After the reboot if all disk remain intact physically, will we only > lose the data that was being written, or is there a possibility to end > up in a corrupted filesystem which could more damages in future ? In the power+disk failure case, there is a very narrow window in which parity may be incorrect, so loss of the disk may result in inability to correctly restore the lost data. This may affect data which was not being written at the time of the crash. Only raid 5 is affected. --Stephen
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ?
Hi, On Wed, 12 Jan 2000 07:21:17 -0500 (EST), Ingo Molnar <[EMAIL PROTECTED]> said: > On Wed, 12 Jan 2000, Gadi Oxman wrote: >> As far as I know, we took care not to poke into the buffer cache to >> find clean buffers -- in raid5.c, the only code which does a find_buffer() >> is: > yep, this is still the case. OK, that's good to know. > Especially the reconstruction code is a rathole. Unfortunately > blocking reconstruction if b_count == 0 is not acceptable because > several filesystems (such as ext2fs) keep metadata caches around > (eg. the block group descriptors in the ext2fs case) which have > b_count == 1 for a longer time. That's not a problem: we don't need reconstruction to interact with the buffer cache at all. Ideally, what I'd like to see the reconstruction code do is to: * lock a stripe * read a new copy of that stripe locally * recalc parity and write back whatever disks are necessary for the stripe * unlock the stripe so that the data never goes through the buffer cache at all, but that the stripe is locked with respect to other IOs going on below the level of ll_rw_block (remember there may be IOs coming in to ll_rw_block which are not from the buffer cache, eg. swap or journal IOs). > We are '100% journal-safe' if power fails during resync. Except for the fact that resync isn't remotely journal-safe in the first place, yes. :-) --Stephen
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure = problems ?
Hi, On Tue, 11 Jan 2000 16:41:55 -0600, "Mark Ferrell" <[EMAIL PROTECTED]> said: > Perhaps I am confused. How is it that a power outage while attached > to the UPS becomes "unpredictable"? One of the most common ways to get an outage while on a UPS is somebody tripping over, or otherwise removing, the cable between the UPS and the computer. How exactly is that predictable? Just because you reduce the risk of unexpected power outage doesn't mean we can ignore the possibility. --Stephen
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ?
"Stephen C. Tweedie" wrote: > Ideally, what I'd like to see the reconstruction code do is to: > > * lock a stripe > * read a new copy of that stripe locally > * recalc parity and write back whatever disks are necessary for the stripe > * unlock the stripe > > so that the data never goes through the buffer cache at all, but that > the stripe is locked with respect to other IOs going on below the level > of ll_rw_block (remember there may be IOs coming in to ll_rw_block which > are not from the buffer cache, eg. swap or journal IOs). > > > We are '100% journal-safe' if power fails during resync. > > Except for the fact that resync isn't remotely journal-safe in the first > place, yes. :-) > > --Stephen Sorry for my ignorance I got a little confused by this post: Ingo said we are 100% journal-safe, you said the contrary, can you or Ingo please explain us in which situation (power-loss) running linux-raid+ journaled FS we risk a corrupted filesystem ? I am interested what happens if the power goes down while you write heavily to a ext3/reiserfs (journaled FS) on soft-raid5 array. After the reboot if all disk remain intact physically, will we only lose the data that was being written, or is there a possibility to end up in a corrupted filesystem which could more damages in future ? (or do we need to wait for the raid code in 2.3 ?) sorry for re-asking that question, but I am still confused. regards, Benno.
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ?
On Wed, 12 Jan 2000, Gadi Oxman wrote: > As far as I know, we took care not to poke into the buffer cache to > find clean buffers -- in raid5.c, the only code which does a find_buffer() > is: yep, this is still the case. (Sorry Stephen, my bad.) We will have these problems once we try to eliminate the current copying overhead. Nevertheless there are bad (illegal) interactions between the RAID code and the buffer cache, i'm cleaning up this for 2.3 right now. Especially the reconstruction code is a rathole. Unfortunately blocking reconstruction if b_count == 0 is not acceptable because several filesystems (such as ext2fs) keep metadata caches around (eg. the block group descriptors in the ext2fs case) which have b_count == 1 for a longer time. If both power and a disk fails at once then we still might get local corruption for partially written RAID5 stripes. If either power or a disk fails, then the Linux RAID5 code is safe wrt. journalling, because it behaves like an ordinary disk. We are '100% journal-safe' if power fails during resync. We are also 100% journal-safe if power fails during reconstruction of failed disk or in degraded mode. the 2.3 buffer-cache enhancements i wrote ensure that 'cache snooping' and adding to the buffer-cache can be done safely by 'external' cache managers. I also added means to do atomic IO operations which in fact are several underlying IO operations - without the need of allocating a separate bh. The RAID code uses these facilities now. Ingo
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure = problems ?
James Manning wrote: > [ Tuesday, January 11, 2000 ] Benno Senoner wrote: > > The problem is that power outages are unpredictable even in presence > > of UPSes therefore it is important to have some protection against > > power losses. > > I gotta ask dying power supply? cord getting ripped out? > Most ppl run serial lines (of course :) and with powerd they > get nice shutdowns :) > > Just wanna make sure I'm understanding you... > > James > -- > Miscellaneous Engineer --- IBM Netfinity Performance Development yep, obviously the UPS has a serial line to shut down the machine nicely before a failure, but it happened to me that the serial cable was disconnected and the power outage lasted SEVERAL hours during a weekend , where no one was in the machine room (of an ISP). you know murphy's law ... :-) But I am mainly interested in the power-failure-protection in the case where you want to setup a workstation with a reliable disk array (soft raid5), and do not have always an UPS handy, you will loose the file that was being written, but the important thing is that the disk array remains in a safe state , just like a single disk + journaled FS. Sthephen Tweedie said that this is possible (by fixing the remaining races in the RAID code), if these problems will be fixed sometime, then our fears of a corrupted soft-RAID array in the case of a power-failure on a machine without UPS will completely go away. cheers, Benno.
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure = problems ?
Perhaps I am confused. How is it that a power outage while attached to the UPS becomes "unpredictable"? We run a Dell PowerEdge 2300/400 using Linux software raid and the system monitors it's own UPS. When power failure occures the system will bring itself down to a minimal state (runlevel 1) after the batteries are below 50% .. and once below 15% it will shutdown which turns off the UPS. When power comes back on the UPS fires up and the system resumes as normal. Addmitedly this wont prevent issues like god reaching out and slapping my system via lightning or something, nor will it resolve issues where someone decides to grab the power cable and swing around on it severing the connection from the UPS to the system .. but for the most part it has thus far prooven to be a fairly decent configuration. Benno Senoner wrote: > > "Stephen C. Tweedie" wrote: > > (...) > > > > > 3) The soft-raid backround rebuild code reads and writes through the > >buffer cache with no synchronisation at all with other fs activity. > >After a crash, this background rebuild code will kill the > >write-ordering attempts of any journalling filesystem. > > > >This affects both ext3 and reiserfs, under both RAID-1 and RAID-5. > > > > Interaction 3) needs a bit more work from the raid core to fix, but it's > > still not that hard to do. > > > > So, can any of these problems affect other, non-journaled filesystems > > too? Yes, 1) can: throughout the kernel there are places where buffers > > are modified before the dirty bits are set. In such places we will > > always mark the buffers dirty soon, so the window in which an incorrect > > parity can be calculated is _very_ narrow (almost non-existant on > > non-SMP machines), and the window in which it will persist on disk is > > also very small. > > > > This is not a problem. It is just another example of a race window > > which exists already with _all_ non-battery-backed RAID-5 systems (both > > software and hardware): even with perfect parity calculations, it is > > simply impossible to guarantee that an entire stipe update on RAID-5 > > completes in a single, atomic operation. If you write a single data > > block and its parity block to the RAID array, then on an unexpected > > reboot you will always have some risk that the parity will have been > > written, but not the data. On a reboot, if you lose a disk then you can > > reconstruct it incorrectly due to the bogus parity. > > > > THIS IS EXPECTED. RAID-5 isn't proof against multiple failures, and the > > only way you can get bitten by this failure mode is to have a system > > failure and a disk failure at the same time. > > > > > > > --Stephen > > thank you very much for these clear explanations, > > Last doubt: :-) > Assume all RAID code - FS interaction problems get fixed, > since a linux soft-RAID5 box has no battery backup, > does this mean that we will loose data > ONLY if there is a power failure AND successive disk failure ? > If we loose the power and then after reboot all disks remain intact > can the RAID layer reconstruct all information in a safe way ? > > The problem is that power outages are unpredictable even in presence > of UPSes therefore it is important to have some protection against > power losses. > > regards, > Benno.
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ?
- Original Message - From: "Benno Senoner" <[EMAIL PROTECTED]> To: "Stephen C. Tweedie" <[EMAIL PROTECTED]> Cc: "Linux RAID" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; "Ingo Molnar" <[EMAIL PROTECTED]> Sent: Tuesday, January 11, 2000 1:17 PM Subject: Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ? -- much snippage here > > The problem is that power outages are unpredictable even in presence > of UPSes therefore it is important to have some protection against > power losses. > > regards, > Benno. > > I run an MGE UPS on my RH6.1 box running RAID 1, they have software for Linux that communicates with the UPS and performs an orderly system shutdown if the box goes on battery and stays on battery for a given (user selectable) length of time. I have tested and verified that this actually works, it's a Good Thing(tm). I did have to cut one pin on the standard RS-232 cable that came the UPS for use on the Linux box, and download the software and install (scripted, easy...) bwilling
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure = problems ?
Hi, On Wed, 12 Jan 2000 00:12:55 +0200 (IST), Gadi Oxman <[EMAIL PROTECTED]> said: > Stephen, I'm afraid that there are some misconceptions about the > RAID-5 code. I don't think so --- I've been through this with Ingo --- but I appreciate your feedback since I'm getting inconsistent advise here! Please let me explain... > In an early pre-release version of the RAID code (more than two years > ago?), which didn't protect against that race, we indeed saw locked > buffers changing under us from the point in which we computed the > parity till the point in which they were actually written to the disk, > leading to a corrupted parity. That is not the race. The race has nothing at all to do with buffers changing while they are being used for parity: that's a different problem, long ago fixed by copying the buffers. The race I'm concerned about could occur when the raid driver wants to compute parity for a stripe and finds some of the blocks are present, and clean, in the buffer cache. Raid assumes that those buffers represent what is on disk, naturally enough. So, it uses them to calculate parity without rereading all of the disk blocks in the stripe. The trouble is that the standard practice in the kernel, when modifying a buffer, is to make the change and _then_ mark the buffer dirty. If you hit that window, then the raid driver will find a buffer which doesn't match what is on disk, and will compute parity from that buffer rather than from the on-disk contents. > 1. n dirty blocks are scheduled for a stripe write. That's not the race. The problem occurs when only one single dirty block is scheduled for a write, and we need to find the contents of the rest of the stripe to compute parity. > Point (2) is also incorrect; we have taken care *not* to peek into > the buffer cache to find clean buffers and use them for parity > calculations. We make no such assumptions. Not according to Ingo --- can we get a definitive answer on this, please? Many thanks, Stephen
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure = problems ?
"Stephen C. Tweedie" wrote: > > Hi, > > On Tue, 11 Jan 2000 15:03:03 +0100, mauelsha > <[EMAIL PROTECTED]> said: > > >> THIS IS EXPECTED. RAID-5 isn't proof against multiple failures, and the > >> only way you can get bitten by this failure mode is to have a system > >> failure and a disk failure at the same time. > > > To try to avoid this kind of problem some brands do have additional > > logging (to disk which is slow for sure or to NVRAM) in place, which > > enables them to at least recognize the fault to avoid the > > reconstruction of invalid data or even enables them to recover the > > data by using redundant copies of it in NVRAM + logging information > > what could be written to the disks and what not. > > Absolutely: the only way to avoid it is to make the data+parity updates > atomic, either in NVRAM or via transactions. I'm not aware of any > software RAID solutions which do such logging at the moment: do you know > of any? > AFAIK Veritas only does the first part of what i mentioned above (invalid on disk data recognition). They do logging by default for RAID5 volumes and optionaly also for RAID1 volumes. In the RAID5 (with logging) case they can figure out if an n-1 disk write took place and can rebuild the data. In case an n-m (1 < m < n) took place they can therefore at least recognize the desaster ;-) In the RAID1 (with logging) scenario they are able to recognize, which of the n mirrors have actual data and which ones don't to deliver the actual data to the user and to try to make the other mirrors consistent. But because it's a software solution without any NVRAM support they can't handle the data redundancy case. Heinz
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure = problems ?
Hi, On Tue, 11 Jan 2000 15:03:03 +0100, mauelsha <[EMAIL PROTECTED]> said: >> THIS IS EXPECTED. RAID-5 isn't proof against multiple failures, and the >> only way you can get bitten by this failure mode is to have a system >> failure and a disk failure at the same time. > To try to avoid this kind of problem some brands do have additional > logging (to disk which is slow for sure or to NVRAM) in place, which > enables them to at least recognize the fault to avoid the > reconstruction of invalid data or even enables them to recover the > data by using redundant copies of it in NVRAM + logging information > what could be written to the disks and what not. Absolutely: the only way to avoid it is to make the data+parity updates atomic, either in NVRAM or via transactions. I'm not aware of any software RAID solutions which do such logging at the moment: do you know of any? --Stephen
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure = problems ?
"Stephen C. Tweedie" wrote: > > Hi, > > This is a FAQ: I've answered it several times, but in different places, > THIS IS EXPECTED. RAID-5 isn't proof against multiple failures, and the > only way you can get bitten by this failure mode is to have a system > failure and a disk failure at the same time. > To try to avoid this kind of problem some brands do have additional logging (to disk which is slow for sure or to NVRAM) in place, which enables them to at least recognize the fault to avoid the reconstruction of invalid data or even enables them to recover the data by using redundant copies of it in NVRAM + logging information what could be written to the disks and what not. Heinz
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure = problems ?
Hi, On Tue, 11 Jan 2000 20:17:22 +0100, Benno Senoner <[EMAIL PROTECTED]> said: > Assume all RAID code - FS interaction problems get fixed, since a > linux soft-RAID5 box has no battery backup, does this mean that we > will loose data ONLY if there is a power failure AND successive disk > failure ? If we loose the power and then after reboot all disks > remain intact can the RAID layer reconstruct all information in a safe > way ? Yes. --Stephen
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure = problems ?
"Stephen C. Tweedie" wrote: (...) > > 3) The soft-raid backround rebuild code reads and writes through the >buffer cache with no synchronisation at all with other fs activity. >After a crash, this background rebuild code will kill the >write-ordering attempts of any journalling filesystem. > >This affects both ext3 and reiserfs, under both RAID-1 and RAID-5. > > Interaction 3) needs a bit more work from the raid core to fix, but it's > still not that hard to do. > > So, can any of these problems affect other, non-journaled filesystems > too? Yes, 1) can: throughout the kernel there are places where buffers > are modified before the dirty bits are set. In such places we will > always mark the buffers dirty soon, so the window in which an incorrect > parity can be calculated is _very_ narrow (almost non-existant on > non-SMP machines), and the window in which it will persist on disk is > also very small. > > This is not a problem. It is just another example of a race window > which exists already with _all_ non-battery-backed RAID-5 systems (both > software and hardware): even with perfect parity calculations, it is > simply impossible to guarantee that an entire stipe update on RAID-5 > completes in a single, atomic operation. If you write a single data > block and its parity block to the RAID array, then on an unexpected > reboot you will always have some risk that the parity will have been > written, but not the data. On a reboot, if you lose a disk then you can > reconstruct it incorrectly due to the bogus parity. > > THIS IS EXPECTED. RAID-5 isn't proof against multiple failures, and the > only way you can get bitten by this failure mode is to have a system > failure and a disk failure at the same time. > > > --Stephen thank you very much for these clear explanations, Last doubt: :-) Assume all RAID code - FS interaction problems get fixed, since a linux soft-RAID5 box has no battery backup, does this mean that we will loose data ONLY if there is a power failure AND successive disk failure ? If we loose the power and then after reboot all disks remain intact can the RAID layer reconstruct all information in a safe way ? The problem is that power outages are unpredictable even in presence of UPSes therefore it is important to have some protection against power losses. regards, Benno.
[FAQ-answer] Re: soft RAID5 + journalled FS + power failure = problems ?
Hi, This is a FAQ: I've answered it several times, but in different places, so here's a definitive answer which will be my last one: future questions will be directed to the list archives. :-) On Tue, 11 Jan 2000 16:20:35 +0100, Benno Senoner <[EMAIL PROTECTED]> said: >> then raid can miscalculate parity by assuming that the buffer matches >> what is on disk, and that can actually cause damage to other data >> than the data being written if a disk dies and we have to start using >> parity for that stripe. > do you know if using soft RAID5 + regular etx2 causes the same sort of > damages, or if the corruption chances are lower when using a non > journaled FS ? Sort of. See below. > is the potential corruption caused by the RAID layer or by the FS > layer ? ( does need the FS code or the RAID code to be fixed ?) It is caused by neither: it is an interaction effect. > if it's caused by the FS layer, how does behave XFS (not here yet ;-) > ) or ReiserFS in this case ? They will both fail in the same way. Right, here's the problem: The semantics of the linux-2.2 buffer cache are not well defined with respect to write ordering. There is no policy to guide what gets written and when: the writeback caching can trickle to disk at any time, and other system components such as filesystems and the VM can force a write-back of data to disk at any time. Journaling imposes write ordering constraints which insist that data in the buffer cache *MUST NOT* be written to disk unless the filesystem explicitly says so. RAID-5 needs to interact directly with the buffer cache in order to be able to improve performance. There are three nasty interactions which result: 1) RAID-5 tries to bunch writes of dirty buffers up so that all the data in a stripe gets written to disk at once. For RAID-5, this is very much faster than dribbling the stripe back one disk at a time. Unfortunately, this can result in dirty buffers being written to disk earlier than the filesystem expected, with the result that on a crash, the filesystem journal may not be entirely consistent. This interaction hits ext3, which stores its pending transaction buffer updates in the buffer cache with the b_dirty bit set. 2) RAID-5 peeks into the buffer cache to look for buffer contents in order to calculate parity without reading all of the disks in a stripe. If a journaling system tries to prevent modified data from being flushed to disk by deferring the setting of the buffer dirty flag, then RAID-5 will think that the buffer, being clean, matches the state of the disk and so it will calculate parity which doesn't actually match what is on disk. If we crash and one disk fails on reboot, wrong parity may prevent recovery of the lost data. This interaction hits reiserfs, which stores its pending transaction buffer updates in the buffer cache with the b_dirty bit clear. Both interactions 1) and 2) can be solved by making RAID-5 completely avoid buffers which have an incremented b_count reference count, and making sure that the filesystems all hold that count raised when the buffers are in an inconsistent or pinned state. 3) The soft-raid backround rebuild code reads and writes through the buffer cache with no synchronisation at all with other fs activity. After a crash, this background rebuild code will kill the write-ordering attempts of any journalling filesystem. This affects both ext3 and reiserfs, under both RAID-1 and RAID-5. Interaction 3) needs a bit more work from the raid core to fix, but it's still not that hard to do. So, can any of these problems affect other, non-journaled filesystems too? Yes, 1) can: throughout the kernel there are places where buffers are modified before the dirty bits are set. In such places we will always mark the buffers dirty soon, so the window in which an incorrect parity can be calculated is _very_ narrow (almost non-existant on non-SMP machines), and the window in which it will persist on disk is also very small. This is not a problem. It is just another example of a race window which exists already with _all_ non-battery-backed RAID-5 systems (both software and hardware): even with perfect parity calculations, it is simply impossible to guarantee that an entire stipe update on RAID-5 completes in a single, atomic operation. If you write a single data block and its parity block to the RAID array, then on an unexpected reboot you will always have some risk that the parity will have been written, but not the data. On a reboot, if you lose a disk then you can reconstruct it incorrectly due to the bogus parity. THIS IS EXPECTED. RAID-5 isn't proof against multiple failures, and the only way you can get bitten by this failure mode is to have a system failure and a disk failure at the same time. --Stephen