Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Sat, 13 Sep 2003 03:54:07 +1000, Russell Coker <[EMAIL PROTECTED]> wrote in message <[EMAIL PROTECTED]>: > On Sat, 13 Sep 2003 02:01, Rich Puhek wrote: > > Ted will know a lot more about this than I do, but I'd think that if > > the first two superblocks are corrupt, the likelihood of superblock > > number 3 or whatever being good is pretty low compared to the odds > > that the drive/parition is shot. Perhaps that's why e2fsck just > > gives up on the extra superblocks? Of course, then why bother > > including them? > > In principle it seems to be always a good idea to have more copies of > your data than the software knows how to deal with automatically. > Then if the software screws up and mangles everything it touches you > may still have a chance to manually do whatever is necessary to save > it. > > I recall a story about a tape drive that became damaged in a way that > made it destroy every tape put in it. When some data needed to be > restored the first tape didn't work, they tried it in a second drive > and it was proven to be dead. They got a second backup and repeated > the same proceedure... > > It was only when they were down to their last backup that someone got > wise and used a different tape drive for the first attempt, which > resulted in the data being read without any errors. > > In that situation if a tape robot had control then it would certainly > have trashed all copies of the data. I can imagine similar things > happening to a file system with a dieing hard disk. ..agreed, but there are vast differences between "the first 2", "every other" and "all". ;-) -- ..med vennlig hilsen = with Kind Regards from Arnt... ;-) ...with a number of polar bear hunters in his ancestry... Scenarios always come in sets of three: best case, worst case, and just in case. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Sat, 13 Sep 2003 02:01, Rich Puhek wrote: > Ted will know a lot more about this than I do, but I'd think that if the > first two superblocks are corrupt, the likelihood of superblock number 3 > or whatever being good is pretty low compared to the odds that the > drive/parition is shot. Perhaps that's why e2fsck just gives up on the > extra superblocks? Of course, then why bother including them? In principle it seems to be always a good idea to have more copies of your data than the software knows how to deal with automatically. Then if the software screws up and mangles everything it touches you may still have a chance to manually do whatever is necessary to save it. I recall a story about a tape drive that became damaged in a way that made it destroy every tape put in it. When some data needed to be restored the first tape didn't work, they tried it in a second drive and it was proven to be dead. They got a second backup and repeated the same proceedure... It was only when they were down to their last backup that someone got wise and used a different tape drive for the first attempt, which resulted in the data being read without any errors. In that situation if a tape robot had control then it would certainly have trashed all copies of the data. I can imagine similar things happening to a file system with a dieing hard disk. -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
Arnt Karlsen wrote: ..and after a journal death, and fsck, the raid set will be able to re-establish itself, no? Or does the journal do both/all disks in a raid set? The FS doesn't know or care about RAID-anything, as far as I know. Doesn't the FS just tell /dev/hda1, /dev/sda1, or /dev/md1 to "write this data to this block". Very oversimplified, I know, but it doesn't seem like RAID should be part of the discussion here (aside from the fact that a RAID1 or RAID5 config *may* reduce the occurance of problems that would bring journaling into play). ..how does the journalling system choose which blocks to work from? What I've been able to see, the journal dies when their super blocks go bad? The filesystem needs the superblock in order to find the journal. If you have a single gigantic filesystem mounted on /, then if the primary superblock is corrupted, the kernel will not be able to mount /, and you're hosed. E2fsck will automatically try the primary superblock, and if that is corrupt, it will try the first backup superblock. Failing that, a human will need to manually try one of the other backup superblocks, if it is corrupted as well. ..this can be tuned to try more blocks before whining for manpower? Ted will know a lot more about this than I do, but I'd think that if the first two superblocks are corrupt, the likelihood of superblock number 3 or whatever being good is pretty low compared to the odds that the drive/parition is shot. Perhaps that's why e2fsck just gives up on the extra superblocks? Of course, then why bother including them? I've had a bunch of Debian systems running on various (sometimes crappy) hardware for years. I've seen very few cases where a superblock was corrupt and e2fsck puked. In each case, it was on a drive that was old enough that it wasn't worth fussing over any more, so I just replaced the drive. Some of the drives are happy running on wintel boxes, others are just paperweights. If your primary superblock is getting corrupted often, then first of all, you should try to figure out why this is happening, and take affirmative actions to prevent them. (The fact that you're reporting marginal power is supremely suspicious; marginal power can cause disk corruptions very easily. Getting higher quality power supplies will help, but a UPS is the first thing I would get.) ..yeah, I'm working on the power bit. ;-) Secondly, you're better off using a small root filesystem that generally isn't modified often. What I normally do is use a 128 meg root filesystem, with a separate /var partition (or /var symlinked to /usr/var), and /tmp as a ram disk. With the root filesystem rarely changing, it's much less likely that it will be corrupted due to hardware problems. Then the root filesystem can come up, and e2fsck can repair the other filesystems. ..yeah, except for /tmp on ramdisk, that's how I do my boxes, and my isp business client is learning his lesson good. ;-) But I repeat, your filesystems shouldn't be getting corrupted in the first place. Using a separate root filesystem is a good idea, and will help you recover from hardware problems, but your primary priority should be to avoid the hardware problems in the first place. - Ted -- _ Rich Puhek ETN Systems Inc. 2125 1st Ave East Hibbing MN 55746 tel: 218.262.1130 email: [EMAIL PROTECTED] _ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Thu, 11 Sep 2003 14:03:17 -0400, Theodore Ts'o <[EMAIL PROTECTED]> wrote in message <[EMAIL PROTECTED]>: > On Thu, Sep 11, 2003 at 02:04:19AM +0200, Arnt Karlsen wrote: > > ..I still believe in raid-1, but, ext3fs??? > > > > ..how does xfs, jfs and Reiserfs compare? > > If you have random disk corruptions happening as often as you are, no > filesystem is going to be able to help you. The only question is how > quickly the filesystem notices *before* user data starts getting > irrecovably lost. Ext3 generally tends to be one of the more paranoid > filesystems about checking assertions and "should never happen cases", > although I don't know how it compares to reiserfs, jfs, et. al. ..ok, how about ext3 versus ext2 on raid-1? > > > Unless you're talking about *software* RAID-1 under Linux, and the > > > > ..bingo, I should have said so. > > > > > fact that you have to rebuild mirror after an unclean shutdown, > > > but that's arguably a defect in the software RAID 1 > > > implementation. On other systems, such as AIX's software RAID-1, > > > the RAID-1 is implemented with a journal, > > > > ..but software RAID-1 under Linux is not or did I miss something > > here? > > No, software RAID-1 does not do journalling at the RAID level. That > means that in the case of a unclean shutdown, the RAID system will > need to restablish the mirror. ..and after a journal death, and fsck, the raid set will be able to re-establish itself, no? Or does the journal do both/all disks in a raid set? > As I said, this is a performance issue, since half the disk bandwidth > of the RAID array will be diverted to restablishing the mirror during > the unclean shutdown. Note also this is true *regardless* of what > filesystem you use, journaling and non-journaling. ..noted, non-issue in my case. > > ..ok, for my throttle boxes, here is where I should honk the > > horn and divert logging to a log server and schedule a fsck? > > (And ofcourse just reboot my mailservers on the same error.) > > For your throttle boxes, do you need to have any writes to your > filesystems at all? If what you care about is zero downtime, why not > just run syslog over the network, and keep all of your filesystems > mounted read/only? Some extreme configurations I've seen (especially > where ISP's don't have direct/easy access to their systems at remote > POP's), use a read-only flash filesystem, and a ramdisk for /tmp, and > no spinning disks at all. This significantly increases reliability > caused by disk failures, since the hard drive is often the most > vulnerable part of the system, especially in the face of heat > vibrations, etc. ..sounds like an idea. The major point against is geography, I like to arrive at stand-alone one-box solutions, but networked logging is a good way to verify the network status. What is used, ssh tunnels? > > ..IMHO the debian bootstrap should first read the rpm database > > and generate a deb database, and then do 'apt-get update && \ > > apt-get dist-upgrade'. _Is_ there such a bootstrap beast? > > While this would be interesting for those people who are converting > from Red Hat to Debian, it's a lot more complicated than that, since > you also have to convert over the configuration files; Red Hat and > Debian don't necessarily store files in the same location. ..I know. ;-) > I generally find that for production systems, it's much safer and > simpler to install Debian on a new disk (and on a new system), and > then copy over the new configuration files over. That way, you can > test the system and make sure everything is A-OK before cutting over > something on a production system. ..yeah, my pipe dream. ;-) > (By the way, it seems like 50% of your problems is that you're doing > things on the cheap, and yet you still want 100% reliability. If you > want "carrier-grade reliability", you need to pay a little bit extra, > and do things like have hot spares, and installation scripts that > allow you to create and configure new servers automatically, without > needing manual handwork.) ..hey, the isp shop is not mine, and it _is_ a small operation, so I need to grow it so I can charge'em. ;-) These guys are Wintendo convertites, and I do the hard stuff for 'em. ;-) > > ..256MB, but the disks may be marginal, on the known bad disks I get > > write errors. I have seen this same error on power "blinks", > > failures lasting for about a 1/3 of a second without losing monitor > > sync etc on my desktops, once frying a power supply, but usually > > these "blinks" cause no harm. > > Sounds like you have marginal power. Do you have a UPS (preferably a > continuous UPS) to protect your systems? If not, why not? (Again, > it's a bad idea to expect "carrier-grade relaibility" when you're not > willing pay for the basic high-quality equipment, backup equipment, > and devices such as UPS's to protect your equipment.) ..2 different sites, I have marginal power in my l
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Thu, Sep 11, 2003 at 02:04:19AM +0200, Arnt Karlsen wrote: > ..I still believe in raid-1, but, ext3fs??? > > ..how does xfs, jfs and Reiserfs compare? If you have random disk corruptions happening as often as you are, no filesystem is going to be able to help you. The only question is how quickly the filesystem notices *before* user data starts getting irrecovably lost. Ext3 generally tends to be one of the more paranoid filesystems about checking assertions and "should never happen cases", although I don't know how it compares to reiserfs, jfs, et. al. There are have certainly been cases in the past where people were convinced that there was a bug in ext2, since other filesystems (minix in this particular case) weren't reporting the problem. But, it turned out to be a buffer cache bug, and it was simply that other filesystems were not doing the appropriate assertion checks, and user data was getting lost; the system administrator was just left in blissful ignorance. > > Unless you're talking about *software* RAID-1 under Linux, and the > > ..bingo, I should have said so. > > > fact that you have to rebuild mirror after an unclean shutdown, but > > that's arguably a defect in the software RAID 1 implementation. On > > other systems, such as AIX's software RAID-1, the RAID-1 is > > implemented with a journal, > > ..but software RAID-1 under Linux is not or did I miss something here? No, software RAID-1 does not do journalling at the RAID level. That means that in the case of a unclean shutdown, the RAID system will need to restablish the mirror. As I said, this is a performance issue, since half the disk bandwidth of the RAID array will be diverted to restablishing the mirror during the unclean shutdown. Note also this is true *regardless* of what filesystem you use, journaling and non-journaling. > ..ok, for my throttle boxes, here is where I should honk the > horn and divert logging to a log server and schedule a fsck? > (And ofcourse just reboot my mailservers on the same error.) For your throttle boxes, do you need to have any writes to your filesystems at all? If what you care about is zero downtime, why not just run syslog over the network, and keep all of your filesystems mounted read/only? Some extreme configurations I've seen (especially where ISP's don't have direct/easy access to their systems at remote POP's), use a read-only flash filesystem, and a ramdisk for /tmp, and no spinning disks at all. This significantly increases reliability caused by disk failures, since the hard drive is often the most vulnerable part of the system, especially in the face of heat vibrations, etc. > ..IMHO the debian bootstrap should first read the rpm database > and generate a deb database, and then do 'apt-get update && \ > apt-get dist-upgrade'. _Is_ there such a bootstrap beast? While this would be interesting for those people who are converting from Red Hat to Debian, it's a lot more complicated than that, since you also have to convert over the configuration files; Red Hat and Debian don't necessarily store files in the same location. I generally find that for production systems, it's much safer and simpler to install Debian on a new disk (and on a new system), and then copy over the new configuration files over. That way, you can test the system and make sure everything is A-OK before cutting over something on a production system. (By the way, it seems like 50% of your problems is that you're doing things on the cheap, and yet you still want 100% reliability. If you want "carrier-grade reliability", you need to pay a little bit extra, and do things like have hot spares, and installation scripts that allow you to create and configure new servers automatically, without needing manual handwork.) > ..256MB, but the disks may be marginal, on the known bad disks I get > write errors. I have seen this same error on power "blinks", failures > lasting for about a 1/3 of a second without losing monitor sync etc > on my desktops, once frying a power supply, but usually these "blinks" > cause no harm. Sounds like you have marginal power. Do you have a UPS (preferably a continuous UPS) to protect your systems? If not, why not? (Again, it's a bad idea to expect "carrier-grade relaibility" when you're not willing pay for the basic high-quality equipment, backup equipment, and devices such as UPS's to protect your equipment.) > ..ah. So with a 30GB /var ext3fs raid-1 I would have 25% or 13% > consumed by backup copies of the superblock and block group descriptors? It's an order n**2 problem; so it's not a linear relationship. And most people get annoyed by that kind of overhead, long before it gets to 10% or above. > ..how does the journalling system choose which blocks to work from? > What I've been able to see, the journal dies when their super blocks > go bad? The filesystem needs the superblock in order to find the journal. If you have a single gigantic filesyste
Re: FS performace with lots of files, was: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
Cameron Moore wrote: * [EMAIL PROTECTED] (Russell Coker) [2003.09.10 20:16]: Also you can't have a ReiserFS file system mounted read-only while fsck'ing it. Which makes recovering errors on the root FS very interesting to say the least. What I hate about ext3 is that it doesn't poorly handles dirs with 1000+ files. Haven't seen if they've fixed that yet. There exists a patch (hhttp://people.nl.linux.org/~phillips/htree/ - i think there are other resources out there somewhere ;)) for 2.4.x, but the code should be in the kernel since 2.4.20 for ext2 and for ext3 it seems that it was available before (but there are some 2.4.19-patches out there: http://lwn.net/Articles/11330/) - hopefully somebody can bring some light into this... regards Markus -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Thu, 11 Sep 2003 13:22, Cameron Moore wrote: > > Having a file system decide to panic the kernel because your mount > > options instructed it to (ext3) is one thing. Having the file system > > driver corrupt random kernel memory and cause an Oops (Reiser) is > > another. The ReiserFS team's response to such issues has not made me > > happy so I am removing it from all my machines and converting to Ext3. > > Can you provide links to your discussions with the ReiserFS team? I'm > considering using ReiserFS on some mail servers. Please share your > experiences. It was on the reiserfs list a couple of months ago. They told me that it would be impossible to check all data for consistency when reading it from disk without having a huge performance hit. Ext3 appears to manage this (or at least corrupt ext2/3 file systems tend not to cause kernel memory corruption). -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
* [EMAIL PROTECTED] (Russell Coker) [2003.09.10 20:16]: > On Thu, 11 Sep 2003 10:04, Arnt Karlsen wrote: > > ..I still believe in raid-1, but, ext3fs??? > > ..how does xfs, jfs and Reiserfs compare? > > ReiserFS has many situations where file system corruption can make operations > such as "find /" trigger a kernel Oops. > > Having a file system decide to panic the kernel because your mount options > instructed it to (ext3) is one thing. Having the file system driver corrupt > random kernel memory and cause an Oops (Reiser) is another. The ReiserFS > team's response to such issues has not made me happy so I am removing it from > all my machines and converting to Ext3. Can you provide links to your discussions with the ReiserFS team? I'm considering using ReiserFS on some mail servers. Please share your experiences. > Also you can't have a ReiserFS file system mounted read-only while fsck'ing > it. Which makes recovering errors on the root FS very interesting to say the > least. What I hate about ext3 is that it doesn't poorly handles dirs with 1000+ files. Haven't seen if they've fixed that yet. -- Cameron Moore [ Smoking cures weight problems... eventually. ] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Thu, 11 Sep 2003 10:04, Arnt Karlsen wrote: > ..I still believe in raid-1, but, ext3fs??? > > ..how does xfs, jfs and Reiserfs compare? ReiserFS has many situations where file system corruption can make operations such as "find /" trigger a kernel Oops. Having a file system decide to panic the kernel because your mount options instructed it to (ext3) is one thing. Having the file system driver corrupt random kernel memory and cause an Oops (Reiser) is another. The ReiserFS team's response to such issues has not made me happy so I am removing it from all my machines and converting to Ext3. Also you can't have a ReiserFS file system mounted read-only while fsck'ing it. Which makes recovering errors on the root FS very interesting to say the least. -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Wed, 10 Sep 2003 14:39:44 -0400, Theodore Ts'o <[EMAIL PROTECTED]> wrote in message <[EMAIL PROTECTED]>: > On Wed, Sep 10, 2003 at 01:36:32AM +0200, Arnt Karlsen wrote: > > > But for an unattended server, most of the time it's probably > > > better to force the system to reboot so you can restore service > > > ASAP. > > > > ..even for raid-1 disks??? _Is_ there a combination of raid-1 and > > journalling fs'es for linux that's ready for carrier grade service? > > I'm not sure what you're referring to here. ..isp gateway boxes that I like to keep running 24/7/365.2442etc. The idea behind using ext3fs and raid-1 was to minimize the risks and downtime. ..I still believe in raid-1, but, ext3fs??? ..how does xfs, jfs and Reiserfs compare? ..what I like with ext3 is it can be mounted as ext2 in a pinch, assuming you can get to /etc/fstab. ;-) > As far as I'm concerned, if the filesystem is inconsistent, panic'ing > and letting the system get back to a known state is always the right > answer. RAID-1 shouldn't be an issue here. ..shouldn't be a problem, agreed. > Unless you're talking about *software* RAID-1 under Linux, and the ..bingo, I should have said so. > fact that you have to rebuild mirror after an unclean shutdown, but > that's arguably a defect in the software RAID 1 implementation. On > other systems, such as AIX's software RAID-1, the RAID-1 is > implemented with a journal, ..but software RAID-1 under Linux is not or did I miss something here? > so that there is no need to rebuild the > mirror after an unclean shutdown. Alternatively, you could use a > hardware RAID-1 solution, which also wouldn't have a problem with an > unclean shutdowns. > > In any case, the speed hit for doing an panic with the current Linux > MD implementation is a performance issue, and in my book reliability 0> takes precedence over performance. So yes, even for RAID-1, and it > doesn't matter what filesystem, if there's a problem, you should > reboot. If you don't like the resulting performance hit after the > panic, get a hardware RAID controller. ..agreed, and disagreed; for my isp gateway throttles, reboots means isp service downtime. Logs can be tee'ed to log servers. For a mail server I agree fully. > > > I'm not sure what you mean by this. When there is a filesystem > > > error > > > > ..add an "healthy" dose of irony to repair in "repair". ;-) > > > > > detected, all writes to the filesystem are immediately aborted, > > > which > > > > ...precludes reporting the error? > > No, if you are using a networked syslog daemon, it certainly does > preclode reporting the error. If you mean the case where there is a > filesystem error on the partition where /var/log resides, yes, we > consider it better to abort writes to the filesystem than to attempt > to write out the log message to a compromised filesystem. ..ok, for my throttle boxes, here is where I should honk the horn and divert logging to a log server and schedule a fsck? (And ofcourse just reboot my mailservers on the same error.) ..bottom line is that same journal death needs different medication depending on which purpose etc the box serves. > > .._exactly_, but it is not reported to any of the system users. > > A system reboot _is_ reported usefully to the system users, all > > tty users get the news. > > The message that a filesystem has been remounted read-only is logged > as a KERN_CRIT message. If you wish, you can configure your > syslog.conf so that all tty users are notified of kern.crit level ..doh! I _like_ fixes this simple. ;-) > errors. That's probably a good thing, although it's not clear that a > typical user will understand what to do when they are a told that a > filesystem has been remounted read-only. ..so clue whack'em. On a desktop they are not gonna loose much more than 5 seconds worth of work with the default commits, and scaring them with 30 years research work loss is good way to slap'em into doing the right things. > Certainly it is trivial to configure sysklogd to grab that message and > do whatever you would like with it, if you were to so choose. If you > want to "honk the big horn", that is certainly within your power to > make the system do that. > > If you believe that Red Hat should configure their syslog.conf files > to do this by default, feel free to submit a bug report / suggestion > with Red Hat. ..heh, the last time I tried that, was: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=89171 ..the "This is a duplicate of various other bugs. We're looking at the issues involved." is what brought me over to Debian. ..now, if installing and raid disks could be as easy... ..IMHO the debian bootstrap should first read the rpm database and generate a deb database, and then do 'apt-get update && \ apt-get dist-upgrade'. _Is_ there such a bootstrap beast? > > > of uncommitted data which has not been written out to disk.) So > > > in general, not r
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Wed, Sep 10, 2003 at 01:36:32AM +0200, Arnt Karlsen wrote: > > But for an unattended server, most of the time it's probably better to > > force the system to reboot so you can restore service ASAP. > > ..even for raid-1 disks??? _Is_ there a combination of raid-1 and > journalling fs'es for linux that's ready for carrier grade service? I'm not sure what you're referring to here. As far as I'm concerned, if the filesystem is inconsistent, panic'ing and letting the system get back to a known state is always the right answer. RAID-1 shouldn't be an issue here. Unless you're talking about *software* RAID-1 under Linux, and the fact that you have to rebuild mirror after an unclean shutdown, but that's arguably a defect in the software RAID 1 implementation. On other systems, such as AIX's software RAID-1, the RAID-1 is implemented with a journal, so that there is no need to rebuild the mirror after an unclean shutdown. Alternatively, you could use a hardware RAID-1 solution, which also wouldn't have a problem with an unclean shutdowns. In any case, the speed hit for doing an panic with the current Linux MD implementation is a performance issue, and in my book reliability takes precedence over performance. So yes, even for RAID-1, and it doesn't matter what filesystem, if there's a problem, you should reboot. If you don't like the resulting performance hit after the panic, get a hardware RAID controller. > > I'm not sure what you mean by this. When there is a filesystem error > > ..add an "healthy" dose of irony to repair in "repair". ;-) > > > detected, all writes to the filesystem are immediately aborted, which > > ...precludes reporting the error? No, if you are using a networked syslog daemon, it certainly does preclode reporting the error. If you mean the case where there is a filesystem error on the partition where /var/log resides, yes, we consider it better to abort writes to the filesystem than to attempt to write out the log message to a compromised filesystem. > .._exactly_, but it is not reported to any of the system users. > A system reboot _is_ reported usefully to the system users, all > tty users get the news. The message that a filesystem has been remounted read-only is logged as a KERN_CRIT message. If you wish, you can configure your syslog.conf so that all tty users are notified of kern.crit level errors. That's probably a good thing, although it's not clear that a typical user will understand what to do when they are a told that a filesystem has been remounted read-only. Certainly it is trivial to configure sysklogd to grab that message and do whatever you would like with it, if you were to so choose. If you want to "honk the big horn", that is certainly within your power to make the system do that. If you believe that Red Hat should configure their syslog.conf files to do this by default, feel free to submit a bug report / suggestion with Red Hat. > > of uncommitted data which has not been written out to disk.) So in > > general, not running the journal will leave you in a worse state after > > rebooting, compared to running the journal. > > ..it appears my experience disagrees with your expertize here. > With more data, I would have been able to advice intelligently > on when to and when not to run the journal, I believe we agree > not running the journal is adviceable if the system has been > left limping like this for a few hours. How long the system has been left limping doesn't really matter. The real issue is that there may be critical data that has been written to the journal that was not written to the filesystem before the journal was aborted and the filesystem left in a read-only state. This might, for example, include a user's thesis or several year's of research. (Why such work might not be backed up is a question I will leave for another day, and falls into the "criminally negligent system administrator" category) In general, you're better off running the journal after a journal abort. You have may think you have experiences to the contrary, but are you sure? Unless you snapshot the entire filesystem, and try it both ways, you can't really know for sure. There are classes of errors where the filesystem has been completely trashed, and whether or not you run the journal won't make a bit of difference. The much more important question is to figure out why the filesystem got trashed in the first place. Do you have marginal memory? hard drives? Are you running a beta-test kernel that might be buggy? Fixing the proximate cause is always the most important thing to do; since in the end, no matter how clever a filesystem, if you have buggy hardware or buggy device drivers, in the end you *will* be screwed. A filesystem can't compensate for those sorts of shortcomings. > ..and, on a raid-1 disk set, a failure oughtta cut off the one bad > fs and not shoot down the entire raid set because that one fs fails. I agree. When i
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Mon, 8 Sep 2003 12:05:24 -0400, Theodore Ts'o <[EMAIL PROTECTED]> wrote in message <[EMAIL PROTECTED]>: > On Sun, Sep 07, 2003 at 07:24:27PM +0200, Arnt Karlsen wrote: > > > What happens on error conditions can be set through tune2fs or as > > > a mount option. Having it remount read-only is probably better > > > than panicing the kernel. > > > > ..yeah, except in /var/log, /var/spool et al, I also lean towards > > panic in /home. > > I tend to use remount read-only feature on desktops, where it's useful > for me to be able to save my work on some other filesystem before I > reboot my system. ..remount read-only is ok, as long as the bugle blows. IME, it doesn't. > But for an unattended server, most of the time it's probably better to > force the system to reboot so you can restore service ASAP. ..even for raid-1 disks??? _Is_ there a combination of raid-1 and journalling fs'es for linux that's ready for carrier grade service? > > > When it happens a reboot may be a good idea, in which case a fsck > > > to fix the problem should occur automatically. > > > > ..should, agrrrRRRrrreed. IME (RH73 - RH9 and woody) it does > > not. > > > > ..what happens is the journaling dies, leaving a good fs intact, > > on rebooting, the dead journal will "repair" the fs wiping good > > data off the fs. > > I'm not sure what you mean by this. When there is a filesystem error ..add an "healthy" dose of irony to repair in "repair". ;-) > detected, all writes to the filesystem are immediately aborted, which ...precludes reporting the error? > means the filesystem on disk is left in an unstable state. (It my > look consistent while the system is still running, but there is a lot .._exactly_, but it is not reported to any of the system users. A system reboot _is_ reported usefully to the system users, all tty users get the news. > of uncommitted data which has not been written out to disk.) So in > general, not running the journal will leave you in a worse state after > rebooting, compared to running the journal. ..it appears my experience disagrees with your expertize here. With more data, I would have been able to advice intelligently on when to and when not to run the journal, I believe we agree not running the journal is adviceable if the system has been left limping like this for a few hours. > An alternative course of action, which we don't currently support > would be to attempt to write everything to disk and quiesce the > filesystem before remounting it read-only. The problem is that trying > to flush everything out to disk might leave things in a worse state > than just freezing all writes. ..could a ramdisk help? As in; store in ramdisk between journal commits and honk the big horn on non-recoverable errors? ..and, on a raid-1 disk set, a failure oughtta cut off the one bad fs and not shoot down the entire raid set because that one fs fails. > The real problem is that in the face of filesystem corruption, by the > time the filesystem notices that something is wrong, there may be > significant damage that has already taken place. Some of it may > already have been written to journal, in which case not replaying the > journal might leave you with more data to recover; on the other hand, > not replaying the journal could also risk leaving your filesystem very > badly corrupted with data which the mail server had promised it had > accepted, not actually getting saved by the filesystem. > > A human could make a read/write snapshot of the filesystem and try it > both ways, but if you want automatic recovery, it's probably better to > run the journal than not to run it. ..agreed, and with ext3 on a raid-1 set, this _oughtta_ be easy. > > ..the errors=remount,ro fstab option remounts the fs ro but fails > > to tell the system, so the system merrily "logs" data and "accepts" > > mail etc 'till Dooms Day, and especially on raid-1 disks I sort of > > expected redundancy, like in "autofeather the bad prop and trim out > > the yaw" and "autopatch that holed fuel tank", and "auto-sync the > > props", I mean, this was done _60_years_ ago in aviation to help > > win WWII, and ext3 on raid-1 floats around USS Yorktown-style??? > > If the system merrily logs data and accepts it, even after the > filesystem is remounted read-only, that implies that the MTA is > horribly buggy, not doing the most basic of error return code checks. ..agreed, pointer hints to such basic hints to such basics? > If the filesystem is remounted read-only, then writes to the > filesystem *will* return an error. If the application doesn't notice, > then it's the application which is at fault, not ext3. ..on Woody, ext3 actually report the remount to /dev/console. ;-) _Nothing_ elsewhere. Dunno about Red Hat, never had one hooked to a monitor upon a journal failure. ..all I know is RH-7.3-8-9 and Woody does _not_ report ext3 journal failures in any way I am aware of and can make use of
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Sun, Sep 07, 2003 at 07:24:27PM +0200, Arnt Karlsen wrote: > > What happens on error conditions can be set through tune2fs or as a > > mount option. Having it remount read-only is probably better than > > panicing the kernel. > > ..yeah, except in /var/log, /var/spool et al, I also lean towards > panic in /home. I tend to use remount read-only feature on desktops, where it's useful for me to be able to save my work on some other filesystem before I reboot my system. But for an unattended server, most of the time it's probably better to force the system to reboot so you can restore service ASAP. > > When it happens a reboot may be a good idea, in which case a fsck to > > fix the problem should occur automatically. > > ..should, agrrrRRRrrreed. IME (RH73 - RH9 and woody) it does not. > > ..what happens is the journaling dies, leaving a good fs intact, > on rebooting, the dead journal will "repair" the fs wiping good > data off the fs. I'm not sure what you mean by this. When there is a filesystem error detected, all writes to the filesystem are immediately aborted, which means the filesystem on disk is left in an unstable state. (It my look consistent while the system is still running, but there is a lot of uncommitted data which has not been written out to disk.) So in general, not running the journal will leave you in a worse state after rebooting, compared to running the journal. An alternative course of action, which we don't currently support would be to attempt to write everything to disk and quiesce the filesystem before remounting it read-only. The problem is that trying to flush everything out to disk might leave things in a worse state than just freezing all writes. The real problem is that in the face of filesystem corruption, by the time the filesystem notices that something is wrong, there may be significant damage that has already taken place. Some of it may already have been written to journal, in which case not replaying the journal might leave you with more data to recover; on the other hand, not replaying the journal could also risk leaving your filesystem very badly corrupted with data which the mail server had promised it had accepted, not actually getting saved by the filesystem. A human could make a read/write snapshot of the filesystem and try it both ways, but if you want automatic recovery, it's probably better to run the journal than not to run it. > ..the errors=remount,ro fstab option remounts the fs ro but fails > to tell the system, so the system merrily "logs" data and "accepts" > mail etc 'till Dooms Day, and especially on raid-1 disks I sort of > expected redundancy, like in "autofeather the bad prop and trim out > the yaw" and "autopatch that holed fuel tank", and "auto-sync the > props", I mean, this was done _60_years_ ago in aviation to help > win WWII, and ext3 on raid-1 floats around USS Yorktown-style??? If the system merrily logs data and accepts it, even after the filesystem is remounted read-only, that implies that the MTA is horribly buggy, not doing the most basic of error return code checks. If the filesystem is remounted read-only, then writes to the filesystem *will* return an error. If the application doesn't notice, then it's the application which is at fault, not ext3. That being said, my preference for servers is to panic immediately on the first sign of trouble, and let the system fsck and come back again. Even if your MTA is non-criminally-negligent, and checks error codes, the best it can do is return a SMTP temporary failure, which still doesn't keep the mail flowing. You're probably best off rebooting the machine and restoring service. - Ted -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Mon, 8 Sep 2003 00:20:12 +1000, Russell Coker <[EMAIL PROTECTED]> wrote in message <[EMAIL PROTECTED]>: > On Mon, 8 Sep 2003 00:17, Arnt Karlsen wrote: > > ..I have had a few cases of ext3fs'es, even on raid-1, going > > read-only on errors, what do you guys use to bring them back > > into service? > > What happens on error conditions can be set through tune2fs or as a > mount option. Having it remount read-only is probably better than > panicing the kernel. ..yeah, except in /var/log, /var/spool et al, I also lean towards panic in /home. > When it happens a reboot may be a good idea, in which case a fsck to > fix the problem should occur automatically. ..should, agrrrRRRrrreed. IME (RH73 - RH9 and woody) it does not. ..what happens is the journaling dies, leaving a good fs intact, on rebooting, the dead journal will "repair" the fs wiping good data off the fs. ..compare 'df -h' and 'cat /proc/mounts' on such a system. ..the errors=remount,ro fstab option remounts the fs ro but fails to tell the system, so the system merrily "logs" data and "accepts" mail etc 'till Dooms Day, and especially on raid-1 disks I sort of expected redundancy, like in "autofeather the bad prop and trim out the yaw" and "autopatch that holed fuel tank", and "auto-sync the props", I mean, this was done _60_years_ ago in aviation to help win WWII, and ext3 on raid-1 floats around USS Yorktown-style??? -- ..med vennlig hilsen = with Kind Regards from Arnt... ;-) ...with a number of polar bear hunters in his ancestry... Scenarios always come in sets of three: best case, worst case, and just in case. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Mon, 8 Sep 2003 00:17, Arnt Karlsen wrote: > ..I have had a few cases of ext3fs'es, even on raid-1, going > read-only on errors, what do you guys use to bring them back > into service? What happens on error conditions can be set through tune2fs or as a mount option. Having it remount read-only is probably better than panicing the kernel. When it happens a reboot may be a good idea, in which case a fsck to fix the problem should occur automatically. -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Sun, 7 Sep 2003 12:34:45 +1000, Russell Coker <[EMAIL PROTECTED]> wrote in message <[EMAIL PROTECTED]>: > > Also I believe that in Ext3 if you write data to a file and then > unlink the file before the data is committed to disk then the data > will never be written. So there seems no loss as long as the file > isn't opened with O_SYNC and you don't call fsync() (and no-one calls > sync()). > ..I have had a few cases of ext3fs'es, even on raid-1, going read-only on errors, what do you guys use to bring them back into service? -- ..med vennlig hilsen = with Kind Regards from Arnt... ;-) ...with a number of polar bear hunters in his ancestry... Scenarios always come in sets of three: best case, worst case, and just in case. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]