Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
Arnt Karlsen wrote: ..and after a journal death, and fsck, the raid set will be able to re-establish itself, no? Or does the journal do both/all disks in a raid set? The FS doesn't know or care about RAID-anything, as far as I know. Doesn't the FS just tell /dev/hda1, /dev/sda1, or /dev/md1 to write this data to this block. Very oversimplified, I know, but it doesn't seem like RAID should be part of the discussion here (aside from the fact that a RAID1 or RAID5 config *may* reduce the occurance of problems that would bring journaling into play). ..how does the journalling system choose which blocks to work from? What I've been able to see, the journal dies when their super blocks go bad? The filesystem needs the superblock in order to find the journal. If you have a single gigantic filesystem mounted on /, then if the primary superblock is corrupted, the kernel will not be able to mount /, and you're hosed. E2fsck will automatically try the primary superblock, and if that is corrupt, it will try the first backup superblock. Failing that, a human will need to manually try one of the other backup superblocks, if it is corrupted as well. ..this can be tuned to try more blocks before whining for manpower? Ted will know a lot more about this than I do, but I'd think that if the first two superblocks are corrupt, the likelihood of superblock number 3 or whatever being good is pretty low compared to the odds that the drive/parition is shot. Perhaps that's why e2fsck just gives up on the extra superblocks? Of course, then why bother including them? I've had a bunch of Debian systems running on various (sometimes crappy) hardware for years. I've seen very few cases where a superblock was corrupt and e2fsck puked. In each case, it was on a drive that was old enough that it wasn't worth fussing over any more, so I just replaced the drive. Some of the drives are happy running on wintel boxes, others are just paperweights. If your primary superblock is getting corrupted often, then first of all, you should try to figure out why this is happening, and take affirmative actions to prevent them. (The fact that you're reporting marginal power is supremely suspicious; marginal power can cause disk corruptions very easily. Getting higher quality power supplies will help, but a UPS is the first thing I would get.) ..yeah, I'm working on the power bit. ;-) Secondly, you're better off using a small root filesystem that generally isn't modified often. What I normally do is use a 128 meg root filesystem, with a separate /var partition (or /var symlinked to /usr/var), and /tmp as a ram disk. With the root filesystem rarely changing, it's much less likely that it will be corrupted due to hardware problems. Then the root filesystem can come up, and e2fsck can repair the other filesystems. ..yeah, except for /tmp on ramdisk, that's how I do my boxes, and my isp business client is learning his lesson good. ;-) But I repeat, your filesystems shouldn't be getting corrupted in the first place. Using a separate root filesystem is a good idea, and will help you recover from hardware problems, but your primary priority should be to avoid the hardware problems in the first place. - Ted -- _ Rich Puhek ETN Systems Inc. 2125 1st Ave East Hibbing MN 55746 tel: 218.262.1130 email: [EMAIL PROTECTED] _ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Sat, 13 Sep 2003 02:01, Rich Puhek wrote: Ted will know a lot more about this than I do, but I'd think that if the first two superblocks are corrupt, the likelihood of superblock number 3 or whatever being good is pretty low compared to the odds that the drive/parition is shot. Perhaps that's why e2fsck just gives up on the extra superblocks? Of course, then why bother including them? In principle it seems to be always a good idea to have more copies of your data than the software knows how to deal with automatically. Then if the software screws up and mangles everything it touches you may still have a chance to manually do whatever is necessary to save it. I recall a story about a tape drive that became damaged in a way that made it destroy every tape put in it. When some data needed to be restored the first tape didn't work, they tried it in a second drive and it was proven to be dead. They got a second backup and repeated the same proceedure... It was only when they were down to their last backup that someone got wise and used a different tape drive for the first attempt, which resulted in the data being read without any errors. In that situation if a tape robot had control then it would certainly have trashed all copies of the data. I can imagine similar things happening to a file system with a dieing hard disk. -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Sat, 13 Sep 2003 03:54:07 +1000, Russell Coker [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]: On Sat, 13 Sep 2003 02:01, Rich Puhek wrote: Ted will know a lot more about this than I do, but I'd think that if the first two superblocks are corrupt, the likelihood of superblock number 3 or whatever being good is pretty low compared to the odds that the drive/parition is shot. Perhaps that's why e2fsck just gives up on the extra superblocks? Of course, then why bother including them? In principle it seems to be always a good idea to have more copies of your data than the software knows how to deal with automatically. Then if the software screws up and mangles everything it touches you may still have a chance to manually do whatever is necessary to save it. I recall a story about a tape drive that became damaged in a way that made it destroy every tape put in it. When some data needed to be restored the first tape didn't work, they tried it in a second drive and it was proven to be dead. They got a second backup and repeated the same proceedure... It was only when they were down to their last backup that someone got wise and used a different tape drive for the first attempt, which resulted in the data being read without any errors. In that situation if a tape robot had control then it would certainly have trashed all copies of the data. I can imagine similar things happening to a file system with a dieing hard disk. ..agreed, but there are vast differences between the first 2, every other and all. ;-) -- ..med vennlig hilsen = with Kind Regards from Arnt... ;-) ...with a number of polar bear hunters in his ancestry... Scenarios always come in sets of three: best case, worst case, and just in case. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Thu, Sep 11, 2003 at 02:04:19AM +0200, Arnt Karlsen wrote: ..I still believe in raid-1, but, ext3fs??? ..how does xfs, jfs and Reiserfs compare? If you have random disk corruptions happening as often as you are, no filesystem is going to be able to help you. The only question is how quickly the filesystem notices *before* user data starts getting irrecovably lost. Ext3 generally tends to be one of the more paranoid filesystems about checking assertions and should never happen cases, although I don't know how it compares to reiserfs, jfs, et. al. There are have certainly been cases in the past where people were convinced that there was a bug in ext2, since other filesystems (minix in this particular case) weren't reporting the problem. But, it turned out to be a buffer cache bug, and it was simply that other filesystems were not doing the appropriate assertion checks, and user data was getting lost; the system administrator was just left in blissful ignorance. Unless you're talking about *software* RAID-1 under Linux, and the ..bingo, I should have said so. fact that you have to rebuild mirror after an unclean shutdown, but that's arguably a defect in the software RAID 1 implementation. On other systems, such as AIX's software RAID-1, the RAID-1 is implemented with a journal, ..but software RAID-1 under Linux is not or did I miss something here? No, software RAID-1 does not do journalling at the RAID level. That means that in the case of a unclean shutdown, the RAID system will need to restablish the mirror. As I said, this is a performance issue, since half the disk bandwidth of the RAID array will be diverted to restablishing the mirror during the unclean shutdown. Note also this is true *regardless* of what filesystem you use, journaling and non-journaling. ..ok, for my throttle boxes, here is where I should honk the horn and divert logging to a log server and schedule a fsck? (And ofcourse just reboot my mailservers on the same error.) For your throttle boxes, do you need to have any writes to your filesystems at all? If what you care about is zero downtime, why not just run syslog over the network, and keep all of your filesystems mounted read/only? Some extreme configurations I've seen (especially where ISP's don't have direct/easy access to their systems at remote POP's), use a read-only flash filesystem, and a ramdisk for /tmp, and no spinning disks at all. This significantly increases reliability caused by disk failures, since the hard drive is often the most vulnerable part of the system, especially in the face of heat vibrations, etc. ..IMHO the debian bootstrap should first read the rpm database and generate a deb database, and then do 'apt-get update \ apt-get dist-upgrade'. _Is_ there such a bootstrap beast? While this would be interesting for those people who are converting from Red Hat to Debian, it's a lot more complicated than that, since you also have to convert over the configuration files; Red Hat and Debian don't necessarily store files in the same location. I generally find that for production systems, it's much safer and simpler to install Debian on a new disk (and on a new system), and then copy over the new configuration files over. That way, you can test the system and make sure everything is A-OK before cutting over something on a production system. (By the way, it seems like 50% of your problems is that you're doing things on the cheap, and yet you still want 100% reliability. If you want carrier-grade reliability, you need to pay a little bit extra, and do things like have hot spares, and installation scripts that allow you to create and configure new servers automatically, without needing manual handwork.) ..256MB, but the disks may be marginal, on the known bad disks I get write errors. I have seen this same error on power blinks, failures lasting for about a 1/3 of a second without losing monitor sync etc on my desktops, once frying a power supply, but usually these blinks cause no harm. Sounds like you have marginal power. Do you have a UPS (preferably a continuous UPS) to protect your systems? If not, why not? (Again, it's a bad idea to expect carrier-grade relaibility when you're not willing pay for the basic high-quality equipment, backup equipment, and devices such as UPS's to protect your equipment.) ..ah. So with a 30GB /var ext3fs raid-1 I would have 25% or 13% consumed by backup copies of the superblock and block group descriptors? It's an order n**2 problem; so it's not a linear relationship. And most people get annoyed by that kind of overhead, long before it gets to 10% or above. ..how does the journalling system choose which blocks to work from? What I've been able to see, the journal dies when their super blocks go bad? The filesystem needs the superblock in order to find the journal. If you have a single gigantic filesystem mounted on /, then if the primary
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Thu, 11 Sep 2003 14:03:17 -0400, Theodore Ts'o [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]: On Thu, Sep 11, 2003 at 02:04:19AM +0200, Arnt Karlsen wrote: ..I still believe in raid-1, but, ext3fs??? ..how does xfs, jfs and Reiserfs compare? If you have random disk corruptions happening as often as you are, no filesystem is going to be able to help you. The only question is how quickly the filesystem notices *before* user data starts getting irrecovably lost. Ext3 generally tends to be one of the more paranoid filesystems about checking assertions and should never happen cases, although I don't know how it compares to reiserfs, jfs, et. al. ..ok, how about ext3 versus ext2 on raid-1? Unless you're talking about *software* RAID-1 under Linux, and the ..bingo, I should have said so. fact that you have to rebuild mirror after an unclean shutdown, but that's arguably a defect in the software RAID 1 implementation. On other systems, such as AIX's software RAID-1, the RAID-1 is implemented with a journal, ..but software RAID-1 under Linux is not or did I miss something here? No, software RAID-1 does not do journalling at the RAID level. That means that in the case of a unclean shutdown, the RAID system will need to restablish the mirror. ..and after a journal death, and fsck, the raid set will be able to re-establish itself, no? Or does the journal do both/all disks in a raid set? As I said, this is a performance issue, since half the disk bandwidth of the RAID array will be diverted to restablishing the mirror during the unclean shutdown. Note also this is true *regardless* of what filesystem you use, journaling and non-journaling. ..noted, non-issue in my case. ..ok, for my throttle boxes, here is where I should honk the horn and divert logging to a log server and schedule a fsck? (And ofcourse just reboot my mailservers on the same error.) For your throttle boxes, do you need to have any writes to your filesystems at all? If what you care about is zero downtime, why not just run syslog over the network, and keep all of your filesystems mounted read/only? Some extreme configurations I've seen (especially where ISP's don't have direct/easy access to their systems at remote POP's), use a read-only flash filesystem, and a ramdisk for /tmp, and no spinning disks at all. This significantly increases reliability caused by disk failures, since the hard drive is often the most vulnerable part of the system, especially in the face of heat vibrations, etc. ..sounds like an idea. The major point against is geography, I like to arrive at stand-alone one-box solutions, but networked logging is a good way to verify the network status. What is used, ssh tunnels? ..IMHO the debian bootstrap should first read the rpm database and generate a deb database, and then do 'apt-get update \ apt-get dist-upgrade'. _Is_ there such a bootstrap beast? While this would be interesting for those people who are converting from Red Hat to Debian, it's a lot more complicated than that, since you also have to convert over the configuration files; Red Hat and Debian don't necessarily store files in the same location. ..I know. ;-) I generally find that for production systems, it's much safer and simpler to install Debian on a new disk (and on a new system), and then copy over the new configuration files over. That way, you can test the system and make sure everything is A-OK before cutting over something on a production system. ..yeah, my pipe dream. ;-) (By the way, it seems like 50% of your problems is that you're doing things on the cheap, and yet you still want 100% reliability. If you want carrier-grade reliability, you need to pay a little bit extra, and do things like have hot spares, and installation scripts that allow you to create and configure new servers automatically, without needing manual handwork.) ..hey, the isp shop is not mine, and it _is_ a small operation, so I need to grow it so I can charge'em. ;-) These guys are Wintendo convertites, and I do the hard stuff for 'em. ;-) ..256MB, but the disks may be marginal, on the known bad disks I get write errors. I have seen this same error on power blinks, failures lasting for about a 1/3 of a second without losing monitor sync etc on my desktops, once frying a power supply, but usually these blinks cause no harm. Sounds like you have marginal power. Do you have a UPS (preferably a continuous UPS) to protect your systems? If not, why not? (Again, it's a bad idea to expect carrier-grade relaibility when you're not willing pay for the basic high-quality equipment, backup equipment, and devices such as UPS's to protect your equipment.) ..2 different sites, I have marginal power in my lab, but the isp gear is on ups, and that again is on a priority grid feed. ..will be producing my own power on this;
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Wed, Sep 10, 2003 at 01:36:32AM +0200, Arnt Karlsen wrote: But for an unattended server, most of the time it's probably better to force the system to reboot so you can restore service ASAP. ..even for raid-1 disks??? _Is_ there a combination of raid-1 and journalling fs'es for linux that's ready for carrier grade service? I'm not sure what you're referring to here. As far as I'm concerned, if the filesystem is inconsistent, panic'ing and letting the system get back to a known state is always the right answer. RAID-1 shouldn't be an issue here. Unless you're talking about *software* RAID-1 under Linux, and the fact that you have to rebuild mirror after an unclean shutdown, but that's arguably a defect in the software RAID 1 implementation. On other systems, such as AIX's software RAID-1, the RAID-1 is implemented with a journal, so that there is no need to rebuild the mirror after an unclean shutdown. Alternatively, you could use a hardware RAID-1 solution, which also wouldn't have a problem with an unclean shutdowns. In any case, the speed hit for doing an panic with the current Linux MD implementation is a performance issue, and in my book reliability takes precedence over performance. So yes, even for RAID-1, and it doesn't matter what filesystem, if there's a problem, you should reboot. If you don't like the resulting performance hit after the panic, get a hardware RAID controller. I'm not sure what you mean by this. When there is a filesystem error ..add an healthy dose of irony to repair in repair. ;-) detected, all writes to the filesystem are immediately aborted, which ...precludes reporting the error? No, if you are using a networked syslog daemon, it certainly does preclode reporting the error. If you mean the case where there is a filesystem error on the partition where /var/log resides, yes, we consider it better to abort writes to the filesystem than to attempt to write out the log message to a compromised filesystem. .._exactly_, but it is not reported to any of the system users. A system reboot _is_ reported usefully to the system users, all tty users get the news. The message that a filesystem has been remounted read-only is logged as a KERN_CRIT message. If you wish, you can configure your syslog.conf so that all tty users are notified of kern.crit level errors. That's probably a good thing, although it's not clear that a typical user will understand what to do when they are a told that a filesystem has been remounted read-only. Certainly it is trivial to configure sysklogd to grab that message and do whatever you would like with it, if you were to so choose. If you want to honk the big horn, that is certainly within your power to make the system do that. If you believe that Red Hat should configure their syslog.conf files to do this by default, feel free to submit a bug report / suggestion with Red Hat. of uncommitted data which has not been written out to disk.) So in general, not running the journal will leave you in a worse state after rebooting, compared to running the journal. ..it appears my experience disagrees with your expertize here. With more data, I would have been able to advice intelligently on when to and when not to run the journal, I believe we agree not running the journal is adviceable if the system has been left limping like this for a few hours. How long the system has been left limping doesn't really matter. The real issue is that there may be critical data that has been written to the journal that was not written to the filesystem before the journal was aborted and the filesystem left in a read-only state. This might, for example, include a user's thesis or several year's of research. (Why such work might not be backed up is a question I will leave for another day, and falls into the criminally negligent system administrator category) In general, you're better off running the journal after a journal abort. You have may think you have experiences to the contrary, but are you sure? Unless you snapshot the entire filesystem, and try it both ways, you can't really know for sure. There are classes of errors where the filesystem has been completely trashed, and whether or not you run the journal won't make a bit of difference. The much more important question is to figure out why the filesystem got trashed in the first place. Do you have marginal memory? hard drives? Are you running a beta-test kernel that might be buggy? Fixing the proximate cause is always the most important thing to do; since in the end, no matter how clever a filesystem, if you have buggy hardware or buggy device drivers, in the end you *will* be screwed. A filesystem can't compensate for those sorts of shortcomings. ..and, on a raid-1 disk set, a failure oughtta cut off the one bad fs and not shoot down the entire raid set because that one fs fails. I agree. When is that not happening? ..sparse_super
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
* [EMAIL PROTECTED] (Russell Coker) [2003.09.10 20:16]: On Thu, 11 Sep 2003 10:04, Arnt Karlsen wrote: ..I still believe in raid-1, but, ext3fs??? ..how does xfs, jfs and Reiserfs compare? ReiserFS has many situations where file system corruption can make operations such as find / trigger a kernel Oops. Having a file system decide to panic the kernel because your mount options instructed it to (ext3) is one thing. Having the file system driver corrupt random kernel memory and cause an Oops (Reiser) is another. The ReiserFS team's response to such issues has not made me happy so I am removing it from all my machines and converting to Ext3. Can you provide links to your discussions with the ReiserFS team? I'm considering using ReiserFS on some mail servers. Please share your experiences. Also you can't have a ReiserFS file system mounted read-only while fsck'ing it. Which makes recovering errors on the root FS very interesting to say the least. What I hate about ext3 is that it doesn't poorly handles dirs with 1000+ files. Haven't seen if they've fixed that yet. -- Cameron Moore [ Smoking cures weight problems... eventually. ] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Thu, 11 Sep 2003 13:22, Cameron Moore wrote: Having a file system decide to panic the kernel because your mount options instructed it to (ext3) is one thing. Having the file system driver corrupt random kernel memory and cause an Oops (Reiser) is another. The ReiserFS team's response to such issues has not made me happy so I am removing it from all my machines and converting to Ext3. Can you provide links to your discussions with the ReiserFS team? I'm considering using ReiserFS on some mail servers. Please share your experiences. It was on the reiserfs list a couple of months ago. They told me that it would be impossible to check all data for consistency when reading it from disk without having a huge performance hit. Ext3 appears to manage this (or at least corrupt ext2/3 file systems tend not to cause kernel memory corruption). -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Mon, 8 Sep 2003 12:05:24 -0400, Theodore Ts'o [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]: On Sun, Sep 07, 2003 at 07:24:27PM +0200, Arnt Karlsen wrote: What happens on error conditions can be set through tune2fs or as a mount option. Having it remount read-only is probably better than panicing the kernel. ..yeah, except in /var/log, /var/spool et al, I also lean towards panic in /home. I tend to use remount read-only feature on desktops, where it's useful for me to be able to save my work on some other filesystem before I reboot my system. ..remount read-only is ok, as long as the bugle blows. IME, it doesn't. But for an unattended server, most of the time it's probably better to force the system to reboot so you can restore service ASAP. ..even for raid-1 disks??? _Is_ there a combination of raid-1 and journalling fs'es for linux that's ready for carrier grade service? When it happens a reboot may be a good idea, in which case a fsck to fix the problem should occur automatically. ..should, agrrrRRRrrreed. IME (RH73 - RH9 and woody) it does not. ..what happens is the journaling dies, leaving a good fs intact, on rebooting, the dead journal will repair the fs wiping good data off the fs. I'm not sure what you mean by this. When there is a filesystem error ..add an healthy dose of irony to repair in repair. ;-) detected, all writes to the filesystem are immediately aborted, which ...precludes reporting the error? means the filesystem on disk is left in an unstable state. (It my look consistent while the system is still running, but there is a lot .._exactly_, but it is not reported to any of the system users. A system reboot _is_ reported usefully to the system users, all tty users get the news. of uncommitted data which has not been written out to disk.) So in general, not running the journal will leave you in a worse state after rebooting, compared to running the journal. ..it appears my experience disagrees with your expertize here. With more data, I would have been able to advice intelligently on when to and when not to run the journal, I believe we agree not running the journal is adviceable if the system has been left limping like this for a few hours. An alternative course of action, which we don't currently support would be to attempt to write everything to disk and quiesce the filesystem before remounting it read-only. The problem is that trying to flush everything out to disk might leave things in a worse state than just freezing all writes. ..could a ramdisk help? As in; store in ramdisk between journal commits and honk the big horn on non-recoverable errors? ..and, on a raid-1 disk set, a failure oughtta cut off the one bad fs and not shoot down the entire raid set because that one fs fails. The real problem is that in the face of filesystem corruption, by the time the filesystem notices that something is wrong, there may be significant damage that has already taken place. Some of it may already have been written to journal, in which case not replaying the journal might leave you with more data to recover; on the other hand, not replaying the journal could also risk leaving your filesystem very badly corrupted with data which the mail server had promised it had accepted, not actually getting saved by the filesystem. A human could make a read/write snapshot of the filesystem and try it both ways, but if you want automatic recovery, it's probably better to run the journal than not to run it. ..agreed, and with ext3 on a raid-1 set, this _oughtta_ be easy. ..the errors=remount,ro fstab option remounts the fs ro but fails to tell the system, so the system merrily logs data and accepts mail etc 'till Dooms Day, and especially on raid-1 disks I sort of expected redundancy, like in autofeather the bad prop and trim out the yaw and autopatch that holed fuel tank, and auto-sync the props, I mean, this was done _60_years_ ago in aviation to help win WWII, and ext3 on raid-1 floats around USS Yorktown-style??? If the system merrily logs data and accepts it, even after the filesystem is remounted read-only, that implies that the MTA is horribly buggy, not doing the most basic of error return code checks. ..agreed, pointer hints to such basic hints to such basics? If the filesystem is remounted read-only, then writes to the filesystem *will* return an error. If the application doesn't notice, then it's the application which is at fault, not ext3. ..on Woody, ext3 actually report the remount to /dev/console. ;-) _Nothing_ elsewhere. Dunno about Red Hat, never had one hooked to a monitor upon a journal failure. ..all I know is RH-7.3-8-9 and Woody does _not_ report ext3 journal failures in any way I am aware of and can make use of, other than these wee sad hints in dumpe2fs: Filesystem revision #:1 (dynamic) Filesystem features:
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Mon, 8 Sep 2003 00:17, Arnt Karlsen wrote: ..I have had a few cases of ext3fs'es, even on raid-1, going read-only on errors, what do you guys use to bring them back into service? What happens on error conditions can be set through tune2fs or as a mount option. Having it remount read-only is probably better than panicing the kernel. When it happens a reboot may be a good idea, in which case a fsck to fix the problem should occur automatically. -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..
On Mon, 8 Sep 2003 00:20:12 +1000, Russell Coker [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]: On Mon, 8 Sep 2003 00:17, Arnt Karlsen wrote: ..I have had a few cases of ext3fs'es, even on raid-1, going read-only on errors, what do you guys use to bring them back into service? What happens on error conditions can be set through tune2fs or as a mount option. Having it remount read-only is probably better than panicing the kernel. ..yeah, except in /var/log, /var/spool et al, I also lean towards panic in /home. When it happens a reboot may be a good idea, in which case a fsck to fix the problem should occur automatically. ..should, agrrrRRRrrreed. IME (RH73 - RH9 and woody) it does not. ..what happens is the journaling dies, leaving a good fs intact, on rebooting, the dead journal will repair the fs wiping good data off the fs. ..compare 'df -h' and 'cat /proc/mounts' on such a system. ..the errors=remount,ro fstab option remounts the fs ro but fails to tell the system, so the system merrily logs data and accepts mail etc 'till Dooms Day, and especially on raid-1 disks I sort of expected redundancy, like in autofeather the bad prop and trim out the yaw and autopatch that holed fuel tank, and auto-sync the props, I mean, this was done _60_years_ ago in aviation to help win WWII, and ext3 on raid-1 floats around USS Yorktown-style??? -- ..med vennlig hilsen = with Kind Regards from Arnt... ;-) ...with a number of polar bear hunters in his ancestry... Scenarios always come in sets of three: best case, worst case, and just in case. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]