Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-12 Thread Rich Puhek


Arnt Karlsen wrote:


..and after a journal death, and fsck, the raid set will be able 
to re-establish itself, no?  Or does the journal do both/all disks 
in a raid set?


The FS doesn't know or care about RAID-anything, as far as I know. 
Doesn't the FS just tell /dev/hda1, /dev/sda1, or /dev/md1 to write 
this data to this block. Very oversimplified, I know, but it doesn't 
seem like RAID should be part of the discussion here (aside from the 
fact that a RAID1 or RAID5 config *may* reduce the occurance of problems 
that would bring journaling into play).


..how does the journalling system choose which blocks to work from?
What I've been able to see, the journal dies when their super blocks
go bad?
The filesystem needs the superblock in order to find the journal.  If
you have a single gigantic filesystem mounted on /, then if the
primary superblock is corrupted, the kernel will not be able to mount
/, and you're hosed.  E2fsck will automatically try the primary
superblock, and if that is corrupt, it will try the first backup
superblock.  Failing that, a human will need to manually try one of
the other backup superblocks, if it is corrupted as well.


..this can be tuned to try more blocks before whining for manpower?

Ted will know a lot more about this than I do, but I'd think that if the 
first two superblocks are corrupt, the likelihood of superblock number 3 
or whatever being good is pretty low compared to the odds that the 
drive/parition is shot. Perhaps that's why e2fsck just gives up on the 
extra superblocks? Of course, then why bother including them?

I've had a bunch of Debian systems running on various (sometimes crappy) 
hardware for years. I've seen very few cases where a superblock was 
corrupt and e2fsck puked. In each case, it was on a drive that was old 
enough that it wasn't worth fussing over any more, so I just replaced 
the drive. Some of the drives are happy running on wintel boxes, others 
are just paperweights.


If your primary superblock is getting corrupted often, then first of
all, you should try to figure out why this is happening, and take
affirmative actions to prevent them.  (The fact that you're reporting
marginal power is supremely suspicious; marginal power can cause disk
corruptions very easily.  Getting higher quality power supplies will
help, but a UPS is the first thing I would get.)


..yeah, I'm working on the power bit.  ;-)


Secondly, you're better off using a small root filesystem that
generally isn't modified often.  What I normally do is use a 128 meg
root filesystem, with a separate /var partition (or /var symlinked to
/usr/var), and /tmp as a ram disk.  With the root filesystem rarely
changing, it's much less likely that it will be corrupted due to
hardware problems.  Then the root filesystem can come up, and e2fsck
can repair the other filesystems.


..yeah, except for /tmp on ramdisk, that's how I do my boxes, 
and my isp business client is learning his lesson good.  ;-)


But I repeat, your filesystems shouldn't be getting corrupted in the
first place.  Using a separate root filesystem is a good idea, and
will help you recover from hardware problems, but your primary
priority should be to avoid the hardware problems in the first place.
		- Ted


--

_

Rich Puhek
ETN Systems Inc.
2125 1st Ave East
Hibbing MN 55746
tel:   218.262.1130
email: [EMAIL PROTECTED]
_
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]


Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-12 Thread Russell Coker
On Sat, 13 Sep 2003 02:01, Rich Puhek wrote:
 Ted will know a lot more about this than I do, but I'd think that if the
 first two superblocks are corrupt, the likelihood of superblock number 3
 or whatever being good is pretty low compared to the odds that the
 drive/parition is shot. Perhaps that's why e2fsck just gives up on the
 extra superblocks? Of course, then why bother including them?

In principle it seems to be always a good idea to have more copies of your 
data than the software knows how to deal with automatically.  Then if the 
software screws up and mangles everything it touches you may still have a 
chance to manually do whatever is necessary to save it.

I recall a story about a tape drive that became damaged in a way that made it 
destroy every tape put in it.  When some data needed to be restored the first 
tape didn't work, they tried it in a second drive and it was proven to be 
dead.  They got a second backup and repeated the same proceedure...

It was only when they were down to their last backup that someone got wise and 
used a different tape drive for the first attempt, which resulted in the data 
being read without any errors.

In that situation if a tape robot had control then it would certainly have 
trashed all copies of the data.  I can imagine similar things happening to a 
file system with a dieing hard disk.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-12 Thread Arnt Karlsen
On Sat, 13 Sep 2003 03:54:07 +1000, 
Russell Coker [EMAIL PROTECTED] wrote in message 
[EMAIL PROTECTED]:

 On Sat, 13 Sep 2003 02:01, Rich Puhek wrote:
  Ted will know a lot more about this than I do, but I'd think that if
  the first two superblocks are corrupt, the likelihood of superblock
  number 3 or whatever being good is pretty low compared to the odds
  that the drive/parition is shot. Perhaps that's why e2fsck just
  gives up on the extra superblocks? Of course, then why bother
  including them?
 
 In principle it seems to be always a good idea to have more copies of
 your data than the software knows how to deal with automatically. 
 Then if the software screws up and mangles everything it touches you
 may still have a chance to manually do whatever is necessary to save
 it.
 
 I recall a story about a tape drive that became damaged in a way that
 made it destroy every tape put in it.  When some data needed to be
 restored the first tape didn't work, they tried it in a second drive
 and it was proven to be dead.  They got a second backup and repeated
 the same proceedure...
 
 It was only when they were down to their last backup that someone got
 wise and used a different tape drive for the first attempt, which
 resulted in the data being read without any errors.
 
 In that situation if a tape robot had control then it would certainly
 have trashed all copies of the data.  I can imagine similar things
 happening to a file system with a dieing hard disk.

..agreed, but there are vast differences between 
the first 2, every other and all.  ;-)

-- 
..med vennlig hilsen = with Kind Regards from Arnt... ;-)
...with a number of polar bear hunters in his ancestry...
  Scenarios always come in sets of three: 
  best case, worst case, and just in case.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-11 Thread Theodore Ts'o
On Thu, Sep 11, 2003 at 02:04:19AM +0200, Arnt Karlsen wrote:
 ..I still believe in raid-1, but, ext3fs???  
 
 ..how does xfs, jfs and Reiserfs compare?  

If you have random disk corruptions happening as often as you are, no
filesystem is going to be able to help you.  The only question is how
quickly the filesystem notices *before* user data starts getting
irrecovably lost.  Ext3 generally tends to be one of the more paranoid
filesystems about checking assertions and should never happen cases,
although I don't know how it compares to reiserfs, jfs, et. al.  

There are have certainly been cases in the past where people were
convinced that there was a bug in ext2, since other filesystems (minix
in this particular case) weren't reporting the problem.  But, it
turned out to be a buffer cache bug, and it was simply that other
filesystems were not doing the appropriate assertion checks, and user
data was getting lost; the system administrator was just left in
blissful ignorance.

  Unless you're talking about *software* RAID-1 under Linux, and the
 
 ..bingo, I should have said so.
 
  fact that you have to rebuild mirror after an unclean shutdown, but
  that's arguably a defect in the software RAID 1 implementation.  On
  other systems, such as AIX's software RAID-1, the RAID-1 is
  implemented with a journal, 
 
 ..but software RAID-1 under Linux is not or did I miss something here?

No, software RAID-1 does not do journalling at the RAID level.  That
means that in the case of a unclean shutdown, the RAID system will
need to restablish the mirror.  As I said, this is a performance
issue, since half the disk bandwidth of the RAID array will be
diverted to restablishing the mirror during the unclean shutdown.
Note also this is true *regardless* of what filesystem you use,
journaling and non-journaling.


 ..ok, for my throttle boxes, here is where I should honk the 
 horn and divert logging to a log server and schedule a fsck?
 (And ofcourse just reboot my mailservers on the same error.)

For your throttle boxes, do you need to have any writes to your
filesystems at all?  If what you care about is zero downtime, why not
just run syslog over the network, and keep all of your filesystems
mounted read/only?  Some extreme configurations I've seen (especially
where ISP's don't have direct/easy access to their systems at remote
POP's), use a read-only flash filesystem, and a ramdisk for /tmp, and
no spinning disks at all.  This significantly increases reliability
caused by disk failures, since the hard drive is often the most
vulnerable part of the system, especially in the face of heat
vibrations, etc.

 ..IMHO the debian bootstrap should first read the rpm database 
 and generate a deb database, and then do 'apt-get update  \
 apt-get dist-upgrade'.  _Is_ there such a bootstrap beast?

While this would be interesting for those people who are converting
from Red Hat to Debian, it's a lot more complicated than that, since
you also have to convert over the configuration files; Red Hat and
Debian don't necessarily store files in the same location.

I generally find that for production systems, it's much safer and
simpler to install Debian on a new disk (and on a new system), and
then copy over the new configuration files over.  That way, you can
test the system and make sure everything is A-OK before cutting over
something on a production system.

(By the way, it seems like 50% of your problems is that you're doing
things on the cheap, and yet you still want 100% reliability.  If you
want carrier-grade reliability, you need to pay a little bit extra,
and do things like have hot spares, and installation scripts that
allow you to create and configure new servers automatically, without
needing manual handwork.)

 ..256MB, but the disks may be marginal, on the known bad disks I get 
 write errors.  I have seen this same error on power blinks, failures 
 lasting for about a 1/3 of a second without losing monitor sync etc 
 on my desktops, once frying a power supply, but usually these blinks 
 cause no harm.

Sounds like you have marginal power.  Do you have a UPS (preferably a
continuous UPS) to protect your systems?  If not, why not?  (Again,
it's a bad idea to expect carrier-grade relaibility when you're not
willing pay for the basic high-quality equipment, backup equipment,
and devices such as UPS's to protect your equipment.)

 ..ah.  So with a 30GB /var ext3fs raid-1 I would have 25% or 13%
 consumed by backup copies of the superblock and block group descriptors?

It's an order n**2 problem; so it's not a linear relationship.  And
most people get annoyed by that kind of overhead, long before it gets
to 10% or above.  

 ..how does the journalling system choose which blocks to work from?
 What I've been able to see, the journal dies when their super blocks 
 go bad?

The filesystem needs the superblock in order to find the journal.  If
you have a single gigantic filesystem mounted on /, then if the
primary 

Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-11 Thread Arnt Karlsen
On Thu, 11 Sep 2003 14:03:17 -0400, 
Theodore Ts'o [EMAIL PROTECTED] wrote in message 
[EMAIL PROTECTED]:

 On Thu, Sep 11, 2003 at 02:04:19AM +0200, Arnt Karlsen wrote:
  ..I still believe in raid-1, but, ext3fs???  
  
  ..how does xfs, jfs and Reiserfs compare?  
 
 If you have random disk corruptions happening as often as you are, no
 filesystem is going to be able to help you.  The only question is how
 quickly the filesystem notices *before* user data starts getting
 irrecovably lost.  Ext3 generally tends to be one of the more paranoid
 filesystems about checking assertions and should never happen cases,
 although I don't know how it compares to reiserfs, jfs, et. al.  

..ok, how about ext3 versus ext2 on raid-1?

   Unless you're talking about *software* RAID-1 under Linux, and the
  
  ..bingo, I should have said so.
  
   fact that you have to rebuild mirror after an unclean shutdown,
   but that's arguably a defect in the software RAID 1
   implementation.  On other systems, such as AIX's software RAID-1,
   the RAID-1 is implemented with a journal, 
  
  ..but software RAID-1 under Linux is not or did I miss something
  here?
 
 No, software RAID-1 does not do journalling at the RAID level.  That
 means that in the case of a unclean shutdown, the RAID system will
 need to restablish the mirror.  

..and after a journal death, and fsck, the raid set will be able 
to re-establish itself, no?  Or does the journal do both/all disks 
in a raid set?

 As I said, this is a performance issue, since half the disk bandwidth
 of the RAID array will be diverted to restablishing the mirror during
 the unclean shutdown. Note also this is true *regardless* of what
 filesystem you use, journaling and non-journaling.

..noted, non-issue in my case. 
 
  ..ok, for my throttle boxes, here is where I should honk the 
  horn and divert logging to a log server and schedule a fsck?
  (And ofcourse just reboot my mailservers on the same error.)
 
 For your throttle boxes, do you need to have any writes to your
 filesystems at all?  If what you care about is zero downtime, why not
 just run syslog over the network, and keep all of your filesystems
 mounted read/only?  Some extreme configurations I've seen (especially
 where ISP's don't have direct/easy access to their systems at remote
 POP's), use a read-only flash filesystem, and a ramdisk for /tmp, and
 no spinning disks at all.  This significantly increases reliability
 caused by disk failures, since the hard drive is often the most
 vulnerable part of the system, especially in the face of heat
 vibrations, etc.

..sounds like an idea.  The major point against is geography, 
I like to arrive at stand-alone one-box solutions, but networked 
logging is a good way to verify the network status.  What is 
used, ssh tunnels?

  ..IMHO the debian bootstrap should first read the rpm database 
  and generate a deb database, and then do 'apt-get update  \
  apt-get dist-upgrade'.  _Is_ there such a bootstrap beast?
 
 While this would be interesting for those people who are converting
 from Red Hat to Debian, it's a lot more complicated than that, since
 you also have to convert over the configuration files; Red Hat and
 Debian don't necessarily store files in the same location.

..I know.  ;-)

 I generally find that for production systems, it's much safer and
 simpler to install Debian on a new disk (and on a new system), and
 then copy over the new configuration files over.  That way, you can
 test the system and make sure everything is A-OK before cutting over
 something on a production system.
 
..yeah, my pipe dream.  ;-)

 (By the way, it seems like 50% of your problems is that you're doing
 things on the cheap, and yet you still want 100% reliability.  If you
 want carrier-grade reliability, you need to pay a little bit extra,
 and do things like have hot spares, and installation scripts that
 allow you to create and configure new servers automatically, without
 needing manual handwork.)

..hey, the isp shop is not mine, and it _is_ a small operation, 
so I need to grow it so I can charge'em.  ;-)  These guys are 
Wintendo convertites, and I do the hard stuff for 'em.  ;-)
 
  ..256MB, but the disks may be marginal, on the known bad disks I get
  write errors.  I have seen this same error on power blinks,
  failures lasting for about a 1/3 of a second without losing monitor
  sync etc on my desktops, once frying a power supply, but usually
  these blinks cause no harm.
 
 Sounds like you have marginal power.  Do you have a UPS (preferably a
 continuous UPS) to protect your systems?  If not, why not?  (Again,
 it's a bad idea to expect carrier-grade relaibility when you're not
 willing pay for the basic high-quality equipment, backup equipment,
 and devices such as UPS's to protect your equipment.)

..2 different sites, I have marginal power in my lab, but the 
isp gear is on ups, and that again is on a priority grid feed.

..will be producing my own power on this; 

Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-10 Thread Theodore Ts'o
On Wed, Sep 10, 2003 at 01:36:32AM +0200, Arnt Karlsen wrote:
  But for an unattended server, most of the time it's probably better to
  force the system to reboot so you can restore service ASAP.
 
 ..even for raid-1 disks???  _Is_ there a combination of raid-1 and 
 journalling fs'es for linux that's ready for carrier grade service?

I'm not sure what you're referring to here.  As far as I'm concerned,
if the filesystem is inconsistent, panic'ing and letting the system
get back to a known state is always the right answer.  RAID-1
shouldn't be an issue here.  

Unless you're talking about *software* RAID-1 under Linux, and the
fact that you have to rebuild mirror after an unclean shutdown, but
that's arguably a defect in the software RAID 1 implementation.  On
other systems, such as AIX's software RAID-1, the RAID-1 is
implemented with a journal, so that there is no need to rebuild the
mirror after an unclean shutdown.  Alternatively, you could use a
hardware RAID-1 solution, which also wouldn't have a problem with an
unclean shutdowns.

In any case, the speed hit for doing an panic with the current Linux
MD implementation is a performance issue, and in my book reliability
takes precedence over performance.  So yes, even for RAID-1, and it
doesn't matter what filesystem, if there's a problem, you should
reboot.  If you don't like the resulting performance hit after the
panic, get a hardware RAID controller.

  I'm not sure what you mean by this.  When there is a filesystem error
 
 ..add an healthy dose of irony to repair in repair.  ;-)
 
  detected, all writes to the filesystem are immediately aborted, which
 
 ...precludes reporting the error?  

No, if you are using a networked syslog daemon, it certainly does
preclode reporting the error.  If you mean the case where there is a
filesystem error on the partition where /var/log resides, yes, we
consider it better to abort writes to the filesystem than to attempt
to write out the log message to a compromised filesystem.

 .._exactly_, but it is not reported to any of the system users.  
 A system reboot _is_ reported usefully to the system users, all 
 tty users get the news.

The message that a filesystem has been remounted read-only is logged
as a KERN_CRIT message.  If you wish, you can configure your
syslog.conf so that all tty users are notified of kern.crit level
errors.  That's probably a good thing, although it's not clear that a
typical user will understand what to do when they are a told that a
filesystem has been remounted read-only.

Certainly it is trivial to configure sysklogd to grab that message and
do whatever you would like with it, if you were to so choose.  If you
want to honk the big horn, that is certainly within your power to
make the system do that.

If you believe that Red Hat should configure their syslog.conf files
to do this by default, feel free to submit a bug report / suggestion
with Red Hat.

  of uncommitted data which has not been written out to disk.)  So in
  general, not running the journal will leave you in a worse state after
  rebooting, compared to running the journal.
 
 ..it appears my experience disagrees with your expertize here.
 With more data, I would have been able to advice intelligently 
 on when to and when not to run the journal, I believe we agree 
 not running the journal is adviceable if the system has been 
 left limping like this for a few hours.

How long the system has been left limping doesn't really matter.  The
real issue is that there may be critical data that has been written to
the journal that was not written to the filesystem before the journal
was aborted and the filesystem left in a read-only state.  This might,
for example, include a user's thesis or several year's of research.
(Why such work might not be backed up is a question I will leave for
another day, and falls into the criminally negligent system
administrator category)

In general, you're better off running the journal after a journal
abort.  You have may think you have experiences to the contrary, but
are you sure?  Unless you snapshot the entire filesystem, and try it
both ways, you can't really know for sure.  There are classes of
errors where the filesystem has been completely trashed, and whether
or not you run the journal won't make a bit of difference.  

The much more important question is to figure out why the filesystem
got trashed in the first place.  Do you have marginal memory?  hard
drives?  Are you running a beta-test kernel that might be buggy?
Fixing the proximate cause is always the most important thing to do;
since in the end, no matter how clever a filesystem, if you have buggy
hardware or buggy device drivers, in the end you *will* be screwed.  A
filesystem can't compensate for those sorts of shortcomings.

 ..and, on a raid-1 disk set, a failure oughtta cut off the one bad 
 fs and not shoot down the entire raid set because that one fs fails.

I agree.  When is that not happening?

 ..sparse_super 

Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-10 Thread Cameron Moore
* [EMAIL PROTECTED] (Russell Coker) [2003.09.10 20:16]:
 On Thu, 11 Sep 2003 10:04, Arnt Karlsen wrote:
  ..I still believe in raid-1, but, ext3fs???
  ..how does xfs, jfs and Reiserfs compare?
 
 ReiserFS has many situations where file system corruption can make operations 
 such as find / trigger a kernel Oops.
 
 Having a file system decide to panic the kernel because your mount options 
 instructed it to (ext3) is one thing.  Having the file system driver corrupt 
 random kernel memory and cause an Oops (Reiser) is another.  The ReiserFS 
 team's response to such issues has not made me happy so I am removing it from 
 all my machines and converting to Ext3.

Can you provide links to your discussions with the ReiserFS team?  I'm
considering using ReiserFS on some mail servers.  Please share your
experiences.

 Also you can't have a ReiserFS file system mounted read-only while fsck'ing 
 it.  Which makes recovering errors on the root FS very interesting to say the 
 least.

What I hate about ext3 is that it doesn't poorly handles dirs with 1000+
files.  Haven't seen if they've fixed that yet.
-- 
Cameron Moore
[ Smoking cures weight problems... eventually. ]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-10 Thread Russell Coker
On Thu, 11 Sep 2003 13:22, Cameron Moore wrote:
  Having a file system decide to panic the kernel because your mount
  options instructed it to (ext3) is one thing.  Having the file system
  driver corrupt random kernel memory and cause an Oops (Reiser) is
  another.  The ReiserFS team's response to such issues has not made me
  happy so I am removing it from all my machines and converting to Ext3.

 Can you provide links to your discussions with the ReiserFS team?  I'm
 considering using ReiserFS on some mail servers.  Please share your
 experiences.

It was on the reiserfs list a couple of months ago.

They told me that it would be impossible to check all data for consistency 
when reading it from disk without having a huge performance hit.

Ext3 appears to manage this (or at least corrupt ext2/3 file systems tend not 
to cause kernel memory corruption).

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-09 Thread Arnt Karlsen
On Mon, 8 Sep 2003 12:05:24 -0400, 
Theodore Ts'o [EMAIL PROTECTED] wrote in message 
[EMAIL PROTECTED]:

 On Sun, Sep 07, 2003 at 07:24:27PM +0200, Arnt Karlsen wrote:
   What happens on error conditions can be set through tune2fs or as
   a mount option.  Having it remount read-only is probably better
   than panicing the kernel.
  
  ..yeah, except in /var/log, /var/spool et al, I also lean towards 
  panic in /home.
 
 I tend to use remount read-only feature on desktops, where it's useful
 for me to be able to save my work on some other filesystem before I
 reboot my system. 

..remount read-only is ok, as long as the bugle blows.  
IME, it doesn't.

 But for an unattended server, most of the time it's probably better to
 force the system to reboot so you can restore service ASAP.

..even for raid-1 disks???  _Is_ there a combination of raid-1 and 
journalling fs'es for linux that's ready for carrier grade service?

   When it happens a reboot may be a good idea, in which case a fsck
   to fix the problem should occur automatically.
  
  ..should, agrrrRRRrrreed.  IME (RH73 - RH9 and woody) it does
  not.
  
  ..what happens is the journaling dies, leaving a good fs intact, 
  on rebooting, the dead journal will repair the fs wiping good 
  data off the fs.
 
 I'm not sure what you mean by this.  When there is a filesystem error

..add an healthy dose of irony to repair in repair.  ;-)

 detected, all writes to the filesystem are immediately aborted, which

...precludes reporting the error?  

 means the filesystem on disk is left in an unstable state.  (It my
 look consistent while the system is still running, but there is a lot

.._exactly_, but it is not reported to any of the system users.  
A system reboot _is_ reported usefully to the system users, all 
tty users get the news.

 of uncommitted data which has not been written out to disk.)  So in
 general, not running the journal will leave you in a worse state after
 rebooting, compared to running the journal.

..it appears my experience disagrees with your expertize here.
With more data, I would have been able to advice intelligently 
on when to and when not to run the journal, I believe we agree 
not running the journal is adviceable if the system has been 
left limping like this for a few hours.

 An alternative course of action, which we don't currently support
 would be to attempt to write everything to disk and quiesce the
 filesystem before remounting it read-only.  The problem is that trying
 to flush everything out to disk might leave things in a worse state
 than just freezing all writes.

..could a ramdisk help?  As in; store in ramdisk between journal 
commits and honk the big horn on non-recoverable errors?

..and, on a raid-1 disk set, a failure oughtta cut off the one bad 
fs and not shoot down the entire raid set because that one fs fails.

 The real problem is that in the face of filesystem corruption, by the
 time the filesystem notices that something is wrong, there may be
 significant damage that has already taken place.  Some of it may
 already have been written to journal, in which case not replaying the
 journal might leave you with more data to recover; on the other hand,
 not replaying the journal could also risk leaving your filesystem very
 badly corrupted with data which the mail server had promised it had
 accepted, not actually getting saved by the filesystem.
 
 A human could make a read/write snapshot of the filesystem and try it
 both ways, but if you want automatic recovery, it's probably better to
 run the journal than not to run it.  

..agreed, and with ext3 on a raid-1 set, this _oughtta_ be easy.
 
  ..the errors=remount,ro fstab option remounts the fs ro but fails 
  to tell the system, so the system merrily logs data and accepts 
  mail etc 'till Dooms Day, and especially on raid-1 disks I sort of 
  expected redundancy, like in autofeather the bad prop and trim out 
  the yaw and autopatch that holed fuel tank, and auto-sync the 
  props, I mean, this was done _60_years_ ago in aviation to help 
  win WWII, and ext3 on raid-1 floats around USS Yorktown-style???
 
 If the system merrily logs data and accepts it, even after the
 filesystem is remounted read-only, that implies that the MTA is
 horribly buggy, not doing the most basic of error return code checks.

..agreed, pointer hints to such basic hints to such basics?

 If the filesystem is remounted read-only, then writes to the
 filesystem *will* return an error.  If the application doesn't notice,
 then it's the application which is at fault, not ext3.

..on Woody, ext3 actually report the remount to /dev/console.  ;-)
_Nothing_ elsewhere.  Dunno about Red Hat, never had one hooked 
to a monitor upon a journal failure. 

..all I know is RH-7.3-8-9 and Woody does _not_ report ext3 journal 
failures in any way I am aware of and can make use of, other than 
these wee sad hints in dumpe2fs:
Filesystem revision #:1 (dynamic)
Filesystem features:   

Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-07 Thread Russell Coker
On Mon, 8 Sep 2003 00:17, Arnt Karlsen wrote:
 ..I have had a few cases of ext3fs'es, even on raid-1, going
 read-only on errors, what do you guys use to bring them back
 into service?

What happens on error conditions can be set through tune2fs or as a mount 
option.  Having it remount read-only is probably better than panicing the 
kernel.

When it happens a reboot may be a good idea, in which case a fsck to fix the 
problem should occur automatically.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-07 Thread Arnt Karlsen
On Mon, 8 Sep 2003 00:20:12 +1000, 
Russell Coker [EMAIL PROTECTED] wrote in message 
[EMAIL PROTECTED]:

 On Mon, 8 Sep 2003 00:17, Arnt Karlsen wrote:
  ..I have had a few cases of ext3fs'es, even on raid-1, going
  read-only on errors, what do you guys use to bring them back
  into service?
 
 What happens on error conditions can be set through tune2fs or as a
 mount option.  Having it remount read-only is probably better than
 panicing the kernel.

..yeah, except in /var/log, /var/spool et al, I also lean towards 
panic in /home.

 When it happens a reboot may be a good idea, in which case a fsck to
 fix the problem should occur automatically.

..should, agrrrRRRrrreed.  IME (RH73 - RH9 and woody) it does not.

..what happens is the journaling dies, leaving a good fs intact, 
on rebooting, the dead journal will repair the fs wiping good 
data off the fs.

..compare 'df -h' and 'cat /proc/mounts' on such a system.

..the errors=remount,ro fstab option remounts the fs ro but fails 
to tell the system, so the system merrily logs data and accepts 
mail etc 'till Dooms Day, and especially on raid-1 disks I sort of 
expected redundancy, like in autofeather the bad prop and trim out 
the yaw and autopatch that holed fuel tank, and auto-sync the 
props, I mean, this was done _60_years_ ago in aviation to help 
win WWII, and ext3 on raid-1 floats around USS Yorktown-style???

-- 
..med vennlig hilsen = with Kind Regards from Arnt... ;-)
...with a number of polar bear hunters in his ancestry...
  Scenarios always come in sets of three: 
  best case, worst case, and just in case.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]