Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-12 Thread Arnt Karlsen
On Sat, 13 Sep 2003 03:54:07 +1000, 
Russell Coker <[EMAIL PROTECTED]> wrote in message 
<[EMAIL PROTECTED]>:

> On Sat, 13 Sep 2003 02:01, Rich Puhek wrote:
> > Ted will know a lot more about this than I do, but I'd think that if
> > the first two superblocks are corrupt, the likelihood of superblock
> > number 3 or whatever being good is pretty low compared to the odds
> > that the drive/parition is shot. Perhaps that's why e2fsck just
> > gives up on the extra superblocks? Of course, then why bother
> > including them?
> 
> In principle it seems to be always a good idea to have more copies of
> your data than the software knows how to deal with automatically. 
> Then if the software screws up and mangles everything it touches you
> may still have a chance to manually do whatever is necessary to save
> it.
> 
> I recall a story about a tape drive that became damaged in a way that
> made it destroy every tape put in it.  When some data needed to be
> restored the first tape didn't work, they tried it in a second drive
> and it was proven to be dead.  They got a second backup and repeated
> the same proceedure...
> 
> It was only when they were down to their last backup that someone got
> wise and used a different tape drive for the first attempt, which
> resulted in the data being read without any errors.
> 
> In that situation if a tape robot had control then it would certainly
> have trashed all copies of the data.  I can imagine similar things
> happening to a file system with a dieing hard disk.

..agreed, but there are vast differences between 
"the first 2", "every other" and "all".  ;-)

-- 
..med vennlig hilsen = with Kind Regards from Arnt... ;-)
...with a number of polar bear hunters in his ancestry...
  Scenarios always come in sets of three: 
  best case, worst case, and just in case.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-12 Thread Russell Coker
On Sat, 13 Sep 2003 02:01, Rich Puhek wrote:
> Ted will know a lot more about this than I do, but I'd think that if the
> first two superblocks are corrupt, the likelihood of superblock number 3
> or whatever being good is pretty low compared to the odds that the
> drive/parition is shot. Perhaps that's why e2fsck just gives up on the
> extra superblocks? Of course, then why bother including them?

In principle it seems to be always a good idea to have more copies of your 
data than the software knows how to deal with automatically.  Then if the 
software screws up and mangles everything it touches you may still have a 
chance to manually do whatever is necessary to save it.

I recall a story about a tape drive that became damaged in a way that made it 
destroy every tape put in it.  When some data needed to be restored the first 
tape didn't work, they tried it in a second drive and it was proven to be 
dead.  They got a second backup and repeated the same proceedure...

It was only when they were down to their last backup that someone got wise and 
used a different tape drive for the first attempt, which resulted in the data 
being read without any errors.

In that situation if a tape robot had control then it would certainly have 
trashed all copies of the data.  I can imagine similar things happening to a 
file system with a dieing hard disk.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-12 Thread Rich Puhek


Arnt Karlsen wrote:


..and after a journal death, and fsck, the raid set will be able 
to re-establish itself, no?  Or does the journal do both/all disks 
in a raid set?


The FS doesn't know or care about RAID-anything, as far as I know. 
Doesn't the FS just tell /dev/hda1, /dev/sda1, or /dev/md1 to "write 
this data to this block". Very oversimplified, I know, but it doesn't 
seem like RAID should be part of the discussion here (aside from the 
fact that a RAID1 or RAID5 config *may* reduce the occurance of problems 
that would bring journaling into play).


..how does the journalling system choose which blocks to work from?
What I've been able to see, the journal dies when their super blocks
go bad?
The filesystem needs the superblock in order to find the journal.  If
you have a single gigantic filesystem mounted on /, then if the
primary superblock is corrupted, the kernel will not be able to mount
/, and you're hosed.  E2fsck will automatically try the primary
superblock, and if that is corrupt, it will try the first backup
superblock.  Failing that, a human will need to manually try one of
the other backup superblocks, if it is corrupted as well.


..this can be tuned to try more blocks before whining for manpower?

Ted will know a lot more about this than I do, but I'd think that if the 
first two superblocks are corrupt, the likelihood of superblock number 3 
or whatever being good is pretty low compared to the odds that the 
drive/parition is shot. Perhaps that's why e2fsck just gives up on the 
extra superblocks? Of course, then why bother including them?

I've had a bunch of Debian systems running on various (sometimes crappy) 
hardware for years. I've seen very few cases where a superblock was 
corrupt and e2fsck puked. In each case, it was on a drive that was old 
enough that it wasn't worth fussing over any more, so I just replaced 
the drive. Some of the drives are happy running on wintel boxes, others 
are just paperweights.


If your primary superblock is getting corrupted often, then first of
all, you should try to figure out why this is happening, and take
affirmative actions to prevent them.  (The fact that you're reporting
marginal power is supremely suspicious; marginal power can cause disk
corruptions very easily.  Getting higher quality power supplies will
help, but a UPS is the first thing I would get.)


..yeah, I'm working on the power bit.  ;-)


Secondly, you're better off using a small root filesystem that
generally isn't modified often.  What I normally do is use a 128 meg
root filesystem, with a separate /var partition (or /var symlinked to
/usr/var), and /tmp as a ram disk.  With the root filesystem rarely
changing, it's much less likely that it will be corrupted due to
hardware problems.  Then the root filesystem can come up, and e2fsck
can repair the other filesystems.


..yeah, except for /tmp on ramdisk, that's how I do my boxes, 
and my isp business client is learning his lesson good.  ;-)


But I repeat, your filesystems shouldn't be getting corrupted in the
first place.  Using a separate root filesystem is a good idea, and
will help you recover from hardware problems, but your primary
priority should be to avoid the hardware problems in the first place.
		- Ted


--

_

Rich Puhek
ETN Systems Inc.
2125 1st Ave East
Hibbing MN 55746
tel:   218.262.1130
email: [EMAIL PROTECTED]
_
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-11 Thread Arnt Karlsen
On Thu, 11 Sep 2003 14:03:17 -0400, 
Theodore Ts'o <[EMAIL PROTECTED]> wrote in message 
<[EMAIL PROTECTED]>:

> On Thu, Sep 11, 2003 at 02:04:19AM +0200, Arnt Karlsen wrote:
> > ..I still believe in raid-1, but, ext3fs???  
> > 
> > ..how does xfs, jfs and Reiserfs compare?  
> 
> If you have random disk corruptions happening as often as you are, no
> filesystem is going to be able to help you.  The only question is how
> quickly the filesystem notices *before* user data starts getting
> irrecovably lost.  Ext3 generally tends to be one of the more paranoid
> filesystems about checking assertions and "should never happen cases",
> although I don't know how it compares to reiserfs, jfs, et. al.  

..ok, how about ext3 versus ext2 on raid-1?

> > > Unless you're talking about *software* RAID-1 under Linux, and the
> > 
> > ..bingo, I should have said so.
> > 
> > > fact that you have to rebuild mirror after an unclean shutdown,
> > > but that's arguably a defect in the software RAID 1
> > > implementation.  On other systems, such as AIX's software RAID-1,
> > > the RAID-1 is implemented with a journal, 
> > 
> > ..but software RAID-1 under Linux is not or did I miss something
> > here?
> 
> No, software RAID-1 does not do journalling at the RAID level.  That
> means that in the case of a unclean shutdown, the RAID system will
> need to restablish the mirror.  

..and after a journal death, and fsck, the raid set will be able 
to re-establish itself, no?  Or does the journal do both/all disks 
in a raid set?

> As I said, this is a performance issue, since half the disk bandwidth
> of the RAID array will be diverted to restablishing the mirror during
> the unclean shutdown. Note also this is true *regardless* of what
> filesystem you use, journaling and non-journaling.

..noted, non-issue in my case. 
 
> > ..ok, for my throttle boxes, here is where I should honk the 
> > horn and divert logging to a log server and schedule a fsck?
> > (And ofcourse just reboot my mailservers on the same error.)
> 
> For your throttle boxes, do you need to have any writes to your
> filesystems at all?  If what you care about is zero downtime, why not
> just run syslog over the network, and keep all of your filesystems
> mounted read/only?  Some extreme configurations I've seen (especially
> where ISP's don't have direct/easy access to their systems at remote
> POP's), use a read-only flash filesystem, and a ramdisk for /tmp, and
> no spinning disks at all.  This significantly increases reliability
> caused by disk failures, since the hard drive is often the most
> vulnerable part of the system, especially in the face of heat
> vibrations, etc.

..sounds like an idea.  The major point against is geography, 
I like to arrive at stand-alone one-box solutions, but networked 
logging is a good way to verify the network status.  What is 
used, ssh tunnels?

> > ..IMHO the debian bootstrap should first read the rpm database 
> > and generate a deb database, and then do 'apt-get update && \
> > apt-get dist-upgrade'.  _Is_ there such a bootstrap beast?
> 
> While this would be interesting for those people who are converting
> from Red Hat to Debian, it's a lot more complicated than that, since
> you also have to convert over the configuration files; Red Hat and
> Debian don't necessarily store files in the same location.

..I know.  ;-)

> I generally find that for production systems, it's much safer and
> simpler to install Debian on a new disk (and on a new system), and
> then copy over the new configuration files over.  That way, you can
> test the system and make sure everything is A-OK before cutting over
> something on a production system.
 
..yeah, my pipe dream.  ;-)

> (By the way, it seems like 50% of your problems is that you're doing
> things on the cheap, and yet you still want 100% reliability.  If you
> want "carrier-grade reliability", you need to pay a little bit extra,
> and do things like have hot spares, and installation scripts that
> allow you to create and configure new servers automatically, without
> needing manual handwork.)

..hey, the isp shop is not mine, and it _is_ a small operation, 
so I need to grow it so I can charge'em.  ;-)  These guys are 
Wintendo convertites, and I do the hard stuff for 'em.  ;-)
 
> > ..256MB, but the disks may be marginal, on the known bad disks I get
> > write errors.  I have seen this same error on power "blinks",
> > failures lasting for about a 1/3 of a second without losing monitor
> > sync etc on my desktops, once frying a power supply, but usually
> > these "blinks" cause no harm.
> 
> Sounds like you have marginal power.  Do you have a UPS (preferably a
> continuous UPS) to protect your systems?  If not, why not?  (Again,
> it's a bad idea to expect "carrier-grade relaibility" when you're not
> willing pay for the basic high-quality equipment, backup equipment,
> and devices such as UPS's to protect your equipment.)

..2 different sites, I have marginal power in my l

Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-11 Thread Theodore Ts'o
On Thu, Sep 11, 2003 at 02:04:19AM +0200, Arnt Karlsen wrote:
> ..I still believe in raid-1, but, ext3fs???  
> 
> ..how does xfs, jfs and Reiserfs compare?  

If you have random disk corruptions happening as often as you are, no
filesystem is going to be able to help you.  The only question is how
quickly the filesystem notices *before* user data starts getting
irrecovably lost.  Ext3 generally tends to be one of the more paranoid
filesystems about checking assertions and "should never happen cases",
although I don't know how it compares to reiserfs, jfs, et. al.  

There are have certainly been cases in the past where people were
convinced that there was a bug in ext2, since other filesystems (minix
in this particular case) weren't reporting the problem.  But, it
turned out to be a buffer cache bug, and it was simply that other
filesystems were not doing the appropriate assertion checks, and user
data was getting lost; the system administrator was just left in
blissful ignorance.

> > Unless you're talking about *software* RAID-1 under Linux, and the
> 
> ..bingo, I should have said so.
> 
> > fact that you have to rebuild mirror after an unclean shutdown, but
> > that's arguably a defect in the software RAID 1 implementation.  On
> > other systems, such as AIX's software RAID-1, the RAID-1 is
> > implemented with a journal, 
> 
> ..but software RAID-1 under Linux is not or did I miss something here?

No, software RAID-1 does not do journalling at the RAID level.  That
means that in the case of a unclean shutdown, the RAID system will
need to restablish the mirror.  As I said, this is a performance
issue, since half the disk bandwidth of the RAID array will be
diverted to restablishing the mirror during the unclean shutdown.
Note also this is true *regardless* of what filesystem you use,
journaling and non-journaling.


> ..ok, for my throttle boxes, here is where I should honk the 
> horn and divert logging to a log server and schedule a fsck?
> (And ofcourse just reboot my mailservers on the same error.)

For your throttle boxes, do you need to have any writes to your
filesystems at all?  If what you care about is zero downtime, why not
just run syslog over the network, and keep all of your filesystems
mounted read/only?  Some extreme configurations I've seen (especially
where ISP's don't have direct/easy access to their systems at remote
POP's), use a read-only flash filesystem, and a ramdisk for /tmp, and
no spinning disks at all.  This significantly increases reliability
caused by disk failures, since the hard drive is often the most
vulnerable part of the system, especially in the face of heat
vibrations, etc.

> ..IMHO the debian bootstrap should first read the rpm database 
> and generate a deb database, and then do 'apt-get update && \
> apt-get dist-upgrade'.  _Is_ there such a bootstrap beast?

While this would be interesting for those people who are converting
from Red Hat to Debian, it's a lot more complicated than that, since
you also have to convert over the configuration files; Red Hat and
Debian don't necessarily store files in the same location.

I generally find that for production systems, it's much safer and
simpler to install Debian on a new disk (and on a new system), and
then copy over the new configuration files over.  That way, you can
test the system and make sure everything is A-OK before cutting over
something on a production system.

(By the way, it seems like 50% of your problems is that you're doing
things on the cheap, and yet you still want 100% reliability.  If you
want "carrier-grade reliability", you need to pay a little bit extra,
and do things like have hot spares, and installation scripts that
allow you to create and configure new servers automatically, without
needing manual handwork.)

> ..256MB, but the disks may be marginal, on the known bad disks I get 
> write errors.  I have seen this same error on power "blinks", failures 
> lasting for about a 1/3 of a second without losing monitor sync etc 
> on my desktops, once frying a power supply, but usually these "blinks" 
> cause no harm.

Sounds like you have marginal power.  Do you have a UPS (preferably a
continuous UPS) to protect your systems?  If not, why not?  (Again,
it's a bad idea to expect "carrier-grade relaibility" when you're not
willing pay for the basic high-quality equipment, backup equipment,
and devices such as UPS's to protect your equipment.)

> ..ah.  So with a 30GB /var ext3fs raid-1 I would have 25% or 13%
> consumed by backup copies of the superblock and block group descriptors?

It's an order n**2 problem; so it's not a linear relationship.  And
most people get annoyed by that kind of overhead, long before it gets
to 10% or above.  

> ..how does the journalling system choose which blocks to work from?
> What I've been able to see, the journal dies when their super blocks 
> go bad?

The filesystem needs the superblock in order to find the journal.  If
you have a single gigantic filesyste

Re: FS performace with lots of files, was: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-11 Thread Markus Schabel
Cameron Moore wrote:
* [EMAIL PROTECTED] (Russell Coker) [2003.09.10 20:16]:
Also you can't have a ReiserFS file system mounted read-only while fsck'ing 
it.  Which makes recovering errors on the root FS very interesting to say the 
least.


What I hate about ext3 is that it doesn't poorly handles dirs with 1000+
files.  Haven't seen if they've fixed that yet.
There exists a patch (hhttp://people.nl.linux.org/~phillips/htree/ - i
think there are other resources out there somewhere ;)) for 2.4.x, but
the code should be in the kernel since 2.4.20 for ext2 and for ext3 it
seems that it was available before (but there are some 2.4.19-patches
out there: http://lwn.net/Articles/11330/) - hopefully somebody can
bring some light into this...
regards
Markus
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-10 Thread Russell Coker
On Thu, 11 Sep 2003 13:22, Cameron Moore wrote:
> > Having a file system decide to panic the kernel because your mount
> > options instructed it to (ext3) is one thing.  Having the file system
> > driver corrupt random kernel memory and cause an Oops (Reiser) is
> > another.  The ReiserFS team's response to such issues has not made me
> > happy so I am removing it from all my machines and converting to Ext3.
>
> Can you provide links to your discussions with the ReiserFS team?  I'm
> considering using ReiserFS on some mail servers.  Please share your
> experiences.

It was on the reiserfs list a couple of months ago.

They told me that it would be impossible to check all data for consistency 
when reading it from disk without having a huge performance hit.

Ext3 appears to manage this (or at least corrupt ext2/3 file systems tend not 
to cause kernel memory corruption).

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-10 Thread Cameron Moore
* [EMAIL PROTECTED] (Russell Coker) [2003.09.10 20:16]:
> On Thu, 11 Sep 2003 10:04, Arnt Karlsen wrote:
> > ..I still believe in raid-1, but, ext3fs???
> > ..how does xfs, jfs and Reiserfs compare?
> 
> ReiserFS has many situations where file system corruption can make operations 
> such as "find /" trigger a kernel Oops.
> 
> Having a file system decide to panic the kernel because your mount options 
> instructed it to (ext3) is one thing.  Having the file system driver corrupt 
> random kernel memory and cause an Oops (Reiser) is another.  The ReiserFS 
> team's response to such issues has not made me happy so I am removing it from 
> all my machines and converting to Ext3.

Can you provide links to your discussions with the ReiserFS team?  I'm
considering using ReiserFS on some mail servers.  Please share your
experiences.

> Also you can't have a ReiserFS file system mounted read-only while fsck'ing 
> it.  Which makes recovering errors on the root FS very interesting to say the 
> least.

What I hate about ext3 is that it doesn't poorly handles dirs with 1000+
files.  Haven't seen if they've fixed that yet.
-- 
Cameron Moore
[ Smoking cures weight problems... eventually. ]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-10 Thread Russell Coker
On Thu, 11 Sep 2003 10:04, Arnt Karlsen wrote:
> ..I still believe in raid-1, but, ext3fs???
>
> ..how does xfs, jfs and Reiserfs compare?

ReiserFS has many situations where file system corruption can make operations 
such as "find /" trigger a kernel Oops.

Having a file system decide to panic the kernel because your mount options 
instructed it to (ext3) is one thing.  Having the file system driver corrupt 
random kernel memory and cause an Oops (Reiser) is another.  The ReiserFS 
team's response to such issues has not made me happy so I am removing it from 
all my machines and converting to Ext3.

Also you can't have a ReiserFS file system mounted read-only while fsck'ing 
it.  Which makes recovering errors on the root FS very interesting to say the 
least.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-10 Thread Arnt Karlsen
On Wed, 10 Sep 2003 14:39:44 -0400, 
Theodore Ts'o <[EMAIL PROTECTED]> wrote in message 
<[EMAIL PROTECTED]>:

> On Wed, Sep 10, 2003 at 01:36:32AM +0200, Arnt Karlsen wrote:
> > > But for an unattended server, most of the time it's probably
> > > better to force the system to reboot so you can restore service
> > > ASAP.
> > 
> > ..even for raid-1 disks???  _Is_ there a combination of raid-1 and 
> > journalling fs'es for linux that's ready for carrier grade service?
> 
> I'm not sure what you're referring to here.  

..isp gateway boxes that I like to keep running 24/7/365.2442etc.
The idea behind using ext3fs and raid-1 was to minimize the 
risks and downtime.

..I still believe in raid-1, but, ext3fs???  

..how does xfs, jfs and Reiserfs compare?  

..what I like with ext3 is it can be mounted as ext2 
in a pinch, assuming you can get to /etc/fstab.  ;-)

> As far as I'm concerned, if the filesystem is inconsistent, panic'ing
> and letting the system get back to a known state is always the right
> answer.  RAID-1 shouldn't be an issue here.  

..shouldn't be a problem, agreed.

> Unless you're talking about *software* RAID-1 under Linux, and the

..bingo, I should have said so.

> fact that you have to rebuild mirror after an unclean shutdown, but
> that's arguably a defect in the software RAID 1 implementation.  On
> other systems, such as AIX's software RAID-1, the RAID-1 is
> implemented with a journal, 

..but software RAID-1 under Linux is not or did I miss something here?

> so that there is no need to rebuild the 
> mirror after an unclean shutdown.  Alternatively, you could use a
> hardware RAID-1 solution, which also wouldn't have a problem with an
> unclean shutdowns.
> 
> In any case, the speed hit for doing an panic with the current Linux
> MD implementation is a performance issue, and in my book reliability
0> takes precedence over performance.  So yes, even for RAID-1, and it
> doesn't matter what filesystem, if there's a problem, you should
> reboot.  If you don't like the resulting performance hit after the
> panic, get a hardware RAID controller.

..agreed, and disagreed; for my isp gateway throttles, reboots means 
isp service downtime.  Logs can be tee'ed to log servers.  For a mail 
server I agree fully.

> > > I'm not sure what you mean by this.  When there is a filesystem
> > > error
> > 
> > ..add an "healthy" dose of irony to repair in "repair".  ;-)
> > 
> > > detected, all writes to the filesystem are immediately aborted,
> > > which
> > 
> > ...precludes reporting the error?  
> 
> No, if you are using a networked syslog daemon, it certainly does
> preclode reporting the error.  If you mean the case where there is a
> filesystem error on the partition where /var/log resides, yes, we
> consider it better to abort writes to the filesystem than to attempt
> to write out the log message to a compromised filesystem.

..ok, for my throttle boxes, here is where I should honk the 
horn and divert logging to a log server and schedule a fsck?
(And ofcourse just reboot my mailservers on the same error.)

..bottom line is that same journal death needs different 
medication depending on which purpose etc the box serves.

> > .._exactly_, but it is not reported to any of the system users.  
> > A system reboot _is_ reported usefully to the system users, all 
> > tty users get the news.
> 
> The message that a filesystem has been remounted read-only is logged
> as a KERN_CRIT message.  If you wish, you can configure your
> syslog.conf so that all tty users are notified of kern.crit level

..doh!  I _like_ fixes this simple.  ;-)

> errors.  That's probably a good thing, although it's not clear that a
> typical user will understand what to do when they are a told that a
> filesystem has been remounted read-only.

..so clue whack'em.  On a desktop they are not gonna loose much more 
than 5 seconds worth of work with the default commits, and scaring 
them with 30 years research work loss is good way to slap'em into 
doing the right things.
 
> Certainly it is trivial to configure sysklogd to grab that message and
> do whatever you would like with it, if you were to so choose.  If you
> want to "honk the big horn", that is certainly within your power to
> make the system do that.
> 
> If you believe that Red Hat should configure their syslog.conf files
> to do this by default, feel free to submit a bug report / suggestion
> with Red Hat.

..heh, the last time I tried that, was: 
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=89171

..the "This is a duplicate of various other bugs. We're looking at the
issues involved." is what brought me over to Debian. 

..now, if installing and raid disks could be as easy... 

..IMHO the debian bootstrap should first read the rpm database 
and generate a deb database, and then do 'apt-get update && \
apt-get dist-upgrade'.  _Is_ there such a bootstrap beast?

> > > of uncommitted data which has not been written out to disk.)  So
> > > in general, not r

Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-10 Thread Theodore Ts'o
On Wed, Sep 10, 2003 at 01:36:32AM +0200, Arnt Karlsen wrote:
> > But for an unattended server, most of the time it's probably better to
> > force the system to reboot so you can restore service ASAP.
> 
> ..even for raid-1 disks???  _Is_ there a combination of raid-1 and 
> journalling fs'es for linux that's ready for carrier grade service?

I'm not sure what you're referring to here.  As far as I'm concerned,
if the filesystem is inconsistent, panic'ing and letting the system
get back to a known state is always the right answer.  RAID-1
shouldn't be an issue here.  

Unless you're talking about *software* RAID-1 under Linux, and the
fact that you have to rebuild mirror after an unclean shutdown, but
that's arguably a defect in the software RAID 1 implementation.  On
other systems, such as AIX's software RAID-1, the RAID-1 is
implemented with a journal, so that there is no need to rebuild the
mirror after an unclean shutdown.  Alternatively, you could use a
hardware RAID-1 solution, which also wouldn't have a problem with an
unclean shutdowns.

In any case, the speed hit for doing an panic with the current Linux
MD implementation is a performance issue, and in my book reliability
takes precedence over performance.  So yes, even for RAID-1, and it
doesn't matter what filesystem, if there's a problem, you should
reboot.  If you don't like the resulting performance hit after the
panic, get a hardware RAID controller.

> > I'm not sure what you mean by this.  When there is a filesystem error
> 
> ..add an "healthy" dose of irony to repair in "repair".  ;-)
> 
> > detected, all writes to the filesystem are immediately aborted, which
> 
> ...precludes reporting the error?  

No, if you are using a networked syslog daemon, it certainly does
preclode reporting the error.  If you mean the case where there is a
filesystem error on the partition where /var/log resides, yes, we
consider it better to abort writes to the filesystem than to attempt
to write out the log message to a compromised filesystem.

> .._exactly_, but it is not reported to any of the system users.  
> A system reboot _is_ reported usefully to the system users, all 
> tty users get the news.

The message that a filesystem has been remounted read-only is logged
as a KERN_CRIT message.  If you wish, you can configure your
syslog.conf so that all tty users are notified of kern.crit level
errors.  That's probably a good thing, although it's not clear that a
typical user will understand what to do when they are a told that a
filesystem has been remounted read-only.

Certainly it is trivial to configure sysklogd to grab that message and
do whatever you would like with it, if you were to so choose.  If you
want to "honk the big horn", that is certainly within your power to
make the system do that.

If you believe that Red Hat should configure their syslog.conf files
to do this by default, feel free to submit a bug report / suggestion
with Red Hat.

> > of uncommitted data which has not been written out to disk.)  So in
> > general, not running the journal will leave you in a worse state after
> > rebooting, compared to running the journal.
> 
> ..it appears my experience disagrees with your expertize here.
> With more data, I would have been able to advice intelligently 
> on when to and when not to run the journal, I believe we agree 
> not running the journal is adviceable if the system has been 
> left limping like this for a few hours.

How long the system has been left limping doesn't really matter.  The
real issue is that there may be critical data that has been written to
the journal that was not written to the filesystem before the journal
was aborted and the filesystem left in a read-only state.  This might,
for example, include a user's thesis or several year's of research.
(Why such work might not be backed up is a question I will leave for
another day, and falls into the "criminally negligent system
administrator" category)

In general, you're better off running the journal after a journal
abort.  You have may think you have experiences to the contrary, but
are you sure?  Unless you snapshot the entire filesystem, and try it
both ways, you can't really know for sure.  There are classes of
errors where the filesystem has been completely trashed, and whether
or not you run the journal won't make a bit of difference.  

The much more important question is to figure out why the filesystem
got trashed in the first place.  Do you have marginal memory?  hard
drives?  Are you running a beta-test kernel that might be buggy?
Fixing the proximate cause is always the most important thing to do;
since in the end, no matter how clever a filesystem, if you have buggy
hardware or buggy device drivers, in the end you *will* be screwed.  A
filesystem can't compensate for those sorts of shortcomings.

> ..and, on a raid-1 disk set, a failure oughtta cut off the one bad 
> fs and not shoot down the entire raid set because that one fs fails.

I agree.  When i

Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-09 Thread Arnt Karlsen
On Mon, 8 Sep 2003 12:05:24 -0400, 
Theodore Ts'o <[EMAIL PROTECTED]> wrote in message 
<[EMAIL PROTECTED]>:

> On Sun, Sep 07, 2003 at 07:24:27PM +0200, Arnt Karlsen wrote:
> > > What happens on error conditions can be set through tune2fs or as
> > > a mount option.  Having it remount read-only is probably better
> > > than panicing the kernel.
> > 
> > ..yeah, except in /var/log, /var/spool et al, I also lean towards 
> > panic in /home.
> 
> I tend to use remount read-only feature on desktops, where it's useful
> for me to be able to save my work on some other filesystem before I
> reboot my system. 

..remount read-only is ok, as long as the bugle blows.  
IME, it doesn't.

> But for an unattended server, most of the time it's probably better to
> force the system to reboot so you can restore service ASAP.

..even for raid-1 disks???  _Is_ there a combination of raid-1 and 
journalling fs'es for linux that's ready for carrier grade service?

> > > When it happens a reboot may be a good idea, in which case a fsck
> > > to fix the problem should occur automatically.
> > 
> > ..should, agrrrRRRrrreed.  IME (RH73 - RH9 and woody) it does
> > not.
> > 
> > ..what happens is the journaling dies, leaving a good fs intact, 
> > on rebooting, the dead journal will "repair" the fs wiping good 
> > data off the fs.
> 
> I'm not sure what you mean by this.  When there is a filesystem error

..add an "healthy" dose of irony to repair in "repair".  ;-)

> detected, all writes to the filesystem are immediately aborted, which

...precludes reporting the error?  

> means the filesystem on disk is left in an unstable state.  (It my
> look consistent while the system is still running, but there is a lot

.._exactly_, but it is not reported to any of the system users.  
A system reboot _is_ reported usefully to the system users, all 
tty users get the news.

> of uncommitted data which has not been written out to disk.)  So in
> general, not running the journal will leave you in a worse state after
> rebooting, compared to running the journal.

..it appears my experience disagrees with your expertize here.
With more data, I would have been able to advice intelligently 
on when to and when not to run the journal, I believe we agree 
not running the journal is adviceable if the system has been 
left limping like this for a few hours.

> An alternative course of action, which we don't currently support
> would be to attempt to write everything to disk and quiesce the
> filesystem before remounting it read-only.  The problem is that trying
> to flush everything out to disk might leave things in a worse state
> than just freezing all writes.

..could a ramdisk help?  As in; store in ramdisk between journal 
commits and honk the big horn on non-recoverable errors?

..and, on a raid-1 disk set, a failure oughtta cut off the one bad 
fs and not shoot down the entire raid set because that one fs fails.

> The real problem is that in the face of filesystem corruption, by the
> time the filesystem notices that something is wrong, there may be
> significant damage that has already taken place.  Some of it may
> already have been written to journal, in which case not replaying the
> journal might leave you with more data to recover; on the other hand,
> not replaying the journal could also risk leaving your filesystem very
> badly corrupted with data which the mail server had promised it had
> accepted, not actually getting saved by the filesystem.
> 
> A human could make a read/write snapshot of the filesystem and try it
> both ways, but if you want automatic recovery, it's probably better to
> run the journal than not to run it.  

..agreed, and with ext3 on a raid-1 set, this _oughtta_ be easy.
 
> > ..the errors=remount,ro fstab option remounts the fs ro but fails 
> > to tell the system, so the system merrily "logs" data and "accepts" 
> > mail etc 'till Dooms Day, and especially on raid-1 disks I sort of 
> > expected redundancy, like in "autofeather the bad prop and trim out 
> > the yaw" and "autopatch that holed fuel tank", and "auto-sync the 
> > props", I mean, this was done _60_years_ ago in aviation to help 
> > win WWII, and ext3 on raid-1 floats around USS Yorktown-style???
> 
> If the system merrily logs data and accepts it, even after the
> filesystem is remounted read-only, that implies that the MTA is
> horribly buggy, not doing the most basic of error return code checks.

..agreed, pointer hints to such basic hints to such basics?

> If the filesystem is remounted read-only, then writes to the
> filesystem *will* return an error.  If the application doesn't notice,
> then it's the application which is at fault, not ext3.

..on Woody, ext3 actually report the remount to /dev/console.  ;-)
_Nothing_ elsewhere.  Dunno about Red Hat, never had one hooked 
to a monitor upon a journal failure. 

..all I know is RH-7.3-8-9 and Woody does _not_ report ext3 journal 
failures in any way I am aware of and can make use of

Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-08 Thread Theodore Ts'o
On Sun, Sep 07, 2003 at 07:24:27PM +0200, Arnt Karlsen wrote:
> > What happens on error conditions can be set through tune2fs or as a
> > mount option.  Having it remount read-only is probably better than
> > panicing the kernel.
> 
> ..yeah, except in /var/log, /var/spool et al, I also lean towards 
> panic in /home.

I tend to use remount read-only feature on desktops, where it's useful
for me to be able to save my work on some other filesystem before I
reboot my system.  But for an unattended server, most of the time it's
probably better to force the system to reboot so you can restore
service ASAP.

> > When it happens a reboot may be a good idea, in which case a fsck to
> > fix the problem should occur automatically.
> 
> ..should, agrrrRRRrrreed.  IME (RH73 - RH9 and woody) it does not.
> 
> ..what happens is the journaling dies, leaving a good fs intact, 
> on rebooting, the dead journal will "repair" the fs wiping good 
> data off the fs.

I'm not sure what you mean by this.  When there is a filesystem error
detected, all writes to the filesystem are immediately aborted, which
means the filesystem on disk is left in an unstable state.  (It my
look consistent while the system is still running, but there is a lot
of uncommitted data which has not been written out to disk.)  So in
general, not running the journal will leave you in a worse state after
rebooting, compared to running the journal.

An alternative course of action, which we don't currently support
would be to attempt to write everything to disk and quiesce the
filesystem before remounting it read-only.  The problem is that trying
to flush everything out to disk might leave things in a worse state
than just freezing all writes.

The real problem is that in the face of filesystem corruption, by the
time the filesystem notices that something is wrong, there may be
significant damage that has already taken place.  Some of it may
already have been written to journal, in which case not replaying the
journal might leave you with more data to recover; on the other hand,
not replaying the journal could also risk leaving your filesystem very
badly corrupted with data which the mail server had promised it had
accepted, not actually getting saved by the filesystem.

A human could make a read/write snapshot of the filesystem and try it
both ways, but if you want automatic recovery, it's probably better to
run the journal than not to run it.  

> ..the errors=remount,ro fstab option remounts the fs ro but fails 
> to tell the system, so the system merrily "logs" data and "accepts" 
> mail etc 'till Dooms Day, and especially on raid-1 disks I sort of 
> expected redundancy, like in "autofeather the bad prop and trim out 
> the yaw" and "autopatch that holed fuel tank", and "auto-sync the 
> props", I mean, this was done _60_years_ ago in aviation to help 
> win WWII, and ext3 on raid-1 floats around USS Yorktown-style???

If the system merrily logs data and accepts it, even after the
filesystem is remounted read-only, that implies that the MTA is
horribly buggy, not doing the most basic of error return code checks.
If the filesystem is remounted read-only, then writes to the
filesystem *will* return an error.  If the application doesn't notice,
then it's the application which is at fault, not ext3.

That being said, my preference for servers is to panic immediately on
the first sign of trouble, and let the system fsck and come back
again.  Even if your MTA is non-criminally-negligent, and checks error
codes, the best it can do is return a SMTP temporary failure, which
still doesn't keep the mail flowing.  You're probably best off
rebooting the machine and restoring service.

- Ted


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-07 Thread Arnt Karlsen
On Mon, 8 Sep 2003 00:20:12 +1000, 
Russell Coker <[EMAIL PROTECTED]> wrote in message 
<[EMAIL PROTECTED]>:

> On Mon, 8 Sep 2003 00:17, Arnt Karlsen wrote:
> > ..I have had a few cases of ext3fs'es, even on raid-1, going
> > read-only on errors, what do you guys use to bring them back
> > into service?
> 
> What happens on error conditions can be set through tune2fs or as a
> mount option.  Having it remount read-only is probably better than
> panicing the kernel.

..yeah, except in /var/log, /var/spool et al, I also lean towards 
panic in /home.

> When it happens a reboot may be a good idea, in which case a fsck to
> fix the problem should occur automatically.

..should, agrrrRRRrrreed.  IME (RH73 - RH9 and woody) it does not.

..what happens is the journaling dies, leaving a good fs intact, 
on rebooting, the dead journal will "repair" the fs wiping good 
data off the fs.

..compare 'df -h' and 'cat /proc/mounts' on such a system.

..the errors=remount,ro fstab option remounts the fs ro but fails 
to tell the system, so the system merrily "logs" data and "accepts" 
mail etc 'till Dooms Day, and especially on raid-1 disks I sort of 
expected redundancy, like in "autofeather the bad prop and trim out 
the yaw" and "autopatch that holed fuel tank", and "auto-sync the 
props", I mean, this was done _60_years_ ago in aviation to help 
win WWII, and ext3 on raid-1 floats around USS Yorktown-style???

-- 
..med vennlig hilsen = with Kind Regards from Arnt... ;-)
...with a number of polar bear hunters in his ancestry...
  Scenarios always come in sets of three: 
  best case, worst case, and just in case.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: ..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-07 Thread Russell Coker
On Mon, 8 Sep 2003 00:17, Arnt Karlsen wrote:
> ..I have had a few cases of ext3fs'es, even on raid-1, going
> read-only on errors, what do you guys use to bring them back
> into service?

What happens on error conditions can be set through tune2fs or as a mount 
option.  Having it remount read-only is probably better than panicing the 
kernel.

When it happens a reboot may be a good idea, in which case a fsck to fix the 
problem should occur automatically.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



..fixing ext3 fs going read-only, was : Sendmail or Qmail ? ..

2003-09-07 Thread Arnt Karlsen
On Sun, 7 Sep 2003 12:34:45 +1000, 
Russell Coker <[EMAIL PROTECTED]> wrote in message 
<[EMAIL PROTECTED]>:
> 
> Also I believe that in Ext3 if you write data to a file and then
> unlink the file before the data is committed to disk then the data
> will never be written.  So there seems no loss as long as the file
> isn't opened with O_SYNC and you don't call fsync() (and no-one calls
> sync()).
> 

..I have had a few cases of ext3fs'es, even on raid-1, going 
read-only on errors, what do you guys use to bring them back 
into service?

-- 
..med vennlig hilsen = with Kind Regards from Arnt... ;-)
...with a number of polar bear hunters in his ancestry...
  Scenarios always come in sets of three: 
  best case, worst case, and just in case.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]