Re: several background fsck panics

2003-03-30 Thread David Schultz
Thus spake Alexander Langer [EMAIL PROTECTED]:
 I had several panics related to background fsck now.  Once I disabled
 background fsck, all went ok.
 
 It began when I pressed the reset buttons on several boots while the
 system was still doing fscks.
[...]
 Mar 24 21:48:59 fump kernel: panic: ufs_dirbad: bad dir

You would have gotten this one without bgfsck as well the next
time you tried to look the offending directory.  Background fsck
only expedited the panic by reading all the directories on the
system in order to perform its checks.  Basically, the panic is
the kernel's way of telling you that something is unexpectedly
wrong with the filesystem (due in this case to ATA write caching),
and that it is going to give up rather than risk causing further
damage.  UFS, as well as most other filesystems, are not designed
to tolerate failures on the part of the hardware to honor its
guarantees, so it's hard to do better without inventing a new
filesystem.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: [Re: several background fsck panics

2003-03-30 Thread David Schultz
Thus spake Terry Lambert [EMAIL PROTECTED]:
 o Put a counter in the first superblock; it would be
   incremented when the BG fsck is started, and reset
   to zero when it completes.  If the counter reaches
   3 (or some command line specified number), then the
   BG flagging is ignored, and a full FG fsck is then
   performed instead.  I like this idea because it will
   always work, and it's not actually a hack, it's a
   correct solution.

I'm glad you like it because AFAIK, it is already implemented.  ;-)

 o Implement soft read-only.  The place that most of
   the complaints are coming from is desktop users, with
   relatively quiescent machines.  Though swap is used,
   it does not occur in an FS partition.  As a result,
   the FS could be marked read-only for long period of
   time.  This marking would be in memory.  The clean bit
   would be set on the superblock.  When a write occurs,
   the clean bit would be reset to dirty, and committed
   to disk prior to the write operation being permitted
   to proceed (a stall barrier).  I like this idea because,
   for the most part, it eliminates fsck, both BG and FG,
   on systems that crash while it's in effect.  The net
   result is a system that is statistically much more
   tolerant of failures, but which still requires another
   safety net, such as the previous solution.

I was thinking of doing something like this myself as part of an
``idle timeout'' for disks.  (Marking the filesystem clean after a
period of quiescence would actually interfere with ATA disks'
built-in mechanism for spinning down after a timeout, which is
important for laptops, so the OS would have to track the true
amount of idle time.)  Annoyingly, I can never get the disk
containing /var to remain quiescent for long while cron is running
(even without any crontabs), and I hope this can be solved without
disabling cron or adding a nontrivial hack to bio.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: [Re: several background fsck panics

2003-03-28 Thread Terry Lambert
David Schultz wrote:
 Thus spake Terry Lambert [EMAIL PROTECTED]:
  o Put a counter in the first superblock; it would be
incremented when the BG fsck is started, and reset
to zero when it completes.  If the counter reaches
3 (or some command line specified number), then the
BG flagging is ignored, and a full FG fsck is then
performed instead.  I like this idea because it will
always work, and it's not actually a hack, it's a
correct solution.
 
 I'm glad you like it because AFAIK, it is already implemented.  ;-)

Nope.  What's implemented is the FS_NEEDSFSCK flag.  But that
flag is not set in the superblock flags field as *the very first
thing done*.

Thus a failure that results in a panic will not set the flag in
pfatal(), since it never gets there.

Probably the correct thing to do is to set the flag as the very
first operation, and then it will work as expected.

FWIW, it looks like the code in pfatal() wanted to be in main(),
since it complains about not being able to run in the background,
the same way main() does.

However, this still leaves a race window.

The reason the panic happens is that FreeBSD is running processes
on a corrupt FS.

Even in the best case, this panic may occur when anything is
loaded off the FS, so it could happen on init, or on fsck
itself, etc..

So really, the only solution is a counter that the FS kernel
code counts up, which is reset to zero when a BG fsck completes
successfully.   Say grabbing the first byte of fs_sparecon32[].

BTW: This still leaves a failure case: the BG fsck has to be
able to complete successfully... but that's not enough to stave
off a future panic from an undetected error that the fsck didn't
see, because it was only pruning CG bitmaps.

So the correct place to zero the counter is, once again, in the
kernel.  As a result of a successful unmount, from a non-panic
shutdown.

This does mean that three (or count) consecutive power failures
gets you a FG fsck, but that's probably livable (if you were that
certain there was no corruption, you could boot to a shell and
override the count parameter to the FG fsck trigger threshold).


  o Implement soft read-only.  The place that most of
the complaints are coming from is desktop users, with
relatively quiescent machines.  Though swap is used,
it does not occur in an FS partition.  As a result,
the FS could be marked read-only for long period of
time.  This marking would be in memory.  The clean bit
would be set on the superblock.  When a write occurs,
the clean bit would be reset to dirty, and committed
to disk prior to the write operation being permitted
to proceed (a stall barrier).  I like this idea because,
for the most part, it eliminates fsck, both BG and FG,
on systems that crash while it's in effect.  The net
result is a system that is statistically much more
tolerant of failures, but which still requires another
safety net, such as the previous solution.
 
 I was thinking of doing something like this myself as part of an
 ``idle timeout'' for disks.  (Marking the filesystem clean after a
 period of quiescence would actually interfere with ATA disks'
 built-in mechanism for spinning down after a timeout, which is
 important for laptops, so the OS would have to track the true
 amount of idle time.)  Annoyingly, I can never get the disk
 containing /var to remain quiescent for long while cron is running
 (even without any crontabs), and I hope this can be solved without
 disabling cron or adding a nontrivial hack to bio.

We implemented this when we implemented soft updates in FFS under
Windows at Artisoft.  That was back before ATX power supplies were
wide spread, and we needed to be tolerant of users who simply
turned off the power switch, without running the Windows95
shutdown sequence.

I dunno about cron.  I think it noticing crontab changes
automatically has maybe made it too smart for its own good.

Cron updates the access time on the crontab file every time it
runs, which is once a second.  If you disabled this for fstat,
the problem would go away.  I'm not sure the semantics are OK,
though.

The old pre-smarter cron would not have this problem, as it
would run on intervals, and sleep for long periods (until the
next job was scheduled to run), and you had to hit it over the
head with kill -HUP to tell it the file changed.

Probably the correct thing to do is to use old-style long delta
intervals, and register a kevent interest in file modifications.

The cruddy thing is, if it were really read-only, then the access
time update wouldn't happen.  Catch-22.

I think maybe it's useful to distinguish the POSIX semantics here:
shall be scheduled for update is not the same thing, really, as
shall be updated.  So, in practice, you could cache the access
time update for long periods, as long as the correct time was
marked in 

Re: several background fsck panics

2003-03-26 Thread Matthias Schuendehuette
Terry Lambert wrote:

 The issue with the repeating background fsck's is important.
 I suggest a counter that gets reset to zero each time the
 FS is marked clean by fsck, and incremented each time the
 background fsck process is started.

 When this counter reaches a predefined value (I sugest a
 command line option to background fsck, which defaults to
 3, if left unspecified), then the fsck is automatically
 converted to a foreground fsck.

 This counter would be recorded in the superblock.

This sounds like a good idea! I vote for a counter of 2... :-)

Also I suggest to mention as clearly as possible, that operating Soft 
Updates with Write Cache enabled is kind of 'out of specs'. This cannot 
work when crashing! (As you stated clearly!) So I'm also voting for 'WC 
disable' for any kind of disks. SCSI-disks don't need it because of 
Tagged Queuing and only those ATA-Disks that *have* TQ can/should be 
operated 'the fast way' - hoping that Soeren gets it working again... 
:-/
-- 
Ciao/BSD - Matthias

Matthias Schuendehuette msch [at] snafu.de, Berlin (Germany)
Powered by FreeBSD 5.0-CURRENT

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: several background fsck panics

2003-03-25 Thread Alexander Langer
Thus spake Terry Lambert ([EMAIL PROTECTED]):

 Disable write caching on your ATA drive.  You should be able to
 safely reset after that.

Good idea, thanks.  Nevertheless:  I don't think the system should
panic on background fsck's, while a manual fsck works.

Alex

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: several background fsck panics

2003-03-25 Thread Terry Lambert
Alexander Langer wrote:
 Thus spake Terry Lambert ([EMAIL PROTECTED]):
  Disable write caching on your ATA drive.  You should be able to
  safely reset after that.
 
 Good idea, thanks.  Nevertheless:  I don't think the system should
 panic on background fsck's, while a manual fsck works.

A manual fsck can deal with corrupt data.

A background fsck can only deal with invalid cylinder group
bitmaps, and operates on a snapshot.

For a background fsck to be feasible, the FS has to be in a
self-consistent state already, which it wasn't.

When you killed the power on your system and reset it, you
lost the cached data sitting in the ATA disk.  This is due
to the fact that the ATA disk lied, and claimed that it had
committed some writes to stable storage, when in fact it had
only copied them to the disk cache.  As a result, when the
device reset happened, you lost some writes which were in
progress.  Therefore you disk image was corrupt, and so your
FS was *not* in a self-consistent state.

This type of error happens on ATA disks because they do not
permit disconnects during writes, only during reads.  If you
want to be able to reset your machine out from under your
disk, with caching turned on, buy SCSI hardware, instead of
ATA hardware: it does not lie to the host system, and claim
tagged writes have been committed to stable storage when they
have not, and are only in (volatile) cache RAM.

The panic was not, in fact, a result of the background fsck
itself: it was a result of an attempt to access FS structures
by the kernel through the FS, assuming -- incorrectly -- that
the FS structures were in a self-consistent state.

This assumption was bogus, but there was no way for the kernel
to know this because the failure state was not recovered, and
that happened because PC hardware is bogus.

This happened because you had background fsck enabled, and it
was unable to tell the difference between a power failure vs.
a panic vs. some other cause for a system crash (hardware or
other failure).  This is because the PC hardware itself doesn't
record these types of events in NVRAM (e.g. CMOS), nor does it
have sufficient DC holdup time that it could write a failure
code to NVRAM, before losing its marbles.

Hope this explains why you had the problem, and why real servers
tend to specify SCSI hardware, and tend not to be PC-class hardware
(i.e. an RS/6000 would have known the failure cause when it came
back up from reading it's NVRAM, and performed a full recovery
appropriate to the failure).


PS: Unfortunately, this will not change on PC's any time soon,
because people have been trained by computer vendors, disk
vendors, and OS vendors that it's OK for PC's to need
rebooting, and/or to crash unexpectedly in catastrophic
ways that require reinstalling the OS.  So people tolerate
hardware that has ambiguous failure modes, as long as it
costs less.


-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: several background fsck panics

2003-03-25 Thread Alexander Langer
Thus spake Terry Lambert ([EMAIL PROTECTED]):

 A manual fsck can deal with corrupt data.

[...]

Yes, I recall the discussion about WC on ata vs. softupdates a few
months back.  I even have it disabled on more important machines than
this one :-)

 The panic was not, in fact, a result of the background fsck
 itself: it was a result of an attempt to access FS structures
 by the kernel through the FS, assuming -- incorrectly -- that
 the FS structures were in a self-consistent state.

Actually I don't care _where_ the panic happened.  If I hadn't manually
interupted the boot process, this kernel would have booted and paniced
on that error for the next three years.  I could fix that by simply
doing a manual (background_fsck=NO), so something is broken, for some
definition of broken:  If my system panics, I call that broken.

We claim background fsck as a cool new feature in the release notes,
which is even the DEFAULT, including WC on ATA disks, which is ALSO the
default.  So , and if this is broken, there is a serious design flaw,
which must be fixed.  It doesn't help to explain why the error is there,
the next user will have the same error, running a verbatim system.

Ciao

Alex

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: several background fsck panics

2003-03-25 Thread Alexander Leidinger
On Tue, 25 Mar 2003 16:04:07 +0100
Alexander Langer [EMAIL PROTECTED] wrote:

 We claim background fsck as a cool new feature in the release notes,
 which is even the DEFAULT, including WC on ATA disks, which is ALSO the
 default.  So , and if this is broken, there is a serious design flaw,
 which must be fixed.  It doesn't help to explain why the error is there,
 the next user will have the same error, running a verbatim system.

AFAIK: Søren had the WC off for a while on -current, but a lot of people
complained, so he switched it back on (I'm sure he regrets it every time
he is reminded about it). So -- including you and me -- there are at
least 3 committers which would like to see the WC turned off by default.

There are a lot of other people without special @FreeBSD.org privileges
which also don't like the actual default (if you can get a look at
iX-10/2002 read the BSD-Softupdates vs. Linux-Journaling article - the
tests for this article where done on SCSI hardware, but this doesn't
matter in this case - it explains the interactions of WC, TQ and SO and
how they affect the speed of some FS-operations).

Maybe we can gain some momentum and restore POLA (in this case the
default of going the safe way instead of the fast (but sometimes
dangerous) way).

Bye,
Alexander.

-- 
Yes, I've heard of decaf. What's your point?

http://www.Leidinger.net   Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: several background fsck panics

2003-03-25 Thread The Anarcat
On Tue Mar 25, 2003 at 03:54:58AM -0800, Terry Lambert wrote:
 Alexander Langer wrote:
  Thus spake Terry Lambert ([EMAIL PROTECTED]):
   Disable write caching on your ATA drive.  You should be able to
   safely reset after that.
  
  Good idea, thanks.  Nevertheless:  I don't think the system should
  panic on background fsck's, while a manual fsck works.
 
 A manual fsck can deal with corrupt data.
 
 A background fsck can only deal with invalid cylinder group
 bitmaps, and operates on a snapshot.
 
 For a background fsck to be feasible, the FS has to be in a
 self-consistent state already, which it wasn't.
 
 When you killed the power on your system and reset it, you
 lost the cached data sitting in the ATA disk.  This is due
 to the fact that the ATA disk lied, and claimed that it had
 committed some writes to stable storage, when in fact it had
 only copied them to the disk cache.  As a result, when the
 device reset happened, you lost some writes which were in
 progress.  Therefore you disk image was corrupt, and so your
 FS was *not* in a self-consistent state.

Shouldn't fsck run in the foreground for disks setup with WC? That
would be a quick hack solving this issue altogether.

A.

-- 
Conformity-the natural instinct to passively yield to that vague something
recognized as authority.
- Mark Twain


pgp0.pgp
Description: PGP signature


Re: several background fsck panics

2003-03-25 Thread Terry Lambert
Alexander Langer wrote:
 Actually I don't care _where_ the panic happened.  If I hadn't manually
 interupted the boot process, this kernel would have booted and paniced
 on that error for the next three years.  I could fix that by simply
 doing a manual (background_fsck=NO), so something is broken, for some
 definition of broken:  If my system panics, I call that broken.

Actually, you *do* care where the panic occurred.  8-).

The issue with the repeating background fsck's is important.
I suggest a counter that gets reset to zero each time the
FS is marked clean by fsck, and incremented each time the
background fsck process is started.

When this counter reaches a predefined value (I sugest a
command line option to background fsck, which defaults to
3, if left unspecified), then the fsck is automatically
converted to a foreground fsck.

This counter would be recorded in the superblock.


 We claim background fsck as a cool new feature in the release notes,

I don't.  I'm convinced it's technically infeasible, and Kirk
has validated my reasoning on this, previously.  It is about
as safe or unsafe as running with async mounts.  Maybe worse,
depending on the MTBF for your disk drives (i.e. ATA drives
fail fairly often, if not catastrophically, in the presence
of power failures; this can be mitigated by dual power supplies
and UPS equipment).


 which is even the DEFAULT, including WC on ATA disks, which is ALSO the
 default.  So , and if this is broken, there is a serious design flaw,
 which must be fixed.  It doesn't help to explain why the error is there,
 the next user will have the same error, running a verbatim system.

The explanation is that the very idea of a background fsck,
without additional hardware support, is flawed.  Rather than
the problem occuring in the snapshot code, it could just as
easily occured as a result of some process running before it
had the opportunity to fsck at all.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


[Re: several background fsck panics

2003-03-25 Thread Terry Lambert
The Anarcat wrote:
  When you killed the power on your system and reset it, you
  lost the cached data sitting in the ATA disk.  This is due
  to the fact that the ATA disk lied, and claimed that it had
  committed some writes to stable storage, when in fact it had
  only copied them to the disk cache.  As a result, when the
  device reset happened, you lost some writes which were in
  progress.  Therefore you disk image was corrupt, and so your
  FS was *not* in a self-consistent state.
 
 Shouldn't fsck run in the foreground for disks setup with WC? That
 would be a quick hack solving this issue altogether.

There are a lot of quick hacks that can be done to solve the
issue.  There are also real fixes:

o   Disable BG fsck if WC is on; I dislike this hack,
mostly because of postings by drive engineers to
FreeBSD lists, indicating a willingness to address
ATA issues like this, and the fact that most SCSI
drives don't actually have this issue.

o   Put a counter in the first superblock; it would be
incremented when the BG fsck is started, and reset
to zero when it completes.  If the counter reaches
3 (or some command line specified number), then the
BG flagging is ignored, and a full FG fsck is then
performed instead.  I like this idea because it will
always work, and it's not actually a hack, it's a
correct solution.

o   Implement soft read-only.  The place that most of
the complaints are coming from is desktop users, with
relatively quiescent machines.  Though swap is used,
it does not occur in an FS partition.  As a result,
the FS could be marked read-only for long period of
time.  This marking would be in memory.  The clean bit
would be set on the superblock.  When a write occurs,
the clean bit would be reset to dirty, and committed
to disk prior to the write operation being permitted
to proceed (a stall barrier).  I like this idea because,
for the most part, it eliminates fsck, both BG and FG,
on systems that crash while it's in effect.  The net
result is a system that is statistically much more
tolerant of failures, but which still requires another
safety net, such as the previous solution.

o   Disk manufacturers could fix the ATA write caching
problem.  I think this will happen eventually, so the
first solution is out.

o   PC manufacturers could provide OS-usable NVRAM scratch
areas, which would permit an OS to allocate a section,
and use it.  The OS would then write the FreeBSD marker
into an area to allocate it, and then write power fail
as the failure code into the allocated area.  When a
panic or hardware failure occurred, it could write panic
or hardware fail as the failure code.  When the system
came back up, it would be able to distinguish which type
of failure by reading the NVRAM area.  If it was something
like panic with sync, it could run the BG fsck, otherwise
it would run the FG fsck.  I really like this idea, too.  I
believe that more modern systems have this capability, but
it has not yet been standardized.  Therefore we should take
a wait and see attitude towards it.

o   Disk manufacturers could provide a Lithium battery on board
disks.  This would not only bound their planned obsolesence
curve to 5 years or so (lifetime of the battery), it would
give them an aftermarket.  The battery would trickle-charge
from the disk drive power, and would be used to commit the
write cache in event of power failure.  I like this too; it
makes disk drives obsolete at about 2X the distance that they
become obsolete, it gives the drive manufacturers a bone for
playing along, and it actually solves the problem at it's
source.  People might not like your disk lasts 5 years vs.
your warranty is one year, but smoothing the market demand
function is probably worth more, in terms of lower cost to
consumers and assured profit to disk manufacturers, and it
can be billed as a marketing checkbox item, to force all the
other disk manufacturers into implementing it, too, so there
should be no downside.

o   We can change our file system structure to journalled; I like
this as well, but there are some issues with manufacturers who
do not provide track bondary information, so you can assure
yourselves that a track boundary doesn't span a corruption
boundary, in the event of a power failure.  If you can do this,
journalling actually becomes incredibly fast, since you know
the disk writes backwards on a given track, so you can just
implemente the completed write datestamp, and perform a single
 

several background fsck panics

2003-03-24 Thread Alexander Langer
Hi!

I had several panics related to background fsck now.  Once I disabled
background fsck, all went ok.

It began when I pressed the reset buttons on several boots while the
system was still doing fscks.

Then sometime this happened:

Mar 24 21:31:12 fump root: /dev/ad0s2g: 701589 files, 12766670 used,
32836022 free (76598 frags, 4094928 blocks, 0.2% fragmentation) 
Mar 24 21:32:27 fump kernel: handle_workitem_freeblocks: block count
Mar 24 21:37:36 fump root: fsck_ufs: cannot find inode 1443360 

and a bit later:

Mar 24 21:48:59 fump syslogd: kernel boot file is /boot/kernel/kernel
Mar 24 21:48:59 fump kernel: /usr: bad dir ino 500641 at offset 0:
mangled entry
Mar 24 21:48:59 fump kernel: panic: ufs_dirbad: bad dir
Mar 24 21:48:59 fump kernel: 
Mar 24 21:48:59 fump kernel: syncing disks, buffers remaining... 3810
3810 3810 3809 3809 3809 3806 3807 3807 3807 3807 3807 3803 3803 3803
3780 378
2 3782 3780 3780 3780 3780 3780 3780 3780 3780 3780 3780 3780 3780 3780
3780 3780 3780 3780 3780 3780 3780 
Mar 24 21:48:59 fump kernel: giving up on 2299 buffers
Mar 24 21:48:59 fump kernel: Uptime: 36m18s
Mar 24 21:48:59 fump kernel: Dumping 511 MB
Mar 24 21:48:59 fump kernel: ata0: resetting devices ..
Mar 24 21:48:59 fump kernel: done
Mar 24 21:48:59 fump kernel: 16 32 48 64 80[CTRL-C to abort]  96 112 128
144 160 176 192 208 224 240 256 272 288 304 320 336 352[CTRL-C to abort]
[C
TRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort]
[CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort]
[CTRL-
C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort]
[CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort]
[CTRL-C to
 abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort]

As I was in X, I thought my system just hung w/o dumping the core (my
hdd led is broken obviously), I then pressed reset.

Next turn:

Mar 24 21:48:59 fump kernel: WARNING: /data was not properly dismounted
Mar 24 21:48:59 fump kernel: /data: mount pending error: blocks 32 files 0
Mar 24 21:48:59 fump kernel: /data: superblock summary recomputed

system entered multi-user, I logged in, but did not start anything but
my login shell.

Mar 24 21:53:37 fump syslogd: kernel boot file is /boot/kernel/kernel
Mar 24 21:53:37 fump kernel: dev = ad0s2g, block = 1, fs = /data
Mar 24 21:53:37 fump kernel: panic: ffs_blkfree: freeing free block
Mar 24 21:53:37 fump kernel: 

(entered debugger, I did a c)

Mar 24 21:53:37 fump kernel: syncing disks, buffers remaining... panic:
bremfree: removing a buffer not on a queue
Mar 24 21:53:37 fump kernel: Uptime: 2m10s
Mar 24 21:53:37 fump kernel: Dumping 511 MB
Mar 24 21:53:37 fump kernel: ata0: resetting devices ..
Mar 24 21:53:37 fump kernel: done
Mar 24 21:53:37 fump kernel: 16 32 48 64 80 96 112 128 144 160 176 192 

I pressed reset again as it would have dumped the wrong panic
and I wanted to dump the panic that caused the initial panic.
Next boot:

Mar 24 22:22:58 fump kernel: RENT WAS I=2404025
Mar 24 22:22:58 fump savecore: reboot after panic: bremfree: removing a
buffer not on a queue
Mar 24 22:22:58 fump savecore: writing core to vmcore.1

ok, here's the stuff:


Script started on Mon Mar 24 22:50:14 2003
[EMAIL PROTECTED] /data/crash # gdb -k
GNU gdb 5.2.1 (FreeBSD)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-undermydesk-freebsd.
(kgdb) exec-file /boot/kernel/kernel
(kgdb) symbol-file /usr/obj/usr/src/sys/ZEROGRAVITY/kernel.debug 
Reading symbols from /usr/obj/usr/src/sys/ZEROGRAVITY/kernel.debug...done.
(kgdb) core-file vmcore.1
panic: bremfree: removing a buffer not on a queue
panic messages:
---
panic: ufs_dirbad: bad dir

syncing disks, buffers remaining... 3810 3810 3810 3809 3809 3809 3806 3807 3807 3807 
3807 3807 3803 3803 3803 3780 3782 3782 3780 3780 3780 3780 3780 3780 3780 3780 3780 
3780 3780 3780 3780 3780 3780 3780 3780 3780 3780 3780 
giving up on 2299 buffers
Uptime: 36m18s
Dumping 511 MB
ata0: resetting devices ..
done
 16 32 48 64 80[CTRL-C to abort]  96 112 128 144 160 176 192 208 224 240 256 272 288 
304 320 336 352[CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] 
[CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to 
abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C 
to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] 
[CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] Copyright (c) 1992-2003 The 
FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.0-CURRENT #5: Tue Mar 18 16:04:55 

Re: several background fsck panics

2003-03-24 Thread Terry Lambert
Alexander Langer wrote:
 I had several panics related to background fsck now.  Once I disabled
 background fsck, all went ok.
 
 It began when I pressed the reset buttons on several boots while the
 system was still doing fscks.

Disable write caching on your ATA drive.  You should be able to
safely reset after that.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message