Re: I've just had a massive file system crash
On Sun, 2003-01-26 at 18:38, Greg Lehey wrote: Did you use shutdown -p? If my hypothesis is correct, it's possible to get this result with shutdown -h if you press the power switch as soon as the System halted message appears, but normally you'd give it a few seconds longer. With shutdown -p, it's immediate, modulo delay. Not certain if I did, but it's likely. -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au The nice thing about standards is that there are so many of them to choose from. -- Andrew Tanenbaum GPG Fingerprint - 9A8C 569F 685A D928 5140 AE4B 319B 41F4 5D17 FDD5 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I've just had a massive file system crash
On Sun, Jan 26, 2003 at 04:08:31PM +0800, Greg Lehey wrote: On Sunday, 26 January 2003 at 14:24:02 +1030, Daniel O'Connor wrote: On Sun, 2003-01-26 at 08:08, David Schultz wrote: Good. I was referring to IDE in this case, because I assume that's what Greg's laptop uses. The ATA driver flushes the cache when the device is closed, but I don't think that happens during shutdown. It probably needs to register a shutdown hook like the SCSI driver. Also, the driver is a bit optimistic about how long the flush will take; it times out after 5 seconds, whereas the ATA spec says a flush can take up to 30 seconds. I am wondering if I experienced this problem with my -stable laptop.. I shut it down and then booted it up later to find fsck having a nice good chew on the drive (deleting REAMS of files). Did you use shutdown -p? If my hypothesis is correct, it's possible to get this result with shutdown -h if you press the power switch as soon as the System halted message appears, but normally you'd give it a few seconds longer. With shutdown -p, it's immediate, modulo delay. Just a random idea: If that poses an issue, how about this patch? Eugene --- src/sys/kern_shutdown.c Sun Jan 26 14:24:56 2003 +++ src/sys/kern_shutdown.c.new Sun Jan 26 14:25:42 2003 @@ -545,7 +545,7 @@ static void poweroff_wait(void *junk, int howto) { - if(!(howto RB_POWEROFF) || poweroff_delay = 0) + if(!(howto (RB_POWEROFF | RB_HALT)) || poweroff_delay = 0) return; DELAY(poweroff_delay * 1000); }
Re: I've just had a massive file system crash
Thus spake Greg Lehey [EMAIL PROTECTED]: I've been thinking about what happened, and I have a possibility: the session before shutdown included a lot of writing to that file system, and I did a shutdown -p. It's possible that the shutdown powered off the system before the disk had flushed its cache. For the moment I'm avoiding shutdown -p, but when I get home I'll try to provoke it again. Just a heads up: Soeren tells me he will commit a fix for this in his next ATA meta-commit. I have patches if wanted. I still can't figure out why the problem would trash your entire home directory, though. Even if the disk reordered writes and failed to write some sectors, directory entries that were not being actively modified shouldn't have become corrupted, as far as I know. (Maybe your disk does track-at-once writes and just happened to be flushing the last few sectors from its cache when the power was cut.) Perhaps someone could ask Kirk, although it may take an actual hosed filesystem to diagnose what happened. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I've just had a massive file system crash
David Schultz wrote: I still can't figure out why the problem would trash your entire home directory, though. Even if the disk reordered writes and failed to write some sectors, directory entries that were not being actively modified shouldn't have become corrupted, as far as I know. Something similar happened to me in 4-STABLE several months ago. After a panic/crash (caused by an unstable USB audio driver) the automatic fsck failed. This happened twice; the second time my filesystem was totally messed up, and after fsck did its thing, several files were missing, including files in /usr/bin and /usr/sbin that had not been touched for many weeks (ie since the last installworld). The damage wasn't as extensive as Greg reports, and my home directory was spared, but I had to reinstall the base system to get things working smoothly again. I then turned off write caching on the IDE drive. Afterwards I had several such crashes (caused by the same driver) but never again had filesystem damage -- automatic fsck always worked. Nevertheless, as you say, it's strange files which had not been touched went missing. - Rahul To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I've just had a massive file system crash
On Sunday, 26 January 2003 at 14:24:02 +1030, Daniel O'Connor wrote: On Sun, 2003-01-26 at 08:08, David Schultz wrote: Good. I was referring to IDE in this case, because I assume that's what Greg's laptop uses. The ATA driver flushes the cache when the device is closed, but I don't think that happens during shutdown. It probably needs to register a shutdown hook like the SCSI driver. Also, the driver is a bit optimistic about how long the flush will take; it times out after 5 seconds, whereas the ATA spec says a flush can take up to 30 seconds. I am wondering if I experienced this problem with my -stable laptop.. I shut it down and then booted it up later to find fsck having a nice good chew on the drive (deleting REAMS of files). Did you use shutdown -p? If my hypothesis is correct, it's possible to get this result with shutdown -h if you press the power switch as soon as the System halted message appears, but normally you'd give it a few seconds longer. With shutdown -p, it's immediate, modulo delay. Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I've just had a massive file system crash
On Fri, Jan 24, 2003 at 11:03:52PM -0800, David Schultz wrote: FreeBSD's ``fix'' for this problem is the same as Windows 98's. Specifically, there is a 5-second delay (tuneable: kern.shutdown.poweroff_delay) after all buffers are flushed but before the power is cut. Maybe we ought to be sending FLUSH CACHE commands to all drives and waiting for them to finish. I've heard is longer then 5sec on more recent systems like 2000 or XP. I even heard one claim that some shops were using 30sec internally. -- Brooks -- Any statement of the form X is the one, true Y is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 msg50901/pgp0.pgp Description: PGP signature
Re: I've just had a massive file system crash
On Fri, 24 Jan 2003, David Schultz wrote: Thus spake Greg Lehey [EMAIL PROTECTED]: I've been thinking about what happened, and I have a possibility: the session before shutdown included a lot of writing to that file system, and I did a shutdown -p. It's possible that the shutdown powered off the system before the disk had flushed its cache. For the moment I'm avoiding shutdown -p, but when I get home I'll try to provoke it again. FreeBSD's ``fix'' for this problem is the same as Windows 98's. Specifically, there is a 5-second delay (tuneable: kern.shutdown.poweroff_delay) after all buffers are flushed but before the power is cut. Maybe we ought to be sending FLUSH CACHE commands to all drives and waiting for them to finish. da(4) does a SYNC CACHE (see daclose() and dashutdown()). -Nate To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I've just had a massive file system crash
Thus spake Nate Lawson [EMAIL PROTECTED]: On Fri, 24 Jan 2003, David Schultz wrote: Thus spake Greg Lehey [EMAIL PROTECTED]: I've been thinking about what happened, and I have a possibility: the session before shutdown included a lot of writing to that file system, and I did a shutdown -p. It's possible that the shutdown powered off the system before the disk had flushed its cache. For the moment I'm avoiding shutdown -p, but when I get home I'll try to provoke it again. FreeBSD's ``fix'' for this problem is the same as Windows 98's. Specifically, there is a 5-second delay (tuneable: kern.shutdown.poweroff_delay) after all buffers are flushed but before the power is cut. Maybe we ought to be sending FLUSH CACHE commands to all drives and waiting for them to finish. da(4) does a SYNC CACHE (see daclose() and dashutdown()). Good. I was referring to IDE in this case, because I assume that's what Greg's laptop uses. The ATA driver flushes the cache when the device is closed, but I don't think that happens during shutdown. It probably needs to register a shutdown hook like the SCSI driver. Also, the driver is a bit optimistic about how long the flush will take; it times out after 5 seconds, whereas the ATA spec says a flush can take up to 30 seconds. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I've just had a massive file system crash
On Sun, 2003-01-26 at 08:08, David Schultz wrote: Good. I was referring to IDE in this case, because I assume that's what Greg's laptop uses. The ATA driver flushes the cache when the device is closed, but I don't think that happens during shutdown. It probably needs to register a shutdown hook like the SCSI driver. Also, the driver is a bit optimistic about how long the flush will take; it times out after 5 seconds, whereas the ATA spec says a flush can take up to 30 seconds. I am wondering if I experienced this problem with my -stable laptop.. I shut it down and then booted it up later to find fsck having a nice good chew on the drive (deleting REAMS of files). I stopped it and then ripped it out of the lappy and mounted it read only to recover most of my files. Lots of things in /etc got toasted, and it was rather annoying to recover from :( -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au The nice thing about standards is that there are so many of them to choose from. -- Andrew Tanenbaum GPG Fingerprint - 9A8C 569F 685A D928 5140 AE4B 319B 41F4 5D17 FDD5 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
I've just had a massive file system crash
I'm rather astounded. I'm currently at a Linux conference, and have of course been boasting about the stability of ufs, and today I had a crash which tore apart my /home file system. This is on a laptop, one which has been running -CURRENT for years with no trouble. At the moment it's running 5.0-RELEASE. Today I shut it down cleanly, and a couple of hours later rebooted it. It has three file systems, one of which came up dirty. fsck -y reported thousands of errors, and when it was finished, my home directory and some other files were gone, and all the subdirectories of my home directory were in lost+found, a total of 1.4 GB. Most of the errors appear to be duplicate Inode numbers. Obviously it's too late to work out what happened, but I thought it's worth mentioning in case somebody else is having the same trouble. Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I've just had a massive file system crash
Greg Lehey [EMAIL PROTECTED] wrote: It has three file systems, one of which came up dirty. fsck -y reported thousands of errors, and when it was finished, my home directory and some other files were gone, and all the subdirectories of my home directory were in lost+found, a total of 1.4 GB. Most of the errors appear to be duplicate Inode numbers. Don't be too hasty to blame UFS. Everytime this has happened to me (even on Linux) it has been because the disk drive was failing. It has happened to me *many* times with IDE drives. I wind up replacing about 1/4 of them every year, on average. But, I did go through a run of those bad IBM drives :-) Did you happen to drop the laptop? :-) - Dave Rivers - -- [EMAIL PROTECTED]Work: (919) 676-0847 Get your mainframe programming tools at http://www.dignus.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I've just had a massive file system crash
Next time you run fsck -y in this scenario, log the output to an md partition and stick it somewhere for analysis. At least, that was the moral of the story last time I hosed a box in this form (incidentally, I think it ended up being a failing hard disk). Robert N M Watson FreeBSD Core Team, TrustedBSD Projects [EMAIL PROTECTED] Network Associates Laboratories On Fri, 24 Jan 2003, Greg Lehey wrote: I'm rather astounded. I'm currently at a Linux conference, and have of course been boasting about the stability of ufs, and today I had a crash which tore apart my /home file system. This is on a laptop, one which has been running -CURRENT for years with no trouble. At the moment it's running 5.0-RELEASE. Today I shut it down cleanly, and a couple of hours later rebooted it. It has three file systems, one of which came up dirty. fsck -y reported thousands of errors, and when it was finished, my home directory and some other files were gone, and all the subdirectories of my home directory were in lost+found, a total of 1.4 GB. Most of the errors appear to be duplicate Inode numbers. Obviously it's too late to work out what happened, but I thought it's worth mentioning in case somebody else is having the same trouble. Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
RE: I've just had a massive file system crash
three file systems, one of which came up dirty. fsck -y reported thousands of errors, and when it was finished, my home directory and some other files were gone, and all the subdirectories of my home This may (or may not) have anything to do with it, but I had a problem with a couple of filesystem back in September that had the error: (Running on RELENG_4 that was very recent at the time) CG 22: BAD MAGIC NUMBER fsck -y gave thousands of errors (similar to what you had) and when it was done, nothing was on the filesystem. (I didn't think to check lost+found at the time, just restored the filesystem, so the files may have been placed in there) During the space of 2 days, I had a total of 3 of these on two different systems. Forcing a mount (without cleaning) on the other two showed a perfect filesystem (which I backed up, newfs'd and restored). I even compared one of these with a backup and there wasn't a single thing different. It sort of baffled me at the time, since one of those filesystems didn't have any writing (other than atime perhaps) and still had the error. I haven't had a problem since then, and I know there are quite a bit of changes between 4 and 5, but it really does sound similar. At least the fsck part sounds almost exactly the same. Jaime bozza To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I've just had a massive file system crash
On Friday, 24 January 2003 at 20:34:24 +1000, Andy Farkas wrote: I'm rather astounded. I'm currently at a Linux conference, and have of course been boasting about the stability of ufs, and today I had a crash which tore apart my /home file system. This is on a laptop, one which has been running -CURRENT for years with no trouble. At the moment it's running 5.0-RELEASE. Today I shut it down cleanly, and a couple of hours later rebooted it. It has three file systems, one of which came up dirty. fsck -y reported thousands of errors, and when it was finished, my home directory and some other files were gone, and all the subdirectories of my home directory were in lost+found, a total of 1.4 GB. Most of the errors appear to be duplicate Inode numbers. Obviously it's too late to work out what happened, but I thought it's worth mentioning in case somebody else is having the same trouble. I can only think that your disk is going bad. That was one of my thoughts too. Try a dd if=/dev/ad0 of=/dev/null and see if you get any read errors. Nope, runs fine. It also doesn't explain why it happened at startup time. On Friday, 24 January 2003 at 6:53:41 -0500, Thomas David Rivers wrote: Don't be too hasty to blame UFS. I'm not. I've just reported what happened, in case others see it. On Friday, 24 January 2003 at 11:06:26 -0500, Robert Watson wrote: Next time you run fsck -y in this scenario, log the output to an md partition and stick it somewhere for analysis. At least, that was the moral of the story last time I hosed a box in this form (incidentally, I think it ended up being a failing hard disk). Yes, if you know it's going to happen. I could easily have written it to /var/tmp, which was mounted. I just wasn't expecting anything like this to happen. I've been using UFS on a daily basis for over 10 years, and this is the first time this has happened to me. I've been thinking about what happened, and I have a possibility: the session before shutdown included a lot of writing to that file system, and I did a shutdown -p. It's possible that the shutdown powered off the system before the disk had flushed its cache. For the moment I'm avoiding shutdown -p, but when I get home I'll try to provoke it again. Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I've just had a massive file system crash
Thus spake Greg Lehey [EMAIL PROTECTED]: I've been thinking about what happened, and I have a possibility: the session before shutdown included a lot of writing to that file system, and I did a shutdown -p. It's possible that the shutdown powered off the system before the disk had flushed its cache. For the moment I'm avoiding shutdown -p, but when I get home I'll try to provoke it again. FreeBSD's ``fix'' for this problem is the same as Windows 98's. Specifically, there is a 5-second delay (tuneable: kern.shutdown.poweroff_delay) after all buffers are flushed but before the power is cut. Maybe we ought to be sending FLUSH CACHE commands to all drives and waiting for them to finish. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message