Re: fsck UNEXPECTED INCONSISTENCY
On Tuesday 02 March 2010 13:35:23 J.C. Roberts wrote: > On Tue, 02 Mar 2010 11:06:46 -0500 and...@msu.edu wrote: > > Quoting "J.C. Roberts" : > > > And I thought I was expected to be inconsistent. ;) > > > > > > Anyhow, I was upgrading from the Feb 2, to the most recent > > > snapshot, and fsck is coming up with a problem on one of my > > > partitions. I can probably get it working ("fix" is such a strong > > > word) with `fsck -fy` but my real concern is if the drive is > > > failing? > > > > > > atactl tells me everything is just fine? > > > > > > I have a nearly identical system, with the same type of disk, which > > > reports similar atactl attributes... but then again, I don't really > > > trust SATA/PATA drives very much or their supposedly "smart" > > > monitoring. > > > > > > The data on the system is not only backed up, but it's also easily > > > replaced since the machine is only used for src and ports builds. I > > > think I might lose a total of a few newly downloaded distfiles > > > since the last backup. > > > > > > What I really want to do here is understand *why* some portion of > > > the disk has become unreadable? > > > > I've seen the smart system report errors and have had them become > > true a few times, but far more often I've seen the damn things report > > "No proble, Boss" and then died a little later... > > > > You could have any number of things going on giving you those read > > errors. With the right test jig from the manufacturer you'd likely > > know. My guess would be an op amp for one of the heads isn't quite > > working right. The other possibility is that part of the disk wasn't > > coated right and you have a weak spot, magnetically speaking. > > > > But you'll never really know. At least you now have a new target, for > > practice with. Sabot rounds are great for little disks... > > Where's Nick and his nail gun when we need him? > > I got to work with some people from the "disk industry" and know how > secretive they must be about how stuff actually works due to NDA's. I'd > have better odds as a snowball in hell than getting the needed test > equipment and docs from the vendor. Very likely true, which I why I say that you have an interesting target! --STeve Andre'
Re: fsck UNEXPECTED INCONSISTENCY
On Tue, 02 Mar 2010 11:06:46 -0500 and...@msu.edu wrote: > Quoting "J.C. Roberts" : > > > And I thought I was expected to be inconsistent. ;) > > > > Anyhow, I was upgrading from the Feb 2, to the most recent > > snapshot, and fsck is coming up with a problem on one of my > > partitions. I can probably get it working ("fix" is such a strong > > word) with `fsck -fy` but my real concern is if the drive is > > failing? > > > > atactl tells me everything is just fine? > > > > I have a nearly identical system, with the same type of disk, which > > reports similar atactl attributes... but then again, I don't really > > trust SATA/PATA drives very much or their supposedly "smart" > > monitoring. > > > > The data on the system is not only backed up, but it's also easily > > replaced since the machine is only used for src and ports builds. I > > think I might lose a total of a few newly downloaded distfiles > > since the last backup. > > > > What I really want to do here is understand *why* some portion of > > the disk has become unreadable? > > I've seen the smart system report errors and have had them become > true a few times, but far more often I've seen the damn things report > "No proble, Boss" and then died a little later... > > You could have any number of things going on giving you those read > errors. With the right test jig from the manufacturer you'd likely > know. My guess would be an op amp for one of the heads isn't quite > working right. The other possibility is that part of the disk wasn't > coated right and you have a weak spot, magnetically speaking. > > But you'll never really know. At least you now have a new target, for > practice with. Sabot rounds are great for little disks... > Where's Nick and his nail gun when we need him? I got to work with some people from the "disk industry" and know how secretive they must be about how stuff actually works due to NDA's. I'd have better odds as a snowball in hell than getting the needed test equipment and docs from the vendor. --
Re: fsck UNEXPECTED INCONSISTENCY
On Tue, Mar 2, 2010 at 8:06 AM, wrote: ... > I've seen the smart system report errors and have had them become > true a few times, but far more often I've seen the damn things report > "No proble, Boss" and then died a little later... I seem to recall a USENIX paper from google (perhaps for the FAST conference?) in which they analyzed the failure statistics for their server farms against things like the SMART stats and a bunch of other stats they collected. IIRC, they found SMART error reports to be useful in predicting failure and that while there were some correlations in other stats, the false positive rates would make using them uneconomical. Or something like that. If you're concerned about this disk failure then you should hunt up and read the actual paper...and continue your research beyond that. Philip Guenther
Re: fsck UNEXPECTED INCONSISTENCY
On Tue, 02 Mar 2010 11:27:50 -0500 "Brad Tilley" wrote: > > What I really want to do here is understand *why* some portion of > > the disk has become unreadable? > > > cd /bad_partition && dd if=/dev/zero of=big_file.zero bs=512 > conv=sync,noerror > > Let it run until it finishes. That won't explain why the sectors are > bad, but it may give a good indication of the problem area and answer > the failing drive question. If dd reports IO issues, you may want to > replace the drive. > > Brad Thanks Brad. If it was an unnecessary partition, I'd do a destructive overwrite to see what it does. Unfortunately, it's /usr/. I'm going to toss a new disk in the box, and do a fresh install on the new disk, so I can reliably play with the old one. --
Re: fsck UNEXPECTED INCONSISTENCY
Quoting "J.C. Roberts" : > And I thought I was expected to be inconsistent. ;) > > Anyhow, I was upgrading from the Feb 2, to the most recent snapshot, and > fsck is coming up with a problem on one of my partitions. I can probably > get it working ("fix" is such a strong word) with `fsck -fy` but my real > concern is if the drive is failing? > > atactl tells me everything is just fine? > > I have a nearly identical system, with the same type of disk, which > reports similar atactl attributes... but then again, I don't really trust > SATA/PATA drives very much or their supposedly "smart" monitoring. > > The data on the system is not only backed up, but it's also easily > replaced since the machine is only used for src and ports builds. I think > I might lose a total of a few newly downloaded distfiles since the last > backup. > > What I really want to do here is understand *why* some portion of the > disk has become unreadable? I've seen the smart system report errors and have had them become true a few times, but far more often I've seen the damn things report "No proble, Boss" and then died a little later... You could have any number of things going on giving you those read errors. With the right test jig from the manufacturer you'd likely know. My guess would be an op amp for one of the heads isn't quite working right. The other possibility is that part of the disk wasn't coated right and you have a weak spot, magnetically speaking. But you'll never really know. At least you now have a new target, for practice with. Sabot rounds are great for little disks... --STeve Andre'
Re: fsck UNEXPECTED INCONSISTENCY
On Tue, 02 Mar 2010 07:50 -0800, "J.C. Roberts" wrote: > And I thought I was expected to be inconsistent. ;) > > Anyhow, I was upgrading from the Feb 2, to the most recent snapshot, and > fsck is coming up with a problem on one of my partitions. I can probably > get it working ("fix" is such a strong word) with `fsck -fy` but my real > concern is if the drive is failing? > > atactl tells me everything is just fine? > > I have a nearly identical system, with the same type of disk, which > reports similar atactl attributes... but then again, I don't really trust > SATA/PATA drives very much or their supposedly "smart" monitoring. > > The data on the system is not only backed up, but it's also easily > replaced since the machine is only used for src and ports builds. I think > I might lose a total of a few newly downloaded distfiles since the last > backup. > > What I really want to do here is understand *why* some portion of the > disk has become unreadable? cd /bad_partition && dd if=/dev/zero of=big_file.zero bs=512 conv=sync,noerror Let it run until it finishes. That won't explain why the sectors are bad, but it may give a good indication of the problem area and answer the failing drive question. If dd reports IO issues, you may want to replace the drive. Brad > All of the below were done in single user mode over serial. > (sorry about the width) > > > # atactl wd0 smartenable > # atactl wd0 readattr > Attributes table revision: 16 > ID Attribute name Threshold Value Raw > 3 Spin Up Time 63 1800x46f2 > 4 Start/Stop Count 0 2530x00d2 > 5 Reallocated Sector Count 63 2530x0007 > 6 Read Channel Margin 100 2530x > 7 Seek Error Rate0 2530x > 8 Seek Time Performance187 2530x9edb > 9 Power-On Hours Count 0 2350xee5c > 10 Spin Retry Count 157 2530x > 11 Calibration Retry Count 223 2530x > 12 Device Power Cycle Count 0 2530x00f0 > 192 Power-Off Retract Count0 2530x > 193 Load Cycle Count 0 2530x > 194 Temperature0 2530x000f > 195 Hardware ECC Recovered 0 2530x170d > 196 Reallocation Event Count 0 2530x > 197 Current Pending Sector Count 0 2530x0001 > 198 Off-Line Scan Uncorrectable Sect 0 2530x > 199 Ultra DMA CRC Error Count 0 1990x > 200 Write Error Rate 0 2530x > 201 Soft Read Error Rate 0 2530x > 202 Data Address Mark Errors 0 2530x > 203 Run Out Cancel 180 2530x0001 > 204 Soft ECC Correction0 2530x > 205 Thermal Asperity Check 0 2530x > 207 Spin High Current 0 2530x > 208 Spin Buzz 0 2530x > 209 Offline Seek Performance 0 2530x > 99 Unknown0 2530x > 100 Unknown0 2530x > 101 Unknown0 2530x > # > > > # atactl wd0 smartstatus > No SMART threshold exceeded > # > > > # atactl wd0 identify > Model:6Y250L6, Rev: YAR41BW0, Serial #: > Device type: ATA, fixed > Cylinders: 16383, heads: 16, sec/track: 63, total sectors: 490234752 > Device capabilities: > ATA standby timer values > IORDY operation > IORDY disabling > Device supports the following standards: > ATA-1 ATA-2 ATA-3 ATA-4 ATA-5 ATA-6 ATA-7 > Device supports the following command sets: > NOP command > READ BUFFER command > WRITE BUFFER command > Host Protected Area feature set > Read look-ahead > Write cache > Power Management feature set > SMART feature set > Flush Cache Ext command > Flush Cache command > Device Configuration Overlay feature set > 48bit address feature set > Automatic Acoustic Management feature set > Set Max security extension commands > Advanced Power Management feature set > DOWNLOAD MICROCODE command > SMART self-t