Hello, there. I recently noticed that one of our servers was horribly slow when aptitude upgrading, at the point that installing a simple mysql update took a minute or so, when this server is in no way overloaded : twenty or so mail accounts, not heavily used, an intranet with its MySQL daemon and a front web page and under 1 Mbps on network interface. I forcefully checked mdadm clusters: they're all clean. I checked RAID attributes: I got a full scale reading on the read error rate on one of the two disks — in fact a 2^16 value, so I assumed this was a positive integer counter which reached its full scale reading — but this value disappeared when I tried to investigate and dropped back to zero. Already a problem because, as far as I know, this value just can't decrease, only increase; am I right to suspect a faulty hard disk ?
Besides that, I listed the filesystems — all of them being ext3 — parameters with tune2fs, and the I saw strange values at last mount/write dates : every filesystems say that these dates are a few weeks ago, at a moment I restarted the server — cleanly, I mean. Worse than that, the / filesystem says that it hasn't been written since the 7th of December, 2013. That's more than seven months ago ! I would make clear that this server's clock is NTP-synchronised; I just checked it and it has corrects date and time; in addition, inodes counts are OK, there are plenty of them free, and filesystems are not even used at 10%. In fact, I see nothing else wrong with these filesystems. Apart from the strange change in RAID attributes values, virtually nothing is wrong with these disks besides the inconsistent last write dates in the filesystems. I noticed that our other servers also show a last write date some weeks ago, so I assume these values are consistent, but seven months ago, with at least one reboot and a server always running since December ? I can't imagine a logical reason for such a period. Do you know if this long period is consistent ? If so, why is it consistent ? If not, does that mean that a hard disk is to be changed ? The one whose RAID attributes values are so erratic ? By the way, how can such a value decrease from full scale reading to zero in a matter of minutes — during an extended self-test, I should add ? I was considering running the sync command to forcibly flush disks caches, but, as the filesystems are all that slow that a single file remove with rm took around a minute and slowed I/O at the point that half the CPU cores where used for I/O wait, I'm not sure this is a good idea to launch a sync. In fact, would a sync be effective ? fask, maybe ? What else could be effective ? Thank you in advance for your answers. Regards. PS: I will of course provide any needed additional information, as long as it isn't a critical information for our server's security. -- David Guyot Administrateur système, réseau et télécommunications / Sysadmin Europe Camions Interactive / Stockway Moulin Collot F-88500 Ambacourt Tel: +33 (0)3 29 30 47 85 Fax : +33 (0)3 29 31 31 31
signature.asc
Description: Digital signature