Am 2013-03-21 01:21, schrieb Phil Kennedy: > Previous admin had us using monitoring with Zenoss (with SNMP hardware > monitoring.) I'm in the process of moving us to Nagios with a much, > MUCH > deeper level of monitoring. I've inherited mess that I was led to > believe was a finely tuned machine. Somehow I doubt my experiences are > unique....
Been there, seen that. Actually I am in the same position and working my way trough lots of interesting-configured machines and legacy stuff (to say it nicely). But blaming the data-loss on the software-raid and not on the missing monitoring/care, is just plain wrong. As others noted, with a hardware-raid you trade in a "proven" name for much less possibilities of monitoring. Reading /dev/mdstat or the output of "mdadm -D <raid>" works as soon as you have an ssh-connection. And no monitoring solution can claim to be usable without monitoring /dev/mdstat. For hw-raids on the other hand you get a different tool for each brand (and possibly series), a different way of alarming and then incompatibilities with controllers, harddisks and firmware-revisions. Once you lost data due to a hw-controller only notifying with a led on the back before failing completely and newer controller-releases not understanding the old revisions disk-format, you will be very glad when you can simply plug your sw-raids disk into any other linux machine and access the data. When you use the old meta-data format, you don't even need sw-raid support, you can simply mount the partitions to restore your data. Your story reminds me of one of my first all-nighter here with a customers server. The former admin had set up the customers new storage-server just fine. But then we wondered why only one of the three disks was used for the raid. Mainly because he 'didn't yet find the time to sync'. But also because he did the setup on a broken disk with read-errors (in currently unused space) and syncing the complete disk to the other raid members would freeze the system and abort the sync... Something I noticed just seconds after I started the sync via ssh... Anyway, stay calm, if in doubt stop all automatic backups on the machine concerned (there is a config option for that), and don't blame human deficits on the software/hardware in use. Good luck, Arnold ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/