On Wednesday 30 August 2006 23:49, Erik Mouw wrote: > On Fri, Aug 25, 2006 at 08:37:03PM +0200, Francesco Pietra wrote: > > Sent again: The external scsi HD was not connected when the accident > > occurred > > > > > > Hi Erik: > > Thanks for your attention. > > > > Main board: Tyan K8WE S2895 > > SATA II controllers nForce Pro 2200. > > Added graphic card Pixel view 6600 256M PCI. > > Added SCSI controller LSI PCI for external scsi HD (old IBM for backup). > > CPU1 and CPU2: Opteron Dual Core 265. > > ram: 8 x KingstonKVR 400D43a/1GB DDR2 CL3 Ecc Reg. > > HD: 2 x Maxtor 6V300F0; ATA version 7; ATA standard ATA/ATAPI-7 T13 1532 > > D. > > There appear to be problems with Nvidia Nforce chipsets with certain > Maxtor drives that result in data corruption. From what I could figure > out it appears to be a problem in the nforce SATA engine that show up > with certain Maxtor drives, though sometimes also with other brands. > Maxtor has a firmware update that works around the Nvidia bug, you > might want to ask their support department. > > > OS: debian etch amd6a, kernel 2.6.15-1-amd64-k8-smp, filesystem ext3, > > grub on boot partition, partitions for proc home tmp usr var swap, raid1 > > software, no Xsystem when the accident occurred. > > Make sure that you don't have the proprietary Nvidia kernel module > loaded for the graphics card. Because it's a proprietary module it's > not properly reviewed so it might silently corrupt memory. > > > #smartctl -a -d ata /dev/sda (or sdb) reported PASSED (run after the > > accident described above). Unable to see the result of short self test > > (don't know where it is written, if at all; disks are not in database). > > You could try to get smartmontools from debian-unstable and see if it > has support for your drives. > > > While I plan to replace the HD cables as soon as this computation has > > attained convergence, I wonder whether lack of a power protection unit > > may have been responsible for the failure of disks. > > Possible, though drives usually tend to die completely when they get > damaged by overvoltage. > > > I plan anyway to buy one; only > > uncertain about the power for this machine and an Athlon k7 pc. 800VA > > enough? I do not need long energy supply because calculations can be > > resumed from last HD written result; perhaps one minute energy supply? > > APC has a nice product selector on their website, see > http://www.apc.com/tools/ups_selector/ . There's good support for APC > UPSes by nut, and knutclient even gives you a nice graphical monitor > application (and of course nut can be monitored by nagios). > > Anyway, back to your problem: > - Make sure you don't use the proprietary nvidia kernel module > - Replace the cables > - If it still persists, check Maxtor support
Thank you for all extremely useful information. During the last night, while computing with mpqc2.3.1 (no X system loaded, as always) same problem with disks. HD led lighted continuously. The screen showed: ---SCSI error ---<o> kernel panic - not syncing: Aiee, killing interrupt handler! All checks I could carry out as before, showed the disks in order, and my data also in order. Now I have changed the data cables with virgin cables to both disks and restarted mpqc. I'll do nothing else for today because I received two new disks I had recently ordered ---WD 1500ADFD WD Raptor MDL WD1500ADFD-OONLR1 Date 13 Jun 2006 DCM HBCA2AB 5VDC 0.90A 12VDC 0.75 I plan to replace the Maxtor with these. My only concern is the ventilation. Although the Enermax CS-721 has 4 fans (in addition to the two to the dual opterons) there is no direct ventilation to the disks. I wonder whether an aluminum Cooldrive 4002 (with its own low-diameter fan) for each disk is an appropriate heat dissipation or a large-diameter external fan directed toward the disk is better. Your advice? Next Monday I'll receive an 1500AV UPS. Thanks again francesco > > > Erik -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]