Currently running TSM server version 5.1.8 on an AIX 5.2 machine. We have a Qualstar TLS-412600 library with three AIT-2 drives.
We are having continual problems with I/O errors almost exclusively during the reclamation of tapes. Most often 98-99% of the reclaim works but we are seeing with perhaps quarter to a half of our reclaims a number of read errors ( < 100 ). In almost all cases, when we then perform a 'move data' on the errant tape, it reads the remaining data off without any problems. Errors have been seen on two of the drives over the past month - but I'm not sure about the relative frequency of use of all 3 drives to determine whether the other drive is error free or just lucky. What is most frustrating about this problem, is that a drive experiencing read errors then hangs. The reclaim process is cancelled when the volume has no reads 'logged' for some time - but the process usually takes between 4 to 12 hours to stop - presumably waiting on some I/O timeout. The drive can be observed performing some activity during this time - continual retries perhaps ? If we can't wait for 12 hours for the drives to be available again, the whole AIX box has to be reloaded to clear the situation. Drives have been replaced following tape jams but the replacement drives still exhibit the same problems. We have set the drives up with a cleaning frequency of 1000 GB - so they are being cleaned every now and then. I would be interested to hear if anyone else has experienced the same problems with these drives - assuming anyone else uses AIT drives. I wonder whether the problem is symptomatic with these drives or whether there are any firmware upgrades that might fix the problem. How do you find out what version is on the drive ? I would also be very interested in any suggestions for preventing the interminable hangs. Is there anywhere that this timeout can be reduced ? Examples of errors logged : 2004-06-24 15:12:14 ANR8302E I/O error on drive DRIVE0 (/dev/mt0) (OP=READ, Error Number=7 8, CC=205, KEY=FF, ASC=FF, ASCQ=FF, SENSE=**NONE**, Description=SCSI adapter failure). Ref er to Appendix D in the 'Messages' manual for recommended action. then eventually when the cancel process completes ; 2004-06-24 23:17:49 ANR8302E I/O error on drive DRIVE0 (/dev/mt0) (OP=FSR, Error Number=78 , CC=205, KEY=FF, ASC=FF, ASCQ=FF, SENSE=**NONE**, Description=SCSI adapter failure). Refe r to Appendix D in the 'Messages' manual for recommended action. +----------------------------------------------------------------------+ Steven Bridge Systems Group, Information Systems, EISD University College London