Re: Problems with Tape Backups - Continued
Thanks for the comments. I checked the system logs and we are not using an Adaptec SCSI controller. It is a Fusion MPT (integral part of the Intel mother board), so I guess that rules out the aic79xx driver as the problem. Doesn't it? The other comment was that our kernel, 2.4.21-37.EL.XFSsmp, is too old for SCSI tape drives and I need at least 2.4.24. I do not see any SCSI errors in any of the logs. Should I see errors in any log? I only have problems when the number of files gets over some unknown limit. We've got 3 systems here, all running the same version of SL and have the same tape drive (the 3rd one has an Adaptec controller). I haven't seen any problems on the 3rd system, but I haven't tried pushing the number of files on the system to test it. The problem with trying to upgrade to a newer kernel is that SL doesn't have an XFS capable kernel newer than what I have installed, at least not for the SL 3.0.x series. Any ideas? Thanks for the help.
Re: Problems with Tape Backups - Continued
See, eg: http://linuxtapecert.org/index.php?option=com_content&task=view&id=2&Itemid=1 "if you are using SCSI tape drives under Linux with a 2.4 kernel, the use of a kernel prior to 2.4.24 will result in SCSI errors and failed tape I/O operations" I spent a long time fruitlessly trying to diagnose a similar problem, that went away after a kernel update. R. On 05/06/07, Brent L. Bates <[EMAIL PROTECTED]> wrote: I've been doing a LOT of testing since my last post. To recap, we're running SL 3.0.5 with XFS file systems and the kernel is 2.4.21-37.EL.XFSsmp. We have 4 SATA drives on 2 controllers. Each drive has 3 partitions. One partition on each drive is a mirrored software RAID for /boot. A second partion on each drive is a swap partition. The final partition on each drive is one large software RAID stripe and root (/) is on this file system. We have a Sony AIT-3 tape drive on a SCSI controller all internal. Tape backups suddenly started taking forever or never finishing because we reached the end of the tape or so it said. We have 27GB to backup and the tape will take 100GB with no compression. Someone thought we might need to clean the tape drive, well AIT drives don't need cleaning as they are self cleaning. When I'd use `xfsdump' to dump to a file on the hard drive, the process goes lighting fast. Only when i try to backup to the tape drive are there problems. The last thing I posted I thought it was a hardware problem with the tape drive. I do not think it is that any more. When I thought it was a hardware problem, I decided to copy everything from the problem system to a subdirectory on another almost identical system using `rsync'. I figured I could use that system's working tape drive to do backups for both machines. Well the problem moved to that system. I tried all sorts of timing tests. Some people thought I might have a bad file that was causing backups to file, so I started deleting different whole directory trees on the subdirectory of the second system trying to find the bad file. I've not found a single file that fixes things. I've found when I delete some directories things speed up. The speed up doesn't seem depend on the disk space used by the directory, but on the number of files/directories under that directory. On the test system, I recently deleted all the files and directories from the primary system and ran a backup. Everything worked fine. I then backed up this test system to a file and then extracted the whole backup to a subdirectory of that system. I basically have the entire file system on the drives twice. I then did a backup to tape and that was dog slow. To me this says the problem is tied to the number of files and directories that need to be backed up. The problem ONLY occurs when backing up to tape and not to a file on the disk drive. Anyone seen anything like this? Anyone have any ideas on how to fix this problem? I find this whole thing very weird. Any and all ideas welcome as I'm not sure where to go from here. Thanks. -- Brent L. Bates (UNIX Sys. Admin.) M.S. 912 Phone:(757) 865-1400, x204 NASA Langley Research CenterFAX:(757) 865-8177 Hampton, Virginia 23681-0001 Email: [EMAIL PROTECTED]http://www.vigyan.com/~blbates/
Problems with Tape Backups - Continued
I've been doing a LOT of testing since my last post. To recap, we're running SL 3.0.5 with XFS file systems and the kernel is 2.4.21-37.EL.XFSsmp. We have 4 SATA drives on 2 controllers. Each drive has 3 partitions. One partition on each drive is a mirrored software RAID for /boot. A second partion on each drive is a swap partition. The final partition on each drive is one large software RAID stripe and root (/) is on this file system. We have a Sony AIT-3 tape drive on a SCSI controller all internal. Tape backups suddenly started taking forever or never finishing because we reached the end of the tape or so it said. We have 27GB to backup and the tape will take 100GB with no compression. Someone thought we might need to clean the tape drive, well AIT drives don't need cleaning as they are self cleaning. When I'd use `xfsdump' to dump to a file on the hard drive, the process goes lighting fast. Only when i try to backup to the tape drive are there problems. The last thing I posted I thought it was a hardware problem with the tape drive. I do not think it is that any more. When I thought it was a hardware problem, I decided to copy everything from the problem system to a subdirectory on another almost identical system using `rsync'. I figured I could use that system's working tape drive to do backups for both machines. Well the problem moved to that system. I tried all sorts of timing tests. Some people thought I might have a bad file that was causing backups to file, so I started deleting different whole directory trees on the subdirectory of the second system trying to find the bad file. I've not found a single file that fixes things. I've found when I delete some directories things speed up. The speed up doesn't seem depend on the disk space used by the directory, but on the number of files/directories under that directory. On the test system, I recently deleted all the files and directories from the primary system and ran a backup. Everything worked fine. I then backed up this test system to a file and then extracted the whole backup to a subdirectory of that system. I basically have the entire file system on the drives twice. I then did a backup to tape and that was dog slow. To me this says the problem is tied to the number of files and directories that need to be backed up. The problem ONLY occurs when backing up to tape and not to a file on the disk drive. Anyone seen anything like this? Anyone have any ideas on how to fix this problem? I find this whole thing very weird. Any and all ideas welcome as I'm not sure where to go from here. Thanks. -- Brent L. Bates (UNIX Sys. Admin.) M.S. 912 Phone:(757) 865-1400, x204 NASA Langley Research CenterFAX:(757) 865-8177 Hampton, Virginia 23681-0001 Email: [EMAIL PROTECTED] http://www.vigyan.com/~blbates/
RE: Problems with Tape Backups - Continued
Brent, Have you tried listening to the tape drive as it works? Is it for instance, writing for a bit, then stopping, backing the tape up, then writing some more? Modern tape units 'stream': that is, they manouver the tape back to well before the end of the last write, wind the tape up to speed, and start writing whne they pass the appropraiet tape mark. If you keep the buffer full, it streams at full speed and all is well. If the buffer gets empty, it has to stop and wind back, then start again. The start-stop is usually clearly audible. So if it is doing start-stop, two guesses: either your box is doing something else while the backup is going on, and this renders it unable to keep the buffer full, OR, the tape heads need cleaning: dirt can cause all sorts of weird problems. You may also see errors in the SCSI transport layer (/var/log/messages) if the tape drive is ailing. SCSI bus resets and the like. If so, check the cable first. Regards, Martin. -- --- Martin Bly RAL Tier1 Fabric Team --- > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On > Behalf Of Brent L. Bates > Sent: 20 April 2007 21:54 > To: Scientific Linux Users mailing list > Subject: Problems with Tape Backups - Continued > > I tried backing up to a file instead of the tape drive > in order to see if I could narrow down where the problem is > located. It completed the full backup of 27GB in 1618 > seconds (27 minutes). I used xfsdump to do it. This would > indicate to me that there isn't anything wrong with the file > system or any problems with large sparse files as others > suggested I check. Since that worked with out problems, it > would seem to me that the real problem is with the tape drive > connection. I've already tried a couple of different tapes > with out any luck. Could it be the drive itself? Anyone > have any other ideas for me to check out? Thanks. >
Problems with Tape Backups - Continued
I tried backing up to a file instead of the tape drive in order to see if I could narrow down where the problem is located. It completed the full backup of 27GB in 1618 seconds (27 minutes). I used xfsdump to do it. This would indicate to me that there isn't anything wrong with the file system or any problems with large sparse files as others suggested I check. Since that worked with out problems, it would seem to me that the real problem is with the tape drive connection. I've already tried a couple of different tapes with out any luck. Could it be the drive itself? Anyone have any other ideas for me to check out? Thanks.