Re: Problems with Tape Backups - Continued

2007-06-07 Thread Brent L. Bates
 Thanks for the comments.  I checked the system logs and we are not using
an Adaptec SCSI controller.  It is a Fusion MPT (integral part of the Intel
mother board), so I guess that rules out the aic79xx driver as the problem.
 Doesn't it?

 The other comment was that our kernel, 2.4.21-37.EL.XFSsmp, is too old
for SCSI tape drives and I need at least 2.4.24.  I do not see any SCSI errors
in any of the logs.  Should I see errors in any log?  I only have problems
when the number of files gets over some unknown limit.  We've got 3 systems
here, all running the same version of SL and have the same tape drive (the 3rd
one has an Adaptec controller).  I haven't seen any problems on the 3rd
system, but I haven't tried pushing the number of files on the system to test
it.  The problem with trying to upgrade to a newer kernel is that SL doesn't
have an XFS capable kernel newer than what I have installed, at least not for
the SL 3.0.x series.  Any ideas?

 Thanks for the help.


Re: Problems with Tape Backups - Continued

2007-06-05 Thread Richard Balthazor

See, eg:

http://linuxtapecert.org/index.php?option=com_content&task=view&id=2&Itemid=1

"if you are using SCSI tape drives under Linux with a 2.4 kernel, the
use of a kernel prior to 2.4.24 will result in SCSI errors and failed
tape I/O operations"

I spent a long time fruitlessly trying to diagnose a similar problem,
that went away after a kernel update.

R.


On 05/06/07, Brent L. Bates <[EMAIL PROTECTED]> wrote:

I've been doing a LOT of testing since my last post.  To recap, we're
running SL 3.0.5 with XFS file systems and the kernel is 2.4.21-37.EL.XFSsmp.
 We have 4 SATA drives on 2 controllers.  Each drive has 3 partitions.  One
partition on each drive is a mirrored software RAID for /boot.  A second
partion on each drive is a swap partition.  The final partition on each drive
is one large software RAID stripe and root (/) is on this file system.  We
have a Sony AIT-3 tape drive on a SCSI controller all internal.

Tape backups suddenly started taking forever or never finishing because
we reached the end of the tape or so it said.  We have 27GB to backup and the
tape will take 100GB with no compression.  Someone thought we might need to
clean the tape drive, well AIT drives don't need cleaning as they are self
cleaning.  When I'd use `xfsdump' to dump to a file on the hard drive, the
process goes lighting fast.  Only when i try to backup to the tape drive are
there problems.

The last thing I posted I thought it was a hardware problem with the tape
drive.  I do not think it is that any more.  When I thought it was a hardware
problem, I decided to copy everything from the problem system to a
subdirectory on another almost identical system using `rsync'.  I figured I
could use that system's working tape drive to do backups for both machines.
 Well the problem moved to that system.

I tried all sorts of timing tests.  Some people thought I might have a
bad file that was causing backups to file, so I started deleting different
whole directory trees on the subdirectory of the second system trying to find
the bad file.  I've not found a single file that fixes things.  I've found
when I delete some directories things speed up.  The speed up doesn't seem
depend on the disk space used by the directory, but on the number of
files/directories under that directory.

On the test system, I recently deleted all the files and directories from
the primary system and ran a backup.  Everything worked fine.  I then backed
up this test system to a file and then extracted the whole backup to a
subdirectory of that system.  I basically have the entire file system on the
drives twice.  I then did a backup to tape and that was dog slow.  To me this
says the problem is tied to the number of files and directories that need to
be backed up.  The problem ONLY occurs when backing up to tape and not to a
file on the disk drive.

Anyone seen anything like this?  Anyone have any ideas on how to fix this
problem?  I find this whole thing very weird.  Any and all ideas welcome as
I'm not sure where to go from here.  Thanks.

--

 Brent L. Bates (UNIX Sys. Admin.)
 M.S. 912  Phone:(757) 865-1400, x204
 NASA Langley Research CenterFAX:(757) 865-8177
 Hampton, Virginia  23681-0001
 Email: [EMAIL PROTECTED]http://www.vigyan.com/~blbates/



Problems with Tape Backups - Continued

2007-06-05 Thread Brent L. Bates
 I've been doing a LOT of testing since my last post.  To recap, we're
running SL 3.0.5 with XFS file systems and the kernel is 2.4.21-37.EL.XFSsmp.
 We have 4 SATA drives on 2 controllers.  Each drive has 3 partitions.  One
partition on each drive is a mirrored software RAID for /boot.  A second
partion on each drive is a swap partition.  The final partition on each drive
is one large software RAID stripe and root (/) is on this file system.  We
have a Sony AIT-3 tape drive on a SCSI controller all internal.

 Tape backups suddenly started taking forever or never finishing because
we reached the end of the tape or so it said.  We have 27GB to backup and the
tape will take 100GB with no compression.  Someone thought we might need to
clean the tape drive, well AIT drives don't need cleaning as they are self
cleaning.  When I'd use `xfsdump' to dump to a file on the hard drive, the
process goes lighting fast.  Only when i try to backup to the tape drive are
there problems.

 The last thing I posted I thought it was a hardware problem with the tape
drive.  I do not think it is that any more.  When I thought it was a hardware
problem, I decided to copy everything from the problem system to a
subdirectory on another almost identical system using `rsync'.  I figured I
could use that system's working tape drive to do backups for both machines.
 Well the problem moved to that system.

 I tried all sorts of timing tests.  Some people thought I might have a
bad file that was causing backups to file, so I started deleting different
whole directory trees on the subdirectory of the second system trying to find
the bad file.  I've not found a single file that fixes things.  I've found
when I delete some directories things speed up.  The speed up doesn't seem
depend on the disk space used by the directory, but on the number of
files/directories under that directory.

 On the test system, I recently deleted all the files and directories from
the primary system and ran a backup.  Everything worked fine.  I then backed
up this test system to a file and then extracted the whole backup to a
subdirectory of that system.  I basically have the entire file system on the
drives twice.  I then did a backup to tape and that was dog slow.  To me this
says the problem is tied to the number of files and directories that need to
be backed up.  The problem ONLY occurs when backing up to tape and not to a
file on the disk drive.

 Anyone seen anything like this?  Anyone have any ideas on how to fix this
problem?  I find this whole thing very weird.  Any and all ideas welcome as
I'm not sure where to go from here.  Thanks.

-- 

  Brent L. Bates (UNIX Sys. Admin.)
  M.S. 912  Phone:(757) 865-1400, x204
  NASA Langley Research CenterFAX:(757) 865-8177
  Hampton, Virginia  23681-0001
  Email: [EMAIL PROTECTED]  http://www.vigyan.com/~blbates/


RE: Problems with Tape Backups - Continued

2007-04-20 Thread Bly, MJ (Martin)
Brent,

Have you tried listening to the tape drive as it works?   Is it for
instance, writing for a bit, then stopping, backing the tape up, then
writing some more?

Modern tape units 'stream': that is, they manouver the tape back to well
before the end of the last write, wind the tape up to speed, and start
writing whne they pass the appropraiet tape mark.  If you keep the
buffer full, it streams at full speed and all is well.  If the buffer
gets empty, it has to stop and wind back, then start again.   The
start-stop is usually clearly audible.

So if it is doing start-stop, two guesses: either your box is doing
something else while the backup is going on, and this renders it unable
to keep the buffer full, OR, the tape heads need cleaning: dirt can
cause all sorts of weird problems.

You may also see errors in the SCSI transport layer (/var/log/messages)
if the tape drive is ailing.  SCSI bus resets and the like.  If so,
check the cable first.

Regards,
Martin.

-- 
   ---
  Martin Bly
  RAL Tier1 Fabric Team
   --- 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Brent L. Bates
> Sent: 20 April 2007 21:54
> To: Scientific Linux Users mailing list
> Subject: Problems with Tape Backups - Continued
> 
>  I tried backing up to a file instead of the tape drive 
> in order to see if I could narrow down where the problem is 
> located.  It completed the full backup of 27GB in 1618 
> seconds (27 minutes).  I used xfsdump to do it.  This would 
> indicate to me that there isn't anything wrong with the file 
> system or any problems with large sparse files as others 
> suggested I check.  Since that worked with out problems, it 
> would seem to me that the real problem is with the tape drive 
> connection.  I've already tried a couple of different tapes 
> with out any luck.  Could it be the drive itself?  Anyone 
> have any other ideas for me to check out?  Thanks.
> 


Problems with Tape Backups - Continued

2007-04-20 Thread Brent L. Bates
 I tried backing up to a file instead of the tape drive in order to see if
I could narrow down where the problem is located.  It completed the full
backup of 27GB in 1618 seconds (27 minutes).  I used xfsdump to do it.  This
would indicate to me that there isn't anything wrong with the file system or
any problems with large sparse files as others suggested I check.  Since that
worked with out problems, it would seem to me that the real problem is with
the tape drive connection.  I've already tried a couple of different tapes
with out any luck.  Could it be the drive itself?  Anyone have any other ideas
for me to check out?  Thanks.