Re: tape throughput - lto1

2006-07-06 Thread Brian Cuttler

Joshua,

On Thu, Jul 06, 2006 at 11:12:39AM -0400, Joshua Baker-LePain wrote:
> On Thu, 6 Jul 2006 at 10:24am, Brian Cuttler wrote
> 
> >I've added more work area to amanda, have been trying to find
> >what other problems we may be seeing with the job, since it
> >still seems to take longer than it should.
> >
> >Upon looking more closely at the amanda report from amdump I see
> >that the tape I/O rate is around 1800 KP/s where as 2 months ago
> >15,000 was not unusual.
> >
> >The reduction does not seem to be tied to a system reboot (patches,
> >installation of HBA [host bus adapter] for the LTO3) nor any other
> >event that I can identify, and in fact I notice that we seem to have
> >two step downs in the I/O rate, separated by aprox one tape cycle.
> 
> I'm going to assume that the drive is LTO1, since you say that twice (in 
> the subject, and in the part I snipped below) and LTO3 only once.  :) 
> Based on some quick specs I found 16MB/s is native rate for LTO1, so your 
> 15K above was normal.

Yes, the drive is actually an LTO1, we've run the L9/LTO1 jukebox
in excess of 3 years and do not have a service contract for it. The
system library died a couple of months back, circuit failure. The
LTO3 is part of the StorEdge C2 Library that we have not yet put into
production... maybe tonight is the night.

> My first suspicion would be that your DLE(s) outgrew your holding space, 
> so now they're dumping straight to tape over the network.  But the amflush 
> you mention below would appear to speak against that.

This was last weeks problem, to many DLE not enough holding disk, though
I now know that this was do in large part to the fact that I have not
been able to clean off the work area by putting the DLEs to tape in a
timely fashion.

> >I have tried to clean the tape drive, have tried to relabel the tape
> >(amflush running as I write) and will next try a brand new tape with
> >the assumption that the max number of tape cycles has been reached
> >on all volumes at the same time. While that would mean remarkable
> >quality control in manufacture, the tapes where all purchased at the
> >same time and have been used an almost identical number of times.
> >
> >If the new tape doesn't help (I expect it will but who knows) I don't
> >know what else it might be, wear of the tape heads ?
> 
> Have you tested the tape performance outside of amanda?  amflush *should* 
> go as fast as the tape and disk drives will let it, but it never hurts to 
> take as many things out of the equation as you can.  Try 'dd'ing from 
> /dev/zero (or the Solaris equivalent) to the tape drive and see how fast 
> that goes.  Ditto for tar with various block sizes.

I haven't (yet) tested the tape performance outside of amanda, if/when
I get access to the drive (amflush completes) I need to archive my FW
and proxy logs. That is a fairly substantial quanitity of data and will
run outside of amanda.

> The blocksize settings didn't get mucked up, did they?

Great question, but I don't see how they could have been stepped down
2x, several weeks apart and the first occurance being aprox 2 weeks
after the most recent reboot. I will check that further if the current
set of write tests don't show any improvement, while not my first
guess it is one of the more readily fixable problems.

> If anything you do to the drive only goes at 1800KB/s, I'd say it's time 
> to call support.  Did I hear the word Dell?  *shudder*  Good luck with 
> that (from a fellow Dell "user").  ;)

A flush of a DLE to the new LTO3 showed the expected write rate, so I
don't see a bus or disk problem on that side of the CPU. We still have
the bus/tape on this side to worry about. Yes, the LTO1 and LTO3 are
actually on separate buses, for that matter the work areas don't share
either of the buses used by the tape libraries buses.

A relabel of the tape did nothing for performance. I am running an
amflush with a brand new tape now, though some sort of error on the
library console (an LCD window) or in the messages file would have been
nice to find if it where the heads or the media.

Will let you know but if my estimates of flush time are any good I'm
not getting the results I'd hoped for.

> -- 
> Joshua Baker-LePain
> Department of Biomedical Engineering
> Duke University
---
   Brian R Cuttler [EMAIL PROTECTED]
   Computer Systems Support(v) 518 486-1697
   Wadsworth Center(f) 518 473-6384
   NYS Department of HealthHelp Desk 518 473-0773



Re: tape throughput - lto1

2006-07-06 Thread Joshua Baker-LePain

On Thu, 6 Jul 2006 at 10:24am, Brian Cuttler wrote


I've added more work area to amanda, have been trying to find
what other problems we may be seeing with the job, since it
still seems to take longer than it should.

Upon looking more closely at the amanda report from amdump I see
that the tape I/O rate is around 1800 KP/s where as 2 months ago
15,000 was not unusual.

The reduction does not seem to be tied to a system reboot (patches,
installation of HBA [host bus adapter] for the LTO3) nor any other
event that I can identify, and in fact I notice that we seem to have
two step downs in the I/O rate, separated by aprox one tape cycle.


I'm going to assume that the drive is LTO1, since you say that twice (in 
the subject, and in the part I snipped below) and LTO3 only once.  :) 
Based on some quick specs I found 16MB/s is native rate for LTO1, so your 
15K above was normal.


My first suspicion would be that your DLE(s) outgrew your holding space, 
so now they're dumping straight to tape over the network.  But the amflush 
you mention below would appear to speak against that.



I have tried to clean the tape drive, have tried to relabel the tape
(amflush running as I write) and will next try a brand new tape with
the assumption that the max number of tape cycles has been reached
on all volumes at the same time. While that would mean remarkable
quality control in manufacture, the tapes where all purchased at the
same time and have been used an almost identical number of times.

If the new tape doesn't help (I expect it will but who knows) I don't
know what else it might be, wear of the tape heads ?


Have you tested the tape performance outside of amanda?  amflush *should* 
go as fast as the tape and disk drives will let it, but it never hurts to 
take as many things out of the equation as you can.  Try 'dd'ing from 
/dev/zero (or the Solaris equivalent) to the tape drive and see how fast 
that goes.  Ditto for tar with various block sizes.


The blocksize settings didn't get mucked up, did they?

If anything you do to the drive only goes at 1800KB/s, I'd say it's time 
to call support.  Did I hear the word Dell?  *shudder*  Good luck with 
that (from a fellow Dell "user").  ;)


--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


tape throughput - lto1

2006-07-06 Thread Brian Cuttler
Hello All,

I've added more work area to amanda, have been trying to find
what other problems we may be seeing with the job, since it
still seems to take longer than it should.

Upon looking more closely at the amanda report from amdump I see
that the tape I/O rate is around 1800 KP/s where as 2 months ago
15,000 was not unusual.

The reduction does not seem to be tied to a system reboot (patches,
installation of HBA [host bus adapter] for the LTO3) nor any other
event that I can identify, and in fact I notice that we seem to have
two step downs in the I/O rate, separated by aprox one tape cycle.

I have tried to clean the tape drive, have tried to relabel the tape
(amflush running as I write) and will next try a brand new tape with
the assumption that the max number of tape cycles has been reached
on all volumes at the same time. While that would mean remarkable
quality control in manufacture, the tapes where all purchased at the
same time and have been used an almost identical number of times.

If the new tape doesn't help (I expect it will but who knows) I don't
know what else it might be, wear of the tape heads ?

I don't see anything in the /var/adm/messages file (Solaris 9 host)
nor errors on the tape library console, LTO1 is part of a StorEdge L9
library.

I do see the library green light pulse and then go idle for a while.
I don't know if this is typical since the library is down in the 
computer room and we usually run amanda at night. I don't know if the
tape is retrying or we aren't feeding it quickly enough, [note to self,
there must be some I/O metrics I can check]

Any comments or suggestions are welcome.

thank you,

Brian
---
   Brian R Cuttler [EMAIL PROTECTED]
   Computer Systems Support(v) 518 486-1697
   Wadsworth Center(f) 518 473-6384
   NYS Department of HealthHelp Desk 518 473-0773