Re: Urgent - Library Master mount queue breaking down, tapes going into RESERVED status and never getting mounted

2010-09-17 Thread John D. Schneider
My thanks to all who replied to my requests for help last Friday.

I thought I would reply and let everybody know how this played out.  

In our situation, we had 128 virtual tape drives, and for two nights in
a row, the TSM Library Master instance was getting into a state where
there would be 80 or more virtual tapes in RESERVED status, and at the
same time hundreds of clients in MediaWait waiting for virtual tape
mounts.  

The basic underlying problem was a Windows LAN-free server that was
using about 40 of our 128 virtual tape mounts, and not giving them back.
 The Storage Agent wasn't down, but it wasn't responding right, either. 
For example, when it is normally working, you can issue a q mount
command to it from the Library Master and get a response back instantly.
 But last week it was causing the Library Master to hang for 10 seconds,
then give us an error that the Storage Agent had replied with errors.  
So not only did the Library Master not know how to get back the 40
virtual tapes, but under heavy load the Library Master's queue would
grow rapidly while it was issuing requests to the Storage Agent over and
over, and waiting 10 seconds between each reply. 

The problem would seem to go away for awhile if we restarted the Library
Master, and during the day the problem would seem to go away because we
don't need all 128 virtual drives, and the tape mounts are fewer and
farther between.  But as soon as backup load picked up at night, the
Library Master would get into trouble.  

Once we understood the underlying problem, we restarted the Windows
LAN-free server, and the 40 virtual tapes freed up, and we were in
business.  We also realized that under normal circumstances we were
using over 110 virtual tapes at night, and so we allocated an additional
64 virtual tape drives to the environment, just to relieve that
potential bottleneck.  

For now we have turned of the LAN-free storage agent, and have come to
the conclusion that running that particular client LAN-free does nothing
to improve it's performance.  It's backup runs just as fast across the
LAN is it did directly to tape, so we will probably just leave it that
way.



Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



 Original Message 
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking
down, tapes going into RESERVED status and never getting mounted
From: Prather, Wanda wprat...@icfi.com
Date: Fri, September 10, 2010 4:03 pm
To: ADSM-L@VM.MARIST.EDU

And you've probably done this already, but you should be able to log
into the CDL and look at it's CPU busy, make sure IT isn't
overwhelmed...


-Original Message-
From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of
Richard Rhodes
Sent: Friday, September 10, 2010 5:01 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking down,
tapes going into RESERVED status and never getting mounted

One time when we had problems like this it was caused by rmt devices
being
out of sync with TSM paths. We never did figure out how it occured, but
we
ended up blowing away all our paths and drives, and recreating it.

Rick






 John D.
 Schneider
 john.schnei...@c To
 OMPUTERCOACHINGCO ADSM-L@VM.MARIST.EDU
 MMUNITY.COM cc
 Sent by: ADSM:
 Dist Stor Subject
 Manager Re: Urgent - Library Master mount
 ads...@vm.marist queue breaking down, tapes going
 .EDU into RESERVED status and never
 getting mounted

 09/10/2010 04:39
 PM


 Please respond to
 ADSM: Dist Stor
 Manager
 ads...@vm.marist
 .EDU






Richard,
 All good suggestions. No AIX errors with the VTL or VTL drives. We
are using the Atape driver, because the VTL is emulating a 3584 with
LTO1 drives.

But there are a number of Atape files, in particular Atape.smc0.traceX.
I look in them and see regular errors in them; but I wonder if this is a
red herring. Because I look on the Library Master for a physical 3584
library, and I see similar trace files, and the same sort of errors on
the smc1 device for a real 3584 library.

So are these libraries always getting these errors?

I looked at our SAN switches a couple days ago, and zeroed out the error
counters for the AIX host, the EDL, and the ISLs between the switches.
Two days later, and all those ports are totally error free. So I don't
see how it could be in the switches.

All good ideas, and I don't mean to disparage them. I just don't see a
smoking gun, yet.

Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



 Original Message 
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking
down, tapes going into RESERVED status and never getting mounted
From: Richard Rhodes rrho...@firstenergycorp.com
Date: Fri, September 10, 2010 12:44 pm
To: ADSM-L@VM.MARIST.EDU

Sounds like maybe the library 

Re: Urgent - Library Master mount queue breaking down, tapes going into RESERVED status and never getting mounted

2010-09-10 Thread Richard Rhodes
Sounds like maybe the library manager is not communicating with the VTL.
Some things to check:

- any errors in the AIX error log?
- any errors in the VTL?
- any san errors?

If you are running atape . . .
- check the logs in /var/adm/ras
- are you running multi-pathing?  If yes, what is the status of the paths?

Atape with multi-paths is very good at hiding hardware problems.


Rick





 John D.
 Schneider
 john.schnei...@c  To
 OMPUTERCOACHINGCO ADSM-L@VM.MARIST.EDU
 MMUNITY.COM   cc
 Sent by: ADSM:
 Dist Stor Subject
 Manager  Urgent - Library Master mount queue
 ads...@vm.marist breaking down, tapes going into
 .EDU RESERVED status and never getting
   mounted

 09/10/2010 01:05
 PM


 Please respond to
 ADSM: Dist Stor
 Manager
 ads...@vm.marist
   .EDU






 Greetings,
   Our environement is 8 TSM instances on AIX, running AIX 5.3ML11, and
TSM 5.4.3.0.  I know we are rather far behind, but this has been an
extremely stable version for us, until just recently.  There are 4
instances on one AIX host, and 4 on the other. The hosts are pSeries
570s. There is also a Windows Lan-free client in the mix.  Total client
count about 1500, in schedules more or less spread across the night.
Performance of backups is OK; the AIX hosts are generally 20-30 CPU
loaded across 8 CPUs.
   One of the TSM instances servers as a TSM Library Master for the
others, and has no other workload. It mounts tapes for a EMC Disk
library (virtual library), configured with 128 virtual LTO1 tape drives,
shared between all the instances.  The device class for the library has
a 15 minute mount retention period.  The clients mostly can only mount a
single virtual tape.  A few larger database servers are allowed to mount
more.  All have keep mount point set to yes.
   This basic configuration has been in place about three years.  At
first we had problems, and had to put LIBSHRTIMEOUT 60 and COMMTIMEOUT
3600 in the dsmserv.opt of the Library Master.  But it has been many
months since we had to make any configuration changes to the
environment.  I like STABLE.
   But things are growing, and we are adding new clients all the time,
and have added about forty in the last few weeks.
   A couple weeks ago, the Library Master instance got into a state
where there were lots of tapes in RESERVED status when we did a 'q
mount'.  There were still occasional mounts happening, but lots of
clients were in Media wait.  We restarted the Library Master and the
problem went away, but then it came back like a week later.
  Now it is happening every day.  Last night we stayed up all night
watching it, and at first could see just a couple of RESERVED tape
drives, and lots of normal mounts coming and going.  Then slowly the
number of RESERVED ones would creap up over the course of an hour or two
until there were 80 or more in RESERVED status, and dozens of clients in
Media wait.  Ordinarily virtual tape mounts take 2-4 seconds.  Last
night during the problem they were taking 15-20 seconds.  At about 1am
we restarted the Library Master, and the RESERVED drives went away, but
were back again within the hour.
   One thing I noticed then was that the Library Master had over 300
sessions, all admin. Usually it has very few.  Our MAXSESSIONS was set
to 500, so I wondered if perhaps were were overrunning it.  We bumped it
up to 1000 on all instances.  We restarted all TSM instances this time,
including the lan-free one.  (The lan-free Windows server was hung,
although we don't know if this is coincidence, or has something to do
with anything).
   After we restarted, we appeared to be stable for about 4 hours, so we
started rerunning a bunch of the TSM clients that failed last night
during the problem.  In no time at all the RESERVED list grew huge,
clients were in Media wait again, and we had to restart the Library
Master again.

   So it seems like to me the problem has to do with the Library
Master's queuing mechanism.  Somehow it is becoming overwhelmed with
tape mount requests, and can't satisfy them all, so they go into
RESERVED status.  This is somewhat normal behavior, and we see drives go
into RESERVED status lots of times when a burst of mounts happens at
once, but then the queue clears after a few minutes.  But even after an
hour or two it never catches up, and things go from bad to worse.

   One other tidbit, but might not even be related.  Back on 8/23 our
EMC Disk library had a drive fail, but within 24 hours had rebuilt onto
a spare.  We just found out about it, and haven't replaced the drive.  I
don't think it is related, but I 

Re: Urgent - Library Master mount queue breaking down, tapes going into RESERVED status and never getting mounted

2010-09-10 Thread Dwight Cook
Also have to wonder if maybe the VTL hasn't lost multiple drives in a raid
array and is (possibly) working off of parity or at minimum, rebuilding on a
spare drive though that would generally happen and be done with  (not be
seen over and over again)  but... if you have a raid array that has lost
excessive drives and is operating off parity, that would greatly slow down
processing when things get busy.


-Original Message-
From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of
Richard Rhodes
Sent: Friday, September 10, 2010 12:44 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking down,
tapes going into RESERVED status and never getting mounted

Sounds like maybe the library manager is not communicating with the VTL.
Some things to check:

- any errors in the AIX error log?
- any errors in the VTL?
- any san errors?

If you are running atape . . .
- check the logs in /var/adm/ras
- are you running multi-pathing?  If yes, what is the status of the paths?

Atape with multi-paths is very good at hiding hardware problems.


Rick





 John D.
 Schneider
 john.schnei...@c  To
 OMPUTERCOACHINGCO ADSM-L@VM.MARIST.EDU
 MMUNITY.COM   cc
 Sent by: ADSM:
 Dist Stor Subject
 Manager  Urgent - Library Master mount queue
 ads...@vm.marist breaking down, tapes going into
 .EDU RESERVED status and never getting
   mounted

 09/10/2010 01:05
 PM


 Please respond to
 ADSM: Dist Stor
 Manager
 ads...@vm.marist
   .EDU






 Greetings,
   Our environement is 8 TSM instances on AIX, running AIX 5.3ML11, and
TSM 5.4.3.0.  I know we are rather far behind, but this has been an
extremely stable version for us, until just recently.  There are 4
instances on one AIX host, and 4 on the other. The hosts are pSeries
570s. There is also a Windows Lan-free client in the mix.  Total client
count about 1500, in schedules more or less spread across the night.
Performance of backups is OK; the AIX hosts are generally 20-30 CPU
loaded across 8 CPUs.
   One of the TSM instances servers as a TSM Library Master for the
others, and has no other workload. It mounts tapes for a EMC Disk
library (virtual library), configured with 128 virtual LTO1 tape drives,
shared between all the instances.  The device class for the library has
a 15 minute mount retention period.  The clients mostly can only mount a
single virtual tape.  A few larger database servers are allowed to mount
more.  All have keep mount point set to yes.
   This basic configuration has been in place about three years.  At
first we had problems, and had to put LIBSHRTIMEOUT 60 and COMMTIMEOUT
3600 in the dsmserv.opt of the Library Master.  But it has been many
months since we had to make any configuration changes to the
environment.  I like STABLE.
   But things are growing, and we are adding new clients all the time,
and have added about forty in the last few weeks.
   A couple weeks ago, the Library Master instance got into a state
where there were lots of tapes in RESERVED status when we did a 'q
mount'.  There were still occasional mounts happening, but lots of
clients were in Media wait.  We restarted the Library Master and the
problem went away, but then it came back like a week later.
  Now it is happening every day.  Last night we stayed up all night
watching it, and at first could see just a couple of RESERVED tape
drives, and lots of normal mounts coming and going.  Then slowly the
number of RESERVED ones would creap up over the course of an hour or two
until there were 80 or more in RESERVED status, and dozens of clients in
Media wait.  Ordinarily virtual tape mounts take 2-4 seconds.  Last
night during the problem they were taking 15-20 seconds.  At about 1am
we restarted the Library Master, and the RESERVED drives went away, but
were back again within the hour.
   One thing I noticed then was that the Library Master had over 300
sessions, all admin. Usually it has very few.  Our MAXSESSIONS was set
to 500, so I wondered if perhaps were were overrunning it.  We bumped it
up to 1000 on all instances.  We restarted all TSM instances this time,
including the lan-free one.  (The lan-free Windows server was hung,
although we don't know if this is coincidence, or has something to do
with anything).
   After we restarted, we appeared to be stable for about 4 hours, so we
started rerunning a bunch of the TSM clients that failed last night
during the problem.  In no time at all the RESERVED list grew huge,
clients were in Media wait again, and we had to restart the Library
Master again.

   So 

Re: Urgent - Library Master mount queue breaking down, tapes going into RESERVED status and never getting mounted

2010-09-10 Thread Curtis Stewart
Good Afternoon Jeff,

Thank you for the email.

Clearly there's a discrepancy around the capacity required for your 
environment. Our sizing is based on the capacity of your full backup.  I'm not 
sure how EMC is sizing their solution. I do know through experience, that EMC 
will often undersize their solution to meet a price point. We prefer to size 
things appropriately from the start. If your full backup is truly 6TB, we can 
surely lower our configuration to that level. But, I think you'd like to have a 
bit of room for growth. To the best of my recollection, the initial EMC 
configuration was 9TB. If that's the case, then it's come quite a bit to the 
6TB level.

In addition, there is a large amount of functionality in the Dell/CommVault 
solution that doesn't appear to be included with the EMC design. The primary 
issue I see is your very long retention requirement that is only well served by 
traditional tape based storage. The Dell/CommVault solution will allow you to 
maintain this data in deduplicated format on tape. I do not know what the EMC  
solution is for this data, but I suspect it requires more cost than what is 
currently proposed.

Looking forward to next week!

Curtis Stewart

-Original Message-
From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of Dwight 
Cook
Sent: Friday, September 10, 2010 1:51 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking down, tapes 
going into RESERVED status and never getting mounted

Also have to wonder if maybe the VTL hasn't lost multiple drives in a raid
array and is (possibly) working off of parity or at minimum, rebuilding on a
spare drive though that would generally happen and be done with  (not be
seen over and over again)  but... if you have a raid array that has lost
excessive drives and is operating off parity, that would greatly slow down
processing when things get busy.


-Original Message-
From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of
Richard Rhodes
Sent: Friday, September 10, 2010 12:44 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking down,
tapes going into RESERVED status and never getting mounted

Sounds like maybe the library manager is not communicating with the VTL.
Some things to check:

- any errors in the AIX error log?
- any errors in the VTL?
- any san errors?

If you are running atape . . .
- check the logs in /var/adm/ras
- are you running multi-pathing?  If yes, what is the status of the paths?

Atape with multi-paths is very good at hiding hardware problems.


Rick





 John D.
 Schneider
 john.schnei...@c  To
 OMPUTERCOACHINGCO ADSM-L@VM.MARIST.EDU
 MMUNITY.COM   cc
 Sent by: ADSM:
 Dist Stor Subject
 Manager  Urgent - Library Master mount queue
 ads...@vm.marist breaking down, tapes going into
 .EDU RESERVED status and never getting
   mounted

 09/10/2010 01:05
 PM


 Please respond to
 ADSM: Dist Stor
 Manager
 ads...@vm.marist
   .EDU






 Greetings,
   Our environement is 8 TSM instances on AIX, running AIX 5.3ML11, and
TSM 5.4.3.0.  I know we are rather far behind, but this has been an
extremely stable version for us, until just recently.  There are 4
instances on one AIX host, and 4 on the other. The hosts are pSeries
570s. There is also a Windows Lan-free client in the mix.  Total client
count about 1500, in schedules more or less spread across the night.
Performance of backups is OK; the AIX hosts are generally 20-30 CPU
loaded across 8 CPUs.
   One of the TSM instances servers as a TSM Library Master for the
others, and has no other workload. It mounts tapes for a EMC Disk
library (virtual library), configured with 128 virtual LTO1 tape drives,
shared between all the instances.  The device class for the library has
a 15 minute mount retention period.  The clients mostly can only mount a
single virtual tape.  A few larger database servers are allowed to mount
more.  All have keep mount point set to yes.
   This basic configuration has been in place about three years.  At
first we had problems, and had to put LIBSHRTIMEOUT 60 and COMMTIMEOUT
3600 in the dsmserv.opt of the Library Master.  But it has been many
months since we had to make any configuration changes to the
environment.  I like STABLE.
   But things are growing, and we are adding new clients all the time,
and have added about forty in the last few weeks.
   A couple weeks ago, the Library Master instance got into a state
where there were lots of tapes in RESERVED status when we did a 

Re: Urgent - Library Master mount queue breaking down, tapes going into RESERVED status and never getting mounted

2010-09-10 Thread Richard Rhodes
Or . .. maybe the vtl is simply full.  Or, if it's a background dedup box,
maybe it can't keep up with the rate of data coming in.






 Dwight Cook
 coo...@cox.net
 Sent by: ADSM:To
 Dist Stor ADSM-L@VM.MARIST.EDU
 Manager   cc
 ads...@vm.marist
 .EDU Subject
   Re: Urgent - Library Master mount
   queue breaking down, tapes going
 09/10/2010 02:51  into RESERVED status and never
 PMgetting mounted


 Please respond to
 ADSM: Dist Stor
 Manager
 ads...@vm.marist
   .EDU






Also have to wonder if maybe the VTL hasn't lost multiple drives in a raid
array and is (possibly) working off of parity or at minimum, rebuilding on
a
spare drive though that would generally happen and be done with  (not be
seen over and over again)  but... if you have a raid array that has lost
excessive drives and is operating off parity, that would greatly slow down
processing when things get busy.


-Original Message-
From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of
Richard Rhodes
Sent: Friday, September 10, 2010 12:44 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking down,
tapes going into RESERVED status and never getting mounted

Sounds like maybe the library manager is not communicating with the VTL.
Some things to check:

- any errors in the AIX error log?
- any errors in the VTL?
- any san errors?

If you are running atape . . .
- check the logs in /var/adm/ras
- are you running multi-pathing?  If yes, what is the status of the paths?

Atape with multi-paths is very good at hiding hardware problems.


Rick





 John D.
 Schneider
 john.schnei...@c  To
 OMPUTERCOACHINGCO ADSM-L@VM.MARIST.EDU
 MMUNITY.COM   cc
 Sent by: ADSM:
 Dist Stor Subject
 Manager  Urgent - Library Master mount queue
 ads...@vm.marist breaking down, tapes going into
 .EDU RESERVED status and never getting
   mounted

 09/10/2010 01:05
 PM


 Please respond to
 ADSM: Dist Stor
 Manager
 ads...@vm.marist
   .EDU






 Greetings,
   Our environement is 8 TSM instances on AIX, running AIX 5.3ML11, and
TSM 5.4.3.0.  I know we are rather far behind, but this has been an
extremely stable version for us, until just recently.  There are 4
instances on one AIX host, and 4 on the other. The hosts are pSeries
570s. There is also a Windows Lan-free client in the mix.  Total client
count about 1500, in schedules more or less spread across the night.
Performance of backups is OK; the AIX hosts are generally 20-30 CPU
loaded across 8 CPUs.
   One of the TSM instances servers as a TSM Library Master for the
others, and has no other workload. It mounts tapes for a EMC Disk
library (virtual library), configured with 128 virtual LTO1 tape drives,
shared between all the instances.  The device class for the library has
a 15 minute mount retention period.  The clients mostly can only mount a
single virtual tape.  A few larger database servers are allowed to mount
more.  All have keep mount point set to yes.
   This basic configuration has been in place about three years.  At
first we had problems, and had to put LIBSHRTIMEOUT 60 and COMMTIMEOUT
3600 in the dsmserv.opt of the Library Master.  But it has been many
months since we had to make any configuration changes to the
environment.  I like STABLE.
   But things are growing, and we are adding new clients all the time,
and have added about forty in the last few weeks.
   A couple weeks ago, the Library Master instance got into a state
where there were lots of tapes in RESERVED status when we did a 'q
mount'.  There were still occasional mounts happening, but lots of
clients were in Media wait.  We restarted the Library Master and the
problem went away, but then it came back like a week later.
  Now it is happening every day.  Last night we stayed up all night
watching it, and at first could see just a couple of RESERVED tape
drives, and lots of normal mounts coming and going.  Then slowly the
number of RESERVED ones would creap up over the course of an hour or two
until there were 80 or more in RESERVED status, and dozens of clients in
Media wait.  Ordinarily virtual tape mounts take 2-4 seconds.  

Re: Urgent - Library Master mount queue breaking down, tapes going into RESERVED status and never getting mounted

2010-09-10 Thread John D. Schneider
Richard,
   All good suggestions.  No AIX errors with the VTL or VTL drives.  We
are using the Atape driver, because the VTL is emulating a 3584 with
LTO1 drives.

But there are a number of Atape files, in particular Atape.smc0.traceX. 
I look in them and see regular errors in them; but I wonder if this is a
red herring.  Because I look on the Library Master for a physical 3584
library, and I see similar trace files, and the same sort of errors on
the smc1 device for a real 3584 library.

So are these libraries always getting these errors?

I looked at our SAN switches a couple days ago, and zeroed out the error
counters for the AIX host, the EDL, and the ISLs between the switches. 
Two days later, and all those ports are totally error free.  So I don't
see how it could be in the switches.

All good ideas, and I don't mean to disparage them.  I just don't see a
smoking gun, yet.

Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



 Original Message 
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking
down, tapes going into RESERVED status and never getting mounted
From: Richard Rhodes rrho...@firstenergycorp.com
Date: Fri, September 10, 2010 12:44 pm
To: ADSM-L@VM.MARIST.EDU

Sounds like maybe the library manager is not communicating with the VTL.
Some things to check:

- any errors in the AIX error log?
- any errors in the VTL?
- any san errors?

If you are running atape . . .
- check the logs in /var/adm/ras
- are you running multi-pathing? If yes, what is the status of the
paths?

Atape with multi-paths is very good at hiding hardware problems.


Rick





 John D.
 Schneider
 john.schnei...@c To
 OMPUTERCOACHINGCO ADSM-L@VM.MARIST.EDU
 MMUNITY.COM cc
 Sent by: ADSM:
 Dist Stor Subject
 Manager Urgent - Library Master mount queue
 ads...@vm.marist breaking down, tapes going into
 .EDU RESERVED status and never getting
 mounted

 09/10/2010 01:05
 PM


 Please respond to
 ADSM: Dist Stor
 Manager
 ads...@vm.marist
 .EDU






 Greetings,
 Our environement is 8 TSM instances on AIX, running AIX 5.3ML11, and
TSM 5.4.3.0. I know we are rather far behind, but this has been an
extremely stable version for us, until just recently. There are 4
instances on one AIX host, and 4 on the other. The hosts are pSeries
570s. There is also a Windows Lan-free client in the mix. Total client
count about 1500, in schedules more or less spread across the night.
Performance of backups is OK; the AIX hosts are generally 20-30 CPU
loaded across 8 CPUs.
 One of the TSM instances servers as a TSM Library Master for the
others, and has no other workload. It mounts tapes for a EMC Disk
library (virtual library), configured with 128 virtual LTO1 tape drives,
shared between all the instances. The device class for the library has
a 15 minute mount retention period. The clients mostly can only mount a
single virtual tape. A few larger database servers are allowed to mount
more. All have keep mount point set to yes.
 This basic configuration has been in place about three years. At
first we had problems, and had to put LIBSHRTIMEOUT 60 and COMMTIMEOUT
3600 in the dsmserv.opt of the Library Master. But it has been many
months since we had to make any configuration changes to the
environment. I like STABLE.
 But things are growing, and we are adding new clients all the time,
and have added about forty in the last few weeks.
 A couple weeks ago, the Library Master instance got into a state
where there were lots of tapes in RESERVED status when we did a 'q
mount'. There were still occasional mounts happening, but lots of
clients were in Media wait. We restarted the Library Master and the
problem went away, but then it came back like a week later.
 Now it is happening every day. Last night we stayed up all night
watching it, and at first could see just a couple of RESERVED tape
drives, and lots of normal mounts coming and going. Then slowly the
number of RESERVED ones would creap up over the course of an hour or two
until there were 80 or more in RESERVED status, and dozens of clients in
Media wait. Ordinarily virtual tape mounts take 2-4 seconds. Last
night during the problem they were taking 15-20 seconds. At about 1am
we restarted the Library Master, and the RESERVED drives went away, but
were back again within the hour.
 One thing I noticed then was that the Library Master had over 300
sessions, all admin. Usually it has very few. Our MAXSESSIONS was set
to 500, so I wondered if perhaps were were overrunning it. We bumped it
up to 1000 on all instances. We restarted all TSM instances this time,
including the lan-free one. (The lan-free Windows server was hung,
although we don't know if this is coincidence, or has something to do
with anything).
 After we restarted, we appeared to be stable for about 4 hours, so we
started rerunning a bunch of the TSM clients that failed last night
during the problem. In no time at 

Re: Urgent - Library Master mount queue breaking down, tapes going into RESERVED status and never getting mounted

2010-09-10 Thread Richard Rhodes
One time when we had problems like this it was caused by rmt devices being
out of sync with TSM paths.  We never did figure out how it occured, but we
ended up blowing away all our paths and drives, and recreating it.

Rick






 John D.
 Schneider
 john.schnei...@c  To
 OMPUTERCOACHINGCO ADSM-L@VM.MARIST.EDU
 MMUNITY.COM   cc
 Sent by: ADSM:
 Dist Stor Subject
 Manager  Re: Urgent - Library Master mount
 ads...@vm.marist queue breaking down, tapes going
 .EDU into RESERVED status and never
   getting mounted

 09/10/2010 04:39
 PM


 Please respond to
 ADSM: Dist Stor
 Manager
 ads...@vm.marist
   .EDU






Richard,
   All good suggestions.  No AIX errors with the VTL or VTL drives.  We
are using the Atape driver, because the VTL is emulating a 3584 with
LTO1 drives.

But there are a number of Atape files, in particular Atape.smc0.traceX.
I look in them and see regular errors in them; but I wonder if this is a
red herring.  Because I look on the Library Master for a physical 3584
library, and I see similar trace files, and the same sort of errors on
the smc1 device for a real 3584 library.

So are these libraries always getting these errors?

I looked at our SAN switches a couple days ago, and zeroed out the error
counters for the AIX host, the EDL, and the ISLs between the switches.
Two days later, and all those ports are totally error free.  So I don't
see how it could be in the switches.

All good ideas, and I don't mean to disparage them.  I just don't see a
smoking gun, yet.

Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



 Original Message 
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking
down, tapes going into RESERVED status and never getting mounted
From: Richard Rhodes rrho...@firstenergycorp.com
Date: Fri, September 10, 2010 12:44 pm
To: ADSM-L@VM.MARIST.EDU

Sounds like maybe the library manager is not communicating with the VTL.
Some things to check:

- any errors in the AIX error log?
- any errors in the VTL?
- any san errors?

If you are running atape . . .
- check the logs in /var/adm/ras
- are you running multi-pathing? If yes, what is the status of the
paths?

Atape with multi-paths is very good at hiding hardware problems.


Rick





 John D.
 Schneider
 john.schnei...@c To
 OMPUTERCOACHINGCO ADSM-L@VM.MARIST.EDU
 MMUNITY.COM cc
 Sent by: ADSM:
 Dist Stor Subject
 Manager Urgent - Library Master mount queue
 ads...@vm.marist breaking down, tapes going into
 .EDU RESERVED status and never getting
 mounted

 09/10/2010 01:05
 PM


 Please respond to
 ADSM: Dist Stor
 Manager
 ads...@vm.marist
 .EDU






 Greetings,
 Our environement is 8 TSM instances on AIX, running AIX 5.3ML11, and
TSM 5.4.3.0. I know we are rather far behind, but this has been an
extremely stable version for us, until just recently. There are 4
instances on one AIX host, and 4 on the other. The hosts are pSeries
570s. There is also a Windows Lan-free client in the mix. Total client
count about 1500, in schedules more or less spread across the night.
Performance of backups is OK; the AIX hosts are generally 20-30 CPU
loaded across 8 CPUs.
 One of the TSM instances servers as a TSM Library Master for the
others, and has no other workload. It mounts tapes for a EMC Disk
library (virtual library), configured with 128 virtual LTO1 tape drives,
shared between all the instances. The device class for the library has
a 15 minute mount retention period. The clients mostly can only mount a
single virtual tape. A few larger database servers are allowed to mount
more. All have keep mount point set to yes.
 This basic configuration has been in place about three years. At
first we had problems, and had to put LIBSHRTIMEOUT 60 and COMMTIMEOUT
3600 in the dsmserv.opt of the Library Master. But it has been many
months since we had to make any configuration changes to the
environment. I like STABLE.
 But things are growing, and we are adding new clients all the time,
and have added about forty in the last few weeks.
 A couple weeks ago, the Library Master instance got into a state
where there were lots of tapes in RESERVED status when we did a 'q
mount'. There were still occasional mounts happening, but lots of
clients were in Media wait. We restarted the Library Master and the
problem went away, but then it came back like a week later.
 Now it is happening every day. Last night we stayed up all night
watching it, and at first could see just a couple of RESERVED tape
drives, and lots 

Re: Urgent - Library Master mount queue breaking down, tapes going into RESERVED status and never getting mounted

2010-09-10 Thread Prather, Wanda
And you've probably done this already, but you should be able to log into the 
CDL and look at it's CPU busy, make sure IT isn't overwhelmed...


-Original Message-
From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of 
Richard Rhodes
Sent: Friday, September 10, 2010 5:01 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking down, tapes 
going into RESERVED status and never getting mounted

One time when we had problems like this it was caused by rmt devices being
out of sync with TSM paths.  We never did figure out how it occured, but we
ended up blowing away all our paths and drives, and recreating it.

Rick






 John D.
 Schneider
 john.schnei...@c  To
 OMPUTERCOACHINGCO ADSM-L@VM.MARIST.EDU
 MMUNITY.COM   cc
 Sent by: ADSM:
 Dist Stor Subject
 Manager  Re: Urgent - Library Master mount
 ads...@vm.marist queue breaking down, tapes going
 .EDU into RESERVED status and never
   getting mounted

 09/10/2010 04:39
 PM


 Please respond to
 ADSM: Dist Stor
 Manager
 ads...@vm.marist
   .EDU






Richard,
   All good suggestions.  No AIX errors with the VTL or VTL drives.  We
are using the Atape driver, because the VTL is emulating a 3584 with
LTO1 drives.

But there are a number of Atape files, in particular Atape.smc0.traceX.
I look in them and see regular errors in them; but I wonder if this is a
red herring.  Because I look on the Library Master for a physical 3584
library, and I see similar trace files, and the same sort of errors on
the smc1 device for a real 3584 library.

So are these libraries always getting these errors?

I looked at our SAN switches a couple days ago, and zeroed out the error
counters for the AIX host, the EDL, and the ISLs between the switches.
Two days later, and all those ports are totally error free.  So I don't
see how it could be in the switches.

All good ideas, and I don't mean to disparage them.  I just don't see a
smoking gun, yet.

Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



 Original Message 
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking
down, tapes going into RESERVED status and never getting mounted
From: Richard Rhodes rrho...@firstenergycorp.com
Date: Fri, September 10, 2010 12:44 pm
To: ADSM-L@VM.MARIST.EDU

Sounds like maybe the library manager is not communicating with the VTL.
Some things to check:

- any errors in the AIX error log?
- any errors in the VTL?
- any san errors?

If you are running atape . . .
- check the logs in /var/adm/ras
- are you running multi-pathing? If yes, what is the status of the
paths?

Atape with multi-paths is very good at hiding hardware problems.


Rick





 John D.
 Schneider
 john.schnei...@c To
 OMPUTERCOACHINGCO ADSM-L@VM.MARIST.EDU
 MMUNITY.COM cc
 Sent by: ADSM:
 Dist Stor Subject
 Manager Urgent - Library Master mount queue
 ads...@vm.marist breaking down, tapes going into
 .EDU RESERVED status and never getting
 mounted

 09/10/2010 01:05
 PM


 Please respond to
 ADSM: Dist Stor
 Manager
 ads...@vm.marist
 .EDU






 Greetings,
 Our environement is 8 TSM instances on AIX, running AIX 5.3ML11, and
TSM 5.4.3.0. I know we are rather far behind, but this has been an
extremely stable version for us, until just recently. There are 4
instances on one AIX host, and 4 on the other. The hosts are pSeries
570s. There is also a Windows Lan-free client in the mix. Total client
count about 1500, in schedules more or less spread across the night.
Performance of backups is OK; the AIX hosts are generally 20-30 CPU
loaded across 8 CPUs.
 One of the TSM instances servers as a TSM Library Master for the
others, and has no other workload. It mounts tapes for a EMC Disk
library (virtual library), configured with 128 virtual LTO1 tape drives,
shared between all the instances. The device class for the library has
a 15 minute mount retention period. The clients mostly can only mount a
single virtual tape. A few larger database servers are allowed to mount
more. All have keep mount point set to yes.
 This basic configuration has been in place about three years. At
first we had problems, and had to put LIBSHRTIMEOUT 60 and COMMTIMEOUT
3600 in the dsmserv.opt of the Library Master. But it has been many
months since we had to make any configuration changes to the
environment. I like STABLE.
 But things are growing, and we are adding new clients all the time,
and have added about forty in the last few weeks.
 A couple weeks 

Re: Urgent - Library Master mount queue breaking down, tapes going into RESERVED status and never getting mounted

2010-09-10 Thread John D. Schneider
Richard,
We have looked at the EDL console, and the EDL is not full, or
getting any other errors.  But we have seen situations before where a
memory leak in the EDL caused it to just run out of swap space on the
engine, and cause various errors and hangs.
It is conceivable that we are just seeing something like that.  This
afternoon we are going to restart the EDL engine.


   The other line of inquiry we have pursued is looking at the library
clients, such as the Windows lan-free.  It is a funny coincidence that
when we were fighting this last night, we discovered that the lan-free
server was hung.  We couldn't log in to it, but it wasn't down.  Just
had a blank screen.  So we rebooted it.  But four hours later we were
back in the same position, with 80+ Reserved tape drives, and the
lan-free was not doing anything that we could tell.


Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



 Original Message 
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking
down, tapes going into RESERVED status and never getting mounted
From: Richard Rhodes rrho...@firstenergycorp.com
Date: Fri, September 10, 2010 2:27 pm
To: ADSM-L@VM.MARIST.EDU

Or . .. maybe the vtl is simply full. Or, if it's a background dedup
box,
maybe it can't keep up with the rate of data coming in.






 Dwight Cook
 coo...@cox.net
 Sent by: ADSM: To
 Dist Stor ADSM-L@VM.MARIST.EDU
 Manager cc
 ads...@vm.marist
 .EDU Subject
 Re: Urgent - Library Master mount
 queue breaking down, tapes going
 09/10/2010 02:51 into RESERVED status and never
 PM getting mounted


 Please respond to
 ADSM: Dist Stor
 Manager
 ads...@vm.marist
 .EDU






Also have to wonder if maybe the VTL hasn't lost multiple drives in a
raid
array and is (possibly) working off of parity or at minimum, rebuilding
on
a
spare drive though that would generally happen and be done with (not be
seen over and over again) but... if you have a raid array that has lost
excessive drives and is operating off parity, that would greatly slow
down
processing when things get busy.


-Original Message-
From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of
Richard Rhodes
Sent: Friday, September 10, 2010 12:44 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking down,
tapes going into RESERVED status and never getting mounted

Sounds like maybe the library manager is not communicating with the VTL.
Some things to check:

- any errors in the AIX error log?
- any errors in the VTL?
- any san errors?

If you are running atape . . .
- check the logs in /var/adm/ras
- are you running multi-pathing? If yes, what is the status of the
paths?

Atape with multi-paths is very good at hiding hardware problems.


Rick





 John D.
 Schneider
 john.schnei...@c To
 OMPUTERCOACHINGCO ADSM-L@VM.MARIST.EDU
 MMUNITY.COM cc
 Sent by: ADSM:
 Dist Stor Subject
 Manager Urgent - Library Master mount queue
 ads...@vm.marist breaking down, tapes going into
 .EDU RESERVED status and never getting
 mounted

 09/10/2010 01:05
 PM


 Please respond to
 ADSM: Dist Stor
 Manager
 ads...@vm.marist
 .EDU






 Greetings,
 Our environement is 8 TSM instances on AIX, running AIX 5.3ML11, and
TSM 5.4.3.0. I know we are rather far behind, but this has been an
extremely stable version for us, until just recently. There are 4
instances on one AIX host, and 4 on the other. The hosts are pSeries
570s. There is also a Windows Lan-free client in the mix. Total client
count about 1500, in schedules more or less spread across the night.
Performance of backups is OK; the AIX hosts are generally 20-30 CPU
loaded across 8 CPUs.
 One of the TSM instances servers as a TSM Library Master for the
others, and has no other workload. It mounts tapes for a EMC Disk
library (virtual library), configured with 128 virtual LTO1 tape drives,
shared between all the instances. The device class for the library has
a 15 minute mount retention period. The clients mostly can only mount a
single virtual tape. A few larger database servers are allowed to mount
more. All have keep mount point set to yes.
 This basic configuration has been in place about three years. At
first we had problems, and had to put LIBSHRTIMEOUT 60 and COMMTIMEOUT
3600 in the dsmserv.opt of the Library Master. But it has been many
months since we had to make any configuration changes to the
environment. I like STABLE.
 But things are growing, and we are adding new clients all the time,
and have added about forty in the last few weeks.
 A couple weeks ago, the Library Master instance got into a state
where there were lots of tapes in RESERVED status when we did a 'q
mount'. There were still occasional mounts happening, but lots of
clients were in Media wait. We restarted the Library Master and the
problem went away, but then it came back like a week later.
 Now it is happening every day. 

Re: Urgent - Library Master mount queue breaking down, tapes going into RESERVED status and never getting mounted

2010-09-10 Thread John D. Schneider
Wanda,
We just now took down all our virtual tape drives, and rebooted the
EDL engine. So we will see tonight if that is the problem. 

We also looked at our schedules, to see if it is possible that we
have too many schedules in too narrow a timeframe asking for tape
mounts, pushing the library master to queue them and not be able to
fulfil them.  We have moved some clients to other schedules hours later,
and reduced resourceutilization on some clients that may be using more
simultaneous tape drives than they need.

Tonight we will monitor it carefully and see.

Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



 Original Message 
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking
down, tapes going into RESERVED status and never getting mounted
From: Prather, Wanda wprat...@icfi.com
Date: Fri, September 10, 2010 4:03 pm
To: ADSM-L@VM.MARIST.EDU

And you've probably done this already, but you should be able to log
into the CDL and look at it's CPU busy, make sure IT isn't
overwhelmed...


-Original Message-
From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of
Richard Rhodes
Sent: Friday, September 10, 2010 5:01 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking down,
tapes going into RESERVED status and never getting mounted

One time when we had problems like this it was caused by rmt devices
being
out of sync with TSM paths. We never did figure out how it occured, but
we
ended up blowing away all our paths and drives, and recreating it.

Rick






 John D.
 Schneider
 john.schnei...@c To
 OMPUTERCOACHINGCO ADSM-L@VM.MARIST.EDU
 MMUNITY.COM cc
 Sent by: ADSM:
 Dist Stor Subject
 Manager Re: Urgent - Library Master mount
 ads...@vm.marist queue breaking down, tapes going
 .EDU into RESERVED status and never
 getting mounted

 09/10/2010 04:39
 PM


 Please respond to
 ADSM: Dist Stor
 Manager
 ads...@vm.marist
 .EDU






Richard,
 All good suggestions. No AIX errors with the VTL or VTL drives. We
are using the Atape driver, because the VTL is emulating a 3584 with
LTO1 drives.

But there are a number of Atape files, in particular Atape.smc0.traceX.
I look in them and see regular errors in them; but I wonder if this is a
red herring. Because I look on the Library Master for a physical 3584
library, and I see similar trace files, and the same sort of errors on
the smc1 device for a real 3584 library.

So are these libraries always getting these errors?

I looked at our SAN switches a couple days ago, and zeroed out the error
counters for the AIX host, the EDL, and the ISLs between the switches.
Two days later, and all those ports are totally error free. So I don't
see how it could be in the switches.

All good ideas, and I don't mean to disparage them. I just don't see a
smoking gun, yet.

Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



 Original Message 
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking
down, tapes going into RESERVED status and never getting mounted
From: Richard Rhodes rrho...@firstenergycorp.com
Date: Fri, September 10, 2010 12:44 pm
To: ADSM-L@VM.MARIST.EDU

Sounds like maybe the library manager is not communicating with the VTL.
Some things to check:

- any errors in the AIX error log?
- any errors in the VTL?
- any san errors?

If you are running atape . . .
- check the logs in /var/adm/ras
- are you running multi-pathing? If yes, what is the status of the
paths?

Atape with multi-paths is very good at hiding hardware problems.


Rick





 John D.
 Schneider
 john.schnei...@c To
 OMPUTERCOACHINGCO ADSM-L@VM.MARIST.EDU
 MMUNITY.COM cc
 Sent by: ADSM:
 Dist Stor Subject
 Manager Urgent - Library Master mount queue
 ads...@vm.marist breaking down, tapes going into
 .EDU RESERVED status and never getting
 mounted

 09/10/2010 01:05
 PM


 Please respond to
 ADSM: Dist Stor
 Manager
 ads...@vm.marist
 .EDU






 Greetings,
 Our environement is 8 TSM instances on AIX, running AIX 5.3ML11, and
TSM 5.4.3.0. I know we are rather far behind, but this has been an
extremely stable version for us, until just recently. There are 4
instances on one AIX host, and 4 on the other. The hosts are pSeries
570s. There is also a Windows Lan-free client in the mix. Total client
count about 1500, in schedules more or less spread across the night.
Performance of backups is OK; the AIX hosts are generally 20-30 CPU
loaded across 8 CPUs.
 One of the TSM instances servers as a TSM Library Master for the
others, and has no other workload. It mounts tapes for a EMC Disk
library (virtual library), configured with 128 virtual LTO1 tape drives,
shared between all the instances. The device class for the library has
a 15 minute mount retention period. The clients mostly can only 

Re: Urgent - Library Master mount queue breaking down, tapes going into RESERVED status and never getting mounted

2010-09-10 Thread John D. Schneider
Dwight,
   As I said in my original email, we did have a drive fail on 8/23, and
the EDL rebuilt onto a spare.  The event log shows it took about a day
to rebuild.  So it doesn't look like it is running in degraded mode at
this point.  I will call in the problem and get the drive replaced, but
I don't think it should be the culprit for our problem with tape drives
going into Reserved status.


Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



 Original Message 
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking
down, tapes going into RESERVED status and never getting mounted
From: Dwight Cook coo...@cox.net
Date: Fri, September 10, 2010 1:51 pm
To: ADSM-L@VM.MARIST.EDU

Also have to wonder if maybe the VTL hasn't lost multiple drives in a
raid
array and is (possibly) working off of parity or at minimum, rebuilding
on a
spare drive though that would generally happen and be done with (not be
seen over and over again) but... if you have a raid array that has lost
excessive drives and is operating off parity, that would greatly slow
down
processing when things get busy.


-Original Message-
From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of
Richard Rhodes
Sent: Friday, September 10, 2010 12:44 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking down,
tapes going into RESERVED status and never getting mounted

Sounds like maybe the library manager is not communicating with the VTL.
Some things to check:

- any errors in the AIX error log?
- any errors in the VTL?
- any san errors?

If you are running atape . . .
- check the logs in /var/adm/ras
- are you running multi-pathing? If yes, what is the status of the
paths?

Atape with multi-paths is very good at hiding hardware problems.


Rick





 John D.
 Schneider
 john.schnei...@c To
 OMPUTERCOACHINGCO ADSM-L@VM.MARIST.EDU
 MMUNITY.COM cc
 Sent by: ADSM:
 Dist Stor Subject
 Manager Urgent - Library Master mount queue
 ads...@vm.marist breaking down, tapes going into
 .EDU RESERVED status and never getting
 mounted

 09/10/2010 01:05
 PM


 Please respond to
 ADSM: Dist Stor
 Manager
 ads...@vm.marist
 .EDU






 Greetings,
 Our environement is 8 TSM instances on AIX, running AIX 5.3ML11, and
TSM 5.4.3.0. I know we are rather far behind, but this has been an
extremely stable version for us, until just recently. There are 4
instances on one AIX host, and 4 on the other. The hosts are pSeries
570s. There is also a Windows Lan-free client in the mix. Total client
count about 1500, in schedules more or less spread across the night.
Performance of backups is OK; the AIX hosts are generally 20-30 CPU
loaded across 8 CPUs.
 One of the TSM instances servers as a TSM Library Master for the
others, and has no other workload. It mounts tapes for a EMC Disk
library (virtual library), configured with 128 virtual LTO1 tape drives,
shared between all the instances. The device class for the library has
a 15 minute mount retention period. The clients mostly can only mount a
single virtual tape. A few larger database servers are allowed to mount
more. All have keep mount point set to yes.
 This basic configuration has been in place about three years. At
first we had problems, and had to put LIBSHRTIMEOUT 60 and COMMTIMEOUT
3600 in the dsmserv.opt of the Library Master. But it has been many
months since we had to make any configuration changes to the
environment. I like STABLE.
 But things are growing, and we are adding new clients all the time,
and have added about forty in the last few weeks.
 A couple weeks ago, the Library Master instance got into a state
where there were lots of tapes in RESERVED status when we did a 'q
mount'. There were still occasional mounts happening, but lots of
clients were in Media wait. We restarted the Library Master and the
problem went away, but then it came back like a week later.
 Now it is happening every day. Last night we stayed up all night
watching it, and at first could see just a couple of RESERVED tape
drives, and lots of normal mounts coming and going. Then slowly the
number of RESERVED ones would creap up over the course of an hour or two
until there were 80 or more in RESERVED status, and dozens of clients in
Media wait. Ordinarily virtual tape mounts take 2-4 seconds. Last
night during the problem they were taking 15-20 seconds. At about 1am
we restarted the Library Master, and the RESERVED drives went away, but
were back again within the hour.
 One thing I noticed then was that the Library Master had over 300
sessions, all admin. Usually it has very few. Our MAXSESSIONS was set
to 500, so I wondered if perhaps were were overrunning it. We bumped it
up to 1000 on all instances. We restarted all TSM instances this time,
including the lan-free one. (The lan-free Windows server was hung,
although we don't know if this is coincidence, or