Re: Dwindling Performance

2004-01-15 Thread Ben Bullock
I ~LOVE~ to hear words like that  ;-)

BUY MORE MEMORY BUY MORE MEMORY... YOU ARE FEELING VERY
SLEEPY

Ben
(FYI, I work for Micron, a DRAM manufacturer) 
http://www.crucial.com or http://www.micron.com

Before I get inundated with "don't solicit" flame-mail, I'm just kidding
guys :-)


-Original Message-
From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of
Chris Murphy
Sent: Thursday, January 15, 2004 2:57 PM
To: [EMAIL PROTECTED]
Subject: Re: Dwindling Performance




Memory is so cheap that you can afford to throw lots of it at it.

> I'm running TSM 5.1.8 on Win2K with 2G
>RAM.

Hi Dwight,

I wanted to throw out one note of caution.  I completely agree with
Roger: memory is too cheap to rob your system of it!  However, you said
you are running your TSM server on a Windows 2000 box.  Remember,
Windows limits application memory space to 2GB.  This can be increased
to 3GB with a startup option at the expense of some operating system
memory spaces.  (i.e. use it carefully!).  Thus, you cannot allocate all
2GB to bufferpool cache as TSM still needs additional space for object
creation, I/O buffering, and its own execution.  We have found that we
could only safely allocate 512-768MB to the bufferpool.  If we incerased
it beyond that, TSM would dump (go down HARD) when under load.  While
our DB is rather small (20GB - 40%
utilized) we still get a hit of 99.9%.

Other options are: Use different OS, use an Itanium-class server, talk
IBM into adding AWE support when they write/compile TSM for windows
(least likely I think <8) ).

HTH

Chris Murphy
IT Network Analyst
Idaho Dept. of Lands
(208) 334-0293


Re: Dwindling Performance

2004-01-15 Thread Chris Murphy


Memory is so cheap that you can afford to throw lots of it at it.

> I'm running TSM 5.1.8 on Win2K with 2G
>RAM.

Hi Dwight,

I wanted to throw out one note of caution.  I completely agree with Roger:
memory is too cheap to rob your system of it!  However, you said you are
running your TSM server on a Windows 2000 box.  Remember, Windows limits
application memory space to 2GB.  This can be increased to 3GB with a
startup option at the expense of some operating system memory spaces.  (i.e.
use it carefully!).  Thus, you cannot allocate all 2GB to bufferpool cache
as TSM still needs additional space for object creation, I/O buffering, and
its own execution.  We have found that we could only safely allocate
512-768MB to the bufferpool.  If we incerased it beyond that, TSM would dump
(go down HARD) when under load.  While our DB is rather small (20GB - 40%
utilized) we still get a hit of 99.9%.

Other options are: Use different OS, use an Itanium-class server, talk IBM
into adding AWE support when they write/compile TSM for windows (least
likely I think <8) ).

HTH

Chris Murphy
IT Network Analyst
Idaho Dept. of Lands
(208) 334-0293


Re: Dwindling Performance

2004-01-15 Thread Rushforth, Tim
Note that with SELFTUNEBUFPOOLSIZE, there are some limits on the size of the
bufferpool.  It used to be 10% of real memory for Windows. (Which you are
not at if you have 2GB memory.)

Check for the following message after Expire inventory.
ANR0386I The BUFPoolsize has been changed to x

You get this if you have selftuning on.  But if it keeps on changing to the
current value then you are probably at the limit.

You then have to turn SELFTUNEBUFPOOLSIZE off and manually tune.

BTW, we have 2GB memory on a Windows 2000 server and set BUFPOOLSIZE to 1
GB.

-Original Message-
From: Roger Deschner [mailto:[EMAIL PROTECTED]
Sent: January 15, 2004 1:56 PM
To: [EMAIL PROTECTED]
Subject: Re: Dwindling Performance

Oh, how I love quoting directly from IBM manuals:

"If the value falls below 98%, consider increasing the size of the
database buffer pool. For larger installations, performance could
imporve significantly if your cache hit percentage is greater than 99%."

--Page 396, TSM V5.1 Administrators Guide for AIX

Memory is so cheap that you can afford to throw lots of it at it.

Roger Deschner  University of Illinois at Chicago [EMAIL PROTECTED]


On Thu, 15 Jan 2004, Dwight McCann wrote:

>Roger,
>
>I really enjoyed and appreciated your response on this issue, but you've
>got me to wondering at bit.  I'm running TSM 5.1.8 on Win2K with 2G
>RAM.  I looked at my migration times which are about 20 minutes (so
>that's no problem) and then did the Q DB F=D and saw that my Cache Hit
>was only 96.81.  You said that if it wasn't 99% it needed help.  I have
>it autotuning and here are my outputs:
>
>tsm: DATAPLUS_SERVER1>q actlog begind=-2 search='expiration'
>
>Date/TimeMessage
>
>--
>01/13/2004 19:19:22  ANR0984I Process 177 for EXPIRATION started in the
>  BACKGROUND at 19:19:22.
>01/13/2004 19:19:22  ANR0811I Inventory client file expiration
>started as
>  process 177.
>01/13/2004 19:37:58  ANR0812I Inventory file expiration process 177
>completed:
>  examined 437237 objects, deleting 87348 backup
>objects, 0
>  archive objects, 0 DB backup volumes, and 0
>recovery plan
>  files. 0 errors were encountered.
>01/13/2004 19:37:58  ANR0987I Process 177 for EXPIRATION running in the
>  BACKGROUND processed 87348 items with a
>completion state
>  of SUCCESS at 19:37:58.
>01/14/2004 19:37:58  ANR0984I Process 182 for EXPIRATION started in the
>  BACKGROUND at 19:37:58.
>01/14/2004 19:37:58  ANR0811I Inventory client file expiration
>started as
>  process 182.
>01/14/2004 19:58:20  ANR0812I Inventory file expiration process 182
>completed:
>  examined 433897 objects, deleting 104134
>backup objects,
>  0 archive objects, 0 DB backup volumes, and 0
>recovery
>  plan files. 0 errors were encountered.
>01/14/2004 19:58:20  ANR0987I Process 182 for EXPIRATION running in the
>  BACKGROUND processed 104134 items with a
>completion state
>  of SUCCESS at 19:58:20.
>01/15/2004 09:43:06  ANR2017I Administrator DWIGHT issued command:
>QUERY ACTLOG
>  begind=-2 search=expiration
>
>tsm: DATAPLUS_SERVER1>q db f=d
>
>  Available Space (MB): 30,000
>Assigned Capacity (MB): 25,292
>Maximum Extension (MB): 4,708
>Maximum Reduction (MB): 19,564
> Page Size (bytes): 4,096
>Total Usable Pages: 6,474,752
>Used Pages: 1,442,366
>  Pct Util: 22.3
> Max. Pct Util: 22.7
>  Physical Volumes: 2
> Buffer Pool Pages: 32,768
> Total Buffer Requests: 28,394,716
>Cache Hit Pct.: 96.81
>   Cache Wait Pct.: 0.00
>   Backup in Progress?: No
>Type of Backup In Progress:
>  Incrementals Since Last Full: 4
>Changed Since Last Backup (MB): 0.98
>Percentage Changed: 0.02
>Last Complete Backup Date/Time: 01/15/2004 05:45:26
>
>Is there something I should do to get the cache hits up to 99% even
>though I don't yet have a performance issue?  I realize that my
>environment is quite small but it is about to double and I'd rather be
>proactive.  TIA,
>
>--
>Dwight McCann
>Computer and Network Technologist, UCSB Info Systems & Computing
>[EMAIL PROTECTED], [EMAIL PROTECTED]
>http://borg.isc.ucsb.edu/dmm/  - office: 805-893-3113
>
>


Re: Dwindling Performance

2004-01-15 Thread Roger Deschner
Oh, how I love quoting directly from IBM manuals:

"If the value falls below 98%, consider increasing the size of the
database buffer pool. For larger installations, performance could
imporve significantly if your cache hit percentage is greater than 99%."

--Page 396, TSM V5.1 Administrators Guide for AIX

Memory is so cheap that you can afford to throw lots of it at it.

Roger Deschner  University of Illinois at Chicago [EMAIL PROTECTED]


On Thu, 15 Jan 2004, Dwight McCann wrote:

>Roger,
>
>I really enjoyed and appreciated your response on this issue, but you've
>got me to wondering at bit.  I'm running TSM 5.1.8 on Win2K with 2G
>RAM.  I looked at my migration times which are about 20 minutes (so
>that's no problem) and then did the Q DB F=D and saw that my Cache Hit
>was only 96.81.  You said that if it wasn't 99% it needed help.  I have
>it autotuning and here are my outputs:
>
>tsm: DATAPLUS_SERVER1>q actlog begind=-2 search='expiration'
>
>Date/TimeMessage
>
>--
>01/13/2004 19:19:22  ANR0984I Process 177 for EXPIRATION started in the
>  BACKGROUND at 19:19:22.
>01/13/2004 19:19:22  ANR0811I Inventory client file expiration
>started as
>  process 177.
>01/13/2004 19:37:58  ANR0812I Inventory file expiration process 177
>completed:
>  examined 437237 objects, deleting 87348 backup
>objects, 0
>  archive objects, 0 DB backup volumes, and 0
>recovery plan
>  files. 0 errors were encountered.
>01/13/2004 19:37:58  ANR0987I Process 177 for EXPIRATION running in the
>  BACKGROUND processed 87348 items with a
>completion state
>  of SUCCESS at 19:37:58.
>01/14/2004 19:37:58  ANR0984I Process 182 for EXPIRATION started in the
>  BACKGROUND at 19:37:58.
>01/14/2004 19:37:58  ANR0811I Inventory client file expiration
>started as
>  process 182.
>01/14/2004 19:58:20  ANR0812I Inventory file expiration process 182
>completed:
>  examined 433897 objects, deleting 104134
>backup objects,
>  0 archive objects, 0 DB backup volumes, and 0
>recovery
>  plan files. 0 errors were encountered.
>01/14/2004 19:58:20  ANR0987I Process 182 for EXPIRATION running in the
>  BACKGROUND processed 104134 items with a
>completion state
>  of SUCCESS at 19:58:20.
>01/15/2004 09:43:06  ANR2017I Administrator DWIGHT issued command:
>QUERY ACTLOG
>  begind=-2 search=expiration
>
>tsm: DATAPLUS_SERVER1>q db f=d
>
>  Available Space (MB): 30,000
>Assigned Capacity (MB): 25,292
>Maximum Extension (MB): 4,708
>Maximum Reduction (MB): 19,564
> Page Size (bytes): 4,096
>Total Usable Pages: 6,474,752
>Used Pages: 1,442,366
>  Pct Util: 22.3
> Max. Pct Util: 22.7
>  Physical Volumes: 2
> Buffer Pool Pages: 32,768
> Total Buffer Requests: 28,394,716
>Cache Hit Pct.: 96.81
>   Cache Wait Pct.: 0.00
>   Backup in Progress?: No
>Type of Backup In Progress:
>  Incrementals Since Last Full: 4
>Changed Since Last Backup (MB): 0.98
>Percentage Changed: 0.02
>Last Complete Backup Date/Time: 01/15/2004 05:45:26
>
>Is there something I should do to get the cache hits up to 99% even
>though I don't yet have a performance issue?  I realize that my
>environment is quite small but it is about to double and I'd rather be
>proactive.  TIA,
>
>--
>Dwight McCann
>Computer and Network Technologist, UCSB Info Systems & Computing
>[EMAIL PROTECTED], [EMAIL PROTECTED]
>http://borg.isc.ucsb.edu/dmm/  - office: 805-893-3113
>
>


Re: Dwindling Performance

2004-01-15 Thread Dwight McCann
Roger,

I really enjoyed and appreciated your response on this issue, but you've
got me to wondering at bit.  I'm running TSM 5.1.8 on Win2K with 2G
RAM.  I looked at my migration times which are about 20 minutes (so
that's no problem) and then did the Q DB F=D and saw that my Cache Hit
was only 96.81.  You said that if it wasn't 99% it needed help.  I have
it autotuning and here are my outputs:
tsm: DATAPLUS_SERVER1>q actlog begind=-2 search='expiration'

Date/TimeMessage

--
01/13/2004 19:19:22  ANR0984I Process 177 for EXPIRATION started in the
 BACKGROUND at 19:19:22.
01/13/2004 19:19:22  ANR0811I Inventory client file expiration
started as
 process 177.
01/13/2004 19:37:58  ANR0812I Inventory file expiration process 177
completed:
 examined 437237 objects, deleting 87348 backup
objects, 0
 archive objects, 0 DB backup volumes, and 0
recovery plan
 files. 0 errors were encountered.
01/13/2004 19:37:58  ANR0987I Process 177 for EXPIRATION running in the
 BACKGROUND processed 87348 items with a
completion state
 of SUCCESS at 19:37:58.
01/14/2004 19:37:58  ANR0984I Process 182 for EXPIRATION started in the
 BACKGROUND at 19:37:58.
01/14/2004 19:37:58  ANR0811I Inventory client file expiration
started as
 process 182.
01/14/2004 19:58:20  ANR0812I Inventory file expiration process 182
completed:
 examined 433897 objects, deleting 104134
backup objects,
 0 archive objects, 0 DB backup volumes, and 0
recovery
 plan files. 0 errors were encountered.
01/14/2004 19:58:20  ANR0987I Process 182 for EXPIRATION running in the
 BACKGROUND processed 104134 items with a
completion state
 of SUCCESS at 19:58:20.
01/15/2004 09:43:06  ANR2017I Administrator DWIGHT issued command:
QUERY ACTLOG
 begind=-2 search=expiration
tsm: DATAPLUS_SERVER1>q db f=d

 Available Space (MB): 30,000
   Assigned Capacity (MB): 25,292
   Maximum Extension (MB): 4,708
   Maximum Reduction (MB): 19,564
Page Size (bytes): 4,096
   Total Usable Pages: 6,474,752
   Used Pages: 1,442,366
 Pct Util: 22.3
Max. Pct Util: 22.7
 Physical Volumes: 2
Buffer Pool Pages: 32,768
Total Buffer Requests: 28,394,716
   Cache Hit Pct.: 96.81
  Cache Wait Pct.: 0.00
  Backup in Progress?: No
   Type of Backup In Progress:
 Incrementals Since Last Full: 4
Changed Since Last Backup (MB): 0.98
   Percentage Changed: 0.02
Last Complete Backup Date/Time: 01/15/2004 05:45:26
Is there something I should do to get the cache hits up to 99% even
though I don't yet have a performance issue?  I realize that my
environment is quite small but it is about to double and I'd rather be
proactive.  TIA,
--
Dwight McCann
Computer and Network Technologist, UCSB Info Systems & Computing
[EMAIL PROTECTED], [EMAIL PROTECTED]
http://borg.isc.ucsb.edu/dmm/  - office: 805-893-3113


Re: Dwindling Performance

2004-01-14 Thread Ben Bullock
Hmmm, interesting that the expire inventory grinds to a halt
during incremental backups. My setup is AIX similar to yours (host, DB
size) although my disks are on locally attached SSA drives. I recently
upgraded my 8 TSM servers from TSM 5.1.1.0 to 5.2.1.3 (mainly to get the
NDMP file-level backups, finally).

 On one of them I saw the same issue. They are all set up almost
identically, so why 1 would misbehave is a mystery to me. To fix the
immediate problem, I put a " duration=" on the expire inventory job so
that it would only run during the day when backups are less likely.
Sure, the expire inventory now takes 2 days to run, but it's better than
having all the backups go extremely slow and not complete.

I then started to look into the performance issues. Some of the
things I have done:
- I changed the DB volumes from JFS to raw (that made a very
good improvement). 
- Turn the SSA fastwrite cache on the db volumes. 
- Tried out these settings for vmtune (gleaned from this
listsrv)
/usr/samples/kernel/vmtune  -t10 -P10 -p5 -s1 -W16 -c8
-R256 -F512 -u25 -b2200 -B2200

All of these changes have improved the speed of the expire
inventory, but to be honest I haven't tried to run the expire inventory
during the incremental backups since. Once bitten twice shy, and I can
live with the expire inventory taking 2 days to complete.

That's kind of where I am now. No solid solution, but improved
performance enough that it's workable now.

I'd love to hear what other changes you make to resolve your
situation.

Ben
Micron Technology Inc.
Boise, Id 

-Original Message-
From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of
Andy Carlson
Sent: Wednesday, January 14, 2004 7:07 AM
To: [EMAIL PROTECTED]
Subject: Re: Dwindling Performance


Thanks for the quick response.

Expiration is not finishing.  Before the main backups start, it maybe
expires 20 objects, but during the backup window it slows to a
crawl.  It picks up some during the day when the backups and migrations
are running, but since we now have some 100 sessions not finished, its
slow then too.

I didn't look at randomize, but these sessions are staying out there for
hours.  I will take a look at that today.

I currently have them doing an incrbydate every other day, and a full
incr the othter.

The cache hit ratio of the database is about 98.5%, but we have about
3.5GB of memory in the cache.  I don't think I can go much higher, but I
will try it if I can.

P.S.  The TSMI clients are Windows and Netware, the TSMU are Unix and a
couple of VMS.

Thanks for the input.


Andy Carlson|\  _,,,---,,_
Senior Technical Specialist   ZZZzz /,`.-'`'-.  ;-;;,_
BJC Health Care|,4-  ) )-,_. ,\ (  `'-'
St. Louis, Missouri   '---''(_/--'  `-'\_)
Cat Pics: http://andyc.dyndns.org/animal.html


On Wed, 14 Jan 2004, Roger Deschner wrote:

> I have posted many times in the past saying you should never do a 
> database unload/reload to gain performance. But this just might be the

> one case where it might make sense - the remaining half of a split 
> server. But before you do something that drastic, dangerous, and time 
> consuming, look for the things that are easier to fix.
>
> My basic metric of whether or not you are in trouble is, how long does

> expiration take? If you start it daily, the closer it is to 24 hours 
> running time, the closer you are to doomsday. Never-ending expiration 
> is the classic symptom of TSM Server Meltdown.
>
> But on the other hand, if your expiration runs nice and fast, your 
> server and its database are probably OK. Look to clients as the 
> problem. They can't all squeeze in the door at once, so don't let them

> try. If they use the client-polling scheduler, how long is the backup 
> window, and what is your setting for Schedule Randomization 
> Percentage? Make it as high as possible - SET RANDOMIZE 50. This will 
> also help if you are having any kind of a network bottleneck.
>
> Look at these clients on a micro level. About how much are they each 
> actually backing up? If it's not much, then your theory might be 
> right, that they are very busy downloading their lists of backed up 
> files. In that case, load spreading will be the best thing you could 
> do. You might consider a schedule where not every client does a full 
> "Incremental" every night - perhaps they only do one every other night

> and on the other nights they do an "incrbydate" backup which is much 
> faster, because it goes only by the timestamps in the file system.
>
> Not to ask the obvious, but what's your Database Cac

Re: Dwindling Performance

2004-01-14 Thread Scott, Brian
Andy,

Have you thought about using the TSM Journal Service? If you're building a
ton of directories/files but not backing up much the journal will cut down
on the processing and keep your sessions down to a minimum.

Just a thought...

Brian Scott
EDS - EOGDE
GM Distributed Management Systems Engineering
MS 3234
750 Tower Drive
Troy, MI  48098

* phone: +01-248-265-4596 (8-365)
* mailto:[EMAIL PROTECTED]




-Original Message-
From: Andy Carlson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 14, 2004 9:07 AM
To: [EMAIL PROTECTED]
Subject: Re: Dwindling Performance


Thanks for the quick response.

Expiration is not finishing.  Before the main backups start, it maybe
expires 20 objects, but during the backup window it slows to a crawl.
It picks up some during the day when the backups and migrations are running,
but since we now have some 100 sessions not finished, its slow then too.

I didn't look at randomize, but these sessions are staying out there for
hours.  I will take a look at that today.

I currently have them doing an incrbydate every other day, and a full incr
the othter.

The cache hit ratio of the database is about 98.5%, but we have about 3.5GB
of memory in the cache.  I don't think I can go much higher, but I will try
it if I can.

P.S.  The TSMI clients are Windows and Netware, the TSMU are Unix and a
couple of VMS.

Thanks for the input.


Andy Carlson|\  _,,,---,,_
Senior Technical Specialist   ZZZzz /,`.-'`'-.  ;-;;,_
BJC Health Care|,4-  ) )-,_. ,\ (  `'-'
St. Louis, Missouri   '---''(_/--'  `-'\_)
Cat Pics: http://andyc.dyndns.org/animal.html


On Wed, 14 Jan 2004, Roger Deschner wrote:

> I have posted many times in the past saying you should never do a
> database unload/reload to gain performance. But this just might be the
> one case where it might make sense - the remaining half of a split
> server. But before you do something that drastic, dangerous, and time
> consuming, look for the things that are easier to fix.
>
> My basic metric of whether or not you are in trouble is, how long does
> expiration take? If you start it daily, the closer it is to 24 hours
> running time, the closer you are to doomsday. Never-ending expiration
> is the classic symptom of TSM Server Meltdown.
>
> But on the other hand, if your expiration runs nice and fast, your
> server and its database are probably OK. Look to clients as the
> problem. They can't all squeeze in the door at once, so don't let them
> try. If they use the client-polling scheduler, how long is the backup
> window, and what is your setting for Schedule Randomization
> Percentage? Make it as high as possible - SET RANDOMIZE 50. This will
> also help if you are having any kind of a network bottleneck.
>
> Look at these clients on a micro level. About how much are they each
> actually backing up? If it's not much, then your theory might be
> right, that they are very busy downloading their lists of backed up
> files. In that case, load spreading will be the best thing you could
> do. You might consider a schedule where not every client does a full
> "Incremental" every night - perhaps they only do one every other night
> and on the other nights they do an "incrbydate" backup which is much
> faster, because it goes only by the timestamps in the file system.
>
> Not to ask the obvious, but what's your Database Cache Hit Percentage?
> (Q DB F=D) If it's below 99%, it needs help. Even (especially) a badly
> fragmented database will run a lot faster if you have it swimming in
> cache.
>
> Look at other differences between your two instances - are they
> basically different types of clients?
>
> Roger Deschner  University of Illinois at Chicago [EMAIL PROTECTED]
> The short fortuneteller who escaped from
> prison= ==was a small medium at
> large.==
>
>
>
> On Tue, 13 Jan 2004, Andy Carlson wrote:
>
> >We are having terrible performance with one of our instances of TSM.
> >I have suspicions, but I want to hear what you guys say.  Here is
> >what we
> >have:
> >
> >2 instances of TSM - TSMI and TSMU (TSMI is the problem)
> >
> >TSM 5.2.1.1
> >AIX 51.ML4
> >RS/6000 P670 - 8 processors, 16GB memory
> >Fastt700 SAN
> >STK9840 Tape drives
> >
> >The Database is 85% of 88GB (with room to expand another 50GB or so).
> >
> >Right at this moment, we have 233 sessions with TSMI.  The backup
> >sessions grind to a halt for hours at a time, with nothing apparently
> >happening.  I suspect that the directory trees a

Re: Dwindling Performance

2004-01-14 Thread Andy Carlson
I'll supply as much information as anyone wants if they have any clues
about what could be going on.  I will take a look at the current
performance book, but I have looked at it in the past.


Andy Carlson|\  _,,,---,,_
Senior Technical Specialist   ZZZzz /,`.-'`'-.  ;-;;,_
BJC Health Care|,4-  ) )-,_. ,\ (  `'-'
St. Louis, Missouri   '---''(_/--'  `-'\_)
Cat Pics: http://andyc.dyndns.org/animal.html


On Wed, 14 Jan 2004, Richard Sims wrote:

> >We are having terrible performance with one of our instances of TSM. ...
>
> Andy - We'd like to help, but would need a lot more information about the
>context of the issue, and in particular what you've already
> investigated.
>
> I'd refer you first to the TSM Performance Tuning Guide at
>  http://publib.boulder.ibm.com/tividd/td/IBMStorageManagerMessages5.2.2.html
> I also have performance issue summaries in ADSM QuickFacts, compiled from
> our community experiences over the years.
>
>   Richard Sims, BU
>


Re: Dwindling Performance

2004-01-14 Thread Andy Carlson
Thanks for the quick response.

Expiration is not finishing.  Before the main backups start, it maybe
expires 20 objects, but during the backup window it slows to a
crawl.  It picks up some during the day when the backups and migrations
are running, but since we now have some 100 sessions not finished, its
slow then too.

I didn't look at randomize, but these sessions are staying out there for
hours.  I will take a look at that today.

I currently have them doing an incrbydate every other day, and a full
incr the othter.

The cache hit ratio of the database is about 98.5%, but we have about
3.5GB of memory in the cache.  I don't think I can go much higher, but
I will try it if I can.

P.S.  The TSMI clients are Windows and Netware, the TSMU are Unix and a
couple of VMS.

Thanks for the input.


Andy Carlson|\  _,,,---,,_
Senior Technical Specialist   ZZZzz /,`.-'`'-.  ;-;;,_
BJC Health Care|,4-  ) )-,_. ,\ (  `'-'
St. Louis, Missouri   '---''(_/--'  `-'\_)
Cat Pics: http://andyc.dyndns.org/animal.html


On Wed, 14 Jan 2004, Roger Deschner wrote:

> I have posted many times in the past saying you should never do a
> database unload/reload to gain performance. But this just might be the
> one case where it might make sense - the remaining half of a split
> server. But before you do something that drastic, dangerous, and time
> consuming, look for the things that are easier to fix.
>
> My basic metric of whether or not you are in trouble is, how long does
> expiration take? If you start it daily, the closer it is to 24 hours
> running time, the closer you are to doomsday. Never-ending expiration is
> the classic symptom of TSM Server Meltdown.
>
> But on the other hand, if your expiration runs nice and fast, your
> server and its database are probably OK. Look to clients as the problem.
> They can't all squeeze in the door at once, so don't let them try. If
> they use the client-polling scheduler, how long is the backup window,
> and what is your setting for Schedule Randomization Percentage? Make it
> as high as possible - SET RANDOMIZE 50. This will also help if you are
> having any kind of a network bottleneck.
>
> Look at these clients on a micro level. About how much are they each
> actually backing up? If it's not much, then your theory might be
> right, that they are very busy downloading their lists of backed up
> files. In that case, load spreading will be the best thing you could
> do. You might consider a schedule where not every client does a full
> "Incremental" every night - perhaps they only do one every other night
> and on the other nights they do an "incrbydate" backup which is much
> faster, because it goes only by the timestamps in the file system.
>
> Not to ask the obvious, but what's your Database Cache Hit Percentage?
> (Q DB F=D) If it's below 99%, it needs help. Even (especially) a badly
> fragmented database will run a lot faster if you have it swimming in
> cache.
>
> Look at other differences between your two instances - are they
> basically different types of clients?
>
> Roger Deschner  University of Illinois at Chicago [EMAIL PROTECTED]
> The short fortuneteller who escaped from prison=
> ==was a small medium at large.==
>
>
>
> On Tue, 13 Jan 2004, Andy Carlson wrote:
>
> >We are having terrible performance with one of our instances of TSM.  I
> >have suspicions, but I want to hear what you guys say.  Here is what we
> >have:
> >
> >2 instances of TSM - TSMI and TSMU (TSMI is the problem)
> >
> >TSM 5.2.1.1
> >AIX 51.ML4
> >RS/6000 P670 - 8 processors, 16GB memory
> >Fastt700 SAN
> >STK9840 Tape drives
> >
> >The Database is 85% of 88GB (with room to expand another 50GB or so).
> >
> >Right at this moment, we have 233 sessions with TSMI.  The backup
> >sessions grind to a halt for hours at a time, with nothing apparently
> >happening.  I suspect that the directory trees are being downloaded and
> >built, but not sure
> >
> >When we split TSMI and TSMU, we created the TSMU instance, and did a
> >full backup on all the servers that moved there.  The TSMI database is a
> >restored copy of the original database, with the TSMU stuff deleted out.
> >
> >Any ideas would be greatly appreciated.
> >
> >
> >Andy Carlson|\  _,,,---,,_
> >Senior Technical Specialist   ZZZzz /,`.-'`'-.  ;-;;,_
> >BJC Health Care|,4-  ) )-,_. ,\ (  `'-'
> >St. Louis, Missouri   '---''(_/--'  `-'\_)
> >Cat Pics: http://andyc.dyndns.org/animal.html
> >
>


Re: Dwindling Performance

2004-01-14 Thread Richard Sims
>We are having terrible performance with one of our instances of TSM. ...

Andy - We'd like to help, but would need a lot more information about the
   context of the issue, and in particular what you've already
investigated.

I'd refer you first to the TSM Performance Tuning Guide at
 http://publib.boulder.ibm.com/tividd/td/IBMStorageManagerMessages5.2.2.html
I also have performance issue summaries in ADSM QuickFacts, compiled from
our community experiences over the years.

  Richard Sims, BU


Re: Dwindling Performance

2004-01-14 Thread Joe Howell
Andy - what kind of clients do you have?  I've had problems with Windows clients all 
trying to do systemobject backups simultaneously and had to reorganize my nightly work 
to split out the systemobject backups from the regular files and then to serialize 
them because of contention problems with the TSM database.  It's supposed to be better 
in 5.2 but I'm not there yet.

Also, you may be right about the directory trees; if your stalls seem to happen at the 
beginning of the backups you can look at the cpu being used by the TSM BA client 
process and see if it's churning away.  If so, that may be what's happening.

Andy Carlson <[EMAIL PROTECTED]> wrote:
We are having terrible performance with one of our instances of TSM. I
have suspicions, but I want to hear what you guys say. Here is what we
have:

2 instances of TSM - TSMI and TSMU (TSMI is the problem)

TSM 5.2.1.1
AIX 51.ML4
RS/6000 P670 - 8 processors, 16GB memory
Fastt700 SAN
STK9840 Tape drives

The Database is 85% of 88GB (with room to expand another 50GB or so).

Right at this moment, we have 233 sessions with TSMI. The backup
sessions grind to a halt for hours at a time, with nothing apparently
happening. I suspect that the directory trees are being downloaded and
built, but not sure

When we split TSMI and TSMU, we created the TSMU instance, and did a
full backup on all the servers that moved there. The TSMI database is a
restored copy of the original database, with the TSMU stuff deleted out.

Any ideas would be greatly appreciated.


Andy Carlson |\ _,,,---,,_
Senior Technical Specialist ZZZzz /,`.-'`' -. ;-;;,_
BJC Health Care |,4- ) )-,_. ,\ ( `'-'
St. Louis, Missouri '---''(_/--' `-'\_)
Cat Pics: http://andyc.dyndns.org/animal.html

Joe Howell
Shelter Insurance Companies
Columbia, MO

-
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes


Re: Dwindling Performance

2004-01-14 Thread Roger Deschner
I have posted many times in the past saying you should never do a
database unload/reload to gain performance. But this just might be the
one case where it might make sense - the remaining half of a split
server. But before you do something that drastic, dangerous, and time
consuming, look for the things that are easier to fix.

My basic metric of whether or not you are in trouble is, how long does
expiration take? If you start it daily, the closer it is to 24 hours
running time, the closer you are to doomsday. Never-ending expiration is
the classic symptom of TSM Server Meltdown.

But on the other hand, if your expiration runs nice and fast, your
server and its database are probably OK. Look to clients as the problem.
They can't all squeeze in the door at once, so don't let them try. If
they use the client-polling scheduler, how long is the backup window,
and what is your setting for Schedule Randomization Percentage? Make it
as high as possible - SET RANDOMIZE 50. This will also help if you are
having any kind of a network bottleneck.

Look at these clients on a micro level. About how much are they each
actually backing up? If it's not much, then your theory might be
right, that they are very busy downloading their lists of backed up
files. In that case, load spreading will be the best thing you could
do. You might consider a schedule where not every client does a full
"Incremental" every night - perhaps they only do one every other night
and on the other nights they do an "incrbydate" backup which is much
faster, because it goes only by the timestamps in the file system.

Not to ask the obvious, but what's your Database Cache Hit Percentage?
(Q DB F=D) If it's below 99%, it needs help. Even (especially) a badly
fragmented database will run a lot faster if you have it swimming in
cache.

Look at other differences between your two instances - are they
basically different types of clients?

Roger Deschner  University of Illinois at Chicago [EMAIL PROTECTED]
The short fortuneteller who escaped from prison=
==was a small medium at large.==



On Tue, 13 Jan 2004, Andy Carlson wrote:

>We are having terrible performance with one of our instances of TSM.  I
>have suspicions, but I want to hear what you guys say.  Here is what we
>have:
>
>2 instances of TSM - TSMI and TSMU (TSMI is the problem)
>
>TSM 5.2.1.1
>AIX 51.ML4
>RS/6000 P670 - 8 processors, 16GB memory
>Fastt700 SAN
>STK9840 Tape drives
>
>The Database is 85% of 88GB (with room to expand another 50GB or so).
>
>Right at this moment, we have 233 sessions with TSMI.  The backup
>sessions grind to a halt for hours at a time, with nothing apparently
>happening.  I suspect that the directory trees are being downloaded and
>built, but not sure
>
>When we split TSMI and TSMU, we created the TSMU instance, and did a
>full backup on all the servers that moved there.  The TSMI database is a
>restored copy of the original database, with the TSMU stuff deleted out.
>
>Any ideas would be greatly appreciated.
>
>
>Andy Carlson|\  _,,,---,,_
>Senior Technical Specialist   ZZZzz /,`.-'`'-.  ;-;;,_
>BJC Health Care|,4-  ) )-,_. ,\ (  `'-'
>St. Louis, Missouri   '---''(_/--'  `-'\_)
>Cat Pics: http://andyc.dyndns.org/animal.html
>