Re: AIX TSM performance improvement (was Re: OS390 TSM Performance questions.)

2003-02-17 Thread Roger Deschner
onday! You've got something to play with for the whole day :-)
>
>Zlatko Krastev
>IT Consultant
>
>P.S. I am charging my customers for such advices but hopefully I can get a
>beer (or Swiss chocolate) for this one :-)
>
>
>
>
>
>
>
>PAC Brion Arnaud <[EMAIL PROTECTED]>
>Sent by: "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]>
>14.02.2003 16:08
>Please respond to "ADSM: Dist Stor Manager"
>
>
>To: [EMAIL PROTECTED]
>cc:
>Subject:Re: OS390 TSM Performance questions.
>
>
>Hi Rodney,
>
>The big picture :
>System : aix 4.3.3.0 on a 6h1 machine, 2Gb memory, 2 cpu
>Vmtune settings : -p10 -P40 -R 256 -F376 -W256 -s1
>TSM version : 4.2.3.1
>Bufpoolsize : 524288
>Iostat:
>tty:  tin tout   avg-cpu:  % user% sys % idle%
>iowait
>  0.0 25.9  55.0 13.0   10.9
>21.1
>
>Vmstat :
>kthr memory page  faultscpu
>- ---   ---
> r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
> 3  3 288460 112615   0   0   0 1220 2288   0 2034 3080 369 55 13 11 21
>
>>q dbvol
>
>Volume Name   CopyVolume Name   CopyVolume Name
>Copy
>(Copy 1)  Status  (Copy 2)  Status  (Copy 3)
>Status
>  --    --  
>--
>/tsmdb/db01.dsm   Sync'd  /tsmdb_m/db01.d-  Sync'd
>Undef-
>   sm
>ined
>/tsmdb/db02.dsm   Sync'd  /tsmdb_m/db02.d-  Sync'd
>Undef-
>   sm
>ined
>/tsmdb/db03.dsm   Sync'd  /tsmdb_m/db03.d-  Sync'd
>Undef-
>   sm
>ined
>/tsmdb/db04.dsm   Sync'd  /tsmdb_m/db04.d-  Sync'd
>Undef-
>   sm
>ined
>/tsmdb/db05.dsm   Sync'd  /tsmdb_m/db05.d-  Sync'd
>Undef-
>   sm
>ined
>/tsmdb/db06.dsm   Sync'd  /tsmdb_m/db06.d-  Sync'd
>Undef-
>   sm
>ined
>/tsmdb/db07.dsm   Sync'd  /tsmdb_m/db07.d-  Sync'd
>Undef-
>   sm
>ined
>/tsmdb/db08.dsm   Sync'd  /tsmdb_m/db08.d-  Sync'd
>Undef-
>   sm
>ined
>>q logvol
>
>Volume Name   CopyVolume Name   CopyVolume Name
>Copy
>(Copy 1)  Status  (Copy 2)  Status  (Copy 3)
>Status
>  --    --  
>--
>/tsmlog2/log02.-  Sync'd  /tsmlog_m/log02-  Sync'd
>Undef-
> dsm   .dsm
>ined
>/tsmlog2/log01.-  Sync'd  /tsmlog_m/log01-  Sync'd
>Undef-
> dsm   .dsm
>ined
>/tsmlog2/log03.-  Sync'd  /tsmlog_m/log03-  Sync'd
>Undef-
> dsm   .dsm
>ined
>/tsmlog2/log04.-  Sync'd  /tsmlog_m/log04-  Sync'd
>Undef-
> dsm   .dsm
>ined
>/tsmlog2/log05.-  Sync'dUndef-
>Undef-
> dsm ined
>ined
>/tsmlog2/log06.-  Sync'dUndef-
>Undef-
> dsm ined
>ined
>
>Some info about disk layout :
>
>tsmvg:
>LV NAME TYPE   LPs   PPs   PVs  LV STATE  MOUNT
>POINT
>loglv00 jfslog 1 2 2open/syncdN/A
>lvtsmdb jfs128   256   8open/syncd/tsmdb
>lvtsmdb1jfs128   256   8open/syncd/tsmdb1
>tsmvg_log_m:
>LV NAME TYPE   LPs   PPs   PVs  LV STATE  MOUNT
>POINT
>lvtsmlog_m  jfs96963open/syncd/tsmlog_m
>loglv05 jfslog 1 1 1open/syncdN/A
>tsmvg_m:
>LV NAME TYPE   LPs   PPs   PVs  LV STATE  MOUNT
>POINT
>lvtsmdb_m   jfs128   128   4open/syncd/tsmdb_m
>loglv01 jfslog 1 2 2open/syncdN/A
>tsmvg_log:
>LV NAME TYPE   LPs   PPs   PVs  LV STATE  MOUNT
>POINT
>lvtsmlog2   jfs96192   6open/syncd/tsmlog2
>loglv04 jfslog 1 2 2open/syncdN/A
>
>All volumes for TSM db and logs are striped.
>What I'm experiencing is very high cpu usage (mini 50 %, max 99), and
>paging as soon as backups or expire inventory are started. Also Cache
>Hit Pct is low (98.69) although I increased bufpoolsize from 151552 to
>524288 (where my performance problems began)! Expire inventory needs
>more or less 20 hours to explor

Re: AIX TSM performance improvement (was Re: OS390 TSM Performance questions.)

2003-02-17 Thread PAC Brion Arnaud
To all that responded my request :

I'm really gratefull for the help you provided to me; hopefully I'll
manage it to use your precious advices properly, therefore ceasing
whining on my weak system ;-) 
What still amazes me, is that our TSM server has been running flawless
using a "faulty" configuration such a long time, until it reached the
20-Gb-DB-size zone, where it began coughing and slowing down like an
elderly person.
Anyway, thanks again !

Arnaud (soon exploring the depths and joys of LVM, thanks to Zlatko ;-))

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
| Arnaud Brion, Panalpina Management Ltd., IT Group |
| Viaduktstrasse 42, P.O. Box, 4002 Basel - Switzerland |
| Phone: +41 61 226 19 78 / Fax: +41 61 226 17 01   | 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=



-Original Message-
From: Zlatko Krastev/ACIT [mailto:[EMAIL PROTECTED]] 
Sent: Monday, 17 February, 2003 2:52
To: [EMAIL PROTECTED]
Subject: AIX TSM performance improvement (was Re: OS390 TSM Performance
questions.)


Arnaud,

--> 6h1 machine, 2Gb memory ...
--> ... I increased bufpoolsize from 151552 to 524288 (where my 
--> performance
problems began)!

512k pages, 4 kB each equals 2 GB. So you left no space for AIX, file
buffers, TSM code, etc. Consequences: excessive paging and performance
degradation.

--> Vmtune settings : -p10 -P40

This means file buffers will occupy 10-40% of the real memory. Count it
50% with AIX kernel, TCP/IP buffers and TSM code. The result - TSM data
structures (mainly DB bufferpool and log befferpool) should not exceed 1
GB. LOWER the buffpoolsize to 262144 !!!

--> (Copy 1)  Status  (Copy 2)
--> /tsmdb/db01.dsm   Sync'd  /tsmdb_m/db01.d
+
--> LV NAME TYPELPs PPs PVs LV STATEMOUNT
--> lvtsmdb jfs 128 256 8   open/syncd  /tsmdb
--> lvtsmdb_m   jfs 128 128 4   open/syncd  /tsmdb_m
+
--> /tsmdb/db01.dsm ... /tsmdb/db08.dsm

Rodney already pointed you have excessive mirroring. Actually it is not
4-way but 2+1 way (secondary copies are not AIX-mirrored). However the
results are same - both AIX and TSM mirroring are applied sequentially
introducing the sum of all consistency delays (you had not shown the
"mirrorread db", "mirrorread log", "mirrorwrite db" and "mirrorwrite
log" options from dsmserv.opt, so I am assuming the defaults are used).

The discussion raw LVs vs. filesystem dbvols was always short - the
performance benefit on AIX is small (on Solaris is much higher). However
it is worth to try using raw LVs and further reduce file buffering using
"vmtune -p 5 -P 10" (as Rodney already suggested). This may also allow
to raise bufpoolsize to 65-75% of the RAM.

The discussion how many dbvols per HDD was sparkling several times on
this list. I personally am in the group of believers that single dbvol
per HDD is better. The argument is simple - TSM attempts to
"parallelize" the load over many dbvols results disk heads thrashing.
The example in your case:
-   8 dbvols within single filesystem;
-   the filesystem is on 4x2 disks (32 PPs each);
-   let TSM has to write 16 pages;
1. dbvol1 is occupying PPs 1 through 4 on each disk, dbvol2 is on PPs
5-8, etc. 2. TSM server attempt to "parallelize" will write page 1 on
dbvol1, 2 on dbvol2, ..., 8 on dbvol8, 9 again on dbvol1, etc. 3. the
result will be write of pages 1&9 on PP 1 (for dbvol1), pages 2&10 on PP
5 (for dbvol2), ..., and pages 8&16 on PP 29 (for dbvol8). ALL THOSE on
HDD1 !!! What happened to the parallelism :-((( And why the disk should
move the heads back'n'forth.

The medicine (all can be done under load and without restarting AIX
and/or TSM; however some performance impact should be expected): 1.
eliminate AIX mirroring (use rmlvcopy command) 2. rearrange the tsmvg
and free 4 disks (migratepv and reducevg) 3. create new VG from those 4
disks using smaller PP size (more PPs per
PV)
4. create *separate* jfs logs on each disk (mklv -y  -t jfslog
 1 ) 5. initialize each log (logform /dev/) 6.
create the filesystem LVs (mklv -y   XYZ ) 7.
create each filesystem with *own* log (crfs -v jfs -d /dev/ -m
/tsm/dbN -A yes -a logname=/dev/). Note the "-a logname"
option of AIX crfs/chfs commands. 8. define *single* big volume on each
filesystem 9. if volumes created in step 8 can be created same size as
existing ones, define new volumes as third copy and delete the first
copy. If not - dbvol delete/migrate ought to be used. 10. delete the
rest of tsmvg and mirror using only one method (use mklvcopy for AIX
mirroring; use extendvg, repeat steps 4-8 and "def dbc" for TSM
mirroring). 11. repeat steps 1-10 for TSM log 12. get rid of tsmvg_m and
tsmvg_log_m. Use the disks for diskpool or add them as third copy within
*the same* mirroring scheme (both AIX and TSM use LVM which allows three
cop

Re: AIX TSM performance improvement (was Re: OS390 TSM Performance questions.)

2003-02-17 Thread PAC Brion Arnaud
Zlatko,

You said :
>> P.S. I am charging my customers for such advices but hopefully I can
get a beer (or Swiss chocolate) for this one :-)

I promise to send you that chocolate (wich one do you prefer : black,
white, milky, bitter quality ? Just ask, we have lots of varieties ) as
soon as my system will be fine again !
But therefore you should give me a snail-mail address where I could ship
it to ...

I sincerely appreciate your help, and would really be glad if a person
like you could be our TSM consultant ! Unfortunately none of the ones I
met here in Switzerland had half of your skills, and nobody notified me
about messy, faulty, or call it whatever you will, disk configuration !
Best regards.

Arnaud

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
| Arnaud Brion, Panalpina Management Ltd., IT Group |
| Viaduktstrasse 42, P.O. Box, 4002 Basel - Switzerland |
| Phone: +41 61 226 19 78 / Fax: +41 61 226 17 01   | 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=



-Original Message-
From: Zlatko Krastev/ACIT [mailto:[EMAIL PROTECTED]] 
Sent: Monday, 17 February, 2003 2:52
To: [EMAIL PROTECTED]
Subject: AIX TSM performance improvement (was Re: OS390 TSM Performance
questions.)


Arnaud,

--> 6h1 machine, 2Gb memory ...
--> ... I increased bufpoolsize from 151552 to 524288 (where my 
--> performance
problems began)!

512k pages, 4 kB each equals 2 GB. So you left no space for AIX, file
buffers, TSM code, etc. Consequences: excessive paging and performance
degradation.

--> Vmtune settings : -p10 -P40

This means file buffers will occupy 10-40% of the real memory. Count it
50% with AIX kernel, TCP/IP buffers and TSM code. The result - TSM data
structures (mainly DB bufferpool and log befferpool) should not exceed 1
GB. LOWER the buffpoolsize to 262144 !!!

--> (Copy 1)  Status  (Copy 2)
--> /tsmdb/db01.dsm   Sync'd  /tsmdb_m/db01.d
+
--> LV NAME TYPELPs PPs PVs LV STATEMOUNT
--> lvtsmdb jfs 128 256 8   open/syncd  /tsmdb
--> lvtsmdb_m   jfs 128 128 4   open/syncd  /tsmdb_m
+
--> /tsmdb/db01.dsm ... /tsmdb/db08.dsm

Rodney already pointed you have excessive mirroring. Actually it is not
4-way but 2+1 way (secondary copies are not AIX-mirrored). However the
results are same - both AIX and TSM mirroring are applied sequentially
introducing the sum of all consistency delays (you had not shown the
"mirrorread db", "mirrorread log", "mirrorwrite db" and "mirrorwrite
log" options from dsmserv.opt, so I am assuming the defaults are used).

The discussion raw LVs vs. filesystem dbvols was always short - the
performance benefit on AIX is small (on Solaris is much higher). However
it is worth to try using raw LVs and further reduce file buffering using
"vmtune -p 5 -P 10" (as Rodney already suggested). This may also allow
to raise bufpoolsize to 65-75% of the RAM.

The discussion how many dbvols per HDD was sparkling several times on
this list. I personally am in the group of believers that single dbvol
per HDD is better. The argument is simple - TSM attempts to
"parallelize" the load over many dbvols results disk heads thrashing.
The example in your case:
-   8 dbvols within single filesystem;
-   the filesystem is on 4x2 disks (32 PPs each);
-   let TSM has to write 16 pages;
1. dbvol1 is occupying PPs 1 through 4 on each disk, dbvol2 is on PPs
5-8, etc. 2. TSM server attempt to "parallelize" will write page 1 on
dbvol1, 2 on dbvol2, ..., 8 on dbvol8, 9 again on dbvol1, etc. 3. the
result will be write of pages 1&9 on PP 1 (for dbvol1), pages 2&10 on PP
5 (for dbvol2), ..., and pages 8&16 on PP 29 (for dbvol8). ALL THOSE on
HDD1 !!! What happened to the parallelism :-((( And why the disk should
move the heads back'n'forth.

The medicine (all can be done under load and without restarting AIX
and/or TSM; however some performance impact should be expected): 1.
eliminate AIX mirroring (use rmlvcopy command) 2. rearrange the tsmvg
and free 4 disks (migratepv and reducevg) 3. create new VG from those 4
disks using smaller PP size (more PPs per
PV)
4. create *separate* jfs logs on each disk (mklv -y  -t jfslog
 1 ) 5. initialize each log (logform /dev/) 6.
create the filesystem LVs (mklv -y   XYZ ) 7.
create each filesystem with *own* log (crfs -v jfs -d /dev/ -m
/tsm/dbN -A yes -a logname=/dev/). Note the "-a logname"
option of AIX crfs/chfs commands. 8. define *single* big volume on each
filesystem 9. if volumes created in step 8 can be created same size as
existing ones, define new volumes as third copy and delete the first
copy. If not - dbvol delete/migrate ought to be used. 10. delete the
rest of tsmvg and mirror using only one method (use mklvcopy for AIX
mirroring; use extendvg, repeat steps 4-8 and "def dbc" for TSM
mirroring). 11. repeat steps 1-10 for TSM log 

Re: AIX TSM performance improvement (was Re: OS390 TSM Performance questions.)

2003-02-16 Thread Zlatko Krastev/ACIT
Sorry, fingers were faster than brain.

bufpoolsize option is in kB, while "Buffer Pool Pages" output of "q db
f=d" is in 4kB pages. Therefore DB bufferpool might be occupying only 25%
of the real memory + up to 40% for file buffers ==> memory contention
might not have place.

Zlatko Krastev
IT Consultant






Zlatko Krastev/ACIT <[EMAIL PROTECTED]>
Sent by: "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]>
17.02.2003 03:51
Please respond to "ADSM: Dist Stor Manager"


To: [EMAIL PROTECTED]
cc:
Subject:    AIX TSM performance improvement (was Re: OS390 TSM Performance 
questions.)


Arnaud,

--> 6h1 machine, 2Gb memory ...
--> ... I increased bufpoolsize from 151552 to 524288 (where my
performance
problems began)!

512k pages, 4 kB each equals 2 GB. So you left no space for AIX, file
buffers, TSM code, etc. Consequences: excessive paging and performance
degradation.

...



AIX TSM performance improvement (was Re: OS390 TSM Performance questions.)

2003-02-16 Thread Zlatko Krastev/ACIT
Arnaud,

--> 6h1 machine, 2Gb memory ...
--> ... I increased bufpoolsize from 151552 to 524288 (where my performance
problems began)!

512k pages, 4 kB each equals 2 GB. So you left no space for AIX, file
buffers, TSM code, etc. Consequences: excessive paging and performance
degradation.

--> Vmtune settings : -p10 -P40

This means file buffers will occupy 10-40% of the real memory. Count it
50% with AIX kernel, TCP/IP buffers and TSM code. The result - TSM data
structures (mainly DB bufferpool and log befferpool) should not exceed 1
GB. LOWER the buffpoolsize to 262144 !!!

--> (Copy 1)  Status  (Copy 2)
--> /tsmdb/db01.dsm   Sync'd  /tsmdb_m/db01.d
+
--> LV NAME TYPELPs PPs PVs LV STATEMOUNT
--> lvtsmdb jfs 128 256 8   open/syncd  /tsmdb
--> lvtsmdb_m   jfs 128 128 4   open/syncd  /tsmdb_m
+
--> /tsmdb/db01.dsm ... /tsmdb/db08.dsm

Rodney already pointed you have excessive mirroring. Actually it is not
4-way but 2+1 way (secondary copies are not AIX-mirrored). However the
results are same - both AIX and TSM mirroring are applied sequentially
introducing the sum of all consistency delays (you had not shown the
"mirrorread db", "mirrorread log", "mirrorwrite db" and "mirrorwrite log"
options from dsmserv.opt, so I am assuming the defaults are used).

The discussion raw LVs vs. filesystem dbvols was always short - the
performance benefit on AIX is small (on Solaris is much higher). However
it is worth to try using raw LVs and further reduce file buffering using
"vmtune -p 5 -P 10" (as Rodney already suggested). This may also allow to
raise bufpoolsize to 65-75% of the RAM.

The discussion how many dbvols per HDD was sparkling several times on this
list. I personally am in the group of believers that single dbvol per HDD
is better. The argument is simple - TSM attempts to "parallelize" the load
over many dbvols results disk heads thrashing. The example in your case:
-   8 dbvols within single filesystem;
-   the filesystem is on 4x2 disks (32 PPs each);
-   let TSM has to write 16 pages;
1. dbvol1 is occupying PPs 1 through 4 on each disk, dbvol2 is on PPs 5-8,
etc.
2. TSM server attempt to "parallelize" will write page 1 on dbvol1, 2 on
dbvol2, ..., 8 on dbvol8, 9 again on dbvol1, etc.
3. the result will be write of pages 1&9 on PP 1 (for dbvol1), pages 2&10
on PP 5 (for dbvol2), ..., and pages 8&16 on PP 29 (for dbvol8). ALL THOSE
on HDD1 !!! What happened to the parallelism :-((( And why the disk should
move the heads back'n'forth.

The medicine (all can be done under load and without restarting AIX and/or
TSM; however some performance impact should be expected):
1. eliminate AIX mirroring (use rmlvcopy command)
2. rearrange the tsmvg and free 4 disks (migratepv and reducevg)
3. create new VG from those 4 disks using smaller PP size (more PPs per
PV)
4. create *separate* jfs logs on each disk (mklv -y  -t jfslog
 1 )
5. initialize each log (logform /dev/)
6. create the filesystem LVs (mklv -y   XYZ )
7. create each filesystem with *own* log (crfs -v jfs -d /dev/ -m
/tsm/dbN -A yes -a logname=/dev/). Note the "-a logname" option
of AIX crfs/chfs commands.
8. define *single* big volume on each filesystem
9. if volumes created in step 8 can be created same size as existing ones,
define new volumes as third copy and delete the first copy. If not - dbvol
delete/migrate ought to be used.
10. delete the rest of tsmvg and mirror using only one method (use
mklvcopy for AIX mirroring; use extendvg, repeat steps 4-8 and "def dbc"
for TSM mirroring).
11. repeat steps 1-10 for TSM log
12. get rid of tsmvg_m and tsmvg_log_m. Use the disks for diskpool or add
them as third copy within *the same* mirroring scheme (both AIX and TSM
use LVM which allows three copies).

--> ... and won't be back before Monday

Happy Monday! You've got something to play with for the whole day :-)

Zlatko Krastev
IT Consultant

P.S. I am charging my customers for such advices but hopefully I can get a
beer (or Swiss chocolate) for this one :-)







PAC Brion Arnaud <[EMAIL PROTECTED]>
Sent by: "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]>
14.02.2003 16:08
Please respond to "ADSM: Dist Stor Manager"


To: [EMAIL PROTECTED]
cc:
Subject:Re: OS390 TSM Performance questions.


Hi Rodney,

The big picture :
System : aix 4.3.3.0 on a 6h1 machine, 2Gb memory, 2 cpu
Vmtune settings : -p10 -P40 -R 256 -F376 -W256 -s1
TSM version : 4.2.3.1
Bufpoolsize : 524288
Iostat:
tty:  tin tout   avg-cpu:  % user% sys % idle%
iowait
  0.0 25.9  55.0 13.0   10.9
21.1

Vmstat :
kthr memory page  faultscpu
- ---   ---
 r  b   av

Re: OS390 TSM Performance questions.

2003-02-14 Thread Paul Ripke
On Friday, Feb 14, 2003, at 19:44 Australia/Sydney, PAC Brion Arnaud
wrote:


Hi all,

I followed your discussion with much interest, as I'm suffering from
huge performance problem problem too. Unfortunately I'm not under
OS390,
but using AIX 4.3.3 : could someone tell me if there is some some trick
like this one, that should be considered, when using this OS ?


Is the system paging, by any chance?
When we were running TSM on AIX, we needed to do the following to
prevent
the AIX file/VM cache from paging out TSM (as must be done on any SAP
database server):
/usr/samples/kernel/vmtune -p 5 -P 8


Another thing that annoys me : using "show memu SHORT" on my server
(TSM
4.2.3.1) returns : ANR2000E Unknown command - SHOW MEMU
Could it be that this command is only available for OS390 TSM version ?


I'm guessing so.

--
Paul Ripke
Unix/OpenVMS/TSM/DBA
101 reasons why you can't find your Sysadmin:
68: It's 9AM. He/She is not working that late.
-- Koos van den Hout



Re: OS390 TSM Performance questions.

2003-02-14 Thread Darby, Mark
 by the way, if you really need to do so.
I doubt you will ever need that, but if GETMAIN/FREEMAIN activity rises to a
point where it does severely degrade performance AND you have some way of
determining that it is, indeed, the culprit of your performance problem,
call TSM support and ask them how to do that.

It is unclear at what point such memory management thrashing begins because
there are no indicators of which I am aware that one can use to determine
that, specifically, but I would say that, generally, the more stress you
place on the server (i.e., the more and more varied work you try to have the
server perform) the more likely it is to enter such a state.

I hope this helps someone, and does not frustrate too many of you who were
willing to read it but found it to be too verbose and, perhaps,
unenlightening.  Also, please forgive me if I appear to be an arrogant
know-it-all.  I do not mean to be that way - I just "come off" that way - or
so I have been told by many close and dear associates.  It's the only way I
know.

Finally, if anyone has any cause to disbelieve, or can refute anything I
have stated here, I am very much interested in any corrections, amendments,
or disputations you might wish to provide.

Kindest Regards,
Mark Darby (or, as know by my peers and close associates - Mr. Verbose - you
see why?)
(301) 903-5229

-Original Message-
From: Bill Kelly [mailto:[EMAIL PROTECTED]]
Sent: Friday, February 14, 2003 8:59 AM
To: [EMAIL PROTECTED]
Subject: Re: OS390 TSM Performance questions.

Hi,

I wanted to clarify, or perhaps retract, something about the 'show memu
SHORT' command that has been mentioned in this thread.  Not surprisingly,
the numbers you get from this command vary depending on what's been going
on in the server.  Specifically, just after startup in a 512 MB region, I
get:

MAX initial storage  536870912  (512.0 MB)
Freeheld bytes   69409  (0.1 MB)
MaxQuickFree bytes 10390159  (9.9 MB)
83 Page buffers of 12683 : 0 buffers of 1585.
0 Large buffers of 792 : 1 XLarge buffers of 99.
   61 buffers free: 148 hiAlloc buffers: 87 current buffers.
   12 units of 56 bytes hiAlloc: 12 units of 104 bytes hiCur.

A couple of hours later, after a storage pool copy has run and nightly
backups are in full swing, I get:

MAX initial storage  536870912  (512.0 MB)
Freeheld bytes 10397099  (9.9 MB)
MaxQuickFree bytes 10390159  (9.9 MB)
83 Page buffers of 12683 : 14 buffers of 1585.
2 Large buffers of 792 : 18 XLarge buffers of 99.
   21616 buffers free: 48260 hiAlloc buffers: 13604 current buffers.
   13400 units of 104 bytes hiAlloc: 4879 units of 56 bytes hiCur.

Note that the Freeheld number, which intially looked 'bad', now looks
'good'.  As has been pointed out to me off-list, unless you know how to
interpret the numbers, they're just that - a bunch of numbers.  I
should've known better.  :-)

Regards,
Bill

Bill Kelly
Auburn University
[EMAIL PROTECTED]



Re: OS390 TSM Performance questions.

2003-02-14 Thread John Naylor
I am resending this to the list, because I only copied it to Bill earlier, and
the thread is still going so there may be some interest in it


-- Forwarded by John Naylor/HAV/SSE on 02/14/2003 03:54 PM
---


John Naylor
02/14/2003 09:43 AM

To:   Bill Kelly <[EMAIL PROTECTED]>
cc:
Subject:  Re: OS390 TSM Performance questions.  (Document link: John Naylor)

Hi Bill,
We are running TSM 4.2.2 on os390 2.10 pn a 9672 (5 cpus)
I have region at 512 mb. and bufferpool at 48 mb.
I am running alongside other business applications, and it is only when they are
busy and the
lpar becomes cpu constrained especially when a DB2 application is very busy that
TSM suffers
significantly performance wise.
This morning I ran migration from disk to tape alongside expiration and and I
got 450 mb. per minute,
I do occasionally bounce the server, maybe every three weeks if my perception is
that the performance is a bit more sluggish than normal.
Do you have access to RMF or similar performance tools, as this should help to
isolate what is causing your performance issue.
I personally would look at your  bufferpool size, maybe reducing it. What do
your stats show,
I ran the show memu see results below, but unless someone who really understands
the figures explains
what they mean?, what are good figures?, what are bad figures?, how are they
impacted by the current TSM activity? , I do not think they are worth very much.
For example my Freeheld bytes shows (0.7 MB)
Good, bad, indifferent.? Who knows?

 MAX initial storage  536870912  (512.0 MB)
 Freeheld bytes  741499  (0.7 MB)
 MaxQuickFree bytes 10391797  (9.9 MB)
 35 Page buffers of 12685 : 79 buffers of 1585.
 6 Large buffers of 792 : 54 XLarge buffers of 99.
1221 buffers free: 5544 hiAlloc buffers: 4323 current buffers.
3290 units of 40 bytes hiAlloc: 3289 units of 40 bytes hiCur.

So in summary my advice would be make sure TSM is getting the cpu it needs
Check your RMF or similar
Ensure you are running MPthreading
Have a look through ADSM.org (search  "region")
regards,
John





Bill Kelly <[EMAIL PROTECTED]> on 02/13/2003 07:48:37 PM

Please respond to Bill Kelly <[EMAIL PROTECTED]>

To:   [EMAIL PROTECTED]
cc:(bcc: John Naylor/HAV/SSE)
Subject:  Re: OS390 TSM Performance questions.



Hi,

We seem to be experiencing symptoms similar (identical?) to Alan's.

We're at z/OS 1.2, running on a 2066-002 w/ 8GB of memory and virtually no
paging; TSM is at 4.2.3.0.; database is 55% of 106 GB.  Network
connectivity is via GB ethernet. Disk pool is 190GB on an ESS. Nightly
backup load is approximately 230 clients (a mix of desktops and servers),
averaging in the 130-140GB range per night total.

For some weeks now (I'm not sure when this started, but I know the problem
was there at 4.2.2.10), we've been seeing horrible performance after TSM
has been up for a few hours.  For example, I can watch 3 migration
processes that run along fine for a little while, each getting approx. 400
MB/min throughput, then suddenly CPU utilization by TSM shoots up to 95%
and throughput on the migrations drops to approx. 50 MB/min per process.
Stopping and restarting the processes does no good, but cycling the server
clears up the problem.  I'm certain this problem affects other server
activities, such as client backups, storage pool backups, etc.

Like Alan, I've been ratcheting up the region size (up to 1.5 GB) and the
db bufferpool size (up to 384 MB) in a vain attempt to help matters.

I recently resorted to cycling the server 4 times per day just to get the
performance needed to keep up with things.

Based on the comments in this thread, I last night changed our region size
to 512 MB and db bufferpool size to 128 MB.  Until now, I wasn't aware of
the 'show memu' diagnostic command (thanks Alan/Mark! I finally have
*something* to quantify directly); here's the output from our server:

MAX initial storage  536870912  (512.0 MB)
Freeheld bytes   63678  (0.1 MB)
MaxQuickFree bytes 10390159  (9.9 MB)
83 Page buffers of 12683 : 0 buffers of 1585.
0 Large buffers of 792 : 1 XLarge buffers of 99.
   68 buffers free: 134 hiAlloc buffers: 66 current buffers.
   12 units of 56 bytes hiAlloc: 11 units of 88 bytes hiCur.

So apparently I still have the 'tiny Freeheld' problem; I suspect strongly
I had the same trouble at 1.5 GB region size. (I don't suppose the
functions of and relationships among these buffer pools is documented
anywhere?  I haven't found anything in the list archives or at the support
web site.)  I wonder if there's a factor other than db bufferpool size
and region size that's affecting these buffer pool allocations?

I suspect that our server performance goes south once we run out of
one/some type(s) of these buffers and the server starts
GETMAINing/FREEMAINing itself to death?

Lacking any further information

Re: OS390 TSM Performance questions.

2003-02-14 Thread Rodney clark
Well you dont mention the stripe size used for the database and logs file
systems.
We had our database and log on a striped filesystem and we had a really bad
expire perf srtipe size was 64 kb.

>From memory all database actions are 4kbyte.

Are these ssa disk if yes do you have write cache installed if yes is it
enabled for these disks ?

We moved the database and logs back to old normal lv's and we had a marked
inprovement in expirary perf. We also did a unload load which also really
helped.
Before the changes we were receiving more objects per day than we could
expire.

You seem to have 4 way mirroring TSM and And AIX or am I misreading this.

We use only AIX lvm mirroring.
Be aware that using AIX mirroring you get hit by the mirror write
consistency update.

To mimimize this have the busiest lv's postioned at the outer edge right
next to track 0
Write cache eliminates the perf hit of mwcc.

Also the setup of multple vg's can alos result in having multiple active
jfslogs whereas with one large vg you can isloate the jfslog to one or more
disks but I assume that the jfslog would not be very busy. Filemon can
answer that question.

the -W 256 in vmtune could this be your problem ?


Here is our vmtune stuff for comparison use it if you dare.
# ./vmtune
vmtune:  current values:
  -p   -P-r  -R -f   -F   -N-W
minperm  maxperm  minpgahead maxpgahead  minfree  maxfree  pd_npages
maxrandwrt
  2621352426   2128   10001 5242880

  -M  -w  -k  -c-b -B   -u-l
-d
maxpin npswarn npskill numclust numfsbufs hd_pbuf_cnt lvm_bufcnt lrubucket
defps
419411   163844096   1 930560 64  131072
1

-s  -n -S -L  -g   -h
sync_release_ilock  nokilluid  v_pinshm  lgpg_regions  lgpg_size
strict_maxperm
0   0   0   000

 -t
maxclient
 418588

number of valid memory pages = 524263   maxperm=10.0% of real memory
maximum pinable=80.0% of real memoryminperm=5.0% of real memory
number of file memory pages = 92187 numperm=17.6% of real memory

number of compressed memory pages = 0   compressed=0.0% of real memory
number of client memory pages = 0   numclient=0.0% of real memory
# of remote pgs sched-pageout = 0   maxclient=79.8% of real memory



Re: OS390 TSM Performance questions.

2003-02-14 Thread Richard Sims
>Richard,
>
>I totally agree with you, but if I tell you that my current situation
>happened due to the "help" from IBM support, what would you answer ? ...

I'd agree with your frustration, Arnaud, given the premium price being
paid to a top vendor.  Generally, though, unless a vendor specialist is
one I know, who is familiar with my site, I accept the advice as
suggestive, and thereafter check it to the extent possible for
usability in my site, and test it if at all possible first, as for
example deploying new drive microcode on just one drive first.  Some
technician on the other end of the phone line knows nothing about your
site and is just rendering an isolated suggestion - which we *expect*
has been reasonably tested by the product specialists.  For new
products, we can often look to the various industry reviews for the
worthiness of offerings, and do web searches for customer experiences.
As for chronic problem areas like certain software or 8mm drives -
well, that's its own story, which we know painfully well.

>...but I'm really bored getting white hair for problems that would not
>exist if some people had done their job properly...

Hah!  I'm way ahead of you: vendor exasperation started giving me
gray hair when I was about 34!

   Richard Sims, BU

"Most vendors have a view of time that would amaze Albert Einstein."



Re: OS390 TSM Performance questions.

2003-02-14 Thread Alan Davenport
Hi,

 I hoped I made it clear it was pure supposition as to what the number
meant. (: I'm trying to make sense of why, by reducing my region size by
768M performance increased dramatically. My daily maintenance cycle was a
good 3+ hours ahead of schedule when I left for the day yesterday. I'm on
track to see a similar completion time today. Who knows why this is so?
Perhaps noone. My reaction is I am glad that it has helped solve my problem.
At least it has provided a lively discussion!

   Take care,
   Al

-Original Message-
From: Bill Kelly [mailto:[EMAIL PROTECTED]]
Sent: Friday, February 14, 2003 8:59 AM
To: [EMAIL PROTECTED]
Subject: Re: OS390 TSM Performance questions.


Hi,

I wanted to clarify, or perhaps retract, something about the 'show memu
SHORT' command that has been mentioned in this thread.  Not surprisingly,
the numbers you get from this command vary depending on what's been going
on in the server.  Specifically, just after startup in a 512 MB region, I
get:

MAX initial storage  536870912  (512.0 MB)
Freeheld bytes   69409  (0.1 MB)
MaxQuickFree bytes 10390159  (9.9 MB)
83 Page buffers of 12683 : 0 buffers of 1585.
0 Large buffers of 792 : 1 XLarge buffers of 99.
   61 buffers free: 148 hiAlloc buffers: 87 current buffers.
   12 units of 56 bytes hiAlloc: 12 units of 104 bytes hiCur.

A couple of hours later, after a storage pool copy has run and nightly
backups are in full swing, I get:

MAX initial storage  536870912  (512.0 MB)
Freeheld bytes 10397099  (9.9 MB)
MaxQuickFree bytes 10390159  (9.9 MB)
83 Page buffers of 12683 : 14 buffers of 1585.
2 Large buffers of 792 : 18 XLarge buffers of 99.
   21616 buffers free: 48260 hiAlloc buffers: 13604 current buffers.
   13400 units of 104 bytes hiAlloc: 4879 units of 56 bytes hiCur.

Note that the Freeheld number, which intially looked 'bad', now looks
'good'.  As has been pointed out to me off-list, unless you know how to
interpret the numbers, they're just that - a bunch of numbers.  I
should've known better.  :-)

Regards,
Bill

Bill Kelly
Auburn University
[EMAIL PROTECTED]



Re: OS390 TSM Performance questions.

2003-02-14 Thread PAC Brion Arnaud
Richard,

I totally agree with you, but if I tell you that my current situation
happened due to the "help" from IBM support, what would you answer ? I
had a rock solid system running TSM 4.2.2.1, until we had a problem with
a tape library wich was unable to locate some tapes, I was then asked to
update the library microcode, what I did. Off course another problem
appeared, that this time should be corrected by upgradation from TSM
server : I followed IBM expert advices, and this was the beginning of an
upgradation waltz : microcode again, server etc etc ... Until the
current state ! Sorry to tell you that, but I have 3 years of practice
with TSM, and from what I'm seeing since approx 6 month, the product and
service quality is really sinking. Just look at this list : why are so
many people asking what version of TSM they should use ? Response :
because ex-tivoli/IBM is not even able to deliver a non bugged version
of its product. The last time I opened a PMR with IBM, do you know what
their answer was : wait until next M.L. delivery, and jump to Version 5
! Sorry but with such conditions, I can't afford buying another 6H0 and
two 3584 libraries , just to test if "experts" advice would better or
impair my productive environment : I (unfortunately) have to trust them
!
Please, do not consider this as a personal attack, as I consider this
list (and it's eminent members) as the most reliable source of
information concerning TSM, but I'm really bored getting white hair for
problems that would not exist if some people had done their job properly
...
Thanks anyway.

Arnaud (now calmed).

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
| Arnaud Brion, Panalpina Management Ltd., IT Group |
| Viaduktstrasse 42, P.O. Box, 4002 Basel - Switzerland |
| Phone: +41 61 226 19 78 / Fax: +41 61 226 17 01   | 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=



-Original Message-
From: Richard Sims [mailto:[EMAIL PROTECTED]] 
Sent: Friday, 14 February, 2003 14:22
To: [EMAIL PROTECTED]
Subject: Re: OS390 TSM Performance questions.


>I followed your discussion with much interest, as I'm suffering from 
>huge performance problem problem too. Unfortunately I'm not under 
>OS390, but using AIX 4.3.3 : could someone tell me if there is some 
>some trick like this one, that should be considered, when using this OS

>?

Arnaud - Your posting did not say whether you've first pursued the
 recommendations in the TSM Performance Tuning Guide. If you are
not also the AIX systems programmer at your site, you should confer with
those people to have them review the environment in which you are
running.  In http://people.bu.edu/rbs/ADSM.QuickFacts
I've collected various performance factors to check for.

Rather than looking for "tricks", it is far better to master the facets
of your hardware and software environment, and thus have a solid grasp
of maximum performance capabilities so as to recognize when things
aren't working right and know what to do.  Anything from overloaded SCSI
adapters to misconfigured ethernet connections can be a factor.

I can't stress strongly enough one major aspect of implementing systems
that novices always overlook: doing a benchmark, first.  It is vital
that you "road test" whatever you implement before committing it to
production.  At a minimum, that should be your basis for accepting
hardware and software from vendors, to see if they measure up to
brochure specifications.  Simple example: a printer which the vendor
states will run at 100 pages per minute.  If you don't initially test
it, to discover that it ony produces 82 ppm when printing duplex, you're
going to be in frantic mode when full-load printing comes along.  The
basic principle is that you MUST first assure that new hardware and
software can perform at expected levels before committing them to
production so as to be certain of their capabilities, and thus to truly
know what is normal and what is abnormal when a problem situation
occurs.  And repeat your measurements if at all possible when doing
firmware or microcode upgrades to avoid unpleasant surprises.  Never
assume that newer stuff is better: it may indeed have been created by
someone equally new in the vendor company.

  Richard Sims, BU


 



Re: OS390 TSM Performance questions.

2003-02-14 Thread Bill Kelly
Hi,

I wanted to clarify, or perhaps retract, something about the 'show memu
SHORT' command that has been mentioned in this thread.  Not surprisingly,
the numbers you get from this command vary depending on what's been going
on in the server.  Specifically, just after startup in a 512 MB region, I
get:

MAX initial storage  536870912  (512.0 MB)
Freeheld bytes   69409  (0.1 MB)
MaxQuickFree bytes 10390159  (9.9 MB)
83 Page buffers of 12683 : 0 buffers of 1585.
0 Large buffers of 792 : 1 XLarge buffers of 99.
   61 buffers free: 148 hiAlloc buffers: 87 current buffers.
   12 units of 56 bytes hiAlloc: 12 units of 104 bytes hiCur.

A couple of hours later, after a storage pool copy has run and nightly
backups are in full swing, I get:

MAX initial storage  536870912  (512.0 MB)
Freeheld bytes 10397099  (9.9 MB)
MaxQuickFree bytes 10390159  (9.9 MB)
83 Page buffers of 12683 : 14 buffers of 1585.
2 Large buffers of 792 : 18 XLarge buffers of 99.
   21616 buffers free: 48260 hiAlloc buffers: 13604 current buffers.
   13400 units of 104 bytes hiAlloc: 4879 units of 56 bytes hiCur.

Note that the Freeheld number, which intially looked 'bad', now looks
'good'.  As has been pointed out to me off-list, unless you know how to
interpret the numbers, they're just that - a bunch of numbers.  I
should've known better.  :-)

Regards,
Bill

Bill Kelly
Auburn University
[EMAIL PROTECTED]



Re: OS390 TSM Performance questions.

2003-02-14 Thread Rodney clark
Post us some details iostat vmstat and how much memory disks e.t.c.
The big quick win on AIX is vmtune -p5 -P10
But I guess you a.ready know that.


-Original Message-
From: PAC Brion Arnaud [mailto:[EMAIL PROTECTED]]
Sent: Friday 14 February 2003 09:44
To: [EMAIL PROTECTED]
Subject: Re: OS390 TSM Performance questions.


Hi all,

I followed your discussion with much interest, as I'm suffering from
huge performance problem problem too. Unfortunately I'm not under OS390,
but using AIX 4.3.3 : could someone tell me if there is some some trick
like this one, that should be considered, when using this OS ?
Another thing that annoys me : using "show memu SHORT" on my server (TSM
4.2.3.1) returns : ANR2000E Unknown command - SHOW MEMU
Could it be that this command is only available for OS390 TSM version ?
Thanks in advance.

Arnaud
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
| Arnaud Brion, Panalpina Management Ltd., IT Group |
| Viaduktstrasse 42, P.O. Box, 4002 Basel - Switzerland |
| Phone: +41 61 226 19 78 / Fax: +41 61 226 17 01   |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=



-Original Message-
From: Alan Davenport [mailto:[EMAIL PROTECTED]]
Sent: Thursday, 13 February, 2003 21:41
To: [EMAIL PROTECTED]
Subject: Re: OS390 TSM Performance questions.


Hi Bill,

  Another thing that came up in our discussion was that the DB
buffpoolsize should not exceed 131072 (128K). You might want to try that
was well. Sounds like you have little to lose, like I did when I tried
reducing the region size. Another observation. My cache hit ratio has
gone up nearly a full percentage point after I made my adjustment this
morning. I'm fairly happy at this point but I sure wish I knew WHY this
has worked! I can see me trying to explain this to management. "I solved
the TSM performance problem!" "Really! How?" "I gave is less than half
the memory to work with!" "OK Al, just stay calm the men with the white
coats will be along shortly!" At least it would be vacation time! (:

 Al

-Original Message-
From: Bill Kelly [mailto:[EMAIL PROTECTED]]
Sent: Thursday, February 13, 2003 2:49 PM
To: [EMAIL PROTECTED]
Subject: Re: OS390 TSM Performance questions.


Hi,

We seem to be experiencing symptoms similar (identical?) to Alan's.

We're at z/OS 1.2, running on a 2066-002 w/ 8GB of memory and virtually
no paging; TSM is at 4.2.3.0.; database is 55% of 106 GB.  Network
connectivity is via GB ethernet. Disk pool is 190GB on an ESS. Nightly
backup load is approximately 230 clients (a mix of desktops and
servers), averaging in the 130-140GB range per night total.

For some weeks now (I'm not sure when this started, but I know the
problem was there at 4.2.2.10), we've been seeing horrible performance
after TSM has been up for a few hours.  For example, I can watch 3
migration processes that run along fine for a little while, each getting
approx. 400 MB/min throughput, then suddenly CPU utilization by TSM
shoots up to 95% and throughput on the migrations drops to approx. 50
MB/min per process. Stopping and restarting the processes does no good,
but cycling the server clears up the problem.  I'm certain this problem
affects other server activities, such as client backups, storage pool
backups, etc.

Like Alan, I've been ratcheting up the region size (up to 1.5 GB) and
the db bufferpool size (up to 384 MB) in a vain attempt to help matters.

I recently resorted to cycling the server 4 times per day just to get
the performance needed to keep up with things.

Based on the comments in this thread, I last night changed our region
size to 512 MB and db bufferpool size to 128 MB.  Until now, I wasn't
aware of the 'show memu' diagnostic command (thanks Alan/Mark! I finally
have
*something* to quantify directly); here's the output from our server:

MAX initial storage  536870912  (512.0 MB)
Freeheld bytes   63678  (0.1 MB)
MaxQuickFree bytes 10390159  (9.9 MB)
83 Page buffers of 12683 : 0 buffers of 1585.
0 Large buffers of 792 : 1 XLarge buffers of 99.
   68 buffers free: 134 hiAlloc buffers: 66 current buffers.
   12 units of 56 bytes hiAlloc: 11 units of 88 bytes hiCur.

So apparently I still have the 'tiny Freeheld' problem; I suspect
strongly I had the same trouble at 1.5 GB region size. (I don't suppose
the functions of and relationships among these buffer pools is
documented anywhere?  I haven't found anything in the list archives or
at the support web site.)  I wonder if there's a factor other than db
bufferpool size and region size that's affecting these buffer pool
allocations?

I suspect that our server performance goes south once we run out of
one/some type(s) of these buffers and the server starts
GETMAINing/FREEMAINing itself to death?

Lacking any further information, I plan to do some bouncing o

Re: OS390 TSM Performance questions.

2003-02-14 Thread PAC Brion Arnaud
Hi all,

I followed your discussion with much interest, as I'm suffering from
huge performance problem problem too. Unfortunately I'm not under OS390,
but using AIX 4.3.3 : could someone tell me if there is some some trick
like this one, that should be considered, when using this OS ?
Another thing that annoys me : using "show memu SHORT" on my server (TSM
4.2.3.1) returns : ANR2000E Unknown command - SHOW MEMU
Could it be that this command is only available for OS390 TSM version ?
Thanks in advance.

Arnaud
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
| Arnaud Brion, Panalpina Management Ltd., IT Group |
| Viaduktstrasse 42, P.O. Box, 4002 Basel - Switzerland |
| Phone: +41 61 226 19 78 / Fax: +41 61 226 17 01   | 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=



-Original Message-
From: Alan Davenport [mailto:[EMAIL PROTECTED]] 
Sent: Thursday, 13 February, 2003 21:41
To: [EMAIL PROTECTED]
Subject: Re: OS390 TSM Performance questions.


Hi Bill,

  Another thing that came up in our discussion was that the DB
buffpoolsize should not exceed 131072 (128K). You might want to try that
was well. Sounds like you have little to lose, like I did when I tried
reducing the region size. Another observation. My cache hit ratio has
gone up nearly a full percentage point after I made my adjustment this
morning. I'm fairly happy at this point but I sure wish I knew WHY this
has worked! I can see me trying to explain this to management. "I solved
the TSM performance problem!" "Really! How?" "I gave is less than half
the memory to work with!" "OK Al, just stay calm the men with the white
coats will be along shortly!" At least it would be vacation time! (:

 Al

-Original Message-
From: Bill Kelly [mailto:[EMAIL PROTECTED]]
Sent: Thursday, February 13, 2003 2:49 PM
To: [EMAIL PROTECTED]
Subject: Re: OS390 TSM Performance questions.


Hi,

We seem to be experiencing symptoms similar (identical?) to Alan's.

We're at z/OS 1.2, running on a 2066-002 w/ 8GB of memory and virtually
no paging; TSM is at 4.2.3.0.; database is 55% of 106 GB.  Network
connectivity is via GB ethernet. Disk pool is 190GB on an ESS. Nightly
backup load is approximately 230 clients (a mix of desktops and
servers), averaging in the 130-140GB range per night total.

For some weeks now (I'm not sure when this started, but I know the
problem was there at 4.2.2.10), we've been seeing horrible performance
after TSM has been up for a few hours.  For example, I can watch 3
migration processes that run along fine for a little while, each getting
approx. 400 MB/min throughput, then suddenly CPU utilization by TSM
shoots up to 95% and throughput on the migrations drops to approx. 50
MB/min per process. Stopping and restarting the processes does no good,
but cycling the server clears up the problem.  I'm certain this problem
affects other server activities, such as client backups, storage pool
backups, etc.

Like Alan, I've been ratcheting up the region size (up to 1.5 GB) and
the db bufferpool size (up to 384 MB) in a vain attempt to help matters.

I recently resorted to cycling the server 4 times per day just to get
the performance needed to keep up with things.

Based on the comments in this thread, I last night changed our region
size to 512 MB and db bufferpool size to 128 MB.  Until now, I wasn't
aware of the 'show memu' diagnostic command (thanks Alan/Mark! I finally
have
*something* to quantify directly); here's the output from our server:

MAX initial storage  536870912  (512.0 MB)
Freeheld bytes   63678  (0.1 MB)
MaxQuickFree bytes 10390159  (9.9 MB)
83 Page buffers of 12683 : 0 buffers of 1585.
0 Large buffers of 792 : 1 XLarge buffers of 99.
   68 buffers free: 134 hiAlloc buffers: 66 current buffers.
   12 units of 56 bytes hiAlloc: 11 units of 88 bytes hiCur.

So apparently I still have the 'tiny Freeheld' problem; I suspect
strongly I had the same trouble at 1.5 GB region size. (I don't suppose
the functions of and relationships among these buffer pools is
documented anywhere?  I haven't found anything in the list archives or
at the support web site.)  I wonder if there's a factor other than db
bufferpool size and region size that's affecting these buffer pool
allocations?

I suspect that our server performance goes south once we run out of
one/some type(s) of these buffers and the server starts
GETMAINing/FREEMAINing itself to death?

Lacking any further information, I plan to do some bouncing of our
server this weekend to see if I can come up with a region and db bufpool
combination that will get the 'Freeheld bytes' (and presumably the
'buffers free') numbers into a reasonable range.  Perhaps if I can do
that, I'll be able to stop this insane cycling of the server every 5-8
hours.

Thanks 

Re: OS390 TSM Performance questions.

2003-02-13 Thread Alan Davenport
Hi Bill,

  Another thing that came up in our discussion was that the DB
buffpoolsize should not exceed 131072 (128K). You might want to try that was
well. Sounds like you have little to lose, like I did when I tried reducing
the region size. Another observation. My cache hit ratio has gone up nearly
a full percentage point after I made my adjustment this morning. I'm fairly
happy at this point but I sure wish I knew WHY this has worked! I can see me
trying to explain this to management. "I solved the TSM performance
problem!" "Really! How?" "I gave is less than half the memory to work with!"
"OK Al, just stay calm the men with the white coats will be along shortly!"
At least it would be vacation time! (:

 Al

-Original Message-
From: Bill Kelly [mailto:[EMAIL PROTECTED]]
Sent: Thursday, February 13, 2003 2:49 PM
To: [EMAIL PROTECTED]
Subject: Re: OS390 TSM Performance questions.


Hi,

We seem to be experiencing symptoms similar (identical?) to Alan's.

We're at z/OS 1.2, running on a 2066-002 w/ 8GB of memory and virtually no
paging; TSM is at 4.2.3.0.; database is 55% of 106 GB.  Network
connectivity is via GB ethernet. Disk pool is 190GB on an ESS. Nightly
backup load is approximately 230 clients (a mix of desktops and servers),
averaging in the 130-140GB range per night total.

For some weeks now (I'm not sure when this started, but I know the problem
was there at 4.2.2.10), we've been seeing horrible performance after TSM
has been up for a few hours.  For example, I can watch 3 migration
processes that run along fine for a little while, each getting approx. 400
MB/min throughput, then suddenly CPU utilization by TSM shoots up to 95%
and throughput on the migrations drops to approx. 50 MB/min per process.
Stopping and restarting the processes does no good, but cycling the server
clears up the problem.  I'm certain this problem affects other server
activities, such as client backups, storage pool backups, etc.

Like Alan, I've been ratcheting up the region size (up to 1.5 GB) and the
db bufferpool size (up to 384 MB) in a vain attempt to help matters.

I recently resorted to cycling the server 4 times per day just to get the
performance needed to keep up with things.

Based on the comments in this thread, I last night changed our region size
to 512 MB and db bufferpool size to 128 MB.  Until now, I wasn't aware of
the 'show memu' diagnostic command (thanks Alan/Mark! I finally have
*something* to quantify directly); here's the output from our server:

MAX initial storage  536870912  (512.0 MB)
Freeheld bytes   63678  (0.1 MB)
MaxQuickFree bytes 10390159  (9.9 MB)
83 Page buffers of 12683 : 0 buffers of 1585.
0 Large buffers of 792 : 1 XLarge buffers of 99.
   68 buffers free: 134 hiAlloc buffers: 66 current buffers.
   12 units of 56 bytes hiAlloc: 11 units of 88 bytes hiCur.

So apparently I still have the 'tiny Freeheld' problem; I suspect strongly
I had the same trouble at 1.5 GB region size. (I don't suppose the
functions of and relationships among these buffer pools is documented
anywhere?  I haven't found anything in the list archives or at the support
web site.)  I wonder if there's a factor other than db bufferpool size
and region size that's affecting these buffer pool allocations?

I suspect that our server performance goes south once we run out of
one/some type(s) of these buffers and the server starts
GETMAINing/FREEMAINing itself to death?

Lacking any further information, I plan to do some bouncing of our server
this weekend to see if I can come up with a region and db bufpool
combination that will get the 'Freeheld bytes' (and presumably the
'buffers free') numbers into a reasonable range.  Perhaps if I can do
that, I'll be able to stop this insane cycling of the server every 5-8
hours.

Thanks for your help and insight!
Bill

Bill Kelly
Auburn University
[EMAIL PROTECTED]

On Thu, 13 Feb 2003, Alan Davenport wrote:

> I had my region size at 1280M and
> TSM was running just awful. I had a phone conversation with Mark and
> afterwards, I tried his suggestion of REDUCING the region size. Note the
> before/after output to the "show memu SHORT" (Case sensitive!) display:
>
> Region Size = 1280M
>
> MAX initial storage  1342177280 (1280.0 MB)
> Freeheld bytes  145620  (0.1 MB)
> MaxQuickFree bytes 26387005  (25.2MB)
> 56 Page buffers of 32210 : 315 buffers of 4026.
> 4 Large buffers of 2013 : 222 XLarge buffers of 251.
> 202 buffers free: 336 hiAlloc buffers: 134 current buffers.
> 50 units of 688 bytes hiAlloc: 44 units of 72 bytes hiCur.
> Region Size=512M
>
> MAX initial storage  536870912  (512.0 MB)
> Freeheld bytes 10280787  (9.8 MB)
> MaxQuickFree bytes 10280878  (9.8 MB)
> 56 Page buffers of 12549 : 4 buffers of 1568.

Re: OS390 TSM Performance questions.

2003-02-13 Thread Bill Kelly
Hi,

We seem to be experiencing symptoms similar (identical?) to Alan's.

We're at z/OS 1.2, running on a 2066-002 w/ 8GB of memory and virtually no
paging; TSM is at 4.2.3.0.; database is 55% of 106 GB.  Network
connectivity is via GB ethernet. Disk pool is 190GB on an ESS. Nightly
backup load is approximately 230 clients (a mix of desktops and servers),
averaging in the 130-140GB range per night total.

For some weeks now (I'm not sure when this started, but I know the problem
was there at 4.2.2.10), we've been seeing horrible performance after TSM
has been up for a few hours.  For example, I can watch 3 migration
processes that run along fine for a little while, each getting approx. 400
MB/min throughput, then suddenly CPU utilization by TSM shoots up to 95%
and throughput on the migrations drops to approx. 50 MB/min per process.
Stopping and restarting the processes does no good, but cycling the server
clears up the problem.  I'm certain this problem affects other server
activities, such as client backups, storage pool backups, etc.

Like Alan, I've been ratcheting up the region size (up to 1.5 GB) and the
db bufferpool size (up to 384 MB) in a vain attempt to help matters.

I recently resorted to cycling the server 4 times per day just to get the
performance needed to keep up with things.

Based on the comments in this thread, I last night changed our region size
to 512 MB and db bufferpool size to 128 MB.  Until now, I wasn't aware of
the 'show memu' diagnostic command (thanks Alan/Mark! I finally have
*something* to quantify directly); here's the output from our server:

MAX initial storage  536870912  (512.0 MB)
Freeheld bytes   63678  (0.1 MB)
MaxQuickFree bytes 10390159  (9.9 MB)
83 Page buffers of 12683 : 0 buffers of 1585.
0 Large buffers of 792 : 1 XLarge buffers of 99.
   68 buffers free: 134 hiAlloc buffers: 66 current buffers.
   12 units of 56 bytes hiAlloc: 11 units of 88 bytes hiCur.

So apparently I still have the 'tiny Freeheld' problem; I suspect strongly
I had the same trouble at 1.5 GB region size. (I don't suppose the
functions of and relationships among these buffer pools is documented
anywhere?  I haven't found anything in the list archives or at the support
web site.)  I wonder if there's a factor other than db bufferpool size
and region size that's affecting these buffer pool allocations?

I suspect that our server performance goes south once we run out of
one/some type(s) of these buffers and the server starts
GETMAINing/FREEMAINing itself to death?

Lacking any further information, I plan to do some bouncing of our server
this weekend to see if I can come up with a region and db bufpool
combination that will get the 'Freeheld bytes' (and presumably the
'buffers free') numbers into a reasonable range.  Perhaps if I can do
that, I'll be able to stop this insane cycling of the server every 5-8
hours.

Thanks for your help and insight!
Bill

Bill Kelly
Auburn University
[EMAIL PROTECTED]

On Thu, 13 Feb 2003, Alan Davenport wrote:

> I had my region size at 1280M and
> TSM was running just awful. I had a phone conversation with Mark and
> afterwards, I tried his suggestion of REDUCING the region size. Note the
> before/after output to the "show memu SHORT" (Case sensitive!) display:
>
> Region Size = 1280M
>
> MAX initial storage  1342177280 (1280.0 MB)
> Freeheld bytes  145620  (0.1 MB)
> MaxQuickFree bytes 26387005  (25.2MB)
> 56 Page buffers of 32210 : 315 buffers of 4026.
> 4 Large buffers of 2013 : 222 XLarge buffers of 251.
> 202 buffers free: 336 hiAlloc buffers: 134 current buffers.
> 50 units of 688 bytes hiAlloc: 44 units of 72 bytes hiCur.
> Region Size=512M
>
> MAX initial storage  536870912  (512.0 MB)
> Freeheld bytes 10280787  (9.8 MB)
> MaxQuickFree bytes 10280878  (9.8 MB)
> 56 Page buffers of 12549 : 4 buffers of 1568.
> 2 Large buffers of 784 : 18 XLarge buffers of 98.
> 66992 buffers free: 81083 hiAlloc buffers: 1903 current buffers.
> 28969 units of 56 bytes hiAlloc: 1532 units of 104 bytes hiCur.
>
> Look at the second line of the displays. It appears that with region=1280M
> the "Freeheld bytes" buffer was WAY under allocated. Only 145K was
> allocated. With the region size set to 512M 9.8MB was allocated to the
> buffer and TSM is running significantly better. Whether or not this will
> help someone else I do not know. This is the first I've heard that REDUCING
> region size will help performance. It is counter-intuitive. I had been
> increasing it slowly over a period of time based on information I had found
> on ADSM.ORG. It's hard to argue with results however. My maintenance cycle
> is currently around 3 hours further along today than it usually is.
>
>  Take care,
>  Al
>



Re: OS390 TSM Performance questions.

2003-02-13 Thread Alan Davenport
Hello All, many thanks to all who responded to my inquiry. It looks like
Mark's response, item #2, was the answer. I had my region size at 1280M and
TSM was running just awful. I had a phone conversation with Mark and
afterwards, I tried his suggestion of REDUCING the region size. Note the
before/after output to the "show memu SHORT" (Case sensitive!) display:

Region Size = 1280M

MAX initial storage  1342177280 (1280.0 MB)

Freeheld bytes  145620  (0.1 MB)
MaxQuickFree bytes 26387005  (25.2MB)

56 Page buffers of 32210 : 315 buffers of 4026.

4 Large buffers of 2013 : 222 XLarge buffers of 251.

202 buffers free: 336 hiAlloc buffers: 134 current buffers.

50 units of 688 bytes hiAlloc: 44 units of 72 bytes hiCur.

Region Size=512M

MAX initial storage  536870912  (512.0 MB)
Freeheld bytes 10280787  (9.8 MB)
MaxQuickFree bytes 10280878  (9.8 MB)
56 Page buffers of 12549 : 4 buffers of 1568.
2 Large buffers of 784 : 18 XLarge buffers of 98.
66992 buffers free: 81083 hiAlloc buffers: 1903 current buffers.
28969 units of 56 bytes hiAlloc: 1532 units of 104 bytes hiCur.

Look at the second line of the displays. It appears that with region=1280M
the "Freeheld bytes" buffer was WAY under allocated. Only 145K was
allocated. With the region size set to 512M 9.8MB was allocated to the
buffer and TSM is running significantly better. Whether or not this will
help someone else I do not know. This is the first I've heard that REDUCING
region size will help performance. It is counter-intuitive. I had been
increasing it slowly over a period of time based on information I had found
on ADSM.ORG. It's hard to argue with results however. My maintenance cycle
is currently around 3 hours further along today than it usually is.

 Take care,
 Al

Alan Davenport
Senior Storage Administrator
Selective Insurance Co. of America
[EMAIL PROTECTED]
(973) 948-1306


-Original Message-
From: Darby, Mark [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, February 12, 2003 12:03 PM
To: [EMAIL PROTECTED]
Subject: Re: OS390 TSM Performance questions.


Hello, Al.

We have much to share.  We are OS/390 2.10 on a 7060-H50 (~120 MIPS) with
approx. 100Mbit network connectivity and have had many, long-standing TSM
performance problems.  We are currently running 4.2.3.2.  We have discovered
in working with TSM support (without a technical explanation as to "why?")
that reducing the TSM server's region to 512M and setting (by reducing)
bufpoolsize to 131072 (i.e., 128MB) works for us.  We had previously tried
several region settings from 1.75G down to 960M with the same problematic
results until "happening upon" the severely reduced, storage-constrained
"settings" with which we are now running (or should I say, limping).  This
was determined with the help of the Tivoli "performance team" in response to
a long string of numerous performance-related PMRs.

Here are some things we have discovered - and which work best for us:
1. Region over 512M causes serious and pervasive performance problems
2. BufPoolSize much over 131072 MAY also cause/contribute similarly (and
definitely doesn't help)
3. CPU utilization is VERY high for any database-intensive processes
4. Database corruption may be the root cause for our severe symptoms (this
is purely conjecture on my part at this point, but supported, to some
degree, by TSM support statements recommending we fix known DB corruption -
which, of course, with dump/reload/audit performance being what it is, is an
impossible "hit" to take).  FYI: We plan to "move out" of the TSM server
with database corruption "into" a new, virgin server(s) as soon as time and
other factors permit.

Prior to adjusting our "settings" as indicated above, we were experiencing
severe, pervasive, and nearly continual performance problems (and CPU
over-utilization), server unresponsiveness, and what I would call
"stress-related" failures of all sorts, and a whole plethora of other,
unmentioned "problems".  After making "the adjustments" we have found that,
although the TSM server still frequently gets "tangled up in its shorts",
the problems are not as severe nor are they as frequent or pervasive, and
performance is better than when we ran it in the "larger memory footprint".
Although it is closer to acceptable, it is still well below the kind of
performance I expect from an application running on the platform (i.e.,
S/390).

We cannot even imagine a reason why these adjustments have helped, but they
have.  It is totally counter-intuitive to me that reducing the memory
footprint would yield these results, but it has.

I would call IBM/Tivoli support, if I were you, and start a diagnostic
regimen with them on your particular issues.  We were told by them that many
OS/390 shops are getting far superior performance, throughput, and (I
presume) a much bet

Re: OS390 TSM Performance questions.

2003-02-12 Thread Matt Simpson
At 12:02 PM -0500 2/12/03, Darby, Mark wrote:

 It is totally counter-intuitive to me that reducing the memory
footprint would yield these results, but it has.


Is your system memory-constrained?  Are you experiencing high paging
rates?  It's possible that increasing TSM's memory demand increases
paging to a point where overall system performance goes down the
toilet if you don't have sufficient real memory to back it.
--


Matt Simpson --  OS/390 Support
219 McVey Hall  -- (859) 257-2900 x300
University Of Kentucky, Lexington, KY 40506

mainframe --   An obsolete device still used by thousands of obsolete
companies serving billions of obsolete customers and making huge obsolete
profits for their obsolete shareholders.  And this year's run twice as fast
as last year's.



Re: OS390 TSM Performance questions.

2003-02-12 Thread Darby, Mark
Hello, Al.

We have much to share.  We are OS/390 2.10 on a 7060-H50 (~120 MIPS) with
approx. 100Mbit network connectivity and have had many, long-standing TSM
performance problems.  We are currently running 4.2.3.2.  We have discovered
in working with TSM support (without a technical explanation as to "why?")
that reducing the TSM server's region to 512M and setting (by reducing)
bufpoolsize to 131072 (i.e., 128MB) works for us.  We had previously tried
several region settings from 1.75G down to 960M with the same problematic
results until "happening upon" the severely reduced, storage-constrained
"settings" with which we are now running (or should I say, limping).  This
was determined with the help of the Tivoli "performance team" in response to
a long string of numerous performance-related PMRs.

Here are some things we have discovered - and which work best for us:
1. Region over 512M causes serious and pervasive performance problems
2. BufPoolSize much over 131072 MAY also cause/contribute similarly (and
definitely doesn't help)
3. CPU utilization is VERY high for any database-intensive processes
4. Database corruption may be the root cause for our severe symptoms (this
is purely conjecture on my part at this point, but supported, to some
degree, by TSM support statements recommending we fix known DB corruption -
which, of course, with dump/reload/audit performance being what it is, is an
impossible "hit" to take).  FYI: We plan to "move out" of the TSM server
with database corruption "into" a new, virgin server(s) as soon as time and
other factors permit.

Prior to adjusting our "settings" as indicated above, we were experiencing
severe, pervasive, and nearly continual performance problems (and CPU
over-utilization), server unresponsiveness, and what I would call
"stress-related" failures of all sorts, and a whole plethora of other,
unmentioned "problems".  After making "the adjustments" we have found that,
although the TSM server still frequently gets "tangled up in its shorts",
the problems are not as severe nor are they as frequent or pervasive, and
performance is better than when we ran it in the "larger memory footprint".
Although it is closer to acceptable, it is still well below the kind of
performance I expect from an application running on the platform (i.e.,
S/390).

We cannot even imagine a reason why these adjustments have helped, but they
have.  It is totally counter-intuitive to me that reducing the memory
footprint would yield these results, but it has.

I would call IBM/Tivoli support, if I were you, and start a diagnostic
regimen with them on your particular issues.  We were told by them that many
OS/390 shops are getting far superior performance, throughput, and (I
presume) a much better CPU utilization picture than we experience.  Further,
their stated position is that some environmental factor, unique to "us", is
the root cause for our performance issues.  Aside from our limited bandwidth
and database corruption "issues", I cannot think of any other factor that
makes us extremely unique among all the other users of the TSM server on
OS/390.

You are the first shop I have heard reporting an experience similar to ours.

Please feel free to explore this further with me off-line if you wish.

Regards,
Mark Darby
(301) 903-5229

-Original Message-
From: Alan Davenport [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, February 12, 2003 10:46 AM
To: [EMAIL PROTECTED]
Subject: OS390 TSM Performance questions.

Hello,

We're running TSM v5.1.5.4 on an IBM 20660A2 processor running OS390
v10. There is a 100Mbit, single port OSA card on the processor. We are
backing up 197 clients per night. MAXSCHEDSESSIONS is set to allow 116
simultaneous backup sessions. Our backup window begins at 20:00 and ends at
07:30 the next morning. We are seeing poor performance on our backups during
the window.  For example, one server that will backup in 6-7 minutes outside
the window takes hours to complete during the window. The TSM server has a
region size of 1280M and MPTHREADING is set to YES. Self tune buffer size
and TXN size is enabled. We are backing up to a 100GB disc buffer to an EMC
model 8830 drive array. On average we backup 30-40GB per night with a peak
of 75-80GB.

I know there are much larger shops backing up many more servers out
there running OS390 also. What I would like to know is, on large shops, what
is your OSA configuration? Are you running multi-port OSAs and/or gigabit
cards? For comparison, I would also like to know how many clients you are
backing up per night. Where do you think the bottleneck is? Have you seen
similar problems and what did you do to help alleviate the problem? I am
fairly confident that TSM is not CPU constrained during the window. We
recently moved TSM to a higher service class wit

Re: OS390 TSM Performance questions.

2003-02-12 Thread MC Matt Cooper (2838)
Alan,
As can be expected there is got to be a dozen different things to
look at.  However, from what you have said so far I would cut the maximum
number of concurrent session in half.  I found that when I get too many
concurrent session TSM thrashes.   I just started backing up desktops during
the day.  I backup 200+ desktops in about 2.5 hours and averages about 90GB.
My adapter is ATM but the desktop's network is fast Ethernet.  To get my
desktop window to work I did the following:
The clients are kicked by POLLING
The MAXSCHEDSESSION reduced to 25
The RANDOMIZE  set to 50
The MAXCMDRETRIES   set to 12
The RETRYPERIOD   set to 15 minutes
This spread out the start of the backups to the 1st 1/2 of the schedule
window, reduced the concurrent backups, but had the backups that didn't have
a session available to retry every 15 minutes for 3 hours.
Matt
PS; I am running TSM 5.1.5.4 on z/OS 1.1 on a 9672 x57.   My server backups
that are run overnight use multiple adapters.  I expand the concurrent
sessions to 50 for them and manage to get the 1st 100GB of 300GB moved in
about 3.5 hours.
Ask your network guys what the network is doing overnight or dialin and look
at CPU Utilization,  and the rate of the adapter TSO RMFMON, then PFK4 to
see the Mbytes/sec for that CHPID  (OSA-Es report as CHPID ID's)
 -Original Message-
From:   Alan Davenport [mailto:[EMAIL PROTECTED]]
Sent:   Wednesday, February 12, 2003 10:46 AM
To: [EMAIL PROTECTED]
Subject:    OS390 TSM Performance questions.

Hello,

We're running TSM v5.1.5.4 on an IBM 20660A2 processor running OS390
v10. There is a 100Mbit, single port OSA card on the processor. We are
backing up 197 clients per night. MAXSCHEDSESSIONS is set to allow 116
simultaneous backup sessions. Our backup window begins at 20:00 and ends at
07:30 the next morning. We are seeing poor performance on our backups during
the window.  For example, one server that will backup in 6-7 minutes outside
the window takes hours to complete during the window. The TSM server has a
region size of 1280M and MPTHREADING is set to YES. Self tune buffer size
and TXN size is enabled. We are backing up to a 100GB disc buffer to an EMC
model 8830 drive array. On average we backup 30-40GB per night with a peak
of 75-80GB.

I know there are much larger shops backing up many more servers out
there running OS390 also. What I would like to know is, on large shops, what
is your OSA configuration? Are you running multi-port OSAs and/or gigabit
cards? For comparison, I would also like to know how many clients you are
backing up per night. Where do you think the bottleneck is? Have you seen
similar problems and what did you do to help alleviate the problem? I am
fairly confident that TSM is not CPU constrained during the window. We
recently moved TSM to a higher service class with little effect on the
problem.  Do you feel we are saturating the OSA card?

Any thoughts and suggestions would be greatly appreciated.

  Take care,
   Al

Alan Davenport
Senior Storage Administrator
Selective Insurance Co. of America
[EMAIL PROTECTED]
(973) 948-1306



Re: OS390 TSM Performance questions.

2003-02-12 Thread John Naylor
Alan,
We are TSM 4.2.2 region=512M on os390 2.10 with gigabit osa
My summary stats from last night for 102 clients show
TOTAL FILES BACKED UP:   214388
TOTAL MEGABYTES BACKED UP:   106219
TOTAL NETWORK TIME(MINUTES): 2723
TOTAL BACKUP TIME(MINUTES):  4863
You should be able to see from your query active statistics (not 100% but near
enough) where your clients are spending their time.
You need to look at these values to see whether you are network constrained
Data transfer time:
Network data transfer rate:
Aggregate data transfer rate:
Elapsed processing time:

If the majority of the elapsed processing tine is accounted for by the data
transfer time then likely you have a network issue.
You do not say if anything else is using the network overnight as this will have
an impact.
Also are the backups spread evenly during your window
Schedule some half houly q storage to cover your backup window, this will give
you a good clue to whether there is a particular peiod of the night
when TSM throughput drops right off.
Hope that helps,
John




Alan Davenport <[EMAIL PROTECTED]> on 02/12/2003 03:46:23 PM

Please respond to "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]>

To:   [EMAIL PROTECTED]
cc:(bcc: John Naylor/HAV/SSE)
Subject:  OS390 TSM Performance questions.



Hello,

We're running TSM v5.1.5.4 on an IBM 20660A2 processor running OS390
v10. There is a 100Mbit, single port OSA card on the processor. We are
backing up 197 clients per night. MAXSCHEDSESSIONS is set to allow 116
simultaneous backup sessions. Our backup window begins at 20:00 and ends at
07:30 the next morning. We are seeing poor performance on our backups during
the window.  For example, one server that will backup in 6-7 minutes outside
the window takes hours to complete during the window. The TSM server has a
region size of 1280M and MPTHREADING is set to YES. Self tune buffer size
and TXN size is enabled. We are backing up to a 100GB disc buffer to an EMC
model 8830 drive array. On average we backup 30-40GB per night with a peak
of 75-80GB.

I know there are much larger shops backing up many more servers out
there running OS390 also. What I would like to know is, on large shops, what
is your OSA configuration? Are you running multi-port OSAs and/or gigabit
cards? For comparison, I would also like to know how many clients you are
backing up per night. Where do you think the bottleneck is? Have you seen
similar problems and what did you do to help alleviate the problem? I am
fairly confident that TSM is not CPU constrained during the window. We
recently moved TSM to a higher service class with little effect on the
problem.  Do you feel we are saturating the OSA card?

Any thoughts and suggestions would be greatly appreciated.

  Take care,
   Al

Alan Davenport
Senior Storage Administrator
Selective Insurance Co. of America
[EMAIL PROTECTED]
(973) 948-1306










**
The information in this E-Mail is confidential and may be legally
privileged. It may not represent the views of Scottish and Southern
Energy plc.
It is intended solely for the addressees. Access to this E-Mail by
anyone else is unauthorised. If you are not the intended recipient,
any disclosure, copying, distribution or any action taken or omitted
to be taken in reliance on it, is prohibited and may be unlawful.
Any unauthorised recipient should advise the sender immediately of
the error in transmission.

Scottish Hydro-Electric, Southern Electric, SWALEC and S+S
are trading names of the Scottish and Southern Energy Group.
**



OS390 TSM Performance questions.

2003-02-12 Thread Alan Davenport
Hello,

We're running TSM v5.1.5.4 on an IBM 20660A2 processor running OS390
v10. There is a 100Mbit, single port OSA card on the processor. We are
backing up 197 clients per night. MAXSCHEDSESSIONS is set to allow 116
simultaneous backup sessions. Our backup window begins at 20:00 and ends at
07:30 the next morning. We are seeing poor performance on our backups during
the window.  For example, one server that will backup in 6-7 minutes outside
the window takes hours to complete during the window. The TSM server has a
region size of 1280M and MPTHREADING is set to YES. Self tune buffer size
and TXN size is enabled. We are backing up to a 100GB disc buffer to an EMC
model 8830 drive array. On average we backup 30-40GB per night with a peak
of 75-80GB.

I know there are much larger shops backing up many more servers out
there running OS390 also. What I would like to know is, on large shops, what
is your OSA configuration? Are you running multi-port OSAs and/or gigabit
cards? For comparison, I would also like to know how many clients you are
backing up per night. Where do you think the bottleneck is? Have you seen
similar problems and what did you do to help alleviate the problem? I am
fairly confident that TSM is not CPU constrained during the window. We
recently moved TSM to a higher service class with little effect on the
problem.  Do you feel we are saturating the OSA card?

Any thoughts and suggestions would be greatly appreciated.

  Take care,
   Al

Alan Davenport
Senior Storage Administrator
Selective Insurance Co. of America
[EMAIL PROTECTED]
(973) 948-1306