Re: Our TSM system is a mess. Suggestions? Ideas?

2010-02-15 Thread Jerry Michalak
Look at that Clarion system. While you may have multiple links to the SAN the 
Clarion might not. I ran into this where our Clarion had lots of disks ( 
spindles ) but only 2 ports for i/o into the box. It turns out we were running 
both ports at over 80% !!

Try to get your DB and log volumes on local disks for better throughput.



 Jerry Michalak
jerry_...@yahoo.com





From: Dury, John C. jd...@duqlight.com
To: ADSM-L@VM.MARIST.EDU
Sent: Sun, February 14, 2010 7:12:16 AM
Subject: [ADSM-L] Our TSM system is a mess. Suggestions? Ideas?

We have about 500 nodes and have a backup windows from 5pm until 7am. I have 
our backup schedule setup so that about 30 nodes do incremental per hour with a 
few exceptions. We have a 3T disk storage pool and 4 LTO4 drives in our tape 
library. Our dbbackuptrigger is set at logfull  30% and numincrmeentals of 4.  
Our recovery log is filling up almost once per hour while backups are running 
and not emptying fast enough before it hits 80% when all backups come to a 
crawl until it is emptied below 80%. Sometimes the recovery log is pinned  at 
70% or so and another backup kicks off immediately which again does not empty 
fast enough and the whole system goes into slowdown after the recovery log is 
past 80%. Expiration, which used to run in a matter of about 6 hours, is not 
completing even after running for 24 hours. Our DB is about 97gig and about 74% 
full. The recovery log is maxed at 13gig.  I don't see anything in the activity 
log out of the ordinary. The
 TSM server is AIX 5.3.10.1 TL10 running on an IBM 9131-52A in a logical 
partition with 20 CPus configured and about 32G of RAM. The TSM DB and disk 
storage pools are attached to a Clariion CX3-80 via 4G Hbas. I have the 
recovery log and TSM DB set to use different HBAs then the disk or tape storage 
pools so the HBAs aren't fighting each other. I've read the tuning and 
performance manual and matched our settings to match it's suggestions with some 
small exceptions.

We have purchased new hardware to move the whole system to Linux and a monster 
of a box since we want to get to TSM v6.x eventually, hopefully sooner rather 
than later. AIX hardware and support is tremendously expensive when compared to 
an intel based box and like a lot of people, we have a very small budget for 
anything IT related.
.
One of the biggest problems we are having is the recovery log filling up too 
quickly and not emptying fast enough.  Even with a log full trigger of 30%, the 
incremental backup won't finish before the recovery log hits 80% and with the 
log full setting so low, we are doing TSM DB backups almost every hour while 
clients are backing up. This really seems excessive to me.  Why would an 
incremental backup of the TSM DB take an hour or so to run and is it normal for 
the  recovery log to fill up so fast while backups are running?
We even attempted to do a reorg  of the TSM DB but unfortunately it was going 
to run for much longer than our window allowed so it had to be cancelled. I'm 
going to try again for next weekend and hopefully talk the powers that be, into 
a 24 hour window for the reorg. We did do a reorg years ago and the performance 
improvements were amazing, ie expiration ran in less than an hour. I know that 
is a bandaid but I have to do something until I can get to version 6 when I can 
have a bigger recovery log and a new, more powerful server in place.
I guess I'm just not sure what to look at at this point and frankly I'm 
exhausted. Our help desk is calling me daily, every day, at 6am or earlier, as 
TSM is running slow again.
Any suggestions on what else to look at? (Sorry for such a fragmented email. 
I've had about 3 hours sleep at this point)


Re: Our TSM system is a mess. Suggestions? Ideas?

2010-02-15 Thread Huebner,Andy,FORT WORTH,IT
Verify that the CX3-80 is using different physicals for DB, Log and disk pools. 
 Your AIX server can easily outrun a CX3-80 unless care is taken.  Also make 
sure that the disks are spread between the 2 SPs in the CX3-80.
We are running 500 clients on a AIX LPAR with .9 CPU (can steal up to 3 in an 8 
way box) and our disk pools are on an overworked CX-300.  DB and logs are on 
DMX.  To me it sounds like the disks may be configured in a less than optimal 
way and as others have said, find and fix/reschedule to nodes that are pinning 
the log.

Andy Huebner

-Original Message-
From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of Dury, 
John C.
Sent: Sunday, February 14, 2010 7:12 AM
To: ADSM-L@VM.MARIST.EDU
Subject: [ADSM-L] Our TSM system is a mess. Suggestions? Ideas?

We have about 500 nodes and have a backup windows from 5pm until 7am. I have 
our backup schedule setup so that about 30 nodes do incremental per hour with a 
few exceptions. We have a 3T disk storage pool and 4 LTO4 drives in our tape 
library. Our dbbackuptrigger is set at logfull  30% and numincrmeentals of 4.  
Our recovery log is filling up almost once per hour while backups are running 
and not emptying fast enough before it hits 80% when all backups come to a 
crawl until it is emptied below 80%. Sometimes the recovery log is pinned  at 
70% or so and another backup kicks off immediately which again does not empty 
fast enough and the whole system goes into slowdown after the recovery log is 
past 80%. Expiration, which used to run in a matter of about 6 hours, is not 
completing even after running for 24 hours. Our DB is about 97gig and about 74% 
full. The recovery log is maxed at 13gig.  I don't see anything in the activity 
log out of the ordinary. The TSM server is AIX 5.3.10.1 TL10 running on an IBM 
9131-52A in a logical partition with 20 CPus configured and about 32G of RAM. 
The TSM DB and disk storage pools are attached to a Clariion CX3-80 via 4G 
Hbas. I have the recovery log and TSM DB set to use different HBAs then the 
disk or tape storage pools so the HBAs aren't fighting each other. I've read 
the tuning and performance manual and matched our settings to match it's 
suggestions with some small exceptions.

We have purchased new hardware to move the whole system to Linux and a monster 
of a box since we want to get to TSM v6.x eventually, hopefully sooner rather 
than later. AIX hardware and support is tremendously expensive when compared to 
an intel based box and like a lot of people, we have a very small budget for 
anything IT related.
.
One of the biggest problems we are having is the recovery log filling up too 
quickly and not emptying fast enough.  Even with a log full trigger of 30%, the 
incremental backup won't finish before the recovery log hits 80% and with the 
log full setting so low, we are doing TSM DB backups almost every hour while 
clients are backing up. This really seems excessive to me.  Why would an 
incremental backup of the TSM DB take an hour or so to run and is it normal for 
the  recovery log to fill up so fast while backups are running?
We even attempted to do a reorg  of the TSM DB but unfortunately it was going 
to run for much longer than our window allowed so it had to be cancelled. I'm 
going to try again for next weekend and hopefully talk the powers that be, into 
a 24 hour window for the reorg. We did do a reorg years ago and the performance 
improvements were amazing, ie expiration ran in less than an hour. I know that 
is a bandaid but I have to do something until I can get to version 6 when I can 
have a bigger recovery log and a new, more powerful server in place.
I guess I'm just not sure what to look at at this point and frankly I'm 
exhausted. Our help desk is calling me daily, every day, at 6am or earlier, as 
TSM is running slow again.
Any suggestions on what else to look at? (Sorry for such a fragmented email. 
I've had about 3 hours sleep at this point)

This e-mail (including any attachments) is confidential and may be legally 
privileged. If you are not an intended recipient or an authorized 
representative of an intended recipient, you are prohibited from using, copying 
or distributing the information in this e-mail or its attachments. If you have 
received this e-mail in error, please notify the sender immediately by return 
e-mail and delete all copies of this message and any attachments.

Thank you.


Re: Our TSM system is a mess. Suggestions? Ideas?

2010-02-15 Thread Dury, John C.
First, I want to thank all of you for your replies. I definitely got some good 
ideas and have some things to look at.  I'm going to make some changes to where 
the DB and rec log are stored. Right now, they are in the same RAID 1 Group 
with 2 133g drives. I created a new lun of 6 133g drives setup as RAID 1/0. 
Eventually all of this will be moved to a bigger box with the storage pools and 
DB and Recovery log living on local disks.
Thanks everyone!


Re: Our TSM system is a mess. Suggestions? Ideas?

2010-02-15 Thread Marcel Anthonijsz
plug
When running AIX (or Linux), run nmon to keep an eye on your systems
performance.

  http://www.ibm.com/developerworks/aix/library/au-analyze_aix/

And off course put the collected nmon files through nmon analyser

 http://www.ibm.com/developerworks/aix/library/au-nmon_analyser/

We do this once a month or so (if time permits) to identify bottlenecks and
hot spots

/plug

Good luck,

2010/2/15 Dury, John C. jd...@duqlight.com

 First, I want to thank all of you for your replies. I definitely got some
 good ideas and have some things to look at.  I'm going to make some changes
 to where the DB and rec log are stored. Right now, they are in the same RAID
 1 Group with 2 133g drives. I created a new lun of 6 133g drives setup as
 RAID 1/0. Eventually all of this will be moved to a bigger box with the
 storage pools and DB and Recovery log living on local disks.
 Thanks everyone!




--
Kind Regards, Groetje,

Marcel Anthonijsz


Re: Our TSM system is a mess. Suggestions? Ideas?

2010-02-14 Thread Lamb, Charles P.
Hi...

We have a similar TSM system.  We have our TSM DB (over 400GB) only about 1/3 
full and proactively run incrementals.  We have fourteen LTO3 tapes drives 
directly connected using 4Gbps FC adapters.  IBM 9155-55A with 8-WAY/64GB of 
memory and an IBM SVC(four nodes)/FAStT system using DS4800s that uses about 
6TB of TSM disk cache using 4-4Gbps FC adapters.  Using fast disk space helps 
in TSM DB backups and other TSM activities.  Our server environment is SAP R/3 
landscapes on RISCs, Intel/MS and VMware farms, etc. 

I would think increasing TSM DB size and using a faster disk system would help. 
 Placing SVC in front of disk space helps caching the data. 

-Original Message-
From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of Dury, 
John C.
Sent: Sunday, February 14, 2010 7:12 AM
To: ADSM-L@VM.MARIST.EDU
Subject: [ADSM-L] Our TSM system is a mess. Suggestions? Ideas?

We have about 500 nodes and have a backup windows from 5pm until 7am. I have 
our backup schedule setup so that about 30 nodes do incremental per hour with a 
few exceptions. We have a 3T disk storage pool and 4 LTO4 drives in our tape 
library. Our dbbackuptrigger is set at logfull  30% and numincrmeentals of 4.  
Our recovery log is filling up almost once per hour while backups are running 
and not emptying fast enough before it hits 80% when all backups come to a 
crawl until it is emptied below 80%. Sometimes the recovery log is pinned  at 
70% or so and another backup kicks off immediately which again does not empty 
fast enough and the whole system goes into slowdown after the recovery log is 
past 80%. Expiration, which used to run in a matter of about 6 hours, is not 
completing even after running for 24 hours. Our DB is about 97gig and about 74% 
full. The recovery log is maxed at 13gig.  I don't see anything in the activity 
log out of the ordinary. The TSM server is AIX 5.3.10.1 TL10 running on an IBM 
9131-52A in a logical partition with 20 CPus configured and about 32G of RAM. 
The TSM DB and disk storage pools are attached to a Clariion CX3-80 via 4G 
Hbas. I have the recovery log and TSM DB set to use different HBAs then the 
disk or tape storage pools so the HBAs aren't fighting each other. I've read 
the tuning and performance manual and matched our settings to match it's 
suggestions with some small exceptions.

We have purchased new hardware to move the whole system to Linux and a monster 
of a box since we want to get to TSM v6.x eventually, hopefully sooner rather 
than later. AIX hardware and support is tremendously expensive when compared to 
an intel based box and like a lot of people, we have a very small budget for 
anything IT related.
.
One of the biggest problems we are having is the recovery log filling up too 
quickly and not emptying fast enough.  Even with a log full trigger of 30%, the 
incremental backup won't finish before the recovery log hits 80% and with the 
log full setting so low, we are doing TSM DB backups almost every hour while 
clients are backing up. This really seems excessive to me.  Why would an 
incremental backup of the TSM DB take an hour or so to run and is it normal for 
the  recovery log to fill up so fast while backups are running?
We even attempted to do a reorg  of the TSM DB but unfortunately it was going 
to run for much longer than our window allowed so it had to be cancelled. I'm 
going to try again for next weekend and hopefully talk the powers that be, into 
a 24 hour window for the reorg. We did do a reorg years ago and the performance 
improvements were amazing, ie expiration ran in less than an hour. I know that 
is a bandaid but I have to do something until I can get to version 6 when I can 
have a bigger recovery log and a new, more powerful server in place.
I guess I'm just not sure what to look at at this point and frankly I'm 
exhausted. Our help desk is calling me daily, every day, at 6am or earlier, as 
TSM is running slow again.
Any suggestions on what else to look at? (Sorry for such a fragmented email. 
I've had about 3 hours sleep at this point)


Re: Our TSM system is a mess. Suggestions? Ideas?

2010-02-14 Thread Lamb, Charles P.
BTW, we have a TSM V5.5.2.0 with IBM 3584-L32 and 3-3584-D32s.  A TSM system 
needs I/Os, I/Os, I/Os and fast I/Os and a lot of disk space.

-Original Message-
From: Lamb, Charles P. 
Sent: Sunday, February 14, 2010 9:35 AM
To: 'ADSM-L@VM.MARIST.EDU'
Subject: RE: Our TSM system is a mess. Suggestions? Ideas?

Hi...

We have a similar TSM system.  We have our TSM DB (over 400GB) only about 1/3 
full and proactively run incrementals.  We have fourteen LTO3 tapes drives 
directly connected using 4Gbps FC adapters.  IBM 9155-55A with 8-WAY/64GB of 
memory and an IBM SVC(four nodes)/FAStT system using DS4800s that uses about 
6TB of TSM disk cache using 4-4Gbps FC adapters.  Using fast disk space helps 
in TSM DB backups and other TSM activities.  Our server environment is SAP R/3 
landscapes on RISCs, Intel/MS and VMware farms, etc. 

I would think increasing TSM DB size and using a faster disk system would help. 
 Placing SVC in front of disk space helps caching the data. 

-Original Message-
From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of Dury, 
John C.
Sent: Sunday, February 14, 2010 7:12 AM
To: ADSM-L@VM.MARIST.EDU
Subject: [ADSM-L] Our TSM system is a mess. Suggestions? Ideas?

We have about 500 nodes and have a backup windows from 5pm until 7am. I have 
our backup schedule setup so that about 30 nodes do incremental per hour with a 
few exceptions. We have a 3T disk storage pool and 4 LTO4 drives in our tape 
library. Our dbbackuptrigger is set at logfull  30% and numincrmeentals of 4.  
Our recovery log is filling up almost once per hour while backups are running 
and not emptying fast enough before it hits 80% when all backups come to a 
crawl until it is emptied below 80%. Sometimes the recovery log is pinned  at 
70% or so and another backup kicks off immediately which again does not empty 
fast enough and the whole system goes into slowdown after the recovery log is 
past 80%. Expiration, which used to run in a matter of about 6 hours, is not 
completing even after running for 24 hours. Our DB is about 97gig and about 74% 
full. The recovery log is maxed at 13gig.  I don't see anything in the activity 
log out of the ordinary. The TSM server is AIX 5.3.10.1 TL10 running on an IBM 
9131-52A in a logical partition with 20 CPus configured and about 32G of RAM. 
The TSM DB and disk storage pools are attached to a Clariion CX3-80 via 4G 
Hbas. I have the recovery log and TSM DB set to use different HBAs then the 
disk or tape storage pools so the HBAs aren't fighting each other. I've read 
the tuning and performance manual and matched our settings to match it's 
suggestions with some small exceptions.

We have purchased new hardware to move the whole system to Linux and a monster 
of a box since we want to get to TSM v6.x eventually, hopefully sooner rather 
than later. AIX hardware and support is tremendously expensive when compared to 
an intel based box and like a lot of people, we have a very small budget for 
anything IT related.
.
One of the biggest problems we are having is the recovery log filling up too 
quickly and not emptying fast enough.  Even with a log full trigger of 30%, the 
incremental backup won't finish before the recovery log hits 80% and with the 
log full setting so low, we are doing TSM DB backups almost every hour while 
clients are backing up. This really seems excessive to me.  Why would an 
incremental backup of the TSM DB take an hour or so to run and is it normal for 
the  recovery log to fill up so fast while backups are running?
We even attempted to do a reorg  of the TSM DB but unfortunately it was going 
to run for much longer than our window allowed so it had to be cancelled. I'm 
going to try again for next weekend and hopefully talk the powers that be, into 
a 24 hour window for the reorg. We did do a reorg years ago and the performance 
improvements were amazing, ie expiration ran in less than an hour. I know that 
is a bandaid but I have to do something until I can get to version 6 when I can 
have a bigger recovery log and a new, more powerful server in place.
I guess I'm just not sure what to look at at this point and frankly I'm 
exhausted. Our help desk is calling me daily, every day, at 6am or earlier, as 
TSM is running slow again.
Any suggestions on what else to look at? (Sorry for such a fragmented email. 
I've had about 3 hours sleep at this point)


Re: Our TSM system is a mess. Suggestions? Ideas?

2010-02-14 Thread Grigori Solonovitch
Hello John,
I am sorry, maybe I do not understand your request, but I have very simple 
advice - just increase logs at least 3-4 times.You will have enough time for 
everything.
I have 130 nodes and 16Gb database with 8GB logs and sometimes it is not 
enough. I think 13GB logs for so big TSM database and big number of nodes is 
not enough.
Regards,
Grigori


From: ADSM: Dist Stor Manager [ads...@vm.marist.edu] On Behalf Of Dury, John C. 
[jd...@duqlight.com]
Sent: Sunday, February 14, 2010 4:12 PM
To: ADSM-L@VM.MARIST.EDU
Subject: [ADSM-L] Our TSM system is a mess. Suggestions? Ideas?

We have about 500 nodes and have a backup windows from 5pm until 7am. I have 
our backup schedule setup so that about 30 nodes do incremental per hour with a 
few exceptions. We have a 3T disk storage pool and 4 LTO4 drives in our tape 
library. Our dbbackuptrigger is set at logfull  30% and numincrmeentals of 4.  
Our recovery log is filling up almost once per hour while backups are running 
and not emptying fast enough before it hits 80% when all backups come to a 
crawl until it is emptied below 80%. Sometimes the recovery log is pinned  at 
70% or so and another backup kicks off immediately which again does not empty 
fast enough and the whole system goes into slowdown after the recovery log is 
past 80%. Expiration, which used to run in a matter of about 6 hours, is not 
completing even after running for 24 hours. Our DB is about 97gig and about 74% 
full. The recovery log is maxed at 13gig.  I don't see anything in the activity 
log out of the ordinary. The TSM server is AIX 5.3.10.1 TL10 running on an IBM 
9131-52A in a logical partition with 20 CPus configured and about 32G of RAM. 
The TSM DB and disk storage pools are attached to a Clariion CX3-80 via 4G 
Hbas. I have the recovery log and TSM DB set to use different HBAs then the 
disk or tape storage pools so the HBAs aren't fighting each other. I've read 
the tuning and performance manual and matched our settings to match it's 
suggestions with some small exceptions.

We have purchased new hardware to move the whole system to Linux and a monster 
of a box since we want to get to TSM v6.x eventually, hopefully sooner rather 
than later. AIX hardware and support is tremendously expensive when compared to 
an intel based box and like a lot of people, we have a very small budget for 
anything IT related.
.
One of the biggest problems we are having is the recovery log filling up too 
quickly and not emptying fast enough.  Even with a log full trigger of 30%, the 
incremental backup won't finish before the recovery log hits 80% and with the 
log full setting so low, we are doing TSM DB backups almost every hour while 
clients are backing up. This really seems excessive to me.  Why would an 
incremental backup of the TSM DB take an hour or so to run and is it normal for 
the  recovery log to fill up so fast while backups are running?
We even attempted to do a reorg  of the TSM DB but unfortunately it was going 
to run for much longer than our window allowed so it had to be cancelled. I'm 
going to try again for next weekend and hopefully talk the powers that be, into 
a 24 hour window for the reorg. We did do a reorg years ago and the performance 
improvements were amazing, ie expiration ran in less than an hour. I know that 
is a bandaid but I have to do something until I can get to version 6 when I can 
have a bigger recovery log and a new, more powerful server in place.
I guess I'm just not sure what to look at at this point and frankly I'm 
exhausted. Our help desk is calling me daily, every day, at 6am or earlier, as 
TSM is running slow again.
Any suggestions on what else to look at? (Sorry for such a fragmented email. 
I've had about 3 hours sleep at this point)

Please consider the environment before printing this Email.

This email message and any attachments transmitted with it may contain 
confidential and proprietary information, intended only for the named 
recipient(s). If you have received this message in error, or if you are not the 
named recipient(s), please delete this email after notifying the sender 
immediately. BKME cannot guarantee the integrity of this communication and 
accepts no liability for any damage caused by this email or its attachments due 
to viruses, any other defects, interception or unauthorized modification. The 
information, views, opinions and comments of this message are those of the 
individual and not necessarily endorsed by BKME.


Re: Our TSM system is a mess. Suggestions? Ideas?

2010-02-14 Thread Marcel Anthonijsz
John,

A few things come to mind; Which nodes are pinning the recovery log? In my
experience it are always a few slow nodes (with a lot of small files
typically) that pin the log. Try to find out which one do, and try to
improve these nodes so that they backup faster. Hell of a job when you have
500 nodes, but try to find those that take longer than 4-5 hours or have a
really slow throughput speed. A speed/duplex mismatch on a TSM client can
killed my log performance more than once.You can look in TSM reporting for
the slowest nodes.

IMHO, I think that TSM 6.1.x will not solve your problem.

Another solution would be to turn of the cell phone off every other day ;-)


Good luck,

2010/2/14 Dury, John C. jd...@duqlight.com

 We have about 500 nodes and have a backup windows from 5pm until 7am. I
 have our backup schedule setup so that about 30 nodes do incremental per
 hour with a few exceptions. We have a 3T disk storage pool and 4 LTO4 drives
 in our tape library. Our dbbackuptrigger is set at logfull  30% and
 numincrmeentals of 4.  Our recovery log is filling up almost once per hour
 while backups are running and not emptying fast enough before it hits 80%
 when all backups come to a crawl until it is emptied below 80%. Sometimes
 the recovery log is pinned  at 70% or so and another backup kicks off
 immediately which again does not empty fast enough and the whole system goes
 into slowdown after the recovery log is past 80%. Expiration, which used to
 run in a matter of about 6 hours, is not completing even after running for
 24 hours. Our DB is about 97gig and about 74% full. The recovery log is
 maxed at 13gig.  I don't see anything in the activity log out of the
 ordinary. The TSM server is AIX 5.3.10.1 TL10 running on an IBM 9131-52A in
 a logical partition with 20 CPus configured and about 32G of RAM. The TSM DB
 and disk storage pools are attached to a Clariion CX3-80 via 4G Hbas. I have
 the recovery log and TSM DB set to use different HBAs then the disk or tape
 storage pools so the HBAs aren't fighting each other. I've read the tuning
 and performance manual and matched our settings to match it's suggestions
 with some small exceptions.

 We have purchased new hardware to move the whole system to Linux and a
 monster of a box since we want to get to TSM v6.x eventually, hopefully
 sooner rather than later. AIX hardware and support is tremendously expensive
 when compared to an intel based box and like a lot of people, we have a very
 small budget for anything IT related.
 .
 One of the biggest problems we are having is the recovery log filling up
 too quickly and not emptying fast enough.  Even with a log full trigger of
 30%, the incremental backup won't finish before the recovery log hits 80%
 and with the log full setting so low, we are doing TSM DB backups almost
 every hour while clients are backing up. This really seems excessive to me.
  Why would an incremental backup of the TSM DB take an hour or so to run and
 is it normal for the  recovery log to fill up so fast while backups are
 running?
 We even attempted to do a reorg  of the TSM DB but unfortunately it was
 going to run for much longer than our window allowed so it had to be
 cancelled. I'm going to try again for next weekend and hopefully talk the
 powers that be, into a 24 hour window for the reorg. We did do a reorg years
 ago and the performance improvements were amazing, ie expiration ran in less
 than an hour. I know that is a bandaid but I have to do something until I
 can get to version 6 when I can have a bigger recovery log and a new, more
 powerful server in place.
 I guess I'm just not sure what to look at at this point and frankly I'm
 exhausted. Our help desk is calling me daily, every day, at 6am or earlier,
 as TSM is running slow again.
 Any suggestions on what else to look at? (Sorry for such a fragmented
 email. I've had about 3 hours sleep at this point)




--
Kind Regards, Groetje,

Marcel Anthonijsz


Re: Our TSM system is a mess. Suggestions? Ideas?

2010-02-14 Thread Richard Sims

On Feb 14, 2010, at 11:32 AM, Grigori Solonovitch wrote:


Hello John,
I am sorry, maybe I do not understand your request, but I have very
simple advice - just increase logs at least 3-4 times.You will have
enough time for everything.
I have 130 nodes and 16Gb database with 8GB logs and sometimes it is
not enough. I think 13GB logs for so big TSM database and big number
of nodes is not enough.


The architectural size limit for the TSM4,5 Recovery Log is 13 GB.

Recovery Log pinning is often due to scheduling congestion or data
transmission speed of clients, usually involving very large files.

Richard Sims


Re: Our TSM system is a mess. Suggestions? Ideas?

2010-02-14 Thread Remco Post
On 14 feb 2010, at 14:12, Dury, John C. wrote:

 We have about 500 nodes and have a backup windows from 5pm until 7am. I have 
 our backup schedule setup so that about 30 nodes do incremental per hour with 
 a few exceptions. We have a 3T disk storage pool and 4 LTO4 drives in our 
 tape library. Our dbbackuptrigger is set at logfull  30% and numincrmeentals 
 of 4.  Our recovery log is filling up almost once per hour while backups are 
 running and not emptying fast enough before it hits 80% when all backups come 
 to a crawl until it is emptied below 80%. Sometimes the recovery log is 
 pinned  at 70% or so and another backup kicks off immediately which again 
 does not empty fast enough and the whole system goes into slowdown after the 
 recovery log is past 80%. Expiration, which used to run in a matter of about 
 6 hours, is not completing even after running for 24 hours. Our DB is about 
 97gig and about 74% full. The recovery log is maxed at 13gig.  I don't see 
 anything in the activity log out of the ordinary. The TSM server is AIX 
 5.3.10.1 TL10 running on an IBM 9131-52A in a logical partition with 20 CPus 
 configured and about 32G of RAM. The TSM DB and disk storage pools are 
 attached to a Clariion CX3-80 via 4G Hbas. I have the recovery log and TSM DB 
 set to use different HBAs then the disk or tape storage pools so the HBAs 
 aren't fighting each other. I've read the tuning and performance manual and 
 matched our settings to match it's suggestions with some small exceptions.
 
 We have purchased new hardware to move the whole system to Linux and a 
 monster of a box since we want to get to TSM v6.x eventually, hopefully 
 sooner rather than later. AIX hardware and support is tremendously expensive 
 when compared to an intel based box and like a lot of people, we have a very 
 small budget for anything IT related.
 .
 One of the biggest problems we are having is the recovery log filling up too 
 quickly and not emptying fast enough.  Even with a log full trigger of 30%, 
 the incremental backup won't finish before the recovery log hits 80% and with 
 the log full setting so low, we are doing TSM DB backups almost every hour 
 while clients are backing up. This really seems excessive to me.  Why would 
 an incremental backup of the TSM DB take an hour or so to run and is it 
 normal for the  recovery log to fill up so fast while backups are running?
 We even attempted to do a reorg  of the TSM DB but unfortunately it was going 
 to run for much longer than our window allowed so it had to be cancelled. I'm 
 going to try again for next weekend and hopefully talk the powers that be, 
 into a 24 hour window for the reorg. We did do a reorg years ago and the 
 performance improvements were amazing, ie expiration ran in less than an 
 hour. I know that is a bandaid but I have to do something until I can get to 
 version 6 when I can have a bigger recovery log and a new, more powerful 
 server in place.
 I guess I'm just not sure what to look at at this point and frankly I'm 
 exhausted. Our help desk is calling me daily, every day, at 6am or earlier, 
 as TSM is running slow again.
 Any suggestions on what else to look at? (Sorry for such a fragmented email. 
 I've had about 3 hours sleep at this point)


Hi John,

it looks like you may have a few nodes that are backing up much more slowly 
than the majority. You could try to reduce the transaction size for those 
nodes, that could help, if these nodes are not backing up just a single huge 
file. If you really need to, move these nodes off to a separate TSM instance on 
the same server.

Check out the bufferpool, in 'q db' you'll find the cache hit percentage, if 
that drops, your database is hitting the disk more often. Below 98% is 
unacceptable, being above 99% is recommendable. You do mention the type of 
controller, but not the type of disks. There is a lot to be gained by using 
LUNS that either stripe across a huge number of very fast disks, or setting up 
(in your case) about 4 to 6 dedicated raid-1 LUNs of 15k RPM disks for the 
database.

It sounds like your using the log in roll-forward mode. This is of course the 
recommended setting, but might be worsening the problem. You might want to 
think about using normal mode, until you upgrade to 6.1.

Btw, it sounds like you have quite a large LPAR for your TSM server, much 
larger than needed. With a database of this size, I'd guess that 2 to 4 CPU's 
and 4 GB of RAM should be plenty. Do you run other applications on your TSM 
LPAR?

-- 
Met vriendelijke groeten/Kind Regards,

Remco Post
r.p...@plcs.nl
+31 6 248 21 622