Re: Dwindling Performance
I ~LOVE~ to hear words like that ;-) BUY MORE MEMORY BUY MORE MEMORY... YOU ARE FEELING VERY SLEEPY Ben (FYI, I work for Micron, a DRAM manufacturer) http://www.crucial.com or http://www.micron.com Before I get inundated with "don't solicit" flame-mail, I'm just kidding guys :-) -Original Message- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Chris Murphy Sent: Thursday, January 15, 2004 2:57 PM To: [EMAIL PROTECTED] Subject: Re: Dwindling Performance Memory is so cheap that you can afford to throw lots of it at it. > I'm running TSM 5.1.8 on Win2K with 2G >RAM. Hi Dwight, I wanted to throw out one note of caution. I completely agree with Roger: memory is too cheap to rob your system of it! However, you said you are running your TSM server on a Windows 2000 box. Remember, Windows limits application memory space to 2GB. This can be increased to 3GB with a startup option at the expense of some operating system memory spaces. (i.e. use it carefully!). Thus, you cannot allocate all 2GB to bufferpool cache as TSM still needs additional space for object creation, I/O buffering, and its own execution. We have found that we could only safely allocate 512-768MB to the bufferpool. If we incerased it beyond that, TSM would dump (go down HARD) when under load. While our DB is rather small (20GB - 40% utilized) we still get a hit of 99.9%. Other options are: Use different OS, use an Itanium-class server, talk IBM into adding AWE support when they write/compile TSM for windows (least likely I think <8) ). HTH Chris Murphy IT Network Analyst Idaho Dept. of Lands (208) 334-0293
Re: Dwindling Performance
Memory is so cheap that you can afford to throw lots of it at it. > I'm running TSM 5.1.8 on Win2K with 2G >RAM. Hi Dwight, I wanted to throw out one note of caution. I completely agree with Roger: memory is too cheap to rob your system of it! However, you said you are running your TSM server on a Windows 2000 box. Remember, Windows limits application memory space to 2GB. This can be increased to 3GB with a startup option at the expense of some operating system memory spaces. (i.e. use it carefully!). Thus, you cannot allocate all 2GB to bufferpool cache as TSM still needs additional space for object creation, I/O buffering, and its own execution. We have found that we could only safely allocate 512-768MB to the bufferpool. If we incerased it beyond that, TSM would dump (go down HARD) when under load. While our DB is rather small (20GB - 40% utilized) we still get a hit of 99.9%. Other options are: Use different OS, use an Itanium-class server, talk IBM into adding AWE support when they write/compile TSM for windows (least likely I think <8) ). HTH Chris Murphy IT Network Analyst Idaho Dept. of Lands (208) 334-0293
Re: Dwindling Performance
Note that with SELFTUNEBUFPOOLSIZE, there are some limits on the size of the bufferpool. It used to be 10% of real memory for Windows. (Which you are not at if you have 2GB memory.) Check for the following message after Expire inventory. ANR0386I The BUFPoolsize has been changed to x You get this if you have selftuning on. But if it keeps on changing to the current value then you are probably at the limit. You then have to turn SELFTUNEBUFPOOLSIZE off and manually tune. BTW, we have 2GB memory on a Windows 2000 server and set BUFPOOLSIZE to 1 GB. -Original Message- From: Roger Deschner [mailto:[EMAIL PROTECTED] Sent: January 15, 2004 1:56 PM To: [EMAIL PROTECTED] Subject: Re: Dwindling Performance Oh, how I love quoting directly from IBM manuals: "If the value falls below 98%, consider increasing the size of the database buffer pool. For larger installations, performance could imporve significantly if your cache hit percentage is greater than 99%." --Page 396, TSM V5.1 Administrators Guide for AIX Memory is so cheap that you can afford to throw lots of it at it. Roger Deschner University of Illinois at Chicago [EMAIL PROTECTED] On Thu, 15 Jan 2004, Dwight McCann wrote: >Roger, > >I really enjoyed and appreciated your response on this issue, but you've >got me to wondering at bit. I'm running TSM 5.1.8 on Win2K with 2G >RAM. I looked at my migration times which are about 20 minutes (so >that's no problem) and then did the Q DB F=D and saw that my Cache Hit >was only 96.81. You said that if it wasn't 99% it needed help. I have >it autotuning and here are my outputs: > >tsm: DATAPLUS_SERVER1>q actlog begind=-2 search='expiration' > >Date/TimeMessage > >-- >01/13/2004 19:19:22 ANR0984I Process 177 for EXPIRATION started in the > BACKGROUND at 19:19:22. >01/13/2004 19:19:22 ANR0811I Inventory client file expiration >started as > process 177. >01/13/2004 19:37:58 ANR0812I Inventory file expiration process 177 >completed: > examined 437237 objects, deleting 87348 backup >objects, 0 > archive objects, 0 DB backup volumes, and 0 >recovery plan > files. 0 errors were encountered. >01/13/2004 19:37:58 ANR0987I Process 177 for EXPIRATION running in the > BACKGROUND processed 87348 items with a >completion state > of SUCCESS at 19:37:58. >01/14/2004 19:37:58 ANR0984I Process 182 for EXPIRATION started in the > BACKGROUND at 19:37:58. >01/14/2004 19:37:58 ANR0811I Inventory client file expiration >started as > process 182. >01/14/2004 19:58:20 ANR0812I Inventory file expiration process 182 >completed: > examined 433897 objects, deleting 104134 >backup objects, > 0 archive objects, 0 DB backup volumes, and 0 >recovery > plan files. 0 errors were encountered. >01/14/2004 19:58:20 ANR0987I Process 182 for EXPIRATION running in the > BACKGROUND processed 104134 items with a >completion state > of SUCCESS at 19:58:20. >01/15/2004 09:43:06 ANR2017I Administrator DWIGHT issued command: >QUERY ACTLOG > begind=-2 search=expiration > >tsm: DATAPLUS_SERVER1>q db f=d > > Available Space (MB): 30,000 >Assigned Capacity (MB): 25,292 >Maximum Extension (MB): 4,708 >Maximum Reduction (MB): 19,564 > Page Size (bytes): 4,096 >Total Usable Pages: 6,474,752 >Used Pages: 1,442,366 > Pct Util: 22.3 > Max. Pct Util: 22.7 > Physical Volumes: 2 > Buffer Pool Pages: 32,768 > Total Buffer Requests: 28,394,716 >Cache Hit Pct.: 96.81 > Cache Wait Pct.: 0.00 > Backup in Progress?: No >Type of Backup In Progress: > Incrementals Since Last Full: 4 >Changed Since Last Backup (MB): 0.98 >Percentage Changed: 0.02 >Last Complete Backup Date/Time: 01/15/2004 05:45:26 > >Is there something I should do to get the cache hits up to 99% even >though I don't yet have a performance issue? I realize that my >environment is quite small but it is about to double and I'd rather be >proactive. TIA, > >-- >Dwight McCann >Computer and Network Technologist, UCSB Info Systems & Computing >[EMAIL PROTECTED], [EMAIL PROTECTED] >http://borg.isc.ucsb.edu/dmm/ - office: 805-893-3113 > >
Re: Dwindling Performance
Oh, how I love quoting directly from IBM manuals: "If the value falls below 98%, consider increasing the size of the database buffer pool. For larger installations, performance could imporve significantly if your cache hit percentage is greater than 99%." --Page 396, TSM V5.1 Administrators Guide for AIX Memory is so cheap that you can afford to throw lots of it at it. Roger Deschner University of Illinois at Chicago [EMAIL PROTECTED] On Thu, 15 Jan 2004, Dwight McCann wrote: >Roger, > >I really enjoyed and appreciated your response on this issue, but you've >got me to wondering at bit. I'm running TSM 5.1.8 on Win2K with 2G >RAM. I looked at my migration times which are about 20 minutes (so >that's no problem) and then did the Q DB F=D and saw that my Cache Hit >was only 96.81. You said that if it wasn't 99% it needed help. I have >it autotuning and here are my outputs: > >tsm: DATAPLUS_SERVER1>q actlog begind=-2 search='expiration' > >Date/TimeMessage > >-- >01/13/2004 19:19:22 ANR0984I Process 177 for EXPIRATION started in the > BACKGROUND at 19:19:22. >01/13/2004 19:19:22 ANR0811I Inventory client file expiration >started as > process 177. >01/13/2004 19:37:58 ANR0812I Inventory file expiration process 177 >completed: > examined 437237 objects, deleting 87348 backup >objects, 0 > archive objects, 0 DB backup volumes, and 0 >recovery plan > files. 0 errors were encountered. >01/13/2004 19:37:58 ANR0987I Process 177 for EXPIRATION running in the > BACKGROUND processed 87348 items with a >completion state > of SUCCESS at 19:37:58. >01/14/2004 19:37:58 ANR0984I Process 182 for EXPIRATION started in the > BACKGROUND at 19:37:58. >01/14/2004 19:37:58 ANR0811I Inventory client file expiration >started as > process 182. >01/14/2004 19:58:20 ANR0812I Inventory file expiration process 182 >completed: > examined 433897 objects, deleting 104134 >backup objects, > 0 archive objects, 0 DB backup volumes, and 0 >recovery > plan files. 0 errors were encountered. >01/14/2004 19:58:20 ANR0987I Process 182 for EXPIRATION running in the > BACKGROUND processed 104134 items with a >completion state > of SUCCESS at 19:58:20. >01/15/2004 09:43:06 ANR2017I Administrator DWIGHT issued command: >QUERY ACTLOG > begind=-2 search=expiration > >tsm: DATAPLUS_SERVER1>q db f=d > > Available Space (MB): 30,000 >Assigned Capacity (MB): 25,292 >Maximum Extension (MB): 4,708 >Maximum Reduction (MB): 19,564 > Page Size (bytes): 4,096 >Total Usable Pages: 6,474,752 >Used Pages: 1,442,366 > Pct Util: 22.3 > Max. Pct Util: 22.7 > Physical Volumes: 2 > Buffer Pool Pages: 32,768 > Total Buffer Requests: 28,394,716 >Cache Hit Pct.: 96.81 > Cache Wait Pct.: 0.00 > Backup in Progress?: No >Type of Backup In Progress: > Incrementals Since Last Full: 4 >Changed Since Last Backup (MB): 0.98 >Percentage Changed: 0.02 >Last Complete Backup Date/Time: 01/15/2004 05:45:26 > >Is there something I should do to get the cache hits up to 99% even >though I don't yet have a performance issue? I realize that my >environment is quite small but it is about to double and I'd rather be >proactive. TIA, > >-- >Dwight McCann >Computer and Network Technologist, UCSB Info Systems & Computing >[EMAIL PROTECTED], [EMAIL PROTECTED] >http://borg.isc.ucsb.edu/dmm/ - office: 805-893-3113 > >
Re: Dwindling Performance
Roger, I really enjoyed and appreciated your response on this issue, but you've got me to wondering at bit. I'm running TSM 5.1.8 on Win2K with 2G RAM. I looked at my migration times which are about 20 minutes (so that's no problem) and then did the Q DB F=D and saw that my Cache Hit was only 96.81. You said that if it wasn't 99% it needed help. I have it autotuning and here are my outputs: tsm: DATAPLUS_SERVER1>q actlog begind=-2 search='expiration' Date/TimeMessage -- 01/13/2004 19:19:22 ANR0984I Process 177 for EXPIRATION started in the BACKGROUND at 19:19:22. 01/13/2004 19:19:22 ANR0811I Inventory client file expiration started as process 177. 01/13/2004 19:37:58 ANR0812I Inventory file expiration process 177 completed: examined 437237 objects, deleting 87348 backup objects, 0 archive objects, 0 DB backup volumes, and 0 recovery plan files. 0 errors were encountered. 01/13/2004 19:37:58 ANR0987I Process 177 for EXPIRATION running in the BACKGROUND processed 87348 items with a completion state of SUCCESS at 19:37:58. 01/14/2004 19:37:58 ANR0984I Process 182 for EXPIRATION started in the BACKGROUND at 19:37:58. 01/14/2004 19:37:58 ANR0811I Inventory client file expiration started as process 182. 01/14/2004 19:58:20 ANR0812I Inventory file expiration process 182 completed: examined 433897 objects, deleting 104134 backup objects, 0 archive objects, 0 DB backup volumes, and 0 recovery plan files. 0 errors were encountered. 01/14/2004 19:58:20 ANR0987I Process 182 for EXPIRATION running in the BACKGROUND processed 104134 items with a completion state of SUCCESS at 19:58:20. 01/15/2004 09:43:06 ANR2017I Administrator DWIGHT issued command: QUERY ACTLOG begind=-2 search=expiration tsm: DATAPLUS_SERVER1>q db f=d Available Space (MB): 30,000 Assigned Capacity (MB): 25,292 Maximum Extension (MB): 4,708 Maximum Reduction (MB): 19,564 Page Size (bytes): 4,096 Total Usable Pages: 6,474,752 Used Pages: 1,442,366 Pct Util: 22.3 Max. Pct Util: 22.7 Physical Volumes: 2 Buffer Pool Pages: 32,768 Total Buffer Requests: 28,394,716 Cache Hit Pct.: 96.81 Cache Wait Pct.: 0.00 Backup in Progress?: No Type of Backup In Progress: Incrementals Since Last Full: 4 Changed Since Last Backup (MB): 0.98 Percentage Changed: 0.02 Last Complete Backup Date/Time: 01/15/2004 05:45:26 Is there something I should do to get the cache hits up to 99% even though I don't yet have a performance issue? I realize that my environment is quite small but it is about to double and I'd rather be proactive. TIA, -- Dwight McCann Computer and Network Technologist, UCSB Info Systems & Computing [EMAIL PROTECTED], [EMAIL PROTECTED] http://borg.isc.ucsb.edu/dmm/ - office: 805-893-3113
Re: Dwindling Performance
Hmmm, interesting that the expire inventory grinds to a halt during incremental backups. My setup is AIX similar to yours (host, DB size) although my disks are on locally attached SSA drives. I recently upgraded my 8 TSM servers from TSM 5.1.1.0 to 5.2.1.3 (mainly to get the NDMP file-level backups, finally). On one of them I saw the same issue. They are all set up almost identically, so why 1 would misbehave is a mystery to me. To fix the immediate problem, I put a " duration=" on the expire inventory job so that it would only run during the day when backups are less likely. Sure, the expire inventory now takes 2 days to run, but it's better than having all the backups go extremely slow and not complete. I then started to look into the performance issues. Some of the things I have done: - I changed the DB volumes from JFS to raw (that made a very good improvement). - Turn the SSA fastwrite cache on the db volumes. - Tried out these settings for vmtune (gleaned from this listsrv) /usr/samples/kernel/vmtune -t10 -P10 -p5 -s1 -W16 -c8 -R256 -F512 -u25 -b2200 -B2200 All of these changes have improved the speed of the expire inventory, but to be honest I haven't tried to run the expire inventory during the incremental backups since. Once bitten twice shy, and I can live with the expire inventory taking 2 days to complete. That's kind of where I am now. No solid solution, but improved performance enough that it's workable now. I'd love to hear what other changes you make to resolve your situation. Ben Micron Technology Inc. Boise, Id -Original Message- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Andy Carlson Sent: Wednesday, January 14, 2004 7:07 AM To: [EMAIL PROTECTED] Subject: Re: Dwindling Performance Thanks for the quick response. Expiration is not finishing. Before the main backups start, it maybe expires 20 objects, but during the backup window it slows to a crawl. It picks up some during the day when the backups and migrations are running, but since we now have some 100 sessions not finished, its slow then too. I didn't look at randomize, but these sessions are staying out there for hours. I will take a look at that today. I currently have them doing an incrbydate every other day, and a full incr the othter. The cache hit ratio of the database is about 98.5%, but we have about 3.5GB of memory in the cache. I don't think I can go much higher, but I will try it if I can. P.S. The TSMI clients are Windows and Netware, the TSMU are Unix and a couple of VMS. Thanks for the input. Andy Carlson|\ _,,,---,,_ Senior Technical Specialist ZZZzz /,`.-'`'-. ;-;;,_ BJC Health Care|,4- ) )-,_. ,\ ( `'-' St. Louis, Missouri '---''(_/--' `-'\_) Cat Pics: http://andyc.dyndns.org/animal.html On Wed, 14 Jan 2004, Roger Deschner wrote: > I have posted many times in the past saying you should never do a > database unload/reload to gain performance. But this just might be the > one case where it might make sense - the remaining half of a split > server. But before you do something that drastic, dangerous, and time > consuming, look for the things that are easier to fix. > > My basic metric of whether or not you are in trouble is, how long does > expiration take? If you start it daily, the closer it is to 24 hours > running time, the closer you are to doomsday. Never-ending expiration > is the classic symptom of TSM Server Meltdown. > > But on the other hand, if your expiration runs nice and fast, your > server and its database are probably OK. Look to clients as the > problem. They can't all squeeze in the door at once, so don't let them > try. If they use the client-polling scheduler, how long is the backup > window, and what is your setting for Schedule Randomization > Percentage? Make it as high as possible - SET RANDOMIZE 50. This will > also help if you are having any kind of a network bottleneck. > > Look at these clients on a micro level. About how much are they each > actually backing up? If it's not much, then your theory might be > right, that they are very busy downloading their lists of backed up > files. In that case, load spreading will be the best thing you could > do. You might consider a schedule where not every client does a full > "Incremental" every night - perhaps they only do one every other night > and on the other nights they do an "incrbydate" backup which is much > faster, because it goes only by the timestamps in the file system. > > Not to ask the obvious, but what's your Database Cac
Re: Dwindling Performance
Andy, Have you thought about using the TSM Journal Service? If you're building a ton of directories/files but not backing up much the journal will cut down on the processing and keep your sessions down to a minimum. Just a thought... Brian Scott EDS - EOGDE GM Distributed Management Systems Engineering MS 3234 750 Tower Drive Troy, MI 48098 * phone: +01-248-265-4596 (8-365) * mailto:[EMAIL PROTECTED] -Original Message- From: Andy Carlson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 14, 2004 9:07 AM To: [EMAIL PROTECTED] Subject: Re: Dwindling Performance Thanks for the quick response. Expiration is not finishing. Before the main backups start, it maybe expires 20 objects, but during the backup window it slows to a crawl. It picks up some during the day when the backups and migrations are running, but since we now have some 100 sessions not finished, its slow then too. I didn't look at randomize, but these sessions are staying out there for hours. I will take a look at that today. I currently have them doing an incrbydate every other day, and a full incr the othter. The cache hit ratio of the database is about 98.5%, but we have about 3.5GB of memory in the cache. I don't think I can go much higher, but I will try it if I can. P.S. The TSMI clients are Windows and Netware, the TSMU are Unix and a couple of VMS. Thanks for the input. Andy Carlson|\ _,,,---,,_ Senior Technical Specialist ZZZzz /,`.-'`'-. ;-;;,_ BJC Health Care|,4- ) )-,_. ,\ ( `'-' St. Louis, Missouri '---''(_/--' `-'\_) Cat Pics: http://andyc.dyndns.org/animal.html On Wed, 14 Jan 2004, Roger Deschner wrote: > I have posted many times in the past saying you should never do a > database unload/reload to gain performance. But this just might be the > one case where it might make sense - the remaining half of a split > server. But before you do something that drastic, dangerous, and time > consuming, look for the things that are easier to fix. > > My basic metric of whether or not you are in trouble is, how long does > expiration take? If you start it daily, the closer it is to 24 hours > running time, the closer you are to doomsday. Never-ending expiration > is the classic symptom of TSM Server Meltdown. > > But on the other hand, if your expiration runs nice and fast, your > server and its database are probably OK. Look to clients as the > problem. They can't all squeeze in the door at once, so don't let them > try. If they use the client-polling scheduler, how long is the backup > window, and what is your setting for Schedule Randomization > Percentage? Make it as high as possible - SET RANDOMIZE 50. This will > also help if you are having any kind of a network bottleneck. > > Look at these clients on a micro level. About how much are they each > actually backing up? If it's not much, then your theory might be > right, that they are very busy downloading their lists of backed up > files. In that case, load spreading will be the best thing you could > do. You might consider a schedule where not every client does a full > "Incremental" every night - perhaps they only do one every other night > and on the other nights they do an "incrbydate" backup which is much > faster, because it goes only by the timestamps in the file system. > > Not to ask the obvious, but what's your Database Cache Hit Percentage? > (Q DB F=D) If it's below 99%, it needs help. Even (especially) a badly > fragmented database will run a lot faster if you have it swimming in > cache. > > Look at other differences between your two instances - are they > basically different types of clients? > > Roger Deschner University of Illinois at Chicago [EMAIL PROTECTED] > The short fortuneteller who escaped from > prison= ==was a small medium at > large.== > > > > On Tue, 13 Jan 2004, Andy Carlson wrote: > > >We are having terrible performance with one of our instances of TSM. > >I have suspicions, but I want to hear what you guys say. Here is > >what we > >have: > > > >2 instances of TSM - TSMI and TSMU (TSMI is the problem) > > > >TSM 5.2.1.1 > >AIX 51.ML4 > >RS/6000 P670 - 8 processors, 16GB memory > >Fastt700 SAN > >STK9840 Tape drives > > > >The Database is 85% of 88GB (with room to expand another 50GB or so). > > > >Right at this moment, we have 233 sessions with TSMI. The backup > >sessions grind to a halt for hours at a time, with nothing apparently > >happening. I suspect that the directory trees a
Re: Dwindling Performance
I'll supply as much information as anyone wants if they have any clues about what could be going on. I will take a look at the current performance book, but I have looked at it in the past. Andy Carlson|\ _,,,---,,_ Senior Technical Specialist ZZZzz /,`.-'`'-. ;-;;,_ BJC Health Care|,4- ) )-,_. ,\ ( `'-' St. Louis, Missouri '---''(_/--' `-'\_) Cat Pics: http://andyc.dyndns.org/animal.html On Wed, 14 Jan 2004, Richard Sims wrote: > >We are having terrible performance with one of our instances of TSM. ... > > Andy - We'd like to help, but would need a lot more information about the >context of the issue, and in particular what you've already > investigated. > > I'd refer you first to the TSM Performance Tuning Guide at > http://publib.boulder.ibm.com/tividd/td/IBMStorageManagerMessages5.2.2.html > I also have performance issue summaries in ADSM QuickFacts, compiled from > our community experiences over the years. > > Richard Sims, BU >
Re: Dwindling Performance
Thanks for the quick response. Expiration is not finishing. Before the main backups start, it maybe expires 20 objects, but during the backup window it slows to a crawl. It picks up some during the day when the backups and migrations are running, but since we now have some 100 sessions not finished, its slow then too. I didn't look at randomize, but these sessions are staying out there for hours. I will take a look at that today. I currently have them doing an incrbydate every other day, and a full incr the othter. The cache hit ratio of the database is about 98.5%, but we have about 3.5GB of memory in the cache. I don't think I can go much higher, but I will try it if I can. P.S. The TSMI clients are Windows and Netware, the TSMU are Unix and a couple of VMS. Thanks for the input. Andy Carlson|\ _,,,---,,_ Senior Technical Specialist ZZZzz /,`.-'`'-. ;-;;,_ BJC Health Care|,4- ) )-,_. ,\ ( `'-' St. Louis, Missouri '---''(_/--' `-'\_) Cat Pics: http://andyc.dyndns.org/animal.html On Wed, 14 Jan 2004, Roger Deschner wrote: > I have posted many times in the past saying you should never do a > database unload/reload to gain performance. But this just might be the > one case where it might make sense - the remaining half of a split > server. But before you do something that drastic, dangerous, and time > consuming, look for the things that are easier to fix. > > My basic metric of whether or not you are in trouble is, how long does > expiration take? If you start it daily, the closer it is to 24 hours > running time, the closer you are to doomsday. Never-ending expiration is > the classic symptom of TSM Server Meltdown. > > But on the other hand, if your expiration runs nice and fast, your > server and its database are probably OK. Look to clients as the problem. > They can't all squeeze in the door at once, so don't let them try. If > they use the client-polling scheduler, how long is the backup window, > and what is your setting for Schedule Randomization Percentage? Make it > as high as possible - SET RANDOMIZE 50. This will also help if you are > having any kind of a network bottleneck. > > Look at these clients on a micro level. About how much are they each > actually backing up? If it's not much, then your theory might be > right, that they are very busy downloading their lists of backed up > files. In that case, load spreading will be the best thing you could > do. You might consider a schedule where not every client does a full > "Incremental" every night - perhaps they only do one every other night > and on the other nights they do an "incrbydate" backup which is much > faster, because it goes only by the timestamps in the file system. > > Not to ask the obvious, but what's your Database Cache Hit Percentage? > (Q DB F=D) If it's below 99%, it needs help. Even (especially) a badly > fragmented database will run a lot faster if you have it swimming in > cache. > > Look at other differences between your two instances - are they > basically different types of clients? > > Roger Deschner University of Illinois at Chicago [EMAIL PROTECTED] > The short fortuneteller who escaped from prison= > ==was a small medium at large.== > > > > On Tue, 13 Jan 2004, Andy Carlson wrote: > > >We are having terrible performance with one of our instances of TSM. I > >have suspicions, but I want to hear what you guys say. Here is what we > >have: > > > >2 instances of TSM - TSMI and TSMU (TSMI is the problem) > > > >TSM 5.2.1.1 > >AIX 51.ML4 > >RS/6000 P670 - 8 processors, 16GB memory > >Fastt700 SAN > >STK9840 Tape drives > > > >The Database is 85% of 88GB (with room to expand another 50GB or so). > > > >Right at this moment, we have 233 sessions with TSMI. The backup > >sessions grind to a halt for hours at a time, with nothing apparently > >happening. I suspect that the directory trees are being downloaded and > >built, but not sure > > > >When we split TSMI and TSMU, we created the TSMU instance, and did a > >full backup on all the servers that moved there. The TSMI database is a > >restored copy of the original database, with the TSMU stuff deleted out. > > > >Any ideas would be greatly appreciated. > > > > > >Andy Carlson|\ _,,,---,,_ > >Senior Technical Specialist ZZZzz /,`.-'`'-. ;-;;,_ > >BJC Health Care|,4- ) )-,_. ,\ ( `'-' > >St. Louis, Missouri '---''(_/--' `-'\_) > >Cat Pics: http://andyc.dyndns.org/animal.html > > >
Re: Dwindling Performance
>We are having terrible performance with one of our instances of TSM. ... Andy - We'd like to help, but would need a lot more information about the context of the issue, and in particular what you've already investigated. I'd refer you first to the TSM Performance Tuning Guide at http://publib.boulder.ibm.com/tividd/td/IBMStorageManagerMessages5.2.2.html I also have performance issue summaries in ADSM QuickFacts, compiled from our community experiences over the years. Richard Sims, BU
Re: Dwindling Performance
Andy - what kind of clients do you have? I've had problems with Windows clients all trying to do systemobject backups simultaneously and had to reorganize my nightly work to split out the systemobject backups from the regular files and then to serialize them because of contention problems with the TSM database. It's supposed to be better in 5.2 but I'm not there yet. Also, you may be right about the directory trees; if your stalls seem to happen at the beginning of the backups you can look at the cpu being used by the TSM BA client process and see if it's churning away. If so, that may be what's happening. Andy Carlson <[EMAIL PROTECTED]> wrote: We are having terrible performance with one of our instances of TSM. I have suspicions, but I want to hear what you guys say. Here is what we have: 2 instances of TSM - TSMI and TSMU (TSMI is the problem) TSM 5.2.1.1 AIX 51.ML4 RS/6000 P670 - 8 processors, 16GB memory Fastt700 SAN STK9840 Tape drives The Database is 85% of 88GB (with room to expand another 50GB or so). Right at this moment, we have 233 sessions with TSMI. The backup sessions grind to a halt for hours at a time, with nothing apparently happening. I suspect that the directory trees are being downloaded and built, but not sure When we split TSMI and TSMU, we created the TSMU instance, and did a full backup on all the servers that moved there. The TSMI database is a restored copy of the original database, with the TSMU stuff deleted out. Any ideas would be greatly appreciated. Andy Carlson |\ _,,,---,,_ Senior Technical Specialist ZZZzz /,`.-'`' -. ;-;;,_ BJC Health Care |,4- ) )-,_. ,\ ( `'-' St. Louis, Missouri '---''(_/--' `-'\_) Cat Pics: http://andyc.dyndns.org/animal.html Joe Howell Shelter Insurance Companies Columbia, MO - Do you Yahoo!? Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
Re: Dwindling Performance
I have posted many times in the past saying you should never do a database unload/reload to gain performance. But this just might be the one case where it might make sense - the remaining half of a split server. But before you do something that drastic, dangerous, and time consuming, look for the things that are easier to fix. My basic metric of whether or not you are in trouble is, how long does expiration take? If you start it daily, the closer it is to 24 hours running time, the closer you are to doomsday. Never-ending expiration is the classic symptom of TSM Server Meltdown. But on the other hand, if your expiration runs nice and fast, your server and its database are probably OK. Look to clients as the problem. They can't all squeeze in the door at once, so don't let them try. If they use the client-polling scheduler, how long is the backup window, and what is your setting for Schedule Randomization Percentage? Make it as high as possible - SET RANDOMIZE 50. This will also help if you are having any kind of a network bottleneck. Look at these clients on a micro level. About how much are they each actually backing up? If it's not much, then your theory might be right, that they are very busy downloading their lists of backed up files. In that case, load spreading will be the best thing you could do. You might consider a schedule where not every client does a full "Incremental" every night - perhaps they only do one every other night and on the other nights they do an "incrbydate" backup which is much faster, because it goes only by the timestamps in the file system. Not to ask the obvious, but what's your Database Cache Hit Percentage? (Q DB F=D) If it's below 99%, it needs help. Even (especially) a badly fragmented database will run a lot faster if you have it swimming in cache. Look at other differences between your two instances - are they basically different types of clients? Roger Deschner University of Illinois at Chicago [EMAIL PROTECTED] The short fortuneteller who escaped from prison= ==was a small medium at large.== On Tue, 13 Jan 2004, Andy Carlson wrote: >We are having terrible performance with one of our instances of TSM. I >have suspicions, but I want to hear what you guys say. Here is what we >have: > >2 instances of TSM - TSMI and TSMU (TSMI is the problem) > >TSM 5.2.1.1 >AIX 51.ML4 >RS/6000 P670 - 8 processors, 16GB memory >Fastt700 SAN >STK9840 Tape drives > >The Database is 85% of 88GB (with room to expand another 50GB or so). > >Right at this moment, we have 233 sessions with TSMI. The backup >sessions grind to a halt for hours at a time, with nothing apparently >happening. I suspect that the directory trees are being downloaded and >built, but not sure > >When we split TSMI and TSMU, we created the TSMU instance, and did a >full backup on all the servers that moved there. The TSMI database is a >restored copy of the original database, with the TSMU stuff deleted out. > >Any ideas would be greatly appreciated. > > >Andy Carlson|\ _,,,---,,_ >Senior Technical Specialist ZZZzz /,`.-'`'-. ;-;;,_ >BJC Health Care|,4- ) )-,_. ,\ ( `'-' >St. Louis, Missouri '---''(_/--' `-'\_) >Cat Pics: http://andyc.dyndns.org/animal.html >