Re: TSM performance very poor, Recovery log is being pinned
Switching to RLV: Some CPU time that would be used for OS overhead for filesystem is freed. This could be used for running one more TSM instance? Memory that would be used for filesystem cache is freed. This could be used by TSM for buffers? I don't know what effect either would have on overall throughput or efficiency, but it would be fascinating to find out. The relative ease of setup of RLV depends on the filesystem it is compared with? On extent based filesystems( like NTFS), dsmfmt finishes very quickly? [RC] On Wednesday, August 01, 2007, at 04:32AM, "Richard Sims" <[EMAIL PROTECTED]> wrote: >On Jul 31, 2007, at 11:59 PM, Stuart Lamble wrote: > >> I am not going to enter into a debate about the relative merits of >> raw volumes versus files on filesystems, as I have insufficient >> direct knowledge to judge either way (I'm trusting a more senior >> colleague to make the right call there. :) > >I'll jump in anyway... >The pure simplicity of RLVs makes them a joy to implement, compared >to the time-consuming work entailed in implementing a TSM volume >within a file system. Their simplicity almost makes them mandatory >where rapid disaster recovery is vital, as there's far less TSM >server set-up time getting in the way of recovering your >organization's functionality. However, RLV access amounts to >unbuffered I/O, and that exacts a performance penalty relative to >file system volumes, where read-ahead provides a nice boost when the >task at hand involves stepping through the volume. I use RLVs, and >it's apparent that Migration is relatively sluggish, as in a disk >storage pool struggling to stay empty enough to handle all the >incoming client backup data so as to prevent some backups having to >go directly to tape. The TSM Performance Tuning Guide cautions about >this. > > Richard Sims, Sr. Systems Programmer at Boston University > >
Re: TSM performance very poor, Recovery log is being pinned
On Jul 31, 2007, at 11:59 PM, Stuart Lamble wrote: I am not going to enter into a debate about the relative merits of raw volumes versus files on filesystems, as I have insufficient direct knowledge to judge either way (I'm trusting a more senior colleague to make the right call there. :) I'll jump in anyway... The pure simplicity of RLVs makes them a joy to implement, compared to the time-consuming work entailed in implementing a TSM volume within a file system. Their simplicity almost makes them mandatory where rapid disaster recovery is vital, as there's far less TSM server set-up time getting in the way of recovering your organization's functionality. However, RLV access amounts to unbuffered I/O, and that exacts a performance penalty relative to file system volumes, where read-ahead provides a nice boost when the task at hand involves stepping through the volume. I use RLVs, and it's apparent that Migration is relatively sluggish, as in a disk storage pool struggling to stay empty enough to handle all the incoming client backup data so as to prevent some backups having to go directly to tape. The TSM Performance Tuning Guide cautions about this. Richard Sims, Sr. Systems Programmer at Boston University
Re: TSM performance very poor, Recovery log is being pinned
On 29/07/2007, at 10:03 PM, Stapleton, Mark wrote: From: ADSM: Dist Stor Manager on behalf of Craig Ross TSM is installed on Solaris 10 This is something that popped right out for me. Do you have your storage pools located on raw logical volumes or mounted filesystems? If the latter, that might be your problem. Solaris has traditionally had incredibly poor throughput performance on mounted filesystems. You might give thought to rebuilding those storage pools on raw logical volumes. Of course, that will require that you completely flush all data from your disk storage pools to tape storage pools first, so as not to lose client data. A small trap for young players: TSM has constraints in place to stop it writing to cylinder 0 of a raw volume on Solaris. If you direct TSM at slice 2, or some other slice that includes cylinder 0, it will barf, and the error message is rather cryptic (sorry for the vagueness; it's been a year or so since I bumped into this. At the time, I was working on the storage pool volume level, but I would expect to see similar behaviour for a DB or log volume.) Workaround is simple: make slice 0 include the entire disk starting from cylinder 1, and use slice 0 as the raw volume. I am not going to enter into a debate about the relative merits of raw volumes versus files on filesystems, as I have insufficient direct knowledge to judge either way (I'm trusting a more senior colleague to make the right call there. :)
Re: TSM performance very poor, Recovery log is being pinned
In the TSM admin guide for AIX, look up "raw volumes". It has examples. > Evening I have been watching these comments with interest as we are > currently in the process of building a new TSM server. > Discussing with colleagues we are baffled by how you create the TSM log > or DB on a raw presented Lun without creating at least a JFS for a mount > point first. We are running AIX 5.3. > > Look forward to your response > > Regards > > -Original Message- > From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of > Craig Ross > Sent: Tuesday, 31 July 2007 7:09 PM > To: ADSM-L@VM.MARIST.EDU > Subject: Re: [ADSM-L] TSM performance very poor, Recovery log is being > pinned > > Thanks guys. > > All this advise is much appreciated. > For the record my TSM servers seems to have returned to more normal > routines. > However the log is still being pinned and log fills up to about 12% then > flushes, not ideal but acceptable. > I am starting to wonder if my install has always had the log pinning > issue I > just did not know!! > I will keep close eye on it. But I feel ultimately I will install 2nd > TSM > server and migrate some Node's to new server. > I did find a bad configured batch of new SAN storage which is promoting > the > slowdown/pinning. I have stopped using this particular disk and > performance > has returned to norm. However this was not obvious before because > server > was busy, I could not pin point easy as once stopping the trouble spot > TSM > was not jumping into life straight away it had too process the back log > of > requests And of course I could not afford to leave server on go slow for > long periods. > > As result I am currently in review of all new Storage installed to > ensure > its running optimally. > > I will propose to management to install new fast disk for log and DB as > the > pinning still issue. > > However this is where the debate continues, I have experimented with DB > volumes log volumes and storage volumes with FS and raw volumes and have > not > seen any performance difference (on the same disks) ie have deleted > volumes > created RAW and FS and seen no difference. > I have read material on both sides of this story and no seems more > convincing than the other except I remember reading somewhere the only > eyebrow raiser was with RAW on solaris you can have issues, I cannot > remember exact issue but potential is there. So after testing both > formats > finding no difference. I use FS for everything, if this is an > indisputable > mistake please let me know. > Also we have 10 10 Gb volumes for DB should I create more smaller ones? > > While I am here though I have another cloudy area, since upgrading > server > from 5.1 to 5.3 installing IBM tape device driver and adding 4 LTO3 > drives I > swear my 6 old LTO 1 drives are running slower than previous is there > some > gotchas when installing IBM Tape to get drives running well! > > 6 LTO 1 drives are SCSI attached and 4 new LTO 3 are fibre. > > Cheers > > A happier TSM administrator the last 2 days :> > > > > > On 7/31/07, Roger Deschner <[EMAIL PROTECTED]> wrote: >> >> . >> I think you are right about the Log - it need not be spread across >> multiple volumes. It's only got one writer. >> >> Your RAID type can affect the performance of the Disk Storage Pools > and >> the Database dramatically. In particular, RAID5 is very poorly suited >> for this, because it is 50% writes. RAID5 is also not ideal for the >> Database, though it can be tolerated for the Log. RAID10 is much > better. >> >> You should be using fast disks, not SATA, for the primary Disk Storage >> Pools. I've got 10,000rpm IBM SSA disks for these. >> >> I use RAID10 for the Disk Storage Pools. I use JBOD disks with TSM >> mirroring for the Log and Database. This is slightly slower than OS >> mirroring or RAID-array mirroring, but it is somewhat safer. Each >> physical volume for Storage Pools and Database is broken into many >> Logical Volumes. >> >> You should be saving your fastest disks for the Database. I've got >> 15,000prm disks for the Database. When I moved the Database from >> 10,000rpm disks to 15,000rpm disks, everything in TSM got noticeably >> faster. For instance, DB backups now take 1/3 less time. RAID boxes > just >> get in the way for the Database; it really runs best on JBOD disks > with >> TSM doing the mirroring. >> >> Here's a controversial paper written by a guy at Oracle. He says you >> should "Stripe And Mirror Everything" (S.A.M.E.) I've read and reread >> this several times, and while I definitely do not agree with > everything >> said, it does raise some very interesting points that definitely apply >> to TSM. For one thing he strongly advocates RAID10, as do I. >> > http://www.oracle.com/technology/deploy/availability/pdf/oow2000_same.pd > f >> >> Most of my Log pinning problems have been caused by clients. If a > client >> suffers a networking problem (typically a half-duplex vs. full-duplex >> conflict) an
Re: TSM performance very poor, Recovery log is being pinned
To create a db with raw volumes . . . . create a vg: mkvg . . . . create log vols: mklv for each log vol create dbvols: mklv for each db vol create db . . . Here is the script/cmd I used to create a tsm db on the raw vols rsfebkup7p.fenetwork.com:/tsmdata/tsm3/config==>cat z_tsm3_s1_format_db.ksh #!/bin/ksh # # Set the language # export LANG=en_US # # Max out size of data area # ulimit -d unlimited # # Allow the server to pack shared memory segments # export EXTSHM=ON # setup to run tsm3 cd /tsmdata/tsm3/config export PATH=${PATH}:/usr/tivoli/tsm/server/bin export DSMSERV_DIR=/usr/tivoli/tsm/server/bin export DSMSERV_CONFIG=/tsmdata/tsm3/config/dsmserv.opt export DSMSERV_ACCOUNTING_DIR=/tsmdata/tsm3/config dsmserv_tsm3 format 3 /dev/rtsm3log01 \ /dev/rtsm3log02 \ /dev/rtsm3log03 \ 9 /dev/rtsm3db01 \ /dev/rtsm3db02 \ /dev/rtsm3db03 \ /dev/rtsm3db04 \ /dev/rtsm3db05 \ /dev/rtsm3db06 \ /dev/rtsm3db07 \ /dev/rtsm3db08 \ /dev/rtsm3db09 Mark Scott <[EMAIL PROTECTED] COM.AU>To Sent by: "ADSM: ADSM-L@VM.MARIST.EDU Dist Stor cc Manager" <[EMAIL PROTECTED] Subject .EDU> Re: TSM performance very poor, Recovery log is being pinned 07/31/2007 08:16 AM Please respond to "ADSM: Dist Stor Manager" <[EMAIL PROTECTED] .EDU> Evening I have been watching these comments with interest as we are currently in the process of building a new TSM server. Discussing with colleagues we are baffled by how you create the TSM log or DB on a raw presented Lun without creating at least a JFS for a mount point first. We are running AIX 5.3. Look forward to your response Regards -Original Message- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Craig Ross Sent: Tuesday, 31 July 2007 7:09 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] TSM performance very poor, Recovery log is being pinned Thanks guys. All this advise is much appreciated. For the record my TSM servers seems to have returned to more normal routines. However the log is still being pinned and log fills up to about 12% then flushes, not ideal but acceptable. I am starting to wonder if my install has always had the log pinning issue I just did not know!! I will keep close eye on it. But I feel ultimately I will install 2nd TSM server and migrate some Node's to new server. I did find a bad configured batch of new SAN storage which is promoting the slowdown/pinning. I have stopped using this particular disk and performance has returned to norm. However this was not obvious before because server was busy, I could not pin point easy as once stopping the trouble spot TSM was not jumping into life straight away it had too process the back log of requests And of course I could not afford to leave server on go slow for long periods. As result I am currently in review of all new Storage installed to ensure its running optimally. I will propose to management to install new fast disk for log and DB as the pinning still issue. However this is where the debate continues, I have experimented with DB volumes log volumes and storage volumes with FS and raw volumes and have not seen any performance difference (on the same disks) ie have deleted volumes created RAW and FS and seen no difference. I have read material on both sides of this story and no seems more convincing than the other except I remember reading somewhere the only eyebrow raiser was with RAW on solaris you can have issues, I cannot remember exact issue but potential is there. So after testing both formats finding no difference. I use FS for everything, if this is an indisputable mistake please let me know. Also we have 10 10 Gb volumes for DB should I create more smaller ones? While I am here though I have another cloudy area, since upgrading server from 5.1 to 5.3 installing IBM tape device driver and adding 4 LTO3 drives I swear my 6 old LTO 1 drives are running slower than previous is there some gotchas when installing IBM Tape to get drives running well! 6 LTO 1 drives are SCSI attached and 4 new LTO 3 are fibre. Cheers A happier TSM administrator the last 2 days :> On 7/31/07, Roger Deschner <[EMAIL PROTECTED]> wrote: > > . > I think you are right about the Log - it need not be spread across > multiple volumes. It's only got one w
Re: TSM performance very poor, Recovery log is being pinned
Evening I have been watching these comments with interest as we are currently in the process of building a new TSM server. Discussing with colleagues we are baffled by how you create the TSM log or DB on a raw presented Lun without creating at least a JFS for a mount point first. We are running AIX 5.3. Look forward to your response Regards -Original Message- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Craig Ross Sent: Tuesday, 31 July 2007 7:09 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] TSM performance very poor, Recovery log is being pinned Thanks guys. All this advise is much appreciated. For the record my TSM servers seems to have returned to more normal routines. However the log is still being pinned and log fills up to about 12% then flushes, not ideal but acceptable. I am starting to wonder if my install has always had the log pinning issue I just did not know!! I will keep close eye on it. But I feel ultimately I will install 2nd TSM server and migrate some Node's to new server. I did find a bad configured batch of new SAN storage which is promoting the slowdown/pinning. I have stopped using this particular disk and performance has returned to norm. However this was not obvious before because server was busy, I could not pin point easy as once stopping the trouble spot TSM was not jumping into life straight away it had too process the back log of requests And of course I could not afford to leave server on go slow for long periods. As result I am currently in review of all new Storage installed to ensure its running optimally. I will propose to management to install new fast disk for log and DB as the pinning still issue. However this is where the debate continues, I have experimented with DB volumes log volumes and storage volumes with FS and raw volumes and have not seen any performance difference (on the same disks) ie have deleted volumes created RAW and FS and seen no difference. I have read material on both sides of this story and no seems more convincing than the other except I remember reading somewhere the only eyebrow raiser was with RAW on solaris you can have issues, I cannot remember exact issue but potential is there. So after testing both formats finding no difference. I use FS for everything, if this is an indisputable mistake please let me know. Also we have 10 10 Gb volumes for DB should I create more smaller ones? While I am here though I have another cloudy area, since upgrading server from 5.1 to 5.3 installing IBM tape device driver and adding 4 LTO3 drives I swear my 6 old LTO 1 drives are running slower than previous is there some gotchas when installing IBM Tape to get drives running well! 6 LTO 1 drives are SCSI attached and 4 new LTO 3 are fibre. Cheers A happier TSM administrator the last 2 days :> On 7/31/07, Roger Deschner <[EMAIL PROTECTED]> wrote: > > . > I think you are right about the Log - it need not be spread across > multiple volumes. It's only got one writer. > > Your RAID type can affect the performance of the Disk Storage Pools and > the Database dramatically. In particular, RAID5 is very poorly suited > for this, because it is 50% writes. RAID5 is also not ideal for the > Database, though it can be tolerated for the Log. RAID10 is much better. > > You should be using fast disks, not SATA, for the primary Disk Storage > Pools. I've got 10,000rpm IBM SSA disks for these. > > I use RAID10 for the Disk Storage Pools. I use JBOD disks with TSM > mirroring for the Log and Database. This is slightly slower than OS > mirroring or RAID-array mirroring, but it is somewhat safer. Each > physical volume for Storage Pools and Database is broken into many > Logical Volumes. > > You should be saving your fastest disks for the Database. I've got > 15,000prm disks for the Database. When I moved the Database from > 10,000rpm disks to 15,000rpm disks, everything in TSM got noticeably > faster. For instance, DB backups now take 1/3 less time. RAID boxes just > get in the way for the Database; it really runs best on JBOD disks with > TSM doing the mirroring. > > Here's a controversial paper written by a guy at Oracle. He says you > should "Stripe And Mirror Everything" (S.A.M.E.) I've read and reread > this several times, and while I definitely do not agree with everything > said, it does raise some very interesting points that definitely apply > to TSM. For one thing he strongly advocates RAID10, as do I. > http://www.oracle.com/technology/deploy/availability/pdf/oow2000_same.pd f > > Most of my Log pinning problems have been caused by clients. If a client > suffers a networking problem (typically a half-duplex vs. full-duplex > conflict) and if that client tries to back up a large file such as a > movie, that can pin the log on our system until it fills completely. > Minimum throughput controls in TSM can help here, though it can still > happen. I wrote a daemon that watches the Log fullness and if it gets to > ab
Re: TSM performance very poor, Recovery log is being pinned
Thanks guys. All this advise is much appreciated. For the record my TSM servers seems to have returned to more normal routines. However the log is still being pinned and log fills up to about 12% then flushes, not ideal but acceptable. I am starting to wonder if my install has always had the log pinning issue I just did not know!! I will keep close eye on it. But I feel ultimately I will install 2nd TSM server and migrate some Node's to new server. I did find a bad configured batch of new SAN storage which is promoting the slowdown/pinning. I have stopped using this particular disk and performance has returned to norm. However this was not obvious before because server was busy, I could not pin point easy as once stopping the trouble spot TSM was not jumping into life straight away it had too process the back log of requests And of course I could not afford to leave server on go slow for long periods. As result I am currently in review of all new Storage installed to ensure its running optimally. I will propose to management to install new fast disk for log and DB as the pinning still issue. However this is where the debate continues, I have experimented with DB volumes log volumes and storage volumes with FS and raw volumes and have not seen any performance difference (on the same disks) ie have deleted volumes created RAW and FS and seen no difference. I have read material on both sides of this story and no seems more convincing than the other except I remember reading somewhere the only eyebrow raiser was with RAW on solaris you can have issues, I cannot remember exact issue but potential is there. So after testing both formats finding no difference. I use FS for everything, if this is an indisputable mistake please let me know. Also we have 10 10 Gb volumes for DB should I create more smaller ones? While I am here though I have another cloudy area, since upgrading server from 5.1 to 5.3 installing IBM tape device driver and adding 4 LTO3 drives I swear my 6 old LTO 1 drives are running slower than previous is there some gotchas when installing IBM Tape to get drives running well! 6 LTO 1 drives are SCSI attached and 4 new LTO 3 are fibre. Cheers A happier TSM administrator the last 2 days :> On 7/31/07, Roger Deschner <[EMAIL PROTECTED]> wrote: > > . > I think you are right about the Log - it need not be spread across > multiple volumes. It's only got one writer. > > Your RAID type can affect the performance of the Disk Storage Pools and > the Database dramatically. In particular, RAID5 is very poorly suited > for this, because it is 50% writes. RAID5 is also not ideal for the > Database, though it can be tolerated for the Log. RAID10 is much better. > > You should be using fast disks, not SATA, for the primary Disk Storage > Pools. I've got 10,000rpm IBM SSA disks for these. > > I use RAID10 for the Disk Storage Pools. I use JBOD disks with TSM > mirroring for the Log and Database. This is slightly slower than OS > mirroring or RAID-array mirroring, but it is somewhat safer. Each > physical volume for Storage Pools and Database is broken into many > Logical Volumes. > > You should be saving your fastest disks for the Database. I've got > 15,000prm disks for the Database. When I moved the Database from > 10,000rpm disks to 15,000rpm disks, everything in TSM got noticeably > faster. For instance, DB backups now take 1/3 less time. RAID boxes just > get in the way for the Database; it really runs best on JBOD disks with > TSM doing the mirroring. > > Here's a controversial paper written by a guy at Oracle. He says you > should "Stripe And Mirror Everything" (S.A.M.E.) I've read and reread > this several times, and while I definitely do not agree with everything > said, it does raise some very interesting points that definitely apply > to TSM. For one thing he strongly advocates RAID10, as do I. > http://www.oracle.com/technology/deploy/availability/pdf/oow2000_same.pdf > > Most of my Log pinning problems have been caused by clients. If a client > suffers a networking problem (typically a half-duplex vs. full-duplex > conflict) and if that client tries to back up a large file such as a > movie, that can pin the log on our system until it fills completely. > Minimum throughput controls in TSM can help here, though it can still > happen. I wrote a daemon that watches the Log fullness and if it gets to > about 70% it cancels the session that has the Log pinned. I still have > problems, because the cancel command can take hours to work if the > client is backing up a large file slowly. If the Log gets to 95% it does > a TSM shutdown command, which is vastly easier to recover from than a > 100% full log. At least with a full TSM shutdown, our novice sysadmin's > first impulse which is to try to restart it, is generally a good thing > to do. It usually restarts with an empty Log in these cases, so they can > claim, "I fixed it!" without knowing the underlying complexities. > > Roger Deschner
Re: TSM performance very poor, Recovery log is being pinned
. I think you are right about the Log - it need not be spread across multiple volumes. It's only got one writer. Your RAID type can affect the performance of the Disk Storage Pools and the Database dramatically. In particular, RAID5 is very poorly suited for this, because it is 50% writes. RAID5 is also not ideal for the Database, though it can be tolerated for the Log. RAID10 is much better. You should be using fast disks, not SATA, for the primary Disk Storage Pools. I've got 10,000rpm IBM SSA disks for these. I use RAID10 for the Disk Storage Pools. I use JBOD disks with TSM mirroring for the Log and Database. This is slightly slower than OS mirroring or RAID-array mirroring, but it is somewhat safer. Each physical volume for Storage Pools and Database is broken into many Logical Volumes. You should be saving your fastest disks for the Database. I've got 15,000prm disks for the Database. When I moved the Database from 10,000rpm disks to 15,000rpm disks, everything in TSM got noticeably faster. For instance, DB backups now take 1/3 less time. RAID boxes just get in the way for the Database; it really runs best on JBOD disks with TSM doing the mirroring. Here's a controversial paper written by a guy at Oracle. He says you should "Stripe And Mirror Everything" (S.A.M.E.) I've read and reread this several times, and while I definitely do not agree with everything said, it does raise some very interesting points that definitely apply to TSM. For one thing he strongly advocates RAID10, as do I. http://www.oracle.com/technology/deploy/availability/pdf/oow2000_same.pdf Most of my Log pinning problems have been caused by clients. If a client suffers a networking problem (typically a half-duplex vs. full-duplex conflict) and if that client tries to back up a large file such as a movie, that can pin the log on our system until it fills completely. Minimum throughput controls in TSM can help here, though it can still happen. I wrote a daemon that watches the Log fullness and if it gets to about 70% it cancels the session that has the Log pinned. I still have problems, because the cancel command can take hours to work if the client is backing up a large file slowly. If the Log gets to 95% it does a TSM shutdown command, which is vastly easier to recover from than a 100% full log. At least with a full TSM shutdown, our novice sysadmin's first impulse which is to try to restart it, is generally a good thing to do. It usually restarts with an empty Log in these cases, so they can claim, "I fixed it!" without knowing the underlying complexities. Roger Deschner University of Illinois at Chicago [EMAIL PROTECTED] = "Standards are great. That's why there are so many of them." = On Mon, 30 Jul 2007, Andrew Carlson wrote: >always heard the DB should, because it opens multiple threads with multiple >volumes, but since the log is sequentially written to for the most part, I >can't figure out why that should be in multiple volumes. Thanks. > >On 7/30/07, Charles A Hart <[EMAIL PROTECTED]> wrote: >> >> Your DB and Log shold be RAW as well, and in small vols. (ie 12GB log >> should be in 2-3GB VOls, DB, vols, depengin on size of db should be 5-10GB >> vols. Also try to make sure the raw logical vols are evenly spread >> accross as many LUNs as possible. >> >> Charles Hart >> >> >> >> >> >> "Stapleton, Mark" <[EMAIL PROTECTED]> >> Sent by: "ADSM: Dist Stor Manager" >> 07/29/2007 07:03 AM >> Please respond to >> "ADSM: Dist Stor Manager" >> >> >> To >> ADSM-L@VM.MARIST.EDU >> cc >> >> Subject >> Re: [ADSM-L] TSM performance very poor, Recovery log is being pinned >> >> >> >> >> >> >> From: ADSM: Dist Stor Manager on behalf of Craig Ross >> >TSM is installed on Solaris 10 >> >> This is something that popped right out for me. Do you have your storage >> pools located on raw logical volumes or mounted filesystems? If the >> latter, that might be your problem. Solaris has traditionally had >> incredibly poor throughput performance on mounted filesystems. >> >> You might give thought to rebuilding those storage pools on raw logical >> volumes. Of course, that will require that you completely flush all data >> from your disk storage pools to tape storage pools first, so as not to >> lose client data. >> >> -- >> Mark Stapleton ([EMAIL PROTECTED]) >> Berbee Information Networks (a CDW company) >> >> >> >> This e-mail, including attachments, may include confidential and/or >> proprietary information, and may be used only by the person or entity to >> which it is addressed. If the reader of this e-mail is not the intended >> recipient or his or her authorized agent, the reader is hereby notified >> that any dissemination, distribution or copying of this e-mail is >> prohibited. If you have received this e-mail in error, please notify the >> sender by replying to this message and delete this e-mail immediately. >> > > > >-- >Andy Carlson >--
Re: TSM performance very poor, Recovery log is being pinned
Could you elaborate on why the log should be in smaller volumes? I have always heard the DB should, because it opens multiple threads with multiple volumes, but since the log is sequentially written to for the most part, I can't figure out why that should be in multiple volumes. Thanks. On 7/30/07, Charles A Hart <[EMAIL PROTECTED]> wrote: > > Your DB and Log shold be RAW as well, and in small vols. (ie 12GB log > should be in 2-3GB VOls, DB, vols, depengin on size of db should be 5-10GB > vols. Also try to make sure the raw logical vols are evenly spread > accross as many LUNs as possible. > > Charles Hart > > > > > > "Stapleton, Mark" <[EMAIL PROTECTED]> > Sent by: "ADSM: Dist Stor Manager" > 07/29/2007 07:03 AM > Please respond to > "ADSM: Dist Stor Manager" > > > To > ADSM-L@VM.MARIST.EDU > cc > > Subject > Re: [ADSM-L] TSM performance very poor, Recovery log is being pinned > > > > > > > From: ADSM: Dist Stor Manager on behalf of Craig Ross > >TSM is installed on Solaris 10 > > This is something that popped right out for me. Do you have your storage > pools located on raw logical volumes or mounted filesystems? If the > latter, that might be your problem. Solaris has traditionally had > incredibly poor throughput performance on mounted filesystems. > > You might give thought to rebuilding those storage pools on raw logical > volumes. Of course, that will require that you completely flush all data > from your disk storage pools to tape storage pools first, so as not to > lose client data. > > -- > Mark Stapleton ([EMAIL PROTECTED]) > Berbee Information Networks (a CDW company) > > > > This e-mail, including attachments, may include confidential and/or > proprietary information, and may be used only by the person or entity to > which it is addressed. If the reader of this e-mail is not the intended > recipient or his or her authorized agent, the reader is hereby notified > that any dissemination, distribution or copying of this e-mail is > prohibited. If you have received this e-mail in error, please notify the > sender by replying to this message and delete this e-mail immediately. > -- Andy Carlson --- Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month, The feeling of seeing the red box with the item you want in it:Priceless.
Re: TSM performance very poor, Recovery log is being pinned
Your DB and Log shold be RAW as well, and in small vols. (ie 12GB log should be in 2-3GB VOls, DB, vols, depengin on size of db should be 5-10GB vols. Also try to make sure the raw logical vols are evenly spread accross as many LUNs as possible. Charles Hart "Stapleton, Mark" <[EMAIL PROTECTED]> Sent by: "ADSM: Dist Stor Manager" 07/29/2007 07:03 AM Please respond to "ADSM: Dist Stor Manager" To ADSM-L@VM.MARIST.EDU cc Subject Re: [ADSM-L] TSM performance very poor, Recovery log is being pinned From: ADSM: Dist Stor Manager on behalf of Craig Ross >TSM is installed on Solaris 10 This is something that popped right out for me. Do you have your storage pools located on raw logical volumes or mounted filesystems? If the latter, that might be your problem. Solaris has traditionally had incredibly poor throughput performance on mounted filesystems. You might give thought to rebuilding those storage pools on raw logical volumes. Of course, that will require that you completely flush all data from your disk storage pools to tape storage pools first, so as not to lose client data. -- Mark Stapleton ([EMAIL PROTECTED]) Berbee Information Networks (a CDW company) This e-mail, including attachments, may include confidential and/or proprietary information, and may be used only by the person or entity to which it is addressed. If the reader of this e-mail is not the intended recipient or his or her authorized agent, the reader is hereby notified that any dissemination, distribution or copying of this e-mail is prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and delete this e-mail immediately.
Re: TSM performance very poor, Recovery log is being pinned
From: ADSM: Dist Stor Manager on behalf of Craig Ross >TSM is installed on Solaris 10 This is something that popped right out for me. Do you have your storage pools located on raw logical volumes or mounted filesystems? If the latter, that might be your problem. Solaris has traditionally had incredibly poor throughput performance on mounted filesystems. You might give thought to rebuilding those storage pools on raw logical volumes. Of course, that will require that you completely flush all data from your disk storage pools to tape storage pools first, so as not to lose client data. -- Mark Stapleton ([EMAIL PROTECTED]) Berbee Information Networks (a CDW company)
Re: TSM performance very poor, Recovery log is being pinned
Thanks for all input guys, Firstly sorry for lack of detail. TSM is installed on Solaris 10, No I did not do any benchmarking, as we were not replacing any existing setup just adding more, I have 6 LTO drives already installed and about 17TB of SAN storage which is Primary Random, I have since added 4 new LTO 3 drives (with different Device classes) and they run better than the LTO 1 drives, when server is not logpinning. And the new 15 TB of SATA, now approx 1 TB of the DISK is Primary Random storage and the remainder is SEQ file and its all FS not RAW, I have had heavy discussion over RAW vs FS and I have not been able to find definitive answer Clients I don;t think are causing the issue any TSM processes can pin the log from migrations DB backups and clients sessions. Once the server starts getting busy. Currently (sorry not in front of installation) but I guess of the 15TB I have about 60 sequential volumes across 4 Stgpools, and I have still more to define. I have not had clients utilize this new storage yet, all I have done is start to migrate data into these STGPools to release some of the legacy STGpools The Transport to DB and Recovery Log is SAN, however yesterday I created local copy and this did not improve things. The STGpools are on WMS SAN and AMS500 SAN. Both SATA disk. All across Fibre!! The SAN engineer when installing the Disk's saw expected performance out of the DISK's. I also don't see it being maxsessions because I can Pin the log with 3 or 4 sessions and 3 processes! I think its safe to say its configuration somewhere, because now I think about it its not taking much load to pin the log. Load in which TSM normally copes ok!! Next step may be to remove the New DISK's now will I need to just unmount the FS or will i Need to migrate data off new storage and delete volumes and New STGpools? Thanks On 7/28/07, Lawrence Clark <[EMAIL PROTECTED]> wrote: > > Assuming the SATA are on AIX, were the logical volumes set up to hold > the volumes > defined as JFS2? > > >>> [EMAIL PROTECTED] 07/27/2007 2:30:54 PM >>> > Do the client backup sessions pin the log? What is the throughput on > the > actual client session and are these backups direct to disk? If the > sessions are cancelled does the system come back to life? > > 15 TB of SATA sounds like a lot of storage. how has this been > added/configured- What raw throughput do you get on these disks outside > of > TSM itself? > > You say the LTO3 drives are new. Do you have existing LTO3 drives? > Have > you configured them correctly with new device class etc if you are > mixing > LTO generations in the library? > > I have seen this type of pinning/dramatic slow down before. I saw > itself > manifest by the server hitting the maxsessions limit as all the > sessions > were running so slowly to the disk pool. > > Lots of questions i know, but as you have made multiple changes at the > same time- its going to be difficult to nail down without additional > info. > > Ian Smith > --- > Core Engineering - Storage > > > > > > Robert Clark <[EMAIL PROTECTED]> > Sent by: "ADSM: Dist Stor Manager" > 27/07/2007 18:01 > Please respond to > "ADSM: Dist Stor Manager" > > > To > ADSM-L@VM.MARIST.EDU > cc > > Subject > Re: [ADSM-L] TSM performance very poor, Recovery log is being pinned > > > > > > > Is the SATA setup as disk storage pools? Is it filesystem or raw > logical volumes? > > What is the OS? vmstat or top/topas may give some ideas. > > What is the network transport? Fast ethernet? > > [RC] > > On Jul 27, 2007, at 2:49 AM, Craig Ross wrote: > > > 10 days ago I Recently added 15TB of SATA storage and a new Fabric > > with 4 > > new LTO drives to our 3584 library, > > The DB is approx 90GB TSM > > > > Few days ago I noticed processing had ground to halt, after digging > > around I > > have found as soon as server gets busy maybe 4 processes 8 or so > > sessions > > the recovery log begins "sh logpinned" to pin and the Database gets > > locks. > > Shown by running "sh locks" > > And as result the server suffers! > > Now today I have stopped using the new Tech LTO 3 and SATA and > > things are > > coping better but still worse than previous as soon as load is > > increased Log > > pins and processing slows drastically. > > > > Are there any steps I can take which will help my scenario. > > Would a DB UNLOAD RELOAD help that much? > > > > Reference: Recovery log has heaps of room DB has heaps of room 90Gb > > DB with > > 100GB of room. > > > > Any advice is much appreciated. > > > > --- > > This e-mail may contain confidential and/or privileged information. If > you are not the intended recipient (or have received this e-mail in > error) please notify the sender immediately and delete this e-mail. Any > unauthorized copying, disclosure or distribution of the material in this > e-mail is strictly forbidden. > > Please refer to http://www.db.com/en/content/eu_disclosures.htm for > addition
Re: TSM performance very poor, Recovery log is being pinned
Do the client backup sessions pin the log? What is the throughput on the actual client session and are these backups direct to disk? If the sessions are cancelled does the system come back to life? 15 TB of SATA sounds like a lot of storage. how has this been added/configured- What raw throughput do you get on these disks outside of TSM itself? You say the LTO3 drives are new. Do you have existing LTO3 drives? Have you configured them correctly with new device class etc if you are mixing LTO generations in the library? I have seen this type of pinning/dramatic slow down before. I saw itself manifest by the server hitting the maxsessions limit as all the sessions were running so slowly to the disk pool. Lots of questions i know, but as you have made multiple changes at the same time- its going to be difficult to nail down without additional info. Ian Smith --- Core Engineering - Storage Robert Clark <[EMAIL PROTECTED]> Sent by: "ADSM: Dist Stor Manager" 27/07/2007 18:01 Please respond to "ADSM: Dist Stor Manager" To ADSM-L@VM.MARIST.EDU cc Subject Re: [ADSM-L] TSM performance very poor, Recovery log is being pinned Is the SATA setup as disk storage pools? Is it filesystem or raw logical volumes? What is the OS? vmstat or top/topas may give some ideas. What is the network transport? Fast ethernet? [RC] On Jul 27, 2007, at 2:49 AM, Craig Ross wrote: > 10 days ago I Recently added 15TB of SATA storage and a new Fabric > with 4 > new LTO drives to our 3584 library, > The DB is approx 90GB TSM > > Few days ago I noticed processing had ground to halt, after digging > around I > have found as soon as server gets busy maybe 4 processes 8 or so > sessions > the recovery log begins "sh logpinned" to pin and the Database gets > locks. > Shown by running "sh locks" > And as result the server suffers! > Now today I have stopped using the new Tech LTO 3 and SATA and > things are > coping better but still worse than previous as soon as load is > increased Log > pins and processing slows drastically. > > Are there any steps I can take which will help my scenario. > Would a DB UNLOAD RELOAD help that much? > > Reference: Recovery log has heaps of room DB has heaps of room 90Gb > DB with > 100GB of room. > > Any advice is much appreciated. --- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures.
Re: TSM performance very poor, Recovery log is being pinned
Assuming the SATA are on AIX, were the logical volumes set up to hold the volumes defined as JFS2? >>> [EMAIL PROTECTED] 07/27/2007 2:30:54 PM >>> Do the client backup sessions pin the log? What is the throughput on the actual client session and are these backups direct to disk? If the sessions are cancelled does the system come back to life? 15 TB of SATA sounds like a lot of storage. how has this been added/configured- What raw throughput do you get on these disks outside of TSM itself? You say the LTO3 drives are new. Do you have existing LTO3 drives? Have you configured them correctly with new device class etc if you are mixing LTO generations in the library? I have seen this type of pinning/dramatic slow down before. I saw itself manifest by the server hitting the maxsessions limit as all the sessions were running so slowly to the disk pool. Lots of questions i know, but as you have made multiple changes at the same time- its going to be difficult to nail down without additional info. Ian Smith --- Core Engineering - Storage Robert Clark <[EMAIL PROTECTED]> Sent by: "ADSM: Dist Stor Manager" 27/07/2007 18:01 Please respond to "ADSM: Dist Stor Manager" To ADSM-L@VM.MARIST.EDU cc Subject Re: [ADSM-L] TSM performance very poor, Recovery log is being pinned Is the SATA setup as disk storage pools? Is it filesystem or raw logical volumes? What is the OS? vmstat or top/topas may give some ideas. What is the network transport? Fast ethernet? [RC] On Jul 27, 2007, at 2:49 AM, Craig Ross wrote: > 10 days ago I Recently added 15TB of SATA storage and a new Fabric > with 4 > new LTO drives to our 3584 library, > The DB is approx 90GB TSM > > Few days ago I noticed processing had ground to halt, after digging > around I > have found as soon as server gets busy maybe 4 processes 8 or so > sessions > the recovery log begins "sh logpinned" to pin and the Database gets > locks. > Shown by running "sh locks" > And as result the server suffers! > Now today I have stopped using the new Tech LTO 3 and SATA and > things are > coping better but still worse than previous as soon as load is > increased Log > pins and processing slows drastically. > > Are there any steps I can take which will help my scenario. > Would a DB UNLOAD RELOAD help that much? > > Reference: Recovery log has heaps of room DB has heaps of room 90Gb > DB with > 100GB of room. > > Any advice is much appreciated. --- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain information that is confidential, privileged, and/or otherwise exempt from disclosure under applicable law. If this electronic message is from an attorney or someone in the Legal Department, it may also contain confidential attorney-client communications which may be privileged and protected from disclosure. If you are not the intended recipient, be advised that you have received this message in error and that any use, dissemination, forwarding, printing, or copying is strictly prohibited. Please notify the New York State Thruway Authority immediately by either responding to this e-mail or calling (518) 436-2700, and destroy all copies of this message and any attachments.
Re: TSM performance very poor, Recovery log is being pinned
Is the SATA setup as disk storage pools? Is it filesystem or raw logical volumes? What is the OS? vmstat or top/topas may give some ideas. What is the network transport? Fast ethernet? [RC] On Jul 27, 2007, at 2:49 AM, Craig Ross wrote: 10 days ago I Recently added 15TB of SATA storage and a new Fabric with 4 new LTO drives to our 3584 library, The DB is approx 90GB TSM Few days ago I noticed processing had ground to halt, after digging around I have found as soon as server gets busy maybe 4 processes 8 or so sessions the recovery log begins "sh logpinned" to pin and the Database gets locks. Shown by running "sh locks" And as result the server suffers! Now today I have stopped using the new Tech LTO 3 and SATA and things are coping better but still worse than previous as soon as load is increased Log pins and processing slows drastically. Are there any steps I can take which will help my scenario. Would a DB UNLOAD RELOAD help that much? Reference: Recovery log has heaps of room DB has heaps of room 90Gb DB with 100GB of room. Any advice is much appreciated.
Re: TSM performance very poor, Recovery log is being pinned
You might also check your diskpool volume count. If its low, assuming your doing raw logical volumes, you might want to try decreasing the size of your volumes and thereby increasing the count of your volumes . A small number of large volumes does not allow several clients or processes to stream data to the storage pools efficiently. Joy Hanna Enterprise Storage Group I.T. Computer Operations (503)745-7748 [EMAIL PROTECTED] -Original Message- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Craig Ross Sent: Friday, July 27, 2007 2:49 AM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] TSM performance very poor, Recovery log is being pinned 10 days ago I Recently added 15TB of SATA storage and a new Fabric with 4 new LTO drives to our 3584 library, The DB is approx 90GB TSM Few days ago I noticed processing had ground to halt, after digging around I have found as soon as server gets busy maybe 4 processes 8 or so sessions the recovery log begins "sh logpinned" to pin and the Database gets locks. Shown by running "sh locks" And as result the server suffers! Now today I have stopped using the new Tech LTO 3 and SATA and things are coping better but still worse than previous as soon as load is increased Log pins and processing slows drastically. Are there any steps I can take which will help my scenario. Would a DB UNLOAD RELOAD help that much? Reference: Recovery log has heaps of room DB has heaps of room 90Gb DB with 100GB of room. Any advice is much appreciated.
Re: TSM performance very poor, Recovery log is being pinned
Craig - You need to perform analysis to identify problem cause, where the TSM Problem Determination Guide and Performance Tuning Guide will help. Log pinning is due to prolonged transactions, and is aggravated by sluggish networking and sluggish TSM servicing of transactions (often due to underlying disk/tape issues). You can quickly see if your TSM server is "behind" in its rate of servicing incoming client data by inspecting the TCP receive queue packets backlog. In AIX that can be done via the command: netstat | head -2 ; netstat | grep -vi dns | grep tcp If the various entries show a large receive queue value, then it is likely that your networking is good, but that TSM is not keeping up with the incoming, as may be caused by the underlying disk, tape, and I/O path technology that it is using. If your clients have recently started backing up very large files (digital movies is a stereotypical case), then that would certainly contribute to what you're seeing. A quick look at TSM accounting data or ANE Activity Log messages would give a sense of that, and Query CONTent with a negative Count value on the collocated tape volumes that the clients are doing will show biggies. Query SESSion during client activity will also help identify consumptive sessions. Before you gave TSM the new LTO 3 and SATA hardware, I would hope that you benchmarked it first, to assure that it was providing the performance you would need in production, and thus uncover any issues with it beforehand. A bad RAID choice in disk implementation will also slow throughput. Old microcode may have performance-impairing defects. A mismatched device driver can cause operational delays. Don't waste your time or jeopardize your server in doing a TSM db unload/reload. You may want to confer with your operating system people to have them help narrow down the problem area, where they are familiar with all the specifics of your environment. Richard Sims