Re: Strange problem with the same ACTIVELOG files opened multiple times
Well at least I feel a little better that it is happening to someone else also. We were also having a problem with the DB Backup completely hanging. I thought maybe it was just stuck, but it actually happened on a Friday and was still running as of Monday morning but making no progress. Originally the DB backup was going to a an NFS mounted filesystem but after moving to a local RAID 1 group and changing some other recommendations in /etc/sysctl.conf, the hung DB backup issue seems to be solved. Our DB and ACTIVELOG files are all on local SSD also. The BACKUPPOOL volumes are on a NFS (10G) mounted filesystem that lives on a Data Domain. I do have an open PMR with IBM about the excessive number of times each /tsmactivelog S00* file is open. It has been sent to development for investigation. I will post any updates here. John Graham Stewart<https://www.mail-archive.com/search?l=adsm-l@vm.marist.edu&q=from:%22Graham+Stewart%22> Tue, 06 Dec 2016 11:09:00 -0800<https://www.mail-archive.com/search?l=adsm-l@vm.marist.edu&q=date:20161206> We see something very similar (TSM 7.1.6.0, RHEL 7.2): -83570 open active S00*.log files, under db2sysc -193 S00*.log files -DB backups sometimes (but not always) slow down, showing no progress, but eventually get going again. We've never rebooted under this scenario, and the DB backups always complete successfully, but some days take twice as long as other days. No NFS. All local disk: SSD (DB and logs) and SATA (storage pools). No deduplication. We haven't yet placed a support call with IBM about the variance in DB backup times, but intend to. -- Graham Stewart Network and Storage Services Manager Information Technology Services University of Toronto Libraries 416-978-6337 On 12/06/2016 01:41 PM, Dury, John C. wrote: We continue to see this on all 4 TSM 7.x servers. I do not see this behavior on the TSM 6.x servers at all. Anyone else running TSM 7.x on RHEL 7.x? The /tsmactivelog/NODE/LOGSTREAM/S0018048.LOG is open simultaneously a total of 386 times. This can't be normal behavior. We have two TSM v7.1.7.0 server running on different RHEL 7.x servers. The primary storage pool is BACKUPPOOL which has it's volumes in the local OS mounted as NFS volumes across a 10g network connection. The volumes live on the Data domain which does it's own deduplication in the background. We have a schedule that does a full TSM DB backup daily. The target is a separate file system but is also mounted as a NFS on the Data Domain across a 10g network connection. The TSM Active log is mounted on local disk. The TSM DB is also mounted locally on a RAID of SSD drives for performance. The issue I am seeing is that although there are only 261 S00*.LOG files in /tsmactivelog, they appear to all be open multiple times. "lsof|grep -i tsmactive|wc -l" command tells me that there are 94576 files opened in /tsmactivelog. The process that has the /TSMACTIVELOG files opened is db2sysc. I've never seen this on our TSM 6.x server. It's almost as if the active log file is opened but then never closed. It isn't a gradual climb to a high number of mounts either. 10 minutes after booting the server, there are an excessive number of files mounted in /tsmactivelog. This behavior is happening even when the server is extremely idle with very few (or none) sessions and/or processes running. The issue we keep seeing every few days is processes like the TSM full DB backup, runs for a few minutes, and then progress just stops. After cancelling it, it never comes down so I am forced to HALT the server and then reboot it. It seems like the excessive number of opened files in /tsmactivelog and the hanging DB Backup feel related but I am not sure. I've been working with IBM on the hanging processes but so far they are also stumped but they agree the two issues seem like they should be related. I'm hoping someone out there might have some ideas.
Re: Strange problem with the same ACTIVELOG files opened multiple times
We continue to see this on all 4 TSM 7.x servers. I do not see this behavior on the TSM 6.x servers at all. Anyone else running TSM 7.x on RHEL 7.x? The /tsmactivelog/NODE/LOGSTREAM/S0018048.LOG is open simultaneously a total of 386 times. This can't be normal behavior. We have two TSM v7.1.7.0 server running on different RHEL 7.x servers. The primary storage pool is BACKUPPOOL which has it's volumes in the local OS mounted as NFS volumes across a 10g network connection. The volumes live on the Data domain which does it's own deduplication in the background. We have a schedule that does a full TSM DB backup daily. The target is a separate file system but is also mounted as a NFS on the Data Domain across a 10g network connection. The TSM Active log is mounted on local disk. The TSM DB is also mounted locally on a RAID of SSD drives for performance. The issue I am seeing is that although there are only 261 S00*.LOG files in /tsmactivelog, they appear to all be open multiple times. "lsof|grep -i tsmactive|wc -l" command tells me that there are 94576 files opened in /tsmactivelog. The process that has the /TSMACTIVELOG files opened is db2sysc. I've never seen this on our TSM 6.x server. It's almost as if the active log file is opened but then never closed. It isn't a gradual climb to a high number of mounts either. 10 minutes after booting the server, there are an excessive number of files mounted in /tsmactivelog. This behavior is happening even when the server is extremely idle with very few (or none) sessions and/or processes running. The issue we keep seeing every few days is processes like the TSM full DB backup, runs for a few minutes, and then progress just stops. After cancelling it, it never comes down so I am forced to HALT the server and then reboot it. It seems like the excessive number of opened files in /tsmactivelog and the hanging DB Backup feel related but I am not sure. I've been working with IBM on the hanging processes but so far they are also stumped but they agree the two issues seem like they should be related. I'm hoping someone out there might have some ideas.
Strange problem with the same ACTIVELOG files opened multiple times
We have two TSM v7.1.7.0 server running on different RHEL 7.x servers. The primary storage pool is BACKUPPOOL which has it's volumes in the local OS mounted as NFS volumes across a 10g network connection. The volumes live on the Data domain which does it's own deduplication in the background. We have a schedule that does a full TSM DB backup daily. The target is a separate file system but is also mounted as a NFS on the Data Domain across a 10g network connection. The TSM Active log is mounted on local disk. The TSM DB is also mounted locally on a RAID of SSD drives for performance. The issue I am seeing is that although there are only 261 S00*.LOG files in /tsmactivelog, they appear to all be open multiple times. "lsof|grep -i tsmactive|wc -l" command tells me that there are 94576 files opened in /tsmactivelog. The process that has the /TSMACTIVELOG files opened is db2sysc. I've never seen this on our TSM 6.x server. It's almost as if the active log file is opened but then never closed. It isn't a gradual climb to a high number of mounts either. 10 minutes after booting the server, there are an excessive number of files mounted in /tsmactivelog. This behavior is happening even when the server is extremely idle with very few (or none) sessions and/or processes running. The issue we keep seeing every few days is processes like the TSM full DB backup, runs for a few minutes, and then progress just stops. After cancelling it, it never comes down so I am forced to HALT the server and then reboot it. It seems like the excessive number of opened files in /tsmactivelog and the hanging DB Backup feel related but I am not sure. I've been working with IBM on the hanging processes but so far they are also stumped but they agree the two issues seem like they should be related. I'm hoping someone out there might have some ideas.
server script to find what schedules a node is associated with?
Anyone have any idea how to query a TSM server (v6 or v7) to find what backup schedules a node is associated with? I'm trying to do the reverse of a "q assoc" command and looking for way to supply a node parameter to a script and have the TSM server tell me it is in the following list of schedules. .
Re: All volumes in STGPOOL show as full. Please help.
We do not have a backup of the DEDUP pool but it is being replicated to a different TSM server. The copy stgpool option is disabled. John -Original Message- From: Stef Coene [mailto:stef.co...@docum.org] Sent: Thursday, January 14, 2016 8:54 AM To: ADSM: Dist Stor Manager; Dury, John C. Subject: Re: All volumes in STGPOOL show as full. Please help. Hi, There is one question not asked: do you have a backup of the dedup pool? Per default, TSM will only reclaim volumes after the data is copied to a copy storage pool. You can overrule that in the dsmserv.opt config file. Stef
Re: All volumes in STGPOOL show as full. Please help.
I tried reclamation but it fails since all volume in stgpool DEDUP show as full. An audit run to successful completion but volume stays as FULL. Even a MOVE DATA of a volume in DEDUP pool runs to successful completion but the status stays as FULL. John -Original Message- From: Rettenberger Sabine [mailto:sabine.rettenber...@wolf-heiztechnik.de] Sent: Wednesday, January 13, 2016 10:35 AM To: ADSM: Dist Stor Manager Cc: Dury, John C. Subject: AW: All volumes in STGPOOL show as full. Please help. Hi John, did you try a RECLAIM at the DEDUP-Pool to get the volumes free? What says an AUDIT VOLUME of the empty volumes? Are they really empty? Best regards Sabine -Ursprüngliche Nachricht- Von: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] Im Auftrag von Dury, John C. Gesendet: Mittwoch, 13. Januar 2016 16:01 An: ADSM-L@VM.MARIST.EDU Betreff: All volumes in STGPOOL show as full. Please help. We have 2 storage pools. A BACKUPPOOL and a DEDUP pool. All nightly backups come into the BACKUPPOOL and then migrate to the DEDUP pool for permanent storage. All volumes in the DEDUP pool are showing FULL although the pool is only 69% in use. I tried doing a move data on a volume in the DEDUP pool to the BACKUPPOOL just to free space so reclamation could run, and although it says it ran to successful completion, the volume still shows as FULL. So for whatever reason, all volumes in the DEDUP pool are never freeing up. I ran an audit on the same volume I tried to the MOVE DATA command on and it also ran to successful completion. No idea what is going on here but hopefully someone else has an idea. If our BACKUPPOOL fills up, we can't back anything up any more and we will have a catastrophe. The BACKUPPOOL is roughly 15T and 15% full and I have no way to increase it. Please reply directly to my email address and list as I am currently subscribed as digest only. TSM Server is 6.3.5.300 on RHEL 5 -- Wolf GmbH Sitz der Gesellschaft: Industriestr. 1, D-84048 Mainburg Registergericht: Amtsgericht Regensburg HR-B-1453 Vorsitzender des Aufsichtsrates: Alfred Gaffal Geschäftsführung:Bernhard Steppe (Sprecher) Christian Amann Gerdewan Jacobs Rudolf Meindl Telefon +49 8751/74-0 Telefax +49 8751/74-1600 Internet: http://cp.mcafee.com/d/avndzgOcygArhovjsd79EVjupdTdFEK9EEEI6QQn4QkhPNJ6XbNEVppvhd79JBV4QsFIEECN51K9o58dyNYy9evbCQ6No-h4DfBPq9EVZNMQsKfZvAkS7C3hOZRXBQSkTKjvssKqehTbnhIyyHtVPBgY-F6lK1FJ4SyrLOb2rPUV5xcQsCXCOD6aDKBOvO59_oD2hnAfdQQxa7Z7VP-13UAvy2z5jTiVfV2A_Ijx8HO7CS6nCjow0GhEw43PPh0pGuSHZ3h1a1Ew60i4wI60M5jh1gDkMq8dEq87mPP-d40mf_quq82vDHsrvpdJIlhZP2t USt-IdNr.: DE811132553 ILN: 40 45013 01
All volumes in STGPOOL show as full. Please help.
We have 2 storage pools. A BACKUPPOOL and a DEDUP pool. All nightly backups come into the BACKUPPOOL and then migrate to the DEDUP pool for permanent storage. All volumes in the DEDUP pool are showing FULL although the pool is only 69% in use. I tried doing a move data on a volume in the DEDUP pool to the BACKUPPOOL just to free space so reclamation could run, and although it says it ran to successful completion, the volume still shows as FULL. So for whatever reason, all volumes in the DEDUP pool are never freeing up. I ran an audit on the same volume I tried to the MOVE DATA command on and it also ran to successful completion. No idea what is going on here but hopefully someone else has an idea. If our BACKUPPOOL fills up, we can't back anything up any more and we will have a catastrophe. The BACKUPPOOL is roughly 15T and 15% full and I have no way to increase it. Please reply directly to my email address and list as I am currently subscribed as digest only. TSM Server is 6.3.5.300 on RHEL 5
Re: How can I exclude files from dedupe processing?
Unfortunately the final storage pool is the dedupe pool. Date:Sun, 2 Aug 2015 16:32:17 + From:Paul Zarnowski mailto:p...@cornell.edu>> Subject: Re: How can I exclude files from dedupe processing? Since the files are already using a separate management class, you can just= change the destination storage pool for that class to go to a non-duplicat= ed storage pool. ..Paul (sent from my iPhone) On Aug 2, 2015, at 11:07 AM, Dury, John C. mailto:JDury= @DUQLIGHT.COM<mailto:jd...@duqlight.com%3cmailto:JDury=%2...@duqlight.com>>> wrote: I have a 6.3.5.100 Linux server and several TSM 7.1.0.0 Linux clients. Thos= e linux clients are dumping several Large Oracle databases using compressio= n, and then those files are being backed up to TSM. Because the files are c= ompressed when dumped via RMAN, they are not good candidates for dedupe pro= cessing. Is there any way to have them excluded from dedupe server processi= ng ? I know I can exclude them from client dedupe processing which I am not= doing on this client anyways. I have the SERVERDEDUPTXNLIMIT limit set to = 200, but these rman dumps are smaller than 200g. I have our DBAs investigat= ing using TDP for Oracle, but until then, I would like to exclude these fil= es from dedupe processing as I suspect it is causing issues with space recl= amation. If it helps, these files are in their own management class also. Ideas?
How can I exclude files from dedupe processing?
I have a 6.3.5.100 Linux server and several TSM 7.1.0.0 Linux clients. Those linux clients are dumping several Large Oracle databases using compression, and then those files are being backed up to TSM. Because the files are compressed when dumped via RMAN, they are not good candidates for dedupe processing. Is there any way to have them excluded from dedupe server processing ? I know I can exclude them from client dedupe processing which I am not doing on this client anyways. I have the SERVERDEDUPTXNLIMIT limit set to 200, but these rman dumps are smaller than 200g. I have our DBAs investigating using TDP for Oracle, but until then, I would like to exclude these files from dedupe processing as I suspect it is causing issues with space reclamation. If it helps, these files are in their own management class also. Ideas?
Re: recovering a DB from full backup when one of TSM DB DIR became corrupt and DB is in roll forward mode (need help)
The issue was that I needed a way to recreate the actual files in the dbdirs. It is fixed now. Original Message From: Erwann Simon Sent: Mon, Nov 24, 2014 12:42 AM To: ADSM: Dist Stor Manager ; Dury, John C. CC: Subject: Re: [ADSM-L] recovering a DB from full backup when one of TSM DB DIR became corrupt and DB is in roll forward mode (need help) Hello John, See both Admin Guide and Admin Reference regarding "dsmserv restore db". There are two recovery methods : on that deletes the logs (point in time restore, ignore it) and one that keeps and rollforwards the logs after db restore (most current state, use this one). The command is pretty simple : Login as instance owner Change to the instance directory Simply run : dsmserv restore db Note that of you use the todate option of the dsmserv restore db command, you'll have a point in time restore which deletes the log ! Le 24 novembre 2014 00:57:56 CET, "Dury, John C." a écrit : >Long story short, I have a full DB backup taken on a linux system >running TSM 6.3.4.300 with the DB in roll forward mode. The DB is >spread across 4 different dbdirs file systems. One of the file systems >became corrupt and I need to recover the DB from the full backup. >I have the dbdirs file systems back and the active log and archive log >directories are ok and were untouched. The dbdirs directories are back >but there are no files in them. >The active log and archive log directories are ok and uncorrupted. >How can I restore the DB to the most recent point since the archivelog >is still intact and recreate the files in the dbdirs so a restore can >repopulate them from the full DB backup? -- Erwann SIMON Envoyé de mon téléphone Android avec K-9 Mail. Excusez la brièveté.
recovering a DB from full backup when one of TSM DB DIR became corrupt and DB is in roll forward mode (need help)
Long story short, I have a full DB backup taken on a linux system running TSM 6.3.4.300 with the DB in roll forward mode. The DB is spread across 4 different dbdirs file systems. One of the file systems became corrupt and I need to recover the DB from the full backup. I have the dbdirs file systems back and the active log and archive log directories are ok and were untouched. The dbdirs directories are back but there are no files in them. The active log and archive log directories are ok and uncorrupted. How can I restore the DB to the most recent point since the archivelog is still intact and recreate the files in the dbdirs so a restore can repopulate them from the full DB backup?
Re: Anyone else doing replica backups of exchange datastores? Need some help please.
This is exactly how I am trying to get the replica backup to work. It is running on the passive server and I believe the DSMAGENT configuration is correct in ters of the proxy settings. I continue to work with IBM to get it to work. Hi John, You can't run a backup of a replica from the primary server. You must be running that command on the passive server. The Microsoft Exchange Replica Writer will be running on the passive server. That is the machine you need to have DP/Exchange and the Windows BA Client installed and configured on. Also keep in mind that it's the DSMAGENT (Windows BA Client) is the node that actually sends the data to the TSM Server. It uses the proxy capability to store the data on behalf of the DP/Exchange node. The service team can assist you through this. Thank you, Del "ADSM: Dist Stor Manager" wrote on 10/31/2014 02:47:25 PM: > From: "Dury, John C." > To: ADSM-L AT VM.MARIST DOT EDU > Date: 10/31/2014 02:49 PM > Subject: Re: Anyone else doing replica backups of exchange > datastores? Need some help please. > Sent by: "ADSM: Dist Stor Manager" > > Thanks for the reply. I agree that it sounds like a configuration > issue. We are trying to do this via command line so it can be > scripted and therefore automated. I do have a problem open with IBM > and sent them some logs and lots of information but their first was > response was to tell me that it wasn't configured correctly as they > were looking at early entries in the logs I sent them instead of > further down. Lesson learned, delete all of your logs before > recreating problem as support won't look at timestamps and first > error they see must be the problem. The command line I am > using looks like this: > C:\Program Files\Tivoli\TSM\TDPExchange>tdpexcc backup > incr /fromreplica /backupmethod=vss /backupdestination=tsm / > tsmoptfile="c:\Program Files\Tivoli\TSM\TDPExchange\dsm.opt" > I was getting it to create a TSM session but the session would just > be WAITING on the server for hours and never actually did anything. > FWIW, the datastore I am testing with is tiny. > Now when I try running the backup on the inactive CCR replica I get: > Updating mailbox history on TSM Server... > Mailbox history has been updated successfully. > Querying Exchange Server to gather component information, please wait... > ACN5241E The Microsoft Exchange Information Store is currently not running. > but as I mentioned before, according to my exchange admins, that > service should never be running on both sides of the CCR cluster. > I have changed the dsm.opt and tdpexc.cfg options so many times with > every option can think of which is why I was hoping someone had a > sanitized version of their dsm.opt and tdpexc.cfg that are working > in their environment I could look at. >
Re: Anyone else doing replica backups of exchange datastores? Need some help please.
Thanks for the reply. I agree that it sounds like a configuration issue. We are trying to do this via command line so it can be scripted and therefore automated. I do have a problem open with IBM and sent them some logs and lots of information but their first was response was to tell me that it wasn't configured correctly as they were looking at early entries in the logs I sent them instead of further down. Lesson learned, delete all of your logs before recreating problem as support won't look at timestamps and first error they see must be the problem. The command line I am using looks like this: C:\Program Files\Tivoli\TSM\TDPExchange>tdpexcc backup incr /fromreplica /backupmethod=vss /backupdestination=tsm /tsmoptfile="c:\Program Files\Tivoli\TSM\TDPExchange\dsm.opt" I was getting it to create a TSM session but the session would just be WAITING on the server for hours and never actually did anything. FWIW, the datastore I am testing with is tiny. Now when I try running the backup on the inactive CCR replica I get: Updating mailbox history on TSM Server... Mailbox history has been updated successfully. Querying Exchange Server to gather component information, please wait... ACN5241E The Microsoft Exchange Information Store is currently not running. but as I mentioned before, according to my exchange admins, that service should never be running on both sides of the CCR cluster. I have changed the dsm.opt and tdpexc.cfg options so many times with every option can think of which is why I was hoping someone had a sanitized version of their dsm.opt and tdpexc.cfg that are working in their environment I could look at. Hi John, This looks like a configuration issue to me. Exchange Server 2007 CCR and LCR replica copies can be backed up and restored by using the VSS method only. Microsoft does not allow Legacy backups of Exchange Server 2007 CCR and LCR replica copies. Also keep in mind, all VSS Restores of a CCR or LCR replica can be restored only into the running instance of a storage group (primary, recovery, or alternate). Microsoft does not support VSS Restores into a replica instance. If you want to back up from the replica copy when running in a CCR or LCR environment, specify the "FromReplica True" backup option in the Protect tab of the MMC GUI. You can also specify the /fromreplica parameter with the tdpexcc backup command on the command-line interface. Here is the important one... For CCR copies, you must run the backup while logged on to the secondary node of the cluster that currently contains the replica copy and you must use the "FROMREPLICA" option. Here are a few more things: http://www-01.ibm.com/support/knowledgecenter/SSTG2D_6.4.0/com.ibm.itsm.mail.exc.doc/c_dpfcm_bup_replica_exc.html?cp=SSTG2D_6.4.0&lang=en If you are not able to get this working, you should open a PMR so that the service team can help you get the configuration working. Del "ADSM: Dist Stor Manager" wrote on 10/31/2014 08:40:00 AM: > From: "Dury, John C." > To: ADSM-L AT VM.MARIST DOT EDU > Date: 10/31/2014 08:42 AM > Subject: Anyone else doing replica backups of exchange datastores? > Need some help please. > Sent by: "ADSM: Dist Stor Manager" > > I have been trying to get this to work for days now and I don't seem > to be making any progress. I have tried all kinds of options in both > the dsm.opt and tdpexc.cfg files and I get various messages. The > documentation in the TDP for Exchange manual on doing replica > backups is not very detailed at all. > Our environment looks like this. > We have two windows 2008 exchange 2007 servers setup for CCR > replication to each other. I am trying to do replica backups on the > offline cluster member so it doesn't affect performance on the live > cluster member. > The TSM server is 6.3.4.300 running on RHEL5 linux. > I thought I had it working but nothing was actually backed up. The > TSM session was established on the TSM server but no data was > actually sent. The sessions appeared to be hung and seemed to > eventually timeout. > The other message I was receiving on the offline exchange cluster member was > Updating mailbox history on TSM Server... > Mailbox history has been updated successfully. > > Querying Exchange Server to gather component information, please wait... > > ACN5241E The Microsoft Exchange Information Store is currently not running. > but from what I was told by our exchange experts is that that > service only runs on the active cluster member and not on the offline member. > Below are my sanitized dsm.opt and tdpexc.cfg > dsm.opt > NODename exchange > deduplication no > CLUSTERnode yes > COMPRESSIon Off > COMPRESSalways On > PASSWORDAccess Genera
Anyone else doing replica backups of exchange datastores? Need some help please.
I have been trying to get this to work for days now and I don't seem to be making any progress. I have tried all kinds of options in both the dsm.opt and tdpexc.cfg files and I get various messages. The documentation in the TDP for Exchange manual on doing replica backups is not very detailed at all. Our environment looks like this. We have two windows 2008 exchange 2007 servers setup for CCR replication to each other. I am trying to do replica backups on the offline cluster member so it doesn't affect performance on the live cluster member. The TSM server is 6.3.4.300 running on RHEL5 linux. I thought I had it working but nothing was actually backed up. The TSM session was established on the TSM server but no data was actually sent. The sessions appeared to be hung and seemed to eventually timeout. The other message I was receiving on the offline exchange cluster member was Updating mailbox history on TSM Server... Mailbox history has been updated successfully. Querying Exchange Server to gather component information, please wait... ACN5241E The Microsoft Exchange Information Store is currently not running. but from what I was told by our exchange experts is that that service only runs on the active cluster member and not on the offline member. Below are my sanitized dsm.opt and tdpexc.cfg dsm.opt NODename exchange deduplication no CLUSTERnode yes COMPRESSIon Off COMPRESSalwaysOn PASSWORDAccessGenerate resourceutilization 5 COMMMethodTCPip TCPPort 1500 TCPServeraddress tsmserver TCPWindowsize 128 TCPBuffSize 64 diskbuffsize 32 SCHEDMODE Prompted SCHEDLOGRetention 14,d HTTPport 1581 tdpexc.cfg BUFFers 4 BUFFERSIze 8192 clusternode yes compression off compressalways on LOGFile tdpexc.log LOGPrune 60 MOUNTWait Yes TEMPLOGRestorepath P:\TempRestoreLoc LASTPRUNEDate 10/31/2014 06:55:09 BACKUPMETHod LEGACY * BACKUPMETHod vss RETRies 0 LANGuage ENU BACKUPDESTination TSM LOCALDSMAgentnode exchangeofflineclustertsmnodename REMOTEDSMAgentnode exchange TEMPDBRestorepath P:\TempRestoreLoc CLIENTACcessserver Could someone who actually has this working send me their sanitized dsm.opt.tdpexc.cfg and the actual command used to do a replica backup from the offline cluster member?
Re: file system backups of a Dell NDMP Equallogic device
Sorry for revisiting this but I'm in a predicament now. Trying to backup the NDMP device is a miserable failure and frankly just ugly. I honestly can't see why anyone would use TSM to backup any NDMP devices except for maybe speed issues. We decided to mount all of the NFS shares locally on the TSM server and allow them to be backed up that way but now the problem is that even with resourceutilization set to 20, it still takes 18+ hours just to do an incremental because there are millions and millions of files in all of those NFS shares. So this isn't going to work either. I can try the proxy node solution but frankly I'm skeptical about it also because of the tremendous number of small files. Of course this is all for a mission critical application so I have to come up with a workable solution and I'm running out of ideas. Help! Oh, that works just fine. Then you're backing up over NFS, no NDMP involved. And TSM will not back up an NFS-mounted volume by default, so you won't get multiple copies. Put the virtualmountpoint names in the DOMAIN statement in dsm.sys of the client you want to run the backups (or create dsmc incr commands that list the sharenames, however you roll), fight through whatever permissions issues pop up, and Bob's your uncle. You'll get incremental-only backups of those files. What you won't know for a while, is how long it takes to noodle through the filesystems across the NFS mount- depends on how many kazillion objects in the directories. If you list the names in the DOMAIN statement, you can add "RESOURCEUTILIZATION 10" to the dsm.sys and process 4 shares at once, if the directory noodling is more time consuming than the actual data transfer, which it usually is if these shares are made of a lot of small files. If you can't get through them by running 4 at a time, I've solved that before by setting up multiple proxy clients (using GRANT PROXYNODE), to get even more parallel streams running, but with all the backups stored under 1 nodename so that it's easy to find them at restore time. W -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of Dury, John C. Sent: Friday, February 14, 2014 3:46 PM To: ADSM-L AT VM.MARIST DOT EDU Subject: Re: [ADSM-L] file system backups of a Dell NDMP Equallogic device We are more concerned about file level backups than an image backup. Eventually the NAS devices will be replicating using Equallogic replication once we get some more storage but for now, we want to make sure that the files in the NFS shares are correctly backed up but I really wanted to avoid backing up the same NFS data to multiple TSM nodes since some of the NFS mount are shared amongst several servers/nodes. My strategy is to pick one TSM node and make sure it has NFS mounts for all of the NFS that live on the NAS and then just backup it up as virtualmountpoint(s) so something like this /NAS exists off of root on TSM node and is local mount NFS1 as /NAS/NFS1 mount NFS2 as /NAS/NFS2 etc put entry in dsm.sys virtualmountpoint /NAS and then just let incrementals run normally. All restores would need to be done on the NODE that can see all of the NFS mounts. Think that will work? I agree with Wanda. Our strategy for our filers (BlueARC, Isilon) is to backup at the file-level exclusively, using NFS. Modern TSM servers support no-query restores well enough that we can get a restore of the latest data very quickly (make sure you have plenty of CPU and memory, along with very fast database disks). To perform the backups efficiently, you might want to think about splitting your data up into separate nodes or filespaces, backed up with independent schedules, so that you're not bottlenecked on a single component. As far as I can tell, NDMP was written by storage vendors to make one buy more expensive storage, and more of it than one needs. You don't have to use tape. You can do NDMP backups via TCP/IP to your regular TSM storage pool hierarchy. But AFAIK you still have to do it at the volume/share level that the NAS device understands, I don't think you can do it at the root. Using "virtualmountpoint" is for backing up incrementally at the *file* level via NFS or CIFS mounts, not NDMP, so I'm not sure which way you are headed. Question is, what are you doing this for? NDMP is a stupid, simplistic protocol. You won't like what you have to do to achieve an individual file restore. If you are trying to get DR capability to rebuild your NDMP shares in case of an emergency, it makes sense. If you are just trying to provide backup coverage to restore people's files like you would from a file server, it may not. I
Tivoli Admin Center frustrations
I apologize in advance but I'm just getting more frustrated. I've installed both the windows and linux version of TSM Admin Center 6.3.4.300 and both seemed to install correctly with no errors. After correctly logging in, I can't add a TSM server under the "Manage Server" option. I've tried 4 different web browsers (opera,firefox,chrome and IE) and the only combination I could get to work correctly was IE and the linux Admin Center. I've done some searching and followed some suggestions like enabling javascript in firefox but it still hasn't helped. We also have TSM Manager licensed but were looking for more functionality (specifically Automatic client deployment) that Admin Center supplies plus maybe saving some money. Admin Center has been available for how many years now and is still this buggy? One browser of four is all that seems to work out of the box and only with the linux version? It seems like I am not the only one feeling the pain. Anyone have any suggestions? I can call IBM of course but I'm afraid that will result in hours and hours trying to get it to work, and frankly I don't have the time (or maybe patience).
Re: server takes long time to connect to via admin or backup client
Thanks for the reply. We aren't using ldap at all so I don't think that is the cause. John Frank Fegert<http://www.mail-archive.com/search?l=adsm-l@vm.marist.edu&q=from:%22Frank+Fegert%22> Mon, 31 Mar 2014 13:20:38 -0700<http://www.mail-archive.com/search?l=adsm-l@vm.marist.edu&q=date:20140331> Hello, On Mon, Mar 31, 2014 at 09:00:10AM -0400, Dury, John C. wrote: > This is a weird one and I have opened a pmr with IBM but I thought I would > check with you all and see if anyone had any ideas what is going on. I have > a TSM 6.3.4.300 server running on RHEL5 with all the latest maintenance > installed, and when I try to connect to it via an admin session, either > locally from within the server via loopback (127.0.0.1) or remotely using an > admin client, it seems to take several minutes before I connect and get any > response back. SSHing into the server is almost immediate so it's not the OS. > The server is not extremely busy and this happens when it is doing almost > nothing and is very consistent. I have an almost identical TSM server that > does not have this problem at all. I can immediately connect via an admin > client and I immediately get a response. I have compared both the dsmserv.opt > files on both servers as well as the /etc/sysctl.conf files and nothing seems > to be out of place. I don't see anything odd in the db2diag.*.log file > either. I'm just not sure where else to look or what could be causing this > but it is definitely affecting backup performance since the backup clients > can take several minutes just to connect to the server. > Ideas? just a wild guess, but take a look at this: http://www.ibm.com/support/docview.wss?uid=swg21667740&myns=swgtiv&mynp=OCSSGSG7&mync=E HTH & best regards, Frank
server takes long time to connect to via admin or backup client
This is a weird one and I have opened a pmr with IBM but I thought I would check with you all and see if anyone had any ideas what is going on. I have a TSM 6.3.4.300 server running on RHEL5 with all the latest maintenance installed, and when I try to connect to it via an admin session, either locally from within the server via loopback (127.0.0.1) or remotely using an admin client, it seems to take several minutes before I connect and get any response back. SSHing into the server is almost immediate so it's not the OS. The server is not extremely busy and this happens when it is doing almost nothing and is very consistent. I have an almost identical TSM server that does not have this problem at all. I can immediately connect via an admin client and I immediately get a response. I have compared both the dsmserv.opt files on both servers as well as the /etc/sysctl.conf files and nothing seems to be out of place. I don't see anything odd in the db2diag.*.log file either. I'm just not sure where else to look or what could be causing this but it is definitely affecting backup performance since the backup clients can take several minutes just to connect to the server. Ideas?
Re: file system backups of a Dell NDMP Equallogic device
We are more concerned about file level backups than an image backup. Eventually the NAS devices will be replicating using Equallogic replication once we get some more storage but for now, we want to make sure that the files in the NFS shares are correctly backed up but I really wanted to avoid backing up the same NFS data to multiple TSM nodes since some of the NFS mount are shared amongst several servers/nodes. My strategy is to pick one TSM node and make sure it has NFS mounts for all of the NFS that live on the NAS and then just backup it up as virtualmountpoint(s) so something like this /NAS exists off of root on TSM node and is local mount NFS1 as /NAS/NFS1 mount NFS2 as /NAS/NFS2 etc put entry in dsm.sys virtualmountpoint /NAS and then just let incrementals run normally. All restores would need to be done on the NODE that can see all of the NFS mounts. Think that will work? I agree with Wanda. Our strategy for our filers (BlueARC, Isilon) is to backup at the file-level exclusively, using NFS. Modern TSM servers support no-query restores well enough that we can get a restore of the latest data very quickly (make sure you have plenty of CPU and memory, along with very fast database disks). To perform the backups efficiently, you might want to think about splitting your data up into separate nodes or filespaces, backed up with independent schedules, so that you're not bottlenecked on a single component. As far as I can tell, NDMP was written by storage vendors to make one buy more expensive storage, and more of it than one needs. You don't have to use tape. You can do NDMP backups via TCP/IP to your regular TSM storage pool hierarchy. But AFAIK you still have to do it at the volume/share level that the NAS device understands, I don't think you can do it at the root. Using "virtualmountpoint" is for backing up incrementally at the *file* level via NFS or CIFS mounts, not NDMP, so I'm not sure which way you are headed. Question is, what are you doing this for? NDMP is a stupid, simplistic protocol. You won't like what you have to do to achieve an individual file restore. If you are trying to get DR capability to rebuild your NDMP shares in case of an emergency, it makes sense. If you are just trying to provide backup coverage to restore people's files like you would from a file server, it may not. If you want to do NDMP via TCP/IP instead of direct to tape, reply with your TSM server platform and server level, and I'll send you back the page reference in the manual you need... W -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of Dury, John C. Sent: Thursday, February 13, 2014 2:11 PM To: ADSM-L AT VM.MARIST DOT EDU Subject: [ADSM-L] file system backups of a Dell NDMP Equallogic device We have two Dell NDMP storage devices and a TSM server at both sites. We'd like to be able to root file level (image backups don't help much) backups (and restores if necessary) of the entire NDMP device to the local TSM server. Can someone point me in the right direction or tell me how they did it? NAS/NDMP is pretty new to me and from what I have read so far, the documentation talks about backing up directly to tape, which we don't have any more. All of our storage is online. What I was originally planning on doing, was creating all of the NFS shares on one linux server, and backing them up as /virtualmountpoints. I'd like to setup just one which points to the root of all the NFS systems on the NAS device but I see no way to do that either. Any help is appreciated. Op 13 feb. 2014, om 20:11 heeft Dury, John C. het volgende geschreven: > We have two Dell NDMP storage devices and a TSM server at both sites. We'd > like to be able to root file level (image backups don't help much) backups > (and restores if necessary) of the entire NDMP device to the local TSM > server. Can someone point me in the right direction or tell me how they did > it? NAS/NDMP is pretty new to me and from what I have read so far, the > documentation talks about backing up directly to tape, which we don't have > any more. All of our storage is online. > > What I was originally planning on doing, was creating all of the NFS shares > on one linux server, and backing them up as /virtualmountpoints. I'd like to > setup just one which points to the root of all the NFS systems on the NAS > device but I see no way to do that either. > Any help is appreciated. if supported by the Dell, NDMP to disk is even simpler than NDMP to tape... just don't define any paths from the datamover to tape (which you don't have any way) -- Met vriendelijke groeten/Kind Regards, Remco Post r.post AT plcs DOT nl +31 6 248 21 622
file system backups of a Dell NDMP Equallogic device
We have two Dell NDMP storage devices and a TSM server at both sites. We'd like to be able to root file level (image backups don't help much) backups (and restores if necessary) of the entire NDMP device to the local TSM server. Can someone point me in the right direction or tell me how they did it? NAS/NDMP is pretty new to me and from what I have read so far, the documentation talks about backing up directly to tape, which we don't have any more. All of our storage is online. What I was originally planning on doing, was creating all of the NFS shares on one linux server, and backing them up as /virtualmountpoints. I'd like to setup just one which points to the root of all the NFS systems on the NAS device but I see no way to do that either. Any help is appreciated.
Re: ANR1817W when exporting node to another server
That is exactly what I am trying to do. John - What happens if you specify a specific node name that hasn't been exported - yet? On Wed, Sep 11, 2013 at 12:20 PM, Dury, John C. wrote: > I have tried several combinations of the following and they all fail with > the output listed below. > > export node toserver=LINUXSERVER filespace=* filedata=all > export node toserver=LINUXSERVER filedata=all > export node toserver=LINUXSERVER filedata=all replacedefs=yes > mergefilespaces=yes > export node toserver=LINUXSERVER filedata=all replacedefs=no > mergefilespaces=no > export node toserver=LINUXSERVER filedata=all domain=standard > > > > - Didn't notice the actual EXPORT command/options you used? > > > On Wed, Sep 11, 2013 at 8:30 AM, Dury, John C. > wrote: > > > We have 2 6.3.4.0 TSM servers, one AIX and one linux. We want to move all > > of the data from the AIX server to the linux server and then eventually > > retire the AIX server. I understand I can also do this with replication, > > but because the lijnux server will eventually be replicating to a second > > linux server at a remote site and until the whole process is complete, > the > > AIX server will still be doing backups, I thought exporting node data > would > > be a simpler solution. > > I have the AIX and linux servers talking to each other and was able to > > successfully export admins from the AIX server directory to the linux > > server but when I try to export a node from AIX to linux, I get the > > following messages during the process and I'm not sure why. I've tried > > every combination of replacedefs and mergefilespaces but all get the same > > errors. Here is the output from the "export node". The output below was > > sanitized but should be easily readable. > > > > Ideas anyone? > > > > > > 09/11/2013 08:03:32 ANR0984I Process 667 for EXPORT NODE started in the > > BACKGROUND at 08:03:32. (SESSION: 744051, PROCESS: > > 667) > > 09/11/2013 08:03:32 ANR0609I EXPORT NODE started as process 667. > (SESSION: > > 744051, PROCESS: 667) > > 09/11/2013 08:03:32 ANR0402I Session 744080 started for administrator > > ADMINID > > (Server) (Memory IPC). (SESSION: 744051, PROCESS: > > 667) > > 09/11/2013 08:03:32 ANR0408I Session 744081 started for server > LINUXSERVER > > (Linux/x86_64) (Tcp/Ip) for general administration. > > (SESSION: 744051, PROCESS: 667) > > 09/11/2013 08:03:32 ANR0610I EXPORT NODE started by ADMINID as process > > 667. > > (SESSION: 744051, PROCESS: 667) > > 09/11/2013 08:03:33 ANR0635I EXPORT NODE: Processing node NODENAME in > > domain > > STANDARD. (SESSION: 744051, PROCESS: 667) > > 09/11/2013 08:03:33 ANR0637I EXPORT NODE: Processing file space > > \\NODENAME\c$ > > for node NODENAME fsId 1 . (SESSION: 744051, > PROCESS: > > 667) > > 09/11/2013 08:03:33 ANR0637I EXPORT NODE: Processing file space > > \\NODENAME\d$ > > for node NODENAME fsId 2 . (SESSION: 744051, > PROCESS: > > 667) > > 09/11/2013 08:04:09 ANR8337I LTO volume TAPEVOLUME mounted in drive > > TAPEDRIVE > > (/dev/rmt4). (SESSION: 744051, PROCESS: 667) > > 09/11/2013 08:04:09 ANR0512I Process 667 opened input volume TAPEVOLUME. > > (SESSION: > > 744051, PROCESS: 667) > > 09/11/2013 08:05:37 ANR1817W EXPORT NODE: No valid nodes were identified > > for > > import on target server LINUXSERVER. (SESSION: > > 744051, > > PROCESS: 667) > > 09/11/2013 08:05:37 ANR0562I EXPORT NODE: Data transfer complete, > deleting > > temporary data. (SESSION: 744051, PROCESS: 667) > > 09/11/2013 08:05:37 ANR0405I Session 744081 ended for administrator > > LINUXSERVER > > (Linux/x86_64). (SESSION: 744051, PROCESS: 667) > > 09/11/2013 08:05:37 ANR0515I Process 667 closed volume TAPEVOLUME. > > (SESSION: > > 744051, PROCESS: 667) > > 09/11/2013 08:05:37 ANR0724E EXPORT NODE: Processing terminated > > abnormally - > > transaction failure. (SESSION: 744051, PROCESS: > 667) > > 09/11/2013 08:05:37 ANR0626I EXPORT NODE: Copied 1 node definitions. > > (SESSION: > > 744051, PROCESS: 667) > > 09/11/2013 08:05:37 A
Re: ANR1817W when exporting node to another server
I found the problem. There was a leftover replicated node from previous testing, with the same name. Replicated nodes don't show up when doing a "q node" so I wasn't seeing it on the new server. Thanks for the fresh eyes. From: Dury, John C. Sent: Wednesday, September 11, 2013 2:02 PM To: 'ADSM-L (ADSM-L@VM.MARIST.EDU)' Subject: Re: [ADSM-L] ANR1817W when exporting node to another server That is exactly what I am trying to do. John - What happens if you specify a specific node name that hasn't been exported - yet? On Wed, Sep 11, 2013 at 12:20 PM, Dury, John C. wrote: > I have tried several combinations of the following and they all fail with > the output listed below. > > export node toserver=LINUXSERVER filespace=* filedata=all > export node toserver=LINUXSERVER filedata=all > export node toserver=LINUXSERVER filedata=all replacedefs=yes > mergefilespaces=yes > export node toserver=LINUXSERVER filedata=all replacedefs=no > mergefilespaces=no > export node toserver=LINUXSERVER filedata=all domain=standard > > > > - Didn't notice the actual EXPORT command/options you used? > > > On Wed, Sep 11, 2013 at 8:30 AM, Dury, John C. > wrote: > > > We have 2 6.3.4.0 TSM servers, one AIX and one linux. We want to move all > > of the data from the AIX server to the linux server and then eventually > > retire the AIX server. I understand I can also do this with replication, > > but because the lijnux server will eventually be replicating to a second > > linux server at a remote site and until the whole process is complete, > the > > AIX server will still be doing backups, I thought exporting node data > would > > be a simpler solution. > > I have the AIX and linux servers talking to each other and was able to > > successfully export admins from the AIX server directory to the linux > > server but when I try to export a node from AIX to linux, I get the > > following messages during the process and I'm not sure why. I've tried > > every combination of replacedefs and mergefilespaces but all get the same > > errors. Here is the output from the "export node". The output below was > > sanitized but should be easily readable. > > > > Ideas anyone? > > > > > > 09/11/2013 08:03:32 ANR0984I Process 667 for EXPORT NODE started in the > > BACKGROUND at 08:03:32. (SESSION: 744051, PROCESS: > > 667) > > 09/11/2013 08:03:32 ANR0609I EXPORT NODE started as process 667. > (SESSION: > > 744051, PROCESS: 667) > > 09/11/2013 08:03:32 ANR0402I Session 744080 started for administrator > > ADMINID > > (Server) (Memory IPC). (SESSION: 744051, PROCESS: > > 667) > > 09/11/2013 08:03:32 ANR0408I Session 744081 started for server > LINUXSERVER > > (Linux/x86_64) (Tcp/Ip) for general administration. > > (SESSION: 744051, PROCESS: 667) > > 09/11/2013 08:03:32 ANR0610I EXPORT NODE started by ADMINID as process > > 667. > > (SESSION: 744051, PROCESS: 667) > > 09/11/2013 08:03:33 ANR0635I EXPORT NODE: Processing node NODENAME in > > domain > > STANDARD. (SESSION: 744051, PROCESS: 667) > > 09/11/2013 08:03:33 ANR0637I EXPORT NODE: Processing file space > > \\NODENAME\c$ > > for node NODENAME fsId 1 . (SESSION: 744051, > PROCESS: > > 667) > > 09/11/2013 08:03:33 ANR0637I EXPORT NODE: Processing file space > > \\NODENAME\d$ > > for node NODENAME fsId 2 . (SESSION: 744051, > PROCESS: > > 667) > > 09/11/2013 08:04:09 ANR8337I LTO volume TAPEVOLUME mounted in drive > > TAPEDRIVE > > (/dev/rmt4). (SESSION: 744051, PROCESS: 667) > > 09/11/2013 08:04:09 ANR0512I Process 667 opened input volume TAPEVOLUME. > > (SESSION: > > 744051, PROCESS: 667) > > 09/11/2013 08:05:37 ANR1817W EXPORT NODE: No valid nodes were identified > > for > > import on target server LINUXSERVER. (SESSION: > > 744051, > > PROCESS: 667) > > 09/11/2013 08:05:37 ANR0562I EXPORT NODE: Data transfer complete, > deleting > > temporary data. (SESSION: 744051, PROCESS: 667) > > 09/11/2013 08:05:37 ANR0405I Session 744081 ended for administrator > > LINUXSERVER > > (Linux/x86_64). (SESSION: 744051, PROCESS: 667) > > 09/11/2013 08:05:37 ANR0515I Process 667 closed volume TAPEVOLUME. > > (S
Re: ANR1817W when exporting node to another server
I have tried several combinations of the following and they all fail with the output listed below. export node toserver=LINUXSERVER filespace=* filedata=all export node toserver=LINUXSERVER filedata=all export node toserver=LINUXSERVER filedata=all replacedefs=yes mergefilespaces=yes export node toserver=LINUXSERVER filedata=all replacedefs=no mergefilespaces=no export node toserver=LINUXSERVER filedata=all domain=standard - Didn't notice the actual EXPORT command/options you used? On Wed, Sep 11, 2013 at 8:30 AM, Dury, John C. wrote: > We have 2 6.3.4.0 TSM servers, one AIX and one linux. We want to move all > of the data from the AIX server to the linux server and then eventually > retire the AIX server. I understand I can also do this with replication, > but because the lijnux server will eventually be replicating to a second > linux server at a remote site and until the whole process is complete, the > AIX server will still be doing backups, I thought exporting node data would > be a simpler solution. > I have the AIX and linux servers talking to each other and was able to > successfully export admins from the AIX server directory to the linux > server but when I try to export a node from AIX to linux, I get the > following messages during the process and I'm not sure why. I've tried > every combination of replacedefs and mergefilespaces but all get the same > errors. Here is the output from the "export node". The output below was > sanitized but should be easily readable. > > Ideas anyone? > > > 09/11/2013 08:03:32 ANR0984I Process 667 for EXPORT NODE started in the > BACKGROUND at 08:03:32. (SESSION: 744051, PROCESS: > 667) > 09/11/2013 08:03:32 ANR0609I EXPORT NODE started as process 667. (SESSION: > 744051, PROCESS: 667) > 09/11/2013 08:03:32 ANR0402I Session 744080 started for administrator > ADMINID > (Server) (Memory IPC). (SESSION: 744051, PROCESS: > 667) > 09/11/2013 08:03:32 ANR0408I Session 744081 started for server LINUXSERVER > (Linux/x86_64) (Tcp/Ip) for general administration. > (SESSION: 744051, PROCESS: 667) > 09/11/2013 08:03:32 ANR0610I EXPORT NODE started by ADMINID as process > 667. > (SESSION: 744051, PROCESS: 667) > 09/11/2013 08:03:33 ANR0635I EXPORT NODE: Processing node NODENAME in > domain > STANDARD. (SESSION: 744051, PROCESS: 667) > 09/11/2013 08:03:33 ANR0637I EXPORT NODE: Processing file space > \\NODENAME\c$ > for node NODENAME fsId 1 . (SESSION: 744051, PROCESS: > 667) > 09/11/2013 08:03:33 ANR0637I EXPORT NODE: Processing file space > \\NODENAME\d$ > for node NODENAME fsId 2 . (SESSION: 744051, PROCESS: > 667) > 09/11/2013 08:04:09 ANR8337I LTO volume TAPEVOLUME mounted in drive > TAPEDRIVE > (/dev/rmt4). (SESSION: 744051, PROCESS: 667) > 09/11/2013 08:04:09 ANR0512I Process 667 opened input volume TAPEVOLUME. > (SESSION: > 744051, PROCESS: 667) > 09/11/2013 08:05:37 ANR1817W EXPORT NODE: No valid nodes were identified > for > import on target server LINUXSERVER. (SESSION: > 744051, > PROCESS: 667) > 09/11/2013 08:05:37 ANR0562I EXPORT NODE: Data transfer complete, deleting > temporary data. (SESSION: 744051, PROCESS: 667) > 09/11/2013 08:05:37 ANR0405I Session 744081 ended for administrator > LINUXSERVER > (Linux/x86_64). (SESSION: 744051, PROCESS: 667) > 09/11/2013 08:05:37 ANR0515I Process 667 closed volume TAPEVOLUME. > (SESSION: > 744051, PROCESS: 667) > 09/11/2013 08:05:37 ANR0724E EXPORT NODE: Processing terminated > abnormally - > transaction failure. (SESSION: 744051, PROCESS: 667) > 09/11/2013 08:05:37 ANR0626I EXPORT NODE: Copied 1 node definitions. > (SESSION: > 744051, PROCESS: 667) > 09/11/2013 08:05:37 ANR0627I EXPORT NODE: Copied 2 file spaces 0 archive > files, 1 backup files, and 0 space managed files. > (SESSION: 744051, PROCESS: 667) > 09/11/2013 08:05:37 ANR0629I EXPORT NODE: Copied 1589 bytes of data. > (SESSION: > 744051, PROCESS: 667) > 09/11/2013 08:05:37 ANR0611I EXPORT NODE started by ADMINID as process > 667 has > ended. (SESSION: 744051, PROCESS: 667) > 09/11/2013 08:05:37 ANR0986I Process 667 for EXPORT NODE running in the > BACKGROUND processed 4 items for a total of 1,589 > bytes >
ANR1817W when exporting node to another server
We have 2 6.3.4.0 TSM servers, one AIX and one linux. We want to move all of the data from the AIX server to the linux server and then eventually retire the AIX server. I understand I can also do this with replication, but because the lijnux server will eventually be replicating to a second linux server at a remote site and until the whole process is complete, the AIX server will still be doing backups, I thought exporting node data would be a simpler solution. I have the AIX and linux servers talking to each other and was able to successfully export admins from the AIX server directory to the linux server but when I try to export a node from AIX to linux, I get the following messages during the process and I'm not sure why. I've tried every combination of replacedefs and mergefilespaces but all get the same errors. Here is the output from the "export node". The output below was sanitized but should be easily readable. Ideas anyone? 09/11/2013 08:03:32 ANR0984I Process 667 for EXPORT NODE started in the BACKGROUND at 08:03:32. (SESSION: 744051, PROCESS: 667) 09/11/2013 08:03:32 ANR0609I EXPORT NODE started as process 667. (SESSION: 744051, PROCESS: 667) 09/11/2013 08:03:32 ANR0402I Session 744080 started for administrator ADMINID (Server) (Memory IPC). (SESSION: 744051, PROCESS: 667) 09/11/2013 08:03:32 ANR0408I Session 744081 started for server LINUXSERVER (Linux/x86_64) (Tcp/Ip) for general administration. (SESSION: 744051, PROCESS: 667) 09/11/2013 08:03:32 ANR0610I EXPORT NODE started by ADMINID as process 667. (SESSION: 744051, PROCESS: 667) 09/11/2013 08:03:33 ANR0635I EXPORT NODE: Processing node NODENAME in domain STANDARD. (SESSION: 744051, PROCESS: 667) 09/11/2013 08:03:33 ANR0637I EXPORT NODE: Processing file space \\NODENAME\c$ for node NODENAME fsId 1 . (SESSION: 744051, PROCESS: 667) 09/11/2013 08:03:33 ANR0637I EXPORT NODE: Processing file space \\NODENAME\d$ for node NODENAME fsId 2 . (SESSION: 744051, PROCESS: 667) 09/11/2013 08:04:09 ANR8337I LTO volume TAPEVOLUME mounted in drive TAPEDRIVE (/dev/rmt4). (SESSION: 744051, PROCESS: 667) 09/11/2013 08:04:09 ANR0512I Process 667 opened input volume TAPEVOLUME. (SESSION: 744051, PROCESS: 667) 09/11/2013 08:05:37 ANR1817W EXPORT NODE: No valid nodes were identified for import on target server LINUXSERVER. (SESSION: 744051, PROCESS: 667) 09/11/2013 08:05:37 ANR0562I EXPORT NODE: Data transfer complete, deleting temporary data. (SESSION: 744051, PROCESS: 667) 09/11/2013 08:05:37 ANR0405I Session 744081 ended for administrator LINUXSERVER (Linux/x86_64). (SESSION: 744051, PROCESS: 667) 09/11/2013 08:05:37 ANR0515I Process 667 closed volume TAPEVOLUME. (SESSION: 744051, PROCESS: 667) 09/11/2013 08:05:37 ANR0724E EXPORT NODE: Processing terminated abnormally - transaction failure. (SESSION: 744051, PROCESS: 667) 09/11/2013 08:05:37 ANR0626I EXPORT NODE: Copied 1 node definitions. (SESSION: 744051, PROCESS: 667) 09/11/2013 08:05:37 ANR0627I EXPORT NODE: Copied 2 file spaces 0 archive files, 1 backup files, and 0 space managed files. (SESSION: 744051, PROCESS: 667) 09/11/2013 08:05:37 ANR0629I EXPORT NODE: Copied 1589 bytes of data. (SESSION: 744051, PROCESS: 667) 09/11/2013 08:05:37 ANR0611I EXPORT NODE started by ADMINID as process 667 has ended. (SESSION: 744051, PROCESS: 667) 09/11/2013 08:05:37 ANR0986I Process 667 for EXPORT NODE running in the BACKGROUND processed 4 items for a total of 1,589 bytes with a completion state of FAILURE at 08:05:37. (SESSION: 744051, PROCESS: 667) 09/11/2013 08:05:37 ANR1893E Process 667 for EXPORT NODE completed with a completion state of FAILURE. (SESSION: 744051, PROCESS: 667) 09/11/2013 08:07:40 ANR2017I Administrator ADMINID issued command: QUERY ACTLOG search='process: 667' (SESSION: 744051)
Re: expire inventory seems to be hanging and not processing all nodes
-How long have you been at 6.3.4.0? I've seen this on slightly older servers -when Windows 2008 Server clients (and similar era Windows desktop, from what I -hear) back up their system states. You get huge numbers of small objects, and -expiration takes a long, long time to process those with no obvious progress. - -However, that's supposed to be fixed in 6.3.4.0. I'm wondering if you backed up -some such clients before upgrading to 6.3.4.0? - -Nick We've been running v6.3.4.0 for a few weeks now. -I saw a similar behavior on my 6.3.3.100 on Windows earlier this week. - -Expiration appeared to be stuck for a couple of hours having processed only 10 -or 15 nodes of 200. - -Do you have a duration specified to limit time? That might explain your 462 of -595. - -I had also tried to cancel, and also did the halt to clear the process. - -Upon restarting, then a new expire, it again appeared to stall, but in time -picked up and cruised through without issue. - -All expiration since that day has run fine, in "normal" behavior and timespan. I do not have duration limit on the process. I looked at a previous night and it ran in 90 minutes or less. The current one has been running for a few hours and is still hung at 462 out of 595 nodes and the number of objects hasn't changed. From: Dury, John C. Sent: Friday, August 09, 2013 12:03 PM To: ADSM-L (ADSM-L@VM.MARIST.EDU) Subject: expire inventory seems to be hanging and not processing all nodes I have an AIX server with TSM 6.3.4.0. Expiration is set to run at 2am and seems to get so far and then hangs. But what's odd is that is also seems to only process some of the nodes but not all. For example, I have it running now and it seems to be stuck: 1 Expiration Processed 462 nodes out of 595 total nodes, examined 4315 objects, deleting 4271 backup objects, 0 archive objects, 0 DB backup volumes, 0 recovery plan files; 0 objects have been retried and 0 errors encountered. I looked at previous nights and it ran successfully but only processed 462 nodes out of 595 and says it ended successfully. If I cancel the one that is running now, it just hangs and never cancels. The only way to get rid of the process is to halt the server. I have opened a problem with IBM but I thought I would see if any of you have seen this issue. Any idea why it isn't processing all 595 nodes?
expire inventory seems to be hanging and not processing all nodes
I have an AIX server with TSM 6.3.4.0. Expiration is set to run at 2am and seems to get so far and then hangs. But what's odd is that is also seems to only process some of the nodes but not all. For example, I have it running now and it seems to be stuck: 1 Expiration Processed 462 nodes out of 595 total nodes, examined 4315 objects, deleting 4271 backup objects, 0 archive objects, 0 DB backup volumes, 0 recovery plan files; 0 objects have been retried and 0 errors encountered. I looked at previous nights and it ran successfully but only processed 462 nodes out of 595 and says it ended successfully. If I cancel the one that is running now, it just hangs and never cancels. The only way to get rid of the process is to halt the server. I have opened a problem with IBM but I thought I would see if any of you have seen this issue. Any idea why it isn't processing all 595 nodes?
I could scream!
Way back when we had 1 TSM server at v5.x, we asked IBM several times if there was a way to migrate a TSM server to a different platform (AIX --> Linux) and they said "no". So we came up with a strategy to install a 2nd TSM server (Linux) at our alternate site and then have all local clients backup to it and then replicate to the main TSM server at our primary site (AIX). All local clients at primary site will do the same, backup to AIX server and then replicate to Linux server. Once both servers have full copies of each other's data, we will change all replicated nodes on the linux server to regular nodes and change all remote nodes to then backup to the linux server. Afterwards the AIX server would be phased out and a new Linux server would be created and the whole process reversed. Basically the goal was to get rid of the TSM AIX server and move to linux only since AIX hardware is much more expensive. So now with version 6.3.4, there is a way to migrate from an AIX 5.x server to a Linux x86_64 server which would have solved our problems. . Where was this years ago? Of course now we have 2 v6 servers and I am in the middle of the process described above. Being able to move from AIX 5.x to Linux 6.3.4 would have been the solution we were looking for but now it's too late. Sorry for my rant but this would have saved me a year's worth of work, at least.
suggestions on setting up a Dell MD3260 as primary storage pool
We are changing our backup system and plan on using a Dell MD3260 as a primary disk storage device. It will house all backups from the local site as well as replicated data from the remote site. It will function as a primary stgpool only. The tape library that is also attached will eventually be phased out. My question is, what recommendations does anyone have on setting up this device. I was thinking of creating several 2T virtual disks and then presenting them to the linux TSM server but I am not sure whether I want to put them all in a volume group or not since all of the striping is done on the back end of the device as all of the hard drives are in one giant dynamic disk pool. In each 2T disk, I was going to create 4 500G disk storage volumes. Or maybe 2 1T disk storage volumes. Because the disk storage volumes will be permanent and not eventually offloaded to tape, I want to make sure I set this up correctly. Any suggestions for best performance? Ext3? Ext2? Thanks.
Re: define server for replication
Excellent! This did the trick and I can now export a node to the other server. Thanks for your help. I believe this will also solve my other thread also "* [ADSM-L] move nodes from one server to a different server via export<http://adsm.org/lists/html/ADSM-L/2012-10/msg00018.html>" Hi, Define server uses servername and serverpa to validate where as ping uses admin name and pass to validate session. Make sure the admin and pass on server where you issue the ping also is avail on the server you ping to. Kind regards, Karel - Oorspronkelijk bericht - Van: Dury, John C. Verzonden: woensdag 3 oktober 2012 23:34 Aan: ADSM-L AT VM.MARIST DOT EDU Onderwerp: [ADSM-L] define server for replication We have 2 datacenters and plan on having a tsm server at each site. Every client will backup all o the nodes at that site to the tsm server at that site and then replicate their data to the server at the other site. Both servers are running tsm 6.3.2. One is aix and one is linux. The linux server is brand new and was just built. The aix server has been around seemingly forever and is currently backing up nodes from both sites. I've read through the manual on replication and believe I understand how it works. Just as a test, I successfully defined both servers to each other and successfully replicated one small node from one server to the other. For whatever reason, the servers stopped taking to each other (I can't find any reason anywhere) so I decided I would delete the replicated node on both servers and also delete each server from the other and redefine them. When I try "def server <2ndserver> serverpassword= hladdress=2ndserver.blah.blah lladdress=1500" (with correct values of course), it works successfully but when I try "ping server 2ndserver" it always fails with "ANR4373E Session rejected by target server 2ndserver, reason: Authentication Failure." I've tried making sure that all of the below were done on both servers: set serverhladdress set serverlladdress set serverpassword and tried with both crossdefine on and crossdefine off. I can't find any logic as to why this doesn't work when I know for a fact it worked previously. There are no nodes type=server defined on either tsm server. The userid I am using to do the command is the same on both servers and has the same password although I don't think that matters. Both servers have nothing currently set for target replication server. The replicated node I used for testing is also still defined on the target server and I can't delete it either because it says "ANR1633E REMOVE NODE: Node NODENAME is set up for replication and cannot be renamed or removed.". The replication settings on the replicated node on the target server are replstate=disabled replmode=receive. I can't change the replmode no matter what either. Is anyone else successfully using replication? It worked once and hasn't since and now as I am trying to set it all up again as if from scratch, I can't even get the two servers to talk anymore. Help!
Re: define server for replication
I was able to fix this and remove the replicated nodes. I had to redefine the whole infrastructure for replication just to delete it. It turns out this wasn't going to work for what we are trying to do anyways. See next message "move nodes from one server to a different server via export" Unfortunately I had already removed the replicated nodes on both servers and it still won't let me delete it. John Hi John About the second part of your message, ANR1633E REMOVE NODE: Node NODENAME is set up for replication and cannot be renamed or removed.". You need first to remove the REPLNODE: remove replnode NODENAME from both servers (source and target) Before you can remove the node Regards Robert -- Date:Wed, 3 Oct 2012 17:34:22 -0400 From:"Dury, John C." Subject: define server for replication We have 2 datacenters and plan on having a tsm server at each site. Every c= lient will backup all o the nodes at that site to the tsm server at that si= te and then replicate their data to the server at the other site. Both serv= ers are running tsm 6.3.2. One is aix and one is linux. The linux server is= brand new and was just built. The aix server has been around seemingly for= ever and is currently backing up nodes from both sites. I've read through the manual on replication and believe I understand how it= works. Just as a test, I successfully defined both servers to each other a= nd successfully replicated one small node from one server to the other. For whatever reason, the servers stopped taking to each other (I can't find= any reason anywhere) so I decided I would delete the replicated node on bo= th servers and also delete each server from the other and redefine them. When I try "def server <2ndserver> serverpassword=3D hladdr= ess=3D2ndserver.blah.blah lladdress=3D1500" (with correct values of course)= , it works successfully but when I try "ping server 2ndserver" it always fa= ils with "ANR4373E Session rejected by target server 2ndserver, reason: Authenticati= on Failure." I've tried making sure that all of the below were done on both servers: set serverhladdress set serverlladdress set serverpassword and tried with both crossdefine on and crossdefine off. I can't find any logic as to why this doesn't work when I know for a fact i= t worked previously. There are no nodes type=3Dserver defined on either tsm= server. The userid I am using to do the command is the same on both server= s and has the same password although I don't think that matters. Both serv= ers have nothing currently set for target replication server. The replicated node I used for testing is also still defined on the target = server and I can't delete it either because it says "ANR1633E REMOVE NODE: = Node NODENAME is set up for replication and cannot be renamed or removed.".= The replication settings on the replicated node on the target server are r= eplstate=3Ddisabled replmode=3Dreceive. I can't change the replmode no matt= er what either. Is anyone else successfully using replication? It worked once and hasn't si= nce and now as I am trying to set it all up again as if from scratch, I can= 't even get the two servers to talk anymore. Help! -- End of ADSM-L Digest - 2 Oct 2012 to 3 Oct 2012 (#2012-229) ***
move nodes from one server to a different server via export
Please ignore my previous post about trying to def a target server to source server and it failing. It looks like it works even though I still can't ping the target server from the source server. The replication works correctly though. A brief description of our environment. We have 2 datacenters. Let's call them Site A and site B. Both sites have TSM nodes which are all now currently backing up to the TSM server at site A. We would like to have a tsm server at both sites and have the clients at each respective site backup to their local site, and then replicate that data to the other site. I was hoping to be able to replicate node data from site A to site B and then convert that node to a regular node so the node at that site could then backup to site B. It doesn't look like there is any way to do that so the next choice is to export the node from server A to server B and then change the node so it backs up to server B instead of server A. Once that is done, I can enable replication between sites for that node so it will replicate from site B to site A and thus we have a DR solution for all of our nodes. Because this is tremendous amount of data I'd like to only have to send the data once from site A to site B via the export, and then use replication to replicate daily changes from site B back to site A which should be minimal data changes since the node data already exists at site A. Is this possible? I have both servers defined to each other and when I try to export a node's data from site A to site B I get: ANR0561E EXPORT NODE: Process aborted - sign on to server, failed. (SESSION: 118112, PROCESS: 318) Both servers have my admin id defined with the same password and both servers have the same password (although I don't think that matters). Site B was defined to site A via: def server siteB pass=server hladdress=*.*.*.* lladdress=1500 I also tried the other syntax: Def server siteB serverpass=password hladdress=*.*.*.* lladdress=1500 I also tried defining a node on siteB with the name of the server at siteA and type=server Force sync also didn't help. No matter what, I get the "sign on to server failed" message. Is it me, is "def server" extremely buggy?!
Re: ADSM-L Digest - 2 Oct 2012 to 3 Oct 2012 (#2012-229)
Unfortunately I had already removed the replicated nodes on both servers and it still won't let me delete it. John Hi John About the second part of your message, ANR1633E REMOVE NODE: Node NODENAME is set up for replication and cannot be renamed or removed.". You need first to remove the REPLNODE: remove replnode NODENAME from both servers (source and target) Before you can remove the node Regards Robert -- Date:Wed, 3 Oct 2012 17:34:22 -0400 From: "Dury, John C." Subject: define server for replication We have 2 datacenters and plan on having a tsm server at each site. Every c= lient will backup all o the nodes at that site to the tsm server at that si= te and then replicate their data to the server at the other site. Both serv= ers are running tsm 6.3.2. One is aix and one is linux. The linux server is= brand new and was just built. The aix server has been around seemingly for= ever and is currently backing up nodes from both sites. I've read through the manual on replication and believe I understand how it= works. Just as a test, I successfully defined both servers to each other a= nd successfully replicated one small node from one server to the other. For whatever reason, the servers stopped taking to each other (I can't find= any reason anywhere) so I decided I would delete the replicated node on bo= th servers and also delete each server from the other and redefine them. When I try "def server <2ndserver> serverpassword=3D hladdr= ess=3D2ndserver.blah.blah lladdress=3D1500" (with correct values of course)= , it works successfully but when I try "ping server 2ndserver" it always fa= ils with "ANR4373E Session rejected by target server 2ndserver, reason: Authenticati= on Failure." I've tried making sure that all of the below were done on both servers: set serverhladdress set serverlladdress set serverpassword and tried with both crossdefine on and crossdefine off. I can't find any logic as to why this doesn't work when I know for a fact i= t worked previously. There are no nodes type=3Dserver defined on either tsm= server. The userid I am using to do the command is the same on both server= s and has the same password although I don't think that matters. Both serv= ers have nothing currently set for target replication server. The replicated node I used for testing is also still defined on the target = server and I can't delete it either because it says "ANR1633E REMOVE NODE: = Node NODENAME is set up for replication and cannot be renamed or removed.".= The replication settings on the replicated node on the target server are r= eplstate=3Ddisabled replmode=3Dreceive. I can't change the replmode no matt= er what either. Is anyone else successfully using replication? It worked once and hasn't si= nce and now as I am trying to set it all up again as if from scratch, I can= 't even get the two servers to talk anymore. Help! -- End of ADSM-L Digest - 2 Oct 2012 to 3 Oct 2012 (#2012-229) ***
define server for replication
We have 2 datacenters and plan on having a tsm server at each site. Every client will backup all o the nodes at that site to the tsm server at that site and then replicate their data to the server at the other site. Both servers are running tsm 6.3.2. One is aix and one is linux. The linux server is brand new and was just built. The aix server has been around seemingly forever and is currently backing up nodes from both sites. I've read through the manual on replication and believe I understand how it works. Just as a test, I successfully defined both servers to each other and successfully replicated one small node from one server to the other. For whatever reason, the servers stopped taking to each other (I can't find any reason anywhere) so I decided I would delete the replicated node on both servers and also delete each server from the other and redefine them. When I try "def server <2ndserver> serverpassword= hladdress=2ndserver.blah.blah lladdress=1500" (with correct values of course), it works successfully but when I try "ping server 2ndserver" it always fails with "ANR4373E Session rejected by target server 2ndserver, reason: Authentication Failure." I've tried making sure that all of the below were done on both servers: set serverhladdress set serverlladdress set serverpassword and tried with both crossdefine on and crossdefine off. I can't find any logic as to why this doesn't work when I know for a fact it worked previously. There are no nodes type=server defined on either tsm server. The userid I am using to do the command is the same on both servers and has the same password although I don't think that matters. Both servers have nothing currently set for target replication server. The replicated node I used for testing is also still defined on the target server and I can't delete it either because it says "ANR1633E REMOVE NODE: Node NODENAME is set up for replication and cannot be renamed or removed.". The replication settings on the replicated node on the target server are replstate=disabled replmode=receive. I can't change the replmode no matter what either. Is anyone else successfully using replication? It worked once and hasn't since and now as I am trying to set it all up again as if from scratch, I can't even get the two servers to talk anymore. Help!
Re: TSM Server Upgrade from 5.5.5.2 to 6.3.2
I did the upgrade and for the most part it went smoothly. The upgrade wizard did insist on the directories being empty but I was able to create a subdirectory under my target directory and use that so I didn't have to have as much SAN space since I could reuse some directories. John With my upgrades I never did test the theory, but I think the directory you specify must be empty. You might try something like this: /tsmdb/v6 instead of /tsmdb Again this is a theory I did not test, but somewhere I saw an example of the DB location that was not on the root of the file system. Please post if this works, I am sure others would be interested. Andy Huebner -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of Dury, John C. Sent: Friday, August 17, 2012 11:06 AM To: ADSM-L AT VM.MARIST DOT EDU Subject: [ADSM-L] TSM Server Upgrade from 5.5.5.2 to 6.3.2 I am planning on upgrading our TSM server from v5.5.5.2 to v6.3.2 this weekend. I have checked and all of the prerequisites are in place and I think I am ready. My only question is, we currently have the v5 database in several directories that have several .DSM files in them but also have lots of free space in them, more than enough to house the V6 DB even if it doubles in size. . According to the upgrade manual, the new directories for the v6 server must be empty but I was hoping to be able to use the same directories where the current v5 database files live since they have more than enough space and then I would eventually delete the old DSM files from the v5 database once the upgrade is complete. When I go to do the upgrade/migration, will it let me use the same directories even though they have one or two files in them already? I realize it might slow the process down but I am limited on SAN space right now. This e-mail (including any attachments) is confidential and may be legally privileged. If you are not an intended recipient or an authorized representative of an intended recipient, you are prohibited from using, copying or distributing the information in this e-mail or its attachments. If you have received this e-mail in error, please notify the sender immediately by return e-mail and delete all copies of this message and any attachments. Thank you.
TSM Server Upgrade from 5.5.5.2 to 6.3.2
I am planning on upgrading our TSM server from v5.5.5.2 to v6.3.2 this weekend. I have checked and all of the prerequisites are in place and I think I am ready. My only question is, we currently have the v5 database in several directories that have several .DSM files in them but also have lots of free space in them, more than enough to house the V6 DB even if it doubles in size. . According to the upgrade manual, the new directories for the v6 server must be empty but I was hoping to be able to use the same directories where the current v5 database files live since they have more than enough space and then I would eventually delete the old DSM files from the v5 database once the upgrade is complete. When I go to do the upgrade/migration, will it let me use the same directories even though they have one or two files in them already? I realize it might slow the process down but I am limited on SAN space right now.
backup tape stgpool to tape stgpool pins recovery log
I have 2 sl500 tape libraries attached via dark fiber (which doesn't appear to be getting anywhere close to the speeds it should but that's another issue) to an AIX TSM 5.5 server. One is copy storage pool and the other a primary. I backup the primary to the copy storage pool daily. This process is taking an extremely long time and seems to hang because there is at least one extremely large file that while it is copying, pins the recovery log and does not finish before the recovery log is over 80% which then causes TSM to slow to a crawl so I end up cancelling the backup storage pool process. This is hanging my TSM server on a daily basis because the recovery keeps getting pinned by the backup storage process. Is there any way at all I can find out what node or node and filespace, is taking so long to backup so I can verify it really needs to be backed up at all?
Migrating from AIX to Linux (again)
Our current environment looks like this: We have a production TSM server that all of our clients backup to throughout the day. This server has 2 SL500 tape libraries attached via fiber. One is local and the other at a remote site which is connected by dark fiber. The backup data is sent to the remote SL500 library several times a day in an effort to keep them in sync. The strategy is to bring up the TSM DR server at the remote site and have it do backups and recovers from the SL500 at that site in case of a DR scenario. I've done a lot of reading in the past and some just recently on the possible ways to migrate from an AIX TSM server to a Linux TSM server. I understand that in earlier versions (we are currently at 5.5.5.2) of the TSM server it allowed you to backup the DB on one platform (AIX for instance) and restore on another platform (Linux for instance) and if you were keeping the same library, it would just work but apparently that was removed by IBM in the TSM server code to presumably prevent customers from moving to less expensive hardware. (Gee, thanks IBM! ). I posted several years ago about any possible ways to migrate the TSM Server from AIX to Linux. The feasible solutions were as follows: 1. Build new linux server with access to same tape library and then export nodes from one server to the other and then change each node as it's exported, to backup to the new TSM Server instead. Then the old data in the old server can be purged. A lengthy and time consuming process depending on the amount of data in your tape library. 2. Build a new TSM linux server and point all TSM clients to it but keep the old TSM server around in case of restores for a specified period of time until it can be removed. There may have been more options but those seemed the most reasonable given our environment. Our biggest problem with scenario 1 above is exporting the data that lives on the remote SL500 tape library would take much longer as the connection to that tape library is slower than the local library. I can probably get some of our SLAs adjusted to not have to export all data and only the active data but that remains to be seen. My question. Has any of this changed with v6 TSM or has anyone come up with a way to do this in a less painful and time consuming way? Hacking the DB so the other platform code doesn't block restoring an AIX TSM DB on a Linux box? Anything? Thanks again and sorry to revisit all of this again. Just hoping something has changed in the last few years. John
Re: nightmares with a STK SL500 tape library,
I'll try to answer your questions but there were a lot! The drives in both libraries are IBM LTO4 drives. Both libraries currently have different version library code although the faulty library has been upgraded several times to different versions as they were available, and the problem still occurred. Both libraries are connected to the same TSM server and each is on a different HBA. I've tried man different zoning options and they solved nothing. The LTO 4 drives have also been upgraded to newer firmware levels as they have become available., again solving nothing. I've looked in the SAN switch and in TSM and see nothing explaining why the robot is going offline. I've sent several logs from the robot itself and STK/SUN/ORACLE support says they see no errors except that the robot has gone offline. They have said when visiting, that the robot seems to vibrate/shimmy sometimes which is why they think it Is going offline. They also told me that there was a known issue in previous library firmware versions, that if the management Ethernet cable was plugged in, it could cause the robot to go offline (something to do with surges from the network switch) but this was supposedly fixed in the version we are running. Currently we have the Ethernet unplugged (remember data goes across fiber only) to see if this solves anything. Unfortunately it may be a month or longer before we even know. Absurd! Hopefully that answered some of all of your questions. Thanks for responding btw.
nightmares with a STK SL500 tape library
We purchased an STK SL500 tape library with 4 LTO4 drives in it a few years ago and we have had nothing but problems with it, almost from the beginning. It is fully loaded with LTO4 cartridges (about 160) and seems to randomly just crash and take all of the drives offline to TSM. We also have a second SL500 that is at a remote site and connected to the same TSM server , and it has no problems at all. The remote SL500 has copies (backup stg pool) of the local SL500. We've gone round and round with STK/Oracle support and they have actually come onsite and physically replaced the entire robot and all of it's parts, several times and they can never find a reason as to what is causing it to go offline. Keep in mind this has been happening about once a month or so for over a year. My questions to all of you is not so much what could be wrong (although if you have ideas, that would be great also), but, we are considering a new robot and are hoping to be able to use or reuse our existing LTO4 tapes. Right now it has about 80 scratches so if we were to goto a second library, I should be able to have both defined to TSM and move the data from one to the other after putting some of the scratches in the new library and labeling/initializing them until all data is in the new library and then I can light the old one on fire (j/k) ! Like most IT departments we are severely budget constrained so we would like to reuse the tape drives and the tape cartridges and only purchase a robot that can handle 160 slots or so. Suggestions if this is even an option or which robots and/or models to look at? Remember, very little budget for this if I could even get it approved at all but we really don't know what else to do with the bad SL500 at this point and we have a project coming up that is going to increase the amount and flow of data to our TSM system significantly within the new few years. Help! John
DISREGARD: backup storage pool slow - find offending file
Ignore this. I got lucky and querying the contents of the volumes in use shows only 1 file. I was expecting it to be much larger since these are LTO4 drives and can hold lots of data. From: Dury, John C. Sent: Wednesday, November 03, 2010 4:37 PM To: ADSM-L (ADSM-L@VM.MARIST.EDU) Subject: backup storage pool slow - find offending file We back all of our nightly data up to an SL 500 LTO4 tape library and then during the day, that cloud of data is also copied to another SL500 LTO4 tape library at a remote location. The backup storage pool is taking a significantly long time to run and appears to be running for a long time because there seems to be one file that takes an especially long time to copy from one tape library to the other as the backup storage pool is using 3 drives at once and every single day, two of the three processes finish and one runs much much longer. How can I track this down to find out the file that is causing the backup storage pool process to run for such a long time? I'm hoping to narrow it down to a file/server to see if it can possibly be excluded or dealt with a different way. Thanks.
backup storage pool slow - find offending file
We back all of our nightly data up to an SL 500 LTO4 tape library and then during the day, that cloud of data is also copied to another SL500 LTO4 tape library at a remote location. The backup storage pool is taking a significantly long time to run and appears to be running for a long time because there seems to be one file that takes an especially long time to copy from one tape library to the other as the backup storage pool is using 3 drives at once and every single day, two of the three processes finish and one runs much much longer. How can I track this down to find out the file that is causing the backup storage pool process to run for such a long time? I'm hoping to narrow it down to a file/server to see if it can possibly be excluded or dealt with a different way. Thanks.
backup storage pool seems to run forever
We have an AIX server running TSM server v5.5.4.0 that is attached to two tape libraries, one remote and one local. The tape libraries are both STK SL500 libraries with 4 LTO4 drives in each. The remote library is configured as a copy storage pool. Several times through the day, we run "backup stgpool" to backup the local tape library to the remote tape library to try and keep them in sync using 3 processes. Two of the processes run to completion but the 3rd process seems to hang. It only backs up 1 file from library to library and then never progresses any further, even after many many hours. Because it seems to hang, it ends up pinning the recovery log which just keeps filling up until I end up cancelling the "backup stg" process which unpins the recovery log. I've checked and it doesn't seem to be the same tapes each time. Unfortunately this seems to happen several times, and often in the middle of the night. Is there any way to diagnose why the backup storage pool process is stopping/hanging at 1 file? I don't see any errors anywhere, either in the activity log or in the tape library. If it at all possible,is there a way to find the one file that seemingly takes forever to complete. This has been going on for quite awhile now and pinning the recovery log lead to quite a few other problems (as I'm sure you all know). Thanks, John
problems with STK SL500 library and drives constantly going offline
We have an STK SL500 tape library with 4 LTO4 drives in it and it seems to randomly take all of the drives offline. STK/Sun has replaced the robot and controller card (more than once) and nothing seems to help. This has been going on for several months now and I'm at a loss to figure out what is causing it. The SL500 and it's drives are all attached to a Brocade DS-4900 switch. Nothing has changed on the switch in over a year as putting maintenance on them is a real chore since we have so many devices attached to it. The only message I see in the act log is: ANR8848W Drive WRLTO4-1 of library WRSL500 is inaccessible; server has begun polling drive. (PROCESS: 832) As best I can tell, it is a hardware problem as even rebooting the robot or power cycling the whole box doesn't fix it. The robot stays parked at the top and never seems to go through the audit it usually does when it is rebooted. If it is indeed a hardware problem, I realize there isn't much that can be done but they have replaced almost all the parts in the box and it keeps happening. Anyone else ever have similar problems? John
Disk Storage pool filling faster than it is emptying. ideas?
We have a 3 TB disk storage pool that until recently, has been filling up partially during backups, and then emptying as the data is migrated to tape. The storage pool is now at 84% and climbing. I have 4 LTO4 Ultrium drives that can't seem to migrate the data fast enough at all. Watching nmon, I am only getting around 30MB/s total when all four drives are in use. They are all assigned to ne HBA because of a lack of slots in the AIX server. I previously had cache turned on for the disk storage pool (it is now off), so I am wondering if the terrible performance is because the disk storage pool is extremely fragmented because cache was turned on for so long. Any ideas how to offload the data to tape faster? John
Re: Our TSM system is a mess. Suggestions? Ideas?
First, I want to thank all of you for your replies. I definitely got some good ideas and have some things to look at. I'm going to make some changes to where the DB and rec log are stored. Right now, they are in the same RAID 1 Group with 2 133g drives. I created a new lun of 6 133g drives setup as RAID 1/0. Eventually all of this will be moved to a bigger box with the storage pools and DB and Recovery log living on local disks. Thanks everyone!
Our TSM system is a mess. Suggestions? Ideas?
We have about 500 nodes and have a backup windows from 5pm until 7am. I have our backup schedule setup so that about 30 nodes do incremental per hour with a few exceptions. We have a 3T disk storage pool and 4 LTO4 drives in our tape library. Our dbbackuptrigger is set at logfull 30% and numincrmeentals of 4. Our recovery log is filling up almost once per hour while backups are running and not emptying fast enough before it hits 80% when all backups come to a crawl until it is emptied below 80%. Sometimes the recovery log is pinned at 70% or so and another backup kicks off immediately which again does not empty fast enough and the whole system goes into slowdown after the recovery log is past 80%. Expiration, which used to run in a matter of about 6 hours, is not completing even after running for 24 hours. Our DB is about 97gig and about 74% full. The recovery log is maxed at 13gig. I don't see anything in the activity log out of the ordinary. The TSM server is AIX 5.3.10.1 TL10 running on an IBM 9131-52A in a logical partition with 20 CPus configured and about 32G of RAM. The TSM DB and disk storage pools are attached to a Clariion CX3-80 via 4G Hbas. I have the recovery log and TSM DB set to use different HBAs then the disk or tape storage pools so the HBAs aren't fighting each other. I've read the tuning and performance manual and matched our settings to match it's suggestions with some small exceptions. We have purchased new hardware to move the whole system to Linux and a monster of a box since we want to get to TSM v6.x eventually, hopefully sooner rather than later. AIX hardware and support is tremendously expensive when compared to an intel based box and like a lot of people, we have a very small budget for anything IT related. . One of the biggest problems we are having is the recovery log filling up too quickly and not emptying fast enough. Even with a log full trigger of 30%, the incremental backup won't finish before the recovery log hits 80% and with the log full setting so low, we are doing TSM DB backups almost every hour while clients are backing up. This really seems excessive to me. Why would an incremental backup of the TSM DB take an hour or so to run and is it normal for the recovery log to fill up so fast while backups are running? We even attempted to do a reorg of the TSM DB but unfortunately it was going to run for much longer than our window allowed so it had to be cancelled. I'm going to try again for next weekend and hopefully talk the powers that be, into a 24 hour window for the reorg. We did do a reorg years ago and the performance improvements were amazing, ie expiration ran in less than an hour. I know that is a bandaid but I have to do something until I can get to version 6 when I can have a bigger recovery log and a new, more powerful server in place. I guess I'm just not sure what to look at at this point and frankly I'm exhausted. Our help desk is calling me daily, every day, at 6am or earlier, as "TSM is running slow again". Any suggestions on what else to look at? (Sorry for such a fragmented email. I've had about 3 hours sleep at this point)
VSS and system state failures on windows 2003 clients
I've been fighting this for several weeks now and I'm getting very frustrated. I have several windows 2003 clients that consistently get the errors listed below. I've done some research and installed the following patches as per recommendations from several others and nothing seems to fix the errors. I've tried multiple version of the TSM client (v5 and v6). I've delete the file space that has the current system state. All of the latest maintenance is installed on the Windows 2003 clients. Sometimes I can get the "dsmc backup systemstate" to work for one night but then the very next night, it fails again. This is on several windows 2003 clients. I did add the line "PRESCHEDULECMD "ntbackup backup systemstate /F C:\temp\SystemStateBackup.bkf /L:S"" in the dsm.opt for the offending clients so I at least have a clean backup of the system state. I'm not sure where to turn now. Is windows 2003 support in TSM really this bad? Any ideas suggestions or hints are more than welcome. TSM Client version tried (all 32bit) 5.5.2.7 5.5.2.10 6.1.2.0 6.1.3.1 Errors received on several windows 2003 clients CLIENT 1 02/11/2010 09:26:25 ANS1577I The Windows console event handler received a 'Ctrl-C' console event. 02/11/2010 09:33:29 VssRequestor::checkWriterStatus: VssRequestor::checkWriterStatus failed with hr=VSS_E_WRITER_NOT_RESPONDING 02/11/2010 09:33:29 ANS5268W The Microsoft Volume Shadow Copy Services writer 'Removable Storage Manager' current state (VSS_WS_STABLE) is not valid for the current operation. 02/11/2010 09:33:29 ANS5274E A Microsoft Volume Shadow Copy Services writer is in an invalid state after backup completion. 02/11/2010 09:33:29 ANS5250E An unexpected error was encountered. TSM function name : CompleteVssSnapshot TSM function : psVssBackupComplete() failed TSM return code : 4345 TSM file : txncon.cpp (4324) 02/11/2010 09:33:34 ANS1999E Incremental processing of '\\client1\d$' stopped. 02/11/2010 09:33:35 ANS4006E Error processing '\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy3': directory path not found CLIENT2 02/11/2010 04:10:24 VssRequestor::checkWriterStatus: VssRequestor::checkWriterStatus failed with hr=VSS_E_WRITER_NOT_RESPONDING 02/11/2010 04:10:24 ANS5268W The Microsoft Volume Shadow Copy Services writer 'Removable Storage Manager' current state (VSS_WS_FAILED_AT_FREEZE) is not valid for the current operation. 02/11/2010 04:10:54 VssRequestor::checkWriterStatus: VssRequestor::checkWriterStatus failed with hr=VSS_E_WRITER_NOT_RESPONDING 02/11/2010 04:10:54 ANS5268W The Microsoft Volume Shadow Copy Services writer 'Removable Storage Manager' current state (VSS_WS_FAILED_AT_FREEZE) is not valid for the current operation. 02/11/2010 04:11:24 VssRequestor::checkWriterStatus: VssRequestor::checkWriterStatus failed with hr=VSS_E_WRITER_NOT_RESPONDING 02/11/2010 04:11:24 ANS5268W The Microsoft Volume Shadow Copy Services writer 'Removable Storage Manager' current state (VSS_WS_FAILED_AT_FREEZE) is not valid for the current operation. 02/11/2010 04:11:54 VssRequestor::checkWriterStatus: VssRequestor::checkWriterStatus failed with hr=VSS_E_WRITER_NOT_RESPONDING 02/11/2010 04:11:54 ANS5268W The Microsoft Volume Shadow Copy Services writer 'Removable Storage Manager' current state (VSS_WS_FAILED_AT_FREEZE) is not valid for the current operation. 02/11/2010 04:11:54 ANS5271E A Microsoft Volume Shadow Copy Services writer is in an invalid state before snapshot initialization. 02/11/2010 04:11:54 ANS5250E An unexpected error was encountered. TSM function name : baHandleSnapshot TSM function : BaStartSnapshot() failed. TSM return code : 4353 TSM file : backsnap.cpp (3767) 02/11/2010 04:11:54 ANS1327W The snapshot operation for 'client2\SystemState\NULL\System State\SystemState' failed with error code: 4353. 02/11/2010 04:11:54 ANS5283E The operation was unsuccessful. CLIENT3 02/11/2010 08:59:14 ANS1577I The Windows console event handler received a 'Ctrl-C' console event. 02/11/2010 09:06:14 VssRequestor::checkWriterStatus: VssRequestor::checkWriterStatus failed with hr=VSS_E_WRITER_NOT_RESPONDING 02/11/2010 09:06:14 ANS5268W The Microsoft Volume Shadow Copy Services writer 'Removable Storage Manager' current state (VSS_WS_STABLE) is not valid for the current operation. 02/11/2010 09:06:14 ANS5274E A Microsoft Volume Shadow Copy Services writer is in an invalid state after backup completion. 02/11/2010 09:06:14 ANS5250E An unexpected error was encountered. TSM function name : CompleteVssSnapshot TSM function : psVssBackupComplete() failed TSM return code : 4345 TSM file : txncon.cpp (6428) 02/11/2010 09:06:15 ANS1228E Sending of object '\\CLIENT3\e$\ABURNS\Work Plan Analyst\2008 Files\2008 Damage Claims\WO #315451' failed 02/11/2010 09:06:15 ANS4021E Error processing '\\ CLIENT3\e$\ABURNS\Work Plan Analyst\2008 Files\2008 Damage Claims\WO #315451': file system not ready 02/11/201
Performance and migration: AIX vs Linux
We are currently running TSM server v5530 under AIX. The AIX server has a mixture of different speed (266mhz and 133mhz, both 64bit) PCI-X slots. With 4 4g HBAs. Our system is connected to a Clariion CX3-80 where the TSM DB and Recovery Log and Disk Storage pools live. The disk parts of TSM have 2 4G HBAs and powerpath installed and configured for load balancing across both. We have 2 STK SL500 libraries, both with 4 LTO4 drives. One is local and the other is connected via a 2 2G fiber paths to our remote site. The remote SL500 is defined as a copy storage pool. All of the local tape traffic is on one HBA and the remote is on another. Disk and tape do not have any traffic on the same HBAs ever. I used a script I found in past emails and the best performance we can get on the local SL500 is around 30MBs. The best performance we can get on the remote library is about 10MBs. We recently redid the AIX box and rezoned so the disks would be using the fastest PCI-X slots and the tape would be using the slower ones to try and improve disk performance. This did help some as expiration was running for about 8 hours and now it runs in about 4. Our TSM DB is 76800 MB and is 75% full right now. I've looked through the latest TSM performance and tuning manual and tweaked everything as per recommendations for our setup. We are considering migrating our AIX server to an Intel based Linux box that has a mixture of 8x and 4x PCI Express slots in hopes of maximizing performance on both disk and tape and also saving money as AIX boxes are considerably more expensive. So here are my questions: 1.How hard is it to move from an AIX TSM server box to a Linux TSM server? I'm hoping it's as easy as building the new box (tape drive,stg pool etc) and then restoring the DB and tweaking the new config. I know there is more to it than that but without researching it yet, that seems like a logical high level overview. 2.Will there be much of a performance difference between and AIX based TSM server and a Linux based TSM server? 3. Going from mixed speeds slots of PCi-X to PCI Express 8x and 4x slots should be a significant improvement correct? I know there are a lot of factors here but we are concerned we aren't getting the best performance for our existing hardware. 30MBs (not Mbps) for a LTO4 drives seems pretty slow and with 4 4x PCI-E and 4 8x PCI-E slots, I can balance out the I/O across cards and slots much better than I can now. Comments and criticisms? Linux vs AIX? Thanks for any insight!