Re: versioning / expiring / multiple backups under same nodename
Quick and dirty answer - the risk is *between 0 and 100* and cannot be evaluated *exactly* ! If there is no failover during the month the risk is certainly 0. In case of failover or failback during the backup window the risk is ~= 100. And can either you or your customer forget for while about backups and just explain which resource is where at specific time after several failovers and failbacks? The configuration your described is somewhat messy and the main question goes in cluster configuration direction and aside from TSM/backup. Lets assume we have a good cluster configuration (other people here called it supported). Lets catch the bull horns and go direct to five systems with machines M1-M5, local resources LR1-LR5 and cluster resources CR1-CR4 (or even CR5, it does not change the things): - create dsm.sys for M1-M4 containing server/node stanzas for nodes LR1+CR1, ... , LR4 + CR4 - create dsm.sys for M5 containing stanzas for LR5+CR1+CR2+CR3+CR4 - create dsmLRx.opt for each M1-M5 in a local (non-shared) resource and use it to start OScluster binaries - create dsmCRn.opt for each CR1-CR4 in a shared filesystem accessible after failover/failback - configure on OS start the startup of LRx TSM client scheduler using dsmLRx.opt - configure on primary nodes boot startup of cluster TSM client schedule using CRn.opt - ensure failover/failback scripts stop the cluster resource TSM client scheduler on the failing server and consecutive startup of this scheduler on the new active node (in your case it would be always Mn - M5 or M5 - Mn). Trigger new incremental backup immeditely after failover/failback to secure interrupted backup during failover. - ensure correct domain statements in each stanza Using something similar the risk can be estimated - it is close to the risk of backing up an ordinary server. The increased server availability due to clustering does not lower backup risk, it lowers downtime risk. And this might be used not only for all-fail-over-one but even for any-fail-anywhere cluster. So do it by the book and you will go the paved road, do it as you wish and enjoy the jungle. If nobody used this before none could evaluated it. Zlatko Krastev IT Consultant Warren, Matthew James [EMAIL PROTECTED] on 21.01.2002 14:22:43 Please respond to ADSM: Dist Stor Manager [EMAIL PROTECTED] To: [EMAIL PROTECTED] cc: Subject:Re: versioning / expiring / multiple backups under same nodename We know the correct way to be backing up the cluster. The customer did not implement it this way, but we are recommending them to do so. Before they switch from their current set-up to how we are recommending they backup the cluster (With 2 TSM environments on each machine, one for local disk, the other for shared disk) they would like to know exactly what risk they are runing with their current setup. Matt. -Original Message- From: Mike Yager [mailto:[EMAIL PROTECTED]] Sent: Friday, January 18, 2002 4:27 PM To: [EMAIL PROTECTED] Subject: Re: versioning / expiring / multiple backups under same nodename Exactly, while im not doing TSM on my clusters (yet) Daniel is correct. In my MS-clustered environment, I have the following entries in DNS SRV01 SRV02 CLUSTER1 DB2-01 the last 2 are a logical name of the cluster resource. This allows the resource to be managed separately from the physical machine or other resources and/or physical machine. I would caution you to be careful as you indicated the resources DO show up as being owned by the controlling node. IE DON'T back it up via that attachment even if you can see it as when it changes to a new node you will have a different name as you mentioned Please keep us posted as I'll be headed down this road shortly. -mike ?-- The description you gave tells me there is something wrong with your configuration. Normally when you set up TSM to handle clustering, you have one TSM nodename for each clusternode(M1,M2,M3). These nodenames are only for backing up local files on the node. Then you have either one nodename for each clusterresource, or one nodename for all clusterresources. You also have to bind the nodename to the clusterresource, so that the TSM service that handles the cluster nodename, moves with the clusterresource. - Michael Yager IBM Global Services (919)382-4808
Re: versioning / expiring / multiple backups under same nodename
We know the correct way to be backing up the cluster. The customer did not implement it this way, but we are recommending them to do so. Before they switch from their current set-up to how we are recommending they backup the cluster (With 2 TSM environments on each machine, one for local disk, the other for shared disk) they would like to know exactly what risk they are runing with their current setup. Matt. -Original Message- From: Mike Yager [mailto:[EMAIL PROTECTED]] Sent: Friday, January 18, 2002 4:27 PM To: [EMAIL PROTECTED] Subject: Re: versioning / expiring / multiple backups under same nodename Exactly, while im not doing TSM on my clusters (yet) Daniel is correct. In my MS-clustered environment, I have the following entries in DNS SRV01 SRV02 CLUSTER1 DB2-01 the last 2 are a logical name of the cluster resource. This allows the resource to be managed separately from the physical machine or other resources and/or physical machine. I would caution you to be careful as you indicated the resources DO show up as being owned by the controlling node. IE DON'T back it up via that attachment even if you can see it as when it changes to a new node you will have a different name as you mentioned Please keep us posted as I'll be headed down this road shortly. -mike ?-- The description you gave tells me there is something wrong with your configuration. Normally when you set up TSM to handle clustering, you have one TSM nodename for each clusternode(M1,M2,M3). These nodenames are only for backing up local files on the node. Then you have either one nodename for each clusterresource, or one nodename for all clusterresources. You also have to bind the nodename to the clusterresource, so that the TSM service that handles the cluster nodename, moves with the clusterresource. - Michael Yager IBM Global Services (919)382-4808
Re: versioning / expiring / multiple backups under same nodename
Hi To start with, they are running a configuration that is not supported by either Tivoli nor IBM. Second of all, running this configuration will result in inconsistent data, and you could question whether or not a complete a full restore of the system. The problem you have to describe to your customer, is that there is a large chance, that they won't be able to restore the cluster environment in case of a disaster. Therefore, it is in their own interest to move over to the supported configuration a.s.a.p. This is not a question of which way to configure the clustered environment, but instead of moving to the only way that will work. Best Regards Daniel Sparrman --- Daniel Sparrman Exist i Stockholm AB Bergkällavägen 31D 192 79 SOLLENTUNA Växel: 08 - 754 98 00 Mobil: 070 - 399 27 51 Warren, Matthew James To: [EMAIL PROTECTED] matthewjames.warrecc: [EMAIL PROTECTED] Subject: Re: versioning / expiring / multiple Sent by: ADSM:backups under same nodename Dist Stor Manager [EMAIL PROTECTED] DU 2002-01-21 13:22 Please respond to ADSM: Dist Stor Manager We know the correct way to be backing up the cluster. The customer did not implement it this way, but we are recommending them to do so. Before they switch from their current set-up to how we are recommending they backup the cluster (With 2 TSM environments on each machine, one for local disk, the other for shared disk) they would like to know exactly what risk they are runing with their current setup. Matt. -Original Message- From: Mike Yager [mailto:[EMAIL PROTECTED]] Sent: Friday, January 18, 2002 4:27 PM To: [EMAIL PROTECTED] Subject: Re: versioning / expiring / multiple backups under same nodename Exactly, while im not doing TSM on my clusters (yet) Daniel is correct. In my MS-clustered environment, I have the following entries in DNS SRV01 SRV02 CLUSTER1 DB2-01 the last 2 are a logical name of the cluster resource. This allows the resource to be managed separately from the physical machine or other resources and/or physical machine. I would caution you to be careful as you indicated the resources DO show up as being owned by the controlling node. IE DON'T back it up via that attachment even if you can see it as when it changes to a new node you will have a different name as you mentioned Please keep us posted as I'll be headed down this road shortly. -mike ?-- The description you gave tells me there is something wrong with your configuration. Normally when you set up TSM to handle clustering, you have one TSM nodename for each clusternode(M1,M2,M3). These nodenames are only for backing up local files on the node. Then you have either one nodename for each clusterresource, or one nodename for all clusterresources. You also have to bind the nodename to the clusterresource, so that the TSM service that handles the cluster nodename, moves with the clusterresource. - Michael Yager IBM Global Services (919)382-4808
versioning / expiring / multiple backups under same nodename
Hi TSM'ers, (this ones a little long winded, sorry :) ) I have a customer who wishes to assess the maximum risk he would incurr in the following situation; We have a copygroup for backup set for 31 day point-in-time recovery. We do not have nolimit for any copygoup parameters - we assume there will only be a single backup each day. The customer has a 5 node cluster. 1 - 4 are production machines, 5 is a failover machine. They would like to know the risk involved when, should a machine be failed over to 5, they back up the data now visible to 5 under the nodename of 5 instead of the original machines nodename, the original machine continues to run a backup as well (this would only see local disk, as a portion of the failed over machines disk is now visible to 5, and hence mark all the non-visible files as inactive) We have told them backups would become inconsistent within filespaces that have the same names across machines, and showed them how fiddly it would be to restore a machine if they had only had one failover occurr in a 31 day period. They would like to know exactly what the risks are if they have multiple failovers within a month, and have multiple machines backing up same-named files under a single nodename!! They won't take 'It won't work' as a answer, they would like to know how it will impact the point in time restore capability for a particular machine, if they keep track of what machines failed over when. As far as I can work out with penpaper, in a worst case, for a 3 machine cluster where 1 2 can failover to 3 at any time, the maximum impact would be to reduce the point-in-time restore capability for a particular machine by the number of days that machines have been failed over to 3 in the last 31 day period, because files with the same path filename on machines 1 and 2 would expire early if they change more often on one machine than they do on another. I get a headache if I try and extend this to a 5 machine cluster. do you other TSM'ers agree? and, I know from our perspective this is a 'silly' thing to work out because they should listen to the advice of the people that know switch to backing things up correctly, but they are insisting they have this info... Any help is much appreciated! Thankyou, Matt.
Re: versioning / expiring / multiple backups under same nodename
Hi, I have a customer who wishes to assess the maximum risk he would incurr in the following situation; We have a copygroup for backup set for 31 day point-in-time recovery. We do not have nolimit for any copygoup parameters - we assume there will only be a single backup each day. The customer has a 5 node cluster. 1 - 4 are production machines, 5 is a failover machine. They would like to know the risk involved when, should a machine be failed over to 5, they back up the data now visible to 5 under the nodename of 5 instead of the original machines nodename, the original machine continues to run a backup as well (this would only see local disk, as a portion of the failed over machines disk is now visible to 5, and hence mark all the non-visible files as inactive) We have told them backups would become inconsistent within filespaces that have the same names across machines, and showed them how fiddly it would be to restore a machine if they had only had one failover occurr in a 31 day period. They would like to know exactly what the risks are if they have multiple failovers within a month, and have multiple machines backing up same-named files under a single nodename!! It depends how the cluster is configured. The TSM Client must be part of the resource group and inside of dsm.sys you must create several stanzas with the TCPPort and nodename forced to diferent numbers and names. And when you start the TSM Client Scheduler you must force the right dsm.opt with -optfile option. They won't take 'It won't work' as a answer, they would like to know how it will impact the point in time restore capability for a particular machine, if they keep track of what machines failed over when. As far as I can work out with penpaper, in a worst case, for a 3 machine cluster where 1 2 can failover to 3 at any time, the maximum impact would be to reduce the point-in-time restore capability for a particular machine by the number of days that machines have been failed over to 3 in the last 31 day period, because files with the same path filename on machines 1 and 2 would expire early if they change more often on one machine than they do on another. It's impossible two machines failover at same time to a third machine if the first two have the same filesystems. They even would import the VGs. You must check if this information it's true. I get a headache if I try and extend this to a 5 machine cluster. do you other TSM'ers agree? Yes, to mange would be a little bit hard. But the client probably has good administrators. Or in worst case you can sell a support contract to administer his environment. :-) and, I know from our perspective this is a 'silly' thing to work out because they should listen to the advice of the people that know switch to backing things up correctly, but they are insisting they have this info... Any help is much appreciated! Thankyou, Matt.
Re: versioning / expiring / multiple backups under same nodename
Thanks, but, the mechanics of the failovers etc.. is fine. only 1 machine will be failed over at any one time. I'll try and clarify; M1 and M2 share some common filespace / dirpath names. M3 is failover machine. Normal; M1 backs up to tsm under node M1, M2 backs up to TSM under node M2, M3 backs up to TSM under nodename M3. if M1 fails over to M3, M3 will now capture M1's files form the shared disk unde hte nodename M3, M1 backs up, but cannot see the shared disk area, so TSM marks all the shared disk files under nodename M1 as inactive. That goes on for a couple of days. Then M1 fails back to M1. M3 backs up, all M1's shared disk files go inactive under nodename M3, and become active files again on M1 under nodename M1. ..Then(!) M2 fails over to M2. The above process is repeated, but is complicated bacause M3 shares filespace names with M1, so, any duplicate filenames will back up and increase the version count of that file under nodename M3; but the version count will be too high as it counts versions from both M1 and M2. This will cause the files to expire earlier than they would have done from M3 than if they had only ever been backed up under the original machine nodename. ..Does anyone follow this? :-/ basically (!) M1, M2 share dirpaths and filenames. The actual data is unique to each machine and is held on a slice of disk that only that machine has access to. M3 is a failover. When a machine is failed over to M3, that machines slice of disk is mounted on M3. The original machine still backs up, but can only see it's local O/S disk. M3 runs backups of all the disk it can see each evening, under the nodename M3. So, if M1 is failed over, its files are backed up under the nodename M3. ..So far, no problem. If you know what days you were failed over you can just get the files from the M3 nodename using -pitd / -pitt or -pick But, M1 fails back to M1, and then M2 fail over to M3. When M3 backs up, it will see M2's disk and save it under the nodename M3. PROBLEM! The shared filespace names between M2 and M1 will now cause TSM to mark files inactive, or back them up creating versions / expirations that should not be happening. Arg! Can anyone see what I'm getting at? -Original Message- From: Anderson F. Nobre [mailto:[EMAIL PROTECTED]] Sent: Wednesday, January 16, 2002 6:30 PM To: [EMAIL PROTECTED] Subject: Re: versioning / expiring / multiple backups under same nodename Hi, I have a customer who wishes to assess the maximum risk he would incurr in the following situation; We have a copygroup for backup set for 31 day point-in-time recovery. We do not have nolimit for any copygoup parameters - we assume there will only be a single backup each day. The customer has a 5 node cluster. 1 - 4 are production machines, 5 is a failover machine. They would like to know the risk involved when, should a machine be failed over to 5, they back up the data now visible to 5 under the nodename of 5 instead of the original machines nodename, the original machine continues to run a backup as well (this would only see local disk, as a portion of the failed over machines disk is now visible to 5, and hence mark all the non-visible files as inactive) We have told them backups would become inconsistent within filespaces that have the same names across machines, and showed them how fiddly it would be to restore a machine if they had only had one failover occurr in a 31 day period. They would like to know exactly what the risks are if they have multiple failovers within a month, and have multiple machines backing up same-named files under a single nodename!! It depends how the cluster is configured. The TSM Client must be part of the resource group and inside of dsm.sys you must create several stanzas with the TCPPort and nodename forced to diferent numbers and names. And when you start the TSM Client Scheduler you must force the right dsm.opt with -optfile option. They won't take 'It won't work' as a answer, they would like to know how it will impact the point in time restore capability for a particular machine, if they keep track of what machines failed over when. As far as I can work out with penpaper, in a worst case, for a 3 machine cluster where 1 2 can failover to 3 at any time, the maximum impact would be to reduce the point-in-time restore capability for a particular machine by the number of days that machines have been failed over to 3 in the last 31 day period, because files with the same path filename on machines 1 and 2 would expire early if they change more often on one machine than they do on another. It's impossible two machines failover at same time to a third machine if the first two have the same filesystems. They even would import the VGs. You must check if this information it's true. I get a headache if I try and extend this to a 5 machine cluster. do you other TSM'ers agree? Yes, to mange would be a little bit hard
Re: versioning / expiring / multiple backups under same nodename
Hi The description you gave tells me there is something wrong with your configuration. Normally when you set up TSM to handle clustering, you have one TSM nodename for each clusternode(M1,M2,M3). These nodenames are only for backing up local files on the node. Then you have either one nodename for each clusterresource, or one nodename for all clusterresources. You also have to bind the nodename to the clusterresource, so that the TSM service that handles the cluster nodename, moves with the clusterresource. This way, when the resource moves from one node to another, the TSM nodename will follow. There's some good books about this on Tivolis website. Best Regards Daniel Sparrman --- Daniel Sparrman Exist i Stockholm AB Bergkällavägen 31D 192 79 SOLLENTUNA Växel: 08 - 754 98 00 Mobil: 070 - 399 27 51 Warren, Matthew James To: [EMAIL PROTECTED] matthewjames.warrecc: [EMAIL PROTECTED] Subject: Re: versioning / expiring / multiple Sent by: ADSM:backups under same nodename Dist Stor Manager [EMAIL PROTECTED] DU 2002-01-18 16:00 Please respond to ADSM: Dist Stor Manager Thanks, but, the mechanics of the failovers etc.. is fine. only 1 machine will be failed over at any one time. I'll try and clarify; M1 and M2 share some common filespace / dirpath names. M3 is failover machine. Normal; M1 backs up to tsm under node M1, M2 backs up to TSM under node M2, M3 backs up to TSM under nodename M3. if M1 fails over to M3, M3 will now capture M1's files form the shared disk unde hte nodename M3, M1 backs up, but cannot see the shared disk area, so TSM marks all the shared disk files under nodename M1 as inactive. That goes on for a couple of days. Then M1 fails back to M1. M3 backs up, all M1's shared disk files go inactive under nodename M3, and become active files again on M1 under nodename M1. ..Then(!) M2 fails over to M2. The above process is repeated, but is complicated bacause M3 shares filespace names with M1, so, any duplicate filenames will back up and increase the version count of that file under nodename M3; but the version count will be too high as it counts versions from both M1 and M2. This will cause the files to expire earlier than they would have done from M3 than if they had only ever been backed up under the original machine nodename. ..Does anyone follow this? :-/ basically (!) M1, M2 share dirpaths and filenames. The actual data is unique to each machine and is held on a slice of disk that only that machine has access to. M3 is a failover. When a machine is failed over to M3, that machines slice of disk is mounted on M3. The original machine still backs up, but can only see it's local O/S disk. M3 runs backups of all the disk it can see each evening, under the nodename M3. So, if M1 is failed over, its files are backed up under the nodename M3. ..So far, no problem. If you know what days you were failed over you can just get the files from the M3 nodename using -pitd / -pitt or -pick But, M1 fails back to M1, and then M2 fail over to M3. When M3 backs up, it will see M2's disk and save it under the nodename M3. PROBLEM! The shared filespace names between M2 and M1 will now cause TSM to mark files inactive, or back them up creating versions / expirations that should not be happening. Arg! Can anyone see what I'm getting at? -Original Message- From: Anderson F. Nobre [mailto:[EMAIL PROTECTED]] Sent: Wednesday, January 16, 2002 6:30 PM To: [EMAIL PROTECTED] Subject: Re: versioning / expiring / multiple backups under same nodename Hi, I have
Re: versioning / expiring / multiple backups under same nodename
Exactly, while im not doing TSM on my clusters (yet) Daniel is correct. In my MS-clustered environment, I have the following entries in DNS SRV01 SRV02 CLUSTER1 DB2-01 the last 2 are a logical name of the cluster resource. This allows the resource to be managed separately from the physical machine or other resources and/or physical machine. I would caution you to be careful as you indicated the resources DO show up as being owned by the controlling node. IE DON'T back it up via that attachment even if you can see it as when it changes to a new node you will have a different name as you mentioned Please keep us posted as I'll be headed down this road shortly. -mike -- The description you gave tells me there is something wrong with your configuration. Normally when you set up TSM to handle clustering, you have one TSM nodename for each clusternode(M1,M2,M3). These nodenames are only for backing up local files on the node. Then you have either one nodename for each clusterresource, or one nodename for all clusterresources. You also have to bind the nodename to the clusterresource, so that the TSM service that handles the cluster nodename, moves with the clusterresource. - Michael Yager IBM Global Services (919)382-4808
Re: versioning / expiring / multiple backups under same nodename
Run a local backup on M1, M2, and M3 that backs up the data that is not failed over. Run a shared1 and shared2 backup on all systems that backups up the data on the system it exists on. ie: M1 /usr /var / /opt M2 /usr /var / /opt M3 shared1 /u /shareddata run a script to determine if the disk is available and back it up to shared1 if it is same on shared2 Sched all local and shared scripts under the node M1,2,3 This buys 1. Data backs up always under the same nodename 2. Filesystem last backup stats can be used to determine backup success (to doubt check unreliable scheduler status) Jeff Bach -Original Message- From: Warren, Matthew James [SMTP:[EMAIL PROTECTED]] Sent: Friday, January 18, 2002 9:00 AM To: [EMAIL PROTECTED] Subject: Re: versioning / expiring / multiple backups under same nodename Thanks, but, the mechanics of the failovers etc.. is fine. only 1 machine will be failed over at any one time. I'll try and clarify; M1 and M2 share some common filespace / dirpath names. M3 is failover machine. Normal; M1 backs up to tsm under node M1, M2 backs up to TSM under node M2, M3 backs up to TSM under nodename M3. if M1 fails over to M3, M3 will now capture M1's files form the shared disk unde hte nodename M3, M1 backs up, but cannot see the shared disk area, so TSM marks all the shared disk files under nodename M1 as inactive. That goes on for a couple of days. Then M1 fails back to M1. M3 backs up, all M1's shared disk files go inactive under nodename M3, and become active files again on M1 under nodename M1. ..Then(!) M2 fails over to M2. The above process is repeated, but is complicated bacause M3 shares filespace names with M1, so, any duplicate filenames will back up and increase the version count of that file under nodename M3; but the version count will be too high as it counts versions from both M1 and M2. This will cause the files to expire earlier than they would have done from M3 than if they had only ever been backed up under the original machine nodename. ..Does anyone follow this? :-/ basically (!) M1, M2 share dirpaths and filenames. The actual data is unique to each machine and is held on a slice of disk that only that machine has access to. M3 is a failover. When a machine is failed over to M3, that machines slice of disk is mounted on M3. The original machine still backs up, but can only see it's local O/S disk. M3 runs backups of all the disk it can see each evening, under the nodename M3. So, if M1 is failed over, its files are backed up under the nodename M3. ..So far, no problem. If you know what days you were failed over you can just get the files from the M3 nodename using -pitd / -pitt or -pick But, M1 fails back to M1, and then M2 fail over to M3. When M3 backs up, it will see M2's disk and save it under the nodename M3. PROBLEM! The shared filespace names between M2 and M1 will now cause TSM to mark files inactive, or back them up creating versions / expirations that should not be happening. Arg! Can anyone see what I'm getting at? -Original Message- From: Anderson F. Nobre [mailto:[EMAIL PROTECTED]] Sent: Wednesday, January 16, 2002 6:30 PM To: [EMAIL PROTECTED] Subject: Re: versioning / expiring / multiple backups under same nodename Hi, I have a customer who wishes to assess the maximum risk he would incurr in the following situation; We have a copygroup for backup set for 31 day point-in-time recovery. We do not have nolimit for any copygoup parameters - we assume there will only be a single backup each day. The customer has a 5 node cluster. 1 - 4 are production machines, 5 is a failover machine. They would like to know the risk involved when, should a machine be failed over to 5, they back up the data now visible to 5 under the nodename of 5 instead of the original machines nodename, the original machine continues to run a backup as well (this would only see local disk, as a portion of the failed over machines disk is now visible to 5, and hence mark all the non-visible files as inactive) We have told them backups would become inconsistent within filespaces that have the same names across machines, and showed them how fiddly it would be to restore a machine if they had only had one failover occurr in a 31 day period. They would like to know exactly what the risks are if they have multiple failovers within a month, and have multiple machines backing up same-named files under a single nodename!! It depends how the cluster is configured. The TSM Client must be part of the resource group and inside of dsm.sys you must create several stanzas with the TCPPort and nodename forced to diferent numbers and names. And when you start the TSM Client Scheduler you must force the right dsm.opt with -optfile option. They won't take 'It won't work' as a answer