Re: versioning / expiring / multiple backups under same nodename
Quick and dirty answer - the risk is *between 0 and 100* and cannot be evaluated *exactly* ! If there is no failover during the month the risk is certainly 0. In case of failover or failback during the backup window the risk is ~= 100. And can either you or your customer forget for while about backups and just explain which resource is where at specific time after several failovers and failbacks? The configuration your described is somewhat messy and the main question goes in cluster configuration direction and aside from TSM/backup. Lets assume we have a good cluster configuration (other people here called it "supported"). Lets catch the bull horns and go direct to five systems with machines M1-M5, local resources LR1-LR5 and cluster resources CR1-CR4 (or even CR5, it does not change the things): - create dsm.sys for M1-M4 containing server/node stanzas for nodes LR1+CR1, ... , LR4 + CR4 - create dsm.sys for M5 containing stanzas for LR5+CR1+CR2+CR3+CR4 - create dsmLR.opt for each M1-M5 in a local (non-shared) resource and use it to start OS&cluster binaries - create dsmCR.opt for each CR1-CR4 in a shared filesystem accessible after failover/failback - configure on OS start the startup of LR TSM client scheduler using dsmLR.opt - configure on primary nodes boot startup of cluster TSM client schedule using CR.opt - ensure failover/failback scripts stop the cluster resource TSM client scheduler on the failing server and consecutive startup of this scheduler on the new active node (in your case it would be always M - M5 or M5 - M). Trigger new incremental backup immeditely after failover/failback to secure interrupted backup during failover. - ensure correct domain statements in each stanza Using something similar the risk can be estimated - it is close to the risk of backing up an ordinary server. The increased server availability due to clustering does not lower backup risk, it lowers downtime risk. And this might be used not only for all-fail-over-one but even for any-fail-anywhere cluster. So do it by the book and you will go the paved road, do it as you wish and enjoy the jungle. If nobody used this before none could evaluated it. Zlatko Krastev IT Consultant "Warren, Matthew James" <[EMAIL PROTECTED]> on 21.01.2002 14:22:43 Please respond to "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] cc: Subject: Re: versioning / expiring / multiple backups under same nodename We know the correct way to be backing up the cluster. The customer did not implement it this way, but we are recommending them to do so. Before they switch from their current set-up to how we are recommending they backup the cluster (With 2 TSM environments on each machine, one for local disk, the other for shared disk) they would like to know exactly what risk they are runing with their current setup. Matt. -Original Message- From: Mike Yager [mailto:[EMAIL PROTECTED]] Sent: Friday, January 18, 2002 4:27 PM To: [EMAIL PROTECTED] Subject: Re: versioning / expiring / multiple backups under same nodename Exactly, while im not doing TSM on my clusters (yet) Daniel is correct. In my MS-clustered environment, I have the following entries in DNS SRV01 SRV02 CLUSTER1 DB2-01 the last 2 are a "logical" name of the cluster resource. This allows the "resource" to be managed separately from the physical machine or other "resources" and/or physical machine. I would caution you to be careful as you indicated the resources DO show up as being owned by the controlling node. IE DON'T back it up via that attachment even if you can see it as when it changes to a new node you will have a different name as you mentioned Please keep us posted as I'll be headed down this road shortly. -mike ?-- The description you gave tells me there is something wrong with your configuration. Normally when you set up TSM to handle clustering, you have one TSM nodename for each clusternode(M1,M2,M3). These nodenames are only for backing up local files on the node. Then you have either one nodename for each clusterresource, or one nodename for all clusterresources. You also have to bind the nodename to the clusterresource, so that the TSM service that handles the cluster nodename, moves with the clusterresource. - Michael Yager IBM Global Services (919)382-4808
Re: versioning / expiring / multiple backups under same nodename
Hi To start with, they are running a configuration that is not supported by either Tivoli nor IBM. Second of all, running this configuration will result in inconsistent data, and you could question whether or not a complete a full restore of the system. The problem you have to describe to your customer, is that there is a large chance, that they won't be able to restore the cluster environment in case of a disaster. Therefore, it is in their own interest to move over to the supported configuration a.s.a.p. This is not a question of which way to configure the clustered environment, but instead of moving to the only way that will work. Best Regards Daniel Sparrman --- Daniel Sparrman Exist i Stockholm AB Bergkällavägen 31D 192 79 SOLLENTUNA Växel: 08 - 754 98 00 Mobil: 070 - 399 27 51 "Warren, Matthew James" To: [EMAIL PROTECTED] Subject: Re: versioning / expiring / multiple Sent by: "ADSM:backups under same nodename Dist Stor Manager" <[EMAIL PROTECTED] DU> 2002-01-21 13:22 Please respond to "ADSM: Dist Stor Manager" We know the correct way to be backing up the cluster. The customer did not implement it this way, but we are recommending them to do so. Before they switch from their current set-up to how we are recommending they backup the cluster (With 2 TSM environments on each machine, one for local disk, the other for shared disk) they would like to know exactly what risk they are runing with their current setup. Matt. -Original Message- From: Mike Yager [mailto:[EMAIL PROTECTED]] Sent: Friday, January 18, 2002 4:27 PM To: [EMAIL PROTECTED] Subject: Re: versioning / expiring / multiple backups under same nodename Exactly, while im not doing TSM on my clusters (yet) Daniel is correct. In my MS-clustered environment, I have the following entries in DNS SRV01 SRV02 CLUSTER1 DB2-01 the last 2 are a "logical" name of the cluster resource. This allows the "resource" to be managed separately from the physical machine or other "resources" and/or physical machine. I would caution you to be careful as you indicated the resources DO show up as being owned by the controlling node. IE DON'T back it up via that attachment even if you can see it as when it changes to a new node you will have a different name as you mentioned Please keep us posted as I'll be headed down this road shortly. -mike ?-- The description you gave tells me there is something wrong with your configuration. Normally when you set up TSM to handle clustering, you have one TSM nodename for each clusternode(M1,M2,M3). These nodenames are only for backing up local files on the node. Then you have either one nodename for each clusterresource, or one nodename for all clusterresources. You also have to bind the nodename to the clusterresource, so that the TSM service that handles the cluster nodename, moves with the clusterresource. - Michael Yager IBM Global Services (919)382-4808
Re: versioning / expiring / multiple backups under same nodename
We know the correct way to be backing up the cluster. The customer did not implement it this way, but we are recommending them to do so. Before they switch from their current set-up to how we are recommending they backup the cluster (With 2 TSM environments on each machine, one for local disk, the other for shared disk) they would like to know exactly what risk they are runing with their current setup. Matt. -Original Message- From: Mike Yager [mailto:[EMAIL PROTECTED]] Sent: Friday, January 18, 2002 4:27 PM To: [EMAIL PROTECTED] Subject: Re: versioning / expiring / multiple backups under same nodename Exactly, while im not doing TSM on my clusters (yet) Daniel is correct. In my MS-clustered environment, I have the following entries in DNS SRV01 SRV02 CLUSTER1 DB2-01 the last 2 are a "logical" name of the cluster resource. This allows the "resource" to be managed separately from the physical machine or other "resources" and/or physical machine. I would caution you to be careful as you indicated the resources DO show up as being owned by the controlling node. IE DON'T back it up via that attachment even if you can see it as when it changes to a new node you will have a different name as you mentioned Please keep us posted as I'll be headed down this road shortly. -mike ?-- The description you gave tells me there is something wrong with your configuration. Normally when you set up TSM to handle clustering, you have one TSM nodename for each clusternode(M1,M2,M3). These nodenames are only for backing up local files on the node. Then you have either one nodename for each clusterresource, or one nodename for all clusterresources. You also have to bind the nodename to the clusterresource, so that the TSM service that handles the cluster nodename, moves with the clusterresource. - Michael Yager IBM Global Services (919)382-4808
Re: versioning / expiring / multiple backups under same nodename
Run a local backup on M1, M2, and M3 that backs up the data that is not failed over. Run a shared1 and shared2 backup on all systems that backups up the data on the system it exists on. ie: M1 /usr /var / /opt M2 /usr /var / /opt M3 "" shared1 /u /shareddata run a script to determine if the disk is available and back it up to shared1 if it is same on shared2 Sched all local and shared scripts under the node M1,2,3 This buys 1. Data backs up always under the same nodename 2. Filesystem last backup stats can be used to determine backup success (to doubt check unreliable scheduler status) Jeff Bach > -Original Message- > From: Warren, Matthew James [SMTP:[EMAIL PROTECTED]] > Sent: Friday, January 18, 2002 9:00 AM > To: [EMAIL PROTECTED] > Subject: Re: versioning / expiring / multiple backups under same > nodename > > Thanks, > > but, the mechanics of the failovers etc.. is fine. only 1 machine will be > failed over at any one time. > > I'll try and clarify; > > M1 and M2 share some common filespace / dirpath names. M3 is failover > machine. > > Normal; M1 backs up to tsm under node M1, M2 backs up to TSM under node > M2, > M3 backs up to TSM under nodename M3. > > > if M1 fails over to M3, M3 will now capture M1's files form the shared > disk > unde hte nodename M3, M1 backs up, but cannot see the shared disk area, so > TSM marks all the shared disk files under nodename M1 as inactive. > > That goes on for a couple of days. Then M1 fails back to M1. M3 backs up, > all M1's shared disk files go inactive under nodename M3, and become > active > files again on M1 under nodename M1. > > ..Then(!) M2 fails over to M2. The above process is repeated, but is > complicated bacause M3 shares filespace names with M1, so, any duplicate > filenames will back up and increase the version count of that file under > nodename M3; but the version count will be too high as it counts versions > from both M1 and M2. This will cause the files to expire earlier than they > would have done from M3 than if they had only ever been backed up under > the > original machine nodename. > > > ..Does anyone follow this? :-/ > > basically (!) > > M1, M2 share dirpaths and filenames. The actual data is unique to each > machine and is held on a slice of disk that only that machine has access > to. > > M3 is a failover. When a machine is failed over to M3, that machines slice > of disk is mounted on M3. The original machine still backs up, but can > only > see it's local O/S disk. > > M3 runs backups of all the disk it can see each evening, under the > nodename > M3. > > > So, if M1 is failed over, its files are backed up under the nodename M3. > > ..So far, no problem. If you know what days you were failed over you can > just get the files from the M3 nodename using -pitd / -pitt or -pick > > But, M1 fails back to M1, and then M2 fail over to M3. > > When M3 backs up, it will see M2's disk and save it under the nodename M3. > PROBLEM! The shared filespace names between M2 and M1 will now cause TSM > to > mark files inactive, or back them up creating versions / expirations that > should not be happening. > > > Arg! > > Can anyone see what I'm getting at? > > > > -Original Message- > From: Anderson F. Nobre [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, January 16, 2002 6:30 PM > To: [EMAIL PROTECTED] > Subject: Re: versioning / expiring / multiple backups under same > nodename > > > Hi, > > > I have a customer who wishes to assess the maximum risk he would incurr > in > > the following situation; > > > > > > We have a copygroup for backup set for 31 day point-in-time recovery. We > do > > not have nolimit for any copygoup parameters - we assume there will only > be > > a single backup each day. > > > > > > The customer has a 5 node cluster. 1 -> 4 are production machines, 5 is > a > > failover machine. > > > > They would like to know the risk involved when, should a machine be > failed > > over to 5, they back up the data now visible to 5 under the nodename of > 5 > > instead of the original machines nodename, & the original machine > continues > > to run a backup as well (this would only see local disk, as a portion of > the > > failed over machines disk is now visible to 5, and hence mark all the > > non-visible files as inactive) > > > > We have told them backups would become inconsistent within filespaces > that > > have the same names across machines, and showed them how fiddly it would > be > > to restore
Re: versioning / expiring / multiple backups under same nodename
Exactly, while im not doing TSM on my clusters (yet) Daniel is correct. In my MS-clustered environment, I have the following entries in DNS SRV01 SRV02 CLUSTER1 DB2-01 the last 2 are a "logical" name of the cluster resource. This allows the "resource" to be managed separately from the physical machine or other "resources" and/or physical machine. I would caution you to be careful as you indicated the resources DO show up as being owned by the controlling node. IE DON'T back it up via that attachment even if you can see it as when it changes to a new node you will have a different name as you mentioned Please keep us posted as I'll be headed down this road shortly. -mike —-- The description you gave tells me there is something wrong with your configuration. Normally when you set up TSM to handle clustering, you have one TSM nodename for each clusternode(M1,M2,M3). These nodenames are only for backing up local files on the node. Then you have either one nodename for each clusterresource, or one nodename for all clusterresources. You also have to bind the nodename to the clusterresource, so that the TSM service that handles the cluster nodename, moves with the clusterresource. - Michael Yager IBM Global Services (919)382-4808
Re: versioning / expiring / multiple backups under same nodename
Hi The description you gave tells me there is something wrong with your configuration. Normally when you set up TSM to handle clustering, you have one TSM nodename for each clusternode(M1,M2,M3). These nodenames are only for backing up local files on the node. Then you have either one nodename for each clusterresource, or one nodename for all clusterresources. You also have to bind the nodename to the clusterresource, so that the TSM service that handles the cluster nodename, moves with the clusterresource. This way, when the resource moves from one node to another, the TSM nodename will follow. There's some good books about this on Tivolis website. Best Regards Daniel Sparrman --- Daniel Sparrman Exist i Stockholm AB Bergkällavägen 31D 192 79 SOLLENTUNA Växel: 08 - 754 98 00 Mobil: 070 - 399 27 51 "Warren, Matthew James" To: [EMAIL PROTECTED] Subject: Re: versioning / expiring / multiple Sent by: "ADSM:backups under same nodename Dist Stor Manager" <[EMAIL PROTECTED] DU> 2002-01-18 16:00 Please respond to "ADSM: Dist Stor Manager" Thanks, but, the mechanics of the failovers etc.. is fine. only 1 machine will be failed over at any one time. I'll try and clarify; M1 and M2 share some common filespace / dirpath names. M3 is failover machine. Normal; M1 backs up to tsm under node M1, M2 backs up to TSM under node M2, M3 backs up to TSM under nodename M3. if M1 fails over to M3, M3 will now capture M1's files form the shared disk unde hte nodename M3, M1 backs up, but cannot see the shared disk area, so TSM marks all the shared disk files under nodename M1 as inactive. That goes on for a couple of days. Then M1 fails back to M1. M3 backs up, all M1's shared disk files go inactive under nodename M3, and become active files again on M1 under nodename M1. ..Then(!) M2 fails over to M2. The above process is repeated, but is complicated bacause M3 shares filespace names with M1, so, any duplicate filenames will back up and increase the version count of that file under nodename M3; but the version count will be too high as it counts versions from both M1 and M2. This will cause the files to expire earlier than they would have done from M3 than if they had only ever been backed up under the original machine nodename. ..Does anyone follow this? :-/ basically (!) M1, M2 share dirpaths and filenames. The actual data is unique to each machine and is held on a slice of disk that only that machine has access to. M3 is a failover. When a machine is failed over to M3, that machines slice of disk is mounted on M3. The original machine still backs up, but can only see it's local O/S disk. M3 runs backups of all the disk it can see each evening, under the nodename M3. So, if M1 is failed over, its files are backed up under the nodename M3. ..So far, no problem. If you know what days you were failed over you can just get the files from the M3 nodename using -pitd / -pitt or -pick But, M1 fails back to M1, and then M2 fail over to M3. When M3 backs up, it will see M2's disk and save it under the nodename M3. PROBLEM! The shared filespace names between M2 and M1 will now cause TSM to mark files inactive, or back them up creating versions / expirations that should not be happening. Arg! Can anyone see what I'm getting at? -Original Message- From: Anderson F. Nobre [mailto:[EMAIL PROTECTED]] Sent: Wednesday, January 16, 2002 6:30 PM To: [EMAIL PROTECTED] Subject: Re: versioning / expiring / multiple backups under same nodename Hi, > I have a customer who wishes to assess th
Re: versioning / expiring / multiple backups under same nodename
Thanks, but, the mechanics of the failovers etc.. is fine. only 1 machine will be failed over at any one time. I'll try and clarify; M1 and M2 share some common filespace / dirpath names. M3 is failover machine. Normal; M1 backs up to tsm under node M1, M2 backs up to TSM under node M2, M3 backs up to TSM under nodename M3. if M1 fails over to M3, M3 will now capture M1's files form the shared disk unde hte nodename M3, M1 backs up, but cannot see the shared disk area, so TSM marks all the shared disk files under nodename M1 as inactive. That goes on for a couple of days. Then M1 fails back to M1. M3 backs up, all M1's shared disk files go inactive under nodename M3, and become active files again on M1 under nodename M1. ..Then(!) M2 fails over to M2. The above process is repeated, but is complicated bacause M3 shares filespace names with M1, so, any duplicate filenames will back up and increase the version count of that file under nodename M3; but the version count will be too high as it counts versions from both M1 and M2. This will cause the files to expire earlier than they would have done from M3 than if they had only ever been backed up under the original machine nodename. ..Does anyone follow this? :-/ basically (!) M1, M2 share dirpaths and filenames. The actual data is unique to each machine and is held on a slice of disk that only that machine has access to. M3 is a failover. When a machine is failed over to M3, that machines slice of disk is mounted on M3. The original machine still backs up, but can only see it's local O/S disk. M3 runs backups of all the disk it can see each evening, under the nodename M3. So, if M1 is failed over, its files are backed up under the nodename M3. ..So far, no problem. If you know what days you were failed over you can just get the files from the M3 nodename using -pitd / -pitt or -pick But, M1 fails back to M1, and then M2 fail over to M3. When M3 backs up, it will see M2's disk and save it under the nodename M3. PROBLEM! The shared filespace names between M2 and M1 will now cause TSM to mark files inactive, or back them up creating versions / expirations that should not be happening. Arg! Can anyone see what I'm getting at? -Original Message- From: Anderson F. Nobre [mailto:[EMAIL PROTECTED]] Sent: Wednesday, January 16, 2002 6:30 PM To: [EMAIL PROTECTED] Subject: Re: versioning / expiring / multiple backups under same nodename Hi, > I have a customer who wishes to assess the maximum risk he would incurr in > the following situation; > > > We have a copygroup for backup set for 31 day point-in-time recovery. We do > not have nolimit for any copygoup parameters - we assume there will only be > a single backup each day. > > > The customer has a 5 node cluster. 1 -> 4 are production machines, 5 is a > failover machine. > > They would like to know the risk involved when, should a machine be failed > over to 5, they back up the data now visible to 5 under the nodename of 5 > instead of the original machines nodename, & the original machine continues > to run a backup as well (this would only see local disk, as a portion of the > failed over machines disk is now visible to 5, and hence mark all the > non-visible files as inactive) > > We have told them backups would become inconsistent within filespaces that > have the same names across machines, and showed them how fiddly it would be > to restore a machine if they had only had one failover occurr in a 31 day > period. They would like to know exactly what the risks are if they have > multiple failovers within a month, and have multiple machines backing up > same-named files under a single nodename!! > It depends how the cluster is configured. The TSM Client must be part of the resource group and inside of dsm.sys you must create several stanzas with the TCPPort and nodename forced to diferent numbers and names. And when you start the TSM Client Scheduler you must force the right dsm.opt with -optfile option. > They won't take 'It won't work' as a answer, they would like to know how it > will impact the point in time restore capability for a particular machine, > if they keep track of what machines failed over when. > > As far as I can work out with pen&paper, in a worst case, for a 3 machine > cluster where 1 & 2 can failover to 3 at any time, the maximum impact would > be to reduce the point-in-time restore capability for a particular machine > by the number of days that machines have been failed over to 3 in the last > 31 day period, because files with the same path filename on machines 1 and 2 > would expire early if they change more often on one machine than they do on > another. > It's impossible two machines failover at same time to a third machine if the first two have the same filesystems.
Re: versioning / expiring / multiple backups under same nodename
Hi, > I have a customer who wishes to assess the maximum risk he would incurr in > the following situation; > > > We have a copygroup for backup set for 31 day point-in-time recovery. We do > not have nolimit for any copygoup parameters - we assume there will only be > a single backup each day. > > > The customer has a 5 node cluster. 1 -> 4 are production machines, 5 is a > failover machine. > > They would like to know the risk involved when, should a machine be failed > over to 5, they back up the data now visible to 5 under the nodename of 5 > instead of the original machines nodename, & the original machine continues > to run a backup as well (this would only see local disk, as a portion of the > failed over machines disk is now visible to 5, and hence mark all the > non-visible files as inactive) > > We have told them backups would become inconsistent within filespaces that > have the same names across machines, and showed them how fiddly it would be > to restore a machine if they had only had one failover occurr in a 31 day > period. They would like to know exactly what the risks are if they have > multiple failovers within a month, and have multiple machines backing up > same-named files under a single nodename!! > It depends how the cluster is configured. The TSM Client must be part of the resource group and inside of dsm.sys you must create several stanzas with the TCPPort and nodename forced to diferent numbers and names. And when you start the TSM Client Scheduler you must force the right dsm.opt with -optfile option. > They won't take 'It won't work' as a answer, they would like to know how it > will impact the point in time restore capability for a particular machine, > if they keep track of what machines failed over when. > > As far as I can work out with pen&paper, in a worst case, for a 3 machine > cluster where 1 & 2 can failover to 3 at any time, the maximum impact would > be to reduce the point-in-time restore capability for a particular machine > by the number of days that machines have been failed over to 3 in the last > 31 day period, because files with the same path filename on machines 1 and 2 > would expire early if they change more often on one machine than they do on > another. > It's impossible two machines failover at same time to a third machine if the first two have the same filesystems. They even would import the VGs. You must check if this information it's true. > I get a headache if I try and extend this to a 5 machine cluster. > > do you other TSM'ers agree? > Yes, to mange would be a little bit hard. But the client probably has good administrators. Or in worst case you can sell a support contract to administer his environment. :-) > and, I know from our perspective this is a 'silly' thing to work out because > they should listen to the advice of the people that know & switch to backing > things up correctly, but they are insisting they have this info... > > Any help is much appreciated! > Thankyou, > > Matt. >
versioning / expiring / multiple backups under same nodename
Hi TSM'ers, (this ones a little long winded, sorry :) ) I have a customer who wishes to assess the maximum risk he would incurr in the following situation; We have a copygroup for backup set for 31 day point-in-time recovery. We do not have nolimit for any copygoup parameters - we assume there will only be a single backup each day. The customer has a 5 node cluster. 1 -> 4 are production machines, 5 is a failover machine. They would like to know the risk involved when, should a machine be failed over to 5, they back up the data now visible to 5 under the nodename of 5 instead of the original machines nodename, & the original machine continues to run a backup as well (this would only see local disk, as a portion of the failed over machines disk is now visible to 5, and hence mark all the non-visible files as inactive) We have told them backups would become inconsistent within filespaces that have the same names across machines, and showed them how fiddly it would be to restore a machine if they had only had one failover occurr in a 31 day period. They would like to know exactly what the risks are if they have multiple failovers within a month, and have multiple machines backing up same-named files under a single nodename!! They won't take 'It won't work' as a answer, they would like to know how it will impact the point in time restore capability for a particular machine, if they keep track of what machines failed over when. As far as I can work out with pen&paper, in a worst case, for a 3 machine cluster where 1 & 2 can failover to 3 at any time, the maximum impact would be to reduce the point-in-time restore capability for a particular machine by the number of days that machines have been failed over to 3 in the last 31 day period, because files with the same path filename on machines 1 and 2 would expire early if they change more often on one machine than they do on another. I get a headache if I try and extend this to a 5 machine cluster. do you other TSM'ers agree? and, I know from our perspective this is a 'silly' thing to work out because they should listen to the advice of the people that know & switch to backing things up correctly, but they are insisting they have this info... Any help is much appreciated! Thankyou, Matt.