Re: [Veritas-bu] Number of retries query
Thanks guys there are some useful options there. To give more info we run the RMAN job as follows :- -We have an oracle admin station which holds various oracle dba scripts. -We have a policy which controls the scheduling and kicks of a client backup of this oracle management station and backs up a single file to a disk storage unit on the master. (simple directory). We backup one file to stop any status 71 -The reason the policy and schedule is there is to run a bpstart script on the management station. -This bpstart script checks no oracle tape dba script is already running (if it is it exits non zero and obviously gives status 73 in netbackup) -Once the checks are passed it launches an oracle dba script (not maintained by me). -This oracle script talks to 3 oracle RAC servers and works out which one is running the particular db instance. -These oracle RAC servers are all Netbackup media servers and they then initiate the oracle backup through a Netbackup oracle agent on the relevant media server. This backs up using the application schedules on the master server for the associated policy with each media server. (sorry that sounds confusing). -If the oracle script fails and exits with non zero then in turn our bpstart script fails with status 73 and we can alert the dbas We want to launch via netbackup this way so we can trap the exit status and report to the dbas there has been a problem, plus for it to appear on a daily report. The case we have experienced is if a backup fails which could be due to no tapes or various oracle failures, the dba's don't want an automatic one running again as it starts doing things with flash recovery areas and starts running into the normal working day. Indeed perhaps some logic in the bpstart script to create a lockfile is useful, but the lock file would need to be removed upon completion or failure and this would then not give us any benefit when try 2 happens. If a lock file was used we could do some date matching and perhaps only run a job if the lockfile was older than x hours ( a lot of date parsing though which could be difficult ) to touch it again and run the backup. I'll have to explorer this method. Cheers ken_zuf...@goodyear.com wrote: Dave, This isn't an ideal fix, but it will work--schedule the backups from the client. Basically, just put entries in cron (root or oracle will work) with the commands (or script wrapper around the command) to launch the backup instead of using the NBU scheduler (will have to remove current full/incremental schedules and replace with a user directed that has the appropriate windows). Reason this will work is because the automatic retries only affects backups launched from the master...if it's submitted by the client, it will not retry on failure. Only real issues off the top of my head are: 1) If client is down or doesn't have network connectivity, you won't see failure to run backup in NBU because the backup will never be submitted. 2) You lose visibility to backup schedules within NBU. Ken Zufall Technical Analyst D660C The Goodyear Tire Rubber Company GTN 446.0592 or 330.796.0592 *Len Boyle len.bo...@sas.com* Sent by: veritas-bu-boun...@mailman.eng.auburn.edu 04/06/2009 09:30 AM To dave.mark...@fjserv.net dave.mark...@fjserv.net, veritas-bu@mailman.eng.auburn.edu veritas-bu@mailman.eng.auburn.edu cc Subject Re: [Veritas-bu] Number of retries query Good Morning Dave, I know of no way to change the number of job retries on a policy or client or schedule object. I can see where this would be a nice feature to have. There are many different reasons that a rman backup job can fail. From a netbackup end of things one could have a 96 error no scratch tapes, A media fault, A network issue. Etc. Or it could be a oracle issue. For something like a media issue that is cleared up on the netbackup end of things I would think that the dba's would want the backup to be retried. For an oracle issue I do not know enough. But either way I believe that you could add the control you require into the script that netbackup runs on the client to run the rman commands. Might not be easy. I am sure other that know oracle can give you a better answer then this, and I look forward to learning. As a simple case of go or nogo without any variance based on the prior failure you could try. In the beginning of the script you could set a state value of STARTED into a file on client. At the end of the script the vaule could be changed to COMPLETE. At the start of the script if the value is not COMPLETE the script could give an error return and exit. Someone would have to change the statue value to STARTED to enable the script to run. This could be done after clearing the problem. This can also be used to bypass the running of the backup at the script level when the oracle dba's
Re: [Veritas-bu] Number of retries query
and if its use for anyone else here is what i shall implement :- Prev_Job=`bperror -backstat -client oracle mgmt client -hoursago 12 | awk '$14 == policy { print Client [$12], STATUS [$19] }'` if [ $Prev_Job ];then echo ERROR: A previous job has ran in the past [$hours] hours $log echo $Prev_Job $log exit 1 fi ken_zuf...@goodyear.com wrote: Wow, glad I don't have your job...that's pretty convoluted :P But may have an answer, building off the lock file idea...but much simpler. Just put the logic in the bpstart to check and see if the policy you're executing has run in the past X hours and failed...if it has, exit gracefully, if it hasn't, continue the backup. Quick and dirty logic: bperror -backstat -hoursago [hours] -l | awk '{print $19,$14}' | grep -v ^0 | grep [policy_name] In the above, $19 = backup status code, $14 = policy name. Strip out any successful backups, grep for the policy name...if it's not null, you've had a failure in the past X hours. Of course, there are different ways to parse the bperror output, but the above would work. In fact, you shouldn't even have to grep out successes because the process shouldn't be trying to submit the policy if it's run successfully. Ken Zufall Technical Analyst D660C The Goodyear Tire Rubber Company GTN 446.0592 or 330.796.0592 *Dave Markham dave.mark...@fjserv.net* 04/07/2009 06:16 AM Please respond to dave.mark...@fjserv.net To ken_zuf...@goodyear.com cc veritas-bu@mailman.eng.auburn.edu veritas-bu@mailman.eng.auburn.edu Subject Re: [Veritas-bu] Number of retries query Thanks guys there are some useful options there. To give more info we run the RMAN job as follows :- -We have an oracle admin station which holds various oracle dba scripts. -We have a policy which controls the scheduling and kicks of a client backup of this oracle management station and backs up a single file to a disk storage unit on the master. (simple directory). We backup one file to stop any status 71 -The reason the policy and schedule is there is to run a bpstart script on the management station. -This bpstart script checks no oracle tape dba script is already running (if it is it exits non zero and obviously gives status 73 in netbackup) -Once the checks are passed it launches an oracle dba script (not maintained by me). -This oracle script talks to 3 oracle RAC servers and works out which one is running the particular db instance. -These oracle RAC servers are all Netbackup media servers and they then initiate the oracle backup through a Netbackup oracle agent on the relevant media server. This backs up using the application schedules on the master server for the associated policy with each media server. (sorry that sounds confusing). -If the oracle script fails and exits with non zero then in turn our bpstart script fails with status 73 and we can alert the dbas We want to launch via netbackup this way so we can trap the exit status and report to the dbas there has been a problem, plus for it to appear on a daily report. The case we have experienced is if a backup fails which could be due to no tapes or various oracle failures, the dba's don't want an automatic one running again as it starts doing things with flash recovery areas and starts running into the normal working day. Indeed perhaps some logic in the bpstart script to create a lockfile is useful, but the lock file would need to be removed upon completion or failure and this would then not give us any benefit when try 2 happens. If a lock file was used we could do some date matching and perhaps only run a job if the lockfile was older than x hours ( a lot of date parsing though which could be difficult ) to touch it again and run the backup. I'll have to explorer this method. Cheers ken_zuf...@goodyear.com wrote: Dave, This isn't an ideal fix, but it will work--schedule the backups from the client. Basically, just put entries in cron (root or oracle will work) with the commands (or script wrapper around the command) to launch the backup instead of using the NBU scheduler (will have to remove current full/incremental schedules and replace with a user directed that has the appropriate windows). Reason this will work is because the automatic retries only affects backups launched from the master...if it's submitted by the client, it will not retry on failure. Only real issues off the top of my head are: 1) If client is down or doesn't have network connectivity, you won't see failure to run backup in NBU because the backup will never be submitted. 2) You lose visibility to backup schedules within NBU. Ken Zufall Technical Analyst D660C The Goodyear Tire Rubber Company GTN 446.0592 or 330.796.0592 *Len Boyle len.bo...@sas.com* Sent by: veritas-bu-boun...@mailman.eng.auburn.edu 04/06/2009 09:30
Re: [Veritas-bu] Number of retries query
Dave Markham wrote: Sorry correction :- hours=12 Prev_Job=`bperror -backstat -client oracle mgmt client -hoursago $hours | awk '$14 == policy { print Client [$12], STATUS [$19] }'` if [ $Prev_Job ];then echo ERROR: A previous job has ran in the past [$hours] hours $log echo $Prev_Job $log exit 1 fi and if its use for anyone else here is what i shall implement :- Prev_Job=`bperror -backstat -client oracle mgmt client -hoursago 12 | awk '$14 == policy { print Client [$12], STATUS [$19] }'` if [ $Prev_Job ];then echo ERROR: A previous job has ran in the past [$hours] hours $log echo $Prev_Job $log exit 1 fi ken_zuf...@goodyear.com wrote: Wow, glad I don't have your job...that's pretty convoluted :P But may have an answer, building off the lock file idea...but much simpler. Just put the logic in the bpstart to check and see if the policy you're executing has run in the past X hours and failed...if it has, exit gracefully, if it hasn't, continue the backup. Quick and dirty logic: bperror -backstat -hoursago [hours] -l | awk '{print $19,$14}' | grep -v ^0 | grep [policy_name] In the above, $19 = backup status code, $14 = policy name. Strip out any successful backups, grep for the policy name...if it's not null, you've had a failure in the past X hours. Of course, there are different ways to parse the bperror output, but the above would work. In fact, you shouldn't even have to grep out successes because the process shouldn't be trying to submit the policy if it's run successfully. Ken Zufall Technical Analyst D660C The Goodyear Tire Rubber Company GTN 446.0592 or 330.796.0592 *Dave Markham dave.mark...@fjserv.net* 04/07/2009 06:16 AM Please respond to dave.mark...@fjserv.net To ken_zuf...@goodyear.com cc veritas-bu@mailman.eng.auburn.edu veritas-bu@mailman.eng.auburn.edu Subject Re: [Veritas-bu] Number of retries query Thanks guys there are some useful options there. To give more info we run the RMAN job as follows :- -We have an oracle admin station which holds various oracle dba scripts. -We have a policy which controls the scheduling and kicks of a client backup of this oracle management station and backs up a single file to a disk storage unit on the master. (simple directory). We backup one file to stop any status 71 -The reason the policy and schedule is there is to run a bpstart script on the management station. -This bpstart script checks no oracle tape dba script is already running (if it is it exits non zero and obviously gives status 73 in netbackup) -Once the checks are passed it launches an oracle dba script (not maintained by me). -This oracle script talks to 3 oracle RAC servers and works out which one is running the particular db instance. -These oracle RAC servers are all Netbackup media servers and they then initiate the oracle backup through a Netbackup oracle agent on the relevant media server. This backs up using the application schedules on the master server for the associated policy with each media server. (sorry that sounds confusing). -If the oracle script fails and exits with non zero then in turn our bpstart script fails with status 73 and we can alert the dbas We want to launch via netbackup this way so we can trap the exit status and report to the dbas there has been a problem, plus for it to appear on a daily report. The case we have experienced is if a backup fails which could be due to no tapes or various oracle failures, the dba's don't want an automatic one running again as it starts doing things with flash recovery areas and starts running into the normal working day. Indeed perhaps some logic in the bpstart script to create a lockfile is useful, but the lock file would need to be removed upon completion or failure and this would then not give us any benefit when try 2 happens. If a lock file was used we could do some date matching and perhaps only run a job if the lockfile was older than x hours ( a lot of date parsing though which could be difficult ) to touch it again and run the backup. I'll have to explorer this method. Cheers ken_zuf...@goodyear.com wrote: Dave, This isn't an ideal fix, but it will work--schedule the backups from the client. Basically, just put entries in cron (root or oracle will work) with the commands (or script wrapper around the command) to launch the backup instead of using the NBU scheduler (will have to remove current full/incremental schedules and replace with a user directed that has the appropriate windows). Reason this will work is because the automatic retries only affects backups launched from the master...if it's submitted by the client, it will not retry on failure. Only real issues off the top of my head are: 1) If client is down or doesn't have network connectivity, you won't see failure to run backup in NBU because
[Veritas-bu] Number of retries query
Guys does anyone know if you can change the number of job retries in xx time period on a per client basis? I currently have the global set at 2 tries per 12 hours which is fine for our needs and good in the fact it will try a failed backup. However the DBA for an RMAN and oracle policy doesn't want this to happen and re-run a backup if there is a failure so i need to try and find a way of setting it to 1 try for just one client. Any ideas? Cheers ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Number of retries query
Good Morning Dave, I know of no way to change the number of job retries on a policy or client or schedule object. I can see where this would be a nice feature to have. There are many different reasons that a rman backup job can fail. From a netbackup end of things one could have a 96 error no scratch tapes, A media fault, A network issue. Etc. Or it could be a oracle issue. For something like a media issue that is cleared up on the netbackup end of things I would think that the dba's would want the backup to be retried. For an oracle issue I do not know enough. But either way I believe that you could add the control you require into the script that netbackup runs on the client to run the rman commands. Might not be easy. I am sure other that know oracle can give you a better answer then this, and I look forward to learning. As a simple case of go or nogo without any variance based on the prior failure you could try. In the beginning of the script you could set a state value of STARTED into a file on client. At the end of the script the vaule could be changed to COMPLETE. At the start of the script if the value is not COMPLETE the script could give an error return and exit. Someone would have to change the statue value to STARTED to enable the script to run. This could be done after clearing the problem. This can also be used to bypass the running of the backup at the script level when the oracle dba's are doing maintenance work on the oracle database. If you use and check for some state value of BYPASS then the script could exit with a normal return code and netbackup would not have a backup but would think that everything is ok and not retry. You could also use touch files instead on one state file. Let us know what you end of doing to solve this issue. len -Original Message- From: veritas-bu-boun...@mailman.eng.auburn.edu [mailto:veritas-bu-boun...@mailman.eng.auburn.edu] On Behalf Of Dave Markham Sent: Monday, April 06, 2009 8:17 AM To: veritas-bu@mailman.eng.auburn.edu Subject: [Veritas-bu] Number of retries query Guys does anyone know if you can change the number of job retries in xx time period on a per client basis? I currently have the global set at 2 tries per 12 hours which is fine for our needs and good in the fact it will try a failed backup. However the DBA for an RMAN and oracle policy doesn't want this to happen and re-run a backup if there is a failure so i need to try and find a way of setting it to 1 try for just one client. Any ideas? Cheers ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Number of retries query
Dave, This isn't an ideal fix, but it will work--schedule the backups from the client. Basically, just put entries in cron (root or oracle will work) with the commands (or script wrapper around the command) to launch the backup instead of using the NBU scheduler (will have to remove current full/incremental schedules and replace with a user directed that has the appropriate windows). Reason this will work is because the automatic retries only affects backups launched from the master...if it's submitted by the client, it will not retry on failure. Only real issues off the top of my head are: 1) If client is down or doesn't have network connectivity, you won't see failure to run backup in NBU because the backup will never be submitted. 2) You lose visibility to backup schedules within NBU. Ken Zufall Technical Analyst D660C The Goodyear Tire Rubber Company GTN 446.0592 or 330.796.0592 Len Boyle len.bo...@sas.com Sent by: veritas-bu-boun...@mailman.eng.auburn.edu 04/06/2009 09:30 AM To dave.mark...@fjserv.net dave.mark...@fjserv.net, veritas-bu@mailman.eng.auburn.edu veritas-bu@mailman.eng.auburn.edu cc Subject Re: [Veritas-bu] Number of retries query Good Morning Dave, I know of no way to change the number of job retries on a policy or client or schedule object. I can see where this would be a nice feature to have. There are many different reasons that a rman backup job can fail. From a netbackup end of things one could have a 96 error no scratch tapes, A media fault, A network issue. Etc. Or it could be a oracle issue. For something like a media issue that is cleared up on the netbackup end of things I would think that the dba's would want the backup to be retried. For an oracle issue I do not know enough. But either way I believe that you could add the control you require into the script that netbackup runs on the client to run the rman commands. Might not be easy. I am sure other that know oracle can give you a better answer then this, and I look forward to learning. As a simple case of go or nogo without any variance based on the prior failure you could try. In the beginning of the script you could set a state value of STARTED into a file on client. At the end of the script the vaule could be changed to COMPLETE. At the start of the script if the value is not COMPLETE the script could give an error return and exit. Someone would have to change the statue value to STARTED to enable the script to run. This could be done after clearing the problem. This can also be used to bypass the running of the backup at the script level when the oracle dba's are doing maintenance work on the oracle database. If you use and check for some state value of BYPASS then the script could exit with a normal return code and netbackup would not have a backup but would think that everything is ok and not retry. You could also use touch files instead on one state file. Let us know what you end of doing to solve this issue. len -Original Message- From: veritas-bu-boun...@mailman.eng.auburn.edu [mailto:veritas-bu-boun...@mailman.eng.auburn.edu] On Behalf Of Dave Markham Sent: Monday, April 06, 2009 8:17 AM To: veritas-bu@mailman.eng.auburn.edu Subject: [Veritas-bu] Number of retries query Guys does anyone know if you can change the number of job retries in xx time period on a per client basis? I currently have the global set at 2 tries per 12 hours which is fine for our needs and good in the fact it will try a failed backup. However the DBA for an RMAN and oracle policy doesn't want this to happen and re-run a backup if there is a failure so i need to try and find a way of setting it to 1 try for just one client. Any ideas? Cheers ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu