Re: [Veritas-bu] Question posed to ACSLS/STK8500 users.
And you see now errors in acsss_event.log on the ACSLS server when the drives get downed? Hampus Lind Rikspolisstyrelsen National Police Board Tel dir: +46 (0)8 - 401 99 43 Tel mob: +46 (0)70 - 217 92 66 E-mail: [EMAIL PROTECTED] -Ursprungligt meddelande- Från: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] För Justin Piszcz Skickat: den 8 december 2006 21:05 Till: Hall, Christian N. Kopia: Mike Dunn (veritas-bu); veritas-bu@mailman.eng.auburn.edu Ämne: Re: [Veritas-bu] Question posed to ACSLS/STK8500 users. Yes everything matches perfectly. Remember, if I run the backups slowly, one at a time, I can see each of the 4 drives being used per each media server. When I run a burst of jobs though, 29-30 of them work (1 tape per each drive) and a RANDOM 2-3 drives do not work (it differs each time I do it).. Currently I am not using MPX so Ican easily test, ie 1 job = 1 tape drive. Justin. On Fri, 8 Dec 2006, Hall, Christian N. wrote: > Justin, > > Do the ACSLS,LSM,PANEL,DRIVE NUMBER for ACSLS match serial number > results from the tpautconf -t on the master server /dev/rmt/*cbn? > Can you please display the output? Did you perform this test from your > master server, or did you perform this test from each host that are > media servers? After you attempt your multi-plexing do you have stuck > tapes? > > Chris > > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Justin > Piszcz > Sent: Friday, December 08, 2006 2:44 PM > To: Mike Dunn (veritas-bu) > Cc: veritas-bu@mailman.eng.auburn.edu > Subject: Re: [Veritas-bu] Question posed to ACSLS/STK8500 users. > > It is 100% correct. Yep. I ran about 5 test backups to each drive in > the robot. No problems. It is only when there is a burst of jobs. > > Justin. > > On Fri, 8 Dec 2006, Mike Dunn (veritas-bu) wrote: > > > Justin, > > > > Are you absolutely certain that you have your drive mapping done > properly? > > The fact that the job fails 30 minutes after the initial mount attempt > > > makes it sound like you are failing with a media mount time out. The > > most common cause (especially with ACS environments) is a simple > > mismatch betwee the /dev/rmt path and your ACS path (i.e. > > ACS,LSM,PANEL,DRIVE). The SL8500 is also very difficult to address > > properly, since the ACS path has little correlation with the physical > location of the drive. > > > > Probably the quickest test you can perform is to verify that your jobs > > > are being affected by the media mount timeout. If you shorten the > > media mount timeout parameter, to say 10 minutes, your jobs should > > fail 10 minutes after they start if the mount timeout is what fails > the jobs. > > > > You should also track down which drives are failing to mount, and see > > if there is a correlation. > > > > Cheers > > Mike > > > > > > > > > > Message: 7 > > > Date: Fri, 8 Dec 2006 11:08:39 -0500 (EST) > > > From: Justin Piszcz <[EMAIL PROTECTED]> > > > Subject: [Veritas-bu] Question posed to ACSLS/STK8500 users. > > > To: veritas-bu@mailman.eng.auburn.edu > > > Message-ID: <[EMAIL PROTECTED]> > > > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > > > > All, > > > > > > My group is setting up two Sun/StorageTek SL8500s. Sun did the > > > install of ACSLS, there were no problems on their side. Each SL8500 > > > > is in its own environment. On each SL8500, we have 8 media servers, > > > > connected to four drives each, giving us a total of 32 drives. For > > > testing, I did the following. Ran a NON-MULTIPLEXED backup to each > > > drive, to ensure each drive worked properly. To do this I kicked > > > off four jobs in succession. When I do this, I utilize all 4 drives. > > > > I did this with each media server without a single problem. > > > However, when testing everything together, all 32 drives, I kick off > > > > 45 jobs for example. It says there are 32 active jobs in netbackup, > > > > which is correct. The problem is, randomly, 2 or 3 jobs will hang > > > at "Mounting MediaID.." and then the drive will go down after 30 > > > minutes. Why is this? With an L700, I can send 500-1000 jobs to > > > all of the drives in it and there is never a mounting problem. > > > There is nothing wrong with any of the drives, they are brand new. > > > I can use ACSLS and dismount the media from the drives and then > > > re-run my earlier test backups, one at a time to each of the four >
Re: [Veritas-bu] Question posed to ACSLS/STK8500 users.
Are you using UDP for communication with your acs server? (UDP is default). If is, try switching to TCP. Cheers Mike On 2:16:50 pm 2006-12-08 Justin Piszcz <[EMAIL PROTECTED]> wrote: > Nope, only 1 NIC. And even so yeah I do specify that in the vm.conf > just incase. > > Justin. > > On Fri, 8 Dec 2006, Mike Dunn (veritas-bu) wrote: > > > Hmmm, do your media server's have multiple NIC, and are you using > > IP multipathing software? (like in.mpathd under Solaris) If so, > > then make sure that you have set the ACS_SSI_HOSTNAME > > appropriately in your vm.conf file. The acs daemon inserts the > > value (or inferred value) of ACS_SSI_HOSTNAME into all > > communications with the acs server. Also, make sure that if you > > are using acls on the acs server, that they match the name/IP used > > in ACS_SSI_HOSTNAME. > >Cheers > >Mike > > > > > > On 1:43:52 pm 2006-12-08 Justin Piszcz <[EMAIL PROTECTED]> > > > wrote: It is 100% correct. Yep. I ran about 5 test backups to > > > each drive in the robot. No problems. It is only when there is > > > a burst of jobs. > > > Justin. > > > > > > On Fri, 8 Dec 2006, Mike Dunn (veritas-bu) wrote: > > > > > > > Justin, > > > > > > > > Are you absolutely certain that you have your drive mapping > > > > done properly? The fact that the job fails 30 minutes after > > > > the initial mount attempt makes it sound like you are failing > > > > with a media mount time out. The most common cause > > > > (especially with ACS environments) is a simple mismatch > > > > betwee the /dev/rmt path and your ACS path (i.e. > > > > ACS,LSM,PANEL,DRIVE). The SL8500 is also very difficult to > > > > address properly, since the ACS path has little correlation > > > > with the physical location of the drive. Probably the > > > > quickest test you can perform is to verify that your jobs are > > > > being affected by the media mount timeout. If you shorten > > > > the media mount timeout parameter, to say 10 minutes, your > > > > jobs should fail 10 minutes after they start if the mount > > > > timeout is what fails the jobs. You should also track down > > > > which drives are failing to mount, and see if there is a > > > > correlation. > > > > Cheers > > > > Mike > > > > > > > > > > > > > > > > > > Message: 7 > > > > > Date: Fri, 8 Dec 2006 11:08:39 -0500 (EST) > > > > > From: Justin Piszcz <[EMAIL PROTECTED]> > > > > > Subject: [Veritas-bu] Question posed to ACSLS/STK8500 users. > > > > > To: veritas-bu@mailman.eng.auburn.edu > > > > > Message-ID: <[EMAIL PROTECTED] > lan> > > > > > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > > > > > > > > All, > > > > > > > > > > My group is setting up two Sun/StorageTek SL8500s. Sun did > > > > > the install of ACSLS, there were no problems on their side. > > > > > Each SL8500 is in its own environment. On each SL8500, we > > > > > have 8 media servers, connected to four drives each, giving > > > > > us a total of 32 drives. For testing, I did the following. > > > > > Ran a NON-MULTIPLEXED backup to each drive, to ensure each > > > > > drive worked properly. To do this I kicked off four jobs in > > > > > succession. When I do this, I utilize all 4 drives. I did > > > > > this with each media server without a single problem. > > > > > However, when testing everything together, all 32 drives, I > > > > > kick off 45 jobs for example. It says there are 32 active > > > > > jobs in netbackup, which is correct. The problem is, > > > > > randomly, 2 or 3 jobs will hang at "Mounting MediaID.." and > > > then the drive will go down after 30 minutes. Why is this? > > > > > With an L700, I can send 500-1000 jobs to all of the drives > > > > > in it and there is never a mounting problem. There is > > > > > nothing wrong with any of the drives, they are brand new. > > > > > I can use ACSLS and dismount the media from the drives and > > > > > then re-run my earlier test backups, one at a time to each > > > > > of the four drives per-media server without any issues. It > > > > > is only when the robot receives a 'burst' of jobs that this > > > > > happens. Has anyone experienced anything like this before? > > > > > > > > > > Thanks for any help and responses, > > > > > > > > > > Justin. > > > > > > > > > > > > > > > > > > ___ > > > > Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu > > > > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Question posed to ACSLS/STK8500 users.
Nope, only 1 NIC. And even so yeah I do specify that in the vm.conf just incase. Justin. On Fri, 8 Dec 2006, Mike Dunn (veritas-bu) wrote: > Hmmm, do your media server's have multiple NIC, and are you using IP > multipathing software? (like in.mpathd under Solaris) If so, then make > sure that you have set the ACS_SSI_HOSTNAME appropriately in your vm.conf > file. The acs daemon inserts the value (or inferred value) of > ACS_SSI_HOSTNAME into all communications with the acs server. Also, make > sure that if you are using acls on the acs server, that they match the > name/IP used in ACS_SSI_HOSTNAME. > > Cheers > Mike > > > On 1:43:52 pm 2006-12-08 Justin Piszcz <[EMAIL PROTECTED]> wrote: > > It is 100% correct. Yep. I ran about 5 test backups to each drive > > in the robot. No problems. It is only when there is a burst of jobs. > > > > Justin. > > > > On Fri, 8 Dec 2006, Mike Dunn (veritas-bu) wrote: > > > > > Justin, > > > > > > Are you absolutely certain that you have your drive mapping done > > > properly? The fact that the job fails 30 minutes after the initial > > > mount attempt makes it sound like you are failing with a media > > > mount time out. The most common cause (especially with ACS > > > environments) is a simple mismatch betwee the /dev/rmt path and > > > your ACS path (i.e. ACS,LSM,PANEL,DRIVE). The SL8500 is also very > > > difficult to address properly, since the ACS path has little > > > correlation with the physical location of the drive. > > > Probably the quickest test you can perform is to verify that your > > > jobs are being affected by the media mount timeout. If you > > > shorten the media mount timeout parameter, to say 10 minutes, your > > > jobs should fail 10 minutes after they start if the mount timeout > > > is what fails the jobs. > > > You should also track down which drives are failing to mount, and > > > see if there is a correlation. > > > > > >Cheers > > >Mike > > > > > > > > > > > > > > Message: 7 > > > > Date: Fri, 8 Dec 2006 11:08:39 -0500 (EST) > > > > From: Justin Piszcz <[EMAIL PROTECTED]> > > > > Subject: [Veritas-bu] Question posed to ACSLS/STK8500 users. > > > > To: veritas-bu@mailman.eng.auburn.edu > > > > Message-ID: <[EMAIL PROTECTED]> > > > > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > > > > > > All, > > > > > > > > My group is setting up two Sun/StorageTek SL8500s. Sun did the > > > > install of ACSLS, there were no problems on their side. Each > > > > SL8500 is in its own environment. On each SL8500, we have 8 > > > > media servers, connected to four drives each, giving us a total > > > > of 32 drives. For testing, I did the following. Ran a > > > > NON-MULTIPLEXED backup to each drive, to ensure each drive > > > > worked properly. To do this I kicked off four jobs in > > > > succession. When I do this, I utilize all 4 drives. I did this > > > > with each media server without a single problem. However, when > > > > testing everything together, all 32 drives, I kick off 45 jobs > > > > for example. It says there are 32 active jobs in netbackup, > > > > which is correct. The problem is, randomly, 2 or 3 jobs will > > > > hang at "Mounting MediaID.." and then the drive will go down > > after 30 minutes. Why is this? With an L700, I can send 500-1000 jobs > > > > to all of the drives in it and there is never a mounting > > > > problem. There is nothing wrong with any of the drives, they > > > > are brand new. I can use ACSLS and dismount the media from the > > > > drives and then re-run my earlier test backups, one at a time to > > > > each of the four drives per-media server without any issues. It > > > > is only when the robot receives a 'burst' of jobs that this > > > > happens. > > > > Has anyone experienced anything like this before? > > > > > > > > Thanks for any help and responses, > > > > > > > > Justin. > > > > > > > > > > > > > > ___ > > > Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu > > > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu > ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Question posed to ACSLS/STK8500 users.
I believe I am, I will try this next week, thanks! Justin. On Fri, 8 Dec 2006, Mike Dunn (veritas-bu) wrote: > Are you using UDP for communication with your acs server? (UDP is default). > If is, try switching to TCP. > > Cheers > Mike > > > On 2:16:50 pm 2006-12-08 Justin Piszcz <[EMAIL PROTECTED]> wrote: > > Nope, only 1 NIC. And even so yeah I do specify that in the vm.conf > > just incase. > > > > Justin. > > > > On Fri, 8 Dec 2006, Mike Dunn (veritas-bu) wrote: > > > > > Hmmm, do your media server's have multiple NIC, and are you using > > > IP multipathing software? (like in.mpathd under Solaris) If so, > > > then make sure that you have set the ACS_SSI_HOSTNAME > > > appropriately in your vm.conf file. The acs daemon inserts the > > > value (or inferred value) of ACS_SSI_HOSTNAME into all > > > communications with the acs server. Also, make sure that if you > > > are using acls on the acs server, that they match the name/IP used > > > in ACS_SSI_HOSTNAME. > > >Cheers > > >Mike > > > > > > > > > On 1:43:52 pm 2006-12-08 Justin Piszcz <[EMAIL PROTECTED]> > > > > wrote: It is 100% correct. Yep. I ran about 5 test backups to > > > > each drive in the robot. No problems. It is only when there is > > > > a burst of jobs. > > > > Justin. > > > > > > > > On Fri, 8 Dec 2006, Mike Dunn (veritas-bu) wrote: > > > > > > > > > Justin, > > > > > > > > > > Are you absolutely certain that you have your drive mapping > > > > > done properly? The fact that the job fails 30 minutes after > > > > > the initial mount attempt makes it sound like you are failing > > > > > with a media mount time out. The most common cause > > > > > (especially with ACS environments) is a simple mismatch > > > > > betwee the /dev/rmt path and your ACS path (i.e. > > > > > ACS,LSM,PANEL,DRIVE). The SL8500 is also very difficult to > > > > > address properly, since the ACS path has little correlation > > > > > with the physical location of the drive. Probably the > > > > > quickest test you can perform is to verify that your jobs are > > > > > being affected by the media mount timeout. If you shorten > > > > > the media mount timeout parameter, to say 10 minutes, your > > > > > jobs should fail 10 minutes after they start if the mount > > > > > timeout is what fails the jobs. You should also track down > > > > > which drives are failing to mount, and see if there is a > > > > > correlation. > > > > > Cheers > > > > > Mike > > > > > > > > > > > > > > > > > > > > > > Message: 7 > > > > > > Date: Fri, 8 Dec 2006 11:08:39 -0500 (EST) > > > > > > From: Justin Piszcz <[EMAIL PROTECTED]> > > > > > > Subject: [Veritas-bu] Question posed to ACSLS/STK8500 users. > > > > > > To: veritas-bu@mailman.eng.auburn.edu > > > > > > Message-ID: <[EMAIL PROTECTED] > > lan> > > > > > > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > > > > > > > > > > All, > > > > > > > > > > > > My group is setting up two Sun/StorageTek SL8500s. Sun did > > > > > > the install of ACSLS, there were no problems on their side. > > > > > > Each SL8500 is in its own environment. On each SL8500, we > > > > > > have 8 media servers, connected to four drives each, giving > > > > > > us a total of 32 drives. For testing, I did the following. > > > > > > Ran a NON-MULTIPLEXED backup to each drive, to ensure each > > > > > > drive worked properly. To do this I kicked off four jobs in > > > > > > succession. When I do this, I utilize all 4 drives. I did > > > > > > this with each media server without a single problem. > > > > > > However, when testing everything together, all 32 drives, I > > > > > > kick off 45 jobs for example. It says there are 32 active > > > > > > jobs in netbackup, which is correct. The problem is, > > > > > > randomly, 2 or 3 jobs will hang at "Mounting MediaID.." and > > >
Re: [Veritas-bu] Question posed to ACSLS/STK8500 users.
Hmmm, do your media server's have multiple NIC, and are you using IP multipathing software? (like in.mpathd under Solaris) If so, then make sure that you have set the ACS_SSI_HOSTNAME appropriately in your vm.conf file. The acs daemon inserts the value (or inferred value) of ACS_SSI_HOSTNAME into all communications with the acs server. Also, make sure that if you are using acls on the acs server, that they match the name/IP used in ACS_SSI_HOSTNAME. Cheers Mike On 1:43:52 pm 2006-12-08 Justin Piszcz <[EMAIL PROTECTED]> wrote: > It is 100% correct. Yep. I ran about 5 test backups to each drive > in the robot. No problems. It is only when there is a burst of jobs. > > Justin. > > On Fri, 8 Dec 2006, Mike Dunn (veritas-bu) wrote: > > > Justin, > > > > Are you absolutely certain that you have your drive mapping done > > properly? The fact that the job fails 30 minutes after the initial > > mount attempt makes it sound like you are failing with a media > > mount time out. The most common cause (especially with ACS > > environments) is a simple mismatch betwee the /dev/rmt path and > > your ACS path (i.e. ACS,LSM,PANEL,DRIVE). The SL8500 is also very > > difficult to address properly, since the ACS path has little > > correlation with the physical location of the drive. > > Probably the quickest test you can perform is to verify that your > > jobs are being affected by the media mount timeout. If you > > shorten the media mount timeout parameter, to say 10 minutes, your > > jobs should fail 10 minutes after they start if the mount timeout > > is what fails the jobs. > > You should also track down which drives are failing to mount, and > > see if there is a correlation. > > > >Cheers > > Mike > > > > > > > > > > Message: 7 > > > Date: Fri, 8 Dec 2006 11:08:39 -0500 (EST) > > > From: Justin Piszcz <[EMAIL PROTECTED]> > > > Subject: [Veritas-bu] Question posed to ACSLS/STK8500 users. > > > To: veritas-bu@mailman.eng.auburn.edu > > > Message-ID: <[EMAIL PROTECTED]> > > > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > > > > All, > > > > > > My group is setting up two Sun/StorageTek SL8500s. Sun did the > > > install of ACSLS, there were no problems on their side. Each > > > SL8500 is in its own environment. On each SL8500, we have 8 > > > media servers, connected to four drives each, giving us a total > > > of 32 drives. For testing, I did the following. Ran a > > > NON-MULTIPLEXED backup to each drive, to ensure each drive > > > worked properly. To do this I kicked off four jobs in > > > succession. When I do this, I utilize all 4 drives. I did this > > > with each media server without a single problem. However, when > > > testing everything together, all 32 drives, I kick off 45 jobs > > > for example. It says there are 32 active jobs in netbackup, > > > which is correct. The problem is, randomly, 2 or 3 jobs will > > > hang at "Mounting MediaID.." and then the drive will go down > after 30 minutes. Why is this? With an L700, I can send 500-1000 jobs > > > to all of the drives in it and there is never a mounting > > > problem. There is nothing wrong with any of the drives, they > > > are brand new. I can use ACSLS and dismount the media from the > > > drives and then re-run my earlier test backups, one at a time to > > > each of the four drives per-media server without any issues. It > > > is only when the robot receives a 'burst' of jobs that this > > > happens. > > > Has anyone experienced anything like this before? > > > > > > Thanks for any help and responses, > > > > > > Justin. > > > > > > > > > > ___ > > Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu > > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Question posed to ACSLS/STK8500 users.
Justin, Do the ACSLS,LSM,PANEL,DRIVE NUMBER for ACSLS match serial number results from the tpautconf -t on the master server /dev/rmt/*cbn? Can you please display the output? Did you perform this test from your master server, or did you perform this test from each host that are media servers? After you attempt your multi-plexing do you have stuck tapes? Chris -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Justin Piszcz Sent: Friday, December 08, 2006 2:44 PM To: Mike Dunn (veritas-bu) Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Question posed to ACSLS/STK8500 users. It is 100% correct. Yep. I ran about 5 test backups to each drive in the robot. No problems. It is only when there is a burst of jobs. Justin. On Fri, 8 Dec 2006, Mike Dunn (veritas-bu) wrote: > Justin, > > Are you absolutely certain that you have your drive mapping done properly? > The fact that the job fails 30 minutes after the initial mount attempt > makes it sound like you are failing with a media mount time out. The > most common cause (especially with ACS environments) is a simple > mismatch betwee the /dev/rmt path and your ACS path (i.e. > ACS,LSM,PANEL,DRIVE). The SL8500 is also very difficult to address > properly, since the ACS path has little correlation with the physical location of the drive. > > Probably the quickest test you can perform is to verify that your jobs > are being affected by the media mount timeout. If you shorten the > media mount timeout parameter, to say 10 minutes, your jobs should > fail 10 minutes after they start if the mount timeout is what fails the jobs. > > You should also track down which drives are failing to mount, and see > if there is a correlation. > > Cheers > Mike > > > > > > Message: 7 > > Date: Fri, 8 Dec 2006 11:08:39 -0500 (EST) > > From: Justin Piszcz <[EMAIL PROTECTED]> > > Subject: [Veritas-bu] Question posed to ACSLS/STK8500 users. > > To: veritas-bu@mailman.eng.auburn.edu > > Message-ID: <[EMAIL PROTECTED]> > > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > > All, > > > > My group is setting up two Sun/StorageTek SL8500s. Sun did the > > install of ACSLS, there were no problems on their side. Each SL8500 > > is in its own environment. On each SL8500, we have 8 media servers, > > connected to four drives each, giving us a total of 32 drives. For > > testing, I did the following. Ran a NON-MULTIPLEXED backup to each > > drive, to ensure each drive worked properly. To do this I kicked > > off four jobs in succession. When I do this, I utilize all 4 drives. > > I did this with each media server without a single problem. > > However, when testing everything together, all 32 drives, I kick off > > 45 jobs for example. It says there are 32 active jobs in netbackup, > > which is correct. The problem is, randomly, 2 or 3 jobs will hang > > at "Mounting MediaID.." and then the drive will go down after 30 > > minutes. Why is this? With an L700, I can send 500-1000 jobs to > > all of the drives in it and there is never a mounting problem. > > There is nothing wrong with any of the drives, they are brand new. > > I can use ACSLS and dismount the media from the drives and then > > re-run my earlier test backups, one at a time to each of the four > > drives per-media server without any issues. It is only when the > > robot receives a 'burst' of jobs that this happens. > > > > Has anyone experienced anything like this before? > > > > Thanks for any help and responses, > > > > Justin. > > > > > > ___ > Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu > ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Question posed to ACSLS/STK8500 users.
Yes everything matches perfectly. Remember, if I run the backups slowly, one at a time, I can see each of the 4 drives being used per each media server. When I run a burst of jobs though, 29-30 of them work (1 tape per each drive) and a RANDOM 2-3 drives do not work (it differs each time I do it).. Currently I am not using MPX so Ican easily test, ie 1 job = 1 tape drive. Justin. On Fri, 8 Dec 2006, Hall, Christian N. wrote: > Justin, > > Do the ACSLS,LSM,PANEL,DRIVE NUMBER for ACSLS match serial number > results from the tpautconf -t on the master server /dev/rmt/*cbn? > Can you please display the output? Did you perform this test from your > master server, or did you perform this test from each host that are > media servers? After you attempt your multi-plexing do you have stuck > tapes? > > Chris > > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Justin > Piszcz > Sent: Friday, December 08, 2006 2:44 PM > To: Mike Dunn (veritas-bu) > Cc: veritas-bu@mailman.eng.auburn.edu > Subject: Re: [Veritas-bu] Question posed to ACSLS/STK8500 users. > > It is 100% correct. Yep. I ran about 5 test backups to each drive in > the robot. No problems. It is only when there is a burst of jobs. > > Justin. > > On Fri, 8 Dec 2006, Mike Dunn (veritas-bu) wrote: > > > Justin, > > > > Are you absolutely certain that you have your drive mapping done > properly? > > The fact that the job fails 30 minutes after the initial mount attempt > > > makes it sound like you are failing with a media mount time out. The > > most common cause (especially with ACS environments) is a simple > > mismatch betwee the /dev/rmt path and your ACS path (i.e. > > ACS,LSM,PANEL,DRIVE). The SL8500 is also very difficult to address > > properly, since the ACS path has little correlation with the physical > location of the drive. > > > > Probably the quickest test you can perform is to verify that your jobs > > > are being affected by the media mount timeout. If you shorten the > > media mount timeout parameter, to say 10 minutes, your jobs should > > fail 10 minutes after they start if the mount timeout is what fails > the jobs. > > > > You should also track down which drives are failing to mount, and see > > if there is a correlation. > > > > Cheers > > Mike > > > > > > > > > > Message: 7 > > > Date: Fri, 8 Dec 2006 11:08:39 -0500 (EST) > > > From: Justin Piszcz <[EMAIL PROTECTED]> > > > Subject: [Veritas-bu] Question posed to ACSLS/STK8500 users. > > > To: veritas-bu@mailman.eng.auburn.edu > > > Message-ID: <[EMAIL PROTECTED]> > > > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > > > > All, > > > > > > My group is setting up two Sun/StorageTek SL8500s. Sun did the > > > install of ACSLS, there were no problems on their side. Each SL8500 > > > > is in its own environment. On each SL8500, we have 8 media servers, > > > > connected to four drives each, giving us a total of 32 drives. For > > > testing, I did the following. Ran a NON-MULTIPLEXED backup to each > > > drive, to ensure each drive worked properly. To do this I kicked > > > off four jobs in succession. When I do this, I utilize all 4 drives. > > > > I did this with each media server without a single problem. > > > However, when testing everything together, all 32 drives, I kick off > > > > 45 jobs for example. It says there are 32 active jobs in netbackup, > > > > which is correct. The problem is, randomly, 2 or 3 jobs will hang > > > at "Mounting MediaID.." and then the drive will go down after 30 > > > minutes. Why is this? With an L700, I can send 500-1000 jobs to > > > all of the drives in it and there is never a mounting problem. > > > There is nothing wrong with any of the drives, they are brand new. > > > I can use ACSLS and dismount the media from the drives and then > > > re-run my earlier test backups, one at a time to each of the four > > > drives per-media server without any issues. It is only when the > > > robot receives a 'burst' of jobs that this happens. > > > > > > Has anyone experienced anything like this before? > > > > > > Thanks for any help and responses, > > > > > > Justin. > > > > > > > > > > ___ > > Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu > > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu > > > ___ > Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu > ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Question posed to ACSLS/STK8500 users.
It is 100% correct. Yep. I ran about 5 test backups to each drive in the robot. No problems. It is only when there is a burst of jobs. Justin. On Fri, 8 Dec 2006, Mike Dunn (veritas-bu) wrote: > Justin, > > Are you absolutely certain that you have your drive mapping done properly? > The fact that the job fails 30 minutes after the initial mount attempt > makes it sound like you are failing with a media mount time out. The most > common cause (especially with ACS environments) is a simple mismatch betwee > the /dev/rmt path and your ACS path (i.e. ACS,LSM,PANEL,DRIVE). The SL8500 > is also very difficult to address properly, since the ACS path has little > correlation with the physical location of the drive. > > Probably the quickest test you can perform is to verify that your jobs are > being affected by the media mount timeout. If you shorten the media mount > timeout parameter, to say 10 minutes, your jobs should fail 10 minutes > after they start if the mount timeout is what fails the jobs. > > You should also track down which drives are failing to mount, and see if > there is a correlation. > > Cheers > Mike > > > > > > Message: 7 > > Date: Fri, 8 Dec 2006 11:08:39 -0500 (EST) > > From: Justin Piszcz <[EMAIL PROTECTED]> > > Subject: [Veritas-bu] Question posed to ACSLS/STK8500 users. > > To: veritas-bu@mailman.eng.auburn.edu > > Message-ID: <[EMAIL PROTECTED]> > > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > > All, > > > > My group is setting up two Sun/StorageTek SL8500s. Sun did the > > install of ACSLS, there were no problems on their side. Each SL8500 > > is in its own environment. On each SL8500, we have 8 media servers, > > connected to four drives each, giving us a total of 32 drives. For > > testing, I did the following. Ran a NON-MULTIPLEXED backup to each > > drive, to ensure each drive worked properly. To do this I kicked off > > four jobs in succession. When I do this, I utilize all 4 drives. I > > did this with each media server without a single problem. However, > > when testing everything together, all 32 drives, I kick off 45 jobs > > for example. It says there are 32 active jobs in netbackup, which is > > correct. The problem is, randomly, 2 or 3 jobs will hang at > > "Mounting MediaID.." and then the drive will go down after 30 > > minutes. Why is this? With an L700, I can send 500-1000 jobs to all > > of the drives in it and there is never a mounting problem. There is > > nothing wrong with any of the drives, they are brand new. I can use > > ACSLS and dismount the media from the drives and then re-run my > > earlier test backups, one at a time to each of the four drives > > per-media server without any issues. It is only when the robot > > receives a 'burst' of jobs that this happens. > > > > Has anyone experienced anything like this before? > > > > Thanks for any help and responses, > > > > Justin. > > > > > > ___ > Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu > ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Question posed to ACSLS/STK8500 users.
We have it set to 200.. But sometimes we have issues with netbackup releasing tapes after backups. Sometimes the tape stays in the drive for 30-40 minutes, even if we have jobs in queue. We have logged cases to Symantec about this and also to STK/SUN. But no one can tells us what`s wrong. We had similar problems a year back or so. But then the SL8500 was kind new here and after some firmware upgrades and some new elevators and arms in place the problem disappeared. I would also recommend what Mike recommended... In the beginning it was hard keeping trace of all device names and which drive it was connected to. Also check the acsss_event.log to see if there are any errors mounting/umounting tapes and send them away with the elevator. Hampus Lind Rikspolisstyrelsen National Police Board Tel dir: +46 (0)8 - 401 99 43 Tel mob: +46 (0)70 - 217 92 66 E-mail: [EMAIL PROTECTED] -Ursprungligt meddelande- Från: Justin Piszcz [mailto:[EMAIL PROTECTED] Skickat: den 8 december 2006 18:15 Till: Hampus Lind Kopia: veritas-bu@mailman.eng.auburn.edu Ämne: Re: SV: [Veritas-bu] Question posed to ACSLS/STK8500 users. Yeah, its either 120 or 180 seconds I believe, what is yours set to? Justin. On Fri, 8 Dec 2006, Hampus Lind wrote: > Hi, > > Do you have umount delay set? It takes some time for tapes to travel within > the SL8500, especially if they need to take the elevator. > > Do you have the latest microcode/firmware on the drives and the library? > > Hampus Lind > Rikspolisstyrelsen > National Police Board > Tel dir: +46 (0)8 - 401 99 43 > Tel mob: +46 (0)70 - 217 92 66 > E-mail: [EMAIL PROTECTED] > > > -Ursprungligt meddelande- > Från: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] För Justin Piszcz > Skickat: den 8 december 2006 17:09 > Till: veritas-bu@mailman.eng.auburn.edu > Ämne: [Veritas-bu] Question posed to ACSLS/STK8500 users. > > All, > > My group is setting up two Sun/StorageTek SL8500s. Sun did the install of > ACSLS, there were no problems on their side. Each SL8500 is in its own > environment. On each SL8500, we have 8 media servers, connected to four > drives each, giving us a total of 32 drives. For testing, I did the > following. Ran a NON-MULTIPLEXED backup to each drive, to ensure each > drive worked properly. To do this I kicked off four jobs in succession. > When I do this, I utilize all 4 drives. I did this with each media server > without a single problem. However, when testing everything together, all > 32 drives, I kick off 45 jobs for example. It says there are 32 active > jobs in netbackup, which is correct. The problem is, randomly, 2 or 3 > jobs will hang at "Mounting MediaID.." and then the drive will go down > after 30 minutes. Why is this? With an L700, I can send 500-1000 jobs to > all of the drives in it and there is never a mounting problem. There is > nothing wrong with any of the drives, they are brand new. I can use ACSLS > and dismount the media from the drives and then re-run my earlier test > backups, one at a time to each of the four drives per-media server without > any issues. It is only when the robot receives a 'burst' of jobs that > this happens. > > Has anyone experienced anything like this before? > > Thanks for any help and responses, > > Justin. > ___ > Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu > ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Question posed to ACSLS/STK8500 users.
Justin, Are you absolutely certain that you have your drive mapping done properly? The fact that the job fails 30 minutes after the initial mount attempt makes it sound like you are failing with a media mount time out. The most common cause (especially with ACS environments) is a simple mismatch betwee the /dev/rmt path and your ACS path (i.e. ACS,LSM,PANEL,DRIVE). The SL8500 is also very difficult to address properly, since the ACS path has little correlation with the physical location of the drive. Probably the quickest test you can perform is to verify that your jobs are being affected by the media mount timeout. If you shorten the media mount timeout parameter, to say 10 minutes, your jobs should fail 10 minutes after they start if the mount timeout is what fails the jobs. You should also track down which drives are failing to mount, and see if there is a correlation. Cheers Mike > > Message: 7 > Date: Fri, 8 Dec 2006 11:08:39 -0500 (EST) > From: Justin Piszcz <[EMAIL PROTECTED]> > Subject: [Veritas-bu] Question posed to ACSLS/STK8500 users. > To: veritas-bu@mailman.eng.auburn.edu > Message-ID: <[EMAIL PROTECTED]> > Content-Type: TEXT/PLAIN; charset=US-ASCII > > All, > > My group is setting up two Sun/StorageTek SL8500s. Sun did the > install of ACSLS, there were no problems on their side. Each SL8500 > is in its own environment. On each SL8500, we have 8 media servers, > connected to four drives each, giving us a total of 32 drives. For > testing, I did the following. Ran a NON-MULTIPLEXED backup to each > drive, to ensure each drive worked properly. To do this I kicked off > four jobs in succession. When I do this, I utilize all 4 drives. I > did this with each media server without a single problem. However, > when testing everything together, all 32 drives, I kick off 45 jobs > for example. It says there are 32 active jobs in netbackup, which is > correct. The problem is, randomly, 2 or 3 jobs will hang at > "Mounting MediaID.." and then the drive will go down after 30 > minutes. Why is this? With an L700, I can send 500-1000 jobs to all > of the drives in it and there is never a mounting problem. There is > nothing wrong with any of the drives, they are brand new. I can use > ACSLS and dismount the media from the drives and then re-run my > earlier test backups, one at a time to each of the four drives > per-media server without any issues. It is only when the robot > receives a 'burst' of jobs that this happens. > > Has anyone experienced anything like this before? > > Thanks for any help and responses, > > Justin. > > ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Question posed to ACSLS/STK8500 users.
> Yeah, its either 120 or 180 seconds I believe, what is yours set to? 10 mins is what we set to, and never have a problem with the same library. On Fri, 8 Dec 2006, Hampus Lind wrote: > Hi, > > Do you have umount delay set? It takes some time for tapes to travel > within the SL8500, especially if they need to take the elevator. > > Do you have the latest microcode/firmware on the drives and the library? I want to echo that. They've released firmware rather recently that *greatly* improves traffic-routing within the library. Also make sure you're running the latest and greatest ACSLS patches -- they go hand-in hand to improve operation times. As an example of the speed-ups, we patched in mid-September, and our daily vault ejects started taking 50% of the time they had taken, for the same number of tapes. Cheers, jf ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Question posed to ACSLS/STK8500 users.
Yeah, its either 120 or 180 seconds I believe, what is yours set to? Justin. On Fri, 8 Dec 2006, Hampus Lind wrote: > Hi, > > Do you have umount delay set? It takes some time for tapes to travel within > the SL8500, especially if they need to take the elevator. > > Do you have the latest microcode/firmware on the drives and the library? > > Hampus Lind > Rikspolisstyrelsen > National Police Board > Tel dir: +46 (0)8 - 401 99 43 > Tel mob: +46 (0)70 - 217 92 66 > E-mail: [EMAIL PROTECTED] > > > -Ursprungligt meddelande- > Från: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] För Justin Piszcz > Skickat: den 8 december 2006 17:09 > Till: veritas-bu@mailman.eng.auburn.edu > Ämne: [Veritas-bu] Question posed to ACSLS/STK8500 users. > > All, > > My group is setting up two Sun/StorageTek SL8500s. Sun did the install of > ACSLS, there were no problems on their side. Each SL8500 is in its own > environment. On each SL8500, we have 8 media servers, connected to four > drives each, giving us a total of 32 drives. For testing, I did the > following. Ran a NON-MULTIPLEXED backup to each drive, to ensure each > drive worked properly. To do this I kicked off four jobs in succession. > When I do this, I utilize all 4 drives. I did this with each media server > without a single problem. However, when testing everything together, all > 32 drives, I kick off 45 jobs for example. It says there are 32 active > jobs in netbackup, which is correct. The problem is, randomly, 2 or 3 > jobs will hang at "Mounting MediaID.." and then the drive will go down > after 30 minutes. Why is this? With an L700, I can send 500-1000 jobs to > all of the drives in it and there is never a mounting problem. There is > nothing wrong with any of the drives, they are brand new. I can use ACSLS > and dismount the media from the drives and then re-run my earlier test > backups, one at a time to each of the four drives per-media server without > any issues. It is only when the robot receives a 'burst' of jobs that > this happens. > > Has anyone experienced anything like this before? > > Thanks for any help and responses, > > Justin. > ___ > Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu > ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Question posed to ACSLS/STK8500 users.
Hi, Do you have umount delay set? It takes some time for tapes to travel within the SL8500, especially if they need to take the elevator. Do you have the latest microcode/firmware on the drives and the library? Hampus Lind Rikspolisstyrelsen National Police Board Tel dir: +46 (0)8 - 401 99 43 Tel mob: +46 (0)70 - 217 92 66 E-mail: [EMAIL PROTECTED] -Ursprungligt meddelande- Från: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] För Justin Piszcz Skickat: den 8 december 2006 17:09 Till: veritas-bu@mailman.eng.auburn.edu Ämne: [Veritas-bu] Question posed to ACSLS/STK8500 users. All, My group is setting up two Sun/StorageTek SL8500s. Sun did the install of ACSLS, there were no problems on their side. Each SL8500 is in its own environment. On each SL8500, we have 8 media servers, connected to four drives each, giving us a total of 32 drives. For testing, I did the following. Ran a NON-MULTIPLEXED backup to each drive, to ensure each drive worked properly. To do this I kicked off four jobs in succession. When I do this, I utilize all 4 drives. I did this with each media server without a single problem. However, when testing everything together, all 32 drives, I kick off 45 jobs for example. It says there are 32 active jobs in netbackup, which is correct. The problem is, randomly, 2 or 3 jobs will hang at "Mounting MediaID.." and then the drive will go down after 30 minutes. Why is this? With an L700, I can send 500-1000 jobs to all of the drives in it and there is never a mounting problem. There is nothing wrong with any of the drives, they are brand new. I can use ACSLS and dismount the media from the drives and then re-run my earlier test backups, one at a time to each of the four drives per-media server without any issues. It is only when the robot receives a 'burst' of jobs that this happens. Has anyone experienced anything like this before? Thanks for any help and responses, Justin. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
[Veritas-bu] Question posed to ACSLS/STK8500 users.
All, My group is setting up two Sun/StorageTek SL8500s. Sun did the install of ACSLS, there were no problems on their side. Each SL8500 is in its own environment. On each SL8500, we have 8 media servers, connected to four drives each, giving us a total of 32 drives. For testing, I did the following. Ran a NON-MULTIPLEXED backup to each drive, to ensure each drive worked properly. To do this I kicked off four jobs in succession. When I do this, I utilize all 4 drives. I did this with each media server without a single problem. However, when testing everything together, all 32 drives, I kick off 45 jobs for example. It says there are 32 active jobs in netbackup, which is correct. The problem is, randomly, 2 or 3 jobs will hang at "Mounting MediaID.." and then the drive will go down after 30 minutes. Why is this? With an L700, I can send 500-1000 jobs to all of the drives in it and there is never a mounting problem. There is nothing wrong with any of the drives, they are brand new. I can use ACSLS and dismount the media from the drives and then re-run my earlier test backups, one at a time to each of the four drives per-media server without any issues. It is only when the robot receives a 'burst' of jobs that this happens. Has anyone experienced anything like this before? Thanks for any help and responses, Justin. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu