[Veritas-bu] semaphore issue
I should have done a little bit more research before replying the first time. google returns references to the Troubleshooting guides related to that error - I would recommend that you do check the current semaphore settings to make sure they are adequate. It's also not clear what the behavior is - i.e. do you start the processes and ltid runs for a while and then returns the error (in which case a reboot probably will not help) or does it simply not start (in which case a reboot probably will help). Then for reference - the following from the troubleshooting guide: Device Management Status Code: 32 Message: Error in getting semaphore Status Codes 436 NetBackup Troubleshooting Guide Explanation: An attempt was made by ltid (the Media Manager device daemon on UNIX or the NetBackup Device Manager service on Windows) to obtain a semaphore used for arbitrating access to shared memory, and the request failed due to a system error. The error probably indicates a lack of system resources for semaphores, or mismatched software components. Recommended Action: 1. Examine command output (if available), debug logs, and system logs for messages related to the error. Enable debug logging by creating the necessary directories/folders. Increase the level of verbosity by adding the VERBOSE option in the vm.conf file and restarting ltid (the device daemon on UNIX or NetBackup Device Manager service on Windows). 2. On UNIX servers, gather the output of the ipcs -a command to see what resources are currently in use. Check the installed software components and verify that they are all at a compatible release version. For reference - the solaris 8 and 9 minimums kernel parameters for netbackup are in Sun Document ID: 73373 and NetBackup technote id 238063 -http://seer.support.veritas.com/docs/238063.htm +-- |This was sent by [EMAIL PROTECTED] via Backup Central. |Forward SPAM to [EMAIL PROTECTED] +-- ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] semaphore issue
I just saw your question to Jeff on the backup central link. Reboot the media server (or servers) that's affected. D - Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] semaphore issue
Hi Jeff, Do I reboot the master or the master and the media servers or just the media server ??? This is happening on 2 off my SAN media servers. The master and my main media server appear to be OK. Thanks, Dominik _ From: Jeff Lightner [mailto:[EMAIL PROTECTED] Sent: Sunday, 27 January 2008 1:12 AM To: Dominik Pietrzykowski; VERITAS-BU@mailman.eng.auburn.edu Subject: RE: [Veritas-bu] semaphore issue Haven't seen it in relation to NBU but if you are sure that the semaphore parameters are all adequate it may be that something stopped abnormally and left semaphores or even shared memory segments in use at a memory address that NBU wants. In NBU 6.x there is a database instead of flat files and it is Sybase. Most modern databases use a combination of shared memory segments and semaphores for control. You can use the ipcs command to examine what semaphores/shared memory segments are in use. You can use ipcrm to remove any. WARNING: Deleting shared memory segments or semaphores that are still required by a running application can cause your system to crash. If you're not sure what can be cleared a reboot will clear both IPC types. _ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dominik Pietrzykowski Sent: Friday, January 25, 2008 6:20 PM To: VERITAS-BU@mailman.eng.auburn.edu Subject: [Veritas-bu] semaphore issue Anyone seen this on a Solaris 10 server # ltid -v # Error in getting semaphore # Ltid keeps on dieing and it complains about semaphores. My other Solaris 10 servers are fine but I have two with this issue. Both use different hardware and no you don't need to tune the kernel on Solaris 10 as it's defaults are much bigger than anything Symantec recommend. Hope someone can help. Thanks, Dominik -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you. -- ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
[Veritas-bu] semaphore issue
Actually, the Solaris 10 system defaults are higher, leading Sun to say that many of the values are obsolete, but they really aren't. An example is the msgmnb setting - the "official" Solaris tuning guide from Sun states that it's obsolete. But run an ipcs sometime on your master running netbackup, without the /etc/system parameter in place in that file the message queue size is 65536. Then change the kernel parameter to say twice the current value (131072, if using mdb to make the change you still need to recycle the processes) note the size has increased after recycling/ I also work for Symantec and if you call Sun and talk to a kernel engineer, they will admit they really haven't obsoleted the values so that o/s actually ignores them. Ok - so really - on to problem (just trying to dispell the myth). I haven't seen that issue before, but prior to starting the process's - are there any 'stale' semaphores left in ipcs? What does the ltid log say (i.e. in /usr/openv/volmgr/debug)? Can you try to start it using truss? Does a reboot help? Is anything in maintenance mode that shouldn't be when you run svcs -a? Also make sure that your /etc/system file doesn't actually have any of the semaphore settings in there (typically the default semaphore settings on solaris 10 are just fine - although SUN HAS recommended additional settings on occassion - see the following technote :http://support.veritas.com/docs/295295). Hope that helps. D From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dominik Pietrzykowski Sent: Friday, January 25, 2008 6:20 PM To: VERITAS-BU@mailman.eng.auburn.edu Subject: [Veritas-bu] semaphore issue Anyone seen this on a Solaris 10 server # ltid -v # Error in getting semaphore # Ltid keeps on dieing and it complains about semaphores. My other Solaris 10 servers are fine but I have two with this issue. Both use different hardware and no you don't need to tune the kernel on Solaris 10 as it's defaults are much bigger than anything Symantec recommend. Hope someone can help. Thanks, Dominik -- - Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
[Veritas-bu] FYI - Solaris 10 mpt driver patch
Just an FYI, if you are running Solaris 10 on a Netbackup server, do not apply this patch: mpt driver patch 125081-10 (or above) (I tried 125081-14) This patch introduces a change which breaks the sg driver. See http://bugs.opensolaris.org/view_bug.do?bug_id=6651884 Basically all your tape drives and robotic devices go away. Not so good. Cheers, -- Roy McMorran Systems Administrator MDI Biological Laboratory [EMAIL PROTECTED] ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] For those of you backing up millions of files....
Great story! Restoring data is overrated anyway. =P -Jonathan From: [EMAIL PROTECTED] on behalf of Bobby Williams Sent: Sat 1/26/2008 8:39 AM To: 'veritas-bu' Subject: [Veritas-bu] For those of you backing up millions of files We have warned, begged, pleaded, and threatened, but some application owners want to keep everything forever. I have a system with a file system with over 29 million files. Of course no one can afford advanced client. No one wants a raw partition backup because they may want that 1 file... You have heard the excuses. Well, the storm hit. I am moving a server to another data center and had to move the SAN volumes via tape. (Don't start telling me a better way of moving this stuff, that is not the point of this email and I have been suggesting ways for a while). I could not fire off a restore of the entire file system. It would just stay in the queue. I started seeing what I could fire off. I started selecting some subdirectories and was able to restore. There were only 21,300 individual subdirectories, so clicking a few in the GUI was NOT an option. I did a bplist and got the subdir names. Using split, I split the subdir names into groups of 50. Gave me 425 file lists. I ran a script to brute force the restores. Uh-oh. 1 tape with the data on it. Not enough memory to calculate the restore list for 425 restore jobs concurrently. There is a "-w" switch on the bprestore command. I now know what it is for. If you are scripting, it prevents the next restore from firing off until the previous restore is finished. I had to go with it to keep everything from timing out in the queue and not knowing what had run and what had not. I did include the "-L" to keep up with what had / had not fired. Data is going back and the restore will be successful. Howerver, someone promised that the system would be online for testing 10 hours after it was installed. I had told them several times this week that the full backup took 35 hours, so don't expect a quick restore. Point of the email is that "yes, we can back up millions of files without paying for advanced client, but we can't restore the data per your RTO/SLA". Bobby Williams 2205 Peterson Drive Chattanooga, Tennessee 37421 423-296-8200 ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] For those of you backing up millions of files....
On Sat, 26 Jan 2008, Bobby Williams wrote: > We have warned, begged, pleaded, and threatened, but some application owners > want to keep everything forever. > > I have a system with a file system with over 29 million files. Of course no > one can afford advanced client. No one wants a raw partition backup because > they may want that 1 file. You have heard the excuses. > > Well, the storm hit. I am moving a server to another data center and had to > move the SAN volumes via tape. > > (Don't start telling me a better way of moving this stuff, that is not the > point of this email and I have been suggesting ways for a while). > > I could not fire off a restore of the entire file system. It would just > stay in the queue. I started seeing what I could fire off. I started > selecting some subdirectories and was able to restore. > > There were only 21,300 individual subdirectories, so clicking a few in the > GUI was NOT an option. > > I did a bplist and got the subdir names. Using split, I split the subdir > names into groups of 50. Gave me 425 file lists. > > I ran a script to brute force the restores. Uh-oh. 1 tape with the data on > it. Not enough memory to calculate the restore list for 425 restore jobs > concurrently. > > There is a "-w" switch on the bprestore command. I now know what it is for. > If you are scripting, it prevents the next restore from firing off until the > previous restore is finished. I had to go with it to keep everything from > timing out in the queue and not knowing what had run and what had not. I > did include the "-L" to keep up with what had / had not fired. > > Data is going back and the restore will be successful. Howerver, someone > promised that the system would be online for testing 10 hours after it was > installed. > > I had told them several times this week that the full backup took 35 hours, > so don't expect a quick restore. > > Point of the email is that "yes, we can back up millions of files without > paying for advanced client, but we can't restore the data per your RTO/SLA". > > > > > Bobby Williams > 2205 Peterson Drive > Chattanooga, Tennessee 37421 > 423-296-8200 > > This should be part of an FAQ, good to know! There is a "-w" switch on the bprestore command. I now know what it is for. If you are scripting, it prevents the next restore from firing off until the previous restore is finished. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] semaphore issue
Haven't seen it in relation to NBU but if you are sure that the semaphore parameters are all adequate it may be that something stopped abnormally and left semaphores or even shared memory segments in use at a memory address that NBU wants. In NBU 6.x there is a database instead of flat files and it is Sybase. Most modern databases use a combination of shared memory segments and semaphores for control. You can use the ipcs command to examine what semaphores/shared memory segments are in use. You can use ipcrm to remove any. WARNING: Deleting shared memory segments or semaphores that are still required by a running application can cause your system to crash. If you're not sure what can be cleared a reboot will clear both IPC types. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dominik Pietrzykowski Sent: Friday, January 25, 2008 6:20 PM To: VERITAS-BU@mailman.eng.auburn.edu Subject: [Veritas-bu] semaphore issue Anyone seen this on a Solaris 10 server # ltid -v # Error in getting semaphore # Ltid keeps on dieing and it complains about semaphores. My other Solaris 10 servers are fine but I have two with this issue. Both use different hardware and no you don't need to tune the kernel on Solaris 10 as it's defaults are much bigger than anything Symantec recommend. Hope someone can help. Thanks, Dominik -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you. -- ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
[Veritas-bu] For those of you backing up millions of files....
We have warned, begged, pleaded, and threatened, but some application owners want to keep everything forever. I have a system with a file system with over 29 million files. Of course no one can afford advanced client. No one wants a raw partition backup because they may want that 1 file. You have heard the excuses. Well, the storm hit. I am moving a server to another data center and had to move the SAN volumes via tape. (Don't start telling me a better way of moving this stuff, that is not the point of this email and I have been suggesting ways for a while). I could not fire off a restore of the entire file system. It would just stay in the queue. I started seeing what I could fire off. I started selecting some subdirectories and was able to restore. There were only 21,300 individual subdirectories, so clicking a few in the GUI was NOT an option. I did a bplist and got the subdir names. Using split, I split the subdir names into groups of 50. Gave me 425 file lists. I ran a script to brute force the restores. Uh-oh. 1 tape with the data on it. Not enough memory to calculate the restore list for 425 restore jobs concurrently. There is a "-w" switch on the bprestore command. I now know what it is for. If you are scripting, it prevents the next restore from firing off until the previous restore is finished. I had to go with it to keep everything from timing out in the queue and not knowing what had run and what had not. I did include the "-L" to keep up with what had / had not fired. Data is going back and the restore will be successful. Howerver, someone promised that the system would be online for testing 10 hours after it was installed. I had told them several times this week that the full backup took 35 hours, so don't expect a quick restore. Point of the email is that "yes, we can back up millions of files without paying for advanced client, but we can't restore the data per your RTO/SLA". Bobby Williams 2205 Peterson Drive Chattanooga, Tennessee 37421 423-296-8200 ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu