[Veritas-bu] semaphore issue

2008-01-26 Thread comp_geek_gal

I should have done a little bit more research before replying the first time.  
google returns references to the Troubleshooting guides related to that error - 
I would recommend that you do check the current semaphore settings to make sure 
they are adequate. 


 It's also not clear what the behavior is - i.e. do you start the processes and 
ltid runs for a while and then returns the error (in which case a reboot 
probably will not help) or does it simply not start (in which case a reboot 
probably will help).


Then for reference - the following from the troubleshooting guide:

Device Management Status Code: 32
Message: Error in getting semaphore
Status Codes
436 NetBackup Troubleshooting Guide
Explanation: An attempt was made by ltid (the Media Manager device daemon on
UNIX or the NetBackup Device Manager service on Windows) to obtain a semaphore
used for arbitrating access to shared memory, and the request failed due to a 
system error.
The error probably indicates a lack of system resources for semaphores, or 
mismatched
software components.
Recommended Action:
1. Examine command output (if available), debug logs, and system logs for 
messages
related to the error. Enable debug logging by creating the necessary
directories/folders. Increase the level of verbosity by adding the VERBOSE 
option in
the vm.conf file and restarting ltid (the device daemon on UNIX or NetBackup 
Device
Manager service on Windows).
2. On UNIX servers, gather the output of the ipcs -a command to see what 
resources
are currently in use. Check the installed software components and verify that 
they are
all at a compatible release version.


For reference - the solaris 8 and 9 minimums kernel parameters for netbackup 
are in Sun Document ID: 73373  and NetBackup technote id 238063  
-http://seer.support.veritas.com/docs/238063.htm

+--
|This was sent by [EMAIL PROTECTED] via Backup Central.
|Forward SPAM to [EMAIL PROTECTED]
+--


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] semaphore issue

2008-01-26 Thread d w
I just saw your question to Jeff on the backup central link.  
   
  Reboot the media server (or servers) that's affected.  
   
  D


   
-
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] semaphore issue

2008-01-26 Thread Dominik Pietrzykowski
 

Hi Jeff,

 

Do I reboot the master or the master and the media servers or just the media
server ???  

 

This is happening on 2 off my SAN media servers.

 

The master and my main media server appear to be OK.

 

Thanks,

 

Dominik

 

  _  

From: Jeff Lightner [mailto:[EMAIL PROTECTED] 
Sent: Sunday, 27 January 2008 1:12 AM
To: Dominik Pietrzykowski; VERITAS-BU@mailman.eng.auburn.edu
Subject: RE: [Veritas-bu] semaphore issue

 

Haven't seen it in relation to NBU but if you are sure that the semaphore
parameters are all adequate it may be that something stopped abnormally and
left semaphores or even shared memory segments in use at a memory address
that NBU wants.  In NBU 6.x there is a database instead of flat files and it
is Sybase.   Most modern databases use a combination of shared memory
segments and semaphores for control.

 

You can use the ipcs command to examine what semaphores/shared memory
segments are in use.   You can use ipcrm to remove any.  WARNING:  Deleting
shared memory segments or semaphores that are still required by a running
application can cause your system to crash.  

 

If you're not sure what can be cleared a reboot will clear both IPC types.

 

  _  

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dominik
Pietrzykowski
Sent: Friday, January 25, 2008 6:20 PM
To: VERITAS-BU@mailman.eng.auburn.edu
Subject: [Veritas-bu] semaphore issue

 

 

Anyone seen this on a Solaris 10 server 

 

# ltid -v

# Error in getting semaphore

#

 

Ltid keeps on dieing and it complains about semaphores. My other Solaris 10
servers are fine but I have two with this issue.

Both use different hardware and no you don't need to tune the kernel on
Solaris 10 as it's defaults are much bigger than anything Symantec
recommend.

 

Hope someone can help.

 

Thanks,

 

Dominik

--
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential
information and is for the sole use of the intended recipient(s). If you are
not the intended recipient, any disclosure, copying, distribution, or use of
the contents of this information is prohibited and may be unlawful. If you
have received this electronic transmission in error, please reply
immediately to the sender that you have received the message in error, and
delete it. Thank you.
--

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


[Veritas-bu] semaphore issue

2008-01-26 Thread d w
Actually, the Solaris 10 system defaults are higher, leading Sun to say that 
many of the values are obsolete, but they really aren't.
   
  An example is the msgmnb setting - the "official" Solaris tuning guide from 
Sun states that it's obsolete.  But run an ipcs sometime on your master running 
netbackup, without the /etc/system parameter in place in that file the message 
queue size is 65536.  
   
  Then change the kernel parameter to say twice the current value (131072, if 
using mdb to make the change you still need to recycle the processes) note the 
size has increased after recycling/
   
  I also work for Symantec and if you call Sun and talk to a kernel engineer,  
they will admit they really haven't obsoleted the values so that o/s actually 
ignores them. 
   
  Ok - so really - on to problem (just trying to dispell the myth).
   
  I haven't seen that issue before, but prior to starting the process's - are 
there any 'stale' semaphores left in ipcs?  
   
  What does the ltid log say (i.e. in /usr/openv/volmgr/debug)?
   
  Can you try to start it using truss?  
   
  Does a reboot help?  
   
  Is anything in maintenance mode that shouldn't be when you run svcs -a?
   
  Also make sure that your /etc/system file doesn't actually have any of the 
semaphore settings in there (typically the default semaphore settings on 
solaris 10 are just fine - although SUN HAS recommended additional settings on 
occassion - see the following technote :http://support.veritas.com/docs/295295).
   
  Hope that helps.
   
  D
   
  
   
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Dominik
  Pietrzykowski
  Sent: Friday, January 25, 2008 6:20 PM
  To: VERITAS-BU@mailman.eng.auburn.edu
  Subject: [Veritas-bu] semaphore issue
   
   
   
   
   
  Anyone seen this on a Solaris 10 server 
   
   
   
  # ltid -v
   
  # Error in getting semaphore
   
  #
   
   
   
  Ltid keeps on dieing and it complains about semaphores. My other
   Solaris
  10 servers are fine but I have two with this issue.
   
  Both use different hardware and no you don't need to tune the kernel on
  Solaris 10 as it's defaults are much bigger than anything Symantec
  recommend.
   
   
   
  Hope someone can help.
   
   
   
  Thanks,
   
   
   
  Dominik
  --

   
-
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


[Veritas-bu] FYI - Solaris 10 mpt driver patch

2008-01-26 Thread Roy McMorran
Just an FYI, if you are running Solaris 10 on a Netbackup server, do not 
apply this patch:

mpt driver patch 125081-10 (or above) (I tried 125081-14)

This patch introduces a change which breaks the sg driver.  See
http://bugs.opensolaris.org/view_bug.do?bug_id=6651884

Basically all your tape drives and robotic devices go away.  Not so good.

Cheers,
-- 

Roy McMorran
Systems Administrator
MDI Biological Laboratory
[EMAIL PROTECTED]

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] For those of you backing up millions of files....

2008-01-26 Thread Martin, Jonathan
Great story!  Restoring data is overrated anyway. =P
 
-Jonathan



From: [EMAIL PROTECTED] on behalf of Bobby Williams
Sent: Sat 1/26/2008 8:39 AM
To: 'veritas-bu'
Subject: [Veritas-bu] For those of you backing up millions of files



We have warned, begged, pleaded, and threatened, but some application owners 
want to keep everything forever. 

I have a system with a file system with over 29 million files.  Of course no 
one can afford advanced client.  No one wants a raw partition backup because 
they may want that 1 file...   You have heard the excuses.

Well, the storm hit.  I am moving a server to another data center and had to 
move the SAN volumes via tape. 

(Don't start telling me a better way of moving this stuff, that is not the 
point of this email and I have been suggesting ways for a while).

I could not fire off a restore of the entire file system.  It would just stay 
in the queue.  I started seeing what I could fire off.  I started selecting 
some subdirectories and was able to restore.

There were only 21,300 individual subdirectories, so clicking a few in the GUI 
was NOT an option. 

I did a bplist and got the subdir names.  Using split, I split the subdir names 
into groups of 50.  Gave me 425 file lists.

I ran a script to brute force the restores.  Uh-oh.  1 tape with the data on 
it.  Not enough memory to calculate the restore list for 425 restore jobs 
concurrently.

There is a "-w" switch on the bprestore command.  I now know what it is for.  
If you are scripting, it prevents the next restore from firing off until the 
previous restore is finished.  I had to go with it to keep everything from 
timing out in the queue and not knowing what had run and what had not.  I did 
include the "-L" to keep up with what had / had not fired.

Data is going back and the restore will be successful.  Howerver, someone 
promised that the system would be online for testing 10 hours after it was 
installed.

I had told them several times this week that the full backup took 35 hours, so 
don't expect a quick restore. 

Point of the email is that "yes, we can back up millions of files without 
paying for advanced client, but we can't restore the data per your RTO/SLA".




Bobby Williams 
2205 Peterson Drive 
Chattanooga, Tennessee  37421 
423-296-8200 


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] For those of you backing up millions of files....

2008-01-26 Thread Justin Piszcz


On Sat, 26 Jan 2008, Bobby Williams wrote:

> We have warned, begged, pleaded, and threatened, but some application owners
> want to keep everything forever.
>
> I have a system with a file system with over 29 million files.  Of course no
> one can afford advanced client.  No one wants a raw partition backup because
> they may want that 1 file.   You have heard the excuses.
>
> Well, the storm hit.  I am moving a server to another data center and had to
> move the SAN volumes via tape.
>
> (Don't start telling me a better way of moving this stuff, that is not the
> point of this email and I have been suggesting ways for a while).
>
> I could not fire off a restore of the entire file system.  It would just
> stay in the queue.  I started seeing what I could fire off.  I started
> selecting some subdirectories and was able to restore.
>
> There were only 21,300 individual subdirectories, so clicking a few in the
> GUI was NOT an option.
>
> I did a bplist and got the subdir names.  Using split, I split the subdir
> names into groups of 50.  Gave me 425 file lists.
>
> I ran a script to brute force the restores.  Uh-oh.  1 tape with the data on
> it.  Not enough memory to calculate the restore list for 425 restore jobs
> concurrently.
>
> There is a "-w" switch on the bprestore command.  I now know what it is for.
> If you are scripting, it prevents the next restore from firing off until the
> previous restore is finished.  I had to go with it to keep everything from
> timing out in the queue and not knowing what had run and what had not.  I
> did include the "-L" to keep up with what had / had not fired.
>
> Data is going back and the restore will be successful.  Howerver, someone
> promised that the system would be online for testing 10 hours after it was
> installed.
>
> I had told them several times this week that the full backup took 35 hours,
> so don't expect a quick restore.
>
> Point of the email is that "yes, we can back up millions of files without
> paying for advanced client, but we can't restore the data per your RTO/SLA".
>
>
>
>
> Bobby Williams
> 2205 Peterson Drive
> Chattanooga, Tennessee  37421
> 423-296-8200
>
>

This should be part of an FAQ, good to know!

There is a "-w" switch on the bprestore command.  I now know what it is 
for. If you are scripting, it prevents the next restore from firing off 
until the previous restore is finished.
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] semaphore issue

2008-01-26 Thread Jeff Lightner
Haven't seen it in relation to NBU but if you are sure that the
semaphore parameters are all adequate it may be that something stopped
abnormally and left semaphores or even shared memory segments in use at
a memory address that NBU wants.  In NBU 6.x there is a database instead
of flat files and it is Sybase.   Most modern databases use a
combination of shared memory segments and semaphores for control.

 

You can use the ipcs command to examine what semaphores/shared memory
segments are in use.   You can use ipcrm to remove any.  WARNING:
Deleting shared memory segments or semaphores that are still required by
a running application can cause your system to crash.  

 

If you're not sure what can be cleared a reboot will clear both IPC
types.

 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dominik
Pietrzykowski
Sent: Friday, January 25, 2008 6:20 PM
To: VERITAS-BU@mailman.eng.auburn.edu
Subject: [Veritas-bu] semaphore issue

 

 

Anyone seen this on a Solaris 10 server 

 

# ltid -v

# Error in getting semaphore

#

 

Ltid keeps on dieing and it complains about semaphores. My other Solaris
10 servers are fine but I have two with this issue.

Both use different hardware and no you don't need to tune the kernel on
Solaris 10 as it's defaults are much bigger than anything Symantec
recommend.

 

Hope someone can help.

 

Thanks,

 

Dominik
--
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential 
information and is for the sole use of the intended recipient(s). If you are 
not the intended recipient, any disclosure, copying, distribution, or use of 
the contents of this information is prohibited and may be unlawful. If you have 
received this electronic transmission in error, please reply immediately to the 
sender that you have received the message in error, and delete it. Thank you.
--
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


[Veritas-bu] For those of you backing up millions of files....

2008-01-26 Thread Bobby Williams
We have warned, begged, pleaded, and threatened, but some application owners
want to keep everything forever.

I have a system with a file system with over 29 million files.  Of course no
one can afford advanced client.  No one wants a raw partition backup because
they may want that 1 file.   You have heard the excuses.

Well, the storm hit.  I am moving a server to another data center and had to
move the SAN volumes via tape.

(Don't start telling me a better way of moving this stuff, that is not the
point of this email and I have been suggesting ways for a while).

I could not fire off a restore of the entire file system.  It would just
stay in the queue.  I started seeing what I could fire off.  I started
selecting some subdirectories and was able to restore.

There were only 21,300 individual subdirectories, so clicking a few in the
GUI was NOT an option.

I did a bplist and got the subdir names.  Using split, I split the subdir
names into groups of 50.  Gave me 425 file lists.

I ran a script to brute force the restores.  Uh-oh.  1 tape with the data on
it.  Not enough memory to calculate the restore list for 425 restore jobs
concurrently.

There is a "-w" switch on the bprestore command.  I now know what it is for.
If you are scripting, it prevents the next restore from firing off until the
previous restore is finished.  I had to go with it to keep everything from
timing out in the queue and not knowing what had run and what had not.  I
did include the "-L" to keep up with what had / had not fired.

Data is going back and the restore will be successful.  Howerver, someone
promised that the system would be online for testing 10 hours after it was
installed.

I had told them several times this week that the full backup took 35 hours,
so don't expect a quick restore.

Point of the email is that "yes, we can back up millions of files without
paying for advanced client, but we can't restore the data per your RTO/SLA".




Bobby Williams
2205 Peterson Drive
Chattanooga, Tennessee  37421
423-296-8200

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu