Re: [FW1] Numerous panics on Sol7, FW1 4.1

Rajeev Kumar Wed, 07 Jun 2000 07:37:52 -0700

Alain,
        I am not sure exactly what is the problem here. But suggest you to save crash 
log and
analyze that. 
For Solaris 6, you need to uncomment the lines in /etc/init/syssetup :
========================================
##
## Default is to not do a savecore
##
#if [ ! -d /var/crash/`uname -n` ]
#then mkdir -m 0700 -p /var/crash/`uname -n`
#fi
#                echo 'checking for crash dump...\c '
#savecore /var/crash/`uname -n`
#                echo ''
=========================================

For Solaris 7,, savecore is already enabled by default, and managed by dumpadm (you 
can do
goodies like save crashdump on other than swap etc.

Well in any case you need to get the crash dump somewhere in /var/crash/<hostname> 
(specified in /etc/dumpadm.conf) and get hold of these two files :  
        vmcore.n  unix.n   (where n is the nth crashed dump files).
Now you can run crash utility to do fun stuff on these core files, these are the system
image when the machine panics. So my suggestion is:

(Following is what I would do in Solaris 6, but will be applicable to Solaris 7 also, 
see
crash man page *MUST* )

-> After panic boot system,  run command dmesg (which will give last console message) 
and
find out reason for panic and which process caused the panic to the system. Note down 
its
PID.

-> Run  crash utility (let's say on 8th dump)
        crash -d vmcore.8 -n unix.8   (This will give you  > prompt, where you run 
some special
command to find out)
                                (You may want to define output file for all your crash 
session also see -w option)
  Ex: To see the output of process table 
        > p -e     (This will show listing as ps -ef command).
        
        > p -e
PROC TABLE SIZE = 7978
SLOT ST  PID  PPID  PGID   SID   UID PRI   NAME        FLAGS
   0 t     0     0     0     0     0  96 sched          load sys lock
   1 s     1     0     0     0     0  58 init           load
   2 s     2     0     0     0     0  98 pageout        load sys lock nowait
   3 s     3     0     0     0     0  60 fsflush        load sys lock nowait
   4 s  1055     1  1055  1055     0  58 sac            load jctl

                (observe this and see the PID of you process here, which caused the 
crash. Sometimes it
is the child of some
                parent process(PPID). Where is the PID of this PPID and identify this 
process. This way
you can trace down the
                culprit process.)
        (Example: a copy(cp) process , might have caused panic to you system (you find 
out this
from dmesg output)
find that process in crash dump >p -e output.
So PID= 4957 PPID=4955
93 p  4957  4955   452   452   225  60 cp             load

which is invoked from some shell process. (PID=4955, PPID=4949)
107 s  4955  4949   452   452   225  29 sh             load
and so on. keep tracing and you will find final process. 
The last process in tracing would be (in general) init process with PID 1 (this is the
very first process in Solaris starts).

This way you can find more intresting stuff with crashdump. I stongly recommend reading
crash man page, since it Sun changes crash dump analysis based on every kernel 
release. I
did crash dump on Solaris 6, not on Solaris 7, but but extrapolating above I can try 
doing
on Solaris 7, although still waiting for Solaris 7 to crash somewhere with in my reach.

Hope this information helps.!!

Good Luck!

Rajeev



 
Alain Fauconnet wrote:
> 
> Hello,
> 
> I'm very sorry if this is a repost, but from the list archives it seems
> obvious that my two previous attempts have not made it to the list, so I
> try again.
> 
> We have CP-FW1 v4.1 (base):
> 
> # fw ver
> This is Check Point VPN-1(TM) & FireWall-1(R) Version 4.1 Build 41439
> [VPN]
> 
> running on a Sun Ultra-10 machine (128Mb memory) with a quad-Ethernet
> (qfe) card. O/S is Solaris 7 with recommended + relevant (qfe driver)
> patches installed.
> 
> This machine is giving us much worries. It panics up to 4 times a day,
> always with the same kind of message:
> 
> Jun  1 17:23:02 saturn unix: fw_send: NULL q (700c2320)
> Jun  1 17:32:04 saturn unix: fw_lock: already locked. current =
> fw_filter (in),
> previous = fw_filter (in), level=2
> Jun  1 17:32:05 saturn unix: FW-1: panic(1): fw_lock
> 
> Then it goes:
> 
> Jun  1 17:32:01 saturn unix: BAD TRAP: cpu=0 type=0x31 rp=0x704f6c78
> addr=0x24 mmu_fsr=0x0
> Jun  1 17:32:01 saturn unix: BAD TRAP occurred in module "fw" due to a
> NULL pointer dereference.
> 
> Hardware failure has been ruled out, because the disk has been moved to
> another Ultra-10 which showed the same panics. File corruption is very
> unlikely (all packages including CP-FW1 itself have been pkgchk'ed)
> 
> The machine has had a fresh Sol. 7 install, no would-be-guru system
> tuning and no unusual things running on it.
> Support from the local CP dealer has been... well... almost unexistent,
> so I'm sending this message in a bottle, hoping that someone on the list
> has heard of such problems.
> 
> Would this be a known problem fixed in SP-1 ?
> 
> By the way if I ever manage to get SP-1 from our local distributor (I'm
> almost despaired, they've been so clueless so far), will I need a new
> key or will my 4.1 "base" key work ?
> 
> Any hints will be GREATLY appreciated, this is becoming my nightmare.
> 
> Alain Fauconnet
> Sr. Unix Sysadmin
> CS Communications Co. Ltd.
> Thailand
> 
> ================================================================================
>      To unsubscribe from this mailing list, please see the instructions at
>                http://www.checkpoint.com/services/mailing.html
> ================================================================================

-- 
#########################################################################
 (Titanic creators used Linux to simulate the sinking of the great ship)
######################################################################### 
                    Rajeev  Kumar ([EMAIL PROTECTED])
        Fluent Inc. 10, Cavendish Court, Lebanon NH-03766
-------------------------------------------------------------------------
Phone :: (603)-643-2600 x 349    Fax :: (603)-643-3967
                Web:: http://www.fluent.com 
#########################################################################


================================================================================
     To unsubscribe from this mailing list, please see the instructions at
               http://www.checkpoint.com/services/mailing.html
================================================================================
Re: [FW1] Numerous panics on Sol7, FW1 4.1

Reply via email to