kernel panic on 2.2.14 in sg driver

2000-11-16 Thread Paul Clements


I am seeing a kernel panic on 2.2.14. It looks like 2.2.16 also has the
same problem.

Details:

I have been able to reproduce a kernel panic several times with kdb
compiled in and some added printk debug messages and I have now
pinpointed the problem. The panic occurs when the following call is made
in scsi_ioctl_send_command() ("scsi_ioctl.c", line 329):

  if(SDpnt->scsi_request_fn)
(*SDpnt->scsi_request_fn)();

I have verified that scsi_request_fn is a function pointer that points
to the do_sd_request() function ("sd.c", line 530). By adding a debug
printk() right before this call I can see that this function pointer
contains: 0x8489ab80. This is the address where the panic occurs,
labelled "?unknown?" in the stack trace below. After a panic, when I
list the instructions at that address, the code that is there is the
middle of a switch statement in the sg_ioctl() function. The function
pointer is pointing to a bogus address.

The problem appears to be due to the following: 

do_sd_request() is in the sd_mod.o module
sg_ioctl() is in sg.o
scsi_ioctl_send_command() is in scsi_mod.o 

Now sg.o and sd_mod.o have a dependency on scsi_mod.o, but sg.o and
sd_mod.o are independent of each other. I can cause the kernel to panic
by simply unloading sd_mod.o and then performing an ioctl
(SCSI_IOCTL_SEND_COMMAND) on an open sg device. 

Since sg does not depend on sd_mod, the kernel loads sg.o when the sg
device is opened, but does not know to load sd_mod.o. The call to
"scsi_request_fn" then of course causes a panic. This problem is
compounded by the fact that most modern Linux distributions (I'm running
Caldera eServer 2.3 on this box), have an /etc/crontab entry that does:
"/sbin/rmmod -a" every five minutes, so sd_mod gets autocleaned
frequently.

So I guess I have a couple questions:

Does anyone know if the SCSI drivers have been redesigned to avoid this
type of problem in the 2.4 kernel?

Does anyone have a solution or workaround to this problem? As a
workaround, I guess I could avoid the autoclean problem by doing "insmod
sd_mod" in some startup script.

Are there other instances of this type of problem elsewhere in the
kernel? 

If you need/want any more information about this, just let me know...

Thanks,
Paul



kernel stack trace:
--
?unknown? (?pointer?, arg, filp, -25, 1)
sg_ioctl (filp->f_dentry->d_inode, filp, SCSI_IOCTL_SEND_COMMAND, arg)
sys_ioctl (fd, SCSI_IOCTL_SEND_COMMAND, arg)
system_call
 
where `fd' is an open file descriptor for the scsi device
 
and `filp' is a "struct file *" corresponding to the open file
descriptor for the scsi device
 
and `arg' is the "Scsi_Ioctl_Command *" (struct scsi_ioctl_command *)
sent in from the original ioctl call, in this case containing the
following:
 
 inlen = 0
 outlen = 0
 command = { 0 (TEST UNIT READY), 6, 0, 70, 98, 0 }
 data = {empty}
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Raid 1/upgrade to 2.2.16

2000-08-29 Thread Paul Clements

David Lang wrote:
> 
> One thing is that Redhat patches their kernels so the stock kernels
> willnot work with the tools that redhat ships.
> 
> another problem appears to be the fact that your new kernel is attempting
> to load the modules created for the old kernel.
>

Yes. You need to run mkinitrd in order to create a ram disk with the new
2.2.16
modules that you just made. There's a man page for mkinitrd that should
help... 

Paul

 
> I can't tell you the redhat way of doing a kernel upgrade (things like
> this are one of the resons I don't use redhat) but if you are useing the
> stock kernel you will need to first apply the raid-0.9x patches (I don't
> have the location handy, it is in the recent archive if they are available
> somewhere) and for the modules issue, when you compile a new kernel you
> will need to do a 'make modules' and 'make modules_install' unless you
> configure the kernel include everything you need for your server and not
> use modules.
> 
> David Lang
> 
>  On Tue, 29 Aug 2000, Kevin Jones wrote:
> 
> > Date: Tue, 29 Aug 2000 15:09:01 -0600
> > From: Kevin Jones <[EMAIL PROTECTED]>
> > To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>,
> >  "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>,
> >  "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>
> > Subject: Raid 1/upgrade to 2.2.16
> >
> > Installed Redhat 6.2. Set up 6 md's as Raid 1 during installation going
> > across two 9 gig scsi drives.  Installation went perfectly. Decided to
> > upgrade to 2.2.16. Went through the regular steps to installing a kernel.
> > Included the Raid 1 support, SCSI support, and initrd support in xconfig.
> > Made the 2.2.16 kernel as "newlinux" in lilo. Copied initrd-2.2.14-5.0.img
> > to init-2.2.16.0.img for the ramdisk. Relinked System.map to the System.map
> > made by the make dep.  Relinked Linux in /usr/src to the new kernel area.
> >
> > Now, all that being said, when I boot up the newlinux in lilo, I get this:
> >
> > 
> > Loading ncr53c8xx module
> > /lib/ncr53c8xx.o: kernel-module version mismatch
> >  /lib/ncr53c8xx.o was compiled for kernel version 2.2.14-5.0
> >  while this kernel version is 2.2.16.
> >
> > Loading raid1 module
> > /lib/raid1.o: kernel-module version mismatch
> >  /lib/raid1.o was compiled for kernel version 2.2.14-5.0
> >  while this kernel version is 2.2.16.
> >
> > Opps! Unable to load md0
> > KERNEL PANIC: Unable to load root filesystem at 09:00
> > 
> >
> > I have tried a few things to "attempt" and patch the raid and scsi drivers,
> > but nothing seems to work.  Have a missed a step somewhere or are there new
> > drivers/patches I should be looking for?  I have tried installing
> > raidtools-19990824-0.90, but to no avail.
> >
> > The Software-Raid-HOWTO does explain somethings, however it is based upon
> > making the raid from scratch once you are logged into the system on regular
> > sdaX or hdaX partitions. Do I have to set raid up in this manner in order to
> > get this working?
> >
> > Sorry for posting this to 3 different mailing lists, but this seems to
> > relate to all three.
> >
> > \\|//
> > (o^o)
> > +-+-+-+-+-+-+-+-oOOo-(_)-oOOo-+-+-+-+-+--+-+
> >  Kevin C Jones
> >   System Administrator - NOC (Engineering)
> >  Shaw Cablesystems G.P.  [EMAIL PROTECTED]
> >   Voice:(403)303-4812Fax:(403)750-4504
> > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [EMAIL PROTECTED]
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/