kernel panic on 2.2.14 in sg driver
I am seeing a kernel panic on 2.2.14. It looks like 2.2.16 also has the same problem. Details: I have been able to reproduce a kernel panic several times with kdb compiled in and some added printk debug messages and I have now pinpointed the problem. The panic occurs when the following call is made in scsi_ioctl_send_command() ("scsi_ioctl.c", line 329): if(SDpnt->scsi_request_fn) (*SDpnt->scsi_request_fn)(); I have verified that scsi_request_fn is a function pointer that points to the do_sd_request() function ("sd.c", line 530). By adding a debug printk() right before this call I can see that this function pointer contains: 0x8489ab80. This is the address where the panic occurs, labelled "?unknown?" in the stack trace below. After a panic, when I list the instructions at that address, the code that is there is the middle of a switch statement in the sg_ioctl() function. The function pointer is pointing to a bogus address. The problem appears to be due to the following: do_sd_request() is in the sd_mod.o module sg_ioctl() is in sg.o scsi_ioctl_send_command() is in scsi_mod.o Now sg.o and sd_mod.o have a dependency on scsi_mod.o, but sg.o and sd_mod.o are independent of each other. I can cause the kernel to panic by simply unloading sd_mod.o and then performing an ioctl (SCSI_IOCTL_SEND_COMMAND) on an open sg device. Since sg does not depend on sd_mod, the kernel loads sg.o when the sg device is opened, but does not know to load sd_mod.o. The call to "scsi_request_fn" then of course causes a panic. This problem is compounded by the fact that most modern Linux distributions (I'm running Caldera eServer 2.3 on this box), have an /etc/crontab entry that does: "/sbin/rmmod -a" every five minutes, so sd_mod gets autocleaned frequently. So I guess I have a couple questions: Does anyone know if the SCSI drivers have been redesigned to avoid this type of problem in the 2.4 kernel? Does anyone have a solution or workaround to this problem? As a workaround, I guess I could avoid the autoclean problem by doing "insmod sd_mod" in some startup script. Are there other instances of this type of problem elsewhere in the kernel? If you need/want any more information about this, just let me know... Thanks, Paul kernel stack trace: -- ?unknown? (?pointer?, arg, filp, -25, 1) sg_ioctl (filp->f_dentry->d_inode, filp, SCSI_IOCTL_SEND_COMMAND, arg) sys_ioctl (fd, SCSI_IOCTL_SEND_COMMAND, arg) system_call where `fd' is an open file descriptor for the scsi device and `filp' is a "struct file *" corresponding to the open file descriptor for the scsi device and `arg' is the "Scsi_Ioctl_Command *" (struct scsi_ioctl_command *) sent in from the original ioctl call, in this case containing the following: inlen = 0 outlen = 0 command = { 0 (TEST UNIT READY), 6, 0, 70, 98, 0 } data = {empty} - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Raid 1/upgrade to 2.2.16
David Lang wrote: > > One thing is that Redhat patches their kernels so the stock kernels > willnot work with the tools that redhat ships. > > another problem appears to be the fact that your new kernel is attempting > to load the modules created for the old kernel. > Yes. You need to run mkinitrd in order to create a ram disk with the new 2.2.16 modules that you just made. There's a man page for mkinitrd that should help... Paul > I can't tell you the redhat way of doing a kernel upgrade (things like > this are one of the resons I don't use redhat) but if you are useing the > stock kernel you will need to first apply the raid-0.9x patches (I don't > have the location handy, it is in the recent archive if they are available > somewhere) and for the modules issue, when you compile a new kernel you > will need to do a 'make modules' and 'make modules_install' unless you > configure the kernel include everything you need for your server and not > use modules. > > David Lang > > On Tue, 29 Aug 2000, Kevin Jones wrote: > > > Date: Tue, 29 Aug 2000 15:09:01 -0600 > > From: Kevin Jones <[EMAIL PROTECTED]> > > To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>, > > "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>, > > "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]> > > Subject: Raid 1/upgrade to 2.2.16 > > > > Installed Redhat 6.2. Set up 6 md's as Raid 1 during installation going > > across two 9 gig scsi drives. Installation went perfectly. Decided to > > upgrade to 2.2.16. Went through the regular steps to installing a kernel. > > Included the Raid 1 support, SCSI support, and initrd support in xconfig. > > Made the 2.2.16 kernel as "newlinux" in lilo. Copied initrd-2.2.14-5.0.img > > to init-2.2.16.0.img for the ramdisk. Relinked System.map to the System.map > > made by the make dep. Relinked Linux in /usr/src to the new kernel area. > > > > Now, all that being said, when I boot up the newlinux in lilo, I get this: > > > > > > Loading ncr53c8xx module > > /lib/ncr53c8xx.o: kernel-module version mismatch > > /lib/ncr53c8xx.o was compiled for kernel version 2.2.14-5.0 > > while this kernel version is 2.2.16. > > > > Loading raid1 module > > /lib/raid1.o: kernel-module version mismatch > > /lib/raid1.o was compiled for kernel version 2.2.14-5.0 > > while this kernel version is 2.2.16. > > > > Opps! Unable to load md0 > > KERNEL PANIC: Unable to load root filesystem at 09:00 > > > > > > I have tried a few things to "attempt" and patch the raid and scsi drivers, > > but nothing seems to work. Have a missed a step somewhere or are there new > > drivers/patches I should be looking for? I have tried installing > > raidtools-19990824-0.90, but to no avail. > > > > The Software-Raid-HOWTO does explain somethings, however it is based upon > > making the raid from scratch once you are logged into the system on regular > > sdaX or hdaX partitions. Do I have to set raid up in this manner in order to > > get this working? > > > > Sorry for posting this to 3 different mailing lists, but this seems to > > relate to all three. > > > > \\|// > > (o^o) > > +-+-+-+-+-+-+-+-oOOo-(_)-oOOo-+-+-+-+-+--+-+ > > Kevin C Jones > > System Administrator - NOC (Engineering) > > Shaw Cablesystems G.P. [EMAIL PROTECTED] > > Voice:(403)303-4812Fax:(403)750-4504 > > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to [EMAIL PROTECTED] > > Please read the FAQ at http://www.tux.org/lkml/ > > > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/