Re: [PATCH] aacraid: [Fastboot] Panics for AACRAID driverduring'insmod' for kexec test [take 4]
Hi Mark, I was going to try and test this patch rather than the last, but I am getting this compile error again where line 640 is the beginning of function aac_rx_init(): CC [M] drivers/scsi/aacraid/rx.o drivers/scsi/aacraid/rx.c: In function '_aac_rx_init': drivers/scsi/aacraid/rx.c:640: warning: ISO C90 forbids mixed declarations and code drivers/scsi/aacraid/rx.c:649: error: expected declaration or statement at end of input drivers/scsi/aacraid/rx.c:649: warning: control reaches end of non-void function make[3]: *** [drivers/scsi/aacraid/rx.o] Error 1 make[2]: *** [drivers/scsi/aacraid] Error 2 make[1]: *** [drivers/scsi] Error 2 make: *** [drivers] Error 2 I applied it to the scsi-misc tree I pulled yesterday after removing the old patch. Judith On Tue, Apr 03, 2007 at 11:58:17AM -0400, Salyzyn, Mark wrote: I will do you one better, James, I will slip in a little cleanup in sa.c (support file for the old PPC based ARC cards) where I discovered the restart platform function was ALSO left unset which could result in similar pain of null pointer discovery. Please note: The issue Judith ran into, where the card took longer than 3 minutes to initialize because of a problem drive may require the extension of the timeout to address (insmod parameter aacraid.startup_timeout=540 may do the trick). Extending the timeout may have been a fact of life given that the restart of the adapter normally occurs on BIOS load long before the driver instantiates settling the problem drives; if this is the case a small and lower priority follow-up hardening patch can help the users that find adding the insmod parameter repugnant in order to support kexec and kdump in the face of problem drives. Problem drives may have lead to the need to get a kernel dump ... You will find enclosed the pristine patch based on the initial patch, dropping the static function, and adding the three missing platform function initializations. Attached is the patch I feel will address this interrupt issue. As an added 'perk' I have also added the code to detect if the controller was previously initialized for interrupted operations by ANY operating system should the reset_devices kernel parameter not be set and we are dealing with a naïve kexec without the addition of this kernel parameter. The reset handler is also improved. Related to reset operations, but not pertinent specifically to this issue, I have also altered the handling somewhat so that we reset the adapter if we feel it is taking too long (three minutes) to start up. ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patches. This attached patch is against current scsi-misc-2.6 MINUS the initial version of this patch and the first patch that sets the missing platform function related to this discussion. Signed-off-by: Mark Salyzyn [EMAIL PROTECTED] --- Sincerely -- Mark Salyzyn -Original Message- From: James Bottomley [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 03, 2007 10:52 AM To: Salyzyn, Mark Cc: Judith Lebzelter; [EMAIL PROTECTED] Subject: RE: [PATCH] aacraid: [Fastboot] Panics for AACRAID driverduring'insmod' for kexec test. On Tue, 2007-04-03 at 09:30 -0400, Salyzyn, Mark wrote: 0x48 status code means the Firmware is trying to boot the Kernel. This phase is most likely blocked because of the hard drive failure as you suspected; the kernel is not declared up and running until after the drives have spun up, and a problem drive could be tricking the Firmware into a recovery loop holding things back ... I'm constructing what I hope will be the last pre 2.6.21 merge tree ... do you have a clean patch with the two necessary fixes for the panic you can send to the list? James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] aacraid: [Fastboot] Panics for AACRAID driver during 'insmod' for kexec test.
On Fri, Mar 30, 2007 at 10:30:48AM -0400, Salyzyn, Mark wrote: Thanks for the info. Attached is the patch I feel will address this issue. As an added 'perk' I have also added the code to detect if the controller was previously initialized for interrupted operations by ANY operating system should the reset_devices kernel parameter not be set and we are dealing with a naïve kexec without the addition of this kernel parameter. The reset handler is also improved. Related to reset operations, but not pertinent specifically to this issue, I have also altered the handling somewhat so that we reset the adapter if we feel it is taking too long (three minutes) to start up. We have not unit tested the reset_devices flag propagation to this driver code, nor have we unit tested the check for the interrupted operations under the conditions of a naively issued kexec. We are submitting this modified driver to our Q/A department for integration testing in our current programs. I would appreciate an ACK to this patch should it resolve the issue described in this thread... Mark; I am getting an error applying this patch: -bash-3.1# patch -p1 ../../aacraid_kexec.patch patching file drivers/scsi/aacraid/rx.c patch: malformed patch at line 36: @@ -526,6 +529,7 @@ Do you think you could regenerate it? Thanks; Judith ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patches. This attached patch is against current scsi-misc-2.6 Signed-off-by: Mark Salyzyn [EMAIL PROTECTED] --- Sincerely -- Mark Salyzyn -Original Message- From: Vivek Goyal [mailto:[EMAIL PROTECTED] Sent: Friday, March 30, 2007 2:06 AM To: Salyzyn, Mark Cc: Judith Lebzelter; linux-scsi@vger.kernel.org; AACRAID; fastboot@lists.osdl.org Subject: Re: [Fastboot] Panics for AACRAID driver during 'insmod' for kexec test. On Thu, Mar 29, 2007 at 10:17:18AM -0400, Salyzyn, Mark wrote: I have been working on a patch to the driver to do just this, reset the adapter during init if necessary. We want to limit the adapter's reset as it takes time (an additional 45 seconds or longer) for the Firmware to cycle... I will bump the priority of the testing for this patch. Hi, Thanks for looking into this. You can make device reset conditional. Now one command line parameter reset_devices has been defined for the kernel. You can reset the device only if the user has passed reset_devices command line option otherwise you can continue to boot normaly. I have introduced this parameter to handle the concern that in normal BIOS boot total boot time will increase. kexec/kdump will pass this parameter to second kernel so that device will be reset during initialization and normal BIOS boot will reamin unaffected. Thanks Vivek - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] aacraid: [Fastboot] Panics for AACRAID driver during 'insmod' for kexec test.
On Fri, Mar 30, 2007 at 01:21:33PM -0400, Salyzyn, Mark wrote: Resending patch file. I looked at the submission that showed on the list, and the original email, and a blank line dropped away at line 20 of the patch (!) Dunno, hope this comes through this second time. But if not, please add the blank line as referenced. Now I got this error which does not seem to be the result of the missing line: Hunk #3 FAILED at 529. Hunk #4 succeeded at 541 (offset 3 lines). Hunk #5 FAILED at 576. I tried manually editing in those two hunks and got an error on compile: C [M] drivers/scsi/aacraid/rx.o drivers/scsi/aacraid/rx.c: In function '_aac_rx_init': drivers/scsi/aacraid/rx.c:641: warning: ISO C90 forbids mixed declarations and code drivers/scsi/aacraid/rx.c:650: error: expected declaration or statement at end of input drivers/scsi/aacraid/rx.c:650: warning: control reaches end of non-void function make[3]: *** [drivers/scsi/aacraid/rx.o] Error 1 make[2]: *** [drivers/scsi/aacraid] Error 2 make[1]: *** [drivers/scsi] Error 2 make: *** [drivers] Error 2 I am pretty sure that I pasted okay, it is not that big a hunk and I tried it twice. Are you sure that the git tree you used is up to date? I am not sure why this is failing; it doesn't look off. Line 641 is actually the start of the next function aac_rx_init(), not _aac_rx_init(). Judith -Original Message- From: Judith Lebzelter [mailto:[EMAIL PROTECTED] Sent: Friday, March 30, 2007 1:10 PM To: Salyzyn, Mark Cc: [EMAIL PROTECTED]; Judith Lebzelter; linux-scsi@vger.kernel.org; fastboot@lists.osdl.org Subject: Re: [PATCH] aacraid: [Fastboot] Panics for AACRAID driver during 'insmod' for kexec test. On Fri, Mar 30, 2007 at 10:30:48AM -0400, Salyzyn, Mark wrote: Thanks for the info. Attached is the patch I feel will address this issue. As an added 'perk' I have also added the code to detect if the controller was previously initialized for interrupted operations by ANY operating system should the reset_devices kernel parameter not be set and we are dealing with a naïve kexec without the addition of this kernel parameter. The reset handler is also improved. Related to reset operations, but not pertinent specifically to this issue, I have also altered the handling somewhat so that we reset the adapter if we feel it is taking too long (three minutes) to start up. We have not unit tested the reset_devices flag propagation to this driver code, nor have we unit tested the check for the interrupted operations under the conditions of a naively issued kexec. We are submitting this modified driver to our Q/A department for integration testing in our current programs. I would appreciate an ACK to this patch should it resolve the issue described in this thread... Mark; I am getting an error applying this patch: -bash-3.1# patch -p1 ../../aacraid_kexec.patch patching file drivers/scsi/aacraid/rx.c patch: malformed patch at line 36: @@ -526,6 +529,7 @@ Do you think you could regenerate it? Thanks; Judith ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patches. This attached patch is against current scsi-misc-2.6 Signed-off-by: Mark Salyzyn [EMAIL PROTECTED] --- Sincerely -- Mark Salyzyn -Original Message- From: Vivek Goyal [mailto:[EMAIL PROTECTED] Sent: Friday, March 30, 2007 2:06 AM To: Salyzyn, Mark Cc: Judith Lebzelter; linux-scsi@vger.kernel.org; AACRAID; fastboot@lists.osdl.org Subject: Re: [Fastboot] Panics for AACRAID driver during 'insmod' for kexec test. On Thu, Mar 29, 2007 at 10:17:18AM -0400, Salyzyn, Mark wrote: I have been working on a patch to the driver to do just this, reset the adapter during init if necessary. We want to limit the adapter's reset as it takes time (an additional 45 seconds or longer) for the Firmware to cycle... I will bump the priority of the testing for this patch. Hi, Thanks for looking into this. You can make device reset conditional. Now one command line parameter reset_devices has been defined for the kernel. You can reset the device only if the user has passed reset_devices command line option otherwise you can continue to boot normaly. I have introduced this parameter to handle the concern that in normal BIOS boot total boot time will increase. kexec/kdump will pass this parameter to second kernel so that device will be reset during initialization and normal BIOS boot will reamin unaffected. Thanks Vivek - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Panics for AACRAID driver during 'insmod' for kexec test.
Hello, I have been running a series of kexec tests using LKDTT on the aacraid driver on this card (ASR-4805SAS (Marauder-E)) on x86_64 using the latest top of scsi-misc git-tree(as of yesterday), and I have found that it is not coming up consistantly when booted through kexec. I have included 4 different types of failures I found here because I assume they might be related, and thought maybe there could be an issue with the card's state on reboot (through kexec). The most common problem is this oops/panic, which has happened with various types of crash points (6 times out of 40): Loading aacraid.Adaptec aacraid driver (1.1-5[2437]-mh4)^M ko module^M ACPI: PCI Interrupt :03:0e.0[A] - Link [LNKC] - GSI 3 (level, low) - IRQ 3^M general protection fault: [1] ^M CPU 0 ^M Modules linked in: aacraid^M Pid: 0, comm: swapper Not tainted 2.6.21-rc3-kdump #1^M RIP: 0010:[88008a99] [88008a99] :aacraid:aac_intr_normal+0x17a/0x1b1^M RSP: :81523ed8 EFLAGS: 00010006^M RAX: 810004102000 RBX: 8100014f01e0 RCX: 0086^M RDX: 810004041540 RSI: 8100014f01e0 RDI: ^M RBP: 810004702cd8 R08: a6037e6c R09: 0016001562d7^M R10: 0023 R11: R12: 0011^M R13: 810004702cd8 R14: 810004001400 R15: ^M FS: () GS:814d5000() knlGS:^M CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b^M CR2: 006ba5a0 CR3: 0474d000 CR4: 06e0^M Process swapper (pid: 0, threadinfo 814e4000, task 81470360)^M Stack: 0011 810004702cd8 0100 0003^M 0001 88009470 810004041540^M 814d5080 810428f4 814d5080^M Call Trace:^M IRQ [88009470] :aacraid:aac_rx_intr_message+0x2c/0x60^M [810428f4] note_interrupt+0xd3/0x1db^M [8104319b] handle_level_irq+0x7e/0xab^M [8100b0b1] do_IRQ+0xd7/0x132^M [810085a1] mwait_idle+0x0/0x43^M [81009651] ret_from_intr+0x0/0xa^M EOI [810085e0] mwait_idle+0x3f/0x43^M [81008540] cpu_idle+0x3d/0x5c^M [814e78d2] start_kernel+0x28f/0x29b^M [814e7140] _sinittext+0x140/0x144^M ^M ^M Code: ff 53 38 eb 20 9c 58 fa 83 7b 30 00 75 07 c7 43 30 01 00 00 ^M RIP [88008a99] :aacraid:aac_intr_normal+0x17a/0x1b1^M Kernel panic - not syncing: Aiee, killing interrupt handler!^M Another failure: for crash point 'TIMERADD-bug' I got this error loading insmod: Loading aacraid.Adaptec aacraid driver (1.1-5[2437]-mh4)^M ko module^M ACPI: PCI Interrupt :03:0e.0[A] - Link [LNKC] - GSI 3 (level, low) - IRQ 3^M input: ImExPS/2 Generic Explorer Mouse as /class/input/input3^M aacraid: aac_fib_send: adapter blinkLED 0xc2.^M Usually a result of a serious unrecoverable hardware problem^M aac_fib_free, XferState != 0, fibptr = 0x8100014f, XferState = 0x810ad^M aacraid: probe of :03:0e.0 failed with error -14^M Yet another Failure: for crash point 'TIMERADD-panic' I got this error during insmod: Loading aacraid.Adaptec aacraid driver (1.1-5[2437]-mh4)^M ko module^M ACPI: PCI Interrupt :03:0e.0[A] - Link [LNKC] - GSI 3 (level, low) - IRQ 3^M input: ImExPS/2 Generic Explorer Mouse as /class/input/input3^M Ecr^H ^H^H ^H^H ^HBUG: soft lockup detected on CPU#0!^M ^M Call Trace:^M IRQ [8102bcbb] update_process_times+0x3b/0x5f^M [8100bebf] main_timer_handler+0x2f/0x1ae^M [8102b504] run_timer_softirq+0x14/0x161^M [8100c050] timer_interrupt+0x12/0x27^M [81041f9c] handle_IRQ_event+0x25/0x53^M [81028c1b] __do_softirq+0x46/0x90^M [81043186] handle_level_irq+0x69/0xab^M [8100b0b1] do_IRQ+0xd7/0x132^M [81009651] ret_from_intr+0x0/0xa^M EOI [811229ed] __delay+0x8/0x10^M [88007c68] :aacraid:aac_fib_send+0x1ba/0x234^M [880048aa] :aacraid:aac_get_adapter_info+0x76/0x536^M [88002bb3] :aacraid:aac_probe_one+0x236/0x457^M [8112bd6d] pci_device_probe+0x4c/0x75^M [8117d0da] really_probe+0xc4/0x148^M [8117d30b] __driver_attach+0x6d/0xab^M [8117d29e] __driver_attach+0x0/0xab^M [8117d29e] __driver_attach+0x0/0xab^M [8117c5b2] bus_for_each_dev+0x43/0x6e^M [8117c8f4] bus_add_driver+0x6b/0x18d^M [8112bf0b] __pci_register_driver+0x72/0xa7^M [8801203a] :aacraid:aac_init+0x3a/0x75^M [8103bafc] sys_init_module+0x1195/0x12e6^M [8100913e] system_call+0x7e/0x83^M ^M BUG: soft lockup detected on CPU#0!^M One last error I got for INT_TASKLET_ENTRY-exception was this after the filesystem is mounted and I am copying the vmcore file to it: Copying the dump aacraid: Host adapter abort request (4,0,0,0) aacraid: Host adapter abort request (4,0,0,0) aacraid: Host adapter reset request. SCSI hang
[ PATCH ] mptsas: Fix oops during driver load time(rev 2)
This fixes an oops during driver load time. mptsas_probe calls mpt_attach(over in mptbase.c). Inside that call, we read some manufacturing config pages to setup some defaults. While reading the config pages, the firmware doesn't complete the reply in time, and we have a timeout. The timeout results in hardreset handler being called. The hardreset handler calls all the fusion upper layer driver reset callback handlers. The mptsas_ioc_reset function is the callback handler in mptsas.c. In summary, mptsas_ioc_reset is getting called before scsi_host_alloc is called, and the pointer ioc-sh is NULL, as well as the hostdata. Signed-off-by: Judith Lebzelter [EMAIL PROTECTED] --- Sorry I was not more descriptive. Here is the patch with Eric's description as requested. Index: linux-2.6.21-rc3/drivers/message/fusion/mptsas.c === --- linux-2.6.21-rc3.orig/drivers/message/fusion/mptsas.c +++ linux-2.6.21-rc3/drivers/message/fusion/mptsas.c @@ -815,7 +815,7 @@ mptsas_taskmgmt_complete(MPT_ADAPTER *io static int mptsas_ioc_reset(MPT_ADAPTER *ioc, int reset_phase) { - MPT_SCSI_HOST *hd = (MPT_SCSI_HOST *)ioc-sh-hostdata; + MPT_SCSI_HOST *hd; struct mptsas_target_reset_event *target_reset_list, *n; int rc; @@ -827,7 +827,10 @@ mptsas_ioc_reset(MPT_ADAPTER *ioc, int r if (reset_phase != MPT_IOC_POST_RESET) goto out; - if (!hd || !hd-ioc) + if (!ioc-sh || !ioc-sh-hostdata) + goto out; + hd = (MPT_SCSI_HOST *)ioc-sh-hostdata; + if (!hd-ioc) goto out; if (list_empty(hd-target_reset_list)) - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ PATCH ] mptsas: Fix oops for insmod during kexec
Hello, This patch is to fix an oops on insmod for mptsas during kexec. This applies to 2.6.21-rc3. Signed-off-by: Judith Lebzelter [EMAIL PROTECTED] --- Index: linux-2.6.21-rc3/drivers/message/fusion/mptsas.c === --- linux-2.6.21-rc3.orig/drivers/message/fusion/mptsas.c +++ linux-2.6.21-rc3/drivers/message/fusion/mptsas.c @@ -815,7 +815,7 @@ mptsas_taskmgmt_complete(MPT_ADAPTER *io static int mptsas_ioc_reset(MPT_ADAPTER *ioc, int reset_phase) { - MPT_SCSI_HOST *hd = (MPT_SCSI_HOST *)ioc-sh-hostdata; + MPT_SCSI_HOST *hd; struct mptsas_target_reset_event *target_reset_list, *n; int rc; @@ -827,7 +827,10 @@ mptsas_ioc_reset(MPT_ADAPTER *ioc, int r if (reset_phase != MPT_IOC_POST_RESET) goto out; - if (!hd || !hd-ioc) + if (!ioc-sh || !ioc-sh-hostdata) + goto out; + hd = (MPT_SCSI_HOST *)ioc-sh-hostdata; + if (!hd-ioc) goto out; if (list_empty(hd-target_reset_list)) - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html