On Wed, 09 Feb 2005, [EMAIL PROTECTED] wrote: > > seems like sdev->shost is bogus when fc_remote_port_block() is > > called... > > We haven't seen this in our testing.... >
Actually it's not the sdev->host that's bogus -- it appears the sdev is referenced after it's been freed -- a reference still present in the shost->__devices list. Here's the scenario: * 1 lun connected to HBA1 -- sdev created for the lun via rport: *** sdev=d76196e8 host=d36f0000 state=2 gdev=d761987c * mid-layer performs linear scan of non-existent ID's (via scsi_sysfs_target_initialize(): *** adding sdev=dd2bc738 sdev->siblings=dd2bc740, __dev=d36f0000 id=1 emp=0 ... *** adding sdev=dd2bc738 sdev->siblings=dd2bc740, __dev=d36f0000 id=509 emp=0 *** adding sdev=dd2bc738 sdev->siblings=dd2bc740, __dev=d36f0000 id=510 emp=0 *** adding sdev=dd2bc738 sdev->siblings=dd2bc740, __dev=d36f0000 id=511 emp=0 * remove lun from the fabric (port-side cable pull). * driver recognizes loss via RSCN, issues fc_remote_port_block(), starget_for_each_device() -> shost_for_each_device() -> __scsi_iterate_devices() where scsi_device_get() is called for reference. 1st sdev valid (ok): *** ctr=0 sdev=d76196e8 host=d36f0000 state=2 gdev=d761987c id=0 *** sdev=d76196e8 host=d36f0000 state=2 gdev=d761987c 2nd sdev invalid -- note old sdev (dd2bc738) from previous linear scan: *** ctr=0 sdev=dd2bc738 host=6b6b6b6b state=1802201963 gdev=dd2bc8cc id=1802201963 *** sdev=dd2bc738 host=6b6b6b6b state=1802201963 gdev=dd2bc8cc [BLAH] Unable to handle kernel paging request at virtual address 6b6b6be7 printing eip: c028ef06 *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: qla2322 qla2xxx CPU: 0 EIP: 0060:[<c028ef06>] Not tainted VLI EFLAGS: 00010086 (2.6.11-rport) EIP is at scsi_device_get+0x56/0xa0 eax: 6b6b6b6b ebx: dd2bc738 ecx: c035f844 edx: fffffffa esi: dd2bc8cc edi: d36f0000 ebp: 00000001 esp: df693dd4 ds: 007b es: 007b ss: 0068 Process qla2322_1_dpc (pid: 11316, threadinfo=df692000 task=d9fa8530) Stack: c0341fcc dd2bc738 6b6b6b6b 6b6b6b6b dd2bc8cc dd2bc738 d76196f0 c028f011 c0341ff4 00000000 dd2bc738 6b6b6b6b 6b6b6b6b dd2bc8cc 6b6b6b6b 00000282 d76196e8 d76196e8 ddd7e790 d36f0000 c029af50 c028f0bd 00000000 dbe8512c Cale Trace: [<c028f011>] __scsi_iterate_devices+0x71/0xb0 [<c029af50>] fc_device_block+0x0/0x10 [<c028f0bd>] starget_for_each_device+0x6d/0x80 [<c029afff>] fc_remote_port_block+0x3f/0x70 [<e08633d3>] qla2x00_mark_device_lost+0x53/0xe0 [qla2xxx] signature very consistent. Another quirk when run with no storage connected to HBAs and the driver is loaded, then unloaded -- is a consistent BUG() hit in _raw_spin_lock() via scsi_forget_host(): kernel BUG at include/asm/spinlock.h:149! invalid operand: 0000 [#1] SMP Modules linked in: qla2322 qla2xxx CPU: 1 EIP: 0060:[<c030b373>] Not tainted VLI EFLAGS: 00010096 (2.6.11-rport) EIP is at _spin_lock_irqsave+0x53/0x60 eax: 0000000e ebx: 00000282 ecx: c035f80c edx: 00000082 esi: 6b6b6bab edi: d86f1ecc ebp: d348d530 esp: d86f1ea4 ds: 007b es: 007b ss: 0068 Process rmmod (pid: 11209, threadinfo=d86f0000 task=d348d530) Stack: c031e548 c030960c 6b6b6ba3 6b6b6bab c030960c 00000000 d348d530 c0117610 00000000 00000000 0000006b d3920000 6b6b6b6b da0c3b74 d3920000 6b6b6b6b d86f0000 c03097d3 d392002c 0000006b c0297656 6b6b6b63 d3920000 6b6b6b63 Call Trace: [<c030960c>] __down+0x3c/0xe0 [<c030960c>] __down+0x3c/0xe0 [<c0117610>] default_wake_function+0x0/0x10 [<c03097d3>] __down_failed+0x7/0xc [<c0297656>] .text.lock.scsi_sysfs+0x8/0x22 [<c0296061>] scsi_forget_host+0x31/0x60 [<c028f3e1>] scsi_remove_host+0x11/0x60 [<e08629df>] qla2x00_remove_one+0x1f/0x40 [qla2xxx] [<c01f9108>] pci_device_remove+0x28/0x30 [<c024cc04>] device_release_driver+0x74/0x80 [<c024cc28>] driver_detach+0x18/0x30 [<c024d13c>] bus_remove_driver+0x5c/0xa0 [<c024d6c8>] driver_unregister+0x8/0x30 [<c01f933b>] pci_unregister_driver+0xb/0x20 [<c013279e>] sys_delete_module+0x16e/0x190 [<c014b61a>] unmap_vma_list+0x1a/0x30 [<c014b9c5>] do_munmap+0x115/0x160 [<c014ba5a>] sys_munmap+0x4a/0x70 [<c010308d>] sysenter_past_esp+0x52/0x75 Code: 90 80 3e 00 7e f9 fa eb e8 89 d8 8b 74 24 0c 8b 5c 24 08 83 c4 10 c3 c7 04 24 48 e5 31 c0 8b 44 host variable seems to be hosed. Perhaps I'm doing something wrong during shutdown -- just the standard scsi_remove_host(), I also tried to add the fc_remove_host() call (as per directed in comments) but same results occured... Andrew Vasquez - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html