Re: Debugging scsi abort handling ?
On Thu, 2014-08-28 at 14:04 +0200, Hannes Reinecke wrote: On 08/25/2014 01:15 PM, Paolo Bonzini wrote: - abort the command, and then the driver should never call the -scsi_done callback for the Scsi_Cmnd*. In practice we rely on the latter behaviour; when -scsi_done is called while the command is under eh_abort _really bad things_ will happen. As soon as eh_abort is called control is transferred back to the SCSI midlayer, so any LLDD should never send completions for these commands back to the midlayer. Mmh, then there is a small race window starting at the point in time when the abort function of an LLDD is called up to the point in time when that abort function has taken the necessary measures to make sure that scsi_done won't be called for a racing command completion , isn't it? Cheers Martin -- Linux on System z Development IBM Deutschland Research Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] scsi: Use Scsi_Host as argument for eh_host_reset_handler
On Fri, 2014-06-27 at 13:04 +0200, Hannes Reinecke wrote: On 06/27/2014 12:47 PM, Steffen Maier wrote: On 06/27/2014 08:27 AM, Hannes Reinecke wrote: Issuing a host reset should not rely on any commands. So use Scsi_Host as argument for eh_host_reset_handler. Signed-off-by: Hannes Reinecke h...@suse.de --- drivers/s390/scsi/zfcp_scsi.c | 3 +- 69 files changed, 289 insertions(+), 379 deletions(-) diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c index dc42c93..fe50f69 100644 --- a/drivers/s390/scsi/zfcp_scsi.c +++ b/drivers/s390/scsi/zfcp_scsi.c @@ -281,13 +281,14 @@ static int zfcp_scsi_eh_target_reset_handler(struct scsi_cmnd *scpnt) return zfcp_task_mgmt_function(scpnt, FCP_TMF_TGT_RESET); } -static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt) +static int zfcp_scsi_eh_host_reset_handler(struct Scsi_Host *host) { struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(scpnt-device); Scpnt argument no longer exists, so this line must likely go away to compile. struct zfcp_adapter *adapter = zfcp_sdev-port-adapter; This needs the assignment from the added line below since zfcp_sdev no longer exists. Alternatively, drop the assignment here, only have the auto variable definition, and do the assignment below with the added line. struct fc_rport *rport = zfcp_sdev-port-rport; int ret; +adapter = (struct zfcp_adapter *)host-hostdata[0]; zfcp_erp_adapter_reopen(adapter, 0, schrh_1); zfcp_erp_wait(adapter); ret = fc_block_scsi_eh(rport); A question just for my understanding: Calling fc_block_scsi_eh at the end *after* our adapter recovery is OK to wait for rports to become unblocked (or fast_io_failed)? I would guess so. Hehe. That's actually a question for _you_ :-) git says: commit af4de36d911ab907b92c5f3f81ceff8474ed7485 Author: Christof Schmitt christof.schm...@de.ibm.com Date: Tue Nov 24 16:54:16 2009 +0100 [SCSI] zfcp: Block scsi_eh thread for rport state BLOCKED In case the SCSI error recovery starts because of a SCSI command timeout, but then something else triggers the rport to be deleted, the SCSI error recovery will run to the end and set the SCSI device offline. To prevent this, call the FC transport function fc_block_scsi_eh which waits until the rport leaves the BLOCKED state. This guarantees that communication is possible if the rport is ONLINE, or the SCSI devices will be removed if the rport state switches to NOT_PRESENT. Not sure if this reasoning is (still) valid. Isn't command timeouts the only reason for scsi eh to kick in? Martin -- Linux on System z Development IBM Deutschland Research Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] scsi_debug: fix endianness bug in sdebug_build_parts()
On Sat, 2013-07-27 at 02:07 +0900, Akinobu Mita wrote: 2013/7/26 Martin Peschke mpesc...@linux.vnet.ibm.com: On Mon, 2013-07-15 at 20:52 +0900, Akinobu Mita wrote: With module parameter num_parts 0, partition table is built on the ramdisk storage when loading the driver. Unfortunately, there is an endianness bug in sdebug_build_parts(). So the partition table is not correctly initialized on big-endian systems. Signed-off-by: Akinobu Mita akinobu.m...@gmail.com Cc: James E.J. Bottomley jbottom...@parallels.com Cc: Douglas Gilbert dgilb...@interlog.com Cc: linux-scsi@vger.kernel.org --- drivers/scsi/scsi_debug.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c index cb4fefa..2f39b13 100644 --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -2659,8 +2659,8 @@ static void __init sdebug_build_parts(unsigned char *ramp, / sdebug_sectors_per; pp-end_sector = (end_sec % sdebug_sectors_per) + 1; - pp-start_sect = start_sec; - pp-nr_sects = end_sec - start_sec + 1; + pp-start_sect = cpu_to_le32(start_sec); + pp-nr_sects = cpu_to_le32(end_sec - start_sec + 1); pp-sys_ind = 0x83; /* plain Linux partition */ } } I have posted the same fix several times, e.g. http://marc.info/?l=linux-scsim=137051617907423w=2 Good luck! Acked-by: Martin Peschke mpesc...@linux.vnet.ibm.com Thanks. I found this problem from sparse warning noticed by 0-DAY kernel build testing. It surfaced as a real problem on System z (big endian). That is why I can confirm that the above patch works. Thanks, Martin -- Linux on System z Development IBM Deutschland Research Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] scsi_debug: fix endianness bug in sdebug_build_parts()
On Mon, 2013-07-15 at 20:52 +0900, Akinobu Mita wrote: With module parameter num_parts 0, partition table is built on the ramdisk storage when loading the driver. Unfortunately, there is an endianness bug in sdebug_build_parts(). So the partition table is not correctly initialized on big-endian systems. Signed-off-by: Akinobu Mita akinobu.m...@gmail.com Cc: James E.J. Bottomley jbottom...@parallels.com Cc: Douglas Gilbert dgilb...@interlog.com Cc: linux-scsi@vger.kernel.org --- drivers/scsi/scsi_debug.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c index cb4fefa..2f39b13 100644 --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -2659,8 +2659,8 @@ static void __init sdebug_build_parts(unsigned char *ramp, / sdebug_sectors_per; pp-end_sector = (end_sec % sdebug_sectors_per) + 1; - pp-start_sect = start_sec; - pp-nr_sects = end_sec - start_sec + 1; + pp-start_sect = cpu_to_le32(start_sec); + pp-nr_sects = cpu_to_le32(end_sec - start_sec + 1); pp-sys_ind = 0x83; /* plain Linux partition */ } } I have posted the same fix several times, e.g. http://marc.info/?l=linux-scsim=137051617907423w=2 Good luck! Acked-by: Martin Peschke mpesc...@linux.vnet.ibm.com -- Linux on System z Development IBM Deutschland Research Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND] scsi_debug: Fix endianess in partition table
James, could you pick up this fix, please? Btw., it looks like others where bitten by this one: https://nazar.karan.org/blob/distro! parted.git/8780c767126938173f49dfbcd4360813932f7756/SOURCES! disable-t9020.patch Thanks, Martin Both start_sect and nr_sects in struct partition are __le32 and require cpu_to_le32() on assignment. Without this fix tools like fdisk show an invalid partition table for SCSI devices emulated by scsi_debug on big-endian architectures, like s390x. Besides a kernel message like this was emitted: sda: p1 start 536870912 is beyond EOD, enabling native capacity sda: p1 start 536870912 is beyond EOD, truncated For verification 'xxd -l 512 /dev/sda' has been used to make sure that this fix makes scsi_debug generated partition tables on s390x look like the ones generated on my laptop. Signed-off-by: Martin Peschke mpesc...@linux.vnet.ibm.com Reviewed-by: Steffen Maier ma...@linux.vnet.ibm.com Acked-by: Douglas Gilbert dgilb...@interlog.com --- drivers/scsi/scsi_debug.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -2662,8 +2662,8 @@ static void __init sdebug_build_parts(un / sdebug_sectors_per; pp-end_sector = (end_sec % sdebug_sectors_per) + 1; - pp-start_sect = start_sec; - pp-nr_sects = end_sec - start_sec + 1; + pp-start_sect = cpu_to_le32(start_sec); + pp-nr_sects = cpu_to_le32(end_sec - start_sec + 1); pp-sys_ind = 0x83; /* plain Linux partition */ } } -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How to fix this sleep-inside-lock problem?
Hi, I would like to ask for advice, prior to submitting a patch for our lldd zfcp, or alternatively for common code. Someone reported this warning and function call stack: BUG: sleeping function called from invalid context at kernel/workqueue.c:2752 in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700 CPU: 1 Not tainted 3.9.3+ #69 Process zfcperp0.0.1700 (pid: 360, task: 75b7e080, ksp: 7476bc30) snip Call Trace: ([001165de] show_trace+0x106/0x154) [001166a0] show_stack+0x74/0xf4 [006ff646] dump_stack+0xc6/0xd4 [0017f3a0] __might_sleep+0x128/0x148 [0015ece8] flush_work+0x54/0x1f8 [001630de] __cancel_work_timer+0xc6/0x128 [005067ac] scsi_device_dev_release_usercontext+0x164/0x23c [00161816] execute_in_process_context+0x96/0xa8 [004d33d8] device_release+0x60/0xc0 [0048af48] kobject_release+0xa8/0x1c4 [004f4bf2] __scsi_iterate_devices+0xfa/0x130 [03ff801b307a] zfcp_erp_strategy+0x4da/0x1014 [zfcp] [03ff801b3caa] zfcp_erp_thread+0xf6/0x2b0 [zfcp] [0016b75a] kthread+0xf2/0xfc [0070c9de] kernel_thread_starter+0x6/0xc [0070c9d8] kernel_thread_starter+0x0/0xc Apparently, the ref_count for some scsi_device drops down to zero, triggering device removal through execute_in_process_context(), while the lldd error recovery thread iterates through a scsi device list. Unfortunately, execute_in_process_context() decides to immediately execute that device removal function, instead of scheduling asynchronous execution, since it detects process context and thinks it is safe to do so. But almost all calls to shost_for_each_device() in our lldd are inside spin_lock_irq, even in thread context. Obviously, schedule() inside spin_lock_irq sections is a bad idea. Does it make sense to teach execute_in_process_context() to run a function asynchronously when irqs are disabled, as inside a spin_lock_irq section? If so, is the addition of an irqs_disabled() check sufficient? Or is it preferable to change the lldd to use __shost_for_each_device() with shost-host_lock? (hoping that doesn't result in locking order issues with host_lock inside our lldd erp_lock...) Other suggestions? The problem has been introduced when our LUN related data was moved to the midlayer (commit b62a8d9b45b971a67a0f8413338c230e3117dff5) back in 2010. Thanks, Martin -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
PATCH] scsi_debug: Fix endianess in partition table
James, could you pick up this fix, preferably for 3.10, please? I am not the only one you has run into the issue: https://nazar.karan.org/blob/distro! parted.git/8780c767126938173f49dfbcd4360813932f7756/SOURCES! disable-t9020.patch Thanks, Martin Both start_sect and nr_sects in struct partition are __le32 and require cpu_to_le32() on assignment. Without this fix tools like fdisk show an invalid partition table for SCSI devices emulated by scsi_debug on big-endian architectures, like s390x. Besides a kernel message like this was emitted: sda: p1 start 536870912 is beyond EOD, enabling native capacity sda: p1 start 536870912 is beyond EOD, truncated For verification 'xxd -l 512 /dev/sda' has been used to make sure that this fix makes scsi_debug generated partition tables on s390x look like the ones generated on my laptop. Signed-off-by: Martin Peschke mpesc...@linux.vnet.ibm.com Reviewed-by: Steffen Maier ma...@linux.vnet.ibm.com Acked-by: Douglas Gilbert dgilb...@interlog.com --- drivers/scsi/scsi_debug.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -2662,8 +2662,8 @@ static void __init sdebug_build_parts(un / sdebug_sectors_per; pp-end_sector = (end_sec % sdebug_sectors_per) + 1; - pp-start_sect = start_sec; - pp-nr_sects = end_sec - start_sec + 1; + pp-start_sect = cpu_to_le32(start_sec); + pp-nr_sects = cpu_to_le32(end_sec - start_sec + 1); pp-sys_ind = 0x83; /* plain Linux partition */ } } -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] scsi_debug: Fix endianess in partition table
James, could you pick up this fix, please? Btw., it looks like others where bitten by this one: https://nazar.karan.org/blob/distro! parted.git/8780c767126938173f49dfbcd4360813932f7756/SOURCES! disable-t9020.patch Thanks, Martin Both start_sect and nr_sects in struct partition are __le32 and require cpu_to_le32() on assignment. Without this fix tools like fdisk show an invalid partition table for SCSI devices emulated by scsi_debug on big-endian architectures, like s390x. Besides a kernel message like this was emitted: sda: p1 start 536870912 is beyond EOD, enabling native capacity sda: p1 start 536870912 is beyond EOD, truncated For verification 'xxd -l 512 /dev/sda' has been used to make sure that this fix makes scsi_debug generated partition tables on s390x look like the ones generated on my laptop. Signed-off-by: Martin Peschke mpesc...@linux.vnet.ibm.com Reviewed-by: Steffen Maier ma...@linux.vnet.ibm.com Acked-by: Douglas Gilbert dgilb...@interlog.com --- drivers/scsi/scsi_debug.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -2662,8 +2662,8 @@ static void __init sdebug_build_parts(un / sdebug_sectors_per; pp-end_sector = (end_sec % sdebug_sectors_per) + 1; - pp-start_sect = start_sec; - pp-nr_sects = end_sec - start_sec + 1; + pp-start_sect = cpu_to_le32(start_sec); + pp-nr_sects = cpu_to_le32(end_sec - start_sec + 1); pp-sys_ind = 0x83; /* plain Linux partition */ } } -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_debug: Fix endianess in partition table
On Tue, 2013-02-12 at 09:45 -0500, Douglas Gilbert wrote: However since SCSI is big endian and you are introducing some le code then a line or so of explanation (comments) in your revised patch might be helpful. I would argue that the definition of struct partition itself should be sufficient explanation. Only scsi_debug failed to assign values in a correct manner. BTW Finding a big endian architecture to test this patch on is not easy. No big deal ;-) I can help out with test data from my System z (aka s390x). Without fix: [root@ ~]# xxd -l 512 /dev/sda snip 1b0: 0001 1c0: 0100 8307 203f 0020 3fe0 ?... ..?... 1d0: 1e0: 1f0: 55aa ..U. With fix: [root@ ~]# xxd -l 512 /dev/sda snip 1b0: 0001 1c0: 0100 8307 203f 2000 e03f ? ? 1d0: 1e0: 1f0: 55aa ..U. Which makes it exactly look like the table seen on my laptop: root@:~# xxd -l 512 /dev/sdb snip 1b0: 0001 1c0: 0100 8307 203f 2000 e03f ? ? 1d0: 1e0: 1f0: 55aa ..U. Assuming: modprobe scsi_debug physblk_exp=3 lowest_aligned=7 num_parts=1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_debug: Fix endianess in partition table
On Mon, 2013-02-11 at 18:34 +0100, Martin Peschke wrote: Both start_sect and nr_sects in struct partition are __le32 and require cpu_to_le32() on assignment. Steffen Maier has pointed me at: block/partitions/msdos.c: return (sector_t)get_unaligned_le32(p-start_sect); Unfortunately, both get_unaligned_le32() and le32_to_cpu() appear to be in use for start_sect and nr_sects. Any one who would argue for changing my patch from cpu_to_le32 to put_unaligned_le32()? Thanks, Martin Without this fix tools like fdisk show an invalid partition table for SCSI devices emulated by scsi_debug on big-endian architectures, like s390x. Besides a kernel message like this was emitted: sda: p1 start 536870912 is beyond EOD, enabling native capacity sda: p1 start 536870912 is beyond EOD, truncated For verification 'xxd -l 512 /dev/sda' has been used to make sure that this fix makes scsi_debug generated partition tables on s390x look like the ones generated on my laptop. Signed-off-by: Martin Peschke mpesc...@linux.vnet.ibm.com Reviewed-by: Steffen Maier ma...@linux.vnet.ibm.com --- drivers/scsi/scsi_debug.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -2662,8 +2662,8 @@ static void __init sdebug_build_parts(un / sdebug_sectors_per; pp-end_sector = (end_sec % sdebug_sectors_per) + 1; - pp-start_sect = start_sec; - pp-nr_sects = end_sec - start_sec + 1; + pp-start_sect = cpu_to_le32(start_sec); + pp-nr_sects = cpu_to_le32(end_sec - start_sec + 1); pp-sys_ind = 0x83; /* plain Linux partition */ } } -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] scsi_debug: Fix endianess in partition table
Both start_sect and nr_sects in struct partition are __le32 and require cpu_to_le32() on assignment. Without this fix tools like fdisk show an invalid partition table for SCSI devices emulated by scsi_debug on big-endian architectures, like s390x. Besides a kernel message like this was emitted: sda: p1 start 536870912 is beyond EOD, enabling native capacity sda: p1 start 536870912 is beyond EOD, truncated For verification 'xxd -l 512 /dev/sda' has been used to make sure that this fix makes scsi_debug generated partition tables on s390x look like the ones generated on my laptop. Signed-off-by: Martin Peschke mpesc...@linux.vnet.ibm.com Reviewed-by: Steffen Maier ma...@linux.vnet.ibm.com --- drivers/scsi/scsi_debug.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -2662,8 +2662,8 @@ static void __init sdebug_build_parts(un / sdebug_sectors_per; pp-end_sector = (end_sec % sdebug_sectors_per) + 1; - pp-start_sect = start_sec; - pp-nr_sects = end_sec - start_sec + 1; + pp-start_sect = cpu_to_le32(start_sec); + pp-nr_sects = cpu_to_le32(end_sec - start_sec + 1); pp-sys_ind = 0x83; /* plain Linux partition */ } } -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset
Hello Hannes, fc_eh_it_nexus_loss_handler() is invoked as the eh_target_reset_handler() callback and the eh_bus_reset_handler() is removed. lpfc_target_reset_handler(), which is replaced by your patch, used to issue a TARGET_RESET task management function over FCP in the eh_target_reset_handler() callback. What's wrong with that? Martin -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch 0/2] zfcp: bug fixes for error recovery
I am posting 2 bug fixes which are required for the zfcp error recovery to work correctly. It takes some error injection tests to strain the zfcp error recovery to the extend which triggers both bugs. The bugs are related to recovery actions which need to be dismissed because some stronger action has been requested. We have verified both patches running our error injection tests. Patches are against 2.6.24-rc2-git5. Please apply. [Patch 1/2] zfcp: fix dismissal of error recovery actions [Patch 2/2] zfcp: fix cleanup of dismissed error recovery actions Signed-off-by: Martin Peschke [EMAIL PROTECTED] Acked-by: Swen Schillig [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch 2/2] zfcp: fix cleanup of dismissed error recovery actions
Calling zfcp_erp_strategy_check_action() after zfcp_erp_action_to_running() in zfcp_erp_strategy() might cause an unbalanced up() for erp_ready_sem, which makes the zfcp recovery fail somewhere along the way: erp thread processing erp_action: | | someone waking up erp thread for erp_action | | | | someone else dismissing erp_action: | | | V V V write_lock_irqsave(adapter-erp_lock, flags); ... if (zfcp_erp_action_exists(erp_action) == ZFCP_ERP_ACTION_RUNNING) { zfcp_erp_action_to_ready(erp_action); up(adapter-erp_ready_sem);/* first up() for erp_action */ } write_unlock_irqrestore(adapter-erp_lock, flags); write_lock_irqsave(adapter-erp_lock, flags); ... zfcp_erp_action_to_running(erp_action); write_unlock_restore(adapter-erp_lock, flags); /* processing erp_action */ write_lock_irqsave(adapter-erp_lock, flags); ... erp_action-status |= ZFCP_STATUS_ERP_DISMISSED; if (zfcp_erp_action_exists(erp_action) == ZFCP_ERP_ACTION_RUNNING) { zfcp_erp_action_to_ready(erp_action); up(adapter-erp_ready_sem); /* second, unbalanced up() for erp_action */ } ... write_unlock_restore(adapter-erp_lock, flags); write_lock_irqsave(adapter-erp_lock, flags); if (erp_action-status ZFCP_STATUS_ERP_DISMISSED) { zfcp_erp_action_dequeue(erp_action); retval = ZFCP_ERP_DISMISSED; } ... write_unlock_restore(adapter-erp_lock, flags); down(adapter-erp_ready_sem); /* this down() is meant to balance the first up() */ The erp thread must not dismiss an erp_action after moving that action to erp_running_head. Instead it should just go through the down() operation, which balances the first up(), and run through zfcp_erp_strategy one more time for the second up(), which eventually cleans up erp_action. Which is similar to the normal processing of an event for erp_action doing something asynchronously (e.g. waiting for the completion of an fsf_req). This only works if we make sure that a dismissed erp_action is passed to zfcp_erp_strategy() prior to the other action, which caused actions to be dismissed. Therefore the patch implements this rule: running actions go to the head of the ready list; new actions go to the tail of the ready list; the erp thread picks actions to be processed from the ready list's head. Signed-off-by: Martin Peschke [EMAIL PROTECTED] Acked-by: Swen Schillig [EMAIL PROTECTED] --- drivers/s390/scsi/zfcp_erp.c | 14 ++ 1 files changed, 6 insertions(+), 8 deletions(-) --- linux-2.6.24-rc2-git5.orig/drivers/s390/scsi/zfcp_erp.c +++ linux-2.6.24-rc2-git5/drivers/s390/scsi/zfcp_erp.c @@ -1065,7 +1065,7 @@ zfcp_erp_thread(void *data) adapter-status)) { write_lock_irqsave(adapter-erp_lock, flags); - next = adapter-erp_ready_head.prev; + next = adapter-erp_ready_head.next; write_unlock_irqrestore(adapter-erp_lock, flags); if (next != adapter-erp_ready_head) { @@ -1155,15 +1155,13 @@ zfcp_erp_strategy(struct zfcp_erp_action /* * check for dismissed status again to avoid follow-up actions, -* failing of targets and so on for dismissed actions +* failing of targets and so on for dismissed actions, +* we go through down() here because there has been an up() */ - retval = zfcp_erp_strategy_check_action(erp_action, retval); + if (erp_action-status ZFCP_STATUS_ERP_DISMISSED) + retval = ZFCP_ERP_CONTINUES; switch (retval) { - case ZFCP_ERP_DISMISSED: - /* leave since this action has ridden to its ancestors */ - debug_text_event(adapter-erp_dbf, 6, a_st_dis2); - goto unlock; case ZFCP_ERP_NOMEM: /* no memory to continue immediately, let it sleep */ if (!(erp_action-status ZFCP_STATUS_ERP_LOWMEM)) { @@ -3091,7 +3089,7 @@ zfcp_erp_action_enqueue(int action, ++adapter-erp_total_count; /* finally put it into 'ready' queue and kick erp thread */ - list_add(erp_action-list, adapter-erp_ready_head); + list_add_tail(erp_action-list, adapter-erp_ready_head); up(adapter-erp_ready_sem); retval = 0; out: - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] zfcp: zfcp: add statistics and other zfcp relatedinformation to sysfs
I am afraid this patch can't be applied as it is. The problem is that the code doesn't take into account that older adapters do not provide the data which your patch tries to extract. There is a feature flag for that in the hardware specification. So the right response to Heiko's complaint about an unused feature flag wasn't to remove the flag definition... :-) Is this patch against 2.6.23-rc or something? It doesn't apply without some minor noise. Besides there are some HEADs below, which makes me think this is against some internal CVS revision... For more nit-picking please see below. Thanks, Martin Swen Schillig wrote: From: Swen Schillig [EMAIL PROTECTED] add statistics and other zfcp related information to sysfs The zfcp adapter provides a variety of information which was never published to the external world. This patch adds a few of those statistics to the sysfs tree structure. These are reflected in the following attributes /sys/bus/ccw/drivers/zfcp/0.0.1707 /* information for the virtual adapter */ statistic_services/ requests megabytes utilization seconds_active /* LUN specific info for channel- and fabric-latency */ 0x500507630300c562/0x401040a6/ read_latency write_latency cmd_latency Signed-off-by: Swen Schillig [EMAIL PROTECTED] --- drivers/s390/scsi/Makefile|2 drivers/s390/scsi/zfcp_aux.c | 30 - drivers/s390/scsi/zfcp_def.h | 14 ++ drivers/s390/scsi/zfcp_ext.h |2 drivers/s390/scsi/zfcp_fsf.c |1 drivers/s390/scsi/zfcp_fsf.h | 28 - drivers/s390/scsi/zfcp_qdio.c | 39 +++ drivers/s390/scsi/zfcp_sysfs_statistics.c | 162 ++ drivers/s390/scsi/zfcp_sysfs_unit.c | 63 +++ 9 files changed, 333 insertions(+), 8 deletions(-) Index: HEAD/drivers/s390/scsi/zfcp_def.h === --- HEAD.orig/drivers/s390/scsi/zfcp_def.h +++ HEAD/drivers/s390/scsi/zfcp_def.h @@ -868,6 +868,17 @@ struct zfcp_erp_action { struct timer_list timer; }; +struct latency_cont { + u32 channel; + u32 fabric; + u32 counter; +}; Are you sure you want just 32 bits for sums? Given the unit used (hw ticks) it overruns quickly. + +struct zfcp_latencies { + struct latency_cont read; + struct latency_cont write; + struct latency_cont cmd; +}; struct zfcp_adapter { struct list_headlist; /* list of adapters */ @@ -883,6 +894,7 @@ struct zfcp_adapter { u32 adapter_features; /* FCP channel features */ u32 connection_features; /* host connection features */ u32hardware_version; /* of FCP channel */ + u16 timer_ticks; /* time int for a tick */ struct Scsi_Host*scsi_host;/* Pointer to mid-layer */ struct list_headport_list_head;/* remote port list */ struct list_headport_remove_lh;/* head of ports to be @@ -930,6 +942,7 @@ struct zfcp_adapter { struct zfcp_scsi_dbf_record scsi_dbf_buf; struct zfcp_adapter_mempool pool; /* Adapter memory pools */ struct qdio_initialize qdio_init_data;/* for qdio_establish */ + struct device stat_services; After my fancy, stat_services is not a good choice. Shorten it; statistics is descriptive enough. In case you have been inspired by generic_services below: generic_services refers to Generic Services (GS) as defined by FC standard documents. struct device generic_services; /* directory for WKA ports */ struct fc_host_statistics *fc_stats; struct fsf_qtcb_bottom_port *stats_reset_data; @@ -986,6 +999,7 @@ struct zfcp_unit { all scsi_scan_target requests have been completed. */ + struct zfcp_latencies latencies; }; /* FSF request */ Index: HEAD/drivers/s390/scsi/zfcp_fsf.c === --- HEAD.orig/drivers/s390/scsi/zfcp_fsf.c +++ HEAD/drivers/s390/scsi/zfcp_fsf.c @@ -2079,6 +2079,7 @@ zfcp_fsf_exchange_config_evaluate(struct fc_host_supported_classes(shost) = FC_COS_CLASS2 | FC_COS_CLASS3; adapter-hydra_version = bottom-adapter_type; + adapter-timer_ticks = bottom-timer_interval; if (fc_host_permanent_port_name(shost) == -1) fc_host_permanent_port_name(shost) =
Re: [PATCH, RFC] SCSI cleanups for 2.5
On Tue, 16 Jan 2001, Prasenjit Sarkar wrote: Well, WWN may be either WWPN (port name) or WWNN (node name). According to the mapping of targetID and D_ID, WWPN seems to be the right choice. Of course, some devices allow access to a given storage through different nodes as well as different ports. There are a vendor an example: http://www.storage.ibm.com/hardsoft/products/ess/support/essfcwp.pdf unique methods of determining when that is the case, unfortunately. I presume such devices are beyond the scope of this proposal. Do you know more about such methods? - Actually, the SCSI INQUIRY command with EVPD bit on and page code 83 should return the VPID of every LU -- this command is used by some SCSI initiators and most multipathing drivers to determine whether the same LU has been accessed through different paths. You can find more details in SPC-2 (or SPC-3). I am aware of these drafts or standards. But I can't find VPID. I know VPD (Vital Product Data). Regards Martin Peschke - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]