[LSF/MM ATTEND] T10-PI, scsi target core, FCoE target/initiator
Hello, I have worked and contributed in SRP initiator driver, SRP target and iSER target transport drivers for scsi target core (lio core), I would like to attend the discussion about SCSI error handler, scsi-mq, T10-PI and FCoE target/initiator drivers. thanks, -vu -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
Though, now that I've unpacked it -- I don't think it is OK for dev_loss_tmo to be off, but fast IO to be on? That drops another conditional. The combination of dev_loss_tmo off and reconnect_delay 0 worked fine in my tests. An I/O failure was detected shortly after the cable to the target was pulled. I/O resumed shortly after the cable to the target was reinserted. Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo 0, and fast_io_fail_tmo = 0. The other transports do not allow this scenario, and I'm asking if it makes sense for SRP to allow it. But now that you mention reconnect_delay, what is the meaning of that when it is negative? That's not in the documentation. And should it be considered in srp_tmo_valid() -- are there values of reconnect_delay that cause problems? I'm starting to get a bit concerned about this patch -- can you, Vu, and Sebastian comment on the testing you have done? Hello Bart, After running cable pull test on two local IB links for several hrs, I/Os got stuck. Further commands multipath -ll or fdisk -l got stuck and never return Here are the stack dump for srp-x kernel threads. I'll run with #DEBUG to get more debug info on scsi host rport -vu srp_threads.txt.tgz Description: application/compressed
Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
David Dillow wrote: On Wed, 2013-07-03 at 20:24 +0200, Bart Van Assche wrote: On 07/03/13 19:27, David Dillow wrote: On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote: The combination of dev_loss_tmo off and reconnect_delay 0 worked fine in my tests. An I/O failure was detected shortly after the cable to the target was pulled. I/O resumed shortly after the cable to the target was reinserted. Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo 0, and fast_io_fail_tmo = 0. The other transports do not allow this scenario, and I'm asking if it makes sense for SRP to allow it. But now that you mention reconnect_delay, what is the meaning of that when it is negative? That's not in the documentation. And should it be considered in srp_tmo_valid() -- are there values of reconnect_delay that cause problems? None of the combinations that can be configured from user space can bring the kernel in trouble. If reconnect_delay = 0 that means that the time-based reconnect mechanism is disabled. Then it should use the same semantics as the other attributes, and have the user store off to turn it off. And I'm getting the strong sense that the answer to my question about fast_io_fail_tmo = 0 when dev_loss_tmo is that we should not allow that combination, even if it doesn't break the kernel. If it doesn't make sense, there is no reason to create an opportunity for user confusion. Hello Dave, when dev_loss_tmo expired, srp not only removes the rport but also removes the associated scsi_host. One may wish to set fast_io_fail_tmo =0 for I/Os to fail-over fast to other paths, and dev_loss_tmo off to keep the scsi_host around until the target coming back. -vu -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
Bart Van Assche wrote: On 06/14/13 19:59, Vu Pham wrote: On 06/13/13 21:43, Vu Pham wrote: +/** + * srp_tmo_valid() - check timeout combination validity + * + * If no fast I/O fail timeout has been configured then the device loss timeout + * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O fail timeout has + * been configured then it must be below the device loss timeout. + */ +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo) +{ +return (fast_io_fail_tmo 0 1 = dev_loss_tmo +dev_loss_tmo = SCSI_DEVICE_BLOCK_MAX_TIMEOUT) +|| (0 = fast_io_fail_tmo +(dev_loss_tmo 0 || + (fast_io_fail_tmo dev_loss_tmo + dev_loss_tmo LONG_MAX / HZ))) ? 0 : -EINVAL; +} +EXPORT_SYMBOL_GPL(srp_tmo_valid); fast_io_fail_tmo is off, one cannot turn off dev_loss_tmo with negative value dev_loss_tmo is off, one cannot turn off fast_io_fail_tmo with negative value OK, will update the documentation such that it correctly refers to off instead of a negative value and I will also mention that dev_loss_tmo can now be disabled. It's not only the documentation but also the code logic, you cannot turn dev_loss_tmo off if fast_io_fail_tmo already turned off and vice versa with the return statement above. Does this mean that you think it would be useful to disable both the fast_io_fail and the dev_loss mechanisms, and hence rely on the user to remove remote ports that have disappeared and on the SCSI command timeout to detect path failures ? Yes. I'll start testing this to see whether that combination does not trigger any adverse behavior. Ok If rport's state is already SRP_RPORT_BLOCKED, I don't think we need to do extra block with scsi_block_requests() Please keep in mind that srp_reconnect_rport() can be called from two different contexts: that function can not only be called from inside the SRP transport layer but also from inside the SCSI error handler (see also the srp_reset_device() modifications in a later patch in this series). If this function is invoked from the context of the SCSI error handler the chance is high that the SCSI device will have another state than SDEV_BLOCK. Hence the scsi_block_requests() call in this function. Yes, srp_reconnect_rport() can be called from two contexts; however, it deals with same rport rport's state. I'm thinking something like this: if (rport-state != SRP_RPORT_BLOCKED) { scsi_block_requests(shost); Sorry but I'm afraid that that approach would still allow the user to unblock one or more SCSI devices via sysfs during the i-f-reconnect(rport) call, something we do not want. I don't think that user can unblock scsi device(s) via sysfs if you use scsi_block_requests(shost) in srp_start_tl_fail_timers(). -vu I think that we can use only the pair scsi_block_requests()/scsi_unblock_requests() unless the advantage of multipathd recognizing the SDEV_BLOCK is noticeable. I think the advantage of multipathd recognizing the SDEV_BLOCK state before the fast_io_fail_tmo timer has expired is important. Multipathd does not queue I/O to paths that are in the SDEV_BLOCK state so setting that state helps I/O to fail over more quickly, especially for large values of fast_io_fail_tmo. Hope this helps, Bart. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
Hello Bart, On 06/13/13 21:43, Vu Pham wrote: Hello Bart, +What:/sys/class/srp_remote_ports/port-h:n/dev_loss_tmo +Date:September 1, 2013 +KernelVersion:3.11 +Contact:linux-scsi@vger.kernel.org, linux-r...@vger.kernel.org +Description:Number of seconds the SCSI layer will wait after a transport +layer error has been observed before removing a target port. +Zero means immediate removal. A negative value will disable the target port removal. snip + +/** + * srp_tmo_valid() - check timeout combination validity + * + * If no fast I/O fail timeout has been configured then the device loss timeout + * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O fail timeout has + * been configured then it must be below the device loss timeout. + */ +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo) +{ +return (fast_io_fail_tmo 0 1 = dev_loss_tmo +dev_loss_tmo = SCSI_DEVICE_BLOCK_MAX_TIMEOUT) +|| (0 = fast_io_fail_tmo +(dev_loss_tmo 0 || + (fast_io_fail_tmo dev_loss_tmo + dev_loss_tmo LONG_MAX / HZ))) ? 0 : -EINVAL; +} +EXPORT_SYMBOL_GPL(srp_tmo_valid); fast_io_fail_tmo is off, one cannot turn off dev_loss_tmo with negative value dev_loss_tmo is off, one cannot turn off fast_io_fail_tmo with negative value OK, will update the documentation such that it correctly refers to off instead of a negative value and I will also mention that dev_loss_tmo can now be disabled. It's not only the documentation but also the code logic, you cannot turn dev_loss_tmo off if fast_io_fail_tmo already turned off and vice versa with the return statement above. snip + * srp_reconnect_rport - reconnect by invoking srp_function_template.reconnect() + * + * Blocks SCSI command queueing before invoking reconnect() such that the + * scsi_host_template.queuecommand() won't be invoked concurrently with + * reconnect(). This is important since a reconnect() implementation may + * reallocate resources needed by queuecommand(). Please note that this + * function neither waits until outstanding requests have finished nor tries + * to abort these. It is the responsibility of the reconnect() function to + * finish outstanding commands before reconnecting to the target port. + */ +int srp_reconnect_rport(struct srp_rport *rport) +{ +struct Scsi_Host *shost = rport_to_shost(rport); +struct srp_internal *i = to_srp_internal(shost-transportt); +struct scsi_device *sdev; +int res; + +pr_debug(SCSI host %s\n, dev_name(shost-shost_gendev)); + +res = mutex_lock_interruptible(rport-mutex); +if (res) { +pr_debug(%s: mutex_lock_interruptible() returned %d\n, + dev_name(shost-shost_gendev), res); +goto out; +} + +spin_lock_irq(shost-host_lock); +scsi_block_requests(shost); +spin_unlock_irq(shost-host_lock); + In scsi_block_requests() definition, no locks are assumed held. Good catch :-) However, if you look around in drivers/scsi you will see that several SCSI LLD drivers invoke scsi_block_requests() with the host lock held. I'm not sure whether these LLDs or the scsi_block_requests() documentation is incorrect. Anyway, I'll leave the locking statements out since these are not necessary around this call of scsi_block_requests(). If rport's state is already SRP_RPORT_BLOCKED, I don't think we need to do extra block with scsi_block_requests() Please keep in mind that srp_reconnect_rport() can be called from two different contexts: that function can not only be called from inside the SRP transport layer but also from inside the SCSI error handler (see also the srp_reset_device() modifications in a later patch in this series). If this function is invoked from the context of the SCSI error handler the chance is high that the SCSI device will have another state than SDEV_BLOCK. Hence the scsi_block_requests() call in this function. Yes, srp_reconnect_rport() can be called from two contexts; however, it deals with same rport rport's state. I'm thinking something like this: if (rport-state != SRP_RPORT_BLOCKED) { scsi_block_requests(shost); while (scsi_request_fn_active(shost)) msleep(20); res = i-f-reconnect(rport); pr_debug(%s (state %d): transport.reconnect() returned %d\n, dev_name(shost-shost_gendev), rport-state, res); if (res == 0) { cancel_delayed_work(rport-fast_io_fail_work); cancel_delayed_work(rport-dev_loss_work); rport-failed_reconnects = 0; scsi_unblock_requests(shost); srp_rport_set_state(rport, SRP_RPORT_RUNNING); /* * It can occur that after fast_io_fail_tmo expired and before * dev_loss_tmo expired that the SCSI error handler has * offlined one or more devices. scsi_target_unblock() doesn't * change the state of these devices into running
Re: [PATCH 00/11] First pass at merging Bart's HA work
Alex Turin wrote: On 12/6/2012 5:04 PM, Bart Van Assche wrote: On 12/06/12 15:27, Or Gerlitz wrote: The core problem here seems to be that scsi_remove_host simply never ends. Hello Or, The later patches in the srp-ha patch series avoided such behavior by checking whether the connection between SRP initiator and target is unique, and by removing duplicate SCSI hosts for which the transport layer failed. Unfortunately these patches are still under review. Unless someone can come up with a better solution I will post a patch one of the next days that makes ib_srp again fail all commands after host removal started. That will avoid spending a long time doing error recovery. Also, you might have noticed that Hannes Reinecke reported a few days ago that the SCSI error handler may need a lot of time for other transport types - this behavior is not SRP specific. Bart. Hello Bart, In our case we don't have duplicate hosts or targets. We are working with a single SCSI disk. To make scsi_remove_host hang we simply disabling a IB port and run dd if=/dev/sdb of=/dev/null count=1. Hello Bart, I applied your latest patch [PATCH for-next] IB/srp: Make SCSI error handling finish and test Let me capture what I'm seeing: Host has two paths (scsi_host 7 8) to target thru two physical ports 1 2 [root@rsws42 ~]# multipath -l size=50G features='0' hwhandler='0' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 7:0:0:11 sdb 8:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 8:0:0:11 sdc 8:32 active undef running Cable pull by disable port 1, I/Os fail-over fine, the problem is the cleaning of scsi_host 7 of fail path. IB RC failure, scsi error recovery kick in. srp _reconnect_target() failed, srp_remove_target() run to remove scsi_host 7; however, I think it get stuck at device_del(dev) inside __scsi_remove_device(dev) Error recovery continuously happen again and again on scsi host 7 for 9-10 minutes. scsi_host 7 cannot be cleaned up, its sysfs entry is still there (/sys/class/scsi_host/host7), its state is SHOST_CANCEL. I brought port 1 back online, scsi_host 7 cannot reconnect to target because its state in SRP_TARGET_REMOVED. scci_host 7 sysfs entry does not contain target login info (ioc_guid, id_ext, dgid...). I think srp_daemon can reconnect to target by creating new path with new scsi hosst; however, I cannot check because I currently don't have a working srp_daemon. I need to manually reconnect to target with echo command Bottom line, I/Os can fail-over/failback; however, old scsi hosts cannot be removed (sysfs entry is still there) with state SHOST_CANCEL, error recovery keep happening on old scsi hosts for 10-20 minutes. thanks, -vu -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel
FUJITA Tomonori wrote: On Tue, 29 Jan 2008 13:31:52 -0800 Roland Dreier [EMAIL PROTECTED] wrote: . . STGT read SCST read.STGT read SCST read. . . performance performance . performance performance . . . (0.5K, MB/s) (0.5K, MB/s) . (1 MB, MB/s) (1 MB, MB/s) . . iSER (8 Gb/s network) . 250N/A . 360 N/A . . SRP (8 Gb/s network) . N/A421 . N/A 683 . On the comparable figures, which only seem to be IPoIB they're showing a 13-18% variance, aren't they? Which isn't an incredible difference. Maybe I'm all wet, but I think iSER vs. SRP should be roughly comparable. The exact formatting of various messages etc. is different but the data path using RDMA is pretty much identical. So the big difference between STGT iSER and SCST SRP hints at some big difference in the efficiency of the two implementations. iSER has parameters to limit the maximum size of RDMA (it needs to repeat RDMA with a poor configuration)? Anyway, here's the results from Robin Humble: iSER to 7G ramfs, x86_64, centos4.6, 2.6.22 kernels, git tgtd, initiator end booted with mem=512M, target with 8G ram direct i/o dd write/read 800/751 MB/s dd if=/dev/zero of=/dev/sdc bs=1M count=5000 oflag=direct dd of=/dev/null if=/dev/sdc bs=1M count=5000 iflag=direct Both Robin (iser/stgt) and Bart (scst/srp) using ramfs Robin's numbers come from DDR IB HCAs Bart's numbers come from SDR IB HCAs: Results with /dev/ram0 configured as backing store on the target (buffered I/O): Read Write Read Write performance performance performance performance (0.5K, MB/s) (0.5K, MB/s) (1 MB, MB/s) (1 MB, MB/s) STGT + iSER 250 48 349 781 SCST + SRP411 66 659 746 Results with /dev/ram0 configured as backing store on the target (direct I/O): Read Write Read Write performance performance performance performance (0.5K, MB/s) (0.5K, MB/s) (1 MB, MB/s) (1 MB, MB/s) STGT + iSER 7.9 9.8 589 647 SCST + SRP 12.3 9.7 811 794 http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg13514.html Here are my numbers with DDR IB HCAs, SCST/SRP 5G /dev/ram0 block_io mode, RHEL5 2.6.18-8.el5 direct i/o dd write/read 1100/895 MB/s dd if=/dev/zero of=/dev/sdc bs=1M count=5000 oflag=direct dd of=/dev/null if=/dev/sdc bs=1M count=5000 iflag=direct buffered i/o dd write/read 950/770 MB/s dd if=/dev/zero of=/dev/sdc bs=1M count=5000 dd of=/dev/null if=/dev/sdc bs=1M count=5000 So when using DDR IB hcas: stgt/iser scst/srp direct I/O 800/751 1100/895 buffered I/O 1109/350950/770 -vu http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg13502.html I think that STGT is pretty fast with the fast backing storage. I don't think that there is the notable perfornace difference between kernel-space and user-space SRP (or ISER) implementations about moving data between hosts. IB is expected to enable user-space applications to move data between hosts quickly (if not, what can IB provide us?). I think that the question is how fast user-space applications can do I/Os ccompared with I/Os in kernel space. STGT is eager for the advent of good asynchronous I/O and event notification interfances. One more possible optimization for STGT is zero-copy data transfer. STGT uses pre-registered buffers and move data between page cache and thsse buffers, and then does RDMA transfer. If we implement own caching mechanism to use pre-registered buffers directly with (AIO and O_DIRECT), then STGT can move data without data copies. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Scst-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/scst-devel - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html