[LSF/MM ATTEND] T10-PI, scsi target core, FCoE target/initiator

2014-01-28 Thread Vu Pham

Hello,

I have worked and contributed in SRP initiator driver, SRP target and 
iSER target transport drivers for scsi target core (lio core), I would 
like to attend the discussion about SCSI error handler, scsi-mq, T10-PI 
and FCoE target/initiator drivers.


thanks,
-vu
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-08 Thread Vu Pham


  

Though, now that I've unpacked it -- I don't think it is OK for
dev_loss_tmo to be off, but fast IO to be on? That drops another
conditional.
  
The combination of dev_loss_tmo off and reconnect_delay  0 worked fine 
in my tests. An I/O failure was detected shortly after the cable to the 
target was pulled. I/O resumed shortly after the cable to the target was 
reinserted.



Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo
 0, and fast_io_fail_tmo = 0. The other transports do not allow this
scenario, and I'm asking if it makes sense for SRP to allow it.

But now that you mention reconnect_delay, what is the meaning of that
when it is negative? That's not in the documentation. And should it be
considered in srp_tmo_valid() -- are there values of reconnect_delay
that cause problems?

I'm starting to get a bit concerned about this patch -- can you, Vu, and
Sebastian comment on the testing you have done?

  

Hello Bart,

After running cable pull test on two local IB links for several hrs, 
I/Os got stuck.

Further commands multipath -ll or fdisk -l got stuck and never return
Here are the stack dump for srp-x kernel threads.
I'll run with #DEBUG to get more debug info on scsi host  rport

-vu


srp_threads.txt.tgz
Description: application/compressed


Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-03 Thread Vu Pham

David Dillow wrote:

On Wed, 2013-07-03 at 20:24 +0200, Bart Van Assche wrote:
  

On 07/03/13 19:27, David Dillow wrote:


On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote:
  

The combination of dev_loss_tmo off and reconnect_delay  0 worked fine
in my tests. An I/O failure was detected shortly after the cable to the
target was pulled. I/O resumed shortly after the cable to the target was
reinserted.


Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo
 0, and fast_io_fail_tmo = 0. The other transports do not allow this
scenario, and I'm asking if it makes sense for SRP to allow it.

But now that you mention reconnect_delay, what is the meaning of that
when it is negative? That's not in the documentation. And should it be
considered in srp_tmo_valid() -- are there values of reconnect_delay
that cause problems?
  
None of the combinations that can be configured from user space can 
bring the kernel in trouble. If reconnect_delay = 0 that means that the 
time-based reconnect mechanism is disabled.



Then it should use the same semantics as the other attributes, and have
the user store off to turn it off.

And I'm getting the strong sense that the answer to my question about
fast_io_fail_tmo = 0 when dev_loss_tmo is that we should not allow that
combination, even if it doesn't break the kernel. If it doesn't make
sense, there is no reason to create an opportunity for user confusion.
  

Hello Dave,

when dev_loss_tmo expired, srp not only removes the rport but also 
removes the associated scsi_host.
One may wish to set fast_io_fail_tmo =0 for I/Os to fail-over fast to 
other paths, and dev_loss_tmo off to keep the scsi_host around until the 
target coming back.


-vu
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-18 Thread Vu Pham

Bart Van Assche wrote:

On 06/14/13 19:59, Vu Pham wrote:

On 06/13/13 21:43, Vu Pham wrote:

+/**
+ * srp_tmo_valid() - check timeout combination validity
+ *
+ * If no fast I/O fail timeout has been configured then the device
loss timeout
+ * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O fail
timeout has
+ * been configured then it must be below the device loss timeout.
+ */
+int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
+{
+return (fast_io_fail_tmo  0  1 = dev_loss_tmo 
+dev_loss_tmo = SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
+|| (0 = fast_io_fail_tmo 
+(dev_loss_tmo  0 ||
+ (fast_io_fail_tmo  dev_loss_tmo 
+  dev_loss_tmo  LONG_MAX / HZ))) ? 0 : -EINVAL;
+}
+EXPORT_SYMBOL_GPL(srp_tmo_valid);

fast_io_fail_tmo is off, one cannot turn off dev_loss_tmo with
negative value
dev_loss_tmo is off, one cannot turn off fast_io_fail_tmo with
negative value


OK, will update the documentation such that it correctly refers to
off instead of a negative value and I will also mention that
dev_loss_tmo can now be disabled.


It's not only the documentation but also the code logic, you cannot turn
dev_loss_tmo off if fast_io_fail_tmo already turned off and vice versa
with the return statement above.


Does this mean that you think it would be useful to disable both the 
fast_io_fail and the dev_loss mechanisms, and hence rely on the user 
to remove remote ports that have disappeared and on the SCSI command 
timeout to detect path failures ?

Yes.

I'll start testing this to see whether that combination does not 
trigger any adverse behavior.




Ok


If rport's state is already SRP_RPORT_BLOCKED, I don't think we need
to do extra block with scsi_block_requests()


Please keep in mind that srp_reconnect_rport() can be called from two
different contexts: that function can not only be called from inside
the SRP transport layer but also from inside the SCSI error handler
(see also the srp_reset_device() modifications in a later patch in
this series). If this function is invoked from the context of the SCSI
error handler the chance is high that the SCSI device will have
another state than SDEV_BLOCK. Hence the scsi_block_requests() call in
this function.

Yes, srp_reconnect_rport() can be called from two contexts; however, it
deals with same rport  rport's state.
I'm thinking something like this:

if (rport-state != SRP_RPORT_BLOCKED) {
 scsi_block_requests(shost);


Sorry but I'm afraid that that approach would still allow the user to 
unblock one or more SCSI devices via sysfs during the 
i-f-reconnect(rport) call, something we do not want.


I don't think that user can unblock scsi device(s) via sysfs if you use 
scsi_block_requests(shost) in srp_start_tl_fail_timers().


-vu


I think that we can use only the pair
scsi_block_requests()/scsi_unblock_requests() unless the advantage of
multipathd recognizing the SDEV_BLOCK is noticeable.


I think the advantage of multipathd recognizing the SDEV_BLOCK state 
before the fast_io_fail_tmo timer has expired is important. Multipathd 
does not queue I/O to paths that are in the SDEV_BLOCK state so 
setting that state helps I/O to fail over more quickly, especially for 
large values of fast_io_fail_tmo.


Hope this helps,

Bart.


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-14 Thread Vu Pham

Hello Bart,


On 06/13/13 21:43, Vu Pham wrote:
  

Hello Bart,



+What:/sys/class/srp_remote_ports/port-h:n/dev_loss_tmo
+Date:September 1, 2013
+KernelVersion:3.11
+Contact:linux-scsi@vger.kernel.org, linux-r...@vger.kernel.org
+Description:Number of seconds the SCSI layer will wait after a 
transport

+layer error has been observed before removing a target port.
+Zero means immediate removal.
  

A negative value will disable the target port removal.

snip


+
+/**
+ * srp_tmo_valid() - check timeout combination validity
+ *
+ * If no fast I/O fail timeout has been configured then the device 
loss timeout
+ * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O fail 
timeout has

+ * been configured then it must be below the device loss timeout.
+ */
+int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
+{
+return (fast_io_fail_tmo  0  1 = dev_loss_tmo 
+dev_loss_tmo = SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
+|| (0 = fast_io_fail_tmo 
+(dev_loss_tmo  0 ||
+ (fast_io_fail_tmo  dev_loss_tmo 
+  dev_loss_tmo  LONG_MAX / HZ))) ? 0 : -EINVAL;
+}
+EXPORT_SYMBOL_GPL(srp_tmo_valid);
  
fast_io_fail_tmo is off, one cannot turn off dev_loss_tmo with negative 
value
dev_loss_tmo is off, one cannot turn off fast_io_fail_tmo with negative 
value



OK, will update the documentation such that it correctly refers to off 
instead of a negative value and I will also mention that dev_loss_tmo can now be disabled.

  
It's not only the documentation but also the code logic, you cannot turn 
dev_loss_tmo off if fast_io_fail_tmo already turned off and vice versa 
with the return statement above.

snip

+ * srp_reconnect_rport - reconnect by invoking 
srp_function_template.reconnect()

+ *
+ * Blocks SCSI command queueing before invoking reconnect() such that 
the

+ * scsi_host_template.queuecommand() won't be invoked concurrently with
+ * reconnect(). This is important since a reconnect() implementation may
+ * reallocate resources needed by queuecommand(). Please note that this
+ * function neither waits until outstanding requests have finished 
nor tries
+ * to abort these. It is the responsibility of the reconnect() 
function to

+ * finish outstanding commands before reconnecting to the target port.
+ */
+int srp_reconnect_rport(struct srp_rport *rport)
+{
+struct Scsi_Host *shost = rport_to_shost(rport);
+struct srp_internal *i = to_srp_internal(shost-transportt);
+struct scsi_device *sdev;
+int res;
+
+pr_debug(SCSI host %s\n, dev_name(shost-shost_gendev));
+
+res = mutex_lock_interruptible(rport-mutex);
+if (res) {
+pr_debug(%s: mutex_lock_interruptible() returned %d\n,
+ dev_name(shost-shost_gendev), res);
+goto out;
+}
+
+spin_lock_irq(shost-host_lock);
+scsi_block_requests(shost);
+spin_unlock_irq(shost-host_lock);
+
  

In scsi_block_requests() definition, no locks are assumed held.



Good catch :-) However, if you look around in drivers/scsi you will see that 
several SCSI LLD drivers invoke scsi_block_requests() with the host lock held. 
I'm not sure whether these LLDs or the scsi_block_requests() documentation is 
incorrect. Anyway, I'll leave the locking statements out since these are not 
necessary around this call of scsi_block_requests().

  
If rport's state is already SRP_RPORT_BLOCKED, I don't think we need to 
do extra block with scsi_block_requests()



Please keep in mind that srp_reconnect_rport() can be called from two different 
contexts: that function can not only be called from inside the SRP transport 
layer but also from inside the SCSI error handler (see also the 
srp_reset_device() modifications in a later patch in this series). If this 
function is invoked from the context of the SCSI error handler the chance is 
high that the SCSI device will have another state than SDEV_BLOCK. Hence the 
scsi_block_requests() call in this function.
  
Yes, srp_reconnect_rport() can be called from two contexts; however, it 
deals with same rport  rport's state.

I'm thinking something like this:

   if (rport-state != SRP_RPORT_BLOCKED) {
scsi_block_requests(shost);

   while (scsi_request_fn_active(shost))
   msleep(20);

   res = i-f-reconnect(rport);

   pr_debug(%s (state %d): transport.reconnect() returned %d\n,
dev_name(shost-shost_gendev), rport-state, res);
   if (res == 0) {
   cancel_delayed_work(rport-fast_io_fail_work);
   cancel_delayed_work(rport-dev_loss_work);
   rport-failed_reconnects = 0;
   scsi_unblock_requests(shost);
   srp_rport_set_state(rport, SRP_RPORT_RUNNING);
   /*
* It can occur that after fast_io_fail_tmo expired and before
* dev_loss_tmo expired that the SCSI error handler has
* offlined one or more devices. scsi_target_unblock() doesn't
* change the state of these devices into running

Re: [PATCH 00/11] First pass at merging Bart's HA work

2012-12-07 Thread Vu Pham

Alex Turin wrote:

On 12/6/2012 5:04 PM, Bart Van Assche wrote:
  

On 12/06/12 15:27, Or Gerlitz wrote:

The core problem here seems to be that scsi_remove_host simply never 
ends.
  

Hello Or,

The later patches in the srp-ha patch series avoided such behavior by 
checking whether the connection between SRP initiator and target is 
unique, and by removing duplicate SCSI hosts for which the transport 
layer failed.  Unfortunately these patches are still under review. 
Unless someone can come up with a better solution I will post a patch 
one of the next days that makes ib_srp again fail all commands after 
host removal started. That will avoid spending a long time doing error 
recovery.


Also, you might have noticed that Hannes Reinecke reported a few days 
ago that the SCSI error handler may need a lot of time for other 
transport types - this behavior is not SRP specific.


Bart.



Hello Bart,

In our case we don't have duplicate hosts or targets. We are working 
with a single SCSI disk.
To make scsi_remove_host hang we simply disabling a IB port and run dd 
if=/dev/sdb of=/dev/null count=1.


  

Hello Bart,

I applied your latest patch [PATCH for-next] IB/srp: Make SCSI error 
handling finish

and test

Let me capture what I'm seeing:

Host has two paths (scsi_host 7  8) to target thru two physical ports 1  2

[root@rsws42 ~]# multipath -l
size=50G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 7:0:0:11 sdb 8:16 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 8:0:0:11 sdc 8:32 active undef running

Cable pull by disable port 1, I/Os fail-over fine, the problem is the 
cleaning of scsi_host 7 of fail path.

IB RC failure, scsi error recovery kick in.
srp _reconnect_target() failed, srp_remove_target() run to remove 
scsi_host 7; however, I think it get stuck at device_del(dev) inside 
__scsi_remove_device(dev)


Error recovery continuously happen again and again on scsi host 7 for 
9-10 minutes.
scsi_host 7 cannot be cleaned up, its sysfs entry is still there 
(/sys/class/scsi_host/host7), its state is SHOST_CANCEL.


I brought port 1 back online, scsi_host 7 cannot reconnect to target 
because its state in SRP_TARGET_REMOVED.


scci_host 7 sysfs entry does not contain target login info (ioc_guid, 
id_ext, dgid...).
I think srp_daemon can reconnect to target by creating new path with new 
scsi hosst; however, I cannot check because I currently don't have a 
working srp_daemon.

I need to manually reconnect to target with echo command

Bottom line, I/Os can fail-over/failback; however, old scsi hosts cannot 
be removed (sysfs entry is still there) with state SHOST_CANCEL, error 
recovery keep happening on old scsi hosts for 10-20 minutes.


thanks,
-vu
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel

2008-01-29 Thread Vu Pham

FUJITA Tomonori wrote:

On Tue, 29 Jan 2008 13:31:52 -0800
Roland Dreier [EMAIL PROTECTED] wrote:


  .   .   STGT read SCST read.STGT read
  SCST read.
  .   .  performance   performance   . performance
performance   .
  .   .  (0.5K, MB/s)  (0.5K, MB/s)  .   (1 MB, MB/s)  
 (1 MB, MB/s)  .
  . iSER (8 Gb/s network) . 250N/A   .   360   
N/A   .
  . SRP  (8 Gb/s network) . N/A421   .   N/A   
683   .

  On the comparable figures, which only seem to be IPoIB they're showing a
  13-18% variance, aren't they?  Which isn't an incredible difference.

Maybe I'm all wet, but I think iSER vs. SRP should be roughly
comparable.  The exact formatting of various messages etc. is
different but the data path using RDMA is pretty much identical.  So
the big difference between STGT iSER and SCST SRP hints at some big
difference in the efficiency of the two implementations.


iSER has parameters to limit the maximum size of RDMA (it needs to
repeat RDMA with a poor configuration)?


Anyway, here's the results from Robin Humble:

iSER to 7G ramfs, x86_64, centos4.6, 2.6.22 kernels, git tgtd,
initiator end booted with mem=512M, target with 8G ram

 direct i/o dd
  write/read  800/751 MB/s
dd if=/dev/zero of=/dev/sdc bs=1M count=5000 oflag=direct
dd of=/dev/null if=/dev/sdc bs=1M count=5000 iflag=direct



Both Robin (iser/stgt) and Bart (scst/srp) using ramfs

Robin's numbers come from DDR IB HCAs

Bart's numbers come from SDR IB HCAs:
Results with /dev/ram0 configured as backing store on the 
target (buffered I/O):
Read  Write Read 
   Write
  performance   performance 
performance   performance
  (0.5K, MB/s)  (0.5K, MB/s)  (1 MB, 
MB/s)  (1 MB, MB/s)
STGT + iSER   250  48 349 
   781
SCST + SRP411  66 659 
   746


Results with /dev/ram0 configured as backing store on the 
target (direct I/O):
Read  Write Read 
   Write
  performance   performance 
performance   performance
  (0.5K, MB/s)  (0.5K, MB/s)  (1 MB, 
MB/s)  (1 MB, MB/s)
STGT + iSER 7.9 9.8   589 
   647
SCST + SRP 12.3 9.7   811 
   794


http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg13514.html

Here are my numbers with DDR IB HCAs, SCST/SRP 5G /dev/ram0 
block_io mode, RHEL5 2.6.18-8.el5


direct i/o dd
   write/read  1100/895 MB/s
 dd if=/dev/zero of=/dev/sdc bs=1M count=5000 oflag=direct
 dd of=/dev/null if=/dev/sdc bs=1M count=5000 iflag=direct

buffered i/o dd
   write/read  950/770 MB/s
 dd if=/dev/zero of=/dev/sdc bs=1M count=5000
 dd of=/dev/null if=/dev/sdc bs=1M count=5000

So when using DDR IB hcas:

  stgt/iser   scst/srp
direct I/O 800/751 1100/895
buffered I/O   1109/350950/770


-vu

http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg13502.html

I think that STGT is pretty fast with the fast backing storage. 



I don't think that there is the notable perfornace difference between
kernel-space and user-space SRP (or ISER) implementations about moving
data between hosts. IB is expected to enable user-space applications
to move data between hosts quickly (if not, what can IB provide us?).

I think that the question is how fast user-space applications can do
I/Os ccompared with I/Os in kernel space. STGT is eager for the advent
of good asynchronous I/O and event notification interfances.


One more possible optimization for STGT is zero-copy data
transfer. STGT uses pre-registered buffers and move data between page
cache and thsse buffers, and then does RDMA transfer. If we implement
own caching mechanism to use pre-registered buffers directly with (AIO
and O_DIRECT), then STGT can move data without data copies.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Scst-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/scst-devel



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html