Re: [dm-devel] [PATCH V4 1/2] multipath-tools: intermittent IO error accounting to improve reliability

Muneendra Kumar M Wed, 20 Sep 2017 06:27:07 -0700

Hi Guan,
>>>Shall we use existing PATH_SHAKY ?
As the path_shaky Indicates path not available for "normal" operations we can 
use this state. That's  a good idea.

Regarding the marginal paths below is my explanation. And brocade is publishing 
couple of white papers regarding the same to educate the SAN administrators and 
the san community.

Marginal path:

A host, target, LUN (ITL path) flow  goes through SAN. It is to be noted that 
the for each I/O request that goes to the SCSI layer, it transforms into a 
single SCSI exchange.  In a single SAN, there are typically multiple SAN 
network paths  for a ITL flow/path. Each SCSI exchange  can take one of the 
various network paths that are available for the ITL path.  A SAN can be based 
on Ethernet, FC, Infiniband physical networks to carry block storage traffic 
(SCSI, NVMe etc.)

There are typically two type of SAN network problems that are categorized as 
marginal issues. These issues by nature are not permanent in time and do come 
and go away over time.
1) Switches in the SAN can have intermittent frame drops or intermittent frame 
corruptions due to bad optics cable (SFP) or any such wear/tear  port issues. 
This causes ITL flows that go through the faulty switch/port to intermittently 
experience frame drops.  
2) There exists SAN topologies where there are switch ports in the fabric that 
becomes the only  conduit for many different ITL flows across multiple hosts. 
These single network paths are essentially shared across multiple ITL flows. 
Under these conditions if the port link bandwidth is not able to handle the net 
sum of the shared ITL flows bandwidth going through the single path  then we 
could see intermittent network congestion problems. This condition is called 
network oversubscription. The intermittent congestions can delay SCSI exchange 
completion time (increase in I/O latency is observed).

To overcome the above network issues and many more such target issues, there 
are frame level retries that are done in HBA device firmware and I/O retries in 
the SCSI layer. These retries might succeed because of two reasons:
1) The intermittent switch/port issue is not observed
2) The retry I/O is a new  SCSI exchange. This SCSI exchange  can take an 
alternate SAN path for the ITL flow, if such an SAN path exists.
3) Network congestion disappears momentarily because the net I/O bandwidth 
coming from multiple ITL flows on the single shared network path is something 
the path can handle

However in some cases we have seen I/O retries don’t succeed because the retry 
I/Os hits a SAN network path that has  intermittent switch/port issue and/or 
network congestion. 

On the host  thus we see configurations two or more ITL path sharing the same 
target/LUN going through two or more HBA ports. These HBA ports are connected 
to two or more SAN to the same target/LUN.
If the I/O fails at the multipath layer then, the ITL path is turned into 
Failed state. Because of the marginal nature of the network, the next Health 
Check command sent from multipath layer might succeed, which results in making 
the ITL path into Active state. You end up seeing the DM path state going  into 
Active, Failed, Active transitions. This results in overall reduction in 
application I/O throughput and sometime application I/O failures (because of 
timing constraints). All this can happen because of I/O retries and I/O request 
moving across multiple paths of the DM device. In the host it is  to be noted 
all I/O retries on a single path and I/O movement across multiple paths results 
in slowing down the forward progress of new application I/O. Reason behind, the 
above I/O  re-queue actions are given higher priority than the newer I/O 
requests coming from the application. 

The above condition of the  ITL path is hence called “marginal”.

What we desire is for the DM to deterministically  categorize a ITL Path as 
“marginal” and move all the pending I/Os from the marginal Path to an Active 
Path. This will help in meeting application I/O timing constraints. Also a 
capability to automatically re-instantiate the marginal path into Active once 
the marginal condition in the network is fixed.

Based on the above explanation I want to rename the names as marginal_path_XXXX 
and this is irrespective of any storage network.

Regards,
Muneendra.

-----Original Message-----
From: Guan Junxiong [mailto:[email protected]] 
Sent: Tuesday, September 19, 2017 6:23 PM
To: Muneendra Kumar M <[email protected]>; Martin Wilck <[email protected]>; 
[email protected]; [email protected]
Cc: [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH V4 1/2] multipath-tools: intermittent IO error accounting 
to improve reliability

Hi Muneendra ,
Thanks for your suggestion. My comments inline.

On 2017/9/19 18:59, Muneendra Kumar M wrote:
> Hi Guan/Martin,
> Below are my points.
> 
>>>> "san_path_double_fault_time"  is great.  One less additional parameter and 
>>>> still covering most scenarios are appreciated.
> This looks good and I completely agree with Guan.
> 
> One question san_path_double_fault_time is the time between two failed states 
> (failed-active-failed )is this correct ?.
> Then this holds good.
> 
> Instead of san_path_double_fault_time can we call it as 
> san_path_double_failed_time as the name suggests the time between two failed 
> states is this ok ?
> 

Both names are fine for me.

> In SAN  topology (FC,NVME,SCSI)transient intermittent network errors make the 
> ITL paths  as marginal paths. 
>
>
> So instead of calling "path_io_err_sample_time", "path_io_err_rate_threshold" 
> and "path_io_err_recovery_time"
> can we name as "marginal_path_err_detection_time", " 
> marginal_path_err_rate_threshold" and " marginal_path_err_recovery_time"
> 
> Some other names should also be good as the io_path is general word from my 
> view.
> >

Can you explain "the marginal paths" in details?  Can the users easily catch 
the meaning of  marginal paths?
IMO, path_io_err_XXXs are easy to under_stand

> If we agree with this one more thing which I would like to add as part of 
> this patch.
> 
> Whenever the path is in XXX_io_error_recovery_time  and if the user runs 
> multipath -ll command the path the state of the path is shown as failed as 
> shown below.
> 
>       | `- 6:0:0:0 sdb 8:16  failed ready  running
> 
> Can we add a new state as marginal so that when the admin run the 
> multipath command and found that the state is in marginal he can quickly come 
> to know that this a marginal Path and needs to be recovered .If we keep the 
> state as failed the admin cannot understand from past how much time the 
> device is in failed state.
> 
>

If the user use multipathd -k , then input "show paths", the multipathd will 
show the path in the state of "delayed".
But we can't find the exactly reason of the delayed state path because other 
features such as path waiting uses it.
Shall we use existing PATH_SHAKY ?

Regards.
Guan

--
dm-devel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH V4 1/2] multipath-tools: intermittent IO error accounting to improve reliability

Reply via email to