RE: [PATCH] sd: Limit WRITE SAME / WRITE SAME(16) w/UNMAP length for certain devices

2017-09-27 Thread Knight, Frederick
I agree that it is disappointing that so many vendors seem to have trouble 
reading the spec.  This case is pretty clear.

The best the T10 committee could do is add a bit to indicate that the device 
uses the length from MAXIMUM UNMAP LBA COUNT field for the length of unmaps via 
the WRITE SAME w/UNMAP=1 rather than the MAXIMUM WRITE SAME LENGTH field.  BUT, 
I'll be very clear that the setting of any such new bit will be bit=0 is 
backward compatible for COMPLIANT devices, and bit=1 will be the new setting 
for "backwards" devices - which means they would STILL require a firmware 
change to tell you they are backwards, and you'd STILL need a blacklist for 
their older revisions.  And this would just makes the hosts job all that much 
harder!

Once a device is broken (violates the spec), there is not very much we can do 
in the spec to fix it (they have to fix their broken device).

Fred

-Original Message-
From: Ewan D. Milne [mailto:emi...@redhat.com] 
Sent: Wednesday, September 27, 2017 12:28 PM
To: Martin K. Petersen <martin.peter...@oracle.com>
Cc: linux-scsi@vger.kernel.org; Knight, Frederick <frederick.kni...@netapp.com>
Subject: Re: [PATCH] sd: Limit WRITE SAME / WRITE SAME(16) w/UNMAP length for 
certain devices

On Mon, 2017-09-25 at 21:46 -0400, Martin K. Petersen wrote:
> Ewan,
> 
> > Some devices do not support a WRITE SAME / WRITE SAME(16) with the
> > UNMAP bit set up to the length specified in the MAXIMUM WRITE SAME
> > LENGTH field in the block limits VPD page (or, the field is zero,
> > indicating there is no limit).  Limit the length by the MAXIMUM UNMAP
> > LBA COUNT value.  Otherwise the command might be rejected.
> 
> From SBC4:
> 
>   "A MAXIMUM UNMAP LBA COUNT field set to a non-zero value indicates the
>   maximum number of LBAs that may be unmapped by an UNMAP command"
> 
> Note that it explicitly states "UNMAP command" and not "unmap
> operation".
> 
>   "A MAXIMUM WRITE SAME LENGTH field set to a non-zero value indicates
>   the maximum number of contiguous logical blocks that the device server
>   allows to be unmapped or written in a single WRITE SAME command."
> 
> It says "unmapped or written" and "WRITE SAME command".
> 
> The spec is crystal clear. The device needs to be fixed. We can
> blacklist older firmware revs.
> 

Yes, I know that is what SBC-4 says, and I agree that the devices
are not conforming.  Unfortunately, I've come across 3 different
arrays now from 3 different manufacturers that exhibit this behavior.

cc: Fred Knight for his opinion on this (NetApp was not one of the
arrays that I've run into, though).

-Ewan






RE: [Lsf] Notes from the four separate IO track sessions at LSF/MM

2016-04-28 Thread Knight, Frederick
There are multiple possible situations being intermixed in this discussion.  
First, I assume you're talking only about random access devices (if you try 
transport level error recover on a sequential access device - tape or SMR disk 
- there are lots of additional complexities).

Failures can occur at multiple places:
a) Transport layer failures that the transport layer is able to detect quickly;
b) SCSI device layer failures that the transport layer never even knows about.

For (a) there are two competing goals.  If a port drops off the fabric and 
comes back again, should you be able to just recover and continue.  But how 
long do you wait during that drop?  Some devices use this technique to "move" a 
WWPN from one place to another.  The port drops from the fabric, and a short 
time later, shows up again (the WWPN moves from one physical port to a 
different physical port). There are FC driver layer timers that define the 
length of time allowed for this operation.  The goal is fast failover, but not 
too fast - because too fast will break this kind of "transparent failover".  
This timer also allows for the "OH crap, I pulled the wrong cable - put it back 
in; quick" kind of stupid user bug.

For (b) the transport never has a failure.  A LUN (or a group of LUNs) have an 
ALUA transition from one set of ports to a different set of ports.  Some of the 
LUNs on the port continue to work just fine, but others enter ALUA TRANSITION 
state so they can "move" to a different part of the hardware.  After the move 
completes, you now have different sets of optimized and non-optimized paths (or 
possible standby, or unavailable).  The transport will never even know this 
happened.  This kind of "failure" is handled by the SCSI layer drivers.

There are other cases too, but these are the most common.

Fred

-Original Message-
From: lsf-boun...@lists.linux-foundation.org 
[mailto:lsf-boun...@lists.linux-foundation.org] On Behalf Of Bart Van Assche
Sent: Thursday, April 28, 2016 11:54 AM
To: James Bottomley; Mike Snitzer
Cc: linux-bl...@vger.kernel.org; l...@lists.linux-foundation.org; device-mapper 
development; linux-scsi
Subject: Re: [Lsf] Notes from the four separate IO track sessions at LSF/MM

On 04/28/2016 08:40 AM, James Bottomley wrote:
> Well, the entire room, that's vendors, users and implementors
> complained that path failover takes far too long.  I think in their
> minds this is enough substance to go on.

The only complaints I heard about path failover taking too long came 
from people working on FC drivers. Aren't SCSI transport layer 
implementations expected to fail I/O after fast_io_fail_tmo expired 
instead of waiting until the SCSI error handler has finished? If so, why 
is it considered an issue that error handling for the FC protocol can 
take very long (hours)?

Thanks,

Bart.
___
Lsf mailing list
l...@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lsf
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: T10 adds locally assigned UUID designation descriptor

2016-02-08 Thread Knight, Frederick
To add a little more information, the reason for the "NO" votes was as follows:

If a storage device implements this TODAY - and the only unique identifier in 
VPD page 0x83 is the UUID identifier, then, any existing shipping host will not 
find any unique identifier that it recognizes.  That host could do any number 
of other things (including but not limited to):
1) prevent the device from being used;
2) enable only a SINGLE path to the device (do not allow MPIO to operate on a 
device for which it cannot find a unique ID);
3) enable MPIO to the device using the unique ID of "NONE".

Both 1 & 2 are workable situations.  BUT, #3 is a problem; not if you have just 
1 of these UUID only devices, but if you have a bunch of them, and the host 
incorrectly assumes they are all the SAME device, and it tries to do IO based 
on that assumption.

So that is the background.  The "NO" votes were based on the belief by those 
companies that situation #3 was a forgone conclusion, and they didn't want to 
add any new features to the storage until after the hosts added code to support 
those new features - which the hosts can't do until there are storage devices 
built (based on a standard) which they can use for testing - CATCH-22.

The "YES" votes were based on the assumption that storage would not be 
configured with ONLY the UUID value unless the storage manager knew that the 
host to which it would be connected could actually support a UUID only storage 
system.  A configuration of a UUID only storage and a host that does not 
support UUID only storage is a configuration error.  No different than a "thin 
provisioned" LUN being configured for use by a host that prohibits the use of 
thin provisioned LUNs. Basically it is assumed that initial deployments of UUID 
identifiers would be in conjunction with other (NAA/EUI/etc) identifiers in 
page 0x83). Remember, real H/W vendors already own NAA and EUI values.  The 
primary creator of the UUID form will be S/W defined storage LUNs (as indicated 
in the preface material in the proposal), where there is no NAA or EUI 
available.

It simply goes back to the catch-22 - which comes first, the host support or 
the storage device support.  The solution is expected to show up in the next 
revision of the standard - there will be a temporary editor's note added 
indicating something along the lines of: a UUID only VPD page 0x83 should not 
be implemented in a storage device until it is known that the host supports 
such a configuration.  That note will be removed before final ANSI/ISO 
publication, but it will remain during the draft cycle.   At least, that is 
where the discussion ended up last I knew - we'll find out at the next meeting.

There was some minor discussion about that lack of uniqueness guarantees, but 
basically the committee said, you get what you get, and if you don't like it, 
don't use it.  You can also see, that the data structure is already primed for 
the addition of the 32 byte UUID value (if/when anyone ever invents such a 
beast, we'll examine whether it too should be added).

So I hope that clarifies some of the background around the controversy.

Fred Knight


-Original Message-
From: Douglas Gilbert [mailto:dgilb...@interlog.com] 
Sent: Monday, February 08, 2016 3:04 PM
To: James Bottomley; SCSI development list
Cc: Knight, Frederick
Subject: Re: T10 adds locally assigned UUID designation descriptor

On 16-02-08 02:00 PM, James Bottomley wrote:
> On Mon, 2016-02-08 at 12:33 -0500, Douglas Gilbert wrote:
>> Recently, in draft spc5r08, T10 added a locally assigned RFC 4122
>> UUID *** designation descriptor. That descriptor can now be
>> returned for VPD page 0x83 (device identification) amongst others.
>> It can be used anywhere SCSI needs a unique identifier expanding
>> the previous set of preferred identifiers: EUI, NAA and SCSI_name
>> (iSCSI).
>>
>> In the soon to be released sg3_utils version 1.42 the new UUID
>> designation descriptor is decoded including Hannes' --export
>> option found in sg_inq, for example:
>>
>> # sg_inq --export /dev/sg0
>> ...
>> SCSI_IDENT_LUN_UUID=11223344-5566-7788-aabb-ccddeeee
>>
>> Perhaps some udev work is needed to incorporate this new identifier.
>
> Hm, we're going to have to do this carefully.  With the move to GPT
> partitions, both the UUID= designator in fstab and the /dev/disk/by
> -uuid/ of udev means the GPT UUID.  In theory the design of the UUID
> space is to allow random selection without clashing, so we could just
> place the SCSI ones in here as well and perhaps there won't be a
> problem, but I'd like us to think about the consequences first.

The UUID proposal (16-005r1 from Fred Knight and "Dr. Hannes Reinecke")
was somewhat controversial with five T10 members voting against it. The
mi

FW: [LSF/MM TOPIC] New Storage capabilities

2014-02-06 Thread Knight, Frederick
Several new features are becoming a reality in SCSI and ATA this year, and I 
would like to participate in the discussions on supporting these new features.

  a) SCSI conglomerate LUNs (using more bits in the LUN to manage groupings 
of logical units);
  b) Atomic commands;
  c) IO and LBA HINTS (for both SCSI and ATA/IDE);
a. For storage tiering;
b. For cache management;
  d) FC - 128Gig parallel and breakout mode

I attend T10 (SCSI), T11 (FC), T13 (ATA/IDE), IETF (iSCSI), and SNIA and can 
provide expertise in the areas listed above as well as the topics covered in 
those standards committees.

Fred Knight
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html