RE: Looking for some help understanding error handling

2018-10-19 Thread Chris.Moore


> -Original Message-
> From: linux-scsi-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of John Garry
> Sent: Friday, October 19, 2018 2:19 AM
> To: Chris Moore - C33997 ; h...@suse.de;
> linux-scsi@vger.kernel.org; Jason Yan 
> Subject: Re: Looking for some help understanding error handling
> 
> On 05/10/2018 16:51, chris.mo...@microchip.com wrote:
> > Thanks Hannes,
> >
> > After some pointers from Shane Seymour I found that the FC and SRP
> > transport layers have a devloss timer, so that when a device
> > disappears they hold on to the target information for a time waiting
> > to see if it comes back.  The SAS transport layer doesn't have that feature.
> >
> > The options for me then would be to modify scsi_transport_sas.c to
> > implement the devloss timeout, or to put that functionality into my LLDD.
> >
> > I'm willing to put the work into the SAS transport and libsas, but I
> > suspect there's not a universal need for it.  And since my LLDD is for
> > internal use at our company and won't be upstreamed, I'll probably
> > just do the work there.  If anyone feels that this is a feature that more
> people would want then I'll look into doing that.
> 
> Hello,
> 
> This feature sounds interesting for libsas. I however have a question on
> feasibility of devloss here (note: I'm not familiar with the 
> concept/realization
> for other standards): if a device is deattached and re-attached, how can we
> confirm the same device? For SAS device it's ok as a disk has the WWN, but
> what about SATA?
> 
> Thanks,
> John

Would the serial number work?  I haven't worked a lot with SATA drives, but
ATA8-ACS says the IDENTIFY DEVICE response must contain a unique serial
number.

Chris

> 
> >
> > Thanks,
> > Chris
> >
> >> -Original Message-
> >> From: Hannes Reinecke [mailto:h...@suse.de]
> >> Sent: Friday, October 5, 2018 8:01 AM
> >> To: Chris Moore - C33997 ; linux-
> >> s...@vger.kernel.org
> >> Subject: Re: Looking for some help understanding error handling
> >>
> >> On 10/2/18 11:04 PM, chris.mo...@microchip.com wrote:
> >>> I'm working on LLDD for a SAS/SATA host adapter, and trying to
> >>> understand
> >> how the system handles link loss and recovery.
> >>>
> >>> Say I have a device that gets recognized and attached as sd
> >>> 12:0:4:0, at
> >> /dev/sdb.
> >>> The drive goes offline temporarily, then comes back online.
> >>> When it does, it comes back as sd 12:0:5:0, and maybe /dev/sdb,
> >>> maybe
> >> /dev/sdc.
> >>>
> >>> I'm not sure how the Id gets assigned.  Since this is the same
> >>> drive, is there some way my driver can tell libsas and/or SCSI core
> >>> that it's the
> >> same drive coming back?
> >>> Or is there no way to control that?
> >>>
> >> Not really. The target device is getting destroyed once the device
> >> disconnects, and when it reconnects a new structure is allocated. But
> >> as the target number is a simple counter it gets increased up each
> allocation.
> >>
> >>> I looked into /dev/disk/by-id, but that also didn't quite do what I
> >>> expected.  If I open /dev/disk/by-id/some_identifier, that's a
> >>> symlink to,
> >> say, /dev/sdb.
> >>
> >> Yes.
> >>
> >>>  /dev/sdb goes away, comes back as /dev/sdc, but my process doesn't
> >>> know that, it still has /dev/disk/by-id/some_identifier opened and
> >>> so it will
> >> never recover without closing and reopening the file.
> >>>
> >> Simply don't keep hold of the symlink; once you have opened you'll
> >> miss any updates to the symlink itself.
> >> So better to open the symlink, check the device, do whatever needs to
> >> be done, and _close the symlink_ again.
> >> Then you can listen for udev events telling you when a device appears
> >> or vanishes.
> >>
> >> Cheers,
> >>
> >> Hannes
> >> --
> >> Dr. Hannes Reinecke   Teamlead Storage & Networking
> >> h...@suse.de  +49 911 74053 688
> >> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> >> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB
> >> 21284 (AG Nürnberg)
> 



RE: Looking for some help understanding error handling

2018-10-05 Thread Chris.Moore
Thanks Hannes,

After some pointers from Shane Seymour I found that the FC and SRP transport 
layers 
have a devloss timer, so that when a device disappears they hold on to the 
target
information for a time waiting to see if it comes back.  The SAS transport layer
doesn't have that feature.

The options for me then would be to modify scsi_transport_sas.c to implement
the devloss timeout, or to put that functionality into my LLDD.

I'm willing to put the work into the SAS transport and libsas, but I suspect 
there's
not a universal need for it.  And since my LLDD is for internal use at our 
company and
won't be upstreamed, I'll probably just do the work there.  If anyone feels 
that this
is a feature that more people would want then I'll look into doing that.

Thanks,
Chris

> -Original Message-
> From: Hannes Reinecke [mailto:h...@suse.de]
> Sent: Friday, October 5, 2018 8:01 AM
> To: Chris Moore - C33997 ; linux-
> s...@vger.kernel.org
> Subject: Re: Looking for some help understanding error handling
> 
> On 10/2/18 11:04 PM, chris.mo...@microchip.com wrote:
> > I'm working on LLDD for a SAS/SATA host adapter, and trying to understand
> how the system handles link loss and recovery.
> >
> > Say I have a device that gets recognized and attached as sd 12:0:4:0, at
> /dev/sdb.
> > The drive goes offline temporarily, then comes back online.
> > When it does, it comes back as sd 12:0:5:0, and maybe /dev/sdb, maybe
> /dev/sdc.
> >
> > I'm not sure how the Id gets assigned.  Since this is the same drive,
> > is there some way my driver can tell libsas and/or SCSI core that it's the
> same drive coming back?
> > Or is there no way to control that?
> >
> Not really. The target device is getting destroyed once the device
> disconnects, and when it reconnects a new structure is allocated. But as the
> target number is a simple counter it gets increased up each allocation.
> 
> > I looked into /dev/disk/by-id, but that also didn't quite do what I
> > expected.  If I open /dev/disk/by-id/some_identifier, that's a symlink to,
> say, /dev/sdb.
> 
> Yes.
> 
> >  /dev/sdb goes away, comes back as /dev/sdc, but my process doesn't
> > know that, it still has /dev/disk/by-id/some_identifier opened and so it 
> > will
> never recover without closing and reopening the file.
> >
> Simply don't keep hold of the symlink; once you have opened you'll miss any
> updates to the symlink itself.
> So better to open the symlink, check the device, do whatever needs to be
> done, and _close the symlink_ again.
> Then you can listen for udev events telling you when a device appears or
> vanishes.
> 
> Cheers,
> 
> Hannes
> --
> Dr. Hannes Reinecke  Teamlead Storage & Networking
> h...@suse.de +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284
> (AG Nürnberg)


Looking for some help understanding error handling

2018-10-02 Thread Chris.Moore
I'm working on LLDD for a SAS/SATA host adapter, and trying to understand how 
the system handles link loss and recovery.

Say I have a device that gets recognized and attached as sd 12:0:4:0, at 
/dev/sdb.
The drive goes offline temporarily, then comes back online.
When it does, it comes back as sd 12:0:5:0, and maybe /dev/sdb, maybe /dev/sdc.

I'm not sure how the Id gets assigned.  Since this is the same drive, is there 
some way my driver can tell libsas and/or SCSI core that it's the same drive 
coming back?
Or is there no way to control that?

I looked into /dev/disk/by-id, but that also didn't quite do what I expected.  
If I open /dev/disk/by-id/some_identifier, that's a symlink to, say, /dev/sdb.  
/dev/sdb goes away, comes back as /dev/sdc, but my process doesn't know that, 
it still has /dev/disk/by-id/some_identifier opened and so it will never 
recover without closing and reopening the file.

Thanks for any help or insight you can give.

Chris Moore