RE: [RFC] hv_storvsc: error handling.

2017-03-07 Thread KY Srinivasan


> -Original Message-
> From: Christoph Hellwig [mailto:h...@lst.de]
> Sent: Monday, March 6, 2017 9:06 PM
> To: KY Srinivasan <k...@microsoft.com>
> Cc: Stephen Hemminger <step...@networkplumber.org>; James
> Bottomley <james.bottom...@hansenpartnership.com>; Hannes Reinecke
> <h...@suse.de>; Christoph Hellwig <h...@lst.de>; James Bottomley
> <j...@linux.vnet.ibm.com>; Jens Axboe <ax...@kernel.dk>; Linus Torvalds
> <torva...@linux-foundation.org>; Martin K. Petersen
> <martin.peter...@oracle.com>; Dexuan Cui <de...@microsoft.com>; Long
> Li <lon...@microsoft.com>; Josh Poulson <jopou...@microsoft.com>; Adrian
> Suhov (Cloudbase Solutions SRL) <v-ads...@microsoft.com>; linux-
> s...@vger.kernel.org; Haiyang Zhang <haiya...@microsoft.com>
> Subject: Re: [RFC] hv_storvsc: error handling.
> 
> > Was the invalid LUN in the LUN0 position. Inquiry of LUN0 support (when
> LUN0 is not populated)
> > was added only recently to address host side issue.
> 
> How does HyperV expect device scanning to happen for a not populated
> LUN?
> 
> REPORT SUPPORTED LUNS but nothing else on LUN 0?  Maybe a TEST UNIT
> READY
> thrown?  Or does it actually support the REPORT LUNS well known LU?

LUN0 on every target is supposed at least return enough data to support
report luns  scan for the target. It looks like if LUN0 on a target is empty DVD
device, the INQUIRY data that is returned from the host is bogus. Stephen can 
give
additional information on this.

K. Y  


Re: [RFC] hv_storvsc: error handling.

2017-03-07 Thread Christoph Hellwig
> Was the invalid LUN in the LUN0 position. Inquiry of LUN0 support (when LUN0 
> is not populated)
> was added only recently to address host side issue.

How does HyperV expect device scanning to happen for a not populated LUN?

REPORT SUPPORTED LUNS but nothing else on LUN 0?  Maybe a TEST UNIT READY
thrown?  Or does it actually support the REPORT LUNS well known LU?


Re: [RFC] hv_storvsc: error handling.

2017-03-06 Thread Stephen Hemminger

> > I will try it, but it can't work for two reasons.
> > First, the INVALID_LUN error is masked off on INQUIRY in current code.
> > Second, the scsi_device is instantiated already as part of scan probe 
> > process
> > before it gets here.  
> 
> Was the invalid LUN in the LUN0 position. Inquiry of LUN0 support (when LUN0 
> is not populated)
> was added only recently to address host side issue.

When probing the code probes with LUN 1, ...

There is a cause where kernel does INQUIRY on LUN0, it looks kernel is asking 
for
page code 80 which is optional "Unit Serial Number".  And then WS2016 is 
returning
an error and invalid sense data.  The old masking of errors caused kernel to
interpret sense data as Unit Serial Number which is also not good but looks 
harmless.

 
> > The best solution so far is:
> > - remove old INQUIRY/SENSE error masking
> > + add new workaround for INQUIRY of device id on LUN 0
> >   which appears to be the reason for old masking
> > + return errors on missing LUN
> > + provide better transport services for hot remove (rather
> >   than detecting by failed I/O).  
> 
> This the mechanism used by the host for notifying LUN removal - invalid LUN 
> error code.

This has a couple of problems. First, it means that hotplug doesn't occur until
an I/O is done. Second the current code was not being truthful to block layer.
If it has to handle hotplug in this manner, it should have still failed the I/O.
If application was using direct I/O it would want to know that write failed.

Perhaps the existing channel mechanism can be used as notification path.



RE: [RFC] hv_storvsc: error handling.

2017-03-06 Thread KY Srinivasan


> -Original Message-
> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Monday, March 6, 2017 8:36 AM
> To: KY Srinivasan <k...@microsoft.com>
> Cc: James Bottomley <james.bottom...@hansenpartnership.com>; Hannes
> Reinecke <h...@suse.de>; Christoph Hellwig <h...@lst.de>; James
> Bottomley <j...@linux.vnet.ibm.com>; Jens Axboe <ax...@kernel.dk>;
> Linus Torvalds <torva...@linux-foundation.org>; Martin K. Petersen
> <martin.peter...@oracle.com>; Dexuan Cui <de...@microsoft.com>; Long
> Li <lon...@microsoft.com>; Josh Poulson <jopou...@microsoft.com>; Adrian
> Suhov (Cloudbase Solutions SRL) <v-ads...@microsoft.com>; linux-
> s...@vger.kernel.org; Haiyang Zhang <haiya...@microsoft.com>
> Subject: Re: [RFC] hv_storvsc: error handling.
> 
> On Sat, 4 Mar 2017 21:03:41 +
> KY Srinivasan <k...@microsoft.com> wrote:
> 
> > > -Original Message-
> > > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > > Sent: Friday, March 3, 2017 4:50 PM
> > > To: James Bottomley <james.bottom...@hansenpartnership.com>
> > > Cc: Hannes Reinecke <h...@suse.de>; Christoph Hellwig <h...@lst.de>;
> > > James Bottomley <j...@linux.vnet.ibm.com>; Jens Axboe
> > > <ax...@kernel.dk>; Linus Torvalds <torva...@linux-foundation.org>;
> > > Martin K. Petersen <martin.peter...@oracle.com>; KY Srinivasan
> > > <k...@microsoft.com>; Dexuan Cui <de...@microsoft.com>; Long Li
> > > <lon...@microsoft.com>; Josh Poulson <jopou...@microsoft.com>;
> Adrian
> > > Suhov (Cloudbase Solutions SRL) <v-ads...@microsoft.com>; linux-
> > > s...@vger.kernel.org; Haiyang Zhang <haiya...@microsoft.com>
> > > Subject: [RFC] hv_storvsc: error handling.
> > >
> > > Needs more testing but this does fix the observed problem.
> > >
> > > From: Stephen Hemminger <sthem...@microsoft.com>
> > >
> > > Subject: [PATCH] hv_storvsc: fix error handling
> > >
> > > The Hyper-V storvsc SCSI driver was hiding all errors in INQUIRY and
> > > MODE_SENSE commands. This caused the scan process to incorrectly
> think
> > > devices were present and online.  Also invalid LUN errors were not
> > > being handled correctly.
> > >
> > > This fixes problems booting a GEN2 VM on Hyper-V. It effectively
> > > reverts commit 4ed51a21c0f69 ("Staging: hv: storvsc: Fixup
> > > srb and scsi status for INQUIRY and MODE_SENSE")
> > >
> > > Signed-off-by: Stephen Hemminger <sthem...@microsoft.com>
> > > ---
> > >  drivers/scsi/storvsc_drv.c | 48 
> > > --
> > >  1 file changed, 4 insertions(+), 44 deletions(-)
> > >
> > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > > index 638e5f427c90..8cc241fc54b8 100644
> > > --- a/drivers/scsi/storvsc_drv.c
> > > +++ b/drivers/scsi/storvsc_drv.c
> > > @@ -543,28 +543,6 @@ static void storvsc_host_scan(struct work_struct
> > > *work)
> > >   kfree(wrk);
> > >  }
> > >
> > > -static void storvsc_remove_lun(struct work_struct *work)
> > > -{
> > > - struct storvsc_scan_work *wrk;
> > > - struct scsi_device *sdev;
> > > -
> > > - wrk = container_of(work, struct storvsc_scan_work, work);
> > > - if (!scsi_host_get(wrk->host))
> > > - goto done;
> > > -
> > > - sdev = scsi_device_lookup(wrk->host, 0, wrk->tgt_id, wrk->lun);
> > > -
> > > - if (sdev) {
> > > - scsi_remove_device(sdev);
> > > - scsi_device_put(sdev);
> > > - }
> > > - scsi_host_put(wrk->host);
> > > -
> > > -done:
> > > - kfree(wrk);
> > > -}
> > > -
> > > -
> > >  /*
> > >   * We can get incoming messages from the host that are not in response
> to
> > >   * messages that we have sent out. An example of this would be
> messages
> > > @@ -955,8 +933,7 @@ static void storvsc_handle_error(struct
> > > vmscsi_request *vm_srb,
> > >   }
> > >   break;
> > >   case SRB_STATUS_INVALID_LUN:
> > > - do_work = true;
> > > - process_err_fn = storvsc_remove_lun;
> > > + set_host_byte(scmnd, DID_NO_CONNECT);
> > >   break;
> > >   case SRB_STATUS_ABO

Re: [RFC] hv_storvsc: error handling.

2017-03-06 Thread Stephen Hemminger
On Sat, 4 Mar 2017 21:03:41 +
KY Srinivasan  wrote:

> > -Original Message-
> > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > Sent: Friday, March 3, 2017 4:50 PM
> > To: James Bottomley 
> > Cc: Hannes Reinecke ; Christoph Hellwig ;
> > James Bottomley ; Jens Axboe
> > ; Linus Torvalds ;
> > Martin K. Petersen ; KY Srinivasan
> > ; Dexuan Cui ; Long Li
> > ; Josh Poulson ; Adrian
> > Suhov (Cloudbase Solutions SRL) ; linux-
> > s...@vger.kernel.org; Haiyang Zhang 
> > Subject: [RFC] hv_storvsc: error handling.
> > 
> > Needs more testing but this does fix the observed problem.
> > 
> > From: Stephen Hemminger 
> > 
> > Subject: [PATCH] hv_storvsc: fix error handling
> > 
> > The Hyper-V storvsc SCSI driver was hiding all errors in INQUIRY and
> > MODE_SENSE commands. This caused the scan process to incorrectly think
> > devices were present and online.  Also invalid LUN errors were not
> > being handled correctly.
> > 
> > This fixes problems booting a GEN2 VM on Hyper-V. It effectively
> > reverts commit 4ed51a21c0f69 ("Staging: hv: storvsc: Fixup
> > srb and scsi status for INQUIRY and MODE_SENSE")
> > 
> > Signed-off-by: Stephen Hemminger 
> > ---
> >  drivers/scsi/storvsc_drv.c | 48 
> > --
> >  1 file changed, 4 insertions(+), 44 deletions(-)
> > 
> > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > index 638e5f427c90..8cc241fc54b8 100644
> > --- a/drivers/scsi/storvsc_drv.c
> > +++ b/drivers/scsi/storvsc_drv.c
> > @@ -543,28 +543,6 @@ static void storvsc_host_scan(struct work_struct
> > *work)
> > kfree(wrk);
> >  }
> > 
> > -static void storvsc_remove_lun(struct work_struct *work)
> > -{
> > -   struct storvsc_scan_work *wrk;
> > -   struct scsi_device *sdev;
> > -
> > -   wrk = container_of(work, struct storvsc_scan_work, work);
> > -   if (!scsi_host_get(wrk->host))
> > -   goto done;
> > -
> > -   sdev = scsi_device_lookup(wrk->host, 0, wrk->tgt_id, wrk->lun);
> > -
> > -   if (sdev) {
> > -   scsi_remove_device(sdev);
> > -   scsi_device_put(sdev);
> > -   }
> > -   scsi_host_put(wrk->host);
> > -
> > -done:
> > -   kfree(wrk);
> > -}
> > -
> > -
> >  /*
> >   * We can get incoming messages from the host that are not in response to
> >   * messages that we have sent out. An example of this would be messages
> > @@ -955,8 +933,7 @@ static void storvsc_handle_error(struct
> > vmscsi_request *vm_srb,
> > }
> > break;
> > case SRB_STATUS_INVALID_LUN:
> > -   do_work = true;
> > -   process_err_fn = storvsc_remove_lun;
> > +   set_host_byte(scmnd, DID_NO_CONNECT);
> > break;
> > case SRB_STATUS_ABORTED:
> > if (vm_srb->srb_status & SRB_STATUS_AUTOSENSE_VALID
> > &&
> > @@ -1050,32 +1027,15 @@ static void storvsc_on_io_completion(struct
> > storvsc_device *stor_device,
> > 
> > stor_pkt = >vstor_packet;
> > 
> > -   /*
> > -* The current SCSI handling on the host side does
> > -* not correctly handle:
> > -* INQUIRY command with page code parameter set to 0x80
> > -* MODE_SENSE command with cmd[2] == 0x1c
> > -*
> > -* Setup srb and scsi status so this won't be fatal.
> > -* We do this so we can distinguish truly fatal failues
> > -* (srb status == 0x4) and off-line the device in that case.
> > -*/
> > -
> > -   if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
> > -  (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
> > -   vstor_packet->vm_srb.scsi_status = 0;
> > -   vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS;
> > -   }
> > -
> > -
> > /* Copy over the status...etc */
> > stor_pkt->vm_srb.scsi_status = vstor_packet->vm_srb.scsi_status;
> > stor_pkt->vm_srb.srb_status = vstor_packet->vm_srb.srb_status;
> > stor_pkt->vm_srb.sense_info_length =
> > vstor_packet->vm_srb.sense_info_length;
> > 
> > -   if (vstor_packet->vm_srb.scsi_status != 0 ||
> > -   vstor_packet->vm_srb.srb_status != SRB_STATUS_SUCCESS)
> > +   if (stor_pkt->vm_srb.cdb[0] != INQUIRY &&
> > +   (vstor_packet->vm_srb.scsi_status != 0 ||
> > +vstor_packet->vm_srb.srb_status != SRB_STATUS_SUCCESS))
> > storvsc_log(device, STORVSC_LOGGING_WARN,
> > "cmd 0x%x scsi status 0x%x srb status 0x%x\n",
> > stor_pkt->vm_srb.cdb[0],
> > --  
> 
> This patch gets rid of the ability to "hot remove" LUNs. I don't think that 
> can be part of any
> solution. The INQUIRY hack I put in a long time ago was to deal with host 
> bugs on prior 

RE: [RFC] hv_storvsc: error handling.

2017-03-04 Thread KY Srinivasan


> -Original Message-
> From: KY Srinivasan
> Sent: Saturday, March 4, 2017 1:40 PM
> To: 'James Bottomley' <j...@linux.vnet.ibm.com>; Stephen Hemminger
> <step...@networkplumber.org>
> Cc: Hannes Reinecke <h...@suse.de>; Christoph Hellwig <h...@lst.de>; Jens
> Axboe <ax...@kernel.dk>; Linus Torvalds <torvalds@linux-
> foundation.org>; Martin K. Petersen <martin.peter...@oracle.com>;
> Dexuan Cui <de...@microsoft.com>; Long Li <lon...@microsoft.com>; Josh
> Poulson <jopou...@microsoft.com>; Adrian Suhov (Cloudbase Solutions SRL)
> <v-ads...@microsoft.com>; linux-scsi@vger.kernel.org; Haiyang Zhang
> <haiya...@microsoft.com>
> Subject: RE: [RFC] hv_storvsc: error handling.
> 
> 
> 
> > -Original Message-
> > From: James Bottomley [mailto:j...@linux.vnet.ibm.com]
> > Sent: Saturday, March 4, 2017 1:37 PM
> > To: KY Srinivasan <k...@microsoft.com>; Stephen Hemminger
> > <step...@networkplumber.org>
> > Cc: Hannes Reinecke <h...@suse.de>; Christoph Hellwig <h...@lst.de>;
> Jens
> > Axboe <ax...@kernel.dk>; Linus Torvalds <torvalds@linux-
> > foundation.org>; Martin K. Petersen <martin.peter...@oracle.com>;
> > Dexuan Cui <de...@microsoft.com>; Long Li <lon...@microsoft.com>; Josh
> > Poulson <jopou...@microsoft.com>; Adrian Suhov (Cloudbase Solutions
> SRL)
> > <v-ads...@microsoft.com>; linux-scsi@vger.kernel.org; Haiyang Zhang
> > <haiya...@microsoft.com>
> > Subject: Re: [RFC] hv_storvsc: error handling.
> >
> > On Sat, 2017-03-04 at 21:03 +, KY Srinivasan wrote:
> > >
> > > > -Original Message-
> > > > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > > > Sent: Friday, March 3, 2017 4:50 PM
> > > > To: James Bottomley <james.bottom...@hansenpartnership.com>
> > > > Cc: Hannes Reinecke <h...@suse.de>; Christoph Hellwig
> <h...@lst.de>;
> > > > James Bottomley <j...@linux.vnet.ibm.com>; Jens Axboe
> > > > <ax...@kernel.dk>; Linus Torvalds <torva...@linux-foundation.org>;
> > > > Martin K. Petersen <martin.peter...@oracle.com>; KY Srinivasan
> > > > <k...@microsoft.com>; Dexuan Cui <de...@microsoft.com>; Long Li
> > > > <lon...@microsoft.com>; Josh Poulson <jopou...@microsoft.com>;
> > > > Adrian
> > > > Suhov (Cloudbase Solutions SRL) <v-ads...@microsoft.com>; linux-
> > > > s...@vger.kernel.org; Haiyang Zhang <haiya...@microsoft.com>
> > > > Subject: [RFC] hv_storvsc: error handling.
> > > >
> > > > Needs more testing but this does fix the observed problem.
> > > >
> > > > From: Stephen Hemminger <sthem...@microsoft.com>
> > > >
> > > > Subject: [PATCH] hv_storvsc: fix error handling
> > > >
> > > > The Hyper-V storvsc SCSI driver was hiding all errors in INQUIRY
> > > > and
> > > > MODE_SENSE commands. This caused the scan process to incorrectly
> > > > think
> > > > devices were present and online.  Also invalid LUN errors were not
> > > > being handled correctly.
> > > >
> > > > This fixes problems booting a GEN2 VM on Hyper-V. It effectively
> > > > reverts commit 4ed51a21c0f69 ("Staging: hv: storvsc: Fixup
> > > > srb and scsi status for INQUIRY and MODE_SENSE")
> > > >
> > > > Signed-off-by: Stephen Hemminger <sthem...@microsoft.com>
> > > > ---
> > > >  drivers/scsi/storvsc_drv.c | 48 --
> > > > 
> > > >  1 file changed, 4 insertions(+), 44 deletions(-)
> > > >
> > > > diff --git a/drivers/scsi/storvsc_drv.c
> > > > b/drivers/scsi/storvsc_drv.c
> > > > index 638e5f427c90..8cc241fc54b8 100644
> > > > --- a/drivers/scsi/storvsc_drv.c
> > > > +++ b/drivers/scsi/storvsc_drv.c
> > > > @@ -543,28 +543,6 @@ static void storvsc_host_scan(struct
> > > > work_struct
> > > > *work)
> > > > kfree(wrk);
> > > >  }
> > > >
> > > > -static void storvsc_remove_lun(struct work_struct *work)
> > > > -{
> > > > -   struct storvsc_scan_work *wrk;
> > > > -   struct scsi_device *sdev;
> > > > -
> > > > -   wrk = container_of(work, struct storvsc_scan_work, work);
> >

RE: [RFC] hv_storvsc: error handling.

2017-03-04 Thread KY Srinivasan


> -Original Message-
> From: James Bottomley [mailto:j...@linux.vnet.ibm.com]
> Sent: Saturday, March 4, 2017 1:37 PM
> To: KY Srinivasan <k...@microsoft.com>; Stephen Hemminger
> <step...@networkplumber.org>
> Cc: Hannes Reinecke <h...@suse.de>; Christoph Hellwig <h...@lst.de>; Jens
> Axboe <ax...@kernel.dk>; Linus Torvalds <torvalds@linux-
> foundation.org>; Martin K. Petersen <martin.peter...@oracle.com>;
> Dexuan Cui <de...@microsoft.com>; Long Li <lon...@microsoft.com>; Josh
> Poulson <jopou...@microsoft.com>; Adrian Suhov (Cloudbase Solutions SRL)
> <v-ads...@microsoft.com>; linux-scsi@vger.kernel.org; Haiyang Zhang
> <haiya...@microsoft.com>
> Subject: Re: [RFC] hv_storvsc: error handling.
> 
> On Sat, 2017-03-04 at 21:03 +, KY Srinivasan wrote:
> >
> > > -Original Message-
> > > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > > Sent: Friday, March 3, 2017 4:50 PM
> > > To: James Bottomley <james.bottom...@hansenpartnership.com>
> > > Cc: Hannes Reinecke <h...@suse.de>; Christoph Hellwig <h...@lst.de>;
> > > James Bottomley <j...@linux.vnet.ibm.com>; Jens Axboe
> > > <ax...@kernel.dk>; Linus Torvalds <torva...@linux-foundation.org>;
> > > Martin K. Petersen <martin.peter...@oracle.com>; KY Srinivasan
> > > <k...@microsoft.com>; Dexuan Cui <de...@microsoft.com>; Long Li
> > > <lon...@microsoft.com>; Josh Poulson <jopou...@microsoft.com>;
> > > Adrian
> > > Suhov (Cloudbase Solutions SRL) <v-ads...@microsoft.com>; linux-
> > > s...@vger.kernel.org; Haiyang Zhang <haiya...@microsoft.com>
> > > Subject: [RFC] hv_storvsc: error handling.
> > >
> > > Needs more testing but this does fix the observed problem.
> > >
> > > From: Stephen Hemminger <sthem...@microsoft.com>
> > >
> > > Subject: [PATCH] hv_storvsc: fix error handling
> > >
> > > The Hyper-V storvsc SCSI driver was hiding all errors in INQUIRY
> > > and
> > > MODE_SENSE commands. This caused the scan process to incorrectly
> > > think
> > > devices were present and online.  Also invalid LUN errors were not
> > > being handled correctly.
> > >
> > > This fixes problems booting a GEN2 VM on Hyper-V. It effectively
> > > reverts commit 4ed51a21c0f69 ("Staging: hv: storvsc: Fixup
> > > srb and scsi status for INQUIRY and MODE_SENSE")
> > >
> > > Signed-off-by: Stephen Hemminger <sthem...@microsoft.com>
> > > ---
> > >  drivers/scsi/storvsc_drv.c | 48 --
> > > 
> > >  1 file changed, 4 insertions(+), 44 deletions(-)
> > >
> > > diff --git a/drivers/scsi/storvsc_drv.c
> > > b/drivers/scsi/storvsc_drv.c
> > > index 638e5f427c90..8cc241fc54b8 100644
> > > --- a/drivers/scsi/storvsc_drv.c
> > > +++ b/drivers/scsi/storvsc_drv.c
> > > @@ -543,28 +543,6 @@ static void storvsc_host_scan(struct
> > > work_struct
> > > *work)
> > >   kfree(wrk);
> > >  }
> > >
> > > -static void storvsc_remove_lun(struct work_struct *work)
> > > -{
> > > - struct storvsc_scan_work *wrk;
> > > - struct scsi_device *sdev;
> > > -
> > > - wrk = container_of(work, struct storvsc_scan_work, work);
> > > - if (!scsi_host_get(wrk->host))
> > > - goto done;
> > > -
> > > - sdev = scsi_device_lookup(wrk->host, 0, wrk->tgt_id, wrk
> > > ->lun);
> > > -
> > > - if (sdev) {
> > > - scsi_remove_device(sdev);
> > > - scsi_device_put(sdev);
> > > - }
> > > - scsi_host_put(wrk->host);
> > > -
> > > -done:
> > > - kfree(wrk);
> > > -}
> > > -
> > > -
> > >  /*
> > >   * We can get incoming messages from the host that are not in
> > > response to
> > >   * messages that we have sent out. An example of this would be
> > > messages
> > > @@ -955,8 +933,7 @@ static void storvsc_handle_error(struct
> > > vmscsi_request *vm_srb,
> > >   }
> > >   break;
> > >   case SRB_STATUS_INVALID_LUN:
> > > - do_work = true;
> > > - process_err_fn = storvsc_remove_lun;
> > > + set_host_byte(scmnd, DID_NO_CONNECT);
> > >   break;
> > >   case SRB_

Re: [RFC] hv_storvsc: error handling.

2017-03-04 Thread James Bottomley
On Sat, 2017-03-04 at 21:03 +, KY Srinivasan wrote:
> 
> > -Original Message-
> > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > Sent: Friday, March 3, 2017 4:50 PM
> > To: James Bottomley 
> > Cc: Hannes Reinecke ; Christoph Hellwig ;
> > James Bottomley ; Jens Axboe
> > ; Linus Torvalds ;
> > Martin K. Petersen ; KY Srinivasan
> > ; Dexuan Cui ; Long Li
> > ; Josh Poulson ;
> > Adrian
> > Suhov (Cloudbase Solutions SRL) ; linux-
> > s...@vger.kernel.org; Haiyang Zhang 
> > Subject: [RFC] hv_storvsc: error handling.
> > 
> > Needs more testing but this does fix the observed problem.
> > 
> > From: Stephen Hemminger 
> > 
> > Subject: [PATCH] hv_storvsc: fix error handling
> > 
> > The Hyper-V storvsc SCSI driver was hiding all errors in INQUIRY
> > and
> > MODE_SENSE commands. This caused the scan process to incorrectly
> > think
> > devices were present and online.  Also invalid LUN errors were not
> > being handled correctly.
> > 
> > This fixes problems booting a GEN2 VM on Hyper-V. It effectively
> > reverts commit 4ed51a21c0f69 ("Staging: hv: storvsc: Fixup
> > srb and scsi status for INQUIRY and MODE_SENSE")
> > 
> > Signed-off-by: Stephen Hemminger 
> > ---
> >  drivers/scsi/storvsc_drv.c | 48 --
> > 
> >  1 file changed, 4 insertions(+), 44 deletions(-)
> > 
> > diff --git a/drivers/scsi/storvsc_drv.c
> > b/drivers/scsi/storvsc_drv.c
> > index 638e5f427c90..8cc241fc54b8 100644
> > --- a/drivers/scsi/storvsc_drv.c
> > +++ b/drivers/scsi/storvsc_drv.c
> > @@ -543,28 +543,6 @@ static void storvsc_host_scan(struct
> > work_struct
> > *work)
> > kfree(wrk);
> >  }
> > 
> > -static void storvsc_remove_lun(struct work_struct *work)
> > -{
> > -   struct storvsc_scan_work *wrk;
> > -   struct scsi_device *sdev;
> > -
> > -   wrk = container_of(work, struct storvsc_scan_work, work);
> > -   if (!scsi_host_get(wrk->host))
> > -   goto done;
> > -
> > -   sdev = scsi_device_lookup(wrk->host, 0, wrk->tgt_id, wrk
> > ->lun);
> > -
> > -   if (sdev) {
> > -   scsi_remove_device(sdev);
> > -   scsi_device_put(sdev);
> > -   }
> > -   scsi_host_put(wrk->host);
> > -
> > -done:
> > -   kfree(wrk);
> > -}
> > -
> > -
> >  /*
> >   * We can get incoming messages from the host that are not in
> > response to
> >   * messages that we have sent out. An example of this would be
> > messages
> > @@ -955,8 +933,7 @@ static void storvsc_handle_error(struct
> > vmscsi_request *vm_srb,
> > }
> > break;
> > case SRB_STATUS_INVALID_LUN:
> > -   do_work = true;
> > -   process_err_fn = storvsc_remove_lun;
> > +   set_host_byte(scmnd, DID_NO_CONNECT);
> > break;
> > case SRB_STATUS_ABORTED:
> > if (vm_srb->srb_status &
> > SRB_STATUS_AUTOSENSE_VALID
> > &&
> > @@ -1050,32 +1027,15 @@ static void storvsc_on_io_completion(struct
> > storvsc_device *stor_device,
> > 
> > stor_pkt = >vstor_packet;
> > 
> > -   /*
> > -* The current SCSI handling on the host side does
> > -* not correctly handle:
> > -* INQUIRY command with page code parameter set to 0x80
> > -* MODE_SENSE command with cmd[2] == 0x1c
> > -*
> > -* Setup srb and scsi status so this won't be fatal.
> > -* We do this so we can distinguish truly fatal failues
> > -* (srb status == 0x4) and off-line the device in that
> > case.
> > -*/
> > -
> > -   if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
> > -  (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
> > -   vstor_packet->vm_srb.scsi_status = 0;
> > -   vstor_packet->vm_srb.srb_status =
> > SRB_STATUS_SUCCESS;
> > -   }
> > -
> > -
> > /* Copy over the status...etc */
> > stor_pkt->vm_srb.scsi_status = vstor_packet
> > ->vm_srb.scsi_status;
> > stor_pkt->vm_srb.srb_status = vstor_packet
> > ->vm_srb.srb_status;
> > stor_pkt->vm_srb.sense_info_length =
> > vstor_packet->vm_srb.sense_info_length;
> > 
> > -   if (vstor_packet->vm_srb.scsi_status != 0 ||
> > -   vstor_packet->vm_srb.srb_status != SRB_STATUS_SUCCESS)
> > +   if (stor_pkt->vm_srb.cdb[0] != INQUIRY &&
> > +   (vstor_packet->vm_srb.scsi_status != 0 ||
> > +vstor_packet->vm_srb.srb_status !=
> > SRB_STATUS_SUCCESS))
> > storvsc_log(device, STORVSC_LOGGING_WARN,
> > "cmd 0x%x scsi status 0x%x srb status
> > 0x%x\n",
> > stor_pkt->vm_srb.cdb[0],
> > --
> 
> This patch gets rid of the ability to "hot remove" LUNs. I don't
> think that can be part of any
> solution. The INQUIRY hack I put in a long time ago 

RE: [RFC] hv_storvsc: error handling.

2017-03-04 Thread KY Srinivasan


> -Original Message-
> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Friday, March 3, 2017 4:50 PM
> To: James Bottomley 
> Cc: Hannes Reinecke ; Christoph Hellwig ;
> James Bottomley ; Jens Axboe
> ; Linus Torvalds ;
> Martin K. Petersen ; KY Srinivasan
> ; Dexuan Cui ; Long Li
> ; Josh Poulson ; Adrian
> Suhov (Cloudbase Solutions SRL) ; linux-
> s...@vger.kernel.org; Haiyang Zhang 
> Subject: [RFC] hv_storvsc: error handling.
> 
> Needs more testing but this does fix the observed problem.
> 
> From: Stephen Hemminger 
> 
> Subject: [PATCH] hv_storvsc: fix error handling
> 
> The Hyper-V storvsc SCSI driver was hiding all errors in INQUIRY and
> MODE_SENSE commands. This caused the scan process to incorrectly think
> devices were present and online.  Also invalid LUN errors were not
> being handled correctly.
> 
> This fixes problems booting a GEN2 VM on Hyper-V. It effectively
> reverts commit 4ed51a21c0f69 ("Staging: hv: storvsc: Fixup
> srb and scsi status for INQUIRY and MODE_SENSE")
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  drivers/scsi/storvsc_drv.c | 48 
> --
>  1 file changed, 4 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 638e5f427c90..8cc241fc54b8 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -543,28 +543,6 @@ static void storvsc_host_scan(struct work_struct
> *work)
>   kfree(wrk);
>  }
> 
> -static void storvsc_remove_lun(struct work_struct *work)
> -{
> - struct storvsc_scan_work *wrk;
> - struct scsi_device *sdev;
> -
> - wrk = container_of(work, struct storvsc_scan_work, work);
> - if (!scsi_host_get(wrk->host))
> - goto done;
> -
> - sdev = scsi_device_lookup(wrk->host, 0, wrk->tgt_id, wrk->lun);
> -
> - if (sdev) {
> - scsi_remove_device(sdev);
> - scsi_device_put(sdev);
> - }
> - scsi_host_put(wrk->host);
> -
> -done:
> - kfree(wrk);
> -}
> -
> -
>  /*
>   * We can get incoming messages from the host that are not in response to
>   * messages that we have sent out. An example of this would be messages
> @@ -955,8 +933,7 @@ static void storvsc_handle_error(struct
> vmscsi_request *vm_srb,
>   }
>   break;
>   case SRB_STATUS_INVALID_LUN:
> - do_work = true;
> - process_err_fn = storvsc_remove_lun;
> + set_host_byte(scmnd, DID_NO_CONNECT);
>   break;
>   case SRB_STATUS_ABORTED:
>   if (vm_srb->srb_status & SRB_STATUS_AUTOSENSE_VALID
> &&
> @@ -1050,32 +1027,15 @@ static void storvsc_on_io_completion(struct
> storvsc_device *stor_device,
> 
>   stor_pkt = >vstor_packet;
> 
> - /*
> -  * The current SCSI handling on the host side does
> -  * not correctly handle:
> -  * INQUIRY command with page code parameter set to 0x80
> -  * MODE_SENSE command with cmd[2] == 0x1c
> -  *
> -  * Setup srb and scsi status so this won't be fatal.
> -  * We do this so we can distinguish truly fatal failues
> -  * (srb status == 0x4) and off-line the device in that case.
> -  */
> -
> - if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
> -(stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
> - vstor_packet->vm_srb.scsi_status = 0;
> - vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS;
> - }
> -
> -
>   /* Copy over the status...etc */
>   stor_pkt->vm_srb.scsi_status = vstor_packet->vm_srb.scsi_status;
>   stor_pkt->vm_srb.srb_status = vstor_packet->vm_srb.srb_status;
>   stor_pkt->vm_srb.sense_info_length =
>   vstor_packet->vm_srb.sense_info_length;
> 
> - if (vstor_packet->vm_srb.scsi_status != 0 ||
> - vstor_packet->vm_srb.srb_status != SRB_STATUS_SUCCESS)
> + if (stor_pkt->vm_srb.cdb[0] != INQUIRY &&
> + (vstor_packet->vm_srb.scsi_status != 0 ||
> +  vstor_packet->vm_srb.srb_status != SRB_STATUS_SUCCESS))
>   storvsc_log(device, STORVSC_LOGGING_WARN,
>   "cmd 0x%x scsi status 0x%x srb status 0x%x\n",
>   stor_pkt->vm_srb.cdb[0],
> --

This patch gets rid of the ability to "hot remove" LUNs. I don't think that can 
be part of any
solution. The INQUIRY hack I put in a long time ago was to deal with host bugs 
on prior versions of
Windows server. WS2016 should not be trigerring this code. Stephen, could you 
please test this patch -
a quick hack:

>From b97f24f224a71a6e745c42e5640045a553eb407c Mon Sep 17 00:00:00 2001
From: K. Y. Srinivasan 

Re: [RFC] hv_storvsc: error handling.

2017-03-04 Thread Hannes Reinecke

On 03/04/2017 01:50 AM, Stephen Hemminger wrote:

Needs more testing but this does fix the observed problem.

From: Stephen Hemminger 

Subject: [PATCH] hv_storvsc: fix error handling

The Hyper-V storvsc SCSI driver was hiding all errors in INQUIRY and
MODE_SENSE commands. This caused the scan process to incorrectly think
devices were present and online.  Also invalid LUN errors were not
being handled correctly.

This fixes problems booting a GEN2 VM on Hyper-V. It effectively
reverts commit 4ed51a21c0f69 ("Staging: hv: storvsc: Fixup
srb and scsi status for INQUIRY and MODE_SENSE")

Signed-off-by: Stephen Hemminger 
---
 drivers/scsi/storvsc_drv.c | 48 --
 1 file changed, 4 insertions(+), 44 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 638e5f427c90..8cc241fc54b8 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -543,28 +543,6 @@ static void storvsc_host_scan(struct work_struct *work)
kfree(wrk);
 }

-static void storvsc_remove_lun(struct work_struct *work)
-{
-   struct storvsc_scan_work *wrk;
-   struct scsi_device *sdev;
-
-   wrk = container_of(work, struct storvsc_scan_work, work);
-   if (!scsi_host_get(wrk->host))
-   goto done;
-
-   sdev = scsi_device_lookup(wrk->host, 0, wrk->tgt_id, wrk->lun);
-
-   if (sdev) {
-   scsi_remove_device(sdev);
-   scsi_device_put(sdev);
-   }
-   scsi_host_put(wrk->host);
-
-done:
-   kfree(wrk);
-}
-
-
 /*
  * We can get incoming messages from the host that are not in response to
  * messages that we have sent out. An example of this would be messages
@@ -955,8 +933,7 @@ static void storvsc_handle_error(struct vmscsi_request 
*vm_srb,
}
break;
case SRB_STATUS_INVALID_LUN:
-   do_work = true;
-   process_err_fn = storvsc_remove_lun;
+   set_host_byte(scmnd, DID_NO_CONNECT);
break;
case SRB_STATUS_ABORTED:
if (vm_srb->srb_status & SRB_STATUS_AUTOSENSE_VALID &&
@@ -1050,32 +1027,15 @@ static void storvsc_on_io_completion(struct 
storvsc_device *stor_device,

stor_pkt = >vstor_packet;

-   /*
-* The current SCSI handling on the host side does
-* not correctly handle:
-* INQUIRY command with page code parameter set to 0x80
-* MODE_SENSE command with cmd[2] == 0x1c
-*
-* Setup srb and scsi status so this won't be fatal.
-* We do this so we can distinguish truly fatal failues
-* (srb status == 0x4) and off-line the device in that case.
-*/
-
-   if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
-  (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
-   vstor_packet->vm_srb.scsi_status = 0;
-   vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS;
-   }
-
-
/* Copy over the status...etc */
stor_pkt->vm_srb.scsi_status = vstor_packet->vm_srb.scsi_status;
stor_pkt->vm_srb.srb_status = vstor_packet->vm_srb.srb_status;
stor_pkt->vm_srb.sense_info_length =
vstor_packet->vm_srb.sense_info_length;

-   if (vstor_packet->vm_srb.scsi_status != 0 ||
-   vstor_packet->vm_srb.srb_status != SRB_STATUS_SUCCESS)
+   if (stor_pkt->vm_srb.cdb[0] != INQUIRY &&
+   (vstor_packet->vm_srb.scsi_status != 0 ||
+vstor_packet->vm_srb.srb_status != SRB_STATUS_SUCCESS))
storvsc_log(device, STORVSC_LOGGING_WARN,
"cmd 0x%x scsi status 0x%x srb status 0x%x\n",
stor_pkt->vm_srb.cdb[0],

I really do wonder; according to the comments above the wrong error 
handling really only affects inquiry VPD page 0x80 and MODE_SENSE page 
0x1c. So why do we need to blank out _every_ inquiry?
Wouldn't it far better to check _what_ goes wrong when asking for page 
0x80, and fix up things correctly?


Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)