Re: [PATCH] scsi: mptfusion: Remove unnecessary parentheses and simplify null checks

2018-09-27 Thread Martin K. Petersen


Nathan,

> Clang warns when multiple pairs of parentheses are used for a single
> conditional statement.

Applied to 4.20/scsi-queue. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: qla4xxx: remove redundant check on drvr_wait

2018-09-27 Thread Martin K. Petersen


Colin,

> The check for a non-zero drvr_wait is redundant as the same check is
> performed earlier in the outer while loop, the inner check will always
> be true if we reached this point inside the while loop.  Remove the
> redundant if check.

Applied to 4.20/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 00/10] NCR5380: Various improvements

2018-09-27 Thread Martin K. Petersen


Finn,

> This series addresses issues which became apparent when Michael
> Schmitz tried to use an AztecMonster II SATA/SCSI adapter with a 5380
> host.
>
> That target seems to have some bugs which thoroughly exercise driver
> error paths and EH.
>
> The patch by Hannes Reinecke was cherry-picked from his 'eh-reset.v5'
> branch.

Applied to 4.20/scsi-queue, thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 0/8] qla2xxx: Bug fixes for the driver

2018-09-27 Thread Martin K. Petersen


Himanshu,

> This series fixes issues found during out last test cycle.
>
> Patch 1,2 and 5 fixes misc NVMe discovery and unload hang in the driver. 
>
> Patch 3 fixes issue discovered during BFS test setup resulting into
> installation failure and hand because driver could not discover BFS Luns.
>
> Patch 4 fixes case where driver was not clearing up Loop ID resulting into
> PLOGI failure.
>
> Patch 6 was corner case which could lead to recursive mailbox timeout.
>
> Patch 7 and 8 are fixes for SRB double free. 
>
> These patches were made against 4.19/fixes branch. Please consider this
> for next rc inclusion. 

Even though the changes were individually pretty trivial, it was a big
amount of churn this late in the cycle. So I applied the series to
4.20/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] qedi: Initialize the stats mutex lock

2018-09-27 Thread Martin K. Petersen


Nilesh,

> Fix kernel NULL pointer dereference,

Applied to 4.19/scsi-fixes, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


[PATCH v2] scsi: bfa: Avoid implicit enum conversion in bfad_im_post_vendor_event

2018-09-27 Thread Nathan Chancellor
Clang warns when one enumerated type is implicitly converted to another.

drivers/scsi/bfa/bfa_fcs_lport.c:379:26: warning: implicit conversion
from enumeration type 'enum bfa_lport_aen_event' to different
enumeration type 'enum bfa_ioc_aen_event' [-Wenum-conversion]
  BFA_AEN_CAT_LPORT, event);
 ^

The root cause of these warnings is the bfad_im_post_vendor_event
function, which expects a value from enum bfa_ioc_aen_event but there
are multiple instances of values from enums bfa_port_aen_event,
bfa_audit_aen_event, and bfa_lport_aen_event being used in this
function.

Given that this doesn't appear to be a problem since cat helps with
differentiating the events, just change evt's type to int so that no
conversion needs to happen and Clang won't warn. Update aen_type's type
in bfa_aen_entry_s as members that hold enumerated types should be int.

Link: https://github.com/ClangBuiltLinux/linux/issues/147
Signed-off-by: Nathan Chancellor 
---

v1 -> v2:

* Update aen_type's type in bfa_aen_entry_s to match evt

 drivers/scsi/bfa/bfa_defs_svc.h | 2 +-
 drivers/scsi/bfa/bfad_im.h  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/bfa/bfa_defs_svc.h b/drivers/scsi/bfa/bfa_defs_svc.h
index 3d0c96a5c873..c19c26e0e405 100644
--- a/drivers/scsi/bfa/bfa_defs_svc.h
+++ b/drivers/scsi/bfa/bfa_defs_svc.h
@@ -1453,7 +1453,7 @@ union bfa_aen_data_u {
 struct bfa_aen_entry_s {
struct list_headqe;
enum bfa_aen_category   aen_category;
-   u32 aen_type;
+   int aen_type;
union bfa_aen_data_uaen_data;
u64 aen_tv_sec;
u64 aen_tv_usec;
diff --git a/drivers/scsi/bfa/bfad_im.h b/drivers/scsi/bfa/bfad_im.h
index e61ed8dad0b4..bd4ac187fd8e 100644
--- a/drivers/scsi/bfa/bfad_im.h
+++ b/drivers/scsi/bfa/bfad_im.h
@@ -143,7 +143,7 @@ struct bfad_im_s {
 static inline void bfad_im_post_vendor_event(struct bfa_aen_entry_s *entry,
 struct bfad_s *drv, int cnt,
 enum bfa_aen_category cat,
-enum bfa_ioc_aen_event evt)
+int evt)
 {
struct timespec64 ts;
 
-- 
2.19.0



Re: [PATCH] scsi: qla2xxx: don't allow negative thresholds

2018-09-27 Thread Martin K. Petersen


Dan,

> We shouldn't allow negative thresholds.  I don't know what it would do
> but it can't be good.

Applied to 4.20/scsi-queue, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: qla2xxx: Fix comment in MODULE_PARM_DESC in qla2xxx

2018-09-27 Thread Martin K. Petersen


Masanari,

> Default value of ql2xasynctmfenable for qla2xxx driver was set to 1 in
> commit 043dc1d7e8501.  But comment in MODULE_PARAM_DESC was not
> modified.

Applied to 4.20/scsi-queue. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: bfa: Avoid implicit enum conversion in bfad_im_post_vendor_event

2018-09-27 Thread Nathan Chancellor
On Thu, Sep 27, 2018 at 04:35:37PM -0700, Nick Desaulniers wrote:
> On Tue, Sep 25, 2018 at 9:58 PM Nathan Chancellor
>  wrote:
> >
> > Clang warns when one enumerated type is implicitly converted to another.
> >
> > drivers/scsi/bfa/bfa_fcs_lport.c:379:26: warning: implicit conversion
> > from enumeration type 'enum bfa_lport_aen_event' to different
> > enumeration type 'enum bfa_ioc_aen_event' [-Wenum-conversion]
> >   BFA_AEN_CAT_LPORT, event);
> >  ^
> >
> > The root cause of these warnings is the bfad_im_post_vendor_event
> > function, which expects a value from enum bfa_ioc_aen_event but there
> > are multiple instances of values from enums bfa_port_aen_event,
> > bfa_audit_aen_event, and bfa_lport_aen_event being used in this
> > function.
> 
> Indeed, it seems that bfad_im_post_vendor_event() assigns this parameter to
> 161 entry->aen_type = evt;
> 
> which is defined as:
> 
> 1456 u32 aen_type;
> 
> so already we know that aen_type is meant to be a grab bag of enum
> values.  bfad_im_post_vendor_event() is already passed many different
> types of enums, as you mention.  Does changing aen_type to an `int`
> produce further warnings, because it would be nice to have that change
> in this one, too.  AFAICT, it's only ever saved away in a containing
> struct.
> 

No, it doesn't introduce any new warnings. I can send that as a v2 here shortly.

Nathan

> >
> > Given that this doesn't appear to be a problem since cat helps with
> > differentiating the events, just change evt's type to int so that no
> > conversion needs to happen and Clang won't warn.
> >
> > Link: https://github.com/ClangBuiltLinux/linux/issues/147
> > Signed-off-by: Nathan Chancellor 
> > ---
> >
> > The alternate way of fixing these warnings is to explicitly cast the
> > conversion when calling the function but since there are about 8-10 of
> > these warnings, it seems logical to just change the function definiton
> > which is cleaner in my opinion.
> >
> > See commits 3eb95feac113 ("mm/zsmalloc.c: change stat type parameter to
> > int") and 04fecbf51b3c ("mm: memcontrol: use int for event/state
> > parameter in several functions") for similar fixes.
> >
> >  drivers/scsi/bfa/bfad_im.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/scsi/bfa/bfad_im.h b/drivers/scsi/bfa/bfad_im.h
> > index e61ed8dad0b4..bd4ac187fd8e 100644
> > --- a/drivers/scsi/bfa/bfad_im.h
> > +++ b/drivers/scsi/bfa/bfad_im.h
> > @@ -143,7 +143,7 @@ struct bfad_im_s {
> >  static inline void bfad_im_post_vendor_event(struct bfa_aen_entry_s *entry,
> >  struct bfad_s *drv, int cnt,
> >  enum bfa_aen_category cat,
> > -enum bfa_ioc_aen_event evt)
> > +int evt)
> >  {
> > struct timespec64 ts;
> >
> > --
> > 2.19.0
> >
> 
> 
> -- 
> Thanks,
> ~Nick Desaulniers


Re: [PATCH] scsi: bfa: Avoid implicit enum conversion in bfad_im_post_vendor_event

2018-09-27 Thread Nick Desaulniers
On Tue, Sep 25, 2018 at 9:58 PM Nathan Chancellor
 wrote:
>
> Clang warns when one enumerated type is implicitly converted to another.
>
> drivers/scsi/bfa/bfa_fcs_lport.c:379:26: warning: implicit conversion
> from enumeration type 'enum bfa_lport_aen_event' to different
> enumeration type 'enum bfa_ioc_aen_event' [-Wenum-conversion]
>   BFA_AEN_CAT_LPORT, event);
>  ^
>
> The root cause of these warnings is the bfad_im_post_vendor_event
> function, which expects a value from enum bfa_ioc_aen_event but there
> are multiple instances of values from enums bfa_port_aen_event,
> bfa_audit_aen_event, and bfa_lport_aen_event being used in this
> function.

Indeed, it seems that bfad_im_post_vendor_event() assigns this parameter to
161 entry->aen_type = evt;

which is defined as:

1456 u32 aen_type;

so already we know that aen_type is meant to be a grab bag of enum
values.  bfad_im_post_vendor_event() is already passed many different
types of enums, as you mention.  Does changing aen_type to an `int`
produce further warnings, because it would be nice to have that change
in this one, too.  AFAICT, it's only ever saved away in a containing
struct.

>
> Given that this doesn't appear to be a problem since cat helps with
> differentiating the events, just change evt's type to int so that no
> conversion needs to happen and Clang won't warn.
>
> Link: https://github.com/ClangBuiltLinux/linux/issues/147
> Signed-off-by: Nathan Chancellor 
> ---
>
> The alternate way of fixing these warnings is to explicitly cast the
> conversion when calling the function but since there are about 8-10 of
> these warnings, it seems logical to just change the function definiton
> which is cleaner in my opinion.
>
> See commits 3eb95feac113 ("mm/zsmalloc.c: change stat type parameter to
> int") and 04fecbf51b3c ("mm: memcontrol: use int for event/state
> parameter in several functions") for similar fixes.
>
>  drivers/scsi/bfa/bfad_im.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/bfa/bfad_im.h b/drivers/scsi/bfa/bfad_im.h
> index e61ed8dad0b4..bd4ac187fd8e 100644
> --- a/drivers/scsi/bfa/bfad_im.h
> +++ b/drivers/scsi/bfa/bfad_im.h
> @@ -143,7 +143,7 @@ struct bfad_im_s {
>  static inline void bfad_im_post_vendor_event(struct bfa_aen_entry_s *entry,
>  struct bfad_s *drv, int cnt,
>  enum bfa_aen_category cat,
> -enum bfa_ioc_aen_event evt)
> +int evt)
>  {
> struct timespec64 ts;
>
> --
> 2.19.0
>


-- 
Thanks,
~Nick Desaulniers


Re: [PATCH v4] scsi: ufs: Make sysfs attributes writable

2018-09-27 Thread Evan Green
On Thu, Sep 27, 2018 at 7:01 AM Christoph Hellwig  wrote:
>
> On Thu, Sep 27, 2018 at 06:32:47AM +, Avri Altman wrote:
> > Also, in this context there is the series in
> > https://www.spinics.net/lists/linux-scsi/msg123479.html
> > which allows to send UPIUs via a bsg device.
> >
> > It's not a provisioning series per-se like Evan's and Sayali's.
> > It covers the provisioning functionality,
> > But also allow to send task management UPIU, and UIC commands,
> > Which can be used for testing and validation.
>
> And as someone having been involved with review of a few different
> UFS provisioning bits this is what I think we should be merging.
>
> Instead of being in a rat race of adding ever new sysfs or configfs
> attributes for things that don't matter to normal driver operation
> I'd rather have a relatively clean pass through interface and move
> policy to userspace.  Especially given that there are plenty of
> vendor specific commands at these levels as well.

There's no policy in my patches (nor Sayali's), nor are there any
vendor-specific commands here. The sysfs interface has exposed knobs
defined by the UFS specification, which to me seems like the kernel
providing a sane abstraction of device functionality. I don't see
there being a rat race, as these attributes and the config descriptor
are all that's needed to provision a device, and any reasonable future
versions of the UFS spec would likely be backwards compatible with
respect to attributes and flags.

The patches Avri linked to seem fine as well, but I don't see why
there's not room for both the "roll your own driver completely in user
mode" approach, and the "kernel provides a reasonable abstraction of
device functionality" approach to co-exist. We do the same sort of
thing for simple buses like I2C for example, where you can both write
a kernel driver or do bus transactions directly from user mode.

-Evan


Re: [PATCH 0/8] qla2xxx: Bug fixes for the driver

2018-09-27 Thread Ewan D. Milne
On Wed, 2018-09-26 at 22:05 -0700, Himanshu Madhani wrote:
> Hi Martin, 
> 
> This series fixes issues found during out last test cycle. 
> 
> Patch 1,2 and 5 fixes misc NVMe discovery and unload hang in the driver. 
> 
> Patch 3 fixes issue discovered during BFS test setup resulting into
> installation failure and hand because driver could not discover BFS Luns.
> 
> Patch 4 fixes case where driver was not clearing up Loop ID resulting into
> PLOGI failure.
> 
> Patch 6 was corner case which could lead to recursive mailbox timeout.
> 
> Patch 7 and 8 are fixes for SRB double free. 
> 
> These patches were made against 4.19/fixes branch. Please consider this
> for next rc inclusion. 
> 
> Thanks,
> Himanshu
>  
> 
> Giridhar Malavali (2):
>   qla2xxx: Fix for double free of SRB structure used in Async switch
> query commands
>   qla2xxx: Move log messages before issuing command to firmware
> 
> Himanshu Madhani (1):
>   qla2xxx: Fix driver hang when FC-NVMe LUNs are configured
> 
> Quinn Tran (5):
>   qla2xxx: fix nvme session hang on unload
>   qla2xxx: Fix NVMe Target discovery
>   qla2xxx: Fix duplicate switch database entries
>   qla2xxx: Fix re-using LoopID when handle is in use
>   qla2xxx: Fix recursive mailbox timeout
> 
>  drivers/scsi/qla2xxx/qla_gs.c | 24 +++-
>  drivers/scsi/qla2xxx/qla_init.c   | 81 
> ---
>  drivers/scsi/qla2xxx/qla_mbx.c|  2 +-
>  drivers/scsi/qla2xxx/qla_nvme.c   |  5 +--
>  drivers/scsi/qla2xxx/qla_os.c |  4 +-
>  drivers/scsi/qla2xxx/qla_target.c |  3 +-
>  6 files changed, 62 insertions(+), 57 deletions(-)
> 

All patches in series Reviewed-by: Ewan D. Milne 




Re: [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available

2018-09-27 Thread Lukas Wunner
On Thu, Sep 27, 2018 at 01:47:46PM -0500, Bjorn Helgaas wrote:
>   mpt3sas_wait_for_commands_to_complete(...)
>   {
> ...
>  +  if (!mpt3sas_base_pci_device_is_available(ioc))
>  +return;
> 
> # In the definitive case, we returned above.  If we get here, we
> # think the device *might* be present.  The readl() inside
> # mpt3sas_base_get_iocstate() might fail (returning ~0 data).
> # It happens that (~0 & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL
> # so we will return below if the device was removed after the
> # check above.
> 
> ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
> if ((ioc_state & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL)
>   return;
> ...
> 
> 
> I think this code is unreasonably complicated.  The multiple layers
> and negations make it very difficult to figure out what's definitive.

I agree there's room for improvement.


> I'm not sure how mpt3sas benefits from adding
> mpt3sas_base_pci_device_is_available() here, other than the fact that
> it avoids an MMIO read to the device in most cases.  I think it would
> be simpler and better to make mpt3sas_base_get_iocstate() smarter
> about handling ~0 values from the readl().

Avoiding an MMIO read when it's known to run into a Completion Timeout
buys about 17 ms.  If the function is executed many times (I don't know
if that's the case here, I'm referring to the general case), or if bailing
out of it early avoids significant amounts of further I/O, then checking
for disconnectedness makes sense.

The 17 ms number is from this talk:
https://www.youtube.com/watch?v=GJ6B0xzgvlM&t=832

It's the typical Completion Timeout on an Intel chip, but it can be up to
50 ms in the default setting or up to 64 s if so configured in the Device
Control 2 register (PCIe r4.0 sec 7.8.16).

Thanks,

Lukas


Re: [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available

2018-09-27 Thread Bjorn Helgaas
[+cc Ben, Russell, Sam, Oliver in case they have pertinent experience from
powerpc error handling; thread begins at
https://lore.kernel.org/linux-pci/1537935759-14754-1-git-send-email-suganath-prabu.subram...@broadcom.com/]

On Thu, Sep 27, 2018 at 09:03:27AM +0200, Lukas Wunner wrote:
> On Wed, Sep 26, 2018 at 04:32:41PM -0500, Bjorn Helgaas wrote:
> > On Wed, Sep 26, 2018 at 09:52:34AM +0530, Suganath Prabu S wrote:
> > > @@ -6853,6 +6872,13 @@ mpt3sas_wait_for_commands_to_complete(struct 
> > > MPT3SAS_ADAPTER *ioc)
> > >  
> > >   ioc->pending_io_count = 0;
> > >  
> > > + if (!mpt3sas_base_pci_device_is_available(ioc)) {
> > > + pr_err(MPT3SAS_FMT
> > > + "%s: pci error recovery reset or pci device unplug 
> > > occurred\n",
> > > + ioc->name, __func__);
> > > + return;
> > > + }
> > > +
> > >   ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
> > 
> > This is a good example of why I don't like pci_device_is_present(): it
> > is fundamentally racy and gives a false sense of security.  Here we
> > *think* we're making the code safer, but in fact we could have this
> > sequence:
> > 
> >   mpt3sas_base_pci_device_is_available()# returns true
> >   # device is removed
> >   ioc_state = mpt3sas_base_get_iocstate()
> > 
> > In this case the readl() inside mpt3sas_base_get_iocstate() will
> > probably return 0x data, and we assume that's valid and
> > continue on our merry way, pretending that "ioc_state" makes sense
> > when it really doesn't.
> 
> The function does the following:
> 
>   ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
>   if ((ioc_state & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL)
>   return;
> 
> where MPI2_IOC_STATE_MASK is 0xF000 and MPI2_IOC_STATE_OPERATIONAL
> is 0x2000.  If the device is removed after the call to
> mpt3sas_base_pci_device_is_available(), the result of the bitwise "and"
> operation would be 0xF000, which is unequal to 0x2000.
> Hence this looks safe.

I agree this particular case is technically safe, but figuring that
out requires an unreasonable amount of analysis.  And there's no hint
in the code that we need to be concerned about whether the readl()
returns valid data, so the need for the analysis won't even occur to
most readers.

I don't feel good about encouraging this style of adding an explicit
test for whether the device is available, followed by a completely
implicit test that accidentally happens to correctly handle a device
that was removed after the explicit test.

If we instead added a test for ~0 after the readl(), we would avoid
the race and give the reader a clue that *any* read from the device
can potentially fail without advance warning.

> I agree that pci_device_is_present() (and the pci_dev_is_disconnected()
> it calls) must be used judiciously, but here it seems to have been done
> correctly.
> 
> One thing to be aware of is that a return value of "true" from
> pci_dev_is_disconnected() is definitive and can be trusted.
> On the other hand a return value of "false" is more like a fuzzy
> "likely not disconnected, but can't give any guarantees".
> So the boolean return value is kind of the problem here.
> Boolean logic doesn't really fit these "definitive if true,
> not definitive if false" semantics.
> 
> However being able to get the definitive answer in the disconnected
> case is valuable:  pciehp is the only entity that can determine
> surprise removal authoritatively and unambiguously (albeit with
> a latency).  All the other tools that we have at our disposal don't
> have that quality:  E.g. checking the Vendor ID is ambiguous because
> it returns a valid value if a device was quickly replaced with another
> one.  Also, all ones may be returned in the case of an Uncorrectable
> Error, but the device may revert to valid responses if the error can
> be recovered.  (Please correct me if I'm wrong.)

I think everything you said above is true, but I'm not yet convinced
that it's being applied usefully in mpt3sas.

  bool pci_dev_is_disconnected(pdev)   # "true" is definitive
  {
return test_bit(PCI_DEV_DISCONNECTED, &dev->priv_flags);
  }

  bool pci_device_is_present(pdev) # "false" is definitive
  {
if (pci_dev_is_disconnected(pdev))
  return false;
return pci_bus_read_dev_vendor_id(...);
  }

  mpt3sas_base_pci_device_is_available(ioc)  # "false" is definitive
  {
return !ioc->pci_error_recovery && pci_device_is_present(ioc->pdev);
  }

  mpt3sas_wait_for_commands_to_complete(...)
  {
...
 +  if (!mpt3sas_base_pci_device_is_available(ioc))
 +return;

# In the definitive case, we returned above.  If we get here, we
# think the device *might* be present.  The readl() inside
# mpt3sas_base_get_iocstate() might fail (returning ~0 data).
# It happens that (~0 & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL
# so we will return below if the device was removed after the
# check above.


Re: [PATCH v6 3/7] scsi: ufs: Add ufs-bsg module

2018-09-27 Thread Christoph Hellwig
On Thu, Sep 27, 2018 at 06:16:44AM +, Avri Altman wrote:
> In V6, we removed the host and device indices from the bsg device name,
> But I have some seconds thoughts about it.
> 
> We are using the bsg device in passthrough mode (bsg_transport_ops),
> But the device name: "ufs-bsg" does not imply that.
> 
> Given that the ABI should never change,
> if someone in the future will want to add a bsg device that uses the 
> bsg_scsi_ops,
> ufs-bsg-scsi seems a little bit awkward, does it?

We should already have a bsg_scsi_ops instance for every SCSI LU, so
they already exist - without any bsg in the name.

I think ufs-bsg is ok.


[PATCH] scsi: qla2xxx: I/Os timing out on surprise removal of

2018-09-27 Thread Bill Kuzeja
When removing an adapter through sysfs, some in flight I/Os can get 
stuck and take a while to complete (they actually timeout and are 
retried). We are not handling an early error exit from 
qla2xxx_eh_abort properly.

Fixes: 45235022da99 ("scsi: qla2xxx: Fix driver unload by shutting down chip") 
Signed-off-by: Bill Kuzeja 
---

When doing a sysfs remove of a QLogic adapter, the driver's remove
function gets called and we end up aborting all in progress I/Os.
Here is the code flow:

qla2x00_remove_one
  qla2x00_abort_isp_cleanup
qla2x00_abort_all_cmds
  __qla2x00_abort_all_cmds
qla2xxx_eh_abort

At the start of qla2xxx_eh_abort, there are some sanity checks done 
before actually sending the abort. One of these checks is a call to 
fc_block_scsi_eh. In the case of a sysfs remove, it turns out that this 
routine can exit with FAST_IO_FAIL.

When this occurs, we return back to __qla2x00_abort_all_cmds with an 
extra reference on sp (because the abort never gets sent). Originally, I 
remedied this kind of situation with another fix:

commit 4cd3b6ebff85 scsi: qla2xxx: Fix extraneous ref on sp's after adapter 
break

But this later added change complicated matters:

commit 45235022da99 scsi: qla2xxx: Fix driver unload by shutting down chip

Because the abort is now being done earlier in the teardown (through 
qla2x00_abort_isp_cleanup), in qla2xxx_eh_abort we make it past 
the first check because qla2x00_isp_reg_stat(ha) returns zero. When we
fail a few lines later in fc_block_scsi_eh, this error is not handled
properly in __qla2x00_abort_all_cmds and the I/O ends up hanging and 
timing out because of the extra reference.

For this fix, I will add this case to __qla2x00_abort_all_cmds where we
check to see if qla2xxx_eh_abort succeeded or not. 

This removes the extra reference in this additional early exit case. In 
my testing, this eliminates the timeouts and delays and the remove 
proceeds smoothly.

---
 drivers/scsi/qla2xxx/qla_os.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 42b8f0d..3ba3765 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -1771,8 +1771,9 @@ uint32_t qla2x00_isp_reg_stat(struct qla_hw_data *ha)
 * if immediate exit from
 * ql2xxx_eh_abort
 */
-   if (status == FAILED &&
-   (qla2x00_isp_reg_stat(ha)))
+   if (((status == FAILED) &&
+   (qla2x00_isp_reg_stat(ha))) ||
+(status == FAST_IO_FAIL))
atomic_dec(
&sp->ref_count);
}
-- 
1.8.3.1



Re: [PATCH v4] scsi: ufs: Make sysfs attributes writable

2018-09-27 Thread Christoph Hellwig
On Thu, Sep 27, 2018 at 06:32:47AM +, Avri Altman wrote:
> Also, in this context there is the series in 
> https://www.spinics.net/lists/linux-scsi/msg123479.html
> which allows to send UPIUs via a bsg device.
> 
> It's not a provisioning series per-se like Evan's and Sayali's.
> It covers the provisioning functionality,
> But also allow to send task management UPIU, and UIC commands,
> Which can be used for testing and validation.

And as someone having been involved with review of a few different
UFS provisioning bits this is what I think we should be merging.

Instead of being in a rat race of adding ever new sysfs or configfs
attributes for things that don't matter to normal driver operation
I'd rather have a relatively clean pass through interface and move
policy to userspace.  Especially given that there are plenty of
vendor specific commands at these levels as well.


RE: [PATCH v6 7/7] scsi: ufs-bsg: Add support for uic commands in ufs_bsg_request()

2018-09-27 Thread Avri Altman
> 
> > -ufshcd_send_uic_cmd(struct ufs_hba *hba, struct uic_command *uic_cmd)
> > +int ufshcd_send_uic_cmd(struct ufs_hba *hba, struct uic_command
> *uic_cmd)
> >  {
> > int ret;
> > unsigned long flags;
> > @@ -2081,6 +2080,7 @@ static inline u8 ufshcd_get_upmcrs(struct ufs_hba
> *hba)
> > ufshcd_release(hba);
> > return ret;
> >  }
> > +EXPORT_SYMBOL_GPL(ufshcd_send_uic_cmd);
> 
> As far as I can tell the bsg code is now build into the main ufshcd
> module, so you shouldn't need this export.
Done.

Thanks,
Avri


RE: [PATCH v6 6/7] scsi: ufs-bsg: Add support for raw upiu in ufs_bsg_request()

2018-09-27 Thread Avri Altman
> 
> > +   case UPIU_TRANSACTION_QUERY_REQ:
> > +   qr = &bsg_request->upiu_req.qr;
> > +   if (qr->opcode == UPIU_QUERY_OPCODE_READ_DESC)
> > +   goto not_supported;
> > +
> > +   if (ufs_bsg_get_query_desc_size(hba, qr, &desc_len))
> > +   goto null_desc_buff;
> > +
> > +   if (qr->opcode == UPIU_QUERY_OPCODE_WRITE_DESC) {
> > +   rw = WRITE;
> > +   desc_buff = (uint8_t *)(bsg_request + 1);
> > +   }
> > +
> > +null_desc_buff:
> > +   /* fall through */
> > +   case UPIU_TRANSACTION_NOP_OUT:
> > +   case UPIU_TRANSACTION_TASK_REQ:
> > +   /* Now that we know if its a read or write, verify again */
> > +   if (rw != UFS_BSG_NOP || desc_len) {
> > +   ret = ufs_bsg_verify_query_size(request_len,
> reply_len,
> > +   rw, desc_len);
> > +   if (ret) {
> > +   dev_err(job->dev,
> > +   "not enough space assigned\n");
> > +   goto out;
> > +   }
> > +   }
> 
> I think this check should be moved into the above switch case
> as it can only be hit for UPIU_TRANSACTION_QUERY_REQ requests.  In
> fact I think the code would be a lot cleaner if you could factor
> the UPIU_TRANSACTION_QUERY_REQ case into a little helper function
> (ufs_bsg_get_query_desc_size should probably be merged into that
> helper also).
Done.

> 
> > +   case UPIU_TRANSACTION_UIC_CMD:
This should introduced only in the next patch

> > +   /* later */
> > +   case UPIU_TRANSACTION_COMMAND:
> > +   case UPIU_TRANSACTION_DATA_OUT:
> > +not_supported:
> > +   /* for the time being, we do not support data transfer upiu */
> > +   ret = -ENOTSUPP;
> > +   dev_err(job->dev, "unsupported msgcode 0x%x\n", msgcode);
> > +
> > +   break;
> > +   default:
> > +   ret = -EINVAL;
> > +
> > +   break;
> > +   }
> 
> This difference between known but not supported and not recognized is
> a bit odd.  I think there should be just the generic not supported
> case, and you can decide if you want to log it or not.
Done.

> 
> Also please try to avoid goto labels jumping into switch statements,
> code is generlaly a lot more readable if you keep the goto targets
> outside the switch statement.
Done.

Thanks,
Avri


Re: [PATCH v6 6/7] scsi: ufs-bsg: Add support for raw upiu in ufs_bsg_request()

2018-09-27 Thread Christoph Hellwig
> + case UPIU_TRANSACTION_QUERY_REQ:
> + qr = &bsg_request->upiu_req.qr;
> + if (qr->opcode == UPIU_QUERY_OPCODE_READ_DESC)
> + goto not_supported;
> +
> + if (ufs_bsg_get_query_desc_size(hba, qr, &desc_len))
> + goto null_desc_buff;
> +
> + if (qr->opcode == UPIU_QUERY_OPCODE_WRITE_DESC) {
> + rw = WRITE;
> + desc_buff = (uint8_t *)(bsg_request + 1);
> + }
> +
> +null_desc_buff:
> + /* fall through */
> + case UPIU_TRANSACTION_NOP_OUT:
> + case UPIU_TRANSACTION_TASK_REQ:
> + /* Now that we know if its a read or write, verify again */
> + if (rw != UFS_BSG_NOP || desc_len) {
> + ret = ufs_bsg_verify_query_size(request_len, reply_len,
> + rw, desc_len);
> + if (ret) {
> + dev_err(job->dev,
> + "not enough space assigned\n");
> + goto out;
> + }
> + }

I think this check should be moved into the above switch case
as it can only be hit for UPIU_TRANSACTION_QUERY_REQ requests.  In
fact I think the code would be a lot cleaner if you could factor
the UPIU_TRANSACTION_QUERY_REQ case into a little helper function
(ufs_bsg_get_query_desc_size should probably be merged into that
helper also).

> + case UPIU_TRANSACTION_UIC_CMD:
> + /* later */
> + case UPIU_TRANSACTION_COMMAND:
> + case UPIU_TRANSACTION_DATA_OUT:
> +not_supported:
> + /* for the time being, we do not support data transfer upiu */
> + ret = -ENOTSUPP;
> + dev_err(job->dev, "unsupported msgcode 0x%x\n", msgcode);
> +
> + break;
> + default:
> + ret = -EINVAL;
> +
> + break;
> + }

This difference between known but not supported and not recognized is
a bit odd.  I think there should be just the generic not supported
case, and you can decide if you want to log it or not.

Also please try to avoid goto labels jumping into switch statements,
code is generlaly a lot more readable if you keep the goto targets
outside the switch statement.


[PATCH -next] scsi: ufs-qcom: Remove set but not used variable 'val'

2018-09-27 Thread YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/scsi/ufs/ufs-qcom.c: In function 'ufs_qcom_pwr_change_notify':
drivers/scsi/ufs/ufs-qcom.c:919:6: warning:
 variable 'val' set but not used [-Wunused-but-set-variable]

Signed-off-by: YueHaibing 
---
 drivers/scsi/ufs/ufs-qcom.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c
index 3dc4501c..0fe957d 100644
--- a/drivers/scsi/ufs/ufs-qcom.c
+++ b/drivers/scsi/ufs/ufs-qcom.c
@@ -916,7 +916,6 @@ static int ufs_qcom_pwr_change_notify(struct ufs_hba *hba,
struct ufs_pa_layer_attr *dev_max_params,
struct ufs_pa_layer_attr *dev_req_params)
 {
-   u32 val;
struct ufs_qcom_host *host = ufshcd_get_variant(hba);
struct ufs_qcom_dev_params ufs_qcom_cap;
int ret = 0;
@@ -985,8 +984,6 @@ static int ufs_qcom_pwr_change_notify(struct ufs_hba *hba,
ret = -EINVAL;
}
 
-   val = ~(MAX_U32 << dev_req_params->lane_tx);
-
/* cache the power mode parameters to use internally */
memcpy(&host->dev_req_params,
dev_req_params, sizeof(*dev_req_params));



Re: [PATCH v6 7/7] scsi: ufs-bsg: Add support for uic commands in ufs_bsg_request()

2018-09-27 Thread Christoph Hellwig
> -ufshcd_send_uic_cmd(struct ufs_hba *hba, struct uic_command *uic_cmd)
> +int ufshcd_send_uic_cmd(struct ufs_hba *hba, struct uic_command *uic_cmd)
>  {
>   int ret;
>   unsigned long flags;
> @@ -2081,6 +2080,7 @@ static inline u8 ufshcd_get_upmcrs(struct ufs_hba *hba)
>   ufshcd_release(hba);
>   return ret;
>  }
> +EXPORT_SYMBOL_GPL(ufshcd_send_uic_cmd);

As far as I can tell the bsg code is now build into the main ufshcd
module, so you shouldn't need this export.


Re: aacraid: latest driver results in Host adapter abort request. / Outstanding commands on (0,0,0,0):

2018-09-27 Thread Stefan Priebe - Profihost AG


Am 27.09.2018 um 12:59 schrieb Emmanuel Florac:
> Le Wed, 19 Sep 2018 08:10:41 +0200
> Stefan Priebe - Profihost AG  écrivait:
> 
>> Hello,
>>
>> after upgrading the aacraid driver / kernel from aacraid 50792 to
>> aacraid 50877.
>>
>> I get aborted comands every night.
>>
> 
> I have exactly the same problem since 4.15 was out, and nobody
> answered... aacraid doesn't work properly neither with ASR7xx5 or
> ASR8xx5 cards.

Than it must be something different. ASR7 and ASR8 cards work fine for
me. Only series ASR6 cards trigger Host adapter abort request under high
load.

Stefan


[PATCH] qedi: Initialize the stats mutex lock

2018-09-27 Thread Nilesh Javali
Fix kernel NULL pointer dereference,

Call Trace:
  [] __mutex_lock_slowpath+0xa6/0x1d0
  [] mutex_lock+0x1f/0x2f
  [] qedi_get_protocol_tlv_data+0x61/0x450 [qedi]
  [] ? map_vm_area+0x2e/0x40
  [] ? __vmalloc_node_range+0x170/0x280
  [] ? qed_mfw_process_tlv_req+0x27d/0xbd0 [qed]
  [] qed_mfw_fill_tlv_data+0x4b/0xb0 [qed]
  [] qed_mfw_process_tlv_req+0x299/0xbd0 [qed]
  [] ? __switch_to+0xce/0x580
  [] qed_slowpath_task+0x5b/0x80 [qed]

Signed-off-by: Nilesh Javali 
---
 drivers/scsi/qedi/qedi_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index cc8e64d..e5bd035 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -2472,6 +2472,7 @@ static int __qedi_probe(struct pci_dev *pdev, int mode)
/* start qedi context */
spin_lock_init(&qedi->hba_lock);
spin_lock_init(&qedi->task_idx_lock);
+   mutex_init(&qedi->stats_lock);
}
qedi_ops->ll2->register_cb_ops(qedi->cdev, &qedi_ll2_cb_ops, qedi);
qedi_ops->ll2->start(qedi->cdev, ¶ms);
-- 
1.8.3.1



RE: [PATCH] scsi: qla4xxx: remove redundant check on drvr_wait

2018-09-27 Thread Rangankar, Manish

> -Original Message-
> From: Colin King 
> Sent: Wednesday, September 26, 2018 6:39 PM
> To: Dept-Eng QLogic Storage Upstream  upstr...@cavium.com>; James E . J . Bottomley ;
> Martin K . Petersen ; linux-scsi@vger.kernel.org
> Cc: kernel-janit...@vger.kernel.org; linux-ker...@vger.kernel.org
> Subject: [PATCH] scsi: qla4xxx: remove redundant check on drvr_wait
> 
> External Email
> 
> From: Colin Ian King 
> 
> The check for a non-zero drvr_wait is redundant as the same check is performed
> earlier in the outer while loop, the inner check will always be true if we 
> reached
> this point inside the while loop.
> Remove the redundant if check.
> 
> Detected by cppcheck:
> (warning) Identical inner 'if' condition is always true.
> 
> Signed-off-by: Colin Ian King 
> ---
>  drivers/scsi/qla4xxx/ql4_init.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/qla4xxx/ql4_init.c 
> b/drivers/scsi/qla4xxx/ql4_init.c index
> 52b1a0bc93c9..1ef74aa2d00a 100644
> --- a/drivers/scsi/qla4xxx/ql4_init.c
> +++ b/drivers/scsi/qla4xxx/ql4_init.c
> @@ -766,12 +766,10 @@ int ql4xxx_lock_drvr_wait(struct scsi_qla_host *a)
> while (drvr_wait) {
> if (ql4xxx_lock_drvr(a) == 0) {
> ssleep(QL4_LOCK_DRVR_SLEEP);
> -   if (drvr_wait) {
> -   DEBUG2(printk("scsi%ld: %s: Waiting for "
> - "Global Init 
> Semaphore(%d)...\n",
> - a->host_no,
> - __func__, drvr_wait));
> -   }
> +   DEBUG2(printk("scsi%ld: %s: Waiting for "
> + "Global Init Semaphore(%d)...\n",
> + a->host_no,
> + __func__, drvr_wait));
> drvr_wait -= QL4_LOCK_DRVR_SLEEP;
> } else {
> DEBUG2(printk("scsi%ld: %s: Global Init Semaphore "
> --
> 2.17.1

Thanks,

Acked-by: Manish Rangankar 


Re: aacraid: latest driver results in Host adapter abort request. / Outstanding commands on (0,0,0,0):

2018-09-27 Thread Emmanuel Florac
Le Mon, 24 Sep 2018 08:11:20 +0200
Stefan Priebe - Profihost AG  écrivait:

> all series 6 controllers are those with problems when high load
> happens.
> 

As I mentioned in my messages to the ML since late june, the driver
doesn't work either with 7xx5 and 8xx5 controllers. Who's maintaining
this driver?

-- 

Emmanuel Florac |   Direction technique
|   Intellique
|   
|   +33 1 78 94 84 02



pgp3m6dCcF7E3.pgp
Description: Signature digitale OpenPGP


Re: aacraid: latest driver results in Host adapter abort request. / Outstanding commands on (0,0,0,0):

2018-09-27 Thread Emmanuel Florac
Le Sun, 23 Sep 2018 20:22:56 +0200
Stefan Priebe - Profihost AG  écrivait:

> Am 22.09.2018 um 23:40 schrieb Bart Van Assche:
> > On 9/18/18 11:10 PM, Stefan Priebe - Profihost AG wrote:  
> >> after upgrading the aacraid driver / kernel from aacraid 50792 to
> >> aacraid 50877.  
> > 
> > The aacraid driver version was updated to 50792 in commit
> > 0662cc968ace ("scsi: aacraid: Update driver version") and to 50877
> > in commit 1cdb74b80f93 ("scsi: aacraid: Update driver version to
> > 50877"). That means that the regression you encountered got
> > introduced after commit 0662cc968ace. 114 changes got checked in
> > after that commit. That's too much to find the root cause by
> > rereading all these changes. Is there any way to trigger the
> > problem faster such that it becomes feasible to run a bisect?  
> 
> Sadly i'm not able. May be also something else in the kernel has
> changed.
> 
> I'm now trying the original out of tree driver from microsemi /
> adaptec: Adaptec aacraid driver 1.2.1.56008src
> 
> No idea how those driver versions corespond to the kernel ones.

I can tell you that the last working version for me is 50834. You can
find the version number in aacraid.h, that's the value of
the AAC_DRIVER_BUILD constant.

the 50877 driver has been broken since it was out several months ago!

-- 

Emmanuel Florac |   Direction technique
|   Intellique
|   
|   +33 1 78 94 84 02



pgpnWeRtVzBvT.pgp
Description: Signature digitale OpenPGP


Re: aacraid: latest driver results in Host adapter abort request. / Outstanding commands on (0,0,0,0):

2018-09-27 Thread Emmanuel Florac
Le Wed, 19 Sep 2018 08:10:41 +0200
Stefan Priebe - Profihost AG  écrivait:

> Hello,
> 
> after upgrading the aacraid driver / kernel from aacraid 50792 to
> aacraid 50877.
> 
> I get aborted comands every night.
> 

I have exactly the same problem since 4.15 was out, and nobody
answered... aacraid doesn't work properly neither with ASR7xx5 or
ASR8xx5 cards.


-- 

Emmanuel Florac |   Direction technique
|   Intellique
|   
|   +33 1 78 94 84 02



pgpONEVFsp4Xf.pgp
Description: Signature digitale OpenPGP


Re: [PATCH v4 2/6] mpt3sas: Separate out mpt3sas_wait_for_ioc_to_operational

2018-09-27 Thread Suganath Prabu Subramani
Hi Bjorn,

Thanks for reviewing.

On Thu, Sep 27, 2018 at 2:33 AM Bjorn Helgaas  wrote:
>
> On Wed, Sep 26, 2018 at 09:52:35AM +0530, Suganath Prabu S wrote:
> > Introduce mpt3sas_wait_for_ioc_to_operational.
> >
> > This section of code "wait for IOC to be operational"
> > is used in many places across the driver,
> > and hence moved this section of code in to the function
> > "mpt3sas_wait_for_ioc_to_operational".
> >
> > Also added HBA hot unplug checks, and this returns with
> > error code EFAULT, if it detects HBA is hot unplugged
> > or IOC is not in operational state.
>
> This should be two patches:
>
>   1) Factor out the "wait for IOC" code.  This should not cause any
>  functional changes (I didn't verify in your code, but this is the
>  objective).
>
>   2) Add the hot unplug checks.
>
> This makes the patches much easier to review.
> Will move hot unplug check to separate patch.
> > V2 change set:
> > used pci_device_is_present instead of
> > mpt3sas_base_pci_device_is_unplugged
> >
> > v4 Change set:
> > Dont split strings in print statement
>
> I don't know the convention in drivers/scsi, but in driver/pci, I
> remove this sort of v2/v3 commentary from the changelog because it's
> really not of interest after the patch is merged.  It's nice to have
> it in the email, but I think if you put it after a line containing
> only "--" it will be in the email but not the changelog.
>Thanks, Will add change sets after --
> > Signed-off-by: Suganath Prabu S 
> > ---
> >  drivers/scsi/mpt3sas/mpt3sas_base.c  | 92 
> > +++-
> >  drivers/scsi/mpt3sas/mpt3sas_base.h  |  4 ++
> >  drivers/scsi/mpt3sas/mpt3sas_config.c| 28 +++---
> >  drivers/scsi/mpt3sas/mpt3sas_ctl.c   | 26 ++---
> >  drivers/scsi/mpt3sas/mpt3sas_transport.c | 75 +-
> >  5 files changed, 81 insertions(+), 144 deletions(-)
> >
> > diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
> > b/drivers/scsi/mpt3sas/mpt3sas_base.c
> > index c880e72..9f1d8fb 100644
> > --- a/drivers/scsi/mpt3sas/mpt3sas_base.c
> > +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
> > @@ -5176,6 +5176,53 @@ _base_send_ioc_reset(struct MPT3SAS_ADAPTER *ioc, u8 
> > reset_type, int timeout)
> >  }
> >
> >  /**
> > + * mpt3sas_wait_for_ioc_to_operational - IOC's operational
> > + *   state and HBA hot unplug status are checked here.
> > + * @ioc: per adapter object
> > + * @wait_count: timeout in seconds
>
> "wait_count" is really a timeout and maybe should be named "timeout".
>Yes, wait_count is timeout, will rename wait_count to timeout
> > + * Return:  Returns EFAULT, if HBA is hot unplugged or IOC is
> > + * not in operational state, within the wait_count.
> > + * And returns 0, If not hot unplugged Or ioc is in
> > + * operational state.
>
> I think you mean something like:
>
>   Waits up to wait_count seconds for the IOC to become operational.
>   Returns 0 if IOC is present and operational; otherwise returns
>   -EFAULT.
> Yes.
> > + */
> > +
> > +int
> > +mpt3sas_wait_for_ioc_to_operational(struct MPT3SAS_ADAPTER *ioc,
> > + int wait_count)


Re: [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available

2018-09-27 Thread Lukas Wunner
On Wed, Sep 26, 2018 at 04:32:41PM -0500, Bjorn Helgaas wrote:
> On Wed, Sep 26, 2018 at 09:52:34AM +0530, Suganath Prabu S wrote:
> > @@ -6853,6 +6872,13 @@ mpt3sas_wait_for_commands_to_complete(struct 
> > MPT3SAS_ADAPTER *ioc)
> >  
> > ioc->pending_io_count = 0;
> >  
> > +   if (!mpt3sas_base_pci_device_is_available(ioc)) {
> > +   pr_err(MPT3SAS_FMT
> > +   "%s: pci error recovery reset or pci device unplug 
> > occurred\n",
> > +   ioc->name, __func__);
> > +   return;
> > +   }
> > +
> > ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
> 
> This is a good example of why I don't like pci_device_is_present(): it
> is fundamentally racy and gives a false sense of security.  Here we
> *think* we're making the code safer, but in fact we could have this
> sequence:
> 
>   mpt3sas_base_pci_device_is_available()# returns true
>   # device is removed
>   ioc_state = mpt3sas_base_get_iocstate()
> 
> In this case the readl() inside mpt3sas_base_get_iocstate() will
> probably return 0x data, and we assume that's valid and
> continue on our merry way, pretending that "ioc_state" makes sense
> when it really doesn't.

The function does the following:

ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
if ((ioc_state & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL)
return;

where MPI2_IOC_STATE_MASK is 0xF000 and MPI2_IOC_STATE_OPERATIONAL
is 0x2000.  If the device is removed after the call to
mpt3sas_base_pci_device_is_available(), the result of the bitwise "and"
operation would be 0xF000, which is unequal to 0x2000.
Hence this looks safe.

I agree that pci_device_is_present() (and the pci_dev_is_disconnected()
it calls) must be used judiciously, but here it seems to have been done
correctly.

One thing to be aware of is that a return value of "true" from
pci_dev_is_disconnected() is definitive and can be trusted.
On the other hand a return value of "false" is more like a fuzzy
"likely not disconnected, but can't give any guarantees".
So the boolean return value is kind of the problem here.
Boolean logic doesn't really fit these "definitive if true,
not definitive if false" semantics.

However being able to get the definitive answer in the disconnected
case is valuable:  pciehp is the only entity that can determine
surprise removal authoritatively and unambiguously (albeit with
a latency).  All the other tools that we have at our disposal don't
have that quality:  E.g. checking the Vendor ID is ambiguous because
it returns a valid value if a device was quickly replaced with another
one.  Also, all ones may be returned in the case of an Uncorrectable
Error, but the device may revert to valid responses if the error can
be recovered.  (Please correct me if I'm wrong.)

Thanks,

Lukas