[PATCH] hpsa: correct enclosure sas address

2018-07-03 Thread Don Brace
- separate enclosure logical identifier from the
  SAS address.

The original complaint was the lsscsi -t showed the same SAS
address of the two enclosures (SEP devices). In fact the
SAS address was being set to the Enclosure Logical Identifier (ELI).

Reviewed-by: Scott Teel 
Reviewed-by: Kevin Barnett 
Signed-off-by: Don Brace 
---
 drivers/scsi/hpsa.c |   25 +
 drivers/scsi/hpsa.h |1 +
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 15c7f3b6f35e..58bb70b886d7 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -3440,11 +3440,11 @@ static void hpsa_get_enclosure_info(struct ctlr_info *h,
struct ext_report_lun_entry *rle = &rlep->LUN[rle_index];
u16 bmic_device_index = 0;
 
-   bmic_device_index = GET_BMIC_DRIVE_NUMBER(&rle->lunid[0]);
-
-   encl_dev->sas_address =
+   encl_dev->eli =
hpsa_get_enclosure_logical_identifier(h, scsi3addr);
 
+   bmic_device_index = GET_BMIC_DRIVE_NUMBER(&rle->lunid[0]);
+
if (encl_dev->target == -1 || encl_dev->lun == -1) {
rc = IO_OK;
goto out;
@@ -9697,7 +9697,24 @@ hpsa_sas_get_linkerrors(struct sas_phy *phy)
 static int
 hpsa_sas_get_enclosure_identifier(struct sas_rphy *rphy, u64 *identifier)
 {
-   *identifier = rphy->identify.sas_address;
+   struct Scsi_Host *shost = phy_to_shost(rphy);
+   struct ctlr_info *h;
+   struct hpsa_scsi_dev_t *sd;
+
+   if (!shost)
+   return -ENXIO;
+
+   h = shost_to_hba(shost);
+
+   if (!h)
+   return -ENXIO;
+
+   sd = hpsa_find_device_by_sas_rphy(h, rphy);
+   if (!sd)
+   return -ENXIO;
+
+   *identifier = sd->eli;
+
return 0;
 }
 
diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index fb9f5e7f8209..59e023696fff 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -68,6 +68,7 @@ struct hpsa_scsi_dev_t {
 #define RAID_CTLR_LUNID "\0\0\0\0\0\0\0\0"
unsigned char device_id[16];/* from inquiry pg. 0x83 */
u64 sas_address;
+   u64 eli;/* from report diags. */
unsigned char vendor[8];/* bytes 8-15 of inquiry data */
unsigned char model[16];/* bytes 16-31 of inquiry data */
unsigned char rev;  /* byte 2 of inquiry data */



RE: aacraid driver, kernel 4.14 and up, ASR8xxx controller : doesn't work

2018-07-03 Thread Dave Carroll


> I have already set up many drives of this size previously, using
> ASR8xx5 controllers, without any problem.
> 
> On another machine with a simple 8x1TB array, it doesn't work any better, 
> while
> an older kernel works perfectly fine too:
> 
> 4.14.48 :
> 
> [   61.069190] Adaptec aacraid driver 1.2.1[50834]-custom
> [   61.069527] aacraid :21:00.0: SME is active, device will require DMA
> bounce buffers
> [   61.076949] SME is active and system is using DMA bounce buffers
> [   61.076954] aacraid: Comm Interface type2 enabled
> 

Both controllers are capable of 64-bit DMA, however the communication area 
should be 32-bit.
Is this issue specific to Secure Memory Encryption? Does it occur without that 
enabled?

> 
> Nothing else happens for minutes. Attached devices are unavailable.
> "arcconf" find no controller. "rmmod aacraid" doesn't work (device in use).
> 
> Running kernel 4.16.14 with the 8 TB array is exactly the same:

Thanks for that info ... I won't have to find a lot of drives.

> 
> [   40.380488] Adaptec aacraid driver 1.2.1[50877]-custom
> [   40.380871] aacraid :21:00.0: SME is active, device will require DMA
> bounce buffers
> [   40.388991] SME is active and system is using DMA bounce buffers
> [   40.388995] aacraid: Comm Interface type2 enabled
> 
> In contrast, kernel 4.13, though the driver version is the same as 4.14, just
> works:
> 
> [   25.286437] Adaptec aacraid driver 1.2.1[50834]-custom
> [   25.293694] aacraid: Comm Interface type2 enabled
> [   25.300799] AAC0: kernel 7.13-0[33263] Mar 16 2018
> [   25.300801] AAC0: monitor 7.13-0[33263]
> [   25.300802] AAC0: bios 7.13-0[33263]
> [   25.300804] AAC0: serial 6A46639462D
> [   25.300804] AAC0: Non-DASD support enabled.
> [   25.300805] AAC0: 64bit support enabled.
> [   25.300807] aacraid :21:00.0: 64 Bit DAC enabled
> ...
> [   27.307013] scsi host16: aacraid
> [   27.307228] scsi 16:0:0:0: Direct-Access ASR8805  LogicalDrv 0 
> V1.0 PQ: 0
> ANSI: 2
> [   27.307364] sd 16:0:0:0: [sdm] Very big device. Trying to use READ
> CAPACITY(16).
> [   27.307376] sd 16:0:0:0: [sdm] 11714670592 512-byte logical blocks: (6.00
> TB/5.45 TiB)
> [   27.307385] sd 16:0:0:0: [sdm] Write Protect is off
> [   27.307386] sd 16:0:0:0: [sdm] Mode Sense: 12 00 10 08
> [   27.307403] sd 16:0:0:0: [sdm] Write cache: disabled, read cache: enabled,
> supports DPO and FUA
> [   27.307411] sd 16:0:0:0: Attached scsi generic sg12 type 0
> [   27.307507] sd 16:0:0:0: [sdm] Very big device. Trying to use READ
> CAPACITY(16).
> [   27.307830] sd 16:0:0:0: [sdm] Very big device. Trying to use READ
> CAPACITY(16).
> [   27.307855] sd 16:0:0:0: [sdm] Attached SCSI removable disk
> [   27.332731] scsi 16:1:0:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01
> PQ: 1 ANSI: 6
> [   27.355431] scsi 16:1:0:0: Attached scsi generic sg13 type 0
> [   27.355830] scsi 16:1:1:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01
> PQ: 1 ANSI: 6
> [   27.385377] scsi 16:1:1:0: Attached scsi generic sg14 type 0
> [   27.385788] scsi 16:1:2:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01
> PQ: 1 ANSI: 6
> [   27.413412] scsi 16:1:2:0: Attached scsi generic sg15 type 0
> [   27.413813] scsi 16:1:3:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01
> PQ: 1 ANSI: 6
> [   27.437420] scsi 16:1:3:0: Attached scsi generic sg16 type 0
> [   27.437836] scsi 16:1:4:0: Direct-Access ATA  WDC WD10TPVT-00H 1A01
> PQ: 1 ANSI: 6
> [   27.636675] scsi 16:1:4:0: Attached scsi generic sg17 type 0
> [   27.637077] scsi 16:1:5:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01
> PQ: 1 ANSI: 6
> [   27.664591] scsi 16:1:5:0: Attached scsi generic sg18 type 0
> [   27.664996] scsi 16:1:6:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01
> PQ: 1 ANSI: 6
> [   27.697619] scsi 16:1:6:0: Attached scsi generic sg19 type 0
> [   27.698027] scsi 16:1:7:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01
> PQ: 1 ANSI: 6
> [   27.732645] scsi 16:1:7:0: Attached scsi generic sg20 type 0
> 

Thanks for the additional info ... -Dave


Re: aacraid driver, kernel 4.14 and up, ASR8xxx controller : doesn't work

2018-07-03 Thread Emmanuel Florac
Le Tue, 3 Jul 2018 16:59:41 +
Dave Carroll  écrivait:

> Hi Emmanuel,
> 
> It is curious that the FW is having outstanding commands ... I've
> created a ticket to iderntify the differences. I suspect that the
> large drive size may be related, but all options are open.
> 

I have already set up many drives of this size previously, using
ASR8xx5 controllers, without any problem.

On another machine with a simple 8x1TB array, it doesn't work any
better, while an older kernel works perfectly fine too:

4.14.48 :

[   61.069190] Adaptec aacraid driver 1.2.1[50834]-custom
[   61.069527] aacraid :21:00.0: SME is active, device will require DMA 
bounce buffers
[   61.076949] SME is active and system is using DMA bounce buffers
[   61.076954] aacraid: Comm Interface type2 enabled


Nothing else happens for minutes. Attached devices are unavailable.
"arcconf" find no controller. "rmmod aacraid" doesn't work (device in
use). 

Running kernel 4.16.14 with the 8 TB array is exactly the same:

[   40.380488] Adaptec aacraid driver 1.2.1[50877]-custom
[   40.380871] aacraid :21:00.0: SME is active, device will require DMA 
bounce buffers
[   40.388991] SME is active and system is using DMA bounce buffers
[   40.388995] aacraid: Comm Interface type2 enabled

In contrast, kernel 4.13, though the driver version is the same as
4.14, just works:

[   25.286437] Adaptec aacraid driver 1.2.1[50834]-custom
[   25.293694] aacraid: Comm Interface type2 enabled
[   25.300799] AAC0: kernel 7.13-0[33263] Mar 16 2018
[   25.300801] AAC0: monitor 7.13-0[33263]
[   25.300802] AAC0: bios 7.13-0[33263]
[   25.300804] AAC0: serial 6A46639462D
[   25.300804] AAC0: Non-DASD support enabled.
[   25.300805] AAC0: 64bit support enabled.
[   25.300807] aacraid :21:00.0: 64 Bit DAC enabled
...
[   27.307013] scsi host16: aacraid
[   27.307228] scsi 16:0:0:0: Direct-Access ASR8805  LogicalDrv 0 V1.0 
PQ: 0 ANSI: 2
[   27.307364] sd 16:0:0:0: [sdm] Very big device. Trying to use READ 
CAPACITY(16).
[   27.307376] sd 16:0:0:0: [sdm] 11714670592 512-byte logical blocks: (6.00 
TB/5.45 TiB)
[   27.307385] sd 16:0:0:0: [sdm] Write Protect is off
[   27.307386] sd 16:0:0:0: [sdm] Mode Sense: 12 00 10 08
[   27.307403] sd 16:0:0:0: [sdm] Write cache: disabled, read cache: enabled, 
supports DPO and FUA
[   27.307411] sd 16:0:0:0: Attached scsi generic sg12 type 0
[   27.307507] sd 16:0:0:0: [sdm] Very big device. Trying to use READ 
CAPACITY(16).
[   27.307830] sd 16:0:0:0: [sdm] Very big device. Trying to use READ 
CAPACITY(16).
[   27.307855] sd 16:0:0:0: [sdm] Attached SCSI removable disk
[   27.332731] scsi 16:1:0:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01 
PQ: 1 ANSI: 6
[   27.355431] scsi 16:1:0:0: Attached scsi generic sg13 type 0
[   27.355830] scsi 16:1:1:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01 
PQ: 1 ANSI: 6
[   27.385377] scsi 16:1:1:0: Attached scsi generic sg14 type 0
[   27.385788] scsi 16:1:2:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01 
PQ: 1 ANSI: 6
[   27.413412] scsi 16:1:2:0: Attached scsi generic sg15 type 0
[   27.413813] scsi 16:1:3:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01 
PQ: 1 ANSI: 6
[   27.437420] scsi 16:1:3:0: Attached scsi generic sg16 type 0
[   27.437836] scsi 16:1:4:0: Direct-Access ATA  WDC WD10TPVT-00H 1A01 
PQ: 1 ANSI: 6
[   27.636675] scsi 16:1:4:0: Attached scsi generic sg17 type 0
[   27.637077] scsi 16:1:5:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01 
PQ: 1 ANSI: 6
[   27.664591] scsi 16:1:5:0: Attached scsi generic sg18 type 0
[   27.664996] scsi 16:1:6:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01 
PQ: 1 ANSI: 6
[   27.697619] scsi 16:1:6:0: Attached scsi generic sg19 type 0
[   27.698027] scsi 16:1:7:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01 
PQ: 1 ANSI: 6
[   27.732645] scsi 16:1:7:0: Attached scsi generic sg20 type 0




-- 

Emmanuel Florac |   Direction technique
|   Intellique
|   
|   +33 1 78 94 84 02



pgpljfhS_hxGN.pgp
Description: Signature digitale OpenPGP


RE: aacraid driver, kernel 4.14 and up, ASR8xxx controller : doesn't work

2018-07-03 Thread Dave Carroll


> After a very long time, it finally boots up and sees the disks, but
> here's the output from dmesg | grep aacraid:
> 
> [1.357760] Adaptec aacraid driver 1.2.1[50877]-custom
> [1.388119] aacraid: Comm Interface type2 enabled
> [3.405113] scsi host0: aacraid
> [   50.156024] aacraid: Host adapter abort request.
>aacraid: Outstanding commands on (0,0,0,0):
> [   50.156126] aacraid: Host adapter abort request.
>aacraid: Outstanding commands on (0,0,0,0):
> [   50.172032] aacraid: Host adapter reset request. SCSI hang ?
> [   65.536106] aacraid: Host adapter reset request. SCSI hang ?
> [   65.536204] aacraid :01:00.0: outstanding cmd: midlevel-0
> [   65.536206] aacraid :01:00.0: outstanding cmd: lowlevel-0
> [   65.536207] aacraid :01:00.0: outstanding cmd: error handler-0
> [   65.536208] aacraid :01:00.0: outstanding cmd: firmware-2
> [   65.536210] aacraid :01:00.0: outstanding cmd: kernel-0
> [   65.536228] aacraid :01:00.0: Controller reset type is 3
> [   65.536314] aacraid :01:00.0: Issuing IOP reset
> [   98.675617] aacraid :01:00.0: IOP reset succeded
> [   98.684161] aacraid: Comm Interface type2 enabled
> [  110.015352] aacraid :01:00.0: Scheduling bus rescan
> [  166.896116] aacraid: Host adapter reset request. SCSI hang ?
> [  166.896214] aacraid :01:00.0: outstanding cmd: midlevel-0
> [  166.896216] aacraid :01:00.0: outstanding cmd: lowlevel-0
> [  166.896217] aacraid :01:00.0: outstanding cmd: error handler-0
> [  166.896218] aacraid :01:00.0: outstanding cmd: firmware-2
> [  166.896220] aacraid :01:00.0: outstanding cmd: kernel-0
> [  166.896236] aacraid :01:00.0: Controller reset type is 3
> [  166.896322] aacraid :01:00.0: Issuing IOP reset
> [  198.858466] aacraid :01:00.0: IOP reset succeded
> [  198.870660] aacraid: Comm Interface type2 enabled
> [  211.129896] aacraid :01:00.0: Scheduling bus rescan
> [  228.844034] aacraid: Host adapter abort request.
>aacraid: Outstanding commands on (0,0,0,0):
> [  266.835610] aacraid: Host adapter reset request. SCSI hang ?
> [  266.837891] aacraid :01:00.0: outstanding cmd: midlevel-0
> [  266.837894] aacraid :01:00.0: outstanding cmd: lowlevel-0
> [  266.837897] aacraid :01:00.0: outstanding cmd: error handler-0
> [  266.837899] aacraid :01:00.0: outstanding cmd: firmware-2
> [  266.837902] aacraid :01:00.0: outstanding cmd: kernel-0
> [  266.837939] aacraid :01:00.0: Controller reset type is 3
> [  266.840415] aacraid :01:00.0: Issuing IOP reset
> [  299.846642] aacraid :01:00.0: IOP reset succeded
> [  299.858811] aacraid: Comm Interface type2 enabled
> [  312.098221] aacraid :01:00.0: Scheduling bus rescan
> [  367.869277] aacraid: Host adapter reset request. SCSI hang ?
> [  367.871382] aacraid :01:00.0: outstanding cmd: midlevel-0
> [  367.871385] aacraid :01:00.0: outstanding cmd: lowlevel-0
> [  367.871387] aacraid :01:00.0: outstanding cmd: error handler-0
> [  367.871389] aacraid :01:00.0: outstanding cmd: firmware-2
> [  367.871391] aacraid :01:00.0: outstanding cmd: kernel-0
> [  367.871480] aacraid :01:00.0: Controller reset type is 3
> [  367.873982] aacraid :01:00.0: Issuing IOP reset
> [  400.765513] aacraid :01:00.0: IOP reset succeded
> [  400.776673] aacraid: Comm Interface type2 enabled
> [  413.036505] aacraid :01:00.0: Scheduling bus rescan
> [  468.995678] aacraid: Host adapter reset request. SCSI hang ?
> [  468.995700] aacraid :01:00.0: outstanding cmd: midlevel-0
> [  468.995704] aacraid :01:00.0: outstanding cmd: lowlevel-0
> [  468.995706] aacraid :01:00.0: outstanding cmd: error handler-0
> [  468.995709] aacraid :01:00.0: outstanding cmd: firmware-2
> [  468.995711] aacraid :01:00.0: outstanding cmd: kernel-0
> [  468.995740] aacraid :01:00.0: Controller reset type is 3
> [  468.995745] aacraid :01:00.0: Issuing IOP reset
> [  501.875537] aacraid :01:00.0: IOP reset succeded
> [  501.887288] aacraid: Comm Interface type2 enabled
> [  514.148609] aacraid :01:00.0: Scheduling bus rescan
> 
> The RAID controller is unusable, obviously. Rebooting with 4.13, now...

Hi Emmanuel,

It is curious that the FW is having outstanding commands ... I've created a 
ticket to iderntify the differences. I suspect that the large drive size may be 
related, but all options are open.

Thanks, -Dave


Re: aacraid driver, kernel 4.14 and up, ASR8xxx controller : doesn't work

2018-07-03 Thread Emmanuel Florac
Le Wed, 27 Jun 2018 18:48:56 +
Dave Carroll  écrivait:

> > Is this size consistent with the 4.13 kernel? That size is greater
> > than the 64-bit LBA addressing (0x93 539F B000).  
> 
> Sorry, that comment was incorrect, but I would like to see if the
> size is consistent between the kernels.


After a very long time, it finally boots up and sees the disks, but
here's the output from dmesg | grep aacraid:

[1.357760] Adaptec aacraid driver 1.2.1[50877]-custom
[1.388119] aacraid: Comm Interface type2 enabled
[3.405113] scsi host0: aacraid
[   50.156024] aacraid: Host adapter abort request.
   aacraid: Outstanding commands on (0,0,0,0):
[   50.156126] aacraid: Host adapter abort request.
   aacraid: Outstanding commands on (0,0,0,0):
[   50.172032] aacraid: Host adapter reset request. SCSI hang ?
[   65.536106] aacraid: Host adapter reset request. SCSI hang ?
[   65.536204] aacraid :01:00.0: outstanding cmd: midlevel-0
[   65.536206] aacraid :01:00.0: outstanding cmd: lowlevel-0
[   65.536207] aacraid :01:00.0: outstanding cmd: error handler-0
[   65.536208] aacraid :01:00.0: outstanding cmd: firmware-2
[   65.536210] aacraid :01:00.0: outstanding cmd: kernel-0
[   65.536228] aacraid :01:00.0: Controller reset type is 3
[   65.536314] aacraid :01:00.0: Issuing IOP reset
[   98.675617] aacraid :01:00.0: IOP reset succeded
[   98.684161] aacraid: Comm Interface type2 enabled
[  110.015352] aacraid :01:00.0: Scheduling bus rescan
[  166.896116] aacraid: Host adapter reset request. SCSI hang ?
[  166.896214] aacraid :01:00.0: outstanding cmd: midlevel-0
[  166.896216] aacraid :01:00.0: outstanding cmd: lowlevel-0
[  166.896217] aacraid :01:00.0: outstanding cmd: error handler-0
[  166.896218] aacraid :01:00.0: outstanding cmd: firmware-2
[  166.896220] aacraid :01:00.0: outstanding cmd: kernel-0
[  166.896236] aacraid :01:00.0: Controller reset type is 3
[  166.896322] aacraid :01:00.0: Issuing IOP reset
[  198.858466] aacraid :01:00.0: IOP reset succeded
[  198.870660] aacraid: Comm Interface type2 enabled
[  211.129896] aacraid :01:00.0: Scheduling bus rescan
[  228.844034] aacraid: Host adapter abort request.
   aacraid: Outstanding commands on (0,0,0,0):
[  266.835610] aacraid: Host adapter reset request. SCSI hang ?
[  266.837891] aacraid :01:00.0: outstanding cmd: midlevel-0
[  266.837894] aacraid :01:00.0: outstanding cmd: lowlevel-0
[  266.837897] aacraid :01:00.0: outstanding cmd: error handler-0
[  266.837899] aacraid :01:00.0: outstanding cmd: firmware-2
[  266.837902] aacraid :01:00.0: outstanding cmd: kernel-0
[  266.837939] aacraid :01:00.0: Controller reset type is 3
[  266.840415] aacraid :01:00.0: Issuing IOP reset
[  299.846642] aacraid :01:00.0: IOP reset succeded
[  299.858811] aacraid: Comm Interface type2 enabled
[  312.098221] aacraid :01:00.0: Scheduling bus rescan
[  367.869277] aacraid: Host adapter reset request. SCSI hang ?
[  367.871382] aacraid :01:00.0: outstanding cmd: midlevel-0
[  367.871385] aacraid :01:00.0: outstanding cmd: lowlevel-0
[  367.871387] aacraid :01:00.0: outstanding cmd: error handler-0
[  367.871389] aacraid :01:00.0: outstanding cmd: firmware-2
[  367.871391] aacraid :01:00.0: outstanding cmd: kernel-0
[  367.871480] aacraid :01:00.0: Controller reset type is 3
[  367.873982] aacraid :01:00.0: Issuing IOP reset
[  400.765513] aacraid :01:00.0: IOP reset succeded
[  400.776673] aacraid: Comm Interface type2 enabled
[  413.036505] aacraid :01:00.0: Scheduling bus rescan
[  468.995678] aacraid: Host adapter reset request. SCSI hang ?
[  468.995700] aacraid :01:00.0: outstanding cmd: midlevel-0
[  468.995704] aacraid :01:00.0: outstanding cmd: lowlevel-0
[  468.995706] aacraid :01:00.0: outstanding cmd: error handler-0
[  468.995709] aacraid :01:00.0: outstanding cmd: firmware-2
[  468.995711] aacraid :01:00.0: outstanding cmd: kernel-0
[  468.995740] aacraid :01:00.0: Controller reset type is 3
[  468.995745] aacraid :01:00.0: Issuing IOP reset
[  501.875537] aacraid :01:00.0: IOP reset succeded
[  501.887288] aacraid: Comm Interface type2 enabled
[  514.148609] aacraid :01:00.0: Scheduling bus rescan

The RAID controller is unusable, obviously. Rebooting with 4.13, now...

-- 

Emmanuel Florac |   Direction technique
|   Intellique
|   
|   +33 1 78 94 84 02



pgpiO9bja167Q.pgp
Description: Signature digitale OpenPGP


Re: aacraid driver, kernel 4.14 and up, ASR8xxx controller : doesn't work

2018-07-03 Thread Emmanuel Florac
Le Wed, 27 Jun 2018 18:48:56 +
Dave Carroll  écrivait:

> Sorry, that comment was incorrect, but I would like to see if the
> size is consistent between the kernels.
> 

I just booted the 4.16 from Debian testing, same problem, so this is
not an artefact of my custom compilation:

aacraid: Host adapter abort request.
aacraid: Outstanding commands on (0,0,0,0):
aacraid: Host adapter abort request.
aacraid: Outstanding commands on (0,0,0,0):
aacraid: Host adapter abort request.
aacraid: Host adapter reset request. SCSI hang ?

The very same adapter connected to the very same disks works perfectly
fine with a 4.13 kernel, and not at all with 4.14, 4.15, 4.16. I didn't
test 4.17 yet but...

-- 

Emmanuel Florac |   Direction technique
|   Intellique
|   
|   +33 1 78 94 84 02



pgptqiB2zA2MO.pgp
Description: Signature digitale OpenPGP


[Bug 199703] HPSA blocking boot on HP smart Array P400

2018-07-03 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199703

--- Comment #24 from Rich Reamer (richr...@yahoo.com) ---
UPDATE:
* found out CONFIG_BLK_CPQ_CISS_DA was removed and Replaced with HPSA driver.
* the "hpsa_allow_any=1" boot parameter does sort-of find the raid/disk -- and
creates a "/dev/disk/by-path/pci-:00:06.1-scsi-0:0:0:0" -- but thats it.
(no "by-uuid" or "by-id" entries)
* blkid just hangs completely when tried

-- 
You are receiving this mail because:
You are the assignee for the bug.


Re: [PATCH] mpt3sas: Fix for regression caused due to cf6bf9710c patch

2018-07-03 Thread James Bottomley
On Tue, 2018-07-03 at 22:49 +0900, David Miller wrote:
> From: Sreekanth Reddy 
> Date: Tue, 3 Jul 2018 17:48:49 +0530
> 
> > Any suggestion/update over my previous mail. I am using 4.13
> kernel.
> 
> I think the issue is that if you are reading a 32-bit word and then
> interpreting it as a struct full of individual bytes, you have to
> order the bytes in the structure appropriately for the cpu
> endianness.

This is undoubtedly it.  The point being if you read from a structure
using readX, you have to read every element at its correct length for
the endian swaps to work.  You can't do a readq on 2 32 bit words and
expect the endianness to be correct (you'll find they come out in the
wrong order).

I think you're using a shared (device and cpu) memory mapped structured
data with a doorbell register, which is pretty identical to how the
qla1280 does it.  We went through several iterations of fixing that
driver for big endian but finally settled on putting __le annotations
on all the structures and doing cpu_to_leX() swaps as we wrote them
(and obviously leX_to_cpu() swaps to read them), meaning the structure
in memory is always correct for the device.  Then we used a writeX to
poke the doorbell and the device just picked up the correct
information.

The rule you want to be following is: memory mapped structure, you're
responsible for annotation and swapping; readX/writeX to correctly
sized data, the API will swap for you.

So, can we just revert the original patch which is clearly now a
regression and try to get this fixed in the merge window?  I think the
actual bug is simply you're missing __leX annotations on the shared
memory mapped structure to fix sparse, but otherwise everything is
working.

James



[PATCH] qedi: Send driver state to mfw.

2018-07-03 Thread Manish Rangankar
In case of iSCSI offload BFS environment, mfw requires to mark
virtual link based upon qedi load status.

Signed-off-by: Manish Rangankar 
---
 drivers/scsi/qedi/qedi_main.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index 682f3ce..253f305 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -2273,6 +2273,7 @@ static int qedi_setup_boot_info(struct qedi_ctx *qedi)
 static void __qedi_remove(struct pci_dev *pdev, int mode)
 {
struct qedi_ctx *qedi = pci_get_drvdata(pdev);
+   int rval;
 
if (qedi->tmf_thread) {
flush_workqueue(qedi->tmf_thread);
@@ -2302,6 +2303,10 @@ static void __qedi_remove(struct pci_dev *pdev, int mode)
if (mode == QEDI_MODE_NORMAL)
qedi_free_iscsi_pf_param(qedi);
 
+   rval = qedi_ops->common->update_drv_state(qedi->cdev, false);
+   if (rval)
+   QEDI_ERR(&qedi->dbg_ctx, "Failed to send drv state to MFW\n");
+
if (!test_bit(QEDI_IN_OFFLINE, &qedi->flags)) {
qedi_ops->common->slowpath_stop(qedi->cdev);
qedi_ops->common->remove(qedi->cdev);
@@ -2576,6 +2581,12 @@ static int __qedi_probe(struct pci_dev *pdev, int mode)
if (qedi_setup_boot_info(qedi))
QEDI_ERR(&qedi->dbg_ctx,
 "No iSCSI boot target configured\n");
+
+   rc = qedi_ops->common->update_drv_state(qedi->cdev, true);
+   if (rc)
+   QEDI_ERR(&qedi->dbg_ctx,
+"Failed to send drv state to MFW\n");
+
}
 
return 0;
-- 
1.8.3.1



Re: [PATCH] mpt3sas: Fix for regression caused due to cf6bf9710c patch

2018-07-03 Thread David Miller
From: Sreekanth Reddy 
Date: Tue, 3 Jul 2018 17:48:49 +0530

> Any suggestion/update over my previous mail. I am using 4.13 kernel.

I think the issue is that if you are reading a 32-bit word and then
interpreting it as a struct full of individual bytes, you have to
order the bytes in the structure appropriately for the cpu endianness.

Look at the 32-bit value read, it is identical in both the x86 and
sparc cases.


Re: [PATCH] mpt3sas: Fix for regression caused due to cf6bf9710c patch

2018-07-03 Thread Sreekanth Reddy
Hi,

Any suggestion/update over my previous mail. I am using 4.13 kernel.

Thanks,
Sreekanth

On Sat, Jun 30, 2018 at 12:34 AM, Sreekanth Reddy
 wrote:
> Hi All,
>
> Here is the issue which we are observing when driver don't use
> le16_to_cpu() in below code snippet on Sparc64 machine when driver is
> reading 2 bytes of data which is posted by HBA firmware,
>
> u32 reply1;
> reply1 = readl(&ioc->chip->Doorbell);
> reply[1] = (reply1 & MPI2_DOORBELL_DATA_MASK);
>
> printk("LSI debug.. 0x%x, 0x%x, 0x%x \n", reply1, reply[1]);
> writel(0, &ioc->chip->HostInterruptStatus);
>
> printk("LSI MsgLength :%d\n", default_reply->MsgLength);
>
> When I execute above code I got below output on Sparc64 machine,
>
> LSI debug.. 0x1c000311, 0x311
> LSI MsgLength :3
>
> When I execute same code in x86 machine then I got below output,
>
> LSI debug.. 0x1c000311, 0x311
> LSI MsgLength :17
>
> Correct message (Here I am referring IOCFacts message) Length is 17
> words. But on Sparc64 machine we got message length as 3 words which
> is wrong.
>
> Here is data structure of default reply message,
>
> typedef struct _MPI2_DEFAULT_REPLY {
> U16 FunctionDependent1; /*0x00 */
> U8 MsgLength;   /*0x02 */
> U8 Function;/*0x03 */
> U16 FunctionDependent2; /*0x04 */
> U8 FunctionDependent3;  /*0x06 */
> U8 MsgFlags;/*0x07 */
> U8 VP_ID;   /*0x08 */
> U8 VF_ID;   /*0x09 */
> U16 Reserved1;  /*0x0A */
> U16 FunctionDependent5; /*0x0C */
> U16 IOCStatus;  /*0x0E */
> U32 IOCLogInfo; /*0x10 */
> } MPI2_DEFAULT_REPLY, *PTR_MPI2_DEFAULT_REPLY,
> MPI2DefaultReply_t, *pMPI2DefaultReply_t;
>
> Until host reads correct number of reply words, IOC won't clear
> Doorbel Used bit and hence we see below error message while loading
> the driver and IOC initialization fails.
>
> Jun 28 02:21:57 localhost kernel: mpt4sas_cm0: _base_get_ioc_facts
> Jun 28 02:21:57 localhost kernel: mpt4sas_cm0: _base_wait_for_iocstate
> Jun 28 02:21:57 localhost kernel: mpt4sas_cm0: doorbell is in use (line=5241)
> Jun 28 02:21:57 localhost kernel: mpt4sas_cm0: _base_get_ioc_facts:
> handshake failed (r=-14)
> Jun 28 02:21:57 localhost kernel: mpt4sas_cm0: mpt3sas_base_free_resources
> Jun 28 02:21:57 localhost kernel: mpt4sas_cm0: _base_make_ioc_ready
> Jun 28 02:21:57 localhost kernel: mpt4sas_cm0: mpt3sas_base_unmap_resources
> Jun 28 02:21:57 localhost kernel: mpt4sas_cm0: _base_release_memory_pools
> Jun 28 02:21:57 localhost kernel: mpt4sas_cm0: failure at
> /home/chaitra/mpt3sas_with_sparse_patch/mpt3sas_scsih.c:10776/_scsih_probe()
>
>
> Thanks,
> Sreekanth
>
> On Sat, Jun 30, 2018 at 12:06 AM, Andy Shevchenko
>  wrote:
>> On Fri, Jun 29, 2018 at 7:06 PM, James Bottomley
>>  wrote:
>>> On Fri, 2018-06-29 at 10:58 -0400, Chaitra P B wrote:
  "scsi: mpt3sas: Bug fix for big endian systems"

 Above patch with commit id "cf6bf9710cabba1fe94a4349f4eb8db623c77ebc"
 was posted to fix sparse warnings. While posting this patch it was
 assumed that readl() & writel() APIs internally calls le32_to_cpu() &
 cpu_to_le32() APIs respectively. Looks like it is not true for all
 architecture
>>>
>>> Just a minute, it damn well should be.  The definition of readl/writel
>>> is barriers and little endian (you can see this in asm-generic/io.h).
>>>
>>> Which architecture is getting this wrong?  Because it sounds like
>>> that's what we need to fix rather than doing something like this in all
>>> drivers.
>>>
>>> Sparc (and parisc) definitely do the little endian thing, so if this
>>> code is what it takes to get them working again, it looks like you're
>>> double swapping somewhere.  I really think cf6bf9710c needs to be
>>> reverted and you should look again at your sparse warnings.  Since the
>>> driver is using the non-raw readX/writeX it should be cpu endian
>>> internally which cf6bf9710c upsets.
>>
>> And we definitely won't see the constructions like
>> writeq(cpu_to_le64()) in the code, because it's weird.
>> If I get it correctly it's equivalent to __raw_writeq().
>>
>> --
>> With Best Regards,
>> Andy Shevchenko