On 2020/11/29 13:12, Douglas Gilbert wrote:
On 2020-11-28 6:27 p.m., James Bottomley wrote:
On Sat, 2020-11-28 at 20:23 +0800, Ding Hui wrote:
We can get a crash when disconnecting the iSCSI session,
the call trace like this:

   [ffff00002a00fb70] kfree at ffff00000830e224
   [ffff00002a00fba0] ses_intf_remove at ffff000001f200e4
   [ffff00002a00fbd0] device_del at ffff0000086b6a98
   [ffff00002a00fc50] device_unregister at ffff0000086b6d58
   [ffff00002a00fc70] __scsi_remove_device at ffff00000870608c
   [ffff00002a00fca0] scsi_remove_device at ffff000008706134
   [ffff00002a00fcc0] __scsi_remove_target at ffff0000087062e4
   [ffff00002a00fd10] scsi_remove_target at ffff0000087064c0
   [ffff00002a00fd70] __iscsi_unbind_session at ffff000001c872c4
   [ffff00002a00fdb0] process_one_work at ffff00000810f35c
   [ffff00002a00fe00] worker_thread at ffff00000810f648
   [ffff00002a00fe70] kthread at ffff000008116e98

In ses_intf_add, components count could be 0, and kcalloc 0 size
scomp,
but not saved in edev->component[i].scratch

In this situation, edev->component[0].scratch is an invalid pointer,
when kfree it in ses_intf_remove_enclosure, a crash like above would
happen
The call trace also could be other random cases when kfree cannot
catch
the invalid pointer

We should not use edev->component[] array when the components count
is 0
We also need check index when use edev->component[] array in
ses_enclosure_data_process

Tested-by: Zeng Zhicong <timmyz...@163.com>
Cc: stable <sta...@vger.kernel.org> # 2.6.25+
Signed-off-by: Ding Hui <ding...@sangfor.com.cn>

This doesn't really look to be the right thing to do: an enclosure
which has no component can't usefully be controlled by the driver since
there's nothing for it to do, so what we should do in this situation is
refuse to attach like the proposed patch below.

It does seem a bit odd that someone would build an enclosure that
doesn't enclose anything, so would you mind running

sg_ses -e

'-e' is the short form of '--enumerate'. That will report the names
and abbreviations of the diagnostic pages that the utility itself
knows about (and supports). It won't show anything specific about
the environment that sg_ses is executed in.

You probably meant:
   sg_ses <ses_device>

Examples of the likely forms are:
   sg_ses /dev/bsg/1:0:0:0
   sg_ses /dev/sg2
   sg_ses /dev/ses0

This from a nearby machine:

$ lsscsi -gs
[3:0:0:0]  disk  ATA      Samsung SSD 850  1B6Q  /dev/sda   /dev/sg0 120GB [4:0:0:0]  disk  IBM-207x HUSMM8020ASS20   J4B6  /dev/sdc   /dev/sg2 200GB [4:0:1:0]  disk  ATA      INTEL SSDSC2KW25 003C  /dev/sdd   /dev/sg3 256GB [4:0:2:0]  disk  SEAGATE  ST10000NM0096    E005  /dev/sde   /dev/sg4 10.0TB [4:0:3:0]  enclosu Areca Te ARC-802801.37.69 0137  - /dev/sg5        - [4:0:4:0]  enclosu Intel    RES2SV240        0d00  - /dev/sg6        - [7:0:0:0]  disk    Kingston DataTravelerMini PMAP  /dev/sdb /dev/sg1 1.03GB [N:0:0:1]  disk    WDC WDS256G1X0C-00ENX0__1       /dev/nvme0n1  - 256GB

# sg_ses /dev/sg5
   Areca Te  ARC-802801.37.69  0137
Supported diagnostic pages:
   Supported Diagnostic Pages [sdp] [0x0]
   Configuration (SES) [cf] [0x1]
   Enclosure Status/Control (SES) [ec,es] [0x2]
   String In/Out (SES) [str] [0x4]
   Threshold In/Out (SES) [th] [0x5]
   Element Descriptor (SES) [ed] [0x7]
   Additional Element Status (SES-2) [aes] [0xa]
   Supported SES Diagnostic Pages (SES-2) [ssp] [0xd]
   Download Microcode (SES-2) [dm] [0xe]
   Subenclosure Nickname (SES-2) [snic] [0xf]
   Protocol Specific (SAS transport) [] [0x3f]

# sg_ses -p cf /dev/sg5
   Areca Te  ARC-802801.37.69  0137
Configuration diagnostic page:
   number of secondary subenclosures: 0
   generation code: 0x0
   enclosure descriptor list
     Subenclosure identifier: 0 [primary]
       relative ES process id: 1, number of ES processes: 1
       number of type descriptor headers: 9
       enclosure logical identifier (hex): d5b401503fc0ec16
       enclosure vendor: Areca Te  product: ARC-802801.37.69  rev: 0137
       vendor-specific data:
         11 22 33 44 55 00 00 00                             ."3DU...

   type descriptor header and text list
     Element type: Array device slot, subenclosure id: 0
       number of possible elements: 24
       text: ArrayDevicesInSubEnclsr0
     Element type: Enclosure, subenclosure id: 0
       number of possible elements: 1
       text: EnclosureElementInSubEnclsr0
     Element type: SAS expander, subenclosure id: 0
       number of possible elements: 1
       text: SAS Expander
     Element type: Cooling, subenclosure id: 0
       number of possible elements: 5
       text: CoolingElementInSubEnclsr0
     Element type: Temperature sensor, subenclosure id: 0
       number of possible elements: 2
       text: TempSensorsInSubEnclsr0
     Element type: Voltage sensor, subenclosure id: 0
       number of possible elements: 2
       text: VoltageSensorsInSubEnclsr0
     Element type: SAS connector, subenclosure id: 0
       number of possible elements: 3
       text: ConnectorsInSubEnclsr0
     Element type: Power supply, subenclosure id: 0
       number of possible elements: 2
       text: PowerSupplyInSubEnclsr0
     Element type: Audible alarm, subenclosure id: 0
       number of possible elements: 1
       text: AudibleAlarmInSubEnclsr0

Doug Gilbert

on it and reporting back what it shows?  It's possible there's another
type that the enclosure device should be tracking.

kernel log:

2020-11-30 09:29:44.228339 info [kernel:] [425726.567579] scsi host18: iSCSI Initiator over TCP/IP 2020-11-30 09:29:44.476319 notice [kernel:] [425726.817417] scsi 18:0:0:0: Direct-Access DELL MD32xxi 0784 PQ: 0 ANSI: 5 2020-11-30 09:29:44.480314 notice [kernel:] [425726.820591] scsi 18:0:0:0: rdac: LUN 0 (IOSHIP) (owned) 2020-11-30 09:29:44.480319 notice [kernel:] [425726.820810] sd 18:0:0:0: Attached scsi generic sg30 type 0 2020-11-30 09:29:44.480320 notice [kernel:] [425726.820812] sd 18:0:0:0: Embedded Enclosure Device 2020-11-30 09:29:44.480321 warning [kernel:] [425726.821119] sd 18:0:0:0: Mode parameters changed 2020-11-30 09:29:44.492316 info [kernel:] [425726.831444] sd 18:0:0:0: [ses_intf_add]:777 sdev: ffff8027ca170000 edev:ffff80271459cc00 com: 0 edev->component[0].scratch: 0000000000000000 2020-11-30 09:29:44.492326 notice [kernel:] [425726.832326] sd 18:0:0:0: [sdt] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB) 2020-11-30 09:29:44.496308 notice [kernel:] [425726.836252] sd 18:0:0:0: [sdt] Write Protect is off 2020-11-30 09:29:44.496313 debug [kernel:] [425726.836255] sd 18:0:0:0: [sdt] Mode Sense: 83 00 10 08 2020-11-30 09:29:44.500310 notice [kernel:] [425726.838436] sd 18:0:0:0: [sdt] Write cache: enabled, read cache: enabled, supports DPO and FUA 2020-11-30 09:29:44.500315 notice [kernel:] [425726.838573] scsi 18:0:0:1: Direct-Access DELL MD32xxi 0784 PQ: 0 ANSI: 5 2020-11-30 09:29:44.501437 notice [kernel:] [425726.842314] scsi 18:0:0:1: rdac: LUN 1 (IOSHIP) (owned) 2020-11-30 09:29:44.501442 notice [kernel:] [425726.842501] sd 18:0:0:1: Attached scsi generic sg31 type 0 2020-11-30 09:29:44.501442 notice [kernel:] [425726.842502] sd 18:0:0:1: Embedded Enclosure Device 2020-11-30 09:29:44.501443 warning [kernel:] [425726.842688] sd 18:0:0:1: Mode parameters changed 2020-11-30 09:29:44.512325 info [kernel:] [425726.852905] sd 18:0:0:1: [ses_intf_add]:777 sdev: ffff8027ca162000 edev:ffff80271459d400 com: 0 edev->component[0].scratch: 0000000000000000 2020-11-30 09:29:44.512335 notice [kernel:] [425726.853249] sd 18:0:0:1: [sdu] 104857600 512-byte logical blocks: (53.7 GB/50.0 GiB) 2020-11-30 09:29:44.514333 notice [kernel:] [425726.853778] sd 18:0:0:1: [sdu] Write Protect is off 2020-11-30 09:29:44.514338 debug [kernel:] [425726.853780] sd 18:0:0:1: [sdu] Mode Sense: 83 00 10 08 2020-11-30 09:29:44.514339 notice [kernel:] [425726.853908] scsi 18:0:0:31: Direct-Access DELL Universal Xport 0784 PQ: 0 ANSI: 5 2020-11-30 09:29:44.514340 notice [kernel:] [425726.854524] sd 18:0:0:1: [sdu] Write cache: enabled, read cache: enabled, supports DPO and FUA 2020-11-30 09:29:44.514340 notice [kernel:] [425726.855140] sd 18:0:0:0: [sdt] Attached SCSI disk 2020-11-30 09:29:44.514341 notice [kernel:] [425726.855221] scsi 18:0:0:31: Attached scsi generic sg32 type 0 2020-11-30 09:29:44.514342 notice [kernel:] [425726.855223] scsi 18:0:0:31: Embedded Enclosure Device 2020-11-30 09:29:44.514383 warning [kernel:] [425726.855571] scsi 18:0:0:31: Power-on or device reset occurred 2020-11-30 09:29:44.514383 err [kernel:] [425726.855587] scsi 18:0:0:31: Failed to get diagnostic page 0x1 2020-11-30 09:29:44.514384 err [kernel:] [425726.855594] scsi 18:0:0:31: Failed to bind enclosure -19 2020-11-30 09:29:44.514388 info [kernel:] [425726.855599] scsi 18:0:0:31: sdev: ffff8027ca173000 2020-11-30 09:29:44.524329 notice [kernel:] [425726.863747] sd 18:0:0:1: [sdu] Attached SCSI disk 2020-11-30 09:29:44.764335 err [kernel:] [425727.103405] I/O scheduler mq-deadline not found 2020-11-30 09:29:44.888323 err [kernel:] [425727.228994] I/O scheduler mq-deadline not found 2020-11-30 09:29:45.001274 warning [kernel:] [425727.342284] device-mapper: multipath round-robin: repeat_count > 1 is deprecated, using 1 instead 2020-11-30 09:31:41.116334 info [kernel:] [425843.460697] scsi host19: iSCSI Initiator over TCP/IP 2020-11-30 09:31:41.356325 notice [kernel:] [425843.700556] scsi 19:0:0:0: Direct-Access DELL MD32xxi 0784 PQ: 0 ANSI: 5 2020-11-30 09:31:41.360325 notice [kernel:] [425843.704270] scsi 19:0:0:0: rdac: LUN 0 (IOSHIP) (owned) 2020-11-30 09:31:41.360334 notice [kernel:] [425843.704478] sd 19:0:0:0: Attached scsi generic sg33 type 0 2020-11-30 09:31:41.360334 notice [kernel:] [425843.704480] sd 19:0:0:0: Embedded Enclosure Device 2020-11-30 09:31:41.360335 warning [kernel:] [425843.704732] sd 19:0:0:0: Mode parameters changed 2020-11-30 09:31:41.380310 info [kernel:] [425843.723464] sd 19:0:0:0: [ses_intf_add]:777 sdev: ffff8027c8984000 edev:ffff802723f68400 com: 0 edev->component[0].scratch: 0000000000000000 2020-11-30 09:31:41.380317 notice [kernel:] [425843.724232] sd 19:0:0:0: [sdv] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB) 2020-11-30 09:31:41.384308 notice [kernel:] [425843.729742] sd 19:0:0:0: [sdv] Write Protect is off 2020-11-30 09:31:41.384313 debug [kernel:] [425843.729745] sd 19:0:0:0: [sdv] Mode Sense: 83 00 10 08 2020-11-30 09:31:41.388310 notice [kernel:] [425843.730436] sd 19:0:0:0: [sdv] Write cache: enabled, read cache: enabled, supports DPO and FUA 2020-11-30 09:31:41.388316 notice [kernel:] [425843.730557] scsi 19:0:0:1: Direct-Access DELL MD32xxi 0784 PQ: 0 ANSI: 5 2020-11-30 09:31:41.392312 notice [kernel:] [425843.736948] scsi 19:0:0:1: rdac: LUN 1 (IOSHIP) (owned) 2020-11-30 09:31:41.392317 notice [kernel:] [425843.737146] sd 19:0:0:1: Attached scsi generic sg34 type 0 2020-11-30 09:31:41.392317 notice [kernel:] [425843.737148] sd 19:0:0:1: Embedded Enclosure Device 2020-11-30 09:31:41.392318 warning [kernel:] [425843.737829] sd 19:0:0:1: Mode parameters changed 2020-11-30 09:31:41.404316 info [kernel:] [425843.749482] sd 19:0:0:1: [ses_intf_add]:777 sdev: ffff8027c899e000 edev:ffff802723f6e800 com: 0 edev->component[0].scratch: 0000000000000000 2020-11-30 09:31:41.404326 notice [kernel:] [425843.749862] sd 19:0:0:1: [sdw] 104857600 512-byte logical blocks: (53.7 GB/50.0 GiB) 2020-11-30 09:31:41.408331 notice [kernel:] [425843.750597] sd 19:0:0:1: [sdw] Write Protect is off 2020-11-30 09:31:41.408346 debug [kernel:] [425843.750598] sd 19:0:0:1: [sdw] Mode Sense: 83 00 10 08 2020-11-30 09:31:41.408347 notice [kernel:] [425843.751325] scsi 19:0:0:31: Direct-Access DELL Universal Xport 0784 PQ: 0 ANSI: 5 2020-11-30 09:31:41.408347 notice [kernel:] [425843.751979] sd 19:0:0:1: [sdw] Write cache: enabled, read cache: enabled, supports DPO and FUA 2020-11-30 09:31:41.408348 notice [kernel:] [425843.753105] scsi 19:0:0:31: Attached scsi generic sg35 type 0 2020-11-30 09:31:41.408349 notice [kernel:] [425843.753109] scsi 19:0:0:31: Embedded Enclosure Device 2020-11-30 09:31:41.408350 notice [kernel:] [425843.753165] sd 19:0:0:0: [sdv] Attached SCSI disk 2020-11-30 09:31:41.408351 warning [kernel:] [425843.753377] scsi 19:0:0:31: Power-on or device reset occurred 2020-11-30 09:31:41.408352 err [kernel:] [425843.753388] scsi 19:0:0:31: Failed to get diagnostic page 0x1 2020-11-30 09:31:41.408352 err [kernel:] [425843.753390] scsi 19:0:0:31: Failed to bind enclosure -19 2020-11-30 09:31:41.408353 info [kernel:] [425843.753391] scsi 19:0:0:31: sdev: ffff8027c899c000 2020-11-30 09:31:41.416332 notice [kernel:] [425843.759795] sd 19:0:0:1: [sdw] Attached SCSI disk


# cat /sys/class/enclosure/18\:0\:0\:0/components
0
# cat /sys/class/enclosure/18\:0\:0\:1/components
0
# cat /sys/class/enclosure/19\:0\:0\:0/components
0
# cat /sys/class/enclosure/19\:0\:0\:1/components
0

# sg_ses -e
Diagnostic pages, followed by abbreviation(s) then page code:
    Supported Diagnostic Pages  [sdp] [0x0]
    Configuration (SES)  [cf] [0x1]
    Enclosure Status/Control (SES)  [ec,es] [0x2]
    Help Text (SES)  [ht] [0x3]
    String In/Out (SES)  [str] [0x4]
    Threshold In/Out (SES)  [th] [0x5]
    Array Status/Control (SES, obsolete)  [ac,as] [0x6]
    Element Descriptor (SES)  [ed] [0x7]
    Short Enclosure Status (SES)  [ses] [0x8]
    Enclosure Busy (SES-2)  [eb] [0x9]
    Additional Element Status (SES-2)  [aes] [0xa]
    Subenclosure Help Text (SES-2)  [sht] [0xb]
    Subenclosure String In/Out (SES-2)  [sstr] [0xc]
    Supported SES Diagnostic Pages (SES-2)  [ssp] [0xd]
    Download Microcode (SES-2)  [dm] [0xe]
    Subenclosure Nickname (SES-2)  [snic] [0xf]
    Protocol Specific (SAS transport)  [] [0x3f]
    Translate Address (SBC)  [] [0x40]
    Device Status (SBC)  [] [0x41]
    Rebuild Assist (SBC)  [] [0x42]

SES element type names, followed by abbreviation and element type code:
    Unspecified  [un] [0x0]
    Device slot  [dev] [0x1]
    Power supply  [ps] [0x2]
    Cooling  [coo] [0x3]
    Temperature sensor  [ts] [0x4]
    Door  [do] [0x5]
    Audible alarm  [aa] [0x6]
    Enclosure services controller electronics  [esc] [0x7]
    SCC controller electronics  [sce] [0x8]
    Nonvolatile cache  [nc] [0x9]
    Invalid operation reason  [ior] [0xa]
    Uninterruptible power supply  [ups] [0xb]
    Display  [dis] [0xc]
    Key pad entry  [kpe] [0xd]
    Enclosure  [enc] [0xe]
    SCSI port/transceiver  [sp] [0xf]
    Language  [lan] [0x10]
    Communication port  [cp] [0x11]
    Voltage sensor  [vs] [0x12]
    Current sensor  [cs] [0x13]
    SCSI target port  [stp] [0x14]
    SCSI initiator port  [sip] [0x15]
    Simple subenclosure  [ss] [0x16]
    Array device slot  [arr] [0x17]
    SAS expander  [sse] [0x18]
    SAS connector  [ssc] [0x19]

# sg_ses /dev/sdu
  DELL      MD32xxi           0784
    disk device has EncServ bit set
Supported diagnostic pages:
  Supported Diagnostic Pages [sdp] [0x0]
  Configuration (SES) [cf] [0x1]
  Enclosure Status/Control (SES) [ec,es] [0x2]
  Array Status/Control (SES, obsolete) [ac,as] [0x6]

# sg_ses -p cf /dev/sdu
  DELL      MD32xxi           0784
    disk device has EncServ bit set
Configuration diagnostic page:
  number of secondary subenclosures: 0
  generation code: 0x12c
  enclosure descriptor list
    Subenclosure identifier: 0 (primary)
      relative ES process id: 0, number of ES processes: 0
      number of type descriptor headers: 5
      enclosure logical identifier (hex): 0000000000000000
      enclosure vendor: DELL      product: MD32xxi           rev: 0784
      vendor-specific data:
        30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
        00 00 00 00
  type descriptor header/text list
    Element type: Device slot, subenclosure id: 0
      number of possible elements: 12
    Element type: Power supply, subenclosure id: 0
      number of possible elements: 2
    Element type: Cooling, subenclosure id: 0
      number of possible elements: 4
    Element type: Temperature sensor, subenclosure id: 0
      number of possible elements: 4
    Element type: SCC controller electronics, subenclosure id: 0
      number of possible elements: 1

I'm not goot at ses, but it seems that ses does not get the right component count


Regards,

James

---8>8>8><8<8<8--------
From: James Bottomley <james.bottom...@hansenpartnership.com>
Subject: [PATCH] scsi: ses: don't attach if enclosure has no components

An enclosure with no components can't usefully be operated by the
driver (since effectively it has nothing to manage), so report the
problem and don't attach.  Not attaching also fixes an oops which
could occur if the driver tries to manage a zero component enclosure.

Reported-by: Ding Hui <ding...@sangfor.com.cn>
Cc: sta...@vger.kernel.org
Signed-off-by: James Bottomley <james.bottom...@hansenpartnership.com>
---
  drivers/scsi/ses.c | 5 +++++
  1 file changed, 5 insertions(+)

diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
index c2afba2a5414..9624298b9c89 100644
--- a/drivers/scsi/ses.c
+++ b/drivers/scsi/ses.c
@@ -690,6 +690,11 @@ static int ses_intf_add(struct device *cdev,
              type_ptr[0] == ENCLOSURE_COMPONENT_ARRAY_DEVICE)
              components += type_ptr[1];
      }
+    if (components == 0) {
+        sdev_printk(KERN_ERR, sdev, "enclosure has no enumerated components\n");
+        goto err_free;
+    }
+
      ses_dev->page1 = buf;
      ses_dev->page1_len = len;
      buf = NULL;




--
Thanks,
- Ding Hui

Reply via email to