The fix was bikeshedded a tiny bit on LKML, but is now accepted upstream
and AIUI will be in linux-next soon:
https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git/commit/?id=0d98ba8d70b0070ac117452ea0b663e26bbf46bf

This change is tested as backwards compatible with Ubuntu 4.15, and
would be appreciated for SRU.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Unknown
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: 0000:00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: 0000:00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc00000 from 0xffffffff81000000 (relocation 
range: 0xffffffff80000000-0xffffffffbfffffff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
     to this power-up.  (Previous lock up code = 0x13) Action: Install the
     latest controller firmware. If the problem persists, replace the
     controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying to continue
  [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
  [529153.554455] Do you have a strange power saving mode enabled?
  [529153.554456] Dazed and confused, but trying to continue
  [529153.554457] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554458] Do you have a strange power saving mode enabled?
  [529153.554458] Dazed and confused, but trying to continue
  [529153.554459] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554460] Do you have a strange power saving mode enabled?
  [529153.554460] Dazed and confused, but trying to continue
  [529154.953916] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529154.953917] Do you have a strange power saving mode enabled?
  [529154.953918] Dazed and confused, but trying to continue

  But upgrading to 2.56 changes that to a kernel panic.

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-signed-image-generic 4.15.0.21.22
  ProcVersionSignature: Ubuntu 4.15.0-21.22-generic 4.15.17
  Uname: Linux 4.15.0-21-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 May 15 23:11 seq
   crw-rw---- 1 root audio 116, 33 May 15 23:11 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Wed May 16 00:17:53 2018
  HibernationDevice: RESUME=UUID=696e8063-c668-4c89-a478-bfc23a450369
  InstallationDate: Installed on 2016-06-01 (713 days ago)
  InstallationMedia: Ubuntu-Server 14.04.5 LTS "Trusty Tahr" - Beta amd64 
(20160527)
  MachineType: HP ProLiant DL360 Gen9
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 mgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-21-generic 
root=UUID=6e6d422d-8ffb-4db3-b8c7-6c81e320b1b2 ro console=tty0 
console=ttyS1,38400 nosplash console=ttyS1,38400 console=tty0 nosplash
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-21-generic N/A
   linux-backports-modules-4.15.0-21-generic  N/A
   linux-firmware                             1.173
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to bionic on 2018-05-09 (6 days ago)
  dmi.bios.date: 01/22/2018
  dmi.bios.vendor: HP
  dmi.bios.version: P89
  dmi.board.name: ProLiant DL360 Gen9
  dmi.board.vendor: HP
  dmi.chassis.type: 23
  dmi.chassis.vendor: HP
  dmi.modalias: 
dmi:bvnHP:bvrP89:bd01/22/2018:svnHP:pnProLiantDL360Gen9:pvr:rvnHP:rnProLiantDL360Gen9:rvr:cvnHP:ct23:cvr:
  dmi.product.family: ProLiant
  dmi.product.name: ProLiant DL360 Gen9
  dmi.sys.vendor: HP

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1771467/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to