Thanks for testing!
I will SRU this to the Disco kernel.

https://lists.ubuntu.com/archives/kernel-team/2019-December/106569.html

** Description changed:

- Using Linux kernel, When inject 1bit ecc error,  there are some mce log
- recorded in the dmesg.like:
+ == SRU Justification ==
+ With the 5.0 Disco kernel, the kernel cannot record the mce log while
+ injecting 1bit ecc error.
+ 
+ == Fix ==
+   * 09cbd219 (RAS/CEC: Increment cec_entered under the mutex lock)
+   * de0e0624 (RAS/CEC: Check count_threshold unconditionally)
+ 
+ Commit de0e0624 is the real fix for this issue, 09cbd219 is a fix to
+ avoid race condition, and it can make the latter become a clean
+ cherry-pick.
+ 
+ These have been landed on newer kernels.
+ 
+ == Test ==
+ Test kernel could be found here:
+ https://people.canonical.com/~phlin/kernel/lp-1857413-ras-err-msg/
+ 
+ Verified by the bug reporter, fan jinke, the patched kernel can log
+ the error correctly.
+ 
+ == Regression Potential ==
+ Low, changes are limited to the RAS Correctable Errors Collector. And
+ the fix has been verified as working as expected.
+ 
+ 
+ == Original Bug Report ==
+ Using Linux kernel, When inject 1bit ecc error,  there are some mce log 
recorded in the dmesg.like:
  
  [ 1561.511210] mce: [Hardware Error]: Machine check events logged
  [ 1561.511221] [Hardware Error]: Corrected error, no action required.
  [ 1561.511311] [Hardware Error]: CPU:0 (18:0:2) 
MC16_STATUS[Over|CE|MiscV|-|AddrV|-|-|SyndV|-|CECC]: 0xdc2040000000011b
  [ 1561.511388] [Hardware Error]: Error Addr: 0x000000077cd66940
  [ 1561.511439] [Hardware Error]: IPID: 0x0000009600150f00, Syndrome: 
0x000010ce0a400d01
  [ 1561.511499] [Hardware Error]: Unified Memory Controller Extended Error 
Code: 0
  [ 1561.511556] [Hardware Error]: Unified Memory Controller Error: DRAM ECC 
error.
  [ 1561.511646] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 
page:0x7fcd66 offset:0x940 grain:0 syndrome:0x10ce)
  [ 1561.511648] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
  
  *But, there are no the log when Using "Ubuntu 18.04.3 LTS"*
  
  The upstream related commit is de0e0624d86ff9fc512dedb297f8978698abf21a
  .
  
  After merged this commit, Ubuntu kernel's dmesg can record the mce log as 
well.
- --- 
+ ---
  ProblemType: Bug
  AlsaDevices:
-  total 0
-  crw-rw----+ 1 root audio 116,  1 Dec 24 17:20 seq
-  crw-rw----+ 1 root audio 116, 33 Dec 24 17:20 timer
+  total 0
+  crw-rw----+ 1 root audio 116,  1 Dec 24 17:20 seq
+  crw-rw----+ 1 root audio 116, 33 Dec 24 17:20 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.10-0ubuntu27
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 19.04
  InstallationDate: Installed on 2019-12-24 (0 days ago)
  InstallationMedia: Ubuntu-Server 19.04 "Disco Dingo" - Release amd64 
(20190416.1)
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  MachineType: Sugon HygonH210
  Package: linux (not installed)
  PciMultimedia:
-  
+ 
  ProcEnviron:
-  TERM=linux
-  PATH=(custom, no user)
-  LANG=en_US.UTF-8
-  SHELL=/bin/bash
+  TERM=linux
+  PATH=(custom, no user)
+  LANG=en_US.UTF-8
+  SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.0.0-13-generic 
root=UUID=43f8bc11-d850-4e79-9d14-1232ef50040f ro
  ProcVersionSignature: Ubuntu 5.0.0-13.14-generic 5.0.6
  RelatedPackageVersions:
-  linux-restricted-modules-5.0.0-13-generic N/A
-  linux-backports-modules-5.0.0-13-generic  N/A
-  linux-firmware                            1.178
+  linux-restricted-modules-5.0.0-13-generic N/A
+  linux-backports-modules-5.0.0-13-generic  N/A
+  linux-firmware                            1.178
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  Tags:  disco
  Uname: Linux 5.0.0-13-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
-  
+ 
  _MarkForUpload: True
  dmi.bios.date: 03/15/2019
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: 210ER119
  dmi.board.asset.tag: Default string
  dmi.board.name: HygonH210
  dmi.board.vendor: Sugon
  dmi.board.version: Default string
  dmi.chassis.asset.tag: Default string
  dmi.chassis.type: 17
  dmi.chassis.vendor: Sugon
  dmi.chassis.version: Default string
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvr210ER119:bd03/15/2019:svnSugon:pnHygonH210:pvrDefaultstring:rvnSugon:rnHygonH210:rvrDefaultstring:cvnSugon:ct17:cvrDefaultstring:
  dmi.product.family: Rack
  dmi.product.name: HygonH210
  dmi.product.sku: Default string
  dmi.product.version: Default string
  dmi.sys.vendor: Sugon

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1857413

Title:
  mce: ras:  When inject 1bit ecc error,  there is no mce log recorded
  in the dmesg

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Disco:
  In Progress

Bug description:
  == SRU Justification ==
  With the 5.0 Disco kernel, the kernel cannot record the mce log while
  injecting 1bit ecc error.

  == Fix ==
    * 09cbd219 (RAS/CEC: Increment cec_entered under the mutex lock)
    * de0e0624 (RAS/CEC: Check count_threshold unconditionally)

  Commit de0e0624 is the real fix for this issue, 09cbd219 is a fix to
  avoid race condition, and it can make the latter become a clean
  cherry-pick.

  These have been landed on newer kernels.

  == Test ==
  Test kernel could be found here:
  https://people.canonical.com/~phlin/kernel/lp-1857413-ras-err-msg/

  Verified by the bug reporter, fan jinke, the patched kernel can log
  the error correctly.

  == Regression Potential ==
  Low, changes are limited to the RAS Correctable Errors Collector. And
  the fix has been verified as working as expected.

  
  == Original Bug Report ==
  Using Linux kernel, When inject 1bit ecc error,  there are some mce log 
recorded in the dmesg.like:

  [ 1561.511210] mce: [Hardware Error]: Machine check events logged
  [ 1561.511221] [Hardware Error]: Corrected error, no action required.
  [ 1561.511311] [Hardware Error]: CPU:0 (18:0:2) 
MC16_STATUS[Over|CE|MiscV|-|AddrV|-|-|SyndV|-|CECC]: 0xdc2040000000011b
  [ 1561.511388] [Hardware Error]: Error Addr: 0x000000077cd66940
  [ 1561.511439] [Hardware Error]: IPID: 0x0000009600150f00, Syndrome: 
0x000010ce0a400d01
  [ 1561.511499] [Hardware Error]: Unified Memory Controller Extended Error 
Code: 0
  [ 1561.511556] [Hardware Error]: Unified Memory Controller Error: DRAM ECC 
error.
  [ 1561.511646] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 
page:0x7fcd66 offset:0x940 grain:0 syndrome:0x10ce)
  [ 1561.511648] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD

  *But, there are no the log when Using "Ubuntu 18.04.3 LTS"*

  The upstream related commit is
  de0e0624d86ff9fc512dedb297f8978698abf21a .

  After merged this commit, Ubuntu kernel's dmesg can record the mce log as 
well.
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw----+ 1 root audio 116,  1 Dec 24 17:20 seq
   crw-rw----+ 1 root audio 116, 33 Dec 24 17:20 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.10-0ubuntu27
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 19.04
  InstallationDate: Installed on 2019-12-24 (0 days ago)
  InstallationMedia: Ubuntu-Server 19.04 "Disco Dingo" - Release amd64 
(20190416.1)
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  MachineType: Sugon HygonH210
  Package: linux (not installed)
  PciMultimedia:

  ProcEnviron:
   TERM=linux
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.0.0-13-generic 
root=UUID=43f8bc11-d850-4e79-9d14-1232ef50040f ro
  ProcVersionSignature: Ubuntu 5.0.0-13.14-generic 5.0.6
  RelatedPackageVersions:
   linux-restricted-modules-5.0.0-13-generic N/A
   linux-backports-modules-5.0.0-13-generic  N/A
   linux-firmware                            1.178
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  Tags:  disco
  Uname: Linux 5.0.0-13-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:

  _MarkForUpload: True
  dmi.bios.date: 03/15/2019
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: 210ER119
  dmi.board.asset.tag: Default string
  dmi.board.name: HygonH210
  dmi.board.vendor: Sugon
  dmi.board.version: Default string
  dmi.chassis.asset.tag: Default string
  dmi.chassis.type: 17
  dmi.chassis.vendor: Sugon
  dmi.chassis.version: Default string
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvr210ER119:bd03/15/2019:svnSugon:pnHygonH210:pvrDefaultstring:rvnSugon:rnHygonH210:rvrDefaultstring:cvnSugon:ct17:cvrDefaultstring:
  dmi.product.family: Rack
  dmi.product.name: HygonH210
  dmi.product.sku: Default string
  dmi.product.version: Default string
  dmi.sys.vendor: Sugon

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857413/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to