Public bug reported:

Using 4.10.0-22-generic from Ubuntu and running any of the Unigine benchmarks 
(Heaven-4.0, Valley-1.0, Superposition-1.0) causes the screen to go black and 
the graphics system to crash.
The graphics card's fan stops working and sensors reports 511C, clearly wrong.

I can still login via SSH and attempt to stop X, however the application
(e.g. heaven) just remains in a zombie state and the system is unusable,
I can't start X again. In fact the graphics card ends up in a pretty bad
state, because if I press the reset button the UEFI BIOS is not able to
detect it anymore, I have to power the whole system off and on again to
make the card work.

Upgrading to mainline 4.11.3 avoids this problem: all 3 benchmarks are
running fine, with no crashes.

I've attached two dmesgs: one with the default, where IOMMU is on and I get 
lots of AMD-Vi warnings logged:
[  439.903842] ------------[ cut here ]------------
[  439.903848] WARNING: CPU: 5 PID: 0 at 
/build/linux-nOqmtv/linux-4.10.0/drivers/iommu/amd_iommu.c:1252 
__domain_flush_pages+0x1f7/0x220
[  439.903848] Modules linked in: overlay ccm xt_CHECKSUM iptable_mangle 
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter 
ip6_tables iptable_filter binfmt_misc nls_iso8859_1 eeepc_wmi asus_wmi 
sparse_keymap video edac_mce_amd edac_core kvm_amd kvm irqbypass 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel 
snd_hda_codec_realtek arc4 aes_x86_64 crypto_simd glue_helper cryptd 
snd_hda_codec_generic ath9k snd_hda_codec_hdmi ath9k_common ath9k_hw 
snd_hda_intel snd_hda_codec snd_hda_core ath snd_hwdep input_leds joydev 
mac80211 snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi cfg80211 snd_seq 
fam15h_power i2c_piix4 snd_seq_device
[  439.903873]  snd_timer snd k10temp mac_hid soundcore tpm_infineon shpchp 
tcp_bbr sch_fq cuse parport_pc ppdev lp parport ip_tables x_tables autofs4 
btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx 
xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid 
amdkfd amd_iommu_v2 amdgpu mxm_wmi i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect r8169 sysimgblt fb_sys_fops mii drm ahci libahci fjes wmi
[  439.903893] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.10.0-22-generic 
#24-Ubuntu
[  439.903894] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./M5A99FX PRO R2.0, BIOS 2501 04/07/2014
[  439.903895] Call Trace:
[  439.903896]  <IRQ>
[  439.903899]  dump_stack+0x63/0x81
[  439.903900]  __warn+0xcb/0xf0
[  439.903901]  warn_slowpath_null+0x1d/0x20
[  439.903903]  __domain_flush_pages+0x1f7/0x220
[  439.903904]  __queue_flush+0x4b/0xd0
[  439.903905]  ? queue_flush_all+0x90/0x90
[  439.903907]  queue_flush_all+0x77/0x90
[  439.903908]  queue_flush_timeout+0x18/0x20
[  439.903910]  call_timer_fn+0x35/0x140
[  439.903911]  run_timer_softirq+0x215/0x4b0
[  439.903912]  ? ktime_get+0x41/0xb0
[  439.903914]  ? lapic_next_event+0x1d/0x30
[  439.903916]  ? clockevents_program_event+0x7f/0x120
[  439.903918]  __do_softirq+0x104/0x2af
[  439.903919]  irq_exit+0xb6/0xc0
[  439.903921]  smp_apic_timer_interrupt+0x3d/0x50
[  439.903922]  apic_timer_interrupt+0x89/0x90
[  439.903924] RIP: 0010:cpuidle_enter_state+0x122/0x2c0
[  439.903925] RSP: 0018:ffffb4e181a23e58 EFLAGS: 00000246 ORIG_RAX: 
ffffffffffffff10
[  439.903926] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 000000000000001f
[  439.903926] RDX: 0000006665f96c97 RSI: ffff9dbcded56a98 RDI: 0000000000000000
[  439.903927] RBP: ffffb4e181a23e98 R08: cccccccccccccccd R09: 0000000000000018
[  439.903927] R10: 0000000000000da8 R11: 0000000000003557 R12: ffff9dbcd036b600
[  439.903928] R13: ffffffffbaeeba38 R14: 0000000000000002 R15: ffffffffbaeeba20
[  439.903929]  </IRQ>
[  439.903930]  ? cpuidle_enter_state+0x110/0x2c0
[  439.903931]  cpuidle_enter+0x17/0x20
[  439.903933]  call_cpuidle+0x23/0x40
[  439.903934]  do_idle+0x189/0x200
[  439.903935]  cpu_startup_entry+0x71/0x80
[  439.903937]  start_secondary+0x154/0x190
[  439.903938]  start_cpu+0x14/0x14
[  439.903939] ---[ end trace 9edd64d3e01a6c8c ]---

And another one with iommu=soft boot option, where nothing interesting
in dmesg shows up, but the system still crashes.

Note: if I turn IOMMU off completely then USB devices are not working
and I cannot use my keyboard/mouse so I cannot test that scenario.

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: linux-image-generic 4.10.0.22.24
ProcVersionSignature: Ubuntu 4.10.0-22.24-generic 4.10.15
Uname: Linux 4.10.0-22-generic x86_64
ApportVersion: 2.20.4-0ubuntu4.1
Architecture: amd64
AudioDevicesInUse:
 USER        PID ACCESS COMMAND
 /dev/snd/controlC0:  edwin      2753 F.... pulseaudio
 /dev/snd/controlC2:  edwin      2753 F.... pulseaudio
 /dev/snd/controlC1:  edwin      2753 F.... pulseaudio
Date: Tue Jun  6 21:09:45 2017
HibernationDevice: RESUME=UUID=3401e45a-9619-4ae8-9e4d-6dc1e7982524
InstallationDate: Installed on 2017-03-25 (72 days ago)
InstallationMedia: Ubuntu-MATE 17.04 "Zesty Zapus" - Beta amd64 (20170321.1)
MachineType: To be filled by O.E.M. To be filled by O.E.M.
ProcEnviron:
 LANGUAGE=en_GB:en
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.10.0-22-generic 
root=/dev/mapper/ubuntu--mate--vg-root ro quiet splash vt.handoff=7
PulseList:
 Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not 
accessible: Permission denied
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-4.10.0-22-generic N/A
 linux-backports-modules-4.10.0-22-generic  N/A
 linux-firmware                             1.164.1
RfKill:
 0: phy0: Wireless LAN
        Soft blocked: no
        Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/07/2014
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 2501
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: M5A99FX PRO R2.0
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev 1.xx
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvr2501:bd04/07/2014:svnTobefilledbyO.E.M.:pnTobefilledbyO.E.M.:pvrTobefilledbyO.E.M.:rvnASUSTeKCOMPUTERINC.:rnM5A99FXPROR2.0:rvrRev1.xx:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.name: To be filled by O.E.M.
dmi.product.version: To be filled by O.E.M.
dmi.sys.vendor: To be filled by O.E.M.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Confirmed


** Tags: amd64 apport-bug package-from-proposed zesty

** Attachment added: "dmesg IOMMU default"
   https://bugs.launchpad.net/bugs/1696240/+attachment/4890310/+files/log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1696240

Title:
  linux 4.10 and AMD Polaris11 card -> graphics crash

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Using 4.10.0-22-generic from Ubuntu and running any of the Unigine benchmarks 
(Heaven-4.0, Valley-1.0, Superposition-1.0) causes the screen to go black and 
the graphics system to crash.
  The graphics card's fan stops working and sensors reports 511C, clearly wrong.

  I can still login via SSH and attempt to stop X, however the
  application (e.g. heaven) just remains in a zombie state and the
  system is unusable, I can't start X again. In fact the graphics card
  ends up in a pretty bad state, because if I press the reset button the
  UEFI BIOS is not able to detect it anymore, I have to power the whole
  system off and on again to make the card work.

  Upgrading to mainline 4.11.3 avoids this problem: all 3 benchmarks are
  running fine, with no crashes.

  I've attached two dmesgs: one with the default, where IOMMU is on and I get 
lots of AMD-Vi warnings logged:
  [  439.903842] ------------[ cut here ]------------
  [  439.903848] WARNING: CPU: 5 PID: 0 at 
/build/linux-nOqmtv/linux-4.10.0/drivers/iommu/amd_iommu.c:1252 
__domain_flush_pages+0x1f7/0x220
  [  439.903848] Modules linked in: overlay ccm xt_CHECKSUM iptable_mangle 
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter 
ip6_tables iptable_filter binfmt_misc nls_iso8859_1 eeepc_wmi asus_wmi 
sparse_keymap video edac_mce_amd edac_core kvm_amd kvm irqbypass 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel 
snd_hda_codec_realtek arc4 aes_x86_64 crypto_simd glue_helper cryptd 
snd_hda_codec_generic ath9k snd_hda_codec_hdmi ath9k_common ath9k_hw 
snd_hda_intel snd_hda_codec snd_hda_core ath snd_hwdep input_leds joydev 
mac80211 snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi cfg80211 snd_seq 
fam15h_power i2c_piix4 snd_seq_device
  [  439.903873]  snd_timer snd k10temp mac_hid soundcore tpm_infineon shpchp 
tcp_bbr sch_fq cuse parport_pc ppdev lp parport ip_tables x_tables autofs4 
btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx 
xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid 
amdkfd amd_iommu_v2 amdgpu mxm_wmi i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect r8169 sysimgblt fb_sys_fops mii drm ahci libahci fjes wmi
  [  439.903893] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.10.0-22-generic 
#24-Ubuntu
  [  439.903894] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./M5A99FX PRO R2.0, BIOS 2501 04/07/2014
  [  439.903895] Call Trace:
  [  439.903896]  <IRQ>
  [  439.903899]  dump_stack+0x63/0x81
  [  439.903900]  __warn+0xcb/0xf0
  [  439.903901]  warn_slowpath_null+0x1d/0x20
  [  439.903903]  __domain_flush_pages+0x1f7/0x220
  [  439.903904]  __queue_flush+0x4b/0xd0
  [  439.903905]  ? queue_flush_all+0x90/0x90
  [  439.903907]  queue_flush_all+0x77/0x90
  [  439.903908]  queue_flush_timeout+0x18/0x20
  [  439.903910]  call_timer_fn+0x35/0x140
  [  439.903911]  run_timer_softirq+0x215/0x4b0
  [  439.903912]  ? ktime_get+0x41/0xb0
  [  439.903914]  ? lapic_next_event+0x1d/0x30
  [  439.903916]  ? clockevents_program_event+0x7f/0x120
  [  439.903918]  __do_softirq+0x104/0x2af
  [  439.903919]  irq_exit+0xb6/0xc0
  [  439.903921]  smp_apic_timer_interrupt+0x3d/0x50
  [  439.903922]  apic_timer_interrupt+0x89/0x90
  [  439.903924] RIP: 0010:cpuidle_enter_state+0x122/0x2c0
  [  439.903925] RSP: 0018:ffffb4e181a23e58 EFLAGS: 00000246 ORIG_RAX: 
ffffffffffffff10
  [  439.903926] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 
000000000000001f
  [  439.903926] RDX: 0000006665f96c97 RSI: ffff9dbcded56a98 RDI: 
0000000000000000
  [  439.903927] RBP: ffffb4e181a23e98 R08: cccccccccccccccd R09: 
0000000000000018
  [  439.903927] R10: 0000000000000da8 R11: 0000000000003557 R12: 
ffff9dbcd036b600
  [  439.903928] R13: ffffffffbaeeba38 R14: 0000000000000002 R15: 
ffffffffbaeeba20
  [  439.903929]  </IRQ>
  [  439.903930]  ? cpuidle_enter_state+0x110/0x2c0
  [  439.903931]  cpuidle_enter+0x17/0x20
  [  439.903933]  call_cpuidle+0x23/0x40
  [  439.903934]  do_idle+0x189/0x200
  [  439.903935]  cpu_startup_entry+0x71/0x80
  [  439.903937]  start_secondary+0x154/0x190
  [  439.903938]  start_cpu+0x14/0x14
  [  439.903939] ---[ end trace 9edd64d3e01a6c8c ]---

  And another one with iommu=soft boot option, where nothing interesting
  in dmesg shows up, but the system still crashes.

  Note: if I turn IOMMU off completely then USB devices are not working
  and I cannot use my keyboard/mouse so I cannot test that scenario.

  ProblemType: Bug
  DistroRelease: Ubuntu 17.04
  Package: linux-image-generic 4.10.0.22.24
  ProcVersionSignature: Ubuntu 4.10.0-22.24-generic 4.10.15
  Uname: Linux 4.10.0-22-generic x86_64
  ApportVersion: 2.20.4-0ubuntu4.1
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC0:  edwin      2753 F.... pulseaudio
   /dev/snd/controlC2:  edwin      2753 F.... pulseaudio
   /dev/snd/controlC1:  edwin      2753 F.... pulseaudio
  Date: Tue Jun  6 21:09:45 2017
  HibernationDevice: RESUME=UUID=3401e45a-9619-4ae8-9e4d-6dc1e7982524
  InstallationDate: Installed on 2017-03-25 (72 days ago)
  InstallationMedia: Ubuntu-MATE 17.04 "Zesty Zapus" - Beta amd64 (20170321.1)
  MachineType: To be filled by O.E.M. To be filled by O.E.M.
  ProcEnviron:
   LANGUAGE=en_GB:en
   TERM=xterm
   PATH=(custom, no user)
   LANG=en_GB.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.10.0-22-generic 
root=/dev/mapper/ubuntu--mate--vg-root ro quiet splash vt.handoff=7
  PulseList:
   Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not 
accessible: Permission denied
   No PulseAudio daemon running, or not running as session daemon.
  RelatedPackageVersions:
   linux-restricted-modules-4.10.0-22-generic N/A
   linux-backports-modules-4.10.0-22-generic  N/A
   linux-firmware                             1.164.1
  RfKill:
   0: phy0: Wireless LAN
        Soft blocked: no
        Hard blocked: no
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 04/07/2014
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: 2501
  dmi.board.asset.tag: To be filled by O.E.M.
  dmi.board.name: M5A99FX PRO R2.0
  dmi.board.vendor: ASUSTeK COMPUTER INC.
  dmi.board.version: Rev 1.xx
  dmi.chassis.asset.tag: To Be Filled By O.E.M.
  dmi.chassis.type: 3
  dmi.chassis.vendor: To Be Filled By O.E.M.
  dmi.chassis.version: To Be Filled By O.E.M.
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvr2501:bd04/07/2014:svnTobefilledbyO.E.M.:pnTobefilledbyO.E.M.:pvrTobefilledbyO.E.M.:rvnASUSTeKCOMPUTERINC.:rnM5A99FXPROR2.0:rvrRev1.xx:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
  dmi.product.name: To be filled by O.E.M.
  dmi.product.version: To be filled by O.E.M.
  dmi.sys.vendor: To be filled by O.E.M.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1696240/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to