Public bug reported: Using 4.10.0-22-generic from Ubuntu and running any of the Unigine benchmarks (Heaven-4.0, Valley-1.0, Superposition-1.0) causes the screen to go black and the graphics system to crash. The graphics card's fan stops working and sensors reports 511C, clearly wrong.
I can still login via SSH and attempt to stop X, however the application (e.g. heaven) just remains in a zombie state and the system is unusable, I can't start X again. In fact the graphics card ends up in a pretty bad state, because if I press the reset button the UEFI BIOS is not able to detect it anymore, I have to power the whole system off and on again to make the card work. Upgrading to mainline 4.11.3 avoids this problem: all 3 benchmarks are running fine, with no crashes. I've attached two dmesgs: one with the default, where IOMMU is on and I get lots of AMD-Vi warnings logged: [ 439.903842] ------------[ cut here ]------------ [ 439.903848] WARNING: CPU: 5 PID: 0 at /build/linux-nOqmtv/linux-4.10.0/drivers/iommu/amd_iommu.c:1252 __domain_flush_pages+0x1f7/0x220 [ 439.903848] Modules linked in: overlay ccm xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter binfmt_misc nls_iso8859_1 eeepc_wmi asus_wmi sparse_keymap video edac_mce_amd edac_core kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel snd_hda_codec_realtek arc4 aes_x86_64 crypto_simd glue_helper cryptd snd_hda_codec_generic ath9k snd_hda_codec_hdmi ath9k_common ath9k_hw snd_hda_intel snd_hda_codec snd_hda_core ath snd_hwdep input_leds joydev mac80211 snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi cfg80211 snd_seq fam15h_power i2c_piix4 snd_seq_device [ 439.903873] snd_timer snd k10temp mac_hid soundcore tpm_infineon shpchp tcp_bbr sch_fq cuse parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid amdkfd amd_iommu_v2 amdgpu mxm_wmi i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect r8169 sysimgblt fb_sys_fops mii drm ahci libahci fjes wmi [ 439.903893] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.10.0-22-generic #24-Ubuntu [ 439.903894] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A99FX PRO R2.0, BIOS 2501 04/07/2014 [ 439.903895] Call Trace: [ 439.903896] <IRQ> [ 439.903899] dump_stack+0x63/0x81 [ 439.903900] __warn+0xcb/0xf0 [ 439.903901] warn_slowpath_null+0x1d/0x20 [ 439.903903] __domain_flush_pages+0x1f7/0x220 [ 439.903904] __queue_flush+0x4b/0xd0 [ 439.903905] ? queue_flush_all+0x90/0x90 [ 439.903907] queue_flush_all+0x77/0x90 [ 439.903908] queue_flush_timeout+0x18/0x20 [ 439.903910] call_timer_fn+0x35/0x140 [ 439.903911] run_timer_softirq+0x215/0x4b0 [ 439.903912] ? ktime_get+0x41/0xb0 [ 439.903914] ? lapic_next_event+0x1d/0x30 [ 439.903916] ? clockevents_program_event+0x7f/0x120 [ 439.903918] __do_softirq+0x104/0x2af [ 439.903919] irq_exit+0xb6/0xc0 [ 439.903921] smp_apic_timer_interrupt+0x3d/0x50 [ 439.903922] apic_timer_interrupt+0x89/0x90 [ 439.903924] RIP: 0010:cpuidle_enter_state+0x122/0x2c0 [ 439.903925] RSP: 0018:ffffb4e181a23e58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 [ 439.903926] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 000000000000001f [ 439.903926] RDX: 0000006665f96c97 RSI: ffff9dbcded56a98 RDI: 0000000000000000 [ 439.903927] RBP: ffffb4e181a23e98 R08: cccccccccccccccd R09: 0000000000000018 [ 439.903927] R10: 0000000000000da8 R11: 0000000000003557 R12: ffff9dbcd036b600 [ 439.903928] R13: ffffffffbaeeba38 R14: 0000000000000002 R15: ffffffffbaeeba20 [ 439.903929] </IRQ> [ 439.903930] ? cpuidle_enter_state+0x110/0x2c0 [ 439.903931] cpuidle_enter+0x17/0x20 [ 439.903933] call_cpuidle+0x23/0x40 [ 439.903934] do_idle+0x189/0x200 [ 439.903935] cpu_startup_entry+0x71/0x80 [ 439.903937] start_secondary+0x154/0x190 [ 439.903938] start_cpu+0x14/0x14 [ 439.903939] ---[ end trace 9edd64d3e01a6c8c ]--- And another one with iommu=soft boot option, where nothing interesting in dmesg shows up, but the system still crashes. Note: if I turn IOMMU off completely then USB devices are not working and I cannot use my keyboard/mouse so I cannot test that scenario. ProblemType: Bug DistroRelease: Ubuntu 17.04 Package: linux-image-generic 4.10.0.22.24 ProcVersionSignature: Ubuntu 4.10.0-22.24-generic 4.10.15 Uname: Linux 4.10.0-22-generic x86_64 ApportVersion: 2.20.4-0ubuntu4.1 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC0: edwin 2753 F.... pulseaudio /dev/snd/controlC2: edwin 2753 F.... pulseaudio /dev/snd/controlC1: edwin 2753 F.... pulseaudio Date: Tue Jun 6 21:09:45 2017 HibernationDevice: RESUME=UUID=3401e45a-9619-4ae8-9e4d-6dc1e7982524 InstallationDate: Installed on 2017-03-25 (72 days ago) InstallationMedia: Ubuntu-MATE 17.04 "Zesty Zapus" - Beta amd64 (20170321.1) MachineType: To be filled by O.E.M. To be filled by O.E.M. ProcEnviron: LANGUAGE=en_GB:en TERM=xterm PATH=(custom, no user) LANG=en_GB.UTF-8 SHELL=/bin/bash ProcFB: 0 amdgpudrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.10.0-22-generic root=/dev/mapper/ubuntu--mate--vg-root ro quiet splash vt.handoff=7 PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not accessible: Permission denied No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions: linux-restricted-modules-4.10.0-22-generic N/A linux-backports-modules-4.10.0-22-generic N/A linux-firmware 1.164.1 RfKill: 0: phy0: Wireless LAN Soft blocked: no Hard blocked: no SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 04/07/2014 dmi.bios.vendor: American Megatrends Inc. dmi.bios.version: 2501 dmi.board.asset.tag: To be filled by O.E.M. dmi.board.name: M5A99FX PRO R2.0 dmi.board.vendor: ASUSTeK COMPUTER INC. dmi.board.version: Rev 1.xx dmi.chassis.asset.tag: To Be Filled By O.E.M. dmi.chassis.type: 3 dmi.chassis.vendor: To Be Filled By O.E.M. dmi.chassis.version: To Be Filled By O.E.M. dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr2501:bd04/07/2014:svnTobefilledbyO.E.M.:pnTobefilledbyO.E.M.:pvrTobefilledbyO.E.M.:rvnASUSTeKCOMPUTERINC.:rnM5A99FXPROR2.0:rvrRev1.xx:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.: dmi.product.name: To be filled by O.E.M. dmi.product.version: To be filled by O.E.M. dmi.sys.vendor: To be filled by O.E.M. ** Affects: linux (Ubuntu) Importance: Undecided Status: Confirmed ** Tags: amd64 apport-bug package-from-proposed zesty ** Attachment added: "dmesg IOMMU default" https://bugs.launchpad.net/bugs/1696240/+attachment/4890310/+files/log -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1696240 Title: linux 4.10 and AMD Polaris11 card -> graphics crash Status in linux package in Ubuntu: Confirmed Bug description: Using 4.10.0-22-generic from Ubuntu and running any of the Unigine benchmarks (Heaven-4.0, Valley-1.0, Superposition-1.0) causes the screen to go black and the graphics system to crash. The graphics card's fan stops working and sensors reports 511C, clearly wrong. I can still login via SSH and attempt to stop X, however the application (e.g. heaven) just remains in a zombie state and the system is unusable, I can't start X again. In fact the graphics card ends up in a pretty bad state, because if I press the reset button the UEFI BIOS is not able to detect it anymore, I have to power the whole system off and on again to make the card work. Upgrading to mainline 4.11.3 avoids this problem: all 3 benchmarks are running fine, with no crashes. I've attached two dmesgs: one with the default, where IOMMU is on and I get lots of AMD-Vi warnings logged: [ 439.903842] ------------[ cut here ]------------ [ 439.903848] WARNING: CPU: 5 PID: 0 at /build/linux-nOqmtv/linux-4.10.0/drivers/iommu/amd_iommu.c:1252 __domain_flush_pages+0x1f7/0x220 [ 439.903848] Modules linked in: overlay ccm xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter binfmt_misc nls_iso8859_1 eeepc_wmi asus_wmi sparse_keymap video edac_mce_amd edac_core kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel snd_hda_codec_realtek arc4 aes_x86_64 crypto_simd glue_helper cryptd snd_hda_codec_generic ath9k snd_hda_codec_hdmi ath9k_common ath9k_hw snd_hda_intel snd_hda_codec snd_hda_core ath snd_hwdep input_leds joydev mac80211 snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi cfg80211 snd_seq fam15h_power i2c_piix4 snd_seq_device [ 439.903873] snd_timer snd k10temp mac_hid soundcore tpm_infineon shpchp tcp_bbr sch_fq cuse parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid amdkfd amd_iommu_v2 amdgpu mxm_wmi i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect r8169 sysimgblt fb_sys_fops mii drm ahci libahci fjes wmi [ 439.903893] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.10.0-22-generic #24-Ubuntu [ 439.903894] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A99FX PRO R2.0, BIOS 2501 04/07/2014 [ 439.903895] Call Trace: [ 439.903896] <IRQ> [ 439.903899] dump_stack+0x63/0x81 [ 439.903900] __warn+0xcb/0xf0 [ 439.903901] warn_slowpath_null+0x1d/0x20 [ 439.903903] __domain_flush_pages+0x1f7/0x220 [ 439.903904] __queue_flush+0x4b/0xd0 [ 439.903905] ? queue_flush_all+0x90/0x90 [ 439.903907] queue_flush_all+0x77/0x90 [ 439.903908] queue_flush_timeout+0x18/0x20 [ 439.903910] call_timer_fn+0x35/0x140 [ 439.903911] run_timer_softirq+0x215/0x4b0 [ 439.903912] ? ktime_get+0x41/0xb0 [ 439.903914] ? lapic_next_event+0x1d/0x30 [ 439.903916] ? clockevents_program_event+0x7f/0x120 [ 439.903918] __do_softirq+0x104/0x2af [ 439.903919] irq_exit+0xb6/0xc0 [ 439.903921] smp_apic_timer_interrupt+0x3d/0x50 [ 439.903922] apic_timer_interrupt+0x89/0x90 [ 439.903924] RIP: 0010:cpuidle_enter_state+0x122/0x2c0 [ 439.903925] RSP: 0018:ffffb4e181a23e58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 [ 439.903926] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 000000000000001f [ 439.903926] RDX: 0000006665f96c97 RSI: ffff9dbcded56a98 RDI: 0000000000000000 [ 439.903927] RBP: ffffb4e181a23e98 R08: cccccccccccccccd R09: 0000000000000018 [ 439.903927] R10: 0000000000000da8 R11: 0000000000003557 R12: ffff9dbcd036b600 [ 439.903928] R13: ffffffffbaeeba38 R14: 0000000000000002 R15: ffffffffbaeeba20 [ 439.903929] </IRQ> [ 439.903930] ? cpuidle_enter_state+0x110/0x2c0 [ 439.903931] cpuidle_enter+0x17/0x20 [ 439.903933] call_cpuidle+0x23/0x40 [ 439.903934] do_idle+0x189/0x200 [ 439.903935] cpu_startup_entry+0x71/0x80 [ 439.903937] start_secondary+0x154/0x190 [ 439.903938] start_cpu+0x14/0x14 [ 439.903939] ---[ end trace 9edd64d3e01a6c8c ]--- And another one with iommu=soft boot option, where nothing interesting in dmesg shows up, but the system still crashes. Note: if I turn IOMMU off completely then USB devices are not working and I cannot use my keyboard/mouse so I cannot test that scenario. ProblemType: Bug DistroRelease: Ubuntu 17.04 Package: linux-image-generic 4.10.0.22.24 ProcVersionSignature: Ubuntu 4.10.0-22.24-generic 4.10.15 Uname: Linux 4.10.0-22-generic x86_64 ApportVersion: 2.20.4-0ubuntu4.1 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC0: edwin 2753 F.... pulseaudio /dev/snd/controlC2: edwin 2753 F.... pulseaudio /dev/snd/controlC1: edwin 2753 F.... pulseaudio Date: Tue Jun 6 21:09:45 2017 HibernationDevice: RESUME=UUID=3401e45a-9619-4ae8-9e4d-6dc1e7982524 InstallationDate: Installed on 2017-03-25 (72 days ago) InstallationMedia: Ubuntu-MATE 17.04 "Zesty Zapus" - Beta amd64 (20170321.1) MachineType: To be filled by O.E.M. To be filled by O.E.M. ProcEnviron: LANGUAGE=en_GB:en TERM=xterm PATH=(custom, no user) LANG=en_GB.UTF-8 SHELL=/bin/bash ProcFB: 0 amdgpudrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.10.0-22-generic root=/dev/mapper/ubuntu--mate--vg-root ro quiet splash vt.handoff=7 PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not accessible: Permission denied No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions: linux-restricted-modules-4.10.0-22-generic N/A linux-backports-modules-4.10.0-22-generic N/A linux-firmware 1.164.1 RfKill: 0: phy0: Wireless LAN Soft blocked: no Hard blocked: no SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 04/07/2014 dmi.bios.vendor: American Megatrends Inc. dmi.bios.version: 2501 dmi.board.asset.tag: To be filled by O.E.M. dmi.board.name: M5A99FX PRO R2.0 dmi.board.vendor: ASUSTeK COMPUTER INC. dmi.board.version: Rev 1.xx dmi.chassis.asset.tag: To Be Filled By O.E.M. dmi.chassis.type: 3 dmi.chassis.vendor: To Be Filled By O.E.M. dmi.chassis.version: To Be Filled By O.E.M. dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr2501:bd04/07/2014:svnTobefilledbyO.E.M.:pnTobefilledbyO.E.M.:pvrTobefilledbyO.E.M.:rvnASUSTeKCOMPUTERINC.:rnM5A99FXPROR2.0:rvrRev1.xx:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.: dmi.product.name: To be filled by O.E.M. dmi.product.version: To be filled by O.E.M. dmi.sys.vendor: To be filled by O.E.M. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1696240/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp