Hi everybody,
I'm really greatful about stumbling upon this issue, because it
describes the exact same issue I've been experiencing for a while now.
Basically whenever I upload file/s via. rsync/Firefox/Chromium, within
several seconds my entire Linux system crashes. I've experienced this
issue on Debian 10, but it also shows up on ArchLinux. In my case the
modem in charge is an M.2. module Huawei ME906s (USB ID 12d1:15c1).
I've also tried debugging via. kdump and I've got different kernel
errors across multiple crashes and I've tried logging my debugging issue
resolving problems on this gist [0].
It doesn't matter if I'm uploading files from a ramfs (/tmp/) or my SATA
SSD.
I'm also using modemmanager and network-manager.
I switched ISP and thought the issue was resolved, but I've just tried
uploading a file again and it still crashes my Linux 4.17.2-1-ARCH
kernel (so I guess this is a Linux and not Debian only related issue).
[0]: https://gist.github.com/norpol/d5b043d6082ace9fc232527d4835f045 or
attachment
# Debugging Linux Kernel Crash
## Error description:
Almost everytime I'm uploading a bigger file (65MB in this case) via. my browser (Firefox, build provided by mozilla.org as `.tar.gz`), my system crashes.
Issue especially happens when I'm doing different things at the same time. (Watching a video, reading email + uploading a file). System is using an SSD, bug also appears if the file is served from `/tmp`, though.
-,- | -,-
--- | ---
OS | Debian Testing (Release Buster / 10)
Kernel | `Linux 4.14.0-3-amd64 #1 SMP Debian 4.14.17-1 (2018-02-14) x86_64 GNU/Linux`
CPU | `Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz`
Machine | Thinkpad T560
EFI | `EFI v2.40 by Lenovo, efi: SMBIOS=0xb705e000 ACPI=0xb7ffe000 ACPI 2.0=0xb7ffe014 MPS=0xb7f48000 ESRT=0xb6aa8000`
Boot method | efistub
storage | Samsung SSD 840 EVO (256GB)`, LUKS (with LVM), rootfs=btrfs, homefs=ext4, cryptswap in LVM`
The issue is persistent for multiple Kernel upgrades, though. Also showed up back when Debian testing was called Stretch.
Issue mostly appears on file uploads via. LTE-modem.
## Actions
- [ ] Intel uCode upgrade didn't help.
- [ ] Vendor BIOS/uEFI upgrade didn't help.
- [ ] Disabling apparmor didn't help.
- [ ] Disabling/chaning IO scheduler didn't help.
- [ ] Reinstalling operating system from Debian => archlinux didn't help
- [ ] Disabling anything power saving related in BIOS, didn't help
- See [Skylake crash bug arstechnica (2017)](https://arstechnica.com/information-technology/2017/06/skylake-kaby-lake-chips-have-a-crash-bug-with-hyperthreading-enabled/)
- Basically setting c-state to 1 [might also work](https://askubuntu.com/questions/749349/how-to-set-intel-idle-max-cstate-1)
Installing and setup `kdump-tools` (had to set `/proc/cmdline` => `nmi_watchdog=1`, otherwise kdump failed to load kdump kernel on crash).
## Other
Early bootup BIOS warning:
```
[ +0.000000] Kernel command line: initrd=\initrd.img root=/dev/mapper/system-root resume=UUID=d9506118-b9e2-49db-9385-f731ef1c8615 ro quiet splash crashkernel=384M nmi_watchdog=1
[ +0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[ +0.000000] Calgary: detecting Calgary via BIOS EBDA area
[ +0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing!
```
### kdump-tools dmesg error trace
Note: I have multiple crashes, this is the only one containing `[ cut here ]` section.
```
------------[ cut here ]------------
WARNING: CPU: 2 PID: 2206 at /build/linux-K4nuoe/linux-4.14.17/mm/vmacache.c:102 vmacache_find+0x96/0xa0
Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables devlink iptable_filter cdc_mbim cdc_wdm cdc_ncm snd_hrtimer snd_seq snd_seq_device cpufreq_userspace cpufreq_powersave cpufreq_conservative wireguard(O) ip6_udp_tunnel udp_tunnel binfmt_misc nls_ascii nls_cp437 vfat fat ext4 mbcache jbd2 fscrypto ecb arc4 iwlmvm snd_soc_skl snd_hda_codec_hdmi snd_soc_skl_ipc intel_rapl snd_soc_sst_ipc btusb x86_pkg_temp_thermal snd_soc_sst_dsp intel_powerclamp btrtl mac80211 btbcm snd_hda_ext_core snd_hda_codec_realtek coretemp btintel snd_soc_sst_match efi_pstore snd_hda_codec_generic kvm_intel bluetooth snd_soc_core snd_compress kvm snd_hda_intel irqbypass uvcvideo videobuf2_vmalloc intel_cstate videobuf2_memops intel_uncore videobuf2_v4l2 iwlwifi intel_rapl_perf snd_hda_codec serio_raw wmi_bmof videobuf2_core snd_hda_core efivars rtsx_pci_ms drbg cfg80211 memstick ansi_cprng snd_hwdep cdc_ether option videodev snd_pcm usb_wwan thinkpad_acpi usbnet iTCO_wdt usbserial mei_me snd_timer ecdh_generic nvram mii iTCO_vendor_support media sg crc16 joydev shpchp mei snd soundcore intel_pch_thermal rfkill battery ac evdev nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nft_counter nft_ct nf_conntrack nft_meta nft_set_bitmap nft_set_hash nft_set_rbtree nf_tables_inet nf_tables_ipv6 nf_tables_ipv4 nf_tables nfnetlink sunrpc efivarfs ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress xxhash algif_skcipher af_alg dm_crypt dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc rtsx_pci_sdmmc mmc_core aesni_intel ahci i915 libahci i2c_algo_bit aes_x86_64 e1000e rtsx_pci crypto_simd glue_helper xhci_pci ptp cryptd pps_core drm_kms_helper psmouse i2c_i801 mfd_core xhci_hcd libata usbcore scsi_mod drm usb_common thermal wmi video button
CPU: 2 PID: 2206 Comm: firefox Tainted: G O 4.14.0-3-amd64 #1 Debian 4.14.17-1
Hardware name: LENOVO [REMOVED], BIOS N1KET21W (1.08 ) 04/20/2016
task: ffff92143f2de000 task.stack: ffffaf9d81dfc000
RIP: 0010:vmacache_find+0x96/0xa0
RSP: 0000:ffffaf9d81dffec0 EFLAGS: 00010207
RAX: ffff921404f23410 RBX: 00007fe232700008 RCX: 0000000000000002
RDX: 0000000000000002 RSI: 00007fe232700008 RDI: ffff92146e447140
RBP: ffff92146e447140 R08: 00007fe250400018 R09: 00000000ffffffff
R10: 00000000ffffffff R11: 00007fe21b600000 R12: ffffaf9d81dfff58
R13: ffff92146e447140 R14: 0000000000000054 R15: ffff92143f2de000
FS: 00007fe251a64740(0000) GS:ffff921481500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe232700008 CR3: 00000001ff380002 CR4: 00000000003606e0
Call Trace:
find_vma+0x16/0x70
__do_page_fault+0x172/0x4e0
? SyS_read+0x76/0xc0
? page_fault+0x36/0x60
page_fault+0x4c/0x60
RIP: 0033:0x41a0dc
RSP: 002b:00007ffc405b1730 EFLAGS: 00010206
Code: 01 00 48 8b 84 c8 80 04 00 00 48 85 c0 74 11 48 39 78 40 75 16 48 39 30 77 06 48 39 70 08 77 8e 83 c2 01 83 fa 04 75 ce 31 c0 c3 <0f> ff 31 c0 c3 f3 c3 90 90 90 0f 1f 44 00 00 41 54 55 ba ff ff
---[ end trace 8a3827954d6da8d6 ]---
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: (null)
PGD 800000022ac5b067 P4D 800000022ac5b067 PUD 231192067 PMD 0
SMP PTI
Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables devlink iptable_filter cdc_mbim cdc_wdm cdc_ncm snd_hrtimer snd_seq snd_seq_device cpufreq_userspace cpufreq_powersave cpufreq_conservative wireguard(O) ip6_udp_tunnel udp_tunnel binfmt_misc nls_ascii nls_cp437 vfat fat ext4 mbcache jbd2 fscrypto ecb arc4 iwlmvm snd_soc_skl snd_hda_codec_hdmi snd_soc_skl_ipc intel_rapl snd_soc_sst_ipc btusb x86_pkg_temp_thermal snd_soc_sst_dsp intel_powerclamp btrtl mac80211 btbcm snd_hda_ext_core snd_hda_codec_realtek coretemp btintel snd_soc_sst_match efi_pstore snd_hda_codec_generic kvm_intel bluetooth snd_soc_core snd_compress
kvm snd_hda_intel irqbypass uvcvideo videobuf2_vmalloc intel_cstate videobuf2_memops intel_uncore videobuf2_v4l2 iwlwifi intel_rapl_perf snd_hda_codec serio_raw wmi_bmof videobuf2_core snd_hda_core efivars rtsx_pci_ms drbg cfg80211 memstick ansi_cprng snd_hwdep cdc_ether option videodev snd_pcm usb_wwan thinkpad_acpi usbnet iTCO_wdt usbserial mei_me snd_timer ecdh_generic nvram mii iTCO_vendor_support media sg crc16 joydev shpchp mei snd soundcore intel_pch_thermal rfkill battery ac evdev nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nft_counter nft_ct nf_conntrack nft_meta nft_set_bitmap nft_set_hash nft_set_rbtree nf_tables_inet nf_tables_ipv6 nf_tables_ipv4 nf_tables nfnetlink sunrpc efivarfs ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress xxhash
algif_skcipher af_alg dm_crypt dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc rtsx_pci_sdmmc mmc_core aesni_intel ahci i915 libahci i2c_algo_bit aes_x86_64 e1000e rtsx_pci crypto_simd glue_helper xhci_pci ptp cryptd pps_core drm_kms_helper psmouse i2c_i801 mfd_core xhci_hcd libata usbcore scsi_mod drm usb_common thermal wmi video button
CPU: 1 PID: 3171 Comm: Chrome_~dThread Tainted: G W O 4.14.0-3-amd64 #1 Debian 4.14.17-1
Hardware name: LENOVO [REMOVED], BIOS N1KET21W (1.08 ) 04/20/2016
task: ffff92141d6d2040 task.stack: ffffaf9d882cc000
RIP: 0010: (null)
RSP: 0000:ffff921481483f38 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff92148149cd00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff92148149cd80 RDI: ffffaf9d83b57d08
RBP: ffffaf9d83b57d08 R08: 00000000003d0900 R09: 00000062126e1800
R10: 0000000000000000 R11: 0000000000000001 R12: ffff92148149cd80
R13: 0000000000000000 R14: 0000000000000001 R15: ffff92148149ce28
FS: 00007f3646728700(0000) GS:ffff921481480000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000022e71c004 CR4: 00000000003606e0
Call Trace:
<IRQ>
? __hrtimer_run_queues+0xde/0x230
? hrtimer_interrupt+0xa6/0x1f0
? smp_apic_timer_interrupt+0x66/0x120
? apic_timer_interrupt+0x98/0xa0
</IRQ>
? __clear_user+0xe/0x50
? copy_fpstate_to_sigframe+0x8e/0x1e0
? get_sigframe.isra.13.constprop.14+0x19f/0x1c0
? do_signal+0x1ba/0x6b0
? force_sig_info_fault+0x97/0xf0
? is_prefetch.isra.24+0x91/0x1a0
? __bad_area_nosemaphore+0x9b/0x1b0
? __do_page_fault+0x37b/0x4e0
? page_fault+0x36/0x60
? exit_to_usermode_loop+0x6e/0xc0
? prepare_exit_to_usermode+0x5e/0x60
? retint_user+0x8/0x8
Code: Bad RIP value.
RIP: (null) RSP: ffff921481483f38
CR2: 0000000000000000
```
`echo "code: [...] | linux.git/scripts/decodecode` output:
```
Code: 01 00 48 8b 84 c8 80 04 00 00 48 85 c0 74 11 48 39 78 40 75 16 48 39 30 77 06 48 39 70 08 77 8e 83 c2 01 83 fa 04 75 ce 31 c0 c3 <0f> ff 31 c0 c3 f3 c3 90 90 90 0f 1f 44 00 00 41 54 55 ba ff ff
All code
========
0: 01 00 add %eax,(%rax)
2: 48 8b 84 c8 80 04 00 mov 0x480(%rax,%rcx,8),%rax
9: 00
a: 48 85 c0 test %rax,%rax
d: 74 11 je 0x20
f: 48 39 78 40 cmp %rdi,0x40(%rax)
13: 75 16 jne 0x2b
15: 48 39 30 cmp %rsi,(%rax)
18: 77 06 ja 0x20
1a: 48 39 70 08 cmp %rsi,0x8(%rax)
1e: 77 8e ja 0xffffffffffffffae
20: 83 c2 01 add $0x1,%edx
23: 83 fa 04 cmp $0x4,%edx
26: 75 ce jne 0xfffffffffffffff6
28: 31 c0 xor %eax,%eax
2a: c3 retq
2b:* 0f ff 31 ud0 (%rcx),%esi <-- trapping instruction
2e: c0 c3 f3 rol $0xf3,%bl
31: c3 retq
32: 90 nop
33: 90 nop
34: 90 nop
35: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
3a: 41 54 push %r12
3c: 55 push %rbp
3d: ba .byte 0xba
3e: ff (bad)
3f: ff .byte 0xff
Code starting with the faulting instruction
===========================================
0: 0f ff 31 ud0 (%rcx),%esi
3: c0 c3 f3 rol $0xf3,%bl
6: c3 retq
7: 90 nop
8: 90 nop
9: 90 nop
a: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
f: 41 54 push %r12
11: 55 push %rbp
12: ba .byte 0xba
13: ff (bad)
14: ff .byte 0xff
```