Hi Tino, Thanks for the report.
On Mon, Jul 23, 2018 at 02:29:32PM +0200, Tino Lehnig wrote: > Hello, > > after enabling the writeback feature in zram, I encountered the kernel bug > below with heavy swap utilization. There is one specific workload that > triggers the bug reliably and that is running Windows in KVM while > overcommitting memory. The Windows VMs would fill all allocated memory with > zero pages while booting. A few seconds after the host hits zram swap, the > console on the host is flooded with the bug message. A few more seconds > later I also encountered filesystem errors on the host causing the root > filesystem to be mounted read-only. The filesystem errors do not occur when > leaving RAM available for the host OS by limiting physical memory of the > QEMU processes via cgroups. > > I started three KVM instances with the following commands in my tests. Any > Windows ISO or disk image can be used. Less instances and smaller allocated > memory will also trigger the bug as long as swapping occurs. The type of > writeback device does not seem to matter. I have tried a SATA SSD and an > NVMe Optane drive so far. My test machine has 256 GB of RAM and one CPU. I > saw the same behavior on another machine with two CPUs and 128 GB of RAM. > > The bug does not occur when using zram as swap without "backing_dev" being > set, but I had even more severe problems when running the same test on > Ubuntu Kernels 4.15 and 4.17. Regardless of the writeback feature being used > or not, the host would eventually lock up entirely when swap is in use on > zram. The lockups may not be related directly to zram though and were > apparently fixed in 4.18. I had absolutely no problems on Ubuntu Kernel 4.13 > either, before the writeback feature was introduced. We didn't release v4.18 yet. Could you say what kernel tree/what version you used? Now I don't have enough time to dig in. Sergey, I really appreciate if you could have availabe time to look into. Anyway, I could try to see it asap if Sergey is not available. No worry. Thanks. > > Thank you for your attention. > > -- > > commands used: > > modprobe zram > echo 1 > /sys/block/zram0/reset > echo lz4 > /sys/block/zram0/comp_algorithm > echo /dev/nvme0n1 > /sys/block/zram0/backing_dev > echo 256G > /sys/block/zram0/disksize > mkswap /dev/zram0 > swapon /dev/zram0 > > kvm -nographic -smp 20 -m 131072 -cdrom winpe.iso > > -- > > log message: > > BUG: Bad page state in process qemu-system-x86 pfn:3dfab21 > page:ffffdfb137eac840 count:0 mapcount:0 mapping:0000000000000000 index:0x1 > flags: 0x17fffc000000008(uptodate) > raw: 017fffc000000008 dead000000000100 dead000000000200 0000000000000000 > raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 > page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set > bad because of flags: 0x8(uptodate) > Modules linked in: lz4 lz4_compress zram zsmalloc intel_rapl sb_edac > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel bin > fmt_misc pcbc aesni_intel aes_x86_64 crypto_simd cryptd iTCO_wdt glue_helper > iTCO_vendor_support intel_cstate lpc_ich mei_me intel_uncore intel_rapl_perf > pcspkr joydev sg mfd_core ioatdma mei wmi evdev ipmi_si ipmi_devintf > ipmi_msghandler > acpi_power_meter acpi_pad button ip_tables x_tables autofs4 ext4 > crc32c_generic crc16 mbcache jbd2 fscrypto hid_generic usbhid hid sd_mod > xhci_pci ehci_pci ahci libahci xhci_hcd ehci_hcd libata igb i2c_algo_bit > crc32c_intel scsi_mod i2c_i8 > 01 dca usbcore > CPU: 4 PID: 1039 Comm: qemu-system-x86 Tainted: G B 4.18.0-rc5+ #1 > Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017 > Call Trace: > dump_stack+0x5c/0x7b > bad_page+0xba/0x120 > get_page_from_freelist+0x1016/0x1250 > __alloc_pages_nodemask+0xfa/0x250 > alloc_pages_vma+0x7c/0x1c0 > do_swap_page+0x347/0x920 > ? __update_load_avg_se.isra.38+0x1eb/0x1f0 > ? cpumask_next_wrap+0x3d/0x60 > __handle_mm_fault+0x7b4/0x1110 > ? update_load_avg+0x5ea/0x720 > handle_mm_fault+0xfc/0x1f0 > __get_user_pages+0x12f/0x690 > get_user_pages_unlocked+0x148/0x1f0 > __gfn_to_pfn_memslot+0xff/0x3c0 [kvm] > try_async_pf+0x87/0x230 [kvm] > tdp_page_fault+0x132/0x290 [kvm] > ? vmexit_fill_RSB+0xc/0x30 [kvm_intel] > kvm_mmu_page_fault+0x74/0x570 [kvm] > ? vmexit_fill_RSB+0xc/0x30 [kvm_intel] > ? vmexit_fill_RSB+0x18/0x30 [kvm_intel] > ? vmexit_fill_RSB+0xc/0x30 [kvm_intel] > ? vmexit_fill_RSB+0x18/0x30 [kvm_intel] > ? vmexit_fill_RSB+0xc/0x30 [kvm_intel] > ? vmexit_fill_RSB+0x18/0x30 [kvm_intel] > ? vmexit_fill_RSB+0xc/0x30 [kvm_intel] > ? vmexit_fill_RSB+0x18/0x30 [kvm_intel] > ? vmexit_fill_RSB+0xc/0x30 [kvm_intel] > ? vmexit_fill_RSB+0x18/0x30 [kvm_intel] > ? vmexit_fill_RSB+0xc/0x30 [kvm_intel] > ? vmx_vcpu_run+0x375/0x620 [kvm_intel] > kvm_arch_vcpu_ioctl_run+0x9b3/0x1990 [kvm] > ? __update_load_avg_se.isra.38+0x1eb/0x1f0 > ? kvm_vcpu_ioctl+0x388/0x5d0 [kvm] > kvm_vcpu_ioctl+0x388/0x5d0 [kvm] > ? __switch_to+0x395/0x450 > ? __switch_to+0x395/0x450 > do_vfs_ioctl+0xa2/0x630 > ? __schedule+0x3fd/0x890 > ksys_ioctl+0x70/0x80 > ? exit_to_usermode_loop+0xca/0xf0 > __x64_sys_ioctl+0x16/0x20 > do_syscall_64+0x55/0x100 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x7fb30361add7 > Code: 00 00 00 48 8b 05 c1 80 2b 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff > ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff > 73 01 c3 48 8b 0d 91 80 2b 00 f7 d8 64 89 01 48 > RSP: 002b:00007fb2e97f98b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fb30361add7 > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000015 > RBP: 00005652b984e0f0 R08: 00005652b7d513d0 R09: 0000000000000001 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 00007fb308c66000 R14: 0000000000000000 R15: 00005652b984e0f0 > > -- > > ver_linux: Debian 9.5 with Kernel 4.18.0-rc5+ > > GNU C 6.3.0 > GNU Make 4.1 > Binutils 2.28 > Util-linux 2.29.2 > Mount 2.29.2 > Module-init-tools 23 > E2fsprogs 1.43.4 > Linux C Library 2.24 > Dynamic linker (ldd) 2.24 > Linux C++ Library 6.0.22 > Procps 3.3.12 > Kbd 2.0.3 > Console-tools 2.0.3 > Sh-utils 8.26 > Udev 232 > > -- > > cpuinfo: > > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 79 > model name : Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > stepping : 1 > microcode : 0xb000021 > cpu MHz : 1200.632 > cache size : 25600 KB > physical id : 0 > siblings : 20 > core id : 0 > cpu cores : 10 > apicid : 0 > initial apicid : 0 > fpu : yes > fpu_exception : yes > cpuid level : 20 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat > pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb > rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology > nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est > tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt > tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch > cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin tpr_shadow vnmi > flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms > invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc > cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts > bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass > bogomips : 4400.00 > clflush size : 64 > cache_alignment : 64 > address sizes : 46 bits physical, 48 bits virtual > power management: > > -- > Kind regards, > > Tino Lehnig