Re: [PATCH v3] sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2
I did a rudimentary benchmark on the same 8-node Sun Fire X4600-M2, on top of todays 5.11.0-rc7-2-ge0756cfc7d7c. The test: building clean kernel with make -j64 after make clean and drop_caches. While running clean kernel / 3 tries): real2m38.574s user46m18.387s sys 6m8.724s real2m37.647s user46m34.171s sys 6m11.993s real2m37.832s user46m34.910s sys 6m12.013s While running patched kernel: real2m40.072s user46m22.610s sys 6m6.658s for real time, seems to be 1.5s-2s slower out of 160s (noise?) User and system time are slightly less, on the other hand, so seems good to me. -- Meelis Roos
Re: [PATCH v2] sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2
03.02.21 13:12 Barry Song wrote: kernel/sched/topology.c | 85 + 1 file changed, 53 insertions(+), 32 deletions(-) diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 5d3675c7a76b..964ed89001fe 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c This one still works on the Sun X4600-M2, on top of v5.11-rc6-55-g3aaf0a27ffc2. Performance-wise - is the some simple benhmark to run to meaure the impact? Compared to what - 5.10.0 or the kernel with the warning? drop caches and time the build time of linux kernel with make -j64? -- Meelis Roos
Re: [RFC PATCH v2] sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2
Tested by the below topology: qemu-system-aarch64 -M virt -nographic \ Also works on the initial 8-node Sun Fire X4600-M2. No strange messages in dmesg and no problems on kernel build with make -j64. Tested-by: Meelis Roos
Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes
Could you paste the output of the below? $ cat /sys/devices/system/node/node*/distance 10 12 12 14 14 14 14 16 12 10 14 12 14 14 12 14 12 14 10 14 12 12 14 14 14 12 14 10 12 12 14 14 14 14 12 12 10 14 12 14 14 14 12 12 14 10 14 12 14 12 14 14 12 14 10 12 16 14 14 14 14 12 12 10 Additionally, booting your system with CONFIG_SCHED_DEBUG=y and appending 'sched_debug' to your cmdline should yield some extra data. [0.00] Linux version 5.11.0-rc4-00015-g45dfb8a5659a (mroos@x4600m2) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1) #55 SMP Thu Jan 21 19:23:10 EET 2021 [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-5.11.0-rc4-00015-g45dfb8a5659a root=/dev/sda1 ro quiet [0.00] x86/fpu: x87 FPU will use FXSAVE [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x00099bff] usable [0.00] BIOS-e820: [mem 0x00099c00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e6000-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xd7f9] usable [0.00] BIOS-e820: [mem 0xd7fae000-0xd7fa] type 9 [0.00] BIOS-e820: [mem 0xd7fb-0xd7fbdfff] ACPI data [0.00] BIOS-e820: [mem 0xd7fbe000-0xd7fe] ACPI NVS [0.00] BIOS-e820: [mem 0xd7ff-0xd7ff] reserved [0.00] BIOS-e820: [mem 0xdc00-0xefff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xff70-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x002027ff] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.5 present. [0.00] DMI: Sun Microsystems Sun Fire X4600 M2/Sun Fire X4600 M2, BIOS 0ABIT132 12/03/2009 [0.00] tsc: Fast TSC calibration using PIT [0.00] tsc: Detected 2293.794 MHz processor [0.005734] e820: update [mem 0x-0x0fff] usable ==> reserved [0.005740] e820: remove [mem 0x000a-0x000f] usable [0.011432] AGP: No AGP bridge found [0.011578] last_pfn = 0x2028000 max_arch_pfn = 0x4 [0.011601] MTRR default type: uncachable [0.011604] MTRR fixed ranges enabled: [0.011607] 0-9 write-back [0.011610] A-E uncachable [0.011612] F-F write-protect [0.011614] MTRR variable ranges enabled: [0.011616] 0 base mask 8000 write-back [0.011620] 1 base 8000 mask C000 write-back [0.011623] 2 base C000 mask F000 write-back [0.011626] 3 base D000 mask F800 write-back [0.011629] 4 disabled [0.011630] 5 disabled [0.011632] 6 disabled [0.011633] 7 disabled [0.011634] TOM2: 00202800 aka 131712M [0.012697] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT [0.013048] e820: update [mem 0xd800-0x] usable ==> reserved [0.013083] last_pfn = 0xd7fa0 max_arch_pfn = 0x4 [0.018157] found SMP MP-table at [mem 0x000ff780-0x000ff78f] [0.018215] Using GB pages for direct mapping [0.018603] ACPI: Early table checksum verification disabled [0.018613] ACPI: RSDP 0x000F9EE0 24 (v02 SUN ) [0.018623] ACPI: XSDT 0xD7FB0100 9C (v01 SUNX4600 M2 0132 MSFT 0097) [0.018635] ACPI: FACP 0xD7FB0290 F4 (v03 SUNX4600 M2 0132 MSFT 0097) [0.018645] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 64/32 (20201113/tbfadt-564) [0.018652] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe1Block: 128/64 (20201113/tbfadt-564) [0.018658] ACPI: DSDT 0xD7FB0710 007DF7 (v01 SUNX4600 M2 0132 INTL 20051117) [0.018664] ACPI: FACS 0xD7FBE000 40 [0.018667] ACPI: FACS 0xD7FBE000 40 [0.018671] ACPI: APIC 0xD7FB0390 000170 (v01 SUNX4600 M2 0132 MSFT 0097) [0.018676] ACPI: SPCR 0xD7FB0500 50 (v01 SUNX4600 M2 0132 MSFT 0097) [0.018681] ACPI: MCFG 0xD7FB0550 3C (v01 SUNX4600 M2 0132 MSFT 0097) [0.018686] ACPI: SLIT 0xD7FB064C 6C (v01 SUNX4600 M2 0132 MSFT 0097) [0.018691] ACPI: SPMI 0xD7FB06C0 41 (v05 SUNOEMSPMI 0132 MSFT 0097) [0.018695] ACPI: OEMB 0xD7FBE040 63 (v01 SUNX4600 M2 0132 MSFT 0097) [0.018700] ACPI: SRAT 0xD7FB8510 0003C0 (v01 AMDFAM_F_10 0002 AMD 0001) [0.018705] ACPI: HPET 0xD7FB88D0 38 (v01 SUNX4600 M2 0132 MSFT 0097) [0.018709] ACPI: IPET 0xD7FB8910 38 (v01 SUNX4600
VGA text console corruption in 5.9.0 and 5.10-rc4
5.9 introduces VGA console corruption in one of my test PC-s (I do not have VGA console on most). The PC has Intel D2550MUD2 board with Atom D2550. The symptoms include: * missing screen updates on VT switch * fragments of other VT-s appear during scrolling (kernel compilation output on visible VT1 scrolls up, sometimes it includes 5 or so lines from curses application on VT2 or its scroll-back history) * missing up-scrolling of lines/fragments in curses applications. Visible in make menuconfig and mc and maybe more (these are the ones I can describe mostly clearly). 5.9.0 with fbcon (as packaged by debian) does not show these symptoms. 5.9.0 and todays 5.10-rc4+git exhibit this behavior if I let them use VGA text console. $ lspci -nn 00:00.0 Host bridge [0600]: Intel Corporation Atom Processor D2xxx/N2xxx DRAM Controller [8086:0bf3] (rev 04) 00:02.0 VGA compatible controller [0300]: Intel Corporation Atom Processor D2xxx/N2xxx Integrated Graphics Controller [8086:0be2] (rev 0b) 00:1b.0 Audio device [0403]: Intel Corporation NM10/ICH7 Family High Definition Audio Controller [8086:27d8] (rev 02) 00:1c.0 PCI bridge [0604]: Intel Corporation NM10/ICH7 Family PCI Express Port 1 [8086:27d0] (rev 02) 00:1d.0 USB controller [0c03]: Intel Corporation NM10/ICH7 Family USB UHCI Controller #1 [8086:27c8] (rev 02) 00:1d.1 USB controller [0c03]: Intel Corporation NM10/ICH7 Family USB UHCI Controller #2 [8086:27c9] (rev 02) 00:1d.2 USB controller [0c03]: Intel Corporation NM10/ICH7 Family USB UHCI Controller #3 [8086:27ca] (rev 02) 00:1d.3 USB controller [0c03]: Intel Corporation NM10/ICH7 Family USB UHCI Controller #4 [8086:27cb] (rev 02) 00:1d.7 USB controller [0c03]: Intel Corporation NM10/ICH7 Family USB2 EHCI Controller [8086:27cc] (rev 02) 00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev e2) 00:1f.0 ISA bridge [0601]: Intel Corporation NM10 Family LPC Controller [8086:27bc] (rev 02) 00:1f.2 SATA controller [0106]: Intel Corporation NM10/ICH7 Family SATA Controller [AHCI mode] [8086:27c1] (rev 02) 00:1f.3 SMBus [0c05]: Intel Corporation NM10/ICH7 Family SMBus Controller [8086:27da] (rev 02) 01:00.0 Ethernet controller [0200]: Intel Corporation 82574L Gigabit Network Connection [8086:10d3] Nothing interesting in dmesg, selected lines: [0.00] Linux version 5.10.0-rc4-00067-g9c87c9f41245 (mroos@d2550) (gcc (Debian 10.2.0-15) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #8 SMP Tue Nov 17 14:39:11 EET 2020 [0.00] DMI: /D2550MUD2, BIOS MUCDT10N.86A.0075.2013.0427.1548 04/27/2013 [0.001878] MTRR default type: uncachable [0.001881] MTRR fixed ranges enabled: [0.001885] 0-9 write-back [0.001888] A-B uncachable [0.001891] C-D write-protect [0.001893] E-F uncachable [0.001896] MTRR variable ranges enabled: [0.001900] 0 base 0 mask F8000 write-back [0.001903] 1 base 07F00 mask FFF00 uncachable [0.001907] 2 base 0FFE0 mask FFFE0 write-protect [0.001909] 3 disabled [0.001911] 4 disabled [0.001913] 5 disabled [0.001915] 6 disabled [0.002024] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT [0.028625] ACPI: HPET id: 0x8086a201 base: 0xfed0 [0.028636] smpboot: Allowing 4 CPUs, 0 hotplug CPUs [0.095056] Console: colour VGA+ 80x25 [0.099221] printk: console [tty0] enabled [0.226357] smpboot: CPU0: Intel(R) Atom(TM) CPU D2550 @ 1.86GHz (family: 0x6, model: 0x36, stepping: 0x1)[0.095056] Console: colour VGA+ 80x25 [0.227697] smp: Bringing up secondary CPUs ... [0.227697] x86: Booting SMP configuration: [0.227697] node #0, CPUs: #1 [0.010909] Disabled fast string operations [0.228016] #2 [0.010909] Disabled fast string operations [0.231720] #3 [0.010909] Disabled fast string operations [0.233935] smp: Brought up 1 node, 4 CPUs [0.233935] smpboot: Max logical packages: 1 [0.233935] smpboot: Total of 4 processors activated (14934.80 BogoMIPS) [0.238692] PCI: MMCONFIG for domain [bus 00-3f] at [mem 0xe000-0xe3ff] (base 0xe000) [0.238756] PCI: MMCONFIG at [mem 0xe000-0xe3ff] reserved in E820 [0.238824] PCI: Using configuration type 1 for base access [0.243986] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages Machine-specific config, from compiling current git: # # Automatically generated file; DO NOT EDIT. # Linux/x86 5.10.0-rc4 Kernel Configuration # CONFIG_CC_VERSION_TEXT="gcc (Debian 10.2.0-15) 10.2.0" CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=100200 CONFIG_LD_VERSION=23501 CONFIG_CLANG_VERSION=0 CONFIG_CC_CAN_LINK=y CONFIG_CC_CAN_LINK_STATIC=y CONFIG_CC_HAS_ASM_GOTO=y CONFIG_CC_HAS_ASM_INLINE=y CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_TABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set CONFI
page granularity memory corruption on alpha (5.8, 5.9)
I have an AlphaServer DS20E that ran 5.6.0 fine. 5.8.0 had a problem during rc's - ext4 mounting failed due to corrupt data (looked like memory corruption but was very deterministic). 5.8.0 release booted fine once but if 5.9-git failed again, I recompiled 5.8.0 and that failed too. Next 5.9-git kernels booted but corrupted files - I updated debian-ports distro and it broke a files list file for some package or another (garbage at end of file). Tried 5.9.0-00282-g1e6d1d96461e yesterday and that fails too: I tried git pull and building the kernel with newest gcc and drivers/mfd/Makefile had 8192 bytes of correct contents and binary garbage with a structure after that. I also checked debian-ports packaged 5.8.0-3-alpha-generic kernel and it seemed to work without corruption - perhaps something is wrong with my configuration (but it worked before). Sample corruption from the Makefile: od -A d -c shows ... 0008160 - c o r e . o l m 3 5 3 3 - c 0008176 t r l b a n k . o \n o b j - $ ( 0008192 \0 \0 \0 \0 \0 \0 \0 \0 002 \0 \0 \0 \0 \0 \0 \0 0008208 341 242 003 001 \0 \0 \0 247 361 \a 001 \0 \0 \0 0008224 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0008240 304 277 \0 \0 \0 002 \0 \0 : 373 237 | \0 \0 \0 \0 0008256 001 \0 \0 \0 \0 \0 \0 \0 320 345 002 \0 \0 002 \0 \0 0008272 205 9 \0 001 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0008288 033 001 \0 \0 \0 \0 \0 \0 H 330 006 \0 \0 002 \0 \0 0008304 \0 340 002 \0 \0 002 \0 \0 320 340 274 037 001 \0 \0 \0 0008320 330 340 274 037 001 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0008336 004 311 \0 \0 \0 002 \0 \0 : 373 237 | \0 \0 \0 \0 0008352 205 9 \0 001 \0 \0 \0 \0 343 274 037 001 \0 \0 \0 0008368 320 345 002 \0 \0 002 \0 \0 @ 342 274 037 001 \0 \0 \0 0008384 ( 342 274 037 001 \0 \0 \0 210 265 003 \0 \0 002 \0 \0 0008400 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0008416 H 330 006 \0 \0 002 \0 \0 354 177 362 001 \0 \0 \0 \0 and the same in od -x (corruption starting from 2 octal) - looks like 64-bit values with two bytes of zero. 0017660 0929 3d2b 7320 7379 6f63 2e6e 0a6f 626f 0017700 2d6a 2824 4f43 464e 4749 4d5f 4446 4c5f 0017720 334d 3335 2933 2b09 203d 6d6c 3533 0017740 632d 726f 2e65 206f 6d6c 3533 632d 0017760 7274 626c 6e61 2e6b 0a6f 626f 2d6a 2824 002 0002 0020020 a2e1 2003 0001 f1a7 2007 0001 0020040 0020060 bfc4 0200 fb3a 7c9f 0020100 0001 e5d0 0002 0200 0020120 3985 2000 0001 0020140 011b d848 0006 0200 0020160 e000 0002 0200 e0d0 1fbc 0001 0020200 e0d8 1fbc 0001 0020220 c904 0200 fb3a 7c9f 0020240 3985 2000 0001 e300 1fbc 0001 0020260 e5d0 0002 0200 e240 1fbc 0001 0020300 e228 1fbc 0001 b588 0003 0200 0020320 0020340 d848 0006 0200 7fec 01f2 0020360 e0d8 1fbc 0001 d990 0005 0200 0020400 e228 1fbc 0001 1350 2000 0001 It has custom kernel configuration: # # Automatically generated file; DO NOT EDIT. # Linux/alpha 5.9.0 Kernel Configuration # CONFIG_CC_VERSION_TEXT="gcc (Debian 10.2.0-13) 10.2.0" CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=100200 CONFIG_LD_VERSION=23501 CONFIG_CLANG_VERSION=0 CONFIG_CC_CAN_LINK=y CONFIG_CC_CAN_LINK_STATIC=y CONFIG_CC_HAS_ASM_GOTO=y CONFIG_CC_HAS_ASM_INLINE=y CONFIG_IRQ_WORK=y # # General setup # CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_BUILD_SALT="" CONFIG_DEFAULT_INIT="" CONFIG_DEFAULT_HOSTNAME="ds20e" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y # CONFIG_WATCH_QUEUE is not set CONFIG_CROSS_MEMORY_ATTACH=y # CONFIG_USELIB is not set CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_GENERIC_MSI_IRQ_DOMAIN=y # CONFIG_GENERIC_IRQ_DEBUGFS is not set # end of IRQ subsystem CONFIG_GENERIC_CLOCKEVENTS=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y # end of Timers subsystem CONFIG_PREEMPT_NONE=y # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y # CONFIG_PSI is not set # end of CPU/Task time and stats accounting # # RCU Subs
Re: gcc crashes with general protection faults in 5.9.0-rc5
e029f3c0 R12: 01ab53e0 [ 1513.209020] R13: 0003 R14: R15: 7f60da91b1f8 [ 1513.209023] Modules linked in: dm_mod md_mod cpufreq_conservative cpufreq_userspace cpufreq_powersave pktcdvd joydev snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi coretemp radeon snd_hda_intel snd_intel_dspcfg intel_powerclamp snd_hda_codec hwmon snd_hwdep kvm_intel ttm snd_hda_core kvm irqbypass iTCO_wdt snd_pcm_oss tpm_infineon iTCO_vendor_support crc32c_intel snd_mixer_oss mei_wdt psmouse evdev pcspkr tpm_tis snd_pcm tpm_tis_core snd_timer e1000e snd lpc_ich tpm mfd_core rng_core soundcore acpi_cpufreq loop i2c_dev parport_pc lp parport ip_tables x_tables autofs4 [ 1513.209048] ---[ end trace 5ccb97e370c341f7 ]--- [ 1513.209051] RIP: 0010:ext4_readpage+0xa/0x50 [ 1513.209053] Code: ff a9 00 00 00 10 74 0b 66 83 bf e2 02 00 00 00 74 01 c3 31 d2 e9 46 ea 01 00 66 0f 1f 44 00 00 41 54 49 89 f4 55 48 8b 46 18 <48> 8b 28 48 8b 85 68 ff ff ff a9 00 00 00 10 74 1b 66 83 bd e2 02 [ 1513.209055] RSP: :96b18b09fd88 EFLAGS: 00010286 [ 1513.209057] RAX: dead0400 RBX: 96b18b09fe60 RCX: [ 1513.209058] RDX: 0001 RSI: c54a413fdf80 RDI: 8d8f41c3c800 [ 1513.209059] RBP: c54a413fdf80 R08: 0005 R09: 8d8f9bd61e50 [ 1513.209061] R10: R11: 8d8f41c3c800 R12: c54a413fdf80 [ 1513.209062] R13: 0b7e R14: 8d8efa04bb18 R15: 8d8efa04bc90 [ 1513.209065] FS: 7f60e012bf00() GS:8d8f9bc0() knlGS: [ 1513.209067] CS: 0010 DS: ES: CR0: 80050033 [ 1513.209068] CR2: 0112cc96 CR3: 0000bb72e000 CR4: 06f0 -- Meelis Roos
Re: gcc crashes with general protection faults in 5.9.0-rc3-00091-ge28f0104343d
Replying to myself: This is 5.9.0-rc3-00091-ge28f0104343d on Lenovo t460s that has ran fine up to 5.8.0. Now I reproduced the same problem with 5.9.0-rc3 on a HP desktop with Core2Quad CPU. The call trace is very similar and it's crashing gcc again while compiling 5.9-rc4. But it seems 5.9-rc4 cures it here as well - whatever the reason might have been. Nope, the reason was nondeterminism - it happened on the Core2Quad running 5.9-rc4 while trying to compile todays Linux from git. -- Meelis Roos
Re: gcc crashes with general protection faults in 5.9.0-rc3-00091-ge28f0104343d
a08308d03d58 EFLAGS: 00010286 [307299.392060] RAX: dead0400 RBX: a08308d03e38 RCX: [307299.392061] RDX: 0001 RSI: de94c0d00ec0 RDI: 9661c786ca00 [307299.392062] RBP: de94c0d00ec0 R08: 0001 R09: [307299.392063] R10: 0071 R11: 9661c786ca00 R12: de94c0d00ec0 [307299.392064] R13: 063b R14: 96636d3aaea0 R15: 96636d3ab018 [307299.392065] FS: 7f7871446f00() GS:966396c8() knlGS: [307299.392067] CS: 0010 DS: ES: CR0: 80050033 [307299.392068] CR2: 00a3b1f0 CR3: 6c3e2003 CR4: 003706e0 [307299.392069] Call Trace: [307299.392073] filemap_fault+0x193/0x7c0 [307299.392075] ext4_filemap_fault+0x28/0x3a [307299.392078] __do_fault+0x31/0xf0 [307299.392080] handle_mm_fault+0xf1a/0x14c0 [307299.392084] do_user_addr_fault+0x1b3/0x3e0 [307299.392087] exc_page_fault+0x61/0x130 [307299.392089] ? asm_exc_page_fault+0x8/0x30 [307299.392091] asm_exc_page_fault+0x1e/0x30 [307299.392093] RIP: 0033:0xa3b620 [307299.392096] Code: Bad RIP value. [307299.392097] RSP: 002b:7ffe4b382018 EFLAGS: 00010202 [307299.392099] RAX: 7f786fd32980 RBX: 7f786fd32a18 RCX: [307299.392100] RDX: 0002 RSI: 0001 RDI: 7f786fd4ee70 [307299.392101] RBP: R08: R09: 00c0 [307299.392102] R10: 0140 R11: 002f R12: 0001 [307299.392103] R13: R14: R15: 0000-- Meelis Roos
Re: 5.9-rc4: modpost undefined symbols + relocation in read-only section `.head.text'
Replying to myself: This is 5.9-rc4 git on a specific amd64 machine with Debian unstable and custom kernel config. 5.8 compiled and worked fine, I have seen something like this with different 5.9-git commits. I made sure my binutils and gcc-10 are up to date in Debian unstable and retried with 5.9-rc4. Still I see the same during build (have not tried booting it more than once after a failed boot). This only happens on this specific computer and is reproducible after make clean, other tested machines with Debian unstable toolchain are fine. Kernel config is below. I found another Debian amd64 machine that exhibits the "relocation in read-only section `.head.text'" warning but no symbol errors from MODPOST. The kernel fails to boot, grub selects next kernel automatically so image format is probably bad. LDS arch/x86/boot/compressed/vmlinux.lds AS arch/x86/boot/compressed/head_64.o VOFFSET arch/x86/boot/compressed/../voffset.h CC arch/x86/boot/compressed/string.o CC arch/x86/boot/compressed/cmdline.o CC arch/x86/boot/compressed/error.o OBJCOPY arch/x86/boot/compressed/vmlinux.bin RELOCS arch/x86/boot/compressed/vmlinux.relocs CC arch/x86/boot/compressed/cpuflags.o CC arch/x86/boot/compressed/early_serial_console.o CC arch/x86/boot/compressed/kaslr.o CC arch/x86/boot/compressed/kaslr_64.o AS arch/x86/boot/compressed/mem_encrypt.o CC arch/x86/boot/compressed/pgtable_64.o CC arch/x86/boot/compressed/acpi.o AS arch/x86/boot/compressed/efi_thunk_64.o CC arch/x86/boot/compressed/misc.o LZMAarch/x86/boot/compressed/vmlinux.bin.lzma MKPIGGY arch/x86/boot/compressed/piggy.S AS arch/x86/boot/compressed/piggy.o LD arch/x86/boot/compressed/vmlinux ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text' ld: warning: creating DT_TEXTREL in a PIE ZOFFSET arch/x86/boot/zoffset.h OBJCOPY arch/x86/boot/vmlinux.bin AS arch/x86/boot/header.o LD arch/x86/boot/setup.elf OBJCOPY arch/x86/boot/setup.bin BUILD arch/x86/boot/bzImage Setup is 14460 bytes (padded to 14848 bytes). System is 4785 kB CRC f036c6cb Kernel: arch/x86/boot/bzImage is ready (#322) ^Cmake[1]: *** [scripts/Makefile.modpost:117: __modpost] Interrupt make: *** [Makefile:1392: modules] Interrupt Config: # # Automatically generated file; DO NOT EDIT. # Linux/x86 5.9.0-rc4 Kernel Configuration # CONFIG_CC_VERSION_TEXT="gcc (Debian 10.2.0-6) 10.2.0" CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=100200 CONFIG_LD_VERSION=23500 CONFIG_CLANG_VERSION=0 CONFIG_CC_CAN_LINK=y CONFIG_CC_CAN_LINK_STATIC=y CONFIG_CC_HAS_ASM_GOTO=y CONFIG_CC_HAS_ASM_INLINE=y CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_TABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_BUILD_SALT="" CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_HAVE_KERNEL_ZSTD=y # CONFIG_KERNEL_GZIP is not set # CONFIG_KERNEL_BZIP2 is not set CONFIG_KERNEL_LZMA=y # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set # CONFIG_KERNEL_ZSTD is not set CONFIG_DEFAULT_INIT="" CONFIG_DEFAULT_HOSTNAME="prometheus" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_WATCH_QUEUE=y CONFIG_CROSS_MEMORY_ATTACH=y # CONFIG_USELIB is not set CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_GENERIC_IRQ_MIGRATION=y CONFIG_GENERIC_IRQ_INJECTION=y CONFIG_HARDIRQS_SW_RESEND=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_GENERIC_MSI_IRQ_DOMAIN=y CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y CONFIG_GENERIC_IRQ_RESERVATION_MODE=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y # CONFIG_GENERIC_IRQ_DEBUGFS is not set # end of IRQ subsystem CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_INIT=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set # CONFIG_NO_HZ_IDLE is not set CONFIG_NO_HZ_FULL=y CONFIG_CONTEXT_TRACKING=y # CONFIG_CONTEXT_TRACKING_FORCE is not set # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y # end of Timers subsystem # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set # # CPU/Task time and stats accounting # CONFIG_VIRT_CPU_ACCOUNTING=y CON
Re: gcc crashes with general protection faults in 5.9.0-rc3-00091-ge28f0104343d
Following up my yesterdays mail: This is 5.9.0-rc3-00091-ge28f0104343d on Lenovo t460s that has ran fine up to 5.8.0. Today I tried reproducing my linking problem with git kernel on my laptop and got segmentation faults in gcc. This is probably the corresponding dmesg part: 0xdead0400 loks like some kind of poisoning. [307299.392045] general protection fault, probably for non-canonical address 0xdead0400: [#1] SMP PTI Was not reproducible in 5.9-rc4 while recompiling the kernel in a loop for 8 hours. -- Meelis Roos
5.9-rc4: modpost undefined symbols + relocation in read-only section `.head.text'
This is 5.9-rc4 git on a specific amd64 machine with Debian unstable and custom kernel config. 5.8 compiled and worked fine, I hav seen something like this with different 5.9-git commits. I made sure my binutils and gcc-10 are up to date in Debian unstable and retried with 5.9-rc4. Still I see the same during build (have not tried booting it more than once after a failed boot). This only happens on this specific computer and is reproducible after make clean, other tested machines with Debian unstable toolchain are fine. Kernel config is below. ... CC arch/x86/boot/cpu.o LDS arch/x86/boot/compressed/vmlinux.lds AS arch/x86/boot/compressed/kernel_info.o AS arch/x86/boot/compressed/head_64.o VOFFSET arch/x86/boot/compressed/../voffset.h CC arch/x86/boot/compressed/string.o CC arch/x86/boot/compressed/cmdline.o CC arch/x86/boot/compressed/error.o OBJCOPY arch/x86/boot/compressed/vmlinux.bin RELOCS arch/x86/boot/compressed/vmlinux.relocs HOSTCC arch/x86/boot/compressed/mkpiggy CC arch/x86/boot/compressed/cpuflags.o CC arch/x86/boot/compressed/early_serial_console.o CC arch/x86/boot/compressed/kaslr.o CC arch/x86/boot/compressed/kaslr_64.o AS arch/x86/boot/compressed/mem_encrypt.o CC arch/x86/boot/compressed/pgtable_64.o CC arch/x86/boot/compressed/acpi.o XZKERN arch/x86/boot/compressed/vmlinux.bin.xz ERROR: modpost: "irq_poll_init" [drivers/scsi/lpfc/lpfc.ko] undefined! ERROR: modpost: "irq_poll_sched" [drivers/scsi/lpfc/lpfc.ko] undefined! ERROR: modpost: "irq_poll_complete" [drivers/scsi/lpfc/lpfc.ko] undefined! CC arch/x86/boot/compressed/misc.o make[1]: *** [scripts/Makefile.modpost:111: Module.symvers] Error 1 make[1]: *** Deleting file 'Module.symvers' make: *** [Makefile:1392: modules] Error 2 make: *** Waiting for unfinished jobs MKPIGGY arch/x86/boot/compressed/piggy.S AS arch/x86/boot/compressed/piggy.o LD arch/x86/boot/compressed/vmlinux ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text' ld: warning: creating DT_TEXTREL in a PIE ZOFFSET arch/x86/boot/zoffset.h OBJCOPY arch/x86/boot/vmlinux.bin AS arch/x86/boot/header.o LD arch/x86/boot/setup.elf OBJCOPY arch/x86/boot/setup.bin BUILD arch/x86/boot/bzImage Setup is 14396 bytes (padded to 14848 bytes). System is 4649 kB CRC 3b22552a Kernel: arch/x86/boot/bzImage is ready (#38) # # Automatically generated file; DO NOT EDIT. # Linux/x86 5.9.0-rc4 Kernel Configuration # CONFIG_CC_VERSION_TEXT="gcc (Debian 10.2.0-6) 10.2.0" CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=100200 CONFIG_LD_VERSION=23500 CONFIG_CLANG_VERSION=0 CONFIG_CC_CAN_LINK=y CONFIG_CC_CAN_LINK_STATIC=y CONFIG_CC_HAS_ASM_GOTO=y CONFIG_CC_HAS_ASM_INLINE=y CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_TABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_BUILD_SALT="" CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_HAVE_KERNEL_ZSTD=y # CONFIG_KERNEL_GZIP is not set # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set CONFIG_KERNEL_XZ=y # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set # CONFIG_KERNEL_ZSTD is not set CONFIG_DEFAULT_INIT="" CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y # CONFIG_WATCH_QUEUE is not set CONFIG_CROSS_MEMORY_ATTACH=y # CONFIG_USELIB is not set CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_GENERIC_IRQ_MIGRATION=y CONFIG_GENERIC_IRQ_INJECTION=y CONFIG_HARDIRQS_SW_RESEND=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_GENERIC_MSI_IRQ_DOMAIN=y CONFIG_IRQ_MSI_IOMMU=y CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y CONFIG_GENERIC_IRQ_RESERVATION_MODE=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y # CONFIG_GENERIC_IRQ_DEBUGFS is not set # end of IRQ subsystem CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_INIT=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ_FULL is not set # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y # end of Timers subsystem CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set # # CPU/Tas
Re: [bisected] "mm/vmalloc: Add flag for freeing of special permsissions" corrupts memory on ia64
I am out of the office and don't have access to this hardware either. I will try to find someone at Intel that does to speed this up. In the meantime I can send you a logging patch to do some sanity checks if you are able to run it. I am also cut off from testing anything - it seems the air conditioning unit in my test site has failked for good now and the earliest I can test anything is next week. I think I found your earlier mail, and it said 5.2-rc1 did not show the problem. I guess this wasn't the case after further testing, but 5.1 continued to be problem free? Yes, 5.2-rc1 was problematic in retesting, and 5.1 was OK. I also started suspecting binutils upgrade meanwhile - I upgraded binutils to 2.31.1-p5 in Gentoo right after booting into 5.1, but the bisection results were finally consistent so I did not look into binutils versions further. gcc has not changed for me recently. -- Meelis Roos
[bisected] "mm/vmalloc: Add flag for freeing of special permsissions" corrupts memory on ia64
I noticed that while 5.1 works on my HP Integrity RX2620, 5.2-rc6 crashed on boot nondeterministically. Bisecting it took many tries sice it does not happen on each boot and when it happes, the symptoms are different each time. But now the bisection converged to !ma868b104d7379e28013e9d48bdd2db25e0bdcf751 is the first bad commit commit 868b104d7379e28013e9d48bdd2db25e0bdcf751 Author: Rick Edgecombe Date: Thu Apr 25 17:11:36 2019 -0700 mm/vmalloc: Add flag for freeing of special permsissions Add a new flag VM_FLUSH_RESET_PERMS, for enabling vfree operations to immediately clear executable TLB entries before freeing pages, and handle resetting permissions on the directmap. This flag is useful for any kind of memory with elevated permissions, or where there can be related permissions changes on the directmap. Today this is RO+X and RO memory. Although this enables directly vfreeing non-writeable memory now, non-writable memory cannot be freed in an interrupt because the allocation itself is used as a node on deferred free list. So when RO memory needs to be freed in an interrupt the code doing the vfree needs to have its own work queue, as was the case before the deferred vfree list was added to vmalloc. For architectures with set_direct_map_ implementations this whole operation can be done with one TLB flush when centralized like this. For others with directmap permissions, currently only arm64, a backup method using set_memory functions is used to reset the directmap. When arm64 adds set_direct_map_ functions, this backup can be removed. When the TLB is flushed to both remove TLB entries for the vmalloc range mapping and the direct map permissions, the lazy purge operation could be done to try to save a TLB flush later. However today vm_unmap_aliases could flush a TLB range that does not include the directmap. So a helper is added with extra parameters that can allow both the vmalloc address and the direct mapping to be flushed during this operation. The behavior of the normal vm_unmap_aliases function is unchanged. Suggested-by: Dave Hansen Suggested-by: Andy Lutomirski Suggested-by: Will Deacon Signed-off-by: Rick Edgecombe Signed-off-by: Peter Zijlstra (Intel) Cc: Cc: Cc: Cc: Cc: Cc: Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Nadav Amit Cc: Rik van Riel Cc: Thomas Gleixner Link: https://lkml.kernel.org/r/20190426001143.4983-17-na...@vmware.com Signed-off-by: Ingo Molnar :04 04 6af7c46e4736f2b80e363d7d7793253f9f279ea4 58066de53107eab0705398b5d0c407424c138a86 M include :04 04 87cf40e161342a2a1c2dd49099740dc413b32449 19a0d6f5ba799f7f1d43ee1f0aebcc46be0e96bd M mm The symptoms seem to be often module loading related. One typical scenario is modprobes failing and udevd agents killed: Jul 1 09:17:57 rx2620 kernel: udevd[421]: worker [504] /devices/pci:00/:00:01.0 is taking a long time Jul 1 09:17:57 rx2620 kernel: udevd[421]: worker [495] /devices/pci:00/:00:01.1 is taking a long time Jul 1 09:19:57 rx2620 kernel: udevd[421]: worker [504] /devices/pci:00/:00:01.0 timeout; kill it Jul 1 09:19:57 rx2620 kernel: udevd[421]: seq 626 '/devices/pci:00/:00:01.0' killed Jul 1 09:19:57 rx2620 kernel: udevd[421]: worker [495] /devices/pci:00/:00:01.1 timeout; kill it Jul 1 09:19:57 rx2620 kernel: udevd[421]: seq 627 '/devices/pci:00/:00:01.1' killed Jul 1 09:19:57 rx2620 kernel: udevd[421]: worker [495] terminated by signal 9 (Killed) Jul 1 09:19:57 rx2620 kernel: udevd[421]: worker [495] failed while handling '/devices/pci:00/:00:01.1' Jul 1 09:19:57 rx2620 kernel: udevd[421]: worker [504] terminated by signal 9 (Killed) Jul 1 09:19:57 rx2620 kernel: udevd[421]: worker [504] failed while handling '/devices/pci:00/:00:01.0' Or: [ 13.363452] udevd[498]: IA-64 Illegal operation fault 0 [1] [ 13.363452] Modules linked in: ehci_pci(+) e1000(+) ehci_hcd usbcore usb_common pata_cmd64x libata efivars [ 13.363452] [ 13.363452] CPU: 0 PID: 498 Comm: udevd Not tainted 5.2.0-rc6 #46 [ 13.363452] Hardware name: hp server rx2620 , BIOS 04.29 11/30/2007 [ 13.363452] psr : 101008026010 ifs : 8003 ip : []Not tainted (5.2.0-rc6) Or (as mentioned in my first mail about the crash): 13.471600] udevd[498]: NaT consumption 2216203124768 [1] [ 13.471600] Modules linked in: L^A() ohci_hcd ehci_pci ehci_hcd usbcore pata_cmd64x e1000(+) usb_common libata efivars [ 13.471600] CPU: 0 PID: 498 Comm: udevd Not tainted 5.2.0-rc6-00015-g249155c20f9b #47 [ 13.473692] Hardware name: hp server rx2620 , BIOS 04.29 11/30
sock_prot_inuse_add unaligned access and crash on sparc64
Tried todays git on Sun Netra 240 (sparc64). Got bootup crash with custom, machine-specific config: [ 47.760841] Kernel unaligned access at TPC[7bf124] sock_prot_inuse_add+0x4/0x20 [ 47.856969] Unable to handle kernel paging request in mna handler [ 47.856972] at virtual address 14ee258a [ 47.997703] current->{active_,}mm->context = 0001 [ 48.073193] current->{active_,}mm->pgd = fff000133cc0c000 [ 48.144105] \|/ \|/ [ 48.144105] "@'/ .. \`@" [ 48.144105] /_| \__/ |_\ [ 48.144105] \__U_/ [ 48.337408] systemd(1): Oops [#1] [ 48.380862] CPU: 0 PID: 1 Comm: systemd Not tainted 5.2.0-rc5-00224-gbed3c0d84e7e #8 [ 48.482657] TSTATE: 004411001605 TPC: 007bf124 TNPC: 007bf128 Y: Not tainted [ 48.611912] TPC: [ 48.671370] g0: ff00 g1: 0200 g2: 0006 g3: [ 48.785748] g4: fff000133c0a5760 g5: fff000133ecc4000 g6: fff000133c0bc000 g7: 001e [ 48.900121] o0: 14ee240a o1: 00afef30 o2: o3: fff000133c0a5d50 [ 49.014495] o4: fff000133c0a5760 o5: sp: fff000133c0bf061 ret_pc: 008aac78 [ 49.133456] RPC: [ 49.196349] l0: 07feff80d5d0 l1: l2: fff000133cd11f88 l3: [ 49.310725] l4: l5: l6: l7: fff100869da0 [ 49.425099] i0: 008aac48 i1: 0200 i2: 0001 i3: [ 49.539475] i4: i5: i6: fff000133c0bf111 i7: 007c05f0 [ 49.653851] I7: <__sk_destruct+0x10/0x180> [ 49.707597] Call Trace: [ 49.739626] [007c05f0] __sk_destruct+0x10/0x180 [ 49.809396] [008abb1c] unix_release_sock+0x1bc/0x260 [ 49.884882] [008abbd0] unix_release+0x10/0x40 [ 49.952361] [007ba96c] __sock_release+0x2c/0xc0 [ 50.022130] [007baa0c] sock_close+0xc/0x20 [ 50.086187] [00594a70] __fput+0x90/0x220 [ 50.147944] [0047ee80] task_work_run+0x80/0xc0 [ 50.216574] [0042e23c] do_notify_resume+0x5c/0x80 [ 50.288624] [00404b48] __handle_signal+0xc/0x30 [ 50.358387] Disabling lock debugging due to kernel taint [ 50.428159] Caller[007c05f0]: __sk_destruct+0x10/0x180 [ 50.504788] Caller[008abb1c]: unix_release_sock+0x1bc/0x260 [ 50.587138] Caller[008abbd0]: unix_release+0x10/0x40 [ 50.661483] Caller[007ba96c]: __sock_release+0x2c/0xc0 [ 50.738110] Caller[007baa0c]: sock_close+0xc/0x20 [ 50.809023] Caller[00594a70]: __fput+0x90/0x220 [ 50.877647] Caller[0047ee80]: task_work_run+0x80/0xc0 [ 50.953136] Caller[0042e23c]: do_notify_resume+0x5c/0x80 [ 51.032053] Caller[00404b48]: __handle_signal+0xc/0x30 [ 51.108685] Caller[fff100205934]: 0xfff100205934 [ 51.178449] Instruction DUMP: [ 51.178451] 0100 [ 51.217335] 0100 [ 51.248217] c40260c8 [ 51.279096] [ 51.309979] 8528b002 [ 51.340858] 82004002 [ 51.371742] c4004005 [ 51.402622] 9400800a [ 51.433505] 81c3e008 [ 51.464385] [ 51.514706] Kernel panic - not syncing: Aiee, killing interrupt handler! [ 51.602778] Press Stop-A (L1-A) from sun keyboard or send break [ 51.602778] twice on console to return to the boot prom [ 51.749170] ---[ end Kernel panic - not syncing: Aiee, killing interrupt handler! ]--- Config: # # Automatically generated file; DO NOT EDIT. # Linux/sparc64 5.2.0-rc5 Kernel Configuration # # # Compiler: gcc (Debian 8.3.0-7) 8.3.0 # CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=80300 CONFIG_CLANG_VERSION=0 CONFIG_CC_HAS_ASM_GOTO=y CONFIG_CC_HAS_WARN_MAYBE_UNINITIALIZED=y CONFIG_CONSTRUCTORS=y CONFIG_IRQ_WORK=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_BUILD_SALT="" CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y # CONFIG_USELIB is not set # CONFIG_AUDIT is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_SHOW=y CONFIG_IRQ_PREFLOW_FASTEOI=y CONFIG_IRQ_DOMAIN=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_SPARSE_IRQ=y # CONFIG_GENERIC_IRQ_DEBUGFS is not set # end of IRQ subsystem CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ_FULL is not set # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y # end of Timers subsystem CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_VIRT_CPU_ACCOUNTING_GEN
sparc64 crash around deactivate_slab
The same Sun V445 that gave me BPF errors, had a differrnet error with todays git, just idling: [ 51.530195] Kernel unaligned access at TPC[58265c] deactivate_slab.isra.28+0xfc/0x420 [ 51.675010] Unable to handle kernel paging request in mna handler [ 51.675013] at virtual address 91d0200591d02005 [ 51.828736] current->{active_,}mm->context = 0026 [ 51.911239] current->{active_,}mm->pgd = fff000323d3d8000 [ 51.988743] \|/ \|/ [ 51.988743] "@'/ .. \`@" [ 51.988743] /_| \__/ |_\ [ 51.988743] \__U_/ [ 52.200013] swapper/0(0): Oops [#1] [ 52.250008] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0-rc5-00224-gbed3c0d84e7e #33 [ 52.365015] TSTATE: 004480e01600 TPC: 0058265c TNPC: 00582660 Y: Not tainted [ 52.506274] TPC: [ 52.578772] g0: 0001 g1: g2: fff000323ed26000 g3: [ 52.703780] g4: fff000323c0e6300 g5: fff000323ed26000 g6: fff000323c134000 g7: 0200 [ 52.828786] o0: o1: 000c71090c88 o2: 0001 o3: [ 52.953792] o4: 000e o5: 000e sp: fff000323c1370b1 ret_pc: 005825a0 [ 53.083799] RPC: [ 53.156299] l0: fff000323d378340 l1: 007f0101 l2: 0001 l3: 000c7109baa8 [ 53.281307] l4: 000f l5: 00210d00 l6: 000c71090c88 l7: [ 53.406312] i0: fff000323d18b1e0 i1: 00800101 i2: 91d0200591d02005 i3: fff000323f814e80 [ 53.531319] i4: fff000323f814e90 i5: 91d0200591d02005 i6: fff000323c1371c1 i7: 00582c18 [ 53.656328] I7: [ 53.715073] Call Trace: [ 53.750076] [00582c18] flush_cpu_slab+0x38/0x60 [ 53.826333] [004d02a8] flush_smp_call_function_queue+0x68/0x180 [ 53.922593] [0093585c] smp_call_function_client+0x1c/0x40 [ 54.011341] [004208d4] tl0_irq6+0x14/0x20 [ 54.080098] [0042c8b4] arch_cpu_idle+0x94/0xa0 [ 54.155104] [0048b118] do_idle+0x118/0x1a0 [ 54.225099] [0048b3bc] cpu_startup_entry+0x1c/0x40 [ 54.305102] [00a71984] 0xa71984 [ 54.361354] [4000] 0x4000 [ 54.420106] Disabling lock debugging due to kernel taint [ 54.496362] Caller[00582c18]: flush_cpu_slab+0x38/0x60 [ 54.580116] Caller[004d02a8]: flush_smp_call_function_queue+0x68/0x180 [ 54.683873] Caller[0093585c]: smp_call_function_client+0x1c/0x40 [ 54.780124] Caller[004208d4]: tl0_irq6+0x14/0x20 [ 54.856378] Caller[0042c8a8]: arch_cpu_idle+0x88/0xa0 [ 54.938882] Caller[0048b118]: do_idle+0x118/0x1a0 [ 55.016386] Caller[0048b3bc]: cpu_startup_entry+0x1c/0x40 [ 55.103889] Caller[00a71984]: 0xa71984 [ 55.167641] Caller[4000]: 0x4000 [ 55.233894] Instruction DUMP: [ 55.233895] c2758000 [ 55.276395] c2062020 [ 55.310146] b410001d [ 55.343897] [ 55.377650] 02c1c004 [ 55.411401] ee5da020 [ 55.445153] 106fffdf [ 55.478904] ba17 [ 55.512655] f85da028 [ 55.546407] [ 55.601412] Kernel panic - not syncing: Aiee, killing interrupt handler! [ 55.697685] Press Stop-A (L1-A) from sun keyboard or send break [ 55.697685] twice on console to return to the boot prom [ 55.857678] ---[ end Kernel panic - not syncing: Aiee, killing interrupt handler! ]--- Config: # # Automatically generated file; DO NOT EDIT. # Linux/sparc64 5.2.0-rc5 Kernel Configuration # # # Compiler: gcc (Debian 8.3.0-7) 8.3.0 # CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=80300 CONFIG_CLANG_VERSION=0 CONFIG_CC_HAS_ASM_GOTO=y CONFIG_CC_HAS_WARN_MAYBE_UNINITIALIZED=y CONFIG_IRQ_WORK=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_BUILD_SALT="" CONFIG_DEFAULT_HOSTNAME="v445" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y # CONFIG_USELIB is not set CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_SHOW=y CONFIG_IRQ_PREFLOW_FASTEOI=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_SPARSE_IRQ=y # CONFIG_GENERIC_IRQ_DEBUGFS is not set # end of IRQ subsystem CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ_FULL is not set # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y # end of Timers subsystem CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TA
Re: [PATCH] vmalloc: Don't use flush flag when no exec perm
The addition of VM_FLUSH_RESET_PERMS for BPF JIT allocations was bisected to prevent boot on an UltraSparc III machine. It was found that sometime shortly after the TLB flush this flag does on vfree of the BPF program, the machine hung. Further investigation showed that before any of the changes for this flag were introduced, with CONFIG_DEBUG_PAGEALLOC configured (which does a similar TLB flush of the vmalloc range on every vfree), this machine also hung shortly after the first vmalloc unmap/free. So the evidence points to there being some existing issue with the vmalloc TLB flushes, but it's still unknown exactly why these hangs are happening on sparc. It is also unknown when someone with this hardware could resolve this, and in the meantime using this flag on it turns a lurking behavior into something that prevents boot. The sparc TLB flush issue has been bisected and is being worked on now, so hopefully we won't need this patch: https://marc.info/?l=linux-sparc&m=155915694304118&w=2 And the sparc64 patch that fixes CONFIG_DEBUG_PAGEALLOC also fixes booting of the latest git kernel on Sun V445 where my problem initially happened. -- Meelis Roos
Re: [PATCH v2] vmalloc: Fix issues with flush flag
Switch VM_FLUSH_RESET_PERMS to use a regular TLB flush intead of vm_unmap_aliases() and fix calculation of the direct map for the CONFIG_ARCH_HAS_SET_DIRECT_MAP case. Meelis Roos reported issues with the new VM_FLUSH_RESET_PERMS flag on a sparc machine. On investigation some issues were noticed: 1. The calculation of the direct map address range to flush was wrong. This could cause problems on x86 if a RO direct map alias ever got loaded into the TLB. This shouldn't normally happen, but it could cause the permissions to remain RO on the direct map alias, and then the page would return from the page allocator to some other component as RO and cause a crash. 2. Calling vm_unmap_alias() on vfree could potentially be a lot of work to do on a free operation. Simply flushing the TLB instead of the whole vm_unmap_alias() operation makes the frees faster and pushes the heavy work to happen on allocation where it would be more expected. In addition to the extra work, vm_unmap_alias() takes some locks including a long hold of vmap_purge_lock, which will make all other VM_FLUSH_RESET_PERMS vfrees wait while the purge operation happens. 3. page_address() can have locking on some configurations, so skip calling this when possible to further speed this up. Fixes: 868b104d7379 ("mm/vmalloc: Add flag for freeing of special permsissions") Reported-by: Meelis Roos Cc: Meelis Roos Cc: Peter Zijlstra Cc: "David S. Miller" Cc: Dave Hansen Cc: Borislav Petkov Cc: Andy Lutomirski Cc: Ingo Molnar Cc: Nadav Amit Signed-off-by: Rick Edgecombe --- Changes since v1: - Update commit message with more detail - Fix flush end range on !CONFIG_ARCH_HAS_SET_DIRECT_MAP case It does not work on my V445 where the initial problem happened. [ 46.582633] systemd[1]: Detected architecture sparc64. Welcome to Debian GNU/Linux 10 (buster)! [ 46.759048] systemd[1]: Set hostname to . [ 46.831383] systemd[1]: Failed to bump fs.file-max, ignoring: Invalid argument [ 67.989695] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 68.074706] rcu: 0-...!: (0 ticks this GP) idle=5c6/1/0x4000 softirq=33/33 fqs=0 [ 68.198443] rcu: 2-...!: (0 ticks this GP) idle=e7e/1/0x4000 softirq=67/67 fqs=0 [ 68.322198] (detected by 1, t=5252 jiffies, g=-939, q=108) [ 68.402204] CPU[ 0]: TSTATE[80001603] TPC[0043f298] TNPC[0043f29c] TASK[systemd-debug-g:89] [ 68.556001] TPC[smp_synchronize_tick_client+0x18/0x1a0] O7[0xfff1691c] I7[xcall_sync_tick+0x1c/0x2c] RPC[alloc_set_pte+0xf4/0x300] [ 68.750973] CPU[ 2]: TSTATE[80001600] TPC[0043f298] TNPC[0043f29c] TASK[systemd-cryptse:88] [ 68.904741] TPC[smp_synchronize_tick_client+0x18/0x1a0] O7[filemap_map_pages+0x3cc/0x3e0] I7[xcall_sync_tick+0x1c/0x2c] RPC[handle_mm_fault+0xa0/0x180] [ 69.115991] rcu: rcu_sched kthread starved for 5252 jiffies! g-939 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=3 [ 69.262239] rcu: RCU grace-period kthread stack dump: [ 69.334741] rcu_sched I010 2 0x0600 [ 69.413495] Call Trace: [ 69.448501] [0093325c] schedule+0x1c/0xc0 [ 69.517253] [00936c74] schedule_timeout+0x154/0x260 [ 69.598514] [004b65a4] rcu_gp_kthread+0x4e4/0xac0 [ 69.677261] [0047ecfc] kthread+0xfc/0x120 [ 69.746018] [004060a4] ret_from_fork+0x1c/0x2c [ 69.821014] [] 0x0 and hangs here, software watchdog kicks in soon. -- Meelis Roos
Re: DISCONTIGMEM is deprecated
ia64 (looks complicated ...) Well as far as I can tell it was not even used 12 or so years ago on Itanium when I worked on that stuff. My notes tell that on UP ia64 (RX2620), !NUMA was broken with both SPARSEMEM and DISCONTIGMEM. NUMA+SPARSEMEM or !NUMA worked. Even NUMA+DISCONTIGMEM worked, that was my config on 2-CPU RX2660. -- Meelis Roos
5.1-rc6: UBSAN: Undefined behaviour in mm/compaction.c:1167:30
The warning UBSAN: Undefined behaviour in mm/compaction.c:1167:30 happened with 5.1-rc6 on UP 32-bit P4 PC with highmem. [ 95.135408] [ 95.135478] UBSAN: Undefined behaviour in mm/compaction.c:1167:30 [ 95.135528] shift exponent 32 is too large for 32-bit type 'long unsigned int' [ 95.135579] CPU: 0 PID: 13 Comm: kcompactd0 Not tainted 5.1.0-rc6 #71 [ 95.135626] Hardware name: MSI MS-6547 /MS-6547 , BIOS 07.00T [ 95.135681] Call Trace: [ 95.135742] dump_stack+0x16/0x1e [ 95.135791] ubsan_epilogue+0xb/0x29 [ 95.135836] __ubsan_handle_shift_out_of_bounds.cold.14+0x20/0x6a [ 95.135887] ? page_vma_mapped_walk+0x125/0x410 [ 95.135935] ? page_counter_cancel+0x16/0x30 [ 95.135984] compaction_alloc.cold.43+0x56/0xbc [ 95.136033] ? free_unref_page_commit.isra.95+0x7a/0x80 [ 95.136082] migrate_pages+0x99/0x732 [ 95.136127] ? isolate_migratepages_block+0x940/0x940 [ 95.136172] ? __ClearPageMovable+0x10/0x10 [ 95.136217] compact_zone+0x7e2/0xb70 [ 95.136262] ? compaction_suitable+0x49/0x60 [ 95.136306] kcompactd_do_work+0xdb/0x1d0 [ 95.136389] ? __switch_to_asm+0x26/0x4c [ 95.136470] kcompactd+0x4f/0x110 [ 95.136550] ? wait_woken+0x60/0x60 [ 95.136630] kthread+0xe5/0x100 [ 95.136709] ? kcompactd_do_work+0x1d0/0x1d0 [ 95.136789] ? kthread_create_worker_on_cpu+0x20/0x20 [ 95.136870] ret_from_fork+0x2e/0x38 [ 95.136949] It is not reproducible at will - did not happen on 2 next reboots, so it probably originates from an earlier version. Full dmesg and config are below. [0.00] Linux version 5.1.0-rc6 (mroos@kukeseen) (gcc version 8.3.0 (Debian 8.3.0-6)) #71 Mon Apr 22 01:30:01 EEST 2019 [0.00] x86/fpu: x87 FPU will use FXSAVE [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000ec000-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0x3fff] usable [0.00] BIOS-e820: [mem 0xfec0-0xfecf] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfeef] reserved [0.00] BIOS-e820: [mem 0xffee-0xfff0fffe] reserved [0.00] BIOS-e820: [mem 0xfffc-0x] reserved [0.00] Notice: NX (Execute Disable) protection missing in CPU! [0.00] Legacy DMI 2.3 present. [0.00] DMI: MSI MS-6547 /MS-6547 , BIOS 07.00T [0.00] tsc: Fast TSC calibration using PIT [0.00] tsc: Detected 2000.078 MHz processor [0.009834] e820: update [mem 0x-0x0fff] usable ==> reserved [0.009838] e820: remove [mem 0x000a-0x000f] usable [0.009849] last_pfn = 0x4 max_arch_pfn = 0x10 [0.009866] MTRR default type: uncachable [0.009868] MTRR fixed ranges enabled: [0.009871] 0-9 write-back [0.009873] A-B uncachable [0.009875] C-C7FFF write-protect [0.009878] C8000-E uncachable [0.009879] F-F write-protect [0.009881] MTRR variable ranges enabled: [0.009885] 0 base 0 mask FC000 write-back [0.009886] 1 disabled [0.009887] 2 disabled [0.009888] 3 disabled [0.009889] 4 disabled [0.009890] 5 disabled [0.009893] 6 base 0E000 mask FFC00 write-combining [0.009895] 7 base 0E000 mask FFC00 write-combining [0.010289] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- UC [0.032447] initial memory mapped: [mem 0x-0x11bf] [0.032502] BRK [0x11831000, 0x11831fff] PGTABLE [0.032536] ACPI: Early table checksum verification disabled [0.033564] ACPI BIOS Error (bug): A valid RSDP was not found (20190215/tbxfroot-210) [0.033571] 140MB HIGHMEM available. [0.033575] 883MB LOWMEM available. [0.033578] mapped low ram: 0 - 373fe000 [0.033581] low ram: 0 - 373fe000 [0.033585] BRK [0x11832000, 0x11832fff] PGTABLE [0.038164] Zone ranges: [0.038177] DMA [mem 0x1000-0x00ff] [0.038181] Normal [mem 0x0100-0x373fdfff] [0.038185] HighMem [mem 0x373fe000-0x3fff] [0.038188] Movable zone start for each node [0.038190] Early memory node ranges [0.038193] node 0: [mem 0x1000-0x0009efff] [0.038196] node 0: [mem 0x0010-0x3fff] [0.038206] Zeroed struct page in unavailable ranges: 98 pages [0.038210] Initmem setup node 0 [
Re: CONFIG_DEBUG_VIRTUAL breaks boot on x86-32
You might be hitting a bug I found. Try applying this patch: https://marc.info/?l=linux-kernel&m=155355953012985&w=2 Unfortunately it did not change anything. -- Meelis Roos
Re: CONFIG_DEBUG_VIRTUAL breaks boot on x86-32
13.104639] [drm] radeon: 1 quad pipes, 1 Z pipes initialized [ 13.105883] radeon :01:00.0: WB disabled [ 13.105921] radeon :01:00.0: fence driver on ring 0 use gpu addr 0xe000 and cpu addr 0xc1c2fb20 [ 13.105942] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 13.105948] [drm] Driver supports precise vblank timestamp query. [ 13.106006] [drm] radeon: irq initialized. [ 13.106061] [drm] Loading R300 Microcode [ 13.145700] Registered IR keymap rc-hauppauge [ 13.146003] rc rc0: Hauppauge as /devices/pci:00/:00:0a.2/i2c-1/1-0018/rc/rc0 [ 13.146230] input: Hauppauge as /devices/pci:00/:00:0a.2/i2c-1/1-0018/rc/rc0/input6 [ 13.152323] rc rc0: lirc_dev: driver ir_kbd_i2c registered at minor = 0, scancode receiver, no transmitter [ 13.205146] cx88_blackbird: cx2388x blackbird driver version 1.0.0 loaded [ 13.205174] cx8802: registering cx8802 driver, type: blackbird access: shared [ 13.205183] cx8802: subsystem: 107d:663c, board: Leadtek PVR 2000 [card=9] [ 13.205538] cx88_blackbird: cx23416 based mpeg encoder (blackbird reference design) [ 13.205767] cx88_blackbird: blackbird_mbox_func: blackbird:Firmware and/or mailbox pointer not initialized or corrupted [ 15.612593] cx88_blackbird: blackbird_load_firmware: blackbird:Firmware upload successful. [ 15.630492] [drm] radeon: ring at 0xE0001000 [ 15.630545] [drm] ring test succeeded in 0 usecs [ 15.630875] [drm] ib test succeeded in 0 usecs [ 15.632854] [drm] Radeon Display Connectors [ 15.632867] [drm] Connector 0: [ 15.632872] [drm] VGA-1 [ 15.632877] [drm] DDC: 0x60 0x60 0x60 0x60 0x60 0x60 0x60 0x60 [ 15.632883] [drm] Encoders: [ 15.632887] [drm] CRT1: INTERNAL_DAC1 [ 15.632892] [drm] Connector 1: [ 15.632896] [drm] DVI-I-1 [ 15.632900] [drm] HPD1 [ 15.632905] [drm] DDC: 0x64 0x64 0x64 0x64 0x64 0x64 0x64 0x64 [ 15.632910] [drm] Encoders: [ 15.632913] [drm] CRT2: INTERNAL_DAC2 [ 15.632918] [drm] DFP1: INTERNAL_TMDS1 [ 15.632922] [drm] Connector 2: [ 15.632925] [drm] SVIDEO-1 [ 15.632929] [drm] Encoders: [ 15.632933] [drm] TV1: INTERNAL_DAC2 [ 15.749890] [drm] fb mappable at 0xC004 [ 15.749914] [drm] vram apper at 0xC000 [ 15.749919] [drm] size 5242880 [ 15.749923] [drm] fb depth is 24 [ 15.749927] [drm]pitch is 5120 [ 15.752277] fbcon: radeondrmfb (fb0) is primary device [ 15.803402] Console: switching to colour frame buffer device 160x64 [ 15.930197] radeon :01:00.0: fb0: radeondrmfb frame buffer device [ 15.930273] [drm] Initialized radeon 2.50.0 20080528 for :01:00.0 on minor 0 [ 16.272511] cx88_blackbird: blackbird_initialize_codec: blackbird:Firmware version is 0x02060039 [ 16.284001] cx88_blackbird: registered device video1 [mpeg] [ 16.287894] modprobe (155) used greatest stack depth: 5496 bytes left [ 16.803253] Adding 2096124k swap on /dev/sda5. Priority:-2 extents:1 across:2096124k [ 20.717229] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s18: link becomes ready [ 21.027559] systemd-udevd (100) used greatest stack depth: 4416 bytes left -- Meelis Roos
CONFIG_DEBUG_VIRTUAL breaks boot on x86-32
I tried to debug another problem and turned on most debug options for memory. The resulting kernel failed to boot. Bisecting the configurations led to CONFIG_DEBUG_VIRTUAL - if I turned it on in addition to some other debug options, the machine crashed with kernel BUG at arch/x86/mm/physaddr.c:79! Screenshot at http://kodu.ut.ee/~mroos/debug_virtual-boot-hang-1.jpg The machine was Athlon XP with VIA KT600 chipset and 2G RAM. -- Meelis Roos
Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28
First, I found out that both the problematic alphas had memory compaction and page migration and bounce buffers turned on, and working alphas had them off. Next, turing off these options makes the problematic alphas work. OK, thanks for testing! Can you narrow down whether the problem is due to CONFIG_BOUNCE or CONFIG_MIGRATION + CONFIG_COMPACTION? These are two completely different things so knowing where to look will help. Thanks! Tested both. Just CONFIG_MIGRATION + CONFIG_COMPACTION breaks the alpha. Just CONFIG_BOUNCE has no effect in 5 tries. -- Meelis Roos
Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28
Could https://lore.kernel.org/linux-mm/20190219123212.29838-1-lar...@axis.com/T/#u be relevant? Tried it, still broken. I wrote: But my kernel config had memory compaction (that turned on page migration) and bounce buffers. I do not remember why I found them necessary but I will try without them. First, I found out that both the problematic alphas had memory compaction and page migration and bounce buffers turned on, and working alphas had them off. Next, turing off these options makes the problematic alphas work. -- Meelis Roos
Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28
Thanks for information. Yeah, that makes somewhat more sense. Can you ever see the failure if you disable CONFIG_TRANSPARENT_HUGEPAGE? HAVE_ARCH_TRANSPARENT_HUGEPAGE [=n] Seems there is no THP on alpha. Because your findings still seem to indicate that there' some problem with page migration and Alpha (added MM list to CC). But my kernel config had memory compaction (that turned on page migration) and bounce buffers. I do not remember why I found them necessary but I will try without them. -- Meelis Roos
Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28
The result of the bisection is [88dbcbb3a4847f5e6dfeae952d3105497700c128] blkdev: avoid migration stalls for blkdev pages Is that result relevant for the problem or should I continue bisecting between 4.20.0 and the so far first bad commit? Can you try reverting the commit and see if it makes the problem go away? Tried reverting it on top of 5.0.0-rc6-00153-g5ded5871030e and it seems to make the kernel work - emerge --sync succeeded. There is more to it. After running 5.0.0-rc6-00153-g5ded5871030e-dirty (with the revert of that patch) successfully for Gentoo update, I upgraded the kernel to 5.0.0-rc7-00011-gb5372fe5dc84-dirty (todays git + revert of this patch) and it broke on rsync again: RepoStorageException: command exited with status -6: rsync -a --link-dest /usr/portage --exclude=/distfiles --exclude=/local --exclude=/lost+found --exclude=/packages --exclude /.tmp-unverified-download-quarantine /usr/portage/ /usr/portage/.tmp-unverified-download-quarantine/ Nothing in dmesg. This means the real root reason is somewhere deeper and reverting this commit just made it less likely to happen. -- Meelis Roos
Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28
Hum, weird. I have hard time understanding how that change could be causing fs corruption on Aplha but OTOH it is not completely unthinkable. With this commit we may migrate some block device pages we were not able to migrate previously and that could be causing some unexpected issue. I'll look into this. To make things more interesting, it does not happen on any alpha but only one subarch so far: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1889207.html is my original bug report. -- Meelis Roos
Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28
The result of the bisection is [88dbcbb3a4847f5e6dfeae952d3105497700c128] blkdev: avoid migration stalls for blkdev pages Is that result relevant for the problem or should I continue bisecting between 4.20.0 and the so far first bad commit? Can you try reverting the commit and see if it makes the problem go away? Tried reverting it on top of 5.0.0-rc6-00153-g5ded5871030e and it seems to make the kernel work - emerge --sync succeeded. Unfinished further bisection has also not yielded any other bad revisions so far. -- Meelis Roos
Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28
I have noticed ext4 filesystem corruption on two of my test alphas with 4.20.0-09062-gd8372ba8ce28. Retried it, still happens with 5.0.0-rc5-00358-gdf3865f8f568 - rsync of emerge --sync just fail with nothing in dmesg. Finished second round of bisecting, first round did not get me far enough so I may still have false "goods" in my bisection history. The command I used for bisecting was Gentoos emerge --sync. that sometimes failed from error -6 or -11 from rsync. Usually the file system corruption did not happen and nothing was in dmesg, just file IO error from rsync. The result of the bisection is [88dbcbb3a4847f5e6dfeae952d3105497700c128] blkdev: avoid migration stalls for blkdev pages Is that result relevant for the problem or should I continue bisecting between 4.20.0 and the so far first bad commit? On AlphaServer DS10: [10749.664418] EXT4-fs error (device sda2): __ext4_iget:5052: inode #1853093: block 1: comm rsync: invalid block On AlphaServer DS10L: [ 5325.064656] EXT4-fs error (device sda2): htree_dirblock_to_tree:1007: inode #1191951: block 4731728: comm rm: bad entry in directory: directory entry overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096 [ 5325.069539] EXT4-fs error (device sda2): htree_dirblock_to_tree:1007: inode #1191951: block 4731728: comm rm: bad entry in directory: directory entry overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096 [ 5325.077351] EXT4-fs error (device sda2): ext4_empty_dir:2718: inode #1191951: block 4731728: comm rm: bad entry in directory: directory entry overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096 Two other alphas, PC-164 and Eiger, worked fine with the same kernel version (different kernel configs according to hardware). The details: 4.20 worked fine, with gentoo emerge package update after bootup. Next, 4.20.0-06428-g00c569b567c7 worked fine, with gentoo emerge after bootup. Next, 4.20.0-09062-gd8372ba8ce28 booted up fine but rsync and rm during start of gentoo emerge errored out like above. So the corruption _might_ have happened during bootup of previous kernel but it looks more likely that only the latest kernel with blk-mq introduced the problems. mq-deadline is in use on all the alphas. DS10 has Symbios 53C896 SCSI (sym2 driver), DS10L has QLogic ISP1040, so they are different. Working Eiger and PC164 have sym2 based scsi controllers too. -- Meelis Roos
Undefined behaviour in drivers/gpu/drm/radeon/r200.c:480:34 - shift exponent 4096 is too large
Got UBSAN warning from Dell D600 running 5.0.0-rc4-00218-g12491ed354d2. The warning did not happen on bootup but during xfce session start or console switch. [ 15.323113] radeon :01:00.0: putting AGP V2 device into 4x mode [ 15.323134] radeon :01:00.0: GTT: 128M 0xE000 - 0xE7FF [ 15.323142] radeon :01:00.0: VRAM: 128M 0xE800 - 0xEFFF (32M used) [ 15.323459] [drm] Detected VRAM RAM=128M, BAR=128M [ 15.323463] [drm] RAM width 64bits DDR [ 15.323566] [TTM] Zone kernel: Available graphics memory: 412446 kiB [ 15.323567] [TTM] Initializing pool allocator [ 15.323580] [TTM] Initializing DMA pool allocator [ 15.323609] [drm] radeon: 32M of VRAM memory ready [ 15.323611] [drm] radeon: 128M of GTT memory ready. [ 15.323621] [drm] radeon: power management initialized [ 15.331289] radeon :01:00.0: WB disabled [ 15.331296] radeon :01:00.0: fence driver on ring 0 use gpu addr 0xe000 and cpu addr 0x712386dd [ 15.331299] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 15.331300] [drm] Driver supports precise vblank timestamp query. [ 15.331315] [drm] radeon: irq initialized. [ 15.331317] [drm] Loading R200 Microcode [...] [ 15.795041] [drm] radeon: ring at 0xE0001000 [ 15.795073] [drm] ring test succeeded in 1 usecs [ 15.795316] [drm] ib test succeeded in 0 usecs [ 15.801857] [drm] Panel ID String: 2K077141X13 [ 15.801861] [drm] Panel Size 1024x768 [ 15.801938] [drm] No TV DAC info found in BIOS [ 15.802012] [drm] Radeon Display Connectors [ 15.802015] [drm] Connector 0: [ 15.802017] [drm] VGA-1 [ 15.802023] [drm] DDC: 0x60 0x60 0x60 0x60 0x60 0x60 0x60 0x60 [ 15.802024] [drm] Encoders: [ 15.802027] [drm] CRT1: INTERNAL_DAC1 [ 15.802030] [drm] Connector 1: [ 15.802031] [drm] DVI-D-1 [ 15.802033] [drm] HPD1 [ 15.802038] [drm] DDC: 0x64 0x64 0x64 0x64 0x64 0x64 0x64 0x64 [ 15.802040] [drm] Encoders: [ 15.802042] [drm] DFP1: INTERNAL_TMDS1 [ 15.802044] [drm] Connector 2: [ 15.802046] [drm] LVDS-1 [ 15.802047] [drm] Encoders: [ 15.802049] [drm] LCD1: INTERNAL_LVDS [ 15.802051] [drm] Connector 3: [ 15.802053] [drm] SVIDEO-1 [ 15.802054] [drm] Encoders: [ 15.802056] [drm] TV1: INTERNAL_DAC2 [ 15.845987] [drm] fb mappable at 0xE804 [ 15.845988] [drm] vram apper at 0xE800 [ 15.845989] [drm] size 1572864 [ 15.845990] [drm] fb depth is 16 [ 15.845990] [drm]pitch is 2048 [ 15.848183] fbcon: radeondrmfb (fb0) is primary device [ 15.892233] Console: switching to colour frame buffer device 128x48 [ 15.901408] radeon :01:00.0: fb0: radeondrmfb frame buffer device [ 15.905786] [drm] Initialized radeon 2.50.0 20080528 for :01:00.0 on minor 0 [...] [ 447.146334] [ 447.146347] UBSAN: Undefined behaviour in drivers/gpu/drm/radeon/r200.c:480:34 [ 447.146351] shift exponent 4096 is too large for 32-bit type 'int' [ 447.146357] CPU: 0 PID: 386 Comm: Xorg Not tainted 5.0.0-rc4-00218-g12491ed354d2 #7 [ 447.146358] Hardware name: Dell Computer Corporation Latitude D600 /0X2034, BIOS A16 06/29/2005 [ 447.146359] Call Trace: [ 447.146375] dump_stack+0x16/0x19 [ 447.146379] ubsan_epilogue+0xb/0x29 [ 447.146381] __ubsan_handle_shift_out_of_bounds.cold.14+0x26/0x80 [ 447.146486] ? radeon_cs_packet_next_reloc+0x3c/0x150 [radeon] [ 447.146521] ? r100_reloc_pitch_offset+0x27/0x150 [radeon] [ 447.146551] r200_packet0_check.cold.0+0xf/0x45 [radeon] [ 447.146592] ? r200_copy_dma+0x430/0x430 [radeon] [ 447.146626] r100_cs_parse_packet0+0x53/0xe0 [radeon] [ 447.146661] r100_cs_parse+0x12e/0x440 [radeon] [ 447.146700] ? r200_copy_dma+0x430/0x430 [radeon] [ 447.146734] radeon_cs_ioctl+0x256/0x890 [radeon] [ 447.146743] ? ttm_bo_init_reserved+0x338/0x390 [ttm] [ 447.146779] ? radeon_cs_parser_init+0x550/0x550 [radeon] [ 447.146804] drm_ioctl_kernel+0x96/0xe0 [drm] [ 447.146816] drm_ioctl+0x25f/0x530 [drm] [ 447.146850] ? radeon_cs_parser_init+0x550/0x550 [radeon] [ 447.146855] ? ktime_get_mono_fast_ns+0xb6/0x1f0 [ 447.146880] radeon_drm_ioctl+0x40/0x80 [radeon] [ 447.146905] ? radeon_pci_shutdown+0x30/0x30 [radeon] [ 447.146909] do_vfs_ioctl+0x90/0x6c0 [ 447.146913] ? handle_mm_fault+0xa48/0xfe0 [ 447.146918] ? vm_mmap_pgoff+0x88/0xd0 [ 447.146923] ? ktime_get_ts64+0x5f/0x1e0 [ 447.146925] ksys_ioctl+0x39/0x70 [ 447.146927] sys_ioctl+0x11/0x13 [ 447.146930] do_fast_syscall_32+0x95/0x1d0 [ 447.146934] entry_SYSENTER_32+0x6b/0xbd [ 447.146936] EIP: 0xb7f937cd [ 447.146939] Code: 54 cd ff ff 85 d2 8b 98 58 cd ff ff 89 c8 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76 [ 447.146941] EAX: ffda EBX: 000e ECX: c0206466 EDX: 02311c40 [ 447.1469
Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28
02.01.19 17:52 I wrote: I have noticed ext4 filesystem corruption on two of my test alphas with 4.20.0-09062-gd8372ba8ce28. Retried it, still happens with 5.0.0-rc5-00358-gdf3865f8f568 - rsync of emerge --sync just fail with nothing in dmesg. On AlphaServer DS10: [10749.664418] EXT4-fs error (device sda2): __ext4_iget:5052: inode #1853093: block 1: comm rsync: invalid block On AlphaServer DS10L: [ 5325.064656] EXT4-fs error (device sda2): htree_dirblock_to_tree:1007: inode #1191951: block 4731728: comm rm: bad entry in directory: directory entry overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096 [ 5325.069539] EXT4-fs error (device sda2): htree_dirblock_to_tree:1007: inode #1191951: block 4731728: comm rm: bad entry in directory: directory entry overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096 [ 5325.077351] EXT4-fs error (device sda2): ext4_empty_dir:2718: inode #1191951: block 4731728: comm rm: bad entry in directory: directory entry overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096 Two other alphas, PC-164 and Eiger, worked fine with the same kernel version (different kernel configs according to hardware). The details: 4.20 worked fine, with gentoo emerge package update after bootup. Next, 4.20.0-06428-g00c569b567c7 worked fine, with gentoo emerge after bootup. Next, 4.20.0-09062-gd8372ba8ce28 booted up fine but rsync and rm during start of gentoo emerge errored out like above. So the corruption _might_ have happened during bootup of previous kernel but it looks more likely that only the latest kernel with blk-mq introduced the problems. mq-deadline is in use on all the alphas. DS10 has Symbios 53C896 SCSI (sym2 driver), DS10L has QLogic ISP1040, so they are different. Working Eiger and PC164 have sym2 based scsi controllers too. -- Meelis Roos
Re: bisected: ttyS panic on pa-risc
The patch below was just applied to my tree, hopefully ti fixes this issue. Yes, it cures both the HP A500 (parisc) and HP RX2620 (ia64) that I also found breaking meanwhile. -- Meelis Roos
bisected: ttyS panic on pa-risc
My HP 9000 A500 (pa-risc architecture) paniced in 5.0-rc1. It happened after printing dmesg lines about ttyS and before moving on to scsi printk-s. I bisected it and the panic symptoms changed during that (some had backtrace, some had just panic). This is one of the crashes I got: Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled serial :00:04.0: enabling device (0146 -> 0147) printk: console [ttyS0] disabled :00:04.0: ttyS0 at MMIO 0xf800 (irq = 21, base_baud = 115200) is a 16550A printk: console [ttyS0] enabled printk: console [ttyS0] enabled printk: bootconsole [ttyB0] disabled printk: bootconsole [ttyB0] disabled :00:04.0: ttyS1 at MMIO 0xf808 (irq = 21, base_baud = 115200) is a 16550A :00:04.0: ttyS2 at MMIO 0xf810 (irq = 21, base_baud = 115200) is a 16550A serial :00:05.0: enabling device (0140 -> 0143) :00:05.0: ttyS3 at MMIO 0xf8005000 (irq = 22, base_baud = 115200) is a 16550A Backtrace: [<40502268>] pciserial_init_ports+0x128/0x240 [<405040b8>] pciserial_init_one+0x1e0/0x2f0 [<404b2b8c>] pci_device_probe+0xfc/0x180 [<40513958>] really_probe+0x268/0x3d0 [<40513d28>] driver_probe_device+0xf8/0x100 [<40513e54>] __driver_attach+0x124/0x130 [<40510dc4>] bus_for_each_dev+0x9c/0xe8 [<40513040>] driver_attach+0x28/0x38 [<405128c0>] bus_a Normal dmesg excerpt from working kernel before the problem: [6.746131] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [6.771772] serial :00:04.0: enabling device (0146 -> 0147) [6.792657] printk: console [ttyS0] disabled [6.829825] :00:04.0: ttyS0 at MMIO 0xf800 (irq = 21, base_baud = 115200) is a 16550A [6.837151] printk: console [ttyS0] enabled [6.877768] printk: bootconsole [ttyB0] disabled [6.904352] :00:04.0: ttyS1 at MMIO 0xf808 (irq = 21, base_baud = 115200) is a 16550A [6.961051] :00:04.0: ttyS2 at MMIO 0xf810 (irq = 21, base_baud = 115200) is a 16550A [6.969881] serial :00:05.0: enabling device ( -> 0003) [7.004160] serial :00:05.0: enabling SERR and PARITY (0003 -> 0143) [7.030298] :00:05.0: ttyS3 at MMIO 0xf8005000 (irq = 22, base_baud = 115200) is a 16550A [7.041663] serial :00:05.0: Couldn't register serial port 0, irq 22, type 2, error -28 [7.145456] sym53c8xx :00:01.0: enabling device ( -> 0003) Bisection leads to this commit: 6d7f677a2afa1c82d7fc7af7f9159cbffd5dc010 is the first bad commit commit 6d7f677a2afa1c82d7fc7af7f9159cbffd5dc010 Author: Darwin Dingel Date: Mon Dec 10 11:29:09 2018 +1300 serial: 8250: Rate limit serial port rx interrupts during input overruns When a serial port gets faulty or gets flooded with inputs, its interrupt handler starts to work double time to get the characters to the workqueue for the tty layer to handle them. When this busy time on the serial/tty subsystem happens during boot, where it is also busy on the userspace trying to initialise, some processes can continuously get preempted and will be on hold until the interrupts subside. The fix is to backoff on processing received characters for a specified amount of time when an input overrun is seen (received a new character before the previous one is processed). This only stops receive and will continue to transmit characters to serial port. After the backoff period is done, it receive will be re-enabled. This is optional and will only be enabled by setting 'overrun-throttle-ms' in the dts. Signed-off-by: Darwin Dingel Signed-off-by: Greg Kroah-Hartman :04 04 4ea6cd68ededa0c9ffaa218668ffeb35557070a5 a011db1916fbf5cfdcfff836a81e4fb5ee737003 M drivers :04 04 b1b1dc977965eb2db6b2cc79939446a1cf2f684d 41322ab1c199f504cfcc5b2ca211b4638d41351c M include -- Meelis Roos
ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28
I have noticed ext4 filesystem corruption on two of my test alphas with 4.20.0-09062-gd8372ba8ce28. On AlphaServer DS10: [10749.664418] EXT4-fs error (device sda2): __ext4_iget:5052: inode #1853093: block 1: comm rsync: invalid block On AlphaServer DS10L: [ 5325.064656] EXT4-fs error (device sda2): htree_dirblock_to_tree:1007: inode #1191951: block 4731728: comm rm: bad entry in directory: directory entry overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096 [ 5325.069539] EXT4-fs error (device sda2): htree_dirblock_to_tree:1007: inode #1191951: block 4731728: comm rm: bad entry in directory: directory entry overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096 [ 5325.077351] EXT4-fs error (device sda2): ext4_empty_dir:2718: inode #1191951: block 4731728: comm rm: bad entry in directory: directory entry overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096 Two other alphas, PC-164 and Eiger, worked fine with the same kernel version (different kernel configs according to hardware). The details: 4.20 worked fine, with gentoo emerge package update after bootup. Next, 4.20.0-06428-g00c569b567c7 worked fine, with gentoo emerge after bootup. Next, 4.20.0-09062-gd8372ba8ce28 booted up fine but rsync and rm during start of gentoo emerge errored out like above. So the corruption _might_ have happened during bootup of previous kernel but it looks more likely that only the latest kernel with blk-mq introduced the problems. mq-deadline is in use on all the alphas. DS10 has Symbios 53C896 SCSI (sym2 driver), DS10L has QLogic ISP1040, so they are different. Working Eiger and PC164 have sym2 based scsi controllers too. Full dmesg of DS10: [0.00] Linux version 4.20.0-09062-gd8372ba8ce28 (mroos@ds10) (gcc version 7.3.0 (Gentoo 7.3.0-r3 p1.4)) #92 Sun Dec 30 01:29:49 EET 2018 [0.00] Booting GENERIC on Tsunami variation Webbrick using machine vector Webbrick from SRM [0.00] Major Options: LEGACY_START VERBOSE_MCHECK MAGIC_SYSRQ [0.00] Command line: root=/dev/sda2 console=ttyS0 [0.00] memcluster 0, usage 1, start0, end 256 [0.00] memcluster 1, usage 0, start 256, end65443 [0.00] memcluster 2, usage 1, start65443, end65536 [0.00] 2048K Bcache detected; load hit latency 20 cycles, load miss latency 95 cycles [0.00] On node 0 totalpages: 65443 [0.00] DMA zone: 448 pages used for memmap [0.00] DMA zone: 0 pages reserved [0.00] DMA zone: 65443 pages, LIFO batch:15 [0.00] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768 [0.00] pcpu-alloc: [0] 0 [0.00] Built 1 zonelists, mobility grouping on. Total pages: 64995 [0.00] Kernel command line: root=/dev/sda2 console=ttyS0 [0.00] Dentry cache hash table entries: 65536 (order: 6, 524288 bytes) [0.00] Inode-cache hash table entries: 32768 (order: 5, 262144 bytes) [0.00] Sorting __ex_table... [0.00] Memory: 508584K/523544K available (5571K kernel code, 413K rwdata, 1456K rodata, 256K init, 206K bss, 14960K reserved, 0K cma-reserved) [0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 [0.00] NR_IRQS: 128 [0.00] HWRPB cycle frequency bogus. Estimated 462413354 Hz [0.00] clocksource: rpcc: mask: 0x max_cycles: 0x, max_idle_ns: 4133229351 ns [0.002929] Console: colour VGA+ 80x25 [0.021484] printk: console [ttyS0] enabled [0.022460] Calibrating delay loop... 916.72 BogoMIPS (lpj=447488) [0.032226] pid_max: default: 32768 minimum: 301 [0.033203] Mount-cache hash table entries: 1024 (order: 0, 8192 bytes) [0.034179] Mountpoint-cache hash table entries: 1024 (order: 0, 8192 bytes) [0.038085] devtmpfs: initialized [0.040039] random: get_random_u32 called from bucket_table_alloc.isra.17+0xc4/0x290 with crng_init=0 [0.041015] clocksource: jiffies: mask: 0x max_cycles: 0x, max_idle_ns: 1866466235866741 ns [0.041992] futex hash table entries: 256 (order: -1, 6144 bytes) [0.043945] NET: Registered protocol family 16 [0.045898] EISA bus registered [0.047851] random: get_random_bytes called from kcmp_cookies_init+0x2c/0x74 with crng_init=0 [0.048828] PCI host bridge to bus :00 [0.050781] pci_bus :00: root bus resource [io 0x-0x1ff] [0.052734] pci_bus :00: root bus resource [mem 0x-0x3fff] [0.053710] pci_bus :00: No busn resource found for root bus, will use [bus 00-ff] [0.054687] pci :00:01.0: [10b9:5237] type 00 class 0x0c0310 [0.054687] pci :00:01.0: reg 0x10: [mem 0x020b4000-0x020b4fff] [0.054687] pci :00:07.0: [10b9:1533] type 00 class 0x060100 [0.055664] pci :00:09.0: [1011:0019] type 00 class 0x02 [0.055664] pci :00:09.0: reg 0x10: [io 0x1200-0x127f] [0.055664] pci :00:09.0: reg 0x14: [mem 0x
Re: [PATCH v2] x86/build: fix compiler support check for CONFIG_RETPOLINE
05.12.18 08:27 Masahiro Yamada kirjutas: The easiest fix is to move this check to the "archprepare" like commit 829fe4aa9ac1 ("x86: Allow generating user-space headers without a compiler") did. Link: https://lkml.org/lkml/2018/12/4/206 Fixes: 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support") Reported-by: Meelis Roos Signed-off-by: Masahiro Yamada --- Changes in v2: - Revive ifdef CONFIG_RETPOLINE surrounding the KBUILD_CFLAGS addition - Rephase the commit log a bit, hoping the cause of the issue will be clearer Works for me - first it did scripts/kconfig/conf --syncconfig Kconfig and then started compiling. The #define is gone from include/linux. Thank you! -- Meelis Roos
Compiling with old gcc breaks when CONFIG_RETPOLINE is off
Just tried 4.20-rc5 on an old K6-2 PC with gcc 5.3.1, got an error about non-retpoline compiler, turned CONFIG_RETPOLINE off and retried. To my surprise, compilation still breaks with arch/x86/Makefile:224: *** You are building kernel with non-retpoline compiler, please update your compiler.. Stop. As I read the Makefile, it should error only when CONFIG_RETPOLINE is enabled, but it still breaks. $ grep -r CONFIG_RETPOLINE .config # CONFIG_RETPOLINE is not set $ grep -r CONFIG_RETPOLINE include/ include/generated/autoconf.h:#define CONFIG_RETPOLINE 1 include/config/auto.conf:CONFIG_RETPOLINE=y So the headers have not been updated yet, maybe? -- Meelis Roos
insecure W+X mappings on HP DL365 G5
This HP DL365 G5 is the second old server where I see massive W+X mapped pages. Is it some BIOS defect? [0.714956] x86/mm: Found insecure W+X mapping at address 0x8ed98000 [0.715101] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:266 note_page+0x4c7/0x780 [0.715298] Modules linked in: [0.715421] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-11807-g310c7585e830 #9 [0.715612] Hardware name: HP ProLiant DL365 G5 , BIOS A10 05/02/2011 [0.715741] RIP: 0010:note_page+0x4c7/0x780 [0.715864] Code: fd 01 0f 87 1a 09 00 00 41 83 e5 01 0f 85 3f fc ff ff 49 8b 74 24 18 48 c7 c7 20 72 f2 bc c6 05 13 7f e9 00 01 e8 8a bf 00 00 <0f> 0b e9 20 fc ff ff 45 84 ed 0f 85 2b 08 00 00 4d 85 ff 0f 85 91 [0.716141] RSP: 0018:b262c0c5be10 EFLAGS: 00010282 [0.716265] RAX: RBX: 0161 RCX: bd06b778 [0.716393] RDX: 0001 RSI: 0082 RDI: bd4a972c [0.716511] RBP: R08: 02bb R09: bd4eb701 [0.716638] R10: 8ed9800bc240 R11: 00032084 R12: b262c0c5bec0 [0.716775] R13: R14: 0002 R15: [0.716903] FS: () GS:8edaaba0() knlGS: [0.717085] CS: 0010 DS: ES: CR0: 80050033 [0.717208] CR2: b262c0e24000 CR3: 9e60a000 CR4: 06f0 [0.717343] Call Trace: [0.717470] ? vprintk_emit+0x18a/0x1e0 [0.717592] ptdump_walk_pgd_level_core+0x352/0x410 [0.717720] ? rest_init+0x1/0xcc [0.717839] kernel_init+0x39/0x114 [0.717960] ? rest_init+0xcc/0xcc [0.718085] ret_from_fork+0x22/0x40 [0.718207] ---[ end trace 34c16f2bb7a914e2 ]--- [0.744838] x86/mm: Checked W+X mappings: FAILED, 2182367 W+X pages found. -- Meelis Roos
Re: HP DL585 warm boot fail (old)
Can you try the patch below? This is extracted from the code here: https://github.com/joyent/illumos-joyent/blob/b6a0b04d591f5b877cfe05f45e81f0e8a5cfc2b3/usr/src/uts/intel/io/pci/pci_boot.c#L1805 Thank you. Unfortunately it does not change anything noticable. Do you see the "disabling NMI on error" message?> Can you boot with "pci=earlydump vga=0xf07" and capture the output? Drop the "vga=0xf07" if it doesn't work or makes the screen unreadable. vga= modes did not work with any LCD available there, vga=6 worked with old CRT only. But I connected serial console and got full dmesg. There is no "disabling NMI on error" in the dmesg. This also caused(?) a working boot with the same kernel that failed before. Both 9600 and 115200 worked the same. dmesg from pci=earlydump from serial console: [0.00] Linux version 4.19.0-dirty (mroos@dl585) (gcc version 8.2.0 (Debian 8.2.0-4)) #97 SMP Wed Oct 24 17:36:06 EEST 2018 [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-4.19.0-dirty root=/dev/sda1 ro ignore_loglevel pci=earlydump console=ttyS0,115200 [0.00] x86/fpu: x87 FPU will use FXSAVE [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009f3ff] usable [0.00] BIOS-e820: [mem 0x0009f400-0x0009] reserved [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xf57f67ff] usable [0.00] BIOS-e820: [mem 0xf57f6800-0xf57f] ACPI data [0.00] BIOS-e820: [mem 0xfdc0-0xfdc00fff] reserved [0.00] BIOS-e820: [mem 0xfdc1-0xfdc10fff] reserved [0.00] BIOS-e820: [mem 0xfdc2-0xfdc20fff] reserved [0.00] BIOS-e820: [mem 0xfdc3-0xfdc30fff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved [0.00] BIOS-e820: [mem 0xfec2-0xfec20fff] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee0] reserved [0.00] BIOS-e820: [mem 0xff80-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x0003efff] usable [0.00] debug: ignoring loglevel setting. [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.3 present. [0.00] DMI: HP ProLiant DL585 G1, BIOS A01 02/14/2007 [0.00] tsc: Fast TSC calibration using PIT [0.00] tsc: Detected 2196.908 MHz processor [0.008307] e820: update [mem 0x-0x0fff] usable ==> reserved [0.008311] e820: remove [mem 0x000a-0x000f] usable [0.015723] AGP: No AGP bridge found [0.015858] last_pfn = 0x3f max_arch_pfn = 0x4 [0.015865] MTRR default type: write-back [0.015866] MTRR fixed ranges enabled: [0.015869] 0-9 write-back [0.015871] A-B uncachable [0.015873] C-F write-back [0.015874] MTRR variable ranges enabled: [0.015878] 0 base 00F580 mask 80 uncachable [0.015881] 1 base 00F600 mask FFFE00 uncachable [0.015883] 2 base 00F800 mask FFF800 uncachable [0.015884] 3 disabled [0.015885] 4 disabled [0.015886] 5 disabled [0.015887] 6 disabled [0.015888] 7 disabled [0.016523] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT [0.016767] last_pfn = 0xf57f6 max_arch_pfn = 0x4 [0.016863] Base memory trampoline at [(ptrval)] 99000 size 24576 [0.016878] BRK [0x2be401000, 0x2be401fff] PGTABLE [0.016887] BRK [0x2be402000, 0x2be402fff] PGTABLE [0.016891] BRK [0x2be403000, 0x2be403fff] PGTABLE [0.016964] BRK [0x2be404000, 0x2be404fff] PGTABLE [0.016971] BRK [0x2be405000, 0x2be405fff] PGTABLE [0.017198] BRK [0x2be406000, 0x2be406fff] PGTABLE [0.017209] BRK [0x2be407000, 0x2be407fff] PGTABLE [0.017219] BRK [0x2be408000, 0x2be408fff] PGTABLE [0.017284] BRK [0x2be409000, 0x2be409fff] PGTABLE [0.017521] BRK [0x2be40a000, 0x2be40afff] PGTABLE [0.017583] ACPI: Early table checksum verification disabled [0.018039] ACPI: RSDP 0x000F4F20 24 (v02 HP) [0.018046] ACPI: XSDT 0xF57F6C00 44 (v01 HP A01 0002 �? 162E) [0.018058] ACPI: FACP 0xF57F6C80 F4 (v03 HP A01 0002 �? 162E) [0.018074] ACPI BIOS Warning (bug): Invalid length for FADT/Pm1aControlBlock: 32, using default 16 (20180810/tbfadt-674) [0.018079] ACPI BIOS Warning (bug): Invalid length for FADT/Pm1bControlBlock: 32, using default 16 (20180810/tbfadt-674) [0.018085] ACPI: DSDT 0xF57F6D80 0051D5 (v01 HP DSDT 0001 MSFT 0201) [0.018091] ACPI: FACS 0xF57F68C0 40 [0.018094] ACPI: FACS 0xF
Re: HP DL585 warm boot fail (old)
Can you try the patch below? This is extracted from the code here: https://github.com/joyent/illumos-joyent/blob/b6a0b04d591f5b877cfe05f45e81f0e8a5cfc2b3/usr/src/uts/intel/io/pci/pci_boot.c#L1805 Thank you. Unfortunately it does not change anything noticable. I'm not sure why this would be only an intermittent problem, but at least we can see if this is related. It seems 4.19 and current git are 100% reproducers so far - I have not managed to successfully boot either of them yet. I have seen 4.19-rc1 era git kernel booting at least once. I noticed that Debian packaged 4.17 with initramfs worked fine so far for my test, from these I have in grub menu. My selfcompiled kernels do not use initramfs. -- Meelis Roos
Re: HH DL585 warm boot fail (old)
Would you mind opening a report at https://bugzilla.kernel.org? I'm not sure if anybody will be able to do anything about this, but it's always possible. Submitted now, https://bugzilla.kernel.org/show_bug.cgi?id=201503 A complete dmesg log and "sudo lspci -vv" output from a successful boot would be a good start. And if you have a screenshot of the failure, that would help, too. You can use the "ignore_loglevel" kernel parameter to make sure we see everything on the console. Added. Does this machine have an iLO? If so, it may have logs that could be useful if this is related to some sort of bus error. Nothing in the ILO logs. -- Meelis Roos
Re: 32-bit PTI with THP = userspace corruption
> 4) Disable PTI support on 2-level paging by making it dependent > on CONFIG_X86_PAE. This is, imho, the least ugly option > because the machines that do not support PAE are most likely > too old to be affected my Meltdown anyway. We might also > consider switching i386_defconfig to PAE? > > Any other thoughts? The machines where I have PAE off are the ones that have less memory. PAE is off just for performance reasons, not lack of PAE. PAE should be present on all of my affected machines anyway and current distributions seem to mostly assume 686 and PAE anyway for 32-bit systems. -- Meelis Roos (mr...@ut.ee) http://www.cs.ut.ee/~mroos/
rng_dev_read: Kernel memory exposure attempt detected from SLUB object 'kmalloc-64'
This is weekend's 4.19.0-rc2-00246-gd7b686ebf704 on a Thinkad T460s. There seems to be a usercopy warning from rng_dev read (full dmesg below). [0.00] microcode: microcode updated early to revision 0xc6, date = 2018-04-17 [0.00] Linux version 4.19.0-rc2-00246-gd7b686ebf704 (mroos@t460s) (gcc version 8.2.0 (Debian 8.2.0-5)) #36 SMP Sat Sep 8 16:27:54 EEST 2018 [0.00] Command line: BOOT_IMAGE=/vmlinuz-4.19.0-rc2-00246-gd7b686ebf704 root=/dev/mapper/TP-ROOT ro [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [0.00] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' [0.00] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' [0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [0.00] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 [0.00] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 [0.00] x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format. [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009cfff] usable [0.00] BIOS-e820: [mem 0x0009d000-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xb100afff] usable [0.00] BIOS-e820: [mem 0xb100b000-0xc3ed5fff] reserved [0.00] BIOS-e820: [mem 0xc3ed6000-0xc3ed6fff] ACPI NVS [0.00] BIOS-e820: [mem 0xc3ed7000-0xcff75fff] reserved [0.00] BIOS-e820: [mem 0xcff76000-0xcff77fff] ACPI NVS [0.00] BIOS-e820: [mem 0xcff78000-0xcff78fff] reserved [0.00] BIOS-e820: [mem 0xcff79000-0xcffc5fff] ACPI NVS [0.00] BIOS-e820: [mem 0xcffc6000-0xcfffdfff] ACPI data [0.00] BIOS-e820: [mem 0xcfffe000-0xd7ff] reserved [0.00] BIOS-e820: [mem 0xd860-0xdc7f] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfd00-0xfe7f] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfed0-0xfed00fff] reserved [0.00] BIOS-e820: [mem 0xfed1-0xfed19fff] reserved [0.00] BIOS-e820: [mem 0xfed84000-0xfed84fff] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xff80-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x0003227f] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.8 present. [0.00] DMI: LENOVO 20F9003SMS/20F9003SMS, BIOS N1CET65W (1.33 ) 02/16/2018 [0.00] tsc: Detected 2400.000 MHz processor [0.002224] e820: update [mem 0x-0x0fff] usable ==> reserved [0.002226] e820: remove [mem 0x000a-0x000f] usable [0.002234] last_pfn = 0x322800 max_arch_pfn = 0x4 [0.002238] MTRR default type: write-back [0.002239] MTRR fixed ranges enabled: [0.002240] 0-9 write-back [0.002241] A-B uncachable [0.002242] C-F write-protect [0.002242] MTRR variable ranges enabled: [0.002244] 0 base 00E000 mask 7FE000 uncachable [0.002245] 1 base 00DC00 mask 7FFC00 uncachable [0.002246] 2 base 00DA00 mask 7FFE00 uncachable [0.002246] 3 disabled [0.002246] 4 disabled [0.002247] 5 disabled [0.002247] 6 disabled [0.002248] 7 disabled [0.002248] 8 disabled [0.002248] 9 disabled [0.003223] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT [0.003726] last_pfn = 0xb100b max_arch_pfn = 0x4 [0.011684] Scanning 1 areas for low memory corruption [0.011688] Base memory trampoline at [(ptrval)] 97000 size 24576 [0.011691] Using GB pages for direct mapping [0.011693] BRK [0x2422f6000, 0x2422f6fff] PGTABLE [0.011695] BRK [0x2422f7000, 0x2422f7fff] PGTABLE [0.011696] BRK [0x2422f8000, 0x2422f8fff] PGTABLE [0.011724] BRK [0x2422f9000, 0x2422f9fff] PGTABLE [0.011726] BRK [0x2422fa000, 0x2422fafff] PGTABLE [0.011888] BRK [0x2422fb000, 0x2422fbfff] PGTABLE [0.011917] BRK [0x2422fc000, 0x2422fcfff] PGTABLE [0.011986] RAMDISK: [mem 0x36a31000-0x3750] [0.011996] ACPI: Early table checksum verification disabled [0.012029] ACPI: RSDP 0x000F0120 24 (v02 LENOVO) [0.012033] ACPI: XSDT 0xCFFCF188 EC (v01 LENOVO TP-N1C PTE
4.19-rc1: usercopy warning from rng_dev_read()
Some time yesterday I have got this warning in dmesg. [55255.629421] usercopy: Kernel memory exposure attempt detected from SLUB object 'kmalloc-64' (offset 0, size 379)! [55255.629440] [ cut here ] [55255.629446] kernel BUG at mm/usercopy.c:102! [55255.629465] invalid opcode: [#1] SMP PTI [55255.629477] CPU: 3 PID: 1719 Comm: rngd Not tainted 4.19.0-rc1 #34 [55255.629483] Hardware name: LENOVO 20F9003SMS/20F9003SMS, BIOS N1CET65W (1.33 ) 02/16/2018 [55255.629499] RIP: 0010:usercopy_abort+0x6f/0x71 [55255.629508] Code: 0f 45 c6 48 c7 c2 2c 27 e0 bd 48 c7 c6 d5 53 df bd 51 48 0f 45 f2 48 89 f9 41 52 48 89 c2 48 c7 c7 f8 27 e0 bd e8 0e 3c ed ff <0f> 0b 49 89 e8 31 c9 44 89 e2 31 f6 48 c7 c7 60 27 e0 bd e8 79 ff [55255.629516] RSP: 0018:a2394078bdb0 EFLAGS: 00010246 [55255.629527] RAX: 0065 RBX: 8d2e5464afc0 RCX: 0006 [55255.629535] RDX: RSI: 0086 RDI: 8d2e56b95500 [55255.629541] RBP: 017b R08: bd5116c0 R09: 0065 [55255.629548] R10: be6902a0 R11: be67efad R12: 0001 [55255.629555] R13: 8d2e5464b13b R14: 017b R15: 017b [55255.629564] FS: 7fc22d165700() GS:8d2e56b8() knlGS: [55255.629572] CS: 0010 DS: ES: CR0: 80050033 [55255.629579] CR2: 1d6de2d36018 CR3: 000309eae004 CR4: 003606e0 [55255.629584] Call Trace: [55255.629605] __check_heap_object+0xd5/0x100 [55255.629615] __check_object_size+0xf5/0x17c [55255.629627] rng_dev_read+0x6e/0x270 [55255.629642] __vfs_read+0x31/0x170 [55255.629657] vfs_read+0x85/0x130 [55255.629670] ksys_read+0x4a/0xb0 [55255.629682] do_syscall_64+0x4a/0xf0 [55255.629696] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [55255.629706] RIP: 0033:0x7fc22d337394 [55255.629715] Code: 84 00 00 00 00 00 41 54 55 49 89 d4 53 48 89 f5 89 fb 48 83 ec 10 e8 8b fc ff ff 4c 89 e2 41 89 c0 48 89 ee 89 df 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 38 44 89 c7 48 89 44 24 08 e8 c7 fc ff ff 48 [55255.629723] RSP: 002b:7fc22d164e10 EFLAGS: 0246 ORIG_RAX: [55255.629733] RAX: ffda RBX: 0003 RCX: 7fc22d337394 [55255.629739] RDX: 09c4 RSI: 55a5a95d0b50 RDI: 0003 [55255.629746] RBP: 55a5a95d0b50 R08: R09: 7fff68b5b080 [55255.629752] R10: 0001 R11: 0246 R12: 09c4 [55255.629759] R13: 7fff68ac3a9f R14: 7fff68ac3aa0 R15: [55255.629766] Modules linked in: tun ipt_MASQUERADE nf_conntrack_netlink iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter bpfilter xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc overlay fuse cpufreq_userspace bnep iwlmvm mac80211 snd_hda_codec_hdmi btusb btrtl btbcm btintel iwlwifi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_hda_intel snd_hda_codec joydev pcspkr videobuf2_common cfg80211 iTCO_wdt snd_hwdep iTCO_vendor_support snd_hda_core cdc_mbim cdc_acm cdc_wdm videodev cdc_ncm usbnet mii media bluetooth ecdh_generic mei_me mei intel_pch_thermal thinkpad_acpi tpm_crb tpm_tis tpm_tis_core pcc_cpufreq tpm ip_tables dm_crypt dm_mod dax [55255.629913] hid_generic rtsx_pci_sdmmc mmc_core crct10dif_pclmul e1000e i2c_i801 rtsx_pci mfd_core [55255.629987] ---[ end trace 26cd21a5b2d7ec20 ]--- [55255.630022] RIP: 0010:usercopy_abort+0x6f/0x71 [55255.630046] Code: 0f 45 c6 48 c7 c2 2c 27 e0 bd 48 c7 c6 d5 53 df bd 51 48 0f 45 f2 48 89 f9 41 52 48 89 c2 48 c7 c7 f8 27 e0 bd e8 0e 3c ed ff <0f> 0b 49 89 e8 31 c9 44 89 e2 31 f6 48 c7 c7 60 27 e0 bd e8 79 ff [55255.630069] RSP: 0018:a2394078bdb0 EFLAGS: 00010246 [55255.630102] RAX: 0065 RBX: 8d2e5464afc0 RCX: 0006 [55255.630134] RDX: RSI: 0086 RDI: 8d2e56b95500 [55255.630154] RBP: 017b R08: bd5116c0 R09: 0065 [55255.630173] R10: be6902a0 R11: be67efad R12: 0001 [55255.630197] R13: 8d2e5464b13b R14: 017b R15: 017b [55255.630218] FS: 7fc22d165700() GS:8d2e56b8() knlGS: [55255.630246] CS: 0010 DS: ES: CR0: 80050033 [55255.630266] CR2: 1d6de2d36018 CR3: 000309eae004 CR4: 003606e0 Config: # # Automatically generated file; DO NOT EDIT. # Linux/x86 4.19.0-rc1 Kernel Configuration # # # Compiler: gcc (Debian 8.2.0-4) 8.2.0 # CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=80200 CONFIG_CLANG_VERSION=0 CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_BUILD_SALT="" CONFIG_HAVE_KERNEL_GZIP=y CONFIG
Re: cmpxchg.h:245:2: error: ‘asm’ operand has impossible constraints
> > > > 5.3.1-14, seems to be available in > > > > http://snapshot.debian.org/package/gcc-5/5.3.1-14/#gcc-5_5.3.1-14 - the > > > > whole system is a snapshot of debian unstable when they stoooed > > > > supporting pre-686 CPUs. > > > > > > Uurgh. That's going to be a nightmare to set that one up. Let's try to > > > nail > > > it on your machine then. Can you try to generate the intermediate file by > > > invoking: make mm/slub.i ? > > > > Here you are. > > Looks unsuspicious. Is this an entitely new issue on 4.19-rc or can you see > the same with older kernel versions? 4.18 was fine with sea same toolchain, so this is a new issue. -- Meelis Roos (mr...@linux.ee)
Re: cmpxchg.h:245:2: error: ‘asm’ operand has impossible constraints
> > While trying to compile v4.18-13105-gaba16dc5cf93 with gcc 5.3.1 on a > > 32-bit x86 configured for AMD K6: > > I tried to get hold of that debian gcc 5.3.1 compiler, but no luck so far. 5.3.1-14, seems to be available in http://snapshot.debian.org/package/gcc-5/5.3.1-14/#gcc-5_5.3.1-14 - the whole system is a snapshot of debian unstable when they stoooed supporting pre-686 CPUs. -- Meelis Roos (mr...@linux.ee)
Re: 32-bit PTI with THP = userspace corruption
> > I am seeing userland corruption and application crashes on multiple > > 32-bit machines with 4.19-rc1+git. The machines vary: PII, PIII, P4. > > They are all Intel. AMD Duron/Athlon/AthlonMP have been fine in my tests > > so far (may be configuration dependent). > > Thanks for the report! I'll try to reproduce the problem tomorrow and > investigate it. Can you please check if any of the kernel configurations > that show the bug has CONFIG_X86_PAE set? If not, can you please test > if enabling this option still triggers the problem? PAE was not visible itself, but when I changed HIGHMEM_4G to HIGHMEM_64G, X86_PAE was also selected and the resutling kernel works. Also, I verified that the olid proliants with 6G RAM already have HIGHMEM_64G set and they do not exhibit the problem either. -- Meelis Roos (mr...@linux.ee)
Re: 32-bit PTI with THP = userspace corruption
> > I am seeing userland corruption and application crashes on multiple > > 32-bit machines with 4.19-rc1+git. The machines vary: PII, PIII, P4. > > They are all Intel. AMD Duron/Athlon/AthlonMP have been fine in my tests > > so far (may be configuration dependent). > > Thanks for the report! I'll try to reproduce the problem tomorrow and > investigate it. Can you please check if any of the kernel configurations > that show the bug has CONFIG_X86_PAE set? If not, can you please test > if enabling this option still triggers the problem? Will check, but out of my memery there were 2 G3 HP Proliants that did not fit into the pattern (problem did not appear). I have more than 4G RAM in those and HIGHMEM_4G there, maybe that's it? -- Meelis Roos (mr...@linux.ee)
32-bit PTI with THP = userspace corruption
I am seeing userland corruption and application crashes on multiple 32-bit machines with 4.19-rc1+git. The machines vary: PII, PIII, P4. They are all Intel. AMD Duron/Athlon/AthlonMP have been fine in my tests so far (may be configuration dependent). Typical problem is running aptitude in Debian unstable, doing package list update and seeing glibc warning about linked list corruption and some other corruption, causing SIGABRT-s: corrupted double-linked list Ouch! Got SIGABRT, dying.. malloc_consolidate(): invalid chunk size Ouch! Got SIGABRT, dying.. I bisected the problem. It was tricky because it led to 32-bit bpf problem commit range, but that could be worked around with the patch that was later applied. The result is 32-bit PTI introduction commit (PTI was turned on on all the test machines): 7757d607c6b3186de42e1fb0210b9c5d8b70 is the first bad commit commit 7757d607c6b3186de42e1fb0210b9c5d8b70 Author: Joerg Roedel Date: Wed Jul 18 11:41:14 2018 +0200 x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32 Allow PTI to be compiled on x86_32. Signed-off-by: Joerg Roedel Signed-off-by: Thomas Gleixner Tested-by: Pavel Machek Cc: "H . Peter Anvin" Cc: linux...@kvack.org Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Dave Hansen Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Jiri Kosina Cc: Boris Ostrovsky Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: Will Deacon Cc: aligu...@amazon.com Cc: daniel.gr...@iaik.tugraz.at Cc: hu...@google.com Cc: keesc...@google.com Cc: Andrea Arcangeli Cc: Waiman Long Cc: "David H . Gutteridge" Cc: j...@8bytes.org Link: https://lkml.kernel.org/r/1531906876-13451-38-git-send-email-j...@8bytes.org :04 04 dbab9a897d534d7b14f900f0c6779b6848833892 f0674017544bc95fafa431d1e638f994eca37b51 M security However, not all of my 32-bit Intel machines showed the problem, so I looked for correlations in kernel configs (6 working and 6 non-working) and found a suspect of CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y, as well as multiple CPU hotplug options (not turned on directly but by something else, I think - and not investigated further). I retested v4.19-rc1-95-g3f16503b7d22 with changed configuration options and found that it starts to work as soon as I turn CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS to madvise or turn off CONFIG_PAGE_TABLE_ISOLATION. So the combination of PTI and THP always-on breaks it. Here is a sample configuration that is broken: # # Automatically generated file; DO NOT EDIT. # Linux/x86 4.19.0-rc1 Kernel Configuration # # # Compiler: gcc (Debian 8.2.0-3) 8.2.0 # CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=80200 CONFIG_CLANG_VERSION=0 CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_BUILD_SALT="" CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y # CONFIG_KERNEL_GZIP is not set # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set CONFIG_KERNEL_XZ=y # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y # CONFIG_USELIB is not set CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_WATCH=y CONFIG_AUDIT_TREE=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_GENERIC_IRQ_MIGRATION=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_GENERIC_MSI_IRQ_DOMAIN=y CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y CONFIG_GENERIC_IRQ_RESERVATION_MODE=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y # CONFIG_GENERIC_IRQ_DEBUGFS is not set CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_IRQ_TIME_ACCOUNTING is not set # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y # CONFIG_CPU_ISOLATION is not set # #
cmpxchg.h:245:2: error: ‘asm’ operand has impossible constraints
While trying to compile v4.18-13105-gaba16dc5cf93 with gcc 5.3.1 on a 32-bit x86 configured for AMD K6: CC mm/slub.o In file included from ./arch/x86/include/asm/atomic.h:8:0, from ./include/linux/atomic.h:7, from ./arch/x86/include/asm/thread_info.h:54, from ./include/linux/thread_info.h:38, from ./arch/x86/include/asm/preempt.h:7, from ./include/linux/preempt.h:81, from ./include/linux/spinlock.h:51, from ./include/linux/mmzone.h:8, from ./include/linux/gfp.h:6, from ./include/linux/mm.h:10, from mm/slub.c:13: mm/slub.c: In function ‘__slab_free’: ./arch/x86/include/asm/cmpxchg.h:245:2: error: ‘asm’ operand has impossible constraints asm volatile(pfx "cmpxchg%c4b %2; sete %0" \ ^ ./arch/x86/include/asm/cmpxchg.h:254:2: note: in expansion of macro ‘__cmpxchg_double’ __cmpxchg_double(LOCK_PREFIX, p1, p2, o1, o2, n1, n2) ^ ./include/asm-generic/atomic-instrumented.h:457:2: note: in expansion of macro ‘arch_cmpxchg_double’ arch_cmpxchg_double(__ai_p1, (p2), (o1), (o2), (n1), (n2)); \ ^ mm/slub.c:404:7: note: in expansion of macro ‘cmpxchg_double’ if (cmpxchg_double(&page->freelist, &page->counters, ^ scripts/Makefile.build:307: recipe for target 'mm/slub.o' failed make[1]: *** [mm/slub.o] Error 1 Config: # # Automatically generated file; DO NOT EDIT. # Linux/x86 4.18.0 Kernel Configuration # # # Compiler: gcc (Debian 5.3.1-14) 5.3.1 20160409 # CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=50301 CONFIG_CLANG_VERSION=0 CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_BUILD_SALT="" CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y # CONFIG_KERNEL_GZIP is not set # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set CONFIG_KERNEL_XZ=y # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y # CONFIG_CROSS_MEMORY_ATTACH is not set # CONFIG_USELIB is not set CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_WATCH=y CONFIG_AUDIT_TREE=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_IRQ_DOMAIN=y CONFIG_GENERIC_IRQ_RESERVATION_MODE=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y # CONFIG_GENERIC_IRQ_DEBUGFS is not set CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_IRQ_TIME_ACCOUNTING is not set # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # # RCU Subsystem # CONFIG_TINY_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y CONFIG_TINY_SRCU=y # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=17 CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y CONFIG_CGROUPS=y # CONFIG_MEMCG is not set # CONFIG_BLK_CGROUP is not set # CONFIG_CGROUP_SCHED is not set # CONFIG_CGROUP_PIDS is not set # CONFIG_CGROUP_RDMA is not set # CONFIG_CGROUP_FREEZER is not set # CONFIG_CGROUP_HUGETLB is not set # CONFIG_CGROUP_DEVICE is not set # CONFIG_CGROUP_CPUACCT is not set # CONFIG_CGROUP_PERF is not set CONFIG_CGROUP_BPF=y # CONFIG_CGROUP_DEBUG is not set CONFIG_SOCK_CGROUP_DATA=y CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_IPC_NS is not set # CONFIG_USER_NS is not set # CONFIG_PID_NS is not set # CONFIG_NET_NS is not set # CONFIG_CHECKPOINT_RESTORE is not set # CONFIG_SCHED_AUTOGROUP is not set # CONFIG_SYSFS_DEPRECATED is not set # CONFIG_RELAY is not set # CONFIG_BLK_DEV_INITRD is not set CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y CONFIG_ANON_INODES=y CONFIG_HAVE_UID16=y CONFIG_SYSCTL_EXCEPTION_TRACE=y CONFIG_HAVE_PCSPKR_PLATFORM=y CONFIG_BPF=y # CONFIG_EXPERT is not set CONFIG_UID16=y CONFIG_MULTIUSER=y CONFIG_SGETMASK_SYSCALL=y CONFIG_SYSFS_SYSCALL=y CONFIG_FHANDLE=y CONFIG_POSIX_TIMERS=y CONFIG_PRINTK=y CONFIG_PRINTK_NMI=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_PCSPKR_PLATFORM=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_FUTEX_PI=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIM
make *config regression: pkg-build
Just tried to run 'make menuconfig' on v4.18-10568-g08b5fa819970 and found a bad surprise: 'make *config' requires 'pkg-config'. Please install it. make[1]: *** [scripts/kconfig/Makefile:219: scripts/kconfig/.mconf-cfg] Error 1 This is clearly a regression - I have libncurses devele pakcage installed in the default system location (as do 99%+ on actuall develeopers proobably) and in this case, pkg-config is useless. pkg-config is needed only when libraries and headers are installed in non-default locations but it is bad to require installation of pkg-config on all the machines where make menuconfig would be possibly run (for example, I have a aboult 100 machine kernel testbed with self-hosted kernel compilation and machine-specific kernel configurations that ocassionally need tweaking). I notice 4.18 complained it can not find pkg-config but still worked. This is clearly better than now. If we want to support developers with libraries in non-default locations, why not - but the common case of system include path should work without any trouble or warnings. For exaple, test if compilation against ncurses works, and if not retry it with pkg-config (and error out if it does not give working result). -- Meelis Roos (mr...@linux.ee)
ptrace compile failure with gcc-8.2 on 32-bit powerpc
After upgrading my distro compiler to gcc-8.2, Linux fails to compile on 32-bit powerpc (tested with 4.17, 4.18 and v4.18-7873-gf91e654474d4). CC arch/powerpc/kernel/ptrace.o In file included from ./include/linux/bitmap.h:9, from ./include/linux/cpumask.h:12, from ./include/linux/rcupdate.h:44, from ./include/linux/rculist.h:11, from ./include/linux/pid.h:5, from ./include/linux/sched.h:14, from arch/powerpc/kernel/ptrace.c:19: In function ‘memcpy’, inlined from ‘user_regset_copyin’ at ./include/linux/regset.h:295:4, inlined from ‘vr_set’ at arch/powerpc/kernel/ptrace.c:619:9: ./include/linux/string.h:345:9: error: ‘__builtin_memcpy’ offset [-527, -529] is out of the bounds [0, 16] of object ‘vrsave’ with type ‘union ’ [-Werror=array-bounds] return __builtin_memcpy(p, q, size); ^~~~ arch/powerpc/kernel/ptrace.c: In function ‘vr_set’: arch/powerpc/kernel/ptrace.c:614:5: note: ‘vrsave’ declared here } vrsave; ^~ In file included from ./include/linux/bitmap.h:9, from ./include/linux/cpumask.h:12, from ./include/linux/rcupdate.h:44, from ./include/linux/rculist.h:11, from ./include/linux/pid.h:5, from ./include/linux/sched.h:14, from arch/powerpc/kernel/ptrace.c:19: In function ‘memcpy’, inlined from ‘user_regset_copyout’ at ./include/linux/regset.h:270:4, inlined from ‘vr_get’ at arch/powerpc/kernel/ptrace.c:572:9: ./include/linux/string.h:345:9: error: ‘__builtin_memcpy’ offset [-527, -529] is out of the bounds [0, 16] of object ‘vrsave’ with type ‘union ’ [-Werror=array-bounds] return __builtin_memcpy(p, q, size); ^~~~ arch/powerpc/kernel/ptrace.c: In function ‘vr_get’: arch/powerpc/kernel/ptrace.c:567:5: note: ‘vrsave’ declared here } vrsave; ^~ cc1: all warnings being treated as errors make[1]: *** [scripts/Makefile.build:311: arch/powerpc/kernel/ptrace.o] Error 1 -- Meelis Roos (mr...@linux.ee)
apparmor unaligned accesses on sparc64 in 4.18+git
Just tried 4.18.0-02978-g1eb46908b35d on a sparc64 box with Debian Ports sparc64 unstable (apparmor packages recommended by linux-image package) and got the following on bootup: [ 46.315721] Kernel unaligned access at TPC[6b8b98] aa_dfa_unpack+0x38/0x620 [ 46.412375] Kernel unaligned access at TPC[6b8ba8] aa_dfa_unpack+0x48/0x620 [ 46.412392] Kernel unaligned access at TPC[6b8c28] aa_dfa_unpack+0xc8/0x620 [ 46.698283] Kernel unaligned access at TPC[6b8ce8] aa_dfa_unpack+0x188/0x620 [ 46.789536] Kernel unaligned access at TPC[6b8cfc] aa_dfa_unpack+0x19c/0x620 -- Meelis Roos (mr...@linux.ee)
4.18+git: undefined reference to `l1tf_vmx_mitigation'
Tried to compile current git (v4.18-1934-gbe718b524d8d) with AMD KVM and got the following linking error: MODPOST vmlinux.o ld: arch/x86/kvm/x86.o: in function `kvm_get_arch_capabilities': x86.c:(.text+0x5132): undefined reference to `l1tf_vmx_mitigation' # # Automatically generated file; DO NOT EDIT. # Linux/x86 4.18.0 Kernel Configuration # # # Compiler: gcc (Debian 8.2.0-3) 8.2.0 # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_MMU=y CONFIG_ARCH_MMAP_RND_BITS_MIN=28 CONFIG_ARCH_MMAP_RND_BITS_MAX=32 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_ARCH_HAS_FILTER_PGPROT=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_64_SMP=y CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_PGTABLE_LEVELS=4 CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=80200 CONFIG_CLANG_VERSION=0 CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y # CONFIG_KERNEL_GZIP is not set # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set CONFIG_KERNEL_XZ=y # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="x4200m2" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y # CONFIG_USELIB is not set CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_WATCH=y CONFIG_AUDIT_TREE=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_GENERIC_IRQ_MIGRATION=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_GENERIC_MSI_IRQ_DOMAIN=y CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y CONFIG_GENERIC_IRQ_RESERVATION_MODE=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y # CONFIG_GENERIC_IRQ_DEBUGFS is not set CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ_FULL is not set # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set # CONFIG_IRQ_TIME_ACCOUNTING is not set CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y # CONFIG_CPU_ISOLATION is not set # # RCU Subsystem # CONFIG_TREE_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y CONFIG_TREE_SRCU=y CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_NEED_SEGCBLIST=y # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=18 CONFIG_LOG_CPU_MAX_BUF_SHIFT=12 CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y CONFIG_ARCH_SUPPORTS_INT128=y CONFIG_NUMA_BALANCING=y CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y CONFIG_CGROUPS=y CONFIG_PAGE_COUNTER=y CONFIG_MEMCG=y CONFIG_MEMCG_SWAP=y CONFIG_MEMCG_SWAP_ENABLED=y CONFIG_BLK_CGROUP=y # CONFIG_DEBUG_BLK_CGROUP is not set CONFIG_CGROUP_WRITEBACK=y CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y # CONFIG_CFS_BANDWIDTH is not set # CONFIG_RT_GROUP_SCHED is not set CONFIG_CGROUP_PIDS=y # CONFIG_CGROUP_RDMA is not set CONFIG_CGROUP_FREEZER=y CONFIG_CGROUP_HUGETLB=y CONFIG_CPUSETS=y CONFIG_PROC_PID_CPUSET=y CONFIG_CGROUP_DEVICE=y CONFIG_CGROUP_CPUACCT=y CONFIG_CGROUP_PERF=y CONFIG_CGROUP_BPF=y # CONFIG_CGROUP_DEBUG is not set CONFIG_SOCK_CGROUP_DATA=y CONFIG_NAMESPACES=y CONFIG_UTS_NS=y CONFIG_IPC_NS=y CONFIG_USER_NS=y CONFIG_PID_NS=y CONFIG_NET_NS=y # CONFIG_SCHED_AUTOGROUP is not set # CONFIG_SYSFS_DEPRECATED is not set # CONFIG_RELAY is not
Re: bisected: 4.18-rc* regression: x86-32 troubles (with timers?)
> >> Now this seems more relevant: > >> > >> mroos@rx100s2:~/linux$ nice git bisect good > >> 24dea04767e6e5175f4750770281b0c17ac6a2fb is the first bad commit > >> commit 24dea04767e6e5175f4750770281b0c17ac6a2fb > >> Author: Daniel Borkmann > >> Date: Fri May 4 01:08:23 2018 +0200 > >> > >> bpf, x32: remove ld_abs/ld_ind > >> > >> Since LD_ABS/LD_IND instructions are now removed from the core and > >> reimplemented through a combination of inlined BPF instructions and > >> a slow-path helper, we can get rid of the complexity from x32 JIT. > > > > This does seem much more likely than the previous bisection, given > > that you ended up in an x86-32 specific commit (the subject says x32, > > but that is a mistake). I also checked that systemd indeed does > > call into bpf in a number of places, possibly for the journald socket. > > > > OTOH, it's still hard to tell how that commit can have ended up > > corrupting the clock read function in systemd. To cross-check, > > could you try reverting that commit on the latest kernel and see > > if it still works? > > I would be curious as well about that whether revert would make it > work. What's the value of sysctl net.core.bpf_jit_enable ? Does it > change anything if you set it to 0 (only interpreter) or 1 (JIT > enabled). Seems a bit strange to me that bisect ended at this commit > given the issue you have. The JIT itself was also new in this window > fwiw. In any case some more debug info would be great to have. net.core.bpf_jit_enable is 1. Since it breaks bootup, I can not easily change the value at runtime (it would be postfactum). Do you mean changing the CONFIG_BPF_JIT_ALWAYS_ON=y option? Anyway, I started compile of v4.18-rc5 that was the latest I tested, with the commit in question reverted. Will see if I can test tomorrow morning. But I will leave tomorrow for a week and can only test further things if they happen to boot fine (no manual reboot possible for a week). -- Meelis Roos (mr...@linux.ee)
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
449 # bad: [e64d52569f6e847495091db40ab58d2d379748ef] tools: bpftool: move get_possible_cpus() to common code git bisect bad e64d52569f6e847495091db40ab58d2d379748ef # bad: [b4264c96b5cbc00c4c07deb9fbab928d43dffcf9] nfp: bpf: rewrite map pointers with NFP TIDs git bisect bad b4264c96b5cbc00c4c07deb9fbab928d43dffcf9 # bad: [9816dd35ececc095f3e3be29d30d3adc755908d9] nfp: bpf: perf event output helpers support git bisect bad 9816dd35ececc095f3e3be29d30d3adc755908d9 # first bad commit: [9816dd35ececc095f3e3be29d30d3adc755908d9] nfp: bpf: perf event output helpers support -- Meelis Roos (mr...@linux.ee)
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
> > Everything below here is is 'bad', which can be an indication that you > > misclassified one of > > the commits above as 'good' when it should have been 'bad'. The most likely > > explanations are that you either typed the 'git bisect good' by accident, or > > that the failure is not 100% reliable, and it sometimes works fine even on a > > broken kernel. > > > > 0bc5fe857274133ca0 follows directly after 3a443bd6dd7c, "net/9p: correct the > > variable name in v9fs_get_trans_by_name() comment", which is marked "good", > > and can't really be good if 0bc5fe85727413 is bad and you are not using the > > 'qed' driver. > > > > I'd retest 3a443bd6dd7c again to see if that should have been 'bad', and > > if it was, test v4.17-rc4, which is what the net-next tree was based on. > > Yes, the same prebuilt 3a443bd6dd7c appeared to be bad when retesting > it. Building v4.17-rc4 now. v4.17-rc4 seems good after 2 reboots. -- Meelis Roos (mr...@ut.ee) http://www.cs.ut.ee/~mroos/
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
> Everything below here is is 'bad', which can be an indication that you > misclassified one of > the commits above as 'good' when it should have been 'bad'. The most likely > explanations are that you either typed the 'git bisect good' by accident, or > that the failure is not 100% reliable, and it sometimes works fine even on a > broken kernel. > > 0bc5fe857274133ca0 follows directly after 3a443bd6dd7c, "net/9p: correct the > variable name in v9fs_get_trans_by_name() comment", which is marked "good", > and can't really be good if 0bc5fe85727413 is bad and you are not using the > 'qed' driver. > > I'd retest 3a443bd6dd7c again to see if that should have been 'bad', and > if it was, test v4.17-rc4, which is what the net-next tree was based on. Yes, the same prebuilt 3a443bd6dd7c appeared to be bad when retesting it. Building v4.17-rc4 now. -- Meelis Roos (mr...@linux.ee)
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
7104a] qed: Add support for Unified Fabric Port. git bisect bad cac6f691546b9efd50c31c0db97fe50d0357104a # bad: [27bf96e32c92599dc7523b36d6c761fc8312c8c0] qed: Remove unused data member 'is_mf_default'. git bisect bad 27bf96e32c92599dc7523b36d6c761fc8312c8c0 # bad: [0bc5fe857274133ca028ebb15ff2e8549a369916] qed*: Refactor mf_mode to consist of bits. git bisect bad 0bc5fe857274133ca028ebb15ff2e8549a369916 # first bad commit: [0bc5fe857274133ca028ebb15ff2e8549a369916] qed*: Refactor mf_mode to consist of bits. -- Meelis Roos (mr...@linux.ee)
HH DL585 warm boot fail (old)
I have a first gen HP Proliant DL585 ("G1" but the name was not used back then) that boots up fine from poweron but usually fails bootup from warm reboot, somewhere in PCI detection (will try to photographs the screen some time). I just stumbled upon an old OpenSolaris thead about the same DL585 and same symptoms: http://opensolaris-discuss.opensolaris.narkive.com/T0UTXYGZ/solaris-10-06-06-x86-hp-dl585-boot-hang-aftrer-reboot-help Their conclusion was the wfollowing and they seem to have found a fix (although I have not tested any version of Solaris on this DL585 myself): "The hang is caused when, during PCI enumeration, a PCI-PCI bridge is partially disabled when the PCI command register bits which enable IO and memory windows are cleared." Is this information useful in some way for debugging it? What else besides screenshot of the can be useful in debugging? -- Meelis Roos (mr...@linux.ee)
UBSAN: Undefined behaviour in lib/percpu_counter.c:92:14
This is on a AMD Athlon64 X2 compiling kernel with make -2: [91550.438790] [91550.438832] UBSAN: Undefined behaviour in lib/percpu_counter.c:92:14 [91550.438862] signed integer overflow: [91550.43] 91550438785688 + 9223336756968817285 cannot be represented in type 'long long int' [91550.438923] CPU: 0 PID: 8875 Comm: cc1 Not tainted 4.18.0-rc3-00113-gfc36def997cf #11 [91550.438924] Hardware name: HP-Pavilion RT589AA-ABU t3709.uk/Nance, BIOS 5.02 11/26/2006 [91550.438924] Call Trace: [91550.438929] [91550.438937] dump_stack+0x5a/0x9b [91550.438941] ubsan_epilogue+0x9/0x40 [91550.438944] handle_overflow+0xf2/0x100 [91550.438946] percpu_counter_add_batch+0xfb/0x120 [91550.438949] cfq_completed_request+0x320/0xb00 [91550.438953] __blk_put_request+0x15d/0x390 [91550.438957] scsi_end_request+0x154/0x370 [91550.438960] scsi_io_completion+0x603/0x9e0 [91550.438963] blk_done_softirq+0xe6/0x1c0 [91550.438967] __do_softirq+0x118/0x414 [91550.438970] irq_exit+0xa2/0xd0 [91550.438972] do_IRQ+0xac/0x160 [91550.438974] common_interrupt+0xf/0xf [91550.438976] [91550.438978] RIP: 0033:0x7f54e89631b7 [91550.438979] Code: 83 f9 02 48 0f 47 cf 83 c1 7c e9 a9 fa ff ff 4c 8b 41 08 4c 89 c2 48 83 e2 f8 48 39 d3 0f 87 fb 00 00 00 48 8d 3c 11 48 8b 07 <48> 39 d0 0f 85 35 01 00 00 48 8b 51 10 48 8b 71 18 48 39 4a 18 0f [91550.439006] RSP: 002b:7ffc7398a470 EFLAGS: 0287 ORIG_RAX: ffde [91550.439007] RAX: 02a0 RBX: 0060 RCX: 03311380 [91550.439009] RDX: 02a0 RSI: 7f54e8c96f30 RDI: 03311620 [91550.439010] RBP: 0004 R08: 02a1 R09: 7f54e8c96cb0 [91550.439011] R10: R11: 0001 R12: [91550.439012] R13: 7f54e8c96c40 R14: 02e3b010 R15: 7f54e8c96ca0 [91550.439013] ======== -- Meelis Roos (mr...@linux.ee)
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
> > I tried 4.18.0-rc1-00023-g9ffc59d57228 and now > > 4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other > > 32-bit machines, and got half-failed bootup - kernel and userspace come > > up but some services fail to start, including network and > > systemd-journald: > > > > systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), &ts) > > == 0' failed at ../src/basic/time-util.c:53, function now(). Aborting. > > > > I then tried multiple other machines. All x86-64 machines seem > > unaffected, some x86-32 machines are affected (Athlon with AMD750 > > chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), > > some very similar x86-32 machines are unaffected. I have different > > customized kernel configuration on them, so far I have not pinpointed > > any configuration option to be at fault. > > > > All machines run Debian unstable. > > > > 4.17.0 was working fine. > > > > Will continue with bisecting between 4.17.0 and > > 4.18.0-rc1-00023-g9ffc59d57228. > > That does sound like it is related to my patches indeed. If you are not > yet done bisecting, please checkout commit e27c49291a7f ("x86: Convert > x86_platform_ops to timespec64") before you try anything else, that > one is the top of the branch with my changes. If that fails, the bisection > will be much quicker. This commit was fine. So it's likely something else. -- Meelis Roos (mr...@linux.ee)
4.18-rc* regression: x86-32 troubles (with timers?)
I tried 4.18.0-rc1-00023-g9ffc59d57228 and now 4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other 32-bit machines, and got half-failed bootup - kernel and userspace come up but some services fail to start, including network and systemd-journald: systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), &ts) == 0' failed at ../src/basic/time-util.c:53, function now(). Aborting. I then tried multiple other machines. All x86-64 machines seem unaffected, some x86-32 machines are affected (Athlon with AMD750 chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), some very similar x86-32 machines are unaffected. I have different customized kernel configuration on them, so far I have not pinpointed any configuration option to be at fault. All machines run Debian unstable. 4.17.0 was working fine. Will continue with bisecting between 4.17.0 and 4.18.0-rc1-00023-g9ffc59d57228. [0.00] Linux version 4.18.0-rc3-00113-gfc36def997cf (mroos@rx100s2) (gcc version 7.3.0 (Debian 7.3.0-23)) #27 SMP Wed Jul 4 13:06:34 EEST 2018 [0.00] x86/fpu: x87 FPU will use FXSAVE [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009afff] usable [0.00] BIOS-e820: [mem 0x0009b000-0x0009] reserved [0.00] BIOS-e820: [mem 0x000ca000-0x000cbfff] reserved [0.00] BIOS-e820: [mem 0x000dc000-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0x3ff6] usable [0.00] BIOS-e820: [mem 0x3ff7-0x3ff79fff] ACPI data [0.00] BIOS-e820: [mem 0x3ff7a000-0x3ff7] ACPI NVS [0.00] BIOS-e820: [mem 0x3ff8-0x3fff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec0] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xff80-0xffbf] reserved [0.00] BIOS-e820: [mem 0xfc00-0x] reserved [0.00] Notice: NX (Execute Disable) protection missing in CPU! [0.00] SMBIOS 2.3 present. [0.00] DMI: FUJITSU SIEMENS PRIMERGY RX100S2/D1571/M71IXG, BIOS 6.0 Rev. C0F2.1571 04/27/2005 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] last_pfn = 0x3ff70 max_arch_pfn = 0x10 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-C7FFF write-protect [0.00] C8000-D uncachable [0.00] E-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 0 mask FC000 write-back [0.00] 1 base 03FF8 mask 8 uncachable [0.00] 2 disabled [0.00] 3 disabled [0.00] 4 disabled [0.00] 5 disabled [0.00] 6 disabled [0.00] 7 disabled [0.00] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- UC [0.00] total RAM covered: 1023M [0.00] Found optimal setting for mtrr clean up [0.00] gran_size: 64K chunk_size: 1M num_reg: 2 lose cover RAM: 0G [0.00] found SMP MP-table at [mem 0x000f6680-0x000f668f] mapped at [(ptrval)] [0.00] initial memory mapped: [mem 0x-0x04ff] [0.00] Base memory trampoline at [(ptrval)] 97000 size 16384 [0.00] BRK [0x04d97000, 0x04d97fff] PGTABLE [0.00] ACPI: Early table checksum verification disabled [0.00] ACPI: RSDP 0x000F66B0 14 (v00 PTLTD ) [0.00] ACPI: RSDT 0x3FF75B79 38 (v01 PTLTDRSDT 0604 LTP ) [0.00] ACPI: FACP 0x3FF79E69 74 (v01 INTEL CANTWOOD 0604 PTL 0003) [0.00] ACPI: DSDT 0x3FF75BB1 0042B8 (v01 INTEL CANTWOOD 0604 MSFT 010B) [0.00] ACPI: FACS 0x3FF7AFC0 40 [0.00] ACPI: SPCR 0x3FF79EDD 50 (v01 PTLTD $UCRTBL$ 0604 PTL 0001) [0.00] ACPI: APIC 0x3FF79F2D 74 (v01 PTLTD ? APIC 0604 LTP ) [0.00] ACPI: BOOT 0x3FF79FA1 28 (v01 PTLTD $SBFTBL$ 0604 LTP 0001) [0.00] ACPI: SSDT 0x3FF79FC9 37 (v01 PTLTD ACPIHT 0604 LTP 0001) [0.00] ACPI: Local APIC address 0xfee0 [0.00] 135MB HIGHMEM available. [0.00] 887MB LOWMEM available. [0.00] mapped low ram: 0 - 377fe000 [0.00] low ram: 0 - 377fe000 [0.00] tsc: Fast TSC calibration using PIT [0.00] BRK [0x04d98000, 0x04d98fff] PGTABLE [0.00] Zone ranges: [0.00] DMA [mem 0x1000-0x00ff] [0.00]
4.18-rc1: Bad or missing .orc_unwind table. Disabling unwinder.
HP Proliant DL360 G6 displays the following on bootup with 4.18.0-rc1-00023-g9ffc59d57228 (4.17 did not display this warning): [0.00] WARNING: WARNING: Bad or missing .orc_unwind table. Disabling unwinder. Debian unstable, gcc 7.3.0-21, config below. # # Automatically generated file; DO NOT EDIT. # Linux/x86 4.18.0-rc1 Kernel Configuration # # # Compiler: gcc (Debian 7.3.0-21) 7.3.0 # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_MMU=y CONFIG_ARCH_MMAP_RND_BITS_MIN=28 CONFIG_ARCH_MMAP_RND_BITS_MAX=32 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_ARCH_HAS_FILTER_PGPROT=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_HAVE_INTEL_TXT=y CONFIG_X86_64_SMP=y CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_PGTABLE_LEVELS=4 CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=70300 CONFIG_CLANG_VERSION=0 CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y # CONFIG_KERNEL_GZIP is not set # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set CONFIG_KERNEL_XZ=y # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y # CONFIG_USELIB is not set CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_WATCH=y CONFIG_AUDIT_TREE=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_GENERIC_IRQ_MIGRATION=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_GENERIC_MSI_IRQ_DOMAIN=y CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y CONFIG_GENERIC_IRQ_RESERVATION_MODE=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y # CONFIG_GENERIC_IRQ_DEBUGFS is not set CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ_FULL is not set # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set # CONFIG_IRQ_TIME_ACCOUNTING is not set # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y # CONFIG_CPU_ISOLATION is not set # # RCU Subsystem # CONFIG_TREE_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y CONFIG_TREE_SRCU=y CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_NEED_SEGCBLIST=y # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=17 CONFIG_LOG_CPU_MAX_BUF_SHIFT=12 CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y CONFIG_ARCH_SUPPORTS_INT128=y CONFIG_NUMA_BALANCING=y CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y CONFIG_CGROUPS=y CONFIG_PAGE_COUNTER=y CONFIG_MEMCG=y CONFIG_MEMCG_SWAP=y # CONFIG_MEMCG_SWAP_ENABLED is not set CONFIG_BLK_CGROUP=y # CONFIG_DEBUG_BLK_CGROUP is not set CONFIG_CGROUP_WRITEBACK=y CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_CFS_BANDWIDTH=y # CONFIG_RT_GROUP_SCHED is not set CONFIG_CGROUP_PIDS=y # CONFIG_CGROUP_RDMA is not set CONFIG_CGROUP_FREEZER=y CONFIG_CGROUP_HUGETLB=y CONFIG_CPUSETS=y CONFIG_PROC_PID_CPUSET=y CONFIG_CGROUP_DEVICE=y CONFIG_CGROUP_CPUACCT=y CONFIG_CGROUP_PERF=y CONFIG_CGROUP_BPF=y # CONFIG_CGROUP_DEBUG is not set CONFIG_SOCK_CGROUP_DATA=y CONFIG_NAMESPACES=y CONFIG_UTS_NS=y CONFIG_IPC_NS=y CONFIG_USER_NS=y CONFIG_PID_NS=y CONFIG_NET_NS=y # CONFIG_SCHED_AUTOGROUP is not set # CONFIG_SYSFS_DEPRECATED is not set # CONFIG_RELAY is n
iomi-si UBSAN warning and NULL pointer dereference
: 0006 R12: c0181aa8 [7.611822] R13: R14: 8e8e3b2df240 R15: c0181260 [7.611894] FS: 7fef3a80b8c0() GS:8e8e3dd0() knlGS: [7.611988] CS: 0010 DS: ES: CR0: 80050033 [7.612067] CR2: CR3: 3ab1a000 CR4: 000006e0 -- Meelis Roos (mr...@linux.ee)
4.17.0-10146-gf0dc7f9c6dd9: hw csum failure on powerpc+sungem
I am seeing this on PowerMac G4 with sungem ethernet driver. 4.17 was OK, 4.17.0-10146-gf0dc7f9c6dd9 is problematic. [ 140.518664] eth0: hw csum failure [ 140.518699] CPU: 0 PID: 1237 Comm: postconf Not tainted 4.17.0-10146-gf0dc7f9c6dd9 #83 [ 140.518707] Call Trace: [ 140.518734] [effefd90] [c03d6db8] __skb_checksum_complete+0xd8/0xdc (unreliable) [ 140.518759] [effefdb0] [c04c1284] icmpv6_rcv+0x248/0x4ec [ 140.518775] [effefdd0] [c049a448] ip6_input_finish.constprop.0+0x11c/0x5f4 [ 140.518786] [effefe10] [c049b1c0] ip6_mc_input+0xcc/0x100 [ 140.518807] [effefe20] [c03e110c] __netif_receive_skb_core+0x310/0x944 [ 140.518820] [effefe70] [c03e76ec] napi_gro_receive+0xd0/0xe8 [ 140.518845] [effefe80] [f3e1f66c] gem_poll+0x618/0x1274 [sungem] [ 140.518856] [effeff30] [c03e6f0c] net_rx_action+0x198/0x374 [ 140.518872] [effeff90] [c0501a88] __do_softirq+0x120/0x278 [ 140.518890] [effeffe0] [c0036188] irq_exit+0xd8/0xdc [ 140.518908] [effefff0] [c000f478] call_do_irq+0x24/0x3c [ 140.518925] [d05a5d30] [c0007120] do_IRQ+0x74/0xf0 [ 140.518941] [d05a5d50] [c0012474] ret_from_except+0x0/0x14 [ 140.518960] --- interrupt: 501 at copy_page+0x40/0x90 LR = copy_user_page+0x18/0x30 [ 140.518973] [d05a5e10] [d058cd80] 0xd058cd80 (unreliable) [ 140.518989] [d05a5e20] [c00fa2bc] wp_page_copy+0xec/0x654 [ 140.519002] [d05a5e60] [c00fd3a4] do_wp_page+0xa8/0x5b4 [ 140.519013] [d05a5e90] [c00fe934] handle_mm_fault+0x564/0xa84 [ 140.519025] [d05a5f00] [c0016230] do_page_fault+0x1bc/0x7e8 [ 140.519037] [d05a5f40] [c0012300] handle_page_fault+0x14/0x40 [ 140.519048] --- interrupt: 301 at 0xb78b6864 LR = 0xb78b6c54 -- Meelis Roos (mr...@linux.ee)
Re: 85f1abe001 ("kthread, sched/wait: Fix kthread_parkme() .."): WARNING: CPU: 0 PID: 1 at kernel/kthread.c:486 kthread_park
I had the same kthread_parkme warning on many machines I tested with 4.17.0-rc6-00158-gbee797529d7c (x86, amd64, sparc, parisc, alpha). Your patch https://lkml.org/lkml/2018/5/4/212 fixed the problem for me. Sorry for off-thread respnse, I found your mail from the web only. -- Meelis Roos (mr...@linux.ee)
Re: [PATCH v1 0/4] sparc/PCI: VGA resource and other fixes
> [+cc sparclinux, sorry I missed this first time around] > > sparc/PCI: Use dev_printk() when possible This patch causes compile errors for me: CC arch/sparc/kernel/pci.o In file included from ./include/linux/pci.h:31:0, from arch/sparc/kernel/pci.c:18: arch/sparc/kernel/pci.c: In function ‘pcibios_enable_device’: arch/sparc/kernel/pci.c:754:53: error: ‘old_cmd’ undeclared (first use in this function) pci_info(dev, "enabling device (%04x -> %04x)\n", old_cmd, cmd); ^ ./include/linux/device.h:1384:58: note: in definition of macro ‘dev_info’ #define dev_info(dev, fmt, arg...) _dev_info(dev, fmt, ##arg) ^ arch/sparc/kernel/pci.c:754:3: note: in expansion of macro ‘pci_info’ pci_info(dev, "enabling device (%04x -> %04x)\n", old_cmd, cmd); ^ arch/sparc/kernel/pci.c:754:53: note: each undeclared identifier is reported only once for each function it appears in pci_info(dev, "enabling device (%04x -> %04x)\n", old_cmd, cmd); ^ ./include/linux/device.h:1384:58: note: in definition of macro ‘dev_info’ #define dev_info(dev, fmt, arg...) _dev_info(dev, fmt, ##arg) ^ arch/sparc/kernel/pci.c:754:3: note: in expansion of macro ‘pci_info’ pci_info(dev, "enabling device (%04x -> %04x)\n", old_cmd, cmd); ^ scripts/Makefile.build:312: recipe for target 'arch/sparc/kernel/pci.o' failed -- Meelis Roos (mr...@linux.ee)
Re: 4.16-rc2+git: pata_serverworks: hanging ata detection thread on HP DL380G3
> Hello, Meelis. > > Can you please verify whether the following patch fixes the problem? > > Thanks. > > Subject: blk-mq: Directly schedule q->timeout_work when aborting a request Yes, this patch on top of 4.16 fixes it for me. dmesg shows CD detection works fast now: [2.278383] libata version 3.00 loaded. [2.292212] scsi host1: pata_serverworks [2.292618] scsi host2: pata_serverworks [2.292844] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0x2000 irq 14 [2.292973] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x2008 irq 15 [...] [2.578720] ata1.00: ATAPI: COMPAQ CD-ROM SN-124, N104, max PIO4 [2.583705] ata1.00: configured for PIO4 [2.584526] scsi 1:0:0:0: CD-ROMCOMPAQ CD-ROM SN-124N104 PQ: 0 ANSI: 5 [2.812963] scsi 1:0:0:0: Attached scsi generic sg3 type 5 [...] [3.179602] sr 1:0:0:0: [sr0] scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray [3.179826] cdrom: Uniform CD-ROM driver Revision: 3.20 [3.180198] sr 1:0:0:0: Attached scsi CD-ROM sr0 config at the last step of bisection: # # Automatically generated file; DO NOT EDIT. # Linux/x86 4.15.0-rc4 Kernel Configuration # # CONFIG_64BIT is not set CONFIG_X86_32=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf32-i386" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_MMU=y CONFIG_ARCH_MMAP_RND_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_BITS_MAX=16 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_32_SMP=y CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_PGTABLE_LEVELS=3 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y # CONFIG_KERNEL_GZIP is not set # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set CONFIG_KERNEL_XZ=y # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y # CONFIG_USELIB is not set CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_WATCH=y CONFIG_AUDIT_TREE=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_GENERIC_IRQ_MIGRATION=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_GENERIC_MSI_IRQ_DOMAIN=y CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y CONFIG_GENERIC_IRQ_RESERVATION_MODE=y # CONFIG_IRQ_DOMAIN_DEBUG is not set CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y # CONFIG_GENERIC_IRQ_DEBUGFS is not set CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_IRQ_TIME_ACCOUNTING is not set # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y # CONFIG_CPU_ISOLATION is not set # # RCU Subsystem # CONFIG_TREE_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y CONFIG_TREE_SRCU=y # CONFIG_TASKS_RCU is not set CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_NEED_SEGCBLIST=y # CONFIG_BUILD_BIN2C is not set # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=17 CONFIG_LOG_CPU_MAX_BUF_SHIFT=12 CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y CONFIG_CGROUPS=y # CONFIG_MEMCG is not set # CONFIG_BLK_CGROUP is not set # CONFIG_CGROUP_SCHED is not set # CONFIG_CGROUP_PIDS is not set # CONFIG_CGROUP_RDMA is not set # CONFIG
Re: 4.16-rc2+git: pata_serverworks: hanging ata detection thread on HP DL380G3
Added CC-s, start of the thread is at https://lkml.org/lkml/2018/2/26/165 > > > 4.16 git bootup on HP Proliant DL380 G3 pauses for a a minute or two and > > > then continues with "blocked for more than 120 seconds" message with > > > libata detection functions in ther stack - > > > async_synchronize_cookie_domain() as the last. It seems to happen during > > > IDE CD-ROM detection (detected before but registered as sr0 after the > > > warning). After detection, the eject button on the drive did not work. > > > > > > > > > pata_serverworks is the libata driver in use. > > There were no changes to pata_serverworks since 2014 and libata changes > in v4.16 look obviously correct.. > > > This is still the same in 4.16.0-rc7-00062-g0b412605ef5f. > > Any chance that you could bisect this issue? Bisected to the following commit: 358f70da49d77c43f2ca11b5da584213b2add29c is the first bad commit commit 358f70da49d77c43f2ca11b5da584213b2add29c Author: Tejun Heo Date: Tue Jan 9 08:29:50 2018 -0800 blk-mq: make blk_abort_request() trigger timeout path With issue/complete and timeout paths now using the generation number and state based synchronization, blk_abort_request() is the only one which depends on REQ_ATOM_COMPLETE for arbitrating completion. There's no reason for blk_abort_request() to be a completely separate path. This patch makes blk_abort_request() piggyback on the timeout path instead of trying to terminate the request directly. This removes the last dependency on REQ_ATOM_COMPLETE in blk-mq. Note that this makes blk_abort_request() asynchronous - it initiates abortion but the actual termination will happen after a short while, even when the caller owns the request. AFAICS, SCSI and ATA should be fine with that and I think mtip32xx and dasd should be safe but not completely sure. It'd be great if people who know the drivers take a look. v2: - Add comment explaining the lack of synchronization around ->deadline update as requested by Bart. Signed-off-by: Tejun Heo Cc: Asai Thambi SP Cc: Stefan Haberland Cc: Jan Hoeppner Cc: Bart Van Assche Signed-off-by: Jens Axboe :04 04 b5c8c2fd69850021865071f9641d54ab4fd20a15 e2dbd2a15a6baeec1332cc1416e51d537ff5040a M block -- Meelis Roos (mr...@linux.ee)
Re: 4.16-rc2+git: pata_serverworks: hanging ata detection thread on HP DL380G3
> On Thursday, March 29, 2018 11:54:09 AM Meelis Roos wrote: > > > 4.16 git bootup on HP Proliant DL380 G3 pauses for a a minute or two and > > > then continues with "blocked for more than 120 seconds" message with > > > libata detection functions in ther stack - > > > async_synchronize_cookie_domain() as the last. It seems to happen during > > > IDE CD-ROM detection (detected before but registered as sr0 after the > > > warning). After detection, the eject button on the drive did not work. > > > > > > > > > pata_serverworks is the libata driver in use. > > There were no changes to pata_serverworks since 2014 and libata changes > in v4.16 look obviously correct.. > > > This is still the same in 4.16.0-rc7-00062-g0b412605ef5f. > > Any chance that you could bisect this issue? Yes, will do. -- Meelis Roos (mr...@linux.ee)
Re: 4.16-rc2+git: pata_serverworks: hanging ata detection thread on HP DL380G3
> 4.16 git bootup on HP Proliant DL380 G3 pauses for a a minute or two and > then continues with "blocked for more than 120 seconds" message with > libata detection functions in ther stack - > async_synchronize_cookie_domain() as the last. It seems to happen during > IDE CD-ROM detection (detected before but registered as sr0 after the > warning). After detection, the eject button on the drive did not work. > > > pata_serverworks is the libata driver in use. This is still the same in 4.16.0-rc7-00062-g0b412605ef5f. > [ 242.652061] INFO: task kworker/u8:4:613 blocked for more than 120 seconds. > [ 242.652230] Not tainted 4.16.0-rc2-00374-g3664ce2d9309 #36 > [ 242.654171] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 242.654386] kworker/u8:4D0 613 2 0x8000 > [ 242.654517] Workqueue: events_unbound async_run_entry_fn > [ 242.654637] Call Trace: > [ 242.654759] __schedule+0x1bc/0x8d3 > [ 242.654877] ? set_next_entity+0xc1/0x39a > [ 242.654994] schedule+0x28/0xb2 > [ 242.655096] async_synchronize_cookie_domain+0xac/0xf4 > [ 242.655217] ? __clear_rsb+0x1d/0x32 > [ 242.655334] ? wait_woken+0xb7/0xb7 > [ 242.655449] async_synchronize_cookie+0xd/0x15 > [ 242.655583] async_port_probe+0x57/0x87 [libata] > [ 242.655703] ? __clear_rsb+0xd/0x32 > [ 242.655825] ? ata_port_probe+0x52/0x52 [libata] > [ 242.655945] async_run_entry_fn+0x49/0x1f2 > [ 242.656075] process_one_work+0x20a/0x568 > [ 242.656191] worker_thread+0x4c/0x631 > [ 242.656312] kthread+0x140/0x1e4 > [ 242.656428] ? process_one_work+0x568/0x568 > [ 242.656547] ? kthread_create_on_node+0x23/0x23 > [ 242.656667] ret_from_fork+0x2e/0x38 > [ 242.656793] INFO: task systemd-udevd:803 blocked for more than 120 seconds. > [ 242.656920] Not tainted 4.16.0-rc2-00374-g3664ce2d9309 #36 > [ 242.657039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 242.657257] systemd-udevd D0 803758 0x8004 > [ 242.657379] Call Trace: > [ 242.657495] __schedule+0x1bc/0x8d3 > [ 242.657614] ? kfree_skbmem+0x65/0x85 > [ 242.657730] schedule+0x28/0xb2 > [ 242.657846] async_synchronize_cookie_domain+0xac/0xf4 > [ 242.657968] ? wait_woken+0xb7/0xb7 > [ 242.658082] async_synchronize_full+0x14/0x16 > [ 242.658206] do_init_module+0x10f/0x24b > [ 242.658323] load_module+0x29c9/0x3865 > [ 242.658443] ? kernel_read+0x50/0xa7 > [ 242.658558] SyS_finit_module+0x78/0x8d > [ 242.658681] do_fast_syscall_32+0xc7/0x323 > [ 242.658800] entry_SYSENTER_32+0x4e/0x7c > [ 242.658916] EIP: 0xb7f0cad5 > [ 242.659030] EFLAGS: 0292 CPU: 0 > [ 242.659145] EAX: ffda EBX: 000d ECX: b7d03bdd EDX: > [ 242.659265] ESI: 011ba740 EDI: 011bdb50 EBP: ESP: bfcf1bcc > [ 242.659388] DS: 007b ES: 007b FS: GS: 0033 SS: 007b > [ 244.422337] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready > [ 246.012767] tg3 :02:01.0 eth0: Link is up at 100 Mbps, full duplex > [ 246.012875] tg3 :02:01.0 eth0: Flow control is off for TX and off for > RX > [ 246.012990] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > [ 316.432903] scsi 1:0:0:0: Attached scsi generic sg3 type 5 > [ 316.667528] sr 1:0:0:0: [sr0] scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda > tray > [ 316.667571] cdrom: Uniform CD-ROM driver Revision: 3.20 > [ 316.667837] sr 1:0:0:0: Attached scsi CD-ROM sr0 > [ 4097.814125] random: crng init done > > -- Meelis Roos (mr...@linux.ee)
Re: 4.15-rc9 new insecure W+X mapping warning
> > This is Intel SE7520JR22S mainboard with 2 64-bit P4 xeons. Earlier > > kernels up to 4.14 have had W+X checking on but found nothing. Now I > > tried 4.15.0-rc9-00023-g1f07476ec143 and it gives a new W+X warning. > > This still happens in 4.15 and 4.16-rc2+. > > What can I do to help resolving it? Below is kernel_page_tables from debugfs. > > [ 10.880663] [ cut here ] > > [ 10.880755] x86/mm: Found insecure W+X mapping at address > > d051fb08/0x8800 > > [ 10.880900] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:266 > > note_page+0x718/0xb89 > > [ 10.881035] Modules linked in: > > [ 10.881128] CPU: 2 PID: 1 Comm: swapper/0 Not tainted > > 4.15.0-rc9-00023-g1f07476ec143 #104 > > [ 10.881264] Hardware name: Intel > > /SE7520JR22S, BIOS SE7520JR22.86B.P.10.00.0087.120820051348 12/08/2005 > > [ 10.881405] RIP: 0010:note_page+0x718/0xb89 > > [ 10.881491] RSP: :c9013e48 EFLAGS: 00010296 > > [ 10.881578] RAX: 0051 RBX: c9013ec8 RCX: > > 8164f938 > > [ 10.881666] RDX: 0001 RSI: 0092 RDI: > > 82b468cc > > [ 10.881756] RBP: 0061 R08: 0177 R09: > > 01d7 > > [ 10.881844] R10: 0720072007200720 R11: 0720072007200720 R12: > > > > [ 10.881932] R13: R14: 0001 R15: > > 88099000 > > [ 10.882022] FS: () GS:88003fc8() > > knlGS: > > [ 10.882156] CS: 0010 DS: ES: CR0: 80050033 > > [ 10.882243] CR2: CR3: 0200a000 CR4: > > 06e0 > > [ 10.882331] Call Trace: > > [ 10.882423] ptdump_walk_pgd_level_core+0x367/0x3a5 > > [ 10.882511] ptdump_walk_pgd_level_checkwx+0x10/0x3e > > [ 10.882602] kernel_init+0x2e/0x10f > > [ 10.882688] ? rest_init+0xb9/0xb9 > > [ 10.882775] ret_from_fork+0x35/0x40 > > [ 10.882861] Code: fb ff ff 41 f7 c7 00 10 00 00 0f 85 e2 fe ff ff e9 36 > > fd ff ff c6 05 7d 45 6f 01 01 48 89 f2 48 c7 c7 08 5b ee 81 e8 4b d6 00 00 > > <0f> ff 48 8b 73 10 e9 bc f9 ff ff 4d 85 ed 0f 84 b9 01 00 00 41 > > [ 10.883103] ---[ end trace bc3e2cf1a1adfa39 ]--- > > [ 10.896336] x86/mm: Checked W+X mappings: FAILED, 266243 W+X pages found. > > [ 10.896430] x86/mm: Checking user space page tables > > [ 10.909522] x86/mm: Checked W+X mappings: FAILED, 56 W+X pages found. ---[ User Space ]--- 0x-0x800016777088T pgd ---[ Kernel Space ]--- 0x8000-0x8800 8T pgd ---[ Low Kernel Mapping ]--- 0x8800-0x88099000 612K RW GLB x pte 0x88099000-0x8809b000 8K ro GLB x pte 0x8809b000-0x88201428K RW GLB x pte 0x8820-0x88000100 14M RW PSE GLB x pmd 0x88000100-0x88000180 8M ro PSE GLB x pmd 0x88000180-0x880001892000 584K ro GLB x pte 0x880001892000-0x880001a01464K RW GLB x pte 0x880001a0-0x880001b520001352K ro GLB x pte 0x880001b52000-0x880001c0 696K RW GLB x pte 0x880001c0-0x8800dfe03554M RW PSE GLB x pmd 0x8800dfe0-0x8800dffe1920K RW GLB x pte 0x8800dffe-0x8800e000 128K pte 0x8800e000-0x8801 512M pmd 0x8801-0x88030ec08428M RW PSE GLB x pmd 0x88030ec0-0x88030ec12000 72K RW GLB x pte 0x88030ec12000-0x88030ec1a000 32K ro GLB x pte 0x88030ec1a000-0x88030ec28000 56K RW GLB x pte 0x88030ec28000-0x88030ec3 32K ro GLB x pte 0x88030ec3-0x88030ec37000 28K RW GLB x pte 0x88030ec37000-0x88030ec4c000 84K ro GLB x pte 0x88030ec4c000-0x88030ec5 16K RW GLB x pte 0x88030ec5-0x88030ec54000 16K ro GLB x pte 0x88030ec54000-0x88030ec82000 184K RW GLB x pte 0x88030ec82000-0x88030ec84000 8K ro GLB x pte 0x88030ec84000-0x88030ec92000 56K RW GLB x pte 0x88030ec92000-0x88030ec97000 20K ro GLB x pte 0
Re: 4.15-rc9 new insecure W+X mapping warning
> This is Intel SE7520JR22S mainboard with 2 64-bit P4 xeons. Earlier > kernels up to 4.14 have had W+X checking on but found nothing. Now I > tried 4.15.0-rc9-00023-g1f07476ec143 and it gives a new W+X warning. Actually, I was wrong about earlier kernels - I just did not have CONFIG_DEBUG_WX turned on before and eralier kernels did not check it. Recompiled 4.14 with CONFIG_DEBUG_WX=y and the problem is there. So this is not a Linux regression but a peculiarity with the SE7520JR22S, it seems. Is there anything that Linux might be doing wrong? > [ 10.880663] [ cut here ] > [ 10.880755] x86/mm: Found insecure W+X mapping at address > d051fb08/0x8800 > [ 10.880900] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:266 > note_page+0x718/0xb89 > [ 10.881035] Modules linked in: > [ 10.881128] CPU: 2 PID: 1 Comm: swapper/0 Not tainted > 4.15.0-rc9-00023-g1f07476ec143 #104 > [ 10.881264] Hardware name: Intel > /SE7520JR22S, BIOS SE7520JR22.86B.P.10.00.0087.120820051348 12/08/2005 > [ 10.881405] RIP: 0010:note_page+0x718/0xb89 > [ 10.881491] RSP: :c9013e48 EFLAGS: 00010296 > [ 10.881578] RAX: 0051 RBX: c9013ec8 RCX: > 8164f938 > [ 10.881666] RDX: 0001 RSI: 0092 RDI: > 82b468cc > [ 10.881756] RBP: 0061 R08: 0177 R09: > 01d7 > [ 10.881844] R10: 0720072007200720 R11: 0720072007200720 R12: > > [ 10.881932] R13: R14: 0001 R15: > 88099000 > [ 10.882022] FS: () GS:88003fc8() > knlGS: > [ 10.882156] CS: 0010 DS: ES: CR0: 80050033 > [ 10.882243] CR2: CR3: 0200a000 CR4: > 06e0 > [ 10.882331] Call Trace: > [ 10.882423] ptdump_walk_pgd_level_core+0x367/0x3a5 > [ 10.882511] ptdump_walk_pgd_level_checkwx+0x10/0x3e > [ 10.882602] kernel_init+0x2e/0x10f > [ 10.882688] ? rest_init+0xb9/0xb9 > [ 10.882775] ret_from_fork+0x35/0x40 > [ 10.882861] Code: fb ff ff 41 f7 c7 00 10 00 00 0f 85 e2 fe ff ff e9 36 fd > ff ff c6 05 7d 45 6f 01 01 48 89 f2 48 c7 c7 08 5b ee 81 e8 4b d6 00 00 <0f> > ff 48 8b 73 10 e9 bc f9 ff ff 4d 85 ed 0f 84 b9 01 00 00 41 > [ 10.883103] ---[ end trace bc3e2cf1a1adfa39 ]--- > [ 10.896336] x86/mm: Checked W+X mappings: FAILED, 266243 W+X pages found. > [ 10.896430] x86/mm: Checking user space page tables > [ 10.909522] x86/mm: Checked W+X mappings: FAILED, 56 W+X pages found. -- Meelis Roos (mr...@linux.ee)
UBSAN warning in nouveau_bios.c:1528:8
This is the first time I have tried UBSAN on this specific machine (onboard nforce 420 with HP BIOS on Nance mainboard). nouveau seems to be working fine but gives this UBSAN warning: [7.953957] nouveau :00:0d.0: NVIDIA C61 (04c000a2) [7.965101] nouveau :00:0d.0: bios: version 05.61.32.25.02 [7.966141] nouveau :00:0d.0: fb: 128 MiB of unknown memory type [8.015336] [TTM] Zone kernel: Available graphics memory: 952564 kiB [8.015339] [TTM] Initializing pool allocator [8.015344] [TTM] Initializing DMA pool allocator [8.015370] nouveau :00:0d.0: DRM: VRAM: 125 MiB [8.015372] nouveau :00:0d.0: DRM: GART: 512 MiB [8.015377] nouveau :00:0d.0: DRM: TMDS table version 1.1 [8.015379] nouveau :00:0d.0: DRM: DCB version 3.0 [8.015382] nouveau :00:0d.0: DRM: DCB outp 00: 01000310 0023 [8.015385] nouveau :00:0d.0: DRM: DCB outp 01: 00110204 98830003 [8.015386] [8.015423] UBSAN: Undefined behaviour in drivers/gpu/drm/nouveau/nouveau_bios.c:1528:8 [8.015455] shift exponent -1 is negative [8.015482] CPU: 1 PID: 148 Comm: systemd-udevd Not tainted 4.16.0-rc3-00167-g97ace515f014 #1 [8.015483] Hardware name: HP-Pavilion RT589AA-ABU t3709.uk/Nance, BIOS 5.02 11/26/2006 [8.015485] Call Trace: [8.015496] dump_stack+0x5a/0x99 [8.015500] ubsan_epilogue+0x9/0x40 [8.015503] __ubsan_handle_shift_out_of_bounds+0x124/0x160 [8.015506] ? _dev_info+0x67/0x90 [8.015509] ? dev_printk_emit+0x49/0x70 [8.015632] parse_dcb_entry+0x91e/0xd90 [nouveau] [8.015712] ? parse_bit_M_tbl_entry+0x150/0x150 [nouveau] [8.015791] olddcb_outp_foreach+0x66/0xa0 [nouveau] [8.015870] nouveau_bios_init+0x23a/0x2250 [nouveau] [8.015950] ? nouveau_ttm_init+0x3a4/0x710 [nouveau] [8.016029] nouveau_drm_load+0x229/0xf10 [nouveau] [8.016033] ? sysfs_do_create_link_sd+0xa6/0x170 [8.016067] drm_dev_register+0x1b7/0x330 [drm] [8.016070] ? pci_enable_device_flags+0x160/0x1f0 [8.016091] drm_get_pci_dev+0xee/0x2e0 [drm] [8.016172] nouveau_drm_probe+0x1dd/0x270 [nouveau] [8.016175] pci_device_probe+0x113/0x1d0 [8.016178] driver_probe_device+0x375/0x720 [8.016180] __driver_attach+0xeb/0x150 [8.016181] ? driver_probe_device+0x720/0x720 [8.016183] bus_for_each_dev+0x84/0xe0 [8.016186] bus_add_driver+0x19f/0x340 [8.016188] driver_register+0x67/0x110 [8.016190] ? 0xc0cfb000 [8.016193] do_one_initcall+0x66/0x210 [8.016197] do_init_module+0xa7/0x2a9 [8.016199] load_module+0x2548/0x3d30 [8.016202] ? __symbol_put+0x60/0x60 [8.016205] ? kernel_read_file+0x21b/0x390 [8.016208] ? kernel_read_file_from_fd+0x52/0x90 [8.016210] SYSC_finit_module+0x124/0x150 [8.016212] do_syscall_64+0x7a/0x1f0 [8.016214] ? page_fault+0x2f/0x50 [8.016217] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [8.016219] RIP: 0033:0x7f2e47b82e19 [8.016220] RSP: 002b:7ffdcdc157b8 EFLAGS: 0246 ORIG_RAX: 0139 [8.016223] RAX: ffda RBX: 5638b23c7250 RCX: 7f2e47b82e19 [8.016224] RDX: RSI: 7f2e4788d0ed RDI: 0019 [8.016225] RBP: 7f2e4788d0ed R08: R09: [8.016226] R10: 0019 R11: 0246 R12: [8.016227] R13: 5638b23c2ce0 R14: 0002 R15: 5638b23c7250 [8.016228] [8.016299] nouveau :00:0d.0: DRM: DCB conn 00: [8.016301] nouveau :00:0d.0: DRM: DCB conn 01: 1131 [8.016302] nouveau :00:0d.0: DRM: DCB conn 02: 0110 [8.016304] nouveau :00:0d.0: DRM: DCB conn 03: 0111 [8.016305] nouveau :00:0d.0: DRM: DCB conn 04: 0113 [8.016626] nouveau :00:0d.0: DRM: Saving VGA fonts [8.052781] nouveau :00:0d.0: DRM: DCB type 4 not known [8.052784] nouveau :00:0d.0: DRM: Unknown-1 has no encoders, removing [8.053728] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [8.053729] [drm] Driver supports precise vblank timestamp query. [8.055836] nouveau :00:0d.0: DRM: MM: using M2MF for buffer copies [8.084488] nouveau :00:0d.0: DRM: allocated 1280x1024 fb: 0x9000, bo 50f4b5d0 [8.084678] fbcon: nouveaufb (fb0) is primary device [8.193959] Console: switching to colour frame buffer device 160x64 [8.195378] nouveau :00:0d.0: fb0: nouveaufb frame buffer device [8.212083] [drm] Initialized nouveau 1.3.1 20120801 for :00:0d.0 on minor 0 -- Meelis Roos (mr...@linux.ee)
Re: 4.15-rc9 new insecure W+X mapping warning
> This is Intel SE7520JR22S mainboard with 2 64-bit P4 xeons. Earlier > kernels up to 4.14 have had W+X checking on but found nothing. Now I > tried 4.15.0-rc9-00023-g1f07476ec143 and it gives a new W+X warning. This still happens in 4.15 and 4.16-rc2+. What can I do to help resolving it? > [ 10.880663] [ cut here ] > [ 10.880755] x86/mm: Found insecure W+X mapping at address > d051fb08/0x8800 > [ 10.880900] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:266 > note_page+0x718/0xb89 > [ 10.881035] Modules linked in: > [ 10.881128] CPU: 2 PID: 1 Comm: swapper/0 Not tainted > 4.15.0-rc9-00023-g1f07476ec143 #104 > [ 10.881264] Hardware name: Intel > /SE7520JR22S, BIOS SE7520JR22.86B.P.10.00.0087.120820051348 12/08/2005 > [ 10.881405] RIP: 0010:note_page+0x718/0xb89 > [ 10.881491] RSP: :c9013e48 EFLAGS: 00010296 > [ 10.881578] RAX: 0051 RBX: c9013ec8 RCX: > 8164f938 > [ 10.881666] RDX: 0001 RSI: 0092 RDI: > 82b468cc > [ 10.881756] RBP: 0061 R08: 0177 R09: > 01d7 > [ 10.881844] R10: 0720072007200720 R11: 0720072007200720 R12: > > [ 10.881932] R13: R14: 0001 R15: > 88099000 > [ 10.882022] FS: () GS:88003fc8() > knlGS: > [ 10.882156] CS: 0010 DS: ES: CR0: 80050033 > [ 10.882243] CR2: CR3: 0200a000 CR4: > 06e0 > [ 10.882331] Call Trace: > [ 10.882423] ptdump_walk_pgd_level_core+0x367/0x3a5 > [ 10.882511] ptdump_walk_pgd_level_checkwx+0x10/0x3e > [ 10.882602] kernel_init+0x2e/0x10f > [ 10.882688] ? rest_init+0xb9/0xb9 > [ 10.882775] ret_from_fork+0x35/0x40 > [ 10.882861] Code: fb ff ff 41 f7 c7 00 10 00 00 0f 85 e2 fe ff ff e9 36 fd > ff ff c6 05 7d 45 6f 01 01 48 89 f2 48 c7 c7 08 5b ee 81 e8 4b d6 00 00 <0f> > ff 48 8b 73 10 e9 bc f9 ff ff 4d 85 ed 0f 84 b9 01 00 00 41 > [ 10.883103] ---[ end trace bc3e2cf1a1adfa39 ]--- > [ 10.896336] x86/mm: Checked W+X mappings: FAILED, 266243 W+X pages found. > [ 10.896430] x86/mm: Checking user space page tables > [ 10.909522] x86/mm: Checked W+X mappings: FAILED, 56 W+X pages found. -- Meelis Roos (mr...@linux.ee)
4.16-rc2+git: pata_serverworks: hanging ata detection thread on HP DL380G3
654386] kworker/u8:4D0 613 2 0x8000 [ 242.654517] Workqueue: events_unbound async_run_entry_fn [ 242.654637] Call Trace: [ 242.654759] __schedule+0x1bc/0x8d3 [ 242.654877] ? set_next_entity+0xc1/0x39a [ 242.654994] schedule+0x28/0xb2 [ 242.655096] async_synchronize_cookie_domain+0xac/0xf4 [ 242.655217] ? __clear_rsb+0x1d/0x32 [ 242.655334] ? wait_woken+0xb7/0xb7 [ 242.655449] async_synchronize_cookie+0xd/0x15 [ 242.655583] async_port_probe+0x57/0x87 [libata] [ 242.655703] ? __clear_rsb+0xd/0x32 [ 242.655825] ? ata_port_probe+0x52/0x52 [libata] [ 242.655945] async_run_entry_fn+0x49/0x1f2 [ 242.656075] process_one_work+0x20a/0x568 [ 242.656191] worker_thread+0x4c/0x631 [ 242.656312] kthread+0x140/0x1e4 [ 242.656428] ? process_one_work+0x568/0x568 [ 242.656547] ? kthread_create_on_node+0x23/0x23 [ 242.656667] ret_from_fork+0x2e/0x38 [ 242.656793] INFO: task systemd-udevd:803 blocked for more than 120 seconds. [ 242.656920] Not tainted 4.16.0-rc2-00374-g3664ce2d9309 #36 [ 242.657039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.657257] systemd-udevd D0 803758 0x8004 [ 242.657379] Call Trace: [ 242.657495] __schedule+0x1bc/0x8d3 [ 242.657614] ? kfree_skbmem+0x65/0x85 [ 242.657730] schedule+0x28/0xb2 [ 242.657846] async_synchronize_cookie_domain+0xac/0xf4 [ 242.657968] ? wait_woken+0xb7/0xb7 [ 242.658082] async_synchronize_full+0x14/0x16 [ 242.658206] do_init_module+0x10f/0x24b [ 242.658323] load_module+0x29c9/0x3865 [ 242.658443] ? kernel_read+0x50/0xa7 [ 242.658558] SyS_finit_module+0x78/0x8d [ 242.658681] do_fast_syscall_32+0xc7/0x323 [ 242.658800] entry_SYSENTER_32+0x4e/0x7c [ 242.658916] EIP: 0xb7f0cad5 [ 242.659030] EFLAGS: 0292 CPU: 0 [ 242.659145] EAX: ffda EBX: 000d ECX: b7d03bdd EDX: [ 242.659265] ESI: 011ba740 EDI: 011bdb50 EBP: ESP: bfcf1bcc [ 242.659388] DS: 007b ES: 007b FS: GS: 0033 SS: 007b [ 244.422337] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 246.012767] tg3 :02:01.0 eth0: Link is up at 100 Mbps, full duplex [ 246.012875] tg3 :02:01.0 eth0: Flow control is off for TX and off for RX [ 246.012990] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 316.432903] scsi 1:0:0:0: Attached scsi generic sg3 type 5 [ 316.667528] sr 1:0:0:0: [sr0] scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray [ 316.667571] cdrom: Uniform CD-ROM driver Revision: 3.20 [ 316.667837] sr 1:0:0:0: Attached scsi CD-ROM sr0 [ 4097.814125] random: crng init done -- Meelis Roos (mr...@linux.ee)
PCI BAR allocation failures in 4.15 and 4.16-rc2+ on HP DL585
I added a Atto U320 SCSI card into my HP Proliant DL585 (G1 despite the G naming was not used then). I have not tried earlier kernels. In dmesg there are actually two sets of BAR assignment failures, the first bridge may or may not be related. [0.353439] pci :00:03.0: BAR 15: no space for [mem size 0x0010 pref] [0.353621] pci :00:03.0: BAR 15: failed to assign [mem size 0x0010 pref] [...] [0.355801] pci :04:09.0: PCI bridge to [bus 05] [0.355801] pci :06:0e.0: BAR 6: no space for [mem size 0x0010 pref] [0.355916] pci :06:0e.0: BAR 6: failed to assign [mem size 0x0010 pref] [0.356207] pci :06:0e.1: BAR 6: no space for [mem size 0x0010 pref] [0.356387] pci :06:0e.1: BAR 6: failed to assign [mem size 0x0010 pref] [0.356661] pci :04:0a.0: PCI bridge to [bus 06] [0.356834] pci :04:0a.0: bridge window [io 0x6000-0x6fff] [0.357013] pci :04:0a.0: bridge window [mem 0xf7f0-0xf7ff] [0.357197] pci :04:0b.0: PCI bridge to [bus 07] [0.357375] pci :04:0c.0: PCI bridge to [bus 08] lspci -vvvxxx and full dmesg are below. 00:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] AMD-8111 PCI (rev 07) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [c0] HyperTransport: Slave or Primary Interface Command: BaseUnitID=3 UnitCnt=4 MastHost- DefDir- DUL- Link Control 0: CFlE- CST- CFE- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [a0] PCI-X bridge device Secondary Status: 64bit+ 133MHz+ SCD- USC- SCO- SRD- Freq=100MHz Status: Dev=00:07.0 64bit+ 133MHz+ SCD- USC- SCO- SRD- Upstream: Capacity=14 CommitmentLimit=65535 Downstream: Capacity=2 CommitmentLimit=65535 Capabilities: [b8] HyperTransport: Interrupt Discovery and Configuration Capabilities: [c0] HyperTransport: Slave or Primary Interface Command: BaseUnitID=7 UnitCnt=2 MastHost- DefDir- DUL- Link Control 0: CFlE- CST- CFE- TAbort- SERR- TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [a0] PCI-X bridge device Secondary Status: 64bit+ 133MHz+ SCD- USC- SCO- SRD- Freq=100MHz Status: Dev=00:08.0 64bit+ 133MHz+ SCD- USC- SCO- SRD- Upstream: Capacity=14 CommitmentLimit=65535 Downstream: Capacity=2 CommitmentLimit=65535 Capabilities: [b8] HyperTransport: Interrupt Discovery and Configuration Kernel modules: shpchp 00: 22 10 50 74 47 01 30 02 12 00 04 06 00 40 81 00 10: 00 00 00 00 00 00 00 00 00 03 03 40 f1 01 20 22 20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 a0 00 00 00 00 00 00 00 ff 00 01 00 40: 05 00 1f 00 01 00 00 00 00 00 00 00 01 2c 00 00 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 07 b8 83 00 40 00 03 00 0e 00 ff ff 02 00 ff ff b0: 00 00 00 00 00 00 00 00 08 00 00 80 00 00 00 06 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:08.1 PIC: Advanced Micro Devices, Inc. [AMD] AMD-8131 PCI-X IOAPIC (rev 01) (prog-if 10 [IO-APIC]) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- [disabled] Capabilities: [40] PCI-X non-bridge device Command: DPERE- ERO- RBC=2048 OST=1 Status: Dev=02:06.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz- Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable+ DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data
Re: hpsa crashes on boot in 4.16-rc2-00062
> This happens on a HP DL360 G6 with Smart Array 410i. > > Will try to bisect. > > IO completion timeout could be because of some IRQ toubles? Reverting 84676c1f21e8ff54befe985f4f14dc1edc10046b fixes it for me (as suggested by Laurence Oberman). -- Meelis Roos (mr...@linux.ee)
hpsa crashes on boot in 4.16-rc2-00062
This happens on a HP DL360 G6 with Smart Array 410i. Will try to bisect. IO completion timeout could be because of some IRQ toubles? (sorry, the rest has scrolled away in ilo2 textcons) [ 242.655025] Call Trace: [ 242.655077] ? __schedule+0x1dd/0x5e0 [ 242.655130] schedule+0x23/0x70 [ 242.655182] schedule_timeout+0xe1/0x290 [ 242.655236] io_schedule_timeout+0x14/0x40 [ 242.655290] wait_for_completion_io+0xa4/0x120 [ 242.655346] ? wake_up_q+0x70/0x70 [ 242.655401] hpsa_scsi_do_simple_cmd+0xa7/0xf0 [ 242.655456] hpsa_scsi_do_simple_cmd_with_retry+0x4a/0x150 [ 242.655512] hpsa_scsi_do_inquiry+0x5d/0xc0 [ 242.655567] hpsa_scan_start+0xf67/0x1fa0 [ 242.655621] ? sched_clock_local+0x12/0x80 [ 242.655675] ? sched_clock_local+0x12/0x80 [ 242.655729] ? select_idle_sibling+0x21/0x3b0 [ 242.655785] ? do_scsi_scan_host+0x2d/0x90 [ 242.655839] do_scsi_scan_host+0x2d/0x90 [ 242.655892] do_scan_async+0x12/0x180 [ 242.655945] async_run_entry_fn+0x2c/0x140 [ 242.656002] process_one_work+0x1a6/0x320 [ 242.656062] worker_thread+0x26/0x3c0 [ 242.656115] ? create_worker+0x190/0x190 [ 242.656170] kthread+0x107/0x120 [ 242.656222] ? kthread_create_worker_on_cpu+0x70/0x70 [ 242.656278] ret_from_fork+0x35/0x40 -- Meelis Roos (mr...@linux.ee)
Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
> Actually this was brought up to me already, there's a fix on the mailing list > for this I reviewed a little while ago from nvidia that we should pull in: > > https://patchwork.freedesktop.org/patch/203205/ > > Would you guys mind confirming that this patch fixes your issues? It works on my amd64, P4 is still compiling. [1.124987] nouveau :04:05.0: NVIDIA NV05 (20154000) [1.161464] nouveau :04:05.0: bios: version 03.05.00.10.00 [1.161475] nouveau :04:05.0: bios: DCB table not found [1.161535] nouveau :04:05.0: bios: DCB table not found [1.161577] nouveau :04:05.0: bios: DCB table not found [1.161586] nouveau :04:05.0: bios: DCB table not found [1.344008] tsc: Refined TSC clocksource calibration: 2200.078 MHz [1.344024] clocksource: tsc: mask: 0x max_cycles: 0x1fb67c69f81, max_idle_ns: 440795210317 ns [1.344037] clocksource: Switched to clocksource tsc [1.408102] nouveau :04:05.0: tmr: unknown input clock freq [1.409471] nouveau :04:05.0: fb: 32 MiB SDRAM [1.414459] nouveau :04:05.0: DRM: VRAM: 31 MiB [1.414467] nouveau :04:05.0: DRM: GART: 128 MiB [1.414476] nouveau :04:05.0: DRM: BMP version 5.17 [1.414484] nouveau :04:05.0: DRM: No DCB data found in VBIOS [1.415629] nouveau :04:05.0: DRM: Adaptor not initialised, running VBIOS init tables. [1.415829] nouveau :04:05.0: bios: DCB table not found [1.416125] nouveau :04:05.0: DRM: Saving VGA fonts [1.477526] nouveau :04:05.0: DRM: No DCB data found in VBIOS [1.478428] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [1.478438] [drm] Driver supports precise vblank timestamp query. [1.479618] nouveau :04:05.0: DRM: MM: using M2MF for buffer copies [1.517930] nouveau :04:05.0: DRM: allocated 1024x768 fb: 0x4000, bo a09f4d1f [1.519294] nouveau :04:05.0: fb1: nouveaufb frame buffer device [1.519313] [drm] Initialized nouveau 1.3.1 20120801 for :04:05.0 on minor 1 -- Meelis Roos (mr...@linux.ee)
apm_32.c: undefined reference to `cpuidle_poll_state_init'
This is 4.16-rc1+git as of today, on a IBM PC 365 that uses APM instead of ACPI. APM linking fails: MODPOST vmlinux.o arch/x86/kernel/apm_32.o: In function `apm_init': apm_32.c:(.init.text+0x597): undefined reference to `cpuidle_poll_state_init' Config: # # Automatically generated file; DO NOT EDIT. # Linux/x86 4.16.0-rc1 Kernel Configuration # # CONFIG_64BIT is not set CONFIG_X86_32=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf32-i386" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_MMU=y CONFIG_ARCH_MMAP_RND_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_BITS_MAX=16 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_32_LAZY_GS=y CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_PGTABLE_LEVELS=2 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y # CONFIG_USELIB is not set # CONFIG_AUDIT is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_RESERVATION_MODE=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_IRQ_TIME_ACCOUNTING is not set # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # # RCU Subsystem # CONFIG_TINY_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y CONFIG_TINY_SRCU=y # CONFIG_TASKS_RCU is not set # CONFIG_RCU_STALL_COMMON is not set # CONFIG_RCU_NEED_SEGCBLIST is not set # CONFIG_BUILD_BIN2C is not set # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=16 CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y CONFIG_CGROUPS=y # CONFIG_MEMCG is not set # CONFIG_BLK_CGROUP is not set # CONFIG_CGROUP_SCHED is not set # CONFIG_CGROUP_PIDS is not set # CONFIG_CGROUP_RDMA is not set # CONFIG_CGROUP_FREEZER is not set # CONFIG_CGROUP_HUGETLB is not set # CONFIG_CGROUP_DEVICE is not set # CONFIG_CGROUP_CPUACCT is not set # CONFIG_CGROUP_PERF is not set # CONFIG_CGROUP_BPF is not set # CONFIG_SOCK_CGROUP_DATA is not set CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_IPC_NS is not set # CONFIG_USER_NS is not set # CONFIG_PID_NS is not set # CONFIG_NET_NS is not set # CONFIG_SCHED_AUTOGROUP is not set # CONFIG_SYSFS_DEPRECATED is not set # CONFIG_RELAY is not set # CONFIG_BLK_DEV_INITRD is not set CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y CONFIG_ANON_INODES=y CONFIG_HAVE_UID16=y CONFIG_SYSCTL_EXCEPTION_TRACE=y CONFIG_HAVE_PCSPKR_PLATFORM=y CONFIG_BPF=y # CONFIG_EXPERT is not set CONFIG_UID16=y CONFIG_MULTIUSER=y CONFIG_SGETMASK_SYSCALL=y CONFIG_SYSFS_SYSCALL=y # CONFIG_SYSCTL_SYSCALL is not set CONFIG_FHANDLE=y CONFIG_POSIX_TIMERS=y CONFIG_PRINTK=y CONFIG_PRINTK_NMI=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_PCSPKR_PLATFORM=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_FUTEX_PI=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_AIO=y CONFIG_ADVISE_SYSCALLS=y CONFIG_MEMBARRIER=y # CONFIG_CHECKPOINT_RESTORE is not set CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ABSOLUTE_PERCPU is not set CONFIG_KALLSYMS_BASE_RELATIVE=y CONFIG_BPF_SYSCALL
Re: 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15: NV5 in another PC (secondary card in x86-64) made the systrem crash on boot, in nvkm_therm_clkgate_fini. -- Meelis Roos (mr...@linux.ee)
4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
d_intel8x0 snd_ac97_codec button rng_core ac97_bus snd_pcm snd_timer snd soundcore eeprom adm1031 adm1025 hwmon_vid i2c_core ip_tables x_tables ipv6 autofs4 [7.410357] CPU: 0 PID: 125 Comm: systemd-udevd Not tainted 4.16.0-rc1-00010-g178e834c47b0 #65 [7.410499] Hardware name: /D850GB , BIOS GB85010A.86A.0078.P18.0110081719 10/08/2001 [7.410824] EIP: nvkm_therm_clkgate_fini+0x15/0x174 [nouveau] [7.410921] EFLAGS: 00010286 CPU: 0 [7.411014] EAX: f6b3b800 EBX: ECX: 0006 EDX: 0007 [7.411109] ESI: EDI: EBP: f6155858 ESP: f6155834 [7.411205] DS: 007b ES: 007b FS: GS: 00e0 SS: 0068 [7.411299] CR0: 80050033 CR2: CR3: 3614b000 CR4: 06d0 [7.411395] Call Trace: [7.411662] ? nvkm_device_subdev+0x1b9/0x1fa [nouveau] [7.411926] nvkm_device_fini+0x113/0x3e9 [nouveau] [7.412030] ? ktime_get+0x4b/0x135 [7.412274] ? nvkm_devinit_post+0x35/0xbf [nouveau] [7.412536] nvkm_device_init+0x228/0x5b0 [nouveau] [7.412640] ? kmem_cache_alloc+0xbd/0x12a [7.412906] nvkm_udevice_init+0x51/0xa9 [nouveau] [7.413146] nvkm_object_init+0xc8/0x442 [nouveau] [7.413248] ? check_preempt_wakeup+0xc2/0x1c1 [7.413602] ? nvkm_client_child_new+0x1d/0x38 [nouveau] [7.413956] nvkm_ioctl_new+0x152/0x3d9 [nouveau] [7.414055] ? default_wake_function+0x1a/0x35 [7.414409] ? nvif_vmm_init+0x2ce/0x2ce [nouveau] [7.414788] ? nvkm_udevice_rd08+0x5b/0x5b [nouveau] [7.415150] nvkm_ioctl+0x1c6/0x48d [nouveau] [7.416466] ? nvif_client_init+0xc3/0x114 [nouveau] [7.416832] ? nvkm_client_map+0xf/0xf [nouveau] [7.417201] nvkm_client_ioctl+0x1c/0x22 [nouveau] [7.417554] nvif_object_ioctl+0x6f/0xff [nouveau] [7.417909] nvif_object_init+0xd4/0x1de [nouveau] [7.418271] nvif_device_init+0x21/0x5c [nouveau] [7.418536] nouveau_cli_init+0x21f/0xe1f [nouveau] [7.418799] ? nouveau_drm_load+0x1d/0xe11 [nouveau] [7.419058] nouveau_drm_load+0x54/0xe11 [nouveau] [7.419158] ? kernfs_new_node+0x2b/0x8e [7.419255] ? kernfs_create_link+0x55/0xcd [7.419369] ? drm_dev_register+0x12f/0x2e0 [drm] [7.419496] drm_dev_register+0x168/0x2e0 [drm] [7.419596] ? pci_enable_device_flags+0xeb/0x15e [7.419724] drm_get_pci_dev+0xbf/0x230 [drm] [7.420102] nouveau_drm_probe+0x183/0x1ea [nouveau] [7.420207] pci_device_probe+0xaa/0x163 [7.420305] driver_probe_device+0x1db/0x383 [7.420402] __driver_attach+0x86/0xb8 [7.420497] ? driver_probe_device+0x383/0x383 [7.420597] bus_for_each_dev+0x4e/0x83 [7.420694] driver_attach+0x1d/0x33 [7.420790] ? driver_probe_device+0x383/0x383 [7.420886] bus_add_driver+0x184/0x273 [7.420983] driver_register+0x66/0x107 [7.421215] ? nouveau_drm_init+0x66/0x1000 [nouveau] [7.421322] __pci_register_driver+0x47/0x71 [7.421555] nouveau_drm_init+0x18a/0x1000 [nouveau] [7.421654] ? 0xf831a000 [7.421751] do_one_initcall+0x4f/0x1e2 [7.421850] ? free_unref_page_commit.isra.88+0xd5/0x176 [7.421947] ? kvfree+0x3c/0x3e [7.422041] ? __vunmap+0x89/0xef [7.422136] ? do_init_module+0x1a/0x23f [7.422232] do_init_module+0x82/0x23f [7.422329] load_module+0x243c/0x36ae [7.422428] ? kernel_read+0x4c/0xa1 [7.422524] SyS_finit_module+0x78/0x8d [7.422624] do_fast_syscall_32+0xc1/0x31b [7.422722] entry_SYSENTER_32+0x4e/0x7c [7.422817] EIP: 0xb7ee9ad5 [7.422907] EFLAGS: 0296 CPU: 0 [7.423001] EAX: ffda EBX: 0019 ECX: b7ce0bdd EDX: [7.423098] ESI: 00eb6670 EDI: 00ebe610 EBP: ESP: bff8704c [7.423195] DS: 007b ES: 007b FS: GS: 0033 SS: 007b [7.423291] Code: e9 30 ff ff ff 31 d2 b8 78 cf b0 f8 e8 ba 07 a2 c8 e9 0f ff ff ff 55 89 e5 57 56 53 83 ec 18 89 c3 89 d6 85 c0 0f 84 2c 01 00 00 <8b> 3b 85 ff 0f 84 11 01 00 00 8b 47 30 85 c0 0f 84 a1 00 00 00 [7.423757] EIP: nvkm_therm_clkgate_fini+0x15/0x174 [nouveau] SS:ESP: 0068:f6155834 [7.423899] CR2: [7.424033] ---[ end trace cad535783d11d7b9 ]--- -- Meelis Roos (mr...@linux.ee)
Re: pata-macio WARNING at dmam_alloc_coherent+0xec/0x110
> Does this fix your warning? > > diff --git a/drivers/macintosh/macio_asic.c b/drivers/macintosh/macio_asic.c > index 62f541f968f6..07074820a167 100644 > --- a/drivers/macintosh/macio_asic.c > +++ b/drivers/macintosh/macio_asic.c > @@ -375,6 +375,7 @@ static struct macio_dev * macio_add_one_device(struct > macio_chip *chip, > dev->ofdev.dev.of_node = np; > dev->ofdev.archdata.dma_mask = 0xUL; > dev->ofdev.dev.dma_mask = &dev->ofdev.archdata.dma_mask; > + dev->ofdev.dev.coherent_dma_mask = dev->ofdev.archdata.dma_mask; > dev->ofdev.dev.parent = parent; > dev->ofdev.dev.bus = &macio_bus_type; > dev->ofdev.dev.release = macio_release_dev; Yes, it does - thank you! Tested-by: Meelis Roos -- Meelis Roos (mr...@linux.ee)
pata-macio WARNING at dmam_alloc_coherent+0xec/0x110
I tested 4.16-rc1 on my PowerMac G4 and got the following warning from macio pata driver. Since pata-macio has no recent changes, dma-mapping.h changes seem to be related. [0.228408] MacIO PCI driver attached to Keylargo chipset [1.283931] pata-macio 0.0001f000:ata-4: Activating pata-macio chipset KeyLargo ATA-4, Apple bus ID 2 [1.284398] WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 dmam_alloc_coherent+0xec/0x110 [1.284689] Modules linked in: [1.284797] CPU: 0 PID: 1 Comm: swapper Not tainted 4.16.0-rc1 #60 [1.284991] NIP: c03259ec LR: c0325948 CTR: [1.285150] REGS: ef047c10 TRAP: 0700 Not tainted (4.16.0-rc1) [1.285337] MSR: 00029032 CR: 24fff228 XER: 2000 [1.285559] GPR00: c0325948 ef047cc0 ef048000 ef1321b0 ef1321bc GPR08: c04f1bd0 22fff884 c0004c80 GPR16: c066 c05f0960 GPR24: 0007 c063d7a8 ef1e59ac 1020 ef1321b0 ef135c18 014000c0 [1.303085] NIP [c03259ec] dmam_alloc_coherent+0xec/0x110 [1.308751] LR [c0325948] dmam_alloc_coherent+0x48/0x110 [1.314511] Call Trace: [1.320187] [ef047cc0] [c0325948] dmam_alloc_coherent+0x48/0x110 (unreliable) [1.326133] [ef047ce0] [c0370a90] pata_macio_port_start+0x44/0xb8 [1.332110] [ef047d00] [c0355ed4] ata_host_start.part.5+0x138/0x254 [1.338100] [ef047d30] [c035c1e8] ata_host_activate+0x84/0x1a0 [1.344007] [ef047d50] [c0371214] pata_macio_common_init+0x3b0/0x608 [1.349890] [ef047db0] [c0336f9c] macio_device_probe+0x60/0x120 [1.355761] [ef047dd0] [c031868c] driver_probe_device+0x25c/0x35c [1.361576] [ef047e00] [c031887c] __driver_attach+0xf0/0xf4 [1.367320] [ef047e20] [c0316340] bus_for_each_dev+0x80/0xc0 [1.373051] [ef047e50] [c031782c] bus_add_driver+0x144/0x258 [1.378805] [ef047e70] [c03190dc] driver_register+0x8c/0x140 [1.384580] [ef047e80] [c060ce14] pata_macio_init+0x5c/0x8c [1.390303] [ef047ea0] [c0004aa0] do_one_initcall+0x48/0x18c [1.396000] [ef047f00] [c05f1214] kernel_init_freeable+0x12c/0x1ec [1.401615] [ef047f30] [c0004c98] kernel_init+0x18/0x128 [1.407208] [ef047f40] [c00122e4] ret_from_kernel_thread+0x5c/0x64 [1.412829] Instruction dump: [1.418409] 939d 4bff6329 80010024 7fe3fb78 8361000c 83810010 7c0803a6 83a10014 [1.424201] 83c10018 83e1001c 38210020 4e800020 <0fe0> 4b84 7fa3eb78 3be0 [1.430020] ---[ end trace 89c0f4a91a110769 ]--- -- Meelis Roos (mr...@linux.ee)
Re: 4.15: WARNING: CPU: 3 PID: 258 at kernel/irq/chip.c:244 __irq_startup+0x80/0x100
> > I'll do a proper fix and queue it so your museum is kept alive. Thank you. > Museum, space heater and ventilation system all in one? :-) Actually, I do have a computer museum that is open for groups in Tartu, Estonia, at University of Tartu, Institute of Computer Science. But this museum displays older stuff than P3. In the queue for the museum, I have lots of servers and desktops and laptops that look too similar for presentation but are interesting for testing kernels. This set includes 100+ machines that are ocassionally powered on and most test 1-2 RC-s and the release kernels - can not afford to run them 24x7. Currently, there are 30+ sparc64 machines, 30 x86 towers (mostly desktop, mostly 32-bit), 7 laptops, 25 x86 rack servers, 6 ia64, 2 powerpc, 4 alpha and 5 parisc machines. At any moment, at least some of them are out of order but the majority are alive. -- Meelis Roos (mr...@linux.ee)
Re: 4.15: WARNING: CPU: 3 PID: 258 at kernel/irq/chip.c:244 __irq_startup+0x80/0x100
> > Your supply of vintage hardware is amazing. :-) > Does the patch below fix the issue for you? CC kernel/irq/autoprobe.o kernel/irq/autoprobe.c: In function ‘probe_irq_on’: kernel/irq/autoprobe.c:74:8: error: void value not ignored as it ought to be if (irq_activate_and_startup(desc, IRQ_NORESEND)) ^~~~ Just irq_activate_and_startup(desc, IRQ_NORESEND); cures the warning and at least the first bootup was working otherwise too. -- Meelis Roos (mr...@linux.ee)
4.15: WARNING: CPU: 3 PID: 258 at kernel/irq/chip.c:244 __irq_startup+0x80/0x100
Upgraded some of my older machines to v4.15 today. On a quad P3 HP NetServer, I get a bootup warning at kernel/irq/chip.c:244 __irq_startup+0x80/0x100 (full dmesg below). It seems it was there before but I did not notice it. Reading older kernel logs, I found that up to 4.15.0-rc4-00041-gace52288edf0 it did not have the warning. 4.15.0-rc6 did not have the warning but had a oops with AACRAID (NULL derefernce when battery died). 4.15.0-rc6-dirty has the warning, dirty means my aacraid init order patch (submitted to linux-scsi, initialized function pointers before using them in error handler, does not seem related to IRQs?). I also found it for the next 2 boots, 4.15.0-rc9-00023-g1f07476ec143-dirty and 4.15.0-dirty. Sometimes on CPU0, sometimes on CPU 3. Config is also below. [0.00] Linux version 4.15.0-dirty (mroos@ninasarvik) (gcc version 7.2.0 (Debian 7.2.0-20)) #89 SMP Mon Jan 29 13:18:49 EET 2018 [0.00] x86/fpu: x87 FPU will use FXSAVE [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009dbff] usable [0.00] BIOS-e820: [mem 0x0009dc00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e5800-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xbffe] usable [0.00] BIOS-e820: [mem 0xbfff-0xbbff] ACPI data [0.00] BIOS-e820: [mem 0xbc00-0xbfff] ACPI NVS [0.00] BIOS-e820: [mem 0xfec0-0xfecf] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfeef] reserved [0.00] BIOS-e820: [mem 0xfff8-0x] reserved [0.00] Notice: NX (Execute Disable) protection missing in CPU! [0.00] random: fast init done [0.00] SMBIOS 2.3 present. [0.00] DMI: Hewlett Packard HP NetServer/HP System Board, BIOS 4.06.46 PW 06/25/2003 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0xbfff0 max_arch_pfn = 0x10 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-C7FFF write-protect [0.00] C8000-E uncachable [0.00] F-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 0 mask F8000 write-back [0.00] 1 base 08000 mask FC000 write-back [0.00] 2 disabled [0.00] 3 disabled [0.00] 4 disabled [0.00] 5 disabled [0.00] 6 disabled [0.00] 7 disabled [0.00] x86/PAT: PAT not supported by CPU. [0.00] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC [0.00] found SMP MP-table at [mem 0x000f7610-0x000f761f] mapped at [(ptrval)] [0.00] initial memory mapped: [mem 0x-0x01ff] [0.00] Base memory trampoline at [(ptrval)] 99000 size 16384 [0.00] BRK [0x01d97000, 0x01d97fff] PGTABLE [0.00] ACPI: Early table checksum verification disabled [0.00] ACPI: RSDP 0x000F75A0 14 (v00 PTLTD ) [0.00] ACPI: RSDT 0xBFFFC11A 30 (v01 PTLTD HWPC20C 0001 LTP ) [0.00] ACPI: FACP 0xBAC1 74 (v01 HP LH 6000 0001 PTL 0001) [0.00] ACPI BIOS Warning (bug): Invalid length for FADT/Pm1aControlBlock: 32, using default 16 (20170831/tbfadt-708) [0.00] ACPI BIOS Warning (bug): Invalid length for FADT/Pm1bControlBlock: 32, using default 16 (20170831/tbfadt-708) [0.00] ACPI: DSDT 0xBFFFC14A 003977 (v01 HP LT 6000 0001 MSFT 010B) [0.00] ACPI: FACS 0xBFC0 40 [0.00] ACPI: APIC 0xBB35 A4 (v01 PTLTDAPIC 0001 LTP ) [0.00] ACPI: BOOT 0xBBD9 27 (v01 PTLTD $SBFTBL$ 0001 LTP 0001) [0.00] ACPI: Local APIC address 0xfee0 [0.00] 2183MB HIGHMEM available. [0.00] 887MB LOWMEM available. [0.00] mapped low ram: 0 - 377fe000 [0.00] low ram: 0 - 377fe000 [0.00] tsc: Fast TSC calibration using PIT [0.00] BRK [0x01d98000, 0x01d98fff] PGTABLE [0.00] Zone ranges: [0.00] DMA [mem 0x1000-0x00ff] [0.00] Normal [mem 0x0100-0x377fdfff] [0.00] HighMem [mem 0x377fe000-0xbffe] [0.00] Movable zone start for each node [0.00] Early memory node ranges [0.00] node 0: [mem 0x1000-0x0009cfff] [0.00] node 0: [mem 0x0010-0xbffe] [0.00] Initmem setup node 0 [mem 0x1000-0xbff
4.15-rc9 new insecure W+X mapping warning
.20 [ 16.301660] mptctl: Registered with Fusion MPT base driver [ 16.301764] mptctl: /dev/mptctl @ (major,minor=10,220) [ 17.020409] e1000: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX [ 17.020573] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 17.535582] audit: type=1400 audit(1516811432.091:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=1013 comm="apparmor_parser" [ 17.536021] audit: type=1400 audit(1516811432.091:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/man//filter" pid=1013 comm="apparmor_parser" [ 17.536432] audit: type=1400 audit(1516811432.095:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/man//groff" pid=1013 comm="apparmor_parser" [ 17.610155] audit: type=1400 audit(1516811432.167:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/ntpd" pid=1014 comm="apparmor_parser" -- Meelis Roos (mr...@linux.ee)
Re: powersaving-related hangs on T460s
> > > I tried 4.15-git on my Thinkpad T460s laptop. It is working but I have > > > problems waking up the computer after it has been idle. > > > > > I seem to have found a better reproducer - when running on battery, it > > will hang after some minutes, with screen on. It just hangs. > > And as of last Fridays git, it seems to have been fixed, so I did not > try to bisect it. And as of yesterdays git, the problem is back again :( Will see if I can biusect it this time on battery power. -- Meelis Roos (mr...@linux.ee)
Re: lapic-related boot crash in 4.15-rc1
> I am compiling the x86/urgent pull that you suggested. And it works. -- Meelis Roos (mr...@linux.ee)
Re: lapic-related boot crash in 4.15-rc1
> I've reverted the commit which Dou pointed out in rc8. Can you please confirm > that > this fixes the issue for you? I am compiling the x86/urgent pull that you suggested. Meanwhile the bisect finished and it came to the exact same commit by Dou Liyang that he sent me for revert test. Reverting this patch worked on 2 of the machines, 3rd one is compiling. -- Meelis Roos (mr...@linux.ee)
Re: powersaving-related hangs on T460s
> > I tried 4.15-git on my Thinkpad T460s laptop. It is working but I have > > problems waking up the computer after it has been idle. > > > I seem to have found a better reproducer - when running on battery, it > will hang after some minutes, with screen on. It just hangs. And as of last Fridays git, it seems to have been fixed, so I did not try to bisect it. -- Meelis Roos (mr...@linux.ee)
Re: lapic-related boot crash in 4.15-rc1
> I've reverted the commit which Dou pointed out in rc8. Can you please confirm > that > this fixes the issue for you? Tried rc8 on the P3, it still hangs. -- Meelis Roos (mr...@linux.ee)
Re: lapic-related boot crash in 4.15-rc1
On Wed, 10 Jan 2018, Thomas Gleixner wrote: > On Wed, 10 Jan 2018, Meelis Roos wrote: > > > > > On 3 of my test computers, boot hangs with 4.15 git kernels. So far I > > > > have traced it down to 4.14.0 being good and 4.15-rc1 being bad (bisect > > > > is slow because the computers are somwehat remote). Also because of > > > > trying to find when it started, I have not tries newer than rc5 > > > > kernels. > > > > > > Please do so. We have fixes post rc5 in that area. > > > > P4 was the quickest to rebuild the kernel and it is still hanging like > > before with todays 4.15-rc7-00102-gcf1fb158230e. So far I have bisected it to 4f45ed9f848f good, ae41a2a40ed4 bad. Will continue tomorrow. 1be2172e96e3 bad 2cd83ba5bede bad 449fcf3ab0ba bad 43ff2f4db9d0 good 313144c1bcd6 good b18d62891aaf bad b24591e2fcf8 good 0696d059f23c bad 023a611748fd bad ae41a2a40ed4 bad 4f45ed9f848f good -- Meelis Roos (mr...@linux.ee)
Re: powersaving-related hangs on T460s
> I tried 4.15-git on my Thinkpad T460s laptop. It is working but I have > problems waking up the computer after it has been idle. > > There should be no suspend (to keep network connections alive) when the > laptop is on AC power, even when the lid is closed. In dmesg, I have > seen no indication of suspend happening. It is configured to just lock > the screen when the lid is closed. > > Normally, I have to press a key after opening the lid to unblank the > screen and get to password prompt. Usually this works but sometimes > there is no response - power LED is on that is all, holding down power > button is the only way out. > > Sometimes it happens overnight, sometimes it is alive in the morning. It > almost never happens with short 5-15 minutes breaks but it can happen > for about hour long breaks. There is no reliable way to reproduce the > problem. I seem to have found a better reproducer - when running on battery, it will hang after some minutes, with screen on. It just hangs. -- Meelis Roos (mr...@linux.ee)
Re: lapic-related boot crash in 4.15-rc1
> > P4 was the quickest to rebuild the kernel and it is still hanging like > > before with todays 4.15-rc7-00102-gcf1fb158230e. > > I try to find a time slot for this ... And I will try to bisect. -- Meelis Roos (mr...@linux.ee)
Re: lapic-related boot crash in 4.15-rc1
> > On 3 of my test computers, boot hangs with 4.15 git kernels. So far I > > have traced it down to 4.14.0 being good and 4.15-rc1 being bad (bisect > > is slow because the computers are somwehat remote). Also because of > > trying to find when it started, I have not tries newer than rc5 > > kernels. > > Please do so. We have fixes post rc5 in that area. P4 was the quickest to rebuild the kernel and it is still hanging like before with todays 4.15-rc7-00102-gcf1fb158230e. -- Meelis Roos (mr...@linux.ee)
powersaving-related hangs on T460s
I tried 4.15-git on my Thinkpad T460s laptop. It is working but I have problems waking up the computer after it has been idle. There should be no suspend (to keep network connections alive) when the laptop is on AC power, even when the lid is closed. In dmesg, I have seen no indication of suspend happening. It is configured to just lock the screen when the lid is closed. Normally, I have to press a key after opening the lid to unblank the screen and get to password prompt. Usually this works but sometimes there is no response - power LED is on that is all, holding down power button is the only way out. Sometimes it happens overnight, sometimes it is alive in the morning. It almost never happens with short 5-15 minutes breaks but it can happen for about hour long breaks. There is no reliable way to reproduce the problem. 4.14 with the same config (modulo any new config options) was working fine. Nothing in the log files afterwards. Network connection is WiFi. There is a USB mouse connected. Wat do I check next? -- Meelis Roos (mr...@linux.ee)