Re: Xen Domu kernel crash at start of boot
2018-06-22 2:54 GMT+02:00 Chuck Zmudzinski : > I am getting a kernel crash almost immediately after booting the current > kernel. I am running NetBSD/xen amd64 on a Debian Linux 8.10 DOM0 which uses > Xen-4.4. Last week's kernel was good. I built a kernel from a cvs update a > couple of days ago and tried it. It crashed. I tried the most recent daily > snapshot available from NetBSD daily builds. It crashed too. Here is the > information from the console about the daily snapshot kernel that crashed > (it was built earlier today): > [ 1.000] vcpu0: Intel(R) Core(TM) i5-4590S CPU @ 3.00GHz, id 0x306c3 > [ 1.000] vcpu0: package 0, core 3, smt 0 > [ 1.000] vcpu1 at hypervisor0 > [ 1.000] vcpu1: Intel(R) Core(TM) i5-4590S CPU @ 3.00GHz, id 0x306c3 > [ 1.000] vcpu1: package 0, core 3, smt 0 > [ 1.000] xenbus0 at hypervisor0: Xen Virtual Bus Interface > [ 1.000] xencons0 at hypervisor0: Xen Virtual Console Driver > [ 1.030] fatal protection fault in supervisor mode > [ 1.030] trap type 4 code 0 rip 0x80205968 cs 0x1e030 > rflags 0x10046 cr2 0 ilevel 0 rsp 0xa000a570fbf0 > [ 1.030] curlwp 0xa642d4a0 pid 0.15 lowest kstack > 0xa000a570b2c0 > kernel: protection fault trap, code=0 > Stopped in pid 0.15 (system) at 80205968: fxsavel That would be my fault. Can you send me moral equivalent of "cpuctl identify 0" from the DOM0? I want to know what CPUID is saying about supported features on the CPU. Can you also check whether you use no-xsave flag for your DOM0 by chance? It should not be needed on Intel CPUs. Meanwhile this change in sys/arch/xen/x86/cpu.c can be done to avoid this: @@ -551,7 +551,7 @@ cpu_init(struct cpu_info *ci) * does, here we only set CR4_OSXSAVE if the feature is already * enabled according to CPUID. */ - if (cpu_feature[1] & CPUID2_OSXSAVE) + if (0 && cpu_feature[1] & CPUID2_OSXSAVE) cr4 |= CR4_OSXSAVE; else { x86_xsave_features = 0; The change was tested on Xen 4.2 and Xen 4.8, I wonder if Xen 4.4 has yet another quirk. Any chance you could try your DOM0 updated to newer Xen? Jaromir
Re: Xen Domu kernel crash at start of boot
Stopped in pid 0.15 (system) at 80205968: fxsavel I seem to recall some recent changes related to save/restore of fpu state. On Thu, 21 Jun 2018, Chuck Zmudzinski wrote: I am getting a kernel crash almost immediately after booting the current kernel. I am running NetBSD/xen amd64 on a Debian Linux 8.10 DOM0 which uses Xen-4.4. Last week's kernel was good. I built a kernel from a cvs update a couple of days ago and tried it. It crashed. I tried the most recent daily snapshot available from NetBSD daily builds. It crashed too. Here is the information from the console about the daily snapshot kernel that crashed (it was built earlier today): [ 1.000] NetBSD 8.99.19 (XEN3_DOMU) #0: Thu Jun 21 11:48:05 UTC 2018 [ 1.000] mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/xen/compile/XEN3_DOMU Here is the information I got from the console about the crash: [ 1.000] total memory = 3072 MB [ 1.000] avail memory = 2963 MB [ 1.000] cpu_rng: RDRAND [ 1.000] running cgd selftest aes-xts-256 aes-xts-512 done [ 1.000] mainbus0 (root) [ 1.000] hypervisor0 at mainbus0: Xen version 4.4.1 [ 1.000] vcpu0 at hypervisor0 [ 1.000] vcpu0: Intel(R) Core(TM) i5-4590S CPU @ 3.00GHz, id 0x306c3 [ 1.000] vcpu0: package 0, core 3, smt 0 [ 1.000] vcpu1 at hypervisor0 [ 1.000] vcpu1: Intel(R) Core(TM) i5-4590S CPU @ 3.00GHz, id 0x306c3 [ 1.000] vcpu1: package 0, core 3, smt 0 [ 1.000] xenbus0 at hypervisor0: Xen Virtual Bus Interface [ 1.000] xencons0 at hypervisor0: Xen Virtual Console Driver [ 1.030] fatal protection fault in supervisor mode [ 1.030] trap type 4 code 0 rip 0x80205968 cs 0x1e030 rflags 0x10046 cr2 0 ilevel 0 rsp 0xa000a570fbf0 [ 1.030] curlwp 0xa642d4a0 pid 0.15 lowest kstack 0xa000a570b2c0 kernel: protection fault trap, code=0 Stopped in pid 0.15 (system) at 80205968: fxsavel ds 0 es 0 fs fd00 gs 0 rdi a000a570fbf8 rsi 1 rbp a000a570fe68 rbx 0 rdx 2 rcx 0 rax 0 r8 a668 r9 cccd r10 64 r11 0 r12 a000a570fbf8 r13 1 r14 0 r15 0 rip 80205968 cs e030 rflags 10046 rsp a000a570fbf0 ss e02b 80205968: fxsavel db{1}> bt ?() at 80205968 ?() at 802313f0 ?() at 8023155e I could not get any useful info from addr2line. Any ideas why it crashes? Thanks, Chuck Zmudzinski !DSPAM:5b2c48e0136191148838969! +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org | +--+--++
Re: Xen Domu kernel crash at start of boot
The next few lines of the working kernel from last week: Jun 21 15:38:33 ave /netbsd: [ 1.000] xencons0: console major 143, unit 0 Jun 21 15:38:33 ave /netbsd: [ 1.000] xencons0: using event channel 2 Jun 21 15:38:33 ave /netbsd: [ 1.000] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0 Jun 21 15:38:33 ave /netbsd: [ 1.030] timecounter: Timecounter "xen_system_time" frequency 10 Hz quality 10 It looks like it panics when configuring the xen console. I don't have time to investigate this now. Maybe next week if it's not fixed by then... On 06/21/2018 09:06 PM, Greg Troxel wrote: you might see what the first line not printed is, in other words, what did the working kernel print next? And the usual bisecting